Publications and Conference Presentations

Showing:

Automated classification of X-rays as normal/Abnormal using a high sensitivity deep learning algorithm

1. Centre for Advanced Research in Imaging, Neuroscience and Genomics;  2. Qure.ai, Mumbai

Presented: 03 March 2019, European Congress of Radiology (ECR)

conference

Purpose

Majority of chest X-rays (CXRs) performed globally are normal radiologists spend significant time ruling out these scans. We present a Deep Learning (DL) model trained for the specific use of classifying CXRs into normal and abnormal, potentially reducing time and cost associated with reporting normal studies.

Methods

A DL algorithm trained on 1,150,084 CXRs and their corresponding reports was developed. A retrospectively acquired independent test set of 430 CXRs (285 abnormal, 145 normal) was analyzed by the algorithm classifying each X-Ray as normal or abnormal. Ground truth for the independent test set was established by a sub-specialist chest radiologist with 8 years’ experience by reviewing every Chest X-Ray image with reference to the existing report. Algorithm output was compared against ground truth and summary statistics were calculated.

Results

The algorithm correctly classified 376 (87.44%) CXRs with sensitivity of 97.19% (95% CI - 94.54% to 98.78%) and specificity of 68.28% (95% CI - 60.04% to 75.75%). There were 46 (10.70%) false positives and 8 (1.86%) false negatives (FNs). Out of 8 FNs, 3 were designated as clinically insignificant (mild, inactive fibrosis) and 5 as significant (rib fractures, pneumothorax).

Conclusion

High-sensitivity DL algorithms can potentially be deployed for primary read of CXRs enabling radiologists to spend appropriate time on abnormal cases, saving time and thereby cost of reporting CXRs, especially on non-emergency situations, More in-depth prospective trials are required to ascertain the overall impact of such algorithms.

Watch recorded presentation at ECR (sign-up required)

Automated Detection and Localization of Pneumocephalus in Head CT scan

1. Qure.ai, Mumbai;  2. CT & MRI Center, Nagpur, India

Presented: 01 March 2019, European Congress of Radiology (ECR)

conference

Purpose

Pneumocephalus, accumulation of air in intracranial space, can lead to midline shift and compression of brain. In this work, we detail the development of deep learning algorithms for automated detection and localization of pneumocephalus in head CT scans.

Methods

Firstly, to localize the intracranial space from a given head CT scan, a skull­-stripping algorithm was developed using a randomly sampled anonymized dataset of 78 head CT scans (1608 slices). We sampled another anonymized dataset containing 83 head CT scans (3546 slices) having pneumocephalus and 310 normal head CT scans which were randomly sampled to represent natural distribution. These 3546 slices (932 slices had pneumocephalus) were annotated for pneumocephalus regions. Then U­Net based deep neural network algorithm was trained on these scans to accurately predict the pneumocephalus region . The predicted pneumocephalus region is refined by removing the regions outside the intracranial space identified by the skull stripping algorithm. The refined pneumocephalus region is then used to extract features. Using these features, a random forest was trained to classify the presence of pneumocephalus in a scan. Areas under receiver operating characteristics curves (AUC) were used to evaluate the algorithms.

Results

An independent dataset of 1891 head CT scans (40 scans had pneumocephalus) was used for testing above algorithms. AUC for the scan level predictions was 0.89. Sensitivity and Specificity of 0.80 and 0.83 respectively were observed.

Conclusion

In this work, we showed the efficacy of deep learning algorithms in localizing and classifying the pneumocephalus accurately in a head CT scan.

Watch recorded presentation at ECR (sign-up required)

Validation of Deep Learning Algorithms for Detection of Critical Findings in Head CT Scans

1. Qure.ai, Mumbai;  2. CT & MRI Center, Nagpur, India;  3. Department of Radiology, Mayo Clinic, Rochester, MN;  4. Centre for Advanced Research in Imaging, Neurosciences and Genomics, New Delhi

Presented: 28 February 2019, European Congress of Radiology (ECR)

conference

Purpose

To validate set of deep learning algorithms for automated detection of key findings from non­contrast head-CT scans: intracranial hemorrhage and its subtypes, calvarial fractures, midline shift and mass effect.

Methods

We retrospectively collected a dataset containing 313,318 head-CT scans of which random subset(Qure25k dataset) was used to validate and rest to develop algorithms. Additional dataset(CQ500 dataset) was collected from different centers to validate algorithms. Patients with post­operative defect or age<7 were excluded from all datasets. Three independent radiologists read each scan in CQ500 dataset. Original clinical radiology report and consensus of readers were considered as gold standards for Qure25k and CQ500 datasets respectively. Areas under receiver operating characteristics curves(AUCs) were used to evaluate algorithms.

Results

After exclusion, Qure25k dataset contained 21,095 scans(mean-age 43;43% female) while CQ500 dataset consisted of 491(mean-age 48;36% female) scans. On Qure25k dataset, algorithms achieved an AUC of 0.92 for detecting intracranial hemorrhage(0.90-intraparenchymal, 0.96-intraventricular, 0.92-subdural, 0.93-extradural, and 0.90-subarachnoid hemorrhages). On CQ500 dataset, AUC was 0.94 for intracranial haemorrhage(0.95, 0.93, 0.95, 0.97, and 0.96 respectively). AUCs on Qure25k dataset were 0.92 for calvarial fractures, 0.93 for midline shift, and 0.86 for mass effect, while AUCs on CQ500 dataset were 0.96, 0.97 and 0.92 respectively.

Conclusion

This study demonstrates that deep learning algorithms can identify head-CT scan abnormalities requiring urgent attention with high AUCs.

Watch recorded presentation at ECR (sign-up required)

Prospective evaluation of a deep learning algorithm deployed in an urban imaging centre to notify clinicians of head CT scans with critical abnormalities

1. Qure.ai, Mumbai;  2. CT & MRI Center, Nagpur, India

Presented: 28 February 2019, European Congress of Radiology (ECR)

conference

Purpose

Non-contrast Head-CT scans are primary imaging modality for evaluating patients with trauma or stroke. While results of deep learning algorithms to identify head-CT scans containing critical abnormalities has been published in retrospective studies, effects of deployment of such an algorithm in a real-world setting, with mobile notifications to clinician remain unstudied. In this prospective study, we evaluated performance of such an automated triage system in an urban 24-hour imaging facility.

Methods

We developed an accurate deep neural network algorithm that identifies and localizes intracranial bleeds, cranial fractures, mass effect and midline shift on non-contrast head-CT scans. The algorithm is deployed in clinical imaging facility in conjunction with an on-premise module that automatically selects eligible scans from PACS and uploads them to cloud-based algorithm for processing. Once processed, cloud algorithm returns an additional series, viewable as an overlay over the original, and a text notification to radiologist with preview images. Mobile notifications facilitated confirmation of the detected abnormalities. We studied the performance of the automated system over 60 days.

Results

748 CT scans were taken over 60 days, of which 194 were non-contrast head-CT scans and these are evaluated by senior radiologist. Sensitivity, specificity, AUC and average time to notification of head-CT scans with critical abnormalities were  0.90 (95% CI 0.74-0.98), 0.86 (0.80-0.91), 0.97 (0.92-1.00) and 3.2 minutes respectively.

Conclusion

An automated triage system in a radiology facility results in rapid notification of critical scans, with a low false positive rate and this may be used to expedite treatment initiation.

Watch recorded presentation at ECR (sign-up required)

Deep Learning for Infarct Detection and Localization from Head CT Scans

1. Qure.ai, Mumbai;  2. Department of Radiology, Mayo Clinic, Rochester, MN

Presented: 28 February 2019, European Congress of Radiology (ECR)

conference

Purpose

The purpose of this study was to use a deep learning algorithm to detect and localize subacute and chronic ischemic infarcts on head CT scans for use in automated volumetric progression tracking.

Methods

We sampled 308 head CT scans (11840 slices) which were reported with chronic or subacute infarct. The infarcted regions in 11840 infarct-positive slices were marked. We trained segmentation algorithm to predict a heatmap of infarct lesion. The heatmap was used to derive scan level features representative of lesion density and volume to train a random forest to predict scan-level probabilities of chronic infarct. Area under receiver operating characteristics curves (AUC) were used to evaluate scan level predictions.

Results

The algorithm was validated on an independent dataset of 1610 head CT scans containing 78 chronic & 9 subacute infarct, 45 chronic ICH, 6 glioblastomas. The distribution of infarct affected territories was - 52.9% MCA, 33.3 % PCA, 9.3% ACA and 4.7% vertebrobasilar territories. The algorithm yielded AUC of 0.8474 (95% CI 0.7964 - 0.8984) for scan level predictions. It identified 8 of 9 subacute infarcts (88.89% recall) and 70 out of 78 chronic infarcts (89.74% recall). The eight missed chronic infarcts constituted of 3 lacunar and 2 hemorrhagic. The volumes of predicted infarct lesions ranged from 1 mL - 526 mL with mean prediction volume as 55.60mL.

Conclusion

The study demonstrates the capability of deep learning algorithms to accurately differentiate infarcts from infarct mimics.

Watch recorded presentation at ECR (sign-up required)

Automated Detection of Midline Shift and Mass Effect from Head CT Scans using Deep Learning

1. Qure.ai, Mumbai;  2. CT & MRI Center, Nagpur, India;  3. Department of Radiology, Mayo Clinic, Rochester, MN;  4. Centre for Advanced Research in Imaging, Neurosciences and Genomics, New Delhi

Presented: 28 February 2019, European Congress of Radiology (ECR)

conference

Purpose

Mass effect and Midline shift are most critical and time­ sensitive abnormalities that can be readily detected on head-CT scan. We describe development and validation of deep learning algorithms to automatically detect the mentioned abnormalities.

Methods

We labeled slices from 699 anonymized non­contrast head-CT scans for the presence or absence of mass effect and midline shift in that slice. Number of scans(slices) with mass effect were 320(3143) and midline shift were 249(2074). We used these labels to train a modified ResNet18, a popular convolutional neural network to predict softmax based confidences for the presence of mass effect and midline shift in a slice. We modified the network by using two parallel fully connected(FC) layers in place of a single FC layer. The confidences at the slice­-level were combined using random forest to predict the scan-level confidence for the presence of mass effect and midline shift. A separate dataset(CQ500 dataset) was collected for the validation of the algorithm. Three senior radiologists independently read each scan in this dataset. Consensus of the readers’ opinion was used as the gold standard. We used areas under receiver operating characteristics curves(AUC) to evaluate the algorithm.

Results

CQ500 dataset contained 491 scans of which number of scans with mass effect and midline shift were 99 and 47 respectively. AUC for detecting mass effect was 0.92(95%CI 0.89-0.95) and for detecting midline shift was 0.97(95%CI 0.94-0.99).

Conclusion

We show that a deep learning algorithm can be trained to accurately detect mass effect and midline shift from head CT scans.

Watch recorded presentation at ECR (sign-up required)

Deep learning algorithms for detection of critical findings in head CT scans - A retrospective study

1. Qure.ai, Mumbai, India;  2. CT & MRI Center, Dhantoli, Nagpur, India;  3. Department of Radiology, Mayo Clinic, Rochester, MN, USA;  4. Centre for Advanced Research in Imaging, Neurosciences and Genomics, New Delhi, India

Published: 11 October 2018, The Lancet

journal

Background

Non-contrast head CT scan is the current standard for initial imaging of patients with head trauma or stroke symptoms. We aimed to develop and validate a set of deep learning algorithms for automated detection of the following key findings from these scans: intracranial haemorrhage and its types (ie, intraparenchymal, intraventricular, subdural, extradural, and subarachnoid); calvarial fractures; midline shift; and mass effect.

Methods

We retrospectively collected a dataset containing 313 318 head CT scans together with their clinical reports from around 20 centres in India between Jan 1, 2011, and June 1, 2017. A randomly selected part of this dataset (Qure25k dataset) was used for validation and the rest was used to develop algorithms. An additional validation dataset (CQ500 dataset) was collected in two batches from centres that were di erent from those used for the development and Qure25k datasets. We excluded postoperative scans and scans of patients younger than 7 years. The original clinical radiology report and consensus of three independent radiologists were considered as gold standard for the Qure25k and CQ500 datasets, respectively. Areas under the receiver operating characteristic curves (AUCs) were primarily used to assess the algorithms.

Findings

The Qure25k dataset contained 21 095 scans (mean age 43 years; 9030 [43%] female patients), and the CQ500 dataset consisted of 214 scans in the rst batch (mean age 43 years; 94 [44%] female patients) and 277 scans in the second batch (mean age 52 years; 84 [30%] female patients). On the Qure25k dataset, the algorithms achieved an AUC of 0·92 (95% CI 0·91–0·93) for detecting intracranial haemorrhage (0·90 [0·89–0·91] for intraparenchymal, 0·96 [0·94–0·97] for intraventricular, 0·92 [0·90–0·93] for subdural, 0·93 [0·91–0·95] for extradural, and 0·90 [0·89–0·92] for subarachnoid). On the CQ500 dataset, AUC was 0·94 (0·92–0·97) for intracranial haemorrhage (0·95 [0·93–0·98], 0·93 [0·87–1·00], 0·95 [0·91–0·99], 0·97 [0·91–1·00], and 0·96 [0·92–0·99], respectively). AUCs on the Qure25k dataset were 0·92 (0·91–0·94) for calvarial fractures, 0·93 (0·91–0·94) for midline shift, and 0·86 (0·85–0·87) for mass e ect, while AUCs on the CQ500 dataset were 0·96 (0·92–1·00), 0·97 (0·94–1·00), and 0·92 (0·89–0·95), respectively.

Interpretation

Our results show that deep learning algorithms can accurately identify head CT scan abnormalities requiring urgent attention, opening up the possibility to use these algorithms to automate the triage process.

Read full paper

Deep learning in chest radiography: Detection of findings and presence of change

1. Department of Radiology, Massachusetts General Hospital, Boston, Massachusetts, United States of America, Harvard Medical School, Boston, Massachusetts, United States of America;  2. Division of Diagnostic Radiology, Department of Diagnostic and Therapeutic Radiology, Faculty of Medicine, Ramathibodi Hospital, Mahidol University, Bangkok, Thailand;  3. Qure.ai, 101 Raheja Titanium, Goregaon East, Mumbai, India

Published: 04 October 2018, PLOS One

journal

Background

Deep learning (DL) based solutions have been proposed for interpretation of several imaging modalities including radiography, CT, and MR. For chest radiographs, DL algorithms have found success in the evaluation of abnormalities such as lung nodules, pulmonary tuberculosis, cystic fibrosis, pneumoconiosis, and location of peripherally inserted central catheters. Chest radiography represents the most commonly performed radiological test for a multitude of non-emergent and emergent clinical indications. This study aims to assess accuracy of deep learning (DL) algorithm for detection of abnormalities on routine frontal chest radiographs (CXR), and assessment of stability or change in findings over serial radiographs.

Methods and Findings

We processed 874 de-identified frontal CXR from 724 adult patients (> 18 years) with DL (Qure AI). Scores and prediction statistics from DL were generated and recorded for the presence of pulmonary opacities, pleural effusions, hilar prominence, and enlarged cardiac silhouette. To establish a standard of reference (SOR), two thoracic radiologists assessed all CXR for these abnormalities. Four other radiologists (test radiologists), unaware of SOR and DL findings, independently assessed the presence of radiographic abnormalities. A total 724 radiographs were assessed for detection of findings. A subset of 150 radiographs with follow up examinations was used to asses change over time. Data were analyzed with receiver operating characteristics analyses and post-hoc power analysis.

Results

About 42% (305/ 724) CXR had no findings according to SOR; single and multiple abnormalities were seen in 23% (168/724) and 35% (251/724) of CXR. There was no statistical difference between DL and SOR for all abnormalities (p = 0.2–0.8). The area under the curve (AUC) for DL and test radiologists ranged between 0.837–0.929 and 0.693–0.923, respectively. DL had lowest AUC (0.758) for assessing changes in pulmonary opacities over follow up CXR. Presence of chest wall implanted devices negatively affected the accuracy of DL algorithm for evaluation of pulmonary and hilar abnormalities.

Conclusions

DL algorithm can aid in interpretation of CXR findings and their stability over follow up CXR. However, in its present version, it is unlikely to replace radiologists due to its limited specificity for categorizing specific findings.

Read full paper

Can Artificial Intelligence Reliably Report Chest X-Rays? Radiologist Validation of an Algorithm trained on 1.2 Million X-Rays

1. Qure.ai, Mumbai, India;  2. Columbia Asia Radiology Group, Bengaluru, India

Published: 19 July 2018

journal

Background and Objectives

Chest x-rays are the most commonly performed, cost-effective diagnostic imaging tests ordered by physicians. A clinically validated, automated artificial intelligence system that can reliably separate normal from abnormal would be invaluable in addressing the problem of reporting backlogs and the lack of radiologists in low-resource settings. The aim of this study was to develop and validate a deep learning system to detect chest x-ray abnormalities.

Methods

A deep learning system was trained on 1.2 million x-rays and their corresponding radiology reports to identify abnormal x-rays and the following specific abnormalities: blunted costophrenic angle, calcification, cardiomegaly, cavity, consolidation, fibrosis, hilar enlargement, opacity and pleural effusion. The system was tested versus a 3-radiologist majority on an independent, retrospectively collected de-identified set of 2000 x-rays. The primary accuracy measure was area under the ROC curve (AUC), estimated separately for each abnormality as well as for normal versus abnormal reports.

Results

The deep learning system demonstrated an AUC of 0.93(CI 0.92-0.94) for detection of abnormal scans, and AUC(CI) of 0.94(0.92-0.97),0.88(0.85-0.91), 0.97(0.95-0.99), 0.92(0.82-1), 0.94(0.91-0.97), 0.92(0.88-0.95), 0.89(0.84-0.94), 0.93(0.92-0.95), 0.98(0.97-1), 0.93(0.0.87-0.99) for the detection of blunted CP angle, calcification, cardiomegaly, cavity, consolidation, fibrosis,hilar enlargement, opacity and pleural effusion respectively.

Conclusions

Our study shows that a deep learning algorithm trained on a large quantity of labelled data can accurately detect abnormalities on chest x-rays. As these systems further increase in accuracy, the feasibility of using artificial intelligence to extend the reach of chest x-ray interpretation and improve reporting efficiency will increase in tandem.

Read full paper

Machine Learning Methods Improve Prognostication, Identify Clinically Distinct Phenotypes, and Detect Heterogeneity in Response to Therapy in a Large Cohort of Heart Failure Patients

1. Section of Cardiovascular Medicine and Center for Outcomes Research, Yale University School of Medicine New Haven, CT;  2. Department of Cardiology, Karolinska Institutet Department of Medicine and Karolinska University Hospital, Stockholm, Sweden;  3. Qure.ai, Mumbai, India;  4. Department of Medicine and Health Sciences, Linköping University, Linköping, Sweden;  5. Duke Clinical Research Institute, Duke University, Durham, NC

Published: 12 April 2018, Journal of the American Heart Association

journal

Background

Whereas heart failure (HF) is a complex clinical syndrome, conventional approaches to its management have treated it as a singular disease, leading to inadequate patient care and inefficient clinical trials. We hypothesized that applying advanced analytics to a large cohort of HF patients would improve prognostication of outcomes, identify distinct patient phenotypes, and detect heterogeneity in treatment response.

Methods and Results

The Swedish Heart Failure Registry is a nationwide registry collecting detailed demographic, clinical, laboratory, and medication data and linked to databases with outcome information. We applied random forest modeling to identify predictors of 1‐year survival. Cluster analysis was performed and validated using serial bootstrapping. Association between clusters and survival was assessed with Cox proportional hazards modeling and interaction testing was performed to assess for heterogeneity in response to HF pharmacotherapy across propensity‐matched clusters. Our study included 44 886 HF patients enrolled in the Swedish Heart Failure Registry between 2000 and 2012. Random forest modeling demonstrated excellent calibration and discrimination for survival (C‐statistic=0.83) whereas left ventricular ejection fraction did not (C‐statistic=0.52): there were no meaningful differences per strata of left ventricular ejection fraction (1‐year survival: 80%, 81%, 83%, and 84%). Cluster analysis using the 8 highest predictive variables identified 4 clinically relevant subgroups of HF with marked differences in 1‐year survival. There were significant interactions between propensity‐matched clusters (across age, sex, and left ventricular ejection fraction and the following medications: diuretics, angiotensin‐converting enzyme inhibitors, β‐blockers, and nitrates, P < 0.001, all).

Conclusions

Machine learning algorithms accurately predicted outcomes in a large data set of HF patients. Cluster analysis identified 4 distinct phenotypes that differed significantly in outcomes and in response to therapeutics. Use of these novel analytic approaches has the potential to enhance effectiveness of current therapies and transform future HF clinical trials.

Read full paper

Development and Validation of Deep Learning Algorithms for Detection of Critical Findings in Head CT Scans

1. Qure.ai, Mumbai;  2. CT and MRI center, Nagpur;  3. Department of Radiology, Mayo Clinic, Rochester, MN;  4. Centre for Advanced Research in Imaging, Neurosciences and Genomics, New Delhi

Published: 13 March 2018

journal

Importance

Non-contrast head CT scan is the current standard for initial imaging of patients with head trauma or stroke symptoms.

Objective

To develop and validate a set of deep learning algorithms for automated detection of following key findings from non-contrast head CT scans: intracranial hemorrhage (ICH) and its types, intraparenchymal (IPH), intraventricular (IVH), subdural (SDH), ex- tradural (EDH) and subarachnoid (SAH) hemorrhages, calvarial fractures, midline shift and mass effect.

Design And Settings

We retrospectively collected a dataset containing 313,318 head CT scans along with their clinical reports from various centers. A part of this dataset (Qure25k dataset) was used to validate and the rest to develop algorithms. Additionally, a dataset (CQ500 dataset) was collected from different centers in two batches B1 & B2 to clinically validate the algorithms.

Main Outcomes And Measures

Original clinical radiology report and consensus of three independent radiologists were considered as gold standard for Qure25k and CQ500 datasets respectively. Area under receiver operating characteristics curve (AUC) for each finding was primarily used to evaluate the algorithms.

Results

Qure25k dataset contained 21,095 scans (mean age 43.31; 42.87% female) while batches B1 and B2 of CQ500 dataset consisted of 214 (mean age 43.40; 43.92% female) and 277 (mean age 51.70; 30.31% female) scans respectively. On Qure25k dataset, the algorithms achieved AUCs of 0.9194, 0.8977, 0.9559, 0.9161, 0.9288 and 0.9044 for detecting ICH, IPH, IVH, SDH, EDH and SAH respectively. AUCs for the same on CQ500 dataset were 0.9419, 0.9544, 0.9310, 0.9521, 0.9731 and 0.9574 respectively. For detecting calvarial fractures, midline shift and mass effect, AUCs on Qure25k dataset were 0.9244, 0.9276 and 0.8583 respectively, while AUCs on CQ500 dataset were 0.9624, 0.9697 and 0.9216 respectively.

Conclusions And Relevance

This study demonstrates that deep learning algorithms can accurately identify head CT scan abnormalities requiring urgent attention. This opens up the possibility to use these algorithms to automate the triage process. They may also provide a lower bound for quality and consistency of radiological interpretation.

Read full paper

Efficacy of deep learning for screening pulmonary tuberculosis

1. Qure.ai, Mumbai

Presented: 04 March 2018, European Congress of Radiology (ECR)

conference

Purpose

Chest X-rays(CXR), being highly sensitive, serve as a Screening tool in TB diagnosis. Though there are no classical features diagnostic of TB on CXR, there are a few patterns that can be used as supportive evidence. In Resource limited settings, developing Deep Learning algorithms for CXR based TB screening, could reduce diagnostic delay. Our algorithm screens for 8 abnormal patterns(TB tags)- Pleural effusion, blunted CP, Atelectasis, Fibrosis, Opacity, Nodules, Calcification and Cavity. It reports ‘No Abnormality Detected’ if none of these patterns are present on CXR.

Methods

An anonymized dataset of 423,218 CXRs with matched radiologist reports across (22 models, 9 manufacturers, 166 centres in India) was used to generate training data for the deep learning models. Natural Language Processing techniques were used to extract TB tags from these reports. Deep learning systems were trained to predict the probability of the presence/absence of each TB tag along with heat-maps that highlight abnormal regions in the CXR for each positive result.

Results

We validated the screening algorithm on 3 datasets external to our training set- two public datasets maintained by NIH(from Montgomery and Shenzen) and a third from NIRT, India. The Area under the Receiver Operating Curve (AUC-ROC) for TB prediction was 0.91, 0.87 and 0.83 respectively.

Conclusion

Training on a diversified dataset enabled good performance on samples from completely different demographics. After further validation of it’s robustness against variation, the system can be deployed at scale to improve the current systems for TB screening significantly

Watch recorded presentation at ECR (sign-up required)

Automated detection of intra- and extra-axial haemorrhages on CT brain images using deep neural networks

1. Qure.ai, Mumbai;  2. CT and MRI center, Nagpur

Presented: 04 March 2018, European Congress of Radiology (ECR)

conference

Purpose

To develop and validate a deep neural network-based algorithm for automated, rapid and accurate detection from head CT for the following haemorrhages: intracerebral (ICH), subdural (SDH), extradural (EDH) and subarachnoid (SAH).

Methods

An anonymised database of head CTs was searched for non-contrast scans which were reported with any of ICH, SDH, EDH, SAH and those which were reported with neither of these. Each slice of these scans is manually tagged with the haemorrhages that are visible in that slice. In all, 3040 scans (116227 slices) were annotated, of which number of scans(slices) with ICH, SDH, EDH, SAH and neither of these are 781(6957), 493(6593), 742(6880), 561(5609) and 944(92999), respectively. Our deep learning model is a modified ResNet18 with 4 parallel final fully connected layers for each of the haemorrhages. This model is trained on the slices from the annotated dataset to make slice-level decisions. Random forests are trained with ResNet’s softmax outputs for all the slices in a scan as features to make scan-level decisions.

Results

A different set of 2993 scans, uniformly sampled from the database without any exclusion criterion, is used for testing the scan-level decisions. Number of scans with ICH, SDH, EDH and SAH in this set are 123, 58, 41 and 62, respectively. Area under the receiver operating curve (AUC) for scan-level decisions for ICH, SDH, EDH and SAH are 0.91, 0.90, 0.90 and 0.90, respectively. Algorithm takes less than 1s to produce the decision for a scan.

Conclusion

Deep learning can accurately detect intra- and extra-axial haemorrhages from head CTs.

Watch recorded presentation at ECR (sign-up required)

Identifying pulmonary consolidation in Chest X Rays using deep learning

1. Qure.ai, Mumbai

Presented: 04 March 2018, European Congress of Radiology (ECR)

conference

Purpose

Chest x-rays are widely used to identify pulmonary consolidation because they are highly accessible, cheap and sensitive. Automating the diagnosis in chest x-rays can reduce diagnostic delay, especially in resource-limited settings.

Methods

Anonymised dataset of 423,218 chest x-rays with corresponding reports (collected from 166 centres across India spanning 22 x-ray machine variants from 9 manufacturers) is used for training and validation. x-rays with consolidation are identified from their reports using natural language processing techniques. Images are preprocessed to a standard size and normalised to remove source dependency. These images are trained using deep residual neural networks. Multiple models are trained on various selective subsets of the dataset along with one model trained on entire data set. Scores yielded by each of these models is passed through a 2-layer neural network to generate final probabilities for presence of consolidation in an x-ray.

Results

The model is validated and tested on a test dataset that is uniformly sampled from the parent dataset without any exclusion criteria. Sensitivity and specificity for the tag has been observed as 0.81 and 0.80, respectively. Area under the Receiver Operating Curve (AUC-ROC) was observed as 0.88.

Conclusion

Deep learning can be used to diagnose pulmonary consolidation in chest x-rays with models trained on a generalised dataset with samples from multiple demographics. This model performs better than a model trained on controlled dataset and is suited for a real world setting where x-ray quality may not be consistent.

Watch recorded presentation at ECR (sign-up required)

Automatic detection of generalised cerebral atrophy using deep neural networks from head CT scans

1. Qure.ai, Mumbai;  2. CT and MRI center, Nagpur

Presented: 04 March 2018, European Congress of Radiology (ECR)

conference

Purpose

Features of generalised cerebral atrophy on brain CT images are the marker of neurodegenerative diseases of the brain. Our study aims at automated diagnosis of generalised cerebral atrophy on brain CT images using deep neural networks thereby offering an objective early diagnosis.

Methods

An anonymised dataset containing 78 head CT scans (1608 slices) was used to train and validate a skull-stripping algorithm. The intracranial region was marked out slice by slice in each scan. Then a U-Net-based deep neural network was trained on these annotations to strip the skull from each slice. A second anonymised dataset containing 2189 CT scans (231 scans with atrophy) was used to train and validate an atrophy detection algorithm. First, an image registration technique was applied on the predicted intracranial region to align all scans to a standard head CT scan. The parenchymal and CSF volume was calculated by thresholding Hounsfield units from the intracranial region. The ratio of CSF volume to parenchymal volume from each slice of the aligned CT scan and the age of the patient were used as features to train a random forest algorithm that decides if the scan shows generalised cerebral atrophy.

Results

An independent set of 3000 head CT scans (347 scans with atrophy) was used to test the algorithm. Area under the receiver operating curve (AUC) for scan-level decisions is 0.86. Predictions on each patient takes time < 45s.

Conclusion

Deep convolutional networks can accurately detect generalised cerebral atrophy given a CT scan.

Watch recorded presentation at ECR (sign-up required)

Clinical validation of a deep learning algorithm for quantification of the idiopathic pulmonary fibrosis pattern

1. Qure.ai, Mumbai;  2. Jankharia Imaging Centre;  3. Centre for Advanced Research in Imaging, Neurosciences and Genomics, New Delhi;  4. Department of Pulmonology, Hinduja Hospital and Research Centre, Mumbai, India

Presented: 02 March 2018, European Congress of Radiology (ECR)

conference

Purpose

Radiologists are currently ill equipped to precisely estimate disease burden and track the progression of idiopathic pulmonary fibrosis (IPF). Development of an automated method for IPF segmentation is challenging, due to the complexity of the fibrosis pattern and degree of variation between patients. Deep neural networks are machine learning algorithms that overcome these challenges. We describe the development and validation of a novel deep learning method to quantify the IPF pattern.

Methods

We used high-resolution chest CT scans from 23 patients with IPF as training data. The fibrosis pattern was marked out on 60 slices per scan. Annotated scans, with 6 additional normal scans were used to train a convolutional neural network to outline the IPF disease pattern. Segmentation accuracy was measured using Dice score. For each patient, percentage of lungs affected by IPF was calculated. An independent set of 50 scans was used for clinical validation. Disease volume was independently estimated by 2 thoracic radiologists blinded to the algorithm estimate. Algorithm-derived estimates were correlated with radiologist estimates of disease volume.

Results

A 3-dimensional neural network architecture coupled with 2-dimensional post-processing of each slice produced the most accurate segmentation, with a Dice score of 0.77. The correlation between algorithm-derived disease volume estimate and average radiologist estimates was 0.92. Inter-radiologist correlation was 0.89. Radiologist estimates of disease volume varied by 5.5% (range 0-15%).

Conclusion

We demonstrate that a deep neural network, trained using expert-annotated images, can accurately quantify the percentage of lung volume affected by IPF.

Watch recorded presentation at ECR (sign-up required)

Automated detection and localisation of skull fractures from CT scans using deep learning

1. Qure.ai, Mumbai;  2. CT and MRI center, Nagpur

Presented: 02 March 2018, European Congress of Radiology (ECR)

conference

Purpose

To develop and validate deep learning-based algorithm pipeline for fast detection and localisation of skull fractures from non-contrast CT scans. All kinds of skull fractures: undisplaced, depressed, comminuted, etc. were included as part of study.

Methods

Anonymized and annotated dataset of 350 scans (11750 slices) with skull fractures were used for generating candidate proposals for fractures. Stacked network pipeline was used for candidate generation - a fully convolutional network for ROI generation and another deep convolutional network for ROI classification. Final ROI classification model (ResNet18) yielded fracture probabilities for candidates generated through the fully convolutional (UNet) network. Separate deep learning model was trained to detect haemorrhages on scan level which was used as proxy for clinical information. Fracture candidate features like size, probabilities, depth for top 5 most probable fracture candidates along with haemorrhage model confidence (phaemorrhage) were combined to train random forest classifier to detect fracture on scan level. In case of predicted fracture, most probable candidate(s) were used for localization.

Results

Separate set of 2971 scans, uniformly sampled from database with no exclusion criterion, was used for testing scan-level decisions with 108 scans reported as skull fracture cases. To evaluate scan-level decisions for fractures, area under receiver operating curve (AUC-ROC) was calculated as 0.83 with phaemorrhage as feature and 0.72 without. Free receiver operating curve yielded 0.9 sensitivity at 2.85 false-positives-per-scan. Predictions on each patient takes < 30s.

Conclusion

Deep learning-based pipeline can accurately detect and localize skull fractures. Pipeline can be used for triaging patients for presence of skull fractures.

Watch recorded presentation at ECR (sign-up required)

Variation in practice patterns and outcomes across United Network for Organ Sharing allocation regions

1. Department of Cardiovascular Medicine, Yale School of Medicine, New Haven, CT;  2. Qure.ai, Mumbai, India;  3. Center for Outcomes Research, Yale University School of Medicine New Haven, CT

Published: 22 January 2018, Clinical Cardiology

journal

Background

The number of heart transplants performed is limited by organ availability and is managed by the United Network for Organ Sharing (UNOS). Efforts are underway to make organ disbursement more equitable as demand increases.

Hypothesis

Significant variation exists in contemporary patterns of care, wait times, and outcomes among patients undergoing heart transplantation across UNOS regions.

Methods

We identified adult patients undergoing first, single‐organ heart transplantation between January 2006 and December 2014 in the UNOS dataset and compared sociodemographic and clinical profiles, wait times, use of mechanical circulatory support (MCS), status at time of transplantation, and 1‐year survival across UNOS regions.

Results

We analyzed 17 096 patients undergoing heart transplantation. There were no differences in age, sex, renal function, and peripheral vascular resistance across regions; however, there was 3‐fold variation in median wait time (range, 48–166 days) across UNOS regions. Proportion of patients undergoing transplantation with status 1A ranged from 36% to 79% across regions (P < 0.01), and percentage of patients hospitalized at time of transplantation varied from 41% to 98%. There was also marked variation in MCS and inotrope utilization (28%–57% and 25%–58%, respectively; P < 0.001). Durable ventricular assist device implantation varied from 20% to 44% (P < 0.001), and intra‐aortic balloon pump utilization ranged from 4% to 18%.

Conclusions

Marked differences exist in patterns of care across UNOS regions that generally trend with differences in waitlist time. Novel policy initiatives are required to address disparities in access to allografts and ensure equitable and efficient allocation of organs.

Read full paper

Deep Neural Networks to Identify and Localize Intracerebral Hemorrhage and Midline Shift in CT Scans of Brain

1. Columbia Asia Radiology Group, Bengaluru;  2. Qure.ai, Mumbai

Presented: 26 November 2017, Radiological Society of North America (RSNA)

conference

Purpose

CT scans of brain are often the frontline investigations in acute conditions of the brain, particularly strokes. Treatment outcomes largely depend on a quick and accurate interpretation of CT scans. A vital feature to illustrate severity of damage is midline shift - indicating high intracerebral (IC) hemorrhage pressure, which can be fatal. Our study aims at designing a deep convolutional network for detection, fast segmentation and quantification of IC hemorrhage, devising an algorithm for midline shift measurement and identification of cerebral hemisphere affected by the detected hemorrhage.

Methods

The anonymized and annotated dataset had 39 CT scans of brain (16 with IC hemorrhage). A deep neural network was trained slice by slice to segment hemorrhage. The network has fully convolutional encoder and decoder with skip connections in between for better localization. 26 scans (589 slices) were used for training and 13 scans (282 slices) for validation. Features extracted from each patient’s complete IC hemorrhage segmentation output were used to train a decision tree for final diagnosis. Ideal midline was drawn using center of mass in bone window and anterior bone protrusion at the level of foramen of Monro. This along with asymmetry in tissue densities gave displaced midline and midline shift. Affected hemisphere was identified using displaced midline and hemorrhage’s center of mass. Accuracy was measured using receiver operating characteristic (ROC) curve and dice score.

Results

100 scans were separately collected over 2 weeks and used for testing. 6 of them had hemorrhage. Sensitivity and specificity for the diagnosis of hemorrhage were 100% and 98.9% respectively. ROC analysis revealed area under curve of 0.994. Model took 3 seconds to segment one CT scan on average. The mean dice score for all test scans was 0.988 while it was 0.80 for the 6 scans with hemorrhage. Midline shift and affected hemisphere were both identified with 100% accuracy.

Conclusion

In this work, we trained a deep convolutional network to detect and quantify IC hemorrhage in brain CT scans. We also measured midline shift and identified the affected hemisphere. The processing pipeline was fully automatic.

Clinical Relevance

Automated detection of hemorrhage and midline shift can help rapidly distinguish between ischemic and hemorrhagic strokes, enabling faster decision-making and treatment.

RSNA 2017 Program

Generating Heatmaps to Visualize the Evidence of Deep Learning Based Diagnosis of Chest X-Rays

1. Qure.ai, Mumbai;  2. Columbia Asia Radiology Group, Bengaluru

Presented: 26 November 2017, Radiological Society of North America (RSNA)

conference

Purpose

For radiologists to develop confidence in a deep learning diagnostic algorithm, it is essential that the algorithm be able to visually demonstrate the evidence for the diagnosis or disease tag. We describe the development of a method that highlights the region(s) of a chest X-ray (CXR) responsible for a deep learning algorithm diagnosis.

Methods

Using 24,384 CXRs, we trained 18-layer deep residual convolutional neural networks to predict if a chest X-ray was normal or abnormal, and to detect the presence of ‘cardiomegaly, ‘opacity’, and ‘pleural effusion’ in a CXR. We then applied a method called prediction difference analysis for visualization and interpretation of the trained models. The contribution of each patch in the image is estimated as the degree by which the prediction changes if that patch is replaced with an average normal patch. This method was used to generate a relevance score for each pixel which is consequently visualized as a heat map.

Results

We used a 60-20-20 split for train, validation and test sets. The trained neural network showed an area under the ROC curve of 0.89, 0.92, 0.84, 0.91 for tagging abnormal, cardiomegaly, opacity and pleural effusion respectively on the test set. The visualization pipeline is used to generate heatmaps highlighting the enlarged heart, opacities and the fluid corresponding to the cardiomegaly, opacity and pleural effusion tags.

Conclusion

We trained and tested a deep learning algorithm which accurately classifies and assigns clinically relevant tags to CXRs. Further, we applied a visualization method that generates heatmaps highlighting the most relevant parts of the CXR. The visualization method is broadly applicable to other kinds of X-rays, and to other deep learning algorithms. Future work will focus on formally validating the accuracy of the visualization, by measuring overlap between radiologist annotation and algorithm-generated heatmap.

Clinical Relevance

Heatmaps highlighting evidence for disease tags will provide clinical users with crucial visual cues that could ease their decision to accept or reject a deep learning based chest x-ray diagnosis.

RSNA 2017 Program

2D-3D Fully Convolutional Neural Networks for Cardiac MR Segmentation

1. Qure.ai, Mumbai

Published: 31 July 2017

journal

Abstract

In this paper, we develop a 2D and 3D segmentation pipelines for fully automated cardiac MR image segmentation using Deep Convolutional Neural Networks (CNN). Our models are trained end-to-end from scratch using the ACD Challenge 2017 dataset comprising of 100 studies, each containing Cardiac MR images in End Diastole and End Systole phase. We show that both our segmentation models achieve near state-of-the-art performance scores in terms of distance metrics and have convincing accuracy in terms of clinical parameters. A comparative analysis is provided by introducing a novel dice loss function and its combination with cross entropy loss. By exploring different network structures and comprehensive experiments, we discuss several key insights to obtain optimal model performance, which also is central to the theme of this challenge.

Read full paper

Improving Boundary Classification for Brain Tumor Segmentation and Longitudinal Disease Progression

1. University of Southern California, Los Angeles, USA;  2. Qure.ai, Mumbai;  3. Dhristi Inc., Palo Alto, USA

Published: 12 April 2017

journal

Abstract

Tracking the progression of brain tumors is a challenging task, due to the slow growth rate and the combination of different tumor components, such as cysts, enhancing patterns, edema and necrosis. In this paper, we propose a Deep Neural Network based architecture that does automatic segmentation of brain tumor, and focuses on improving accuracy at the edges of these different classes. We show that enhancing the loss function to give more weight to the edge pixels significantly improves the neural network’s accuracy at classifying the boundaries. In the BRATS 2016 challenge, our submission placed third on the task of predicting progression for the complete tumor region.

Read full paper

A Deep-Learning Based Approach for Ischemic Stroke Lesion Outcome Prediction

1. University of Southern California, Los Angeles, USA;  2. Qure.ai, Mumbai;  3. Dhristi Inc., Palo Alto, USA

Published: 17 October 2016, ISLES 2016 proceedings

journal

Abstract

The ISLES 2016 challenge aims to address two important aspects of Ischemic stroke lesion treatment prediction. The first aspect relates to segmenting the brain MRI to identify the areas with lesions and the second aspect relates to predicting the actual clinical outcome in terms of the patient’s degree of disability. The input data consists of acute MRI scans and additional clinical such as TICI scores, Time Since Stroke, and Time to Treatment. To address this challenge we take a deep-learning based approach. In particular, we first focus on the segmentation task and use an automatic segmentation model that consists of a Deep Neural Network (DNN). The DNN takes as input the MRI images and outputs the segmented image, automatically learning the latent underlying features during the training process. The DNN architectures we consider utilize many convolutional layers with small kernels, e.g., 3x3. This approach requires fewer parameters to estimate, and allows one to learn and generalize from the somewhat limited amount of data that is provided. One of the architectures we are currently utilizing is based on the U-Net [1], which is an all-convolutional network. It acts as an auto-encoder, that first “en- codes” the input image by applying combinations of convolutional and pooling operations. This is followed by the “decoding” step that up-scales the encoded images, while performing convolutions. The all-convolutional architecture of the U-Net allows it to handle input images of different dimensions as in the challenge dataset. In our experiments, we found that this architecture yielded excellent performance on the previous ISLES 2015 dataset. Although the modalities in the 2016 challenge are different, our initial training experiments have yielded promising segmentation results. Our next steps involve addressing the regression challenge. There is limited amount of labeled data for this task. Our approach will be to include these outcomes as part of the segmentation training directly. This will allow the DNN to learn latent features that can directly help with the classification task.

Read full paper