Publications and Conference Abstracts

Deep learning algorithms for detection of critical findings in head CT scans - A retrospective study

1. Qure.ai, Mumbai, India;  2. CT & MRI Center, Dhantoli, Nagpur, India;  3. Department of Radiology, Mayo Clinic, Rochester, MN, USA;  4. Centre for Advanced Research in Imaging, Neurosciences and Genomics, New Delhi, India

Published: 11 October 2018, The Lancet

Background

Non-contrast head CT scan is the current standard for initial imaging of patients with head trauma or stroke symptoms. We aimed to develop and validate a set of deep learning algorithms for automated detection of the following key findings from these scans: intracranial haemorrhage and its types (ie, intraparenchymal, intraventricular, subdural, extradural, and subarachnoid); calvarial fractures; midline shift; and mass effect.

Methods

We retrospectively collected a dataset containing 313 318 head CT scans together with their clinical reports from around 20 centres in India between Jan 1, 2011, and June 1, 2017. A randomly selected part of this dataset (Qure25k dataset) was used for validation and the rest was used to develop algorithms. An additional validation dataset (CQ500 dataset) was collected in two batches from centres that were di erent from those used for the development and Qure25k datasets. We excluded postoperative scans and scans of patients younger than 7 years. The original clinical radiology report and consensus of three independent radiologists were considered as gold standard for the Qure25k and CQ500 datasets, respectively. Areas under the receiver operating characteristic curves (AUCs) were primarily used to assess the algorithms.

Findings

The Qure25k dataset contained 21 095 scans (mean age 43 years; 9030 [43%] female patients), and the CQ500 dataset consisted of 214 scans in the rst batch (mean age 43 years; 94 [44%] female patients) and 277 scans in the second batch (mean age 52 years; 84 [30%] female patients). On the Qure25k dataset, the algorithms achieved an AUC of 0·92 (95% CI 0·91–0·93) for detecting intracranial haemorrhage (0·90 [0·89–0·91] for intraparenchymal, 0·96 [0·94–0·97] for intraventricular, 0·92 [0·90–0·93] for subdural, 0·93 [0·91–0·95] for extradural, and 0·90 [0·89–0·92] for subarachnoid). On the CQ500 dataset, AUC was 0·94 (0·92–0·97) for intracranial haemorrhage (0·95 [0·93–0·98], 0·93 [0·87–1·00], 0·95 [0·91–0·99], 0·97 [0·91–1·00], and 0·96 [0·92–0·99], respectively). AUCs on the Qure25k dataset were 0·92 (0·91–0·94) for calvarial fractures, 0·93 (0·91–0·94) for midline shift, and 0·86 (0·85–0·87) for mass e ect, while AUCs on the CQ500 dataset were 0·96 (0·92–1·00), 0·97 (0·94–1·00), and 0·92 (0·89–0·95), respectively.

Interpretation

Our results show that deep learning algorithms can accurately identify head CT scan abnormalities requiring urgent attention, opening up the possibility to use these algorithms to automate the triage process.

Read full paper

Deep learning in chest radiography: Detection of findings and presence of change

1. Department of Radiology, Massachusetts General Hospital, Boston, Massachusetts, United States of America, Harvard Medical School, Boston, Massachusetts, United States of America;  2. Division of Diagnostic Radiology, Department of Diagnostic and Therapeutic Radiology, Faculty of Medicine, Ramathibodi Hospital, Mahidol University, Bangkok, Thailand;  3. Qure.ai, 101 Raheja Titanium, Goregaon East, Mumbai, India

Published: 04 October 2018, PLOS One

Background

Deep learning (DL) based solutions have been proposed for interpretation of several imaging modalities including radiography, CT, and MR. For chest radiographs, DL algorithms have found success in the evaluation of abnormalities such as lung nodules, pulmonary tuberculosis, cystic fibrosis, pneumoconiosis, and location of peripherally inserted central catheters. Chest radiography represents the most commonly performed radiological test for a multitude of non-emergent and emergent clinical indications. This study aims to assess accuracy of deep learning (DL) algorithm for detection of abnormalities on routine frontal chest radiographs (CXR), and assessment of stability or change in findings over serial radiographs.

Methods and Findings

We processed 874 de-identified frontal CXR from 724 adult patients (> 18 years) with DL (Qure AI). Scores and prediction statistics from DL were generated and recorded for the presence of pulmonary opacities, pleural effusions, hilar prominence, and enlarged cardiac silhouette. To establish a standard of reference (SOR), two thoracic radiologists assessed all CXR for these abnormalities. Four other radiologists (test radiologists), unaware of SOR and DL findings, independently assessed the presence of radiographic abnormalities. A total 724 radiographs were assessed for detection of findings. A subset of 150 radiographs with follow up examinations was used to asses change over time. Data were analyzed with receiver operating characteristics analyses and post-hoc power analysis.

Results

About 42% (305/ 724) CXR had no findings according to SOR; single and multiple abnormalities were seen in 23% (168/724) and 35% (251/724) of CXR. There was no statistical difference between DL and SOR for all abnormalities (p = 0.2–0.8). The area under the curve (AUC) for DL and test radiologists ranged between 0.837–0.929 and 0.693–0.923, respectively. DL had lowest AUC (0.758) for assessing changes in pulmonary opacities over follow up CXR. Presence of chest wall implanted devices negatively affected the accuracy of DL algorithm for evaluation of pulmonary and hilar abnormalities.

Conclusions

DL algorithm can aid in interpretation of CXR findings and their stability over follow up CXR. However, in its present version, it is unlikely to replace radiologists due to its limited specificity for categorizing specific findings.

Read full paper

Can Artificial Intelligence Reliably Report Chest X-Rays? Radiologist Validation of an Algorithm trained on 1.2 Million X-Rays

1. Qure.ai, Mumbai, India;  2. Columbia Asia Radiology Group, Bengaluru, India

Published: 19 July 2018

Background and Objectives

Chest x-rays are the most commonly performed, cost-effective diagnostic imaging tests ordered by physicians. A clinically validated, automated artificial intelligence system that can reliably separate normal from abnormal would be invaluable in addressing the problem of reporting backlogs and the lack of radiologists in low-resource settings. The aim of this study was to develop and validate a deep learning system to detect chest x-ray abnormalities.

Methods

A deep learning system was trained on 1.2 million x-rays and their corresponding radiology reports to identify abnormal x-rays and the following specific abnormalities: blunted costophrenic angle, calcification, cardiomegaly, cavity, consolidation, fibrosis, hilar enlargement, opacity and pleural effusion. The system was tested versus a 3-radiologist majority on an independent, retrospectively collected de-identified set of 2000 x-rays. The primary accuracy measure was area under the ROC curve (AUC), estimated separately for each abnormality as well as for normal versus abnormal reports.

Results

The deep learning system demonstrated an AUC of 0.93(CI 0.92-0.94) for detection of abnormal scans, and AUC(CI) of 0.94(0.92-0.97),0.88(0.85-0.91), 0.97(0.95-0.99), 0.92(0.82-1), 0.94(0.91-0.97), 0.92(0.88-0.95), 0.89(0.84-0.94), 0.93(0.92-0.95), 0.98(0.97-1), 0.93(0.0.87-0.99) for the detection of blunted CP angle, calcification, cardiomegaly, cavity, consolidation, fibrosis,hilar enlargement, opacity and pleural effusion respectively.

Conclusions

Our study shows that a deep learning algorithm trained on a large quantity of labelled data can accurately detect abnormalities on chest x-rays. As these systems further increase in accuracy, the feasibility of using artificial intelligence to extend the reach of chest x-ray interpretation and improve reporting efficiency will increase in tandem.

Read full paper

Machine Learning Methods Improve Prognostication, Identify Clinically Distinct Phenotypes, and Detect Heterogeneity in Response to Therapy in a Large Cohort of Heart Failure Patients

1. Section of Cardiovascular Medicine and Center for Outcomes Research, Yale University School of Medicine New Haven, CT;  2. Department of Cardiology, Karolinska Institutet Department of Medicine and Karolinska University Hospital, Stockholm, Sweden;  3. Qure.ai, Mumbai, India;  4. Department of Medicine and Health Sciences, Linköping University, Linköping, Sweden;  5. Duke Clinical Research Institute, Duke University, Durham, NC

Published: 12 April 2018, Journal of the American Heart Association

Background

Whereas heart failure (HF) is a complex clinical syndrome, conventional approaches to its management have treated it as a singular disease, leading to inadequate patient care and inefficient clinical trials. We hypothesized that applying advanced analytics to a large cohort of HF patients would improve prognostication of outcomes, identify distinct patient phenotypes, and detect heterogeneity in treatment response.

Methods and Results

The Swedish Heart Failure Registry is a nationwide registry collecting detailed demographic, clinical, laboratory, and medication data and linked to databases with outcome information. We applied random forest modeling to identify predictors of 1‐year survival. Cluster analysis was performed and validated using serial bootstrapping. Association between clusters and survival was assessed with Cox proportional hazards modeling and interaction testing was performed to assess for heterogeneity in response to HF pharmacotherapy across propensity‐matched clusters. Our study included 44 886 HF patients enrolled in the Swedish Heart Failure Registry between 2000 and 2012. Random forest modeling demonstrated excellent calibration and discrimination for survival (C‐statistic=0.83) whereas left ventricular ejection fraction did not (C‐statistic=0.52): there were no meaningful differences per strata of left ventricular ejection fraction (1‐year survival: 80%, 81%, 83%, and 84%). Cluster analysis using the 8 highest predictive variables identified 4 clinically relevant subgroups of HF with marked differences in 1‐year survival. There were significant interactions between propensity‐matched clusters (across age, sex, and left ventricular ejection fraction and the following medications: diuretics, angiotensin‐converting enzyme inhibitors, β‐blockers, and nitrates, P < 0.001, all).

Conclusions

Machine learning algorithms accurately predicted outcomes in a large data set of HF patients. Cluster analysis identified 4 distinct phenotypes that differed significantly in outcomes and in response to therapeutics. Use of these novel analytic approaches has the potential to enhance effectiveness of current therapies and transform future HF clinical trials.

Read full paper

Development and Validation of Deep Learning Algorithms for Detection of Critical Findings in Head CT Scans

1. Qure.ai, Mumbai;  2. CT and MRI center, Nagpur;  3. Department of Radiology, Mayo Clinic, Rochester, MN;  4. Centre for Advanced Research in Imaging, Neurosciences and Genomics, New Delhi

Published: 13 March 2018

Importance

Non-contrast head CT scan is the current standard for initial imaging of patients with head trauma or stroke symptoms.

Objective

To develop and validate a set of deep learning algorithms for automated detection of following key findings from non-contrast head CT scans: intracranial hemorrhage (ICH) and its types, intraparenchymal (IPH), intraventricular (IVH), subdural (SDH), ex- tradural (EDH) and subarachnoid (SAH) hemorrhages, calvarial fractures, midline shift and mass effect.

Design And Settings

We retrospectively collected a dataset containing 313,318 head CT scans along with their clinical reports from various centers. A part of this dataset (Qure25k dataset) was used to validate and the rest to develop algorithms. Additionally, a dataset (CQ500 dataset) was collected from different centers in two batches B1 & B2 to clinically validate the algorithms.

Main Outcomes And Measures

Original clinical radiology report and consensus of three independent radiologists were considered as gold standard for Qure25k and CQ500 datasets respectively. Area under receiver operating characteristics curve (AUC) for each finding was primarily used to evaluate the algorithms.

Results

Qure25k dataset contained 21,095 scans (mean age 43.31; 42.87% female) while batches B1 and B2 of CQ500 dataset consisted of 214 (mean age 43.40; 43.92% female) and 277 (mean age 51.70; 30.31% female) scans respectively. On Qure25k dataset, the algorithms achieved AUCs of 0.9194, 0.8977, 0.9559, 0.9161, 0.9288 and 0.9044 for detecting ICH, IPH, IVH, SDH, EDH and SAH respectively. AUCs for the same on CQ500 dataset were 0.9419, 0.9544, 0.9310, 0.9521, 0.9731 and 0.9574 respectively. For detecting calvarial fractures, midline shift and mass effect, AUCs on Qure25k dataset were 0.9244, 0.9276 and 0.8583 respectively, while AUCs on CQ500 dataset were 0.9624, 0.9697 and 0.9216 respectively.

Conclusions And Relevance

This study demonstrates that deep learning algorithms can accurately identify head CT scan abnormalities requiring urgent attention. This opens up the possibility to use these algorithms to automate the triage process. They may also provide a lower bound for quality and consistency of radiological interpretation.

Read full paper

Efficacy of deep learning for screening pulmonary tuberculosis

1. Qure.ai, Mumbai

Presented: 04 March 2018, European Congress of Radiology (ECR)

Purpose

Chest x-rays are widely used to identify pulmonary consolidation because they are highly accessible, cheap and sensitive. Automating the diagnosis in chest x-rays can reduce diagnostic delay, especially in resource-limited settings.

Methods

Anonymised dataset of 423,218 chest x-rays with corresponding reports (collected from 166 centres across India spanning 22 x-ray machine variants from 9 manufacturers) is used for training and validation. x-rays with consolidation are identified from their reports using natural language processing techniques. Images are preprocessed to a standard size and normalised to remove source dependency. These images are trained using deep residual neural networks. Multiple models are trained on various selective subsets of the dataset along with one model trained on entire data set. Scores yielded by each of these models is passed through a 2-layer neural network to generate final probabilities for presence of consolidation in an x-ray.

Results

The model is validated and tested on a test dataset that is uniformly sampled from the parent dataset without any exclusion criteria. Sensitivity and specificity for the tag has been observed as 0.81 and 0.80, respectively. Area under the Receiver Operating Curve (AUC-ROC) was observed as 0.88.

Conclusion

Deep learning can be used to diagnose pulmonary consolidation in chest x-rays with models trained on a generalised dataset with samples from multiple demographics. This model performs better than a model trained on controlled dataset and is suited for a real world setting where x-ray quality may not be consistent.

Watch recorded presentation at ECR (sign-up required)

Automated detection of intra- and extra-axial haemorrhages on CT brain images using deep neural networks

1. Qure.ai, Mumbai;  2. CT and MRI center, Nagpur

Presented: 04 March 2018, European Congress of Radiology (ECR)

Purpose

To develop and validate a deep neural network-based algorithm for automated, rapid and accurate detection from head CT for the following haemorrhages: intracerebral (ICH), subdural (SDH), extradural (EDH) and subarachnoid (SAH).

Methods

An anonymised database of head CTs was searched for non-contrast scans which were reported with any of ICH, SDH, EDH, SAH and those which were reported with neither of these. Each slice of these scans is manually tagged with the haemorrhages that are visible in that slice. In all, 3040 scans (116227 slices) were annotated, of which number of scans(slices) with ICH, SDH, EDH, SAH and neither of these are 781(6957), 493(6593), 742(6880), 561(5609) and 944(92999), respectively. Our deep learning model is a modified ResNet18 with 4 parallel final fully connected layers for each of the haemorrhages. This model is trained on the slices from the annotated dataset to make slice-level decisions. Random forests are trained with ResNet’s softmax outputs for all the slices in a scan as features to make scan-level decisions.

Results

A different set of 2993 scans, uniformly sampled from the database without any exclusion criterion, is used for testing the scan-level decisions. Number of scans with ICH, SDH, EDH and SAH in this set are 123, 58, 41 and 62, respectively. Area under the receiver operating curve (AUC) for scan-level decisions for ICH, SDH, EDH and SAH are 0.91, 0.90, 0.90 and 0.90, respectively. Algorithm takes less than 1s to produce the decision for a scan.

Conclusion

Deep learning can accurately detect intra- and extra-axial haemorrhages from head CTs.

Watch recorded presentation at ECR (sign-up required)

Identifying pulmonary consolidation in Chest X Rays using deep learning

1. Qure.ai, Mumbai

Presented: 04 March 2018, European Congress of Radiology (ECR)

Purpose

Chest x-rays are widely used to identify pulmonary consolidation because they are highly accessible, cheap and sensitive. Automating the diagnosis in chest x-rays can reduce diagnostic delay, especially in resource-limited settings.

Methods

Anonymised dataset of 423,218 chest x-rays with corresponding reports (collected from 166 centres across India spanning 22 x-ray machine variants from 9 manufacturers) is used for training and validation. x-rays with consolidation are identified from their reports using natural language processing techniques. Images are preprocessed to a standard size and normalised to remove source dependency. These images are trained using deep residual neural networks. Multiple models are trained on various selective subsets of the dataset along with one model trained on entire data set. Scores yielded by each of these models is passed through a 2-layer neural network to generate final probabilities for presence of consolidation in an x-ray.

Results

The model is validated and tested on a test dataset that is uniformly sampled from the parent dataset without any exclusion criteria. Sensitivity and specificity for the tag has been observed as 0.81 and 0.80, respectively. Area under the Receiver Operating Curve (AUC-ROC) was observed as 0.88.

Conclusion

Deep learning can be used to diagnose pulmonary consolidation in chest x-rays with models trained on a generalised dataset with samples from multiple demographics. This model performs better than a model trained on controlled dataset and is suited for a real world setting where x-ray quality may not be consistent.

Watch recorded presentation at ECR (sign-up required)

Automatic detection of generalised cerebral atrophy using deep neural networks from head CT scans

1. Qure.ai, Mumbai;  2. CT and MRI center, Nagpur

Presented: 04 March 2018, European Congress of Radiology (ECR)

Purpose

Features of generalised cerebral atrophy on brain CT images are the marker of neurodegenerative diseases of the brain. Our study aims at automated diagnosis of generalised cerebral atrophy on brain CT images using deep neural networks thereby offering an objective early diagnosis.

Methods

An anonymised dataset containing 78 head CT scans (1608 slices) was used to train and validate a skull-stripping algorithm. The intracranial region was marked out slice by slice in each scan. Then a U-Net-based deep neural network was trained on these annotations to strip the skull from each slice. A second anonymised dataset containing 2189 CT scans (231 scans with atrophy) was used to train and validate an atrophy detection algorithm. First, an image registration technique was applied on the predicted intracranial region to align all scans to a standard head CT scan. The parenchymal and CSF volume was calculated by thresholding Hounsfield units from the intracranial region. The ratio of CSF volume to parenchymal volume from each slice of the aligned CT scan and the age of the patient were used as features to train a random forest algorithm that decides if the scan shows generalised cerebral atrophy.

Results

An independent set of 3000 head CT scans (347 scans with atrophy) was used to test the algorithm. Area under the receiver operating curve (AUC) for scan-level decisions is 0.86. Predictions on each patient takes time < 45s.

Conclusion

Deep convolutional networks can accurately detect generalised cerebral atrophy given a CT scan.

Watch recorded presentation at ECR (sign-up required)

Clinical validation of a deep learning algorithm for quantification of the idiopathic pulmonary fibrosis pattern

1. Qure.ai, Mumbai;  2. Jankharia Imaging Centre;  3. Centre for Advanced Research in Imaging, Neurosciences and Genomics, New Delhi;  4. Department of Pulmonology, Hinduja Hospital and Research Centre, Mumbai, India

Presented: 02 March 2018, European Congress of Radiology (ECR)

Purpose

Radiologists are currently ill equipped to precisely estimate disease burden and track the progression of idiopathic pulmonary fibrosis (IPF). Development of an automated method for IPF segmentation is challenging, due to the complexity of the fibrosis pattern and degree of variation between patients. Deep neural networks are machine learning algorithms that overcome these challenges. We describe the development and validation of a novel deep learning method to quantify the IPF pattern.

Methods

We used high-resolution chest CT scans from 23 patients with IPF as training data. The fibrosis pattern was marked out on 60 slices per scan. Annotated scans, with 6 additional normal scans were used to train a convolutional neural network to outline the IPF disease pattern. Segmentation accuracy was measured using Dice score. For each patient, percentage of lungs affected by IPF was calculated. An independent set of 50 scans was used for clinical validation. Disease volume was independently estimated by 2 thoracic radiologists blinded to the algorithm estimate. Algorithm-derived estimates were correlated with radiologist estimates of disease volume.

Results

A 3-dimensional neural network architecture coupled with 2-dimensional post-processing of each slice produced the most accurate segmentation, with a Dice score of 0.77. The correlation between algorithm-derived disease volume estimate and average radiologist estimates was 0.92. Inter-radiologist correlation was 0.89. Radiologist estimates of disease volume varied by 5.5% (range 0-15%).

Conclusion

We demonstrate that a deep neural network, trained using expert-annotated images, can accurately quantify the percentage of lung volume affected by IPF.

Watch recorded presentation at ECR (sign-up required)

Automated detection and localisation of skull fractures from CT scans using deep learning

1. Qure.ai, Mumbai;  2. CT and MRI center, Nagpur

Presented: 02 March 2018, European Congress of Radiology (ECR)

Purpose

To develop and validate deep learning-based algorithm pipeline for fast detection and localisation of skull fractures from non-contrast CT scans. All kinds of skull fractures: undisplaced, depressed, comminuted, etc. were included as part of study.

Methods

Anonymized and annotated dataset of 350 scans (11750 slices) with skull fractures were used for generating candidate proposals for fractures. Stacked network pipeline was used for candidate generation - a fully convolutional network for ROI generation and another deep convolutional network for ROI classification. Final ROI classification model (ResNet18) yielded fracture probabilities for candidates generated through the fully convolutional (UNet) network. Separate deep learning model was trained to detect haemorrhages on scan level which was used as proxy for clinical information. Fracture candidate features like size, probabilities, depth for top 5 most probable fracture candidates along with haemorrhage model confidence (phaemorrhage) were combined to train random forest classifier to detect fracture on scan level. In case of predicted fracture, most probable candidate(s) were used for localization.

Results

Separate set of 2971 scans, uniformly sampled from database with no exclusion criterion, was used for testing scan-level decisions with 108 scans reported as skull fracture cases. To evaluate scan-level decisions for fractures, area under receiver operating curve (AUC-ROC) was calculated as 0.83 with phaemorrhage as feature and 0.72 without. Free receiver operating curve yielded 0.9 sensitivity at 2.85 false-positives-per-scan. Predictions on each patient takes < 30s.

Conclusion

Deep learning-based pipeline can accurately detect and localize skull fractures. Pipeline can be used for triaging patients for presence of skull fractures.

Watch recorded presentation at ECR (sign-up required)

Variation in practice patterns and outcomes across United Network for Organ Sharing allocation regions

1. Department of Cardiovascular Medicine, Yale School of Medicine, New Haven, CT;  2. Qure.ai, Mumbai, India;  3. Center for Outcomes Research, Yale University School of Medicine New Haven, CT

Published: 22 January 2018, Clinical Cardiology

Background

The number of heart transplants performed is limited by organ availability and is managed by the United Network for Organ Sharing (UNOS). Efforts are underway to make organ disbursement more equitable as demand increases.

Hypothesis

Significant variation exists in contemporary patterns of care, wait times, and outcomes among patients undergoing heart transplantation across UNOS regions.

Methods

We identified adult patients undergoing first, single‐organ heart transplantation between January 2006 and December 2014 in the UNOS dataset and compared sociodemographic and clinical profiles, wait times, use of mechanical circulatory support (MCS), status at time of transplantation, and 1‐year survival across UNOS regions.

Results

We analyzed 17 096 patients undergoing heart transplantation. There were no differences in age, sex, renal function, and peripheral vascular resistance across regions; however, there was 3‐fold variation in median wait time (range, 48–166 days) across UNOS regions. Proportion of patients undergoing transplantation with status 1A ranged from 36% to 79% across regions (P < 0.01), and percentage of patients hospitalized at time of transplantation varied from 41% to 98%. There was also marked variation in MCS and inotrope utilization (28%–57% and 25%–58%, respectively; P < 0.001). Durable ventricular assist device implantation varied from 20% to 44% (P < 0.001), and intra‐aortic balloon pump utilization ranged from 4% to 18%.

Conclusions

Marked differences exist in patterns of care across UNOS regions that generally trend with differences in waitlist time. Novel policy initiatives are required to address disparities in access to allografts and ensure equitable and efficient allocation of organs.

Read full paper

Deep Neural Networks to Identify and Localize Intracerebral Hemorrhage and Midline Shift in CT Scans of Brain

1. Columbia Asia Radiology Group, Bengaluru;  2. Qure.ai, Mumbai

Presented: 26 November 2017, Radiological Society of North America (RSNA)

Purpose

CT scans of brain are often the frontline investigations in acute conditions of the brain, particularly strokes. Treatment outcomes largely depend on a quick and accurate interpretation of CT scans. A vital feature to illustrate severity of damage is midline shift - indicating high intracerebral (IC) hemorrhage pressure, which can be fatal. Our study aims at designing a deep convolutional network for detection, fast segmentation and quantification of IC hemorrhage, devising an algorithm for midline shift measurement and identification of cerebral hemisphere affected by the detected hemorrhage.

Methods

The anonymized and annotated dataset had 39 CT scans of brain (16 with IC hemorrhage). A deep neural network was trained slice by slice to segment hemorrhage. The network has fully convolutional encoder and decoder with skip connections in between for better localization. 26 scans (589 slices) were used for training and 13 scans (282 slices) for validation. Features extracted from each patient’s complete IC hemorrhage segmentation output were used to train a decision tree for final diagnosis. Ideal midline was drawn using center of mass in bone window and anterior bone protrusion at the level of foramen of Monro. This along with asymmetry in tissue densities gave displaced midline and midline shift. Affected hemisphere was identified using displaced midline and hemorrhage’s center of mass. Accuracy was measured using receiver operating characteristic (ROC) curve and dice score.

Results

100 scans were separately collected over 2 weeks and used for testing. 6 of them had hemorrhage. Sensitivity and specificity for the diagnosis of hemorrhage were 100% and 98.9% respectively. ROC analysis revealed area under curve of 0.994. Model took 3 seconds to segment one CT scan on average. The mean dice score for all test scans was 0.988 while it was 0.80 for the 6 scans with hemorrhage. Midline shift and affected hemisphere were both identified with 100% accuracy.

Conclusion

In this work, we trained a deep convolutional network to detect and quantify IC hemorrhage in brain CT scans. We also measured midline shift and identified the affected hemisphere. The processing pipeline was fully automatic.

Clinical Relevance

Automated detection of hemorrhage and midline shift can help rapidly distinguish between ischemic and hemorrhagic strokes, enabling faster decision-making and treatment.

RSNA 2017 Program

Generating Heatmaps to Visualize the Evidence of Deep Learning Based Diagnosis of Chest X-Rays

1. Qure.ai, Mumbai;  2. Columbia Asia Radiology Group, Bengaluru

Presented: 26 November 2017, Radiological Society of North America (RSNA)

Purpose

For radiologists to develop confidence in a deep learning diagnostic algorithm, it is essential that the algorithm be able to visually demonstrate the evidence for the diagnosis or disease tag. We describe the development of a method that highlights the region(s) of a chest X-ray (CXR) responsible for a deep learning algorithm diagnosis.

Methods

Using 24,384 CXRs, we trained 18-layer deep residual convolutional neural networks to predict if a chest X-ray was normal or abnormal, and to detect the presence of ‘cardiomegaly, ‘opacity’, and ‘pleural effusion’ in a CXR. We then applied a method called prediction difference analysis for visualization and interpretation of the trained models. The contribution of each patch in the image is estimated as the degree by which the prediction changes if that patch is replaced with an average normal patch. This method was used to generate a relevance score for each pixel which is consequently visualized as a heat map.

Results

We used a 60-20-20 split for train, validation and test sets. The trained neural network showed an area under the ROC curve of 0.89, 0.92, 0.84, 0.91 for tagging abnormal, cardiomegaly, opacity and pleural effusion respectively on the test set. The visualization pipeline is used to generate heatmaps highlighting the enlarged heart, opacities and the fluid corresponding to the cardiomegaly, opacity and pleural effusion tags.

Conclusion

We trained and tested a deep learning algorithm which accurately classifies and assigns clinically relevant tags to CXRs. Further, we applied a visualization method that generates heatmaps highlighting the most relevant parts of the CXR. The visualization method is broadly applicable to other kinds of X-rays, and to other deep learning algorithms. Future work will focus on formally validating the accuracy of the visualization, by measuring overlap between radiologist annotation and algorithm-generated heatmap.

Clinical Relevance

Heatmaps highlighting evidence for disease tags will provide clinical users with crucial visual cues that could ease their decision to accept or reject a deep learning based chest x-ray diagnosis.

RSNA 2017 Program

2D-3D Fully Convolutional Neural Networks for Cardiac MR Segmentation

1. Qure.ai, Mumbai

Published: 31 July 2017

Abstract

In this paper, we develop a 2D and 3D segmentation pipelines for fully automated cardiac MR image segmentation using Deep Convolutional Neural Networks (CNN). Our models are trained end-to-end from scratch using the ACD Challenge 2017 dataset comprising of 100 studies, each containing Cardiac MR images in End Diastole and End Systole phase. We show that both our segmentation models achieve near state-of-the-art performance scores in terms of distance metrics and have convincing accuracy in terms of clinical parameters. A comparative analysis is provided by introducing a novel dice loss function and its combination with cross entropy loss. By exploring different network structures and comprehensive experiments, we discuss several key insights to obtain optimal model performance, which also is central to the theme of this challenge.

Read full paper

Improving Boundary Classification for Brain Tumor Segmentation and Longitudinal Disease Progression

1. University of Southern California, Los Angeles, USA;  2. Qure.ai, Mumbai;  3. Dhristi Inc., Palo Alto, USA

Published: 12 April 2017

Abstract

Tracking the progression of brain tumors is a challenging task, due to the slow growth rate and the combination of different tumor components, such as cysts, enhancing patterns, edema and necrosis. In this paper, we propose a Deep Neural Network based architecture that does automatic segmentation of brain tumor, and focuses on improving accuracy at the edges of these different classes. We show that enhancing the loss function to give more weight to the edge pixels significantly improves the neural network’s accuracy at classifying the boundaries. In the BRATS 2016 challenge, our submission placed third on the task of predicting progression for the complete tumor region.

Read full paper

A Deep-Learning Based Approach for Ischemic Stroke Lesion Outcome Prediction

1. University of Southern California, Los Angeles, USA;  2. Qure.ai, Mumbai;  3. Dhristi Inc., Palo Alto, USA

Published: 17 October 2016, ISLES 2016 proceedings

Abstract

The ISLES 2016 challenge aims to address two important aspects of Ischemic stroke lesion treatment prediction. The first aspect relates to segmenting the brain MRI to identify the areas with lesions and the second aspect relates to predicting the actual clinical outcome in terms of the patient’s degree of disability. The input data consists of acute MRI scans and additional clinical such as TICI scores, Time Since Stroke, and Time to Treatment. To address this challenge we take a deep-learning based approach. In particular, we first focus on the segmentation task and use an automatic segmentation model that consists of a Deep Neural Network (DNN). The DNN takes as input the MRI images and outputs the segmented image, automatically learning the latent underlying features during the training process. The DNN architectures we consider utilize many convolutional layers with small kernels, e.g., 3x3. This approach requires fewer parameters to estimate, and allows one to learn and generalize from the somewhat limited amount of data that is provided. One of the architectures we are currently utilizing is based on the U-Net [1], which is an all-convolutional network. It acts as an auto-encoder, that first “en- codes” the input image by applying combinations of convolutional and pooling operations. This is followed by the “decoding” step that up-scales the encoded images, while performing convolutions. The all-convolutional architecture of the U-Net allows it to handle input images of different dimensions as in the challenge dataset. In our experiments, we found that this architecture yielded excellent performance on the previous ISLES 2015 dataset. Although the modalities in the 2016 challenge are different, our initial training experiments have yielded promising segmentation results. Our next steps involve addressing the regression challenge. There is limited amount of labeled data for this task. Our approach will be to include these outcomes as part of the segmentation training directly. This will allow the DNN to learn latent features that can directly help with the classification task.

Read full paper