Automated classification of X-rays as normal/Abnormal using a high sensitivity deep learning algorithm

Purpose

Majority of chest X-rays (CXRs) performed globally are normal radiologists spend significant time ruling out these scans. We present a Deep Learning (DL) model trained for the specific use of classifying CXRs into normal and abnormal, potentially reducing time and cost associated with reporting normal studies.

Methods

A DL algorithm trained on 1,150,084 CXRs and their corresponding reports was developed. A retrospectively acquired independent test set of 430 CXRs (285 abnormal, 145 normal) was analyzed by the algorithm classifying each X-Ray as normal or abnormal. Ground truth for the independent test set was established by a sub-specialist chest radiologist with 8 years’ experience by reviewing every Chest X-Ray image with reference to the existing report. Algorithm output was compared against ground truth and summary statistics were calculated.

Results

The algorithm correctly classified 376 (87.44%) CXRs with sensitivity of 97.19% (95% CI – 94.54% to 98.78%) and specificity of 68.28% (95% CI – 60.04% to 75.75%). There were 46 (10.70%) false positives and 8 (1.86%) false negatives (FNs). Out of 8 FNs, 3 were designated as clinically insignificant (mild, inactive fibrosis) and 5 as significant (rib fractures, pneumothorax).

Conclusion

High-sensitivity DL algorithms can potentially be deployed for primary read of CXRs enabling radiologists to spend appropriate time on abnormal cases, saving time and thereby cost of reporting CXRs, especially on non-emergency situations, More in-depth prospective trials are required to ascertain the overall impact of such algorithms.

Automated Detection and Localization of Pneumocephalus in Head CT scan

Purpose

Pneumocephalus, accumulation of air in intracranial space, can lead to midline shift and compression of brain. In this work, we detail the development of deep learning algorithms for automated detection and localization of pneumocephalus in head CT scans.

Methods

Firstly, to localize the intracranial space from a given head CT scan, a skull­-stripping algorithm was developed using a randomly sampled anonymized dataset of 78 head CT scans (1608 slices). We sampled another anonymized dataset containing 83 head CT scans (3546 slices) having pneumocephalus and 310 normal head CT scans which were randomly sampled to represent natural distribution. These 3546 slices (932 slices had pneumocephalus) were annotated for pneumocephalus regions. Then U­Net based deep neural network algorithm was trained on these scans to accurately predict the pneumocephalus region . The predicted pneumocephalus region is refined by removing the regions outside the intracranial space identified by the skull stripping algorithm. The refined pneumocephalus region is then used to extract features. Using these features, a random forest was trained to classify the presence of pneumocephalus in a scan. Areas under receiver operating characteristics curves (AUC) were used to evaluate the algorithms.

Results

An independent dataset of 1891 head CT scans (40 scans had pneumocephalus) was used for testing above algorithms. AUC for the scan level predictions was 0.89. Sensitivity and Specificity of 0.80 and 0.83 respectively were observed.

Conclusion

In this work, we showed the efficacy of deep learning algorithms in localizing and classifying the pneumocephalus accurately in a head CT scan.

Deep Learning for Infarct Detection and Localization from Head CT Scans

Purpose

The purpose of this study was to use a deep learning algorithm to detect and localize subacute and chronic ischemic infarcts on head CT scans for use in automated volumetric progression tracking.

Methods

We sampled 308 head CT scans (11840 slices) which were reported with chronic or subacute infarct. The infarcted regions in 11840 infarct-positive slices were marked. We trained segmentation algorithm to predict a heatmap of infarct lesion. The heatmap was used to derive scan level features representative of lesion density and volume to train a random forest to predict scan-level probabilities of chronic infarct. Area under receiver operating characteristics curves (AUC) were used to evaluate scan level predictions.

Results

The algorithm was validated on an independent dataset of 1610 head CT scans containing 78 chronic & 9 subacute infarct, 45 chronic ICH, 6 glioblastomas. The distribution of infarct affected territories was – 52.9% MCA, 33.3 % PCA, 9.3% ACA and 4.7% vertebrobasilar territories. The algorithm yielded AUC of 0.8474 (95% CI 0.7964 – 0.8984) for scan level predictions. It identified 8 of 9 subacute infarcts (88.89% recall) and 70 out of 78 chronic infarcts (89.74% recall). The eight missed chronic infarcts constituted of 3 lacunar and 2 hemorrhagic. The volumes of predicted infarct lesions ranged from 1 mL – 526 mL with mean prediction volume as 55.60mL.

Conclusion

The study demonstrates the capability of deep learning algorithms to accurately differentiate infarcts from infarct mimics.

Automated Detection of Midline Shift and Mass Effect from Head CT Scans using Deep Learning

Purpose

Mass effect and Midline shift are most critical and time­ sensitive abnormalities that can be readily detected on head-CT scan. We describe development and validation of deep learning algorithms to automatically detect the mentioned abnormalities.

Methods

We labeled slices from 699 anonymized non­contrast head-CT scans for the presence or absence of mass effect and midline shift in that slice. Number of scans(slices) with mass effect were 320(3143) and midline shift were 249(2074). We used these labels to train a modified ResNet18, a popular convolutional neural network to predict softmax based confidences for the presence of mass effect and midline shift in a slice. We modified the network by using two parallel fully connected(FC) layers in place of a single FC layer. The confidences at the slice­-level were combined using random forest to predict the scan-level confidence for the presence of mass effect and midline shift. A separate dataset(CQ500 dataset) was collected for the validation of the algorithm. Three senior radiologists independently read each scan in this dataset. Consensus of the readers’ opinion was used as the gold standard. We used areas under receiver operating characteristics curves(AUC) to evaluate the algorithm.

Results

CQ500 dataset contained 491 scans of which number of scans with mass effect and midline shift were 99 and 47 respectively. AUC for detecting mass effect was 0.92(95%CI 0.89-0.95) and for detecting midline shift was 0.97(95%CI 0.94-0.99).

Conclusion

We show that a deep learning algorithm can be trained to accurately detect mass effect and midline shift from head CT scans.

Validation of Deep Learning Algorithms for Detection of Critical Findings in Head CT Scans

Purpose

To validate set of deep learning algorithms for automated detection of key findings from non­contrast head-CT scans: intracranial hemorrhage and its subtypes, calvarial fractures, midline shift and mass effect.

Methods

We retrospectively collected a dataset containing 313,318 head-CT scans of which random subset(Qure25k dataset) was used to validate and rest to develop algorithms. Additional dataset(CQ500 dataset) was collected from different centers to validate algorithms. Patients with post­operative defect or age<7 were excluded from all datasets. Three independent radiologists read each scan in CQ500 dataset. Original clinical radiology report and consensus of readers were considered as gold standards for Qure25k and CQ500 datasets respectively. Areas under receiver operating characteristics curves(AUCs) were used to evaluate algorithms.

Results

After exclusion, Qure25k dataset contained 21,095 scans(mean-age 43;43% female) while CQ500 dataset consisted of 491(mean-age 48;36% female) scans. On Qure25k dataset, algorithms achieved an AUC of 0.92 for detecting intracranial hemorrhage(0.90-intraparenchymal, 0.96-intraventricular, 0.92-subdural, 0.93-extradural, and 0.90-subarachnoid hemorrhages). On CQ500 dataset, AUC was 0.94 for intracranial haemorrhage(0.95, 0.93, 0.95, 0.97, and 0.96 respectively). AUCs on Qure25k dataset were 0.92 for calvarial fractures, 0.93 for midline shift, and 0.86 for mass effect, while AUCs on CQ500 dataset were 0.96, 0.97 and 0.92 respectively.

Conclusion

This study demonstrates that deep learning algorithms can identify head-CT scan abnormalities requiring urgent attention with high AUCs.

Prospective evaluation of a deep learning algorithm deployed in an urban imaging centre to notify clinicians of head CT scans with critical abnormalities

Purpose

Non-contrast Head-CT scans are primary imaging modality for evaluating patients with trauma or stroke. While results of deep learning algorithms to identify head-CT scans containing critical abnormalities has been published in retrospective studies, effects of deployment of such an algorithm in a real-world setting, with mobile notifications to clinician remain unstudied. In this prospective study, we evaluated performance of such an automated triage system in an urban 24-hour imaging facility.

Methods

We developed an accurate deep neural network algorithm that identifies and localizes intracranial bleeds, cranial fractures, mass effect and midline shift on non-contrast head-CT scans. The algorithm is deployed in clinical imaging facility in conjunction with an on-premise module that automatically selects eligible scans from PACS and uploads them to cloud-based algorithm for processing. Once processed, cloud algorithm returns an additional series, viewable as an overlay over the original, and a text notification to radiologist with preview images. Mobile notifications facilitated confirmation of the detected abnormalities. We studied the performance of the automated system over 60 days.

Results

748 CT scans were taken over 60 days, of which 194 were non-contrast head-CT scans and these are evaluated by senior radiologist. Sensitivity, specificity, AUC and average time to notification of head-CT scans with critical abnormalities were  0.90 (95% CI 0.74-0.98), 0.86 (0.80-0.91), 0.97 (0.92-1.00) and 3.2 minutes respectively.

Conclusion

An automated triage system in a radiology facility results in rapid notification of critical scans, with a low false positive rate and this may be used to expedite treatment initiation.

Efficacy of deep learning for screening pulmonary tuberculosis

Purpose

Chest X-rays(CXR), being highly sensitive, serve as a Screening tool in TB diagnosis. Though there are no classical features diagnostic of TB on CXR, there are a few patterns that can be used as supportive evidence. In Resource limited settings, developing Deep Learning algorithms for CXR based TB screening, could reduce diagnostic delay. Our algorithm screens for 8 abnormal patterns(TB tags)- Pleural effusion, blunted CP, Atelectasis, Fibrosis, Opacity, Nodules, Calcification and Cavity. It reports ‘No Abnormality Detected’ if none of these patterns are present on CXR.

Methods

An anonymized dataset of 423,218 CXRs with matched radiologist reports across (22 models, 9 manufacturers, 166 centres in India) was used to generate training data for the deep learning models. Natural Language Processing techniques were used to extract TB tags from these reports. Deep learning systems were trained to predict the probability of the presence/absence of each TB tag along with heat-maps that highlight abnormal regions in the CXR for each positive result.

Results

We validated the screening algorithm on 3 datasets external to our training set- two public datasets maintained by NIH(from Montgomery and Shenzen) and a third from NIRT, India. The Area under the Receiver Operating Curve (AUC-ROC) for TB prediction was 0.91, 0.87 and 0.83 respectively.

Conclusion

Training on a diversified dataset enabled good performance on samples from completely different demographics. After further validation of it’s robustness against variation, the system can be deployed at scale to improve the current systems for TB screening significantly

Automated detection of intra- and extra-axial haemorrhages on CT brain images using deep neural networks

Purpose

To develop and validate a deep neural network-based algorithm for automated, rapid and accurate detection from head CT for the following haemorrhages: intracerebral (ICH), subdural (SDH), extradural (EDH) and subarachnoid (SAH).

Methods

An anonymised database of head CTs was searched for non-contrast scans which were reported with any of ICH, SDH, EDH, SAH and those which were reported with neither of these. Each slice of these scans is manually tagged with the haemorrhages that are visible in that slice. In all, 3040 scans (116227 slices) were annotated, of which number of scans(slices) with ICH, SDH, EDH, SAH and neither of these are 781(6957), 493(6593), 742(6880), 561(5609) and 944(92999), respectively. Our deep learning model is a modified ResNet18 with 4 parallel final fully connected layers for each of the haemorrhages. This model is trained on the slices from the annotated dataset to make slice-level decisions. Random forests are trained with ResNet’s softmax outputs for all the slices in a scan as features to make scan-level decisions.

Results

A different set of 2993 scans, uniformly sampled from the database without any exclusion criterion, is used for testing the scan-level decisions. Number of scans with ICH, SDH, EDH and SAH in this set are 123, 58, 41 and 62, respectively. Area under the receiver operating curve (AUC) for scan-level decisions for ICH, SDH, EDH and SAH are 0.91, 0.90, 0.90 and 0.90, respectively. Algorithm takes less than 1s to produce the decision for a scan.

Conclusion

Deep learning can accurately detect intra- and extra-axial haemorrhages from head CTs.

Identifying pulmonary consolidation in Chest X Rays using deep learning

Purpose

Chest x-rays are widely used to identify pulmonary consolidation because they are highly accessible, cheap and sensitive. Automating the diagnosis in chest x-rays can reduce diagnostic delay, especially in resource-limited settings.

Methods

Anonymised dataset of 423,218 chest x-rays with corresponding reports (collected from 166 centres across India spanning 22 x-ray machine variants from 9 manufacturers) is used for training and validation. x-rays with consolidation are identified from their reports using natural language processing techniques. Images are preprocessed to a standard size and normalised to remove source dependency. These images are trained using deep residual neural networks. Multiple models are trained on various selective subsets of the dataset along with one model trained on entire data set. Scores yielded by each of these models is passed through a 2-layer neural network to generate final probabilities for presence of consolidation in an x-ray.

Results

The model is validated and tested on a test dataset that is uniformly sampled from the parent dataset without any exclusion criteria. Sensitivity and specificity for the tag has been observed as 0.81 and 0.80, respectively. Area under the Receiver Operating Curve (AUC-ROC) was observed as 0.88.

Conclusion

Deep learning can be used to diagnose pulmonary consolidation in chest x-rays with models trained on a generalised dataset with samples from multiple demographics. This model performs better than a model trained on controlled dataset and is suited for a real world setting where x-ray quality may not be consistent.

Automatic detection of generalised cerebral atrophy using deep neural networks from head CT scans

Purpose

Features of generalised cerebral atrophy on brain CT images are the marker of neurodegenerative diseases of the brain. Our study aims at automated diagnosis of generalised cerebral atrophy on brain CT images using deep neural networks thereby offering an objective early diagnosis.

Methods

An anonymised dataset containing 78 head CT scans (1608 slices) was used to train and validate a skull-stripping algorithm. The intracranial region was marked out slice by slice in each scan. Then a U-Net-based deep neural network was trained on these annotations to strip the skull from each slice. A second anonymised dataset containing 2189 CT scans (231 scans with atrophy) was used to train and validate an atrophy detection algorithm. First, an image registration technique was applied on the predicted intracranial region to align all scans to a standard head CT scan. The parenchymal and CSF volume was calculated by thresholding Hounsfield units from the intracranial region. The ratio of CSF volume to parenchymal volume from each slice of the aligned CT scan and the age of the patient were used as features to train a random forest algorithm that decides if the scan shows generalised cerebral atrophy.

Results

An independent set of 3000 head CT scans (347 scans with atrophy) was used to test the algorithm. Area under the receiver operating curve (AUC) for scan-level decisions is 0.86. Predictions on each patient takes time < 45s.

Conclusion

Deep convolutional networks can accurately detect generalised cerebral atrophy given a CT scan.