Deep-ASPECTS: A Segmentation-Assisted Model for Stroke Severity Measurement

Abstract

A stroke occurs when an artery in the brain ruptures and bleeds or when the blood supply to the brain is cut off. Blood and oxygen cannot reach the brain’s tissues due to the rupture or obstruction resulting in tissue death. The Middle cerebral artery (MCA) is the largest cerebral artery and the most commonly damaged vessel in stroke. The quick onset of a focused neurological deficit caused by interruption of blood flow in the territory supplied by the MCA is known as an MCA stroke. Alberta stroke programme early CT score (ASPECTS) is used to estimate the extent of early ischemic changes in patients with MCA stroke. This study proposes a deep learning-based method to score the CT scan for ASPECTS. Our work has three highlights. First, we propose a novel method for medical image segmentation for stroke detection. Second, we show the effectiveness of AI solution for fully-automated ASPECT scoring with reduced diagnosis time for a given non-contrast CT (NCCT) Scan. Our algorithms show a dice similarity coefficient of 0.64 for the MCA anatomy segmentation and 0.72 for the infarcts segmentation. Lastly, we show that our model’s performance is inline with inter-reader variability between radiologists. 

The Early diagnostic value of chest X-ray scanning by the help of Artificial Intelligence in Heart Failure (ART-IN-HF): The First Outcomes

BACKGROUND

A rapid and accurate diagnosis of heart failure(HF) is of utmost importance to decrease the mortality and to retrench the medical expenditure. Our project aims to identify potential HF patients by detecting cardiomegaly and pleural effusion in their chest X-rays by the help of Artificial Intelligence(AI) scanning 

METHODS

We scanned the anonymous chest X-rays of the patients who were at least 45 years old and have applied to any department of the hospital (except cardiology, cardiovascular surgery and emergency department) by the help of AI. AI detected patients who have both cardiomegaly and pleural effusion in their X-Rays and they have been invited to our cardiology clinic for further, definitive HF diagnostic tests. 

RESULTS

5623 subjects were scanned and 119 of them(2.1%) had cardiomegaly and pleural effusion together. We reached 57 of 119 patients. We diagnosed HF in 49 of 57 patients (86%) according to the 2021 ESC HF guidelines. The mean values for left ventricular EF was 42±13 %, NT-proBNP levels were median 4218 pg/ml (Q1: 1947pg/ml-Q3:10674pg/ml) and the mean age of HF patients was 70±10 years.  

CONCLUSION

ART-IN-HF project is the first initiative that had used chest X-ray scanning for the early diagnosis of HF by detecting both cardiomegaly and pleural effusion in chest X-rays facilitated by AI. Most of the patients who had both cardiomegaly and pleural effusion were diagnosed with HF. AI might be useful for detecting HF using chest X-ray scanning in undiagnosed HF patients. 

Deep Learning-Based Software Tools for Tuberculosis Detection in Chest X-Ray Images

Abstract

Deep learning is a rising phenomenon in data analysis and one of the ten innovative technologies. Deep learning allows for multi-layered computational models to learn data representations with multiple levels of abstraction. Deep learning provides the best solution to visual object recognition. Deep learning offers the best solution to tuberculosis (TB) detection in X-ray images. In a shortage of qualified radiologists, these new technologies increase the capacity to improve overall TB diagnosis and treatment. This paper aims to review deep learning-based software techniques, such as CAD4TB, qXR, and Lunit INSIGHT used for detecting chest X-ray (CXR) abnormalities. Deep learning-based CAD4TB software highlights the abnormal region in the form of a heat map image. qXR software identifies 15 abnormalities from abnormal chest X-ray images. Lunit INSIGHT CXR software identifies ten abnormalities from chest radiography. The human observer must validate deep learning-based computer-aided detection (CAD) systems used in real diagnosis service. The accuracy of CAD software is analysed by using the receiver operating characteristic (ROC) curve. Deep learning-based systems in medicine must meet ethical values and validate the results.

Segmentation and Measurement of Ventricular and Cranial Vault Volumes in 15,223 Subjects Using Artificial Intelligence

PURPOSE

True volumetric measurement of the cerebral ventricles could potentially improve the diagnosis and follow-up of hydrocephalus. However, manual image segmentation is an impractically laborious process to be applied routinely. In this work, we utilize deep learning algorithms to provide an automated solution for precise ventricular segmentation and volume quantification in a large dataset.

METHOD AND MATERIALS

A large database of head computed tomographic (CT) scans was utilized to train and validate convoluted neural network models. Lateral ventricles were manually annotated in 103 scans and were randomly split with a training-validation ratio of 4:1. One U-net model was trained to segment the lateral ventricles in each slice, and another to segment the cranial vault. Each model was validated against the manually annotated images using the DICE index. Both networks were then used to segment and quantify the ventricular and cranial vault volumes of a large random sample from our database.

RESULTS

Both the ventricular volume and the cranial vault U-net models showed high fidelity to the manual annotation (DICE 0.909 and 0.983, respectively). They were then applied to a subset of 15,223 head CT scans from our database. Among these scans, 1,999 (13.1%) had a radiological report of cerebral atrophy and 1,404 (9.2%) of hydrocephalus. Age median was 40 years and 41.5% were females. Patients reported to have cerebral atrophy were confirmed by the U-nets to have larger ventricles. Cranial vault volume increased until the age group of 10-20, after which a plateau was observed. Conversely, ventricular volume increased with age without a plateau, evidently due to cerebral atrophy. Males had 13% larger cranial vault volumes than females. The ventricle-to-vault ratio showed no differences between the two sexes in younger ages, but men had a steeper increase with age than women. 

CONCLUSION

This is the first study to measure ventricular volumes in such a large dataset, which has been made possible using artificial intelligence. Here we provide a robust method to establish normal values for ventricular volumes and a tool to routinely report these volumes on CT scans and evaluate for hydrocephalus. 

CLINICAL RELEVANCE/APPLICATION

Deep learning tools can provide ventricular volume measurements on head CTs efficiently and with high fidelity, and can assist in the diagnosis of hydrocephalus. 

Clinical Context Improves the Performance of AI Models for Cranial Fracture Detection

PURPOSE

Clinical history plays a vital role in a physician's or radiologist's diagnosis. However, when training AI models, clinical history or presence of an abnormality which correlates to the target abnormality were not generally considered. In this study, we use scalp hematoma as an additional clinical context in training the models and study the accuracy (AUC and average precision) of a fracture detection AI model before and after adding this clinical context. 

METHOD AND MATERIALS

Using 141,105 studies, we trained a convolutional neural network (CNN) to detect cranial fractures on non-contrast head CT scans. Scalp hematoma is considered a good indicator by physicians for diagnosing fractures. We confirmed this by automated natural language processing (NLP) analysis of large number of reports. Therefore, scalp hematoma is a good candidate for improving AI algorithms for detecting fractures. A logistic regression model was trained to detect a cranial fracture, using the presence of a scalp hematoma and the output probability of the CNN as inputs. The original CNN by itself (Model 1) and the combined CNN-logistic regression algorithm (Model 2) were tested using an independent set containing 18200 scans. We used area under the ROC curve (AUC) and average precision (AP), a probability based metric that is inversely proportional to false positive rate, as evaluation metrics. 

RESULTS

Analysis of 141,105 reports confirmed that scalp hematoma was present in 49.8% of scans with fractures and conversely fractures were present in 29.8% of scans with scalp hematoma. The CNN with images as sole inputs reached an AUC and AP of 0.9599 and 0.7952 respectively. Adding scalp hematoma as a feature increased AUC to 0.9666. AP however, increased significantly to 0.8190. 

CONCLUSION

Using a simple probabilistic algorithm to add clinical context to a CNN resulted in a significant improvement in AP. As AUC is saturated, there is no significant difference in AUC. Results show significant decrease in false positive rate without impacting sensitivity. 

CLINICAL RELEVANCE/APPLICATION

Like radiologists, deep learning models can be more accurate when they incorporate clinical context in addition to image analysis. 

AUC and Enriched Datasets are Not Good Enough Anymore: Presenting an Alternative Metric to Evaluate Radiology AI Models

PURPOSE

Area under receiver operating curve (AUC) is commonly used to evaluate and select artificial intelligence (AI) models for radiology. Artificially balanced/enriched datasets are also usually used to estimate AUC to maximize confidence interval to sample size ratio. In this work, we show that such evaluation of model performance has reached saturation and propose alternate performance evaluation schemes.

METHOD AND MATERIALS

Receiver operating curve (ROC) is a curve where false positive rate (1 – specificity) and sensitivity of model at different thresholds are plotted on x and y-axes respectively. Similarly, precision recall curve (PRC) is plotted with recall (sensitivity) and precision (positive predictive value) on x and y-axes respectively. AUC is defined as area under ROC while average precision (AP) is defined as area under PRC. To illustrate the proposed evaluation scheme, two different high-performance models to detect fractures from head CT scans were created. In addition, two datasets were created by uniformly sampling scans and artificially enriching scans with fractures respectively. AUCs and APs were computed for the model-dataset pairs. We propose that AP computed on uniformly sampled dataset is more useful for model selection than other options. 

RESULTS

AUCs for all four (model, dataset) pairs were >92%. For both the datasets, difference in AUCs between the models was less than 3%. APs on enriched dataset were high for both models (95% & 92% respectively). However, APs on uniformly sampled dataset were lower than expected (80% & 69% respectively). The difference in models' performance was the highest (difference of 11%) when performance was measured using AP on uniformly sampled dataset. 

CONCLUSION

AUC, although a commonly used performance metric for models, saturates early. Therefore, it is not suitable for model selection among high performance models (i.e. AUC > 0.9). Similarly, model selection using artificially enriched datasets is not a good practice as both AUC and AP saturate early. Average precision measured on a uniformly sampled dataset shows the deficiencies in models' performance well and therefore, is a better metric for model selection. 

CLINICAL RELEVANCE/APPLICATION

Average precision and uniformly sampled datasets should be used to evaluate artificial intelligence models in radiology instead of AUC and enriched datasets. 

Automated classification of X-rays as normal/Abnormal using a high sensitivity deep learning algorithm

Purpose

Majority of chest X-rays (CXRs) performed globally are normal radiologists spend significant time ruling out these scans. We present a Deep Learning (DL) model trained for the specific use of classifying CXRs into normal and abnormal, potentially reducing time and cost associated with reporting normal studies.

Methods

A DL algorithm trained on 1,150,084 CXRs and their corresponding reports was developed. A retrospectively acquired independent test set of 430 CXRs (285 abnormal, 145 normal) was analyzed by the algorithm classifying each X-Ray as normal or abnormal. Ground truth for the independent test set was established by a sub-specialist chest radiologist with 8 years’ experience by reviewing every Chest X-Ray image with reference to the existing report. Algorithm output was compared against ground truth and summary statistics were calculated.

Results

The algorithm correctly classified 376 (87.44%) CXRs with sensitivity of 97.19% (95% CI – 94.54% to 98.78%) and specificity of 68.28% (95% CI – 60.04% to 75.75%). There were 46 (10.70%) false positives and 8 (1.86%) false negatives (FNs). Out of 8 FNs, 3 were designated as clinically insignificant (mild, inactive fibrosis) and 5 as significant (rib fractures, pneumothorax).

Conclusion

High-sensitivity DL algorithms can potentially be deployed for primary read of CXRs enabling radiologists to spend appropriate time on abnormal cases, saving time and thereby cost of reporting CXRs, especially on non-emergency situations, More in-depth prospective trials are required to ascertain the overall impact of such algorithms.

Automated Detection and Localization of Pneumocephalus in Head CT scan

Purpose

Pneumocephalus, accumulation of air in intracranial space, can lead to midline shift and compression of brain. In this work, we detail the development of deep learning algorithms for automated detection and localization of pneumocephalus in head CT scans.

Methods

Firstly, to localize the intracranial space from a given head CT scan, a skull­-stripping algorithm was developed using a randomly sampled anonymized dataset of 78 head CT scans (1608 slices). We sampled another anonymized dataset containing 83 head CT scans (3546 slices) having pneumocephalus and 310 normal head CT scans which were randomly sampled to represent natural distribution. These 3546 slices (932 slices had pneumocephalus) were annotated for pneumocephalus regions. Then U­Net based deep neural network algorithm was trained on these scans to accurately predict the pneumocephalus region . The predicted pneumocephalus region is refined by removing the regions outside the intracranial space identified by the skull stripping algorithm. The refined pneumocephalus region is then used to extract features. Using these features, a random forest was trained to classify the presence of pneumocephalus in a scan. Areas under receiver operating characteristics curves (AUC) were used to evaluate the algorithms.

Results

An independent dataset of 1891 head CT scans (40 scans had pneumocephalus) was used for testing above algorithms. AUC for the scan level predictions was 0.89. Sensitivity and Specificity of 0.80 and 0.83 respectively were observed.

Conclusion

In this work, we showed the efficacy of deep learning algorithms in localizing and classifying the pneumocephalus accurately in a head CT scan.

Validation of Deep Learning Algorithms for Detection of Critical Findings in Head CT Scans

Purpose

To validate set of deep learning algorithms for automated detection of key findings from non­contrast head-CT scans: intracranial hemorrhage and its subtypes, calvarial fractures, midline shift and mass effect.

Methods

We retrospectively collected a dataset containing 313,318 head-CT scans of which random subset(Qure25k dataset) was used to validate and rest to develop algorithms. Additional dataset(CQ500 dataset) was collected from different centers to validate algorithms. Patients with post­operative defect or age<7 were excluded from all datasets. Three independent radiologists read each scan in CQ500 dataset. Original clinical radiology report and consensus of readers were considered as gold standards for Qure25k and CQ500 datasets respectively. Areas under receiver operating characteristics curves(AUCs) were used to evaluate algorithms.

Results

After exclusion, Qure25k dataset contained 21,095 scans(mean-age 43;43% female) while CQ500 dataset consisted of 491(mean-age 48;36% female) scans. On Qure25k dataset, algorithms achieved an AUC of 0.92 for detecting intracranial hemorrhage(0.90-intraparenchymal, 0.96-intraventricular, 0.92-subdural, 0.93-extradural, and 0.90-subarachnoid hemorrhages). On CQ500 dataset, AUC was 0.94 for intracranial haemorrhage(0.95, 0.93, 0.95, 0.97, and 0.96 respectively). AUCs on Qure25k dataset were 0.92 for calvarial fractures, 0.93 for midline shift, and 0.86 for mass effect, while AUCs on CQ500 dataset were 0.96, 0.97 and 0.92 respectively.

Conclusion

This study demonstrates that deep learning algorithms can identify head-CT scan abnormalities requiring urgent attention with high AUCs.

Prospective evaluation of a deep learning algorithm deployed in an urban imaging centre to notify clinicians of head CT scans with critical abnormalities

Purpose

Non-contrast Head-CT scans are primary imaging modality for evaluating patients with trauma or stroke. While results of deep learning algorithms to identify head-CT scans containing critical abnormalities has been published in retrospective studies, effects of deployment of such an algorithm in a real-world setting, with mobile notifications to clinician remain unstudied. In this prospective study, we evaluated performance of such an automated triage system in an urban 24-hour imaging facility.

Methods

We developed an accurate deep neural network algorithm that identifies and localizes intracranial bleeds, cranial fractures, mass effect and midline shift on non-contrast head-CT scans. The algorithm is deployed in clinical imaging facility in conjunction with an on-premise module that automatically selects eligible scans from PACS and uploads them to cloud-based algorithm for processing. Once processed, cloud algorithm returns an additional series, viewable as an overlay over the original, and a text notification to radiologist with preview images. Mobile notifications facilitated confirmation of the detected abnormalities. We studied the performance of the automated system over 60 days.

Results

748 CT scans were taken over 60 days, of which 194 were non-contrast head-CT scans and these are evaluated by senior radiologist. Sensitivity, specificity, AUC and average time to notification of head-CT scans with critical abnormalities were  0.90 (95% CI 0.74-0.98), 0.86 (0.80-0.91), 0.97 (0.92-1.00) and 3.2 minutes respectively.

Conclusion

An automated triage system in a radiology facility results in rapid notification of critical scans, with a low false positive rate and this may be used to expedite treatment initiation.