Performance of a Chest Radiography AI Algorithm for Detection of Missed or Mislabeled Findings: A Multicenter Study


According to studies, around 25% of annual diagnostic imaging procedures are for chest X-rays (CXRs). CXRs make up to almost half (44%) of all radiography pictures in the United States alone. Its broad usage in medical practice for a variety of cardiothoracic conditions can be attributed to several aspects, including ease of accessibility, mobility, familiarity, and price (compared to other imaging tests).  

CXRs do, however, have a significant misinterpretation rate—up to 30%. In fact, according to earlier investigations, inter-radiologist and physician concordance for CXRs was as low as 78%.  

Missed radiography findings can have serious consequences. For instance, take this other study that demonstrates that 19% of lung tumors that appeared as pulmonary nodules on CXRs were overlooked. Such overlooked results may be fatal for both patients.  


The goal of this multi-center study was to find out whether a CXR AI algorithm can spot overlooked or mis-labeled CXR findings in radiology reports. 

This study took into consideration multi-institutional radiology databases totaling 13 million reports to identify all CXR reports for the past 22 years. The data was collected from centers such as the Massachusetts General Hospital (MGH), Brigham Women Hospital (BWH), Faulkner Health Center (FH), Martha’s Vineyard Hospital (MVH), Salem Hospital (NSMC), Newton-Wellesley Hospital (NWH), and Spaulding Rehabilitation Hospital (SRH).


The study's primary application relates to the functionality and practical application of Qure’s CXR AI – qXR . A team of expert radiologists used’s CE- approved qXR for this retrospective multi-center study. It was used to investigate if qXR can reduce the frequency of errors in detection and side-labeling of radiographic findings. 


  • The study included 279 CXRs with an addendum for missed/mislabeling from multiple sites in the US.
  • ~90% of the missed/mislabeled CXR have been identified with a critical abnormality by qXR with zero false positives
  • The study concluded that qXR could identify critical findings on Chest X-Rays, including malignant nodules. 


The qXR AI algorithm had high sensitivity (96%), specificity (100%), and accuracy (96%) for detecting all missed and mislabeled CXR findings. The AI could accurately detect mislabeled and missed findings. Qure’s qXR can reduce the frequency of errors in detection and side-labeling of radiographic findings. 

Industry Report: Using an Artificial Intelligence-Enabled Diagnostic Imaging Solution in Clinical Workflows

You’re one click away from understanding the impact of AI in Clinical Workflow

Identifying malignant nodules on chest X-rays: A validation study of radiologist versus artificial intelligence diagnostic accuracy


Three and half million anonymous X-rays were gathered from 45 locations worldwide (in-hospital and outpatient settings). qXR was initially trained on this massive dataset. We used an independent dataset of 13,426 chest X-rays from radiologists’ reports. The test data set included 213,459 X-rays chosen at random from a pool of 3.5 million X-rays. The dataset (development) was developed using the remaining X-rays received from the remaining patients.


qXR is a deep learning algorithm-enabled software that is used to study nodules and malignant nodules on X-rays. We observed moderate to a substantial agreement even when observations were made with normal X-rays.


qXR presented a high area under the curve (AUC) of 0.99 with a 95% confidence interval calculated with the Clopper–Pearson method. The specificity obtained with qXR was 0.90, and the sensitivity was 1 at the operating threshold. The sensitivity value of qXR in detecting nodules was 0.99, and the specificity ranged from 0.87 to 0.92, with AUC ranging between 0.98 and 0.99. The malignant nodules were detected with a sensitivity ranging from 0.95 to 1.00, specificity between 0.96 and 0.99, and AUC from 0.99 to 1. The sensitivity of radiologists 1 and 2 was between 0.74 and 0.76, with a specificity ranging from 0.98 to 0.99. In detecting the malignant nodules, specificity ranged between 0.98 and 0.99, and sensitivity fell between 0.88 and 0.94.


Machine learning model can be used as a passive tool to find incidental cases of lung cancer or as a triaging tool, which accelerate the patient journey through standard care pipeline for lung cancer

Using artificial intelligence to risk stratify COVID19 patients based on chest X-ray findings


Deep learning-based radiological image analysis could facilitate use of chest x-rays as a triaging tool for COVID-19 diagnosis in resource-limited settings. This study sought to determine whether a modified commercially available deep learning algorithm (M-qXR) could risk stratify patients with suspected COVID-19 infections. 


A dual track clinical validation study was designed to assess the clinical accuracy of M-qXR. The algorithm evaluated all Chest-X-rays (CXRs) performed during the study period for abnormal findings and assigned a COVID-19 risk score. Four independent radiologists served as radiological ground truth. The M-qXR algorithm output was compared against radiological ground truth and summary statistics for prediction accuracy were calculated. In addition, patients who underwent both PCR testing and CXR for suspected COVID-19 infection were included in a co-occurrence matrix to assess the sensitivity and specificity of the M-qXR algorithm. 


625 CXRs were included in the clinical validation study. 98% of total interpretations made by M-qXR agreed with ground truth (p = 0.25). M-qXR correctly identified the presence or absence of pulmonary opacities in 94% of CXR interpretations. M-qXR’s sensitivity, specificity, PPV, and NPV for detecting pulmonary opacities were 94%, 95%, 99%, and 88% respectively. M-qXR correctly identified the presence or absence of pulmonary consolidation in 88% of CXR interpretations (p = 0.48). M-qXR’s sensitivity, specificity, PPV, and NPV for detecting pulmonary consolidation were 91%, 84%, 89%, and 86% respectively. Furthermore, 113 PCR-confirmed COVID-19 cases were used to create a co-occurrence matrix between M-qXR’s COVID-19 risk score and COVID-19 PCR test results. The PPV and NPV of a medium to high COVID-19 risk score assigned by M-qXR yielding a positive COVID-19 PCR test result was estimated to be 89.7% and 80.4% respectively. 


M-qXR was found to have comparable accuracy to radiological ground truth in detecting radiographic abnormalities on CXR suggestive of COVID-19. 

Evaluation of chest X-Ray with automated interpretation algorithms for mass tuberculosis screening in prisons


The World Health Organization (WHO) recommends systematic tuberculosis (TB) screening in prisons. Evidence is lacking for accurate and scalable screening approaches in this setting. 


To assess the diagnostic accuracy of artificial intelligence-based chest x-ray interpretation algorithms for TB screening in prisons. 


Prospective TB screening study in three prisons in Brazil from October 2017 to December 2019. We administered a standardized questionnaire, performed chest x-ray in a mobile unit, and collected sputum for confirmatory testing using Xpert MTB/RIF and culture. We evaluated x-ray images using three algorithms (CAD4TB version 6, LunitTB and qXR) and compared their diagnostic accuracy. We utilized multivariable logistic regression to assess the effect of demographic and clinical characteristics on algorithm accuracy. Finally, we investigated the relationship between abnormality scores and Xpert semi-quantitative results. 

Measurements and Main Results

Among 2,075 incarcerated individuals, 259 (12.5%) had confirmed TB. All three algorithms performed similarly overall with AUCs of 0.87-0.91. At 90% sensitivity, only LunitTB and qXR met the WHO Target Product Profile requirements for a triage test, with specificity of 84% and 74%, respectively. All algorithms had variable performance by age, prior TB, smoking, and presence of TB symptoms. LunitTB was the most robust to this heterogeneity, but nonetheless failed to meet the TPP for individuals with previous TB. Abnormality scores of all three algorithms were significantly correlated with sputum bacillary load. 


Automated x-ray interpretation algorithms can be an effective triage tool for TB screening in prisons. However, their specificity is insufficient in individuals with previous TB. 

Independent evaluation of 12 artificial intelligence solutions for the detection of tuberculosis


There have been few independent evaluations of computer-aided detection (CAD) software for tuberculosis (TB) screening, despite the rapidly expanding array of available CAD solutions. We developed a test library of chest X-ray (CXR) images which was blindly re-read by two TB clinicians with different levels of experience and then processed by 12 CAD software solutions. Using Xpert MTB/RIF results as the reference standard, we compared the performance characteristics of each CAD software against both an Expert and Intermediate Reader, using cut-off thresholds which were selected to match the sensitivity of each human reader. Six CAD systems performed on par with the Expert Reader (, DeepTek, Delft Imaging, JF Healthcare, OXIPIT, and Lunit) and one additional software (Infervision) performed on par with the Intermediate Reader only., Delft Imaging and Lunit were the only software to perform significantly better than the Intermediate Reader. The majority of these CAD software showed significantly lower performance among participants with a past history of TB. The radiography equipment used to capture the CXR image was also shown to affect performance for some CAD software. TB program implementers now have a wide selection of quality CAD software solutions to utilize in their CXR screening initiatives. 

Artificial intelligence matches subjective severity assessment of pneumonia for prediction of patient outcome and need for mechanical ventilation – a cohort study


To compare the performance of artificial intelligence (AI) and Radiographic Assessment of Lung Edema (RALE) scores from frontal chest radiographs (CXRs) for predicting patient outcomes and the need for mechanical ventilation in COVID-19 pneumonia. Our IRB-approved study included 1367 serial CXRs from 405 adult patients (mean age 65 ± 16 years) from two sites in the US (Site A) and South Korea (Site B). We recorded information pertaining to patient demographics (age, gender), smoking history, comorbid conditions (such as cancer, cardiovascular and other diseases), vital signs (temperature, oxygen saturation), and available laboratory data (such as WBC count and CRP). Two thoracic radiologists performed the qualitative assessment of all CXRs based on the RALE score for assessing the severity of lung involvement. All CXRs were processed with a commercial AI algorithm to obtain the percentage of the lung affected with findings related to COVID-19 (AI score). Independent t- and chi-square tests were used in addition to multiple logistic regression with Area Under the Curve (AUC) as output for predicting disease outcome and the need for mechanical ventilation. The RALE and AI scores had a strong positive correlation in CXRs from each site (r2 = 0.79–0.86; p < 0.0001). Patients who died or received mechanical ventilation had significantly higher RALE and AI scores than those with recovery or without the need for mechanical ventilation (p < 0.001). Patients with a more substantial difference in baseline and maximum RALE scores and AI scores had a higher prevalence of death and mechanical ventilation (p < 0.001). The addition of patients’ age, gender, WBC count, and peripheral oxygen saturation increased the outcome prediction from 0.87 to 0.94 (95% CI 0.90–0.97) for RALE scores and from 0.82 to 0.91 (95% CI 0.87–0.95) for the AI scores. AI algorithm is as robust a predictor of adverse patient outcome (death or need for mechanical ventilation) as subjective RALE scores in patients with COVID-19 pneumonia.

Early Evaluation of an Ultra-Portable X-ray System for Tuberculosis Active Case Finding


X-ray screening is an important tool in tuberculosis (TB) prevention and care, but access has historically been restricted by its immobile nature. As recent advancements have improved the portability of modern X-ray systems, this study represents an early evaluation of the safety, image quality and yield of using an ultra-portable X-ray system for active case finding (ACF). We reported operational and radiological performance characteristics and compared image quality between the ultra-portable and two reference systems. Image quality was rated by three human readers and by an artificial intelligence (AI) software. We deployed the ultra-portable X-ray alongside the reference system for community-based ACF and described TB care cascades for each system. The ultra-portable system operated within advertised specifications and radiologic tolerances, except on X-ray capture capacity, which was 58% lower than the reported maximum of 100 exposures per charge. The mean image quality rating from radiologists for the ultra-portable system was significantly lower than the reference (3.71 vs. 3.99, p < 0.001). However, we detected no significant differences in TB abnormality scores using the AI software (p = 0.571), nor in any of the steps along the TB care cascade during our ACF campaign. Despite some shortcomings, ultra-portable X-ray systems have significant potential to improve case detection and equitable access to high-quality TB care. 

Tuberculosis detection from chest x-rays for triaging in a high tuberculosis-burden setting: an evaluation of five artificial intelligence algorithms


Artificial intelligence (AI) algorithms can be trained to recognise tuberculosis-related abnormalities on chest radiographs. Various AI algorithms are available commercially, yet there is little impartial evidence on how their performance compares with each other and with radiologists. We aimed to evaluate five commercial AI algorithms for triaging tuberculosis using a large dataset that had not previously been used to train any AI algorithms. 


Individuals aged 15 years or older presenting or referred to three tuberculosis screening centres in Dhaka, Bangladesh, between May 15, 2014, and Oct 4, 2016, were recruited consecutively. Every participant was verbally screened for symptoms and received a digital posterior-anterior chest x-ray and an Xpert MTB/RIF (Xpert) test. All chest x-rays were read independently by a group of three registered radiologists and five commercial AI algorithms: CAD4TB (version 7), InferRead DR (version 2), Lunit INSIGHT CXR (version 4.9.0), JF CXR-1 (version 2), and qXR (version 3). We compared the performance of the AI algorithms with each other, with the radiologists, and with the WHO's Target Product Profile (TPP) of triage tests (≥90% sensitivity and ≥70% specificity). We used a new evaluation framework that simultaneously evaluates sensitivity, proportion of Xpert tests avoided, and number needed to test to inform implementers’ choice of software and selection of threshold abnormality scores. 


Chest x-rays from 23 954 individuals were included in the analysis. All five AI algorithms significantly outperformed the radiologists. The areas under the receiver operating characteristic curve were 90·81% (95% CI 90·33–91·29) for qXR, 90·34% (89·81–90·87) for CAD4TB, 88·61% (88·03–89·20) for Lunit INSIGHT CXR, 84·90% (84·27–85·54) for InferRead DR, and 84·89% (84·26–85·53) for JF CXR-1. Only qXR (74·3% specificity [95% CI 73·3–74·9]) and CAD4TB (72·9% specificity [72·3–73·5]) met the TPP at 90% sensitivity. All five AI algorithms reduced the number of Xpert tests required by 50% while maintaining a sensitivity above 90%. All AI algorithms performed worse among older age groups (>60 years) and people with a history of tuberculosis. 


AI algorithms can be highly accurate and useful triage tools for tuberculosis detection in high-burden regions, and outperform human readers. 

Pattern of abnormalities amongst chest X-rays of adults undergoing computer-assisted digital chest X-ray screening for tuberculosis in Peri-Urban Blantyre, Malawi: A cross-sectional study


The prevalence of diseases other than tuberculosis (TB) detected during chest X-ray screening is poorly described in sub-Saharan Africa. Computer-assisted digital chest X-ray technology is available for TB screening and has the potential to be a screening tool for non-communicable diseases as well. Low- and middle-income countries are in a transition period where the burden of non-communicable diseases is increasing, but health systems are mainly focused on addressing infectious diseases. 


Participants were adults undergoing computer-assisted chest X-ray screening for tuberculosis in a community-wide tuberculosis prevalence survey in Blantyre, Malawi. Adults with abnormal radiographs by field radiographer interpretation were evaluated by a physician in a community-based clinic. X-ray classifications were compared to classifications of a random sample of normal chest X-rays by radiographer interpretation. Radiographic features were classified using WHO Integrated Management for Adult Illnesses (IMAI) guidelines. All radiographs taken at the screening tent were analysed by the qXR v2.0 software. 


5% (648/13,490) of adults who underwent chest radiography were identified to have an abnormal chest X-ray by the radiographer. 387 (59.7%) of the participants attended the X-ray clinic, and another 387 randomly sampled normal X-rays were available for comparison. Participants who were referred to the community clinic had a significantly higher HIV prevalence than those who had been identified to have a normal CXR by the field radiographer (90 [23.3%] vs. 43 [11.1%] p-value < 0.001). The commonest radiographic finding was cardiomegaly (20.7%, 95% CI 18.0–23.7). One in five (81/387) chest X-rays were misclassified by the radiographer. The overall mean qXR v2.0 score for all reviewed X-rays was 0.23 (SD 0.20). There was a high concordance of cardiomegaly classification between the physician and the computer-assisted software (109/118, 92.4%). 


There is a high burden of cardiomegaly on a chest X-ray at a community level, much of which is in patients with diabetes, heart disease and high blood pressure. Cardiomegaly on chest X-ray may be a potential tool for screening for cardiovascular NCDs at the primary care level as well as in the community.