qure_logo.svg

Published 05 Aug 2023

Benchmarking the diagnostic test accuracy of certified AI products for screening pulmonary tuberculosis in digital chest radiographs: Preliminary evidence from a rapid review and meta-analysis

Author: David Hua, Khang Nguyen, Neysa Petrina, Noel Young, Jin-Gun Cho, Adeline Yap, Simon K. Poon

SHARE

https://cms.qure.ai

Back

Performance by AI Product:

qXR v2 (2 studies)

Sensitivity: 0.944 (95% CI: 0.887–0.973)
Specificity: 0.692 (95% CI: 0.549–0.805)
DOR: 3.63 (95% CI: 3.17–4.09)
PPV: 0.259 (95% CI: 0.252–0.265)
NPV: 0.988 (95% CI: 0.988–0.989)
AUC: 0.923
Lunit INSIGHT CXR v3.1 (2 studies)

Sensitivity: 0.853 (95% CI: 0.787–0.901)
Specificity: 0.646 (95% CI: 0.627–0.665)
DOR: 2.37 (95% CI: 1.96–2.78)
PPV: 0.256 (95% CI: 0.2555–0.2564)
NPV: 0.968 (95% CI: 0.9677–0.9682)
AUC: 0.672
CAD4TB v3.07 (4 studies)

Sensitivity: 0.917 (95% CI: 0.848–0.956)
Specificity: 0.371 (95% CI: 0.336–0.408)
DOR: 1.91 (95% CI: 1.4–2.47)
PPV: 0.203 (95% CI: 0.202–0.203)
NPV: 0.962 (95% CI: 0.9618–0.9622)
AUC: 0.423

Conclusion:
AI products for TB screening demonstrate high sensitivity and modest specificity.
qXR v2 showed the best performance overall, with the highest AUC (0.923) and balanced specificity (0.692).
CAD4TB v3.07 had the lowest specificity (0.371), indicating a higher rate of false positives.
More clinical studies, especially in low TB burden settings, are needed to improve generalizability and refine AI thresholds for better specificity.
The study underscores the growing potential of AI in TB detection but highlights variability across different AI models.  

Authors

David Hua, Khang Nguyen, Neysa Petrina, Noel Young, Jin-Gun Cho, Adeline Yap, Simon K. Poon

Share this publication