Performance by AI Product:
qXR v2 (2 studies)
Sensitivity: 0.944 (95% CI: 0.887–0.973)
Specificity: 0.692 (95% CI: 0.549–0.805)
DOR: 3.63 (95% CI: 3.17–4.09)
PPV: 0.259 (95% CI: 0.252–0.265)
NPV: 0.988 (95% CI: 0.988–0.989)
AUC: 0.923
Lunit INSIGHT CXR v3.1 (2 studies)
Sensitivity: 0.853 (95% CI: 0.787–0.901)
Specificity: 0.646 (95% CI: 0.627–0.665)
DOR: 2.37 (95% CI: 1.96–2.78)
PPV: 0.256 (95% CI: 0.2555–0.2564)
NPV: 0.968 (95% CI: 0.9677–0.9682)
AUC: 0.672
CAD4TB v3.07 (4 studies)
Sensitivity: 0.917 (95% CI: 0.848–0.956)
Specificity: 0.371 (95% CI: 0.336–0.408)
DOR: 1.91 (95% CI: 1.4–2.47)
PPV: 0.203 (95% CI: 0.202–0.203)
NPV: 0.962 (95% CI: 0.9618–0.9622)
AUC: 0.423
Conclusion:
AI products for TB screening demonstrate high sensitivity and modest specificity.
qXR v2 showed the best performance overall, with the highest AUC (0.923) and balanced specificity (0.692).
CAD4TB v3.07 had the lowest specificity (0.371), indicating a higher rate of false positives.
More clinical studies, especially in low TB burden settings, are needed to improve generalizability and refine AI thresholds for better specificity.
The study underscores the growing potential of AI in TB detection but highlights variability across different AI models.
qXR v2 (2 studies)
Sensitivity: 0.944 (95% CI: 0.887–0.973)
Specificity: 0.692 (95% CI: 0.549–0.805)
DOR: 3.63 (95% CI: 3.17–4.09)
PPV: 0.259 (95% CI: 0.252–0.265)
NPV: 0.988 (95% CI: 0.988–0.989)
AUC: 0.923
Lunit INSIGHT CXR v3.1 (2 studies)
Sensitivity: 0.853 (95% CI: 0.787–0.901)
Specificity: 0.646 (95% CI: 0.627–0.665)
DOR: 2.37 (95% CI: 1.96–2.78)
PPV: 0.256 (95% CI: 0.2555–0.2564)
NPV: 0.968 (95% CI: 0.9677–0.9682)
AUC: 0.672
CAD4TB v3.07 (4 studies)
Sensitivity: 0.917 (95% CI: 0.848–0.956)
Specificity: 0.371 (95% CI: 0.336–0.408)
DOR: 1.91 (95% CI: 1.4–2.47)
PPV: 0.203 (95% CI: 0.202–0.203)
NPV: 0.962 (95% CI: 0.9618–0.9622)
AUC: 0.423
Conclusion:
AI products for TB screening demonstrate high sensitivity and modest specificity.
qXR v2 showed the best performance overall, with the highest AUC (0.923) and balanced specificity (0.692).
CAD4TB v3.07 had the lowest specificity (0.371), indicating a higher rate of false positives.
More clinical studies, especially in low TB burden settings, are needed to improve generalizability and refine AI thresholds for better specificity.
The study underscores the growing potential of AI in TB detection but highlights variability across different AI models.
