Published 19 Aug 2024

Independent evaluation of the accuracy of 5 artificial intelligence software for detecting lung nodules on chest X-rays

Author: Kirill Arzamasov, Yuriy Vasilev, Maria Zelenova, Lev Pestrenin, Yulia Busygina, Tatiana Bobrovskaya, Sergey Chetverikov, David Shikhmuradov, Andrey Pankratov, Yury Kirpichev, Valentin Sinitsyn, Irina Son, Olga Omelyanskaya

Back

The study analyzed 7,670,212 record pairs from radiological exams conducted between 2020 and 2022 during the Moscow Computer Vision Experiment.

For AI performance evaluation, a final dataset of 100 CXR images (50 with lung nodules, 50 without) was selected based on inclusion and exclusion criteria.

Results:

Three AI solutions (Celsus, Lunit INSIGHT CXR, and qXR) met or exceeded vendor specifications, achieving the highest AUC of 0.956 (95% CI: 0.918–0.994). When radiologists assessed AI segmentation and classification of nodules, performance dropped, with the highest AUC at 0.812 (95% CI: 0.744–0.879). All AI services achieved 100% specificity in the second and third evaluation stages.

Conclusion:

To ensure the reliability of AI for lung nodule detection, validation using high-quality datasets and expert radiologist assessments is essential. Despite high AUC values in automated detection, AI models underperformed when evaluated for segmentation and classification accuracy. Developers should enhance model accuracy before considering standalone AI use in clinical practice.

Authors

Kirill Arzamasov, Yuriy Vasilev, Maria Zelenova, Lev Pestrenin, Yulia Busygina, Tatiana Bobrovskaya, Sergey Chetverikov, David Shikhmuradov, Andrey Pankratov, Yury Kirpichev, Valentin Sinitsyn, Irina Son, Olga Omelyanskaya

Share this publication

Featured in