The study analyzed 7,670,212 record pairs from radiological exams conducted between 2020 and 2022 during the Moscow Computer Vision Experiment.
For AI performance evaluation, a final dataset of 100 CXR images (50 with lung nodules, 50 without) was selected based on inclusion and exclusion criteria.
Three AI solutions (Celsus, Lunit INSIGHT CXR, and qXR) met or exceeded vendor specifications, achieving the highest AUC of 0.956 (95% CI: 0.918–0.994).
When radiologists assessed AI segmentation and classification of nodules, performance dropped, with the highest AUC at 0.812 (95% CI: 0.744–0.879).
All AI services achieved 100% specificity in the second and third evaluation stages.
To ensure the reliability of AI for lung nodule detection, validation using high-quality datasets and expert radiologist assessments is essential. Despite high AUC values in automated detection, AI models underperformed when evaluated for segmentation and classification accuracy. Developers should enhance model accuracy before considering standalone AI use in clinical practice.