For radiologists to develop confidence in a deep learning diagnostic algorithm, it is essential that the algorithm be able to visually demonstrate the evidence for the diagnosis or disease tag. We describe the development of a method that highlights the region(s) of a chest X-ray (CXR) responsible for a deep learning algorithm diagnosis.
Using 24,384 CXRs, we trained 18-layer deep residual convolutional neural networks to predict if a chest X-ray was normal or abnormal, and to detect the presence of ‘cardiomegaly, ‘opacity’, and ‘pleural effusion’ in a CXR. We then applied a method called prediction difference analysis for visualization and interpretation of the trained models. The contribution of each patch in the image is estimated as the degree by which the prediction changes if that patch is replaced with an average normal patch. This method was used to generate a relevance score for each pixel which is consequently visualized as a heat map.
We used a 60-20-20 split for train, validation and test sets. The trained neural network showed an area under the ROC curve of 0.89, 0.92, 0.84, 0.91 for tagging abnormal, cardiomegaly, opacity and pleural effusion respectively on the test set. The visualization pipeline is used to generate heatmaps highlighting the enlarged heart, opacities and the fluid corresponding to the cardiomegaly, opacity and pleural effusion tags.
We trained and tested a deep learning algorithm which accurately classifies and assigns clinically relevant tags to CXRs. Further, we applied a visualization method that generates heatmaps highlighting the most relevant parts of the CXR. The visualization method is broadly applicable to other kinds of X-rays, and to other deep learning algorithms. Future work will focus on formally validating the accuracy of the visualization, by measuring overlap between radiologist annotation and algorithm-generated heatmap.
Heatmaps highlighting evidence for disease tags will provide clinical users with crucial visual cues that could ease their decision to accept or reject a deep learning based chest x-ray diagnosis.