Categories
Uncategorized

Time is Brain: AI helps cut down stroke diagnosis time in the Himalayan foothills

Stroke is a leading cause of death. Stroke care is limited by the availability of specialized medical professionals. In this post, we describe a physician-led stroke unit model established at Baptist Christian Hospital (BCH) in Assam, India. Enabled with qER, Qure’s AI driven automated CT Brain interpretation tool, BCH can quickly and easily determine next steps in terms of treatment and examine the implications for clinical outcomes.

qER at a Stroke unit

Across the world, Stroke is a leading cause of death, second only to ischemic heart disease. According to the the World Stroke Organization (WSO), 13.7 million new strokes occur each year and there are about 80 million stroke survivors globally. In India as per the Health of the Nation’s State Report we see an incidence rate of 119 to 152/100000, and has a case fatality rate of 19 to 42% across the country.

Catering to tea plantation workers in and around the town of Tezpur, the Baptist Christian Hospital, Tezpur (BCH) is a 130-bed secondary care hospital in the North eastern state of Assam in India. This hospital is a unit of the Emmanuel Hospital Association, New Delhi. From humble beginnings, offering basic dispensary services, the hospital grew to become one of the best healthcare providers in Assam, being heavily involved in academic and research work at both national and international levels.

Nestled below the Himalayas, interspersed with large tea plantations, Assamese indigenous population and tea garden workers showcase a prevalence of hypertension, the largest single risk factor of stroke, reportedly between 33% to 60.8%. Anecdotal reports and hospital-based studies indicate a huge burden of stroke in Assam – a significant portion of which is addressed by Baptist Hospital. Recent study showed that hemorrhagic strokes account for close to 50% of the cases here, compared to only about 20% of the strokes in the rest of India.

Baptist Christian Hospital

Baptist Christian Hospital, Tezpur. Source

Challenges in Stroke Care

One of the biggest obstacles in Stroke Care is the lack of awareness of stroke symptoms and the late arrival of the patient, often at smaller peripheral hospitals, which are not equipped with the necessary scanning facilities and the specialists, leading to a delay in effective treatment.

The doctors and nurses of the Stroke Unit at BCH, Tezpur were trained online by specialist neurologists, who in turn trained the rest of the team on a protocol that included Stroke Clinical Assessment, monitoring of risk factors and vital parameters, and other supportive measures like management of Swallow assessment in addition to starting the rehabilitation process and advising on long term care at home. A study done at Tezpur indicated that post establishment of Stroke Unit, there was significant improvement in the quality of life along with reduction in deaths compared to the pre-Stroke Unit phase.

This is a crucial development in Stroke care especially in the low and middle income countries(LMIC) like India, to strengthen the peripheral smaller hospitals which lack specialists and are almost always the first stop for patients in emergencies like Stroke.

Stroke pathway barriers

This representative image details the acute stroke care pathway. Source

The guidelines for management of acute ischemic stroke involves capturing a non-contrast CT (NCCT) study of the brain along with CT or MRI angiography and perfusion and thrombolysis-administration of rTPA (Tissue Plasminogen Activator) within 4.5 hours of symptom onset. Equipped with a CT machine and teleradiology reporting, the physicians at BCH provide primary intervention for these stroke cases after a basic NCCT and may refer them to a tertiary facility, as applicable. They follow a Telestroke model-in cases where thrombolysis is required, the ER doctors consult with neurologists at a more specialized center and the decision making is done upon sharing these NCCT images via phone-based mediums like WhatsApp while severe cases of head trauma are referred for further management to far away tertiary facilities. There have been studies done on a Physician based Stroke Unit model in Tezpur, that has shown an improvement in treatment outcomes.

How is Qure.ai helping BCH with stroke management?

BCH and Qure have worked closely since the onset of the COVID-19 pandemic, especially at a time when confirmatory RT-PCR kits were limiting. qXR, Qure’s AI aided chest X-ray solution had proved to be a beneficial addition for identification of especially asymptomatic COVID-19 suspects and their treatment and management, beyond its role in comprehensive chest screening.

qER messages

AI at BCH

In efforts to improve the workflow of stroke management and care at the Baptist hospital, qER, FDA approved and CE certified software which can detect 12 abnormalities was deployed. The abnormalities including five types of Intracranial Hemorrhages, Cranial Fractures, Mass effect, midline Shift, Infarcts, Hydrocephalus, Atrophy etc in less than 1-2 minutes of the CT being taken. qER has been trained on CT scans from more than 22 different CT machine models, thus making it hardware agnostic. In addition to offering a pre-populated radiology report, the HIPAA compliant qER solution is also able to label and annotate the abnormalities in the key slices.

Since qER integrates seamlessly with the existing technical framework of the site, the deployment of the software was completed in less an hour along with setting up a messaging group for the site. Soon after, within minutes of taking the Head CT, qER analyses were available in the PACS worklist along with messaging alerts for the physicians’ and medical team’s review on their mobile phones.

The aim of this pilot project was to evaluate how qER could add value to a secondary care center where the responsibility for determination of medical intervention falls on the physicians based on teleradiology report available to them in a span of 15-60 minutes. As is established with stroke care, every minute saved is precious.

Baptist Christian Hospital

Physician using qER

At the outset, there were apprehensions amongst the medical team about the performance of the software and its efficacy in improving the workflow, however, this is what they have to say about qER after 2 months of operation:

“qER is good as it alerts the physicians in a busy casualty room even without having to open the workstation. We know if there are any critical issues with the patient” – Dr. Jemin Webster, a physician at Tezpur.

He goes on to explain how qER helps grab the attention of the emergency room doctors and nurses to critical cases that need intervention, or in some instances, referral. It helps in boosting the confidence of the treating doctors in making the right judgement in the clinical decision-making process. It also helps in seeking the teleradiology support’s attention into the notified critical scans, as well as the scans of the stroke cases that are in the window period for thrombolysis. Dr. Jemin also sees the potential of qER in the workflow of high volume, multi-specialty referral centers, where coordination between multiple departments are required.

The Way Ahead

A technology solution like qER can reduce the time to diagnosis in case of emergencies like Stroke or trauma and boosts the confidence of Stroke Unit, even in the absence of specialists. The qER platform can help Stroke neurologists in the Telestroke settings access great quality scans even on their smartphones and guide the treating doctors for thrombolysis and further management. Scaling up this technology to Stroke units and MSUs can empower peripheral hospitals to manage acute Stroke especially in LMICs.

We intend to conduct an observational time-motion study to analyze the Door-to- Needle time with qER intervention via instant reports and phone alerts as we work through the required approvals. Also in the pipeline is performance comparison of qER reporting against the Radiologist report as ground truth along with comparison of clinical outcomes and these parameters before and after introduction of qER into the workflow. We also plan to extend the pilot project to Padhar Mission Hospital, MP and the Shanthibhavan Medical Center, Simdega, Jharkhand.

Qure team is also working on creating a comprehensive stroke platform which is aimed at improving stroke workflows in LMICs and low-resource settings.

Categories
Recommended

Engineering Radiology AI for National Scale in the US

vRad, a large US teleradiology practice and Qure.ai have been colloborating for more than an year for a large scale radiology AI deployment. In this blog post, we describe the engineering that goes into scaling radiology AI. We discuss adapting AI for extreme data diversity, DICOM protocol and software engineering.

vRad and Qure.ai have been collaborating on a large-scale prospective validation of qER, Qure.ai’s ICH model for detecting intracranial hemorrhages (ICH) for more than a year. vRad is a large teleradiology practice – 500+ radiologists serving over 2,000 facilities in the United States – representing patients from nearly all states. vRad uses an in-house built RIS and PACS that processes over 1 million studies a month, with the majority of those studies being XR or CT. Of these, about 70,000 CT studies a month get processed by qure.ai’s algorithms. This collaboration has produced interesting insights into the challenges of implementing AI on such a large scale. Our earlier work together is published elsewhere at Imaging Wire and vRad’s blog.

Models that are accurate on extremely diverse data

Before we discuss the accuracy of models, we have to start with how we actually measure it at scale. In this respect, we have leveraged our experience from prior AI endeavors. vRad runs the imaging models during validation in parallel with production flows. As an imaging study is ingested into the PACS, it is sent directly to validation models for processing. In turn, as soon as the radiologist on the platform completes their report for the scan, we use it to establish the ground truth. We used our Natural Language Processing (NLP) algorithms to automatically read these reports to assign whether the current scan is positive or negative for ICH. Thus, the sensitivity and specificity of a model can be measured in real-time this way on real-world data.

AI models often perform well in the lab, but when tried in a real-world clinical workflow, it does not live up to expectations. This is a combination of problems. The idea of a diverse, heterogeneous cohort of patients is well discussed in the space of medical imaging. In this case, Qure.ai’s model was measured with a cohort of patients representative of the entire US population – with studies from all 50 states flowing through the model and being reported against.

Less commonly discussed are the challenges with the uniqueness of data that is a hospital or even imaging device-specific. vRad receives images from over 150,000 unique imaging devices in over 2,000 facilities. At a study level, different facilities can have many different study protocols – varying amounts of contrast, varying radiation dosages, varying slice thicknesses, and other considerations can change how well a human radiologist can evaluate a study, let alone the AI model.

Just like human radiologists, AI models do their best if they see consistent images at pixel level despite the data diversity. Nobody would want to recalibrate their decision process just because different manufacturers chose to use different post-processing techniques. For example, image characteristics of a thin slice CT scan are quite different from a 5mm thick scan with the former being considerably noisier. Both AI and doctors are sure to be confused if asked to decide whether those subtle hyperdense dots that they see on a thin slice scan are just noise or symptoms of diffuse axonal injury. Therefore, we invested considerably in making sure the diverse data is pre-processed into highly consistent raw pixel data. We discuss more in the following section.

A thin slice CT (left) vs a thick slice one (right)

A thin slice CT (left) vs a thick slice one (right)

DICOM, AI, and interoperability

Dealing with patient and data diversity are major components of AI models. The AI model not only has to be generalizable at the pixel level, but it also must make sure the right pixels are fed into it. The first problem is highly documented in the AI literature but the second one, not so much. As traditional AI imaging models are trained to work on natural images (think cat photos), they deal with simplistic data formats like PNG or JPEG. However, medical imaging is highly structured and complex and contains orders more data compared to natural images. DICOM is the file format and standard used for storing and transfer the medical images.

While DICOM is a robust and well-adopted standard, implementation details vary. Often DICOM tags differ greatly from facility to facility, private tags vary from manufacturer to manufacturer, encodings and other imaging-device specific differences in DICOM require that any piece of software, including an AI model, be robust and good at error handling. After a decade of receiving DICOM from all over the U.S., the vRad PACS still runs into new unique configurations and implementations a few times a year, so we are uniquely sensitive to the challenges.

A taste of DICOM diversity: shown are random study descriptions used to represent CT brain

A taste of DICOM diversity: shown are random study descriptions used to represent CT brain

We realized that we need another machine learning model to solve this interoperability problem itself. How do we recognize that this particular CT image is not a brain image even if the description of images says so? How do we make sure the complete brain is present in the image before we decide there is a bleed in it? Variability of DICOM metadata doesn’t allow us to write simple rules which can work at scale. So, we have trained another AI model based on metadata and pixels which can make the above decisions for us.

These challenges harken back to classic healthcare interoperability problems. In a survey by Philips, the majority of younger healthcare professionals indicated that improved interoperability between software platforms and healthcare practices is important for their workplace satisfaction. Interestingly, these are the exact challenges medical imaging AI has to solve for it to work well. So, AI generalizability is just another name for healthcare interoperability. Given how we used machine learning and computer vision to solve the interoperability problems for our AI model, it might be that solving wider interoperability problems might involve AI itself.

AI Software Engineering

But even after those generalizability/interoperability challenges are overcome, a model must be hosted in some manner, often in a docker-based solution, frequently written in Python. And like the model, this wrapper must scale the solution. It must handle calls to the model and returning results, as well as logging information for the health of the system just like any other piece of software. As a model goes live on a platform like vRad’s, common problems that we see happen are memory overflows, underperforming throughput, and other “typical” software problems.

Although these problems look quite similar to traditional “software problems”, the root cause is quite different. For the scalability and the reliability of traditional software, the bottleneck usually boils down to database transactions. Take Slack, an enterprise messaging platform, for example. What’s the most compute-intensive thing Slack app does? It looks up the chat typed previously by your colleague from a database and shows it to you. Basically, a database transaction. The scalability of Slack usually means scalability and reliability of these database transactions. Given how databases have been around for years, this problem is fairly well solved with off-the-shelf solutions.

For an AI enabled software, the most compute intensive task is not a database transaction but running of an AI model. And this is arguably more intensive than a database lookup. Given how new deep learning is, the ecosystem around it is not yet well-developed. This make AI model deployment and engineering hard and it is being tackled by big names like Google (Tensorflow), Facebook (Torch), and Microsoft (ONNX). Because these are opensource, we actively contribute to them and make them better as we come across problems.

As different is the root cause of the engineering challenges, the process to tackle them is surprisingly similar. After all, engineers’ approach to building bridges and rockets is not all that different, they just require different tools. To make our AI scale to vRad, we followed traditional software engineering best practices including highly tested code and frequent updates. As soon as we identify an issue, we patch it up and write a regression test to make sure we never come across it again. Docker has made deployment and updates easy and consistent.

Automated slack alerts

We get automated alerts of the errors and fix them proactively

Integration to clinical workflow

Another significant engineering challenge we solved is to bend clinical software to our will. DICOM is a messy communication standard and lacks some important features. For example, DICOM features no acknowledgement signal that the complete study has been sent over the network. Another great example is the lack of standardization in how a given study is described – what fields are used and what phrases are used to describe what the study represents. The work Qure.ai and vRad collaborated on the required intelligent mapping of study descriptions and modality information throughout the platform – from the vRad PACS through the Inference Engine running the models to the actual logic in the model containers themselves.

Many AI image models and solutions on the market today integrate with PACS and Worklists, but one unique aspect of Qure.AI and vRad’s work is the sheer scale of the undertaking.  vRad’s PACS ingests millions of studies a year, around 1 billion individual images annually. The vRad platform, including the PACS, RIS, and AI Inference Engine, route those studies to the right AI models and the right radiologists, radiologists perform thousands of reads each night, and NLP helps them report and analyze those reports for continual feedback both to radiologists as well as AI models and monitoring.  Qure.AI’s ICH model plugged into the platform and demonstrated robustness as well as impressive sensitivity and specificity.

During vRad and Qure.ai’s validation, we were able to run hundreds of thousands of studies in parallel with our production workloads, validating that the model and the solution for hosting the model was able to not only generalize for sensitivity and specificity but overcome all of these other technical challenges that are often issues in large-scale deployments of AI solutions.

Categories
Uncategorized

Morphology of the Brain: Changes in Ventricular and Cranial Vault Volumes in 15000 subjects with Aging and Hydrocephalus

This post is Part 1 of a series that uses large datasets (15,000+) coupled with deep learning segmentation methods to review and maybe re-establish what we know about normal brain anatomy and pathology. Subsequent posts will tackle intra-cranial bleeds, their typical volumes and locations across similarly sized datasets.

Brain ventricular volume has been quantified by post-mortem studies [1] and pneumoencephalography. When CT and subsequently MRI became available, facilitating non-invasive observation of the ventricular system larger datasets could be used to study these volumes. Typical subject numbers in recent studies have ranged from 50 – 150 [26].

Now that deep learning segmentation methods have enabled automated precise measurements of ventricular volume, we can re-establish these reference ranges using datasets that are 2 orders of magnitude larger. This is likely to be especially useful for age group extremes – in children, where very limited reference data exist and the elderly, where the effects of age-related atrophy may co-exist with pathologic neurodegenerative processes.

To date, no standard has been established regarding the normal ventricular volume of the human brain. The Evans index and the bicaudate index are linear measurements currently being used as surrogates to provide some indication that there is abnormal ventricular enlargement [1]. True volumetric measures are preferable to these indices for a number of reasons [7, 8] but have not been adopted so far, largely because of the time required for manual segmentation of images. Now that automated precise quantification is feasible with deep learning, it is possible to upgrade to a more precise volumetric measure.

Such quantitative measures will be useful in the monitoring of patients with hydrocephalus, and as an aid to diagnosing normal pressure hydrocephalus. In the future, automated measurements of ventricular, brain and cranial volumes could be displayed alongside established age- and gender-adjusted normal ranges as a standard part of radiology head CT and MRI reports.

Methods and Results

To train our deep learning model, lateral ventricles were manually annotated in 103 scans. We split these scans randomly with a ratio of 4:1 for training and validation respectively. We trained a U-Net to segment lateral ventricles in each slice. Another U-Net model was trained to segment cranial vault using a similar process. Models were validated using DICE score metric versus the annotations.

AnatomyDICE Score

Lateral Ventricles0.909
Cranial Vault0.983

Validation set of about 20 scans might not have represented all the anatomical/pathological variations in the population. Therefore, we visually verified that the resulting models worked despite pathologies like hemorrhage/infarcts or surgical implants such as shunts. We show some representative scans and model outputs below.

Focal ventricle dilatation

30 year old male reported with 'focal dilatation of left lateral ventricle.'

Mild Hydrcephalus

7 year old female child reported with 'mild obstructive hydrocephalus'

Mild Hydrcephalus

28 year old male reported with fracture and hemorrhages

Shunt

36 year old male reported with an intraventricular mass and with a VP shunt

To study lateral ventricular and cranial vault volume variation across the population, we randomly selected 14,153 scans from our database. This selection contained only 208 scans with hydrocephalus reported by the radiologist. Since we wanted to study ventricle volume variation in patients with hydrocephalus, we added 1314 additional scans reported with ‘hydrocephalus’. We excluded those scans for which age/gender metadata were not available.
In total, our analysis dataset contained 15223 scans whose demographic characteristics are shown in the table below.

CharacteristicValue

Number of scans15223
Females6317 (41.5%)
Age: median (interquartile range)40 (24 – 56) years
Scans reported with cerebral atrophy1999 (13.1%)
Scans reported with hydrocephalus1404 (9.2%)

Dataset demographics and prevalances.

Histogram of age distribution is shown below. It can be observed that there are reasonable numbers of subjects (>200) for all age and sex groups. This ensures that our analysis is generalizable.

age histogram

We ran the trained deep learning models and measured lateral ventricular and cranial vault volumes for each of the 15223 scans in our database. Below is the scatter plot of all the analyzed scans.

Scatter plot

In this scatter plot, x-axis is the lateral ventricular volume while y-axis is cranial vault volume. Patients with atrophy were circled with marked orange and while scans with hydrocephalus were marked with green. Patients with atrophy were on the right to the majority of the individuals, indicating larger ventricles in these subjects. Patients with hydrocephalus move to the extreme right with ventricular volumes even higher than those with atrophy.

To make this relationship clearer, we have plotted distribution of ventricular volume for patients without hydrocephalus or atrophy and patients with one of these.

ventricular volume distribution

Interestingly, hydrocephalus distribution has a very long tail while distribution of patients with neither hydrocephalus nor atrophy has a narrower peak.

Next, let us observe cranial vault volume variation with age and sex. Bands around solid lines indicate interquartile range of cranial vault volume of the particular group.

cranial vault volume variation

An obvious feature of this plot is that the cranial vault increases in size until age of 10-20 after which it plateaus. The cranial vault of males is approximately 13% larger than that of females. Another interesting point is that the cranial vault in males will grow until the age group of 15-20 while in the female group it stabilizes at ages of 10-15.

Now, let’s plot variation of lateral ventricles with age and sex. As before, bands indicate interquartile range for a particular age group.

lateral ventricular volume variation

This plot shows that ventricles grow in size as one ages. This may be explained by the fact that brain naturally atrophies with age, leading to relative enlargement of the ventricles. This information can be used as normal range of ventricle volume for a particular age in a defined gender. Ventricle volume outside this normal range can be indicative of hydrocephalus or a neurodegenerative disease.

While the above plot showed variation of lateral ventricle volumes across age and sex, it might be easier to visualize relative proportion of lateral ventricles compared to cranial vault volume. This also has a normalizing effect across sexes; difference in ventricular volumes between sexes might be due to difference in cranial vault sizes.

relative lateral ventricular volume variation

This plot looks similar to the plot before, with the ratio of the ventricular volume to the cranial vault increasing with age. Until the age of 30-35, males and females have relatively similar ventricular volumes. After that age, however, males tend to larger relative ventricular size compared to females. This is in line with prior research which found that males are more susceptible to atrophy than females[10].

We can incorporate all this analysis into our automated report. For example, following is the CT scan of an 75 year old patient and our automated report.

 

CT scan of a 75 Y/M patient.
Use scroll bar on the right to scroll through slices.

qER Analysis Report
===================

Patient ID: KSA18458
Patient Age: 75Y
Patient Sex: M

Preliminary Findings by Automated Analysis:

- Infarct of 0.86 ml in left occipital region.
- Dilated lateral ventricles.
  This might indicate neurodegenerative disease/hydrocephalus.
  Lateral ventricular volume = 88 ml.
  Interquartile range for male >=75Y patients is 28 - 54 ml.

This is a report of preliminary findings by automated analysis.
Other significant abnormalities may be present.
Please refer to final report.

Our auto generated report. Added text is indicated in bold.

Discussion

The question of how to establish the ground truth for these measurements still remains to be answered. For this study, we use DICE scores versus manually outlined ventricles as an indicator of segmentation accuracy. Ventricle volumes annotated slice-wise by experts are an insufficient gold-standard not only because of scale, but also because of the lack of precision. The most likely places where these algorithms are likely to fail (and therefore need more testing) are anatomical variants and pathology that might alter the structure of the ventricles. We have tested some common co-occurring pathologies (hemorrhage), but it would be interesting to see how well the method performs on scans with congenital anomalies and other conditions such as subarachnoid cysts (which caused an earlier machine-learning-based algorithm to fail [9]).

  • Recording ventricular volume on reports is a good idea for future reference and monitor ventricular size in individuals with varying pathologies such as traumatic brain injury and colloid cysts of the third ventricle.
  • It provides an objective measure to follow ventricular volumes in patients who have had shunts and can help in identifying shunt failure.
  • Establishing the accuracy of these automated segmentation methods algorithms also paves the way for more nuanced neuroradiology research on a scale that was not previously possible.
  • One can use the data in relation to the cerebral volume and age to define hydrocephalus, atrophy and normal pressure hydrocephalus.

References

  1. EVANS, WILLIAM A. “An encephalographic ratio for estimating ventricular enlargement and cerebral atrophy.” Archives of Neurology & Psychiatry 47.6 (1942): 931-937.
  2. Matsumae, Mitsunori, et al. “Age-related changes in intracranial compartment volumes in normal adults assessed by magnetic resonance imaging.” Journal of neurosurgery 84.6 (1996): 982-991.
  3. Scahill, Rachael I., et al. “A longitudinal study of brain volume changes in normal aging using serial registered magnetic resonance imaging.” Archives of neurology 60.7 (2003): 989-994.
  4. Hanson, J., B. Levander, and B. Liliequist. “Size of the intracerebral ventricles as measured with computer tomography, encephalography and echoventriculography.” Acta Radiologica. Diagnosis 16.346_suppl (1975): 98-106.
  5. Gyldensted, C. “Measurements of the normal ventricular system and hemispheric sulci of 100 adults with computed tomography.” Neuroradiology 14.4 (1977): 183-192.
  6. Haug, G. “Age and sex dependence of the size of normal ventricles on computed tomography.” Neuroradiology 14.4 (1977): 201-204.
  7. Toma, Ahmed K., et al. “Evans’ index revisited: the need for an alternative in normal pressure hydrocephalus.” Neurosurgery 68.4 (2011): 939-944.
  8. Ambarki, Khalid, et al. “Brain ventricular size in healthy elderly: comparison between Evans index and volume measurement.” Neurosurgery 67.1 (2010): 94-99.
  9. Yepes-Calderon, Fernando, Marvin D. Nelson, and J. Gordon McComb. “Automatically measuring brain ventricular volume within PACS using artificial intelligence.” PloS one 13.3 (2018): e0193152.
  10. Gur, Ruben C., et al. “Gender differences in age effect on brain atrophy measured by magnetic resonance imaging.” Proceedings of the National Academy of Sciences 88.7 (1991): 2845-2849.