In Focus Uncategorized

Qure’s AI for detecting risk of heart failure

Every year, approximately 17.9 million lives are lost to cardiovascular diseases (CVD), the leading cause of death across the world. The rates of heart failure misdiagnosis range from 16.1% in hospitals to 68.5% in GP referral settings. In the EU alone, the economic burden of cardiovascular diseases exceeds €210 Bn.

A systematic analysis on 10 studies done across 5 countries found patients groups with comorbidities and COPD, and the elderly population in nursing homes were more likely to have unrecognized heart failure.

Turkey is known to have a higher prevalence of heart failure and Atrioventricular Septal Defect (AVSD) as compared to the western world. With millions of chest radiographs done annually for a host of reasons, a tool that could screen the data used in these studies to predict the early signs for risk of heart failure could be ground-breaking for care and patient outcomes.

Output generated by Chest X-ray AI Solution

Enlargement of heart in cases of heart failure

At the start of 2021, the Department of Cardiology at the Mersin University Faculty of Medicine initiated a study under Digital Transformation with Artificial Intelligence in Health with support from AstraZeneca Turkey to use Qure’s AI solutions to understand the role of AI in predicting heart failures early from incidental findings on Chest X-rays. Only patients who were previously not suspected or identified for signs of heart failure were included in the study.

Post risk assessment using the AI tool on chest radiographs, the department approached at-risk patients for follow-up tests. A larger number of patients were identified as at-risk but since this study was conducted during the pandemic restrictions, not all individuals came back to the hospital for follow-up. Of the high risk patients who came for follow up tests, 86% were identified to be confirmed heart failure patients. These individuals had confirmatory diagnoses with tests such as NT-proBNP and Echocardiography.

The results of this year-long exercise have the potential to change the use of AI in cardiology altogether.

Prof. Dr. Ahmet Çelik, President at Heart Failure Working Group of Turkish Society of Cardiology and the Principal Investigator in this research said,

“In this study, which was carried out for the early diagnosis of heart failure, the power of artificial intelligence to predict heart failure by looking at lung X-rays was realized with a sensitivity of 89.1 percent and a selectivity of 86.4 percent. More importantly 65.3 percent of patients diagnosed with heart failure had Preserved Ejection Fraction Heart Failure which is difficult to diagnose.”

 Qure’s AI solution has been found to have 95%+ sensitivity for both cardiomegaly and pleural effusion. It could potentially be a game-changer as a silent reader, without increasing the work burden on healthcare professionals or adding significant costs by changing care pathways. It could screen all chest radiographs done worldwide on non-suspecting cases adding thousands of undiagnosed cases onto the cardiology risk assessment, diagnosis, and eventually treatment pathway. With a well-thought-through system for detection and diagnosis, this technology could mean more lives saved with minimal additional investment.

AstraZeneca Middle East and Africa Region Medical Director Dr. Viraj Rajadhyaksha stated,

“By applying advanced artificial intelligence and machine learning approaches to patients who go to different units for many reasons, this project will enable them to touch the lives of patients who are diagnosed early and to meet the right treatments much earlier. The results of the research have the potential to create an early detection tool for heart failure for the first time in the world.”

AstraZeneca team aims to expand the project nationally and apply it to every lung x-ray taken. There has been research exploring the possibility of using AI for X-ray-based cardiac failure detection in a study setting. However, the potential impact on patients has not been demonstrated at such a scale before. This opens a world of opportunities for further focussed research evaluations to ascertain protocols of bringing in clinical practice.

Prof. Dr. Ahmet Çamsari, the Rector of Mersin University is a strong believer in the potential of AI to impact the diagnostic pathway for patients. He said,

“Our project will be one of the first projects where artificial intelligence is used in the early diagnosis of undiagnosed and suspected heart failure patients in our country and even in the world. In line with the results obtained, we aim to expand the project nationally and apply it to every lung x-ray taken. Again, we hope that these systems can be used in other fields such as radiological oncology and that artificial intelligence projects that touch the lives of patients can be implemented.”

In Focus Uncategorized

Aarthi Scans: Scripting tele-radiology growth in India

The rapidly increasing need for radiology diagnostic and image interpretation services around the world has brought two major issues to light. The first is the lack of radiologists and the other is the dearth of specialized knowledge.

Building reliable communication and image transfer systems to tap into the expertise of radiologists who are not on-site can solve these issues to some extent. Hospitals, mobile imaging firms, urgent care clinics, and even some private practices all around the world are increasingly using tele-radiology. Tele-radiology improves patient care by allowing radiologists to provide services without having to be physically present at the imaging site so that the patients can receive round-the-clock access to trained specialists.

Tele-radiology is significantly less expensive than having a radiologist on-site. These services are typically priced per exam, with the cost as low as $1 per X-ray Tele-radiology has transformed the practice of many radiology clinics around the world, allowing them to provide results faster and by facilitating access to the radiologist, adding enormous value to the diagnostic process.

To set up a tele-radiology system between two centres, one requiring radiologist’s services and one providing radiologist’s services, the following elements are needed:

  1. Modality – a system that captures the medical image and has the facility to send these images in the preferred format i.e., DICOM
  2. PACS – a system that stores, sends, and receives medical images (DICOMs) and that can be identifiable by a unique address (like IP address, port number, etc.)
  3. Gateway – a medium that handles communication between the two centers (source and the destination) – receives the medical images from the source and sends back the output in the required format) using API calls. Complying with the data security and Protected Health Information (PHI) standards, de-identification and re-identification of confidential information can be performed by the Gateway.
  4. API Hub – a single place where all the Application Programming Interfaces (APIs) can be published and shared with external parties/clients by the intended service provider (tele-radiology) com

Introducing Aarthi Scans: India’s largest Tele-radiology service provider

India with its population of around 1.44 billion, right now we have around 1 radiologist for 100,000 people. Most rural India still lacks adequate radiological services and personnel, and not all imaging centers have subspecialty expertise, tele-radiology plays a significant role in quality diagnostics.

Aarthi Scans and Labs, one of the largest diagnostic centers were started in the year 2000 by Mr. Govindarajan. Today Aarthi Scans has more than 100+ branches across 10 states.It was in 2011 that they started their tele-radiology services to provide quick reporting for emergency cases and nighttime reporting and to ensure continuous reporting even when radiologists go on leave. Their radiologists’ review over 200,000 CTs, 250,000 MRIs, and 2.1 million chest X-rays annually.

“At that point of time in 2011, when we started tele-radiology, telemedicine as a concept had not evolved in India. Radiologists giving reports without being present at the scanning location was viewed with skepticism. But once the referring doctors started viewing the benefits of tele-radiology, like nighttime reports, subspecialty reporting, they were impressed. Radiologists also needed a lot of convincing to report tele-radiology images. We standardised and digitised patient history, records, improved communication channels between tele-radiologists and radiographer in scanning sites and there was a slow and steady adoption by the Radiologist community. Our PACS vendor – Mr Ravindran from Innowave Healthcare Technologies helped us a great deal in solving the workflow related issues and helping us choose the right technology for us.”

 Govindarajan, Aarthi Scans and Labs

Aarthi Scans has taken a step forward by incorporating Artificial Intelligence (AI) into their reporting procedure, demonstrating their commitment to staying on the cutting edge of technology. The ratio of one radiologist to more than 100,000 people in India has resulted from stress in radiology reporting, scan misreads and reporting delays. Any solution that might assist radiologists to relax and improve their productivity is always welcome at the technologically advanced setup at Aarthi Scans, and AI can be of immense value add in this scenario.’s qXR, a Chest X-ray interpretation software – a CE Class II certified product – has been installed in Aarthi Scans diagnostic centers. The most common application of qXR in this setting is for Radiologist assistance to triage any scans with abnormalities on the worklist. The images are scanned and interpreted in under a minute. All scans that qXR identifies some findings in, are saved as a draft in the radiology worklist for further assessment and reporting. The report is generated in a natural language, significantly reducing the typing time that constitutes a significant portion of the reporting time. The final report is released in 30% lesser time due to this triaging mechanism and reading assistance by qXR. is a leading solution provider and we validated a few solutions before choose  We chose qXR because the accuracy in categorising a study as normal or abnormal is very high (95%)!”.  “We have been using qXRin our day-to-day radiological practice across India in all our branches. We are huge fans of qXR’s accuracy and utility.”

– Dr. Arunkumar Govindarajan, Director, Aarthi Scans and Labs

Technical Integration

Qure PACS gateway for acquiring the Chest X-rays is integrated with Freedom Nano PACS which is present in every center of Aarthi Scans. Each center is authenticated with a unique token for the transmission of the studies and their corresponding results. The end-to-end transmission is supported by API calls that communicate with the Qure API hub to send the studies to the qXR AI models for processing. The AI interpretations are sent back to Freedom Nano PACS, where the radiologist can view the result from the individual centers. API and back to Freedom Nano PACS, where the radiologist can view the result from the individual centers. API and https-based communication make the data secured even on the cloud.

Since partnering with Aarthi Scans four months ago, qXR has processed over 45,000 scans, triaging 55% of scans with abnormal findings. On a daily basis, qXR processes 200 chest X-rays.

Scans that are classified as normal by qXR can be evaluated by the radiologist more quickly, giving them more time to review the abnormal scans and cutting down on overall reporting time. This has led to a reduction in the TAT by 30%. The qXR Secondary Capture output also uses contours to localize anomalies in the lungs, enabling radiologists to recognize abnormalities more accurately, without spending as much time as a regular scan.

Finalising a Chest X-ray report as ‘normal’ is like passing through the valley of uncertainty for every Radiologist and qXR is like that friendly colleague who assists you with a second opinion / confirmation without bias. qXR saves our Radiologists’ time and removes doubt while reporting.qXR has resulted in a 30% reduction in reporting time for our Radiologists.

– Dr. Aarthi, Director, Aarthi Scans and Labs

Insights into incorporating AI into the practice – Dr. Arunkumar Govindarajan

“Learning about basics of Artificial Intelligence (AI) has helped me a lot to understand the inner workings of AI and terminologies. To start and understand in deep about AI one can take Coursera courses like – “AI for Everyone" by Andrew Ng and “AI for Medical Diagnosis” by DeepLearningAI. There are a lot of AI solutions out there, one can research in google to find which will suit your patients’ and radiologist’s needs. Once you fix a good AI vendor –

  • do your own validation and provide transparent feedback
  • partner with a good IT & PACS vendor and fix a workflow suited to your organization’s needs. AI integration into PACS takes a bit of effort from the AI and PACS vendor. Be present during the meetings to ease the process and quickly resolve doubts”

What’s Next?

After successful operations with qXR in all our centers, we will be next deploying Qure’s AI solution, qER for detecting brain abnormalities from head CT scans.

“Time is Brain and quick qER report never goes in vain” Dr. Arunkumar Govindarajan, Director, Aarthi Scans and Labs


qCT-Lung: Catching lung cancer early

In this blog, we will unbox qCT-Lung – our latest AI powered product that analyses Chest CT scans for lung cancer. At, we have always taken a holistic approach towards building solutions for lung health. qXR provides automated interpretation of chest X-rays and is complemented by qTrack, a disease & care pathway management platform with AI at its core. qCT-Lung augments our lung health suite with the ability to detect lung nodules & emphysema on chest CTs and analyze their malignancy. It can quantify & track nodules over subsequent scans. qCT-Lung is a CE certified product.

qCT-Lung banner

qCT-Lung: Catching lung cancer early


Medical Imaging has seen the biggest healthcare advancements in artificial intelligence (AI) and lung health has been at the forefront of these improvements. Lung health has also been a key domain of our product portfolio. We’ve built AI algorithms like qXR, which provides automated interpretation of chest X-rays. We augmented its capabilities with qTrack – our AI powered disease management platform, which solves for active case finding & tracking patients in care pathways. These applications have empowered healthcare practitioners at all stages of the patient journey in TB, Covid-19 & lung cancer screenings.

We’re adding a new member to our lung health suite: qCT-Lung. Its AI-powered algorithms can interpret chest CTs for findings like lung nodules & emphysema, and analyze their malignancy. It empowers clinicians to detect lung cancer in both screening programs as well as opportunistic screening settings.

qXR & qCT-Lung’s abilities to support clinicians with detection of lung cancer on chest X-rays & CTs complement qTrack’s disease management & patient tracking capability. Together, they round up our lung health portfolio to make it a comprehensive, powerful & unique offering.

Lung Cancer – The most fatal cancer

Lung cancer is the second most common cancer in both men & women. 2.2 million people were diagnosed with lung cancer worldwide in 2020 [1]. With 1.74 million deaths in 2020, lung cancer is also the leading cause of cancer related deaths (18.4%) resulting in more deaths than the second and third deadliest cancers combined (colorectal – 9.2% & stomach – 8.2%).

Future projections don’t look good either. Lung cancer incidents are projected to rise by 38% and the mortality is projected to rise by 39% by 2030 [2].

There are two main types of lung cancer:

  • Non-small cell lung cancer (NSCLC): NSCLC comprises of 80-85% of all lung cancer cases. Their major subtypes are adenocarcinoma, squamous cell carcinoma, and large cell carcinoma. They are grouped together because of shared similarity in treatment & prognoses.
  • Small cell lung cancer (SCLC): SCLC tends to grow and spread faster than NSCLC. 10-15% of all lung cancers are SCLC.

There are also cancers that start in other organs (like breast) and spread to lung, but they don’t come under the vicinity of lung cancer.

Early detection & outcomes

Survival rates

The 5-year survival is a measure of what percent of people live at least 5 years after the cancer is found. The 5-year survival rates for both NSCLC & SCLC look as follows [4]:

Lung Cancer Survival rates

Lung Cancer Survival rates

The data shows that lung cancer mortality can be reduced significantly if detected & treated early.

Early detection

Data from England shows that the chances of surviving for at least a year decrease from 90% to 20% for the earliest to most advanced stage of lung cancer [5]. WHO elaborates on two components for early detection [6]:

Early diagnosis

Early identification of cancer results in better response to treatment, greater chances of survival, lesser morbidity & less expensive treatment. It comprises of 3 components:

  • Being aware of early symptoms of lung cancer like persistent cough, coughing up blood, pain in breathing, continuous breathlessness, loss of appetite, unexplained weight loss, etc [7].
  • access to clinical evaluation and diagnostic services
  • timely referral to treatment services.


Screening is aimed at identifying individuals with findings suggestive of lung cancer before they have developed symptoms. Further tests are conducted to establish if the diagnosis should be followed or referral for treatments should be made. They’re effective because symptoms of lung cancer do not appear until the disease is already at an advanced stage.

Lung Cancer Screening Programs

Screening programs use regular chest X-rays and low dose CT/ CAT scans to study people at higher risk of getting lung cancer. CT scans have proven to be more effective than X-rays. They resulted in a 20% reduction in lung cancer-specific deaths as compared to X-rays [2]. However, X-rays are more accessible and cheaper and thus, are important for low-income settings.

The U.S. Preventive Services Task Force (USPSTF) recommends yearly lung cancer screening with LDCT for people who [9]:

  • Have a 20 pack-year or more smoking history, and
  • Smoke now or have quit within the past 15 years, and
  • Are between 50 and 80 years of age.

Challenges in radiology screening today

Chest CTs are comparatively more accurate than chest X-rays for identification of thoracic abnormalities. This is because of lack of superimposition, greater contrast, and spatial resolution. However, there are many challenges in identifying & reporting lung cancer on Chest CTs. These challenges can be divided into the following categories:


A study revealed that 42.5% of malpractice suits on radiologists are because of failure to diagnose lung cancer [14]. These lawsuits can cost as high as $10M [15]. Misdiagnosis can occur due to two reasons [11]:

  • Lesion characteristics: Small dimension, poor conspicuousness , ill-defined margins and central location are the most common lesion characteristics that lead to missed lung cancers incidences.
  • Observer Error: There are multiple sources of observer error like:
    • Recognition error consists of missed detection of lesions.
    • Decision making error includes cases of inaccurately interpreted characteristics of a detected malignant lesion as benign/ normal.
    • Satisfaction of search error occurs when the observer fails to continue to search for subsequent abnormalities after identifying an initial one. Typically, this happens due to two possible mechanisms: ceasing the search for other abnormalities early in a positive exam and focusing on the wrong part of the exam.

Analysis & Tracking

Post detection of a lesion, a major challenge is to analyse its characteristics and determine malignancy. Even when the lesion’s malignancy is determined correctly, tracking them over subsequent scans is challenging for screening programs due to lack of appropriate CADs & tools.

Structured reporting & Follow-ups

Structured reporting helps to categorize results and recommend follow-ups based on chances of malignancy by considering size, appearance, and growth of the lesion. Further, volume measurement & volume doubling times (VDT) have been proposed in the management protocol of NELSON lung cancer screening trial [13]. All these metrics are challenging to calculate & report in absence of appropriate tools. This makes it hard to standardize follow up recommendations based on guidelines like Fleishner Society or Lung-RADS scores.

Detecting relevant co-findings

Certain other pulmonary findings like COPD (chronic obstructive pulmonary disease) are an independent risk factor for lung cancer. Lung cancer screening subjects have a high prevalence of COPD which accounts for significant morbidity and mortality.
One of the major benefits of emphysema (a type of COPD) quantification in lung cancer screening patients is an earlier diagnosis and therapy of COPD with smoking cessation strategies. It can potentially lead to less COPD-related hospitalizations.

Time constraints

Interpreting CT scans is a time intensive process. A CT scan can have 16 to 320 slices compared to one or two images in an X-ray. Radiologists spend 5-10 minutes to interpret & report each CT scan.

For chest CTs, detecting small nodules through hundreds of slices consumes a lot of time. There are tools that help with some of these issues but none of them solve for lung cancer screening comprehensively.

qCT-Lung: AI powered lung nodule interpretation tool

qCT-Lung empowers lung cancer screening programs and facilitates opportunistic screening by detecting malignant lesions using AI. It is aimed at helping clinicians with all the issues discussed in the previous section – misdiagnosis, analysis, reporting, detection of co-findings & reducing time constraints. The algorithm is trained on more than 200k chest CTs and can detect, analyze, monitor and auto-report lung nodules.
This is how qCT-Lung assists clinicians in interpreting chest CTs for lung nodules:

Detecting & Quantifying Lesions

Secondary Capture

Secondary Capture with detected nodule

qCT can distinguish lung lesions from complex anatomical structures on lung CTs and minimize instances of letting lung cancers go undetected, by preventing nodules from being overlooked on scans. Faster and more accurate detection helps decrease time to treatment and improves patient outcomes.


  • Detects lung nodules as small as 3mm with high accuracy (sensitivity of 95% and less than 1 false positives per scan)
  • Detects emphysema
  • Reduces the chance of missed nodules
  • Auto quantification of diameter & volume

Analysis & Growth Monitoring

Nodule Analysis & Malignancy Risk

Nodule Analysis & Malignancy Risk

qCT analyzes nodule characteristics to determine malignancy. The algorithm also assigns a malignancy risk score for each of the nodules that helps clinicians plan treatments.


  • Analyses localization, spiculation, size, calcification & texture (solid, sub-solid & ground glass nodules)
  • Calculates Malignancy Risk Score
  • Measures volumetry and tracks growth of nodules
  • Predicts nodule volume doubling time
  • Precisely quantifies response to treatment

Reporting Assistance

Pre-filled report with suggested follow-ups

Pre-filled report with suggested follow-ups

qCT-Lung utilizes pre-populated results to offer clinicians faster reporting, that reduces time to treatment and further diagnosis. It can also recommend timelines for follow-up scans.


  • Automates reporting to save time and reduce reporting workload
  • Pre-fed with the Lung-RADS & Fleischer Society Guidelines to suggest follow-ups.

Modifiable Results

qCt-Lung also offers a lung nodule reporting platform that is designed for screening programs. It enables clinicians to choose which nodules to include in the report and also to add new nodules. The platform pre-populates the image viewer with nodules identified by qCT-Lung. Clinicians can then exclude or add new nodules to this list. The final list after these changes is sent to the RIS.


The platform empowers physicians to modify the results generated by qCT-Lung and report on what’s profoundly important for them.

Qure’s Lung Health suite: A 3–pronged approach

Qure's Lung Health Suite

We have built an end-to-end portfolio for managing lung cancer screenings in all kinds of resource-settings. Lung cancer screening has many challenges. While CTs are recommended imaging modality, resource limited settings must depend on X-rays for its cost benefit and easy availability. Patient tracking, disease management and long term follow up for individuals with high-risk cases are also a challenge. Our comprehensive lung health suite takes care of these challenges.

  1. qXR – our chest X-ray interpretation algorithm detects lung nodules on X-rays with high accuracy.
  2. qCT-lungs does the same on chest CTs.
  3. qTrack is built and designed for community screening to track an individual’s disease and manage care pathways.

Together, these solutions can help in active case screening, monitoring disease progression, reducing turn-around-time, linking care to treatment, & improving care pathways.

Write to us at to integrate qCT-Lung in your lung nodule management pathway.


  1. Key Statistics for Lung Cancer
  2. Lung Cancer Fact Sheet
  3. What Is Lung Cancer?
  4. Lung Cancer Survival Rates
  5. Cancer Research UK: Why is early diagnosis important?
  6. WHO: Fact Sheet on Cancer
  7. NHS UK: Lung Cancer Symptoms
  8. Can Lung Cancer Be Found Early?
  9. CDC: Who Should Be Screened for Lung Cancer?
  10. National Lung Screening Trial Research Team, Aberle DR, Berg CD, et al. “The National Lung Screening Trial: overview and study design.” Radiology. 2011;258(1):243–253.
  11. del Ciello A, et al. “Missed lung cancer: when, where, and why? Diagnos.” Intervent. Radiol. 2017;23:118–126. doi: 10.5152/dir.2016.16187.
  12. Widmann, G. “Challenges in implementation of lung cancer screening—radiology requirements.” memo 12, 166–170 (2019).
  13. Dong Ming Xu, Hester Gietema, Harry de Koning, René Vernhout, Kristiaan Nackaerts, Mathias Prokop, Carla Weenink, Jan-Willem Lammers, Harry Groen, Matthijs Oudkerk, Rob van Klaveren, “Nodule management protocol of the NELSON randomised lung cancer screening trial”, Lung Cancer, Volume 54, Issue 2, 2006, Pages 177-184, ISSN 0169-5002
  14. Baker SR, Patel RH, Yang L, Lelkes VM, Castro A 3rd. “Malpractice suits in chest radiology: an evaluation of the histories of 8265 radiologists.” J Thorac Imaging. 2013 Nov;28(6):388-91.
  15. HealthImaging: Lung cancer missed on CT prompts $10M lawsuit against U.S. government


Time is Brain: AI helps cut down stroke diagnosis time in the Himalayan foothills

Stroke is a leading cause of death. Stroke care is limited by the availability of specialized medical professionals. In this post, we describe a physician-led stroke unit model established at Baptist Christian Hospital (BCH) in Assam, India. Enabled with qER, Qure’s AI driven automated CT Brain interpretation tool, BCH can quickly and easily determine next steps in terms of treatment and examine the implications for clinical outcomes.

qER at a Stroke unit

Across the world, Stroke is a leading cause of death, second only to ischemic heart disease. According to the the World Stroke Organization (WSO), 13.7 million new strokes occur each year and there are about 80 million stroke survivors globally. In India as per the Health of the Nation’s State Report we see an incidence rate of 119 to 152/100000, and has a case fatality rate of 19 to 42% across the country.

Catering to tea plantation workers in and around the town of Tezpur, the Baptist Christian Hospital, Tezpur (BCH) is a 130-bed secondary care hospital in the North eastern state of Assam in India. This hospital is a unit of the Emmanuel Hospital Association, New Delhi. From humble beginnings, offering basic dispensary services, the hospital grew to become one of the best healthcare providers in Assam, being heavily involved in academic and research work at both national and international levels.

Nestled below the Himalayas, interspersed with large tea plantations, Assamese indigenous population and tea garden workers showcase a prevalence of hypertension, the largest single risk factor of stroke, reportedly between 33% to 60.8%. Anecdotal reports and hospital-based studies indicate a huge burden of stroke in Assam – a significant portion of which is addressed by Baptist Hospital. Recent study showed that hemorrhagic strokes account for close to 50% of the cases here, compared to only about 20% of the strokes in the rest of India.

Baptist Christian Hospital

Baptist Christian Hospital, Tezpur. Source

Challenges in Stroke Care

One of the biggest obstacles in Stroke Care is the lack of awareness of stroke symptoms and the late arrival of the patient, often at smaller peripheral hospitals, which are not equipped with the necessary scanning facilities and the specialists, leading to a delay in effective treatment.

The doctors and nurses of the Stroke Unit at BCH, Tezpur were trained online by specialist neurologists, who in turn trained the rest of the team on a protocol that included Stroke Clinical Assessment, monitoring of risk factors and vital parameters, and other supportive measures like management of Swallow assessment in addition to starting the rehabilitation process and advising on long term care at home. A study done at Tezpur indicated that post establishment of Stroke Unit, there was significant improvement in the quality of life along with reduction in deaths compared to the pre-Stroke Unit phase.

This is a crucial development in Stroke care especially in the low and middle income countries(LMIC) like India, to strengthen the peripheral smaller hospitals which lack specialists and are almost always the first stop for patients in emergencies like Stroke.

Stroke pathway barriers

This representative image details the acute stroke care pathway. Source

The guidelines for management of acute ischemic stroke involves capturing a non-contrast CT (NCCT) study of the brain along with CT or MRI angiography and perfusion and thrombolysis-administration of rTPA (Tissue Plasminogen Activator) within 4.5 hours of symptom onset. Equipped with a CT machine and teleradiology reporting, the physicians at BCH provide primary intervention for these stroke cases after a basic NCCT and may refer them to a tertiary facility, as applicable. They follow a Telestroke model-in cases where thrombolysis is required, the ER doctors consult with neurologists at a more specialized center and the decision making is done upon sharing these NCCT images via phone-based mediums like WhatsApp while severe cases of head trauma are referred for further management to far away tertiary facilities. There have been studies done on a Physician based Stroke Unit model in Tezpur, that has shown an improvement in treatment outcomes.

How is helping BCH with stroke management?

BCH and Qure have worked closely since the onset of the COVID-19 pandemic, especially at a time when confirmatory RT-PCR kits were limiting. qXR, Qure’s AI aided chest X-ray solution had proved to be a beneficial addition for identification of especially asymptomatic COVID-19 suspects and their treatment and management, beyond its role in comprehensive chest screening.

qER messages


In efforts to improve the workflow of stroke management and care at the Baptist hospital, qER, FDA approved and CE certified software which can detect 12 abnormalities was deployed. The abnormalities including five types of Intracranial Hemorrhages, Cranial Fractures, Mass effect, midline Shift, Infarcts, Hydrocephalus, Atrophy etc in less than 1-2 minutes of the CT being taken. qER has been trained on CT scans from more than 22 different CT machine models, thus making it hardware agnostic. In addition to offering a pre-populated radiology report, the HIPAA compliant qER solution is also able to label and annotate the abnormalities in the key slices.

Since qER integrates seamlessly with the existing technical framework of the site, the deployment of the software was completed in less an hour along with setting up a messaging group for the site. Soon after, within minutes of taking the Head CT, qER analyses were available in the PACS worklist along with messaging alerts for the physicians’ and medical team’s review on their mobile phones.

The aim of this pilot project was to evaluate how qER could add value to a secondary care center where the responsibility for determination of medical intervention falls on the physicians based on teleradiology report available to them in a span of 15-60 minutes. As is established with stroke care, every minute saved is precious.

Baptist Christian Hospital

Physician using qER

At the outset, there were apprehensions amongst the medical team about the performance of the software and its efficacy in improving the workflow, however, this is what they have to say about qER after 2 months of operation:

“qER is good as it alerts the physicians in a busy casualty room even without having to open the workstation. We know if there are any critical issues with the patient” – Dr. Jemin Webster, a physician at Tezpur.

He goes on to explain how qER helps grab the attention of the emergency room doctors and nurses to critical cases that need intervention, or in some instances, referral. It helps in boosting the confidence of the treating doctors in making the right judgement in the clinical decision-making process. It also helps in seeking the teleradiology support’s attention into the notified critical scans, as well as the scans of the stroke cases that are in the window period for thrombolysis. Dr. Jemin also sees the potential of qER in the workflow of high volume, multi-specialty referral centers, where coordination between multiple departments are required.

The Way Ahead

A technology solution like qER can reduce the time to diagnosis in case of emergencies like Stroke or trauma and boosts the confidence of Stroke Unit, even in the absence of specialists. The qER platform can help Stroke neurologists in the Telestroke settings access great quality scans even on their smartphones and guide the treating doctors for thrombolysis and further management. Scaling up this technology to Stroke units and MSUs can empower peripheral hospitals to manage acute Stroke especially in LMICs.

We intend to conduct an observational time-motion study to analyze the Door-to- Needle time with qER intervention via instant reports and phone alerts as we work through the required approvals. Also in the pipeline is performance comparison of qER reporting against the Radiologist report as ground truth along with comparison of clinical outcomes and these parameters before and after introduction of qER into the workflow. We also plan to extend the pilot project to Padhar Mission Hospital, MP and the Shanthibhavan Medical Center, Simdega, Jharkhand.

Qure team is also working on creating a comprehensive stroke platform which is aimed at improving stroke workflows in LMICs and low-resource settings.


Smarter City: How AI is enabling Mumbai battle COVID-19

When the COVID-19 pandemic hit Mumbai, one of the most densely populated cities in the world, the Municipal Corporation of Greater Mumbai (MCGM) promptly embraced newer technologies, while creatively utilising available resources. Here is a deeper dive into how the versatility of chest x-rays and Artificial Intelligence helped the financial capital of India in efforts to containing this pandemic.

The COVID-19 pandemic is one of the most demanding adversities that the present generation has had to witness and endure. The highly virulent novel Coronavirus has posed a challenge like no other to the most sophisticated healthcare systems world over. Given the brisk transmission, it was only a matter of time that the virus spread to Mumbai, the busiest city of India, with a population more than 1.5 times that of New York.

The resilient Municipal Corporation of Greater Mumbai (MCGM), swiftly sprang into action, devising multiple strategies to test, isolate, and treat in an attempt to contain the pandemic and avoid significant damage. Given the availability and effectiveness of chest x-rays, they were identified to be an excellent tool to rule-in cases that needed further testing to ensure that no suspected case was missed out. Though Mumbai saw a steep rise in cases more than any other city in India, MCGM’s efforts across various touchpoints in the city were augmented using Qure’s AI-based X-ray interpretation tool – qXR – and the extension of its capabilities and benefits.

In the latter half of June, MCGM launched the MISSION ZERO initiative, a public-private partnership supported by the Bill & Melinda Gates Foundation, Bharatiya Jain Sanghatana (BJS) and Desh Apnayen and CREDAI-MCHI. Mobile vans with qXR installed digital X-ray systems were stationed outside various quarantine centers in the city. Individuals identified to be at high-risk of COVID-19 infection by on-site physicians from various camps were directed to these vans for further examination. Based on the clinical and radiological indications of the individuals thus screened, they were requested to proceed for isolation, RT-PCR testing, or continue isolation in the quarantine facility. Our objective was to reduce the load on the centers by continuously monitoring patients and discharging those who had recovered, making room for new patients to be admitted, and ensuring optimal utilization of resources.

A patient being screen in a BJS van equipped with qXR

The approach adopted by MCGM was multi-pronged to ascertain that no step of the pandemic management process was overlooked:

  • Triaging of high-risk and vulnerable and increase in case-detection in a mass screening setting to contain community transmission (11.4% individuals screened)
  • Patient management in critical care units to manage mortality rates
  • Support the existing healthcare framework by launching MISSION ZERO initiative and using chest X-ray based screening for optimum utilization of beds at quarantine centers

Learn more about qXR COVID in our detailed blog here

Triaging and Improvement in Case Finding

Kasturba Hospital and HBT Trauma Center were among the first few COVID-19 testing centers in Mumbai. However, due to the overwhelming caseload, it was essential that they triage individuals flowing into fever clinics for optimal utilization of testing kits.  The two centers used conventional analog film-based X-ray machines, one for standard OPD setting and another portable system for COVID isolation wards

From early March, both these hospitals adopted

  1. qXR software – our AI-powered chest X-ray interpretation tool provided the COVID-19 risk score based on the condition of the patient’s lungs
  2. qTrack – our newly launched disease management platform

The qTrack mobile app is a simple, easy to use tool that interfaces qXR results with the user. The qTrack app digitizes film-based X-rays and provides real-time interpretation using deep learning models. The x-ray technician simply clicks a picture of the x-ray against a view box via the app to receive the AI reading corresponding to the x-ray uploaded. The app is a complete workflow management tool, with the provision to register patients and capture all relevant information along with the x-ray. The attending physicians and the hospital Deans were provided separate access to the Qure portal so that they could instantly access AI analyses of the x-rays from their respective sites, from the convenience of their desktops/mobile phones.

qXR app in action at Kasturba Hospital

qXR app in action at Kasturba Hospital

Triaging in Hotspots and Containment Zones

When the city went into lockdown along with the rest of the world as a measure to contain the spread of infection, social distancing guidelines were imposed across the globe. However, this is not a luxury that the second-most densely populated city in the world could always afford. It is not uncommon to have several families living in close quarters within various communities, easily making them high-risk areas and soon, containment zones. With more than 50% of the COVID-19 positive cases being asymptomatic cases, it was imperative to test aggressively. Especially in the densely populated areas to identify individuals who are at high-risk of infection so that they could be institutionally quarantined in order to prevent and contain community transmission.

Workflow for COVID-19 management in containment zones using qXR

Workflow for COVID-19 management in containment zones using qXR

The BMC van involved in mass screenings and qXR in action in the van

The BMC van involved in mass screenings and qXR in action in the van

Patient Management in Critical Care Units

As the global situation worsened, the commercial capital of the country saw a steady rise in the number of positive cases. MCGM, very creatively and promptly, revived the previously closed down hospitals and converted large open grounds in the city into dedicated COVID-19 centers in record time with their own critical patient units. The BKC MMRDA grounds, NESCO grounds, NSCI (National Sports Council of India) Dome, and SevenHills Hospital are a few such centers.


The COVID-19 center at NESCO is a 3000-bed facility with 100+ ICU beds, catering primarily to patients from Mumbai’s slums. With several critical patients admitted here, it was important for Dr. Neelam Andrade, the facility head, and her team to monitor patients closely, keep a check on their disease progression and ensure that they acted quickly.  qXR helped Dr. Andrade’s team by providing instant automated reporting of the chest X-rays. It also captured all clinical information, enabling the center to make their process completely paperless.

The patient summary screen of qXR web portal

The patient summary screen of qXR web portal

“Since the patients admitted here are confirmed cases, we take frequent X-rays to monitor their condition. qXR gives instant results and this has been very helpful for us to make decisions quickly for the patient on their treatment and management.”

– Dr Neelam Andrade, Dean, NESCO COVID centre

SevenHills Hospital, Andheri

Located in the heart of the city’s suburbs, SevenHills Hospital was one of the first hospitals that were revived by MCGM as a part of COVID-19 response measures.

The center played a critical role on two accounts:

  1. Because patients were referred to the hospital for RT-PCR testing from door-to-door screening by MCGM. If found positive, they were admitted at the center itself for quarantine and treatment.
  2. With close to 1000 beds dedicated to COVID-19 patients alone, the doctors needed assistance for easy management of critical patients and to monitor their cases closely.

As with all COVID-19 cases, chest x-rays were taken of the admitted patients periodically to ascertain their lung condition and monitor the progress of the disease. All x-rays were then read by the head radiologist, Dr. Bhujang Pai, the next day, and released to the patient only post his review and approval. This meant that on most mornings, Dr. Pai was tasked with reading and reporting 200-250 x-rays, if not more. This is where qXR simplified his work.

Initially, we deployed the software on one of the two chest X-ray systems. However, after stellar feedback from Dr. Pai, our technology was installed in both the machines. In this manner AI, pre-read was available for all chest X-rays of COVID-19 patients from the center.

Where qXR adds most value:

  • several crucial indications are reported up by qXR
  • percentage lung affected helps to quantify improvement/deterioration in the patient lung and provide an objective assessment of the patient’s condition
  • pre-filled PDF report downloadable from the Qure portal makes it easier to finalize the radiology report prior to releasing to the patient, especially in a high-volume setting

Dr. Pai reviews and finalizes the qXR report prior to signing it off

Dr. Pai reviews and finalizes the qXR report prior to signing it off

“At SevenHills hospital, we have a daily load of ~220 Chest X-rays from the admitted COVID-19 cases, sometimes going up to 300 films per day. Having qXR has helped me immensely in reading them in a much shorter amount of time and helps me utilise my time more efficiently. The findings from the software are useful to quickly pickup the indications and we have been able to work with the team, and make suitable modifications in the reporting pattern, to make the findings more accurate. qXR pre-fills the report which I review and edit, and this facilitates releasing the patient reports in a much faster and efficient manner. This obviously translates into better patient care and treatment outcomes. The percentage of lung involvement which qXR analyses enhances the Radiologist’s report and is an excellent tool in reporting Chest radiographs of patients diagnosed with COVID infection.”

– Dr Bhujang Pai, Radiology Head, SevenHills Hospital

Challenges and learnings

During the course of the pandemic, Qure has assisted MCGM with providing AI analyses for thousands of chest x-rays of COVID-19 suspects and patients. This has been possible with continued collaboration with key stakeholders within MCGM who have been happy to assist in the process and provide necessary approvals and documentation to initiate work. However, different challenges were posed by the sites owing to their varied nature and the limitations that came with them.

We had to navigate through various technical challenges like interrupted network connections and lack of an IT team, especially at the makeshift COVID centers. We crossed these hurdles repeatedly to ensure that the x-rays from these centers were processed seamlessly within the stipulated timeframe, and the x-ray systems being used were serviced and functioning uninterrupted. Close coordination with the on-ground team and cooperation from their end was crucial to keep the engagement smooth.

This pandemic has been a revelation in many ways. In addition to reiterating that a virus sees no class or creed, it also forced us to move beyond our comfort zones and take our blinders off. Owing to limitations posed by the pandemic and subsequent movement restrictions, every single deployment of qXR by Qure was done entirely remotely. This included end-to-end activities like coordination with the key stakeholders, planning and execution of the deployment of the software, training of on-ground staff, and physicians using the portal/mobile app in addition to continuous operations support.

Robust and smart technology truly made it possible to implement what we had conceived and hoped for. Proving yet again that if we are to move ahead, it has to be a healthy partnership between technology and humanity.

Qure is supported by ACT Grants and India Health Fund for joining MCGM’s efforts for the pandemic response using qXR for COVID-19 management.


An AI Upgrade during COVID-19: Stories from the most resilient healthcare systems in Rural India

When the pandemic hit the world without discretion, it caused health systems to crumble across the world. While a large focus was on strengthening them in the urban cities, the rural areas were struggling to cope up. In this blog, we highlight our experience working with some of the best healthcare centers in rural India that are delivering healthcare to the last mile. We describe how they embraced AI technology during this pandemic, and how it made a difference in their workflow and patient outcomes.

2020 will be remembered as the year of the COVID-19 pandemic. Affecting every corner of the world without discretion, it has caused unprecedented chaos and put healthcare systems under enormous stress. The majority of COVID-19 transmissions take place due to asymptomatic or mildly symptomatic cases. While global public health programs have steadily created evolving strategies for integrative technologies for improved case detection, there is a critical need for consistent and rigorous testing. It is at this juncture that the impact of Qure’s AI-powered chest X-ray screening tool, qXR, was felt across large testing sites such as hospital networks and government-led initiatives.

In India, Qure joined forces with the Indian Government to combat COVID-19 and qXR found its value towards diagnostic aid and critical care management. With the assistance of investor groups like ACT Grants and India Health Fund, we extended support to a number of sites, strengthening the urban systems fighting the virus in hotspots and containment zones.
Unfortunately, by this time, the virus had already moved to the rural areas, crumbling the primary healthcare systems that were already overburdened and resource-constrained.

Discovering the undiscovered healthcare providers

Technologies are meant to improve the quality of human lives, and access to quality healthcare is one of the most basic necessities. To further our work with hospitals and testing centers across the world, we took upon ourselves if more hospitals could benefit from the software in optimising testing capability. Through our physicians, we reached out to healthcare provider networks and social impact organisations that could potentially use the software for triaging and optimisation. During this process, we discovered an entirely new segment, very different from the well equipped urban hospitals we have been operating so far, and interacted with few Physicians dedicated to delivering quality and affordable healthcare through these hospitals.

Working closely with the community public health systems, these secondary care hospitals act as a vital referral link for tertiary hospitals. Some of these are located in isolated tribal areas and address the needs of large catchment populations, hosting close to 100,000 OPD visits annually. They already faced the significant burden of TB and now had to cope with the COVID-19 crisis. With testing facilities often located far away, the diagnosis time increases by days, which is unfortunate because chest X-rays are crucial for primary investigation prior to confirmatory tests, mainly due to the limitations in a testing capacity. No, sufficient testing kits have not reached many parts of rural India as yet!

“I have just finished referring a 25-year-old who came in respiratory distress, flagged positive on X-ray with positive rapid antigen test to Silchar Medical College and Hospital (SMCH), which is 162kms away from here. The number of cases here in Assam is increasing”

Dr. Roshine Koshy, Makunda Christian Leprosy and General Hospital in Assam.

BSTI algorithm

On the left: Chinchpada mission hospital, Maharashtra; Right: Shanti Bhavan Medical Center, Jharkhand.

When we first reached out to these hospitals, we were struck by the heroic vigour with which they were already handling the COVID-19 crisis despite their limited resources. We spoke to the doctors, care-givers and IT experts across all of these hospitals and they had the utmost clarity from the very beginning on how the technology could help them.

Why do they need innovations?

Patients regularly present with no symptoms or atypical ones and conceal their travel history due to the associated stigma of COVID-19. Owing to the ambiguous nature of the COVID-19 presentation, there is a possibility of missing subtle findings. This means that, apart from direct contact with the patient, it puts the healthcare team, their families, and other vulnerable patients at risk.

qXR bridges underlying gaps in these remote, isolated and resource-constrained regions around the world. Perhaps the most revolutionary, life-saving aspect is the fact that, in less than 1 minute, qXR generates the AI analysis of whether the X-ray is normal or abnormal, along with a list of 27+ abnormalities including COVID-19 and TB. With qXR’s assistance, the X-rays that are suggestive of a high risk of COVID-19 are flagged, enabling quick triaging and isolation of these suspects till negative RT PCR confirmatory results are received. As the prognosis changes with co-morbidities, alerting the referring Physician via phone of life-threatening findings like Pneumothorax is an added advantage.

Overview of results generated by qXR

Overview of results generated by qXR

Due to the lack of radiologists and other specialists in their own or neighbouring cities, Clinicians often play multiple roles – Physician, Obstetrician, Surgeon, Intensivist, Anaesthesist – and is normal in these hospitals that investigate, treat and perform surgeries for those in need. Detecting any case at risk prior to their surgical procedures are important for necessitating RT PCR confirmation and further action.

Enabling the solution and the impact

These hospitals have been in the service of the local communities with a mix of healthcare and community outreach services for decades now. Heavily dependent on funding, these setups have to often navigate severe financial crises in their mission to continue catering to people at the bottom of the pyramid. Amidst the tribal belt in Jharkhand, Dr. George Mathew (former Principal, CMC, Vellore) and Medical Director of Shantibhavan Medical Center in Simdega, had to face the herculean task of providing food and accommodation for all his healthcare and non-healthcare staff as they were ostracised by their families owing to the stigma attached to COVID-19 care. Lack of availability of  PPE kits and other protective gear, also pushed these sites to innovate and produce them inhouse.

Staff protecting themselves and patients

Left: the staff of Shanti Bhavan medical center making the essentials for protecting themselves in-house; Right: staff protecting themselves and a patient.

qXR was introduced to these passionate professionals and other staff were sensitized on the technology. Post their buy-in of the solution, we on-boarded 11 of these hospitals, working closely with their IT teams for secure protocols, deployment and training of the staff in a span of 2 weeks. A glimpse of the hospitals as below:

LocationHospital NameSetting
Betul District, rural Madhya PradeshPadhar HospitalThis is a 200 bedded multi-speciality charitable hospital engages in a host of community outreach activities in nearby villages involving education, nutrition, maternal and child health programs, mental health and cancer screening
Nandurbar, MaharashtraChinchpada Mission HospitalThis secondary care hospital serves the Bhil tribal community. Patients travel upto 200kms from the interiors of Maharashtra to avail affordable, high quality care.
Tezpur, AssamThe Baptist Christian HospitalThis is a 200- bedded secondary care hospital in the North eastern state of Assam
Bazaricherra, AssamMakunda Christian Leprosy & General HospitalThey cater to the tribal regions. Situated in a district with a Maternal Mortality Rate (MMR) as high as 284 per 100,000 live births and Infant Mortality Rate (IMR) of 69 per 1000 live births. They conduct 6,000 deliveries, and perform 3,000 surgeries annually.
Simdega, JharkhandShanti Bhavan Medical CenterThis secondary hospital caters to remote tribal district. It is managed entirely by 3-4 doctors that actively multitask to ensure highest quality care for their patients. The nearest tertiary care hospital is approximately 100 km away. Currently, they are a COVID-19 designated center and they actively see many TB cases as well.

Others include hospitals in Khariar, Odisha; Dimapur, Nagaland; Raxaul, Bihar and so on.

Initially, qXR was used to process X-rays of cases with COVID-19 like symptoms, with results interpreted and updated in a minute. Soon the doctors found it to be useful in OPD as well and the solution’s capability was extended to all patients who visited with various ailments that required chest X-ray diagnosis. Alerts on every suspect are provided immediately, based on the likelihood of disease predicted by qXR, along with information on other suggestive findings. The reports are compiled and integrated on our patient workflow management solution, qTrack. Due to resource constraints for viewing X-ray in dedicated workstations, the results are also made available real-time using the qTrack mobile application.

qTrack app and web

Left: qTrack app used by the Physicians to view results in real time during while they are attending patients and performing routine work; Right: qTrack web used by Physicians and technicians to view instantaneously for reporting.

“It is a handy tool for our junior medical officers in the emergency department, as it helps in quick clinical decision making. The uniqueness of the system being speed, accuracy, and the details of the report. We get the report moment the x rays are uploaded on the server. The dashboard is very friendly to use. It is a perfect tool for screening asymptomatic patients for RT PCR testing, as it calculates the COVID-19 risk score. This also helps us to isolate suspected patients early and thereby helping in infection control. In this pandemic, this AI system would be a valuable tool in the battleground”

Dr Jemin Webster, Tezpur Baptist Hospital

Once the preliminary chest X-ray screening is done, the hospitals equipped with COVID-19 rapid tests get them done right away, while the others send samples to the closest testing facility which may be more than 30 miles away, with results made available in 4-5 days or more. But, none of these hospitals have the RT-PCR testing facility, yet!

qXR Protocol

In Makunda Hospital, Assam, qXR is used as an additional input in the diagnosis methodologies to manage the patient as a COVID-19 patient. They have currently streamlined their workflow to include the X-ray technicians taking digital X-rays and uploading the images on qXR, to  follow up and alert the doctors. Meanwhile, physicians can also access reports, review images  and make clinical corroboration anywhere they are through qTrack and manage patients without any undue delay.

Dr. Roshine Koshy using qXR

Dr. Roshine Koshy using qXR system during her OPD to review and take next course of action

“One of our objectives as a clinical team has been to ensure that care for non-COVID-19 patients is not affected as much as possible as there are no other healthcare facilities providing similar care. We are seeing atypical presentations of the illness, patients without fever, with vague complaints. We had one patient admitted in the main hospital who was flagged positive on the qXR system and subsequently tested positive and referred to a higher center. All the symptomatic patients who tested positive on the rapid antigen test have been flagged positive by qXR and some of them were alerted because of the qXR input. Being a high volume center and the main service provider in the district, using as a triaging tool will have enormous benefits in rural areas especially where there are no well-trained doctors”

– Dr. Roshine Koshy, Makunda Christian Leprosy and General Hospital in Assam.

There are a number of changes our users experienced in this short span of introduction of qXR in their existing workflow including:

  • Empowering the front-line healthcare physicians and care-givers in quick decisions
  • Enabling diagnosis for patients by triaging them for Rapid Antigen or RT-PCR tests immediately
  • Identifying asymptomatic cases which would have been missed otherwise
  • Ensuring safety of the health workers and other staff
  • Reducing risk of disease transmission

In Padhar Hospital, Madhya Pradesh, in addition to triaging suspected COVID cases, qXR assists doctors in managing pre-operative patients, where their medicine department takes care of pre-anaesthesia checkups as well. qXR helps them in identifying and flagging off suspected cases who are planned for procedures.  They are deferred till diagnosis or handled with appropriate additional safety measures in case of an emergency.

“We are finding it quite useful since we get a variety of patients, both outpatients and inpatients. And anyone who has a short history of illness and has history suggestive of SARI, we quickly do the chest X-ray and if the Qure app shows a high COVID-19 score, we immediately refer the patient to the nearby district hospital for RT-PCR for further management. Through the app we are also able to pick up asymptomatic suspects who hides their travel history or positive cases who have come for second opinion, to confirm and/or guide them to the proper place for further testing and isolation”

– Dr Mahima Sonwani, Padhar Hospital, Betul, Madhya Pradesh

Dr. Roshine Koshy using qXR

Left: technician capturing X-ray in Shanti Bhavan medical center; Right: Dr. Jemine Webster using qXR solution in Baptist hospital, Tezpur

In some of the high TB burden settings like Simdega in Jharkhand, qXR is used as a surveillance tool for screening and triaging Tuberculosis cases in addition to COVID-19 and other lung ailments.

“We are dependent on chest X-rays to make the preliminary diagnosis in both these conditions before we perform any confirmatory test. There are no trained radiologists available in our district or our neighbouring district and struggle frequently to make accurate diagnosis without help of a trained radiologist. The AI solution provided by Qure, is a perfect answer for our problem in this remote and isolated region. I strongly feel that the adoption of AI for Chest X-ray and other radiological investigation is the ideal solution for isolated and human resource deprived regions of the world”

– Dr.George Mathew, Medical Director, Shanti Bhavan Medical Centre

Currently, qXR processes close to 150 chest X-rays a day from these hospitals, enabling quick diagnostic decisions for lung diseases.

Challenges: Several hospitals had very basic technological infrastructure systems with poor internet connectivity and limitations in IT systems for using all supporting softwares. They were anxious about potential viruses / crashing the computer where our software was installed. Most of these teams had limited understanding of exposure to working with such softwares as well. However, they were extremely keen to learn, adapt and even provide solutions to overcome these infrastructural limitations. The engineers of the customer success team at Qure, deployed the software gateways carefully, ensuring no interruption in their existing functioning.


At Qure, we have worked closely with public health stakeholders in recent years. It is rewarding to hear the experiences and stories of impact from these physicians. To strengthen their armor in the fight against the pandemic even in such resource-limited settings, we will continue to expand our software solutions. Without limitation, qXR will be available across primary, secondary, and tertiary hospitals. The meetings, deployments, and training will be done remotely, providing a seamless experience. It is reassuring to hear these words:

“Qure’s solution is particularly attractive because it is cutting edge technology that directly impacts care for those sections of our society who are deprived of many advances in science and technology simply because they never reach them! We hope that this and many more such innovative initiatives would be encouraged so that we can include the forgotten masses of our dear people in rural India in the progress enjoyed by those in the cities, where most of the health infrastructure and manpower is concentrated”

Dr. Ashita Waghmare, Chinchpada hospital

Democratizing healthcare through innovations! We will be publishing a detailed study soon.


Improving performance of AI models in presence of artifacts

Our deep learning models have become really good at recognizing hemorrhages from Head CT scans. Real-world performance is sometimes hampered by several external factors both hardware-related and human-related. In this blog post, we analyze how acquisition artifacts are responsible for performance degradation and introduce two methods that we tried, to solve this problem.

Medical Imaging is often accompanied by acquisition artifacts which can be subject related or hardware related. These artifacts make confident diagnostic evaluation difficult in two ways:

  • by making abnormalities less obvious visually by overlaying on them.
  • by mimicking an abnormality.

Some common examples of artifacts are

  • Clothing artifact- due to clothing on the patient at acquisition time See fig 1 below. Here a button on the patient’s clothing looks like a coin lesion on a Chest X Ray. Marked by red arrow.

clothing artifact

Fig 1. A button mimicking coin lesion in Chest X Ray. Marked by red arrow.Source.

  • Motion artifact- due to voluntary or involuntary subject motion during acquisition. Severe motion artifacts due to voluntary motion would usually call for a rescan. Involuntary motion like respiration or cardiac motion, or minimal subject movement could result in artifacts that go undetected and mimic a pathology. See fig 2. Here subject movement has resulted in motion artifacts that mimic subdural hemorrhage(SDH).

motion artifact

Fig 2. Artifact due to subject motion, mimicking a subdural hemorrhage in a Head CT.Source

  • Hardware artifact- See fig 3. This artifact is caused due to air bubbles in the cooling system. There are subtle irregular dark bands in scan, that can be misidentifed as cerebral edema.

hardware artifact edema

Fig 3. A hardware related artifact, mimicking cerebral edema marked by yellow arrows.Source

Here we are investigating motion artifacts that look like SDH, in Head CT scans. These artifacts result in increase in false positive (FPs) predictions of subdural hemorrhage models. We confirmed this by quantitatively analyzing the FPs of our AI model deployed at an urban outpatient center. The FP rates were higher for this data when compared to our internal test dataset.
The reason for these false positive predictions is due to the lack of variety of artifact-ridden data in the training set used. Its practically difficult to acquire and include scans containing all varieties of artifacts in the training set.

artifact mistaken for sdh

Fig 4. Model identifies an artifact slice as SDH because of similarity in shape and location. Both are hyperdense areas close to the cranial bones

We tried to solve this problem in the following two ways.

  • Making the models invariant to artifacts, by explicitly including artifact images into our training dataset.
  • Discounting slices with artifact when calculating the probability of bleed in a scan.

Method 1. Artifact as an augmentation using Cycle GANs

We reasoned that the artifacts were misclassified as bleeds because the model has not seen enough artifact scans while training.
The number of images containing artifacts is relatively small in our annotated training dataset. But we have access to several unannotated scans containing artifacts acquired from various centers with older CT scanners.(Motion artifacts are more prevalent when using older CT scanners with poor in plane temporal resolution). If we could generate artifact ridden versions of all the annotated images in our training dataset, we would be able to effectively augment our training dataset and make the model invariant to artifacts.
We decided to use a Cycle GAN to generate new training data containing artifacts.

Cycle GAN[1] is a generative adversarial network that is used for unpaired image to image translation. It serves our purpose because we have an unpaired image translation problem where X domain has our training set CT images with no artifact and Y domain has artifact-ridden CT images.

cycle gan illustration

Fig 5. Cycle GAN was used to convert a short clip of horse into that of a zebra.Source

We curated a A dataset of 5000 images without artifact and B dataset of 4000 images with artifacts and used this to train the Cycle GAN.

Unfortunately, the quality of generated images was not very good. See fig 6.
The generator was unable to capture all the variety in CT dataset, meanwhile introducing artifacts of its own, thus rendering it useless for augmenting the dataset. Cycle GAN authors state that the performance of the generator when the transformation involves geometric changes for ex. dog to cat, apples to oranges etc. is worse when compared to transformation involving color or style changes. Inclusion of artifacts is a bit more complex than color or style changes because it has to introduce distortions to existing geometry. This could be one of the reasons why the generated images have extra artifacts.

cycle gan images

Fig 6. Sampling of generated images using Cycle GAN. real_A are input images and fake_B are the artifact_images generated by Cycle GAN.

Method 2. Discounting artifact slices

In this method, we trained a model to identify slices with artifacts and show that discounting these slices made the AI model identifying subdural hemorrhage (SDH) robust to artifacts.
A manually annotated dataset was used to train a convolutional neural network (CNN) model to detect if a CT slice had artifacts or not. The original SDH model was also a CNN which predicted if a slice contained SDH. The probabilities from artifact model were used to discount the slices containing artifact and artifact-free slices of a scan were used in computation of score for presence of bleed.
See fig 7.

Method 2 illustration

Fig 7. Method 2 Using a trained artifacts model to discount artifact slices while calculating SDH probability.


Our validation dataset contained 712 head CT scans, of which 42 contained SDH. Original SDH model predicted 35 false positives and no false negatives. Quantitative analysis of FPs confirmed that 17 (48%) of them were due to CT artifacts. Our trained artifact model had slice-wise AUC of 96%. Proposed modification to the SDH model had reduced the FPs to 18 (decrease of 48%) without introducing any false negatives. Thus using method 2, all scanwise FP’s due to artifacts were corrected.

In summary, using method 2, we improved the precision of SDH detection from 54.5% to 70% while maintaining a sensitivity of 100 percent.

confusion matrics

Fig 8. Confusion Matrix before and after using artifact model for SDH prediction

See fig 9. for model predictions on a representative scan.

artifact discount explanaation

Fig 9. Model predictions for few representative slices in a scan falsely predicted as positive by original SDH model

A drawback of Method 2 is that if SDH and artifact are present in the same slice, its probable that the SDH could be missed.


Using a cycle GAN to augment the dataset with artifact ridden scans would solve the problem by enriching the dataset with both SDH positive and SDH negative scans with artifacts over top of it. But the current experiments do not give realistic looking image synthesis results. The alternative we used, meanwhile reduces the problem of high false positives due to artifacts while maintaining the same sensitivity.


  1. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks by Jun-Yan Zhu et al.


Morphology of the Brain: Changes in Ventricular and Cranial Vault Volumes in 15000 subjects with Aging and Hydrocephalus

This post is Part 1 of a series that uses large datasets (15,000+) coupled with deep learning segmentation methods to review and maybe re-establish what we know about normal brain anatomy and pathology. Subsequent posts will tackle intra-cranial bleeds, their typical volumes and locations across similarly sized datasets.

Brain ventricular volume has been quantified by post-mortem studies [1] and pneumoencephalography. When CT and subsequently MRI became available, facilitating non-invasive observation of the ventricular system larger datasets could be used to study these volumes. Typical subject numbers in recent studies have ranged from 50 – 150 [26].

Now that deep learning segmentation methods have enabled automated precise measurements of ventricular volume, we can re-establish these reference ranges using datasets that are 2 orders of magnitude larger. This is likely to be especially useful for age group extremes – in children, where very limited reference data exist and the elderly, where the effects of age-related atrophy may co-exist with pathologic neurodegenerative processes.

To date, no standard has been established regarding the normal ventricular volume of the human brain. The Evans index and the bicaudate index are linear measurements currently being used as surrogates to provide some indication that there is abnormal ventricular enlargement [1]. True volumetric measures are preferable to these indices for a number of reasons [7, 8] but have not been adopted so far, largely because of the time required for manual segmentation of images. Now that automated precise quantification is feasible with deep learning, it is possible to upgrade to a more precise volumetric measure.

Such quantitative measures will be useful in the monitoring of patients with hydrocephalus, and as an aid to diagnosing normal pressure hydrocephalus. In the future, automated measurements of ventricular, brain and cranial volumes could be displayed alongside established age- and gender-adjusted normal ranges as a standard part of radiology head CT and MRI reports.

Methods and Results

To train our deep learning model, lateral ventricles were manually annotated in 103 scans. We split these scans randomly with a ratio of 4:1 for training and validation respectively. We trained a U-Net to segment lateral ventricles in each slice. Another U-Net model was trained to segment cranial vault using a similar process. Models were validated using DICE score metric versus the annotations.

AnatomyDICE Score

Lateral Ventricles0.909
Cranial Vault0.983

Validation set of about 20 scans might not have represented all the anatomical/pathological variations in the population. Therefore, we visually verified that the resulting models worked despite pathologies like hemorrhage/infarcts or surgical implants such as shunts. We show some representative scans and model outputs below.

Focal ventricle dilatation

30 year old male reported with 'focal dilatation of left lateral ventricle.'

Mild Hydrcephalus

7 year old female child reported with 'mild obstructive hydrocephalus'

Mild Hydrcephalus

28 year old male reported with fracture and hemorrhages


36 year old male reported with an intraventricular mass and with a VP shunt

To study lateral ventricular and cranial vault volume variation across the population, we randomly selected 14,153 scans from our database. This selection contained only 208 scans with hydrocephalus reported by the radiologist. Since we wanted to study ventricle volume variation in patients with hydrocephalus, we added 1314 additional scans reported with ‘hydrocephalus’. We excluded those scans for which age/gender metadata were not available.
In total, our analysis dataset contained 15223 scans whose demographic characteristics are shown in the table below.


Number of scans15223
Females6317 (41.5%)
Age: median (interquartile range)40 (24 – 56) years
Scans reported with cerebral atrophy1999 (13.1%)
Scans reported with hydrocephalus1404 (9.2%)

Dataset demographics and prevalances.

Histogram of age distribution is shown below. It can be observed that there are reasonable numbers of subjects (>200) for all age and sex groups. This ensures that our analysis is generalizable.

age histogram

We ran the trained deep learning models and measured lateral ventricular and cranial vault volumes for each of the 15223 scans in our database. Below is the scatter plot of all the analyzed scans.

Scatter plot

In this scatter plot, x-axis is the lateral ventricular volume while y-axis is cranial vault volume. Patients with atrophy were circled with marked orange and while scans with hydrocephalus were marked with green. Patients with atrophy were on the right to the majority of the individuals, indicating larger ventricles in these subjects. Patients with hydrocephalus move to the extreme right with ventricular volumes even higher than those with atrophy.

To make this relationship clearer, we have plotted distribution of ventricular volume for patients without hydrocephalus or atrophy and patients with one of these.

ventricular volume distribution

Interestingly, hydrocephalus distribution has a very long tail while distribution of patients with neither hydrocephalus nor atrophy has a narrower peak.

Next, let us observe cranial vault volume variation with age and sex. Bands around solid lines indicate interquartile range of cranial vault volume of the particular group.

cranial vault volume variation

An obvious feature of this plot is that the cranial vault increases in size until age of 10-20 after which it plateaus. The cranial vault of males is approximately 13% larger than that of females. Another interesting point is that the cranial vault in males will grow until the age group of 15-20 while in the female group it stabilizes at ages of 10-15.

Now, let’s plot variation of lateral ventricles with age and sex. As before, bands indicate interquartile range for a particular age group.

lateral ventricular volume variation

This plot shows that ventricles grow in size as one ages. This may be explained by the fact that brain naturally atrophies with age, leading to relative enlargement of the ventricles. This information can be used as normal range of ventricle volume for a particular age in a defined gender. Ventricle volume outside this normal range can be indicative of hydrocephalus or a neurodegenerative disease.

While the above plot showed variation of lateral ventricle volumes across age and sex, it might be easier to visualize relative proportion of lateral ventricles compared to cranial vault volume. This also has a normalizing effect across sexes; difference in ventricular volumes between sexes might be due to difference in cranial vault sizes.

relative lateral ventricular volume variation

This plot looks similar to the plot before, with the ratio of the ventricular volume to the cranial vault increasing with age. Until the age of 30-35, males and females have relatively similar ventricular volumes. After that age, however, males tend to larger relative ventricular size compared to females. This is in line with prior research which found that males are more susceptible to atrophy than females[10].

We can incorporate all this analysis into our automated report. For example, following is the CT scan of an 75 year old patient and our automated report.


CT scan of a 75 Y/M patient.
Use scroll bar on the right to scroll through slices.

qER Analysis Report

Patient ID: KSA18458
Patient Age: 75Y
Patient Sex: M

Preliminary Findings by Automated Analysis:

- Infarct of 0.86 ml in left occipital region.
- Dilated lateral ventricles.
  This might indicate neurodegenerative disease/hydrocephalus.
  Lateral ventricular volume = 88 ml.
  Interquartile range for male >=75Y patients is 28 - 54 ml.

This is a report of preliminary findings by automated analysis.
Other significant abnormalities may be present.
Please refer to final report.

Our auto generated report. Added text is indicated in bold.


The question of how to establish the ground truth for these measurements still remains to be answered. For this study, we use DICE scores versus manually outlined ventricles as an indicator of segmentation accuracy. Ventricle volumes annotated slice-wise by experts are an insufficient gold-standard not only because of scale, but also because of the lack of precision. The most likely places where these algorithms are likely to fail (and therefore need more testing) are anatomical variants and pathology that might alter the structure of the ventricles. We have tested some common co-occurring pathologies (hemorrhage), but it would be interesting to see how well the method performs on scans with congenital anomalies and other conditions such as subarachnoid cysts (which caused an earlier machine-learning-based algorithm to fail [9]).

  • Recording ventricular volume on reports is a good idea for future reference and monitor ventricular size in individuals with varying pathologies such as traumatic brain injury and colloid cysts of the third ventricle.
  • It provides an objective measure to follow ventricular volumes in patients who have had shunts and can help in identifying shunt failure.
  • Establishing the accuracy of these automated segmentation methods algorithms also paves the way for more nuanced neuroradiology research on a scale that was not previously possible.
  • One can use the data in relation to the cerebral volume and age to define hydrocephalus, atrophy and normal pressure hydrocephalus.


  1. EVANS, WILLIAM A. “An encephalographic ratio for estimating ventricular enlargement and cerebral atrophy.” Archives of Neurology & Psychiatry 47.6 (1942): 931-937.
  2. Matsumae, Mitsunori, et al. “Age-related changes in intracranial compartment volumes in normal adults assessed by magnetic resonance imaging.” Journal of neurosurgery 84.6 (1996): 982-991.
  3. Scahill, Rachael I., et al. “A longitudinal study of brain volume changes in normal aging using serial registered magnetic resonance imaging.” Archives of neurology 60.7 (2003): 989-994.
  4. Hanson, J., B. Levander, and B. Liliequist. “Size of the intracerebral ventricles as measured with computer tomography, encephalography and echoventriculography.” Acta Radiologica. Diagnosis 16.346_suppl (1975): 98-106.
  5. Gyldensted, C. “Measurements of the normal ventricular system and hemispheric sulci of 100 adults with computed tomography.” Neuroradiology 14.4 (1977): 183-192.
  6. Haug, G. “Age and sex dependence of the size of normal ventricles on computed tomography.” Neuroradiology 14.4 (1977): 201-204.
  7. Toma, Ahmed K., et al. “Evans’ index revisited: the need for an alternative in normal pressure hydrocephalus.” Neurosurgery 68.4 (2011): 939-944.
  8. Ambarki, Khalid, et al. “Brain ventricular size in healthy elderly: comparison between Evans index and volume measurement.” Neurosurgery 67.1 (2010): 94-99.
  9. Yepes-Calderon, Fernando, Marvin D. Nelson, and J. Gordon McComb. “Automatically measuring brain ventricular volume within PACS using artificial intelligence.” PloS one 13.3 (2018): e0193152.
  10. Gur, Ruben C., et al. “Gender differences in age effect on brain atrophy measured by magnetic resonance imaging.” Proceedings of the National Academy of Sciences 88.7 (1991): 2845-2849.


Challenges of Development & Validation of Deep Learning for Radiology

We have recently published an article on our deep learning algorithms for Head CT in The Lancet. This article is the first ever AI in medical imaging paper to be published in this journal.
We described development and validation of these algorithms in the article.
In this blog, I explain some of the challenges we faced in this process and how we solved them. The challenges I describe are fairly general and should be applicable to any research involving AI and radiology images.


3D Images

First challenge we faced in the development process is that CT scans are three dimensional (3D). There is plethora of research for two dimensional (2D) images, but far less for 3D images. You might ask, why not simply use 3D convolutional neural networks (CNNs) in place of 2D CNNs? Notwithstanding the computational and memory requirements of 3D CNNs, they have been shown to be inferior to 2D CNN based approaches on a similar problem (action recognition).

So how do we solve it? We need not invent the wheel from scratch when there is a lot of literature on a similar problem, action recognition. Action recognition is classification of action that is present in a given video.
Why is action recognition similar to 3D volume classification? Well, temporal dimension in videos is analogous to the Z dimension in the CT.

Left: Example Head CT scan. Right: Example video from a action recognition dataset. Z dimension in the CT volume is analogous to time dimension in the video.

We have taken a foundational work from action recognition literature and modified it to our purposes. Our modification was that we have incorporated slice (or frame in videos) level labels in to the network. This is because action recognition literature had a comfort of using pretrained 2D CNNs which we do not share.

High Resolution

Second challenge was that CT is of high resolution both spatially and in bit depth. We just downsample the CT to a standard pixel spacing. How about bit depth? Deep learning doesn’t work great with the data which is not normalized to [-1, 1] or [0, 1]. We solved this with what a radiologist would use – windowing. Windowing is restriction of dynamic range to a certain interval (eg. [0, 80]) and then normalizing it. We applied three windows and passed them as channels to the CNNs.

Windows: brain, blood/subdural and bone

Windows: brain, blood/subdural and bone

This approach allows for multi-class effects to be accounted by the model. For example, a large scalp hemotoma visible in brain window might indicate a fracture underneath it. Conversely, a fracture visible in the bone window is usually correlated with an extra-axial bleed.

Other Challenges

There are few other challenges that deserve mention as well:

  1. Class Imbalance: We solved the class imbalance issue by weighted sampling and loss weighting.
  2. Lack of pretraining: There’s no pretrained model like imagenet available for medical images. We found that using imagenet weights actually hurts the performance.


Once the algorithms were developed, validation was not without its challenges as well.
Here are the key questions we started with: does our algorithms generalize well to CT scans not in the development dataset?
Does the algorithm also generalize to CT scans from a different source altogether? How does it compare to radiologists without access to clinical history?

Low prevalences and statistical confidence

The validation looks simple enough: just acquire scans (from a different source), get it read by radiologists and compare their reads with the algorithms’.
But statistical design is a challenge! This is because prevalence of abnormalities tend to be low; it can be as low as 1% for some abnormalities. Our key metrics for evaluating the algorithms are sensitivity & specificity and AUC depending on the both. Sensitivity is the trouble maker: we have to ensure there are enough positives in the dataset to ensure narrow enough 95% confidence intervals (CI). Required number of positive scans turns out to be ~80 for a CI of +/- 10% at an expected sensitivity of 0.7.

If we were to chose a randomly sampled dataset, number of scans to be read is ~ 80/prevalence rate = 8000. Suppose there are three readers per scan, number of total reads are 8k * 3 = 24k. So, this is a prohibitively large dataset to get read by radiologists. We cannot therefore have a randomly sampled dataset; we have to somehow enrich the number of positives in the dataset.


To enrich a dataset with positives, we have to find the positives from all the scans available. It’s like searching for a needle in a haystack. Fortunately, all the scans usually have a clinical report associated with them. So we just have to read the reports and choose the positive reports. Even better, have an NLP algorithm parse the reports and randomly sample the required number of positives. We chose this path.

We collected the dataset in two batches, B1 & B2. B1 was all the head CT scans acquired in a month and B2 was the algorithmically selected dataset. So, B1 mostly contained negatives while B2 contained lot of positives. This approach removed any selection bias that might have been present if the scans were manually picked. For example, if positive scans were to be picked by manual & cursory glances at the scans themselves, subtle positive findings would have been missing from the dataset.

Prevalences of the findings in batches B1 and B2. Observe the low prevalences of findings in uniformly sampled batch B1.


We called this enriched dataset, CQ500 dataset (C for CARING and Q for The dataset contained 491 scans after the exclusions. Three radiologists independently read the scans in the dataset and the majority vote is considered the gold standard. We randomized the order of the reads to minimize the recall of follow up scans and to blind the readers to the batches of the dataset.

We make this dataset and the radiologists’ reads public under CC-BY-NC-SA license. Other researchers can use this dataset to benchmark their algorithms. I think it can also be used for some clinical research like measuring concordance of radiologists on various tasks etc.

In addition to the CQ500 dataset, we validated the algorithms on a much larger randomly sampled dataset, Qure25k dataset. Number of scans in this dataset was 21095. Ground truths were clinical radiology reports. We used the NLP algorithm to get structured data from the reports. This dataset satisfies the statistical requirements, but each scan is read only by a single radiologist who had access to clinical history.


FindingCQ500 (95% CI)Qure25k (95% CI)
Intracranial hemorrhage0.9419 (0.9187-0.9651)0.9194 (0.9119-0.9269)
Intraparenchymal0.9544 (0.9293-0.9795)0.8977 (0.8884-0.9069)
Intraventricular0.9310 (0.8654-0.9965)0.9559 (0.9424-0.9694)
Subdural0.9521 (0.9117-0.9925)0.9161 (0.9001-0.9321)
Extradural0.9731 (0.9113-1.0000)0.9288 (0.9083-0.9494)
Subarachnoid0.9574 (0.9214-0.9934)0.9044 (0.8882-0.9205)
Calvarial fracture0.9624 (0.9204-1.0000)0.9244 (0.9130-0.9359)
Midline Shift0.9697 (0.9403-0.9991)0.9276 (0.9139-0.9413)
Mass Effect0.9216 (0.8883-0.9548)0.8583 (0.8462-0.8703)

AUCs of the algorithms on the both datasets.

Above table shows AUCs of the algorithms on the two datasets. Note that the AUCs are directly comparable. This is because AUC is prevalence independent. AUCs on CQ500 dataset are generally better than that on the Qure25k dataset. This might be because:

  1. Ground truths in the Qure25k dataset incorporated clinical information not available to the algorithms and therefore the algorithms did not perform well.
  2. Majority vote of three reads is a better ground truth than that of a single read.

ROC curves

ROC curves for the algorithms on the Qure25k (blue) and CQ500 (red) datasets. TPR and FPR of radiologists are also plotted.

Shown above is ROC curves on both the datasets. Readers’ TPR and FPR are also plotted. We observe that radiologists are either highly sensitive or specific to a particular finding. The algorithms are still yet to beat radiologists, on this task at least! But these should nonetheless be useful to triage or notify physicians.


Deep Learning for Videos: A 2018 Guide to Action Recognition

Medical images like MRIs, CTs (3D images) are very similar to videos – both of them encode 2D spatial information over a 3rd dimension. Much like diagnosing abnormalities from 3D images, action recognition from videos would require capturing context from entire video rather than just capturing information from each frame.

Fig 1: Left: Example Head CT scan. Right: Example video from a action recognition dataset. Z dimension in the CT volume is analogous to time dimension in the video.

In this post, I summarize the literature on action recognition from videos. The post is organized into three sections –

  1. What is action recognition and why is it tough
  2. Overview of approaches
  3. Summary of papers

Action recognition and why is it tough?

Action recognition task involves the identification of different actions from video clips (a sequence of 2D frames) where the action may or may not be performed throughout the entire duration of the video. This seems like a natural extension of image classification tasks to multiple frames and then aggregating the predictions from each frame. Despite the stratospheric success of deep learning architectures in image classification (ImageNet), progress in architectures for video classification and representation learning has been slower.

What made this task tough?

  1. Huge Computational Cost
    A simple convolution 2D net for classifying 101 classes has just ~5M parameters whereas the same architecture when inflated to a 3D structure results in ~33M parameters. It takes 3 to 4 days to train a 3DConvNet on UCF101 and about two months on Sports-1M, which makes extensive architecture search difficult and overfitting likely[1].
  2. Capturing long context
    Action recognition involves capturing spatiotemporal context across frames. Additionally, the spatial information captured has to be compensated for camera movement. Even having strong spatial object detection doesn’t suffice as the motion information also carries finer details. There’s a local as well as global context w.r.t. motion information which needs to be captured for robust predictions. For example, consider the video representations shown in Figure 2. A strong image classifier can identify human, water body in both the videos but the nature of temporal periodic action differentiates front crawl from breast stroke.

    Fig 2: Left: Front crawl. Right: Breast stroke. Capturing temporal motion is critical to differentiate these two seemingly similar cases. Also notice, how camera angle suddenly changes in the middle of front crawl video.

  3. Designing classification architectures
    Designing architectures that can capture spatiotemporal information involve multiple options which are non-trivial and expensive to evaluate. For example, some possible strategies could be

    • One network for capturing spatiotemporal information vs. two separate ones for each spatial and temporal
    • Fusing predictions across multiple clips
    • End-to-end training vs. feature extraction and classifying separately
  4. No standard benchmark
    The most popular and benchmark datasets have been UCF101 and Sports1M for a long time. Searching for reasonable architecture on Sports1M can be extremely expensive. For UCF101, although the number of frames is comparable to ImageNet, the high spatial correlation among the videos makes the actual diversity in the training much lesser. Also, given the similar theme (sports) across both the datasets, generalization of benchmarked architectures to other tasks remained a problem. This has been solved lately with the introduction of Kinetics dataset[2].

    Sample illustration of UCF-101. Source.

It must be noted here that abnormality detection from 3D medical images doesn’t involve all the challenges mentioned here. The major differences between action recognition from medical images are mentioned as below

  1. In case of medical imaging, the temporal context may not be as important as action recognition. For example, detecting hemorrhage in a head CT scan could involve much less temporal context across slices. Intracranial hemorrhage can be detected from a single slice only. As opposed to that, detecting lung nodule from chest CT scans would involve capturing temporal context as the nodule as well as bronchi and vessels all look like circular objects in 2D scans. It’s only when 3D context is captured, that nodules can be seen as spherical objects as opposed to cylindrical objects like vessels
  2. In case of action recognition, most of the research ideas resort to using pre-trained 2D CNNs as a starting point for drastically better convergence. In case of medical images, such pre-trained networks would be unavailable.

Overview of approaches

Before deep learning came along, most of the traditional CV algorithm variants for action recognition can be broken down into the following 3 broad steps:

  1. Local high-dimensional visual features that describe a region of the video are extracted either densely [3] or at a sparse set of interest points[4 , 5].
  2. The extracted features get combined into a fixed-sized video level description. One popular variant to the step is to bag of visual words (derived using hierarchical or k-means clustering) for encoding features at video-level.
  3. A classifier, like SVM or RF, is trained on bag of visual words for final prediction

Of these algorithms that use shallow hand-crafted features in Step 1, improved Dense Trajectories [6] (iDT) which uses densely sampled trajectory features was the state-of-the-art. Simultaneously, 3D convolutions were used as is for action recognition without much help in 2013[7]. Soon after this in 2014, two breakthrough research papers were released which form the backbone for all the papers we are going to discuss in this post. The major differences between them was the design choice around combining spatiotemporal information.

Approach 1: Single Stream Network

In this work [June 2014], the authors – Karpathy et al. – explore multiple ways to fuse temporal information from consecutive frames using 2D pre-trained convolutions.


Fig 3: Fusion Ideas Source.

As can be seen in Fig 3, the consecutive frames of the video are presented as input in all setups. Single frame uses single architecture that fuses information from all frames at the last stage. Late fusion uses two nets with shared params, spaced 15 frames apart, and also combines predictions at the end. Early fusion combines in the first layer by convolving over 10 frames. Slow fusion involves fusing at multiple stages, a balance between early and late fusion. For final predictions, multiple clips were sampled from entire video and prediction scores from them were averaged for final prediction.

Despite extensive experimentations the authors found that the results were significantly worse as compared to state-of-the-art hand-crafted feature based algorithms. There were multiple reasons attributed for this failure:

  1. The learnt spatiotemporal features didn’t capture motion features
  2. The dataset being less diverse, learning such detailed features was tough

Approach 2: Two Stream Networks

In this pioneering work [June 2014] by Simmoyan and Zisserman, the authors build on the failures of the previous work by Karpathy et al. Given the toughness of deep architectures to learn motion features, authors explicitly modeled motion features in the form of stacked optical flow vectors. So instead of single network for spatial context, this architecture has two separate networks – one for spatial context (pre-trained), one for motion context. The input to the spatial net is a single frame of the video. Authors experimented with the input to the temporal net and found bi-directional optical flow stacked across for 10 successive frames was performing best. The two streams were trained separately and combined using SVM. Final prediction was same as previous paper, i.e. averaging across sampled frames.

2 stream architecture

Fig 4: Two stream architecture Source.

Though this method improved the performance of single stream method by explicitly capturing local temporal movement, there were still a few drawbacks:

  1. Because the video level predictions were obtained from averaging predictions over sampled clips, the long range temporal information was still missing in learnt features.
  2. Since training clips are sampled uniformly from videos, they suffer from a problem of false label assignemnt. The ground truth of each of these clips are assumed same as ground truth of the video which may not be the case if the action just happens for a small duration within the entire video.
  3. The method involved pre-computing optical flow vectors and storing them separately. Also, the training for both the streams was separate implying end-to-end training on-the-go is still a long road.


Following papers which are, in a way, evolutions from the two papers (single stream and two stream) which are summarized as below:

  1. LRCN
  2. C3D
  3. Conv3D & Attention
  4. TwoStreamFusion
  5. TSN
  6. ActionVlad
  7. HiddenTwoStream
  8. I3D
  9. T3D

The recurrent theme around these papers can be summarized as follows. All of the papers are improvisations on top of these basic ideas.

SegNet Architecture

Recurrent theme across papers. Source.

For each of these papers, I list down their key contributions and explain them.
I also show their benchmark scores on UCF101-split1.


  • Long-term Recurrent Convolutional Networks for Visual Recognition and Description
  • Donahue et al.
  • Submitted on 17 November 2014
  • Arxiv Link

Key Contributions:

  • Building on previous work by using RNN as opposed to stream based designs
  • Extension of encoder-decoder architecture for video representations
  • End-to-end trainable architecture proposed for action recognition


In a previous work by Ng et al[9]. authors had explored the idea of using LSTMs on separately trained feature maps to see if it can capture temporal information from clips. Sadly, they conclude that temporal pooling of convoluted features proved more effective than LSTM stacked after trained feature maps. In the current paper, authors build on the same idea of using LSTM blocks (decoder) after convolution blocks(encoder) but using end-to-end training of entire architecture. They also compared RGB and optical flow as input choice and found that a weighted scoring of predictions based on both inputs was the best.

2 stream architecture2 stream architecture

Fig 5: Left: LRCN for action recognition. Right: Generic LRCN architecture for all tasks Source.


During training, 16 frame clips are sampled from video. The architecture is trained end-to-end with input as RGB or optical flow of 16 frame clips. Final prediction for each clip is the average of predictions across each time step. The final prediction at video level is average of predictions from each clip.

Benchmarks (UCF101-split1):

82.92Weighted score of flow and RGB inputs
71.1Score with just RGB

My comments:

Even though the authors suggested end-to-end training frameworks, there were still a few drawbacks

  • False label assignment as video was broken to clips
  • Inability to capture long range temporal information
  • Using optical flow meant pre-computing flow features separately

Varol et al. in their work[10] tried to compensate for the stunted temporal range problem by using lower spatial resolution of video and longer clips (60 frames) which led to significantly better performance.


  • Learning Spatiotemporal Features with 3D Convolutional Networks
  • Du Tran et al.
  • Submitted on 02 December 2014
  • Arxiv Link

Key Contributions:

  • Repurposing 3D convolutional networks as feature extractors
  • Extensive search for best 3D convolutional kernel and architecture
  • Using deconvolutional layers to interpret model decision


In this work authors built upon work by Karpathy et al. However, instead of using 2D convolutions across frames, they used 3D convolutions on video volume. The idea was to train these vast networks on Sports1M and then use them (or an ensemble of nets with different temporal depths) as feature extractors for other datasets. Their finding was a simple linear classifier like SVM on top of ensemble of extracted features worked better than she ttate-of-the-art algorithms. The model performed even better if hand crafted features like iDT were used additionally.

SegNet Architecture

Differences in C3D paper and single stream paper Source.

The other interesting part of the work was using deconvolutional layers (explained here) to interpret the decisions. Their finding was that the net focussed on spatial appearance in first few frames and tracked the motion in the subsequent frames.


During training, five random 2-second clips are extracted for each video with ground truth as action reported in the entire video. In test time, 10 clips are randomly sampled and predictions across them are averaged for final prediction.

SegNet Architecture

3D convolution where convolution is applied on a spatiotemporal cube.

Benchmarks (UCF101-split1):

82.3C3D (1 net) + linear SVM
85.2C3D (3 nets) + linear SVM
90.4C3D (3 nets) + iDT + linear SVM

My comments:

The long range temporal modeling was still a problem. Moreover, training such huge networks is computationally a problem – especially for medical imaging where pre-training from natural images doesn’t help a lot.

Note: Around the same time Sun et al.[11] introduced the concept of factorized 3D conv networks (FSTCN), where the authors explored the idea of breaking 3D convolutions into spatial 2D convolutions followed by temporal 1D convolutions. The 1D convolution, placed after 2D conv layer, was implemented as 2D convolution over temporal and channel dimension. The factorized 3D convolutions (FSTCN) had comparable results on UCF101 split.

SegNet Architecture

FSTCN paper and the factorization of 3D convolution Source.

Conv3D & Attention

  • Describing Videos by Exploiting Temporal Structure
  • Yao et al.
  • Submitted on 25 April 2015
  • Arxiv Link

Key Contributions:

  • Novel 3D CNN-RNN encoder-decoder architecture which captures local spatiotemporal information
  • Use of an attention mechanism within a CNN-RNN encoder-decoder framework to capture global context


Although this work is not directly related to action recognition, but it was a landmark work in terms of video representations. In this paper the authors use a 3D CNN + LSTM as base architecture for video description task. On top of the base, authors use a pre-trained 3D CNN for improved results.


The set up is almost same as encoder-decoder architecture described in LRCN with two differences

  1. Instead of passing features from 3D CNN as is to LSTM, 3D CNN feature maps for the clip are concatenated with stacked 2D feature maps for the same set of frames to enrich representation {v1, v2, …, vn} for each frame i. Note: The 2D & 3D CNN used is a pre-trained one and not trained end-to-end like LRCN
  2. Instead of averaging temporal vectors across all frames, a weighted average is used to combine the temporal features. The attention weights are decided based on LSTM output at every time step.

Attention Mechanism

Attention mechanism for action recognition. Source.


Network used for video description prediction

My comments:

This was one of the landmark work in 2015 introducing attention mechanism for the first time for video representations.


  • Convolutional Two-Stream Network Fusion for Video Action Recognition
  • Feichtenhofer et al.
  • Submitted on 22 April 2016
  • Arxiv Link

Key Contributions:

  • Long range temporal modeling through better long range losses
  • Novel multi-level fused architecture


In this work, authors use the base two stream architecture with two novel approaches and demonstrate performance increment without any significant increase in size of parameters. The authors explore the efficacy of two major ideas.

  1. Fusion of spatial and temporal streams (how and when) – For a task discriminating between brushing hair and brushing teeth – spatial net can capture the spatial dependency in a video (if it’s hair or teeth) while temporal net can capture presence of periodic motion for each spatial location in video. Hence it’s important to map spatial feature maps pertaining to say a particular facial region to temporal feature map for the corresponding region. To achieve the same, the nets need to be fused at an early level such that responses at the same pixel position are put in correspondence rather than fusing at end (like in base two stream architecture).
  2. Combining temporal net output across time frames so that long term dependency is also modeled.


Everything from two stream architecture remains almost similar except

  1. As described in the figure below, outputs of conv_5 layer from both streams are fused by conv+pooling. There is yet another fusion at the end layer. The final fused output was used for spatiotemporal loss evaluation.

    SegNet Architecture

    Possible strategies for fusing spatial and temporal streams. The one on right performed better. Source.

  2. For temporal fusion, output from temporal net, stacked across time, fused by conv+pooling was used for temporal loss

SegNet Architecture

Two stream fusion architecture. There are two paths one for step 1 and other for step 2 Source.

Benchmarks (UCF101-split1):

94.2TwoStreamfusion + iDT

My comments:
The authors established the supremacy of the TwoStreamFusion method as it improved the performance over C3D without the extra parameters used in C3D.


  • Temporal Segment Networks: Towards Good Practices for Deep Action Recognition
  • Wang et al.
  • Submitted on 02 August 2016
  • Arxiv Link

Key Contributions:

  • Effective solution aimed at long range temporal modeling
  • Establishing the usage of batch normalization, dropout and pre-training as good practices


In this work authors improved on two streams architecture to produce state-of-the-art results. There were two major differences from the original paper

  1. They suggest sampling clips sparsely across the video to better model long range temporal signal instead of the random sampling across entire video.
  2. For final prediction at video-level authors explored multiple strategies. The best strategy was
    1. Combining scores of temporal and spatial streams (and other streams if other input modalities are involved) separately by averaging across snippets
    2. Fusing score of final spatial and temporal scores using weighted average and applying softmax over all classes.

The other important part of the work was establishing the problem of overfitting (due to small dataset sizes) and demonstrating usage of now-prevalent techniques like batch normalization, dropout and pre-trainign to counter the same. The authors also evaluated two new input modalities as alternate to optical flow – namely warped optical flow and RGB difference.


During training and prediction a video is divided into K segments
of equal durations. Thereafter, snippets are sampled randomly from each of the K segments. Rest of the steps remained similar to two stream architecture with changes as mentioned above.

SegNet Architecture

Temporal Segment Network architecture. Source.

Benchmarks (UCF101-split1):

94.0TSN (input RGB + Flow )
94.2TSN (input RGB + Flow + Warped flow)

My comments:

The work attempted to tackle two big challenges in action recognition – overfitting due to small sizes and long range modeling and the results were really strong. However,the problem of pre-computing optical flow and related input modalities was still a problem at large.


  • ActionVLAD: Learning spatio-temporal aggregation for action classification
  • Girdhar et al.
  • Submitted on 10 April 2017
  • Arxiv Link

Key Contributions:

  • Learnable video-level aggregation of features
  • End-to-end trainable model with video-level aggregated features to capture long term dependency


In this work, the most notable contribution by the authors is the usage of learnable feature aggregation (VLAD) as compared to normal aggregation using maxpool or avgpool. The aggregation technique is akin to bag of visual words. There are multiple learned anchor-point (say c1, …ck) based vocabulary representing k typical action (or sub-action) related spatiotemporal features. The output from each stream in two stream architecture is encoded in terms of k-space “action words” features – each feature being difference of the output from the corresponding anchor-point for any given spatial or temporal location.

SegNet Architecture

ActionVLAD – Bag of action based visual “words". Source.

Average or max-pooling represent the entire distribution of points as only a single descriptor which can be sub-optimal for representing an entire video composed of multiple sub-actions. In contrast, the proposed video aggregation represents an entire distribution of descriptors with multiple sub-actions by splitting the descriptor space into k cells and pooling inside each of the cells.

SegNet Architecture

While max or average pooling are good for similar features, they do not not adequately capture the complete distribution of features. ActionVlAD clusters the appearance and motion features and aggregates their residuals from nearest cluster centers. Source.


Everything from two stream architecture remains almost similar except the usage of ActionVLAD layer. The authors experiment multiple layers to place ActionVLAD layer with the late fusion after conv layers working out as the best strategy.

Benchmarks (UCF101-split1):

93.6ActionVLAD + iDT

My comments:
The use of VLAD as an effective way of pooling was already proved long back. The extension of the same in an end-to-end trainable framework made this technique extremely robust and state-of-the-art for most action recognition tasks in early 2017.


  • Hidden Two-Stream Convolutional Networks for Action Recognition
  • Zhu et al.
  • Submitted on 2 April 2017
  • Arxiv Link

Key Contributions:

  • Novel architecture for generating optical flow input on-the-fly using a separate network


The usage of optical flow in the two stream architecture made it mandatory to pre-compute optical flow for each sampled frame before hand thereby affecting storage and speed adversely. This paper advocates the usage of an unsupervised architecture to generate optical flow for a stack of frames.

Optical flow can be regarded as an image reconstruction problem. Given a pair of adjacent frames I1 and I2 as input, our CNN generates a flow field V. Then using the predicted flow field V and I2, I1 can be reconstructed as I1 using inverse warping such that difference between I1 and it’s reconstruction is minimized.


The authors explored multiple strategies and architectures to generate optical flow with largest fps and least parameters without hurting accuracy much. The final architecture was same as two stream architecture with changes as mentioned:

  1. The temporal stream now had the optical flow generation net (MotionNet) stacked on the top of the general temporal stream architectures. The input to the temporal stream was now consequent frames instead of preprocessed optical flow.
  2. There’s an additional multi-level loss for the unsupervised training of MotionNet

The authors also demonstrate improvement in performance using TSN based fusion instead of conventional architecture for two stream approach.

SegNet Architecture

HiddenTwoStream – MotionNet generates optical flow on-the-fly. Source.

Benchmarks (UCF101-split1):

89.8Hidden Two Stream
92.5Hidden Two Stream + TSN

My comments:
The major contribution of the paper was to improve speed and associated cost of prediction. With automated generation of flow, the authors relieved the dependency on slower traditional methods to generate optical flow.


  • Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
  • Carreira et al.
  • Submitted on 22 May 2017
  • Arxiv Link

Key Contributions:

  • Combining 3D based models into two stream architecture leveraging pre-training
  • Kinetics dataset for future benchmarking and improved diversity of action datasets


This paper takes off from where C3D left. Instead of a single 3D network, authors use two different 3D networks for both the streams in the two stream architecture. Also, to take advantage of pre-trained 2D models the authors repeat the 2D pre-trained weights in the 3rd dimension. The spatial stream input now consists of frames stacked in time dimension instead of single frames as in basic two stream architectures.


Same as basic two stream architecture but with 3D nets for each stream

Benchmarks (UCF101-split1):

93.4Two Stream I3D
98.0Imagenet + Kinetics pre-training

My comments:

The major contribution of the paper was the demonstration of evidence towards benefit of using pre-trained 2D conv nets. The Kinetics dataset, that was open-sourced along the paper, was the other crucial contribution from this paper.


  • Temporal 3D ConvNets: New Architecture and Transfer Learning for Video Classification
  • Diba et al.
  • Submitted on 22 Nov 2017
  • Arxiv Link

Key Contributions:

  • Architecture to combine temporal information across variable depth
  • Novel training architecture & technique to supervise transfer learning between 2D pre-trained net to 3D net


The authors extend the work done on I3D but suggest using a single stream 3D DenseNet based architecture with multi-depth temporal pooling layer (Temporal Transition Layer) stacked after dense blocks to capture different temporal depths The multi depth pooling is achieved by pooling with kernels of varying temporal sizes.

SegNet Architecture

TTL Layer along with rest of DenseNet architecture. Source.

Apart from the above, the authors also devise a new technique of supervising transfer learning betwenn pre-trained 2D conv nets and T3D. The 2D pre-trianed net and T3D are both presented frames and clips from videos where the clips and videos could be from same video or not. The architecture is trianed to predict 0/1 based on the same and the error from the prediction is back-propagated through the T3D net so as to effectively transfer knowledge.

SegNet Architecture

Transfer learning supervision. Source.


The architecture is basically 3D modification to DenseNet [12] with added variable temporal pooling.

Benchmarks (UCF101-split1):

91.7T3D + Transfer
93.2T3D + TSN

My comments:

Although the results don’t improve on I3D results but that can mostly attributed to much lower model footprint as compared to I3D. The most novel contribution of the paper was the supervised transfer learning technique.


  1. ConvNet Architecture Search for Spatiotemporal Feature Learning by Du Tran et al.
  2. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
  3. Action recognition by dense trajectories by Wang et. al.
  4. On space-time interest points by Laptev
  5. Behavior recognition via sparse spatio-temporal features by Dollar et al
  6. Action Recognition with Improved Trajectories by Wang et al.
  7. 3D Convolutional Neural Networks for Human Action Recognition by Ji et al.
  8. Large-scale Video Classification with Convolutional Neural Networks by Karpathy et al.
  9. Beyond Short Snippets: Deep Networks for Video Classification by Ng et al.
  10. Long-term Temporal Convolutions for Action Recognition by Varol et al.
  11. Human Action Recognition using Factorized Spatio-Temporal Convolutional Networks by Sun et al.
  12. Densely Connected Convolutional Networks by Huang et al.