at the Forefront of ML Research and Applications in Healthcare

December 24, 2020

In the past 4 years there has been an explosion of research on the application of AI in biology-related fields, particularly in medicine, pharmaceutical development and healthcare administration. 

According to the NIH’s National Library of Medicine, published papers on deep learning, NLP, computer vision and reinforcement learning in this field have been growing at over 50% per annum since 2017 and papers published in the last year already account for 25% of all output in this area in the past 10 years.

Specific advancements and new applications of AI technology are being demonstrated in the areas of drug discovery and screening,1,2,3 prediction of drug success likelihood,4 prediction of disease outcomes,5 guidance on how novel molecules may be synthesized in the lab,6,7 image analysis of microscopy,8 improved diagnostics9 and prediction of health outcomes based on changes in nutrition.10

AI Research: Causality and Explainability

We are also seeing fundamental research taking place in two areas of AI that will have outsized effects on healthcare use cases in the future: causal AI and AI explainability. 

Causal AI will expand the available scope of AI models from finding correlations in data (and successfully generalizing them to previously unseen data) to making actual causal inferences. This is already being used to improve the ability of models to perform accurate differential diagnoses in a range of medical contexts.11 AI Explainability is likewise crucial for improving the trustworthiness and transparency of AI systems in healthcare.12

Availability of Data

In support of these advances, there has been significant progress in terms of the availability of data that can be used for medical and pharmaceutical research. These include open source datasets such as Stanford University’s Medical ImageNet, a petabyte-scale repository of diagnostic imaging studies used for developing intelligent image analysis systems; the Protein Data Bank (PDB) from Brookhaven National Laboratory, which now contains over 150,000 unique structures; and Recursion’s RxRx datasets, which have hundreds of thousands of new images and cell types being released annually.

Data Privacy

Given the sensitive and protected nature of medical records, however, the majority of data in healthcare applications will likely always remain private. To deal with this issue, research is also ongoing in the area of federated learning, a technique for allowing machine learning to be conducted while maintaining data privacy and secrecy. Research papers on this topic grew almost 5x from 2018 to 2019, and more papers were published on federated learning in the first half of 2020 than in all of 2019.13

Startups and Collaborative Efforts

Much of these research efforts into new applications of AI in the areas of medicine, healthcare administration and pharmaceutical development has been collaborative in nature. We are currently seeing a proliferation of startups partnering with established pharmaceutical companies, companies conducting joint research with universities, as well as academic consortia and industry groups all creating collaborative groups to focus on these topics. These collaborations contain tremendous potential but also present their own challenges in terms of culture, data security and infrastructure management.

Healthcare Administration

It is also important to note that activity in this area is not merely confined to research. There are real world trials of AI-discovered drugs currently in progress – both Sumitomo Dainippon in Japan and Sanofi in France are conducting phase-1 clinical trials on molecules developed using AI techniques.14,15

Much work still needs to be done, however, in order for this new AI revolution to truly be successful – the regulatory environment must be updated to allow for approval of new AI-developed drugs, services and devices. Insurance policies must also be modified in order to allow for reimbursement for AI-enabled procedures and AI-developed drugs as well.

Incorporating AI into Research Practices and Workflows

Finally, research practices must be updated for evaluating new AI health interventions. A recent review of 20,000 AI studies found that less than 1% of them had sufficiently high-quality design and reporting. Studies frequently suffer from a lack of external validation by independent research groups, generalizability to new datasets and poor data quality.16

Going forward, it is clear that AI will be used for more than drug identification and disease diagnosis. It will also be used to improve patient outcomes and efficiency across the healthcare industry by identifying workflow optimization strategies, reducing failure of delivery, overtreatment, mispricing, fraud and abuse; also integrating with smart health record organization and retrieval and even improving sales & marketing for healthcare organizations. The picture that is emerging for virtually every member of the healthcare industry is that AI transformation will be central going forward. Involvement in Pharmaceutical Research and Healthcare

At, we have been actively participating in this exciting period for AI in healthcare via conducting fundamental research, developing and deploying targeted AI solutions, and ML operations and infrastructure setup and management solving a range of operational, privacy and efficiency issues. Our work has taken the form of direct client engagements with both enterprises and SMBs, collaborations with well-known startups in the space as well as joint work with government research organizations.

One recent example is’s development of DeepCycle17, a new technology for modeling the lifecycle of cells with applications in medical and cancer research, that was conducted in coordination with the European Molecular Biology Laboratory (EMBL), Neuromation Chief Research Officer Sergey Nikolenko and Senior AI Researcher Alexander Rakhlin.

“For the first time ever, we have been able to develop distributed representations of cell images that actually have a closed cell cycle progression in time. These representations can be used to identify the ‘cell clock’, i.e., current ‘age’ of a cell, which may have important implications across the medical field,” said Nikolenko.

The DeepCycle method was developed using approximately 2.6 million microscopy images of canine kidney cells, and applying “transfer learning,” an approach to modelling that uses knowledge gained in one problem to bootstrap a different or related problem. The team in this case started with a computer vision model pre-trained on public data containing over a million common images, and then refined the model for their cell tasks, which made it possible to distinguish between microscopy images of cells

Furthermore, by using the MLOps Platform for the entire project, was able to manage the entire ML lifecycle for the project, including experiment tracking, hyperparameter tuning, remote debugging, distributed training and model deployment and monitoring. With, researchers were able to streamline infrastructure management, optimize hosting and compute costs and accelerate development and deployment of this important new technology. 

Crucially, the team was able to install, setup and manage the entire ML pipeline on EMBL’s own infrastructure by tunneling in without ever having to set foot in their offices and without the data ever being visible to our team. This means that all data remained on their servers and in their control with all privacy and security measures in place – an important capability with numerous applications in the healthcare space.’s team was able to install, setup and manage the entire ML pipeline on EMBL’s own infrastructure by tunneling in without ever having to set foot in their offices and without the data ever being visible to our team. This means that all data remained on their servers and in their control with all privacy and security measures in place

Another significant project we completed in the space was conducted in collaboration with our long-time partners Insilico Medicine – with Neuromation contributing to the creation of druGAN, an advanced generative adversarial autoencoder model for de novo generation of new molecules with desired molecular properties. This work demonstrated for the first time the improved effectiveness of using a customized adversarial autoencoder in this context over previous variational autoencoder techniques for accurately capturing the structure of large molecular databases.

In the area of computer vision applications in health, has also completed several interesting projects – including accurate surgical tool identification and health code compliance monitoring, which we will discuss in more detail in an upcoming whitepaper devoted to this topic specifically.

As this exciting area of AI research and development continues to advance, and our team of AI researchers and specialists at Neuromation look forward to continuing to contribute and collaborate with companies in pharmaceutical development and biotech, health maintenance organizations (HMO’s), insurance providers, research universities and government and non-governmental healthcare organizations to move the ball forward in the goal of improving health outcomes for all of humanity.

  1. Cell, February 2020: A Deep Learning Approach to Antibiotic Discovery
  2. Arxiv: PrincasdaSDipal Neighbourhood Aggregation for Graph Nets
  3. Jou___rnal of Medicinal Chemistry, June 2020: Machine learning on DNA-encoded libraries: A new paradigm for hit-finding
  4. BioRxiv pre-print, August 2020: Functional immune mapping with deep-learning enabled phenomics applied to immunomodulatory and COVID-19 drug discovery
  5. Nature Medicine, May 2020: Predicting conversion to wet age-related macular degeneration using deep learning
  6. arxiv: Molecular Design in Synthetically Accessible Chemical Space via Deep Reinforcement Learning
  7. arxiv: Predicting Organic Reaction Outcomes with Weisfeiler-Lehman Network
  8. EMBL release (add link)
  9. Nature, October 2020: International evaluation of an AI system for breast cancer screening
  10. Nature, October 2020: Human postprandial responses to food and potential for precision nutrition
  11. Nature, September 2020: Improving the accuracy of medical diagnosis with causal machine learning
  12. Arxiv, October 2019: Asymmetric Shapley values: incorporating causal knowledge into model-agnostic explainability
  13. Google Scholar search 
  14. IR News
  15. Financial Times, January 2020
  16. The Lancet, Digital Health
  17., October 2020