Data Scientists Should Do Drugs!

Data Scientists Should Do Drugs!

By Keith Black, PhD, CFA, CAIA, FDP

Now that this attention-grabbing headline has drawn you in, let me clarify. Data scientists should not partake in illegal drugs. Data scientists should participate in pharmacological research, as artificial intelligence and machine learning can add value, even when the data scientist does not have a background or training in physics, biology, chemistry, or medicine.

The CAIA Association and FDP Institute had a recent conversation with Woody Sherman, the CSO of Silicon Therapeutics. While many of us can be left behind in a discussion of computational drug discovery, it seems that almost everyone today is a budding epidemiologist trying to better understand the prevention and spread of COVID-19, so let’s continue.

Molecular dynamics sounds complicated but watching animations of how drug molecules move and interact with body chemistry is quite fascinating. The dangerous thing about COVID-19 is the spike proteins that reach out and grab your lungs and won’t let go. If scientists can figure out how to prevent these proteins from interacting with, and binding to, other molecules in our body, then the impact of the virus can be vastly reduced. Building simulations on what affects how molecules move and bind or not bind with other molecules in our body is computationally intense. Physical scientists and data scientists are working together to screen millions of potential formulations, seeking to maximize efficacy and minimize side effects.

But how can data scientists contribute to the effort without a background in medicine or physics? For those looking to make novel predictions, design completely new drugs, and provide insights, intensive training in the hard sciences is required. Of course, there is a limited supply of PhDs in physics, medicine, and biochemistry who can do this groundbreaking research.

As you can see in Sherman’s chart above, there are a number of ways that data scientists trained only in artificial intelligence and machine learning can contribute to pharmacological research. In fact, the availability and sophistication of data scientists is growing with the publication of open source algorithms from Facebook and Google, as well as from R and Python libraries. In the @DataCamp class on AI Fundamentals, we learn that some machine learning models can be implemented in eight lines of or less of Python code. @DataRobot allows data scientists to build, implement, and interpret machine learning models through a GUI without requiring any Python coding by the analyst.

The key to applying AI and ML approaches is lots and lots of data. Data scientists can use these pattern recognition techniques when given a data set of well-known diseases, such as cancers. For example, radiologists looking for tumors can be aided significantly by deep learning and pattern recognition programs. “Artificial intelligence won’t necessarily replace radiologists, but it will replace radiologists who don’t use artificial intelligence,” says Raymond Liu, MD, associate radiologist at Massachusetts General Hospital. “The idea is that artificial intelligence will be an augmentation tool.”[i] Liu states that routine brain MRIs using deep learning can predict molecular markers with over 90% accuracy, which is far superior to what human radiologists can achieve working without this technology. That is, we are benefiting from the merger between human intelligence and artificial intelligence.

A common theme in our FDP webinars is that diverse teams can be quite productive. Whether in finance or medicine, we need to team the subject matter experts with data scientists. While the data scientists are expert in machine learning and artificial intelligence, they can make significant contributions without a deep background in the subject being analyzed.

At Silicon Therapeutics, the physicists and chemists design the new molecules and provide instructions on how to model the interactions between molecules and diseases as well as the parameters to simulate variations of these novel medicines. Starting with a relatively small number of ideas, physics and quantum mechanics models build millions of potential molecules and turn them over to the data scientist team. Because these molecules are very expensive to physically build, the first task of the AI/ML team is to suggest which molecules have the most promise to meet the pharmacological goals for the new formulation. AI and ML models can be built and run much more quickly than models based on physics. Many of these models are run just a couple of towns over from the Amherst, Massachusetts, headquarters of the FDP Institute, where the Massachusetts Green High Performance Computing Center in Holyoke, MA, houses supercomputers that are shared by numerous universities and researchers in New England.

The future of drug discovery is highly dependent on continued innovations in computing. One of the key innovations is the use of graphics processing units (GPUs). Commonly used by professional video game players or Bitcoin miners, GPUs accelerated the run time of pharmacological models from months to just hours or days. The Folding@Home ( program encourages gamers to allow scientific research projects to access their GPC when their PC is idle. Other technologies accelerating drug discovery efforts include cloud computing, quantum computing, and neuromorphic computing.

Finally, ML can help with the design of clinical trials by identifying and optimizing the use of multiple populations of people with different genetics. That is, the potential drug discovery needs to be tested on a wide variety of patients. The broader the genetic characteristics of a population helped by a drug, the greater the potential of the drug, both to help the world medically and to help the researcher financially. While standard drug research takes 2 to 5 years to reach human trials and at least another 5 years of human trials before reaching FDA approval, there are hopes that these advances in medicine and computing can somewhat predict or tame COVID-19 by the end of the year. Eventually these breakthroughs in physical and computing research can lead to personalized medicine where a specific person’s genetics are analyzed to offer a customized diet and medicinal regimen to optimize their health.

Even amateur epidemiologists can contribute to this effort. Artificial intelligence plus human intelligence are better together than separate. Physical scientists can be helped by data scientists. We’re all in this together. We all can lend a hand, even without a PhD in biochemistry or physics.

Woody Sherman has a PhD in Computational Chemistry from MIT. If you like watching visual simulations of molecular dynamics or want to verify or disprove my layman’s explanation of computational drug discovery, you are encouraged to hear how this works directly from Dr. Sherman.

Keith Black, PhD, CFA, CAIA, FDP is Managing Director of Content Strategy at CAIA Association. Follow him on Twitter and LinkedIn.

To find the schedule for future FDP webinars, see here.

More information on the FDP exam can be found here.

[i] Artificial Intelligence in Radiology: Friend or Foe. Whitney Palmer, Diagnostic Imaging, October 2018.

Be Sociable, Share!

Leave A Reply

← Diving Deeper into the Deep Learning Pool Volatility Forecasting Across the Financial Markets →