January 10, 2019

Experimenting with AI to improve drug discovery

Anti-cancer drugs, antibiotics or other drugs have always been a key tool at doctors’ disposal for treating patients with a range of diseases. However, the first step in finding molecules that eventually make it to the clinic is both expensive and time-consuming, so anything researchers can do to find newer drugs more quickly and more inexpensively is crucial to developing better therapies.

A group of UW-Madison researchers with backgrounds in drug discovery, math, computer science and statistics have received a UW2020 grant to develop new computational tools to virtually screen compounds, including those that target cancer-driving proteins, looking for the ones that are most promising to begin testing. Using these tools, the researchers are hoping to reduce the cost and scale of early-stage drug discovery.

“We need to find way of making that process more efficient so that it can be available to more academic researchers,” said chemist Scott Wildman, PhD, an associate scientist at the UW Carbone Cancer Center’s Small Molecule Screening Facility. “If we can use the computational tools to reduce that upfront cost, then government funding agencies can get a lot more projects through for the same amount of money.”

Wildman and his colleague, Spencer Ericksen, PhD, interact with researchers from all over campus for both experimental screening and for computational work, looking to find the best machine learning algorithms for drug discovery. The researchers start by inputting lots of experimental data (from real-world, non-virtual screens) and telling the computer which molecules are “hits” and which are not. Then, the computer “learns” how to make predictions based on these confirmed results, and the new models can be used to predict on previously untested small molecules.

For example, in a recent study, SMSF researchers worked with UW Carbone member Anthony Gitter, PhD, and UW SMPH Associate Dean for Basic Research Jim Keck, PhD, to identify small molecules that could potentially disrupt protein interactions important to DNA replication and repair, processes essential to cell division. They first conducted a lab-based screen of 75,000 compounds to identify the subset of compounds that were active in their experiments. Then, they used those experimental results to train models to make predictions on another 25,000 previously untested compounds.

“The hit rate was incredible,” Ericksen said. “We found that our best virtual screening method identified 37 of the 54 experimentally determined active compounds within the top 250 predictions.”

The researchers are now in the process of using the machine learning algorithm derived from that dataset of 100,000 molecules – 75,000 experimentally tested and 25,000 virtually tested – to predict on 10 million new compounds.

SMSF researchers also compared different machine learning techniques to see if one is better than the others. Previous studies of virtual drug discovery touted the superior capabilities of computationally heavy “deep” learning techniques, but Ericksen, Wildman and colleagues found a much simpler learning model worked better at predicting hits with Keck’s specific proteins of interest. Whereas a deep learning model requires a large computing cluster, like UW-Madison’s Center for High Throughput Computing, the simpler model could be run on any laptop, meaning that virtual screening methods could be accessible to more researchers than had previously been assumed.

“It’s not always obvious which mathematical model or algorithm you should be using,” Wildman said. “We’re trying to figure out when one of these techniques is going to be better than the other.”

And while the role of artificial intelligence in drug discovery is not going away anytime soon, it will never be the only tool.  For example, one problem the scientists are working on is addressing a situation where researchers have a very promising target protein, but little is known about its structure or how small molecules might interact with it. Also, the computers will never completely replace lab verification.

“I think machine learning, deep learning, artificial intelligence in drug discovery will turn out to be a useful tool, but it won’t solve everything,” Wildman said. “These tools make predictions and those predictions still don’t mean anything until we do an actual experiment to show whether it’s real.”