Researchers in the lab of Vijay Pande, PhD, have developed an algorithm to help make predictions about the properties of molecules. This algorithm could be beneficial in the early stages of drug design when a researcher is trying to choose what molecule to study from a group of molecules that each show some potential.
Customarily, researchers need to train algorithms using thousands to trillions of data points. This new algorithm was able to produce fairly accurate predictions about toxicity and side effects of molecules using fewer than 30 data points. The study was published this week in ACS Central Science.
The researchers describe this type of deep learning, called one-shot learning, as analogous to how toddlers can identify giraffes after seeing a picture of a giraffe only once. People are quite good at one-shot learning, but computers struggle to match our ability.
As Pande, who is a professor of chemistry and senior author of the paper, explained in a recent Stanford News release, the use of just a few data points is crucial for applying artificial intelligence to drug design: "The issue is, once you have thousands of examples in drug design, you probably already have a successful drug."
The release explains:
Other researchers have successfully applied one-shot learning to image recognition and genomics, but applying it to problems relevant to drug development is a bit different. Whereas pixels and bases are fairly natural types of data to feed into an algorithm, properties of small molecules aren’t.
To make molecular information more digestible, the researchers first represented each molecule in terms of the connections between atoms (what a mathematician would call a graph). This step highlighted intrinsic properties of the chemical in a form that an algorithm could process.
The researchers admit that they were surprised by their success, but they’ve wasted no time seeing what else they can accomplish. Already, the Pande lab is testing the algorithm on different chemical compositions for solar cells. They’ve also made the code freely available at the DeepChem library.
Previously: Decoding proteins using your very own super computer and Computer algorithm predicts outcome for leukemia patients
Photo by L.A. Cicero