Skip to content

New antibiotics are desperately needed: Machine learning could help

Scientists have created an algorithm that works to generate and refine DNA sequences that are likely to code for antimicrobial proteins.

As the threat of antibiotic resistance looms, microbiologists aren't the only ones thinking up new solutions. James Zou, PhD, a biomedical data scientist at Stanford, is applying machine learning to create an algorithm that generates thousands of entirely new DNA sequences with the intent to one day create antimicrobial proteins.

The algorithm, called Feedback GAN, essentially acts as a mass producer of different DNA snippets. And while these sequence attempts are somewhat random, the algorithm isn't working blindly. It's basing the new possible peptides, or small groups of amino acids (a protein is a larger compound also composed of amino acids), on previous research that lays out the DNA sequences most likely to align with antimicrobial properties.

For now, these templates — which don't exist in nature — are theoretical, generated on a computer. But in the face of rising concerns about microbe resistance, Zou emphasizes that it's critical to think about solutions that don't already exist.

"We chose to pursue antimicrobial proteins because it's a very important, high impact problem that's also a relatively tractable problem for the algorithm," said Zou. "There are already existing tools that we incorporate into our system that evaluate if a new sequence is likely to have the properties of a successful antimicrobial protein."

Feedback GAN builds on that, working to incorporate just the right balance of random chance and precision.

A paper detailing the algorithm published online in Nature Machine Learning. Anvita Gupta, a computer science student, is the first author; Zou is the senior author.

The thing about Gupta and Zou's algorithm is that it doesn't just churn out new combinations of DNA, it's also actively refining itself, learning what works and what doesn't through a feedback loop: After the algorithm spits out a wide range of DNA sequences it runs a trial-and-error learning process that sifts through the peptide suggestions. Based on their resemblance to other known antimicrobial peptides, the "good" ones get fed back into the algorithm to inform future DNA sequences generated from the code, and to get refined themselves.

"There's a built-in arbiter and by having this feedback loop, the system learns to model newly generated sequences after those that are deemed likely to have antimicrobial properties," said Zou. "So the idea is both individual peptide sequences and the generation of the sequences gets better and better."

Zou has also considered another core component of hypothetical proteins: protein folding. Proteins contort into very specific structures linked to their functions. An algorithm could create the perfect sequence, but unless it can fold up, it's useless — like the cogs of a clock strewn on a table.

In this vein, Zou can tweak the algorithm so that instead of analyzing a propensity for antimicrobial properties, it determines the likelihood of correct folding.

"We can actually do these two things in parallel where we look at antimicrobial properties of one sequence and folding likelihood of another," said Zou. "We run both so that we're optimizing either the antimicrobial properties or its ability to fold."

Next, Zou hopes to merge the two variations of the algorithm to create peptide sequences that are optimized for both their microbe-killing abilities, and their ability to fold into a genuine protein.

Photo by Marc A

Popular posts

How the tobacco industry began funding courses for doctors

Earlier this year, the largest tobacco company in the world paid millions to fund continuing medical education courses on nicotine addiction —16,000 physicians and other health care providers took them.