Skip to content

Using AI to find disease-causing genes

Researchers are using a new artificial intelligence-based program to help identify genes that underly diseases.

A new artificial intelligence program is helping scientists speedily sift through thousands of data sets and millions of papers to home in on genes that underly disease, drastically condensing a search process that once took months.

Using computer software, scientists can scan entire genomes, or an organism's full set of DNA, of mice that model human diseases. The goal: to identify genetic mutations that cause those diseases and open new doors for scientists to better harness genetics to develop disease treatments, said Gary Peltz, MD, PhD, professor of anesthesiology, perioperative and pain medicine at Stanford Medicine.

But to do that, scientists must search through massive sets of genomic data, which yields more false positives than researchers care to admit. It's also time intensive. Peltz wanted to make the genetic discovery process easier, faster and more accurate.

To this end, Peltz and postdoctoral fellow Zhuoqing Fang, PhD, created an automated program that sorts through DNA sequence data and can analyze more than 10,000 data sets - in this case, of mouse disease traits -- at a whack. The program sorts through 29 million published papers and assesses possible links between genes and disease traits. Then, it narrows that information down to identify the genes that may be contributing to a certain disease.

In a recently published paper in the journal Bioinformatics, the team used their automated pipeline to identify genes that are linked to diabetes and obesity, as well as cataract formation in mice.   

I spoke with Peltz about what motivated his team to combine AI and gene discovery, how this new computer pipeline works and what this may mean for the future of medicine.

What did your lab do to improve the gene discovery process?

We began analyzing large data sets detailing disease phenotypes, or physical traits, from many different types of mice. We then performed genome-wide association studies, which identify correlations between genes and phenotypes that are specific for certain diseases.

But this requires analyzing huge amounts of incoming data, which creates a lot of work. You end up with a lot of genes that may just be randomly linked with disease susceptibility or resistance. However, that correlation doesn't mean that a gene is actually involved in causing or protecting against that disease. So, you have to sort through those correlations. I'd spend three months reading the literature on the different genes, trying to figure out which ones were most likely to be involved in a given disease.

How does your new automated pipeline make this process even more manageable?  

Being a little bit lazy, I asked: "Couldn't we get the computer to help sort through these genetic correlations?" That's the basis for this pipeline.

Zhuoqing Fang, the co-author of the study, had a lot of experience with AI and computer programs. What he did was really nothing short of phenomenal. His work uses AI to assess the likelihood that a candidate gene is involved in disease development.

The AI program identifies genes correlated with disease traits, such as in diabetes and cataracts, in mice. Fang downloaded 29 million papers and the AI program read all of them and determined if a candidate gene was mentioned in a paper about a particular disease. The program is looking for the co-occurrence of gene X and disease Y in a paper.

And, since many human diseases result from interactions between proteins, we used the AI program to analyze a database of protein-protein interactions. That way we get a better sense of if a protein produced by a candidate gene is related to a disease.

The algorithm then compiles this data, which is used to assess the strength of the relationship between a candidate gene and a disease phenotype.

Has the pipeline helped you identify genes that are associated with a given disease?

In one instance, we saw some, but not all, mice that had a mouse version of cataracts had a very high incidence of the condition. So we asked a simple question: "What's the difference between the mice that develop cataracts and those that do not?" The AI program identified a gene called Nid1, which is active in cells that sit right in a membrane located inside your eye. The program discovered this by finding a paper, which identified a type of mutation that was just like the one we saw in the mice, but it caused cataracts in cattle.

It really lets the cream rise to the top and filters out a lot of junk.

What do you hope to do with this computational pipeline?

I hope it will enable us to make additional genetic discoveries, which can be the basis for new diagnostic tests and therapies that will improve patient care. Right now, we're using our AI program to analyze mouse genomes. However, this tool can also be used to analyze other types of data sets, such as human genetic data or lists of proteins, that could further reveal how genetic changes cause diseases.

Understanding the genetic architecture underlying disease susceptibility could enable doctors to diagnose diseases in their patients more precisely, and a person's genetic information can be used to identify the best treatment or to develop a disease prevention plan. It could also help guide new drug development. If AI can drive your car, why can't we use AI to make genetic discoveries?

Photo Siarhei

Popular posts