When Shayla Haddock’s doctors tested her for a rare genetic disease in 2012, they couldn’t pinpoint a diagnosis. Her lifelong symptoms — which include club feet, short stature, unusual facial features and congenital deafness — led her doctors to suspect a disease-causing gene mutation. But for children like Shayla, finding the culprit among 3 billion base pairs of DNA can be very difficult. Each case takes 20 to 40 hours of analysis by a trained geneticist after gene sequencing has been done, and around 75 percent of patients don’t get a diagnosis on the first try.
As I described in a recent story, Shayla’s case was eventually solved by a team of Stanford computer scientists who devised an automated way to compare patients’ symptoms and mutated genes to information in existing databases of genetic diseases. In early 2016, Shalya was found to have a disease reported in the medical literature a few years earlier, only two weeks after her doctors initially told her family they couldn’t find an answer.
Now, the same Stanford team, led by computer scientist and genomicist Gill Bejerano, PhD, has gone another step toward ending diagnostic “near misses:” They’ve developed a more granular tool that automatically evaluates single-letter mistakes in the genetic code. The new tool, which is called M-CAP and described in a paper published today in Nature Genetics, uses a machine-learning algorithm to classify genetic variants according to whether they are likely to cause disease. It’s freely available online for non-commercial purposes to geneticists around the world.
"If you take a pool of all the nastiest mutations in our genome, tens of thousands of changes implicated in causing severe early childhood disease, and compare them to all the variants in healthy people’s genomes, they look very different," Bejerano told me.
The basic problem, he explained, is that everyone has about 10,000 tiny blips or variants in the protein-making parts of their genetic codes where one base pair in the DNA differs from the typical human genetic sequence. Nearly all of these blips are harmless. But in kids who have had inexplicable symptoms from birth, there is a reasonable chance that one or two of these small genetic changes explains their illness.
In their evaluations, geneticists try to zero in on these one to two changes that are most likely to cause disease. For instance, they ignore genetic variants that are common in the general population, since we expect rare diseases to be caused by rare mutations. They’ve been winnowing down the list of changes to evaluate by hand to around 300 per patient. M-CAP chops the list further, to about 120 variants, and Bejerano’s team expects it will become even more specific as genetic-disease research advances.
Importantly, M-CAP is much more accurate than older methods for automatically sorting genetic variants. Such methods mis-label one-quarter to one-third of disease-causing mutations as harmless. M-CAP makes this error only 5 percent of the time.
"Our challenge was to try to make the shortest list we could of all the variants that look particularly nasty, not just rare and potentially functional," Bejerano said. "But even more important is making sure that we don't tell people that disease-causing mutations are benign."
Stay tuned for more about the challenges of diagnosing rare genetic diseases in my upcoming feature story in Stanford Medicine magazine. It'll be out in mid-November.
Previously: Automating genetic analysis could speed diagnosis of rare diseases, Individuals' medical histories predicted by non-coding DNA in Stanford study and Crying without tears unlocks the mystery of a new genetic disease
Photo by Petra B. Fritz