As whole-genome sequencing gains ground, researchers and clinicians are struggling with how best to interpret the results to improve patient care. After all, three billion base pairs are a lot to sift through, even with powerful computers. Now genomicist Gill Bejerano, PhD, and research associate Harendra Guturu, PhD, have published in PLoS Computational Biology the results of a study showing that computer algorithms and tools previously developed in the Bejerano lab (including one I've previously written about here called GREAT) can help researchers home in on important regulatory regions and predict which are likely to contribute to disease.
When they tried their technique on five people who agreed to publicly share their genome sequences and medical histories, they found it to be surprisingly prescient. From our release:
Using this approach to study the genomes of the five individuals, Guturu, Bejerano and their colleagues found that one of the individuals who had a family history of sudden cardiac death had a surprising accumulation of variants associated with "abnormal cardiac output"; another with hypertension had variants likely to affect genes involved in circulating sodium levels; and another with narcolepsy had variants affecting parasympathetic nervous system development. In all five cases, GREAT reported results that jibed with what was known about that individual's self-reported medical history, and that were rarely seen in the more than 1,000 other genomes used as controls.
Bejerano and Guturu focused on a subset of regulatory regions that control gene expression. As I explained:
The researchers focused their analyses on a relatively small proportion of each person's genome -- the sequences of regulatory regions that have been faithfully conserved among many species over millions of years of evolution. Proteins called transcription factors bind to regulatory regions to control when, where and how genes are expressed. Some regulatory regions have evolved to generate species-specific differences -- for example, mutating in a way that changes the expression of a gene involved in foot anatomy in humans -- while other regions have stayed mostly the same for millennia. [...]
All of us have some natural variation in our genome, accumulated through botched DNA replication, chemical mutation and simple errors that arise when each cell tries to successfully copy 3 billion nucleotides prior to each cell division. When these errors occur in our sperm or egg cells, they are passed to our children and perhaps grandchildren. These variations, called polymorphisms, are usually, but not always, harmless.
The researchers used a software tool called PRISM to predict where in the regulatory regions transcription factors would bind, and whether an individual carried a polymorphism in that site that is likely to disrupt this binding. They then used GREAT to determine the most common set or sets of biological pathways controlled by nearby genes, and cross-referenced this information with what was known about that person's medical history.
The research is a fascinating example of flipping the traditional approach to genetic disease on its head. Rather than taking a patient with a medical condition, and looking to see if he or she has a mutation known to cause that disease, the researchers looked at the genome first and predicted the clinical outcome for the five participants. As Bejerano, who is a member of Stanford's Artificial Intelligence Lab, Child Health Research Institute, Neurosciences Institute, Cancer Institute and Bio-X, explained, "The beauty of having whole genomes available for study is that you can then ask completely agnostic questions. We set out to find hidden layers of susceptibility in the regulatory regions of these genomes. We were very pleased that our analysis gave such clear and significant associations between the mutations and medical histories."
Previously: "GREAT" Stanford tool to help researchers worldwide, Hey guys, sometimes less really is more and Study shows toothed whales have persisted millions of years without two common antiviral proteins