Skip to content

Strength in numbers: Harnessing public gene data to answer a diverse range of research questions

Nature News today takes a closer look at how Stanford systems-medicine chief Atul Butte, MD, PhD, and colleagues are mining a mountain of data to make medical discoveries, including new uses for old drugs and insights into the genetics of diabetes.

The article details how researchers are repurposing datasets and outlines the potential benefits of doing so:

Butte and his team are now using publicly available data to answer a diverse range of questions — [Purvesh Khatri, a post-doctoral research fellow in Butte's lab], for instance, hopes to discover secrets behind kidney-transplant rejection. “We don’t do wet lab experiments for discovery,” he says. Those are for validating hypotheses. The beauty of analysing data from multiple experiments is that biases and artefacts should cancel out between data sets, helping true relationships to stand out, Butte says. “There is safety in numbers.”

And those numbers are rising rapidly. Since 2002, many scientific journals have required that data from gene-expression studies be deposited in public databases such as GEO, which is maintained by the National Center for Biotechnology Information in Bethesda, Maryland, and ArrayExpress, a large gene-expression repository at the European Bioinformatics Institute (EBI) in Hinxton, UK. Some time in the next few weeks, the number of deposited data sets will top one million (see ‘Data dump’).

The result is an unprecedented resource that promises to drive down costs and speed up progress in understanding disease. Gene-sequence data are already shared extensively, but expression data are more complex and can reveal which genes are the most active in, say, liver versus brain cells, or in diseased versus healthy tissue. And because studies often look at many genes, researchers can repurpose the data sets, asking questions other than those posed by the original researchers.

Read more about Butte's work and the challenge of mastering the growing biomedical “big data” deluge in the latest issue of Stanford Medicine.

Previously: Mining medical discoveries from a mountain of ones and zeroes, The data deluge: A report from Stanford Medicine magazine, Stanford’s Atul Butte discusses outsourcing research online at TEDMED and Health-care experts discuss opportunities and challenges of mining ‘big data’ in health care
Photo by Susan Moberly, Wellcome Images

Popular posts