Digging into diversity to understand diabetes

Studying the human genome -- the code that determines how the body is put together and operates -- has helped scientists decipher the root of many diseases. Even so, there are still holes (some might say gaping ones) in our knowledge of genetic disease.

That's particularly true when it comes to the causes and risk factors that lead to genetically complex diseases, such as Type 2 diabetes. Researchers call these complex polygenic diseases because they arise from hundreds of small changes to the genome combined with a person's environment and lifestyle.

In the sea of every individual's unique genome, finding the precise variants that give rise to such diseases can be, simply put, a massive challenge.

One of the most powerful ways to mine this trove of data is a genomewide association study (GWAS), in which researchers measure genetic variation at millions of locations in the genome, then study how these differ in people with and without a given disease. Not all variants change one's risk for a disease, and even if a variant does confer risk, its impact is often quite small. That's why large sample sizes are required to robustly identify variations which are linked to a health risk.

Most of the studies conducted to date have used existing genetic data, which has largely been collected from European countries and mostly reflects people of European ancestry. This lack of diversity begs the question: Are these studies' findings applicable to everyone?

In a study published May 12 in Nature Genetics, Stanford Medicine diabetes researcher Anna Gloyn, DPhil, working with a large group of international collaborators, analyzed data from 181,000 people with Type 2 diabetes and more than 1 million people with healthy blood sugar levels. Their study is the largest and the most diverse, including patients from South Asian, Southeast Asian, Black and Latino ancestry groups. The study linked more than 117 genes to an increased risk of Type 2 diabetes -- 40 of which had not previously been identified in earlier studies.

Gloyn's research focuses on how these diabetes-associated DNA changes influence how the insulin producing cells of the pancreas work and alter a person's risk of developing diabetes. I spoke with Gloyn to learn more about the study and why diverse datasets matter in research.

What new insights into Type 2 diabetes are gained from having a genetic database that includes a more diverse population of patients?

Most human genome-wide genetic studies over the past 20 years have focused on European populations, which has a major limitation: It teaches us primarily about how diabetes develops in Europeans. Of course, diabetes affects everybody around the world and, although there are similarities in risk between populations, there are also important differences. To understand diabetes and treat it equitably, we need to use diverse populations in our studies.

There's also a scientific advantage to broadening the data pool. GWAS results are complicated because they tell only part of the genetic story. The data can point us toward a particular region of the genome, but they can't always pinpoint an individual genetic variant that changes risk for diabetes. Including diverse populations with different genome structures can help us reduce the number of contenders for "causal variant," or a genetic variation that actually confers increased risk.

Data from European populations can tell us that a variant of interest is 1 in 100 in a pool of variants residing in a particular "neighborhood" of the genome. Diverse populations have different neighborhood borders, so comparing the regions where variants overlap can reduce the search space a more granular level, leaving us with perhaps 10, or in some cases a single, variant of interest to study further.

How do we figure out which variants in the region are truly causative?

Even if we can reduce the number of potentially causal variants from 100 variants down to 10, we still have to identify which of these 10 is the culprit.

If one of the 10 variants is in a gene that codes for a protein, that's a sign that it could be important. But most of the time, the variants are not so easy to interpret. In diabetes, we know that there is a suite of tissues in our body that are particularly important for controlling our blood sugar levels, including pancreatic islet cells as well as fat, liver and muscle tissues. If we find a variant sitting in a region of DNA that is important for controlling protein production in one of these tissues, that's a hint that it's much more likely to be causal.

We do that kind of study a lot -- it's called a functional GWAS. It's a great way of prioritizing a variant. Of course, it doesn't prove causation. For that, we need to run experiments in our lab in diabetes relevant cells to determine how these genetic changes impact the actual functioning of glucose metabolism in tissues.

Can we use these variants to better screen for diabetes?

Although genetic risk scores are not used routinely in clinical practice, they are used in a research and commercial setting. For example, companies like 23andMe perform in-house genetic prediction for Type 2 diabetes using their own genetic risk scores, which are informed by their own data and those which are publicly available.

Personally, I know that my genetic risk is above average and, as I get older, I will be even more careful about managing my weight and lifestyle. Identifying people who are at greatest risk for developing Type 2 diabetes could allow us to focus the often limited health care resources on these individuals to help prevent its development.

What's next for this research?

First, researchers can use this data to develop tests for predicting genetic risk of Type 2 diabetes that can be deployed across more diverse populations, not just among Europeans.

Second, the variants found in this study are a rich resource for laboratory researchers, like myself, trying to drill down on mechanisms for what happens in the body and ultimately leads to diabetes. Each of the diabetes-associated variants is a clue that may reveal pathways important to the underlying biology of diabetes. The idea is that researchers in both academia and pharmaceutical companies can then use that information to identify safe and effective targets for therapeutic development.

Photo by Jakub Krechowicz