Skip to content

Stanford researchers use data mining to show safety of peripheral artery disease treatment

Every day, doctors across the country take down reams of information about their patients. Those notes are a treasure trove of information about preventive treatments, side effects of drugs and drug interactions. But most of this information has been hard to access - until now.

In the past, researchers who really wanted the data have combed through individual records - which can be hundreds of pages - manually. When I worked as an epidemiologist at the California Department of Public Health, for example, I worked on a large study that looked at prenatal hospital records for infectious disease lab testing – and our team of a dozen researchers took months to go through a few hundred records. It’s not an efficient approach for most diseases medical researchers might want to study. Recently launched national efforts also recognize this limitation and call for turning data from regular doctor visits into data points to figure out what is best for patients.

So when I learned that Nigam Shah, MBBS, PhD, a biomedical informatics professor at Stanford, has been looking at ways to pull out information electronically from clinical notes, my ears perked up.

In a paper published today in the journal PLOS ONE, Shah and co-authors Nicholas Leeper, MD, a Stanford cardiologist and vascular medicine specialist, and Anna Bauer-Mehren, PhD, an informaticist who recently moved from Stanford to Roche Germany, used a new methodology to answer a nagging question about the safety of Cilostazol, the only drug with the American Heart Association’s highest effectiveness rating - Class 1A - for treating the symptoms of peripheral arterial disease, a condition that affects millions of Americans. Regulators fear the drug might have side effects on the cardiovascular system that could lead to death, so the drug has historically carried a “black-box warning.” As a result, the use of this drug has been limited.

Looking at a specialized system that includes health-research data from millions of patients seen at Stanford Hospital over 18 years, the researchers found no evidence that patients with peripheral arterial disease who took Cilostazol suffered the side effects about which doctors were worried, compared with patients who didn't receive the drug. By querying these records, the researchers identified a subset of patients they felt were at highest risk - a group which is often excluded from company-sponsored trials - and found no evidence of the side effects.

In April, Shah published a related paper describing his method. As he said then, the challenging part is developing an accurate list of key terms related to the condition researchers want to study, and then using that keyword list to fish the relevant patients from the thousands that come to the hospital:

“If you ask any audience related to health care how much of the clinical knowledge is bundled up in text, you won’t get an answer below 70 percent,” said Shah. “If 70 to 80 percent of the data is locked up in text notes, we asked ourselves, ‘What would be a good way to unlock it?’”

Today’s paper is one example of how data “unlocked” from previous patients’ records can aid doctors making decisions for future patients. Shah’s method simplifies that process immensely and makes it possible to ask many more interesting research questions that can open up intriguing areas of inquiry.

Shah and Leeper both say that further research about Cilostazol’s safety is needed, but this study is a good building block because it suggests further clinical studies are likely to be safe for the patients who participate. Finding that out any other way would have been expensive and labor-intensive.

The paper has been published, fittingly, on the second day of the Big Data in Biomedicine conference on the Stanford campus, which Shah is participating in. He says he’s eager to see how such methods can be used to derive practice-based evidence from large clinical data warehouses - a key aspect of the vision for a national learning health system.

Rina Shaikh-Lesko is a science-writing intern for the medical school's Office of Communication & Public Affairs. She is a student in the Science Communication Program at University of California-Santa Cruz.

Previously: A call to use the “tsunami of biomedical data” to preserve life and enhance health, Atul Butte discusses why big data is a big deal in biomedicine, Mining data from patients’ charts to identify harmful drug reactions, Thousands of previously unknown drug side effects and interactions identified by Stanford study and Unexpected drug interactions identified by Stanford data mining

Popular posts