Health-care providers know there's a wealth of valuable information trapped in the hand-written notes on patients' charts. But the challenge of collecting and interpreting the data on a large scale remains to be solved. Now researchers at Stanford have taken a step forward in mining patient-based information by using existing language-analysis methods to identify drug side effects in advance of the Food and Drug Administration issuing official alerts.
My colleague writes in a release:
Although their application is new, their information-gathering methods are based on well-established text processing techniques. It’s also simpler and faster than current strategies used in the same arena, said [engineering research associate Paea LePendu, PhD, the lead author of the paper]. Content is first grouped via “ontologies,” which are information graphs organized by associative relationships instead of a rigid linear structure. For example, melanoma is a kind of skin cancer, and so is Kaposi’s sarcoma; by knowing “skin cancer” encompasses both kinds of cancer, the search process picks up this medical knowledge. The system also de-identifies patient information in the process, so sensitive data, such as names and addresses, doesn’t get revealed. With these methods, LePendu said, the technique allows them to process 11 million clinical notes in about seven hours on hardware no different from a laptop computer — a pace that other programs can’t match.
The information is also current: It’s generated from what is observed and recorded in the hospital or doctor’s office. That’s an advantage over the FDA's AERS reports, which rely on patients and health providers to make the additional effort to report adverse events.
The researchers developed the computerized method to sift through the contents of clinical notes in electronic medical records and used it to examine how often specific drugs and diseases were mentioned in roughly 10 million notes for about 1.8 million patients over 15 years. The goal was to organize these notes into a data-mining substrate they refer to as a patient-feature matrix. “Everyone is excited about the prospect of ‘big data’ mining on electronic health record data,” Shah said. “We demonstrate it in practice.”