Perhaps you're familiar (or very familiar) with PubMed, the go-to database for biomedical research. Or, perhaps you've spent a rainy Sunday exploring historical anatomical images. What you might now is that they're both administered by the National Library of Medicine, an institute within the National Institutes of Health that traces its roots to a collection of textbooks in the library of the Army Surgeon General in 1836.
Now it's the largest biomedical library in the world, whose influence extends throughout biomedicine. I spoke recently with its director, Patricia Flatley Brennan, PhD, about the library, trends in big data and more. She is scheduled to speak at the Big Data in Biomedicine Conference, which will be held on campus May 24 and 25.
Can you tell me about the library's work with big data?
We first of all collect, catalog and disseminate information, such as through PubMed. We have a number of other kinds of databases, including a data base of genetic sequences and phenotypes, of medications and of medical images, among others.
We're extending that into data science, but the library isn't going to be the big data repository in the sky. Most of the data is too big to move around. The library will become as much a pointer to other collections as to its own collections. We also provide best practices and resources to help a university, for example, build a data repository and make that accessible to the public.
One of our roles is to ensure there is a common terminology to label health data and that terminology is not a barrier to interoperability. We have units that maintain vocabularies so that, for example, one test for blood glucose can be compared with another test for blood glucose.
The idea of the NIH being the nexus of health data really requires that we think about how to do that in a sustainable fashion.
What do you mean by sustainability?
Sustainability has a lot to do with findability. One aspect is making sure the data is in an interoperable, reusable, labeled, clean and computable form. Sustainability also encompasses economics. It's expensive to make data available and secure. Some data needs to be available instantaneously — such as information on recently reported cases of salmonella. But not all data needs to be available. How fast will one need the data and for how long? Another aspect is considering what to do about data enrichment -- by marrying databases or adding new information, researchers create two versions of the database, which requires a whole different catalog.
What are you planning to discuss at the meeting?
I'm going to talk, in part, about personal data libraries. We have to find a way to explain it that helps people see their own value and role in it, to connect the clinicians and researchers with patients or people. People are experts in everyday living... The experience of each person and their health, their personal and very intimate experience, goes into a personal data library. A personal data library includes information about me that is key to my health -- such as from a wearable device -- but it also includes knowing more about the world that could pertain to me. For example, what if I experience headaches every Tuesday at 4 p.m. and that's the day the local plant lets out its exudate. That information could be useful in my personal health library or diary.
Can I collect that information so that I might gain insights into my health, and then store in a safe way? Should you entrust it with an individual clinician or with an institution? When you are managing the information for yourself, you are in charge. We have to open up the scientific communities' minds about thinking about data in a very different way.
Previously: Stanford's Big Data Conference returning, Countdown to Big Data in Biomedicine: Heart wellness in an era of precision health and Countdown to Big Data in Biomedicine: Improving health using network science
Photo courtesy of NLM