Skip to content

Stanford librarians and volunteers help safeguard government data

Celsus LibraryIf you want to make a history buff sigh, mention the Library of Alexandria, a center of learning that may have held hundreds of thousands of ancient scrolls, or Turkey’s Library of Celsus, pictured here. Like all the great libraries of antiquity, they were eventually destroyed.

Modern digital information — stored on servers or in physical libraries — is just as vulnerable to destruction. But there are several efforts to make data more secure by backing it up, including a project supported by volunteers and librarians at Stanford, according to James Jacobs, a U.S. government information librarian at Stanford.

One project, called the End of Term (EOT) Presidential Harvest, was created in 2008 by the Library of Congress, the Internet Archive (also known as the Wayback Machine) and other archivists. At the end of every presidential term, the team of data specialists, professional librarians and volunteers searches and indexes all government information available, mainly at .gov or .mil websites. The 2008 and 2012 EOT efforts indexed and copied millions of webpages of government information.

The effort has heightened this year due to uncertainty about the Trump administration's plans. Just days into the new administration, substantive changes have been made at the Department of Agriculture and the Environmental Protection Agency websites. And any potential disappearance of health data from the Centers for Disease Control and Prevention, the U.S. Census Bureau or the National Institutes of Health, for example, could affect researchers at the School of Medicine and elsewhere.

Nigam Shah, MBBS, PhD, an associate professor of medicine, who specializes in analyzing biomedical data, said he knows many Stanford researchers whose work depends on government health data, including the National InPatient Sample dataset and the National Health and Nutrition Examination Survey.

But downloading databases can be hit or miss, Jacobs said. “The internet is a messy place." For example, the data in a database might be hidden behind search or GIS interfaces. And if those databases are taken offline, for all practical purposes, they cease to exist, Jacobs said.

For this year's effort, 300 volunteers nationally (up from 30 in years past) have jumped in to help since mid-October, Jacobs said. The EOT team is planning to work through the end of March, so if you have a favorite database, you can still nominate it to be archived.

There are also other efforts to save government data — such as Climate MirrorData Refuge and the Azimuth Climate Data Backup Project.

Previously: Stanford Medicine conference provided a big look at big dataCountdown to Big Data in Biomedicine: Building bridges for massive amounts of information
Photo by Benh Lieu Song

Popular posts