The use of huge amounts of data for research hinges on a key premise: that the privacy of the individuals involved will be protected. But, as a recent Stanford News article reminds us, extracting personal information from certain types of data might not be that difficult.
A team of computer scientists here showed that it's possible to discover personal details from telephone metadata, the collection of numbers called and length of calls gathered by some law enforcement agencies without warrants. The article explains:
The computer scientists built a smartphone application that retrieved the previous call and text message metadata – the numbers, times and lengths of communications – from more than 800 volunteers’ smartphone logs. In total, participants provided records of more than 250,000 calls and 1.2 million texts. The researchers then used a combination of inexpensive automated and manual processes to illustrate both the extent of the reach – how many people would be involved in a scan of a single person – and the level of sensitive information that can be gleaned about each user.
From a small selection of the users, the Stanford researchers were able to infer, for instance, that a person who placed several calls to a cardiologist, a local drugstore and a cardiac arrhythmia monitoring device hotline likely suffers from cardiac arrhythmia. Another study participant likely owns an AR semiautomatic rifle, based on frequent calls to a local firearms dealer that prominently advertises AR semiautomatic rifles and to the customer support hotline of a major firearm manufacturer that produces these rifles.
Many would consider these details private, although they don't involve names, ages or other traditional markers of identity.
Study co-author Patrick Mutchler, a Stanford computer science graduate student, reflects on the findings, which appeared in the Proceedings of the National Academy of Science:
I was somewhat surprised by how successfully we inferred sensitive details about individuals. It feels intuitive that the businesses you call say something about yourself. But when you look at how effectively we were able to identify that a person likely had a medical condition, which we consider intensely private, that was interesting.
There's bound to be much conversation about data and privacy issues here later this week: The Big Data in Biomedicine Conference is happening on the Stanford Medicine campus this Wednesday and Thursday.
Previously: Countdown to Big Data in Biomedicine: Building bridges for massive amounts of information, Countdown to Big Data in Biomedicine: Genomic data sharing is key, says UCSC's David Haussler and Locking the door on big-data risks to privacy
Photo by geralt