About 6,000 tweets are sent every second and they aren’t all about celebrities. Posts about health or illness can be tremendously valuable to health care professionals, allowing them to track trends, spot epidemics and to assess the quality of services provided by health facilities, to name just a few uses.
But how can the researchers make sense of this flood of data? To find out, I spoke with Sidhartha Sinha, MD, an assistant professor of medicine at Stanford, who analyzes social media posts to better understand patient and societal perceptions.
What sparked your interest in online data?
While there are certainly downsides with working with unstructured data from sources such as social media and online patient forums, there are also tremendous advantages, including the scope of patients we can reach.
For example, in our current work analyzing data from an online patient forum for patients with inflammatory bowel disease, we are able to access tens of thousands of posts from patients with IBD. These patients are describing a variety of issues around their experience with the disease — such as their therapy side effects (some of which have not been seen before and may offer early insights), psychosocial issues with chronic disease, and opinions regarding treatments and interventions.
By analyzing this data, we are in effect listening to these patients’ experiences and hopefully gaining insights to better treat the disease.
I understand you've also used online data to better understand public sentiment — could you describe that?
One of the most important things health care providers do is try to prevent disease. And one of the best means to do this is through disease screening. However, millions of people do not get age-appropriate screening for diseases such as breast cancer or colon cancer.
My group’s initial work targeted understanding the perceptions around cancer screening tools. Understanding how people feel about these screening interventions — particularly on the scale we’re able to examine using social media — allows us to not only identify barriers, but also further ascertain methods that work.
How did you do that?
Tens of thousands of tweets mentioning screening tests are created weekly. And while there are clear limitations to the quality of data and its generalizability, the sheer volume of data that we can access is much larger than most other means such as conventional surveys, which carry their own significant limitations. So we developed and validated a machine learning algorithm to classify sentiment (positive, negative, or neutral) around mentions of three common cancer screening tools: colonoscopy, mammography and PAP smears.
We found more negative sentiment expressed for colonoscopy and more positive sentiment for mammography. For example, the words ‘fear’ and ‘pain’ were commonly associated with negative sentiment. We also found that posts that were negative in sentiment spread more rapidly through social media than positive posts.
How are these findings being used?
Knowing the types of postings that reach more users, and some of the common issues expressed in them, could certainly influence how professional societies develop outreach interventions to improve engagement with preventive health efforts.
Based on our initial findings, we are developing additional algorithms to hopefully better understand patient and societal perceptions of disease. We are also now engaged with professional societies such as the Crohn’s and Colitis Foundation to provide organizations with improved methods to understand patient needs and promote health.
Image by Max Pixel