Skip to main content

IU researchers use Jetstream to study social media posts with misinformation about the COVID-19 vaccine

IU’s John Bryden, Matthew DeVerna, and Francesco Pierri study the effects of Twitter posts that are spreading misinformation about the COVID-19 vaccine.

Research and discovery May 6, 2021

Have you ever wondered how misinformation on social media affects our everyday decisions such as whether we are willing to get the COVID-19 vaccine?

A team of researchers at the Observatory on Social Media (OSoMe, pronounced “awesome”) at Indiana University are setting out to answer this exact question. Led by John Bryden, executive director, the team has developed a tool called CoVaxxy. CoVaxxy visualizes the relationship between COVID-19 vaccine adoption and online (mis)information, specifically via Twitter. Bryden’s team, consisting of Matthew DeVerna, an informatics Ph.D. student and Francesco Pierri, a visiting researcher from Politecnico Di Milano, have contributed an enormous amount of work to this project.


John Bryden, executive director of IU’s Observatory on Social Media

Bryden and his team are using a carefully selected list of keywords obtained through a snowball sampling technique to identify English language Twitter posts related to COVID-19 vaccine adoption. They designed their architecture to collect and process large quantities of these data. The architecture is hosted on Indiana University’s National Science Foundation (NSF)-funded Jetstream cloud servers.

To maintain the integrity of the data, Bryden and his team incorporated redundancy. They maintain two streamer virtual machines (VMs) in different U.S. states so that if one suffers a fault, data can be used from the other. The instances connect to Twitter’s filtered stream Application Programming Interface (API) to collect Twitter posts matching any of the keywords in real time. Data from the two stream instances are then collated on a general-purpose VM where Bryden and his team run their data analysis. 

Virtual Machine server architecture for the CoVaxxy project.

When deciding where to host their architecture, Bryden said, “Initially we thought, this [Jetstream] looks good, but we didn’t know if it was right for us. However, George Turner and Peg Lindenlaub from Research Technologies (RT) spent time talking with us about how it would work and they alleviated a lot of our concerns about the amount of data space we’d have available.” 

So far, Bryden is very happy with his experience with Jetstream. “With Jetstream, it’s very easy to set up and we can configure the machines how we want them,” he said. “We found that it’s really useful and powerful because we can essentially set up different machines to do different roles and we have complete control over the machines ourselves.”

Francesco Pierri, a visiting researcher from Politecnico Di Milano

The data are also copied over to Indiana University’s Slate-Project high performance file system. In the future, Bryden intends to use Slate-Project and Indiana University’s Carbonate supercomputer to do a lot of machine learning processing of the tweets in order to look for anti-vaccine sentiment or to find people who are posting anti-vaccine messages. This type of work will take a lot of processing power and Bryden believes that Slate-Project and Carbonate are powerful enough to enable him to complete this work.

Matthew DeVerna, an IU Informatics Ph.D. student

The CoVaxxy dashboard consists of interesting and informative graphs that help visualize the percentage of people in a given state that are unwilling to accept the vaccine. There are also graphs visualizing the percentage of related hashtag usage, the percentage of all posted tweets using low credibility and mainstream sources, and the most shared low credibility and mainstream sources in the past week.

Bryden and his team wrote most of the software themselves that they are using to comb through every Twitter post and identify which websites are linked to each post. On average they are collecting and analyzing around 600,000-700,000 Twitter posts a day. On some days there are significantly more Twitter posts to review.

The CoVaxxy dashboard tracks and quantifies credible information and misinformation narratives over time, as well as their sources and related popular keywords. By collecting and displaying these data, Bryden and his team hope to encourage the public to be more vigilant about the information they consume on their daily social media feeds in the fight against COVID-19.

More stories