.
D

isinformation researchers face a common problem: collecting data. Social media companies are notoriously frugal when it comes to releasing their data to the public—forcing researchers to pursue workarounds like open-source or self-coded web scraping scripts that toe the line between legal and illegal data collection. Without support from social media platforms, the capabilities of such scripts are limited. With Meta’s plan to shut down CrowdTangle—one of the only publicly available data collection tools supported by a social media company—in response to public relations (PR) issues, online disinformation research is poised to get harder and more legally dubious.

To understand the value of CrowdTangle, consider DisinfoLab’s recent study on disinformation resiliency in Eastern Europe. In an effort to analyze reactions to misleading articles in Hungary, Poland, and Estonia, the lab heavily relied on CrowdTangle’s Link Checker tool, one of multiple products offered by the service. The Link Checker tool allowed DisinfoLab to track and quantify the most influential Facebook posts sharing disinformation. While CrowdTangle captured users' engagement, DisinfoLab collected comments using the facebook-scraper Python package. With these metrics, DisinfoLab was able to gauge support or skepticism of disinformation by commenters in each country.

CrowdTangle expedites data collection and lowers barriers to entry for social media research, making it an indispensable tool for researching disinformation. First, CrowdTangle democratizes data collection, allowing researchers without coding proficiency or technical support to access aggregate data across social media posts. Absent CrowdTangle, these researchers would have to develop a web scraping tool using Facebook’s Graph API.

Second, CrowdTangle accelerates the data collection process, even for those proficient in coding. The tool offers a universal approach to collecting Facebook data without the need to build and test custom web scraping tools. Moreover, researchers would have to design these tools to maneuver around obstacles including Facebook Graph API’s rate limit, which caps the number of times a program can interface with Facebook in a given time period. CrowdTangle’s rate limit allows for significantly more Facebook queries. Although it is possible to emulate some of CrowdTangle’s functionality using Facebook’s Graph API, doing so is tedious, time-intensive, and requires substantial coding knowledge.

Despite CrowdTangle's facilitation of social media research, companies face mixed interests when it comes to supporting these studies. On the one hand, Meta’s support of CrowdTangle enabled not only DisinfoLab’s study on disinformation resiliency in Eastern Europe, but also illuminated investigations into health misinformation on Facebook, Russian-linked influence operations in Africa, and the online presence of right wing actors during the 2018 Swedish elections. These studies offer unique, data-backed insight into the ways disinformation spreads online and, in turn, how to combat it.

On the other hand, studies enabled by tools like CrowdTangle often expose social media companies to more scrutiny by uncovering the societal ills supported or exacerbated by social media. For example, a summer 2020 CrowdTangle-powered investigation found that right-wing commentators received “more engagement on their Facebook pages than mainstream news outlets”—prompting public backlash against Meta. The company responded to criticisms, suggesting that a post’s engagement is not necessarily indicative of its reach.

Unfortunately, Meta has decided that the research enabled by CrowdTangle is not worth the PR issues. In fall 2020, The Economist released a CrowdTangle-powered study finding that relative to the overall media ecosystem, Facebook page engagements skewed towards right-wing media outlets. The Economist report prompted an email chain between Meta executives with the subject line “The trouble with CrowdTangle.” In the chain, executives expressed concerns such as “our own tools are helping journos to consolidate the wrong narrative.” Some executives, seeking to shake the bad press, inquired whether CrowdTangle should release reach data for Facebook posts, noting that it was right-wing skewed engagement data at the heart of the controversy. Brandon Silverman, founder and head of CrowdTangle at the time, explained that releasing reach data could in fact exacerbate the PR crisis, noting that misleading articles on Facebook had some of the greatest reach. In April 2021, Meta opted to break up the CrowdTangle team—severely stalling updates and support for the service. Then in June 2022, Meta confirmed its plans to close CrowdTangle altogether.

PR incentives make it untenable to rely on social media companies to volunteer their data or support data sharing tools (like CrowdTangle) for disinformation research. Ensuring that academics and policy analysts can legally, reliably, and easily retrieve social media data will require shifting the incentives driving private companies. One such approach would be the Platform Transparency and Accountability Act (PTAA), which would require social media platforms to share insider platform data with vetted external researchers through a federally regulated process. The key principles behind the proposal—federal regulation, vetted research, and user privacy with respect to data sharing—set a strong foundation for developing legislation targeted at retrieving policy-relevant data from social media platforms. With CrowdTangle shutting down soon, it is critical to disinformation research that the federal government acts quickly.

About
Aaraj Vij
:
Aaraj Vij is the Co–Founder of VerbaAI LLC, a software startup dedicated to strengthening our information environment. He previously directed W&M DisinfoLab, leading multidisciplinary research on social media, artificial intelligence, and foreign malign influence campaigns.
About
Thomas Plant
:
Thomas Plant is an Associate Product Manager at Accrete AI and co–founder of William & Mary’s DisinfoLab, the nation’s first undergraduate disinformation research lab.
About
Alyssa Nekritz
:
Alyssa Nekritz is Managing Editor and Disinformation Analyst for DisinfoLab, a student-led think tank at the College of William & Mary’s Global Research Institute. She is a junior pursuing a B.A. in International Relations with a minor in Data Science from the College of William & Mary.
The views presented in this article are the author’s own and do not necessarily represent the views of any other organization.

a global affairs media network

www.diplomaticourier.com

Disinformation Research is about to get Harder

Photo by Shahadat Rahman via Unsplash.

August 21, 2022

Disinformation researchers face a common problem: collecting data. With Meta’s plan to shut down CrowdTangle in response to public relations issues, online disinformation research is poised to get harder and more legally dubious, writes DisinfoLab’s Aaraj Vij, Thomas Plant, and Alyssa Nekritz.

D

isinformation researchers face a common problem: collecting data. Social media companies are notoriously frugal when it comes to releasing their data to the public—forcing researchers to pursue workarounds like open-source or self-coded web scraping scripts that toe the line between legal and illegal data collection. Without support from social media platforms, the capabilities of such scripts are limited. With Meta’s plan to shut down CrowdTangle—one of the only publicly available data collection tools supported by a social media company—in response to public relations (PR) issues, online disinformation research is poised to get harder and more legally dubious.

To understand the value of CrowdTangle, consider DisinfoLab’s recent study on disinformation resiliency in Eastern Europe. In an effort to analyze reactions to misleading articles in Hungary, Poland, and Estonia, the lab heavily relied on CrowdTangle’s Link Checker tool, one of multiple products offered by the service. The Link Checker tool allowed DisinfoLab to track and quantify the most influential Facebook posts sharing disinformation. While CrowdTangle captured users' engagement, DisinfoLab collected comments using the facebook-scraper Python package. With these metrics, DisinfoLab was able to gauge support or skepticism of disinformation by commenters in each country.

CrowdTangle expedites data collection and lowers barriers to entry for social media research, making it an indispensable tool for researching disinformation. First, CrowdTangle democratizes data collection, allowing researchers without coding proficiency or technical support to access aggregate data across social media posts. Absent CrowdTangle, these researchers would have to develop a web scraping tool using Facebook’s Graph API.

Second, CrowdTangle accelerates the data collection process, even for those proficient in coding. The tool offers a universal approach to collecting Facebook data without the need to build and test custom web scraping tools. Moreover, researchers would have to design these tools to maneuver around obstacles including Facebook Graph API’s rate limit, which caps the number of times a program can interface with Facebook in a given time period. CrowdTangle’s rate limit allows for significantly more Facebook queries. Although it is possible to emulate some of CrowdTangle’s functionality using Facebook’s Graph API, doing so is tedious, time-intensive, and requires substantial coding knowledge.

Despite CrowdTangle's facilitation of social media research, companies face mixed interests when it comes to supporting these studies. On the one hand, Meta’s support of CrowdTangle enabled not only DisinfoLab’s study on disinformation resiliency in Eastern Europe, but also illuminated investigations into health misinformation on Facebook, Russian-linked influence operations in Africa, and the online presence of right wing actors during the 2018 Swedish elections. These studies offer unique, data-backed insight into the ways disinformation spreads online and, in turn, how to combat it.

On the other hand, studies enabled by tools like CrowdTangle often expose social media companies to more scrutiny by uncovering the societal ills supported or exacerbated by social media. For example, a summer 2020 CrowdTangle-powered investigation found that right-wing commentators received “more engagement on their Facebook pages than mainstream news outlets”—prompting public backlash against Meta. The company responded to criticisms, suggesting that a post’s engagement is not necessarily indicative of its reach.

Unfortunately, Meta has decided that the research enabled by CrowdTangle is not worth the PR issues. In fall 2020, The Economist released a CrowdTangle-powered study finding that relative to the overall media ecosystem, Facebook page engagements skewed towards right-wing media outlets. The Economist report prompted an email chain between Meta executives with the subject line “The trouble with CrowdTangle.” In the chain, executives expressed concerns such as “our own tools are helping journos to consolidate the wrong narrative.” Some executives, seeking to shake the bad press, inquired whether CrowdTangle should release reach data for Facebook posts, noting that it was right-wing skewed engagement data at the heart of the controversy. Brandon Silverman, founder and head of CrowdTangle at the time, explained that releasing reach data could in fact exacerbate the PR crisis, noting that misleading articles on Facebook had some of the greatest reach. In April 2021, Meta opted to break up the CrowdTangle team—severely stalling updates and support for the service. Then in June 2022, Meta confirmed its plans to close CrowdTangle altogether.

PR incentives make it untenable to rely on social media companies to volunteer their data or support data sharing tools (like CrowdTangle) for disinformation research. Ensuring that academics and policy analysts can legally, reliably, and easily retrieve social media data will require shifting the incentives driving private companies. One such approach would be the Platform Transparency and Accountability Act (PTAA), which would require social media platforms to share insider platform data with vetted external researchers through a federally regulated process. The key principles behind the proposal—federal regulation, vetted research, and user privacy with respect to data sharing—set a strong foundation for developing legislation targeted at retrieving policy-relevant data from social media platforms. With CrowdTangle shutting down soon, it is critical to disinformation research that the federal government acts quickly.

About
Aaraj Vij
:
Aaraj Vij is the Co–Founder of VerbaAI LLC, a software startup dedicated to strengthening our information environment. He previously directed W&M DisinfoLab, leading multidisciplinary research on social media, artificial intelligence, and foreign malign influence campaigns.
About
Thomas Plant
:
Thomas Plant is an Associate Product Manager at Accrete AI and co–founder of William & Mary’s DisinfoLab, the nation’s first undergraduate disinformation research lab.
About
Alyssa Nekritz
:
Alyssa Nekritz is Managing Editor and Disinformation Analyst for DisinfoLab, a student-led think tank at the College of William & Mary’s Global Research Institute. She is a junior pursuing a B.A. in International Relations with a minor in Data Science from the College of William & Mary.
The views presented in this article are the author’s own and do not necessarily represent the views of any other organization.