Data Resources

OSoMe is home to a number of datasets that are available to the IU research community. Many of these data resources, including the Botometer Pro API and the many datasets on Zenodo, are also available to non-IU researchers.

For more information or to gain access to any of the datasets, use the Contact Form to make a Data Request. Please include details such as your research purpose, the specific dataset you are interested in, and any relevant affiliations.

Public Data

  • BlueSky Archive - An archive of the BlueSky firehose from 2023 to the present. The dataset is updated daily, currently 675GB compressed and growing.
  • Mastodon Archive - An archive of posts from Mastodon starting in August 2023. Initially collected from only 2 of the largest Mastodon instances, we are now streaming from more than 50 of the most active instances. Updated daily and, as of November 12, 2024, is 475GB compressed.
  • Botometer Pro API - Use the official Botometer API to retrieve botscores from Botometer X or calculate new botscores with data provided by the user.
  • Publication Data - Datasets that are publicly available on Zenodo to accompany various publications. These datasets cover a range of social media studies, including misinformation spread, bot behavior, and content analysis.
  • Vendor Purchased Bot Raw Data - Raw data used to construct the vendor-purchased-2019 dataset.
  • Media Bias Fact Check list - A snapshot of news sources graded by Media Bias Fact Check.

Indiana University Affiliates Only

  • Meta Content Library - Public Facebook, Instagram, and Threads data available through Meta's secure interface. Data includes public posts, comments, and engagement metrics. While OSoMe currently has access, other researchers need to apply independently. We can assist with this process.
  • TikTok Research API - The TikTok Research API allows access to account data and content metadata. OSoMe has access to this API and can assist researchers in gaining their own access.
  • NewsGuard - OSoMe has a subscription to NewsGuard data, which can be used by OSoMe affiliated faculty. Alternatively, we can assist in acquiring a subscription for your own research.
  • Tavern - OSoMe maintains a large archive of historical data, sampled through a 10% stream from a major social media platform. Available to IU researchers upon request.
  • Covaxxy Raw Data - Raw data used to construct the CoVaxxy Tweet IDs data set.
  • Trains Raw Data - Raw data used in the paper The Manufacture of Partisan Echo Chambers by Follow Train Abuse on Twitter.
  • 2022 Midterm Election Raw Data - Raw data used, in part, to construct datasets in the MEIU22 collection.