Data Resources

OSoMe is home to a number of datasets that are available to the IU research community. Many of these data resources, including the Botometer Pro API and the many datasets on Zenodo, are also available to non-IU researchers.

For more information or to gain access to any of the datasets, use the Contact Form to make a Data Request. Please include details such as your research purpose, the specific dataset you are interested in, and any relevant affiliations.

Public Data

  • BlueSky
    • Firehose Archive - An archive of the BlueSky firehose from August 2023 to the present. The dataset is updated daily.
    • Day 1 Archive - A snapshot of Bluesky collected at the end of April 2024. The dataset contains an archive of everything on Bluesky from Day 1 (November 17, 2022) through ~April 30, 2024 that was visible at the time of collection.
    • Note: This data is publicly available from Bluesky and is not subject to terms of service that restrict its sharing. It contains personally identifiable information and is not anonymized. Researchers should seek review from their institutional review board (IRB) before using this data. Additionally, when processing and cleaning the data, users should comply with any deletion requests in the firehose to respect user privacy.
  • Mastodon Archive - An archive of posts from Mastodon starting in August 2023. Initially collected from only 2 of the largest Mastodon instances, we are now streaming from more than 50 of the most active instances. Updated daily and, as of November 12, 2024, is 475GB compressed.
  • Botometer Pro API - Use the official Botometer API to retrieve botscores from Botometer X or calculate new botscores with data provided by the user.
  • Publication Data - Datasets that are publicly available on Zenodo to accompany various publications. These datasets cover a range of social media studies, including misinformation spread, bot behavior, and content analysis.
  • Vendor Purchased Bot Raw Data - Raw data used to construct the vendor-purchased-2019 dataset.
  • Media Bias Fact Check list - A snapshot of news sources graded by Media Bias Fact Check.
  • IO Datasets - Labeled Datasets for Research on Information Operations.

Indiana University Affiliates Only

  • Meta Content Library - Public Facebook, Instagram, and Threads data available through Meta's secure interface. Data includes public posts, comments, and engagement metrics. While OSoMe currently has access, other researchers need to apply independently. We can assist with this process.
  • TikTok Research API - The TikTok Research API allows access to account data and content metadata. OSoMe has access to this API and can assist researchers in gaining their own access.
  • NewsGuard - OSoMe has a subscription to NewsGuard data, which can be used by OSoMe affiliated faculty. Alternatively, we can assist in acquiring a subscription for your own research.
  • Tavern - OSoMe maintains a large archive of historical data, sampled through a 10% stream from a major social media platform. Available to IU researchers upon request.
  • Covaxxy Raw Data - Raw data used to construct the CoVaxxy Tweet IDs data set.
  • Trains Raw Data - Raw data used in the paper The Manufacture of Partisan Echo Chambers by Follow Train Abuse on Twitter.
  • 2022 Midterm Election Raw Data - Raw data used, in part, to construct datasets in the MEIU22 collection.