IC2S2 2024 Tutorial

Exploring Emerging Social Media: Acquiring, Processing, and Visualizing Data with Python and OSoMe Web Tools

July 17, 2024

IC2S2 Registration (closes July 3, 2024): https://ic2s2-2024.org/register

In the digital age, social media platforms have become crucial for societal interaction and communication. Computational social science, especially social media research, has shed light on crucial insights such as detecting bots, identifying suspicious activities, and uncovering narratives. Underlying these findings is the combination of large-scale data and network science techniques that reveal user connectivity and interactions.

There have been significant shifts in the landscape of computational social science research in recent years. New restrictions on data access policies of widely used platforms pose significant challenges to the types of research that can be conducted. On the other hand, emerging platforms that offer open data access, like Bluesky and Mastodon, have seen a surge in popularity, opening opportunities for investigation. Additionally, the rapid development of large language models (LLMs) provides new insights to represent and understand published content. The Observatory on Social Media (OSoMe) addresses these challenges and opportunities by focusing on developing data acquisition tools for emergent platforms, providing historical datasets, and synthetic data, and developing novel data analysis tools and techniques.

This tutorial aims to guide participants through these new developments, highlighting the current approaches for accessing social media data, including the use of OSoMe's infrastructure to acquire social media data or generate data from a model of a social media platform, and methodologies to understand this data. Attendees will learn to build various network types, extending beyond traditional interactions like replies and re-posts to include co-post and co-hashtag networks, enabling diverse data representations for different research needs. The tutorial will cover network science techniques, including basic network features, centrality measures, and community detection, along with techniques for building and analyzing text-based embeddings, such as those generated by the Sentence-BERT method. The tutorial will also cover techniques to extract narratives and attribute content-aware labels to communities.

Moreover, participants will be guided through advanced visualization tools like Helios-Web, enhancing their ability to explore and disseminate their findings effectively. The tutorial will be conducted in Python and utilize Jupyter notebooks preloaded with datasets and scripts. These materials will be open-source and available on GitHub, providing participants with a toolkit to kickstart or advance their social media research endeavors.

This tutorial is suited for anyone with an interest in social media analysis, encompassing a wide range of disciplines and expertise. While participants with prior knowledge in Python will be able to maximize their learning experience, the tutorial is designed to be inclusive and accessible to those with varying levels of technical proficiency. OSoMe tools will be available via web interfaces, and the prepared Python environment is structured to allow easy modifications for processing different datasets.


  • Filipi N. Silva, Research Scientist, Observatory on Social Media

  • Bao Tran Truong, Ph.D. candidate in Informatics, Observatory on Social Media

  • Wanying Zhao, Ph.D. candidate in Complex Networks and Systems, Indiana University

  • Kai-Cheng Yang, Postdoctoral research associate at the Network Science Institute, Northeastern University

Preliminary Program

Demonstration of the OSoMe tools and data acquisition (60 min)

  • Utilizing OSoMe tools for analyzing and acquiring data, including the network tool, Top FIBers, OSoMe Mastodon Search, Botometer-X, etc.
  • Data acquisition from an emerging platform using OSoMe infrastructure.
  • Generate synthetic data from SimSoM, a minimal model that simulates information-sharing on a social media platform.

Building Networks and Embeddings, Simple Analysis (60 min)

  • Preprocessing, filtering, cleaning.
  • Constructing interaction networks (re-post, reply, mention).
  • Building co-hash and co-post networks.
  • Conducting simple analyses using igraph.
  • Generating embeddings of posts using BERT.
  • Employing similarity measures to find posts with similar content.
  • Illustrating a classification task using embeddings.
  • Demonstrating the use of semantic axes to illustrate polarity.

Visualization of Networks and LLM Integration to Explore Narratives (60 min)

  • Using Helios-web for visualizing user networks and post embeddings.
  • Extracting and analyzing communities.
  • Identifying and discussing the narratives prevalent in each community, based on their content.