Event Details

Jun18Wed

PhD Dissertation Defense - Siyi Guo

Wed, Jun 18, 2025
2:00 PM - 3:30 PM
Location: RTH 114
Title: Learning Social Representations for Causal Understanding of Heterogeneous and Dynamic Online Behavor

Date and Time: Wednesday, June 8th, 2025 | 2:00pm

Location: RTH 114

Committee Members: Kristina Lerman, Emilo Ferrara, Urbashi Mitra

Abstract: Social media has become a dominant force in shaping public discourse, civic engagement, and individual behavior, offering a rich but challenging environment for studying human beliefs, behaviors, and decision-making at scale. However, modeling user behavior on these platforms is complicated by multiple challenges---the massive volume of data, multi-modality, heterogeneity across users and platforms, rapidly evolving dynamics, scarcity of annotations, and difficulty in causal analysis from observational data.

This dissertation presents a framework for monitoring, explaining, modeling, and intervening in online user behavior, that addresses the challenges of handling complex and dynamic social media data. First, to understand users' reactions to real-world events in a dynamic online environment, we propose an unsupervised methodology for detecting and explaining collective emotional reactions to events, leveraging transformer-based affect modeling and topic-guided explanations. Second, we introduce SoMeR, a self-supervised, multi-view user representation learning framework that captures diverse user behaviors across text, temporal patterns, profiles, and networks, and generalizes across platforms and tasks. Third, we develop DAMF, a domain-adaptive moral foundation inference model that enables robust supervised language modeling from heterogeneous annotated datasets through adversarial training and label distribution balancing. Finally, we propose CausalDANN, a novel framework for estimating causal effects of direct text interventions using LLM-generated counterfactuals and domain adaptation to mitigate distributional shifts.

Together, these contributions advance computational social science by addressing core challenges in tracking and modeling the temporal dynamics and heterogeneity of online user behavior. The methods developed in this thesis integrate causal inference, representation learning, and time series analysis to enable scalable, generalizable, and causally grounded understanding of social media data. In this thesis, I demonstrate their utility by applying the tools to detect and explain online reactions to offline events, identify and forecast harmful behaviors such as coordinated campaigns and hateful speech, understand the evolution of polarized discussions, infer moral values expressed in online language, and evaluate the causal impact of language on moral judgment. This work offers practical tools for researchers and policymakers seeking to better understand and engage with digital populations in complex, polarized, and fast-changing online environments.