Event Details

Nov06Thu

PhD Thesis Proposal - Bhavya Vasudeva

Thu, Nov 06, 2025
3:00 PM - 5:00 PM
Location: GCS 302C
Title: Towards Demystifying Modern Deep Learning: From Architecture and Optimization to Emergent Capabilities

Date and Time: Thursday, November 6, 2025 | 3:00pm - 5:00pm

Location: Ginsburg 302C

Committee Members: Vatsal Sharan, Robin Jia, Haipeng Luo, Mahdi Soltanolkotabi, Christos Thrampoulidis

Abstract: Modern deep learning systems, particularly large language models (LLMs), are now deeply embedded in real-world infrastructure, making their reliability and scientific understanding essential. This work seeks to uncover how the architecture, the optimization algorithms, and the training data/objectives shape the generalization and emergent abilities of LLMs. First, since most LLMs are based on the transformer architecture, we characterize its inductive bias to learn low-sensitivity functions, and analyze the training dynamics of the self-attention mechanism—the core component of transformers. Second, as most modern large-scale models are trained with adaptive optimizers like Adam and more recently, spectrum-aware optimizers like Muon, we investigate the reasons and mechanisms underlying their effectiveness compared to SGD. Third, we study in-context learning—the remarkable ability of LLMs to learn tasks from a few demonstrations given in context. This line of work studies the inductive biases of in-context learning in transformer-based language models as well as how they disentangle and compose latent concept representations in different settings. The proposed work aims to explain how such extrapolative generalization arises—specifically, how the ability to learn in context emerges in models trained on open-ended text. Together, these studies aim to strengthen the scientific foundations of modern deep learning.