Event Details

Jul21Mon

PhD Dissertation Defense - Ming-Chang Chiu

Mon, Jul 21, 2025
11:00 AM - 1:00 PM
Location: RTH 306
Title: From Subgroup Discrepancy to Agentic Learning: A Data-Driven Approach to Building Robust and Contextually Aligned Vision-Language Models

Date: Monday, July 21st, 2025 11:00am-1:00pm

Committee Members: Xuezhe Ma (Chair), Haipeng Luo, Daniel E. O'Leary

Venue: RTH 306

Abstract: Recent advances in vision-language models (VLMs) have significantly improved machines’ ability to jointly understand visual and linguistic information, enabling impressive performance across image captioning, visual question answering, and multimodal reasoning. However, building VLMs that are robust, fair, and perceptually grounded—especially under real-world variability in color, environment, and background context—remains an open challenge. The focus of my research is to develop a learning framework that equips VLMs with perceptual fidelity, fairness awareness, and self-improving capability to function reliably in unconstrained and high-stakes environments.

Specifically, I aim to develop a VLM training and evaluation framework that is: (1) diagnostic: by uncovering subgroup discrepancies and spurious correlations that are hidden by average accuracy, (2) perceptually grounded: by benchmarking and training on datasets that emphasize color vision, environment understanding, and low-level visual cues, (3) instructional: by leveraging medium-grained, templated prompts that align model outputs with interpretable visual concepts, and (4) agentic: by enabling models to identify failure cases, consult external tools, and iteratively refine their own training without relying on superior teacher models. This dissertation presents a series of projects—FlowAug, ColorSense, MEGACOIN, and AIDE—that collectively advance the development of diagnostic, perceptually aligned, instruction-tuned, and self-improving VLMs.