PhD Thesis Proposal - Arka Sadhu
Fri, Dec 03, 2021
12:00 PM - 2:00 PM
Ph.D. Thesis Proposal - Arka Sadhu
Friday, Dec 3rd, 2021: 12pm-2pm
Title: Grounding Language in Images and Videos
Thesis Committee members: Prof. Ram Nevatia, Prof. Xiang Ren, Prof. Yan Liu, Prof. Stefanos Nikolaidis, Prof. Toby Mintz.
Abstract: Language grounding in images and videos -- the task of associating linguistic symbols to perceptual experiences and actions -- is fundamental to developing multi-modal models which can understand and jointly reason over images, videos and text.
It has garnered wide interest from multiple disciplines such as computer vision, natural language processing, and robotics. An essential element in this space involves formulating tasks that investigate a particular phenomenon inherent in image or video understanding in isolation, thereby encouraging the community to develop more robust models. In this thesis proposal, I will articulate four vision-language tasks developed during the course of my Ph.D., namely, grounding unseen words, spatio-temporal localization of entities in a video, video question-answering, and visual semantic role labeling in videos. For each of these tasks, I will further discuss the development of corresponding datasets, evaluation protocols, and model frameworks.
Zoom Link: https://usc.zoom.us/j/92383912262?pwd=N25ETlRMVFRiWTlKdGxtN09UVHhlQT09