Event Details

Oct09Thu

Grant Schoenebeck (University of Michigan) - Eliciting Informative Text Evaluations with Large Language Models

Thu, Oct 09, 2025
11:00 AM - 12:00 PM
Location: GCS 202 C
Speaker: Grant Schoenebeck, University of Michigan

Talk Title: Eliciting Informative Text Evaluations with Large Language Models

Abstract: In a wide variety of contexts including peer grading, peer review, and crowd-sourcing (e.g. evaluating LLM outputs) we would like to design mechanisms which reward agents for producing high quality responses that are accurate and strategically robust. Unfortunately, in many situations, computing rewards by comparing to ground truth or gold standard is cumbersome, costly, or impossible. Other methods, such as “llm-as-a-judge” are typically manipulable.
Peer prediction mechanisms, which use a peer report as a refence, motivate high-quality feedback with provable guarantees. However, current methods only apply to rather simple reports, like multiple-choice or scalar numbers. We aim to broaden these techniques to the larger domain of text-based reports, drawing on the recent developments in large language models. This vastly increases the applicability of peer prediction mechanisms as textual feedback is the norm in a large variety of feedback channels: peer reviews, e-commerce customer reviews, and comments on social media.
I will introduce mechanisms that utilize LLMs as predictors, mapping from one agent’s report to a prediction of her peer’s report. Theoretically, we show that when the LLM prediction is sufficiently accurate, our mechanisms can incentivize high effort and truth-telling as an (approximate) Bayesian Nash equilibrium. Empirically, our mechanisms demonstrate competitive correlations with human scores compared to the state-of-the-art GPT-4o Examiner, and outperform all other baselines. Additionally, they are more robust against strategic manipulation. Finally, on an ICLR dataset, our mechanisms can differentiate three quality levels — human written reviews, GPT-4-generated reviews, and GPT-3.5-generated reviews in terms of expected scores.

Biography: Grant Schoenebeck is an associate professor at the University of Michigan in the School of Information. His work has recently focused on develop and analyze systems for eliciting and aggregating information from of diverse group of agents with varying information, interests, and abilities by combining ideas from theoretical computer science, machine learning, and economics (e.g game theory, mechanism design, and information design). More generally his recent work has been about incentives and (machine) learning in a variety of contexts.

Host: Yan Liu