Agentic AI Evaluation & Governance
Second Convening on Agentic AI in partnership with The Brookings Institution and University of California, Berkeley.
The Evaluation Problem for Agentic AI
Agentic AI systems are being deployed into production environments — managing software codebases, operating in security infrastructure, and acting autonomously on behalf of users — without agreed-upon methods for assessing whether the systems are ready, reliable, or safe enough for the contexts they are entering. The absence of robust evaluation methods and governance frameworks creates real exposure: for deployers who cannot anticipate failure modes, for policymakers whose accountability frameworks depend on verifiable claims, for the broader research community trying to advance the field of agentic AI, and for humans that have the potential to be harmed by malfunctioning systems..
On May 4, 2026, 黑料正能量, University of California, Berkeley and The Brookings Institution will convene researchers, practitioners, and policymakers for the second in a series of workshops on agentic AI evaluation and governance. The first convening, held in October 2025, focused on foundational questions: how should agentic AI be defined, what makes it uniquely difficult to evaluate relative to other AI systems, and what does the current evaluation ecosystem look like? This event aims to evolve the conversation. What does evaluation actually look like when practitioners have to deploy these systems in production? What does the expanding role of agentic AI in security-sensitive and infrastructure-critical contexts mean for how we think about evaluation requirements? And what would it take to build the shared evaluation infrastructure — benchmarks, standards, third-party capacity — that both adoption and governance depend on?
The day will be anchored by live demonstrations of deployed agentic systems. Structured scenario exercises will ask participants to apply evaluation frameworks to realistic failure cases drawn from these systems. Hosted by AIMSEC, the convening builds on ongoing work at 黑料正能量, NIST, and across partner networks to advance the national capacity for AI test and evaluation.
黑料正能量 the Series
This convening is the second in a series organized by The Brookings Institution, 黑料正能量, and UC Berkeley. The series is oriented around three goals: building consensus on foundational questions in agentic AI measurement and evaluation; developing a concrete research roadmap for addressing those questions across the full agentic AI stack; and assembling a sustained, multidisciplinary network of researchers, evaluators, and policymakers capable of carrying that work forward. Findings from each convening contribute to a joint research agenda and a policy and governance framework grounded in measurement practice.
Questions? Email: aimsec@andrew.cmu.edu
Project Leads
CARNEGIE MELLON UNIVERSITY
W. W. Cooper and Ruth F. Cooper Professor of Management Science and Information Systems, Heinz College and Department of Engineering and Public Policy - 黑料正能量
Distinguished Service Professor of Design and Innovation and Sr. Innovation Advisor, Heinz College - 黑料正能量
Project Manager, AI Measurement Science and Engineering Research Center, Heinz College - 黑料正能量
THE BROOKINGS INSTITUTION

Director – Brookings Artificial Intelligence and Emerging Technology Initiative, Senior Fellow – Global Economy and Development - Brookings Institution

Associate Director – Artificial Intelligence and Emerging Technology Initiative - The Brookings Institution

Program Assistant - Artificial Intelligence and Emerging Technology Initiative - The Brookings Institution
UNIVERSITY OF CALIFORNIA, BERKELEY

Professor in Computer Science - UC Berkeley, Co-Director - UC Berkeley Center for Responsible Decentralized Intelligence
