Technical Talks

Chaos by Design: Solving the Unsolvable AI Agent Testing Problem
Missing value detected...
Video will be populated after the conference
ABOUT THE TALK
- Lightning Talks
Not all AI agent use cases are created equal. While code generation agents can be tested against clear benchmarks, operational agents tackling real-world problems face a fundamentally different challenge: how do you evaluate an agent that must navigate complex, dynamic systems without a predefined playbook? Take root cause analysis in distributed systems: an agent must understand intricate service dependencies, parse through inconsistent logs, and reason about potential failure modes. Unlike coding tasks with definitive right answers, these scenarios have no ground truth. Traditional testing approaches break down completely. This talk breaks down our approach to building a deterministic simulation environment that generates and tests realistic failure scenarios at scale. We'll expose why existing evaluation methods fail—from infrastructure mimicry to LLM-generated tests—and demonstrate a lightweight simulation technique that enables precise, reproducible agent testing.

Co-Founder & CTO
Willem Pienaar
Cleric
Willem is the Co-Founder and CTO of Cleric, an AI Site Reliability Engineer that autonomously investigates and resolves production issues. He also created the Feast Feature Store, an open source project widely adopted for ML feature management. Prior to Cleric, Willem was a Principal Engineer at Tecton and led the ML Platform at Gojek.
Discover the data foundations powering today's AI breakthroughs. Join leading minds as we explore both cutting-edge AI and the infrastructure behind it. Reserve your spot at before tickets sell out!