Technical Talks

Willem Pienaar
Willem Pienaar
Co-Founder & CTO | Cleric

Chaos by Design: Solving the Unsolvable AI Agent Testing Problem

video
Missing value detected...
Video will be populated after the conference

ABOUT THE TALK
  • Lightning Talks

Not all AI agent use cases are created equal. While code generation agents can be tested against clear benchmarks, operational agents tackling real-world problems face a fundamentally different challenge: how do you evaluate an agent that must navigate complex, dynamic systems without a predefined playbook? Take root cause analysis in distributed systems: an agent must understand intricate service dependencies, parse through inconsistent logs, and reason about potential failure modes. Unlike coding tasks with definitive right answers, these scenarios have no ground truth. Traditional testing approaches break down completely. This talk breaks down our approach to building a deterministic simulation environment that generates and tests realistic failure scenarios at scale. We'll expose why existing evaluation methods fail—from infrastructure mimicry to LLM-generated tests—and demonstrate a lightweight simulation technique that enables precise, reproducible agent testing.

Willem Pienaar

Co-Founder & CTO

Willem Pienaar

Cleric

Willem is the Co-Founder and CTO of Cleric, an AI Site Reliability Engineer that autonomously investigates and resolves production issues. He also created the Feast Feature Store, an open source project widely adopted for ML feature management. Prior to Cleric, Willem was a Principal Engineer at Tecton and led the ML Platform at Gojek.