Technical Talks

Willem Pienaar

Co-Founder & CTO | Cleric

Chaos by Design: Solving the Unsolvable AI Agent Testing Problem

ABOUT THE TALK

Lightning Talks

Not all AI agent use cases are created equal. While code generation agents can be tested against clear benchmarks, operational agents tackling real-world problems face a fundamentally different challenge: how do you evaluate an agent that must navigate complex, dynamic systems without a predefined playbook? Take root cause analysis in distributed systems: an agent must understand intricate service dependencies, parse through inconsistent logs, and reason about potential failure modes. Unlike coding tasks with definitive right answers, these scenarios have no ground truth. Traditional testing approaches break down completely. This talk breaks down our approach to building a deterministic simulation environment that generates and tests realistic failure scenarios at scale. We'll expose why existing evaluation methods fail—from infrastructure mimicry to LLM-generated tests—and demonstrate a lightweight simulation technique that enables precise, reproducible agent testing.

Co-Founder & CTO

Willem Pienaar

Cleric

Willem is the Co-Founder and CTO of Cleric, an AI Site Reliability Engineer that autonomously investigates and resolves production issues. He also created the Feast Feature Store, an open source project widely adopted for ML feature management. Prior to Cleric, Willem was a Principal Engineer at Tecton and led the ML Platform at Gojek.

Technical Talks

Chaos by Design: Solving the Unsolvable AI Agent Testing Problem

ABOUT THE TALK Lightning Talks

ABOUT THE TALK

Lightning Talks