Eval Agents: How to Solve Error Cascades in Agents

Technical Talks

Agents or RAG chatbots are multi-turn AI systems. Multi-turn means interacting back-and-forth with humans. These systems face a fundamental challenge: errors compound and cascade with each interaction. In this talk, we'll go through real-world examples of agents failing in spectacular ways when one step goes wrong - overconfidence, manipulation, looping actions, and more. After doing so, we'll examine how agent builders use "eval agents" tuned on real-world interactions to evaluate agents and even use them as verifiers to improve performance in production! By the end of the talk, you'll have learned about the new world of trajectory evaluation needed to evaluate agents accurately.

Dhruv Singh

Co-founder & CTO | HoneyHive AI

Dhruv Singh has a diverse work experience in various roles and companies. He is currently the Co-Founder and CTO at HoneyHive AI. Prior to that, he worked at Microsoft as a Software Engineer, where he contributed to the development of frameworks for LLM developers on Microsoft's OpenAI Innovation team and also worked on projects in the Office of the CTO.. Dhruv won the Codex Innovation Challenge organized by the Office of the CTO during his time at Microsoft. Before joining Microsoft, Dhruv had internships at Otsuka Pharmaceutical Companies (U.S.) and Genomic Prediction in 2018. Dhruv also served as a Software Engineering Intern at Microsoft in 2019. Dhruv has a Bachelor of Science degree in Computer Science from Columbia University.

Technical Talks

Eval Agents: How to Solve Error Cascades in Agents

FEATURED MEETINGS

Follow / Join Us

Contact Us

Menu