Technical Talks

From Scaling to Observability: Solving Key Challenges for Distributed ML with Ray
Missing value detected...
Video will be populated after the conference
ABOUT THE TALK
- ML OPs & Platforms
As machine learning workloads grow increasingly complex, distributed training across thousands of nodes presents significant challenges. This talk explores how the Ray library ecosystem tackles critical issues in multi-node ML training, focusing on development, orchestration, and comprehensive observability. Attendees will learn about innovative solutions for tracking system data, managing potential failure points, and implementing robust observability workflows that persist critical information.

Software Engineer
Nikita Vemuri
Anyscale
Nikita Vemuri is a software engineer at Anyscale, where she focuses on developing observability features across Ray and the Anyscale platform to help developers debug and monitor their large scale AI workloads. She joined Anyscale as one of the early engineers and has contributed to multiple initiatives across the platform stack over the last 4 years. As a UC Berkeley grad, she earned both her bachelor's and master's in Electrical Engineering and Computer Science, and conducted research at Berkeley’s RISE Lab.
Discover the data foundations powering today's AI breakthroughs. Join leading minds as we explore both cutting-edge AI and the infrastructure behind it. Reserve your spot at before tickets sell out!