DON'T MISS THE BEST AI EVENT THIS YEAR

3 DAYS • APRIL 22-24

WHERE DATA MEETS INTELLIGENCE

Experience 3 DAYS of no bullsh*t technical talks & awesome networking with the brightest minds in data & AI in Oakland, CA. 

Speakers From

replit logo
motherduck white-1-1
vectara-logo-mesh-dark@2x
Group 82 (1)
SNO-SnowflakeLogo_white(1)
OpenAI_Logo 2
Group 83

Every AI breakthrough starts with data. We’re the premier technical event spotlighting cutting-edge AI and the data stack that powers it.

Data-Council-2024-Tico-Mendoza--3988
Data-Council-2024-Day-2-Tico-Mendoza-All-4723
DSC_4716-2
DSC_6823
Data-Council-2024-Tico-Mendoza--2558
Data-Council-2024-2
Data-Council-2024-Day-3-Tico-Mendoza--6573

JOIN YOUR TRIBE

Our attendees are AI engineers, founders, CTOs, AI researchers, Heads of Data, and investors who are all building the future of data.

Days

+

Technical Attendees

+

Deep-Dive Talks

Featured Keynotes

Naveen Rao

Naveen Rao

VP of AI

Databricks
Denis Yarats

Denis Yarats

Co-Founder & CTO

Perplexity
Aaron Katz

Aaron Katz

Co-Founder & CEO

Clickhouse
Martin Casado

Martin Casado

General Partner

a16z
Sharon Zhou

Sharon Zhou

Founder & CEO

Lamini
Michele Catasta

Michele Catasta

President

Replit
Jake Brill

Jake Brill

Head of Product - Integrity

OpenAI
Rachad Alao

Rachad Alao

Senior Engineering Director

Meta
Julien Le Dem

Julien Le Dem

Principal Engineer

Datadog
Joseph Gonzalez

Joseph Gonzalez

Professor

RunLLM & UC Berkeley
Krishnaram Kenthapadi

Krishnaram Kenthapadi

Chief Scientist, Clinical AI

Oracle Health
George Mathew

George Mathew

Managing Director

Insight Partners
Daniel Olmedilla

Daniel Olmedilla

Distinguished Engineer, AI & Trust

LinkedIn
Sumti Jairath

Sumti Jairath

Chief Architect

SambaNova Systems
View All Keynote Speakers

Featured Keynotes

All
Keynotes
Data Eng & Infrastructure
Data Sci & Algos
ML OPs & Platforms
Analytics & BI
Lightning Talks
Workshops
Databases
Foundation Models
AI Engineering
GenAI Applications
AI & Data Culture
Naveen Rao
Naveen Rao
VP of AI
Databricks
Naveen Rao
VP of AI
Databricks
VP of AI Databricks
Keynotes
Data Meets Intelligence: Where the Data Infra & AI Stack Converge
As organizations navigate the AI revolution, the traditional boundaries between data infrastructure and AI systems are blurring. This session explores the critical convergence point where data management meets machine intelligence, examining how this intersection is reshaping enterprise technology stacks. Our keynote panelists bring complementary perspectives from operating and investing in this rapidly evolving landscape. Drawing from experiences spanning neuroscience, hardware architecture, product development, and venture capital, they'll unpack the technical and strategic considerations for organizations building modern data + AI platforms. Join us for an insightful discussion on how this convergence enables scalable AI adoption, the architectural patterns emerging across industries, and what the future holds as data infrastructure and AI capabilities become increasingly interdependent in driving business transformation.
Read More
Denis Yarats
Denis Yarats
Co-Founder & CTO
Perplexity
Denis Yarats
Co-Founder & CTO
Perplexity
Co-Founder & CTO Perplexity
Keynotes
RAGs to Riches: Engineering the Future of LLM Systems
This keynote panel features Denis Yarats and Joseph Gonzalez -- two pioneers bridging academic theory and practical application. Joseph Gonzalez has transformed his Berkeley research into tangible solutions through LM-Sys and Gorilla projects, now bringing his expertise in machine learning and robotics to RunLLM.com after successfully launching Turi based on his doctoral work. Denis Yarats complements this approach with his reinvention of information discovery at Perplexity, where he's leveraging his NYU PhD and Facebook AI experience to develop Comet, a revolutionary "browser for agentic search." Together, they exemplify how rigorous academic foundations can be transformed into technologies that solve real-world problems and reshape our digital interactions.
Read More
Aaron Katz
Aaron Katz
Co-Founder & CEO
Clickhouse
Aaron Katz
Co-Founder & CEO
Clickhouse
Co-Founder & CEO Clickhouse
Keynotes
Real-Time Data Infrastructure and AI: Powering the Next Generation of Analytics

Join us for an authentic fireside chat that cuts through industry hype to explore how real-time data infrastructure is transforming analytics and AI. We'll examine technical hurdles in processing data at scale in real-time and the architectural decisions enabling performance breakthroughs. The discussion covers data processing evolution, performance bottlenecks in distributed systems, observability innovations, and approaches for maintaining consistency while increasing throughput. Gain insights into how real-time processing creates competitive advantages for business intelligence and AI applications, with honest assessments of implementation challenges and practical solutions.

Read More
Martin Casado
Martin Casado
General Partner
a16z
Martin Casado
General Partner
a16z
General Partner a16z
Keynotes
Real-Time Data Infrastructure and AI: Powering the Next Generation of Analytics

Join us for an authentic fireside chat that cuts through industry hype to explore how real-time data infrastructure is transforming analytics and AI. We'll examine technical hurdles in processing data at scale in real-time and the architectural decisions enabling performance breakthroughs. The discussion covers data processing evolution, performance bottlenecks in distributed systems, observability innovations, and approaches for maintaining consistency while increasing throughput. Gain insights into how real-time processing creates competitive advantages for business intelligence and AI applications, with honest assessments of implementation challenges and practical solutions.

Read More
Sharon Zhou
Sharon Zhou
Founder & CEO
Lamini
Sharon Zhou
Founder & CEO
Lamini
Founder & CEO Lamini
Keynotes
RAGs to Riches: Engineering the Future of LLM Systems
This keynote panel features Denis Yarats and Joseph Gonzalez -- two pioneers bridging academic theory and practical application. Joseph Gonzalez has transformed his Berkeley research into tangible solutions through LM-Sys and Gorilla projects, now bringing his expertise in machine learning and robotics to RunLLM.com after successfully launching Turi based on his doctoral work. Denis Yarats complements this approach with his reinvention of information discovery at Perplexity, where he's leveraging his NYU PhD and Facebook AI experience to develop Comet, a revolutionary "browser for agentic search." Together, they exemplify how rigorous academic foundations can be transformed into technologies that solve real-world problems and reshape our digital interactions.
Read More
Michele Catasta
Michele Catasta
President
Replit
Michele Catasta
President
Replit
President Replit
Keynotes
RAGs to Riches: Engineering the Future of LLM Systems
This keynote panel features Denis Yarats and Joseph Gonzalez -- two pioneers bridging academic theory and practical application. Joseph Gonzalez has transformed his Berkeley research into tangible solutions through LM-Sys and Gorilla projects, now bringing his expertise in machine learning and robotics to RunLLM.com after successfully launching Turi based on his doctoral work. Denis Yarats complements this approach with his reinvention of information discovery at Perplexity, where he's leveraging his NYU PhD and Facebook AI experience to develop Comet, a revolutionary "browser for agentic search." Together, they exemplify how rigorous academic foundations can be transformed into technologies that solve real-world problems and reshape our digital interactions.
Read More
Jake Brill
Jake Brill
Head of Product - Integrity
OpenAI
Jake Brill
Head of Product - Integrity
OpenAI
Head of Product - Integrity OpenAI
Keynotes
Guardrails for the Future: AI Safety and Responsible AI in Practice
Join us for a keynote panel that moves beyond theoretical discussions of AI ethics to explore the practical realities of implementing responsible AI safeguards. This conversation will unpack the complex trade-offs and technical challenges faced when deploying AI systems at scale. Panelists will share insights from building fairness and privacy protections into major platforms while maintaining innovation, and discuss how responsible AI has evolved into a business imperative. Topics include creating effective trust and safety protocols for generative AI, developing robust safeguards for user-generated content, implementing fairness frameworks across diverse products, and managing the tension between rapid deployment and thorough safety testing. Expect candid discussion about governance structures that work, persistent technical hurdles, and lessons learned from high-stakes incidents that shaped today's AI safeguards.
Read More
Rachad Alao
Rachad Alao
Senior Engineering Director
Meta
Rachad Alao
Senior Engineering Director
Meta
Senior Engineering Director Meta
Keynotes
Guardrails for the Future: AI Safety and Responsible AI in Practice
Join us for a keynote panel that moves beyond theoretical discussions of AI ethics to explore the practical realities of implementing responsible AI safeguards. This conversation will unpack the complex trade-offs and technical challenges faced when deploying AI systems at scale. Panelists will share insights from building fairness and privacy protections into major platforms while maintaining innovation, and discuss how responsible AI has evolved into a business imperative. Topics include creating effective trust and safety protocols for generative AI, developing robust safeguards for user-generated content, implementing fairness frameworks across diverse products, and managing the tension between rapid deployment and thorough safety testing. Expect candid discussion about governance structures that work, persistent technical hurdles, and lessons learned from high-stakes incidents that shaped today's AI safeguards.
Read More
Julien Le Dem
Julien Le Dem
Principal Engineer
Datadog
Julien Le Dem
Principal Engineer
Datadog
Principal Engineer Datadog
Keynotes
The Deconstructed Database and the Advent of the Open Data Lake
In 2018, Julien Le Dem described how the components of databases, distributed or not, were being commoditized as individual parts that anyone could recombine into use-case-specific engines. Given one's constraints, they could leverage those components to build a query engine that solves a specific problem much faster than building everything from the ground up. He called this idea "the Deconstructed Database" and spoke about it at a previous edition of Data Council. Fast forward to today, the big data ecosystem has matured and evolved from a melting pot of competing projects into a more composable ecosystem organized around a few open source standards. It's been incredible to see the vision he outlined in his talk crystallize with the adoption of key components like Parquet, Arrow, Iceberg, Calcite, Substrait and OpenLineage. These tools, and others like them, provide an interoperability layer that enables harnessing data for many purposes without creating silos.

In this talk, Julien will discuss the impact of the cloud and the advent of the open data lake, breaking silos to form the foundation of this ecosystem. As compute and storage can be efficiently decoupled, a common storage layer enables a vibrant ecosystem of on-demand tools specialized to specific use cases that avoid vendor lock-in. He'll go over the core components, how they work together and more importantly, the contracts that keep them decoupled and composable.
Read More
Joseph Gonzalez
Joseph Gonzalez
Professor
RunLLM & UC Berkeley
Joseph Gonzalez
Professor
RunLLM & UC Berkeley
Professor RunLLM & UC Berkeley
GenAI Applications
AGI Is Already Here (But It's Not What You Think)
The Future of AGI: Building Compound AI Systems | Explore a paradigm shift in AGI development through the lens of compound AI systems that integrate multiple LLMs with specialized tools. Learn how orchestrating diverse AI components can achieve human-level performance across broad task domains, demonstrated through RunLLM's AI support engineer implementation. Features practical approaches to building general-purpose AI workflows that combine speed, accuracy, and adaptability. Includes real-world case studies showing how compound AI systems are transforming customer support and service automation.
Read More
Keynotes
RAGs to Riches: Engineering the Future of LLM Systems
This keynote panel features Denis Yarats and Joseph Gonzalez -- two pioneers bridging academic theory and practical application. Joseph Gonzalez has transformed his Berkeley research into tangible solutions through LM-Sys and Gorilla projects, now bringing his expertise in machine learning and robotics to RunLLM.com after successfully launching Turi based on his doctoral work. Denis Yarats complements this approach with his reinvention of information discovery at Perplexity, where he's leveraging his NYU PhD and Facebook AI experience to develop Comet, a revolutionary "browser for agentic search." Together, they exemplify how rigorous academic foundations can be transformed into technologies that solve real-world problems and reshape our digital interactions.
Read More
Krishnaram Kenthapadi
Krishnaram Kenthapadi
Chief Scientist, Clinical AI
Oracle Health
Krishnaram Kenthapadi
Chief Scientist, Clinical AI
Oracle Health
Chief Scientist, Clinical AI Oracle Health
Keynotes
Guardrails for the Future: AI Safety and Responsible AI in Practice
Join us for a keynote panel that moves beyond theoretical discussions of AI ethics to explore the practical realities of implementing responsible AI safeguards. This conversation will unpack the complex trade-offs and technical challenges faced when deploying AI systems at scale. Panelists will share insights from building fairness and privacy protections into major platforms while maintaining innovation, and discuss how responsible AI has evolved into a business imperative. Topics include creating effective trust and safety protocols for generative AI, developing robust safeguards for user-generated content, implementing fairness frameworks across diverse products, and managing the tension between rapid deployment and thorough safety testing. Expect candid discussion about governance structures that work, persistent technical hurdles, and lessons learned from high-stakes incidents that shaped today's AI safeguards.
Read More
George Mathew
George Mathew
Managing Director
Insight Partners
George Mathew
Managing Director
Insight Partners
Managing Director Insight Partners
Keynotes
Data Meets Intelligence: Where the Data Infra & AI Stack Converge
As organizations navigate the AI revolution, the traditional boundaries between data infrastructure and AI systems are blurring. This session explores the critical convergence point where data management meets machine intelligence, examining how this intersection is reshaping enterprise technology stacks. Our keynote panelists bring complementary perspectives from operating and investing in this rapidly evolving landscape. Drawing from experiences spanning neuroscience, hardware architecture, product development, and venture capital, they'll unpack the technical and strategic considerations for organizations building modern data + AI platforms. Join us for an insightful discussion on how this convergence enables scalable AI adoption, the architectural patterns emerging across industries, and what the future holds as data infrastructure and AI capabilities become increasingly interdependent in driving business transformation.
Read More
Daniel Olmedilla
Daniel Olmedilla
Distinguished Engineer, AI & Trust
LinkedIn
Daniel Olmedilla
Distinguished Engineer, AI & Trust
LinkedIn
Distinguished Engineer, AI & Trust LinkedIn
Keynotes
Guardrails for the Future: AI Safety and Responsible AI in Practice
Join us for a keynote panel that moves beyond theoretical discussions of AI ethics to explore the practical realities of implementing responsible AI safeguards. This conversation will unpack the complex trade-offs and technical challenges faced when deploying AI systems at scale. Panelists will share insights from building fairness and privacy protections into major platforms while maintaining innovation, and discuss how responsible AI has evolved into a business imperative. Topics include creating effective trust and safety protocols for generative AI, developing robust safeguards for user-generated content, implementing fairness frameworks across diverse products, and managing the tension between rapid deployment and thorough safety testing. Expect candid discussion about governance structures that work, persistent technical hurdles, and lessons learned from high-stakes incidents that shaped today's AI safeguards.
Read More
Sumti Jairath
Sumti Jairath
Chief Architect
SambaNova Systems
Sumti Jairath
Chief Architect
SambaNova Systems
Chief Architect SambaNova Systems
Keynotes
Bringing Trillions to Reality: How SambaNova’s Memory-Centric Design Powers Agentic AI and GenAI Workflows for Enterprise Data

As enterprises increasingly leverage vast public and private datasets, generative AI and agentic systems are transforming the landscape of AI-driven solutions. These systems demand unparalleled scalability, speed, and efficiency to process massive data volumes while autonomously orchestrating complex workflows. SambaNova Systems offers its revolutionary memory-centric design, engineered to power trillion-parameter models and multi-agent systems with record-breaking interactive inference performance.

This talk will delve into SambaNova’s innovative three-tier memory system and reconfigurable dataflow architecture, which overcome the "memory wall" challenge by enabling seamless switching between hundreds of agents in microseconds. Attendees will explore how these technologies optimize data access, minimize latency, and scale across diverse real-world applications—from real-time decision-making to autonomous multi-agent collaboration—delivering transformative solutions for enterprises worldwide.

Read More
CURATING TRACK SPEAKERS. STAY TUNED.
VIEW ALL KEYNOTES
×

Featured Keynote Speakers

Naveen Rao

Naveen Rao

VP of AI

Databricks
Denis Yarats

Denis Yarats

Co-Founder & CTO

Perplexity
Aaron Katz

Aaron Katz

Co-Founder & CEO

Clickhouse
Martin Casado

Martin Casado

General Partner

a16z
Sharon Zhou

Sharon Zhou

Founder & CEO

Lamini
Michele Catasta

Michele Catasta

President

Replit
Jake Brill

Jake Brill

Head of Product - Integrity

OpenAI
Rachad Alao

Rachad Alao

Senior Engineering Director

Meta
Julien Le Dem

Julien Le Dem

Principal Engineer

Datadog
Joseph Gonzalez

Joseph Gonzalez

Professor

RunLLM & UC Berkeley
Krishnaram Kenthapadi

Krishnaram Kenthapadi

Chief Scientist, Clinical AI

Oracle Health
George Mathew

George Mathew

Managing Director

Insight Partners
Daniel Olmedilla

Daniel Olmedilla

Distinguished Engineer, AI & Trust

LinkedIn
Sumti Jairath

Sumti Jairath

Chief Architect

SambaNova Systems
View All Keynote Speakers

100+ Speakers

Learn from data & AI heroes at top companies as they explain their architectures, discoveries and solutions in detail.

Talk Schedule

All
Keynotes
Data Eng & Infrastructure
Data Sci & Algos
ML OPs & Platforms
Analytics & BI
Lightning Talks
Workshops
Databases
Foundation Models
AI Engineering
GenAI Applications
AI & Data Culture
Lloyd Tabb
Lloyd Tabb
Founder/Former CTO - Looker & Co-creator of Malloy
Meta
Lloyd Tabb
Founder/Former CTO - Looker & Co-creator of Malloy
Meta
Founder/Former CTO - Looker & Co-creator of Malloy Meta
Analytics & BI
Building Blocks: Reusing Queries in Semantic Data Modeling
Data exploration is like a sophisticated Lego set, where strategic piece selection transforms understanding. This session delves into advanced semantic data modeling, revealing how reusing queries creates more powerful, intelligent building blocks that enhance comprehension for both humans and AI. Attendees will learn how to move beyond traditional tables and measures, revolutionizing their approach to data analysis and uncovering deeper insights through innovative modeling techniques.
Read More
Hannes Mühleisen
Hannes Mühleisen
Co-Creator of DuckDB
DuckDB Labs
Hannes Mühleisen
Co-Creator of DuckDB
DuckDB Labs
Co-Creator of DuckDB DuckDB Labs
Data Eng & Infrastructure
Liberate Analytical Data Management with DuckDB
DuckDB Analytics Engine: High-Performance Data Processing Without Limits | Discover how DuckDB's revolutionary in-process analytical engine transforms data warehouse capabilities through a lightweight, versatile architecture. The engine features state-of-the-art vectorized query processing, morsel-driven parallelism, and advanced memory management that scales from embedded devices to powerful servers. This talk dives deep into DuckDB's innovative design principles, implementation strategies, and optimization techniques that enable previously impossible use cases on single nodes. Learn from real-world applications and performance benchmarks demonstrating DuckDB's impact on modern data analytics workflows.
Read More
Nikunj Handa
Nikunj Handa
Product Lead
OpenAI
Nikunj Handa
Product Lead
OpenAI
Product Lead OpenAI
Foundation Models
OpenAI’s Responses API: A New Foundation for Building with Models & Tools
Last month, OpenAI introduced the Responses API: a programmatic agent API businesses can use to perform a wide variety of tasks. With this new primitive, we radically simplified integration, transforming what previously took hundreds of lines of code into just a few. Built from the ground up based on insights from thousands of developers who have used Chat Completions and Assistants APIs, Responses reimagines simplicity, performance, and flexibility, enabling seamless integration of advanced reasoning, multimedia inputs, and multi-step workflows. In this talk, I'll walk through the design decisions and engineering challenges behind Responses. You'll learn how we anticipated developer needs to create an API uniquely engineered for agent-like use cases, capable of handling simultaneous tool calls and seamless multi-turn conversations. We'll explore key features like built-in state management, semantic streaming, intelligent token truncation, and support for hosted tools (ex: web search, file search, and computer operations) that significantly reduce complexity and enhance real-time interactions. And, we’ll talk about how Responses empowers developers to build faster, smarter, and more responsive AI applications than ever before – driving the next wave of intelligent, agentic experiences.
Read More
Naveen Rao
Naveen Rao
VP of AI
Databricks
Naveen Rao
VP of AI
Databricks
VP of AI Databricks
Keynotes
Data Meets Intelligence: Where the Data Infra & AI Stack Converge
As organizations navigate the AI revolution, the traditional boundaries between data infrastructure and AI systems are blurring. This session explores the critical convergence point where data management meets machine intelligence, examining how this intersection is reshaping enterprise technology stacks. Our keynote panelists bring complementary perspectives from operating and investing in this rapidly evolving landscape. Drawing from experiences spanning neuroscience, hardware architecture, product development, and venture capital, they'll unpack the technical and strategic considerations for organizations building modern data + AI platforms. Join us for an insightful discussion on how this convergence enables scalable AI adoption, the architectural patterns emerging across industries, and what the future holds as data infrastructure and AI capabilities become increasingly interdependent in driving business transformation.
Read More
Ravin Kumar
Ravin Kumar
Senior Researcher
Google Deepmind
Ravin Kumar
Senior Researcher
Google Deepmind
Senior Researcher Google Deepmind
Foundation Models
Models as Tools: My Perspective On the Matter
You can look at GenAI from many perspectives. For me perspective shifts when I'm building products, to when I'm training foundation models, to being a day to day user of GenAI. However, most people aren't doing all these things. For the audience here I suggest focusing on one practical angle: LLMs as tools. In this talk I'll share how in this perspective LLMs are just any other tool. By starting with this perspective it'll ensure you start from a grounded realistic perspective before moving into the exciting more hype laden aspects of this new technology.
Read More
Denis Yarats
Denis Yarats
Co-Founder & CTO
Perplexity
Denis Yarats
Co-Founder & CTO
Perplexity
Co-Founder & CTO Perplexity
Keynotes
RAGs to Riches: Engineering the Future of LLM Systems
This keynote panel features Denis Yarats and Joseph Gonzalez -- two pioneers bridging academic theory and practical application. Joseph Gonzalez has transformed his Berkeley research into tangible solutions through LM-Sys and Gorilla projects, now bringing his expertise in machine learning and robotics to RunLLM.com after successfully launching Turi based on his doctoral work. Denis Yarats complements this approach with his reinvention of information discovery at Perplexity, where he's leveraging his NYU PhD and Facebook AI experience to develop Comet, a revolutionary "browser for agentic search." Together, they exemplify how rigorous academic foundations can be transformed into technologies that solve real-world problems and reshape our digital interactions.
Read More
Hadley Wickham
Hadley Wickham
Chief Scientist
Posit
Hadley Wickham
Chief Scientist
Posit
Chief Scientist Posit
Data Sci & Algos
LLMs for Data Science
Obviously everyone is super excited about LLMs right now, and while there's a large element of hype in the popularity they are also genuinely useful. In this talk I'll give a round up of data science things that I've found LLMs particularly useful for, broken up into three broad categories: writing code, writing prose, and rectangling fundamentally non-rectangular data (e.g. test, images, videos, audio). 
Read More
Aaron Katz
Aaron Katz
Co-Founder & CEO
Clickhouse
Aaron Katz
Co-Founder & CEO
Clickhouse
Co-Founder & CEO Clickhouse
Keynotes
Real-Time Data Infrastructure and AI: Powering the Next Generation of Analytics

Join us for an authentic fireside chat that cuts through industry hype to explore how real-time data infrastructure is transforming analytics and AI. We'll examine technical hurdles in processing data at scale in real-time and the architectural decisions enabling performance breakthroughs. The discussion covers data processing evolution, performance bottlenecks in distributed systems, observability innovations, and approaches for maintaining consistency while increasing throughput. Gain insights into how real-time processing creates competitive advantages for business intelligence and AI applications, with honest assessments of implementation challenges and practical solutions.

Read More
Paige Bailey
Paige Bailey
AI Developer Experience Engineer
Google
Paige Bailey
AI Developer Experience Engineer
Google
AI Developer Experience Engineer Google
Workshops
Introduction to Google DeepMind's Models: Gemini 2.0, Imagen 3, and Veo
This intensive workshop is designed for developers eager to explore the cutting-edge capabilities of Google's latest AI tools. Participants will gain hands-on experience working with the Gemini APIs, Google AI Studio, Veo 2, and Imagen 3, enabling them to build intelligent applications and generate stunning creative content. We'll also cover how to use Gemini 2.0 in developer tools like Cursor, Sourcegraph Cody, and more.
Read More
Ryan Blue
Ryan Blue
Creator of Apache Iceberg, Member of Technical Staff
Databricks
Ryan Blue
Creator of Apache Iceberg, Member of Technical Staff
Databricks
Creator of Apache Iceberg, Member of Technical Staff Databricks
Data Eng & Infrastructure
Why is Everyone Talking about Apache Iceberg™? (From the Original Creator of Apache Iceberg)
This talk is a primer for Apache Iceberg™ from one of its original creators. In this talk Ryan Blue, CEO of Tabular (now part of Databricks) and the original creator of Apache Iceberg, discusses its origin and why it's even more relevant today. Ryan will discuss the early days of Apache Iceberg at Netflix, how the project evolved at Tabular, and how Tabular (now part of Databricks) will continue its mission of creating a universal format. Attendees will gain an understanding of Apache Iceberg and how open table formats like it are changing the analytic database industry.
Read More
Ganesh Ramanarayanan
Ganesh Ramanarayanan
VP Engineering
Hex
Ganesh Ramanarayanan
VP Engineering
Hex
VP Engineering Hex
Analytics & BI
Multi-Modal Compute for Data Analytics
Following their groundbreaking Data Council 2022 presentation, Hex continues to push notebook technology boundaries with an innovative approach to data analytics. This session delves into their unique, fully parallelized, multi-modal backend, revealing how sophisticated computational techniques are transforming data processing. Attendees will explore cutting-edge methods that redefine performance, flexibility, and computational efficiency in modern data workflows, gaining insights into the next generation of analytical computing.
Read More
Raghotham Murthy
Raghotham Murthy
Software Engineer, Llama
Meta
Raghotham Murthy
Software Engineer, Llama
Meta
Software Engineer, Llama Meta
Foundation Models
Building LLM Applications with Llama Stack
In this talk, Raghotham describes what it takes to build production grade LLM applications. Unlike regular applications, LLM applications are non-deterministic, and require a unique set of building blocks to support the full software development lifecycle from building to testing to deploying to monitoring to then improving the application. We will show how Llama Stack can be used to build and improve LLM applications in different environments – local development, cloud, on-prem, and mobile.
Read More
Martin Casado
Martin Casado
General Partner
a16z
Martin Casado
General Partner
a16z
General Partner a16z
Keynotes
Real-Time Data Infrastructure and AI: Powering the Next Generation of Analytics

Join us for an authentic fireside chat that cuts through industry hype to explore how real-time data infrastructure is transforming analytics and AI. We'll examine technical hurdles in processing data at scale in real-time and the architectural decisions enabling performance breakthroughs. The discussion covers data processing evolution, performance bottlenecks in distributed systems, observability innovations, and approaches for maintaining consistency while increasing throughput. Gain insights into how real-time processing creates competitive advantages for business intelligence and AI applications, with honest assessments of implementation challenges and practical solutions.

Read More
Tanya Bragin
Tanya Bragin
VP Product
ClickHouse
Tanya Bragin
VP Product
ClickHouse
VP Product ClickHouse
Databases
Unbundling of the Cloud Data Warehouse
The era of proprietary cloud data warehouses in the last decade has revealed critical challenges: performance bottlenecks, escalating costs, and vendor lock-in. This session examines how open-source technologies and data lake standards are transforming the modern data stack. Explore how platforms like ClickHouse, Iceberg, and other open technologies are providing organizations with flexible, cost-effective alternatives to monolithic cloud data warehouses, enabling more diverse and efficient data workflows.
Read More
Ethan Rosenthal
Ethan Rosenthal
Member of Technical Staff
Runway
Ethan Rosenthal
Member of Technical Staff
Runway
Member of Technical Staff Runway
Foundation Models
Building a Data Foundation for Multimodal Foundation Models
While it is often easier to simply throw more data at a problem, scale is not all you need when building multimodal foundation models. Data quality continues to be just as important as data quantity, and supporting “data-centric AI” requires lowering the barrier to data curation as much as possible. However, multimodal data curation presents unique requirements compared to conventional machine learning or business intelligence data management systems. The data is heterogeneous, ranging from scalars to embedding arrays to entire compressed videos. While the dataset sizes in terms of number of rows are not quite Big Data™, the number of bytes is massive with high columnar variance. Given the storage size, it’s infeasible to construct and copy new training datasets for each model training job; training jobs must query the core datasets without copying them. Finally, large scale distributed training jobs require fast random access which bumps up against limitations of typical solutions like partitioned parquet files. In this talk, I will discuss how we built a petabyte-scale, multimodal feature lakehouse. This lakehouse supports analytical querying as well as serving features for large scale distributed training jobs, such as those that were used for training Runway’s recent foundation models like Gen-3 Alpha.
Read More
Tengyu Ma
Tengyu Ma
Co-Founder & CEO
Voyage AI
Tengyu Ma
Co-Founder & CEO
Voyage AI
Co-Founder & CEO Voyage AI
AI Engineering
RAG In 2025: State Of The Art And The Road Forward
Enterprise RAG Systems: Building Robust LLM Knowledge Integration | Master advanced techniques in Retrieval-Augmented Generation (RAG) for enterprise-scale language models. Learn strategies to overcome common RAG pipeline challenges including brittle parsers, suboptimal chunking, and manual query tuning. Deep dive into cutting-edge embedding models and reranking systems that enable automated, scalable knowledge retrieval. Discover practical approaches to building production-ready RAG systems that deliver consistent, high-quality results while minimizing maintenance overhead and manual optimization.
Read More
Charles Frye
Charles Frye
Developer Advocate
Modal Labs
Charles Frye
Developer Advocate
Modal Labs
Developer Advocate Modal Labs
AI Engineering
What Every Data Scientist Needs To Know About GPUs
GPU Optimization for Data Scientists: Essential Knowledge from Silicon to PyTorch | Comprehensive guide to GPU architecture and optimization for modern machine learning workloads. Learn critical GPU concepts from hardware fundamentals to high-level frameworks, with focus on performance tuning for neural networks. Master practical techniques for optimizing system latency and throughput in popular ML frameworks including PyTorch, vLLM, and RAPIDS. Essential knowledge for data scientists and ML engineers working with GPU-accelerated workloads.
Read More
Shreya Rajpal
Shreya Rajpal
Co-Founder & CEO
Guardrails
Shreya Rajpal
Co-Founder & CEO
Guardrails
Co-Founder & CEO Guardrails
AI Engineering
The Future Of Guardrails
AI Safety and Guardrails: Enterprise Framework for Reliable Generative AI | Explore next-generation approaches to implementing guardrails and safety measures in production AI systems, including RAG-enhanced chatbots and autonomous agents. Learn systematic methodologies for risk assessment, reliability monitoring, and failure prevention in enterprise AI deployments. Discover practical frameworks for implementing robust safety controls and guardrails across different AI architectures. Features case studies demonstrating improved system reliability and reduced risks through structured safety protocols and monitoring systems.
Read More
Eno Reyes
Eno Reyes
CTO
Factory
Eno Reyes
CTO
Factory
CTO Factory
AI Engineering
Building Reliable Agentic AI Systems
Building Reliable Agentic AI Systems: Design Principles for Complex Autonomous Software | Explore cutting-edge approaches to designing reliable AI systems that operate autonomously in unpredictable environments. Learn architectural patterns from robotics, cybernetics, and biological systems for building predictable outcomes from non-deterministic components. Deep dive into practical strategies for implementing reliable agentic systems, with focus on stability, error handling, and performance monitoring. Discover emerging patterns for creating AI systems that achieve reliable results despite underlying stochastic processes.
Read More
Sharon Zhou
Sharon Zhou
Founder & CEO
Lamini
Sharon Zhou
Founder & CEO
Lamini
Founder & CEO Lamini
Keynotes
RAGs to Riches: Engineering the Future of LLM Systems
This keynote panel features Denis Yarats and Joseph Gonzalez -- two pioneers bridging academic theory and practical application. Joseph Gonzalez has transformed his Berkeley research into tangible solutions through LM-Sys and Gorilla projects, now bringing his expertise in machine learning and robotics to RunLLM.com after successfully launching Turi based on his doctoral work. Denis Yarats complements this approach with his reinvention of information discovery at Perplexity, where he's leveraging his NYU PhD and Facebook AI experience to develop Comet, a revolutionary "browser for agentic search." Together, they exemplify how rigorous academic foundations can be transformed into technologies that solve real-world problems and reshape our digital interactions.
Read More
Mike Driscoll
Mike Driscoll
Co-Founder & CTO
Rill Data
Mike Driscoll
Co-Founder & CTO
Rill Data
Co-Founder & CTO Rill Data
Analytics & BI
A SQL-Based Metrics Layer for DuckDB and Clickhouse
The ability to aggregate raw data into summarized metrics and slice them across dimensions is at the core of analytics teams' work. This session reveals how Rill has developed a metrics layer that declares metrics entirely with SQL expressions, overcoming traditional limitations of metrics management. By leveraging DuckDB and Clickhouse, attendees will discover how to generate multi-dimensional OLAP cubes, implement real-time data access with sub-second performance, and create uniform dashboards through a BI-as-code philosophy. Learn how to define, manage, and secure metrics using an innovative SQL-based approach that transforms raw data into powerful, actionable insights.
Read More
Bryan Bischof
Bryan Bischof
Head of AI
Theory Ventures
Bryan Bischof
Head of AI
Theory Ventures
Head of AI Theory Ventures
Data Sci & Algos
Failure Is A Funnel
LLM Quality Engineering: From Slop to Production | Learn systematic approaches to evaluating and improving LLM performance, with focus on transforming experimental models into production-ready systems. Master practical frameworks for quality assessment, iterative improvement, and building robust deployment pipelines. Features proven strategies for identifying failure patterns and establishing reliable production environments.
Read More
Nuno Campos
Nuno Campos
Founding Engineer
LangChain
Nuno Campos
Founding Engineer
LangChain
Founding Engineer LangChain
Data Eng & Infrastructure
The Future Of Data Engineering: From Unstructured To Structured for Agent Systems
Next-Generation Data Engineering: AI Agents, Knowledge Graphs, Memory, and Real-Time Systems | Discover how AI agents and foundation models are revolutionizing data engineering through advanced real-time data processing. Learn cutting-edge approaches to agent memory and knowledge representation using semantic layers, vector embeddings, and graph RAG (Retrieval Augmented Generation) systems that power modern AI applications. This expert session explores the evolution from traditional data modeling to dynamic knowledge graphs, with implementation strategies for building responsive, context-aware data platforms and agent memory. Industry leaders share practical insights on adopting these technologies and advancing your career in AI-driven data engineering, including real-world case studies and emerging best practices.
Read More
Michele Catasta
Michele Catasta
President
Replit
Michele Catasta
President
Replit
President Replit
Keynotes
RAGs to Riches: Engineering the Future of LLM Systems
This keynote panel features Denis Yarats and Joseph Gonzalez -- two pioneers bridging academic theory and practical application. Joseph Gonzalez has transformed his Berkeley research into tangible solutions through LM-Sys and Gorilla projects, now bringing his expertise in machine learning and robotics to RunLLM.com after successfully launching Turi based on his doctoral work. Denis Yarats complements this approach with his reinvention of information discovery at Perplexity, where he's leveraging his NYU PhD and Facebook AI experience to develop Comet, a revolutionary "browser for agentic search." Together, they exemplify how rigorous academic foundations can be transformed into technologies that solve real-world problems and reshape our digital interactions.
Read More
Jake Brill
Jake Brill
Head of Product - Integrity
OpenAI
Jake Brill
Head of Product - Integrity
OpenAI
Head of Product - Integrity OpenAI
Keynotes
Guardrails for the Future: AI Safety and Responsible AI in Practice
Join us for a keynote panel that moves beyond theoretical discussions of AI ethics to explore the practical realities of implementing responsible AI safeguards. This conversation will unpack the complex trade-offs and technical challenges faced when deploying AI systems at scale. Panelists will share insights from building fairness and privacy protections into major platforms while maintaining innovation, and discuss how responsible AI has evolved into a business imperative. Topics include creating effective trust and safety protocols for generative AI, developing robust safeguards for user-generated content, implementing fairness frameworks across diverse products, and managing the tension between rapid deployment and thorough safety testing. Expect candid discussion about governance structures that work, persistent technical hurdles, and lessons learned from high-stakes incidents that shaped today's AI safeguards.
Read More
Rachad Alao
Rachad Alao
Senior Engineering Director
Meta
Rachad Alao
Senior Engineering Director
Meta
Senior Engineering Director Meta
Keynotes
Guardrails for the Future: AI Safety and Responsible AI in Practice
Join us for a keynote panel that moves beyond theoretical discussions of AI ethics to explore the practical realities of implementing responsible AI safeguards. This conversation will unpack the complex trade-offs and technical challenges faced when deploying AI systems at scale. Panelists will share insights from building fairness and privacy protections into major platforms while maintaining innovation, and discuss how responsible AI has evolved into a business imperative. Topics include creating effective trust and safety protocols for generative AI, developing robust safeguards for user-generated content, implementing fairness frameworks across diverse products, and managing the tension between rapid deployment and thorough safety testing. Expect candid discussion about governance structures that work, persistent technical hurdles, and lessons learned from high-stakes incidents that shaped today's AI safeguards.
Read More
Julien Le Dem
Julien Le Dem
Principal Engineer
Datadog
Julien Le Dem
Principal Engineer
Datadog
Principal Engineer Datadog
Keynotes
The Deconstructed Database and the Advent of the Open Data Lake
In 2018, Julien Le Dem described how the components of databases, distributed or not, were being commoditized as individual parts that anyone could recombine into use-case-specific engines. Given one's constraints, they could leverage those components to build a query engine that solves a specific problem much faster than building everything from the ground up. He called this idea "the Deconstructed Database" and spoke about it at a previous edition of Data Council. Fast forward to today, the big data ecosystem has matured and evolved from a melting pot of competing projects into a more composable ecosystem organized around a few open source standards. It's been incredible to see the vision he outlined in his talk crystallize with the adoption of key components like Parquet, Arrow, Iceberg, Calcite, Substrait and OpenLineage. These tools, and others like them, provide an interoperability layer that enables harnessing data for many purposes without creating silos.

In this talk, Julien will discuss the impact of the cloud and the advent of the open data lake, breaking silos to form the foundation of this ecosystem. As compute and storage can be efficiently decoupled, a common storage layer enables a vibrant ecosystem of on-demand tools specialized to specific use cases that avoid vendor lock-in. He'll go over the core components, how they work together and more importantly, the contracts that keep them decoupled and composable.
Read More
Joseph Gonzalez
Joseph Gonzalez
Professor
RunLLM & UC Berkeley
Joseph Gonzalez
Professor
RunLLM & UC Berkeley
Professor RunLLM & UC Berkeley
GenAI Applications
AGI Is Already Here (But It's Not What You Think)
The Future of AGI: Building Compound AI Systems | Explore a paradigm shift in AGI development through the lens of compound AI systems that integrate multiple LLMs with specialized tools. Learn how orchestrating diverse AI components can achieve human-level performance across broad task domains, demonstrated through RunLLM's AI support engineer implementation. Features practical approaches to building general-purpose AI workflows that combine speed, accuracy, and adaptability. Includes real-world case studies showing how compound AI systems are transforming customer support and service automation.
Read More
Keynotes
RAGs to Riches: Engineering the Future of LLM Systems
This keynote panel features Denis Yarats and Joseph Gonzalez -- two pioneers bridging academic theory and practical application. Joseph Gonzalez has transformed his Berkeley research into tangible solutions through LM-Sys and Gorilla projects, now bringing his expertise in machine learning and robotics to RunLLM.com after successfully launching Turi based on his doctoral work. Denis Yarats complements this approach with his reinvention of information discovery at Perplexity, where he's leveraging his NYU PhD and Facebook AI experience to develop Comet, a revolutionary "browser for agentic search." Together, they exemplify how rigorous academic foundations can be transformed into technologies that solve real-world problems and reshape our digital interactions.
Read More
Krishnaram Kenthapadi
Krishnaram Kenthapadi
Chief Scientist, Clinical AI
Oracle Health
Krishnaram Kenthapadi
Chief Scientist, Clinical AI
Oracle Health
Chief Scientist, Clinical AI Oracle Health
Keynotes
Guardrails for the Future: AI Safety and Responsible AI in Practice
Join us for a keynote panel that moves beyond theoretical discussions of AI ethics to explore the practical realities of implementing responsible AI safeguards. This conversation will unpack the complex trade-offs and technical challenges faced when deploying AI systems at scale. Panelists will share insights from building fairness and privacy protections into major platforms while maintaining innovation, and discuss how responsible AI has evolved into a business imperative. Topics include creating effective trust and safety protocols for generative AI, developing robust safeguards for user-generated content, implementing fairness frameworks across diverse products, and managing the tension between rapid deployment and thorough safety testing. Expect candid discussion about governance structures that work, persistent technical hurdles, and lessons learned from high-stakes incidents that shaped today's AI safeguards.
Read More
George Mathew
George Mathew
Managing Director
Insight Partners
George Mathew
Managing Director
Insight Partners
Managing Director Insight Partners
Keynotes
Data Meets Intelligence: Where the Data Infra & AI Stack Converge
As organizations navigate the AI revolution, the traditional boundaries between data infrastructure and AI systems are blurring. This session explores the critical convergence point where data management meets machine intelligence, examining how this intersection is reshaping enterprise technology stacks. Our keynote panelists bring complementary perspectives from operating and investing in this rapidly evolving landscape. Drawing from experiences spanning neuroscience, hardware architecture, product development, and venture capital, they'll unpack the technical and strategic considerations for organizations building modern data + AI platforms. Join us for an insightful discussion on how this convergence enables scalable AI adoption, the architectural patterns emerging across industries, and what the future holds as data infrastructure and AI capabilities become increasingly interdependent in driving business transformation.
Read More
Daniel Olmedilla
Daniel Olmedilla
Distinguished Engineer, AI & Trust
LinkedIn
Daniel Olmedilla
Distinguished Engineer, AI & Trust
LinkedIn
Distinguished Engineer, AI & Trust LinkedIn
Keynotes
Guardrails for the Future: AI Safety and Responsible AI in Practice
Join us for a keynote panel that moves beyond theoretical discussions of AI ethics to explore the practical realities of implementing responsible AI safeguards. This conversation will unpack the complex trade-offs and technical challenges faced when deploying AI systems at scale. Panelists will share insights from building fairness and privacy protections into major platforms while maintaining innovation, and discuss how responsible AI has evolved into a business imperative. Topics include creating effective trust and safety protocols for generative AI, developing robust safeguards for user-generated content, implementing fairness frameworks across diverse products, and managing the tension between rapid deployment and thorough safety testing. Expect candid discussion about governance structures that work, persistent technical hurdles, and lessons learned from high-stakes incidents that shaped today's AI safeguards.
Read More
Sumti Jairath
Sumti Jairath
Chief Architect
SambaNova Systems
Sumti Jairath
Chief Architect
SambaNova Systems
Chief Architect SambaNova Systems
Keynotes
Bringing Trillions to Reality: How SambaNova’s Memory-Centric Design Powers Agentic AI and GenAI Workflows for Enterprise Data

As enterprises increasingly leverage vast public and private datasets, generative AI and agentic systems are transforming the landscape of AI-driven solutions. These systems demand unparalleled scalability, speed, and efficiency to process massive data volumes while autonomously orchestrating complex workflows. SambaNova Systems offers its revolutionary memory-centric design, engineered to power trillion-parameter models and multi-agent systems with record-breaking interactive inference performance.

This talk will delve into SambaNova’s innovative three-tier memory system and reconfigurable dataflow architecture, which overcome the "memory wall" challenge by enabling seamless switching between hundreds of agents in microseconds. Attendees will explore how these technologies optimize data access, minimize latency, and scale across diverse real-world applications—from real-time decision-making to autonomous multi-agent collaboration—delivering transformative solutions for enterprises worldwide.

Read More
Han-chung Lee
Han-chung Lee
Machine Learning Director
Moody's Analytics
Han-chung Lee
Machine Learning Director
Moody's Analytics
Machine Learning Director Moody's Analytics
Foundation Models
The Model is the Product
In the realm of machine learning, AI, and deep learning, the intelligence embedded within a system—the model—stands as the primary product and key differentiator. This talk explores how the intelligence component has evolved to become the central selling point across technological eras. We will examine the historical progression of how intelligence capabilities have increasingly defined product value, transforming from hardware differentiators like "Intel Inside" during the PC era, to software advantages, and now to model-centric offerings in today's AI landscape. The intelligence layer has become not just a feature but the core product itself. Additionally, we'll analyze how the definition of "model" itself has evolved alongside technological advancement, reshaping what constitutes a system's core value. Companies now face a strategic bifurcation: pursue a model-centric approach or focus on distribution-centered strategies. Each path carries distinct trade-offs, risks, and opportunities in today's competitive AI marketplace. Through case studies of industry leaders and emerging players, we'll demonstrate how the fundamental principle—"the model is the product, the distribution is the moat"—is reshaping competitive dynamics and business strategies across sectors.
Read More
Julian Hyde
Julian Hyde
Senior Staff Engineer
Google
Julian Hyde
Senior Staff Engineer
Google
Senior Staff Engineer Google
Analytics & BI
More Than Query: Future Directions of Query Langages, from SQL to Morel
"Never bet against SQL,” the saying goes. But what exactly do we want from a query language, and will SQL always be the right tool for the job? What separates a query language from a regular programming language like Python or a framework like Apache Spark? This talk looks at recent efforts to extend SQL with measures and pipe syntax, and then gives an introduction to Morel. Morel is an exciting language that combines the strong type system and expressive power of a functional programming language with the efficiency of a declarative query language. Morel can express not just queries but also data-intensive programming, logic programming and mathematical optimization, and has the potential to replace today’s data frameworks. This talk explores the many ways that we use query languages today – from simple lookup queries and transactions to data engineering, data science and analytics – and related areas such as data-intensive programming, mathematical optimization and logic programming.
Read More
Pedram Navid
Pedram Navid
Head of Data Engineering & DevRel
Dagster Labs
Pedram Navid
Head of Data Engineering & DevRel
Dagster Labs
Head of Data Engineering & DevRel Dagster Labs
Lightning Talks
Write Less More: How Dagster Rebuilt Our Docs from the Ground Up
Documentation can become a critical pain point for technical teams, transforming from a helpful resource into a maintenance nightmare. In this candid session, Dagster reveals their radical approach to documentation reconstruction, demonstrating how a complete ground-up rebuild can revolutionize user experience. Attendees will dive deep into the strategic decision to completely overhaul their documentation, exploring the challenges of incremental improvements and the transformative power of a fresh perspective. Learn how radical rethinking can turn documentation from a source of user frustration into a powerful communication tool that truly serves the community.
Read More
Yusuf Ozuysal
Yusuf Ozuysal
Director of Engineering, AI
Snowflake
Yusuf Ozuysal
Director of Engineering, AI
Snowflake
Director of Engineering, AI Snowflake
Workshops
AI Your Way with All-In-One Access
Break bread with us while exploring the latest in LLM inference! Whether you’re a startup or seasoned developer, building with AI requires quick, easy access to top-tier models—without juggling multiple subscriptions. Snowflake is (now) the only platform where you can access Claude 3.6/3.7 Sonnet, GPT-4, O3-mini, and OpenAI embeddings through a single, Cloud Service Provider-agnostic API. We'll explore how a unified gateway for all your essential models can streamline AI pipelines at scale. Plus, our research team will showcase cutting-edge innovations in OSS model inference, pushing the boundaries of throughput and latency at the Pareto frontier. Join us for a unique Lunch & Learn where you'll experience the latest AI innovations firsthand and provide feedback that shapes our product roadmap.
Read More
Paul Dix
Paul Dix
Founder & CTO
InfluxData
Paul Dix
Founder & CTO
InfluxData
Founder & CTO InfluxData
Databases
Building InfluxDB 3 Core: A Real-Time Columnar DB and Data Processor on Object Storage
InfluxDB 3 Core reimagines time series databases with a ground-up Rust rewrite using Apache Arrow, DataFusion, and Parquet. This session explores an innovative diskless architecture that leverages object storage for persistence, featuring a sophisticated caching system enabling real-time data ingestion and querying. Attendees will discover how an embedded Python VM transforms the database into a comprehensive data collector, monitoring agent, and data transformation platform.
Read More
Hamel Husain
Hamel Husain
Machine Learning Consultant
Parlance Labs
Hamel Husain
Machine Learning Consultant
Parlance Labs
Machine Learning Consultant Parlance Labs
Foundation Models
The Model is Not the Product
This Data Council 2025 talk is in development. Check back soon! 
Read More
Chenggang Wu
Chenggang Wu
Co-Founder & CTO
RunLLM
Chenggang Wu
Co-Founder & CTO
RunLLM
Co-Founder & CTO RunLLM
GenAI Applications
AGI Is Already Here (But It's Not What You Think)
The Future of AGI: Building Compound AI Systems | Explore a paradigm shift in AGI development through the lens of compound AI systems that integrate multiple LLMs with specialized tools. Learn how orchestrating diverse AI components can achieve human-level performance across broad task domains, demonstrated through RunLLM's AI support engineer implementation. Features practical approaches to building general-purpose AI workflows that combine speed, accuracy, and adaptability. Includes real-world case studies showing how compound AI systems are transforming customer support and service automation.
Read More
Alexa Garrison
Alexa Garrison
VP Data & Business Operations
Splice
Alexa Garrison
VP Data & Business Operations
Splice
VP Data & Business Operations Splice
AI & Data Culture
Building High-Impact Data Teams in an AI-Driven World
This Data Council 2025 talk explores how organizations can build strong data teams and empower them to drive impactful decision-making, regardless of size or resources. More details to be announced... 
Read More
Etienne Dilocker
Etienne Dilocker
CTO
Weaviate
Etienne Dilocker
CTO
Weaviate
CTO Weaviate
Databases
The Agentic Database: A New Way to Interact with Your Data
For decades, database interactions have been constrained by traditional Create, Update, and Delete (CRUD) operations, but the emergence of AI agents is poised to revolutionize this paradigm. This session explores a transformative approach to database interaction, where databases become collaborative partners capable of understanding complex, natural language commands. Attendees will discover how future databases might interpret sophisticated requests like "Translate all documents to Spanish and summarize them" or "Extract the 2024 Sales numbers and map out their correlation with events and feature releases." By moving beyond vector search and similarity matching, this talk reveals a groundbreaking vision of databases as intelligent, context-aware systems that can comprehend, process, and execute nuanced human instructions.
Read More
Samuel Colvin
Samuel Colvin
Founder
Pydantic
Samuel Colvin
Founder
Pydantic
Founder Pydantic
Workshops
Pydantic: An Opinionated Blueprint for the Future of GenAI Applications
AI application development doesn't require reinventing software engineering. This transformative talk presents a practical blueprint for building maintainable AI systems using existing tools like Pydantic as the foundation. Learn how to implement critical components: strict data validation at API levels, self-correction mechanisms for enhanced accuracy, automated schema generation for LLM tool calls, continuous evaluation frameworks, and comprehensive observability solutions. Through concrete examples and code snippets, discover how familiar tools can create robust AI applications without unnecessary complexity. Perfect for developers looking to integrate AI functionality into larger software systems efficiently.
Read More
Andy Pavlo
Andy Pavlo
Assistant Professor of Databaseology
Carnegie Mellon University
Andy Pavlo
Assistant Professor of Databaseology
Carnegie Mellon University
Assistant Professor of Databaseology Carnegie Mellon University
Databases
What Goes Around Comes Around... and Around...
Doesn't it feel like there is always a new crop of database management systems pushing the idea that the relational model is outdated and SQL is dying? Vector database proponents have recently taken up this mantle, fueled by AI/ML technologies. Before that, NoSQL users claimed RM/SQL was insufficient for "webscale" applications. And in the 1990s, object-oriented database vendors wanted developers to switch to their systems. Database history doesn't repeat, but it rhymes. In this talk, Professor Andy Pavlo presents the 60-year history of data modeling research and demonstrate why RM/SQL is the preferred default choice for database applications of any size. All efforts to completely replace the data model or query language have failed. Instead, SQL absorbed the best ideas from these alternative approaches and remains relevant for modern applications.
Read More
Dhruv Singh
Dhruv Singh
Co-founder & CTO
HoneyHive AI
Dhruv Singh
Co-founder & CTO
HoneyHive AI
Co-founder & CTO HoneyHive AI
AI Engineering
Eval Agents: How to Solve Error Cascades in Agents
Agents or RAG chatbots are multi-turn AI systems. Multi-turn means interacting back-and-forth with humans. These systems face a fundamental challenge: errors compound and cascade with each interaction. In this talk, we'll go through real-world examples of agents failing in spectacular ways when one step goes wrong - overconfidence, manipulation, looping actions, and more. After doing so, we'll examine how agent builders use "eval agents" tuned on real-world interactions to evaluate agents and even use them as verifiers to improve performance in production! By the end of the talk, you'll have learned about the new world of trajectory evaluation needed to evaluate agents accurately.
Read More
George Fraser
George Fraser
Co-Founder & CEO
Fivetran
George Fraser
Co-Founder & CEO
Fivetran
Co-Founder & CEO Fivetran
Workshops
Look Ma, No Data Warehouse!

Modern data lakes promise affordability and scalability, but using them can be a headache. Cloud data warehouses make querying easy, but they come with a hefty price tag and extra complexity. What if you could get the same ease of use without the cost and lock-in?

In this session, we’ll show you how to leverage open-source software to build a fully functional, queryable analytics powerhouse using DuckDB, Fivetran, and Polaris Catalog. We’ll walk through how to:

1. Load data that is automatically converted to Iceberg open table format

2. Run SQL queries using DuckDB’s new Iceberg extension

3. Run transformations directly on data stored in your data lake with a new dbt adapter

4. Get started easily with a practical, hands-on demo

No vendor lock-in, no unnecessary complexity—just an open-source-powered approach to enabling advanced analytics and AI. If your data warehouse is holding you back or eating away at your budget, this session is for you!

Read More
Simon Eskildsen
Simon Eskildsen
Co-Founder
Turbopuffer
Simon Eskildsen
Co-Founder
Turbopuffer
Co-Founder Turbopuffer
Data Eng & Infrastructure
Billion-Scale Vector Search on Object Storage
Vector Search at Scale: How Notion Built Billion-Vector Search Infrastructure | Explore the architecture behind Notion's enterprise-scale vector search system, powering one of the largest semantic search implementations in production. Learn advanced techniques in embedding pipeline design, distributed vector processing, and optimal storage strategies using Spark and Turbopuffer. This technical deep-dive covers LSM indexing, RAG (Retrieval Augmented Generation) implementation, and practical approaches to query optimization. Discover battle-tested strategies for building and scaling production-ready vector search systems capable of handling billions of vectors with high performance and reliability.
Read More
Vishnu Vasanth
Vishnu Vasanth
Co-Founder & CEO
e6data
Vishnu Vasanth
Co-Founder & CEO
e6data
Co-Founder & CEO e6data
Workshops
Everything Everywhere All at Once: Object Store Native
Discover how e6data’s lakehouse compute engine runs complex and high-concurrency SQL analytics and AI workloads 10x faster than all leading engines at 1/3rd the cost—all with zero data movement. Learn how e6data’s atomically scalable lakehouse architecture helps achieve sub-second latencies even under heavy concurrency. This technical deep-dive covers e6data’s atomically scalable K8s native architecture, disaggregated compute design, and open table format integration showing the future of SQL analytics and AI workloads through real-world performance benchmarks and production case studies. Learn how an object-store-native approach unlocks “everything, everywhere, all at once” in modern data ecosystems.
Read More
Niko Grupen
Niko Grupen
Head of Applied Research
Harvey
Niko Grupen
Head of Applied Research
Harvey
Head of Applied Research Harvey
GenAI Applications
Legal Agency: Building Domain-specific Agents for Enterprise
Building agents for real-world knowledge work requires a delicate balance of AI and Human-Computer Interaction (HCI) — one has to understand frontier model capabilities, translate them into a framework for agent behavior (with the right primitives, guardrails, etc), and then place them in an intuitive product surface that is interactive and transparent. The complexity of attaining this balance is magnified for vertical problem spaces that require significant domain expertise to solve for, like law. This talk will share insights and best-practices from building at the bleeding edge of the application layer. We'll explore how to leverage domain expertise to map model problems to legal problems (and importantly, evaluate them), how to create a framework for vertical agents that mirrors human processes, and why, despite LLMs being the star of the show, traditional engineering and machine learning practices are essential for maximizing quality and reliability in production environments.
Read More
Dillon Morrison
Dillon Morrison
Director of Product Management
Sigma Computing
Dillon Morrison
Director of Product Management
Sigma Computing
Director of Product Management Sigma Computing
Workshops
Text-to-SQL Is Not the Answer: How to Effectively Use AI For Analytics
Think BI is dead? Will natural language replace the dashboard? Sigma's Wednesday morning workshop breaks down why generative AI is a powerful supplement - not replacement - for BI practices, and examines how to effectively embed AI into your analytics workflows.
Read More
Natacha Crooks
Natacha Crooks
Assistant Professor
UC Berkeley
Natacha Crooks
Assistant Professor
UC Berkeley
Assistant Professor UC Berkeley
Databases
From Concurrency Control to Concurrent Scheduling
This Data Council 2025 talk is in development. Please check back soon for updates!
Read More
Rachel Lee Nabors
Rachel Lee Nabors
Former React Core
Meta
Rachel Lee Nabors
Former React Core
Meta
Former React Core Meta
AI Engineering
AI Cram Session
Machine Learning Fundamentals: From RAG to Deep Learning for Beginners | Comprehensive introduction to essential machine learning concepts, including RAG (Retrieval Augmented Generation), neural networks, and foundational math principles. Learn complex ML concepts through engaging visual explanations and intuitive metaphors from an experienced technical educator. Perfect for developers, analysts, and technical professionals looking to understand modern AI terminology and architecture. Features practical examples and visual guides from the creator of React's educational platform, making advanced concepts accessible for technical audiences.
Read More
Chenyu Qiu
Chenyu Qiu
Senior Applied Scientist
Uber
Chenyu Qiu
Senior Applied Scientist
Uber
Senior Applied Scientist Uber
Data Sci & Algos
Scalable Continuous Monitoring for Large-scale A/B Experimentation
At Uber, our A/B Testing Framework and Continuous Experiment Monitoring talk reveals how we've revolutionized experimental analytics at scale. We'll demonstrate our solution to the "peeking problem" that plagues traditional experiment monitoring approaches. This presentation showcases our automated platform that processes thousands of monitoring analyses daily using regression-adjusted estimators with anytime-valid inference. This advanced statistical methodology eliminates 95% of noise without sacrificing true signals, enabling Early Experiment Detection and Performance Insights. Learn how our Spark-powered computational framework efficiently batches experiments and metrics for scalable processing. We'll share Real-World Case Studies showing how this system has transformed Uber's Data-Driven Decision Making, minimizing undetected regressions and accelerating product innovation across our global platform.
Read More
Ori Soen
Ori Soen
CEO
Montara
Ori Soen
CEO
Montara
CEO Montara
Workshops
Analytics and the dark side of the Analytics Development Lifecycle
In this insightful session, we examine how the Analytics Development Lifecycle (ADLC) introduced essential structure to data workflows but unintentionally created organizational bottlenecks by limiting data warehouse access to engineers. Our speaker shares how innovative Data Teams are successfully enabling analysts, product managers, and data scientists to migrate their work to Data Warehouse tables while maintaining strong Data Governance and Quality Standards. Discover practical DataOps Strategies that balance democratized data access with the structured Quality Assurance processes that modern enterprises require for effective Data Management.
Read More
Franck Pachot
Franck Pachot
Developer Advocate
MongoDB
Franck Pachot
Developer Advocate
MongoDB
Developer Advocate MongoDB
Databases
The Modern Database Debate: PostgreSQL and MongoDB
Which database should you choose? This question has evolved from theoretical debates to practical decisions based on facts. Technology has advanced significantly—SQL databases now support JSON, while NoSQL databases have integrated ACID properties. PostgreSQL and MongoDB represent the most common choices today, both widely adopted as standard APIs for managed database services. We will explore differences between these approaches, comparing interactive SQL transactions versus document-based design, examining internal storage performance implications, and considering how team expertise influences choices. Our goal is to clarify how to utilize each option effectively for modern applications' agility, scalability, and performance requirements, helping you select the database your team will be most comfortable using efficiently.
Read More
Parham Parvizi
Parham Parvizi
Founding Data Architect
Prospective
Parham Parvizi
Founding Data Architect
Prospective
Founding Data Architect Prospective
Workshops
A Local-First approach to extremely fast Streaming Visualization

Modern data workloads demand fast, interactive, and scalable visualization—without the cost and complexity of server-side rendering. The local-first approach leverages modern browser capabilities, WebAssembly, and in-browser computation to achieve high-performance analytics while reducing cloud costs.

In this workshop, we’ll explore:

1. Why Local-First? The benefits of running everything client-side for cost-efficient, scalable visualization across thousands of users.
WebAssembly (WASM) for Data Apps: How Perspective harnesses WASM to power ultra-fast, browser-native analytics and even replace traditional Docker-based containers for data workloads.

2. Perspective + DuckDB: A full in-browser analytics stack that enables high-speed querying and visualization without a backend.
Streaming Data with InfluxDB: How to visualize high-frequency, real-time IoT and log data with sub-second latency.
Databricks + Perspective: Enhancing large-scale analytics with interactive dashboards inside Jupyter notebooks.
Through live coding and guided exercises, attendees will build their own browser-native analytics dashboards, connect to real-time data streams, and learn Perspective’s API in Python, Node.js, and Rust.

Difficulty level - Intermediate – Some experience with Python, JavaScript, and data analytics will be helpful, but beginners can follow along with guided exercises.

To participate, bring a laptop with: 
Git
VS Code
Docker
Python (3.8+)
Node.js (16+)

Read More
Wenjing Zheng
Wenjing Zheng
Data Science Manager
Roblox
Wenjing Zheng
Data Science Manager
Roblox
Data Science Manager Roblox
Data Sci & Algos
Causal Inference Methods for Bridging Experiments and Strategic Impact
While experimentation gives us clean effect measures, connecting those results to real-world business decisions is messy. In this talk, I’ll walk through two case studies at Roblox that highlight this challenge and explore some causal inference methods to help bridge the gap. The first focuses on attributing observed year-over-year business growth to product launches. The strategic need here is twofold: to understand how much of our growth is driven by the innovations we shipped, and to reconcile different measurements of business performance— experiment results and long-term growth trends—into a coherent narrative. The core challenge is isolating product impact from organic growth (in the absence of these launches) in the topline metrics we observe.The second case study addresses how to generalize A/B test results to a broader population, without requiring an explicit evaluation of covariate shift between the experiment and target population—making the approach scalable across experiments and surfaces. This framing is essential for fair comparisons across product areas that vary in reach and in how amenable they are to metric movement, enabling more effective prioritization across teams.Together, these cases reflect a broader goal: building a common measurement language that connects local experimental results to global business impact—so organizations can make more strategic, data-informed decisions.
Read More
Doron Porat
Doron Porat
Co-Founder & CEO
Lakeway
Doron Porat
Co-Founder & CEO
Lakeway
Co-Founder & CEO Lakeway
AI & Data Culture
AI is Going to Break Your Data Platform - Are You Ready?
AI isn't just another workload - it's an unpredictable force disrupting data operations. This isn't evolution - it's collision. Traditional platforms assume stability, but AI workloads introduce volatility everywhere: in queries, users, and purposes. We need a new playbook. The way we optimize, govern, and structure data must evolve before AI forces our hand. The cracks are forming: workloads becoming chaotic, query patterns unpredictable, and latency constraints tightening. Pre-joins and aggregations matter, but existing optimization strategies won't hold at AI scale. This talk breaks down what's coming, what's at risk, and how to build AI-ready data platforms that don't just survive change - they thrive on it.
Read More
Oriol Mirosa
Oriol Mirosa
Director, Data Solutions
Brooklyn Data Co
Oriol Mirosa
Director, Data Solutions
Brooklyn Data Co
Director, Data Solutions Brooklyn Data Co
AI & Data Culture
Data Governance is NOT the Governance of Data!
This talk challenges the misleading concept that data governance is about controlling information rather than managing relationships between people. This Data Governance Best Practices talk explores why Traditional Data Management frameworks fail when overlooking the human element, presenting instead a Relationship-Centered Governance Model that aligns roles and responsibilities across organizations. Drawing from Enterprise Data Governance Case Studies, attendees will discover practical strategies for embedding Effective Governance Workflows without creating bottlenecks, transforming Data Management Strategy from control-focused to people-empowering while maintaining appropriate Data Quality Standards and Compliance Requirements.
Read More
Willem Pienaar
Willem Pienaar
Co-Founder & CTO
Cleric
Willem Pienaar
Co-Founder & CTO
Cleric
Co-Founder & CTO Cleric
Lightning Talks
Chaos by Design: Solving the Unsolvable AI Agent Testing Problem

Not all AI agent use cases are created equal. While code generation agents can be tested against clear benchmarks, operational agents tackling real-world problems face a fundamentally different challenge: how do you evaluate an agent that must navigate complex, dynamic systems without a predefined playbook? Take root cause analysis in distributed systems: an agent must understand intricate service dependencies, parse through inconsistent logs, and reason about potential failure modes. Unlike coding tasks with definitive right answers, these scenarios have no ground truth. Traditional testing approaches break down completely. This talk breaks down our approach to building a deterministic simulation environment that generates and tests realistic failure scenarios at scale. We'll expose why existing evaluation methods fail—from infrastructure mimicry to LLM-generated tests—and demonstrate a lightweight simulation technique that enables precise, reproducible agent testing.

Read More
Mitul Tiwari
Mitul Tiwari
Co-founder & CTO
Stealth
Mitul Tiwari
Co-founder & CTO
Stealth
Co-founder & CTO Stealth
GenAI Applications
TapeAgents: A Powerful Framework For Building And Optimizing AI Agents
TapeAgents: Advanced Framework for Observable AI Development | Discover ServiceNow's open-source framework for building transparent, debuggable AI agents with comprehensive action recording and replay capabilities. Learn how TapeAgents' innovative recording system enables unprecedented visibility into agent behavior, streamlined debugging, and data-driven optimization. Master practical techniques for building robust AI agents with built-in observability and performance analysis tools. Features implementation strategies for creating production-ready agents with enhanced reliability and maintainability.
Read More
Timothy Chan
Timothy Chan
Head of Data
Statsig
Timothy Chan
Head of Data
Statsig
Head of Data Statsig
Data Sci & Algos
Unlocking A/B Testing For B2B
B2B Experimentation: Advanced A/B Testing Beyond Consumer Applications | Learn enterprise-grade experimentation strategies from Statsig's work with leading B2B platforms including Notion, Figma, and Atlassian. Master specialized statistical approaches designed for B2B contexts, addressing unique challenges in sample sizes, user behaviors, and impact measurement. Discover practical frameworks for implementing robust experimentation systems that deliver reliable insights for enterprise products. Features real-world case studies demonstrating successful B2B testing methodologies and their impact on product development.
Read More
David Wilson
David Wilson
Co-Founder & CEO
Hunch Tools
David Wilson
Co-Founder & CEO
Hunch Tools
Co-Founder & CEO Hunch Tools
GenAI Applications
Designing & Engineering a Viral Multi-Model AI Workflow: From Prototype to 300K Users in Two Weeks
When Hunch's viral LinkedIn year-in-review AI generator reached 300,000 users processing 1+ trillion tokens in two weeks, their multi-model architecture faced extreme scaling challenges. This case study reveals how a simple prototype evolved into a production-scale AI system overnight. Discover Hunch's technical blueprint featuring multiple LLM orchestration across OpenAI, Anthropic, and Google models, critical infrastructure scaling solutions, and how they achieved 85% cost reduction through optimized model selection and prompt engineering. Learn from their 26 rapid iterations that simultaneously improved output quality while decreasing costs. This presentation shares practical patterns for AI workflow orchestration balancing quality, cost, and reliability at scale. Gain actionable engineering strategies for building resilient, scalable AI applications that maintain performance under unpredictable growth, plus vital lessons about system failure points when success arrives faster than expected.
Read More
Ofer Mendelevitch
Ofer Mendelevitch
Head of Developer Relations
Vectara
Ofer Mendelevitch
Head of Developer Relations
Vectara
Head of Developer Relations Vectara
Workshops
Building Enterprise Agentic RAG Applications with Reduced Hallucinations
As AI continues to evolve, agentic frameworks are becoming essential tools for developing intelligent and autonomous systems that can reason, plan, and act dynamically. In this workshop, we will explore how to leverage Vectara’s Agentic RAG framework to build context-aware, AI assistants and agents, with reduced hallucinations that enhance productivity and automate enterprise workflows. We will provide a step-by-step walkthrough on how to build Agentic RAG applications, delving into the technical details with a real-world example, and discuss the challenges developers might face, such as reducing hallucinations. Whether you are an AI developer, researcher, or enthusiast, this workshop will equip you with the practical skills to harness agentic AI for your enterprise.
Read More
Lindsay Murphy
Lindsay Murphy
Director, Head of Data
Hiive
Lindsay Murphy
Director, Head of Data
Hiive
Director, Head of Data Hiive
AI & Data Culture
No More BS: How (and When) to Really Leverage AI
Successful AI implementation hinges on a solid foundation of data quality and governance, and the current hype often overshadows the critical practical considerations needed to achieve that foundation. Moreover, while AI holds immense potential, it's crucial to evaluate whether it's truly the optimal solution for a given business problem, as simpler, more established methods may be equally or more effective. We present a practical framework to assess whether AI is the optimal solution, and encourage some good old-fashioned critical thinking. Join Colleen Tartow and Lindsay Murphy for a data-driven conversation exploring AI's true viability.
Read More
Jake Thomas
Jake Thomas
Manager, Data Foundations
Okta
Jake Thomas
Manager, Data Foundations
Okta
Manager, Data Foundations Okta
ML OPs & Platforms
Embedding OLAP, Everywhere: Lessons from Okta
Okta's innovative journey from processing trillions of events with mini serverless databases to embedding OLAP across its systems reveals a transformative approach to data processing. This session explores how embedded database systems are reshaping traditional data warehousing, demonstrating how small databases can create enormous value beyond analytics. Attendees will discover the strategic shift that's bringing databases back into application engineering and driving unprecedented innovation.
Read More
Tobias Lunt
Tobias Lunt
Co-Founder & Data Scientist
Development Data Lab
Tobias Lunt
Co-Founder & Data Scientist
Development Data Lab
Co-Founder & Data Scientist Development Data Lab
Lightning Talks
Putting Data to Work for Global Urban Development
Imagine transforming the lives of billions by reimagining urban data infrastructure. Development Data Lab is pioneering a revolutionary approach to urban policy and planning, addressing the critical gap in decision-ready data for the world's developing cities. By integrating diverse data sources—including satellite imagery, administrative records, household surveys, and AI-powered text analysis—this innovative project creates a unified geographic framework for understanding urban challenges. The team demonstrates how emerging data technologies can generate near-real-time, actionable insights to tackle complex issues like urban sprawl, air pollution, poverty, education, mobility, and migration. Learn how a mission-driven approach can leverage incremental technological improvements and AI-assisted development to create outsized impact for global urban communities.
Read More
Marck Vaisman
Marck Vaisman
Global AI Solutions Architect
Microsoft
Marck Vaisman
Global AI Solutions Architect
Microsoft
Global AI Solutions Architect Microsoft
AI Engineering
Revolutionize AI Engineering With Autogen
Microsoft AutoGen: Scale and Automate Enterprise AI Development | Discover Microsoft's open-source framework for building and orchestrating production-ready AI agent systems. Learn practical implementation strategies for automating complex AI workflows, reducing development time, and optimizing resource utilization. Features real-world case studies demonstrating AutoGen's impact on development efficiency, model performance, and cost reduction across various industries. Includes hands-on examples of system integration, agent orchestration, and workflow automation for enterprise AI applications.
Read More
Elias DeFaria
Elias DeFaria
Co-Founder & VP of Product
SDF
Elias DeFaria
Co-Founder & VP of Product
SDF
Co-Founder & VP of Product SDF
Data Eng & Infrastructure
Why dbt Acquired Sdf: How A Small Team Built True SQL Comprehension
At SDF, we built a multi-dialect SQL compiler that resolves proprietary SQL dialects like Snowflake and BigQuery into a unified logical plan. This breakthrough technology, now part of dbt following the acquisition, unlocks immense value in developer experience, data governance, and cost optimization—enabling seamless cross-engine workflows. In this talk, Elias, co-founder of SDF, will dive into how we built the compiler, the challenges of normalizing complex dialects, and the transformative potential for data practitioners. He'll conclude with an exclusive look at upcoming dbt features powered by this technology, reshaping how teams approach analytics.
Read More
Mickey Liu
Mickey Liu
Software Engineer
Notion
Mickey Liu
Software Engineer
Notion
Software Engineer Notion
Data Eng & Infrastructure
Billion-Scale Vector Search on Object Storage
Vector Search at Scale: How Notion Built Billion-Vector Search Infrastructure | Explore the architecture behind Notion's enterprise-scale vector search system, powering one of the largest semantic search implementations in production. Learn advanced techniques in embedding pipeline design, distributed vector processing, and optimal storage strategies using Spark and Turbopuffer. This technical deep-dive covers LSM indexing, RAG (Retrieval Augmented Generation) implementation, and practical approaches to query optimization. Discover battle-tested strategies for building and scaling production-ready vector search systems capable of handling billions of vectors with high performance and reliability.
Read More
Sumedh Sakdeo
Sumedh Sakdeo
Senior Staff Software Engineer
LinkedIn
Sumedh Sakdeo
Senior Staff Software Engineer
LinkedIn
Senior Staff Software Engineer LinkedIn
Data Eng & Infrastructure
Optimizing Iceberg Table Layouts at Scale: A Multi-Objective Approach
Optimizing Iceberg Tables: Advanced Data Layout Strategies for Enterprise Data Lakes | Master data layout optimization techniques for managing large-scale Iceberg deployments with 100K+ tables. Learn comprehensive approaches to multi-objective optimization, balancing storage efficiency with query performance through intelligent file management and compaction strategies. This session covers practical implementation of table scoring algorithms, automated optimization workflows, and real-world performance insights from OpenHouse deployment. Includes detailed case studies and benchmarks using LST-bench, demonstrating measurable improvements in query performance and storage efficiency.
Read More
Jesus Camacho
Jesus Camacho
Principal Engineering Manager
Microsoft
Jesus Camacho
Principal Engineering Manager
Microsoft
Principal Engineering Manager Microsoft
Data Eng & Infrastructure
Optimizing Iceberg Table Layouts at Scale: A Multi-Objective Approach
Optimizing Iceberg Tables: Advanced Data Layout Strategies for Enterprise Data Lakes | Master data layout optimization techniques for managing large-scale Iceberg deployments with 100K+ tables. Learn comprehensive approaches to multi-objective optimization, balancing storage efficiency with query performance through intelligent file management and compaction strategies. This session covers practical implementation of table scoring algorithms, automated optimization workflows, and real-world performance insights from OpenHouse deployment. Includes detailed case studies and benchmarks using LST-bench, demonstrating measurable improvements in query performance and storage efficiency.
Read More
Ciro Greco
Ciro Greco
Founder
Bauplan
Ciro Greco
Founder
Bauplan
Founder Bauplan
Data Sci & Algos
Python Over Data Lakes: Declarative Environments, Data Management And Other Things With Feathers
Python Data Lake Reproducibility: Building Deterministic Pipelines at Scale | Learn advanced techniques for creating reproducible data workflows across distributed environments using Python, Iceberg, Arrow, and Docker. Master declarative approaches to managing code versions, data dependencies, and runtime configurations in complex data lake architectures. Discover practical solutions for decoupling compute, storage, and execution environments while maintaining deterministic results. Includes implementation strategies using open-source tools for building efficient, scalable data pipelines with improved developer experience.
Read More
Joseph Powers
Joseph Powers
Principal Data Scientist
Intuit
Joseph Powers
Principal Data Scientist
Intuit
Principal Data Scientist Intuit
Data Sci & Algos
Going Bayes: Shifting Our Testing Methods To Reflect Our Priorities
Bayesian AB Testing at Scale: How Intuit Revolutionized Experiment Design | Discover how Intuit transformed their experimentation framework using Bayesian risk-based testing to achieve 60% faster results. Learn practical implementation of risk threshold algorithms that optimize for business outcomes rather than traditional error rates. Master strategies for organizational adoption of advanced statistical methods across Analytics, Product, and Marketing teams. Features detailed case study of successful enterprise-wide statistical transformation, including implementation challenges and measurable outcomes.
Read More
Marcel Kornacker
Marcel Kornacker
Co-Founder & CTO
Pixeltable
Marcel Kornacker
Co-Founder & CTO
Pixeltable
Co-Founder & CTO Pixeltable
ML OPs & Platforms
Introducing Pixeltable: Open Source Data Infrastructure for Multimodal AI
Traditional AI infrastructure creates complexity by forcing data teams to juggle multiple specialized systems, fragmenting workflows and increasing operational costs. Marcel Kornacker, founder of Apache Impala and Apache Parquet, introduces Pixeltable, an open-source solution that revolutionizes AI data infrastructure through a declarative, incremental approach. This session reveals how a unified platform can solve common AI data challenges by bringing together data, computation, and models in a single, integrated interface. Attendees will discover how Pixeltable provides automatic versioning, enables incremental updates, and streamlines pipeline management for ML engineers, data scientists, and infrastructure teams seeking to overcome traditional data processing limitations.
Read More
Saif Ur-Rehman
Saif Ur-Rehman
Data Engineering Lead
Basecamp Research
Saif Ur-Rehman
Data Engineering Lead
Basecamp Research
Data Engineering Lead Basecamp Research
Lightning Talks
Engineering Earth's Largest Biological Data Pipeline
Basecamp Research is pioneering a groundbreaking mission to map the unknown biological world, addressing the staggering fact that over 99.9% of life on Earth remains undiscovered. This session unveils an unprecedented biological data pipeline that surpasses all publicly available scientific data collected over the past century. By creating a comprehensive digital twin of Earth's life, the team is developing next-generation biological foundation models with applications spanning pharmaceutical research, deep learning, and scientific discovery. Attendees will explore how a global biological data supply chain, spanning five continents, is generating billions of biological labels and producing state-of-the-art AI models that outperform research from Google, DeepMind, and Genentech.
Read More
Jonathan Jin
Jonathan Jin
Staff Machine Learning Engineer
Hinge
Jonathan Jin
Staff Machine Learning Engineer
Hinge
Staff Machine Learning Engineer Hinge
ML OPs & Platforms
Trimming the Long Tail of Production Model Ownership at Hinge
Beyond model performance lies a critical challenge in machine learning: comprehensive model ownership. This talk examines how focusing on the often-overlooked "long tail" of machine learning infrastructure can dramatically improve operational efficiency and innovation. Staff Engineer Jonathan Jin from Hinge's AI Platform team will reveal how addressing challenges like observability, feature access, and model refinement creates a "golden path" that empowers teams to continuously innovate. Attendees will learn how strategic infrastructure development can transform machine learning from a performance-driven to a holistic, sustainable practice.
Read More
Madison Faulkner
Madison Faulkner
Principal & Head of Data Science
NEA
Madison Faulkner
Principal & Head of Data Science
NEA
Principal & Head of Data Science NEA
Data Eng & Infrastructure
The Future Of Data Engineering: From Unstructured To Structured for Agent Systems
Next-Generation Data Engineering: AI Agents, Knowledge Graphs, Memory, and Real-Time Systems | Discover how AI agents and foundation models are revolutionizing data engineering through advanced real-time data processing. Learn cutting-edge approaches to agent memory and knowledge representation using semantic layers, vector embeddings, and graph RAG (Retrieval Augmented Generation) systems that power modern AI applications. This expert session explores the evolution from traditional data modeling to dynamic knowledge graphs, with implementation strategies for building responsive, context-aware data platforms and agent memory. Industry leaders share practical insights on adopting these technologies and advancing your career in AI-driven data engineering, including real-world case studies and emerging best practices.
Read More
Hamilton Ulmer
Hamilton Ulmer
UI Engineer & Designer
MotherDuck
Hamilton Ulmer
UI Engineer & Designer
MotherDuck
UI Engineer & Designer MotherDuck
Analytics & BI
Instant Preview Mode: Real-Time Feedback to Make SQL Data Exploration Fly
Imagine writing SQL queries that give you instant visual feedback, transforming your entire data exploration experience. In this talk, you'll see how MotherDuck's Instant Preview Mode breaks through traditional development barriers by providing real-time results as you type. Powered by cutting-edge client-side query parsing and DuckDB-WASM, this technology eliminates the frustrating write-run-debug cycle that's slowed down data professionals for years. You'll see how we've created a system that not only accelerates query iteration but makes working with SQL—especially AI-generated queries—feel more intuitive and responsive than ever before.
Read More
Vignesh Chadramohan
Vignesh Chadramohan
Engineering Manager
Doordash
Vignesh Chadramohan
Engineering Manager
Doordash
Engineering Manager Doordash
ML OPs & Platforms
Internals of SlateDB: An Embedded Key-Value Store Built on Object Storage
Object storage platforms like S3 and Azure Blob Storage have transformed data systems, enabling new architectural paradigms. This session explores SlateDB, an embeddable storage engine built in Rust that leverages object storage's unique properties. Attendees will dive into how conditional writes, checkpoints, transactions, and remote compaction can be implemented, discovering insights that extend beyond SlateDB to broader data system design and implementation.
Read More
Nikita Vemuri
Nikita Vemuri
Software Engineer
Anyscale
Nikita Vemuri
Software Engineer
Anyscale
Software Engineer Anyscale
ML OPs & Platforms
From Scaling to Observability: Solving Key Challenges for Distributed ML with Ray
As machine learning workloads grow increasingly complex, distributed training across thousands of nodes presents significant challenges. This talk explores how the Ray library ecosystem tackles critical issues in multi-node ML training, focusing on development, orchestration, and comprehensive observability. Attendees will learn about innovative solutions for tracking system data, managing potential failure points, and implementing robust observability workflows that persist critical information.
Read More
Ethan Brown
Ethan Brown
Director, Data & Applied Science
Twitch / AWS
Ethan Brown
Director, Data & Applied Science
Twitch / AWS
Director, Data & Applied Science Twitch / AWS
GenAI Applications
Building an LLM-Powered Analytics Slack Bot at Twitch
The best way to beat a wave of automation is to surf it. With this principle in mind, the data team at Amazon IVS / Twitch Video has developed an LLM-powered data analytics bot to augment their data operations. The bot integrates with Slack, allowing employees to interact with data tools through a familiar chat interface. It performs a range of tasks including SQL query generation, chat summarization, and account lookups. This talk provides a practical walkthrough of the implementation, demonstrating how teams can build similar solutions using standard AWS services.
Read More
John Bagnall
John Bagnall
Senior Data Product Manager
Matillion
John Bagnall
Senior Data Product Manager
Matillion
Senior Data Product Manager Matillion
Lightning Talks
Humanizing Data Architecture: How Design Thinking Transforms Data Strategy
As organizations embrace increasingly complex data architectures like data mesh and data fabric, a critical challenge emerges: how do we ensure these sophisticated technical solutions genuinely serve human needs? This session introduces design thinking as a transformative framework for developing data strategies that balance technical excellence with profound user-centricity. Through practical examples and deep case studies, explore how empathy, innovative problem-solving, and iterative feedback can revolutionize data architecture. Attendees will learn to apply design thinking's core principles—understanding stakeholder needs, articulating human-centered problems, generating innovative solutions, rapid prototyping, and continuous improvement—to create data products that are not just technically sophisticated, but truly meaningful and accessible to their users.
Read More
CL Kao
CL Kao
Founder
Recce
CL Kao
Founder
Recce
Founder Recce
Lightning Talks
Data Engineering Is Not Software Engineering, Until It Is
Modern Data Engineering: Bridging DevOps, MLOps, and Software Development | This technical session examines how modern data engineering is evolving beyond traditional software engineering practices, focusing on data pipeline architecture, testing frameworks, and deployment strategies. Through real-world case studies from dbt (data build tool) implementations and SQLMesh data transformation workflows, the presentation explores how data teams are adopting GitOps methodologies, continuous integration, and version control for data-centric systems. As artificial intelligence and machine learning operations become central to software development, these emerging data engineering practices are reshaping how teams approach data quality, system validation, and production deployment. The session will demonstrate how differences in ETL pipeline feedback loops and data testing environments are driving new best practices for managing enterprise data systems, while offering insights into the future convergence of CI/CD, data governance, and MLOps practices.
Read More
Avi Press
Avi Press
CEO
Scarf
Avi Press
CEO
Scarf
CEO Scarf
Lightning Talks
Open Source Success: Learnings from 1 Billion Downloads
This data-driven analysis examines user behavior patterns across 1 billion open source package downloads, spanning 2000+ GitHub repositories and open source projects tracked through Scarf Analytics. The research reveals critical insights for open source maintainers and OSS business leaders, covering package management trends, download metrics, and documentation strategy. By analyzing global distribution patterns, software packaging formats, and community engagement metrics, the presentation provides actionable strategies for open source project growth, user adoption, and sustainable business development in the open source ecosystem. The findings highlight how successful OSS projects leverage download analytics, developer documentation, and community metrics to drive project adoption and monetization.
Read More
Michael Cohen
Michael Cohen
Global Chief Data & Analytics Officer
Plus Company
Michael Cohen
Global Chief Data & Analytics Officer
Plus Company
Global Chief Data & Analytics Officer Plus Company
Lightning Talks
The Art of Data: Reimaging Creative Processes with Data Culture
This session tackles a persistent challenge in creative industries: why do artists often see data as the enemy of creativity, and how can we change that perception? Drawing from hands-on experience, the presentation explores how organizations can transform data from a creative constraint into an inspiration catalyst. We'll dive into practical strategies for building data literacy among creative teams, showcase compelling examples of data storytelling in artistic contexts, and demonstrate how leading creative professionals are using analytics to amplify rather than stifle their artistic vision. Learn how successful organizations are bridging the gap between data teams and creatives, fostering a culture where intuition and analytics work in harmony to drive more impactful creative outcomes.
Read More
Dylan Perez  Neider
Dylan Perez Neider
Sr. Solutions Engineer
Sigma Computing
Dylan Perez Neider
Sr. Solutions Engineer
Sigma Computing
Sr. Solutions Engineer Sigma Computing
Workshops
Text-to-SQL Is Not the Answer: How to Effectively Use AI For Analytics
Think BI is dead? Will natural language replace the dashboard? Sigma's Wednesday morning workshop breaks down why generative AI is a powerful supplement - not replacement - for BI practices, and examines how to effectively embed AI into your analytics workflows.
Read More
Dadi Atar
Dadi Atar
VP Product
Montara
Dadi Atar
VP Product
Montara
VP Product Montara
Workshops
Analytics and the dark side of the Analytics Development Lifecycle
In this insightful session, we examine how the Analytics Development Lifecycle (ADLC) introduced essential structure to data workflows but unintentionally created organizational bottlenecks by limiting data warehouse access to engineers. Our speaker shares how innovative Data Teams are successfully enabling analysts, product managers, and data scientists to migrate their work to Data Warehouse tables while maintaining strong Data Governance and Quality Standards. Discover practical DataOps Strategies that balance democratized data access with the structured Quality Assurance processes that modern enterprises require for effective Data Management.
Read More
Sudarsan  Lakshmi
Sudarsan Lakshmi
Head of Engineering
e6data
Sudarsan Lakshmi
Head of Engineering
e6data
Head of Engineering e6data
Workshops
Everything Everywhere All at Once: Object Store Native
Discover how e6data’s lakehouse compute engine runs complex and high-concurrency SQL analytics and AI workloads 10x faster than all leading engines at 1/3rd the cost—all with zero data movement. Learn how e6data’s atomically scalable lakehouse architecture helps achieve sub-second latencies even under heavy concurrency. This technical deep-dive covers e6data’s atomically scalable K8s native architecture, disaggregated compute design, and open table format integration showing the future of SQL analytics and AI workloads through real-world performance benchmarks and production case studies. Learn how an object-store-native approach unlocks “everything, everywhere, all at once” in modern data ecosystems.
Read More
Beto Ferreira  De Almeida
Beto Ferreira De Almeida
Staff Engineer
Preset
Beto Ferreira De Almeida
Staff Engineer
Preset
Staff Engineer Preset
AI & Data Culture
Data Should be Invisible

The modern data landscape is dominated by complexity: tables, schemas, pipelines, warehouses, and more. Yet the most successful data platforms share a common principle—they make data itself invisible to the end user. When data infrastructure functions optimally, it's like good plumbing: you only notice it when something breaks. Organizations often fixate on the mechanics of data while losing sight of what truly matters: metrics, dimensions, and semantics. When users engage with meaningful abstractions rather than technical details, they make better decisions faster. In this talk, you'll learn strategies for making data invisible through real-world abstraction success stories, designing effortless interactions, and implementing governance through abstraction. Walk away with practical ways to assess your data stack, advocate for user-centric approaches, and measure progress—making your data platform not just powerful, but invisible in all the right ways.

Read More
Josh Curl
Josh Curl
Co-Founder & CTO
Hightouch
Josh Curl
Co-Founder & CTO
Hightouch
Co-Founder & CTO Hightouch
AI & Data Culture
Bridging the AI Implementation gap: Strategies for Embedding Data Professionals with Business Units

At the foundation of AI project failures lies a critical gap between data teams and business reality. On top of this gap, data quality issues, unexpected privacy concerns, and tools that don't align with actual business problems arise to hinder or block implementation. As we've built our own AI product—AI Decisioning—and implemented it with customers, we've learned that successful AI implementations depend on embedding data teams within business units. Embedding doesn't mean breaking apart your data team and dispersing it throughout every other department. It means establishing focused partnerships where data team members are deeply integrated into business teams' daily workflows and decision-making processes while remaining connected to the central data organization. This embedding creates a virtuous cycle: data teams gain deep domain knowledge, business professionals see improved data quality and gain data facility, and together, data and business teams implement AI solutions that solve real problems. In this talk, we'll share concrete examples of how data teams (especially data scientists) and marketing have worked together in successful AI Decisioning implementations. We’ll derive strategies to implement this organizational pattern and enable a company to move from analytics to actions and from data teams as service providers to active collaborators. While our case studies focus primarily on marketing partnerships, the embedded partnership model we present applies equally to other business functions including product development, operations, and customer service teams.

Read More
Tomás Kofman
Tomás Kofman
Co-Founder & CEO
Not Diamond
Tomás Kofman
Co-Founder & CEO
Not Diamond
Co-Founder & CEO Not Diamond
Lightning Talks
How to Build Your Own Model Router

Building Cost-Effective LLM Routers: Boost Accuracy 25% While Cutting Costs 90% | This session reveals how to build intelligent model routers that dynamically direct inputs to the optimal large language model (LLM) for each specific task. Attendees will learn practical implementation strategies for multi-model LLM systems that significantly improve performance metrics—achieving up to 25% higher accuracy while reducing operational costs by as much as 90%. The presentation covers essential routing methodologies, evaluation frameworks, and scalable architectures for production deployments. Developers and ML engineers will gain actionable insights for overcoming technical challenges in multi-model LLM systems, optimizing both performance and cost-efficiency in generative AI applications. Perfect for teams looking to maximize ROI from their AI infrastructure while maintaining high-quality outputs.

Read More
Diptanu Gon  Choudhury
Diptanu Gon Choudhury
Founder& CEO
Tensorlake
Diptanu Gon Choudhury
Founder& CEO
Tensorlake
Founder& CEO Tensorlake
Data Eng & Infrastructure
The Future Of Data Engineering: From Unstructured To Structured for Agent Systems
Next-Generation Data Engineering: AI Agents, Knowledge Graphs, Memory, and Real-Time Systems | Discover how AI agents and foundation models are revolutionizing data engineering through advanced real-time data processing. Learn cutting-edge approaches to agent memory and knowledge representation using semantic layers, vector embeddings, and graph RAG (Retrieval Augmented Generation) systems that power modern AI applications. This expert session explores the evolution from traditional data modeling to dynamic knowledge graphs, with implementation strategies for building responsive, context-aware data platforms and agent memory. Industry leaders share practical insights on adopting these technologies and advancing your career in AI-driven data engineering, including real-world case studies and emerging best practices.
Read More
Gleb Mezhanskiy
Gleb Mezhanskiy
Co-Founder & CEO
Datafold
Gleb Mezhanskiy
Co-Founder & CEO
Datafold
Co-Founder & CEO Datafold
Data Eng & Infrastructure
The Future Of Data Engineering: From Unstructured To Structured for Agent Systems
Next-Generation Data Engineering: AI Agents, Knowledge Graphs, Memory, and Real-Time Systems | Discover how AI agents and foundation models are revolutionizing data engineering through advanced real-time data processing. Learn cutting-edge approaches to agent memory and knowledge representation using semantic layers, vector embeddings, and graph RAG (Retrieval Augmented Generation) systems that power modern AI applications. This expert session explores the evolution from traditional data modeling to dynamic knowledge graphs, with implementation strategies for building responsive, context-aware data platforms and agent memory. Industry leaders share practical insights on adopting these technologies and advancing your career in AI-driven data engineering, including real-world case studies and emerging best practices.
Read More
Nathan Sooter
Nathan Sooter
Sr. Manager, RevOps Analytics & Insights
1Password
Nathan Sooter
Sr. Manager, RevOps Analytics & Insights
1Password
Sr. Manager, RevOps Analytics & Insights 1Password
Analytics & BI
Go-To-Market Data Enrichment: Practical Strategies to Drive Business Value
Let’s be frank, data teams and sales teams don’t always see eye to eye. Data teams see Salesforce as a swamp of messy, user-generated chaos. Sales teams often see data teams as a slow-moving black box. The result? Frustration on both sides and a missed opportunity to drive business value. But what if data teams weren’t just seen as pipeline or dashboard builders, but as strategic partners in revenue growth? In this session, we’ll show how simple, no-fuss data engineering and enrichment can transform the way sales teams trust and use data. Through real-world examples, like cleaning up CRM records with the power of LLMs, we’ll explore how small but intentional changes in ingestion and modeling can change the perception of a data team. We'll explore practical strategies to make your work more visible, valuable, and aligned with the GTM team.
Read More
Margaret Quigley
Margaret Quigley
ex-Cohere Head of Data Acquisition
MQ Consulting
Margaret Quigley
ex-Cohere Head of Data Acquisition
MQ Consulting
ex-Cohere Head of Data Acquisition MQ Consulting
Lightning Talks
Ethical Data Acquisition & Sales in the AI Age
Learn strategies for sitting on both sides of a data acquisition negotiation table - not just how to evaluate and price new data for training & evaluating AI/LLMs, but also how to efficiently package and sell it as a data owner. Margaret will cover real-world examples from her 7 years of experience running Data Acquisition & GTM teams with leading AI companies and data vendors that touch on nuances within ethics & due diligence, security & storage, and transparency & accountability.
Read More
Jonathan Mortensen
Jonathan Mortensen
CEO
Confident Security
Jonathan Mortensen
CEO
Confident Security
CEO Confident Security
ML OPs & Platforms
The Unofficial Guide to Apple’s Private Cloud Compute
In October 2024, Apple released a new private AI technology onto millions of devices called “Private Cloud Compute”. It brings the same level of privacy and security a local device offers but on an “untrusted" remote server. This talk discusses how Private Cloud Compute represents a paradigm shift in confidential computing and explores the core advancements that made it possible to become mainstream. We’ll explore its novel architecture that allows developers to run sensitive, multi-tenant workloads with cryptographically-provably privacy guarantees at scale and at reasonable cost. Attendees will leave with an understanding of how to leverage this technology for data and AI applications where privacy and security is paramount.
Read More
Skip Everling
Skip Everling
Head of Developer Relations
Kolena
Skip Everling
Head of Developer Relations
Kolena
Head of Developer Relations Kolena
Workshops
AI-Powered Automation: Supercharge Data-Intensive Workflows with Intelligent Agents
In today’s fast-paced, data-heavy industries, crucial information is often buried in PDFs, contracts, compliance reports, and other unstructured sources—slowing down decision-making and increasing risk. Join us for an interactive workshop where we’ll showcase how to use AI gents to automate data-intensive workflows for analysts, compliance officers, underwriters, diligence teams, and knowledge workers. In this hands-on session, you’ll learn how AI can: Automate repetitive tasks — freeing your team to focus on high-value analysis. Enhance accuracy & consistency — reducing errors and ensuring data integrity. Accelerate decision-making — with faster data extraction and smarter insights. Scale operations — handling complex tasks across massive document sets with ease. We’ll walk through real-world use cases, including compliance assessments, contract analysis, risk evaluation, and more—demonstrating how AI agents can streamline workflows, reduce bottlenecks, and drive smarter decisions. If you deal with large volumes of documents, this workshop is for you. Walk away with actionable strategies to: Boost productivity. Reduce risk. Gain a competitive edge.
Read More
Jacob Matson
Jacob Matson
Developer Advocate
MotherDuck
Jacob Matson
Developer Advocate
MotherDuck
Developer Advocate MotherDuck
Workshops
More Than a Vibe: AI-Driven SQL that Actually Works
In this hands-on workshop, we will demonstrate how AI can empower you to "vibe code"—using AI to write accurate SQL, enabled only by the magic of MotherDuck & DuckDB. Participants will work with a real life spatial data set to tackle real-world challenges and see firsthand how AI-Driven DuckDB SQL can transform data handling into a rapid, low-risk, interactive process. By the end of the workshop, participants will have experienced an end-to-end workflow: from ingesting and querying spatial data with DuckDB/MotherDuck, to refining query results with AI, and finally presenting insights through Python visualizations. This session is designed to empower you to confidently incorporate AI in your coding processes, transforming how you approach data analysis and decision making in real-world business scenarios.

1. Key Components of the Workshop: Dataset Handling: Participants will work with a spatial dataset to evaluate potential locations for opening a new BBQ restaurant. Thanks to MotherDuck, the dataset is easily brought down locally in a highly compressed format, ensuring a quick and safe environment for experimentation.

2. Live AI-Assisted Coding: The workshop will feature a live demonstration where an AI tool iteratively generates SQL queries. Rather than pre-defining metrics, the AI assists in exploring and defining the spatial parameters necessary to identify the optimal restaurant location—a process that mirrors real-world, dynamic decision-making.

3. Real-Time Data Visualization: As queries are refined and executed, Python will be used to chart the results on the fly. Utilizing uv for environment management alongside visualization libraries such as Seaborn and Matplotlib, participants will see how spatial insights are translated into clear, actionable charts.

4. Iterative, Low-Risk Workflow: The session emphasizes a low-risk, experimental approach. If the AI-generated code isn't perfect, no harm is done—files can be quickly deleted or corrected, encouraging a creative, hands-on learning environment where trial and error lead to deeper understanding.
Read More
Cole Bowden
Cole Bowden
Developer Advocate
Firebolt
Cole Bowden
Developer Advocate
Firebolt
Developer Advocate Firebolt
Workshops
The Power of Low Latency Data for AI Apps
Retrieval-augmented generation (RAG) has transformed AI applications by grounding responses with external data. It can be better. By pairing RAG with low latency SQL analytics, you can enrich responses with instant insights, leading to a more interactive and insightful user experience with fresh, data-driven intelligence. In this talk, we’ll demo how low latency SQL combined with an AI application can deliver speed, accuracy, and trust.
Read More
Rui Lopes
Rui Lopes
Head of AI
DataLinks
Rui Lopes
Head of AI
DataLinks
Head of AI DataLinks
Workshops
Powering AI Workflows with Tabular Graphs
DataLinks is the new semantic layer for AI systems. Join us in this workshop to gain a concise overview of our entity-linking technology, backed by two dynamic demonstrations. First we will enable you to experience firsthand how our intuitive user interface simplifies complex data integration, visualization, and exploration, enabling rapid discoverability and seamless dataset linkage. Then we invite you to discover the flexibility of our API and Python SDK, designed for developers to effortlessly integrate automated entity resolution and graph-based insights into their workflows and applications. Finally, we'll show how to leverage our platform for natural language search over your data enabling AutoRAG for your application.
Read More
Issac Roth
Issac Roth
Co-Founder & CEO
Orama
Issac Roth
Co-Founder & CEO
Orama
Co-Founder & CEO Orama
Lightning Talks
OramaCore: A Search Database with LLMs Built-In
In this fast-paced talk by we’ll dive right in to why the world needed another database - this time with multiple LLMs and a JavaScript engine right in the same process. A database that runs on GPUs? Why why why? Turns out this is the ultimate platform for agentic AI like the SaaS Copilots and answer engines that developers create with Orama. That’s why! We’ll look at the construction of the database, the algorithms involved, how we made it fast, and a little bit of what you can do with it. OramaCore is open source and just released!
Read More
Alexy Khraborov
Alexy Khraborov
AI/ML Community Architect
Neo4j
Alexy Khraborov
AI/ML Community Architect
Neo4j
AI/ML Community Architect Neo4j
Lightning Talks
OAKS: Open Agentic Knowledge Stack

The first two years of the GenAI revolution are bending the OSS way: Open Source models have reached state of the art, and most of the ecosystem around AI is open-source. The key to AI adoption is properly organizing and using business knowledge. In industry, LLMs give way to Small Specialized Models (SSMs), utilized by Domain Expert Agents (DXAs). Their work should be structured according to the domain requirements, requiring structured output. Organizing and using domain knowledge for AI has long been a domain of Knowledge Graphs (KG). At Neo4j, we are in a moment where our KG leadership powers the rise of GraphRAG, a better context traversal that we lead alongside Microsoft, Amazon, Google, and other GenAI partners. We also integrate with many OSS AI startups to build a better AI stack around GraphRAG. Neo4j has joined LFAI to bridge the enterprise AI adoption with startup innovation, centered around structured knowledge. In this talk we describe OAKS, a set of projects, communities, and technologies that comprise the Open Agentic AI Knowledge Stack. We show where the most value will be created and how the OSS AI ecosystems come together to build and deliver it.

OAKS consists of structured input, knowledge transformation, and structured output. We show the Agentic AI architectures emerging around AI memory, graph-based agentic workflows, and frameworks including scalable message passing, knowledge encapsulation, and colocated knowledge and computation for web-scale routing. We invite the community to join us!

Read More
Anant Agarwal
Anant Agarwal
Staff Software Engineer & Engineering Lead
Instacart
Anant Agarwal
Staff Software Engineer & Engineering Lead
Instacart
Staff Software Engineer & Engineering Lead Instacart
Data Eng & Infrastructure
Orchestrating at Scale: How Instacart Manages 20M+ Daily Workflows

Building High-Throughput Data Orchestration: Instacart's Journey to 20M Daily Workflows | Explore how Instacart built an enterprise-grade orchestration system handling 20 million daily workflows across diverse technical domains. Learn implementation details of their cloud-native platform combining Apache Airflow and Temporal for robust scheduling and execution. Deep dive into YAML-based workflow definitions, GitOps deployment patterns, and observability solutions that enable reliable scaling. Practical insights from years of production experience, applicable to both startups and enterprises building scalable data infrastructure.

Read More
Skyler Thomas
Skyler Thomas
Co-Founder & CTO
Cake AI
Skyler Thomas
Co-Founder & CTO
Cake AI
Co-Founder & CTO Cake AI
GenAI Applications
Make Too Much Knowledge Just Enough. Massive Scale RAG and GraphRAG with Open Source
RAG systems that work in the real world are not just the trivial extract, vector search, and rerank systems that the simplistic "Introductions to RAG" suggest. After this talk, you will understand how to think about the design and construction of real world RAG and GraphRAG systems that can scale to hundreds of millions of documents or billions of vectors. You will learn about the complex orchestration of multiple libraries. You will also learn how to use tools and frameworks that use open standards like OpenTelemetry or OpenInference to help you monitor and debug these complex RAG orchestrations. Topics will include discussions of scalable RAG/GraphRAG architectures, complex extraction flows, embedding model and re-ranking considerations. We will dive deep into integration between various libraries like Ray, LangChain, LlamaIndex, DSPy, Phoenix, Weaviate, PgVector, GraphRAG, LangGraph, AirFlow, KFP and vLLM to form a cohesive solutions that actually scale. We will discuss the patterns and anti-patterns Cake has learned building and deploying these systems for real customers. If time permits, we will address advanced topics like complex table-detection/extraction for financial data, complex agentic flows to handle heterogeneous datasets, etc.
Read More
Brenna Buuck
Brenna Buuck
Developer Evangelist
MinIO
Brenna Buuck
Developer Evangelist
MinIO
Developer Evangelist MinIO
Lightning Talks
The Middle Ground: Balancing Batch and Real-Time Processing in a Data Lakehouse
Data Lakehouse Architecture: Unifying Batch and Real-Time Data Processing | Is your organization stuck choosing between batch and streaming? The reality is, you probably need both. This session explores how modern data lakehouse architectures are breaking down the false dichotomy between batch and real-time processing. We'll examine how innovative organizations are using lakehouse platforms to handle everything from millisecond-latency queries to massive batch analytics jobs on a single unified platform. Learn how this hybrid approach is transforming data infrastructure, reducing complexity, and enabling teams to build more flexible, future-proof data systems.
Read More
Colleen Tartow
Colleen Tartow
Senior Director, Enterprise Data Engineering
Capital One
Colleen Tartow
Senior Director, Enterprise Data Engineering
Capital One
Senior Director, Enterprise Data Engineering Capital One
AI & Data Culture
No More BS: How (and When) to Really Leverage AI
Successful AI implementation hinges on a solid foundation of data quality and governance, and the current hype often overshadows the critical practical considerations needed to achieve that foundation. Moreover, while AI holds immense potential, it's crucial to evaluate whether it's truly the optimal solution for a given business problem, as simpler, more established methods may be equally or more effective. We present a practical framework to assess whether AI is the optimal solution, and encourage some good old-fashioned critical thinking. Join Colleen Tartow and Lindsay Murphy for a data-driven conversation exploring AI's true viability.
Read More
Marco Slot
Marco Slot
Software Imagineer
Crunchy Data
Marco Slot
Software Imagineer
Crunchy Data
Software Imagineer Crunchy Data
Databases
Converging Database Architectures: DuckDB in PostgreSQL
Traditionally divided between transactional and analytical systems, databases are converging through innovative architectural approaches. This talk explores the fusion of PostgreSQL and DuckDB, demonstrating how embedding an OLAP database into an OLTP system can simplify data platforms. Attendees will learn about the motivations, challenges, and substantial benefits of creating a unified system capable of high-throughput transactions, fast analytical queries, and seamless data processing across different paradigms.
Read More
Anil Sadineni
Anil Sadineni
Principal Software Engineer
1upHealth
Anil Sadineni
Principal Software Engineer
1upHealth
Principal Software Engineer 1upHealth
Lightning Talks
A Modern Data Stack in Healthcare
The US Healthcare industry faces complex data exchange challenges, with legacy standards creating massive processing burdens. This session explores how emerging technologies like FHIR can transform healthcare data management by leveraging modern data stack approaches. Attendees will discover innovative strategies for addressing unique healthcare data challenges, including cross-entity data contracts, identity management, and end-to-end lineage preservation. Learn how technologies from social media, advertising, and finance can revolutionize healthcare data processing, overcoming traditional interoperability and scalability limitations.
Read More
Dr. Greg Michaelson
Dr. Greg Michaelson
Co-Founder & Chief Product Officer
Zerve AI
Dr. Greg Michaelson
Co-Founder & Chief Product Officer
Zerve AI
Co-Founder & Chief Product Officer Zerve AI
Workshops
Scaling GenAI & Agentic Workflows for practical solutions with Zerve
Enterprises investing in Generative AI (GenAI) or Agentic Workflows need more than just cutting-edge models—they need scalable, cost-efficient systems that deliver real business impact. In this session we’ll show how Zerve unlocks the full potential of GenAI using it’s distributed computing engine, The Fleet. You’ll learn how enterprises as advanced as Canal+ and NASA as well as cutting edge startups are streamlining AI development, reducing infrastructure costs, and transforming GenAI into a scalable, high-impact business solution.
Read More
CURATING TRACK SPEAKERS. STAY TUNED.
View all speakers

100+ Speakers

Learn from data & AI heroes at top companies as they explain their architectures, discoveries and solutions in detail.

Talk Schedule

All
Keynotes
Data Eng & Infrastructure
Data Sci & Algos
ML OPs & Platforms
Analytics & BI
Lightning Talks
Workshops
Databases
Foundation Models
AI Engineering
GenAI Applications
AI & Data Culture
Lloyd Tabb
Lloyd Tabb
Founder/Former CTO - Looker & Co-creator of Malloy
Meta
Lloyd Tabb
Founder/Former CTO - Looker & Co-creator of Malloy
Meta
Founder/Former CTO - Looker & Co-creator of Malloy Meta
Analytics & BI
Building Blocks: Reusing Queries in Semantic Data Modeling
Data exploration is like a sophisticated Lego set, where strategic piece selection transforms understanding. This session delves into advanced semantic data modeling, revealing how reusing queries creates more powerful, intelligent building blocks that enhance comprehension for both humans and AI. Attendees will learn how to move beyond traditional tables and measures, revolutionizing their approach to data analysis and uncovering deeper insights through innovative modeling techniques.
Read More
Hannes Mühleisen
Hannes Mühleisen
Co-Creator of DuckDB
DuckDB Labs
Hannes Mühleisen
Co-Creator of DuckDB
DuckDB Labs
Co-Creator of DuckDB DuckDB Labs
Data Eng & Infrastructure
Liberate Analytical Data Management with DuckDB
DuckDB Analytics Engine: High-Performance Data Processing Without Limits | Discover how DuckDB's revolutionary in-process analytical engine transforms data warehouse capabilities through a lightweight, versatile architecture. The engine features state-of-the-art vectorized query processing, morsel-driven parallelism, and advanced memory management that scales from embedded devices to powerful servers. This talk dives deep into DuckDB's innovative design principles, implementation strategies, and optimization techniques that enable previously impossible use cases on single nodes. Learn from real-world applications and performance benchmarks demonstrating DuckDB's impact on modern data analytics workflows.
Read More
Nikunj Handa
Nikunj Handa
Product Lead
OpenAI
Nikunj Handa
Product Lead
OpenAI
Product Lead OpenAI
Foundation Models
OpenAI’s Responses API: A New Foundation for Building with Models & Tools
Last month, OpenAI introduced the Responses API: a programmatic agent API businesses can use to perform a wide variety of tasks. With this new primitive, we radically simplified integration, transforming what previously took hundreds of lines of code into just a few. Built from the ground up based on insights from thousands of developers who have used Chat Completions and Assistants APIs, Responses reimagines simplicity, performance, and flexibility, enabling seamless integration of advanced reasoning, multimedia inputs, and multi-step workflows. In this talk, I'll walk through the design decisions and engineering challenges behind Responses. You'll learn how we anticipated developer needs to create an API uniquely engineered for agent-like use cases, capable of handling simultaneous tool calls and seamless multi-turn conversations. We'll explore key features like built-in state management, semantic streaming, intelligent token truncation, and support for hosted tools (ex: web search, file search, and computer operations) that significantly reduce complexity and enhance real-time interactions. And, we’ll talk about how Responses empowers developers to build faster, smarter, and more responsive AI applications than ever before – driving the next wave of intelligent, agentic experiences.
Read More
Naveen Rao
Naveen Rao
VP of AI
Databricks
Naveen Rao
VP of AI
Databricks
VP of AI Databricks
Keynotes
Data Meets Intelligence: Where the Data Infra & AI Stack Converge
As organizations navigate the AI revolution, the traditional boundaries between data infrastructure and AI systems are blurring. This session explores the critical convergence point where data management meets machine intelligence, examining how this intersection is reshaping enterprise technology stacks. Our keynote panelists bring complementary perspectives from operating and investing in this rapidly evolving landscape. Drawing from experiences spanning neuroscience, hardware architecture, product development, and venture capital, they'll unpack the technical and strategic considerations for organizations building modern data + AI platforms. Join us for an insightful discussion on how this convergence enables scalable AI adoption, the architectural patterns emerging across industries, and what the future holds as data infrastructure and AI capabilities become increasingly interdependent in driving business transformation.
Read More
Ravin Kumar
Ravin Kumar
Senior Researcher
Google Deepmind
Ravin Kumar
Senior Researcher
Google Deepmind
Senior Researcher Google Deepmind
Foundation Models
Models as Tools: My Perspective On the Matter
You can look at GenAI from many perspectives. For me perspective shifts when I'm building products, to when I'm training foundation models, to being a day to day user of GenAI. However, most people aren't doing all these things. For the audience here I suggest focusing on one practical angle: LLMs as tools. In this talk I'll share how in this perspective LLMs are just any other tool. By starting with this perspective it'll ensure you start from a grounded realistic perspective before moving into the exciting more hype laden aspects of this new technology.
Read More
Denis Yarats
Denis Yarats
Co-Founder & CTO
Perplexity
Denis Yarats
Co-Founder & CTO
Perplexity
Co-Founder & CTO Perplexity
Keynotes
RAGs to Riches: Engineering the Future of LLM Systems
This keynote panel features Denis Yarats and Joseph Gonzalez -- two pioneers bridging academic theory and practical application. Joseph Gonzalez has transformed his Berkeley research into tangible solutions through LM-Sys and Gorilla projects, now bringing his expertise in machine learning and robotics to RunLLM.com after successfully launching Turi based on his doctoral work. Denis Yarats complements this approach with his reinvention of information discovery at Perplexity, where he's leveraging his NYU PhD and Facebook AI experience to develop Comet, a revolutionary "browser for agentic search." Together, they exemplify how rigorous academic foundations can be transformed into technologies that solve real-world problems and reshape our digital interactions.
Read More
Hadley Wickham
Hadley Wickham
Chief Scientist
Posit
Hadley Wickham
Chief Scientist
Posit
Chief Scientist Posit
Data Sci & Algos
LLMs for Data Science
Obviously everyone is super excited about LLMs right now, and while there's a large element of hype in the popularity they are also genuinely useful. In this talk I'll give a round up of data science things that I've found LLMs particularly useful for, broken up into three broad categories: writing code, writing prose, and rectangling fundamentally non-rectangular data (e.g. test, images, videos, audio). 
Read More
Aaron Katz
Aaron Katz
Co-Founder & CEO
Clickhouse
Aaron Katz
Co-Founder & CEO
Clickhouse
Co-Founder & CEO Clickhouse
Keynotes
Real-Time Data Infrastructure and AI: Powering the Next Generation of Analytics

Join us for an authentic fireside chat that cuts through industry hype to explore how real-time data infrastructure is transforming analytics and AI. We'll examine technical hurdles in processing data at scale in real-time and the architectural decisions enabling performance breakthroughs. The discussion covers data processing evolution, performance bottlenecks in distributed systems, observability innovations, and approaches for maintaining consistency while increasing throughput. Gain insights into how real-time processing creates competitive advantages for business intelligence and AI applications, with honest assessments of implementation challenges and practical solutions.

Read More
Paige Bailey
Paige Bailey
AI Developer Experience Engineer
Google
Paige Bailey
AI Developer Experience Engineer
Google
AI Developer Experience Engineer Google
Workshops
Introduction to Google DeepMind's Models: Gemini 2.0, Imagen 3, and Veo
This intensive workshop is designed for developers eager to explore the cutting-edge capabilities of Google's latest AI tools. Participants will gain hands-on experience working with the Gemini APIs, Google AI Studio, Veo 2, and Imagen 3, enabling them to build intelligent applications and generate stunning creative content. We'll also cover how to use Gemini 2.0 in developer tools like Cursor, Sourcegraph Cody, and more.
Read More
Ryan Blue
Ryan Blue
Creator of Apache Iceberg, Member of Technical Staff
Databricks
Ryan Blue
Creator of Apache Iceberg, Member of Technical Staff
Databricks
Creator of Apache Iceberg, Member of Technical Staff Databricks
Data Eng & Infrastructure
Why is Everyone Talking about Apache Iceberg™? (From the Original Creator of Apache Iceberg)
This talk is a primer for Apache Iceberg™ from one of its original creators. In this talk Ryan Blue, CEO of Tabular (now part of Databricks) and the original creator of Apache Iceberg, discusses its origin and why it's even more relevant today. Ryan will discuss the early days of Apache Iceberg at Netflix, how the project evolved at Tabular, and how Tabular (now part of Databricks) will continue its mission of creating a universal format. Attendees will gain an understanding of Apache Iceberg and how open table formats like it are changing the analytic database industry.
Read More
Ganesh Ramanarayanan
Ganesh Ramanarayanan
VP Engineering
Hex
Ganesh Ramanarayanan
VP Engineering
Hex
VP Engineering Hex
Analytics & BI
Multi-Modal Compute for Data Analytics
Following their groundbreaking Data Council 2022 presentation, Hex continues to push notebook technology boundaries with an innovative approach to data analytics. This session delves into their unique, fully parallelized, multi-modal backend, revealing how sophisticated computational techniques are transforming data processing. Attendees will explore cutting-edge methods that redefine performance, flexibility, and computational efficiency in modern data workflows, gaining insights into the next generation of analytical computing.
Read More
Raghotham Murthy
Raghotham Murthy
Software Engineer, Llama
Meta
Raghotham Murthy
Software Engineer, Llama
Meta
Software Engineer, Llama Meta
Foundation Models
Building LLM Applications with Llama Stack
In this talk, Raghotham describes what it takes to build production grade LLM applications. Unlike regular applications, LLM applications are non-deterministic, and require a unique set of building blocks to support the full software development lifecycle from building to testing to deploying to monitoring to then improving the application. We will show how Llama Stack can be used to build and improve LLM applications in different environments – local development, cloud, on-prem, and mobile.
Read More
Martin Casado
Martin Casado
General Partner
a16z
Martin Casado
General Partner
a16z
General Partner a16z
Keynotes
Real-Time Data Infrastructure and AI: Powering the Next Generation of Analytics

Join us for an authentic fireside chat that cuts through industry hype to explore how real-time data infrastructure is transforming analytics and AI. We'll examine technical hurdles in processing data at scale in real-time and the architectural decisions enabling performance breakthroughs. The discussion covers data processing evolution, performance bottlenecks in distributed systems, observability innovations, and approaches for maintaining consistency while increasing throughput. Gain insights into how real-time processing creates competitive advantages for business intelligence and AI applications, with honest assessments of implementation challenges and practical solutions.

Read More
Tanya Bragin
Tanya Bragin
VP Product
ClickHouse
Tanya Bragin
VP Product
ClickHouse
VP Product ClickHouse
Databases
Unbundling of the Cloud Data Warehouse
The era of proprietary cloud data warehouses in the last decade has revealed critical challenges: performance bottlenecks, escalating costs, and vendor lock-in. This session examines how open-source technologies and data lake standards are transforming the modern data stack. Explore how platforms like ClickHouse, Iceberg, and other open technologies are providing organizations with flexible, cost-effective alternatives to monolithic cloud data warehouses, enabling more diverse and efficient data workflows.
Read More
Ethan Rosenthal
Ethan Rosenthal
Member of Technical Staff
Runway
Ethan Rosenthal
Member of Technical Staff
Runway
Member of Technical Staff Runway
Foundation Models
Building a Data Foundation for Multimodal Foundation Models
While it is often easier to simply throw more data at a problem, scale is not all you need when building multimodal foundation models. Data quality continues to be just as important as data quantity, and supporting “data-centric AI” requires lowering the barrier to data curation as much as possible. However, multimodal data curation presents unique requirements compared to conventional machine learning or business intelligence data management systems. The data is heterogeneous, ranging from scalars to embedding arrays to entire compressed videos. While the dataset sizes in terms of number of rows are not quite Big Data™, the number of bytes is massive with high columnar variance. Given the storage size, it’s infeasible to construct and copy new training datasets for each model training job; training jobs must query the core datasets without copying them. Finally, large scale distributed training jobs require fast random access which bumps up against limitations of typical solutions like partitioned parquet files. In this talk, I will discuss how we built a petabyte-scale, multimodal feature lakehouse. This lakehouse supports analytical querying as well as serving features for large scale distributed training jobs, such as those that were used for training Runway’s recent foundation models like Gen-3 Alpha.
Read More
Tengyu Ma
Tengyu Ma
Co-Founder & CEO
Voyage AI
Tengyu Ma
Co-Founder & CEO
Voyage AI
Co-Founder & CEO Voyage AI
AI Engineering
RAG In 2025: State Of The Art And The Road Forward
Enterprise RAG Systems: Building Robust LLM Knowledge Integration | Master advanced techniques in Retrieval-Augmented Generation (RAG) for enterprise-scale language models. Learn strategies to overcome common RAG pipeline challenges including brittle parsers, suboptimal chunking, and manual query tuning. Deep dive into cutting-edge embedding models and reranking systems that enable automated, scalable knowledge retrieval. Discover practical approaches to building production-ready RAG systems that deliver consistent, high-quality results while minimizing maintenance overhead and manual optimization.
Read More
Charles Frye
Charles Frye
Developer Advocate
Modal Labs
Charles Frye
Developer Advocate
Modal Labs
Developer Advocate Modal Labs
AI Engineering
What Every Data Scientist Needs To Know About GPUs
GPU Optimization for Data Scientists: Essential Knowledge from Silicon to PyTorch | Comprehensive guide to GPU architecture and optimization for modern machine learning workloads. Learn critical GPU concepts from hardware fundamentals to high-level frameworks, with focus on performance tuning for neural networks. Master practical techniques for optimizing system latency and throughput in popular ML frameworks including PyTorch, vLLM, and RAPIDS. Essential knowledge for data scientists and ML engineers working with GPU-accelerated workloads.
Read More
Shreya Rajpal
Shreya Rajpal
Co-Founder & CEO
Guardrails
Shreya Rajpal
Co-Founder & CEO
Guardrails
Co-Founder & CEO Guardrails
AI Engineering
The Future Of Guardrails
AI Safety and Guardrails: Enterprise Framework for Reliable Generative AI | Explore next-generation approaches to implementing guardrails and safety measures in production AI systems, including RAG-enhanced chatbots and autonomous agents. Learn systematic methodologies for risk assessment, reliability monitoring, and failure prevention in enterprise AI deployments. Discover practical frameworks for implementing robust safety controls and guardrails across different AI architectures. Features case studies demonstrating improved system reliability and reduced risks through structured safety protocols and monitoring systems.
Read More
Eno Reyes
Eno Reyes
CTO
Factory
Eno Reyes
CTO
Factory
CTO Factory
AI Engineering
Building Reliable Agentic AI Systems
Building Reliable Agentic AI Systems: Design Principles for Complex Autonomous Software | Explore cutting-edge approaches to designing reliable AI systems that operate autonomously in unpredictable environments. Learn architectural patterns from robotics, cybernetics, and biological systems for building predictable outcomes from non-deterministic components. Deep dive into practical strategies for implementing reliable agentic systems, with focus on stability, error handling, and performance monitoring. Discover emerging patterns for creating AI systems that achieve reliable results despite underlying stochastic processes.
Read More
Sharon Zhou
Sharon Zhou
Founder & CEO
Lamini
Sharon Zhou
Founder & CEO
Lamini
Founder & CEO Lamini
Keynotes
RAGs to Riches: Engineering the Future of LLM Systems
This keynote panel features Denis Yarats and Joseph Gonzalez -- two pioneers bridging academic theory and practical application. Joseph Gonzalez has transformed his Berkeley research into tangible solutions through LM-Sys and Gorilla projects, now bringing his expertise in machine learning and robotics to RunLLM.com after successfully launching Turi based on his doctoral work. Denis Yarats complements this approach with his reinvention of information discovery at Perplexity, where he's leveraging his NYU PhD and Facebook AI experience to develop Comet, a revolutionary "browser for agentic search." Together, they exemplify how rigorous academic foundations can be transformed into technologies that solve real-world problems and reshape our digital interactions.
Read More
Mike Driscoll
Mike Driscoll
Co-Founder & CTO
Rill Data
Mike Driscoll
Co-Founder & CTO
Rill Data
Co-Founder & CTO Rill Data
Analytics & BI
A SQL-Based Metrics Layer for DuckDB and Clickhouse
The ability to aggregate raw data into summarized metrics and slice them across dimensions is at the core of analytics teams' work. This session reveals how Rill has developed a metrics layer that declares metrics entirely with SQL expressions, overcoming traditional limitations of metrics management. By leveraging DuckDB and Clickhouse, attendees will discover how to generate multi-dimensional OLAP cubes, implement real-time data access with sub-second performance, and create uniform dashboards through a BI-as-code philosophy. Learn how to define, manage, and secure metrics using an innovative SQL-based approach that transforms raw data into powerful, actionable insights.
Read More
Bryan Bischof
Bryan Bischof
Head of AI
Theory Ventures
Bryan Bischof
Head of AI
Theory Ventures
Head of AI Theory Ventures
Data Sci & Algos
Failure Is A Funnel
LLM Quality Engineering: From Slop to Production | Learn systematic approaches to evaluating and improving LLM performance, with focus on transforming experimental models into production-ready systems. Master practical frameworks for quality assessment, iterative improvement, and building robust deployment pipelines. Features proven strategies for identifying failure patterns and establishing reliable production environments.
Read More
Nuno Campos
Nuno Campos
Founding Engineer
LangChain
Nuno Campos
Founding Engineer
LangChain
Founding Engineer LangChain
Data Eng & Infrastructure
The Future Of Data Engineering: From Unstructured To Structured for Agent Systems
Next-Generation Data Engineering: AI Agents, Knowledge Graphs, Memory, and Real-Time Systems | Discover how AI agents and foundation models are revolutionizing data engineering through advanced real-time data processing. Learn cutting-edge approaches to agent memory and knowledge representation using semantic layers, vector embeddings, and graph RAG (Retrieval Augmented Generation) systems that power modern AI applications. This expert session explores the evolution from traditional data modeling to dynamic knowledge graphs, with implementation strategies for building responsive, context-aware data platforms and agent memory. Industry leaders share practical insights on adopting these technologies and advancing your career in AI-driven data engineering, including real-world case studies and emerging best practices.
Read More
Michele Catasta
Michele Catasta
President
Replit
Michele Catasta
President
Replit
President Replit
Keynotes
RAGs to Riches: Engineering the Future of LLM Systems
This keynote panel features Denis Yarats and Joseph Gonzalez -- two pioneers bridging academic theory and practical application. Joseph Gonzalez has transformed his Berkeley research into tangible solutions through LM-Sys and Gorilla projects, now bringing his expertise in machine learning and robotics to RunLLM.com after successfully launching Turi based on his doctoral work. Denis Yarats complements this approach with his reinvention of information discovery at Perplexity, where he's leveraging his NYU PhD and Facebook AI experience to develop Comet, a revolutionary "browser for agentic search." Together, they exemplify how rigorous academic foundations can be transformed into technologies that solve real-world problems and reshape our digital interactions.
Read More
Jake Brill
Jake Brill
Head of Product - Integrity
OpenAI
Jake Brill
Head of Product - Integrity
OpenAI
Head of Product - Integrity OpenAI
Keynotes
Guardrails for the Future: AI Safety and Responsible AI in Practice
Join us for a keynote panel that moves beyond theoretical discussions of AI ethics to explore the practical realities of implementing responsible AI safeguards. This conversation will unpack the complex trade-offs and technical challenges faced when deploying AI systems at scale. Panelists will share insights from building fairness and privacy protections into major platforms while maintaining innovation, and discuss how responsible AI has evolved into a business imperative. Topics include creating effective trust and safety protocols for generative AI, developing robust safeguards for user-generated content, implementing fairness frameworks across diverse products, and managing the tension between rapid deployment and thorough safety testing. Expect candid discussion about governance structures that work, persistent technical hurdles, and lessons learned from high-stakes incidents that shaped today's AI safeguards.
Read More
Rachad Alao
Rachad Alao
Senior Engineering Director
Meta
Rachad Alao
Senior Engineering Director
Meta
Senior Engineering Director Meta
Keynotes
Guardrails for the Future: AI Safety and Responsible AI in Practice
Join us for a keynote panel that moves beyond theoretical discussions of AI ethics to explore the practical realities of implementing responsible AI safeguards. This conversation will unpack the complex trade-offs and technical challenges faced when deploying AI systems at scale. Panelists will share insights from building fairness and privacy protections into major platforms while maintaining innovation, and discuss how responsible AI has evolved into a business imperative. Topics include creating effective trust and safety protocols for generative AI, developing robust safeguards for user-generated content, implementing fairness frameworks across diverse products, and managing the tension between rapid deployment and thorough safety testing. Expect candid discussion about governance structures that work, persistent technical hurdles, and lessons learned from high-stakes incidents that shaped today's AI safeguards.
Read More
Julien Le Dem
Julien Le Dem
Principal Engineer
Datadog
Julien Le Dem
Principal Engineer
Datadog
Principal Engineer Datadog
Keynotes
The Deconstructed Database and the Advent of the Open Data Lake
In 2018, Julien Le Dem described how the components of databases, distributed or not, were being commoditized as individual parts that anyone could recombine into use-case-specific engines. Given one's constraints, they could leverage those components to build a query engine that solves a specific problem much faster than building everything from the ground up. He called this idea "the Deconstructed Database" and spoke about it at a previous edition of Data Council. Fast forward to today, the big data ecosystem has matured and evolved from a melting pot of competing projects into a more composable ecosystem organized around a few open source standards. It's been incredible to see the vision he outlined in his talk crystallize with the adoption of key components like Parquet, Arrow, Iceberg, Calcite, Substrait and OpenLineage. These tools, and others like them, provide an interoperability layer that enables harnessing data for many purposes without creating silos.

In this talk, Julien will discuss the impact of the cloud and the advent of the open data lake, breaking silos to form the foundation of this ecosystem. As compute and storage can be efficiently decoupled, a common storage layer enables a vibrant ecosystem of on-demand tools specialized to specific use cases that avoid vendor lock-in. He'll go over the core components, how they work together and more importantly, the contracts that keep them decoupled and composable.
Read More
Joseph Gonzalez
Joseph Gonzalez
Professor
RunLLM & UC Berkeley
Joseph Gonzalez
Professor
RunLLM & UC Berkeley
Professor RunLLM & UC Berkeley
GenAI Applications
AGI Is Already Here (But It's Not What You Think)
The Future of AGI: Building Compound AI Systems | Explore a paradigm shift in AGI development through the lens of compound AI systems that integrate multiple LLMs with specialized tools. Learn how orchestrating diverse AI components can achieve human-level performance across broad task domains, demonstrated through RunLLM's AI support engineer implementation. Features practical approaches to building general-purpose AI workflows that combine speed, accuracy, and adaptability. Includes real-world case studies showing how compound AI systems are transforming customer support and service automation.
Read More
Keynotes
RAGs to Riches: Engineering the Future of LLM Systems
This keynote panel features Denis Yarats and Joseph Gonzalez -- two pioneers bridging academic theory and practical application. Joseph Gonzalez has transformed his Berkeley research into tangible solutions through LM-Sys and Gorilla projects, now bringing his expertise in machine learning and robotics to RunLLM.com after successfully launching Turi based on his doctoral work. Denis Yarats complements this approach with his reinvention of information discovery at Perplexity, where he's leveraging his NYU PhD and Facebook AI experience to develop Comet, a revolutionary "browser for agentic search." Together, they exemplify how rigorous academic foundations can be transformed into technologies that solve real-world problems and reshape our digital interactions.
Read More
Krishnaram Kenthapadi
Krishnaram Kenthapadi
Chief Scientist, Clinical AI
Oracle Health
Krishnaram Kenthapadi
Chief Scientist, Clinical AI
Oracle Health
Chief Scientist, Clinical AI Oracle Health
Keynotes
Guardrails for the Future: AI Safety and Responsible AI in Practice
Join us for a keynote panel that moves beyond theoretical discussions of AI ethics to explore the practical realities of implementing responsible AI safeguards. This conversation will unpack the complex trade-offs and technical challenges faced when deploying AI systems at scale. Panelists will share insights from building fairness and privacy protections into major platforms while maintaining innovation, and discuss how responsible AI has evolved into a business imperative. Topics include creating effective trust and safety protocols for generative AI, developing robust safeguards for user-generated content, implementing fairness frameworks across diverse products, and managing the tension between rapid deployment and thorough safety testing. Expect candid discussion about governance structures that work, persistent technical hurdles, and lessons learned from high-stakes incidents that shaped today's AI safeguards.
Read More
George Mathew
George Mathew
Managing Director
Insight Partners
George Mathew
Managing Director
Insight Partners
Managing Director Insight Partners
Keynotes
Data Meets Intelligence: Where the Data Infra & AI Stack Converge
As organizations navigate the AI revolution, the traditional boundaries between data infrastructure and AI systems are blurring. This session explores the critical convergence point where data management meets machine intelligence, examining how this intersection is reshaping enterprise technology stacks. Our keynote panelists bring complementary perspectives from operating and investing in this rapidly evolving landscape. Drawing from experiences spanning neuroscience, hardware architecture, product development, and venture capital, they'll unpack the technical and strategic considerations for organizations building modern data + AI platforms. Join us for an insightful discussion on how this convergence enables scalable AI adoption, the architectural patterns emerging across industries, and what the future holds as data infrastructure and AI capabilities become increasingly interdependent in driving business transformation.
Read More
Daniel Olmedilla
Daniel Olmedilla
Distinguished Engineer, AI & Trust
LinkedIn
Daniel Olmedilla
Distinguished Engineer, AI & Trust
LinkedIn
Distinguished Engineer, AI & Trust LinkedIn
Keynotes
Guardrails for the Future: AI Safety and Responsible AI in Practice
Join us for a keynote panel that moves beyond theoretical discussions of AI ethics to explore the practical realities of implementing responsible AI safeguards. This conversation will unpack the complex trade-offs and technical challenges faced when deploying AI systems at scale. Panelists will share insights from building fairness and privacy protections into major platforms while maintaining innovation, and discuss how responsible AI has evolved into a business imperative. Topics include creating effective trust and safety protocols for generative AI, developing robust safeguards for user-generated content, implementing fairness frameworks across diverse products, and managing the tension between rapid deployment and thorough safety testing. Expect candid discussion about governance structures that work, persistent technical hurdles, and lessons learned from high-stakes incidents that shaped today's AI safeguards.
Read More
Sumti Jairath
Sumti Jairath
Chief Architect
SambaNova Systems
Sumti Jairath
Chief Architect
SambaNova Systems
Chief Architect SambaNova Systems
Keynotes
Bringing Trillions to Reality: How SambaNova’s Memory-Centric Design Powers Agentic AI and GenAI Workflows for Enterprise Data

As enterprises increasingly leverage vast public and private datasets, generative AI and agentic systems are transforming the landscape of AI-driven solutions. These systems demand unparalleled scalability, speed, and efficiency to process massive data volumes while autonomously orchestrating complex workflows. SambaNova Systems offers its revolutionary memory-centric design, engineered to power trillion-parameter models and multi-agent systems with record-breaking interactive inference performance.

This talk will delve into SambaNova’s innovative three-tier memory system and reconfigurable dataflow architecture, which overcome the "memory wall" challenge by enabling seamless switching between hundreds of agents in microseconds. Attendees will explore how these technologies optimize data access, minimize latency, and scale across diverse real-world applications—from real-time decision-making to autonomous multi-agent collaboration—delivering transformative solutions for enterprises worldwide.

Read More
Han-chung Lee
Han-chung Lee
Machine Learning Director
Moody's Analytics
Han-chung Lee
Machine Learning Director
Moody's Analytics
Machine Learning Director Moody's Analytics
Foundation Models
The Model is the Product
In the realm of machine learning, AI, and deep learning, the intelligence embedded within a system—the model—stands as the primary product and key differentiator. This talk explores how the intelligence component has evolved to become the central selling point across technological eras. We will examine the historical progression of how intelligence capabilities have increasingly defined product value, transforming from hardware differentiators like "Intel Inside" during the PC era, to software advantages, and now to model-centric offerings in today's AI landscape. The intelligence layer has become not just a feature but the core product itself. Additionally, we'll analyze how the definition of "model" itself has evolved alongside technological advancement, reshaping what constitutes a system's core value. Companies now face a strategic bifurcation: pursue a model-centric approach or focus on distribution-centered strategies. Each path carries distinct trade-offs, risks, and opportunities in today's competitive AI marketplace. Through case studies of industry leaders and emerging players, we'll demonstrate how the fundamental principle—"the model is the product, the distribution is the moat"—is reshaping competitive dynamics and business strategies across sectors.
Read More
Julian Hyde
Julian Hyde
Senior Staff Engineer
Google
Julian Hyde
Senior Staff Engineer
Google
Senior Staff Engineer Google
Analytics & BI
More Than Query: Future Directions of Query Langages, from SQL to Morel
"Never bet against SQL,” the saying goes. But what exactly do we want from a query language, and will SQL always be the right tool for the job? What separates a query language from a regular programming language like Python or a framework like Apache Spark? This talk looks at recent efforts to extend SQL with measures and pipe syntax, and then gives an introduction to Morel. Morel is an exciting language that combines the strong type system and expressive power of a functional programming language with the efficiency of a declarative query language. Morel can express not just queries but also data-intensive programming, logic programming and mathematical optimization, and has the potential to replace today’s data frameworks. This talk explores the many ways that we use query languages today – from simple lookup queries and transactions to data engineering, data science and analytics – and related areas such as data-intensive programming, mathematical optimization and logic programming.
Read More
Pedram Navid
Pedram Navid
Head of Data Engineering & DevRel
Dagster Labs
Pedram Navid
Head of Data Engineering & DevRel
Dagster Labs
Head of Data Engineering & DevRel Dagster Labs
Lightning Talks
Write Less More: How Dagster Rebuilt Our Docs from the Ground Up
Documentation can become a critical pain point for technical teams, transforming from a helpful resource into a maintenance nightmare. In this candid session, Dagster reveals their radical approach to documentation reconstruction, demonstrating how a complete ground-up rebuild can revolutionize user experience. Attendees will dive deep into the strategic decision to completely overhaul their documentation, exploring the challenges of incremental improvements and the transformative power of a fresh perspective. Learn how radical rethinking can turn documentation from a source of user frustration into a powerful communication tool that truly serves the community.
Read More
Yusuf Ozuysal
Yusuf Ozuysal
Director of Engineering, AI
Snowflake
Yusuf Ozuysal
Director of Engineering, AI
Snowflake
Director of Engineering, AI Snowflake
Workshops
AI Your Way with All-In-One Access
Break bread with us while exploring the latest in LLM inference! Whether you’re a startup or seasoned developer, building with AI requires quick, easy access to top-tier models—without juggling multiple subscriptions. Snowflake is (now) the only platform where you can access Claude 3.6/3.7 Sonnet, GPT-4, O3-mini, and OpenAI embeddings through a single, Cloud Service Provider-agnostic API. We'll explore how a unified gateway for all your essential models can streamline AI pipelines at scale. Plus, our research team will showcase cutting-edge innovations in OSS model inference, pushing the boundaries of throughput and latency at the Pareto frontier. Join us for a unique Lunch & Learn where you'll experience the latest AI innovations firsthand and provide feedback that shapes our product roadmap.
Read More
Paul Dix
Paul Dix
Founder & CTO
InfluxData
Paul Dix
Founder & CTO
InfluxData
Founder & CTO InfluxData
Databases
Building InfluxDB 3 Core: A Real-Time Columnar DB and Data Processor on Object Storage
InfluxDB 3 Core reimagines time series databases with a ground-up Rust rewrite using Apache Arrow, DataFusion, and Parquet. This session explores an innovative diskless architecture that leverages object storage for persistence, featuring a sophisticated caching system enabling real-time data ingestion and querying. Attendees will discover how an embedded Python VM transforms the database into a comprehensive data collector, monitoring agent, and data transformation platform.
Read More
Hamel Husain
Hamel Husain
Machine Learning Consultant
Parlance Labs
Hamel Husain
Machine Learning Consultant
Parlance Labs
Machine Learning Consultant Parlance Labs
Foundation Models
The Model is Not the Product
This Data Council 2025 talk is in development. Check back soon! 
Read More
Chenggang Wu
Chenggang Wu
Co-Founder & CTO
RunLLM
Chenggang Wu
Co-Founder & CTO
RunLLM
Co-Founder & CTO RunLLM
GenAI Applications
AGI Is Already Here (But It's Not What You Think)
The Future of AGI: Building Compound AI Systems | Explore a paradigm shift in AGI development through the lens of compound AI systems that integrate multiple LLMs with specialized tools. Learn how orchestrating diverse AI components can achieve human-level performance across broad task domains, demonstrated through RunLLM's AI support engineer implementation. Features practical approaches to building general-purpose AI workflows that combine speed, accuracy, and adaptability. Includes real-world case studies showing how compound AI systems are transforming customer support and service automation.
Read More
Alexa Garrison
Alexa Garrison
VP Data & Business Operations
Splice
Alexa Garrison
VP Data & Business Operations
Splice
VP Data & Business Operations Splice
AI & Data Culture
Building High-Impact Data Teams in an AI-Driven World
This Data Council 2025 talk explores how organizations can build strong data teams and empower them to drive impactful decision-making, regardless of size or resources. More details to be announced... 
Read More
Etienne Dilocker
Etienne Dilocker
CTO
Weaviate
Etienne Dilocker
CTO
Weaviate
CTO Weaviate
Databases
The Agentic Database: A New Way to Interact with Your Data
For decades, database interactions have been constrained by traditional Create, Update, and Delete (CRUD) operations, but the emergence of AI agents is poised to revolutionize this paradigm. This session explores a transformative approach to database interaction, where databases become collaborative partners capable of understanding complex, natural language commands. Attendees will discover how future databases might interpret sophisticated requests like "Translate all documents to Spanish and summarize them" or "Extract the 2024 Sales numbers and map out their correlation with events and feature releases." By moving beyond vector search and similarity matching, this talk reveals a groundbreaking vision of databases as intelligent, context-aware systems that can comprehend, process, and execute nuanced human instructions.
Read More
Samuel Colvin
Samuel Colvin
Founder
Pydantic
Samuel Colvin
Founder
Pydantic
Founder Pydantic
Workshops
Pydantic: An Opinionated Blueprint for the Future of GenAI Applications
AI application development doesn't require reinventing software engineering. This transformative talk presents a practical blueprint for building maintainable AI systems using existing tools like Pydantic as the foundation. Learn how to implement critical components: strict data validation at API levels, self-correction mechanisms for enhanced accuracy, automated schema generation for LLM tool calls, continuous evaluation frameworks, and comprehensive observability solutions. Through concrete examples and code snippets, discover how familiar tools can create robust AI applications without unnecessary complexity. Perfect for developers looking to integrate AI functionality into larger software systems efficiently.
Read More
Andy Pavlo
Andy Pavlo
Assistant Professor of Databaseology
Carnegie Mellon University
Andy Pavlo
Assistant Professor of Databaseology
Carnegie Mellon University
Assistant Professor of Databaseology Carnegie Mellon University
Databases
What Goes Around Comes Around... and Around...
Doesn't it feel like there is always a new crop of database management systems pushing the idea that the relational model is outdated and SQL is dying? Vector database proponents have recently taken up this mantle, fueled by AI/ML technologies. Before that, NoSQL users claimed RM/SQL was insufficient for "webscale" applications. And in the 1990s, object-oriented database vendors wanted developers to switch to their systems. Database history doesn't repeat, but it rhymes. In this talk, Professor Andy Pavlo presents the 60-year history of data modeling research and demonstrate why RM/SQL is the preferred default choice for database applications of any size. All efforts to completely replace the data model or query language have failed. Instead, SQL absorbed the best ideas from these alternative approaches and remains relevant for modern applications.
Read More
Dhruv Singh
Dhruv Singh
Co-founder & CTO
HoneyHive AI
Dhruv Singh
Co-founder & CTO
HoneyHive AI
Co-founder & CTO HoneyHive AI
AI Engineering
Eval Agents: How to Solve Error Cascades in Agents
Agents or RAG chatbots are multi-turn AI systems. Multi-turn means interacting back-and-forth with humans. These systems face a fundamental challenge: errors compound and cascade with each interaction. In this talk, we'll go through real-world examples of agents failing in spectacular ways when one step goes wrong - overconfidence, manipulation, looping actions, and more. After doing so, we'll examine how agent builders use "eval agents" tuned on real-world interactions to evaluate agents and even use them as verifiers to improve performance in production! By the end of the talk, you'll have learned about the new world of trajectory evaluation needed to evaluate agents accurately.
Read More
George Fraser
George Fraser
Co-Founder & CEO
Fivetran
George Fraser
Co-Founder & CEO
Fivetran
Co-Founder & CEO Fivetran
Workshops
Look Ma, No Data Warehouse!

Modern data lakes promise affordability and scalability, but using them can be a headache. Cloud data warehouses make querying easy, but they come with a hefty price tag and extra complexity. What if you could get the same ease of use without the cost and lock-in?

In this session, we’ll show you how to leverage open-source software to build a fully functional, queryable analytics powerhouse using DuckDB, Fivetran, and Polaris Catalog. We’ll walk through how to:

1. Load data that is automatically converted to Iceberg open table format

2. Run SQL queries using DuckDB’s new Iceberg extension

3. Run transformations directly on data stored in your data lake with a new dbt adapter

4. Get started easily with a practical, hands-on demo

No vendor lock-in, no unnecessary complexity—just an open-source-powered approach to enabling advanced analytics and AI. If your data warehouse is holding you back or eating away at your budget, this session is for you!

Read More
Simon Eskildsen
Simon Eskildsen
Co-Founder
Turbopuffer
Simon Eskildsen
Co-Founder
Turbopuffer
Co-Founder Turbopuffer
Data Eng & Infrastructure
Billion-Scale Vector Search on Object Storage
Vector Search at Scale: How Notion Built Billion-Vector Search Infrastructure | Explore the architecture behind Notion's enterprise-scale vector search system, powering one of the largest semantic search implementations in production. Learn advanced techniques in embedding pipeline design, distributed vector processing, and optimal storage strategies using Spark and Turbopuffer. This technical deep-dive covers LSM indexing, RAG (Retrieval Augmented Generation) implementation, and practical approaches to query optimization. Discover battle-tested strategies for building and scaling production-ready vector search systems capable of handling billions of vectors with high performance and reliability.
Read More
Vishnu Vasanth
Vishnu Vasanth
Co-Founder & CEO
e6data
Vishnu Vasanth
Co-Founder & CEO
e6data
Co-Founder & CEO e6data
Workshops
Everything Everywhere All at Once: Object Store Native
Discover how e6data’s lakehouse compute engine runs complex and high-concurrency SQL analytics and AI workloads 10x faster than all leading engines at 1/3rd the cost—all with zero data movement. Learn how e6data’s atomically scalable lakehouse architecture helps achieve sub-second latencies even under heavy concurrency. This technical deep-dive covers e6data’s atomically scalable K8s native architecture, disaggregated compute design, and open table format integration showing the future of SQL analytics and AI workloads through real-world performance benchmarks and production case studies. Learn how an object-store-native approach unlocks “everything, everywhere, all at once” in modern data ecosystems.
Read More
Niko Grupen
Niko Grupen
Head of Applied Research
Harvey
Niko Grupen
Head of Applied Research
Harvey
Head of Applied Research Harvey
GenAI Applications
Legal Agency: Building Domain-specific Agents for Enterprise
Building agents for real-world knowledge work requires a delicate balance of AI and Human-Computer Interaction (HCI) — one has to understand frontier model capabilities, translate them into a framework for agent behavior (with the right primitives, guardrails, etc), and then place them in an intuitive product surface that is interactive and transparent. The complexity of attaining this balance is magnified for vertical problem spaces that require significant domain expertise to solve for, like law. This talk will share insights and best-practices from building at the bleeding edge of the application layer. We'll explore how to leverage domain expertise to map model problems to legal problems (and importantly, evaluate them), how to create a framework for vertical agents that mirrors human processes, and why, despite LLMs being the star of the show, traditional engineering and machine learning practices are essential for maximizing quality and reliability in production environments.
Read More
Dillon Morrison
Dillon Morrison
Director of Product Management
Sigma Computing
Dillon Morrison
Director of Product Management
Sigma Computing
Director of Product Management Sigma Computing
Workshops
Text-to-SQL Is Not the Answer: How to Effectively Use AI For Analytics
Think BI is dead? Will natural language replace the dashboard? Sigma's Wednesday morning workshop breaks down why generative AI is a powerful supplement - not replacement - for BI practices, and examines how to effectively embed AI into your analytics workflows.
Read More
Natacha Crooks
Natacha Crooks
Assistant Professor
UC Berkeley
Natacha Crooks
Assistant Professor
UC Berkeley
Assistant Professor UC Berkeley
Databases
From Concurrency Control to Concurrent Scheduling
This Data Council 2025 talk is in development. Please check back soon for updates!
Read More
Rachel Lee Nabors
Rachel Lee Nabors
Former React Core
Meta
Rachel Lee Nabors
Former React Core
Meta
Former React Core Meta
AI Engineering
AI Cram Session
Machine Learning Fundamentals: From RAG to Deep Learning for Beginners | Comprehensive introduction to essential machine learning concepts, including RAG (Retrieval Augmented Generation), neural networks, and foundational math principles. Learn complex ML concepts through engaging visual explanations and intuitive metaphors from an experienced technical educator. Perfect for developers, analysts, and technical professionals looking to understand modern AI terminology and architecture. Features practical examples and visual guides from the creator of React's educational platform, making advanced concepts accessible for technical audiences.
Read More
Chenyu Qiu
Chenyu Qiu
Senior Applied Scientist
Uber
Chenyu Qiu
Senior Applied Scientist
Uber
Senior Applied Scientist Uber
Data Sci & Algos
Scalable Continuous Monitoring for Large-scale A/B Experimentation
At Uber, our A/B Testing Framework and Continuous Experiment Monitoring talk reveals how we've revolutionized experimental analytics at scale. We'll demonstrate our solution to the "peeking problem" that plagues traditional experiment monitoring approaches. This presentation showcases our automated platform that processes thousands of monitoring analyses daily using regression-adjusted estimators with anytime-valid inference. This advanced statistical methodology eliminates 95% of noise without sacrificing true signals, enabling Early Experiment Detection and Performance Insights. Learn how our Spark-powered computational framework efficiently batches experiments and metrics for scalable processing. We'll share Real-World Case Studies showing how this system has transformed Uber's Data-Driven Decision Making, minimizing undetected regressions and accelerating product innovation across our global platform.
Read More
Ori Soen
Ori Soen
CEO
Montara
Ori Soen
CEO
Montara
CEO Montara
Workshops
Analytics and the dark side of the Analytics Development Lifecycle
In this insightful session, we examine how the Analytics Development Lifecycle (ADLC) introduced essential structure to data workflows but unintentionally created organizational bottlenecks by limiting data warehouse access to engineers. Our speaker shares how innovative Data Teams are successfully enabling analysts, product managers, and data scientists to migrate their work to Data Warehouse tables while maintaining strong Data Governance and Quality Standards. Discover practical DataOps Strategies that balance democratized data access with the structured Quality Assurance processes that modern enterprises require for effective Data Management.
Read More
Franck Pachot
Franck Pachot
Developer Advocate
MongoDB
Franck Pachot
Developer Advocate
MongoDB
Developer Advocate MongoDB
Databases
The Modern Database Debate: PostgreSQL and MongoDB
Which database should you choose? This question has evolved from theoretical debates to practical decisions based on facts. Technology has advanced significantly—SQL databases now support JSON, while NoSQL databases have integrated ACID properties. PostgreSQL and MongoDB represent the most common choices today, both widely adopted as standard APIs for managed database services. We will explore differences between these approaches, comparing interactive SQL transactions versus document-based design, examining internal storage performance implications, and considering how team expertise influences choices. Our goal is to clarify how to utilize each option effectively for modern applications' agility, scalability, and performance requirements, helping you select the database your team will be most comfortable using efficiently.
Read More
Parham Parvizi
Parham Parvizi
Founding Data Architect
Prospective
Parham Parvizi
Founding Data Architect
Prospective
Founding Data Architect Prospective
Workshops
A Local-First approach to extremely fast Streaming Visualization

Modern data workloads demand fast, interactive, and scalable visualization—without the cost and complexity of server-side rendering. The local-first approach leverages modern browser capabilities, WebAssembly, and in-browser computation to achieve high-performance analytics while reducing cloud costs.

In this workshop, we’ll explore:

1. Why Local-First? The benefits of running everything client-side for cost-efficient, scalable visualization across thousands of users.
WebAssembly (WASM) for Data Apps: How Perspective harnesses WASM to power ultra-fast, browser-native analytics and even replace traditional Docker-based containers for data workloads.

2. Perspective + DuckDB: A full in-browser analytics stack that enables high-speed querying and visualization without a backend.
Streaming Data with InfluxDB: How to visualize high-frequency, real-time IoT and log data with sub-second latency.
Databricks + Perspective: Enhancing large-scale analytics with interactive dashboards inside Jupyter notebooks.
Through live coding and guided exercises, attendees will build their own browser-native analytics dashboards, connect to real-time data streams, and learn Perspective’s API in Python, Node.js, and Rust.

Difficulty level - Intermediate – Some experience with Python, JavaScript, and data analytics will be helpful, but beginners can follow along with guided exercises.

To participate, bring a laptop with: 
Git
VS Code
Docker
Python (3.8+)
Node.js (16+)

Read More
Wenjing Zheng
Wenjing Zheng
Data Science Manager
Roblox
Wenjing Zheng
Data Science Manager
Roblox
Data Science Manager Roblox
Data Sci & Algos
Causal Inference Methods for Bridging Experiments and Strategic Impact
While experimentation gives us clean effect measures, connecting those results to real-world business decisions is messy. In this talk, I’ll walk through two case studies at Roblox that highlight this challenge and explore some causal inference methods to help bridge the gap. The first focuses on attributing observed year-over-year business growth to product launches. The strategic need here is twofold: to understand how much of our growth is driven by the innovations we shipped, and to reconcile different measurements of business performance— experiment results and long-term growth trends—into a coherent narrative. The core challenge is isolating product impact from organic growth (in the absence of these launches) in the topline metrics we observe.The second case study addresses how to generalize A/B test results to a broader population, without requiring an explicit evaluation of covariate shift between the experiment and target population—making the approach scalable across experiments and surfaces. This framing is essential for fair comparisons across product areas that vary in reach and in how amenable they are to metric movement, enabling more effective prioritization across teams.Together, these cases reflect a broader goal: building a common measurement language that connects local experimental results to global business impact—so organizations can make more strategic, data-informed decisions.
Read More
Doron Porat
Doron Porat
Co-Founder & CEO
Lakeway
Doron Porat
Co-Founder & CEO
Lakeway
Co-Founder & CEO Lakeway
AI & Data Culture
AI is Going to Break Your Data Platform - Are You Ready?
AI isn't just another workload - it's an unpredictable force disrupting data operations. This isn't evolution - it's collision. Traditional platforms assume stability, but AI workloads introduce volatility everywhere: in queries, users, and purposes. We need a new playbook. The way we optimize, govern, and structure data must evolve before AI forces our hand. The cracks are forming: workloads becoming chaotic, query patterns unpredictable, and latency constraints tightening. Pre-joins and aggregations matter, but existing optimization strategies won't hold at AI scale. This talk breaks down what's coming, what's at risk, and how to build AI-ready data platforms that don't just survive change - they thrive on it.
Read More
Oriol Mirosa
Oriol Mirosa
Director, Data Solutions
Brooklyn Data Co
Oriol Mirosa
Director, Data Solutions
Brooklyn Data Co
Director, Data Solutions Brooklyn Data Co
AI & Data Culture
Data Governance is NOT the Governance of Data!
This talk challenges the misleading concept that data governance is about controlling information rather than managing relationships between people. This Data Governance Best Practices talk explores why Traditional Data Management frameworks fail when overlooking the human element, presenting instead a Relationship-Centered Governance Model that aligns roles and responsibilities across organizations. Drawing from Enterprise Data Governance Case Studies, attendees will discover practical strategies for embedding Effective Governance Workflows without creating bottlenecks, transforming Data Management Strategy from control-focused to people-empowering while maintaining appropriate Data Quality Standards and Compliance Requirements.
Read More
Willem Pienaar
Willem Pienaar
Co-Founder & CTO
Cleric
Willem Pienaar
Co-Founder & CTO
Cleric
Co-Founder & CTO Cleric
Lightning Talks
Chaos by Design: Solving the Unsolvable AI Agent Testing Problem

Not all AI agent use cases are created equal. While code generation agents can be tested against clear benchmarks, operational agents tackling real-world problems face a fundamentally different challenge: how do you evaluate an agent that must navigate complex, dynamic systems without a predefined playbook? Take root cause analysis in distributed systems: an agent must understand intricate service dependencies, parse through inconsistent logs, and reason about potential failure modes. Unlike coding tasks with definitive right answers, these scenarios have no ground truth. Traditional testing approaches break down completely. This talk breaks down our approach to building a deterministic simulation environment that generates and tests realistic failure scenarios at scale. We'll expose why existing evaluation methods fail—from infrastructure mimicry to LLM-generated tests—and demonstrate a lightweight simulation technique that enables precise, reproducible agent testing.

Read More
Mitul Tiwari
Mitul Tiwari
Co-founder & CTO
Stealth
Mitul Tiwari
Co-founder & CTO
Stealth
Co-founder & CTO Stealth
GenAI Applications
TapeAgents: A Powerful Framework For Building And Optimizing AI Agents
TapeAgents: Advanced Framework for Observable AI Development | Discover ServiceNow's open-source framework for building transparent, debuggable AI agents with comprehensive action recording and replay capabilities. Learn how TapeAgents' innovative recording system enables unprecedented visibility into agent behavior, streamlined debugging, and data-driven optimization. Master practical techniques for building robust AI agents with built-in observability and performance analysis tools. Features implementation strategies for creating production-ready agents with enhanced reliability and maintainability.
Read More
Timothy Chan
Timothy Chan
Head of Data
Statsig
Timothy Chan
Head of Data
Statsig
Head of Data Statsig
Data Sci & Algos
Unlocking A/B Testing For B2B
B2B Experimentation: Advanced A/B Testing Beyond Consumer Applications | Learn enterprise-grade experimentation strategies from Statsig's work with leading B2B platforms including Notion, Figma, and Atlassian. Master specialized statistical approaches designed for B2B contexts, addressing unique challenges in sample sizes, user behaviors, and impact measurement. Discover practical frameworks for implementing robust experimentation systems that deliver reliable insights for enterprise products. Features real-world case studies demonstrating successful B2B testing methodologies and their impact on product development.
Read More
David Wilson
David Wilson
Co-Founder & CEO
Hunch Tools
David Wilson
Co-Founder & CEO
Hunch Tools
Co-Founder & CEO Hunch Tools
GenAI Applications
Designing & Engineering a Viral Multi-Model AI Workflow: From Prototype to 300K Users in Two Weeks
When Hunch's viral LinkedIn year-in-review AI generator reached 300,000 users processing 1+ trillion tokens in two weeks, their multi-model architecture faced extreme scaling challenges. This case study reveals how a simple prototype evolved into a production-scale AI system overnight. Discover Hunch's technical blueprint featuring multiple LLM orchestration across OpenAI, Anthropic, and Google models, critical infrastructure scaling solutions, and how they achieved 85% cost reduction through optimized model selection and prompt engineering. Learn from their 26 rapid iterations that simultaneously improved output quality while decreasing costs. This presentation shares practical patterns for AI workflow orchestration balancing quality, cost, and reliability at scale. Gain actionable engineering strategies for building resilient, scalable AI applications that maintain performance under unpredictable growth, plus vital lessons about system failure points when success arrives faster than expected.
Read More
Ofer Mendelevitch
Ofer Mendelevitch
Head of Developer Relations
Vectara
Ofer Mendelevitch
Head of Developer Relations
Vectara
Head of Developer Relations Vectara
Workshops
Building Enterprise Agentic RAG Applications with Reduced Hallucinations
As AI continues to evolve, agentic frameworks are becoming essential tools for developing intelligent and autonomous systems that can reason, plan, and act dynamically. In this workshop, we will explore how to leverage Vectara’s Agentic RAG framework to build context-aware, AI assistants and agents, with reduced hallucinations that enhance productivity and automate enterprise workflows. We will provide a step-by-step walkthrough on how to build Agentic RAG applications, delving into the technical details with a real-world example, and discuss the challenges developers might face, such as reducing hallucinations. Whether you are an AI developer, researcher, or enthusiast, this workshop will equip you with the practical skills to harness agentic AI for your enterprise.
Read More
Lindsay Murphy
Lindsay Murphy
Director, Head of Data
Hiive
Lindsay Murphy
Director, Head of Data
Hiive
Director, Head of Data Hiive
AI & Data Culture
No More BS: How (and When) to Really Leverage AI
Successful AI implementation hinges on a solid foundation of data quality and governance, and the current hype often overshadows the critical practical considerations needed to achieve that foundation. Moreover, while AI holds immense potential, it's crucial to evaluate whether it's truly the optimal solution for a given business problem, as simpler, more established methods may be equally or more effective. We present a practical framework to assess whether AI is the optimal solution, and encourage some good old-fashioned critical thinking. Join Colleen Tartow and Lindsay Murphy for a data-driven conversation exploring AI's true viability.
Read More
Jake Thomas
Jake Thomas
Manager, Data Foundations
Okta
Jake Thomas
Manager, Data Foundations
Okta
Manager, Data Foundations Okta
ML OPs & Platforms
Embedding OLAP, Everywhere: Lessons from Okta
Okta's innovative journey from processing trillions of events with mini serverless databases to embedding OLAP across its systems reveals a transformative approach to data processing. This session explores how embedded database systems are reshaping traditional data warehousing, demonstrating how small databases can create enormous value beyond analytics. Attendees will discover the strategic shift that's bringing databases back into application engineering and driving unprecedented innovation.
Read More
Tobias Lunt
Tobias Lunt
Co-Founder & Data Scientist
Development Data Lab
Tobias Lunt
Co-Founder & Data Scientist
Development Data Lab
Co-Founder & Data Scientist Development Data Lab
Lightning Talks
Putting Data to Work for Global Urban Development
Imagine transforming the lives of billions by reimagining urban data infrastructure. Development Data Lab is pioneering a revolutionary approach to urban policy and planning, addressing the critical gap in decision-ready data for the world's developing cities. By integrating diverse data sources—including satellite imagery, administrative records, household surveys, and AI-powered text analysis—this innovative project creates a unified geographic framework for understanding urban challenges. The team demonstrates how emerging data technologies can generate near-real-time, actionable insights to tackle complex issues like urban sprawl, air pollution, poverty, education, mobility, and migration. Learn how a mission-driven approach can leverage incremental technological improvements and AI-assisted development to create outsized impact for global urban communities.
Read More
Marck Vaisman
Marck Vaisman
Global AI Solutions Architect
Microsoft
Marck Vaisman
Global AI Solutions Architect
Microsoft
Global AI Solutions Architect Microsoft
AI Engineering
Revolutionize AI Engineering With Autogen
Microsoft AutoGen: Scale and Automate Enterprise AI Development | Discover Microsoft's open-source framework for building and orchestrating production-ready AI agent systems. Learn practical implementation strategies for automating complex AI workflows, reducing development time, and optimizing resource utilization. Features real-world case studies demonstrating AutoGen's impact on development efficiency, model performance, and cost reduction across various industries. Includes hands-on examples of system integration, agent orchestration, and workflow automation for enterprise AI applications.
Read More
Elias DeFaria
Elias DeFaria
Co-Founder & VP of Product
SDF
Elias DeFaria
Co-Founder & VP of Product
SDF
Co-Founder & VP of Product SDF
Data Eng & Infrastructure
Why dbt Acquired Sdf: How A Small Team Built True SQL Comprehension
At SDF, we built a multi-dialect SQL compiler that resolves proprietary SQL dialects like Snowflake and BigQuery into a unified logical plan. This breakthrough technology, now part of dbt following the acquisition, unlocks immense value in developer experience, data governance, and cost optimization—enabling seamless cross-engine workflows. In this talk, Elias, co-founder of SDF, will dive into how we built the compiler, the challenges of normalizing complex dialects, and the transformative potential for data practitioners. He'll conclude with an exclusive look at upcoming dbt features powered by this technology, reshaping how teams approach analytics.
Read More
Mickey Liu
Mickey Liu
Software Engineer
Notion
Mickey Liu
Software Engineer
Notion
Software Engineer Notion
Data Eng & Infrastructure
Billion-Scale Vector Search on Object Storage
Vector Search at Scale: How Notion Built Billion-Vector Search Infrastructure | Explore the architecture behind Notion's enterprise-scale vector search system, powering one of the largest semantic search implementations in production. Learn advanced techniques in embedding pipeline design, distributed vector processing, and optimal storage strategies using Spark and Turbopuffer. This technical deep-dive covers LSM indexing, RAG (Retrieval Augmented Generation) implementation, and practical approaches to query optimization. Discover battle-tested strategies for building and scaling production-ready vector search systems capable of handling billions of vectors with high performance and reliability.
Read More
Sumedh Sakdeo
Sumedh Sakdeo
Senior Staff Software Engineer
LinkedIn
Sumedh Sakdeo
Senior Staff Software Engineer
LinkedIn
Senior Staff Software Engineer LinkedIn
Data Eng & Infrastructure
Optimizing Iceberg Table Layouts at Scale: A Multi-Objective Approach
Optimizing Iceberg Tables: Advanced Data Layout Strategies for Enterprise Data Lakes | Master data layout optimization techniques for managing large-scale Iceberg deployments with 100K+ tables. Learn comprehensive approaches to multi-objective optimization, balancing storage efficiency with query performance through intelligent file management and compaction strategies. This session covers practical implementation of table scoring algorithms, automated optimization workflows, and real-world performance insights from OpenHouse deployment. Includes detailed case studies and benchmarks using LST-bench, demonstrating measurable improvements in query performance and storage efficiency.
Read More
Jesus Camacho
Jesus Camacho
Principal Engineering Manager
Microsoft
Jesus Camacho
Principal Engineering Manager
Microsoft
Principal Engineering Manager Microsoft
Data Eng & Infrastructure
Optimizing Iceberg Table Layouts at Scale: A Multi-Objective Approach
Optimizing Iceberg Tables: Advanced Data Layout Strategies for Enterprise Data Lakes | Master data layout optimization techniques for managing large-scale Iceberg deployments with 100K+ tables. Learn comprehensive approaches to multi-objective optimization, balancing storage efficiency with query performance through intelligent file management and compaction strategies. This session covers practical implementation of table scoring algorithms, automated optimization workflows, and real-world performance insights from OpenHouse deployment. Includes detailed case studies and benchmarks using LST-bench, demonstrating measurable improvements in query performance and storage efficiency.
Read More
Ciro Greco
Ciro Greco
Founder
Bauplan
Ciro Greco
Founder
Bauplan
Founder Bauplan
Data Sci & Algos
Python Over Data Lakes: Declarative Environments, Data Management And Other Things With Feathers
Python Data Lake Reproducibility: Building Deterministic Pipelines at Scale | Learn advanced techniques for creating reproducible data workflows across distributed environments using Python, Iceberg, Arrow, and Docker. Master declarative approaches to managing code versions, data dependencies, and runtime configurations in complex data lake architectures. Discover practical solutions for decoupling compute, storage, and execution environments while maintaining deterministic results. Includes implementation strategies using open-source tools for building efficient, scalable data pipelines with improved developer experience.
Read More
Joseph Powers
Joseph Powers
Principal Data Scientist
Intuit
Joseph Powers
Principal Data Scientist
Intuit
Principal Data Scientist Intuit
Data Sci & Algos
Going Bayes: Shifting Our Testing Methods To Reflect Our Priorities
Bayesian AB Testing at Scale: How Intuit Revolutionized Experiment Design | Discover how Intuit transformed their experimentation framework using Bayesian risk-based testing to achieve 60% faster results. Learn practical implementation of risk threshold algorithms that optimize for business outcomes rather than traditional error rates. Master strategies for organizational adoption of advanced statistical methods across Analytics, Product, and Marketing teams. Features detailed case study of successful enterprise-wide statistical transformation, including implementation challenges and measurable outcomes.
Read More
Marcel Kornacker
Marcel Kornacker
Co-Founder & CTO
Pixeltable
Marcel Kornacker
Co-Founder & CTO
Pixeltable
Co-Founder & CTO Pixeltable
ML OPs & Platforms
Introducing Pixeltable: Open Source Data Infrastructure for Multimodal AI
Traditional AI infrastructure creates complexity by forcing data teams to juggle multiple specialized systems, fragmenting workflows and increasing operational costs. Marcel Kornacker, founder of Apache Impala and Apache Parquet, introduces Pixeltable, an open-source solution that revolutionizes AI data infrastructure through a declarative, incremental approach. This session reveals how a unified platform can solve common AI data challenges by bringing together data, computation, and models in a single, integrated interface. Attendees will discover how Pixeltable provides automatic versioning, enables incremental updates, and streamlines pipeline management for ML engineers, data scientists, and infrastructure teams seeking to overcome traditional data processing limitations.
Read More
Saif Ur-Rehman
Saif Ur-Rehman
Data Engineering Lead
Basecamp Research
Saif Ur-Rehman
Data Engineering Lead
Basecamp Research
Data Engineering Lead Basecamp Research
Lightning Talks
Engineering Earth's Largest Biological Data Pipeline
Basecamp Research is pioneering a groundbreaking mission to map the unknown biological world, addressing the staggering fact that over 99.9% of life on Earth remains undiscovered. This session unveils an unprecedented biological data pipeline that surpasses all publicly available scientific data collected over the past century. By creating a comprehensive digital twin of Earth's life, the team is developing next-generation biological foundation models with applications spanning pharmaceutical research, deep learning, and scientific discovery. Attendees will explore how a global biological data supply chain, spanning five continents, is generating billions of biological labels and producing state-of-the-art AI models that outperform research from Google, DeepMind, and Genentech.
Read More
Jonathan Jin
Jonathan Jin
Staff Machine Learning Engineer
Hinge
Jonathan Jin
Staff Machine Learning Engineer
Hinge
Staff Machine Learning Engineer Hinge
ML OPs & Platforms
Trimming the Long Tail of Production Model Ownership at Hinge
Beyond model performance lies a critical challenge in machine learning: comprehensive model ownership. This talk examines how focusing on the often-overlooked "long tail" of machine learning infrastructure can dramatically improve operational efficiency and innovation. Staff Engineer Jonathan Jin from Hinge's AI Platform team will reveal how addressing challenges like observability, feature access, and model refinement creates a "golden path" that empowers teams to continuously innovate. Attendees will learn how strategic infrastructure development can transform machine learning from a performance-driven to a holistic, sustainable practice.
Read More
Madison Faulkner
Madison Faulkner
Principal & Head of Data Science
NEA
Madison Faulkner
Principal & Head of Data Science
NEA
Principal & Head of Data Science NEA
Data Eng & Infrastructure
The Future Of Data Engineering: From Unstructured To Structured for Agent Systems
Next-Generation Data Engineering: AI Agents, Knowledge Graphs, Memory, and Real-Time Systems | Discover how AI agents and foundation models are revolutionizing data engineering through advanced real-time data processing. Learn cutting-edge approaches to agent memory and knowledge representation using semantic layers, vector embeddings, and graph RAG (Retrieval Augmented Generation) systems that power modern AI applications. This expert session explores the evolution from traditional data modeling to dynamic knowledge graphs, with implementation strategies for building responsive, context-aware data platforms and agent memory. Industry leaders share practical insights on adopting these technologies and advancing your career in AI-driven data engineering, including real-world case studies and emerging best practices.
Read More
Hamilton Ulmer
Hamilton Ulmer
UI Engineer & Designer
MotherDuck
Hamilton Ulmer
UI Engineer & Designer
MotherDuck
UI Engineer & Designer MotherDuck
Analytics & BI
Instant Preview Mode: Real-Time Feedback to Make SQL Data Exploration Fly
Imagine writing SQL queries that give you instant visual feedback, transforming your entire data exploration experience. In this talk, you'll see how MotherDuck's Instant Preview Mode breaks through traditional development barriers by providing real-time results as you type. Powered by cutting-edge client-side query parsing and DuckDB-WASM, this technology eliminates the frustrating write-run-debug cycle that's slowed down data professionals for years. You'll see how we've created a system that not only accelerates query iteration but makes working with SQL—especially AI-generated queries—feel more intuitive and responsive than ever before.
Read More
Vignesh Chadramohan
Vignesh Chadramohan
Engineering Manager
Doordash
Vignesh Chadramohan
Engineering Manager
Doordash
Engineering Manager Doordash
ML OPs & Platforms
Internals of SlateDB: An Embedded Key-Value Store Built on Object Storage
Object storage platforms like S3 and Azure Blob Storage have transformed data systems, enabling new architectural paradigms. This session explores SlateDB, an embeddable storage engine built in Rust that leverages object storage's unique properties. Attendees will dive into how conditional writes, checkpoints, transactions, and remote compaction can be implemented, discovering insights that extend beyond SlateDB to broader data system design and implementation.
Read More
Nikita Vemuri
Nikita Vemuri
Software Engineer
Anyscale
Nikita Vemuri
Software Engineer
Anyscale
Software Engineer Anyscale
ML OPs & Platforms
From Scaling to Observability: Solving Key Challenges for Distributed ML with Ray
As machine learning workloads grow increasingly complex, distributed training across thousands of nodes presents significant challenges. This talk explores how the Ray library ecosystem tackles critical issues in multi-node ML training, focusing on development, orchestration, and comprehensive observability. Attendees will learn about innovative solutions for tracking system data, managing potential failure points, and implementing robust observability workflows that persist critical information.
Read More
Ethan Brown
Ethan Brown
Director, Data & Applied Science
Twitch / AWS
Ethan Brown
Director, Data & Applied Science
Twitch / AWS
Director, Data & Applied Science Twitch / AWS
GenAI Applications
Building an LLM-Powered Analytics Slack Bot at Twitch
The best way to beat a wave of automation is to surf it. With this principle in mind, the data team at Amazon IVS / Twitch Video has developed an LLM-powered data analytics bot to augment their data operations. The bot integrates with Slack, allowing employees to interact with data tools through a familiar chat interface. It performs a range of tasks including SQL query generation, chat summarization, and account lookups. This talk provides a practical walkthrough of the implementation, demonstrating how teams can build similar solutions using standard AWS services.
Read More
John Bagnall
John Bagnall
Senior Data Product Manager
Matillion
John Bagnall
Senior Data Product Manager
Matillion
Senior Data Product Manager Matillion
Lightning Talks
Humanizing Data Architecture: How Design Thinking Transforms Data Strategy
As organizations embrace increasingly complex data architectures like data mesh and data fabric, a critical challenge emerges: how do we ensure these sophisticated technical solutions genuinely serve human needs? This session introduces design thinking as a transformative framework for developing data strategies that balance technical excellence with profound user-centricity. Through practical examples and deep case studies, explore how empathy, innovative problem-solving, and iterative feedback can revolutionize data architecture. Attendees will learn to apply design thinking's core principles—understanding stakeholder needs, articulating human-centered problems, generating innovative solutions, rapid prototyping, and continuous improvement—to create data products that are not just technically sophisticated, but truly meaningful and accessible to their users.
Read More
CL Kao
CL Kao
Founder
Recce
CL Kao
Founder
Recce
Founder Recce
Lightning Talks
Data Engineering Is Not Software Engineering, Until It Is
Modern Data Engineering: Bridging DevOps, MLOps, and Software Development | This technical session examines how modern data engineering is evolving beyond traditional software engineering practices, focusing on data pipeline architecture, testing frameworks, and deployment strategies. Through real-world case studies from dbt (data build tool) implementations and SQLMesh data transformation workflows, the presentation explores how data teams are adopting GitOps methodologies, continuous integration, and version control for data-centric systems. As artificial intelligence and machine learning operations become central to software development, these emerging data engineering practices are reshaping how teams approach data quality, system validation, and production deployment. The session will demonstrate how differences in ETL pipeline feedback loops and data testing environments are driving new best practices for managing enterprise data systems, while offering insights into the future convergence of CI/CD, data governance, and MLOps practices.
Read More
Avi Press
Avi Press
CEO
Scarf
Avi Press
CEO
Scarf
CEO Scarf
Lightning Talks
Open Source Success: Learnings from 1 Billion Downloads
This data-driven analysis examines user behavior patterns across 1 billion open source package downloads, spanning 2000+ GitHub repositories and open source projects tracked through Scarf Analytics. The research reveals critical insights for open source maintainers and OSS business leaders, covering package management trends, download metrics, and documentation strategy. By analyzing global distribution patterns, software packaging formats, and community engagement metrics, the presentation provides actionable strategies for open source project growth, user adoption, and sustainable business development in the open source ecosystem. The findings highlight how successful OSS projects leverage download analytics, developer documentation, and community metrics to drive project adoption and monetization.
Read More
Michael Cohen
Michael Cohen
Global Chief Data & Analytics Officer
Plus Company
Michael Cohen
Global Chief Data & Analytics Officer
Plus Company
Global Chief Data & Analytics Officer Plus Company
Lightning Talks
The Art of Data: Reimaging Creative Processes with Data Culture
This session tackles a persistent challenge in creative industries: why do artists often see data as the enemy of creativity, and how can we change that perception? Drawing from hands-on experience, the presentation explores how organizations can transform data from a creative constraint into an inspiration catalyst. We'll dive into practical strategies for building data literacy among creative teams, showcase compelling examples of data storytelling in artistic contexts, and demonstrate how leading creative professionals are using analytics to amplify rather than stifle their artistic vision. Learn how successful organizations are bridging the gap between data teams and creatives, fostering a culture where intuition and analytics work in harmony to drive more impactful creative outcomes.
Read More
Dylan Perez  Neider
Dylan Perez Neider
Sr. Solutions Engineer
Sigma Computing
Dylan Perez Neider
Sr. Solutions Engineer
Sigma Computing
Sr. Solutions Engineer Sigma Computing
Workshops
Text-to-SQL Is Not the Answer: How to Effectively Use AI For Analytics
Think BI is dead? Will natural language replace the dashboard? Sigma's Wednesday morning workshop breaks down why generative AI is a powerful supplement - not replacement - for BI practices, and examines how to effectively embed AI into your analytics workflows.
Read More
Dadi Atar
Dadi Atar
VP Product
Montara
Dadi Atar
VP Product
Montara
VP Product Montara
Workshops
Analytics and the dark side of the Analytics Development Lifecycle
In this insightful session, we examine how the Analytics Development Lifecycle (ADLC) introduced essential structure to data workflows but unintentionally created organizational bottlenecks by limiting data warehouse access to engineers. Our speaker shares how innovative Data Teams are successfully enabling analysts, product managers, and data scientists to migrate their work to Data Warehouse tables while maintaining strong Data Governance and Quality Standards. Discover practical DataOps Strategies that balance democratized data access with the structured Quality Assurance processes that modern enterprises require for effective Data Management.
Read More
Sudarsan  Lakshmi
Sudarsan Lakshmi
Head of Engineering
e6data
Sudarsan Lakshmi
Head of Engineering
e6data
Head of Engineering e6data
Workshops
Everything Everywhere All at Once: Object Store Native
Discover how e6data’s lakehouse compute engine runs complex and high-concurrency SQL analytics and AI workloads 10x faster than all leading engines at 1/3rd the cost—all with zero data movement. Learn how e6data’s atomically scalable lakehouse architecture helps achieve sub-second latencies even under heavy concurrency. This technical deep-dive covers e6data’s atomically scalable K8s native architecture, disaggregated compute design, and open table format integration showing the future of SQL analytics and AI workloads through real-world performance benchmarks and production case studies. Learn how an object-store-native approach unlocks “everything, everywhere, all at once” in modern data ecosystems.
Read More
Beto Ferreira  De Almeida
Beto Ferreira De Almeida
Staff Engineer
Preset
Beto Ferreira De Almeida
Staff Engineer
Preset
Staff Engineer Preset
AI & Data Culture
Data Should be Invisible

The modern data landscape is dominated by complexity: tables, schemas, pipelines, warehouses, and more. Yet the most successful data platforms share a common principle—they make data itself invisible to the end user. When data infrastructure functions optimally, it's like good plumbing: you only notice it when something breaks. Organizations often fixate on the mechanics of data while losing sight of what truly matters: metrics, dimensions, and semantics. When users engage with meaningful abstractions rather than technical details, they make better decisions faster. In this talk, you'll learn strategies for making data invisible through real-world abstraction success stories, designing effortless interactions, and implementing governance through abstraction. Walk away with practical ways to assess your data stack, advocate for user-centric approaches, and measure progress—making your data platform not just powerful, but invisible in all the right ways.

Read More
Josh Curl
Josh Curl
Co-Founder & CTO
Hightouch
Josh Curl
Co-Founder & CTO
Hightouch
Co-Founder & CTO Hightouch
AI & Data Culture
Bridging the AI Implementation gap: Strategies for Embedding Data Professionals with Business Units

At the foundation of AI project failures lies a critical gap between data teams and business reality. On top of this gap, data quality issues, unexpected privacy concerns, and tools that don't align with actual business problems arise to hinder or block implementation. As we've built our own AI product—AI Decisioning—and implemented it with customers, we've learned that successful AI implementations depend on embedding data teams within business units. Embedding doesn't mean breaking apart your data team and dispersing it throughout every other department. It means establishing focused partnerships where data team members are deeply integrated into business teams' daily workflows and decision-making processes while remaining connected to the central data organization. This embedding creates a virtuous cycle: data teams gain deep domain knowledge, business professionals see improved data quality and gain data facility, and together, data and business teams implement AI solutions that solve real problems. In this talk, we'll share concrete examples of how data teams (especially data scientists) and marketing have worked together in successful AI Decisioning implementations. We’ll derive strategies to implement this organizational pattern and enable a company to move from analytics to actions and from data teams as service providers to active collaborators. While our case studies focus primarily on marketing partnerships, the embedded partnership model we present applies equally to other business functions including product development, operations, and customer service teams.

Read More
Tomás Kofman
Tomás Kofman
Co-Founder & CEO
Not Diamond
Tomás Kofman
Co-Founder & CEO
Not Diamond
Co-Founder & CEO Not Diamond
Lightning Talks
How to Build Your Own Model Router

Building Cost-Effective LLM Routers: Boost Accuracy 25% While Cutting Costs 90% | This session reveals how to build intelligent model routers that dynamically direct inputs to the optimal large language model (LLM) for each specific task. Attendees will learn practical implementation strategies for multi-model LLM systems that significantly improve performance metrics—achieving up to 25% higher accuracy while reducing operational costs by as much as 90%. The presentation covers essential routing methodologies, evaluation frameworks, and scalable architectures for production deployments. Developers and ML engineers will gain actionable insights for overcoming technical challenges in multi-model LLM systems, optimizing both performance and cost-efficiency in generative AI applications. Perfect for teams looking to maximize ROI from their AI infrastructure while maintaining high-quality outputs.

Read More
Diptanu Gon  Choudhury
Diptanu Gon Choudhury
Founder& CEO
Tensorlake
Diptanu Gon Choudhury
Founder& CEO
Tensorlake
Founder& CEO Tensorlake
Data Eng & Infrastructure
The Future Of Data Engineering: From Unstructured To Structured for Agent Systems
Next-Generation Data Engineering: AI Agents, Knowledge Graphs, Memory, and Real-Time Systems | Discover how AI agents and foundation models are revolutionizing data engineering through advanced real-time data processing. Learn cutting-edge approaches to agent memory and knowledge representation using semantic layers, vector embeddings, and graph RAG (Retrieval Augmented Generation) systems that power modern AI applications. This expert session explores the evolution from traditional data modeling to dynamic knowledge graphs, with implementation strategies for building responsive, context-aware data platforms and agent memory. Industry leaders share practical insights on adopting these technologies and advancing your career in AI-driven data engineering, including real-world case studies and emerging best practices.
Read More
Gleb Mezhanskiy
Gleb Mezhanskiy
Co-Founder & CEO
Datafold
Gleb Mezhanskiy
Co-Founder & CEO
Datafold
Co-Founder & CEO Datafold
Data Eng & Infrastructure
The Future Of Data Engineering: From Unstructured To Structured for Agent Systems
Next-Generation Data Engineering: AI Agents, Knowledge Graphs, Memory, and Real-Time Systems | Discover how AI agents and foundation models are revolutionizing data engineering through advanced real-time data processing. Learn cutting-edge approaches to agent memory and knowledge representation using semantic layers, vector embeddings, and graph RAG (Retrieval Augmented Generation) systems that power modern AI applications. This expert session explores the evolution from traditional data modeling to dynamic knowledge graphs, with implementation strategies for building responsive, context-aware data platforms and agent memory. Industry leaders share practical insights on adopting these technologies and advancing your career in AI-driven data engineering, including real-world case studies and emerging best practices.
Read More
Nathan Sooter
Nathan Sooter
Sr. Manager, RevOps Analytics & Insights
1Password
Nathan Sooter
Sr. Manager, RevOps Analytics & Insights
1Password
Sr. Manager, RevOps Analytics & Insights 1Password
Analytics & BI
Go-To-Market Data Enrichment: Practical Strategies to Drive Business Value
Let’s be frank, data teams and sales teams don’t always see eye to eye. Data teams see Salesforce as a swamp of messy, user-generated chaos. Sales teams often see data teams as a slow-moving black box. The result? Frustration on both sides and a missed opportunity to drive business value. But what if data teams weren’t just seen as pipeline or dashboard builders, but as strategic partners in revenue growth? In this session, we’ll show how simple, no-fuss data engineering and enrichment can transform the way sales teams trust and use data. Through real-world examples, like cleaning up CRM records with the power of LLMs, we’ll explore how small but intentional changes in ingestion and modeling can change the perception of a data team. We'll explore practical strategies to make your work more visible, valuable, and aligned with the GTM team.
Read More
Margaret Quigley
Margaret Quigley
ex-Cohere Head of Data Acquisition
MQ Consulting
Margaret Quigley
ex-Cohere Head of Data Acquisition
MQ Consulting
ex-Cohere Head of Data Acquisition MQ Consulting
Lightning Talks
Ethical Data Acquisition & Sales in the AI Age
Learn strategies for sitting on both sides of a data acquisition negotiation table - not just how to evaluate and price new data for training & evaluating AI/LLMs, but also how to efficiently package and sell it as a data owner. Margaret will cover real-world examples from her 7 years of experience running Data Acquisition & GTM teams with leading AI companies and data vendors that touch on nuances within ethics & due diligence, security & storage, and transparency & accountability.
Read More
Jonathan Mortensen
Jonathan Mortensen
CEO
Confident Security
Jonathan Mortensen
CEO
Confident Security
CEO Confident Security
ML OPs & Platforms
The Unofficial Guide to Apple’s Private Cloud Compute
In October 2024, Apple released a new private AI technology onto millions of devices called “Private Cloud Compute”. It brings the same level of privacy and security a local device offers but on an “untrusted" remote server. This talk discusses how Private Cloud Compute represents a paradigm shift in confidential computing and explores the core advancements that made it possible to become mainstream. We’ll explore its novel architecture that allows developers to run sensitive, multi-tenant workloads with cryptographically-provably privacy guarantees at scale and at reasonable cost. Attendees will leave with an understanding of how to leverage this technology for data and AI applications where privacy and security is paramount.
Read More
Skip Everling
Skip Everling
Head of Developer Relations
Kolena
Skip Everling
Head of Developer Relations
Kolena
Head of Developer Relations Kolena
Workshops
AI-Powered Automation: Supercharge Data-Intensive Workflows with Intelligent Agents
In today’s fast-paced, data-heavy industries, crucial information is often buried in PDFs, contracts, compliance reports, and other unstructured sources—slowing down decision-making and increasing risk. Join us for an interactive workshop where we’ll showcase how to use AI gents to automate data-intensive workflows for analysts, compliance officers, underwriters, diligence teams, and knowledge workers. In this hands-on session, you’ll learn how AI can: Automate repetitive tasks — freeing your team to focus on high-value analysis. Enhance accuracy & consistency — reducing errors and ensuring data integrity. Accelerate decision-making — with faster data extraction and smarter insights. Scale operations — handling complex tasks across massive document sets with ease. We’ll walk through real-world use cases, including compliance assessments, contract analysis, risk evaluation, and more—demonstrating how AI agents can streamline workflows, reduce bottlenecks, and drive smarter decisions. If you deal with large volumes of documents, this workshop is for you. Walk away with actionable strategies to: Boost productivity. Reduce risk. Gain a competitive edge.
Read More
Jacob Matson
Jacob Matson
Developer Advocate
MotherDuck
Jacob Matson
Developer Advocate
MotherDuck
Developer Advocate MotherDuck
Workshops
More Than a Vibe: AI-Driven SQL that Actually Works
In this hands-on workshop, we will demonstrate how AI can empower you to "vibe code"—using AI to write accurate SQL, enabled only by the magic of MotherDuck & DuckDB. Participants will work with a real life spatial data set to tackle real-world challenges and see firsthand how AI-Driven DuckDB SQL can transform data handling into a rapid, low-risk, interactive process. By the end of the workshop, participants will have experienced an end-to-end workflow: from ingesting and querying spatial data with DuckDB/MotherDuck, to refining query results with AI, and finally presenting insights through Python visualizations. This session is designed to empower you to confidently incorporate AI in your coding processes, transforming how you approach data analysis and decision making in real-world business scenarios.

1. Key Components of the Workshop: Dataset Handling: Participants will work with a spatial dataset to evaluate potential locations for opening a new BBQ restaurant. Thanks to MotherDuck, the dataset is easily brought down locally in a highly compressed format, ensuring a quick and safe environment for experimentation.

2. Live AI-Assisted Coding: The workshop will feature a live demonstration where an AI tool iteratively generates SQL queries. Rather than pre-defining metrics, the AI assists in exploring and defining the spatial parameters necessary to identify the optimal restaurant location—a process that mirrors real-world, dynamic decision-making.

3. Real-Time Data Visualization: As queries are refined and executed, Python will be used to chart the results on the fly. Utilizing uv for environment management alongside visualization libraries such as Seaborn and Matplotlib, participants will see how spatial insights are translated into clear, actionable charts.

4. Iterative, Low-Risk Workflow: The session emphasizes a low-risk, experimental approach. If the AI-generated code isn't perfect, no harm is done—files can be quickly deleted or corrected, encouraging a creative, hands-on learning environment where trial and error lead to deeper understanding.
Read More
Cole Bowden
Cole Bowden
Developer Advocate
Firebolt
Cole Bowden
Developer Advocate
Firebolt
Developer Advocate Firebolt
Workshops
The Power of Low Latency Data for AI Apps
Retrieval-augmented generation (RAG) has transformed AI applications by grounding responses with external data. It can be better. By pairing RAG with low latency SQL analytics, you can enrich responses with instant insights, leading to a more interactive and insightful user experience with fresh, data-driven intelligence. In this talk, we’ll demo how low latency SQL combined with an AI application can deliver speed, accuracy, and trust.
Read More
Rui Lopes
Rui Lopes
Head of AI
DataLinks
Rui Lopes
Head of AI
DataLinks
Head of AI DataLinks
Workshops
Powering AI Workflows with Tabular Graphs
DataLinks is the new semantic layer for AI systems. Join us in this workshop to gain a concise overview of our entity-linking technology, backed by two dynamic demonstrations. First we will enable you to experience firsthand how our intuitive user interface simplifies complex data integration, visualization, and exploration, enabling rapid discoverability and seamless dataset linkage. Then we invite you to discover the flexibility of our API and Python SDK, designed for developers to effortlessly integrate automated entity resolution and graph-based insights into their workflows and applications. Finally, we'll show how to leverage our platform for natural language search over your data enabling AutoRAG for your application.
Read More
Issac Roth
Issac Roth
Co-Founder & CEO
Orama
Issac Roth
Co-Founder & CEO
Orama
Co-Founder & CEO Orama
Lightning Talks
OramaCore: A Search Database with LLMs Built-In
In this fast-paced talk by we’ll dive right in to why the world needed another database - this time with multiple LLMs and a JavaScript engine right in the same process. A database that runs on GPUs? Why why why? Turns out this is the ultimate platform for agentic AI like the SaaS Copilots and answer engines that developers create with Orama. That’s why! We’ll look at the construction of the database, the algorithms involved, how we made it fast, and a little bit of what you can do with it. OramaCore is open source and just released!
Read More
Alexy Khraborov
Alexy Khraborov
AI/ML Community Architect
Neo4j
Alexy Khraborov
AI/ML Community Architect
Neo4j
AI/ML Community Architect Neo4j
Lightning Talks
OAKS: Open Agentic Knowledge Stack

The first two years of the GenAI revolution are bending the OSS way: Open Source models have reached state of the art, and most of the ecosystem around AI is open-source. The key to AI adoption is properly organizing and using business knowledge. In industry, LLMs give way to Small Specialized Models (SSMs), utilized by Domain Expert Agents (DXAs). Their work should be structured according to the domain requirements, requiring structured output. Organizing and using domain knowledge for AI has long been a domain of Knowledge Graphs (KG). At Neo4j, we are in a moment where our KG leadership powers the rise of GraphRAG, a better context traversal that we lead alongside Microsoft, Amazon, Google, and other GenAI partners. We also integrate with many OSS AI startups to build a better AI stack around GraphRAG. Neo4j has joined LFAI to bridge the enterprise AI adoption with startup innovation, centered around structured knowledge. In this talk we describe OAKS, a set of projects, communities, and technologies that comprise the Open Agentic AI Knowledge Stack. We show where the most value will be created and how the OSS AI ecosystems come together to build and deliver it.

OAKS consists of structured input, knowledge transformation, and structured output. We show the Agentic AI architectures emerging around AI memory, graph-based agentic workflows, and frameworks including scalable message passing, knowledge encapsulation, and colocated knowledge and computation for web-scale routing. We invite the community to join us!

Read More
Anant Agarwal
Anant Agarwal
Staff Software Engineer & Engineering Lead
Instacart
Anant Agarwal
Staff Software Engineer & Engineering Lead
Instacart
Staff Software Engineer & Engineering Lead Instacart
Data Eng & Infrastructure
Orchestrating at Scale: How Instacart Manages 20M+ Daily Workflows

Building High-Throughput Data Orchestration: Instacart's Journey to 20M Daily Workflows | Explore how Instacart built an enterprise-grade orchestration system handling 20 million daily workflows across diverse technical domains. Learn implementation details of their cloud-native platform combining Apache Airflow and Temporal for robust scheduling and execution. Deep dive into YAML-based workflow definitions, GitOps deployment patterns, and observability solutions that enable reliable scaling. Practical insights from years of production experience, applicable to both startups and enterprises building scalable data infrastructure.

Read More
Skyler Thomas
Skyler Thomas
Co-Founder & CTO
Cake AI
Skyler Thomas
Co-Founder & CTO
Cake AI
Co-Founder & CTO Cake AI
GenAI Applications
Make Too Much Knowledge Just Enough. Massive Scale RAG and GraphRAG with Open Source
RAG systems that work in the real world are not just the trivial extract, vector search, and rerank systems that the simplistic "Introductions to RAG" suggest. After this talk, you will understand how to think about the design and construction of real world RAG and GraphRAG systems that can scale to hundreds of millions of documents or billions of vectors. You will learn about the complex orchestration of multiple libraries. You will also learn how to use tools and frameworks that use open standards like OpenTelemetry or OpenInference to help you monitor and debug these complex RAG orchestrations. Topics will include discussions of scalable RAG/GraphRAG architectures, complex extraction flows, embedding model and re-ranking considerations. We will dive deep into integration between various libraries like Ray, LangChain, LlamaIndex, DSPy, Phoenix, Weaviate, PgVector, GraphRAG, LangGraph, AirFlow, KFP and vLLM to form a cohesive solutions that actually scale. We will discuss the patterns and anti-patterns Cake has learned building and deploying these systems for real customers. If time permits, we will address advanced topics like complex table-detection/extraction for financial data, complex agentic flows to handle heterogeneous datasets, etc.
Read More
Brenna Buuck
Brenna Buuck
Developer Evangelist
MinIO
Brenna Buuck
Developer Evangelist
MinIO
Developer Evangelist MinIO
Lightning Talks
The Middle Ground: Balancing Batch and Real-Time Processing in a Data Lakehouse
Data Lakehouse Architecture: Unifying Batch and Real-Time Data Processing | Is your organization stuck choosing between batch and streaming? The reality is, you probably need both. This session explores how modern data lakehouse architectures are breaking down the false dichotomy between batch and real-time processing. We'll examine how innovative organizations are using lakehouse platforms to handle everything from millisecond-latency queries to massive batch analytics jobs on a single unified platform. Learn how this hybrid approach is transforming data infrastructure, reducing complexity, and enabling teams to build more flexible, future-proof data systems.
Read More
Colleen Tartow
Colleen Tartow
Senior Director, Enterprise Data Engineering
Capital One
Colleen Tartow
Senior Director, Enterprise Data Engineering
Capital One
Senior Director, Enterprise Data Engineering Capital One
AI & Data Culture
No More BS: How (and When) to Really Leverage AI
Successful AI implementation hinges on a solid foundation of data quality and governance, and the current hype often overshadows the critical practical considerations needed to achieve that foundation. Moreover, while AI holds immense potential, it's crucial to evaluate whether it's truly the optimal solution for a given business problem, as simpler, more established methods may be equally or more effective. We present a practical framework to assess whether AI is the optimal solution, and encourage some good old-fashioned critical thinking. Join Colleen Tartow and Lindsay Murphy for a data-driven conversation exploring AI's true viability.
Read More
Marco Slot
Marco Slot
Software Imagineer
Crunchy Data
Marco Slot
Software Imagineer
Crunchy Data
Software Imagineer Crunchy Data
Databases
Converging Database Architectures: DuckDB in PostgreSQL
Traditionally divided between transactional and analytical systems, databases are converging through innovative architectural approaches. This talk explores the fusion of PostgreSQL and DuckDB, demonstrating how embedding an OLAP database into an OLTP system can simplify data platforms. Attendees will learn about the motivations, challenges, and substantial benefits of creating a unified system capable of high-throughput transactions, fast analytical queries, and seamless data processing across different paradigms.
Read More
Anil Sadineni
Anil Sadineni
Principal Software Engineer
1upHealth
Anil Sadineni
Principal Software Engineer
1upHealth
Principal Software Engineer 1upHealth
Lightning Talks
A Modern Data Stack in Healthcare
The US Healthcare industry faces complex data exchange challenges, with legacy standards creating massive processing burdens. This session explores how emerging technologies like FHIR can transform healthcare data management by leveraging modern data stack approaches. Attendees will discover innovative strategies for addressing unique healthcare data challenges, including cross-entity data contracts, identity management, and end-to-end lineage preservation. Learn how technologies from social media, advertising, and finance can revolutionize healthcare data processing, overcoming traditional interoperability and scalability limitations.
Read More
Dr. Greg Michaelson
Dr. Greg Michaelson
Co-Founder & Chief Product Officer
Zerve AI
Dr. Greg Michaelson
Co-Founder & Chief Product Officer
Zerve AI
Co-Founder & Chief Product Officer Zerve AI
Workshops
Scaling GenAI & Agentic Workflows for practical solutions with Zerve
Enterprises investing in Generative AI (GenAI) or Agentic Workflows need more than just cutting-edge models—they need scalable, cost-efficient systems that deliver real business impact. In this session we’ll show how Zerve unlocks the full potential of GenAI using it’s distributed computing engine, The Fleet. You’ll learn how enterprises as advanced as Canal+ and NASA as well as cutting edge startups are streamlining AI development, reducing infrastructure costs, and transforming GenAI into a scalable, high-impact business solution.
Read More
CURATING TRACK SPEAKERS. STAY TUNED.
View all speakers

WHY ATTEND?

Go beyond just conference talks and engage directly with our community.

  • Expo Hall & Networking
  • Interactive Workshops
  • Speaker Office Hours
  • Drinks & Demos

Expo Hall & Networking

DSC_5036-1

Discover cutting-edge tools and technologies from innovators at the forefront of AI & data. Explore, connect, and get a firsthand look at what’s next.

Interactive Workshops

Data-Council-2024-Day-2-Tico-Mendoza--4811

Why pay extra to level up your career? Gain practical training on the latest data tools from the architects & builders of the tools themselves. (All workshops Included in Ticket Price)

Speaker Office Hours

DSC_6671

Our speakers provide real technical depth and go beyond whitepaper-level details. Office Hours sessions with speakers follow each talk and feature additional in-depth discussion opportunities for attendees.

Drinks & Demos

Data-Council-2024-Day-2-Tico-Mendoza--5855

Rub shoulders with the brightest minds in AI & data. Come to make meaningful connections with startups, customers, peers, investors & more.

AI Launchpad

Join us on Day 1 during our 🥳 Community Party to hear from 6 exceptional AI startups. 

Brought to you by Zero Prime Ventures.

ai-launchpad-4
Data-Council-2024-Day-2-Tico-Mendoza--5882-2
ai-launchpad-4

3-Day Conference Passes

 Startup Ticket 
$799

Founder-Friendly Pricing

Our special discounted rate for
companies that have raised <$5M

 Regular Ticket 
$1999

Discounted Group Pricing
 
Buy 5 Tickets = $1,200/each
Buy 10 Tickets = $1,000/each

Investor Ticket 
$4999

(They can afford it 💸)
 
Investor tickets help subsidize our
Startup Tickets. Thank you :)

💥 INCLUDED IN ALL TICKETS 💥
3 Full Days • Free Workshops • Speaker Office Hours • Community Party
• Data Council T-shirt & Tote
• Breakfast, Lunch & Snacks • Coffee & Drinks • Locally Sourced Food • Talk Recordings • Fun Networking 

👮‍♀️ NEED MANAGER APPROVAL?
Check out our Convince Your Boss Email Template

Oakland-1
oakland-3
oakland-2
oakland-4
oakland-6

See You in Oakland!

This year, we're excited to call the historic Oakland Scottish Rite Center home to Data Council 2025.

 

Nestled on Lake Merritt with stunning lake views, this architectural gem puts you steps from downtown's best hotels, dining, and nightlife. Just 15 minutes from BART or a scenic ferry ride from downtown San Francisco.

 

547 Lakeside Dr, Oakland, CA, 94612

BEST-Front-Page-1300x662-2
Oakland-1
oakland-3
oakland-2
oakland-4

Oakland Scottish Rite Center

The Temple of Data

April 22 - 24, 2025

Lake Merritt, Oakland

Why Attend Data Council?

Group 124

Learn from Industry Experts

Get architectural insights and best practices straight from the pioneers building the future of data & AI, no marketing fluff here.

Hands-On Experiences

Hands-On Experiences

Put theory into practice through interactive workshops and learning opportunities, such as our unique office hours where you can meet any speaker in a small group setting.

Unparalleled_Networking

Unparalleled Networking

Get exclusive access and connect with engineers and founders who speak your language. No suits and sales pitches, just real pros sharing their work.

Meet the Hosts

Content quality sets Data Council apart. Unlike other conferences that simply accept abstracts as-is, our track hosts go the extra mile to hand select presentations and collaborate with speakers on their topics to ensure the highest value talks take the stage.

Bryan Bischof

Bryan Bischof

Head of AI

Theory Ventures
Carlos Aguilar

Carlos Aguilar

Founder

Hashboard
Daniel Francisco

Daniel Francisco

Director of Product

Meta
Maggie Hays

Maggie Hays

Community Product Manager

Acryl Data
Roger Magoulas

Roger Magoulas

Principal

Almost Data
Sai Srirampur

Sai Srirampur

Principal Engineer

Clickhouse
Scott Breitenother

Scott Breitenother

Founder

Brooklyn Data
Sean Anderson

Sean Anderson

Head of Product Marketing

Vectara
Sean Taylor

Sean Taylor

Data Scientist

OpenAI
Swyx (Shawn)  Wang

Swyx (Shawn) Wang

Co-Host

Latent.Space Podcast
Tristan Zajonc

Tristan Zajonc

CEO & Co-Founder

Continual

About Our Tracks

Our carefully curated tracks balance proven technical foundations with emerging data & AI trends. Get real frameworks, techniques and actionable knowledge straight from seasoned practitioners.

Frame 277
Data Eng & Infrastructure
2
AI Engineering
3i
Data Science & Algos
4i
GenAI Applications
5i
Analytics & BI
6i
MLOps & Platforms
7i
Foundation Models
8i
Databases
9i
AI & Data Culture
ai
Lightning Talks

FAQ

Do you offer group ticket rates?

Yes! We <3 teams at Data Council and offer streamlined packages for groups of 5 or 10 with huge savings of up to 40-50% off regular ticket prices. Best of all, you can purchase them directly with no invoicing or back-and-forth needed with a sales rep. Simply visit our ticketing site to learn more about group rates.

Do you offer startup or non-profit tickets?

Yes, we offer discounts for startups (must have raised <$5M), non-profits, government agencies and academic students & faculty. For startups, please see our ticketing site and for non-profit & academic, please contact community@datacouncil.ai for more information.

When and where will Data Council 2025 be held?

We're excited to bring Data Council back to the Bay Area on Apr 22-24, 2025! The event will be held at the historic Oakland Scottish Rite Center, right off the shores of beautiful Lake Merritt in Oakland, CA.

Are there extra costs to be aware of for attending?

Once you purchase your ticket, all talks, workshops and networking opportunities are available to you as part of the Data Council experience. However, external costs such as travel, lodgings and commuting are your responsibility.

Meet the Team

pete-soderling-1

Pete Soderling

Founder & GP at Zero Prime Ventures
yang-tran

Yang Tran

Partner at Zero Prime Ventures
tim-wu

Tim Wu

Head of Marketing at Data Council
missy-bass

Missy Bass

Events Director at Data Council
gillian-jarvis

Gillian Jarvis

Design Advisor at Data Council