Data Council 2025, Apr 22-24, Oakland, CA

Speakers From

Data-Council-2024-Day-2-Tico-Mendoza-All-4723

Data-Council-2024-Day-3-Tico-Mendoza--6573

JOIN YOUR TRIBE

Our attendees are AI engineers, founders, CTOs, AI researchers, Heads of Data, and investors who are all building the future of data.

Days

Technical Attendees

Deep-Dive Talks

Featured Keynotes

Naveen Rao

VP of AI

Databricks

Denis Yarats

Co-Founder & CTO

Perplexity

Aaron Katz

Co-Founder & CEO

Clickhouse

Martin Casado

General Partner

a16z

Sharon Zhou

Founder & CEO

Lamini

Michele Catasta

President

Replit

Jake Brill

Head of Product - Integrity

OpenAI

Rachad Alao

Senior Engineering Director

Meta

Julien Le Dem

Principal Engineer

Datadog

Joseph Gonzalez

Professor

RunLLM & UC Berkeley

Krishnaram Kenthapadi

Chief Scientist, Clinical AI

Oracle Health

George Mathew

Managing Director

Insight Partners

Daniel Olmedilla

Distinguished Engineer, AI & Trust

Sumti Jairath

Chief Architect

SambaNova Systems

View All Keynote Speakers

Featured Keynotes

Naveen Rao

VP of AI

Databricks

Data Meets Intelligence: Where the Data Infra & AI Stack Converge

Naveen Rao

VP of AI

Databricks

VP of AI Databricks

Keynotes

Data Meets Intelligence: Where the Data Infra & AI Stack Converge

As organizations navigate the AI revolution, the traditional boundaries between data infrastructure and AI systems are blurring. This session explores the critical convergence point where data management meets machine intelligence, examining how this intersection is reshaping enterprise technology stacks. Our keynote panelists bring complementary perspectives from operating and investing in this rapidly evolving landscape. Drawing from experiences spanning neuroscience, hardware architecture, product development, and venture capital, they'll unpack the technical and strategic considerations for organizations building modern data + AI platforms. Join us for an insightful discussion on how this convergence enables scalable AI adoption, the architectural patterns emerging across industries, and what the future holds as data infrastructure and AI capabilities become increasingly interdependent in driving business transformation.

Denis Yarats

Co-Founder & CTO

Perplexity

RAGs to Riches: Engineering the Future of LLM Systems

Denis Yarats

Co-Founder & CTO

Perplexity

Co-Founder & CTO Perplexity

Keynotes

RAGs to Riches: Engineering the Future of LLM Systems

This keynote panel features Denis Yarats and Joseph Gonzalez -- two pioneers bridging academic theory and practical application. Joseph Gonzalez has transformed his Berkeley research into tangible solutions through LM-Sys and Gorilla projects, now bringing his expertise in machine learning and robotics to RunLLM.com after successfully launching Turi based on his doctoral work. Denis Yarats complements this approach with his reinvention of information discovery at Perplexity, where he's leveraging his NYU PhD and Facebook AI experience to develop Comet, a revolutionary "browser for agentic search." Together, they exemplify how rigorous academic foundations can be transformed into technologies that solve real-world problems and reshape our digital interactions.

Aaron Katz

Co-Founder & CEO

Clickhouse

Real-Time Data Infrastructure and AI: Powering the Next Generation of Analytics

Aaron Katz

Co-Founder & CEO

Clickhouse

Co-Founder & CEO Clickhouse

Keynotes

Real-Time Data Infrastructure and AI: Powering the Next Generation of Analytics

Join us for an authentic fireside chat that cuts through industry hype to explore how real-time data infrastructure is transforming analytics and AI. We'll examine technical hurdles in processing data at scale in real-time and the architectural decisions enabling performance breakthroughs. The discussion covers data processing evolution, performance bottlenecks in distributed systems, observability innovations, and approaches for maintaining consistency while increasing throughput. Gain insights into how real-time processing creates competitive advantages for business intelligence and AI applications, with honest assessments of implementation challenges and practical solutions.

Martin Casado

General Partner

a16z

Real-Time Data Infrastructure and AI: Powering the Next Generation of Analytics

Martin Casado

General Partner

a16z

General Partner a16z

Keynotes

Real-Time Data Infrastructure and AI: Powering the Next Generation of Analytics

Sharon Zhou

Founder & CEO

Lamini

RAGs to Riches: Engineering the Future of LLM Systems

Sharon Zhou

Founder & CEO

Lamini

Founder & CEO Lamini

Keynotes

RAGs to Riches: Engineering the Future of LLM Systems

Michele Catasta

President

Replit

RAGs to Riches: Engineering the Future of LLM Systems

Michele Catasta

President

Replit

President Replit

Keynotes

RAGs to Riches: Engineering the Future of LLM Systems

Jake Brill

Head of Product - Integrity

OpenAI

Guardrails for the Future: AI Safety and Responsible AI in Practice

Jake Brill

Head of Product - Integrity

OpenAI

Head of Product - Integrity OpenAI

Keynotes

Guardrails for the Future: AI Safety and Responsible AI in Practice

Join us for a keynote panel that moves beyond theoretical discussions of AI ethics to explore the practical realities of implementing responsible AI safeguards. This conversation will unpack the complex trade-offs and technical challenges faced when deploying AI systems at scale. Panelists will share insights from building fairness and privacy protections into major platforms while maintaining innovation, and discuss how responsible AI has evolved into a business imperative. Topics include creating effective trust and safety protocols for generative AI, developing robust safeguards for user-generated content, implementing fairness frameworks across diverse products, and managing the tension between rapid deployment and thorough safety testing. Expect candid discussion about governance structures that work, persistent technical hurdles, and lessons learned from high-stakes incidents that shaped today's AI safeguards.

Rachad Alao

Senior Engineering Director

Meta

Guardrails for the Future: AI Safety and Responsible AI in Practice

Rachad Alao

Senior Engineering Director

Meta

Senior Engineering Director Meta

Keynotes

Guardrails for the Future: AI Safety and Responsible AI in Practice

Julien Le Dem

Principal Engineer

Datadog

The Deconstructed Database and the Advent of the Open Data Lake

Julien Le Dem

Principal Engineer

Datadog

Principal Engineer Datadog

Keynotes

The Deconstructed Database and the Advent of the Open Data Lake

In 2018, Julien Le Dem described how the components of databases, distributed or not, were being commoditized as individual parts that anyone could recombine into use-case-specific engines. Given one's constraints, they could leverage those components to build a query engine that solves a specific problem much faster than building everything from the ground up. He called this idea "the Deconstructed Database" and spoke about it at a previous edition of Data Council. Fast forward to today, the big data ecosystem has matured and evolved from a melting pot of competing projects into a more composable ecosystem organized around a few open source standards. It's been incredible to see the vision he outlined in his talk crystallize with the adoption of key components like Parquet, Arrow, Iceberg, Calcite, Substrait and OpenLineage. These tools, and others like them, provide an interoperability layer that enables harnessing data for many purposes without creating silos.

In this talk, Julien will discuss the impact of the cloud and the advent of the open data lake, breaking silos to form the foundation of this ecosystem. As compute and storage can be efficiently decoupled, a common storage layer enables a vibrant ecosystem of on-demand tools specialized to specific use cases that avoid vendor lock-in. He'll go over the core components, how they work together and more importantly, the contracts that keep them decoupled and composable.

Joseph Gonzalez

Professor

RunLLM & UC Berkeley

AGI Is Already Here (But It's Not What You Think)

RAGs to Riches: Engineering the Future of LLM Systems

Joseph Gonzalez

Professor

RunLLM & UC Berkeley

Professor RunLLM & UC Berkeley

GenAI Applications

AGI Is Already Here (But It's Not What You Think)

The Future of AGI: Building Compound AI Systems | Explore a paradigm shift in AGI development through the lens of compound AI systems that integrate multiple LLMs with specialized tools. Learn how orchestrating diverse AI components can achieve human-level performance across broad task domains, demonstrated through RunLLM's AI support engineer implementation. Features practical approaches to building general-purpose AI workflows that combine speed, accuracy, and adaptability. Includes real-world case studies showing how compound AI systems are transforming customer support and service automation.

Keynotes

RAGs to Riches: Engineering the Future of LLM Systems

Krishnaram Kenthapadi

Chief Scientist, Clinical AI

Oracle Health

Guardrails for the Future: AI Safety and Responsible AI in Practice

Krishnaram Kenthapadi

Chief Scientist, Clinical AI

Oracle Health

Chief Scientist, Clinical AI Oracle Health

Keynotes

Guardrails for the Future: AI Safety and Responsible AI in Practice

Krishnaram Kenthapadi is the Chief Scientist of Clinical AI at Oracle Health, where he leads AI initiatives for Clinical AI Agent and other Oracle Health products, focusing on modernizing clinical applications, reducing administrative burden for clinicians, and driving healthcare transformation through trustworthy AI. Previously, he was a Principal Scientist at Amazon AWS AI, where he led the fairness, explainability, and privacy initiatives in Amazon AI platform. Until recently, he led similar efforts across different LinkedIn applications as part of the LinkedIn AI team, and served as LinkedIn's representative in Microsoft's AI and Ethics in Engineering and Research (AETHER) Advisory Board. He shaped the technical roadmap and led the privacy/modeling efforts for LinkedIn Salary product, and prior to that, served as the relevance lead for the LinkedIn Careers and Talent Solutions Relevance team, which powers search/recommendation products at the intersection of members, recruiters, and career opportunities. Previously, he was a Researcher at Microsoft Research Silicon Valley, where his work resulted in product impact (and Gold Star / Technology Transfer awards), and several publications/patents. Krishnaram received his Ph.D. in Computer Science from Stanford University in 2006, and his Bachelors in Computer Science from IIT Madras.

He serves regularly on the program committees of KDD, WWW, WSDM, and related conferences, and co-chaired the 2014 ACM Symposium on Computing for Development. He received Microsoft's AI/ML conference (MLADS) distinguished contribution award, NAACL best thematic paper award, CIKM best case studies paper award, SODA best student paper award, and WWW best paper award nomination. He has published 40+ papers, with 2500+ citations and filed 140+ patents (30+ granted). He has presented lectures/tutorials on privacy, fairness, and explainable AI in industry at forums such as KDD '18 '19, WSDM '19, WWW '19, FAccT '20, and AAAI'20 , and instructed a course on AI at Stanford.

George Mathew

Managing Director

Insight Partners

Data Meets Intelligence: Where the Data Infra & AI Stack Converge

George Mathew

Managing Director

Insight Partners

Managing Director Insight Partners

Keynotes

Data Meets Intelligence: Where the Data Infra & AI Stack Converge

Daniel Olmedilla

Distinguished Engineer, AI & Trust

Guardrails for the Future: AI Safety and Responsible AI in Practice

Daniel Olmedilla

Distinguished Engineer, AI & Trust

Distinguished Engineer, AI & Trust LinkedIn

Keynotes

Guardrails for the Future: AI Safety and Responsible AI in Practice

Sumti Jairath

Chief Architect

SambaNova Systems

Bringing Trillions to Reality: How SambaNova’s Memory-Centric Design Powers Agentic AI and GenAI Workflows for Enterprise Data

Sumti Jairath

Chief Architect

SambaNova Systems

Chief Architect SambaNova Systems

Keynotes

Bringing Trillions to Reality: How SambaNova’s Memory-Centric Design Powers Agentic AI and GenAI Workflows for Enterprise Data

As enterprises increasingly leverage vast public and private datasets, generative AI and agentic systems are transforming the landscape of AI-driven solutions. These systems demand unparalleled scalability, speed, and efficiency to process massive data volumes while autonomously orchestrating complex workflows. SambaNova Systems offers its revolutionary memory-centric design, engineered to power trillion-parameter models and multi-agent systems with record-breaking interactive inference performance.

This talk will delve into SambaNova’s innovative three-tier memory system and reconfigurable dataflow architecture, which overcome the "memory wall" challenge by enabling seamless switching between hundreds of agents in microseconds. Attendees will explore how these technologies optimize data access, minimize latency, and scale across diverse real-world applications—from real-time decision-making to autonomous multi-agent collaboration—delivering transformative solutions for enterprises worldwide.

CURATING TRACK SPEAKERS. STAY TUNED.

VIEW ALL KEYNOTES

Featured Keynote Speakers

Naveen Rao

VP of AI

Databricks

Denis Yarats

Co-Founder & CTO

Perplexity

Aaron Katz

Co-Founder & CEO

Clickhouse

Martin Casado

General Partner

a16z

Sharon Zhou

Founder & CEO

Lamini

Michele Catasta

President

Replit

Jake Brill

Head of Product - Integrity

OpenAI

Rachad Alao

Senior Engineering Director

Meta

Julien Le Dem

Principal Engineer

Datadog

Joseph Gonzalez

Professor

RunLLM & UC Berkeley

Krishnaram Kenthapadi

Chief Scientist, Clinical AI

Oracle Health

George Mathew

Managing Director

Insight Partners

Daniel Olmedilla

Distinguished Engineer, AI & Trust

Sumti Jairath

Chief Architect

SambaNova Systems

View All Keynote Speakers

100+ Speakers

Learn from data & AI heroes at top companies as they explain their architectures, discoveries and solutions in detail.

Talk Schedule

Lloyd Tabb

Founder/Former CTO - Looker & Co-creator of Malloy

Meta

Building Blocks: Reusing Queries in Semantic Data Modeling

Lloyd Tabb

Founder/Former CTO - Looker & Co-creator of Malloy

Meta

Founder/Former CTO - Looker & Co-creator of Malloy Meta

Analytics & BI

Building Blocks: Reusing Queries in Semantic Data Modeling

Data exploration is like a sophisticated Lego set, where strategic piece selection transforms understanding. This session delves into advanced semantic data modeling, revealing how reusing queries creates more powerful, intelligent building blocks that enhance comprehension for both humans and AI. Attendees will learn how to move beyond traditional tables and measures, revolutionizing their approach to data analysis and uncovering deeper insights through innovative modeling techniques.

Hannes Mühleisen

Co-Creator of DuckDB

DuckDB Labs

Liberate Analytical Data Management with DuckDB

Hannes Mühleisen

Co-Creator of DuckDB

DuckDB Labs

Co-Creator of DuckDB DuckDB Labs

Data Eng & Infrastructure

Liberate Analytical Data Management with DuckDB

DuckDB Analytics Engine: High-Performance Data Processing Without Limits | Discover how DuckDB's revolutionary in-process analytical engine transforms data warehouse capabilities through a lightweight, versatile architecture. The engine features state-of-the-art vectorized query processing, morsel-driven parallelism, and advanced memory management that scales from embedded devices to powerful servers. This talk dives deep into DuckDB's innovative design principles, implementation strategies, and optimization techniques that enable previously impossible use cases on single nodes. Learn from real-world applications and performance benchmarks demonstrating DuckDB's impact on modern data analytics workflows.

Nikunj Handa

Product Lead

OpenAI

OpenAI’s Responses API: A New Foundation for Building with Models & Tools

Nikunj Handa

Product Lead

OpenAI

Product Lead OpenAI

Foundation Models

OpenAI’s Responses API: A New Foundation for Building with Models & Tools

Last month, OpenAI introduced the Responses API: a programmatic agent API businesses can use to perform a wide variety of tasks. With this new primitive, we radically simplified integration, transforming what previously took hundreds of lines of code into just a few. Built from the ground up based on insights from thousands of developers who have used Chat Completions and Assistants APIs, Responses reimagines simplicity, performance, and flexibility, enabling seamless integration of advanced reasoning, multimedia inputs, and multi-step workflows. In this talk, I'll walk through the design decisions and engineering challenges behind Responses. You'll learn how we anticipated developer needs to create an API uniquely engineered for agent-like use cases, capable of handling simultaneous tool calls and seamless multi-turn conversations. We'll explore key features like built-in state management, semantic streaming, intelligent token truncation, and support for hosted tools (ex: web search, file search, and computer operations) that significantly reduce complexity and enhance real-time interactions. And, we’ll talk about how Responses empowers developers to build faster, smarter, and more responsive AI applications than ever before – driving the next wave of intelligent, agentic experiences.

Naveen Rao

VP of AI

Databricks

Data Meets Intelligence: Where the Data Infra & AI Stack Converge

Naveen Rao

VP of AI

Databricks

VP of AI Databricks

Keynotes

Data Meets Intelligence: Where the Data Infra & AI Stack Converge

Ravin Kumar

Senior Researcher

Google Deepmind

Models as Tools: My Perspective On the Matter

Ravin Kumar

Senior Researcher

Google Deepmind

Senior Researcher Google Deepmind

Foundation Models

Models as Tools: My Perspective On the Matter

You can look at GenAI from many perspectives. For me perspective shifts when I'm building products, to when I'm training foundation models, to being a day to day user of GenAI. However, most people aren't doing all these things. For the audience here I suggest focusing on one practical angle: LLMs as tools. In this talk I'll share how in this perspective LLMs are just any other tool. By starting with this perspective it'll ensure you start from a grounded realistic perspective before moving into the exciting more hype laden aspects of this new technology.

Denis Yarats

Co-Founder & CTO

Perplexity

RAGs to Riches: Engineering the Future of LLM Systems

Denis Yarats

Co-Founder & CTO

Perplexity

Co-Founder & CTO Perplexity

Keynotes

RAGs to Riches: Engineering the Future of LLM Systems

Hadley Wickham

Chief Scientist

Posit

LLMs for Data Science

Hadley Wickham

Chief Scientist

Posit

Chief Scientist Posit

Data Sci & Algos

LLMs for Data Science

Obviously everyone is super excited about LLMs right now, and while there's a large element of hype in the popularity they are also genuinely useful. In this talk I'll give a round up of data science things that I've found LLMs particularly useful for, broken up into three broad categories: writing code, writing prose, and rectangling fundamentally non-rectangular data (e.g. test, images, videos, audio).

Aaron Katz

Co-Founder & CEO

Clickhouse

Real-Time Data Infrastructure and AI: Powering the Next Generation of Analytics

Aaron Katz

Co-Founder & CEO

Clickhouse

Co-Founder & CEO Clickhouse

Keynotes

Real-Time Data Infrastructure and AI: Powering the Next Generation of Analytics

Paige Bailey

AI Developer Experience Engineer

Google

Introduction to Google DeepMind's Models: Gemini 2.0, Imagen 3, and Veo

Paige Bailey

AI Developer Experience Engineer

Google

AI Developer Experience Engineer Google

Workshops

Introduction to Google DeepMind's Models: Gemini 2.0, Imagen 3, and Veo

This intensive workshop is designed for developers eager to explore the cutting-edge capabilities of Google's latest AI tools. Participants will gain hands-on experience working with the Gemini APIs, Google AI Studio, Veo 2, and Imagen 3, enabling them to build intelligent applications and generate stunning creative content. We'll also cover how to use Gemini 2.0 in developer tools like Cursor, Sourcegraph Cody, and more.

Ryan Blue

Creator of Apache Iceberg, Member of Technical Staff

Databricks

Why is Everyone Talking about Apache Iceberg™? (From the Original Creator of Apache Iceberg)

Ryan Blue

Creator of Apache Iceberg, Member of Technical Staff

Databricks

Creator of Apache Iceberg, Member of Technical Staff Databricks

Data Eng & Infrastructure

Why is Everyone Talking about Apache Iceberg™? (From the Original Creator of Apache Iceberg)

This talk is a primer for Apache Iceberg™ from one of its original creators. In this talk Ryan Blue, CEO of Tabular (now part of Databricks) and the original creator of Apache Iceberg, discusses its origin and why it's even more relevant today. Ryan will discuss the early days of Apache Iceberg at Netflix, how the project evolved at Tabular, and how Tabular (now part of Databricks) will continue its mission of creating a universal format. Attendees will gain an understanding of Apache Iceberg and how open table formats like it are changing the analytic database industry.

Ganesh Ramanarayanan

VP Engineering

Hex

Multi-Modal Compute for Data Analytics

Ganesh Ramanarayanan

VP Engineering

Hex

VP Engineering Hex

Analytics & BI

Multi-Modal Compute for Data Analytics

Following their groundbreaking Data Council 2022 presentation, Hex continues to push notebook technology boundaries with an innovative approach to data analytics. This session delves into their unique, fully parallelized, multi-modal backend, revealing how sophisticated computational techniques are transforming data processing. Attendees will explore cutting-edge methods that redefine performance, flexibility, and computational efficiency in modern data workflows, gaining insights into the next generation of analytical computing.

Raghotham Murthy

Software Engineer, Llama

Meta

Building LLM Applications with Llama Stack

Raghotham Murthy

Software Engineer, Llama

Meta

Software Engineer, Llama Meta

Foundation Models

Building LLM Applications with Llama Stack

In this talk, Raghotham describes what it takes to build production grade LLM applications. Unlike regular applications, LLM applications are non-deterministic, and require a unique set of building blocks to support the full software development lifecycle from building to testing to deploying to monitoring to then improving the application. We will show how Llama Stack can be used to build and improve LLM applications in different environments – local development, cloud, on-prem, and mobile.

Martin Casado

General Partner

a16z

Real-Time Data Infrastructure and AI: Powering the Next Generation of Analytics

Martin Casado

General Partner

a16z

General Partner a16z

Keynotes

Real-Time Data Infrastructure and AI: Powering the Next Generation of Analytics

Tanya Bragin

VP Product

ClickHouse

Unbundling of the Cloud Data Warehouse

Tanya Bragin

VP Product

ClickHouse

VP Product ClickHouse

Databases

Unbundling of the Cloud Data Warehouse

The era of proprietary cloud data warehouses in the last decade has revealed critical challenges: performance bottlenecks, escalating costs, and vendor lock-in. This session examines how open-source technologies and data lake standards are transforming the modern data stack. Explore how platforms like ClickHouse, Iceberg, and other open technologies are providing organizations with flexible, cost-effective alternatives to monolithic cloud data warehouses, enabling more diverse and efficient data workflows.

Ethan Rosenthal

Member of Technical Staff

Runway

Building a Data Foundation for Multimodal Foundation Models

Ethan Rosenthal

Member of Technical Staff

Runway

Member of Technical Staff Runway

Foundation Models

Building a Data Foundation for Multimodal Foundation Models

While it is often easier to simply throw more data at a problem, scale is not all you need when building multimodal foundation models. Data quality continues to be just as important as data quantity, and supporting “data-centric AI” requires lowering the barrier to data curation as much as possible. However, multimodal data curation presents unique requirements compared to conventional machine learning or business intelligence data management systems. The data is heterogeneous, ranging from scalars to embedding arrays to entire compressed videos. While the dataset sizes in terms of number of rows are not quite Big Data™, the number of bytes is massive with high columnar variance. Given the storage size, it’s infeasible to construct and copy new training datasets for each model training job; training jobs must query the core datasets without copying them. Finally, large scale distributed training jobs require fast random access which bumps up against limitations of typical solutions like partitioned parquet files. In this talk, I will discuss how we built a petabyte-scale, multimodal feature lakehouse. This lakehouse supports analytical querying as well as serving features for large scale distributed training jobs, such as those that were used for training Runway’s recent foundation models like Gen-3 Alpha.

Tengyu Ma

Co-Founder & CEO

Voyage AI

RAG In 2025: State Of The Art And The Road Forward

Tengyu Ma

Co-Founder & CEO

Voyage AI

Co-Founder & CEO Voyage AI

AI Engineering

RAG In 2025: State Of The Art And The Road Forward

Enterprise RAG Systems: Building Robust LLM Knowledge Integration | Master advanced techniques in Retrieval-Augmented Generation (RAG) for enterprise-scale language models. Learn strategies to overcome common RAG pipeline challenges including brittle parsers, suboptimal chunking, and manual query tuning. Deep dive into cutting-edge embedding models and reranking systems that enable automated, scalable knowledge retrieval. Discover practical approaches to building production-ready RAG systems that deliver consistent, high-quality results while minimizing maintenance overhead and manual optimization.

Charles Frye

Developer Advocate

Modal Labs

What Every Data Scientist Needs To Know About GPUs

Charles Frye

Developer Advocate

Modal Labs

Developer Advocate Modal Labs

AI Engineering

What Every Data Scientist Needs To Know About GPUs

GPU Optimization for Data Scientists: Essential Knowledge from Silicon to PyTorch | Comprehensive guide to GPU architecture and optimization for modern machine learning workloads. Learn critical GPU concepts from hardware fundamentals to high-level frameworks, with focus on performance tuning for neural networks. Master practical techniques for optimizing system latency and throughput in popular ML frameworks including PyTorch, vLLM, and RAPIDS. Essential knowledge for data scientists and ML engineers working with GPU-accelerated workloads.

Shreya Rajpal

Co-Founder & CEO

Guardrails

The Future Of Guardrails

Shreya Rajpal

Co-Founder & CEO

Guardrails

Co-Founder & CEO Guardrails

AI Engineering

The Future Of Guardrails

AI Safety and Guardrails: Enterprise Framework for Reliable Generative AI | Explore next-generation approaches to implementing guardrails and safety measures in production AI systems, including RAG-enhanced chatbots and autonomous agents. Learn systematic methodologies for risk assessment, reliability monitoring, and failure prevention in enterprise AI deployments. Discover practical frameworks for implementing robust safety controls and guardrails across different AI architectures. Features case studies demonstrating improved system reliability and reduced risks through structured safety protocols and monitoring systems.

Eno Reyes

CTO

Factory

Building Reliable Agentic AI Systems

Eno Reyes

CTO

Factory

CTO Factory

AI Engineering

Building Reliable Agentic AI Systems

Building Reliable Agentic AI Systems: Design Principles for Complex Autonomous Software | Explore cutting-edge approaches to designing reliable AI systems that operate autonomously in unpredictable environments. Learn architectural patterns from robotics, cybernetics, and biological systems for building predictable outcomes from non-deterministic components. Deep dive into practical strategies for implementing reliable agentic systems, with focus on stability, error handling, and performance monitoring. Discover emerging patterns for creating AI systems that achieve reliable results despite underlying stochastic processes.

Sharon Zhou

Founder & CEO

Lamini

RAGs to Riches: Engineering the Future of LLM Systems

Sharon Zhou

Founder & CEO

Lamini

Founder & CEO Lamini

Keynotes

RAGs to Riches: Engineering the Future of LLM Systems

Mike Driscoll

Co-Founder & CTO

Rill Data

A SQL-Based Metrics Layer for DuckDB and Clickhouse

Mike Driscoll

Co-Founder & CTO

Rill Data

Co-Founder & CTO Rill Data

Analytics & BI

A SQL-Based Metrics Layer for DuckDB and Clickhouse

The ability to aggregate raw data into summarized metrics and slice them across dimensions is at the core of analytics teams' work. This session reveals how Rill has developed a metrics layer that declares metrics entirely with SQL expressions, overcoming traditional limitations of metrics management. By leveraging DuckDB and Clickhouse, attendees will discover how to generate multi-dimensional OLAP cubes, implement real-time data access with sub-second performance, and create uniform dashboards through a BI-as-code philosophy. Learn how to define, manage, and secure metrics using an innovative SQL-based approach that transforms raw data into powerful, actionable insights.

Bryan Bischof

Head of AI

Theory Ventures

Failure Is A Funnel

Bryan Bischof

Head of AI

Theory Ventures

Head of AI Theory Ventures

Data Sci & Algos

Failure Is A Funnel

LLM Quality Engineering: From Slop to Production | Learn systematic approaches to evaluating and improving LLM performance, with focus on transforming experimental models into production-ready systems. Master practical frameworks for quality assessment, iterative improvement, and building robust deployment pipelines. Features proven strategies for identifying failure patterns and establishing reliable production environments.

Nuno Campos

Founding Engineer

LangChain

The Future Of Data Engineering: From Unstructured To Structured for Agent Systems

Nuno Campos

Founding Engineer

LangChain

Founding Engineer LangChain

Data Eng & Infrastructure

The Future Of Data Engineering: From Unstructured To Structured for Agent Systems

Next-Generation Data Engineering: AI Agents, Knowledge Graphs, Memory, and Real-Time Systems | Discover how AI agents and foundation models are revolutionizing data engineering through advanced real-time data processing. Learn cutting-edge approaches to agent memory and knowledge representation using semantic layers, vector embeddings, and graph RAG (Retrieval Augmented Generation) systems that power modern AI applications. This expert session explores the evolution from traditional data modeling to dynamic knowledge graphs, with implementation strategies for building responsive, context-aware data platforms and agent memory. Industry leaders share practical insights on adopting these technologies and advancing your career in AI-driven data engineering, including real-world case studies and emerging best practices.

Michele Catasta

President

Replit

RAGs to Riches: Engineering the Future of LLM Systems

Michele Catasta

President

Replit

President Replit

Keynotes

RAGs to Riches: Engineering the Future of LLM Systems

Jake Brill

Head of Product - Integrity

OpenAI

Guardrails for the Future: AI Safety and Responsible AI in Practice

Jake Brill

Head of Product - Integrity

OpenAI

Head of Product - Integrity OpenAI

Keynotes

Guardrails for the Future: AI Safety and Responsible AI in Practice

Rachad Alao

Senior Engineering Director

Meta

Guardrails for the Future: AI Safety and Responsible AI in Practice

Rachad Alao

Senior Engineering Director

Meta

Senior Engineering Director Meta

Keynotes

Guardrails for the Future: AI Safety and Responsible AI in Practice

Julien Le Dem

Principal Engineer

Datadog

The Deconstructed Database and the Advent of the Open Data Lake

Julien Le Dem

Principal Engineer

Datadog

Principal Engineer Datadog

Keynotes

The Deconstructed Database and the Advent of the Open Data Lake

Joseph Gonzalez

Professor

RunLLM & UC Berkeley

AGI Is Already Here (But It's Not What You Think)

RAGs to Riches: Engineering the Future of LLM Systems

Joseph Gonzalez

Professor

RunLLM & UC Berkeley

Professor RunLLM & UC Berkeley

GenAI Applications

AGI Is Already Here (But It's Not What You Think)

Keynotes

RAGs to Riches: Engineering the Future of LLM Systems

Krishnaram Kenthapadi

Chief Scientist, Clinical AI

Oracle Health

Guardrails for the Future: AI Safety and Responsible AI in Practice

Krishnaram Kenthapadi

Chief Scientist, Clinical AI

Oracle Health

Chief Scientist, Clinical AI Oracle Health

Keynotes

Guardrails for the Future: AI Safety and Responsible AI in Practice

George Mathew

Managing Director

Insight Partners

Data Meets Intelligence: Where the Data Infra & AI Stack Converge

George Mathew

Managing Director

Insight Partners

Managing Director Insight Partners

Keynotes

Data Meets Intelligence: Where the Data Infra & AI Stack Converge

Daniel Olmedilla

Distinguished Engineer, AI & Trust

Guardrails for the Future: AI Safety and Responsible AI in Practice

Daniel Olmedilla

Distinguished Engineer, AI & Trust

Distinguished Engineer, AI & Trust LinkedIn

Keynotes

Guardrails for the Future: AI Safety and Responsible AI in Practice

Sumti Jairath

Chief Architect

SambaNova Systems

Bringing Trillions to Reality: How SambaNova’s Memory-Centric Design Powers Agentic AI and GenAI Workflows for Enterprise Data

Sumti Jairath

Chief Architect

SambaNova Systems

Chief Architect SambaNova Systems

Keynotes

Bringing Trillions to Reality: How SambaNova’s Memory-Centric Design Powers Agentic AI and GenAI Workflows for Enterprise Data

Han-chung Lee

Machine Learning Director

Moody's Analytics

The Model is the Product

Han-chung Lee

Machine Learning Director

Moody's Analytics

Machine Learning Director Moody's Analytics

Foundation Models

The Model is the Product

In the realm of machine learning, AI, and deep learning, the intelligence embedded within a system—the model—stands as the primary product and key differentiator. This talk explores how the intelligence component has evolved to become the central selling point across technological eras. We will examine the historical progression of how intelligence capabilities have increasingly defined product value, transforming from hardware differentiators like "Intel Inside" during the PC era, to software advantages, and now to model-centric offerings in today's AI landscape. The intelligence layer has become not just a feature but the core product itself. Additionally, we'll analyze how the definition of "model" itself has evolved alongside technological advancement, reshaping what constitutes a system's core value. Companies now face a strategic bifurcation: pursue a model-centric approach or focus on distribution-centered strategies. Each path carries distinct trade-offs, risks, and opportunities in today's competitive AI marketplace. Through case studies of industry leaders and emerging players, we'll demonstrate how the fundamental principle—"the model is the product, the distribution is the moat"—is reshaping competitive dynamics and business strategies across sectors.

Julian Hyde

Senior Staff Engineer

Google

More Than Query: Future Directions of Query Langages, from SQL to Morel

Julian Hyde

Senior Staff Engineer

Google

Senior Staff Engineer Google

Analytics & BI

More Than Query: Future Directions of Query Langages, from SQL to Morel

"Never bet against SQL,” the saying goes. But what exactly do we want from a query language, and will SQL always be the right tool for the job? What separates a query language from a regular programming language like Python or a framework like Apache Spark? This talk looks at recent efforts to extend SQL with measures and pipe syntax, and then gives an introduction to Morel. Morel is an exciting language that combines the strong type system and expressive power of a functional programming language with the efficiency of a declarative query language. Morel can express not just queries but also data-intensive programming, logic programming and mathematical optimization, and has the potential to replace today’s data frameworks. This talk explores the many ways that we use query languages today – from simple lookup queries and transactions to data engineering, data science and analytics – and related areas such as data-intensive programming, mathematical optimization and logic programming.

Pedram Navid

Head of Data Engineering & DevRel

Dagster Labs

Write Less More: How Dagster Rebuilt Our Docs from the Ground Up

Pedram Navid

Head of Data Engineering & DevRel

Dagster Labs

Head of Data Engineering & DevRel Dagster Labs

Lightning Talks

Write Less More: How Dagster Rebuilt Our Docs from the Ground Up

Documentation can become a critical pain point for technical teams, transforming from a helpful resource into a maintenance nightmare. In this candid session, Dagster reveals their radical approach to documentation reconstruction, demonstrating how a complete ground-up rebuild can revolutionize user experience. Attendees will dive deep into the strategic decision to completely overhaul their documentation, exploring the challenges of incremental improvements and the transformative power of a fresh perspective. Learn how radical rethinking can turn documentation from a source of user frustration into a powerful communication tool that truly serves the community.

Yusuf Ozuysal

Director of Engineering, AI

Snowflake

AI Your Way with All-In-One Access

Yusuf Ozuysal

Director of Engineering, AI

Snowflake

Director of Engineering, AI Snowflake

Workshops

AI Your Way with All-In-One Access

Break bread with us while exploring the latest in LLM inference! Whether you’re a startup or seasoned developer, building with AI requires quick, easy access to top-tier models—without juggling multiple subscriptions. Snowflake is (now) the only platform where you can access Claude 3.6/3.7 Sonnet, GPT-4, O3-mini, and OpenAI embeddings through a single, Cloud Service Provider-agnostic API. We'll explore how a unified gateway for all your essential models can streamline AI pipelines at scale. Plus, our research team will showcase cutting-edge innovations in OSS model inference, pushing the boundaries of throughput and latency at the Pareto frontier. Join us for a unique Lunch & Learn where you'll experience the latest AI innovations firsthand and provide feedback that shapes our product roadmap.

Paul Dix

Founder & CTO

InfluxData

Building InfluxDB 3 Core: A Real-Time Columnar DB and Data Processor on Object Storage

Paul Dix

Founder & CTO

InfluxData

Founder & CTO InfluxData

Databases

Building InfluxDB 3 Core: A Real-Time Columnar DB and Data Processor on Object Storage

InfluxDB 3 Core reimagines time series databases with a ground-up Rust rewrite using Apache Arrow, DataFusion, and Parquet. This session explores an innovative diskless architecture that leverages object storage for persistence, featuring a sophisticated caching system enabling real-time data ingestion and querying. Attendees will discover how an embedded Python VM transforms the database into a comprehensive data collector, monitoring agent, and data transformation platform.

Hamel Husain

Machine Learning Consultant

Parlance Labs

The Model is Not the Product

Hamel Husain

Machine Learning Consultant

Parlance Labs

Machine Learning Consultant Parlance Labs

Foundation Models

The Model is Not the Product

This Data Council 2025 talk is in development. Check back soon!

Chenggang Wu

Co-Founder & CTO

RunLLM

AGI Is Already Here (But It's Not What You Think)

Chenggang Wu

Co-Founder & CTO

RunLLM

Co-Founder & CTO RunLLM

GenAI Applications

AGI Is Already Here (But It's Not What You Think)

Alexa Garrison

VP Data & Business Operations

Splice

Building High-Impact Data Teams in an AI-Driven World

Alexa Garrison

VP Data & Business Operations

Splice

VP Data & Business Operations Splice

AI & Data Culture

Building High-Impact Data Teams in an AI-Driven World

This Data Council 2025 talk explores how organizations can build strong data teams and empower them to drive impactful decision-making, regardless of size or resources. More details to be announced...

Etienne Dilocker

CTO

Weaviate

The Agentic Database: A New Way to Interact with Your Data

Etienne Dilocker

CTO

Weaviate

CTO Weaviate

Databases

The Agentic Database: A New Way to Interact with Your Data

For decades, database interactions have been constrained by traditional Create, Update, and Delete (CRUD) operations, but the emergence of AI agents is poised to revolutionize this paradigm. This session explores a transformative approach to database interaction, where databases become collaborative partners capable of understanding complex, natural language commands. Attendees will discover how future databases might interpret sophisticated requests like "Translate all documents to Spanish and summarize them" or "Extract the 2024 Sales numbers and map out their correlation with events and feature releases." By moving beyond vector search and similarity matching, this talk reveals a groundbreaking vision of databases as intelligent, context-aware systems that can comprehend, process, and execute nuanced human instructions.

Samuel Colvin

Founder

Pydantic

Pydantic: An Opinionated Blueprint for the Future of GenAI Applications

Samuel Colvin

Founder

Pydantic

Founder Pydantic

Workshops

Pydantic: An Opinionated Blueprint for the Future of GenAI Applications

AI application development doesn't require reinventing software engineering. This transformative talk presents a practical blueprint for building maintainable AI systems using existing tools like Pydantic as the foundation. Learn how to implement critical components: strict data validation at API levels, self-correction mechanisms for enhanced accuracy, automated schema generation for LLM tool calls, continuous evaluation frameworks, and comprehensive observability solutions. Through concrete examples and code snippets, discover how familiar tools can create robust AI applications without unnecessary complexity. Perfect for developers looking to integrate AI functionality into larger software systems efficiently.

Andy Pavlo

Assistant Professor of Databaseology

Carnegie Mellon University

What Goes Around Comes Around... and Around...

Andy Pavlo

Assistant Professor of Databaseology

Carnegie Mellon University

Assistant Professor of Databaseology Carnegie Mellon University

Databases

What Goes Around Comes Around... and Around...

Doesn't it feel like there is always a new crop of database management systems pushing the idea that the relational model is outdated and SQL is dying? Vector database proponents have recently taken up this mantle, fueled by AI/ML technologies. Before that, NoSQL users claimed RM/SQL was insufficient for "webscale" applications. And in the 1990s, object-oriented database vendors wanted developers to switch to their systems. Database history doesn't repeat, but it rhymes. In this talk, Professor Andy Pavlo presents the 60-year history of data modeling research and demonstrate why RM/SQL is the preferred default choice for database applications of any size. All efforts to completely replace the data model or query language have failed. Instead, SQL absorbed the best ideas from these alternative approaches and remains relevant for modern applications.

Dhruv Singh

Co-founder & CTO

HoneyHive AI

Eval Agents: How to Solve Error Cascades in Agents

Dhruv Singh

Co-founder & CTO

HoneyHive AI

Co-founder & CTO HoneyHive AI

AI Engineering

Eval Agents: How to Solve Error Cascades in Agents

Agents or RAG chatbots are multi-turn AI systems. Multi-turn means interacting back-and-forth with humans. These systems face a fundamental challenge: errors compound and cascade with each interaction. In this talk, we'll go through real-world examples of agents failing in spectacular ways when one step goes wrong - overconfidence, manipulation, looping actions, and more. After doing so, we'll examine how agent builders use "eval agents" tuned on real-world interactions to evaluate agents and even use them as verifiers to improve performance in production! By the end of the talk, you'll have learned about the new world of trajectory evaluation needed to evaluate agents accurately.

George Fraser

Co-Founder & CEO

Fivetran

Look Ma, No Data Warehouse!

George Fraser

Co-Founder & CEO

Fivetran

Co-Founder & CEO Fivetran

Workshops

Look Ma, No Data Warehouse!

Modern data lakes promise affordability and scalability, but using them can be a headache. Cloud data warehouses make querying easy, but they come with a hefty price tag and extra complexity. What if you could get the same ease of use without the cost and lock-in?

In this session, we’ll show you how to leverage open-source software to build a fully functional, queryable analytics powerhouse using DuckDB, Fivetran, and Polaris Catalog. We’ll walk through how to:

1. Load data that is automatically converted to Iceberg open table format

2. Run SQL queries using DuckDB’s new Iceberg extension

3. Run transformations directly on data stored in your data lake with a new dbt adapter

4. Get started easily with a practical, hands-on demo

No vendor lock-in, no unnecessary complexity—just an open-source-powered approach to enabling advanced analytics and AI. If your data warehouse is holding you back or eating away at your budget, this session is for you!

Simon Eskildsen

Co-Founder

Turbopuffer

Billion-Scale Vector Search on Object Storage

Simon Eskildsen

Co-Founder

Turbopuffer

Co-Founder Turbopuffer

Data Eng & Infrastructure

Billion-Scale Vector Search on Object Storage

Vector Search at Scale: How Notion Built Billion-Vector Search Infrastructure | Explore the architecture behind Notion's enterprise-scale vector search system, powering one of the largest semantic search implementations in production. Learn advanced techniques in embedding pipeline design, distributed vector processing, and optimal storage strategies using Spark and Turbopuffer. This technical deep-dive covers LSM indexing, RAG (Retrieval Augmented Generation) implementation, and practical approaches to query optimization. Discover battle-tested strategies for building and scaling production-ready vector search systems capable of handling billions of vectors with high performance and reliability.

Vishnu Vasanth

Co-Founder & CEO

e6data

Everything Everywhere All at Once: Object Store Native

Vishnu Vasanth

Co-Founder & CEO

e6data

Co-Founder & CEO e6data

Workshops

Everything Everywhere All at Once: Object Store Native

Discover how e6data’s lakehouse compute engine runs complex and high-concurrency SQL analytics and AI workloads 10x faster than all leading engines at 1/3rd the cost—all with zero data movement. Learn how e6data’s atomically scalable lakehouse architecture helps achieve sub-second latencies even under heavy concurrency. This technical deep-dive covers e6data’s atomically scalable K8s native architecture, disaggregated compute design, and open table format integration showing the future of SQL analytics and AI workloads through real-world performance benchmarks and production case studies. Learn how an object-store-native approach unlocks “everything, everywhere, all at once” in modern data ecosystems.

Niko Grupen

Head of Applied Research

Harvey

Legal Agency: Building Domain-specific Agents for Enterprise

Niko Grupen

Head of Applied Research

Harvey

Head of Applied Research Harvey

GenAI Applications

Legal Agency: Building Domain-specific Agents for Enterprise

Building agents for real-world knowledge work requires a delicate balance of AI and Human-Computer Interaction (HCI) — one has to understand frontier model capabilities, translate them into a framework for agent behavior (with the right primitives, guardrails, etc), and then place them in an intuitive product surface that is interactive and transparent. The complexity of attaining this balance is magnified for vertical problem spaces that require significant domain expertise to solve for, like law. This talk will share insights and best-practices from building at the bleeding edge of the application layer. We'll explore how to leverage domain expertise to map model problems to legal problems (and importantly, evaluate them), how to create a framework for vertical agents that mirrors human processes, and why, despite LLMs being the star of the show, traditional engineering and machine learning practices are essential for maximizing quality and reliability in production environments.

Dillon Morrison

Director of Product Management

Sigma Computing

Text-to-SQL Is Not the Answer: How to Effectively Use AI For Analytics

Dillon Morrison

Director of Product Management

Sigma Computing

Director of Product Management Sigma Computing

Workshops

Text-to-SQL Is Not the Answer: How to Effectively Use AI For Analytics

Think BI is dead? Will natural language replace the dashboard? Sigma's Wednesday morning workshop breaks down why generative AI is a powerful supplement - not replacement - for BI practices, and examines how to effectively embed AI into your analytics workflows.

Natacha Crooks

Assistant Professor

UC Berkeley

From Concurrency Control to Concurrent Scheduling

Natacha Crooks

Assistant Professor

UC Berkeley

Assistant Professor UC Berkeley

Databases

From Concurrency Control to Concurrent Scheduling

This Data Council 2025 talk is in development. Please check back soon for updates!

Rachel Lee Nabors

Former React Core

Meta

AI Cram Session

Rachel Lee Nabors

Former React Core

Meta

Former React Core Meta

AI Engineering

AI Cram Session

Machine Learning Fundamentals: From RAG to Deep Learning for Beginners | Comprehensive introduction to essential machine learning concepts, including RAG (Retrieval Augmented Generation), neural networks, and foundational math principles. Learn complex ML concepts through engaging visual explanations and intuitive metaphors from an experienced technical educator. Perfect for developers, analysts, and technical professionals looking to understand modern AI terminology and architecture. Features practical examples and visual guides from the creator of React's educational platform, making advanced concepts accessible for technical audiences.

Chenyu Qiu

Senior Applied Scientist

Uber

Scalable Continuous Monitoring for Large-scale A/B Experimentation

Chenyu Qiu

Senior Applied Scientist

Uber

Senior Applied Scientist Uber

Data Sci & Algos

Scalable Continuous Monitoring for Large-scale A/B Experimentation

At Uber, our A/B Testing Framework and Continuous Experiment Monitoring talk reveals how we've revolutionized experimental analytics at scale. We'll demonstrate our solution to the "peeking problem" that plagues traditional experiment monitoring approaches. This presentation showcases our automated platform that processes thousands of monitoring analyses daily using regression-adjusted estimators with anytime-valid inference. This advanced statistical methodology eliminates 95% of noise without sacrificing true signals, enabling Early Experiment Detection and Performance Insights. Learn how our Spark-powered computational framework efficiently batches experiments and metrics for scalable processing. We'll share Real-World Case Studies showing how this system has transformed Uber's Data-Driven Decision Making, minimizing undetected regressions and accelerating product innovation across our global platform.

Ori Soen

CEO

Montara

Analytics and the dark side of the Analytics Development Lifecycle

Ori Soen

CEO

Montara

CEO Montara

Workshops

Analytics and the dark side of the Analytics Development Lifecycle

In this insightful session, we examine how the Analytics Development Lifecycle (ADLC) introduced essential structure to data workflows but unintentionally created organizational bottlenecks by limiting data warehouse access to engineers. Our speaker shares how innovative Data Teams are successfully enabling analysts, product managers, and data scientists to migrate their work to Data Warehouse tables while maintaining strong Data Governance and Quality Standards. Discover practical DataOps Strategies that balance democratized data access with the structured Quality Assurance processes that modern enterprises require for effective Data Management.

Franck Pachot

Developer Advocate

MongoDB

The Modern Database Debate: PostgreSQL and MongoDB

Franck Pachot

Developer Advocate

MongoDB

Developer Advocate MongoDB

Databases

The Modern Database Debate: PostgreSQL and MongoDB

Which database should you choose? This question has evolved from theoretical debates to practical decisions based on facts. Technology has advanced significantly—SQL databases now support JSON, while NoSQL databases have integrated ACID properties. PostgreSQL and MongoDB represent the most common choices today, both widely adopted as standard APIs for managed database services. We will explore differences between these approaches, comparing interactive SQL transactions versus document-based design, examining internal storage performance implications, and considering how team expertise influences choices. Our goal is to clarify how to utilize each option effectively for modern applications' agility, scalability, and performance requirements, helping you select the database your team will be most comfortable using efficiently.

Parham Parvizi

Founding Data Architect

Prospective

A Local-First approach to extremely fast Streaming Visualization

Parham Parvizi

Founding Data Architect

Prospective

Founding Data Architect Prospective

Workshops

A Local-First approach to extremely fast Streaming Visualization

Modern data workloads demand fast, interactive, and scalable visualization—without the cost and complexity of server-side rendering. The local-first approach leverages modern browser capabilities, WebAssembly, and in-browser computation to achieve high-performance analytics while reducing cloud costs.

In this workshop, we’ll explore:

1. Why Local-First? The benefits of running everything client-side for cost-efficient, scalable visualization across thousands of users.
WebAssembly (WASM) for Data Apps: How Perspective harnesses WASM to power ultra-fast, browser-native analytics and even replace traditional Docker-based containers for data workloads.

2. Perspective + DuckDB: A full in-browser analytics stack that enables high-speed querying and visualization without a backend.
Streaming Data with InfluxDB: How to visualize high-frequency, real-time IoT and log data with sub-second latency.
Databricks + Perspective: Enhancing large-scale analytics with interactive dashboards inside Jupyter notebooks.
Through live coding and guided exercises, attendees will build their own browser-native analytics dashboards, connect to real-time data streams, and learn Perspective’s API in Python, Node.js, and Rust.

Difficulty level - Intermediate – Some experience with Python, JavaScript, and data analytics will be helpful, but beginners can follow along with guided exercises.

To participate, bring a laptop with:
Git
VS Code
Docker
Python (3.8+)
Node.js (16+)

Wenjing Zheng

Data Science Manager

Roblox

Causal Inference Methods for Bridging Experiments and Strategic Impact

Wenjing Zheng

Data Science Manager

Roblox

Data Science Manager Roblox

Data Sci & Algos

Causal Inference Methods for Bridging Experiments and Strategic Impact

While experimentation gives us clean effect measures, connecting those results to real-world business decisions is messy. In this talk, I’ll walk through two case studies at Roblox that highlight this challenge and explore some causal inference methods to help bridge the gap. The first focuses on attributing observed year-over-year business growth to product launches. The strategic need here is twofold: to understand how much of our growth is driven by the innovations we shipped, and to reconcile different measurements of business performance— experiment results and long-term growth trends—into a coherent narrative. The core challenge is isolating product impact from organic growth (in the absence of these launches) in the topline metrics we observe.The second case study addresses how to generalize A/B test results to a broader population, without requiring an explicit evaluation of covariate shift between the experiment and target population—making the approach scalable across experiments and surfaces. This framing is essential for fair comparisons across product areas that vary in reach and in how amenable they are to metric movement, enabling more effective prioritization across teams.Together, these cases reflect a broader goal: building a common measurement language that connects local experimental results to global business impact—so organizations can make more strategic, data-informed decisions.

Doron Porat

Co-Founder & CEO

Lakeway

AI is Going to Break Your Data Platform - Are You Ready?

Doron Porat

Co-Founder & CEO

Lakeway

Co-Founder & CEO Lakeway

AI & Data Culture

AI is Going to Break Your Data Platform - Are You Ready?

AI isn't just another workload - it's an unpredictable force disrupting data operations. This isn't evolution - it's collision. Traditional platforms assume stability, but AI workloads introduce volatility everywhere: in queries, users, and purposes. We need a new playbook. The way we optimize, govern, and structure data must evolve before AI forces our hand. The cracks are forming: workloads becoming chaotic, query patterns unpredictable, and latency constraints tightening. Pre-joins and aggregations matter, but existing optimization strategies won't hold at AI scale. This talk breaks down what's coming, what's at risk, and how to build AI-ready data platforms that don't just survive change - they thrive on it.

Oriol Mirosa

Director, Data Solutions

Brooklyn Data Co

Data Governance is NOT the Governance of Data!

Oriol Mirosa

Director, Data Solutions

Brooklyn Data Co

Director, Data Solutions Brooklyn Data Co

AI & Data Culture

Data Governance is NOT the Governance of Data!

This talk challenges the misleading concept that data governance is about controlling information rather than managing relationships between people. This Data Governance Best Practices talk explores why Traditional Data Management frameworks fail when overlooking the human element, presenting instead a Relationship-Centered Governance Model that aligns roles and responsibilities across organizations. Drawing from Enterprise Data Governance Case Studies, attendees will discover practical strategies for embedding Effective Governance Workflows without creating bottlenecks, transforming Data Management Strategy from control-focused to people-empowering while maintaining appropriate Data Quality Standards and Compliance Requirements.

Willem Pienaar

Co-Founder & CTO

Cleric

Chaos by Design: Solving the Unsolvable AI Agent Testing Problem

Willem Pienaar

Co-Founder & CTO

Cleric

Co-Founder & CTO Cleric

Lightning Talks

Chaos by Design: Solving the Unsolvable AI Agent Testing Problem

Not all AI agent use cases are created equal. While code generation agents can be tested against clear benchmarks, operational agents tackling real-world problems face a fundamentally different challenge: how do you evaluate an agent that must navigate complex, dynamic systems without a predefined playbook? Take root cause analysis in distributed systems: an agent must understand intricate service dependencies, parse through inconsistent logs, and reason about potential failure modes. Unlike coding tasks with definitive right answers, these scenarios have no ground truth. Traditional testing approaches break down completely. This talk breaks down our approach to building a deterministic simulation environment that generates and tests realistic failure scenarios at scale. We'll expose why existing evaluation methods fail—from infrastructure mimicry to LLM-generated tests—and demonstrate a lightweight simulation technique that enables precise, reproducible agent testing.

Mitul Tiwari

Co-founder & CTO

Stealth

TapeAgents: A Powerful Framework For Building And Optimizing AI Agents

Mitul Tiwari

Co-founder & CTO

Stealth

Co-founder & CTO Stealth

GenAI Applications

TapeAgents: A Powerful Framework For Building And Optimizing AI Agents

TapeAgents: Advanced Framework for Observable AI Development | Discover ServiceNow's open-source framework for building transparent, debuggable AI agents with comprehensive action recording and replay capabilities. Learn how TapeAgents' innovative recording system enables unprecedented visibility into agent behavior, streamlined debugging, and data-driven optimization. Master practical techniques for building robust AI agents with built-in observability and performance analysis tools. Features implementation strategies for creating production-ready agents with enhanced reliability and maintainability.

Timothy Chan

Head of Data

Statsig

Unlocking A/B Testing For B2B

Timothy Chan

Head of Data

Statsig

Head of Data Statsig

Data Sci & Algos

Unlocking A/B Testing For B2B

B2B Experimentation: Advanced A/B Testing Beyond Consumer Applications | Learn enterprise-grade experimentation strategies from Statsig's work with leading B2B platforms including Notion, Figma, and Atlassian. Master specialized statistical approaches designed for B2B contexts, addressing unique challenges in sample sizes, user behaviors, and impact measurement. Discover practical frameworks for implementing robust experimentation systems that deliver reliable insights for enterprise products. Features real-world case studies demonstrating successful B2B testing methodologies and their impact on product development.

David Wilson

Co-Founder & CEO

Hunch Tools

Designing & Engineering a Viral Multi-Model AI Workflow: From Prototype to 300K Users in Two Weeks

David Wilson

Co-Founder & CEO

Hunch Tools

Co-Founder & CEO Hunch Tools

GenAI Applications

Designing & Engineering a Viral Multi-Model AI Workflow: From Prototype to 300K Users in Two Weeks

When Hunch's viral LinkedIn year-in-review AI generator reached 300,000 users processing 1+ trillion tokens in two weeks, their multi-model architecture faced extreme scaling challenges. This case study reveals how a simple prototype evolved into a production-scale AI system overnight. Discover Hunch's technical blueprint featuring multiple LLM orchestration across OpenAI, Anthropic, and Google models, critical infrastructure scaling solutions, and how they achieved 85% cost reduction through optimized model selection and prompt engineering. Learn from their 26 rapid iterations that simultaneously improved output quality while decreasing costs. This presentation shares practical patterns for AI workflow orchestration balancing quality, cost, and reliability at scale. Gain actionable engineering strategies for building resilient, scalable AI applications that maintain performance under unpredictable growth, plus vital lessons about system failure points when success arrives faster than expected.

Ofer Mendelevitch

Head of Developer Relations

Vectara

Building Enterprise Agentic RAG Applications with Reduced Hallucinations

Ofer Mendelevitch

Head of Developer Relations

Vectara

Head of Developer Relations Vectara

Workshops

Building Enterprise Agentic RAG Applications with Reduced Hallucinations

As AI continues to evolve, agentic frameworks are becoming essential tools for developing intelligent and autonomous systems that can reason, plan, and act dynamically. In this workshop, we will explore how to leverage Vectara’s Agentic RAG framework to build context-aware, AI assistants and agents, with reduced hallucinations that enhance productivity and automate enterprise workflows. We will provide a step-by-step walkthrough on how to build Agentic RAG applications, delving into the technical details with a real-world example, and discuss the challenges developers might face, such as reducing hallucinations. Whether you are an AI developer, researcher, or enthusiast, this workshop will equip you with the practical skills to harness agentic AI for your enterprise.

Lindsay Murphy

Director, Head of Data

Hiive

No More BS: How (and When) to Really Leverage AI

Lindsay Murphy

Director, Head of Data

Hiive

Director, Head of Data Hiive

AI & Data Culture

No More BS: How (and When) to Really Leverage AI

Successful AI implementation hinges on a solid foundation of data quality and governance, and the current hype often overshadows the critical practical considerations needed to achieve that foundation. Moreover, while AI holds immense potential, it's crucial to evaluate whether it's truly the optimal solution for a given business problem, as simpler, more established methods may be equally or more effective. We present a practical framework to assess whether AI is the optimal solution, and encourage some good old-fashioned critical thinking. Join Colleen Tartow and Lindsay Murphy for a data-driven conversation exploring AI's true viability.

Jake Thomas

Manager, Data Foundations

Okta

Embedding OLAP, Everywhere: Lessons from Okta

Jake Thomas

Manager, Data Foundations

Okta

Manager, Data Foundations Okta

ML OPs & Platforms

Embedding OLAP, Everywhere: Lessons from Okta

Okta's innovative journey from processing trillions of events with mini serverless databases to embedding OLAP across its systems reveals a transformative approach to data processing. This session explores how embedded database systems are reshaping traditional data warehousing, demonstrating how small databases can create enormous value beyond analytics. Attendees will discover the strategic shift that's bringing databases back into application engineering and driving unprecedented innovation.

Tobias Lunt

Co-Founder & Data Scientist

Development Data Lab

Putting Data to Work for Global Urban Development

Tobias Lunt

Co-Founder & Data Scientist

Development Data Lab

Co-Founder & Data Scientist Development Data Lab

Lightning Talks

Putting Data to Work for Global Urban Development

Imagine transforming the lives of billions by reimagining urban data infrastructure. Development Data Lab is pioneering a revolutionary approach to urban policy and planning, addressing the critical gap in decision-ready data for the world's developing cities. By integrating diverse data sources—including satellite imagery, administrative records, household surveys, and AI-powered text analysis—this innovative project creates a unified geographic framework for understanding urban challenges. The team demonstrates how emerging data technologies can generate near-real-time, actionable insights to tackle complex issues like urban sprawl, air pollution, poverty, education, mobility, and migration. Learn how a mission-driven approach can leverage incremental technological improvements and AI-assisted development to create outsized impact for global urban communities.

Marck Vaisman

Global AI Solutions Architect

Microsoft

Revolutionize AI Engineering With Autogen

Marck Vaisman

Global AI Solutions Architect

Microsoft

Global AI Solutions Architect Microsoft

AI Engineering

Revolutionize AI Engineering With Autogen

Microsoft AutoGen: Scale and Automate Enterprise AI Development | Discover Microsoft's open-source framework for building and orchestrating production-ready AI agent systems. Learn practical implementation strategies for automating complex AI workflows, reducing development time, and optimizing resource utilization. Features real-world case studies demonstrating AutoGen's impact on development efficiency, model performance, and cost reduction across various industries. Includes hands-on examples of system integration, agent orchestration, and workflow automation for enterprise AI applications.

Elias DeFaria

Co-Founder & VP of Product

SDF

Why dbt Acquired Sdf: How A Small Team Built True SQL Comprehension

Elias DeFaria

Co-Founder & VP of Product

SDF

Co-Founder & VP of Product SDF

Data Eng & Infrastructure

Why dbt Acquired Sdf: How A Small Team Built True SQL Comprehension

At SDF, we built a multi-dialect SQL compiler that resolves proprietary SQL dialects like Snowflake and BigQuery into a unified logical plan. This breakthrough technology, now part of dbt following the acquisition, unlocks immense value in developer experience, data governance, and cost optimization—enabling seamless cross-engine workflows. In this talk, Elias, co-founder of SDF, will dive into how we built the compiler, the challenges of normalizing complex dialects, and the transformative potential for data practitioners. He'll conclude with an exclusive look at upcoming dbt features powered by this technology, reshaping how teams approach analytics.

Mickey Liu

Software Engineer

Notion

Billion-Scale Vector Search on Object Storage

Mickey Liu

Software Engineer

Notion

Software Engineer Notion

Data Eng & Infrastructure

Billion-Scale Vector Search on Object Storage

Sumedh Sakdeo

Senior Staff Software Engineer

Optimizing Iceberg Table Layouts at Scale: A Multi-Objective Approach

Sumedh Sakdeo

Senior Staff Software Engineer

Senior Staff Software Engineer LinkedIn

Data Eng & Infrastructure

Optimizing Iceberg Table Layouts at Scale: A Multi-Objective Approach

Optimizing Iceberg Tables: Advanced Data Layout Strategies for Enterprise Data Lakes | Master data layout optimization techniques for managing large-scale Iceberg deployments with 100K+ tables. Learn comprehensive approaches to multi-objective optimization, balancing storage efficiency with query performance through intelligent file management and compaction strategies. This session covers practical implementation of table scoring algorithms, automated optimization workflows, and real-world performance insights from OpenHouse deployment. Includes detailed case studies and benchmarks using LST-bench, demonstrating measurable improvements in query performance and storage efficiency.

Jesus Camacho

Principal Engineering Manager

Microsoft

Optimizing Iceberg Table Layouts at Scale: A Multi-Objective Approach

Jesus Camacho

Principal Engineering Manager

Microsoft

Principal Engineering Manager Microsoft

Data Eng & Infrastructure

Optimizing Iceberg Table Layouts at Scale: A Multi-Objective Approach

Ciro Greco

Founder

Bauplan

Python Over Data Lakes: Declarative Environments, Data Management And Other Things With Feathers

Ciro Greco

Founder

Bauplan

Founder Bauplan

Data Sci & Algos

Python Over Data Lakes: Declarative Environments, Data Management And Other Things With Feathers

Python Data Lake Reproducibility: Building Deterministic Pipelines at Scale | Learn advanced techniques for creating reproducible data workflows across distributed environments using Python, Iceberg, Arrow, and Docker. Master declarative approaches to managing code versions, data dependencies, and runtime configurations in complex data lake architectures. Discover practical solutions for decoupling compute, storage, and execution environments while maintaining deterministic results. Includes implementation strategies using open-source tools for building efficient, scalable data pipelines with improved developer experience.

Joseph Powers

Principal Data Scientist

Intuit

Going Bayes: Shifting Our Testing Methods To Reflect Our Priorities

Joseph Powers

Principal Data Scientist

Intuit

Principal Data Scientist Intuit

Data Sci & Algos

Going Bayes: Shifting Our Testing Methods To Reflect Our Priorities

Bayesian AB Testing at Scale: How Intuit Revolutionized Experiment Design | Discover how Intuit transformed their experimentation framework using Bayesian risk-based testing to achieve 60% faster results. Learn practical implementation of risk threshold algorithms that optimize for business outcomes rather than traditional error rates. Master strategies for organizational adoption of advanced statistical methods across Analytics, Product, and Marketing teams. Features detailed case study of successful enterprise-wide statistical transformation, including implementation challenges and measurable outcomes.

Marcel Kornacker

Co-Founder & CTO

Pixeltable

Introducing Pixeltable: Open Source Data Infrastructure for Multimodal AI

Marcel Kornacker

Co-Founder & CTO

Pixeltable

Co-Founder & CTO Pixeltable

ML OPs & Platforms

Introducing Pixeltable: Open Source Data Infrastructure for Multimodal AI

Traditional AI infrastructure creates complexity by forcing data teams to juggle multiple specialized systems, fragmenting workflows and increasing operational costs. Marcel Kornacker, founder of Apache Impala and Apache Parquet, introduces Pixeltable, an open-source solution that revolutionizes AI data infrastructure through a declarative, incremental approach. This session reveals how a unified platform can solve common AI data challenges by bringing together data, computation, and models in a single, integrated interface. Attendees will discover how Pixeltable provides automatic versioning, enables incremental updates, and streamlines pipeline management for ML engineers, data scientists, and infrastructure teams seeking to overcome traditional data processing limitations.

Saif Ur-Rehman

Data Engineering Lead

Basecamp Research

Engineering Earth's Largest Biological Data Pipeline

Saif Ur-Rehman

Data Engineering Lead

Basecamp Research

Data Engineering Lead Basecamp Research

Lightning Talks

Engineering Earth's Largest Biological Data Pipeline

Basecamp Research is pioneering a groundbreaking mission to map the unknown biological world, addressing the staggering fact that over 99.9% of life on Earth remains undiscovered. This session unveils an unprecedented biological data pipeline that surpasses all publicly available scientific data collected over the past century. By creating a comprehensive digital twin of Earth's life, the team is developing next-generation biological foundation models with applications spanning pharmaceutical research, deep learning, and scientific discovery. Attendees will explore how a global biological data supply chain, spanning five continents, is generating billions of biological labels and producing state-of-the-art AI models that outperform research from Google, DeepMind, and Genentech.

Jonathan Jin

Staff Machine Learning Engineer

Hinge

Trimming the Long Tail of Production Model Ownership at Hinge

Jonathan Jin

Staff Machine Learning Engineer

Hinge

Staff Machine Learning Engineer Hinge

ML OPs & Platforms

Trimming the Long Tail of Production Model Ownership at Hinge

Beyond model performance lies a critical challenge in machine learning: comprehensive model ownership. This talk examines how focusing on the often-overlooked "long tail" of machine learning infrastructure can dramatically improve operational efficiency and innovation. Staff Engineer Jonathan Jin from Hinge's AI Platform team will reveal how addressing challenges like observability, feature access, and model refinement creates a "golden path" that empowers teams to continuously innovate. Attendees will learn how strategic infrastructure development can transform machine learning from a performance-driven to a holistic, sustainable practice.

Madison Faulkner

Principal & Head of Data Science

NEA

The Future Of Data Engineering: From Unstructured To Structured for Agent Systems

Madison Faulkner

Principal & Head of Data Science

NEA

Principal & Head of Data Science NEA

Data Eng & Infrastructure

The Future Of Data Engineering: From Unstructured To Structured for Agent Systems

Hamilton Ulmer

UI Engineer & Designer

MotherDuck

Instant Preview Mode: Real-Time Feedback to Make SQL Data Exploration Fly

Hamilton Ulmer

UI Engineer & Designer

MotherDuck

UI Engineer & Designer MotherDuck

Analytics & BI

Instant Preview Mode: Real-Time Feedback to Make SQL Data Exploration Fly

Imagine writing SQL queries that give you instant visual feedback, transforming your entire data exploration experience. In this talk, you'll see how MotherDuck's Instant Preview Mode breaks through traditional development barriers by providing real-time results as you type. Powered by cutting-edge client-side query parsing and DuckDB-WASM, this technology eliminates the frustrating write-run-debug cycle that's slowed down data professionals for years. You'll see how we've created a system that not only accelerates query iteration but makes working with SQL—especially AI-generated queries—feel more intuitive and responsive than ever before.

Vignesh Chadramohan

Engineering Manager

Doordash

Internals of SlateDB: An Embedded Key-Value Store Built on Object Storage

Vignesh Chadramohan

Engineering Manager

Doordash

Engineering Manager Doordash

ML OPs & Platforms

Internals of SlateDB: An Embedded Key-Value Store Built on Object Storage

Object storage platforms like S3 and Azure Blob Storage have transformed data systems, enabling new architectural paradigms. This session explores SlateDB, an embeddable storage engine built in Rust that leverages object storage's unique properties. Attendees will dive into how conditional writes, checkpoints, transactions, and remote compaction can be implemented, discovering insights that extend beyond SlateDB to broader data system design and implementation.

Nikita Vemuri

Software Engineer

Anyscale

From Scaling to Observability: Solving Key Challenges for Distributed ML with Ray

Nikita Vemuri

Software Engineer

Anyscale

Software Engineer Anyscale

ML OPs & Platforms

From Scaling to Observability: Solving Key Challenges for Distributed ML with Ray

As machine learning workloads grow increasingly complex, distributed training across thousands of nodes presents significant challenges. This talk explores how the Ray library ecosystem tackles critical issues in multi-node ML training, focusing on development, orchestration, and comprehensive observability. Attendees will learn about innovative solutions for tracking system data, managing potential failure points, and implementing robust observability workflows that persist critical information.

Ethan Brown

Director, Data & Applied Science

Twitch / AWS

Building an LLM-Powered Analytics Slack Bot at Twitch

Ethan Brown

Director, Data & Applied Science

Twitch / AWS

Director, Data & Applied Science Twitch / AWS

GenAI Applications

Building an LLM-Powered Analytics Slack Bot at Twitch

The best way to beat a wave of automation is to surf it. With this principle in mind, the data team at Amazon IVS / Twitch Video has developed an LLM-powered data analytics bot to augment their data operations. The bot integrates with Slack, allowing employees to interact with data tools through a familiar chat interface. It performs a range of tasks including SQL query generation, chat summarization, and account lookups. This talk provides a practical walkthrough of the implementation, demonstrating how teams can build similar solutions using standard AWS services.

John Bagnall

Senior Data Product Manager

Matillion

Humanizing Data Architecture: How Design Thinking Transforms Data Strategy

John Bagnall

Senior Data Product Manager

Matillion

Senior Data Product Manager Matillion

Lightning Talks

Humanizing Data Architecture: How Design Thinking Transforms Data Strategy

As organizations embrace increasingly complex data architectures like data mesh and data fabric, a critical challenge emerges: how do we ensure these sophisticated technical solutions genuinely serve human needs? This session introduces design thinking as a transformative framework for developing data strategies that balance technical excellence with profound user-centricity. Through practical examples and deep case studies, explore how empathy, innovative problem-solving, and iterative feedback can revolutionize data architecture. Attendees will learn to apply design thinking's core principles—understanding stakeholder needs, articulating human-centered problems, generating innovative solutions, rapid prototyping, and continuous improvement—to create data products that are not just technically sophisticated, but truly meaningful and accessible to their users.

CL Kao

Founder

Recce

Data Engineering Is Not Software Engineering, Until It Is

CL Kao

Founder

Recce

Founder Recce

Lightning Talks

Data Engineering Is Not Software Engineering, Until It Is

Modern Data Engineering: Bridging DevOps, MLOps, and Software Development | This technical session examines how modern data engineering is evolving beyond traditional software engineering practices, focusing on data pipeline architecture, testing frameworks, and deployment strategies. Through real-world case studies from dbt (data build tool) implementations and SQLMesh data transformation workflows, the presentation explores how data teams are adopting GitOps methodologies, continuous integration, and version control for data-centric systems. As artificial intelligence and machine learning operations become central to software development, these emerging data engineering practices are reshaping how teams approach data quality, system validation, and production deployment. The session will demonstrate how differences in ETL pipeline feedback loops and data testing environments are driving new best practices for managing enterprise data systems, while offering insights into the future convergence of CI/CD, data governance, and MLOps practices.

Avi Press

CEO

Scarf

Open Source Success: Learnings from 1 Billion Downloads

Avi Press

CEO

Scarf

CEO Scarf

Lightning Talks

Open Source Success: Learnings from 1 Billion Downloads

This data-driven analysis examines user behavior patterns across 1 billion open source package downloads, spanning 2000+ GitHub repositories and open source projects tracked through Scarf Analytics. The research reveals critical insights for open source maintainers and OSS business leaders, covering package management trends, download metrics, and documentation strategy. By analyzing global distribution patterns, software packaging formats, and community engagement metrics, the presentation provides actionable strategies for open source project growth, user adoption, and sustainable business development in the open source ecosystem. The findings highlight how successful OSS projects leverage download analytics, developer documentation, and community metrics to drive project adoption and monetization.

Michael Cohen

Global Chief Data & Analytics Officer

Plus Company

The Art of Data: Reimaging Creative Processes with Data Culture

Michael Cohen

Global Chief Data & Analytics Officer

Plus Company

Global Chief Data & Analytics Officer Plus Company

Lightning Talks

The Art of Data: Reimaging Creative Processes with Data Culture

This session tackles a persistent challenge in creative industries: why do artists often see data as the enemy of creativity, and how can we change that perception? Drawing from hands-on experience, the presentation explores how organizations can transform data from a creative constraint into an inspiration catalyst. We'll dive into practical strategies for building data literacy among creative teams, showcase compelling examples of data storytelling in artistic contexts, and demonstrate how leading creative professionals are using analytics to amplify rather than stifle their artistic vision. Learn how successful organizations are bridging the gap between data teams and creatives, fostering a culture where intuition and analytics work in harmony to drive more impactful creative outcomes.

Dylan Perez Neider

Sr. Solutions Engineer

Sigma Computing

Text-to-SQL Is Not the Answer: How to Effectively Use AI For Analytics

Dylan Perez Neider

Sr. Solutions Engineer

Sigma Computing

Sr. Solutions Engineer Sigma Computing

Workshops

Text-to-SQL Is Not the Answer: How to Effectively Use AI For Analytics

Dadi Atar

VP Product

Montara

Analytics and the dark side of the Analytics Development Lifecycle

Dadi Atar

VP Product

Montara

VP Product Montara

Workshops

Analytics and the dark side of the Analytics Development Lifecycle

Sudarsan Lakshmi

Head of Engineering

e6data

Everything Everywhere All at Once: Object Store Native

Sudarsan Lakshmi

Head of Engineering

e6data

Head of Engineering e6data

Workshops

Everything Everywhere All at Once: Object Store Native

Beto Ferreira De Almeida

Staff Engineer

Preset

Data Should be Invisible

Beto Ferreira De Almeida

Staff Engineer

Preset

Staff Engineer Preset

AI & Data Culture

Data Should be Invisible

The modern data landscape is dominated by complexity: tables, schemas, pipelines, warehouses, and more. Yet the most successful data platforms share a common principle—they make data itself invisible to the end user. When data infrastructure functions optimally, it's like good plumbing: you only notice it when something breaks. Organizations often fixate on the mechanics of data while losing sight of what truly matters: metrics, dimensions, and semantics. When users engage with meaningful abstractions rather than technical details, they make better decisions faster. In this talk, you'll learn strategies for making data invisible through real-world abstraction success stories, designing effortless interactions, and implementing governance through abstraction. Walk away with practical ways to assess your data stack, advocate for user-centric approaches, and measure progress—making your data platform not just powerful, but invisible in all the right ways.

Josh Curl

Co-Founder & CTO

Hightouch

Bridging the AI Implementation gap: Strategies for Embedding Data Professionals with Business Units

Josh Curl

Co-Founder & CTO

Hightouch

Co-Founder & CTO Hightouch

AI & Data Culture

Bridging the AI Implementation gap: Strategies for Embedding Data Professionals with Business Units

At the foundation of AI project failures lies a critical gap between data teams and business reality. On top of this gap, data quality issues, unexpected privacy concerns, and tools that don't align with actual business problems arise to hinder or block implementation. As we've built our own AI product—AI Decisioning—and implemented it with customers, we've learned that successful AI implementations depend on embedding data teams within business units. Embedding doesn't mean breaking apart your data team and dispersing it throughout every other department. It means establishing focused partnerships where data team members are deeply integrated into business teams' daily workflows and decision-making processes while remaining connected to the central data organization. This embedding creates a virtuous cycle: data teams gain deep domain knowledge, business professionals see improved data quality and gain data facility, and together, data and business teams implement AI solutions that solve real problems. In this talk, we'll share concrete examples of how data teams (especially data scientists) and marketing have worked together in successful AI Decisioning implementations. We’ll derive strategies to implement this organizational pattern and enable a company to move from analytics to actions and from data teams as service providers to active collaborators. While our case studies focus primarily on marketing partnerships, the embedded partnership model we present applies equally to other business functions including product development, operations, and customer service teams.

Tomás Kofman

Co-Founder & CEO

Not Diamond

How to Build Your Own Model Router

Tomás Kofman

Co-Founder & CEO

Not Diamond

Co-Founder & CEO Not Diamond

Lightning Talks

How to Build Your Own Model Router

Building Cost-Effective LLM Routers: Boost Accuracy 25% While Cutting Costs 90% | This session reveals how to build intelligent model routers that dynamically direct inputs to the optimal large language model (LLM) for each specific task. Attendees will learn practical implementation strategies for multi-model LLM systems that significantly improve performance metrics—achieving up to 25% higher accuracy while reducing operational costs by as much as 90%. The presentation covers essential routing methodologies, evaluation frameworks, and scalable architectures for production deployments. Developers and ML engineers will gain actionable insights for overcoming technical challenges in multi-model LLM systems, optimizing both performance and cost-efficiency in generative AI applications. Perfect for teams looking to maximize ROI from their AI infrastructure while maintaining high-quality outputs.

Diptanu Gon Choudhury

Founder& CEO

Tensorlake

The Future Of Data Engineering: From Unstructured To Structured for Agent Systems

Diptanu Gon Choudhury

Founder& CEO

Tensorlake

Founder& CEO Tensorlake

Data Eng & Infrastructure

The Future Of Data Engineering: From Unstructured To Structured for Agent Systems

Gleb Mezhanskiy

Co-Founder & CEO

Datafold

The Future Of Data Engineering: From Unstructured To Structured for Agent Systems

Gleb Mezhanskiy

Co-Founder & CEO

Datafold

Co-Founder & CEO Datafold

Data Eng & Infrastructure

The Future Of Data Engineering: From Unstructured To Structured for Agent Systems

Nathan Sooter

Sr. Manager, RevOps Analytics & Insights

1Password

Go-To-Market Data Enrichment: Practical Strategies to Drive Business Value

Nathan Sooter

Sr. Manager, RevOps Analytics & Insights

1Password

Sr. Manager, RevOps Analytics & Insights 1Password

Analytics & BI

Go-To-Market Data Enrichment: Practical Strategies to Drive Business Value

Let’s be frank, data teams and sales teams don’t always see eye to eye. Data teams see Salesforce as a swamp of messy, user-generated chaos. Sales teams often see data teams as a slow-moving black box. The result? Frustration on both sides and a missed opportunity to drive business value. But what if data teams weren’t just seen as pipeline or dashboard builders, but as strategic partners in revenue growth? In this session, we’ll show how simple, no-fuss data engineering and enrichment can transform the way sales teams trust and use data. Through real-world examples, like cleaning up CRM records with the power of LLMs, we’ll explore how small but intentional changes in ingestion and modeling can change the perception of a data team. We'll explore practical strategies to make your work more visible, valuable, and aligned with the GTM team.

Margaret Quigley

ex-Cohere Head of Data Acquisition

MQ Consulting

Ethical Data Acquisition & Sales in the AI Age

Margaret Quigley

ex-Cohere Head of Data Acquisition

MQ Consulting

ex-Cohere Head of Data Acquisition MQ Consulting

Lightning Talks

Ethical Data Acquisition & Sales in the AI Age

Learn strategies for sitting on both sides of a data acquisition negotiation table - not just how to evaluate and price new data for training & evaluating AI/LLMs, but also how to efficiently package and sell it as a data owner. Margaret will cover real-world examples from her 7 years of experience running Data Acquisition & GTM teams with leading AI companies and data vendors that touch on nuances within ethics & due diligence, security & storage, and transparency & accountability.

Jonathan Mortensen

CEO

Confident Security

The Unofficial Guide to Apple’s Private Cloud Compute

Jonathan Mortensen

CEO

Confident Security

CEO Confident Security

ML OPs & Platforms

The Unofficial Guide to Apple’s Private Cloud Compute

In October 2024, Apple released a new private AI technology onto millions of devices called “Private Cloud Compute”. It brings the same level of privacy and security a local device offers but on an “untrusted" remote server. This talk discusses how Private Cloud Compute represents a paradigm shift in confidential computing and explores the core advancements that made it possible to become mainstream. We’ll explore its novel architecture that allows developers to run sensitive, multi-tenant workloads with cryptographically-provably privacy guarantees at scale and at reasonable cost. Attendees will leave with an understanding of how to leverage this technology for data and AI applications where privacy and security is paramount.

Skip Everling

Head of Developer Relations

Kolena

AI-Powered Automation: Supercharge Data-Intensive Workflows with Intelligent Agents

Skip Everling

Head of Developer Relations

Kolena

Head of Developer Relations Kolena

Workshops

AI-Powered Automation: Supercharge Data-Intensive Workflows with Intelligent Agents

In today’s fast-paced, data-heavy industries, crucial information is often buried in PDFs, contracts, compliance reports, and other unstructured sources—slowing down decision-making and increasing risk. Join us for an interactive workshop where we’ll showcase how to use AI gents to automate data-intensive workflows for analysts, compliance officers, underwriters, diligence teams, and knowledge workers. In this hands-on session, you’ll learn how AI can: Automate repetitive tasks — freeing your team to focus on high-value analysis. Enhance accuracy & consistency — reducing errors and ensuring data integrity. Accelerate decision-making — with faster data extraction and smarter insights. Scale operations — handling complex tasks across massive document sets with ease. We’ll walk through real-world use cases, including compliance assessments, contract analysis, risk evaluation, and more—demonstrating how AI agents can streamline workflows, reduce bottlenecks, and drive smarter decisions. If you deal with large volumes of documents, this workshop is for you. Walk away with actionable strategies to: Boost productivity. Reduce risk. Gain a competitive edge.

Jacob Matson

Developer Advocate

MotherDuck

More Than a Vibe: AI-Driven SQL that Actually Works

Jacob Matson

Developer Advocate

MotherDuck

Developer Advocate MotherDuck

Workshops

More Than a Vibe: AI-Driven SQL that Actually Works

In this hands-on workshop, we will demonstrate how AI can empower you to "vibe code"—using AI to write accurate SQL, enabled only by the magic of MotherDuck & DuckDB. Participants will work with a real life spatial data set to tackle real-world challenges and see firsthand how AI-Driven DuckDB SQL can transform data handling into a rapid, low-risk, interactive process. By the end of the workshop, participants will have experienced an end-to-end workflow: from ingesting and querying spatial data with DuckDB/MotherDuck, to refining query results with AI, and finally presenting insights through Python visualizations. This session is designed to empower you to confidently incorporate AI in your coding processes, transforming how you approach data analysis and decision making in real-world business scenarios.

1. Key Components of the Workshop: Dataset Handling: Participants will work with a spatial dataset to evaluate potential locations for opening a new BBQ restaurant. Thanks to MotherDuck, the dataset is easily brought down locally in a highly compressed format, ensuring a quick and safe environment for experimentation.

2. Live AI-Assisted Coding: The workshop will feature a live demonstration where an AI tool iteratively generates SQL queries. Rather than pre-defining metrics, the AI assists in exploring and defining the spatial parameters necessary to identify the optimal restaurant location—a process that mirrors real-world, dynamic decision-making.

3. Real-Time Data Visualization: As queries are refined and executed, Python will be used to chart the results on the fly. Utilizing uv for environment management alongside visualization libraries such as Seaborn and Matplotlib, participants will see how spatial insights are translated into clear, actionable charts.

4. Iterative, Low-Risk Workflow: The session emphasizes a low-risk, experimental approach. If the AI-generated code isn't perfect, no harm is done—files can be quickly deleted or corrected, encouraging a creative, hands-on learning environment where trial and error lead to deeper understanding.

Cole Bowden

Developer Advocate

Firebolt

The Power of Low Latency Data for AI Apps

Cole Bowden

Developer Advocate

Firebolt

Developer Advocate Firebolt

Workshops

The Power of Low Latency Data for AI Apps

Retrieval-augmented generation (RAG) has transformed AI applications by grounding responses with external data. It can be better. By pairing RAG with low latency SQL analytics, you can enrich responses with instant insights, leading to a more interactive and insightful user experience with fresh, data-driven intelligence. In this talk, we’ll demo how low latency SQL combined with an AI application can deliver speed, accuracy, and trust.

Rui Lopes

Head of AI

DataLinks

Powering AI Workflows with Tabular Graphs

Rui Lopes

Head of AI

DataLinks

Head of AI DataLinks

Workshops

Powering AI Workflows with Tabular Graphs

DataLinks is the new semantic layer for AI systems. Join us in this workshop to gain a concise overview of our entity-linking technology, backed by two dynamic demonstrations. First we will enable you to experience firsthand how our intuitive user interface simplifies complex data integration, visualization, and exploration, enabling rapid discoverability and seamless dataset linkage. Then we invite you to discover the flexibility of our API and Python SDK, designed for developers to effortlessly integrate automated entity resolution and graph-based insights into their workflows and applications. Finally, we'll show how to leverage our platform for natural language search over your data enabling AutoRAG for your application.

Issac Roth

Co-Founder & CEO

Orama

OramaCore: A Search Database with LLMs Built-In

Issac Roth

Co-Founder & CEO

Orama

Co-Founder & CEO Orama

Lightning Talks

OramaCore: A Search Database with LLMs Built-In

In this fast-paced talk by we’ll dive right in to why the world needed another database - this time with multiple LLMs and a JavaScript engine right in the same process. A database that runs on GPUs? Why why why? Turns out this is the ultimate platform for agentic AI like the SaaS Copilots and answer engines that developers create with Orama. That’s why! We’ll look at the construction of the database, the algorithms involved, how we made it fast, and a little bit of what you can do with it. OramaCore is open source and just released!

Alexy Khraborov

AI/ML Community Architect

Neo4j

OAKS: Open Agentic Knowledge Stack

Alexy Khraborov

AI/ML Community Architect

Neo4j

AI/ML Community Architect Neo4j

Lightning Talks

OAKS: Open Agentic Knowledge Stack

The first two years of the GenAI revolution are bending the OSS way: Open Source models have reached state of the art, and most of the ecosystem around AI is open-source. The key to AI adoption is properly organizing and using business knowledge. In industry, LLMs give way to Small Specialized Models (SSMs), utilized by Domain Expert Agents (DXAs). Their work should be structured according to the domain requirements, requiring structured output. Organizing and using domain knowledge for AI has long been a domain of Knowledge Graphs (KG). At Neo4j, we are in a moment where our KG leadership powers the rise of GraphRAG, a better context traversal that we lead alongside Microsoft, Amazon, Google, and other GenAI partners. We also integrate with many OSS AI startups to build a better AI stack around GraphRAG. Neo4j has joined LFAI to bridge the enterprise AI adoption with startup innovation, centered around structured knowledge. In this talk we describe OAKS, a set of projects, communities, and technologies that comprise the Open Agentic AI Knowledge Stack. We show where the most value will be created and how the OSS AI ecosystems come together to build and deliver it.

OAKS consists of structured input, knowledge transformation, and structured output. We show the Agentic AI architectures emerging around AI memory, graph-based agentic workflows, and frameworks including scalable message passing, knowledge encapsulation, and colocated knowledge and computation for web-scale routing. We invite the community to join us!

Anant Agarwal

Staff Software Engineer & Engineering Lead

Instacart

Orchestrating at Scale: How Instacart Manages 20M+ Daily Workflows

Anant Agarwal

Staff Software Engineer & Engineering Lead

Instacart

Staff Software Engineer & Engineering Lead Instacart

Data Eng & Infrastructure

Orchestrating at Scale: How Instacart Manages 20M+ Daily Workflows

Building High-Throughput Data Orchestration: Instacart's Journey to 20M Daily Workflows | Explore how Instacart built an enterprise-grade orchestration system handling 20 million daily workflows across diverse technical domains. Learn implementation details of their cloud-native platform combining Apache Airflow and Temporal for robust scheduling and execution. Deep dive into YAML-based workflow definitions, GitOps deployment patterns, and observability solutions that enable reliable scaling. Practical insights from years of production experience, applicable to both startups and enterprises building scalable data infrastructure.

Skyler Thomas

Co-Founder & CTO

Cake AI

Make Too Much Knowledge Just Enough. Massive Scale RAG and GraphRAG with Open Source

Skyler Thomas

Co-Founder & CTO

Cake AI

Co-Founder & CTO Cake AI

GenAI Applications

Make Too Much Knowledge Just Enough. Massive Scale RAG and GraphRAG with Open Source

RAG systems that work in the real world are not just the trivial extract, vector search, and rerank systems that the simplistic "Introductions to RAG" suggest. After this talk, you will understand how to think about the design and construction of real world RAG and GraphRAG systems that can scale to hundreds of millions of documents or billions of vectors. You will learn about the complex orchestration of multiple libraries. You will also learn how to use tools and frameworks that use open standards like OpenTelemetry or OpenInference to help you monitor and debug these complex RAG orchestrations. Topics will include discussions of scalable RAG/GraphRAG architectures, complex extraction flows, embedding model and re-ranking considerations. We will dive deep into integration between various libraries like Ray, LangChain, LlamaIndex, DSPy, Phoenix, Weaviate, PgVector, GraphRAG, LangGraph, AirFlow, KFP and vLLM to form a cohesive solutions that actually scale. We will discuss the patterns and anti-patterns Cake has learned building and deploying these systems for real customers. If time permits, we will address advanced topics like complex table-detection/extraction for financial data, complex agentic flows to handle heterogeneous datasets, etc.

Brenna Buuck

Developer Evangelist

MinIO

The Middle Ground: Balancing Batch and Real-Time Processing in a Data Lakehouse

Brenna Buuck

Developer Evangelist

MinIO

Developer Evangelist MinIO

Lightning Talks

The Middle Ground: Balancing Batch and Real-Time Processing in a Data Lakehouse

Data Lakehouse Architecture: Unifying Batch and Real-Time Data Processing | Is your organization stuck choosing between batch and streaming? The reality is, you probably need both. This session explores how modern data lakehouse architectures are breaking down the false dichotomy between batch and real-time processing. We'll examine how innovative organizations are using lakehouse platforms to handle everything from millisecond-latency queries to massive batch analytics jobs on a single unified platform. Learn how this hybrid approach is transforming data infrastructure, reducing complexity, and enabling teams to build more flexible, future-proof data systems.

Colleen Tartow

Senior Director, Enterprise Data Engineering

Capital One

No More BS: How (and When) to Really Leverage AI

Colleen Tartow

Senior Director, Enterprise Data Engineering

Capital One

Senior Director, Enterprise Data Engineering Capital One

AI & Data Culture

No More BS: How (and When) to Really Leverage AI

Marco Slot

Software Imagineer

Crunchy Data

Converging Database Architectures: DuckDB in PostgreSQL

Marco Slot

Software Imagineer

Crunchy Data

Software Imagineer Crunchy Data

Databases

Converging Database Architectures: DuckDB in PostgreSQL

Traditionally divided between transactional and analytical systems, databases are converging through innovative architectural approaches. This talk explores the fusion of PostgreSQL and DuckDB, demonstrating how embedding an OLAP database into an OLTP system can simplify data platforms. Attendees will learn about the motivations, challenges, and substantial benefits of creating a unified system capable of high-throughput transactions, fast analytical queries, and seamless data processing across different paradigms.

Anil Sadineni

Principal Software Engineer

1upHealth

A Modern Data Stack in Healthcare

Anil Sadineni

Principal Software Engineer

1upHealth

Principal Software Engineer 1upHealth

Lightning Talks

A Modern Data Stack in Healthcare

The US Healthcare industry faces complex data exchange challenges, with legacy standards creating massive processing burdens. This session explores how emerging technologies like FHIR can transform healthcare data management by leveraging modern data stack approaches. Attendees will discover innovative strategies for addressing unique healthcare data challenges, including cross-entity data contracts, identity management, and end-to-end lineage preservation. Learn how technologies from social media, advertising, and finance can revolutionize healthcare data processing, overcoming traditional interoperability and scalability limitations.

Dr. Greg Michaelson

Co-Founder & Chief Product Officer

Zerve AI

Scaling GenAI & Agentic Workflows for practical solutions with Zerve

Dr. Greg Michaelson

Co-Founder & Chief Product Officer

Zerve AI

Co-Founder & Chief Product Officer Zerve AI

Workshops

Scaling GenAI & Agentic Workflows for practical solutions with Zerve

Enterprises investing in Generative AI (GenAI) or Agentic Workflows need more than just cutting-edge models—they need scalable, cost-efficient systems that deliver real business impact. In this session we’ll show how Zerve unlocks the full potential of GenAI using it’s distributed computing engine, The Fleet. You’ll learn how enterprises as advanced as Canal+ and NASA as well as cutting edge startups are streamlining AI development, reducing infrastructure costs, and transforming GenAI into a scalable, high-impact business solution.

CURATING TRACK SPEAKERS. STAY TUNED.

View all speakers

100+ Speakers

Learn from data & AI heroes at top companies as they explain their architectures, discoveries and solutions in detail.

Talk Schedule

Lloyd Tabb

Founder/Former CTO - Looker & Co-creator of Malloy

Meta

Building Blocks: Reusing Queries in Semantic Data Modeling

Lloyd Tabb

Founder/Former CTO - Looker & Co-creator of Malloy

Meta

Founder/Former CTO - Looker & Co-creator of Malloy Meta

Analytics & BI

Building Blocks: Reusing Queries in Semantic Data Modeling

Hannes Mühleisen

Co-Creator of DuckDB

DuckDB Labs

Liberate Analytical Data Management with DuckDB

Hannes Mühleisen

Co-Creator of DuckDB

DuckDB Labs

Co-Creator of DuckDB DuckDB Labs

Data Eng & Infrastructure

Liberate Analytical Data Management with DuckDB

Nikunj Handa

Product Lead

OpenAI

OpenAI’s Responses API: A New Foundation for Building with Models & Tools

Nikunj Handa

Product Lead

OpenAI

Product Lead OpenAI

Foundation Models

OpenAI’s Responses API: A New Foundation for Building with Models & Tools

Naveen Rao

VP of AI

Databricks

Data Meets Intelligence: Where the Data Infra & AI Stack Converge

Naveen Rao

VP of AI

Databricks

VP of AI Databricks

Keynotes

Data Meets Intelligence: Where the Data Infra & AI Stack Converge

Ravin Kumar

Senior Researcher

Google Deepmind

Models as Tools: My Perspective On the Matter

Ravin Kumar

Senior Researcher

Google Deepmind

Senior Researcher Google Deepmind

Foundation Models

Models as Tools: My Perspective On the Matter

Denis Yarats

Co-Founder & CTO

Perplexity

RAGs to Riches: Engineering the Future of LLM Systems

Denis Yarats

Co-Founder & CTO

Perplexity

Co-Founder & CTO Perplexity

Keynotes

RAGs to Riches: Engineering the Future of LLM Systems

Hadley Wickham

Chief Scientist

Posit

LLMs for Data Science

Hadley Wickham

Chief Scientist

Posit

Chief Scientist Posit

Data Sci & Algos

LLMs for Data Science

Aaron Katz

Co-Founder & CEO

Clickhouse

Real-Time Data Infrastructure and AI: Powering the Next Generation of Analytics

Aaron Katz

Co-Founder & CEO

Clickhouse

Co-Founder & CEO Clickhouse

Keynotes

Real-Time Data Infrastructure and AI: Powering the Next Generation of Analytics

Paige Bailey

AI Developer Experience Engineer

Google

Introduction to Google DeepMind's Models: Gemini 2.0, Imagen 3, and Veo

Paige Bailey

AI Developer Experience Engineer

Google

AI Developer Experience Engineer Google

Workshops

Introduction to Google DeepMind's Models: Gemini 2.0, Imagen 3, and Veo

Ryan Blue

Creator of Apache Iceberg, Member of Technical Staff

Databricks

Why is Everyone Talking about Apache Iceberg™? (From the Original Creator of Apache Iceberg)

Ryan Blue

Creator of Apache Iceberg, Member of Technical Staff

Databricks

Creator of Apache Iceberg, Member of Technical Staff Databricks

Data Eng & Infrastructure

Why is Everyone Talking about Apache Iceberg™? (From the Original Creator of Apache Iceberg)

Ganesh Ramanarayanan

VP Engineering

Hex

Multi-Modal Compute for Data Analytics

Ganesh Ramanarayanan

VP Engineering

Hex

VP Engineering Hex

Analytics & BI

Multi-Modal Compute for Data Analytics

Raghotham Murthy

Software Engineer, Llama

Meta

Building LLM Applications with Llama Stack

Raghotham Murthy

Software Engineer, Llama

Meta

Software Engineer, Llama Meta

Foundation Models

Building LLM Applications with Llama Stack

Martin Casado

General Partner

a16z

Real-Time Data Infrastructure and AI: Powering the Next Generation of Analytics

Martin Casado

General Partner

a16z

General Partner a16z

Keynotes

Real-Time Data Infrastructure and AI: Powering the Next Generation of Analytics

Tanya Bragin

VP Product

ClickHouse

Unbundling of the Cloud Data Warehouse

Tanya Bragin

VP Product

ClickHouse

VP Product ClickHouse

Databases

Unbundling of the Cloud Data Warehouse

Ethan Rosenthal

Member of Technical Staff

Runway

Building a Data Foundation for Multimodal Foundation Models

Ethan Rosenthal

Member of Technical Staff

Runway

Member of Technical Staff Runway

Foundation Models

Building a Data Foundation for Multimodal Foundation Models

Tengyu Ma

Co-Founder & CEO

Voyage AI

RAG In 2025: State Of The Art And The Road Forward

Tengyu Ma

Co-Founder & CEO

Voyage AI

Co-Founder & CEO Voyage AI

AI Engineering

RAG In 2025: State Of The Art And The Road Forward

Charles Frye

Developer Advocate

Modal Labs

What Every Data Scientist Needs To Know About GPUs

Charles Frye

Developer Advocate

Modal Labs

Developer Advocate Modal Labs

AI Engineering

What Every Data Scientist Needs To Know About GPUs

Shreya Rajpal

Co-Founder & CEO

Guardrails

The Future Of Guardrails

Shreya Rajpal

Co-Founder & CEO

Guardrails

Co-Founder & CEO Guardrails

AI Engineering

The Future Of Guardrails

Eno Reyes

CTO

Factory

Building Reliable Agentic AI Systems

Eno Reyes

CTO

Factory

CTO Factory

AI Engineering

Building Reliable Agentic AI Systems

Sharon Zhou

Founder & CEO

Lamini

RAGs to Riches: Engineering the Future of LLM Systems

Sharon Zhou

Founder & CEO

Lamini

Founder & CEO Lamini

Keynotes

RAGs to Riches: Engineering the Future of LLM Systems

Mike Driscoll

Co-Founder & CTO

Rill Data

A SQL-Based Metrics Layer for DuckDB and Clickhouse

Mike Driscoll

Co-Founder & CTO

Rill Data

Co-Founder & CTO Rill Data

Analytics & BI

A SQL-Based Metrics Layer for DuckDB and Clickhouse

Bryan Bischof

Head of AI

Theory Ventures

Failure Is A Funnel

Bryan Bischof

Head of AI

Theory Ventures

Head of AI Theory Ventures

Data Sci & Algos

Failure Is A Funnel

Nuno Campos

Founding Engineer

LangChain

The Future Of Data Engineering: From Unstructured To Structured for Agent Systems

Nuno Campos

Founding Engineer

LangChain

Founding Engineer LangChain

Data Eng & Infrastructure

The Future Of Data Engineering: From Unstructured To Structured for Agent Systems

Michele Catasta

President

Replit

RAGs to Riches: Engineering the Future of LLM Systems

Michele Catasta

President

Replit

President Replit

Keynotes

RAGs to Riches: Engineering the Future of LLM Systems

Jake Brill

Head of Product - Integrity

OpenAI

Guardrails for the Future: AI Safety and Responsible AI in Practice

Jake Brill

Head of Product - Integrity

OpenAI

Head of Product - Integrity OpenAI

Keynotes

Guardrails for the Future: AI Safety and Responsible AI in Practice

Rachad Alao

Senior Engineering Director

Meta

Guardrails for the Future: AI Safety and Responsible AI in Practice

Rachad Alao

Senior Engineering Director

Meta

Senior Engineering Director Meta

Keynotes

Guardrails for the Future: AI Safety and Responsible AI in Practice

Julien Le Dem

Principal Engineer

Datadog

The Deconstructed Database and the Advent of the Open Data Lake

Julien Le Dem

Principal Engineer

Datadog

Principal Engineer Datadog

Keynotes

The Deconstructed Database and the Advent of the Open Data Lake

Joseph Gonzalez

Professor

RunLLM & UC Berkeley

AGI Is Already Here (But It's Not What You Think)

RAGs to Riches: Engineering the Future of LLM Systems

Joseph Gonzalez

Professor

RunLLM & UC Berkeley

Professor RunLLM & UC Berkeley

GenAI Applications

AGI Is Already Here (But It's Not What You Think)

Keynotes

RAGs to Riches: Engineering the Future of LLM Systems

Krishnaram Kenthapadi

Chief Scientist, Clinical AI

Oracle Health

Guardrails for the Future: AI Safety and Responsible AI in Practice

Krishnaram Kenthapadi

Chief Scientist, Clinical AI

Oracle Health

Chief Scientist, Clinical AI Oracle Health

Keynotes

Guardrails for the Future: AI Safety and Responsible AI in Practice

George Mathew

Managing Director

Insight Partners

Data Meets Intelligence: Where the Data Infra & AI Stack Converge

George Mathew

Managing Director

Insight Partners

Managing Director Insight Partners

Keynotes

Data Meets Intelligence: Where the Data Infra & AI Stack Converge

Daniel Olmedilla

Distinguished Engineer, AI & Trust

Guardrails for the Future: AI Safety and Responsible AI in Practice

Daniel Olmedilla

Distinguished Engineer, AI & Trust

Distinguished Engineer, AI & Trust LinkedIn

Keynotes

Guardrails for the Future: AI Safety and Responsible AI in Practice

Sumti Jairath

Chief Architect

SambaNova Systems

Bringing Trillions to Reality: How SambaNova’s Memory-Centric Design Powers Agentic AI and GenAI Workflows for Enterprise Data

Sumti Jairath

Chief Architect

SambaNova Systems

Chief Architect SambaNova Systems

Keynotes

Bringing Trillions to Reality: How SambaNova’s Memory-Centric Design Powers Agentic AI and GenAI Workflows for Enterprise Data

Han-chung Lee

Machine Learning Director

Moody's Analytics

The Model is the Product

Han-chung Lee

Machine Learning Director

Moody's Analytics

Machine Learning Director Moody's Analytics

Foundation Models

The Model is the Product

Julian Hyde

Senior Staff Engineer

Google

More Than Query: Future Directions of Query Langages, from SQL to Morel

Julian Hyde

Senior Staff Engineer

Google

Senior Staff Engineer Google

Analytics & BI

More Than Query: Future Directions of Query Langages, from SQL to Morel

Pedram Navid

Head of Data Engineering & DevRel

Dagster Labs

Write Less More: How Dagster Rebuilt Our Docs from the Ground Up

Pedram Navid

Head of Data Engineering & DevRel

Dagster Labs

Head of Data Engineering & DevRel Dagster Labs

Lightning Talks

Write Less More: How Dagster Rebuilt Our Docs from the Ground Up

Yusuf Ozuysal

Director of Engineering, AI

Snowflake

AI Your Way with All-In-One Access

Yusuf Ozuysal

Director of Engineering, AI

Snowflake

Director of Engineering, AI Snowflake

Workshops

AI Your Way with All-In-One Access

Paul Dix

Founder & CTO

InfluxData

Building InfluxDB 3 Core: A Real-Time Columnar DB and Data Processor on Object Storage

Paul Dix

Founder & CTO

InfluxData

Founder & CTO InfluxData

Databases

Building InfluxDB 3 Core: A Real-Time Columnar DB and Data Processor on Object Storage

Hamel Husain

Machine Learning Consultant

Parlance Labs

The Model is Not the Product

Hamel Husain

Machine Learning Consultant

Parlance Labs

Machine Learning Consultant Parlance Labs

Foundation Models

The Model is Not the Product

This Data Council 2025 talk is in development. Check back soon!

Chenggang Wu

Co-Founder & CTO

RunLLM

AGI Is Already Here (But It's Not What You Think)

Chenggang Wu

Co-Founder & CTO

RunLLM

Co-Founder & CTO RunLLM

GenAI Applications

AGI Is Already Here (But It's Not What You Think)

Alexa Garrison

VP Data & Business Operations

Splice

Building High-Impact Data Teams in an AI-Driven World

Alexa Garrison

VP Data & Business Operations

Splice

VP Data & Business Operations Splice

AI & Data Culture

Building High-Impact Data Teams in an AI-Driven World

Etienne Dilocker

CTO

Weaviate

The Agentic Database: A New Way to Interact with Your Data

Etienne Dilocker

CTO

Weaviate

CTO Weaviate

Databases

The Agentic Database: A New Way to Interact with Your Data

Samuel Colvin

Founder

Pydantic

Pydantic: An Opinionated Blueprint for the Future of GenAI Applications

Samuel Colvin

Founder

Pydantic

Founder Pydantic

Workshops

Pydantic: An Opinionated Blueprint for the Future of GenAI Applications

Andy Pavlo

Assistant Professor of Databaseology

Carnegie Mellon University

What Goes Around Comes Around... and Around...

Andy Pavlo

Assistant Professor of Databaseology

Carnegie Mellon University

Assistant Professor of Databaseology Carnegie Mellon University

Databases

What Goes Around Comes Around... and Around...

Dhruv Singh

Co-founder & CTO

HoneyHive AI

Eval Agents: How to Solve Error Cascades in Agents

Dhruv Singh

Co-founder & CTO

HoneyHive AI

Co-founder & CTO HoneyHive AI

AI Engineering

Eval Agents: How to Solve Error Cascades in Agents

George Fraser

Co-Founder & CEO

Fivetran

Look Ma, No Data Warehouse!

George Fraser

Co-Founder & CEO

Fivetran

Co-Founder & CEO Fivetran

Workshops

Look Ma, No Data Warehouse!

1. Load data that is automatically converted to Iceberg open table format

2. Run SQL queries using DuckDB’s new Iceberg extension

3. Run transformations directly on data stored in your data lake with a new dbt adapter

4. Get started easily with a practical, hands-on demo

Simon Eskildsen

Co-Founder

Turbopuffer

Billion-Scale Vector Search on Object Storage

Simon Eskildsen

Co-Founder

Turbopuffer

Co-Founder Turbopuffer

Data Eng & Infrastructure

Billion-Scale Vector Search on Object Storage

Vishnu Vasanth

Co-Founder & CEO

e6data

Everything Everywhere All at Once: Object Store Native

Vishnu Vasanth

Co-Founder & CEO

e6data

Co-Founder & CEO e6data

Workshops

Everything Everywhere All at Once: Object Store Native

Niko Grupen

Head of Applied Research

Harvey

Legal Agency: Building Domain-specific Agents for Enterprise

Niko Grupen

Head of Applied Research

Harvey

Head of Applied Research Harvey

GenAI Applications

Legal Agency: Building Domain-specific Agents for Enterprise

Dillon Morrison

Director of Product Management

Sigma Computing

Text-to-SQL Is Not the Answer: How to Effectively Use AI For Analytics

Dillon Morrison

Director of Product Management

Sigma Computing

Director of Product Management Sigma Computing

Workshops

Text-to-SQL Is Not the Answer: How to Effectively Use AI For Analytics

Natacha Crooks

Assistant Professor

UC Berkeley

From Concurrency Control to Concurrent Scheduling

Natacha Crooks

Assistant Professor

UC Berkeley

Assistant Professor UC Berkeley

Databases

From Concurrency Control to Concurrent Scheduling

This Data Council 2025 talk is in development. Please check back soon for updates!

Rachel Lee Nabors

Former React Core

Meta

AI Cram Session

Rachel Lee Nabors

Former React Core

Meta

Former React Core Meta

AI Engineering

AI Cram Session

Chenyu Qiu

Senior Applied Scientist

Uber

Scalable Continuous Monitoring for Large-scale A/B Experimentation

Chenyu Qiu

Senior Applied Scientist

Uber

Senior Applied Scientist Uber

Data Sci & Algos

Scalable Continuous Monitoring for Large-scale A/B Experimentation

Ori Soen

CEO

Montara

Analytics and the dark side of the Analytics Development Lifecycle

Ori Soen

CEO

Montara

CEO Montara

Workshops

Analytics and the dark side of the Analytics Development Lifecycle

Franck Pachot

Developer Advocate

MongoDB

The Modern Database Debate: PostgreSQL and MongoDB

Franck Pachot

Developer Advocate

MongoDB

Developer Advocate MongoDB

Databases

The Modern Database Debate: PostgreSQL and MongoDB

Parham Parvizi

Founding Data Architect

Prospective

A Local-First approach to extremely fast Streaming Visualization

Parham Parvizi

Founding Data Architect

Prospective

Founding Data Architect Prospective

Workshops

A Local-First approach to extremely fast Streaming Visualization

In this workshop, we’ll explore:

Wenjing Zheng

Data Science Manager

Roblox

Causal Inference Methods for Bridging Experiments and Strategic Impact

Wenjing Zheng

Data Science Manager

Roblox

Data Science Manager Roblox

Data Sci & Algos

Causal Inference Methods for Bridging Experiments and Strategic Impact

Doron Porat

Co-Founder & CEO

Lakeway

AI is Going to Break Your Data Platform - Are You Ready?

Doron Porat

Co-Founder & CEO

Lakeway

Co-Founder & CEO Lakeway

AI & Data Culture

AI is Going to Break Your Data Platform - Are You Ready?

Oriol Mirosa

Director, Data Solutions

Brooklyn Data Co

Data Governance is NOT the Governance of Data!

Oriol Mirosa

Director, Data Solutions

Brooklyn Data Co

Director, Data Solutions Brooklyn Data Co

AI & Data Culture

Data Governance is NOT the Governance of Data!

Willem Pienaar

Co-Founder & CTO

Cleric

Chaos by Design: Solving the Unsolvable AI Agent Testing Problem

Willem Pienaar

Co-Founder & CTO

Cleric

Co-Founder & CTO Cleric

Lightning Talks

Chaos by Design: Solving the Unsolvable AI Agent Testing Problem

Mitul Tiwari

Co-founder & CTO

Stealth

TapeAgents: A Powerful Framework For Building And Optimizing AI Agents

Mitul Tiwari

Co-founder & CTO

Stealth

Co-founder & CTO Stealth

GenAI Applications

TapeAgents: A Powerful Framework For Building And Optimizing AI Agents

Timothy Chan

Head of Data

Statsig

Unlocking A/B Testing For B2B

Timothy Chan

Head of Data

Statsig

Head of Data Statsig

Data Sci & Algos

Unlocking A/B Testing For B2B

David Wilson

Co-Founder & CEO

Hunch Tools

Designing & Engineering a Viral Multi-Model AI Workflow: From Prototype to 300K Users in Two Weeks

David Wilson

Co-Founder & CEO

Hunch Tools

Co-Founder & CEO Hunch Tools

GenAI Applications

Designing & Engineering a Viral Multi-Model AI Workflow: From Prototype to 300K Users in Two Weeks

Ofer Mendelevitch

Head of Developer Relations

Vectara

Building Enterprise Agentic RAG Applications with Reduced Hallucinations

Ofer Mendelevitch

Head of Developer Relations

Vectara

Head of Developer Relations Vectara

Workshops

Building Enterprise Agentic RAG Applications with Reduced Hallucinations

Lindsay Murphy

Director, Head of Data

Hiive

No More BS: How (and When) to Really Leverage AI

Lindsay Murphy

Director, Head of Data

Hiive

Director, Head of Data Hiive

AI & Data Culture

No More BS: How (and When) to Really Leverage AI

Jake Thomas

Manager, Data Foundations

Okta

Embedding OLAP, Everywhere: Lessons from Okta

Jake Thomas

Manager, Data Foundations

Okta

Manager, Data Foundations Okta

ML OPs & Platforms

Embedding OLAP, Everywhere: Lessons from Okta

Tobias Lunt

Co-Founder & Data Scientist

Development Data Lab

Putting Data to Work for Global Urban Development

Tobias Lunt

Co-Founder & Data Scientist

Development Data Lab

Co-Founder & Data Scientist Development Data Lab

Lightning Talks

Putting Data to Work for Global Urban Development

Marck Vaisman

Global AI Solutions Architect

Microsoft

Revolutionize AI Engineering With Autogen

Marck Vaisman

Global AI Solutions Architect

Microsoft

Global AI Solutions Architect Microsoft

AI Engineering

Revolutionize AI Engineering With Autogen

Elias DeFaria

Co-Founder & VP of Product

SDF

Why dbt Acquired Sdf: How A Small Team Built True SQL Comprehension

Elias DeFaria

Co-Founder & VP of Product

SDF

Co-Founder & VP of Product SDF

Data Eng & Infrastructure

Why dbt Acquired Sdf: How A Small Team Built True SQL Comprehension

Mickey Liu

Software Engineer

Notion

Billion-Scale Vector Search on Object Storage

Mickey Liu

Software Engineer

Notion

Software Engineer Notion

Data Eng & Infrastructure

Billion-Scale Vector Search on Object Storage

Sumedh Sakdeo

Senior Staff Software Engineer

Optimizing Iceberg Table Layouts at Scale: A Multi-Objective Approach

Sumedh Sakdeo

Senior Staff Software Engineer

Senior Staff Software Engineer LinkedIn

Data Eng & Infrastructure

Optimizing Iceberg Table Layouts at Scale: A Multi-Objective Approach

Jesus Camacho

Principal Engineering Manager

Microsoft

Optimizing Iceberg Table Layouts at Scale: A Multi-Objective Approach

Jesus Camacho

Principal Engineering Manager

Microsoft

Principal Engineering Manager Microsoft

Data Eng & Infrastructure

Optimizing Iceberg Table Layouts at Scale: A Multi-Objective Approach

Ciro Greco

Founder

Bauplan

Python Over Data Lakes: Declarative Environments, Data Management And Other Things With Feathers

Ciro Greco

Founder

Bauplan

Founder Bauplan

Data Sci & Algos

Python Over Data Lakes: Declarative Environments, Data Management And Other Things With Feathers

Joseph Powers

Principal Data Scientist

Intuit

Going Bayes: Shifting Our Testing Methods To Reflect Our Priorities

Joseph Powers

Principal Data Scientist

Intuit

Principal Data Scientist Intuit

Data Sci & Algos

Going Bayes: Shifting Our Testing Methods To Reflect Our Priorities

Marcel Kornacker

Co-Founder & CTO

Pixeltable

Introducing Pixeltable: Open Source Data Infrastructure for Multimodal AI

Marcel Kornacker

Co-Founder & CTO

Pixeltable

Co-Founder & CTO Pixeltable

ML OPs & Platforms

Introducing Pixeltable: Open Source Data Infrastructure for Multimodal AI

Saif Ur-Rehman

Data Engineering Lead

Basecamp Research

Engineering Earth's Largest Biological Data Pipeline

Saif Ur-Rehman

Data Engineering Lead

Basecamp Research

Data Engineering Lead Basecamp Research

Lightning Talks

Engineering Earth's Largest Biological Data Pipeline

Jonathan Jin

Staff Machine Learning Engineer

Hinge

Trimming the Long Tail of Production Model Ownership at Hinge

Jonathan Jin

Staff Machine Learning Engineer

Hinge

Staff Machine Learning Engineer Hinge

ML OPs & Platforms

Trimming the Long Tail of Production Model Ownership at Hinge

Madison Faulkner

Principal & Head of Data Science

NEA

The Future Of Data Engineering: From Unstructured To Structured for Agent Systems

Madison Faulkner

Principal & Head of Data Science

NEA

Principal & Head of Data Science NEA

Data Eng & Infrastructure

The Future Of Data Engineering: From Unstructured To Structured for Agent Systems

Hamilton Ulmer

UI Engineer & Designer

MotherDuck

Instant Preview Mode: Real-Time Feedback to Make SQL Data Exploration Fly

Hamilton Ulmer

UI Engineer & Designer

MotherDuck

UI Engineer & Designer MotherDuck

Analytics & BI

Instant Preview Mode: Real-Time Feedback to Make SQL Data Exploration Fly

Vignesh Chadramohan

Engineering Manager

Doordash

Internals of SlateDB: An Embedded Key-Value Store Built on Object Storage

Vignesh Chadramohan

Engineering Manager

Doordash

Engineering Manager Doordash

ML OPs & Platforms

Internals of SlateDB: An Embedded Key-Value Store Built on Object Storage

Nikita Vemuri

Software Engineer

Anyscale

From Scaling to Observability: Solving Key Challenges for Distributed ML with Ray

Nikita Vemuri

Software Engineer

Anyscale

Software Engineer Anyscale

ML OPs & Platforms

From Scaling to Observability: Solving Key Challenges for Distributed ML with Ray

Ethan Brown

Director, Data & Applied Science

Twitch / AWS

Building an LLM-Powered Analytics Slack Bot at Twitch

Ethan Brown

Director, Data & Applied Science

Twitch / AWS

Director, Data & Applied Science Twitch / AWS

GenAI Applications

Building an LLM-Powered Analytics Slack Bot at Twitch

John Bagnall

Senior Data Product Manager

Matillion

Humanizing Data Architecture: How Design Thinking Transforms Data Strategy

John Bagnall

Senior Data Product Manager

Matillion

Senior Data Product Manager Matillion

Lightning Talks

Humanizing Data Architecture: How Design Thinking Transforms Data Strategy

CL Kao

Founder

Recce

Data Engineering Is Not Software Engineering, Until It Is

CL Kao

Founder

Recce

Founder Recce

Lightning Talks

Data Engineering Is Not Software Engineering, Until It Is

Avi Press

CEO

Scarf

Open Source Success: Learnings from 1 Billion Downloads

Avi Press

CEO

Scarf

CEO Scarf

Lightning Talks

Open Source Success: Learnings from 1 Billion Downloads

Michael Cohen

Global Chief Data & Analytics Officer

Plus Company

The Art of Data: Reimaging Creative Processes with Data Culture

Michael Cohen

Global Chief Data & Analytics Officer

Plus Company

Global Chief Data & Analytics Officer Plus Company

Lightning Talks

The Art of Data: Reimaging Creative Processes with Data Culture

Dylan Perez Neider

Sr. Solutions Engineer

Sigma Computing

Text-to-SQL Is Not the Answer: How to Effectively Use AI For Analytics

Dylan Perez Neider

Sr. Solutions Engineer

Sigma Computing

Sr. Solutions Engineer Sigma Computing

Workshops

Text-to-SQL Is Not the Answer: How to Effectively Use AI For Analytics

Dadi Atar

VP Product

Montara

Analytics and the dark side of the Analytics Development Lifecycle

Dadi Atar

VP Product

Montara

VP Product Montara

Workshops

Analytics and the dark side of the Analytics Development Lifecycle

Sudarsan Lakshmi

Head of Engineering

e6data

Everything Everywhere All at Once: Object Store Native

Sudarsan Lakshmi

Head of Engineering

e6data

Head of Engineering e6data

Workshops

Everything Everywhere All at Once: Object Store Native

Beto Ferreira De Almeida

Staff Engineer

Preset

Data Should be Invisible

Beto Ferreira De Almeida

Staff Engineer

Preset

Staff Engineer Preset

AI & Data Culture

Data Should be Invisible

Josh Curl

Co-Founder & CTO

Hightouch

Bridging the AI Implementation gap: Strategies for Embedding Data Professionals with Business Units

Josh Curl

Co-Founder & CTO

Hightouch

Co-Founder & CTO Hightouch

AI & Data Culture

Bridging the AI Implementation gap: Strategies for Embedding Data Professionals with Business Units

Tomás Kofman

Co-Founder & CEO

Not Diamond

How to Build Your Own Model Router

Tomás Kofman

Co-Founder & CEO

Not Diamond

Co-Founder & CEO Not Diamond

Lightning Talks

How to Build Your Own Model Router

Diptanu Gon Choudhury

Founder& CEO

Tensorlake

The Future Of Data Engineering: From Unstructured To Structured for Agent Systems

Diptanu Gon Choudhury

Founder& CEO

Tensorlake

Founder& CEO Tensorlake

Data Eng & Infrastructure

The Future Of Data Engineering: From Unstructured To Structured for Agent Systems

Gleb Mezhanskiy

Co-Founder & CEO

Datafold

The Future Of Data Engineering: From Unstructured To Structured for Agent Systems

Gleb Mezhanskiy

Co-Founder & CEO

Datafold

Co-Founder & CEO Datafold

Data Eng & Infrastructure

The Future Of Data Engineering: From Unstructured To Structured for Agent Systems

Nathan Sooter

Sr. Manager, RevOps Analytics & Insights

1Password

Go-To-Market Data Enrichment: Practical Strategies to Drive Business Value

Nathan Sooter

Sr. Manager, RevOps Analytics & Insights

1Password

Sr. Manager, RevOps Analytics & Insights 1Password

Analytics & BI

Go-To-Market Data Enrichment: Practical Strategies to Drive Business Value

Margaret Quigley

ex-Cohere Head of Data Acquisition

MQ Consulting

Ethical Data Acquisition & Sales in the AI Age

Margaret Quigley

ex-Cohere Head of Data Acquisition

MQ Consulting

ex-Cohere Head of Data Acquisition MQ Consulting

Lightning Talks

Ethical Data Acquisition & Sales in the AI Age

Jonathan Mortensen

CEO

Confident Security

The Unofficial Guide to Apple’s Private Cloud Compute

Jonathan Mortensen

CEO

Confident Security

CEO Confident Security

ML OPs & Platforms

The Unofficial Guide to Apple’s Private Cloud Compute

Skip Everling

Head of Developer Relations

Kolena

AI-Powered Automation: Supercharge Data-Intensive Workflows with Intelligent Agents

Skip Everling

Head of Developer Relations

Kolena

Head of Developer Relations Kolena

Workshops

AI-Powered Automation: Supercharge Data-Intensive Workflows with Intelligent Agents

Jacob Matson

Developer Advocate

MotherDuck

More Than a Vibe: AI-Driven SQL that Actually Works

Jacob Matson

Developer Advocate

MotherDuck

Developer Advocate MotherDuck

Workshops

More Than a Vibe: AI-Driven SQL that Actually Works

Cole Bowden

Developer Advocate

Firebolt

The Power of Low Latency Data for AI Apps

Cole Bowden

Developer Advocate

Firebolt

Developer Advocate Firebolt

Workshops

The Power of Low Latency Data for AI Apps

Rui Lopes

Head of AI

DataLinks

Powering AI Workflows with Tabular Graphs

Rui Lopes

Head of AI

DataLinks

Head of AI DataLinks

Workshops

Powering AI Workflows with Tabular Graphs

Issac Roth

Co-Founder & CEO

Orama

OramaCore: A Search Database with LLMs Built-In

Issac Roth

Co-Founder & CEO

Orama

Co-Founder & CEO Orama

Lightning Talks

OramaCore: A Search Database with LLMs Built-In

Alexy Khraborov

AI/ML Community Architect

Neo4j

OAKS: Open Agentic Knowledge Stack

Alexy Khraborov

AI/ML Community Architect

Neo4j

AI/ML Community Architect Neo4j

Lightning Talks

OAKS: Open Agentic Knowledge Stack

Anant Agarwal

Staff Software Engineer & Engineering Lead

Instacart

Orchestrating at Scale: How Instacart Manages 20M+ Daily Workflows

Anant Agarwal

Staff Software Engineer & Engineering Lead

Instacart

Staff Software Engineer & Engineering Lead Instacart

Data Eng & Infrastructure

Orchestrating at Scale: How Instacart Manages 20M+ Daily Workflows

Skyler Thomas

Co-Founder & CTO

Cake AI

Make Too Much Knowledge Just Enough. Massive Scale RAG and GraphRAG with Open Source

Skyler Thomas

Co-Founder & CTO

Cake AI

Co-Founder & CTO Cake AI

GenAI Applications

Make Too Much Knowledge Just Enough. Massive Scale RAG and GraphRAG with Open Source

Brenna Buuck

Developer Evangelist

MinIO

The Middle Ground: Balancing Batch and Real-Time Processing in a Data Lakehouse

Brenna Buuck

Developer Evangelist

MinIO

Developer Evangelist MinIO

Lightning Talks

The Middle Ground: Balancing Batch and Real-Time Processing in a Data Lakehouse

Colleen Tartow

Senior Director, Enterprise Data Engineering

Capital One

No More BS: How (and When) to Really Leverage AI

Colleen Tartow

Senior Director, Enterprise Data Engineering

Capital One

Senior Director, Enterprise Data Engineering Capital One

AI & Data Culture

No More BS: How (and When) to Really Leverage AI

Marco Slot

Software Imagineer

Crunchy Data

Converging Database Architectures: DuckDB in PostgreSQL

Marco Slot

Software Imagineer

Crunchy Data

Software Imagineer Crunchy Data

Databases

Converging Database Architectures: DuckDB in PostgreSQL

Anil Sadineni

Principal Software Engineer

1upHealth

A Modern Data Stack in Healthcare

Anil Sadineni

Principal Software Engineer

1upHealth

Principal Software Engineer 1upHealth

Lightning Talks

A Modern Data Stack in Healthcare

Dr. Greg Michaelson

Co-Founder & Chief Product Officer

Zerve AI

Scaling GenAI & Agentic Workflows for practical solutions with Zerve

Dr. Greg Michaelson

Co-Founder & Chief Product Officer

Zerve AI

Co-Founder & Chief Product Officer Zerve AI

Workshops

Scaling GenAI & Agentic Workflows for practical solutions with Zerve

CURATING TRACK SPEAKERS. STAY TUNED.

View all speakers

WHY ATTEND?

Go beyond just conference talks and engage directly with our community.

Expo Hall & Networking
Interactive Workshops
Speaker Office Hours
Drinks & Demos

Get Tickets

Expo Hall & Networking

Discover cutting-edge tools and technologies from innovators at the forefront of AI & data. Explore, connect, and get a firsthand look at what’s next.

Interactive Workshops

Data-Council-2024-Day-2-Tico-Mendoza--4811

Why pay extra to level up your career? Gain practical training on the latest data tools from the architects & builders of the tools themselves. (All workshops Included in Ticket Price)

Speaker Office Hours

Our speakers provide real technical depth and go beyond whitepaper-level details. Office Hours sessions with speakers follow each talk and feature additional in-depth discussion opportunities for attendees.

Drinks & Demos

Data-Council-2024-Day-2-Tico-Mendoza--5855

Rub shoulders with the brightest minds in AI & data. Come to make meaningful connections with startups, customers, peers, investors & more.

AI Launchpad

Join us on Day 1 during our 🥳 Community Party to hear from 6 exceptional AI startups.

Brought to you by Zero Prime Ventures.

Meet the winners for 2025

Data-Council-2024-Day-2-Tico-Mendoza--5882-2

See You in Oakland!

This year, we're excited to call the historic Oakland Scottish Rite Center home to Data Council 2025.

Nestled on Lake Merritt with stunning lake views, this architectural gem puts you steps from downtown's best hotels, dining, and nightlife. Just 15 minutes from BART or a scenic ferry ride from downtown San Francisco.

547 Lakeside Dr, Oakland, CA, 94612

Oakland Scottish Rite Center

The Temple of Data

April 22 - 24, 2025

Lake Merritt, Oakland

PARTY + EVENT GUIDE

Get Tickets

Thanks to Our Sponsors

Gold

Silver

Base

Why Attend Data Council?

Learn from Industry Experts

Get architectural insights and best practices straight from the pioneers building the future of data & AI, no marketing fluff here.

Hands-On Experiences

Put theory into practice through interactive workshops and learning opportunities, such as our unique office hours where you can meet any speaker in a small group setting.

Unparalleled Networking

Get exclusive access and connect with engineers and founders who speak your language. No suits and sales pitches, just real pros sharing their work.

Meet the Hosts

Content quality sets Data Council apart. Unlike other conferences that simply accept abstracts as-is, our track hosts go the extra mile to hand select presentations and collaborate with speakers on their topics to ensure the highest value talks take the stage.

Bryan Bischof

Head of AI

Theory Ventures

Carlos Aguilar

Founder

Hashboard

Daniel Francisco

Director of Product

Meta

Maggie Hays

Community Product Manager

Acryl Data

Roger Magoulas

Principal

Almost Data

Sai Srirampur

Principal Engineer

Clickhouse

Scott Breitenother

Founder

Brooklyn Data

Sean Anderson

Head of Product Marketing

Vectara

Sean Taylor

Data Scientist

OpenAI

Swyx (Shawn) Wang

Co-Host

Latent.Space Podcast

Tristan Zajonc

CEO & Co-Founder

Continual

About Our Tracks

Our carefully curated tracks balance proven technical foundations with emerging data & AI trends. Get real frameworks, techniques and actionable knowledge straight from seasoned practitioners.

Data Eng & Infrastructure

AI Engineering

Data Science & Algos

GenAI Applications

Analytics & BI

MLOps & Platforms

Foundation Models

Databases

AI & Data Culture

Lightning Talks

FAQ

Do you offer group ticket rates?

Yes! We <3 teams at Data Council and offer streamlined packages for groups of 5 or 10 with huge savings of up to 40-50% off regular ticket prices. Best of all, you can purchase them directly with no invoicing or back-and-forth needed with a sales rep. Simply visit our ticketing site to learn more about group rates.

Do you offer startup or non-profit tickets?

Yes, we offer discounts for startups (must have raised <$5M), non-profits, government agencies and academic students & faculty. For startups, please see our ticketing site and for non-profit & academic, please contact community@datacouncil.ai for more information.

When and where will Data Council 2025 be held?

We're excited to bring Data Council back to the Bay Area on Apr 22-24, 2025! The event will be held at the historic Oakland Scottish Rite Center, right off the shores of beautiful Lake Merritt in Oakland, CA.

Are there extra costs to be aware of for attending?

Once you purchase your ticket, all talks, workshops and networking opportunities are available to you as part of the Data Council experience. However, external costs such as travel, lodgings and commuting are your responsibility.

3 DAYS • APRIL 22-24

WHERE DATA MEETS INTELLIGENCE

Speakers From

JOIN YOUR TRIBE

Featured Keynotes

Naveen Rao

Databricks

Denis Yarats

Perplexity

Aaron Katz

Clickhouse

Martin Casado

a16z

Sharon Zhou

Lamini

Michele Catasta

Replit

Jake Brill

OpenAI

Rachad Alao

Meta

Julien Le Dem

Datadog

Joseph Gonzalez

RunLLM & UC Berkeley

Krishnaram Kenthapadi

Oracle Health

George Mathew

Insight Partners

Daniel Olmedilla

LinkedIn

Sumti Jairath

SambaNova Systems

Featured Keynotes

Featured Keynote Speakers

Naveen Rao

Databricks

Denis Yarats

Perplexity

Aaron Katz

Clickhouse

Martin Casado

a16z

Sharon Zhou

Lamini

Michele Catasta

Replit

Jake Brill

OpenAI

Rachad Alao

Meta

Julien Le Dem

Datadog

Joseph Gonzalez

RunLLM & UC Berkeley

Krishnaram Kenthapadi

Oracle Health

George Mathew

Insight Partners

Daniel Olmedilla

LinkedIn

Sumti Jairath

SambaNova Systems

100+ Speakers

100+ Speakers

WHY ATTEND?

Expo Hall & Networking

Interactive Workshops

Speaker Office Hours

Drinks & Demos

AI Launchpad

3-Day Conference Passes

Startup Ticket $799

Regular Ticket $1999

Investor Ticket $4999

See You in Oakland!

🎟️ BIG TEAM = BIG DEAL 💸

The Temple of Data

Thanks to Our Sponsors

Why Attend Data Council?

Startup Ticket
$799

Regular Ticket
$1999

Investor Ticket
$4999