3 DAYS • APRIL 22-24
WHERE DATA MEETS INTELLIGENCE
Experience 3 DAYS of no bullsh*t technical talks & awesome networking with the brightest minds in data & AI in Oakland, CA.
Speakers From



.png?width=160&height=35&name=Group%2082%20(1).png)
.png?width=184&height=44&name=SNO-SnowflakeLogo_white(1).png)

Every AI breakthrough starts with data. We’re the premier technical event spotlighting cutting-edge AI and the data stack that powers it.







JOIN YOUR TRIBE
Our attendees are AI engineers, founders, CTOs, AI researchers, Heads of Data, and investors who are all building the future of data.
Days
Technical Attendees
Deep-Dive Talks
Featured Keynotes

Naveen Rao
VP of AI
Databricks

Denis Yarats
Co-Founder & CTO
Perplexity

Aaron Katz
Co-Founder & CEO
Clickhouse

Martin Casado
General Partner
a16z

Sharon Zhou
Founder & CEO
Lamini

Michele Catasta
President
Replit

Jake Brill
Head of Product - Integrity
OpenAI
.jpeg)
Rachad Alao
Senior Engineering Director
Meta

Julien Le Dem
Principal Engineer
Datadog

Joseph Gonzalez
Professor
RunLLM & UC Berkeley

Krishnaram Kenthapadi
Chief Scientist, Clinical AI
Oracle Health

George Mathew
Managing Director
Insight Partners
.webp)
Daniel Olmedilla
Distinguished Engineer, AI & Trust

Sumti Jairath
Chief Architect
SambaNova Systems
Featured Keynotes

- Keynotes

- Keynotes

- Keynotes
Join us for an authentic fireside chat that cuts through industry hype to explore how real-time data infrastructure is transforming analytics and AI. We'll examine technical hurdles in processing data at scale in real-time and the architectural decisions enabling performance breakthroughs. The discussion covers data processing evolution, performance bottlenecks in distributed systems, observability innovations, and approaches for maintaining consistency while increasing throughput. Gain insights into how real-time processing creates competitive advantages for business intelligence and AI applications, with honest assessments of implementation challenges and practical solutions.

- Keynotes
Join us for an authentic fireside chat that cuts through industry hype to explore how real-time data infrastructure is transforming analytics and AI. We'll examine technical hurdles in processing data at scale in real-time and the architectural decisions enabling performance breakthroughs. The discussion covers data processing evolution, performance bottlenecks in distributed systems, observability innovations, and approaches for maintaining consistency while increasing throughput. Gain insights into how real-time processing creates competitive advantages for business intelligence and AI applications, with honest assessments of implementation challenges and practical solutions.

- Keynotes

- Keynotes

- Keynotes
.jpeg)
- Keynotes

- Keynotes
In this talk, Julien will discuss the impact of the cloud and the advent of the open data lake, breaking silos to form the foundation of this ecosystem. As compute and storage can be efficiently decoupled, a common storage layer enables a vibrant ecosystem of on-demand tools specialized to specific use cases that avoid vendor lock-in. He'll go over the core components, how they work together and more importantly, the contracts that keep them decoupled and composable.
Julien Le Dem is a Principal Engineer at Datadog, serves as an officer of the ASF and is a member of the LFAI&Data Technical Advisory Council. He co-created the Parquet, Arrow and OpenLineage open source projects and is involved in several others. His career leadership began in Data Platforms at Yahoo! - where he received his Hadoop initiation - then continued at Twitter, Dremio and WeWork. He then co-founded Datakin (acquired by Astronomer) to solve Data Observability. His French accent makes his talks particularly attractive.

- GenAI Applications
- Keynotes

- Keynotes
Krishnaram Kenthapadi is the Chief Scientist of Clinical AI at Oracle Health, where he leads AI initiatives for Clinical AI Agent and other Oracle Health products, focusing on modernizing clinical applications, reducing administrative burden for clinicians, and driving healthcare transformation through trustworthy AI. Previously, he was a Principal Scientist at Amazon AWS AI, where he led the fairness, explainability, and privacy initiatives in Amazon AI platform. Until recently, he led similar efforts across different LinkedIn applications as part of the LinkedIn AI team, and served as LinkedIn's representative in Microsoft's AI and Ethics in Engineering and Research (AETHER) Advisory Board. He shaped the technical roadmap and led the privacy/modeling efforts for LinkedIn Salary product, and prior to that, served as the relevance lead for the LinkedIn Careers and Talent Solutions Relevance team, which powers search/recommendation products at the intersection of members, recruiters, and career opportunities. Previously, he was a Researcher at Microsoft Research Silicon Valley, where his work resulted in product impact (and Gold Star / Technology Transfer awards), and several publications/patents. Krishnaram received his Ph.D. in Computer Science from Stanford University in 2006, and his Bachelors in Computer Science from IIT Madras.
He serves regularly on the program committees of KDD, WWW, WSDM, and related conferences, and co-chaired the 2014 ACM Symposium on Computing for Development. He received Microsoft's AI/ML conference (MLADS) distinguished contribution award, NAACL best thematic paper award, CIKM best case studies paper award, SODA best student paper award, and WWW best paper award nomination. He has published 40+ papers, with 2500+ citations and filed 140+ patents (30+ granted). He has presented lectures/tutorials on privacy, fairness, and explainable AI in industry at forums such as KDD '18 '19, WSDM '19, WWW '19, FAccT '20, and AAAI'20 , and instructed a course on AI at Stanford.

- Keynotes
.webp)
- Keynotes

- Keynotes
As enterprises increasingly leverage vast public and private datasets, generative AI and agentic systems are transforming the landscape of AI-driven solutions. These systems demand unparalleled scalability, speed, and efficiency to process massive data volumes while autonomously orchestrating complex workflows. SambaNova Systems offers its revolutionary memory-centric design, engineered to power trillion-parameter models and multi-agent systems with record-breaking interactive inference performance.
This talk will delve into SambaNova’s innovative three-tier memory system and reconfigurable dataflow architecture, which overcome the "memory wall" challenge by enabling seamless switching between hundreds of agents in microseconds. Attendees will explore how these technologies optimize data access, minimize latency, and scale across diverse real-world applications—from real-time decision-making to autonomous multi-agent collaboration—delivering transformative solutions for enterprises worldwide.
Featured Keynote Speakers

Naveen Rao
VP of AI
Databricks

Denis Yarats
Co-Founder & CTO
Perplexity

Aaron Katz
Co-Founder & CEO
Clickhouse

Martin Casado
General Partner
a16z

Sharon Zhou
Founder & CEO
Lamini

Michele Catasta
President
Replit

Jake Brill
Head of Product - Integrity
OpenAI
.jpeg)
Rachad Alao
Senior Engineering Director
Meta

Julien Le Dem
Principal Engineer
Datadog

Joseph Gonzalez
Professor
RunLLM & UC Berkeley

Krishnaram Kenthapadi
Chief Scientist, Clinical AI
Oracle Health

George Mathew
Managing Director
Insight Partners
.webp)
Daniel Olmedilla
Distinguished Engineer, AI & Trust

Sumti Jairath
Chief Architect
SambaNova Systems
100+ Speakers
Learn from data & AI heroes at top companies as they explain their architectures, discoveries and solutions in detail.

- Analytics & BI
Lloyd Tabb, a tech pioneer, revolutionized internet and data usage over 30 years. After working as Borland's database architect, he was Principal Engineer at Netscape during the browser wars and helped found Mozilla.org. He later founded Looker, acquired by Google in 2019, which helped define the Modern Data Stack. Now at Meta, he leads Malloy, an experimental language that reimagines SQL with coding libraries, recently transferred to the Linux Foundation. The project has expanded to support Presto, Trino, Snowflake, and other SQL dialects, while adding features like parameterized sources and visual query builders.

- Data Eng & Infrastructure

- Foundation Models

- Keynotes

- Foundation Models

- Keynotes

- Data Sci & Algos

- Keynotes
Join us for an authentic fireside chat that cuts through industry hype to explore how real-time data infrastructure is transforming analytics and AI. We'll examine technical hurdles in processing data at scale in real-time and the architectural decisions enabling performance breakthroughs. The discussion covers data processing evolution, performance bottlenecks in distributed systems, observability innovations, and approaches for maintaining consistency while increasing throughput. Gain insights into how real-time processing creates competitive advantages for business intelligence and AI applications, with honest assessments of implementation challenges and practical solutions.
Join us for an authentic fireside chat that cuts through industry hype to explore how real-time data infrastructure is transforming analytics and AI. We'll examine technical hurdles in processing data at scale in real-time and the architectural decisions enabling performance breakthroughs. The discussion covers data processing evolution, performance bottlenecks in distributed systems, observability innovations, and approaches for maintaining consistency while increasing throughput. Gain insights into how real-time processing creates competitive advantages for business intelligence and AI applications, with honest assessments of implementation challenges and practical solutions.

- Workshops

- Data Eng & Infrastructure
Ryan Blue is the original creator of Apache Iceberg and currently serves as Vice President of the project. With over a decade in data engineering, he's an established expert in big data formats and infrastructure. Currently a Member of Technical Staff at Databricks since June 2024, Ryan previously co-founded and served as CEO of Tabular until its acquisition by Databricks. His career includes senior positions at Netflix and Cloudera, where he was a technical lead for data formats. An Apache Software Foundation member since 2017, he's a committer in the Apache Parquet, Avro, and Spark communities and previously served as VP of Apache Avro. Ryan holds dual Bachelor's degrees in Mathematics and Computer Science from the University of Idaho and a Master's in Computer Science from the University of Maryland.

- Analytics & BI

- Foundation Models

- Keynotes
Join us for an authentic fireside chat that cuts through industry hype to explore how real-time data infrastructure is transforming analytics and AI. We'll examine technical hurdles in processing data at scale in real-time and the architectural decisions enabling performance breakthroughs. The discussion covers data processing evolution, performance bottlenecks in distributed systems, observability innovations, and approaches for maintaining consistency while increasing throughput. Gain insights into how real-time processing creates competitive advantages for business intelligence and AI applications, with honest assessments of implementation challenges and practical solutions.
Join us for an authentic fireside chat that cuts through industry hype to explore how real-time data infrastructure is transforming analytics and AI. We'll examine technical hurdles in processing data at scale in real-time and the architectural decisions enabling performance breakthroughs. The discussion covers data processing evolution, performance bottlenecks in distributed systems, observability innovations, and approaches for maintaining consistency while increasing throughput. Gain insights into how real-time processing creates competitive advantages for business intelligence and AI applications, with honest assessments of implementation challenges and practical solutions.

- Databases

- Foundation Models
Ethan Rosenthal is a Member of Technical Staff at Runway, an applied AI research company focused on multimedia content creation, where he builds engineering systems to accelerate the work of research scientists. His career spans diverse roles across AI, machine learning, and data science - from training language models at Square to developing recommendation systems at seed-stage ecommerce startups. Before working in tech, Ethan was an actual scientist and got his PhD in experimental physics from Columbia University.

- AI Engineering

- AI Engineering

- AI Engineering

- AI Engineering

- Keynotes

- Analytics & BI
Mike has spent over two decades as a technologist, entrepreneur, and investor. He’s currently co-founder and CEO of Rill, a cloud service for operational intelligence. Previously he founded Metamarkets (acquired by Snap, Inc. in 2017), a real-time analytics platform for digital ad firms, and CustomInk.com a leader in custom apparel online. Mike was also a founding partner at the venture capital firm DCVC, which has invested over $2B+ in assets in deep tech. He began his career as a software engineer for the Human Genome Project and later received a Ph.D. in computational biology.

- Data Sci & Algos

- Data Eng & Infrastructure

- Keynotes

- Keynotes
.jpeg)
- Keynotes

- Keynotes
In this talk, Julien will discuss the impact of the cloud and the advent of the open data lake, breaking silos to form the foundation of this ecosystem. As compute and storage can be efficiently decoupled, a common storage layer enables a vibrant ecosystem of on-demand tools specialized to specific use cases that avoid vendor lock-in. He'll go over the core components, how they work together and more importantly, the contracts that keep them decoupled and composable.
In this talk, Julien will discuss the impact of the cloud and the advent of the open data lake, breaking silos to form the foundation of this ecosystem. As compute and storage can be efficiently decoupled, a common storage layer enables a vibrant ecosystem of on-demand tools specialized to specific use cases that avoid vendor lock-in. He'll go over the core components, how they work together and more importantly, the contracts that keep them decoupled and composable.
Julien Le Dem is a Principal Engineer at Datadog, serves as an officer of the ASF and is a member of the LFAI&Data Technical Advisory Council. He co-created the Parquet, Arrow and OpenLineage open source projects and is involved in several others. His career leadership began in Data Platforms at Yahoo! - where he received his Hadoop initiation - then continued at Twitter, Dremio and WeWork. He then co-founded Datakin (acquired by Astronomer) to solve Data Observability. His French accent makes his talks particularly attractive.

- GenAI Applications
- Keynotes

- Keynotes
Krishnaram Kenthapadi is the Chief Scientist of Clinical AI at Oracle Health, where he leads AI initiatives for Clinical AI Agent and other Oracle Health products, focusing on modernizing clinical applications, reducing administrative burden for clinicians, and driving healthcare transformation through trustworthy AI. Previously, he was a Principal Scientist at Amazon AWS AI, where he led the fairness, explainability, and privacy initiatives in Amazon AI platform. Until recently, he led similar efforts across different LinkedIn applications as part of the LinkedIn AI team, and served as LinkedIn's representative in Microsoft's AI and Ethics in Engineering and Research (AETHER) Advisory Board. He shaped the technical roadmap and led the privacy/modeling efforts for LinkedIn Salary product, and prior to that, served as the relevance lead for the LinkedIn Careers and Talent Solutions Relevance team, which powers search/recommendation products at the intersection of members, recruiters, and career opportunities. Previously, he was a Researcher at Microsoft Research Silicon Valley, where his work resulted in product impact (and Gold Star / Technology Transfer awards), and several publications/patents. Krishnaram received his Ph.D. in Computer Science from Stanford University in 2006, and his Bachelors in Computer Science from IIT Madras.
He serves regularly on the program committees of KDD, WWW, WSDM, and related conferences, and co-chaired the 2014 ACM Symposium on Computing for Development. He received Microsoft's AI/ML conference (MLADS) distinguished contribution award, NAACL best thematic paper award, CIKM best case studies paper award, SODA best student paper award, and WWW best paper award nomination. He has published 40+ papers, with 2500+ citations and filed 140+ patents (30+ granted). He has presented lectures/tutorials on privacy, fairness, and explainable AI in industry at forums such as KDD '18 '19, WSDM '19, WWW '19, FAccT '20, and AAAI'20 , and instructed a course on AI at Stanford.

- Keynotes
.webp)
- Keynotes

- Keynotes
As enterprises increasingly leverage vast public and private datasets, generative AI and agentic systems are transforming the landscape of AI-driven solutions. These systems demand unparalleled scalability, speed, and efficiency to process massive data volumes while autonomously orchestrating complex workflows. SambaNova Systems offers its revolutionary memory-centric design, engineered to power trillion-parameter models and multi-agent systems with record-breaking interactive inference performance.
This talk will delve into SambaNova’s innovative three-tier memory system and reconfigurable dataflow architecture, which overcome the "memory wall" challenge by enabling seamless switching between hundreds of agents in microseconds. Attendees will explore how these technologies optimize data access, minimize latency, and scale across diverse real-world applications—from real-time decision-making to autonomous multi-agent collaboration—delivering transformative solutions for enterprises worldwide.
As enterprises increasingly leverage vast public and private datasets, generative AI and agentic systems are transforming the landscape of AI-driven solutions. These systems demand unparalleled scalability, speed, and efficiency to process massive data volumes while autonomously orchestrating complex workflows. SambaNova Systems offers its revolutionary memory-centric design, engineered to power trillion-parameter models and multi-agent systems with record-breaking interactive inference performance.
This talk will delve into SambaNova’s innovative three-tier memory system and reconfigurable dataflow architecture, which overcome the "memory wall" challenge by enabling seamless switching between hundreds of agents in microseconds. Attendees will explore how these technologies optimize data access, minimize latency, and scale across diverse real-world applications—from real-time decision-making to autonomous multi-agent collaboration—delivering transformative solutions for enterprises worldwide.

- Foundation Models
Han-chung Lee is a machine learning expert who builds and operates AI systems with a focus on GenAI, LLM agents, and recommendation engines. Currently Director of Machine Learning at Moody's Analytics and founder of Calabazas Creek, Han-chung excels at untangling complex code and organizations. A Berkeley EECS grad with an MBA from SJSU, he shares insights on ML engineering and tech investing drawn from his diverse experience across sell-side, buy-side, and venture capital. Follow his practical wisdom and occasional industry reflections.

- Analytics & BI
Julian Hyde is the original developer of Apache Calcite, which provides SQL parsers and query optimizers for dozens of products, and Morel, a new functional query language. Previously he led the query processing team at Looker (acquired by Google in 2020), and co-founded SQLstream, an engine for continuous queries. He left Google in early 2025 to create the next language for data.

- Lightning Talks

- Workshops

- Databases

- Foundation Models

- GenAI Applications

- AI & Data Culture

- Databases

- Workshops

- Databases

- AI Engineering

- Workshops
Modern data lakes promise affordability and scalability, but using them can be a headache. Cloud data warehouses make querying easy, but they come with a hefty price tag and extra complexity. What if you could get the same ease of use without the cost and lock-in?
In this session, we’ll show you how to leverage open-source software to build a fully functional, queryable analytics powerhouse using DuckDB, Fivetran, and Polaris Catalog. We’ll walk through how to:
1. Load data that is automatically converted to Iceberg open table format
2. Run SQL queries using DuckDB’s new Iceberg extension
3. Run transformations directly on data stored in your data lake with a new dbt adapter
4. Get started easily with a practical, hands-on demo
No vendor lock-in, no unnecessary complexity—just an open-source-powered approach to enabling advanced analytics and AI. If your data warehouse is holding you back or eating away at your budget, this session is for you!
Modern data lakes promise affordability and scalability, but using them can be a headache. Cloud data warehouses make querying easy, but they come with a hefty price tag and extra complexity. What if you could get the same ease of use without the cost and lock-in?
In this session, we’ll show you how to leverage open-source software to build a fully functional, queryable analytics powerhouse using DuckDB, Fivetran, and Polaris Catalog. We’ll walk through how to:
1. Load data that is automatically converted to Iceberg open table format
2. Run SQL queries using DuckDB’s new Iceberg extension
3. Run transformations directly on data stored in your data lake with a new dbt adapter
4. Get started easily with a practical, hands-on demo
No vendor lock-in, no unnecessary complexity—just an open-source-powered approach to enabling advanced analytics and AI. If your data warehouse is holding you back or eating away at your budget, this session is for you!
George Fraser is the CEO and co-founder of Fivetran. George founded the company with COO Taylor Brown in 2012 after completing the prestigious Y-Combinator accelerator program. Since then he has built Fivetran, a fully managed automated data integration provider, from an idea to a rapidly growing global business valued at $5.6 billion, supported by a global team of 1000+ employees.

- Data Eng & Infrastructure

- Workshops

- GenAI Applications

- Workshops

- Databases

- AI Engineering

- Data Sci & Algos

- Workshops
Ori Soen is a serial entrepreneur and current Founder/CEO of Montara Inc., creating the first unified DataOps platform for data development. With a track record of building, scaling, and successfully exiting multiple startups, Ori previously served as EVP at Medallia (NYSE: MDLA), where he helped take the company public. At Medallia, he led the Digital Products Business Unit, mid-market sales, served as CMO, and joined through the acquisition of Kampyle, where as CEO he built the company into a market leader. Earlier, he was CMO and Head of Product at Jajah (acquired by Telefonica).

- Databases

- Workshops
Modern data workloads demand fast, interactive, and scalable visualization—without the cost and complexity of server-side rendering. The local-first approach leverages modern browser capabilities, WebAssembly, and in-browser computation to achieve high-performance analytics while reducing cloud costs.
In this workshop, we’ll explore:
1. Why Local-First? The benefits of running everything client-side for cost-efficient, scalable visualization across thousands of users.
WebAssembly (WASM) for Data Apps: How Perspective harnesses WASM to power ultra-fast, browser-native analytics and even replace traditional Docker-based containers for data workloads.
2. Perspective + DuckDB: A full in-browser analytics stack that enables high-speed querying and visualization without a backend.
Streaming Data with InfluxDB: How to visualize high-frequency, real-time IoT and log data with sub-second latency.
Databricks + Perspective: Enhancing large-scale analytics with interactive dashboards inside Jupyter notebooks.
Through live coding and guided exercises, attendees will build their own browser-native analytics dashboards, connect to real-time data streams, and learn Perspective’s API in Python, Node.js, and Rust.
Difficulty level - Intermediate – Some experience with Python, JavaScript, and data analytics will be helpful, but beginners can follow along with guided exercises.
To participate, bring a laptop with:
Git
VS Code
Docker
Python (3.8+)
Node.js (16+)
Modern data workloads demand fast, interactive, and scalable visualization—without the cost and complexity of server-side rendering. The local-first approach leverages modern browser capabilities, WebAssembly, and in-browser computation to achieve high-performance analytics while reducing cloud costs.
In this workshop, we’ll explore:
1. Why Local-First? The benefits of running everything client-side for cost-efficient, scalable visualization across thousands of users.
WebAssembly (WASM) for Data Apps: How Perspective harnesses WASM to power ultra-fast, browser-native analytics and even replace traditional Docker-based containers for data workloads.
2. Perspective + DuckDB: A full in-browser analytics stack that enables high-speed querying and visualization without a backend.
Streaming Data with InfluxDB: How to visualize high-frequency, real-time IoT and log data with sub-second latency.
Databricks + Perspective: Enhancing large-scale analytics with interactive dashboards inside Jupyter notebooks.
Through live coding and guided exercises, attendees will build their own browser-native analytics dashboards, connect to real-time data streams, and learn Perspective’s API in Python, Node.js, and Rust.
Difficulty level - Intermediate – Some experience with Python, JavaScript, and data analytics will be helpful, but beginners can follow along with guided exercises.
To participate, bring a laptop with:
Git
VS Code
Docker
Python (3.8+)
Node.js (16+)

- Data Sci & Algos

- AI & Data Culture

- AI & Data Culture

- Lightning Talks
Not all AI agent use cases are created equal. While code generation agents can be tested against clear benchmarks, operational agents tackling real-world problems face a fundamentally different challenge: how do you evaluate an agent that must navigate complex, dynamic systems without a predefined playbook? Take root cause analysis in distributed systems: an agent must understand intricate service dependencies, parse through inconsistent logs, and reason about potential failure modes. Unlike coding tasks with definitive right answers, these scenarios have no ground truth. Traditional testing approaches break down completely. This talk breaks down our approach to building a deterministic simulation environment that generates and tests realistic failure scenarios at scale. We'll expose why existing evaluation methods fail—from infrastructure mimicry to LLM-generated tests—and demonstrate a lightweight simulation technique that enables precise, reproducible agent testing.
Not all AI agent use cases are created equal. While code generation agents can be tested against clear benchmarks, operational agents tackling real-world problems face a fundamentally different challenge: how do you evaluate an agent that must navigate complex, dynamic systems without a predefined playbook? Take root cause analysis in distributed systems: an agent must understand intricate service dependencies, parse through inconsistent logs, and reason about potential failure modes. Unlike coding tasks with definitive right answers, these scenarios have no ground truth. Traditional testing approaches break down completely. This talk breaks down our approach to building a deterministic simulation environment that generates and tests realistic failure scenarios at scale. We'll expose why existing evaluation methods fail—from infrastructure mimicry to LLM-generated tests—and demonstrate a lightweight simulation technique that enables precise, reproducible agent testing.

- GenAI Applications
Mitul Tiwari is CTO and Co-founder of a stealth AI startup. Until recently he was a Director of AI and Machine Learning Engineering at ServiceNow leading natural language technologies group. Earlier he was CTO and Co-founder of Passage AI (acquired by Servicenow). His expertise lies in building data-driven products using AI, Machine Learning and big data technologies. Previously he was head of People You May Know and Growth Relevance at LinkedIn, where he led technical innovations in large-scale social recommender systems. Prior to that, he worked at Kosmix (now Walmart Labs) on web-scale text categorization, and its applications. He earned his PhD in Computer Science from the University of Texas at Austin and his undergraduate degree from the Indian Institute of Technology, Bombay. He has also co-authored more than twenty publications in top conferences such as ACL, AAAI, KDD, WWW, RecSys, VLDB, SIGIR, CIKM, and SPAA.

- Data Sci & Algos

- GenAI Applications

- Workshops

- AI & Data Culture

- ML OPs & Platforms

- Lightning Talks

- AI Engineering

- Data Eng & Infrastructure

- Data Eng & Infrastructure

- Data Eng & Infrastructure

- Data Eng & Infrastructure

- Data Sci & Algos
Ciro Greco is the Founder and CEO of bauplan, a zero-copy, data-first FaaS platform launched in 2023. He holds a Ph.D. in Experimental Psychology, Linguistics and Neuroscience from the University of Milano-Bicocca. Previously, he co-founded Tooso, an AI-powered commerce search company acquired by Coveo in 2019, where he later served as VP of Artificial Intelligence. His expertise spans AI, linguistics, and data engineering, with a focus on making cloud data pipelines more efficient.

- Data Sci & Algos

- ML OPs & Platforms
Marcel Kornacker is currently CTO and Co-Founder at Pixeltable and is notably known as the founder of Apache Impala and co-founder of Apache Parquet. He holds a Ph.D. in Computer Science from UC Berkeley, where he studied databases under Joe Hellerstein. His career includes founding several startups, including Blink Computing, which provided data lake analytics as a service. He has also served as an Entrepreneur in Residence at both Sutter Hill Ventures and Coatue Management. Kornacker's expertise spans databases, data analytics, and open-source technologies.

- Lightning Talks

- ML OPs & Platforms

- Data Eng & Infrastructure

- Analytics & BI

- ML OPs & Platforms

- ML OPs & Platforms

- GenAI Applications

- Lightning Talks

- Lightning Talks

- Lightning Talks

- Lightning Talks
Michael is a globally recognized leader in AI and data-driven business transformation, known for developing AIOS, an intelligent marketing system at Plus that optimizes customer journey analysis and marketing ROI. A former NYU Stern marketing professor, his research spans predictive computation, marketing strategies, and consumer behavior. He has launched successful AI products focused on user safety, privacy, and marketing optimization. His work on explainable prediction and data synthesis, particularly with incomplete data, has been featured in major publications like LA Times and AdAge, and he frequently speaks at prestigious events including The Nantucket Project and ANA Masters of Marketing.

- Workshops

- Workshops

- Workshops

- AI & Data Culture
The modern data landscape is dominated by complexity: tables, schemas, pipelines, warehouses, and more. Yet the most successful data platforms share a common principle—they make data itself invisible to the end user. When data infrastructure functions optimally, it's like good plumbing: you only notice it when something breaks. Organizations often fixate on the mechanics of data while losing sight of what truly matters: metrics, dimensions, and semantics. When users engage with meaningful abstractions rather than technical details, they make better decisions faster. In this talk, you'll learn strategies for making data invisible through real-world abstraction success stories, designing effortless interactions, and implementing governance through abstraction. Walk away with practical ways to assess your data stack, advocate for user-centric approaches, and measure progress—making your data platform not just powerful, but invisible in all the right ways.
The modern data landscape is dominated by complexity: tables, schemas, pipelines, warehouses, and more. Yet the most successful data platforms share a common principle—they make data itself invisible to the end user. When data infrastructure functions optimally, it's like good plumbing: you only notice it when something breaks. Organizations often fixate on the mechanics of data while losing sight of what truly matters: metrics, dimensions, and semantics. When users engage with meaningful abstractions rather than technical details, they make better decisions faster. In this talk, you'll learn strategies for making data invisible through real-world abstraction success stories, designing effortless interactions, and implementing governance through abstraction. Walk away with practical ways to assess your data stack, advocate for user-centric approaches, and measure progress—making your data platform not just powerful, but invisible in all the right ways.

- AI & Data Culture
At the foundation of AI project failures lies a critical gap between data teams and business reality. On top of this gap, data quality issues, unexpected privacy concerns, and tools that don't align with actual business problems arise to hinder or block implementation. As we've built our own AI product—AI Decisioning—and implemented it with customers, we've learned that successful AI implementations depend on embedding data teams within business units. Embedding doesn't mean breaking apart your data team and dispersing it throughout every other department. It means establishing focused partnerships where data team members are deeply integrated into business teams' daily workflows and decision-making processes while remaining connected to the central data organization. This embedding creates a virtuous cycle: data teams gain deep domain knowledge, business professionals see improved data quality and gain data facility, and together, data and business teams implement AI solutions that solve real problems. In this talk, we'll share concrete examples of how data teams (especially data scientists) and marketing have worked together in successful AI Decisioning implementations. We’ll derive strategies to implement this organizational pattern and enable a company to move from analytics to actions and from data teams as service providers to active collaborators. While our case studies focus primarily on marketing partnerships, the embedded partnership model we present applies equally to other business functions including product development, operations, and customer service teams.
At the foundation of AI project failures lies a critical gap between data teams and business reality. On top of this gap, data quality issues, unexpected privacy concerns, and tools that don't align with actual business problems arise to hinder or block implementation. As we've built our own AI product—AI Decisioning—and implemented it with customers, we've learned that successful AI implementations depend on embedding data teams within business units. Embedding doesn't mean breaking apart your data team and dispersing it throughout every other department. It means establishing focused partnerships where data team members are deeply integrated into business teams' daily workflows and decision-making processes while remaining connected to the central data organization. This embedding creates a virtuous cycle: data teams gain deep domain knowledge, business professionals see improved data quality and gain data facility, and together, data and business teams implement AI solutions that solve real problems. In this talk, we'll share concrete examples of how data teams (especially data scientists) and marketing have worked together in successful AI Decisioning implementations. We’ll derive strategies to implement this organizational pattern and enable a company to move from analytics to actions and from data teams as service providers to active collaborators. While our case studies focus primarily on marketing partnerships, the embedded partnership model we present applies equally to other business functions including product development, operations, and customer service teams.

- Lightning Talks
Building Cost-Effective LLM Routers: Boost Accuracy 25% While Cutting Costs 90% | This session reveals how to build intelligent model routers that dynamically direct inputs to the optimal large language model (LLM) for each specific task. Attendees will learn practical implementation strategies for multi-model LLM systems that significantly improve performance metrics—achieving up to 25% higher accuracy while reducing operational costs by as much as 90%. The presentation covers essential routing methodologies, evaluation frameworks, and scalable architectures for production deployments. Developers and ML engineers will gain actionable insights for overcoming technical challenges in multi-model LLM systems, optimizing both performance and cost-efficiency in generative AI applications. Perfect for teams looking to maximize ROI from their AI infrastructure while maintaining high-quality outputs.
Building Cost-Effective LLM Routers: Boost Accuracy 25% While Cutting Costs 90% | This session reveals how to build intelligent model routers that dynamically direct inputs to the optimal large language model (LLM) for each specific task. Attendees will learn practical implementation strategies for multi-model LLM systems that significantly improve performance metrics—achieving up to 25% higher accuracy while reducing operational costs by as much as 90%. The presentation covers essential routing methodologies, evaluation frameworks, and scalable architectures for production deployments. Developers and ML engineers will gain actionable insights for overcoming technical challenges in multi-model LLM systems, optimizing both performance and cost-efficiency in generative AI applications. Perfect for teams looking to maximize ROI from their AI infrastructure while maintaining high-quality outputs.

- Data Eng & Infrastructure

- Data Eng & Infrastructure

- Analytics & BI

- Lightning Talks

- ML OPs & Platforms

- Workshops

- Workshops
1. Key Components of the Workshop: Dataset Handling: Participants will work with a spatial dataset to evaluate potential locations for opening a new BBQ restaurant. Thanks to MotherDuck, the dataset is easily brought down locally in a highly compressed format, ensuring a quick and safe environment for experimentation.
2. Live AI-Assisted Coding: The workshop will feature a live demonstration where an AI tool iteratively generates SQL queries. Rather than pre-defining metrics, the AI assists in exploring and defining the spatial parameters necessary to identify the optimal restaurant location—a process that mirrors real-world, dynamic decision-making.
3. Real-Time Data Visualization: As queries are refined and executed, Python will be used to chart the results on the fly. Utilizing uv for environment management alongside visualization libraries such as Seaborn and Matplotlib, participants will see how spatial insights are translated into clear, actionable charts.
4. Iterative, Low-Risk Workflow: The session emphasizes a low-risk, experimental approach. If the AI-generated code isn't perfect, no harm is done—files can be quickly deleted or corrected, encouraging a creative, hands-on learning environment where trial and error lead to deeper understanding.
1. Key Components of the Workshop: Dataset Handling: Participants will work with a spatial dataset to evaluate potential locations for opening a new BBQ restaurant. Thanks to MotherDuck, the dataset is easily brought down locally in a highly compressed format, ensuring a quick and safe environment for experimentation.
2. Live AI-Assisted Coding: The workshop will feature a live demonstration where an AI tool iteratively generates SQL queries. Rather than pre-defining metrics, the AI assists in exploring and defining the spatial parameters necessary to identify the optimal restaurant location—a process that mirrors real-world, dynamic decision-making.
3. Real-Time Data Visualization: As queries are refined and executed, Python will be used to chart the results on the fly. Utilizing uv for environment management alongside visualization libraries such as Seaborn and Matplotlib, participants will see how spatial insights are translated into clear, actionable charts.
4. Iterative, Low-Risk Workflow: The session emphasizes a low-risk, experimental approach. If the AI-generated code isn't perfect, no harm is done—files can be quickly deleted or corrected, encouraging a creative, hands-on learning environment where trial and error lead to deeper understanding.

- Workshops

- Workshops

- Lightning Talks

- Lightning Talks
The first two years of the GenAI revolution are bending the OSS way: Open Source models have reached state of the art, and most of the ecosystem around AI is open-source. The key to AI adoption is properly organizing and using business knowledge. In industry, LLMs give way to Small Specialized Models (SSMs), utilized by Domain Expert Agents (DXAs). Their work should be structured according to the domain requirements, requiring structured output. Organizing and using domain knowledge for AI has long been a domain of Knowledge Graphs (KG). At Neo4j, we are in a moment where our KG leadership powers the rise of GraphRAG, a better context traversal that we lead alongside Microsoft, Amazon, Google, and other GenAI partners. We also integrate with many OSS AI startups to build a better AI stack around GraphRAG. Neo4j has joined LFAI to bridge the enterprise AI adoption with startup innovation, centered around structured knowledge. In this talk we describe OAKS, a set of projects, communities, and technologies that comprise the Open Agentic AI Knowledge Stack. We show where the most value will be created and how the OSS AI ecosystems come together to build and deliver it.
OAKS consists of structured input, knowledge transformation, and structured output. We show the Agentic AI architectures emerging around AI memory, graph-based agentic workflows, and frameworks including scalable message passing, knowledge encapsulation, and colocated knowledge and computation for web-scale routing. We invite the community to join us!
The first two years of the GenAI revolution are bending the OSS way: Open Source models have reached state of the art, and most of the ecosystem around AI is open-source. The key to AI adoption is properly organizing and using business knowledge. In industry, LLMs give way to Small Specialized Models (SSMs), utilized by Domain Expert Agents (DXAs). Their work should be structured according to the domain requirements, requiring structured output. Organizing and using domain knowledge for AI has long been a domain of Knowledge Graphs (KG). At Neo4j, we are in a moment where our KG leadership powers the rise of GraphRAG, a better context traversal that we lead alongside Microsoft, Amazon, Google, and other GenAI partners. We also integrate with many OSS AI startups to build a better AI stack around GraphRAG. Neo4j has joined LFAI to bridge the enterprise AI adoption with startup innovation, centered around structured knowledge. In this talk we describe OAKS, a set of projects, communities, and technologies that comprise the Open Agentic AI Knowledge Stack. We show where the most value will be created and how the OSS AI ecosystems come together to build and deliver it.
OAKS consists of structured input, knowledge transformation, and structured output. We show the Agentic AI architectures emerging around AI memory, graph-based agentic workflows, and frameworks including scalable message passing, knowledge encapsulation, and colocated knowledge and computation for web-scale routing. We invite the community to join us!

- Data Eng & Infrastructure
Building High-Throughput Data Orchestration: Instacart's Journey to 20M Daily Workflows | Explore how Instacart built an enterprise-grade orchestration system handling 20 million daily workflows across diverse technical domains. Learn implementation details of their cloud-native platform combining Apache Airflow and Temporal for robust scheduling and execution. Deep dive into YAML-based workflow definitions, GitOps deployment patterns, and observability solutions that enable reliable scaling. Practical insights from years of production experience, applicable to both startups and enterprises building scalable data infrastructure.
Building High-Throughput Data Orchestration: Instacart's Journey to 20M Daily Workflows | Explore how Instacart built an enterprise-grade orchestration system handling 20 million daily workflows across diverse technical domains. Learn implementation details of their cloud-native platform combining Apache Airflow and Temporal for robust scheduling and execution. Deep dive into YAML-based workflow definitions, GitOps deployment patterns, and observability solutions that enable reliable scaling. Practical insights from years of production experience, applicable to both startups and enterprises building scalable data infrastructure.

- GenAI Applications

- Lightning Talks

- AI & Data Culture

- Databases

- Lightning Talks

- Workshops
100+ Speakers
Learn from data & AI heroes at top companies as they explain their architectures, discoveries and solutions in detail.

- Analytics & BI
Lloyd Tabb, a tech pioneer, revolutionized internet and data usage over 30 years. After working as Borland's database architect, he was Principal Engineer at Netscape during the browser wars and helped found Mozilla.org. He later founded Looker, acquired by Google in 2019, which helped define the Modern Data Stack. Now at Meta, he leads Malloy, an experimental language that reimagines SQL with coding libraries, recently transferred to the Linux Foundation. The project has expanded to support Presto, Trino, Snowflake, and other SQL dialects, while adding features like parameterized sources and visual query builders.

- Data Eng & Infrastructure

- Foundation Models

- Keynotes

- Foundation Models

- Keynotes

- Data Sci & Algos

- Keynotes
Join us for an authentic fireside chat that cuts through industry hype to explore how real-time data infrastructure is transforming analytics and AI. We'll examine technical hurdles in processing data at scale in real-time and the architectural decisions enabling performance breakthroughs. The discussion covers data processing evolution, performance bottlenecks in distributed systems, observability innovations, and approaches for maintaining consistency while increasing throughput. Gain insights into how real-time processing creates competitive advantages for business intelligence and AI applications, with honest assessments of implementation challenges and practical solutions.
Join us for an authentic fireside chat that cuts through industry hype to explore how real-time data infrastructure is transforming analytics and AI. We'll examine technical hurdles in processing data at scale in real-time and the architectural decisions enabling performance breakthroughs. The discussion covers data processing evolution, performance bottlenecks in distributed systems, observability innovations, and approaches for maintaining consistency while increasing throughput. Gain insights into how real-time processing creates competitive advantages for business intelligence and AI applications, with honest assessments of implementation challenges and practical solutions.

- Workshops

- Data Eng & Infrastructure
Ryan Blue is the original creator of Apache Iceberg and currently serves as Vice President of the project. With over a decade in data engineering, he's an established expert in big data formats and infrastructure. Currently a Member of Technical Staff at Databricks since June 2024, Ryan previously co-founded and served as CEO of Tabular until its acquisition by Databricks. His career includes senior positions at Netflix and Cloudera, where he was a technical lead for data formats. An Apache Software Foundation member since 2017, he's a committer in the Apache Parquet, Avro, and Spark communities and previously served as VP of Apache Avro. Ryan holds dual Bachelor's degrees in Mathematics and Computer Science from the University of Idaho and a Master's in Computer Science from the University of Maryland.

- Analytics & BI

- Foundation Models

- Keynotes
Join us for an authentic fireside chat that cuts through industry hype to explore how real-time data infrastructure is transforming analytics and AI. We'll examine technical hurdles in processing data at scale in real-time and the architectural decisions enabling performance breakthroughs. The discussion covers data processing evolution, performance bottlenecks in distributed systems, observability innovations, and approaches for maintaining consistency while increasing throughput. Gain insights into how real-time processing creates competitive advantages for business intelligence and AI applications, with honest assessments of implementation challenges and practical solutions.
Join us for an authentic fireside chat that cuts through industry hype to explore how real-time data infrastructure is transforming analytics and AI. We'll examine technical hurdles in processing data at scale in real-time and the architectural decisions enabling performance breakthroughs. The discussion covers data processing evolution, performance bottlenecks in distributed systems, observability innovations, and approaches for maintaining consistency while increasing throughput. Gain insights into how real-time processing creates competitive advantages for business intelligence and AI applications, with honest assessments of implementation challenges and practical solutions.

- Databases

- Foundation Models
Ethan Rosenthal is a Member of Technical Staff at Runway, an applied AI research company focused on multimedia content creation, where he builds engineering systems to accelerate the work of research scientists. His career spans diverse roles across AI, machine learning, and data science - from training language models at Square to developing recommendation systems at seed-stage ecommerce startups. Before working in tech, Ethan was an actual scientist and got his PhD in experimental physics from Columbia University.

- AI Engineering

- AI Engineering

- AI Engineering

- AI Engineering

- Keynotes

- Analytics & BI
Mike has spent over two decades as a technologist, entrepreneur, and investor. He’s currently co-founder and CEO of Rill, a cloud service for operational intelligence. Previously he founded Metamarkets (acquired by Snap, Inc. in 2017), a real-time analytics platform for digital ad firms, and CustomInk.com a leader in custom apparel online. Mike was also a founding partner at the venture capital firm DCVC, which has invested over $2B+ in assets in deep tech. He began his career as a software engineer for the Human Genome Project and later received a Ph.D. in computational biology.

- Data Sci & Algos

- Data Eng & Infrastructure

- Keynotes

- Keynotes
.jpeg)
- Keynotes

- Keynotes
In this talk, Julien will discuss the impact of the cloud and the advent of the open data lake, breaking silos to form the foundation of this ecosystem. As compute and storage can be efficiently decoupled, a common storage layer enables a vibrant ecosystem of on-demand tools specialized to specific use cases that avoid vendor lock-in. He'll go over the core components, how they work together and more importantly, the contracts that keep them decoupled and composable.
In this talk, Julien will discuss the impact of the cloud and the advent of the open data lake, breaking silos to form the foundation of this ecosystem. As compute and storage can be efficiently decoupled, a common storage layer enables a vibrant ecosystem of on-demand tools specialized to specific use cases that avoid vendor lock-in. He'll go over the core components, how they work together and more importantly, the contracts that keep them decoupled and composable.
Julien Le Dem is a Principal Engineer at Datadog, serves as an officer of the ASF and is a member of the LFAI&Data Technical Advisory Council. He co-created the Parquet, Arrow and OpenLineage open source projects and is involved in several others. His career leadership began in Data Platforms at Yahoo! - where he received his Hadoop initiation - then continued at Twitter, Dremio and WeWork. He then co-founded Datakin (acquired by Astronomer) to solve Data Observability. His French accent makes his talks particularly attractive.

- GenAI Applications
- Keynotes

- Keynotes
Krishnaram Kenthapadi is the Chief Scientist of Clinical AI at Oracle Health, where he leads AI initiatives for Clinical AI Agent and other Oracle Health products, focusing on modernizing clinical applications, reducing administrative burden for clinicians, and driving healthcare transformation through trustworthy AI. Previously, he was a Principal Scientist at Amazon AWS AI, where he led the fairness, explainability, and privacy initiatives in Amazon AI platform. Until recently, he led similar efforts across different LinkedIn applications as part of the LinkedIn AI team, and served as LinkedIn's representative in Microsoft's AI and Ethics in Engineering and Research (AETHER) Advisory Board. He shaped the technical roadmap and led the privacy/modeling efforts for LinkedIn Salary product, and prior to that, served as the relevance lead for the LinkedIn Careers and Talent Solutions Relevance team, which powers search/recommendation products at the intersection of members, recruiters, and career opportunities. Previously, he was a Researcher at Microsoft Research Silicon Valley, where his work resulted in product impact (and Gold Star / Technology Transfer awards), and several publications/patents. Krishnaram received his Ph.D. in Computer Science from Stanford University in 2006, and his Bachelors in Computer Science from IIT Madras.
He serves regularly on the program committees of KDD, WWW, WSDM, and related conferences, and co-chaired the 2014 ACM Symposium on Computing for Development. He received Microsoft's AI/ML conference (MLADS) distinguished contribution award, NAACL best thematic paper award, CIKM best case studies paper award, SODA best student paper award, and WWW best paper award nomination. He has published 40+ papers, with 2500+ citations and filed 140+ patents (30+ granted). He has presented lectures/tutorials on privacy, fairness, and explainable AI in industry at forums such as KDD '18 '19, WSDM '19, WWW '19, FAccT '20, and AAAI'20 , and instructed a course on AI at Stanford.

- Keynotes
.webp)
- Keynotes

- Keynotes
As enterprises increasingly leverage vast public and private datasets, generative AI and agentic systems are transforming the landscape of AI-driven solutions. These systems demand unparalleled scalability, speed, and efficiency to process massive data volumes while autonomously orchestrating complex workflows. SambaNova Systems offers its revolutionary memory-centric design, engineered to power trillion-parameter models and multi-agent systems with record-breaking interactive inference performance.
This talk will delve into SambaNova’s innovative three-tier memory system and reconfigurable dataflow architecture, which overcome the "memory wall" challenge by enabling seamless switching between hundreds of agents in microseconds. Attendees will explore how these technologies optimize data access, minimize latency, and scale across diverse real-world applications—from real-time decision-making to autonomous multi-agent collaboration—delivering transformative solutions for enterprises worldwide.
As enterprises increasingly leverage vast public and private datasets, generative AI and agentic systems are transforming the landscape of AI-driven solutions. These systems demand unparalleled scalability, speed, and efficiency to process massive data volumes while autonomously orchestrating complex workflows. SambaNova Systems offers its revolutionary memory-centric design, engineered to power trillion-parameter models and multi-agent systems with record-breaking interactive inference performance.
This talk will delve into SambaNova’s innovative three-tier memory system and reconfigurable dataflow architecture, which overcome the "memory wall" challenge by enabling seamless switching between hundreds of agents in microseconds. Attendees will explore how these technologies optimize data access, minimize latency, and scale across diverse real-world applications—from real-time decision-making to autonomous multi-agent collaboration—delivering transformative solutions for enterprises worldwide.

- Foundation Models
Han-chung Lee is a machine learning expert who builds and operates AI systems with a focus on GenAI, LLM agents, and recommendation engines. Currently Director of Machine Learning at Moody's Analytics and founder of Calabazas Creek, Han-chung excels at untangling complex code and organizations. A Berkeley EECS grad with an MBA from SJSU, he shares insights on ML engineering and tech investing drawn from his diverse experience across sell-side, buy-side, and venture capital. Follow his practical wisdom and occasional industry reflections.

- Analytics & BI
Julian Hyde is the original developer of Apache Calcite, which provides SQL parsers and query optimizers for dozens of products, and Morel, a new functional query language. Previously he led the query processing team at Looker (acquired by Google in 2020), and co-founded SQLstream, an engine for continuous queries. He left Google in early 2025 to create the next language for data.

- Lightning Talks

- Workshops

- Databases

- Foundation Models

- GenAI Applications

- AI & Data Culture

- Databases

- Workshops

- Databases

- AI Engineering

- Workshops
Modern data lakes promise affordability and scalability, but using them can be a headache. Cloud data warehouses make querying easy, but they come with a hefty price tag and extra complexity. What if you could get the same ease of use without the cost and lock-in?
In this session, we’ll show you how to leverage open-source software to build a fully functional, queryable analytics powerhouse using DuckDB, Fivetran, and Polaris Catalog. We’ll walk through how to:
1. Load data that is automatically converted to Iceberg open table format
2. Run SQL queries using DuckDB’s new Iceberg extension
3. Run transformations directly on data stored in your data lake with a new dbt adapter
4. Get started easily with a practical, hands-on demo
No vendor lock-in, no unnecessary complexity—just an open-source-powered approach to enabling advanced analytics and AI. If your data warehouse is holding you back or eating away at your budget, this session is for you!
Modern data lakes promise affordability and scalability, but using them can be a headache. Cloud data warehouses make querying easy, but they come with a hefty price tag and extra complexity. What if you could get the same ease of use without the cost and lock-in?
In this session, we’ll show you how to leverage open-source software to build a fully functional, queryable analytics powerhouse using DuckDB, Fivetran, and Polaris Catalog. We’ll walk through how to:
1. Load data that is automatically converted to Iceberg open table format
2. Run SQL queries using DuckDB’s new Iceberg extension
3. Run transformations directly on data stored in your data lake with a new dbt adapter
4. Get started easily with a practical, hands-on demo
No vendor lock-in, no unnecessary complexity—just an open-source-powered approach to enabling advanced analytics and AI. If your data warehouse is holding you back or eating away at your budget, this session is for you!
George Fraser is the CEO and co-founder of Fivetran. George founded the company with COO Taylor Brown in 2012 after completing the prestigious Y-Combinator accelerator program. Since then he has built Fivetran, a fully managed automated data integration provider, from an idea to a rapidly growing global business valued at $5.6 billion, supported by a global team of 1000+ employees.

- Data Eng & Infrastructure

- Workshops

- GenAI Applications

- Workshops

- Databases

- AI Engineering

- Data Sci & Algos

- Workshops
Ori Soen is a serial entrepreneur and current Founder/CEO of Montara Inc., creating the first unified DataOps platform for data development. With a track record of building, scaling, and successfully exiting multiple startups, Ori previously served as EVP at Medallia (NYSE: MDLA), where he helped take the company public. At Medallia, he led the Digital Products Business Unit, mid-market sales, served as CMO, and joined through the acquisition of Kampyle, where as CEO he built the company into a market leader. Earlier, he was CMO and Head of Product at Jajah (acquired by Telefonica).

- Databases

- Workshops
Modern data workloads demand fast, interactive, and scalable visualization—without the cost and complexity of server-side rendering. The local-first approach leverages modern browser capabilities, WebAssembly, and in-browser computation to achieve high-performance analytics while reducing cloud costs.
In this workshop, we’ll explore:
1. Why Local-First? The benefits of running everything client-side for cost-efficient, scalable visualization across thousands of users.
WebAssembly (WASM) for Data Apps: How Perspective harnesses WASM to power ultra-fast, browser-native analytics and even replace traditional Docker-based containers for data workloads.
2. Perspective + DuckDB: A full in-browser analytics stack that enables high-speed querying and visualization without a backend.
Streaming Data with InfluxDB: How to visualize high-frequency, real-time IoT and log data with sub-second latency.
Databricks + Perspective: Enhancing large-scale analytics with interactive dashboards inside Jupyter notebooks.
Through live coding and guided exercises, attendees will build their own browser-native analytics dashboards, connect to real-time data streams, and learn Perspective’s API in Python, Node.js, and Rust.
Difficulty level - Intermediate – Some experience with Python, JavaScript, and data analytics will be helpful, but beginners can follow along with guided exercises.
To participate, bring a laptop with:
Git
VS Code
Docker
Python (3.8+)
Node.js (16+)
Modern data workloads demand fast, interactive, and scalable visualization—without the cost and complexity of server-side rendering. The local-first approach leverages modern browser capabilities, WebAssembly, and in-browser computation to achieve high-performance analytics while reducing cloud costs.
In this workshop, we’ll explore:
1. Why Local-First? The benefits of running everything client-side for cost-efficient, scalable visualization across thousands of users.
WebAssembly (WASM) for Data Apps: How Perspective harnesses WASM to power ultra-fast, browser-native analytics and even replace traditional Docker-based containers for data workloads.
2. Perspective + DuckDB: A full in-browser analytics stack that enables high-speed querying and visualization without a backend.
Streaming Data with InfluxDB: How to visualize high-frequency, real-time IoT and log data with sub-second latency.
Databricks + Perspective: Enhancing large-scale analytics with interactive dashboards inside Jupyter notebooks.
Through live coding and guided exercises, attendees will build their own browser-native analytics dashboards, connect to real-time data streams, and learn Perspective’s API in Python, Node.js, and Rust.
Difficulty level - Intermediate – Some experience with Python, JavaScript, and data analytics will be helpful, but beginners can follow along with guided exercises.
To participate, bring a laptop with:
Git
VS Code
Docker
Python (3.8+)
Node.js (16+)

- Data Sci & Algos

- AI & Data Culture

- AI & Data Culture

- Lightning Talks
Not all AI agent use cases are created equal. While code generation agents can be tested against clear benchmarks, operational agents tackling real-world problems face a fundamentally different challenge: how do you evaluate an agent that must navigate complex, dynamic systems without a predefined playbook? Take root cause analysis in distributed systems: an agent must understand intricate service dependencies, parse through inconsistent logs, and reason about potential failure modes. Unlike coding tasks with definitive right answers, these scenarios have no ground truth. Traditional testing approaches break down completely. This talk breaks down our approach to building a deterministic simulation environment that generates and tests realistic failure scenarios at scale. We'll expose why existing evaluation methods fail—from infrastructure mimicry to LLM-generated tests—and demonstrate a lightweight simulation technique that enables precise, reproducible agent testing.
Not all AI agent use cases are created equal. While code generation agents can be tested against clear benchmarks, operational agents tackling real-world problems face a fundamentally different challenge: how do you evaluate an agent that must navigate complex, dynamic systems without a predefined playbook? Take root cause analysis in distributed systems: an agent must understand intricate service dependencies, parse through inconsistent logs, and reason about potential failure modes. Unlike coding tasks with definitive right answers, these scenarios have no ground truth. Traditional testing approaches break down completely. This talk breaks down our approach to building a deterministic simulation environment that generates and tests realistic failure scenarios at scale. We'll expose why existing evaluation methods fail—from infrastructure mimicry to LLM-generated tests—and demonstrate a lightweight simulation technique that enables precise, reproducible agent testing.

- GenAI Applications
Mitul Tiwari is CTO and Co-founder of a stealth AI startup. Until recently he was a Director of AI and Machine Learning Engineering at ServiceNow leading natural language technologies group. Earlier he was CTO and Co-founder of Passage AI (acquired by Servicenow). His expertise lies in building data-driven products using AI, Machine Learning and big data technologies. Previously he was head of People You May Know and Growth Relevance at LinkedIn, where he led technical innovations in large-scale social recommender systems. Prior to that, he worked at Kosmix (now Walmart Labs) on web-scale text categorization, and its applications. He earned his PhD in Computer Science from the University of Texas at Austin and his undergraduate degree from the Indian Institute of Technology, Bombay. He has also co-authored more than twenty publications in top conferences such as ACL, AAAI, KDD, WWW, RecSys, VLDB, SIGIR, CIKM, and SPAA.

- Data Sci & Algos

- GenAI Applications

- Workshops

- AI & Data Culture

- ML OPs & Platforms

- Lightning Talks

- AI Engineering

- Data Eng & Infrastructure

- Data Eng & Infrastructure

- Data Eng & Infrastructure

- Data Eng & Infrastructure

- Data Sci & Algos
Ciro Greco is the Founder and CEO of bauplan, a zero-copy, data-first FaaS platform launched in 2023. He holds a Ph.D. in Experimental Psychology, Linguistics and Neuroscience from the University of Milano-Bicocca. Previously, he co-founded Tooso, an AI-powered commerce search company acquired by Coveo in 2019, where he later served as VP of Artificial Intelligence. His expertise spans AI, linguistics, and data engineering, with a focus on making cloud data pipelines more efficient.

- Data Sci & Algos

- ML OPs & Platforms
Marcel Kornacker is currently CTO and Co-Founder at Pixeltable and is notably known as the founder of Apache Impala and co-founder of Apache Parquet. He holds a Ph.D. in Computer Science from UC Berkeley, where he studied databases under Joe Hellerstein. His career includes founding several startups, including Blink Computing, which provided data lake analytics as a service. He has also served as an Entrepreneur in Residence at both Sutter Hill Ventures and Coatue Management. Kornacker's expertise spans databases, data analytics, and open-source technologies.

- Lightning Talks

- ML OPs & Platforms

- Data Eng & Infrastructure

- Analytics & BI

- ML OPs & Platforms

- ML OPs & Platforms

- GenAI Applications

- Lightning Talks

- Lightning Talks

- Lightning Talks

- Lightning Talks
Michael is a globally recognized leader in AI and data-driven business transformation, known for developing AIOS, an intelligent marketing system at Plus that optimizes customer journey analysis and marketing ROI. A former NYU Stern marketing professor, his research spans predictive computation, marketing strategies, and consumer behavior. He has launched successful AI products focused on user safety, privacy, and marketing optimization. His work on explainable prediction and data synthesis, particularly with incomplete data, has been featured in major publications like LA Times and AdAge, and he frequently speaks at prestigious events including The Nantucket Project and ANA Masters of Marketing.

- Workshops

- Workshops

- Workshops

- AI & Data Culture
The modern data landscape is dominated by complexity: tables, schemas, pipelines, warehouses, and more. Yet the most successful data platforms share a common principle—they make data itself invisible to the end user. When data infrastructure functions optimally, it's like good plumbing: you only notice it when something breaks. Organizations often fixate on the mechanics of data while losing sight of what truly matters: metrics, dimensions, and semantics. When users engage with meaningful abstractions rather than technical details, they make better decisions faster. In this talk, you'll learn strategies for making data invisible through real-world abstraction success stories, designing effortless interactions, and implementing governance through abstraction. Walk away with practical ways to assess your data stack, advocate for user-centric approaches, and measure progress—making your data platform not just powerful, but invisible in all the right ways.
The modern data landscape is dominated by complexity: tables, schemas, pipelines, warehouses, and more. Yet the most successful data platforms share a common principle—they make data itself invisible to the end user. When data infrastructure functions optimally, it's like good plumbing: you only notice it when something breaks. Organizations often fixate on the mechanics of data while losing sight of what truly matters: metrics, dimensions, and semantics. When users engage with meaningful abstractions rather than technical details, they make better decisions faster. In this talk, you'll learn strategies for making data invisible through real-world abstraction success stories, designing effortless interactions, and implementing governance through abstraction. Walk away with practical ways to assess your data stack, advocate for user-centric approaches, and measure progress—making your data platform not just powerful, but invisible in all the right ways.

- AI & Data Culture
At the foundation of AI project failures lies a critical gap between data teams and business reality. On top of this gap, data quality issues, unexpected privacy concerns, and tools that don't align with actual business problems arise to hinder or block implementation. As we've built our own AI product—AI Decisioning—and implemented it with customers, we've learned that successful AI implementations depend on embedding data teams within business units. Embedding doesn't mean breaking apart your data team and dispersing it throughout every other department. It means establishing focused partnerships where data team members are deeply integrated into business teams' daily workflows and decision-making processes while remaining connected to the central data organization. This embedding creates a virtuous cycle: data teams gain deep domain knowledge, business professionals see improved data quality and gain data facility, and together, data and business teams implement AI solutions that solve real problems. In this talk, we'll share concrete examples of how data teams (especially data scientists) and marketing have worked together in successful AI Decisioning implementations. We’ll derive strategies to implement this organizational pattern and enable a company to move from analytics to actions and from data teams as service providers to active collaborators. While our case studies focus primarily on marketing partnerships, the embedded partnership model we present applies equally to other business functions including product development, operations, and customer service teams.
At the foundation of AI project failures lies a critical gap between data teams and business reality. On top of this gap, data quality issues, unexpected privacy concerns, and tools that don't align with actual business problems arise to hinder or block implementation. As we've built our own AI product—AI Decisioning—and implemented it with customers, we've learned that successful AI implementations depend on embedding data teams within business units. Embedding doesn't mean breaking apart your data team and dispersing it throughout every other department. It means establishing focused partnerships where data team members are deeply integrated into business teams' daily workflows and decision-making processes while remaining connected to the central data organization. This embedding creates a virtuous cycle: data teams gain deep domain knowledge, business professionals see improved data quality and gain data facility, and together, data and business teams implement AI solutions that solve real problems. In this talk, we'll share concrete examples of how data teams (especially data scientists) and marketing have worked together in successful AI Decisioning implementations. We’ll derive strategies to implement this organizational pattern and enable a company to move from analytics to actions and from data teams as service providers to active collaborators. While our case studies focus primarily on marketing partnerships, the embedded partnership model we present applies equally to other business functions including product development, operations, and customer service teams.

- Lightning Talks
Building Cost-Effective LLM Routers: Boost Accuracy 25% While Cutting Costs 90% | This session reveals how to build intelligent model routers that dynamically direct inputs to the optimal large language model (LLM) for each specific task. Attendees will learn practical implementation strategies for multi-model LLM systems that significantly improve performance metrics—achieving up to 25% higher accuracy while reducing operational costs by as much as 90%. The presentation covers essential routing methodologies, evaluation frameworks, and scalable architectures for production deployments. Developers and ML engineers will gain actionable insights for overcoming technical challenges in multi-model LLM systems, optimizing both performance and cost-efficiency in generative AI applications. Perfect for teams looking to maximize ROI from their AI infrastructure while maintaining high-quality outputs.
Building Cost-Effective LLM Routers: Boost Accuracy 25% While Cutting Costs 90% | This session reveals how to build intelligent model routers that dynamically direct inputs to the optimal large language model (LLM) for each specific task. Attendees will learn practical implementation strategies for multi-model LLM systems that significantly improve performance metrics—achieving up to 25% higher accuracy while reducing operational costs by as much as 90%. The presentation covers essential routing methodologies, evaluation frameworks, and scalable architectures for production deployments. Developers and ML engineers will gain actionable insights for overcoming technical challenges in multi-model LLM systems, optimizing both performance and cost-efficiency in generative AI applications. Perfect for teams looking to maximize ROI from their AI infrastructure while maintaining high-quality outputs.

- Data Eng & Infrastructure

- Data Eng & Infrastructure

- Analytics & BI

- Lightning Talks

- ML OPs & Platforms

- Workshops

- Workshops
1. Key Components of the Workshop: Dataset Handling: Participants will work with a spatial dataset to evaluate potential locations for opening a new BBQ restaurant. Thanks to MotherDuck, the dataset is easily brought down locally in a highly compressed format, ensuring a quick and safe environment for experimentation.
2. Live AI-Assisted Coding: The workshop will feature a live demonstration where an AI tool iteratively generates SQL queries. Rather than pre-defining metrics, the AI assists in exploring and defining the spatial parameters necessary to identify the optimal restaurant location—a process that mirrors real-world, dynamic decision-making.
3. Real-Time Data Visualization: As queries are refined and executed, Python will be used to chart the results on the fly. Utilizing uv for environment management alongside visualization libraries such as Seaborn and Matplotlib, participants will see how spatial insights are translated into clear, actionable charts.
4. Iterative, Low-Risk Workflow: The session emphasizes a low-risk, experimental approach. If the AI-generated code isn't perfect, no harm is done—files can be quickly deleted or corrected, encouraging a creative, hands-on learning environment where trial and error lead to deeper understanding.
1. Key Components of the Workshop: Dataset Handling: Participants will work with a spatial dataset to evaluate potential locations for opening a new BBQ restaurant. Thanks to MotherDuck, the dataset is easily brought down locally in a highly compressed format, ensuring a quick and safe environment for experimentation.
2. Live AI-Assisted Coding: The workshop will feature a live demonstration where an AI tool iteratively generates SQL queries. Rather than pre-defining metrics, the AI assists in exploring and defining the spatial parameters necessary to identify the optimal restaurant location—a process that mirrors real-world, dynamic decision-making.
3. Real-Time Data Visualization: As queries are refined and executed, Python will be used to chart the results on the fly. Utilizing uv for environment management alongside visualization libraries such as Seaborn and Matplotlib, participants will see how spatial insights are translated into clear, actionable charts.
4. Iterative, Low-Risk Workflow: The session emphasizes a low-risk, experimental approach. If the AI-generated code isn't perfect, no harm is done—files can be quickly deleted or corrected, encouraging a creative, hands-on learning environment where trial and error lead to deeper understanding.

- Workshops

- Workshops

- Lightning Talks

- Lightning Talks
The first two years of the GenAI revolution are bending the OSS way: Open Source models have reached state of the art, and most of the ecosystem around AI is open-source. The key to AI adoption is properly organizing and using business knowledge. In industry, LLMs give way to Small Specialized Models (SSMs), utilized by Domain Expert Agents (DXAs). Their work should be structured according to the domain requirements, requiring structured output. Organizing and using domain knowledge for AI has long been a domain of Knowledge Graphs (KG). At Neo4j, we are in a moment where our KG leadership powers the rise of GraphRAG, a better context traversal that we lead alongside Microsoft, Amazon, Google, and other GenAI partners. We also integrate with many OSS AI startups to build a better AI stack around GraphRAG. Neo4j has joined LFAI to bridge the enterprise AI adoption with startup innovation, centered around structured knowledge. In this talk we describe OAKS, a set of projects, communities, and technologies that comprise the Open Agentic AI Knowledge Stack. We show where the most value will be created and how the OSS AI ecosystems come together to build and deliver it.
OAKS consists of structured input, knowledge transformation, and structured output. We show the Agentic AI architectures emerging around AI memory, graph-based agentic workflows, and frameworks including scalable message passing, knowledge encapsulation, and colocated knowledge and computation for web-scale routing. We invite the community to join us!
The first two years of the GenAI revolution are bending the OSS way: Open Source models have reached state of the art, and most of the ecosystem around AI is open-source. The key to AI adoption is properly organizing and using business knowledge. In industry, LLMs give way to Small Specialized Models (SSMs), utilized by Domain Expert Agents (DXAs). Their work should be structured according to the domain requirements, requiring structured output. Organizing and using domain knowledge for AI has long been a domain of Knowledge Graphs (KG). At Neo4j, we are in a moment where our KG leadership powers the rise of GraphRAG, a better context traversal that we lead alongside Microsoft, Amazon, Google, and other GenAI partners. We also integrate with many OSS AI startups to build a better AI stack around GraphRAG. Neo4j has joined LFAI to bridge the enterprise AI adoption with startup innovation, centered around structured knowledge. In this talk we describe OAKS, a set of projects, communities, and technologies that comprise the Open Agentic AI Knowledge Stack. We show where the most value will be created and how the OSS AI ecosystems come together to build and deliver it.
OAKS consists of structured input, knowledge transformation, and structured output. We show the Agentic AI architectures emerging around AI memory, graph-based agentic workflows, and frameworks including scalable message passing, knowledge encapsulation, and colocated knowledge and computation for web-scale routing. We invite the community to join us!

- Data Eng & Infrastructure
Building High-Throughput Data Orchestration: Instacart's Journey to 20M Daily Workflows | Explore how Instacart built an enterprise-grade orchestration system handling 20 million daily workflows across diverse technical domains. Learn implementation details of their cloud-native platform combining Apache Airflow and Temporal for robust scheduling and execution. Deep dive into YAML-based workflow definitions, GitOps deployment patterns, and observability solutions that enable reliable scaling. Practical insights from years of production experience, applicable to both startups and enterprises building scalable data infrastructure.
Building High-Throughput Data Orchestration: Instacart's Journey to 20M Daily Workflows | Explore how Instacart built an enterprise-grade orchestration system handling 20 million daily workflows across diverse technical domains. Learn implementation details of their cloud-native platform combining Apache Airflow and Temporal for robust scheduling and execution. Deep dive into YAML-based workflow definitions, GitOps deployment patterns, and observability solutions that enable reliable scaling. Practical insights from years of production experience, applicable to both startups and enterprises building scalable data infrastructure.

- GenAI Applications

- Lightning Talks

- AI & Data Culture

- Databases

- Lightning Talks

- Workshops
WHY ATTEND?
Go beyond just conference talks and engage directly with our community.
- Expo Hall & Networking
- Interactive Workshops
- Speaker Office Hours
- Drinks & Demos
Expo Hall & Networking

Discover cutting-edge tools and technologies from innovators at the forefront of AI & data. Explore, connect, and get a firsthand look at what’s next.
Interactive Workshops

Why pay extra to level up your career? Gain practical training on the latest data tools from the architects & builders of the tools themselves. (All workshops Included in Ticket Price)
Speaker Office Hours

Our speakers provide real technical depth and go beyond whitepaper-level details. Office Hours sessions with speakers follow each talk and feature additional in-depth discussion opportunities for attendees.
Drinks & Demos

Rub shoulders with the brightest minds in AI & data. Come to make meaningful connections with startups, customers, peers, investors & more.
AI Launchpad
Join us on Day 1 during our 🥳 Community Party to hear from 6 exceptional AI startups.
Brought to you by Zero Prime Ventures.



3-Day Conference Passes
Startup Ticket
$799
Founder-Friendly Pricing
Our special discounted rate for
companies that have raised <$5M
Regular Ticket
$1999
Discounted Group Pricing
Buy 5 Tickets = $1,200/each
Buy 10 Tickets = $1,000/each
Investor Ticket
$4999
(They can afford it 💸)
Investor tickets help subsidize our
Startup Tickets. Thank you :)
💥 INCLUDED IN ALL TICKETS 💥
3 Full Days • Free Workshops • Speaker Office Hours • Community Party • Data Council T-shirt & Tote
• Breakfast, Lunch & Snacks • Coffee & Drinks • Locally Sourced Food • Talk Recordings • Fun Networking
👮♀️ NEED MANAGER APPROVAL?
Check out our Convince Your Boss Email Template





See You in Oakland!
This year, we're excited to call the historic Oakland Scottish Rite Center home to Data Council 2025.
Nestled on Lake Merritt with stunning lake views, this architectural gem puts you steps from downtown's best hotels, dining, and nightlife. Just 15 minutes from BART or a scenic ferry ride from downtown San Francisco.
547 Lakeside Dr, Oakland, CA, 94612
🎟️ BIG TEAM = BIG DEAL 💸
Data Council is more fun with friends. Save 40% OFF when you purchase a 5 pack, and 50% OFF with a 10 pack.





Oakland Scottish Rite Center
The Temple of Data
April 22 - 24, 2025
Lake Merritt, Oakland
Why Attend Data Council?
Learn from Industry Experts
Get architectural insights and best practices straight from the pioneers building the future of data & AI, no marketing fluff here.
Hands-On Experiences
Put theory into practice through interactive workshops and learning opportunities, such as our unique office hours where you can meet any speaker in a small group setting.
Unparalleled Networking
Get exclusive access and connect with engineers and founders who speak your language. No suits and sales pitches, just real pros sharing their work.
Meet the Hosts
Content quality sets Data Council apart. Unlike other conferences that simply accept abstracts as-is, our track hosts go the extra mile to hand select presentations and collaborate with speakers on their topics to ensure the highest value talks take the stage.

Bryan Bischof
Head of AI
Theory Ventures

Carlos Aguilar
Founder
Hashboard

Daniel Francisco
Director of Product
Meta

Maggie Hays
Community Product Manager
Acryl Data
.webp)
Roger Magoulas
Principal
Almost Data

Sai Srirampur
Principal Engineer
Clickhouse

Scott Breitenother
Founder
Brooklyn Data

Sean Anderson
Head of Product Marketing
Vectara

Sean Taylor
Data Scientist
OpenAI

Swyx (Shawn) Wang
Co-Host
Latent.Space Podcast

Tristan Zajonc
CEO & Co-Founder
Continual
About Our Tracks
Our carefully curated tracks balance proven technical foundations with emerging data & AI trends. Get real frameworks, techniques and actionable knowledge straight from seasoned practitioners.
Data Eng & Infrastructure
AI Engineering
Data Science & Algos
GenAI Applications
Analytics & BI
MLOps & Platforms
Foundation Models
Databases
AI & Data Culture
Lightning Talks
FAQ
Yes! We <3 teams at Data Council and offer streamlined packages for groups of 5 or 10 with huge savings of up to 40-50% off regular ticket prices. Best of all, you can purchase them directly with no invoicing or back-and-forth needed with a sales rep. Simply visit our ticketing site to learn more about group rates.
Yes, we offer discounts for startups (must have raised <$5M), non-profits, government agencies and academic students & faculty. For startups, please see our ticketing site and for non-profit & academic, please contact community@datacouncil.ai for more information.
We're excited to bring Data Council back to the Bay Area on Apr 22-24, 2025! The event will be held at the historic Oakland Scottish Rite Center, right off the shores of beautiful Lake Merritt in Oakland, CA.
Once you purchase your ticket, all talks, workshops and networking opportunities are available to you as part of the Data Council experience. However, external costs such as travel, lodgings and commuting are your responsibility.
Meet the Team

Pete Soderling
Founder & GP at Zero Prime Ventures

Yang Tran
Partner at Zero Prime Ventures

Tim Wu
Head of Marketing at Data Council

Missy Bass
Events Director at Data Council

Gillian Jarvis
Design Advisor at Data Council
Discover the data foundations powering today's AI breakthroughs. Join leading minds as we explore both cutting-edge AI and the infrastructure behind it.