Data lakes and Data warehouses have co-existed for over a decade, but their paths appear to converge in the cloud. But, do they really? In this talk, you will learn how to build a future-proof data architecture, that leverages both. We discuss how the modern data architecture has evolved over the last decade, from the front row seats we have had as creators of Apache Hudi. We walk through the forces behind the move from on-prem data warehouses to Hadoop data lakes and their incarnations on the cloud, rise of cloud data warehouses and most recently the emergence of the Lakehouse technologies. We then present major lakehouse and cloud warehouse technology stacks, discuss their pros/cons, features and cost/performance tradeoffs. Finally, we present a practical approach to using an interoperable lakehouse as the bedrock of your data architecture, showing how it unlocks cost effectiveness, AI/ML ecosystem and massive scale, while still retaining your cloud warehouse for traditional BI/Analytics workloads.
Vinoth Chandar is the original creator & VP of the Apache Hudi project, which pioneered transactional data lakes as we know it today, during his time as Uber's data architect. Vinoth has unique perspectives and deep experience with databases, distributed-systems and data systems at planet-scale, through his work at Oracle, Linkedin, Uber & Confluent, on systems like Oracle Streams, Voldemort, Apache Kafka/Streams, ksqlDB.