In 2018, Julien Le Dem described how the components of databases, distributed or not, were being commoditized as individual parts that anyone could recombine into use-case-specific engines. Given one's constraints, they could leverage those components to build a query engine that solves a specific problem much faster than building everything from the ground up. He called this idea "the Deconstructed Database" and spoke about it at a previous edition of Data Council. Fast forward to today, the big data ecosystem has matured and evolved from a melting pot of competing projects into a more composable ecosystem organized around a few open source standards. It's been incredible to see the vision he outlined in his talk crystallize with the adoption of key components like Parquet, Arrow, Iceberg, Calcite, Substrait and OpenLineage. These tools, and others like them, provide an interoperability layer that enables harnessing data for many purposes without creating silos.
In this talk, Julien will discuss the impact of the cloud and the advent of the open data lake, breaking silos to form the foundation of this ecosystem. As compute and storage can be efficiently decoupled, a common storage layer enables a vibrant ecosystem of on-demand tools specialized to specific use cases that avoid vendor lock-in. He'll go over the core components, how they work together and more importantly, the contracts that keep them decoupled and composable.
Julien Le Dem is a Principal Engineer at Datadog, serves as an officer of the ASF and is a member of the LFAI&Data Technical Advisory Council. He co-created the Parquet, Arrow and OpenLineage open source projects and is involved in several others. His career leadership began in Data Platforms at Yahoo! - where he received his Hadoop initiation - then continued at Twitter, Dremio and WeWork. He then co-founded Datakin (acquired by Astronomer) to solve Data Observability. His French accent makes his talks particularly attractive.