This talk discusses Apache Arrow project and its uses for high performance analytics and system interoperability. Data processing systems have historically been full-stack systems features memory management, IO, file format adapters, runtime memory format, in-memory query engine, and front-end user interfaces. Many of these components are fully "bespoke" or "custom", in part due to a lack of open standards for many of the pieces.
Apache Arrow was created by a diverse group of open source data system developers to define open standards and community-maintained libraries for high performance in-memory data processing. Since the beginning of 2016, we have been building a cross-language development platform for data processing to help create systems that are faster, more scalable, and more interoperable.
I discuss the current development initiative and future roadmap as it relates to the data science and data engineering worlds.
Wes McKinney is an open source software developer and entrepreneur focusing on data processing tools and systems. He created the Python pandas and Ibis projects, and co-created Apache Arrow. He is a Member of the Apache Software Foundation and also a project PMC member for Apache Parquet. He is currently a Principal Architect at Posit PBC and a co-founder of Voltron Data.