Technical Talks

View All

Data Lineage with Apache Airflow using OpenLineage

Julien Le Dem Julien Le Dem | Principal Engineer | Datadog
Willy Lulciuc Willy Lulciuc | Founding Engineer | Datakin

As workflows increase in complexity, companies have come to depend on Airflow to manage inter-DAG dependencies. Airflow has quickly become an important component of the Modern Data Stack powering analytical reports, business metrics, and dashboards.

But what effects (if any) would upstream DAGs have on downstream DAGs if dataset consumption were delayed? What alerting rules should be in place to notify downstream DAGs of possible upstream processing issues or failures? How can we use data lineage to achieve the data observability we need to answer these questions?

In this talk, OpenLineage will be introduced, an open standard for collecting lineage metadata for jobs under execution, and how it works with Airflow. The presentation will walk through a practical example using Marquez, the reference implementation of OpenLineage. It will be explained how OpenLineage can help data teams maintain inter-DAG dependencies within their Airflow instance, capture metadata on historical DAG runs, and minimize data quality issues.

Julien Le Dem
Julien Le Dem
Principal Engineer  | Datadog

Julien Le Dem is a Principal Engineer at Datadog, serves as an officer of the ASF and is a member of the LFAI&Data Technical Advisory Council. He co-created the Parquet, Arrow and OpenLineage open source projects and is involved in several others. His career leadership began in Data Platforms at Yahoo! - where he received his Hadoop initiation - then continued at Twitter, Dremio and WeWork. He then co-founded Datakin (acquired by Astronomer) to solve Data Observability. His French accent makes his talks particularly attractive.

Willy Lulciuc
Willy Lulciuc
Founding Engineer | Datakin

Willy Lulciuc is the Founding Engineer of Datakin. He makes datasets discoverable and meaningful with metadata. He co-created Marquez and is now involved in the OpenLineage initiative. Previously, he worked on the Project Marquez team at WeWork. When he’s not reviewing code and creating indirections, he can be found experimenting with analog synthesizers.

FEATURED MEETINGS