OpenLineage is an API for collecting data lineage and metadata at runtime. While initiated by Datakin, the company behind Marquez, it was developed with the aim to create an open standard. As Datakin’s CTO Julien Le Dem explained in a
blog post announcing the launch, OpenLineage is meant to answer the industry-wide need for data lineage, while making sure efforts in that direction aren’t fragmented or duplicated.
Only time will tell if it achieves this goal, but its circle of supporters already includes contributors from other major open-source data projects: Airflow, Amundsen, DataHub, dbt, Egeria, Great Expectations, Iceberg, Pandas, Parquet, Prefect, Spark, and Superset. As for Marquez, which is
now an LF AI & Data project, i
t is the reference implementation of the OpenLineage API.