Company
Date Published
Author
Howard Yoo
Word count
2118
Language
English
Hacker News points
None

Summary

Airflow can be used to extract data lineage events from pipelines using OpenLineage, an open-source standard for collecting and analyzing lineage metadata. There are three ways to do this: (1) using pre-built operators that emit OpenLineage events, such as BigQueryOperator and PostgresOperator; (2) developing custom Airflow operators with custom OpenLineage extractors; or (3) using inlets and outlets to manually set data lineage for operators. Each method has its own advantages and requires different setup and configuration. Astro, a fully managed cloud orchestration platform powered by Apache Airflow, provides pre-built support for OpenLineage out of the box, making it easy to extract lineage metadata and visualize pipeline workflows.