Airflow in Action: How American Express Orchestrates Metadata Across 3,000 Databases
Blog post from Astronomer
At the Airflow Summit, Kunal Jain from American Express presented a session on tackling metadata management at scale using Apache Airflow, focusing on building a scalable metadata pipeline to handle thousands of databases and data sources. He emphasized the importance of metadata in data management and governance, identifying four types: technical, declared, operational, and monitoring metadata. American Express faced challenges in coordinating metadata across a vast landscape of data sources, including RDBMS, NoSQL, and data warehouses, with Airflow serving as the backbone of their solution. The company developed custom Airflow operators to streamline metadata collection, ensuring continuous and reliable updates across their enterprise. Looking ahead, they plan to leverage Airflow 3's event-driven scheduling and remote execution capabilities to enhance their metadata management processes further.