Airflow in Action: How DoorDash Scaled for Data and ML Engineering
Blog post from Astronomer
DoorDash's engineering team addressed the challenges of scaling their Apache Airflow deployment by developing the Orchestration Frederator, a centralized unifying layer that enables horizontal scaling across multiple Airflow instances. This solution was necessitated by operational bottlenecks experienced with a single monolithic instance, which proved difficult to manage at high scale due to memory pressure, DAG parsing delays, and API server responsiveness issues. By categorizing pipelines based on their business importance, DoorDash deployed a tiered instance structure that improved scalability, reliability, and isolation, although it introduced complexities such as managing cross-instance DAG dependencies and dynamically shifting workloads. The Frederator's centralized database and unified interface streamline operations by directing users to the correct Airflow instance and managing dependencies, while the dual-hosting migration strategy ensures smooth transitions. DoorDash's approach highlights the intricate engineering required to scale Airflow, whereas Astro offers a managed service with similar benefits out-of-the-box, including elastic auto-scaling, high availability, disaster recovery, and multi-cloud deployment, allowing teams to focus on pipeline creation rather than infrastructure management.