Cross-Region Disaster Recovery on Astro Is Now Generally Available: Here's How We Built It
Blog post from Astronomer
Cross-region disaster recovery (DR) on Astro is now generally available for AWS data planes, allowing customers to seamlessly fail over their Airflow workloads to a secondary region with just a click. The DR solution was developed to meet business-critical demands from industries like financial services and healthcare, providing an essential backup for enterprise-scale Airflow operations. This innovation alleviates the burden of building parallel infrastructure for DR, which traditionally required significant engineering effort. The system operates by provisioning a secondary EKS cluster in a warm standby mode, ensuring continuity through data replication across three categories: Airflow metadata, task logs, and container images. The architecture relies on AWS Aurora Global Clusters for efficient cross-region replication, bi-directional S3 replication for task logs, and a headless database setup that optimizes costs by running compute instances only when necessary. Programmatic control is available via the Astro API and Terraform, facilitating automated DR operations. Observability and health monitoring are maintained across both primary and secondary clusters, with a focus on centralizing DR awareness in the manifest system to simplify maintenance. Future developments include extending DR support to GCP and Azure, along with enhancing the self-service migration experience for existing clusters.
| Trend | Post Mentions | Total Month Mentions | Posts | Companies | MoM |
|---|---|---|---|---|---|
| Observability | 4 | 4,496 | 812 | 176 | +40% |