Apache Airflow for Orchestration and Monitoring of Apache Druid

Post Details

Company

Rill

Date Published

Oct. 21, 2021

Author

Scott Cohen

Word Count

973

Language

English

Hacker News Points

-

Source URL

www.rilldata.com/blog/apache-airflow-for-orchestration-and-monitoring-of-apache-druid-observability-logic

Summary

The text outlines a comprehensive approach to observability and data health checks within data pipelines, focusing on the integration of Apache Druid with systems like Apache Airflow, Opsgenie, and Slack. It emphasizes the importance of maintaining data quality and completeness from the initial stages of raw data processing through to analysis, using both static rule checks and dynamic, data-driven tests to ensure accuracy and reliability. The article discusses the trade-offs between cost, timeliness, and data validation, advocating for an iterative approach to testing, starting small and expanding as more is learned from production pipelines. It highlights the need to identify root causes of pipeline ingestion failures quickly and to automate responses where possible to minimize data lag. Monitoring end-user performance is also crucial, particularly for optimizing query latency on massive datasets, with the use of Rill Explore dashboards to diagnose issues. The text concludes by suggesting the inclusion of business stakeholders in the alerting process and conducting post-mortems to refine workflows and reduce future issues, sharing insights gained from their journey in maintaining an always-on observability system.