Home / Companies / Datadog / Blog / Post Details
Content Deep Dive

How we measure data completeness at scale

Blog post from Datadog

Post Details
Company
Date Published
Author
Valentin Touffet, Alexandre Olivier
Word Count
3,664
Company Posts That Month
4
Language
English
Hacker News Points
-
Summary

Datadog's Data Completeness team has developed a robust system to ensure the integrity and completeness of data across its vast distributed ingestion pipelines, which handle billions of payloads per second. This system is crucial for maintaining the reliability of automated decisions and customer-facing dashboards, as incomplete data can lead to flawed outcomes. To achieve this, the team tracks data completeness by segmenting pipelines and monitoring payloads as they traverse each segment, using create and acknowledgment events to gauge completeness. By employing a time-bucket model, the system ensures idempotency and minimizes external dependencies, allowing it to remain functional even during system degradations. Additionally, a load-shedding mechanism dynamically adjusts sampling to maintain accuracy without incurring prohibitive costs. The completeness system is designed to be resilient, deploying independently across multiple availability zones and employing custom in-memory storage to handle the vast data volumes efficiently. By integrating metadata for real-time topology insights and facilitating incident response, Datadog has created a system that not only detects and mitigates pipeline issues swiftly but also supports ongoing automation and scalability efforts.

Trends Found in this Post

No tracked trend matches for this post yet.