Home / Companies / Datadog / Blog / Post Details
Content Deep Dive

Building highly reliable data pipelines at Datadog

Blog post from Datadog

Post Details
Company
Date Published
Author
Quentin Francois
Word Count
2,061
Language
English
Hacker News Points
16
Summary

Quentin Francois, a data engineer at Datadog, shares the company's best practices for building reliable data pipelines. According to Francois, reliability is not about never failing, but rather ensuring that data pipelines deliver good data in time. To achieve this, it's essential to design pipelines with fault tolerance and monitoring in mind. This includes using object stores, cloud Hadoop/Spark services, and spot instances, which can fail at any time due to supply and demand fluctuations. However, with proper architecture, clustering, and monitoring, failures can be anticipated and recovered from quickly. Key strategies include breaking down jobs into smaller pieces, monitoring cluster metrics, job metrics, and data latencies, and thinking ahead of potential failures. By following these best practices, data engineers can build more reliable data pipelines that meet the needs of downstream users.