Home / Companies / Datadog / Blog / Post Details
Content Deep Dive

Husky: Efficient compaction at Datadog scale

Blog post from Datadog

Post Details
Company
Date Published
Author
Damien Profeta, George Talbot
Word Count
3,996
Language
English
Hacker News Points
29
Summary

Husky is a distributed storage system designed for observability data, which means it processes and stores large amounts of new data with minimal updates to existing records. To achieve this, Husky uses a combination of size-tiered compaction and locality compaction methods. Size-tiered compaction involves merging small fragments into larger ones based on exponentially increasing size classes, while locality compaction generates fragments that are as lexically narrow as possible by dividing them into individual levels with exponentially larger sizes. This approach aims to minimize the number of small fragments, reducing object storage GET requests and improving query performance. The system also uses a custom fragment file storage format, a sorting schema, and pruning mechanisms to optimize for query execution, ensuring that only relevant fragments are scanned, reducing latency and cost. Overall, Husky's compaction system is designed to efficiently process trillions of events per day while minimizing the impact on queries and object storage costs.