/plushcap/analysis/datadog/engineering-husky-deep-dive

Husky: Exactly-Once Ingestion and Multi-Tenancy at Scale

What's this blog post about?

Datadog's third-generation event store, Husky, is a distributed, time-series oriented, columnar store optimized for streaming ingestion and hybrid analytical and search queries. To ensure exactly once ingestion of every event into Husky’s storage engine, the company developed auto-scaling, multi-tenant data ingestion pipelines. They introduced locality by deterministically mapping events to groups of partitions called shards by their ID and timestamp. This allowed for efficient deduplication within a shard and reduced storage costs and improved performance. The Sharding Allocator ensures all Shard Router nodes have a consistent view of allocated Shard Placements, while the Autosharder periodically adjusts configured shard counts on a tenant-by-tenant basis to better fit observed traffic volume. Load balancing is achieved by shifting tenant Shard Placements around using a salting technique and a balancing algorithm that shifts placements around until all shards are roughly balanced. The Writers, responsible for exactly-once ingestion to Husky, persist event IDs in separate Husky tables from the raw event data to ensure consistency between the event data itself and the event IDs once they’ve been committed to the Metadata store.

Company
Datadog

Date published
Feb. 22, 2023

Author(s)
Daniel Intskirveli, Cecilia Watt

Word count
4354

Hacker News points
22

Language
English