Home / Companies / Grafana Labs / Blog / Post Details
Content Deep Dive

Metrictank Data Distribution: The Quest for the Best Hashing Method

Blog post from Grafana Labs

Post Details
Company
Date Published
Author
Florian Boucault
Word Count
1,228
Language
English
Hacker News Points
-
Summary

The blog post explores the quest for the most efficient hashing method for data distribution within Metrictank, a clustered time-series database. It explains how Metrictank utilizes partition IDs to distribute data across cluster instances, thereby ensuring even load distribution and resilience to instance failures. The text investigates various hashing functions, such as FNV-1a, SipHash, xxHash, and MetroHash, with a focus on the jump hash function, which requires data to be preprocessed into a uint64 format for optimal performance. The experiments reveal that xxHash combined with jump hash provides a superior balance of computational efficiency and even data distribution, leading Metrictank to adopt it as the default for setups with tag support. The study highlights the ability of the new setup to process up to 10 million metrics per second on a single core, with additional benefits such as reduced bandwidth and disk space usage due to the elimination of key publication in Kafka messages.