Company
Date Published
Author
Adrien Grand
Word count
675
Language
-
Hacker News points
None

Summary

Elasticsearch introduced a long-requested feature in its 1.1.0 release: the ability to count unique values for a particular field using a cardinality aggregation. This feature allows users to compute metrics such as the number of unique website visitors, and can be combined with other aggregations, like date histograms, to identify trends. The challenge of counting unique values across distributed datasets is addressed using algorithms like linear counting and HyperLogLog, which provide approximate results while managing memory usage. HyperLogLog++ is employed to achieve a balance between precision and resource efficiency by using linear counting for low-cardinality datasets and HyperLogLog for larger ones. The precision_threshold parameter allows users to configure the trade-off between precision and memory usage, ensuring that counts remain accurate up to the set threshold, with relative errors typically remaining under 5% even for large datasets.