Company
Date Published
Author
-
Word count
1025
Language
-
Hacker News points
None

Summary

With the release of Elasticsearch 1.1.0, a new percentile metric aggregation offers a valuable tool for analyzing data by highlighting the distribution of values within a dataset, unlike basic statistics like mean or median which often obscure outliers. This percentile metric provides a more nuanced picture, revealing, for example, that while average latency might appear manageable, a significant portion of users could be experiencing much higher latencies. The metric calculates a set of default percentiles and can be applied to any data aggregation bucket to uncover insights such as geographical disparities in data load times. Although the calculation of percentiles can be resource-intensive, Elasticsearch employs the T-Digest algorithm to approximate these values efficiently, trading off some accuracy for memory savings, especially useful with large datasets. This approach is particularly effective for assessing extreme percentiles, which tend to maintain higher accuracy, and can be adjusted through a compression parameter for finer control over the memory-accuracy balance.