Bucket Spans for Elasticsearch and Machine Learning

Post Details

Company

Elastic

Date Published

July 27, 2017

Author

Sophie Chang

Word Count

1,853

Language

-

Hacker News Points

-

Source URL

www.elastic.co/blog/explaining-the-bucket-span-in-machine-learning-for-elasticsearch

Summary

The concept of bucket span in machine learning for Elasticsearch is crucial for dividing continuous streams of data into manageable batches for processing, especially in anomaly detection tasks. With the release of Elastic Stack 5.5, a bucket span estimator was introduced as an experimental feature, aiding users in determining the minimum viable bucket span based on a subset of their data. The bucket span determines the frequency of data analysis and alerts, affecting the system's sensitivity to anomalies and balancing between detecting anomalies quickly and avoiding excessive noise. Choosing the optimal bucket span involves considering data characteristics, anomaly duration, and processing performance, with shorter spans detecting anomalies faster but potentially introducing noise, and longer spans smoothing data but possibly missing significant anomalies. Once set, the bucket span cannot be changed during an analysis as it would require recalibrating the entire model, emphasizing the importance of selecting an appropriate span initially.