Home / Companies / Elastic / Blog / Post Details
Content Deep Dive

Categorize your logs with Elasticsearch categorize_text aggregation

Blog post from Elastic

Post Details
Company
Date Published
Author
-
Word Count
1,130
Language
-
Hacker News Points
-
Summary

Elasticsearch's categorize_text aggregation is a powerful new feature designed to enhance log exploration by identifying prevalent log patterns at query time, significantly reducing the time it takes to extract information from large volumes of data. This capability, which is particularly beneficial for system administrators and Site Reliability Engineers (SREs), works by reading text from the document source and creating tokens using a custom tokenizer called ml_standard, which is specifically tailored for machine-generated text. The tokens are then clustered using a modified DRAIN algorithm, focusing on consistent tokens to form category definitions while removing highly variable ones. The feature is integrated into Elasticsearch’s aggregation framework and can be visualized in Kibana, allowing users to identify and compare error categories over time, visualize category trends, and explore term prevalence within categories. Released as a technical preview in version 7.16, this tool offers extensive opportunities for data exploration and invites user feedback through Elastic's community forums and Slack channels.