How to monitor Elasticsearch performance

Company

Datadog

Date Published

Sept. 26, 2016

Author

Emily Chang

Word count

6327

Language

English

Hacker News points

URL

www.datadoghq.com/blog/monitor-elasticsearch-performance-metrics

Summary

Elasticsearch is an open source distributed document store and search engine that stores and retrieves data structures in near real-time. It relies heavily on Apache Lucene, a full-text search engine written in Java. Elasticsearch represents data in the form of structured JSON documents, making full-text search accessible via RESTful API and web clients for languages like PHP, Python, and Ruby. The cluster is made up of one or more nodes, with each node being a single running instance of Elasticsearch that designates which cluster it belongs to and what type of node it can be. There are three common types of nodes in Elasticsearch: primary-eligible nodes, data nodes, and client nodes. Primary-eligible nodes are eligible to become the primary node if the current primary node fails, while data nodes store data in the form of shards and perform actions related to indexing, searching, and aggregating data. Client nodes act as load balancers that help route indexing and search requests. Elasticsearch organizes data by storing related documents in the same index, which can be thought of as a logical wrapper of configuration. Each index contains a set of related documents in JSON format, with each shard being a complete instance of Lucene, like a mini search engine. The primary node coordinates cluster tasks, and any primary-eligible node is also able to function as a data node. Elasticsearch provides several key metrics that can help detect signs of trouble and take action when faced with problems like unreliable nodes, out-of-memory errors, and long garbage collection times. These include search and indexing performance, memory and garbage collection, host-level system and network metrics, cluster health and node availability, resource saturation and errors, and pending tasks. Monitoring these metrics can help detect potential issues before they become major problems and ensure the smooth operation of an Elasticsearch cluster.