Company
Date Published
Author
-
Word count
2389
Language
-
Hacker News points
None

Summary

Elasticsearch's failover capabilities rely on maintaining multiple copies of data within a cluster, primarily through a primary-backup model where the primary shard handles all indexing operations and replicates changes to replica shards. This system ensures data safety even amidst network disruptions or node failures by promoting the most recent in-sync shard copies to primary status when needed. Shard allocation is managed by the master node, which records allocation decisions in the cluster state, enabling smart routing and ensuring only the latest data copies are used as primaries. In cases of network issues or node failures, if replicas miss updates, they are marked as stale and removed from the in-sync set to prevent data loss. Elasticsearch maintains write availability by updating the in-sync set via its consensus layer, ensuring that only fully updated copies can become primary. In extreme situations where all in-sync copies are lost, Elasticsearch provides manual commands to allocate stale or empty shard copies, though these measures involve data loss and are recommended only as last resorts.