Company
Date Published
Author
Marko Rajcevic
Word count
510
Language
English
Hacker News points
None

Summary

When a node is lost in a running cluster and then brought back online, the system prioritizes consistency over availability according to the CAP theorem, but still maintains high availability with a replication factor of 3. A 3-second re-election process ensures that a new leader is elected, causing temporary latency for certain operations. After being repaired, the node is caught up by the remaining nodes and leaders are redistributed equally across all nodes. If a node is down for longer than 15 minutes, it will be removed from the system unless replaced with a new node, allowing data to be replicated behind the scenes without manual intervention.