What Happens to Tablets (Shards) When Node Is Lost and Then Brought Back Into Cluster?

Company

Yugabyte

Date Published

Sept. 22, 2023

Author

Marko Rajcevic

Word count

510

Language

English

Hacker News points

None

URL

www.yugabyte.com/blog/node-loss-recovery-shards

Summary

When a node is lost in a running cluster and then brought back online, the system prioritizes consistency over availability according to the CAP theorem, but still maintains high availability with a replication factor of 3. A 3-second re-election process ensures that a new leader is elected, causing temporary latency for certain operations. After being repaired, the node is caught up by the remaining nodes and leaders are redistributed equally across all nodes. If a node is down for longer than 15 minutes, it will be removed from the system unless replaced with a new node, allowing data to be replicated behind the scenes without manual intervention.