Company
Date Published
Author
Kulpreet Singh
Word count
4931
Language
English
Hacker News points
None

Summary

HashiCorp conducted a scale test to observe gossip stability risk failure for Consul deployments with more than 50,000 clients. The test compared three configuration scenarios: baseline (66K clients on 1 gossip pool), 20-segment (22K clients on <default> segment and 55K across 20 segments), and 64-segment (22K clients on <default> segment and approximately 55K across 64 segments). The results showed that Consul servers remained healthy under all test configurations, and splitting a large LAN gossip pool into smaller pools with network segments reduces gossip stability risk by making the gossip converge faster. The primary goal was to observe a reduction in the `consul.serf.queue.Intent` metric, which was reduced by more than 90% post-20-segment migration. Additionally, the test highlighted the importance of configuring the ARP cache appropriately for large-scale Consul cluster deployments and introduced new labeled Serf and memberlist metrics to monitor gossip traffic between segments.