Company
Date Published
Author
Chuck Lauer Vose, Principal Software Engineer
Word count
3480
Language
English
Hacker News points
None

Summary

This article discusses how to configure bulkheads and circuit breakers in a way that prevents cascading failures in systems with multiple external services. Bulkheads are essentially connection pools that prevent a single failing service from taking down the entire system, while circuit breakers detect when a service is failing and automatically disconnect it until it's fixed. The article provides a framework for finding the right configuration for bulkhead systems, using Semian, a unified low-effort tool that adds circuit breakers and bulkheads to network requests automatically. It covers topics such as data collection, filtering, finding a good baseline time range, playing a guessing game to determine the optimal bulkhead value, and how to adjust the configuration based on historical data. The article emphasizes the importance of using historical data to find viable numbers before deploying the system to production, and encourages readers to continue monitoring these numbers after deployment.