Company
Date Published
Author
Sean Lynch
Word count
1942
Language
English
Hacker News points
None

Summary

A death spiral occurs when a system becomes overloaded due to an increase in concurrency, causing it to slow down or become unresponsive. This can happen in single-node systems where the load balancer is unable to handle the sudden surge of requests, and in distributed systems where requests spawned by one node can cause other nodes to become overwhelmed. Limiting concurrency close to the client, using job queues, avoiding loops in the call graph, and marking servers dead or limiting outbound concurrency per destination server can help prevent death spirals. By understanding these causes and following design guidelines, developers can build more robust and reliable distributed services that can handle real-world conditions.