Home / Companies / Twilio / Blog / Post Details
Content Deep Dive

Lazarus: Twilio’s Cloud Scale Automated Microservice Remediation System

Blog post from Twilio

Post Details
Company
Date Published
Author
Sushil Prasad
Word Count
3,740
Company Posts That Month
17
Language
English
Hacker News Points
-
Post removed?
No
Summary

Lazarus, a command and control cluster automation system built by Twilio, automates frequent operational tasks to improve scalability and efficiency. It addresses the challenges of running thousands of microservices in a large-scale distributed system by providing automated remediation for failed hosts, services, and other issues. By analyzing events and triggering workflows, Lazarus minimizes false positives and true negatives, ensuring that only complex failures are escalated to on-call engineers. The system is designed to work with existing tools like Nagios and Datadog, and provides a flexible configuration management framework, notification engine, and auditing & reporting features. With 90% of instances in Twilio's cloud infrastructure running with Lazarus remediations enabled, the system has been successfully deployed for over two years, improving resilience and reducing engineering workload.

Trends Found in this Post

No tracked trend matches for this post yet.

Use This Data

Use this post, company, and trend context to find content marketing opportunities, perform competitive analysis, or address product feature gaps via the Plushcap MCP server or the Plushcap API.