Introducing Runner Replicas: Scalable, Reliable Automation for Modern Ops
Blog post from PagerDuty
Runner Replicas, a new feature in Runbook Automation, enhances the reliability and scalability of automation processes by allowing multiple instances of the same Runner to operate as a distributed and fault-tolerant service. This advancement transforms the automation engine from a single execution point into a resilient and horizontally scalable system, eliminating manual interventions and enabling engineering teams to focus on strategic tasks. Runner Replicas can be deployed across different hosts, ensuring job continuity and performance even if a host fails or demand spikes. They allow for regional distribution, providing geographic affinity and execution resilience without complex scheduling. The feature supports both controlled environments and ephemeral infrastructures, with the flexibility to manually provision replicas or let them scale automatically. By reducing operational overhead and minimizing failure risks, Runner Replicas offer immediate business value and confidence in large-scale automation, ensuring that the automation process itself does not become a point of fragility. Available in Runbook Automation 5.15, Runner Replicas are accessible in both SaaS and self-hosted editions, with setup instructions provided in the technical documentation.