Defining and Distributing Security Protocols for Fault Tolerance
Blog post from PagerDuty
PagerDuty's security management strategy emphasizes high availability and reliability through a system of centralized policy management and distributed enforcement, reducing single points of failure and enhancing fault tolerance. Their approach includes dynamic local firewalls and point-to-point encryption based on IPSec, which allows for efficient, scalable, and secure communication between nodes without relying on vulnerable VPN gateways. By migrating to a Service Oriented Architecture, PagerDuty isolates services to prevent lateral movements and ensures secure operations even if individual servers fail. Furthermore, their role-based access control system, implemented with Chef and Linux user groups, supports a least-privilege permissions model, facilitating secure and efficient access management via version-controlled JSON configurations.