PagerDuty’s Engineering Management Handbook for Healthier Teams and Services
Blog post from PagerDuty
In a recent discussion led by Julian Dunn, PagerDuty's engineering managers Leeor Engel and Dileshni Jayasinghe explored strategies for managing real-time, unplanned work and building effective on-call teams, reflecting on findings from "The State of Digital Operations." They emphasized the importance of fine-tuning alerts and monitoring tools to manage the increased noise from incidents, especially as work shifts to remote settings, and highlighted the necessity of fostering an organizational culture that supports on-call engineers through ownership, psychological safety, and continuous learning. Engel and Jayasinghe shared insights into maintaining team and service health by addressing excessive work hours and interruptions, advocating for documentation and policies that allow engineers to recover from disruptive on-call shifts, and using dashboards to proactively manage workloads. They also discussed the value of operational reviews and knowledge sharing through postmortems and suggested maintaining a culture of learning by having new teams and managers shadow existing ones. The conversation underscored the importance of building a robust on-call muscle through empathy, practice, and learning from both past experiences and industry resources.