/plushcap/analysis/datadog/engineering-2023-03-08-deep-dive-into-incident-response

2023-03-08 Incident: A Deep Dive into Our Incident Response

What's this blog post about?

Datadog experienced a global outage on March 8th, which was the first of its kind for the company. The incident involved several hundred engineers working in shifts and using various communication channels to resolve the issue. This post describes Datadog's incident response process, including monitoring systems, high-severity incident management, training, and a blameless culture. The outage provided valuable lessons on improving internal response, customer communications, and overall preparedness for future incidents.

Company
Datadog

Date published
June 1, 2023

Author(s)
Laura de Vesine

Word count
3798

Hacker News points
None found.

Language
English


By Matt Makai. 2021-2024.