The Datadog platform experienced a global outage that affected all services across multiple regions, resulting in a loss of 60 percent of compute capacity. The teams quickly realized they needed to restore the platform and worked through decisions whose outcomes and downstream effects were not immediately obvious. They recovered the EU1 region by rebooting impacted nodes, progressively recovering clusters, and ultimately restoring 100 percent compute capacity. However, they faced challenges in other regions, including US1, where instances remained in an "Unregistered" state due to Vault instability, and scaling up to process a backlog of data caused issues with node registration and IP allocation. The teams learned several lessons, including the importance of strong relationships with partners, testing for global failure scenarios, and preparing for sudden capacity increases. They successfully restored their platform-level capabilities and are now working on restoring the Datadog application.