Outage on octopus.com - report and learnings
Blog post from Octopus Deploy
On January 25-26, 2023, Octopus Deploy experienced a significant service disruption affecting customers primarily in the US and EU, caused by issues during a planned DNS and routing provider migration involving Azure Front Door (AFD). The incident, lasting over 37 hours, was triggered by a misbehaving AFD profile, compounded by a network outage in Azure, internal routing problems, and complications from a new Web Application Firewall (WAF). Initial mitigation efforts included creating a CNAME record to redirect traffic through a www subdomain, but the incorrect IP address resolution persisted until Azure made internal changes to the AFD profile. Octopus Deploy has since engaged with Azure to prevent future occurrences, refined its incident response processes, and emphasized the importance of rollback-safe migration steps to minimize downtime and service disruption.