Company
Date Published
Author
DevOps Team
Word count
756
Language
English
Hacker News points
None

Summary

The Google Cloud Platform experienced a major outage on February 8th, 2016, affecting the Redis Cloud service. The outage was caused by a software bug introduced during a weekly upgrade to the service's software. The issue led to multiple nodes in one cluster failing simultaneously, causing downtime and manual recovery efforts. The company apologized for the impact on its customers and is taking immediate steps to prevent similar incidents, including suspending automatic weekly upgrades, revisiting DevOps code deployments, and improving tooling to handle DNS request throttling.