On February 18, 2025, Grafana Cloud experienced a 150-minute outage affecting 25% of its services due to a configuration change in their TLS policies, which led to the loss of load balancers. This incident, caused by the company's failure to adhere to its standard testing and deployment practices, did not result in any security breaches or data leaks, but some customers were unable to access services and potentially lost data. Grafana Labs emphasizes its commitment to transparency and learning from mistakes, detailing how the outage was resolved by rolling back the change and recreating affected services. Moving forward, the company plans to enforce stricter deployment controls, including mandatory use of deployment waves and enhanced deletion protections, to prevent similar incidents. They have implemented a CI check to restrict deployments to one wave at a time and are improving change validation through their Crossplane Managed Resources exporter, which is now in beta testing.