Why optimizing for MTTR over MTBF is better for business

Post Details

Company

Grafana Labs

Date Published

July 1, 2020

Author

Tom Wilkie

Word Count

1,237

Company Posts That Month

19

Language

English

Hacker News Points

-

Post removed?

Yes

Source URL

grafana.com/blog/2020/07/01/why-optimizing-for-mttr-over-mtbf-is-better-for-business

Summary

The blog post argues for optimizing Mean Time to Recovery (MTTR) over Mean Time Between Failures (MTBF) in running a Software as a Service (SaaS) business, emphasizing that frequent releases and embracing instability can lead to better product reliability and responsiveness. By continuously deploying minimum viable products and testing in production, teams can become adept at handling failures, fostering a resilient on-call team that is better prepared for outages. This approach allows for quicker adaptation to customer needs, as frequent updates minimize the size of each change, reducing the risk of significant disruptions. The post suggests that deploying updates regularly helps teams to maintain familiarity with the codebase, making even the traditionally risky holiday season a stable period for MTTR-focused companies. By leveraging tools like Kubernetes and maintaining a solid observability strategy, teams can effectively manage incidents and balance their release cadence with on-call load, ultimately leading to a more agile and responsive product development cycle.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
Kubernetes	1	1,229	132	45	+59%
Observability	1	505	103	31	+6%

Use This Data

Use this post, company, and trend context to find content marketing opportunities, perform competitive analysis, or address product feature gaps via the Plushcap MCP server or the Plushcap API.

Why optimizing for MTTR over MTBF is better for businessRemoved