Customers over control: how we measure On-call reliability

Post Details

Company

Incident.io

Date Published

May 28, 2026

Author

Mike Fisher

Word Count

2,164

Company Posts That Month

20

Language

English

Hacker News Points

-

Post removed?

No

Source URL

incident.io/blog/customers-over-control

Summary

The blog post by Mike Fisher focuses on how incident.io approaches on-call reliability by prioritizing customer experience over mere technical control. It emphasizes two critical functions of their On-call product: alert ingestion and notification delivery. The company uses Service Level Indicators (SLIs) to measure alert ingestion availability and notification delivery latency, aiming for a monthly Service Level Objective (SLO) of 99.99% for both. Fisher explains how incident.io designs its systems to cope with third-party dependencies and user-configured delays, ensuring that notifications are timely and reliable even in complex scenarios. The post argues against the notion of excusing failures due to factors outside direct control, instead advocating for a proactive approach that considers customer outcomes as paramount. By embracing complexity and designing redundancy into both their systems and those of their customers, incident.io seeks to deliver a superior, reliable customer experience.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
Kubernetes	1	1,965	371	106	-15%
MCP	1	7,098	726	186	+16%
Observability	1	3,421	707	180	-24%

Use This Data

Use this post, company, and trend context to find content marketing opportunities, perform competitive analysis, or address product feature gaps via the Plushcap MCP server or the Plushcap API.