On the Brittleness of Dashboards
Blog post from Honeycomb
Dashboards, while popular and useful tools for software engineers, can often be over-relied upon, potentially overshadowing more suitable approaches, especially during incidents. They serve as a window into the system by helping engineers develop mental models to infer system states from limited metrics, but this reliance can become brittle as systems change and disrupt established patterns. During high-pressure incidents, dashboards can become overwhelming due to the sheer volume of data, making them less effective for quick decision-making as they require interpretation of signals that may not always provide clear causal clues. Instead, dashboards should be streamlined to highlight high-level signals, similar to vital signs in a hospital, directing attention to potential issues without diagnosing them outright. The emphasis should be on alerts that guide engineers with relevant context and information, allowing them to bypass extensive data sifting and focus on immediate response actions. This approach suggests that dashboards should complement, not replace, deeper, contextual understanding and other investigative tools in incident response.