Home / Companies / Grafana Labs / Blog / Post Details
Content Deep Dive

How we reduced flaky tests using Grafana, Prometheus, Grafana Loki, and Drone CI

Blog post from Grafana Labs

Post Details
Company
Date Published
Author
Dimitris Sotirakis
Word Count
1,639
Language
English
Hacker News Points
-
Summary

Flaky tests, which unpredictably succeed or fail without changes to the code, pose significant challenges in software development, particularly during continuous integration (CI) processes. At Grafana Labs, they tackled this issue by utilizing Drone CI, a container-native CI tool, alongside Grafana, Prometheus, and Grafana Loki to monitor and manage these tests. By exporting environmental variables from Drone CI and creating custom metrics with a Prometheus exporter, the team could visualize build data and set up alerts for CI failures. They also used Grafana Loki to query Drone logs, helping identify and address flaky tests by running repeated builds and analyzing success ratios. This approach allowed them to delegate troubleshooting to the right teams and improve their build success rate, establishing alerts to maintain it above a set threshold. Although the complete eradication of flaky tests is unlikely in large projects, the observability and responsive measures implemented have significantly enhanced the robustness and efficiency of their CI pipelines, ensuring timely and reliable releases of their products.