Ask Miss O11y: Load Testing With Fidelity
Blog post from Honeycomb
Running load tests in production can be a beneficial practice, provided it is executed with caution and proper planning to avoid risks. It is essential to establish a clear hypothesis, have an emergency stop option, and ensure there is enough error budget to accommodate potential issues that may arise during testing. Understanding peak and off-peak traffic capacity helps in preventing service failures during high-demand periods. Service-level objectives (SLOs) should reflect actual user experiences rather than just successful load test results, ensuring that real user traffic is prioritized. Differentiating between artificial and real user traffic in observability data, such as through HTTP headers or telemetry attributes, is crucial for effective monitoring and analysis. Tools like Honeycomb enable the exclusion of load test data from SLO calculations, allowing for more accurate assessments of service health. Properly managing load tests in production can enhance both system performance and the accuracy of operational dashboards, without incurring excessive observability costs.