Home / Companies / Coralogix / Blog / Post Details
Content Deep Dive

Olly for SREs: 3 ways I actually use it in production

Blog post from Coralogix

Post Details
Company
Date Published
Author
Coralogix Team
Word Count
1,255
Language
English
Hacker News Points
-
Summary

In a practical breakdown of using an autonomous AI agent, the author describes how the tool, Olly, assists in investigating production incidents by quickly evaluating logs, metrics, traces, and alert contexts to provide a structured summary of issues and guide users to the root cause within minutes. The process begins with identifying whether an alert is indicative of a genuine issue or a transient anomaly, and Olly helps by establishing temporal deviations and correlating error messages with metric spikes. Once changes are understood, the tool assesses whether the service in question is the origin of degradation or merely absorbing impacts, allowing for informed escalation decisions. Olly supports structured hypothesis testing by analyzing evidence tied to different hypotheses, moving from metrics to logs to code, and identifying root causes with suggestions for fixes. This approach compresses the investigation steps, offering a significant time-saving advantage and enhancing the efficiency of incident management in production environments.