A 4-Month Bug Fixed in <10 Minutes with Olly
Blog post from Coralogix
In a complex interconnected system, a persistent latency issue in a NotificationService was resolved in under ten minutes using Olly, an autonomous observability agent. The issue, which had persisted for four months, was not evident through traditional telemetry analysis due to its indirect cause, rooted in a transitive dependency on a Postgres database. Olly utilized the Coralogix data layer to map out service dependencies and identified a correlation between CPU spikes in an RDS database and a Lambda function performing inefficient queries on unindexed tables. By tracing these telemetry "breadcrumbs," Olly pinpointed the root cause, provided optimization recommendations, and demonstrated its capability to efficiently navigate and diagnose complex distributed systems, highlighting its advantage in environments where knowledge is fragmented across teams.