How we cut our NLQ agent debugging time from hours to minutes with LLM Observability
Blog post from Datadog
Datadog's Cloud Cost Management (CCM) team developed a natural language query (NLQ) agent that translates plain-English questions into valid Datadog metrics queries, allowing FinOps and engineering users to evaluate costs with ease. The agent's non-conversational nature required a focus on correctness, prompting the team to conduct user testing and create a reference dataset from real user prompts. To address the challenges posed by the nondeterministic nature of large language models, Datadog implemented LLM Observability with component-level evaluators for parsing, metric selection, roll-up, group-bys, and filters, enabling more precise debugging and iteration. This approach streamlined testing and debugging, reducing time spent on these tasks by 20 times through automated evaluations and trace-level inspection. Additionally, the use of Datadog's distributed tracing facilitated seamless integration with existing systems, allowing for objective model comparisons and continuous improvement of the NLQ agent.
| Trend | Post Mentions | Total Month Mentions | Posts | Companies | MoM |
|---|---|---|---|---|---|
| Observability | 17 | 2,104 | 424 | 141 | -21% |
| LLM | 13 | 3,836 | 662 | 193 | +2% |
| Harness engineering | 1 | 80 | 60 | 39 | +29% |