Home / Companies / Datadog / Blog / Post Details
Content Deep Dive

How we cut our NLQ agent debugging time from hours to minutes with LLM Observability

Blog post from Datadog

Post Details
Company
Date Published
Author
Florent Le Gall, Alex Guo, Will Potts
Word Count
1,379
Language
English
Hacker News Points
-
Summary

Datadog's Cloud Cost Management (CCM) team developed a natural language query (NLQ) agent that translates plain-English questions into valid Datadog metrics queries, allowing FinOps and engineering users to evaluate costs with ease. The agent's non-conversational nature required a focus on correctness, prompting the team to conduct user testing and create a reference dataset from real user prompts. To address the challenges posed by the nondeterministic nature of large language models, Datadog implemented LLM Observability with component-level evaluators for parsing, metric selection, roll-up, group-bys, and filters, enabling more precise debugging and iteration. This approach streamlined testing and debugging, reducing time spent on these tasks by 20 times through automated evaluations and trace-level inspection. Additionally, the use of Datadog's distributed tracing facilitated seamless integration with existing systems, allowing for objective model comparisons and continuous improvement of the NLQ agent.