How we cut our NLQ agent debugging time from hours to minutes with LLM Observability
Blog post from Datadog
Datadog's Cloud Cost Management (CCM) team developed a natural language query (NLQ) agent that translates plain-English questions into valid Datadog metrics queries, allowing FinOps and engineering users to evaluate costs with ease. The agent's non-conversational nature required a focus on correctness, prompting the team to conduct user testing and create a reference dataset from real user prompts. To address the challenges posed by the nondeterministic nature of large language models, Datadog implemented LLM Observability with component-level evaluators for parsing, metric selection, roll-up, group-bys, and filters, enabling more precise debugging and iteration. This approach streamlined testing and debugging, reducing time spent on these tasks by 20 times through automated evaluations and trace-level inspection. Additionally, the use of Datadog's distributed tracing facilitated seamless integration with existing systems, allowing for objective model comparisons and continuous improvement of the NLQ agent.