How we cut our NLQ agent debugging time from hours to minutes with LLM Observability

Post Details

Company

Datadog

Date Published

Jan. 30, 2026

Author

Florent Le Gall, Alex Guo, Will Potts

Word Count

1,379

Company Posts That Month

17

Language

English

Hacker News Points

-

Source URL

www.datadoghq.com/blog/llm-observability-at-datadog-nlq

Summary

Datadog's Cloud Cost Management (CCM) team developed a natural language query (NLQ) agent that translates plain-English questions into valid Datadog metrics queries, allowing FinOps and engineering users to evaluate costs with ease. The agent's non-conversational nature required a focus on correctness, prompting the team to conduct user testing and create a reference dataset from real user prompts. To address the challenges posed by the nondeterministic nature of large language models, Datadog implemented LLM Observability with component-level evaluators for parsing, metric selection, roll-up, group-bys, and filters, enabling more precise debugging and iteration. This approach streamlined testing and debugging, reducing time spent on these tasks by 20 times through automated evaluations and trace-level inspection. Additionally, the use of Datadog's distributed tracing facilitated seamless integration with existing systems, allowing for objective model comparisons and continuous improvement of the NLQ agent.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
Observability	17	2,104	424	141	-21%
LLM	13	3,836	662	193	+2%
Harness engineering	1	80	60	39	+29%