Evaluate LLMs and LLM applications for accuracy with NVIDIA NeMo Evaluator and Datadog LLM Observability

Post Details

Company

Datadog

Date Published

April 23, 2025

Author

Shri Subramanian, Barry Eom

Word Count

582

Company Posts That Month

34

Language

English

Hacker News Points

-

Source URL

www.datadoghq.com/blog/nvidia-nemo-evaluator

Summary

NVIDIA NeMo Evaluator is a microservice with an easy-to-use API that simplifies the end-to-end evaluation of generative AI applications, including retrieval-augmented generation (RAG) and agentic AI. It supports evaluation for a wide range of custom tasks and domains, including reasoning, coding, retrieval, and instruction-following, and allows developers to automatically evaluate their models against academic benchmarks or custom datasets, or score them with standard metrics such as accuracy, ROUGE, BLEU, or LLM-as-a-judge scoring. Datadog LLM Observability can be integrated with NeMo Evaluator to provide end-to-end visibility into the health and performance of LLM applications, tracing requests across RAG components and model inference and evaluation steps, collecting and visualizing key model metrics and metadata, and linking model quality metrics directly to corresponding LLM request traces for unified analysis.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
LLM	18	4,226	639	179	-13%
Observability	8	2,122	444	131	+14%
RAG	3	1,623	226	80	+8%
AI Agents	1	2,161	387	128	0%
AI Guardrails	1	220	86	29	-28%
Real-time	1	6,887	1,132	212	+49%