Evaluating generative AI performance: When your data is “anything”

Post Details

Company

New Relic

Date Published

Dec. 5, 2023

Author

Tal Reisfeld, Senior Data Scientist

Word Count

1,832

Language

English

Hacker News Points

-

Source URL

newrelic.com/blog/news/evaluating-genai-performance

Summary

New Relic AI was developed to enhance observability by allowing users to interact with telemetry data through natural language, reducing the dependence on New Relic Query Language (NRQL) proficiency. The AI aims to provide accurate insights, identify system anomalies, and streamline tech stack analysis, while addressing challenges like performance validation and AI hallucinations. The team implemented strategies such as modular design for systematic evaluation, syntax validation for NRQL queries, and retrieval-augmented generation (RAG) to mitigate hallucinations and enhance response accuracy. Transparency and clear communication are promoted by providing users with context and intermediate steps in the Q&A process, enabling them to evaluate the AI's answers. User feedback is integral to improving New Relic AI's capabilities, with ongoing input collected to refine and enhance its performance.