LLMs are facing a QA crisis: Here’s how we could solve it

Post Details

Company

LogRocket

Date Published

Aug. 4, 2025

Author

Rosario De Chiara

Word Count

2,095

Language

-

Hacker News Points

-

Source URL

blog.logrocket.com/llms-are-facing-a-qa-crisis

Summary

The rise of large language models (LLMs) has fundamentally challenged traditional quality assurance (QA) practices due to their non-deterministic nature and reliance on probabilistic AI rather than deterministic code. As a result, conventional testing frameworks, which assume predictable inputs and outputs, struggle to address the dynamic and varied responses generated by LLMs. This paradigm shift highlights the need for new approaches and tools to ensure software reliability, as LLMs can mislead users and amplify biases in ways that traditional bugs do not. A case study involving a kiosk using LLMs to interact with users revealed issues like hallucinations and incorrect information, emphasizing the necessity for innovative QA strategies. These include using golden test sets, manual evaluations, A/B testing, and new tools to manage LLM behavior while balancing creativity and accuracy. The QA process for LLMs requires a blend of quantitative and qualitative evaluations, underlining the importance of treating prompts like code by versioning, reviewing, and testing them rigorously. As LLMs become integrated into critical systems, a robust QA framework that includes logging and monitoring will become essential to maintain user trust and ensure system reliability.