Automatically boost the accuracy of any LLM, without changing your prompts or the model

Post Details

Company

Cleanlab

Date Published

Oct. 31, 2024

Author

Hui Wen Goh, Jay Zhang, Ulyana Tkachenko, Jonas Mueller

Word Count

1,890

Language

English

Hacker News Points

-

Source URL

cleanlab.ai/blog/llm-accuracy

Summary

The Trustworthy Language Model (TLM) enhances the accuracy of responses from various base language models (LLMs) such as GPT-4, GPT-3.5, and Claude 3 by scoring the trustworthiness of the responses to reduce errors without altering the prompts or relying on any additional models. TLM demonstrates the ability to decrease error rates significantly across a range of datasets like TriviaQA, ARC, SVAMP, and GSM8k, showcasing improvements over the base models. It operates by sampling multiple candidate responses, scoring their trustworthiness, and selecting the most reliable one, thereby improving the accuracy of LLM responses. While TLM increases accuracy, it may require longer runtimes, making it more suitable for data processing tasks rather than latency-sensitive applications.