Improving data quality with confidence

Post Details

Company

Refuel

Date Published

July 31, 2023

Author

Dhruva Bansal, Nihit Desai

Word Count

1,625

Language

English

Hacker News Points

-

Source URL

www.refuel.ai/blog-posts/labeling-with-confidence

Summary

Leveraging large language models (LLMs) for data labeling necessitates accurately estimating the model's confidence in its own outputs to reject low-confidence labels and optimize ensemble strategies. By exploring various techniques for confidence estimation, the study found that token-level generation probabilities, commonly referred to as "logprobs," are the most accurate method, while prompting the LLM to produce a confidence score is notably unreliable. The research utilized Autolabel, an open-source library, to conduct experiments on a range of NLP tasks and demonstrated that token probabilities achieved the highest AUROC scores across different datasets. This study emphasizes the importance of confidence estimation in improving data labeling accuracy and provides insights into future enhancements through fine-tuning verifier LLMs. Additionally, the library supports confidence score computation by integrating with Refuel's Verifier LLM for models lacking native logprob extraction capabilities.