Use Cleanlab to Improve LLMs: Find Errors in Human Feedback in the Anthropic RLHF Dataset

Post Details

Company

Cleanlab

Date Published

April 11, 2023

Author

Chris Mauck, Jonas Mueller

Word Count

351

Language

English

Hacker News Points

-

Source URL

cleanlab.ai/blog/csa/csa-1

Summary

Cleanlab Studio` is an AI platform used to detect and fix issues in data, including human feedback provided during the training of Large Language Models (LLMs) like `Anthropic's Claude`. The dataset `hh-rlhf` from `Hugging Face Datasets` was analyzed using Cleanlab Studio, revealing various problems with the data. Examples include rejected outputs being better than chosen outputs due to human mistakes, and chosen outputs merely describing a subject without answering a query. These issues can hinder the reliability of LLMs trained via Reinforcement Learning from Human Feedback (RLHF). By running datasets through Cleanlab Studio, organizations can identify and fix such problems, leading to more reliable Large Language models.