SLiC-HF: Sequence Likelihood Calibration with Human Feedback - Summary

Company

Portkey

Date Published

May 21, 2023

Author

The Quill

Word count

222

Language

English

Hacker News points

None

URL

portkey.ai/blog/slic-hf-sequence-likelihood-calibration-with-human-feedback-summary

Summary

SLiC-HF, a novel approach utilizing Sequence Likelihood Calibration with Human Feedback, is introduced to enhance language models, proving effective for the TL;DR summarization task. This method serves as a simpler, more computationally efficient alternative to Reinforcement Learning from Human Feedback (RLHF) and can leverage human feedback data from different models akin to off-policy offline RL data. The paper positions SLiC-HF as a competitive alternative to the PPO RLHF implementation, offering easier implementation, tuning, and computational efficiency. It emphasizes the approach's advantages, including the use of calibration and cross-entropy loss in improving models like T5, and highlights its performance through metrics such as ROUGE, perplexity, and win rate, demonstrating its efficacy in automatic evaluation systems.