Home / Companies / Portkey / Blog / Post Details
Content Deep Dive

SLiC-HF: Sequence Likelihood Calibration with Human Feedback - Summary

Blog post from Portkey

Post Details
Company
Date Published
Author
The Quill
Word Count
222
Company Posts That Month
5
Language
English
Hacker News Points
-
Summary

SLiC-HF, a novel approach utilizing Sequence Likelihood Calibration with Human Feedback, is introduced to enhance language models, proving effective for the TL;DR summarization task. This method serves as a simpler, more computationally efficient alternative to Reinforcement Learning from Human Feedback (RLHF) and can leverage human feedback data from different models akin to off-policy offline RL data. The paper positions SLiC-HF as a competitive alternative to the PPO RLHF implementation, offering easier implementation, tuning, and computational efficiency. It emphasizes the approach's advantages, including the use of calibration and cross-entropy loss in improving models like T5, and highlights its performance through metrics such as ROUGE, perplexity, and win rate, demonstrating its efficacy in automatic evaluation systems.

Trends Found in this Post
Trend Post Mentions Total Month Mentions Posts Companies MoM
Reinforcement learning 6 No monthly metrics for this publish month.
AI Model Fine-tuning 1 169 75 54 -