SLiC-HF: Sequence Likelihood Calibration with Human Feedback - Summary
Blog post from Portkey
SLiC-HF, a novel approach utilizing Sequence Likelihood Calibration with Human Feedback, is introduced to enhance language models, proving effective for the TL;DR summarization task. This method serves as a simpler, more computationally efficient alternative to Reinforcement Learning from Human Feedback (RLHF) and can leverage human feedback data from different models akin to off-policy offline RL data. The paper positions SLiC-HF as a competitive alternative to the PPO RLHF implementation, offering easier implementation, tuning, and computational efficiency. It emphasizes the approach's advantages, including the use of calibration and cross-entropy loss in improving models like T5, and highlights its performance through metrics such as ROUGE, perplexity, and win rate, demonstrating its efficacy in automatic evaluation systems.
| Trend | Post Mentions | Total Month Mentions | Posts | Companies | MoM |
|---|---|---|---|---|---|
| Reinforcement learning | 6 | No monthly metrics for this publish month. | |||
| AI Model Fine-tuning | 1 | 169 | 75 | 54 | - |