Training language models to follow instructions with human feedback - Summary

Post Details

Company

Portkey

Date Published

April 15, 2023

Author

Rohit Agarwal

Word Count

249

Language

English

Hacker News Points

-

Source URL

portkey.ai/blog/training-language-models-to-follow-instructions-with-human-feedback-summary

Summary

The paper explores a method to align language models with user intent by fine-tuning them using human feedback, resulting in models known as InstructGPT. These models demonstrate improvements in truthfulness and a reduction in the generation of toxic outputs, with only minimal performance regressions on public NLP datasets. The study highlights the misalignment between traditional language modeling objectives and the goal of adhering to user instructions, emphasizing that public NLP datasets do not accurately represent real-world usage of language models. InstructGPT models, despite having significantly fewer parameters than GPT-3, produce outputs that labelers prefer, showcasing the potential of human feedback in enhancing language model alignment.