Home / Companies / AssemblyAI / Blog / Post Details
Content Deep Dive

How ChatGPT actually works

Blog post from AssemblyAI

Post Details
Company
Date Published
Author
Marco Ramponi
Word Count
3,262
Language
English
Hacker News Points
4
Summary

ChatGPT is based on the Reinforcement Learning with Human Feedback (RLHF) methodology, which consists of three main steps: supervised fine-tuning (SFT), reinforcement learning from human feedback (RLHF), and evaluating the resulting model. In the SFT step, a pre-trained language model is fine-tuned on high-quality instruction data. In the RLHF step, the model is trained with an additional reward model based on human feedback to optimize its output for human preferences. Finally, the performance of the resulting model is evaluated by human labelers on several criteria including helpfulness, truthfulness, and harmlessness.