Home / Companies / OpenPipe / Blog / Post Details
Content Deep Dive

Introducing Direct Preference Optimization (DPO) Support on OpenPipe

Blog post from OpenPipe

Post Details
Company
Date Published
Author
Kyle Corbitt
Word Count
740
Language
English
Hacker News Points
1
Summary

OpenPipe has introduced Direct Preference Optimization (DPO) support, allowing users to align models with their specific requirements more strongly. DPO is an advanced fine-tuning method that enables models to learn directly from preference data, making it useful when users have a source of preference data that they can exploit. This technique is particularly effective when used in conjunction with user-defined criteria and has shown promising results in initial tests, such as reducing the number of responses exceeding word limits by 77% or dropping hallucinated information by 76%. To get started with DPO on OpenPipe, users need to prepare their preference data, upload it to the platform, select the DPO option during fine-tuning job configuration, and launch their training run. The company is also working on integrating DPO into an online learning workflow to enable continual learning.