Introducing Direct Preference Optimization (DPO) Support on OpenPipe

Post Details

Company

OpenPipe

Date Published

Oct. 1, 2024

Author

Kyle Corbitt

Word Count

740

Language

English

Hacker News Points

1

Source URL

openpipe.ai/blog/announcing-dpo-support

Summary

OpenPipe has introduced Direct Preference Optimization (DPO) support, allowing users to align models with their specific requirements more strongly. DPO is an advanced fine-tuning method that enables models to learn directly from preference data, making it useful when users have a source of preference data that they can exploit. This technique is particularly effective when used in conjunction with user-defined criteria and has shown promising results in initial tests, such as reducing the number of responses exceeding word limits by 77% or dropping hallucinated information by 76%. To get started with DPO on OpenPipe, users need to prepare their preference data, upload it to the platform, select the DPO option during fine-tuning job configuration, and launch their training run. The company is also working on integrating DPO into an online learning workflow to enable continual learning.