Reinforcement Fine Tuning (Beta): Train expert open models to surpass closed frontier models

Company

Fireworks AI

Date Published

Oct. 6, 2025

Author

Word count

958

Language

English

Hacker News points

None

URL

fireworks.ai/blog/reinforcement-fine-tuning

Summary

Reinforcement Fine-Tuning (RFT) is a new technique announced in beta, aimed at enhancing expert models for complex tasks like agentic reasoning, function calling, and coding by leveraging Reinforcement Learning with Verifiable Reward (RLVR). RFT allows for improved model quality with minimal examples and can outperform closed frontier models in both quality and speed, as evidenced by its application in customer service AI agents and code generation with partners like Vercel. It simplifies the traditionally complex setup of reinforcement learning by automating infrastructure and training management, requiring only a Python evaluator function to grade model outputs. This approach extends to creative writing by using large language models as judges for tasks that require subjective evaluation. The Fireworks platform facilitates training without the need for complex infrastructure, and it is currently offering free access to train open models like Llama and DeepSeek for two weeks, encouraging users to explore various applications and contribute their ideas.