Company
Date Published
Author
-
Word count
958
Language
English
Hacker News points
None

Summary

Reinforcement Fine-Tuning (RFT) is a new technique announced in beta, aimed at enhancing expert models for complex tasks like agentic reasoning, function calling, and coding by leveraging Reinforcement Learning with Verifiable Reward (RLVR). RFT allows for improved model quality with minimal examples and can outperform closed frontier models in both quality and speed, as evidenced by its application in customer service AI agents and code generation with partners like Vercel. It simplifies the traditionally complex setup of reinforcement learning by automating infrastructure and training management, requiring only a Python evaluator function to grade model outputs. This approach extends to creative writing by using large language models as judges for tasks that require subjective evaluation. The Fireworks platform facilitates training without the need for complex infrastructure, and it is currently offering free access to train open models like Llama and DeepSeek for two weeks, encouraging users to explore various applications and contribute their ideas.