How to fine-tune gpt-oss-120b with Baseten and Axolotl

Company

Baseten

Date Published

Aug. 19, 2025

Author

Sanskriti Sharma 2 others

Word count

1083

Language

English

Hacker News points

None

URL

www.baseten.co/blog/how-to-fine-tune-gpt-oss-120b-with-baseten-and-axolotl

Summary

OpenAI has released two new open-weight models, gpt-oss-120b and gpt-oss-20b, their first since GPT-2 in 2019, with the larger model rivaling OpenAI’s proprietary o4-mini on reasoning benchmarks and the smaller performing on par with o3-mini. These models can be fine-tuned for specific use cases, although fine-tuning the larger model is complex due to compute and model parallelism demands. To address these challenges, Baseten and Axolotl have partnered to provide a streamlined recipe for fine-tuning the gpt-oss-120b. Axolotl offers an open-source fine-tuning runtime that supports various techniques and is optimized for distributed parallel training, while Baseten simplifies large-scale model training by providing access to powerful GPUs, seamless scaling, and features like dataset caching and checkpointing. The fine-tuning process involves defining configurations for both Baseten and Axolotl, using a Python configuration file and a YAML file, respectively, to ensure smooth execution and performance monitoring, with tools like Truss for launching and managing training jobs.