We Got Claude to Fine-Tune an Open Source LLM
Blog post from HuggingFace
A new tool called Hugging Face Skills enables Claude, a coding agent, to fine-tune language models by submitting jobs to cloud GPUs, monitoring progress, and pushing completed models to the Hugging Face Hub. This tutorial outlines how users can leverage this tool to train models using various methods, including supervised fine-tuning, direct preference optimization, and reinforcement learning, on datasets ranging from 0.5B to 70B parameters. The process involves dataset validation, hardware selection, script generation, and job submission, with real-time monitoring through Trackio. The tutorial emphasizes running quick test runs to ensure the setup is correct before committing to full-scale training, thereby saving costs. It also highlights the ability to convert models to GGUF format for local deployment post-training. Hugging Face Skills integrates with coding agents like OpenAI Codex and Google's Gemini CLI, making model fine-tuning accessible through conversational instructions, thus democratizing a process previously reserved for specialists.