Fine-tuning LLama 3.1 8B and Outperforming the Competition

Company

Monster API

Date Published

Aug. 17, 2024

Author

Gaurav Vij

Word count

789

Language

English

Hacker News points

None

URL

blog.monsterapi.ai/enhancing-performance-by-finetuning-llama-3-1-8b-model

Summary

Fine-tuning the Llama 3.1 base model using MonsterAPI's no-code LLM fine-tuner, MonsterTuner, resulted in exceptional performance in multistep soft reasoning and general problem-solving and question answering benchmarks, outperforming larger models while being efficient and cost-effective. The use of Odds Ratio Preference Optimization (ORPO), a novel preference alignment algorithm, significantly enhanced the model's fine-tuning process. The fine-tuned model achieved remarkable scores in MuSR and GPQA, demonstrating its capability to handle multistep reasoning and complex narrative-based tasks effectively, and surpassing many larger models in general problem-solving and question-answering ability.