Home / Companies / HuggingFace / Blog / Post Details
Content Deep Dive

ML Intern Takes Our Post-Training Internship Test

Blog post from HuggingFace

Post Details
Company
Date Published
Author
Carlos Miguel PatiƱo, Aksel Joonas Reedi, and Lewis Tunstall
Word Count
924
Language
-
Hacker News Points
-
Summary

In a post-training exercise, the ML intern replicated a HuggingFace internship test to explore Best-of-N Weighted Selection on MATH-500 problems. The study involved sampling multiple solutions from a large language model (LLM) and scoring each using a Process Reward Model (PRM), selecting the solution with the highest total weighted score. The Weighted Best-of-N approach demonstrated superior accuracy compared to greedy and standard methods, with improvements noted as more solutions were sampled. Key findings included that weighted selection overcomes the limitations of single high-scoring incorrect solutions by aggregating evidence across multiple correct solutions, as seen in specific number theory problems. The report highlighted the effectiveness of PRM in distinguishing correct from incorrect solutions and suggested that accounting for formatting differences could further enhance accuracy. The methodology was supported by co-authored code with contributions to pipeline structure, model loading, and voting implementation, alongside comprehensive results and analysis.