Introducing AutoJudge: Streamlined inference acceleration via automated dataset curation

Post Details

Company

Together AI

Date Published

Dec. 3, 2025

Author

ROMAN GARIPOV, FEDOR VELIKONIVTSEV, IVAN ERMAKOV, RUSLAN SVIRSCHEVSKI, VAGE EGIAZARIAN, MAX RYABININ

Word Count

1,077

Language

English

Hacker News Points

-

Source URL

www.together.ai/blog/introducing-autojudge-streamlined-inference-acceleration-via-automated-dataset-curation

Summary

Speculative decoding is an advanced method that accelerates token generation by using a small draft model alongside a larger target model, where the draft suggests potential next tokens and the target verifies them. AutoJudge, an enhancement of this method, introduces an automated system for identifying and accepting "unimportant" mismatches, which are differences that do not affect the final output's correctness. This approach eliminates the need for human labeling by using a small classifier trained on existing embeddings to predict the importance of mismatches. AutoJudge shows notable improvements in inference speed across various testing scenarios, such as mathematical reasoning and programming tasks, by allowing more tokens to be accepted per cycle with minimal accuracy loss. It integrates seamlessly with existing speculative decoding frameworks and demonstrates substantial throughput gains, particularly in bandwidth-limited scenarios. However, the speedup benefits are dependent on the specific task and the frequency of unimportant mismatches, suggesting that threshold tuning for the classifier may be necessary for optimal results.