Together Evaluations now supports comparing top commercial APIs vs. open source models

Post Details

Company

Together AI

Date Published

Feb. 2, 2026

Author

Ivan Provilkov, Conner Manuel, Kirah Sapong, Ruslan Khaidurov, Jasmine Li, Zain Hasan, Jennifer Wu, Max Ryabinin

Word Count

634

Language

English

Hacker News Points

-

Source URL

www.together.ai/blog/together-evaluations-v2

Summary

Together Evaluations introduces a comprehensive framework for assessing the quality of large language models (LLMs), facilitating comparisons between open-source, fine-tuned, and proprietary models. This framework allows teams to make data-driven decisions regarding model selection and optimization by evaluating models side-by-side using standardized metrics and methodologies. The latest update includes support for closed-source frontier models from major providers like OpenAI, Anthropic, and Google, enabling cross-model benchmarking. The platform also offers capabilities to evaluate fine-tuned models using Together AI's deployment options and provides resources such as a deep dive and cookbook to guide users through optimizing and evaluating models, demonstrating how open-source models can outperform closed-source ones with significant cost and speed advantages. The evaluation services also include automated prompt optimization using frameworks like GEPA, which enhances prompt efficacy through iterative LLM-guided reflection. Users can leverage the platform via UI, API, or Python client, with comprehensive documentation and tutorials available to aid in the use of these new features.