Home / Companies / Together AI / Blog / Post Details
Content Deep Dive

Together Evaluations now supports comparing top commercial APIs vs. open source models

Blog post from Together AI

Post Details
Company
Date Published
Author
Ivan Provilkov, Conner Manuel, Kirah Sapong, Ruslan Khaidurov, Jasmine Li, Zain Hasan, Jennifer Wu, Max Ryabinin
Word Count
634
Language
English
Hacker News Points
-
Summary

Together Evaluations introduces a comprehensive framework for assessing the quality of large language models (LLMs), facilitating comparisons between open-source, fine-tuned, and proprietary models. This framework allows teams to make data-driven decisions regarding model selection and optimization by evaluating models side-by-side using standardized metrics and methodologies. The latest update includes support for closed-source frontier models from major providers like OpenAI, Anthropic, and Google, enabling cross-model benchmarking. The platform also offers capabilities to evaluate fine-tuned models using Together AI's deployment options and provides resources such as a deep dive and cookbook to guide users through optimizing and evaluating models, demonstrating how open-source models can outperform closed-source ones with significant cost and speed advantages. The evaluation services also include automated prompt optimization using frameworks like GEPA, which enhances prompt efficacy through iterative LLM-guided reflection. Users can leverage the platform via UI, API, or Python client, with comprehensive documentation and tutorials available to aid in the use of these new features.