Together Evaluations now supports comparing top commercial APIs vs. open source models
Blog post from Together AI
Together Evaluations introduces a comprehensive framework for assessing the quality of large language models (LLMs), facilitating comparisons between open-source, fine-tuned, and proprietary models. This framework allows teams to make data-driven decisions regarding model selection and optimization by evaluating models side-by-side using standardized metrics and methodologies. The latest update includes support for closed-source frontier models from major providers like OpenAI, Anthropic, and Google, enabling cross-model benchmarking. The platform also offers capabilities to evaluate fine-tuned models using Together AI's deployment options and provides resources such as a deep dive and cookbook to guide users through optimizing and evaluating models, demonstrating how open-source models can outperform closed-source ones with significant cost and speed advantages. The evaluation services also include automated prompt optimization using frameworks like GEPA, which enhances prompt efficacy through iterative LLM-guided reflection. Users can leverage the platform via UI, API, or Python client, with comprehensive documentation and tutorials available to aid in the use of these new features.