Benchmarking inference at scale: coding agents

Post Details

Company

Together AI

Date Published

April 30, 2026

Author

Together AI

Word Count

2,862

Language

English

Hacker News Points

-

Source URL

www.together.ai/blog/coding-agent-benchmarks

Summary

Together Inference Engine demonstrates significant performance advantages over TensorRT-LLM and SGLang in handling high-concurrency coding agent workloads, offering over 50% more tokens per second (TPS) and twice the time to first token (TTFT) efficiency at saturation on similar hardware. Designed to simulate real production conditions with long inputs and no latency tolerance, the benchmark highlights how different engines manage load, with Together's engine maintaining functionality at higher traffic levels compared to its competitors. The Kimi K2.6 model, available on the Together platform, matches the coding benchmarks of Claude Opus 4.6 at a substantially lower cost—76% cheaper per request—providing a cost-effective solution for large-scale operations. The study emphasizes the importance of realistic benchmarks and detailed optimization techniques, such as the ThunderMLA kernel, which significantly enhance performance by reducing overhead and improving execution efficiency, making Together's engine a robust choice for high-demand environments.