Home / Companies / Deepinfra / Blog / Post Details
Content Deep Dive

GLM-5.1 API Benchmarks: Latency, Throughput & Cost

Blog post from Deepinfra

Post Details
Company
Date Published
Author
Deep
Word Count
2,142
Language
English
Hacker News Points
-
Summary

DeepInfra's GLM-5.1 is an advanced reasoning model optimized for long-horizon agentic engineering, released in April 2026, featuring a 754-billion parameter Mixture-of-Experts architecture. It was benchmarked across ten API providers, showing a significant variation in pricing and performance, with blended pricing ranging from $0.74 to $1.70 per million tokens and output speed differences of up to 5.2 times between providers. DeepInfra is highlighted as the best option, offering the lowest costs across all metrics and tying for the fastest time to first token. Fireworks leads in raw output speed, while Wafer provides a balanced alternative. The model is designed for sustained improvements across extensive runs, demonstrated by its ability to autonomously build a Linux desktop environment. DeepInfra's FP8 deployment stands out for its cost-efficiency and practical cached input pricing, making it ideal for complex, agentic workloads.