Qwen3 Coder 480B A35B API Benchmarks: Latency & Cost

Post Details

Company

Deepinfra

Date Published

April 3, 2026

Author

Deep

Word Count

1,498

Company Posts That Month

34

Language

English

Hacker News Points

-

Post removed?

No

Source URL

deepinfra.com/blog/qwen3-coder-480b-a35b-api-benchmarks

Summary

Qwen3 Coder 480B A35B Instruct is a sophisticated large language model developed by Alibaba Cloud's Qwen team, designed for agentic coding and code generation tasks. It features a Mixture-of-Experts architecture with 480 billion total parameters and 35 billion active parameters per inference, offering high performance at reduced computational costs compared to similarly scaled dense models. The model's capabilities include a native context length of 256K tokens, extendable to 1 million tokens via YaRN interpolation, and it excels in tasks like agentic coding and browser use, achieving performance on par with Claude Sonnet 4. Trained on 7.5 trillion tokens with a 70% code ratio across 358 programming languages, its post-training employs long-horizon reinforcement learning to enhance multi-step planning and tool interaction. Among various API providers, DeepInfra (Turbo, FP4) is recommended for its low cost ($0.41/1M), low latency (0.60s TTFT), and support for Function Calling, making it ideal for interactive and cost-sensitive applications. DeepInfra (FP8) provides higher throughput at a moderate price, while Google Vertex offers a balanced option with full support for JSON mode and Function Calling. Eigen AI leads in throughput for bulk operations but lacks Function Calling, and Amazon Bedrock is suitable for AWS compliance needs despite higher latency and lack of JSON mode support.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
LLM	3	5,932	1,046	223	-2%
Vector Search	2	1,739	413	146	-27%
AI Agents	1	4,430	1,100	236	-3%
Real-time	1	6,296	1,346	246	-2%
Reinforcement learning	1	104	49	23	-14%

Use This Data

Use this post, company, and trend context to find content marketing opportunities, perform competitive analysis, or address product feature gaps via the Plushcap MCP server or the Plushcap API.