MiniMax-M2.5 API Benchmarks: Latency, Throughput & Cost

Post Details

Company

Deepinfra

Date Published

April 3, 2026

Author

Deep

Word Count

1,853

Company Posts That Month

34

Language

English

Hacker News Points

-

Post removed?

No

Source URL

deepinfra.com/blog/minimax-m2-5-api-benchmarks

Summary

MiniMax-M2.5 is a cutting-edge large language model released in February 2026, featuring a 230B-parameter Mixture of Experts (MoE) architecture with innovative Lightning Attention, supporting a context window of up to 205,000 tokens. Trained with reinforcement learning across over 200,000 real-world environments, it excels in programming tasks, handling more than 10 coding languages, and is particularly adept at decomposing and planning software architecture. The model achieves top industry benchmark scores, showing a 37% faster performance than its predecessor, M2.1. MiniMax-M2.5 is available through several API providers, with DeepInfra being the standout choice due to its balanced approach of low latency, competitive pricing, and comprehensive feature support. DeepInfra offers a token pricing of $0.44 per million, a latency of 0.56s, and excels in applications requiring rapid response times, such as RAG applications and agentic workflows. Other providers like SambaNova, Together.ai, SiliconFlow, and Fireworks cater to specific needs, such as maximum throughput, lowest latency, cost efficiency, and high speed, respectively, each with unique trade-offs in performance metrics.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
RAG	5	941	216	85	-48%
Real-time	3	6,296	1,346	246	-2%
LLM	2	5,932	1,046	223	-2%
AI Agents	1	4,430	1,100	236	-3%
Reinforcement learning	1	104	49	23	-14%
Vector Search	1	1,739	413	146	-27%
Voice AI	1	2,379	221	38	-3%

Use This Data

Use this post, company, and trend context to find content marketing opportunities, perform competitive analysis, or address product feature gaps via the Plushcap MCP server or the Plushcap API.