Mixtral 8x7B on Fireworks: faster, cheaper, even before the official release

Post Details

Company

Fireworks AI

Date Published

Oct. 6, 2025

Author

-

Word Count

844

Language

English

Hacker News Points

-

Source URL

fireworks.ai/blog/mixtral-8x7b-on-fireworks-faster-cheaper-even-before-the-official-release

Summary

Mixtral 8x7B, a new model from Mistral AI, is now available on the Fireworks platform, offering unprecedented speed and cost-effectiveness for tasks like summarization and multi-turn chat. The model, which utilizes a sparse mixture of experts technique, allows for higher model capacity at lower costs and speeds, outperforming models like Llama 2 70B and GPT-3.5 in benchmarks. Fireworks has optimized its custom-built inference engine to achieve 175 tokens per second on a latency-optimized setup, and offers flexible pricing tiers based on usage patterns. The platform supports various deployment configurations, including latency-optimized and throughput-optimized setups, and allows users to interact with Mixtral using free credits and an OpenAI-compatible API. The model was quickly released and fine-tuned for instruction-following use, with ongoing enhancements to support custom fine-tunes at no additional cost. Fireworks' competitive pricing is facilitated by its efficient GPU utilization, and the release was marked by a collaborative effort from the Mistral AI team, who shared the model openly with the community.