Mixtral 8x7B, a new model from Mistral AI, is now available on the Fireworks platform, offering unprecedented speed and cost-effectiveness for tasks like summarization and multi-turn chat. The model, which utilizes a sparse mixture of experts technique, allows for higher model capacity at lower costs and speeds, outperforming models like Llama 2 70B and GPT-3.5 in benchmarks. Fireworks has optimized its custom-built inference engine to achieve 175 tokens per second on a latency-optimized setup, and offers flexible pricing tiers based on usage patterns. The platform supports various deployment configurations, including latency-optimized and throughput-optimized setups, and allows users to interact with Mixtral using free credits and an OpenAI-compatible API. The model was quickly released and fine-tuned for instruction-following use, with ongoing enhancements to support custom fine-tunes at no additional cost. Fireworks' competitive pricing is facilitated by its efficient GPU utilization, and the release was marked by a collaborative effort from the Mistral AI team, who shared the model openly with the community.