Company
Date Published
Author
-
Word count
1205
Language
English
Hacker News points
None

Summary

Meta's Llama 4 Maverick, an innovative Mixture-of-Experts model, is designed to process both text and images with an expanded context window of 1 million tokens, enhancing its capability to manage extensive data like code repositories and detailed product specifications. Fireworks AI quickly incorporated this model, offering superior performance through optimizations such as FP8 quantization, tensor and expert parallelism, and a custom attention mechanism, allowing it to achieve a streaming throughput of 145 tokens per second, outperforming competitors. The platform supports an OpenAI-compatible function-calling interface, enabling users to execute deterministic function calls effectively. Fireworks AI’s infrastructure ensures efficient asynchronous GPU execution and is the first public API for Llama 4, providing the fastest and most expansive context model with native function calling for developers.