Home / Companies / Deepinfra / Blog / Post Details
Content Deep Dive

Gemma 4 on DeepInfra: Fast & Scalable Open AI Models

Blog post from Deepinfra

Post Details
Company
Date Published
Author
Deep
Word Count
1,488
Language
English
Hacker News Points
-
Summary

Gemma 4, developed by Google DeepMind and available on DeepInfra, is a family of AI models designed to offer significant improvements over its predecessor, Gemma 3, particularly in areas like mathematics, coding, and agentic tasks. The models, ranging from sub-5B edge-optimized variants to a 31B dense model, leverage a Mixture-of-Experts (MoE) architecture, which activates only a fraction of the total parameters during inference, making them efficient and scalable. Notably, the 26B A4B variant outperforms the previous version with nearly tripled scores on benchmarks like AIME 2026 and LiveCodeBench v6. These models support a wide range of capabilities, including native function calling, extensive multimodal processing, and a 256K token context window, all under an Apache 2.0 license that allows for unrestricted commercial use. DeepInfra provides a straightforward pricing model and an OpenAI-compatible API for seamless integration, appealing to developers who require powerful AI solutions without complex infrastructure setups.