Gemma 3 on mobile and web with Google AI Edge

Post Details

Company

Google Cloud

Date Published

March 12, 2025

Author

Marissa Ikonomidis, T.J. Alumbaugh, Mark Sherwood, and Cormac Brick

Word Count

1,450

Language

English

Hacker News Points

-

Source URL

developers.googleblog.com/gemma-3-on-mobile-and-web-with-google-ai-edge

Summary

Gemma 3 1B is a compact model in the Gemma family designed for seamless deployment of small language models (SLMs) across mobile and web platforms, offering fast performance and broad device compatibility. Weighing 529MB, it processes content swiftly and supports offline operation, reducing latency and enhancing privacy by keeping data on the device. Key applications include data captioning, in-game dialog, smart replies, and document Q&A. The model is optimized for both CPU and GPU, utilizing quantization-aware training and efficient KV cache operations to improve performance by up to 25% on CPU and 20% on GPU. Users can customize and fine-tune the model for specific domains or use cases, benefiting from its versatile capabilities. Future enhancements aim to extend support to more third-party models and further optimize memory usage, making it accessible on a wider range of devices.