Home / Companies / Google Cloud / Blog / Post Details
Content Deep Dive

Gemma 3 on mobile and web with Google AI Edge

Blog post from Google Cloud

Post Details
Company
Date Published
Author
Marissa Ikonomidis, T.J. Alumbaugh, Mark Sherwood, and Cormac Brick
Word Count
1,450
Language
English
Hacker News Points
-
Summary

Gemma 3 1B is a compact model in the Gemma family designed for seamless deployment of small language models (SLMs) across mobile and web platforms, offering fast performance and broad device compatibility. Weighing 529MB, it processes content swiftly and supports offline operation, reducing latency and enhancing privacy by keeping data on the device. Key applications include data captioning, in-game dialog, smart replies, and document Q&A. The model is optimized for both CPU and GPU, utilizing quantization-aware training and efficient KV cache operations to improve performance by up to 25% on CPU and 20% on GPU. Users can customize and fine-tune the model for specific domains or use cases, benefiting from its versatile capabilities. Future enhancements aim to extend support to more third-party models and further optimize memory usage, making it accessible on a wider range of devices.