Gemma 3 on mobile and web with Google AI Edge
Blog post from Google Cloud
Gemma 3 1B is a compact model in the Gemma family designed for seamless deployment of small language models (SLMs) across mobile and web platforms, offering fast performance and broad device compatibility. Weighing 529MB, it processes content swiftly and supports offline operation, reducing latency and enhancing privacy by keeping data on the device. Key applications include data captioning, in-game dialog, smart replies, and document Q&A. The model is optimized for both CPU and GPU, utilizing quantization-aware training and efficient KV cache operations to improve performance by up to 25% on CPU and 20% on GPU. Users can customize and fine-tune the model for specific domains or use cases, benefiting from its versatile capabilities. Future enhancements aim to extend support to more third-party models and further optimize memory usage, making it accessible on a wider range of devices.