Home / Companies / Google Cloud / Blog / Post Details
Content Deep Dive

On-device small language models with multimodality, RAG, and Function Calling

Blog post from Google Cloud

Post Details
Company
Date Published
Author
Mark Sherwood, Matthew Chan, Marissa Ikonomidis, and Milen Ferev
Word Count
1,078
Language
English
Hacker News Points
-
Summary

Google AI Edge has expanded its support for on-device small language models (SLMs), introducing over a dozen new models, including the Gemma 3 and Gemma 3n models, available through the LiteRT Hugging Face community. Gemma 3n is the first multimodal on-device model supporting text, image, video, and audio inputs, designed for enterprise use cases where larger models can be accommodated on mobile devices. The release is complemented by the new Retrieval Augmented Generation (RAG) and Function Calling libraries, which enhance the capabilities of on-device AI by allowing for application-specific data augmentation and interactive function calling. These tools enable users to leverage models efficiently even with limited connectivity, offering transformative AI features grounded in user-relevant information. The AI Edge RAG library is currently available on Android, with plans to extend to other platforms, while the function calling library facilitates the integration of language models with application functions. Additionally, the latest quantization tools offer improved int4 post-training quantization, reducing model size and latency. The ongoing development will continue to support the latest modalities and expand functionality across platforms, with updates available through the LiteRT Hugging Face Community.