On-device small language models with multimodality, RAG, and Function Calling

Post Details

Company

Google Cloud

Date Published

May 20, 2025

Author

Mark Sherwood, Matthew Chan, Marissa Ikonomidis, and Milen Ferev

Word Count

1,078

Language

English

Hacker News Points

-

Source URL

developers.googleblog.com/google-ai-edge-small-language-models-multimodality-rag-function-calling

Summary

Google AI Edge has expanded its support for on-device small language models (SLMs), introducing over a dozen new models, including the Gemma 3 and Gemma 3n models, available through the LiteRT Hugging Face community. Gemma 3n is the first multimodal on-device model supporting text, image, video, and audio inputs, designed for enterprise use cases where larger models can be accommodated on mobile devices. The release is complemented by the new Retrieval Augmented Generation (RAG) and Function Calling libraries, which enhance the capabilities of on-device AI by allowing for application-specific data augmentation and interactive function calling. These tools enable users to leverage models efficiently even with limited connectivity, offering transformative AI features grounded in user-relevant information. The AI Edge RAG library is currently available on Android, with plans to extend to other platforms, while the function calling library facilitates the integration of language models with application functions. Additionally, the latest quantization tools offer improved int4 post-training quantization, reducing model size and latency. The ongoing development will continue to support the latest modalities and expand functionality across platforms, with updates available through the LiteRT Hugging Face Community.