Google Boosts LiteRT and Gemini Nano for On-Device AI Efficiency
Blog post from SSOJet
The latest release of LiteRT, formerly TensorFlow Lite, introduces significant enhancements for on-device machine learning, including a simplified API, improved GPU acceleration, and support for Qualcomm NPUs, aimed at accelerating AI models on mobile devices while reducing power consumption. The release features MLDrift, a new GPU acceleration implementation that improves performance for models like CNNs and Transformers, and includes the TensorBuffer API to minimize unnecessary data transfers between GPU and CPU memory. Asynchronous execution is also supported, allowing for concurrent processing across different processors. Additionally, LiteRT now supports small language models (SLMs) with multimodality, including the Gemma 3 models, optimized for mobile and web applications, and introduces the concept of Retrieval Augmented Generation (RAG) to enhance SLMs with application-specific data. Google is also preparing to announce new ML Kit APIs to enable developers to leverage Gemini Nano for on-device AI functionalities, offering a more consistent mobile AI experience without relying heavily on cloud resources. This update is complemented by an API-first platform from SSOJet that facilitates secure SSO and user management for enterprises.
| Trend | Post Mentions | Total Month Mentions | Posts | Companies | MoM |
|---|---|---|---|---|---|
| Local AI | 3 | 41 | 16 | 9 | +32% |
| RAG | 3 | 899 | 167 | 74 | -45% |
| AI Model Fine-tuning | 1 | 671 | 147 | 64 | -4% |
| Edge Computing | 1 | 23 | 14 | 13 | -65% |
| Real-time | 1 | 3,344 | 937 | 222 | -51% |