Home / Companies / Google Cloud / Blog / Post Details
Content Deep Dive

Building with Gemini Embedding 2: Agentic multimodal RAG and beyond

Blog post from Google Cloud

Post Details
Company
Date Published
Author
Patrick Löber, Lucia Loher, Roberto Santana, and Mojtaba Seyedhosseini
Word Count
1,031
Language
English
Hacker News Points
-
Summary

Gemini Embedding 2, now generally available through the Gemini API and Gemini Enterprise Agent Platform, is a versatile embedding model that maps various data types—including text, images, video, audio, and documents—into a unified embedding space across over 100 languages. This model handles diverse inputs in a single call and enables developers to create applications that can "see" and "hear" complex, real-world data, offering improved accuracy in tasks such as agentic multimodal retrieval-augmented generation (RAG) and visual search. The model's ability to process interleaved inputs enhances its understanding of data, which can be leveraged in AI-driven tasks like multimodal search, search reranking, and anomaly detection. Users like Harvey and Supermemory have already reported increased accuracy in legal research and conceptual searching, respectively, after integrating Gemini Embedding 2. Additionally, the model supports efficient storage solutions by allowing dimensionality reduction of its vectors, thus reducing costs without compromising accuracy. As industries explore the model's potential, Gemini Embedding 2 promises to enhance the understanding and processing of complex data across various applications.