Updated production-ready Gemini models, reduced 1.5 Pro pricing, increased rate limits, and more
Blog post from Google Cloud
Google has released updated versions of its Gemini models, namely Gemini-1.5-Pro-002 and Gemini-1.5-Flash-002, offering improved performance and significant cost reductions. These updates include over 50% price reductions for input and output tokens under 128K, increased rate limits, faster output, and reduced latency, making them more efficient and cost-effective for various tasks such as synthesizing information from extensive PDFs, answering complex code-related questions, and video content creation. The models now exhibit superior performance in math, long context, and vision-related tasks, with improvements noted in benchmarks like MMLU-Pro, MATH, and HiddenMath, and enhancements in visual understanding and Python code generation. The updated models respond more efficiently, with shorter default outputs for tasks like summarization and question answering, while maintaining content safety standards. Google has also announced a new experimental version, Gemini-1.5-Flash-8B-Exp-0924, which promises enhanced performance across various use cases and is accessible via Google AI Studio and the Gemini API. These advancements reflect Google's commitment to incorporating developer feedback and optimizing its experimental-to-production release pipeline.