Google has announced Gemini 2.0, the next generation of its large language models (LLMs), designed for what the company calls "the agentic era." This significant upgrade builds upon Gemini 1.5, delivering substantial improvements in performance and capabilities.
Gemini 2.0 Flash: Key Features
The first model in the Gemini 2.0 family, Gemini 2.0 Flash, is now available in Google AI Studio and Vertex AI. Key enhancements include:
- Enhanced Performance: Twice the speed of Gemini 1.5 Flash, with similarly fast response times.
- Multimodal Capabilities: Supports input from images, text, video, and audio, including combinations of images and text, and multilingual text-to-speech audio.
- Google Search Integration: Natively accesses Google Search for up-to-date information.
- Third-Party Code Execution: Allows for execution of third-party code and pre-defined functions.
- Multimodal Live API: A new API for developers to integrate multimodal capabilities into their applications.
A chat-optimized version of Gemini 2.0 Flash will soon be available for desktop and mobile browsers, with a mobile app version coming to the Gemini mobile app.
Research Prototypes: Expanding Gemini 2.0's Capabilities
Google has also updated its research prototypes with Gemini 2.0, showcasing its versatility and potential:
- Project Astra: Improved dialogue, reasoning, and native support for Google Search, Lens, and Maps. Offers up to 10 minutes of in-session memory.
- Project Mariner: Understands complex instructions and accesses information directly from a browser screen using an experimental Chrome extension to complete tasks.
- Jules (AI Code Assistant): Integrates directly into GitHub workflows, assisting developers with code challenges and providing solutions under supervision.
- AI Game Agents: Can navigate video games, offering real-time suggestions based on on-screen actions.
Gemini 2.0 represents a significant leap in AI capabilities, opening up new possibilities for developers and users alike.