Google Unveils PaliGemma 2 Vision-Language Models
Google introduces PaliGemma 2, its next-generation vision-language model with enhanced capabilities and various sizes/resolutions.
Google has announced PaliGemma 2, the successor to its vision-language model, PaliGemma. These new models offer a range of sizes, from 3 billion to 28 billion parameters, and support various resolutions up to 896px, allowing for greater flexibility and customization. PaliGemma 2 boasts improved performance in several areas, including: Chemical formula recognition Music score recognition Spatial reasoning Chest X-ray report generation Detailed, contextually relevant image captioning The models are designed as drop-in replacements, minimizing the need for code modifications. Pre-trained models are available for free download and experimentation on Hugging Face and Kaggle . PaliGemma 2 supports multiple frameworks such as Hugging Face Transformers, Keras, PyTorch, JAX, and Gemma.cpp. According to Google, PaliGemma 2's flexibility simplifies fine-tuning for specific tasks and datasets, empowering users to tailor the model to their exact requirements.