Vision use cases with Llama 3.2 11B and 90B models from Meta
AWS Machine Learning
SEPTEMBER 25, 2024
The 11B and 90B models are multimodal—they support text in/text out, and text+image in/text out. 11B and 90B are the first Llama models to support vision tasks, with a new model architecture that integrates image encoder representations into the language model. The Llama 3.2 Overview of Llama 3.2 11B and 90B Vision models The Llama 3.2
Let's personalize your content