Choosing the right model
LLM Vision is compatible with multiple providers, each of which has different models available. Some providers run in the cloud, while others are self-hosted. To see which model is best for your use case, check the figure below. It visualizes the averaged scores of available cloud-based models. The higher the score, the more accurate the output.
Claude 3.5 Sonnet achieves strong performance - comparable to GPT-4o - in the Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark MMMU1, while being 40% less expensive. This makes it the go-to model for most use cases.
Available Models
Model Name | Model ID | Provider | MMMU1 Score |
---|---|---|---|
GPT-4o |
| OpenAI | 69.1 |
GPT-4o-mini |
| OpenAI | 59.4 |
Claude 3.5 Sonnet |
| Anthropic | 68.3 |
Claude 3 Opus |
| Anthropic | 59.4 |
Claude 3 Sonnet |
| Anthropic | 53.1 |
Claude 3 Haiku |
| Anthropic | 50.2 |
Gemini 1.5 Pro |
| 62.2 | |
Gemini 1.5 Flash |
| 56.1 | |
LLaVA-1.6 | Ollama: | Self-hosted | 43.8 |
Last updated