Choosing the right model
Last updated
Last updated
LLM Vision is compatible with multiple providers, each of which has different models available. Some providers run in the cloud, while others are self-hosted. To see which model is best for your use case, check the figure below. It visualizes the averaged scores of available cloud-based models. The higher the score, the more accurate the output.
Claude 3.5 Sonnet achieves strong performance - comparable to GPT-4o - in the Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark MMMU1, while being 40% less expensive. This makes it the go-to model for most use cases.
Model Name | Model ID | Provider | MMMU1 Score |
---|---|---|---|
GPT-4o
gpt-4o
OpenAI
69.1
GPT-4o-mini
gpt-4o-mini
OpenAI
59.4
Claude 3.5 Sonnet
claude-3-5-sonnet-20240620
Anthropic
68.3
Claude 3 Opus
claude-3-opus-20240229
Anthropic
59.4
Claude 3 Sonnet
claude-3-sonnet-20240229
Anthropic
53.1
Claude 3 Haiku
claude-3-haiku-20240307
Anthropic
50.2
Gemini 1.5 Pro
gemini-1.5-pro
62.2
Gemini 1.5 Flash
gemini-1.5-flash
56.1
LLaVA-1.6
Ollama: llava
LocalAI: gpt-4-vision-preview
Self-hosted
43.8