Choosing the right model

LLM Vision is compatible with multiple providers, each of which has different models available. Some providers run in the cloud, while others are self-hosted. To see which model is best for your use case, check the figure below. It visualizes the averaged scores of available cloud-based models. The higher the score, the more accurate the output.

Claude 3.5 Sonnet achieves strong performance - comparable to GPT-4o - in the Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark MMMU1, while being 40% less expensive. This makes it the go-to model for most use cases.

Available Models

Model NameModel IDProviderMMMU1 Score

GPT-4o

gpt-4o

OpenAI

69.1

GPT-4o-mini

gpt-4o-mini

OpenAI

59.4

Claude 3.5 Sonnet

claude-3-5-sonnet-20240620

Anthropic

68.3

Claude 3 Opus

claude-3-opus-20240229

Anthropic

59.4

Claude 3 Sonnet

claude-3-sonnet-20240229

Anthropic

53.1

Claude 3 Haiku

claude-3-haiku-20240307

Anthropic

50.2

Gemini 1.5 Pro

gemini-1.5-pro

Google

62.2

Gemini 1.5 Flash

gemini-1.5-flash

Google

56.1

LLaVA-1.6

Ollama: llava LocalAI: gpt-4-vision-preview

Self-hosted

43.8

Last updated