Model Selection Guide
Determine which large language model to use
Last updated
Was this helpful?
Determine which large language model to use
Last updated
Was this helpful?
Choosing a model depends on the following:
Context Window: the context window refers to the number of tokens you can provide to a LLM. ~1 Token = ~4 characters
Task Complexity: more capable models are generally better suited for complex logic.
Web Access: whether the use case you're building require the model to have web access?
Cost: more capable models are generally more expensive - for example, o1 is more expensive than GPT-4o.
Speed: more capable models are generally slower to execute.
O1
OpenAI
Advanced multi-step reasoning for complex tasks
200K
-
-
-
O3 Mini
OpenAI
Small reasoning model optimized for complex tasks
200K
-
-
-
GPT-4o
OpenAI
Flagship for complex tasks, vision-capable
128K
✓
✓
-
GPT-4o Mini
OpenAI
Fast and intelligent model for lightweight tasks
128K
✓
✓
-
Claude 3.5 Sonnet
Anthropic
Flagship intelligent model for complex tasks
200K
✓
-
-
Perplexity Sonar
Perplexity
Intelligent model for online web research
128K
-
-
✓
Gemini Pro 2.0
Flagship for complex tasks, vision-capable
2M
-
✓
✓
GPT Models (4o, 4.1): Optimized for general-purpose tasks with excellent instruction following. GPT-4.1 excels with long contexts (1M tokens) while GPT-4o has variants for realtime speech, text-to-speech, and speech-to-text. GPT-4.1 also comes in a mini, and nano variant, while GPT-4o has a mini variant. These variants are cheaper and faster than their full-size counterparts. Strong in structured output
O-series Models (o3, o4-mini): Specialized for deep reasoning and step-by-step problem solving. These models excel at complex, multi-stage tasks requiring logical thinking and tool use. Choose these when accuracy and reasoning depth are paramount. These models also have an optional reasoning_effort
parameter (that can be set to low
, medium
, or high
), which allows users to control the amount of tokens used for reasoning. Validates factual accuracy and citation correctness (o4-mini)
The cost to run a model depends on the number of input and output tokens.
Input tokens: to approximate the total input tokens, copy and paste your system, user, and assistant prompts into
Output tokens: to approximate the total output tokens, copy and paste your output into
OpenAI: divide the input and output tokens by 1000; then multiply by their respective costs *
Anthropic: divide the input and output tokens by 1,000,000; then multiply by their respective costs *
*This is the cost if you . If you choose to use AirOps hosted models, you will be .