Model Selection Guide
Determine which large language model to use
Which Model Should I Use?

What to Consider
Choosing a model depends on the following:
Context Window: the context window refers to the number of tokens you can provide to a LLM. ~1 Token = ~4 characters
Task Complexity: more capable models are generally better suited for complex logic.
Web Access: whether the use case you're building require the model to have web access?
Cost: more capable models are generally more expensive - for example, o1 is more expensive than GPT-4o.
Speed: more capable models are generally slower to execute.
AirOps Popular LLMs
GPT-5
OpenAI
Flagship model for complex tasks
400K
✓
✓
✓
GPT-4.1
OpenAI
For complex tasks, vision-capable
1M
✓
✓
-
GPT-4o Search Preview
OpenAI
Flagship model for online web research
128K
✓
✓
✓
O4 Mini
OpenAI
Fast multi-step reasoning for complex tasks
128K
-
✓
-
O3
OpenAI
Advanced reasoning for complex tasks
128K
-
✓
-
O3 Mini
OpenAI
Fast multi-step reasoning for complex tasks
128K
-
✓
-
Claude Opus 4.1
Anthropic
Powerful model for complex and writing tasks
200K
✓
-
-
Claude Sonnet 4
Anthropic
Hybrid reasoning: fast answers or deep thinking
200K
✓
-
-
Gemini 2.5 Pro
Advanced reasoning for complex tasks
1M
✓
✓
✓
Gemini 2.5 Flash
Fast and intelligent model for lightweight tasks
1M
✓
✓
✓
Perplexity Sonar
Perplexity
Balanced model for online web research
128K
-
✓
✓
Differences between “o-series” vs “GPT” models
GPT-5 Series: Built-In Reasoning
GPT-5 Models: OpenAI's first model series to combine reasoning paradigm with traditional LLM capabilities. Features reasoning levels of minimal
, low
, medium
, high
that control how much reasoning the model performs.
O-series Models (o3, o4-mini): Pure Reasoning Specialists
Specialized exclusively for deep reasoning and step-by-step problem solving. These models excel at complex, multi-stage tasks requiring logical thinking and tool use. Choose these when maximum accuracy and reasoning depth are paramount. Features reasoning levels of low
, medium
, high
for controlling reasoning token usage.
GPT Models (4.1, 4o): Traditional General-Purpose
Optimized for general-purpose tasks with excellent instruction following. GPT-4.1 excels with long contexts (1M tokens) while GPT-4o has variants for realtime speech, text-to-speech, and speech-to-text. GPT-4.1 also comes in mini and nano variants, while GPT-4o has a mini variant. These variants are cheaper and faster than their full-size counterparts. Strong in structured output generation.
How much will it cost to run?
The cost to run a model depends on the number of input and output tokens.
Token Approximation
Input tokens: to approximate the total input tokens, copy and paste your system, user, and assistant prompts into the OpenAI tokenizer
Output tokens: to approximate the total output tokens, copy and paste your output into the OpenAI tokenizer
Cost Approximation
OpenAI: divide the input and output tokens by 1000; then multiply by their respective costs based on OpenAI pricing*
Anthropic: divide the input and output tokens by 1,000,000; then multiply by their respective costs based on Anthropic pricing*
Last updated
Was this helpful?