Model Selection Guide
Determine which large language model to use
Which Model Should I Use?

What to Consider
Choosing a model depends on the following:
Context Window: the context window refers to the number of tokens you can provide to a LLM. ~1 Token = ~4 characters
Task Complexity: more capable models are generally better suited for complex logic.
Web Access: whether the use case you're building require the model to have web access?
Cost: more capable models are generally more expensive - for example, o1 is more expensive than GPT-4o.
Speed: more capable models are generally slower to execute.
AirOps Popular LLMs
GPT-5.2
OpenAI
Latest flagship with enhanced long-context reasoning
400K
✓
✓
✓
GPT-5.1
OpenAI
Flagship model with adaptive reasoning modes
400K
✓
✓
✓
GPT-5
OpenAI
Flagship model for complex tasks
400K
✓
✓
✓
GPT-4.1
OpenAI
For complex tasks, vision-capable
1M
✓
✓
-
GPT-4o Search Preview
OpenAI
Flagship model for online web research
128K
✓
✓
✓
O4 Mini
OpenAI
Fast multi-step reasoning for complex tasks
128K
-
✓
-
O3
OpenAI
Advanced reasoning for complex tasks
128K
-
✓
-
O3 Mini
OpenAI
Fast multi-step reasoning for complex tasks
128K
-
✓
-
Claude Opus 4.5
Anthropic
Most powerful Claude for complex multi-step tasks
200K
✓
-
-
Claude Opus 4.1
Anthropic
Powerful model for complex and writing tasks
200K
✓
-
-
Claude Sonnet 4.5
Anthropic
Best for agents and coding with web fetch capability
200K
✓
-
✓
Claude Sonnet 4
Anthropic
Hybrid reasoning: fast answers or deep thinking
200K
✓
-
-
Gemini 3 Pro
Advanced multimodal reasoning for complex tasks
1M
✓
✓
✓
Gemini Flash 3
Fast and intelligent model optimized for speed
1M
✓
✓
✓
Gemini 2.5 Pro
Advanced reasoning for complex tasks
1M
✓
✓
✓
Gemini 2.5 Flash
Fast and intelligent model for lightweight tasks
1M
✓
✓
✓
Perplexity Sonar
Perplexity
Balanced model for online web research
128K
-
✓
✓
Differences between “o-series” vs “GPT” models
GPT-5 Series: Built-In Reasoning
GPT-5 Models (5, 5.1, 5.2): OpenAI's model series that combines reasoning paradigm with traditional LLM capabilities. GPT-5.2 is the latest flagship with enhanced long-context reasoning and a knowledge cutoff of August 2025. GPT-5.1 introduced adaptive reasoning with "Instant" and "Thinking" modes. All GPT-5 models feature reasoning levels of minimal, low, medium, high that control how much reasoning the model performs.
O-series Models (o3, o4-mini): Pure Reasoning Specialists
Specialized exclusively for deep reasoning and step-by-step problem solving. These models excel at complex, multi-stage tasks requiring logical thinking and tool use. Choose these when maximum accuracy and reasoning depth are paramount. Features reasoning levels of low, medium, high for controlling reasoning token usage.
GPT Models (4.1, 4o): Traditional General-Purpose
Optimized for general-purpose tasks with excellent instruction following. GPT-4.1 excels with long contexts (1M tokens) while GPT-4o has variants for realtime speech, text-to-speech, and speech-to-text. GPT-4.1 also comes in mini and nano variants, while GPT-4o has a mini variant. These variants are cheaper and faster than their full-size counterparts. Strong in structured output generation.
Differences between Claude Models
Claude Opus 4.5: Most Powerful
Anthropic's flagship model for complex, multi-step workflows. Excels at long-form content, research tasks, and maintaining context across extended conversations. Choose Opus 4.5 when quality matters most.
Claude Sonnet 4.5: Best Value
Strong reasoning with built-in web fetch that can retrieve content from URLs in your prompts. Great balance of capability and cost for most marketing workflows.
Claude Sonnet 4 & Opus 4.1: Previous Generation
Solid models for straightforward tasks that don't require the latest capabilities.
Web Search Capabilities
Several models support web search, allowing them to access real-time information from the internet during generation:
OpenAI Models with Web Search: gpt-4o-mini, gpt-4o, gpt-4.1-mini, gpt-4.1, o4-mini, o3, GPT-5, GPT-5.1, and GPT-5.2 all support web search when enabled in the LLM step configuration.
Claude Sonnet 4.5 Web Fetch: Claude Sonnet 4.5 includes a unique web fetch capability that can grab and process the contents of URLs included in your prompts, making it ideal for workflows that need to analyze specific web pages.
Google Gemini: All Gemini models (2.5 Pro, 2.5 Flash, 3 Pro, Flash 3) support web access through Google Search grounding.
How much will it cost to run?
The cost to run a model depends on the number of input and output tokens.
Token Approximation
Input tokens: to approximate the total input tokens, copy and paste your system, user, and assistant prompts into the OpenAI tokenizer
Output tokens: to approximate the total output tokens, copy and paste your output into the OpenAI tokenizer
Cost Approximation
OpenAI: divide the input and output tokens by 1000; then multiply by their respective costs based on OpenAI pricing*
Anthropic: divide the input and output tokens by 1,000,000; then multiply by their respective costs based on Anthropic pricing*
Last updated
Was this helpful?