# Model Selection Guide

## Which Model Should I Use?

<figure><img src="https://3762890407-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FX2n5yPRPynbnWuO4SH0M%2Fuploads%2Fgit-blob-fb8e717c33fdf19de6f0f5fbc5411cdfc6eaccb1%2FCleanShot%202025-02-24%20at%2004.30.37%402x.png?alt=media" alt=""><figcaption></figcaption></figure>

### What to Consider <a href="#what-to-consider" id="what-to-consider"></a>

Choosing a model depends on the following:

1. **Context Window:** the context window refers to the number of tokens you can provide to a LLM. \~1 Token = \~4 characters
2. **Task Complexity:** more capable models are generally better suited for complex logic.
3. **Web Access:** whether the use case you're building require the model to have web access?
4. **Cost:** more capable models are generally more expensive - for example, o1 is more expensive than GPT-4o.
5. **Speed:** more capable models are generally slower to execute.

## AirOps Popular LLMs

<table><thead><tr><th width="151.8367919921875">Model</th><th width="128">Provider</th><th width="151.62109375">Description</th><th>Context Window</th><th>Vision</th><th>JSON Mode</th><th>Web Access</th></tr></thead><tbody><tr><td>GPT-5.2</td><td>OpenAI</td><td>Latest flagship with enhanced long-context reasoning</td><td>400K</td><td>✓</td><td>✓</td><td>✓</td></tr><tr><td>GPT-5.1</td><td>OpenAI</td><td>Flagship model with adaptive reasoning modes</td><td>400K</td><td>✓</td><td>✓</td><td>✓</td></tr><tr><td>GPT-5</td><td>OpenAI</td><td>Flagship model for complex tasks</td><td>400K</td><td>✓</td><td>✓</td><td>✓</td></tr><tr><td>GPT-4.1</td><td>OpenAI</td><td>For complex tasks, vision-capable</td><td>1M</td><td>✓</td><td>✓</td><td>-</td></tr><tr><td>GPT-4o Search Preview</td><td>OpenAI</td><td>Flagship model for online web research</td><td>128K</td><td>✓</td><td>✓</td><td>✓</td></tr><tr><td>O4 Mini</td><td>OpenAI</td><td>Fast multi-step reasoning for complex tasks</td><td>128K</td><td>-</td><td>✓</td><td>-</td></tr><tr><td>O3</td><td>OpenAI</td><td>Advanced reasoning for complex tasks</td><td>128K</td><td>-</td><td>✓</td><td>-</td></tr><tr><td>O3 Mini</td><td>OpenAI</td><td>Fast multi-step reasoning for complex tasks</td><td>128K</td><td>-</td><td>✓</td><td>-</td></tr><tr><td>Claude Opus 4.5</td><td>Anthropic</td><td>Most powerful Claude for complex multi-step tasks</td><td>200K</td><td>✓</td><td>-</td><td>-</td></tr><tr><td>Claude Opus 4.1</td><td>Anthropic</td><td>Powerful model for complex and writing tasks</td><td>200K</td><td>✓</td><td>-</td><td>-</td></tr><tr><td>Claude Sonnet 4.5</td><td>Anthropic</td><td>Best for agents and coding with web fetch capability</td><td>200K</td><td>✓</td><td>-</td><td>✓</td></tr><tr><td>Claude Sonnet 4</td><td>Anthropic</td><td>Hybrid reasoning: fast answers or deep thinking</td><td>200K</td><td>✓</td><td>-</td><td>-</td></tr><tr><td>Gemini 3 Pro</td><td>Google</td><td>Advanced multimodal reasoning for complex tasks</td><td>1M</td><td>✓</td><td>✓</td><td>✓</td></tr><tr><td>Gemini Flash 3</td><td>Google</td><td>Fast and intelligent model optimized for speed</td><td>1M</td><td>✓</td><td>✓</td><td>✓</td></tr><tr><td>Gemini 2.5 Pro</td><td>Google</td><td>Advanced reasoning for complex tasks</td><td>1M</td><td>✓</td><td>✓</td><td>✓</td></tr><tr><td>Gemini 2.5 Flash</td><td>Google</td><td>Fast and intelligent model for lightweight tasks</td><td>1M</td><td>✓</td><td>✓</td><td>✓</td></tr><tr><td>Perplexity Sonar</td><td>Perplexity</td><td>Balanced model for online web research</td><td>128K</td><td>-</td><td>✓</td><td>✓</td></tr></tbody></table>

## Differences between “o-series” vs “GPT” models

### **GPT-5 Series**: Built-In Reasoning

**GPT-5 Models (5, 5.1, 5.2)**: OpenAI's model series that combines reasoning paradigm with traditional LLM capabilities. GPT-5.2 is the latest flagship with enhanced long-context reasoning and a knowledge cutoff of August 2025. GPT-5.1 introduced adaptive reasoning with "Instant" and "Thinking" modes. All GPT-5 models feature reasoning levels of `minimal`, `low`, `medium`, `high` that control how much reasoning the model performs.

### **O-series Models (o3, o4-mini)**: Pure Reasoning Specialists

Specialized exclusively for deep reasoning and step-by-step problem solving. These models excel at complex, multi-stage tasks requiring logical thinking and tool use. Choose these when maximum accuracy and reasoning depth are paramount. Features reasoning levels of `low`, `medium`, `high` for controlling reasoning token usage.

### **GPT Models (4.1, 4o)**: Traditional General-Purpose

Optimized for general-purpose tasks with excellent instruction following. GPT-4.1 excels with long contexts (1M tokens) while GPT-4o has variants for realtime speech, text-to-speech, and speech-to-text. GPT-4.1 also comes in mini and nano variants, while GPT-4o has a mini variant. These variants are cheaper and faster than their full-size counterparts. Strong in structured output generation.

## Differences between Claude Models

### **Claude Opus 4.5**: Most Powerful

Anthropic's flagship model for complex, multi-step workflows. Excels at long-form content, research tasks, and maintaining context across extended conversations. Choose Opus 4.5 when quality matters most.

### **Claude Sonnet 4.5**: Best Value

Strong reasoning with built-in web fetch that can retrieve content from URLs in your prompts. Great balance of capability and cost for most marketing workflows.

### **Claude Sonnet 4 & Opus 4.1**: Previous Generation

Solid models for straightforward tasks that don't require the latest capabilities.

## Web Search Capabilities

Several models support web search, allowing them to access real-time information from the internet during generation:

**OpenAI Models with Web Search:** gpt-4o-mini, gpt-4o, gpt-4.1-mini, gpt-4.1, o4-mini, o3, GPT-5, GPT-5.1, and GPT-5.2 all support web search when enabled in the LLM step configuration.

**Claude Sonnet 4.5 Web Fetch:** Claude Sonnet 4.5 includes a unique web fetch capability that can grab and process the contents of URLs included in your prompts, making it ideal for workflows that need to analyze specific web pages.

**Google Gemini:** All Gemini models (2.5 Pro, 2.5 Flash, 3 Pro, Flash 3) support web access through Google Search grounding.

## How much will it cost to run?

The cost to run a model depends on the number of input and output tokens.

### Token Approximation

**Input tokens:** to approximate the total input tokens, copy and paste your system, user, and assistant prompts into [the OpenAI tokenizer](https://platform.openai.com/tokenizer)

**Output tokens:** to approximate the total output tokens, copy and paste your output into [the OpenAI tokenizer](https://platform.openai.com/tokenizer)

### Cost Approximation

**OpenAI:** divide the input and output tokens by 1000; then multiply by their respective costs [based on OpenAI pricing](https://openai.com/pricing)\*

**Anthropic:** divide the input and output tokens by 1,000,000; then multiply by their respective costs [based on Anthropic pricing](https://www-cdn.anthropic.com/files/4zrzovbb/website/31021aea87c30ccaecbd2e966e49a03834bfd1d2.pdf)\*

{% hint style="info" %}
\*This is the cost if you [bring your own API Key](https://docs.airops.com/your-workspace/byo-key). If you choose to use AirOps hosted models, you will be [charged tasks according to your usage](https://airopshq.notion.site/AirOps-Task-Pricing-by-Hosted-Service-fd825db1300545cb8c3b5de8bc16529e).
{% endhint %}
