Model Selection Guide

Determine which large language model to use

Which Model Should I Use?

What to Consider

Choosing a model depends on the following:

Context Window: the context window refers to the number of tokens you can provide to a LLM. ~1 Token = ~4 characters
Task Complexity: more capable models are generally better suited for complex logic.
Web Access: whether the use case you're building require the model to have web access?
Cost: more capable models are generally more expensive - for example, o1 is more expensive than GPT-4o.
Speed: more capable models are generally slower to execute.

AirOps Popular LLMs

Model

Provider

Description

Context Window

Vision

JSON Mode

Web Access

GPT-4.1

OpenAI

Flagship for complex tasks, vision-capable

128K

✓

GPT-4o Search Preview

OpenAI

Flagship model for online web research

128K

✓

O4 Mini

OpenAI

Fast multi-step reasoning for complex tasks

128K

✓

OpenAI

Advanced reasoning for complex tasks

128K

✓

O3 Mini

OpenAI

Fast multi-step reasoning for complex tasks

128K

✓

Claude Opus 4

Anthropic

Powerful model for complex and writing tasks

200K

✓

Claude Sonnet 4

Anthropic

Hybrid reasoning: fast answers or deep thinking

200K

✓

Gemini 2.5 Pro Preview

Google

Advanced reasoning for complex tasks

✓

Gemini 2.5 Flash Preview

Google

Fast and intelligent model for lightweight tasks

✓

Perplexity Sonar

Perplexity

Balanced model for online web research

128K

✓

Differences between “o-series” vs “GPT” models

GPT Models (4o, 4.1): Optimized for general-purpose tasks with excellent instruction following. GPT-4.1 excels with long contexts (1M tokens) while GPT-4o has variants for realtime speech, text-to-speech, and speech-to-text. GPT-4.1 also comes in a mini, and nano variant, while GPT-4o has a mini variant. These variants are cheaper and faster than their full-size counterparts. Strong in structured output

O-series Models (o3, o4-mini): Specialized for deep reasoning and step-by-step problem solving. These models excel at complex, multi-stage tasks requiring logical thinking and tool use. Choose these when accuracy and reasoning depth are paramount. These models also have an optional reasoning_effort parameter (that can be set to low, medium, or high), which allows users to control the amount of tokens used for reasoning. Validates factual accuracy and citation correctness (o4-mini)

How much will it cost to run?

The cost to run a model depends on the number of input and output tokens.

Token Approximation

Input tokens: to approximate the total input tokens, copy and paste your system, user, and assistant prompts into the OpenAI tokenizer

Output tokens: to approximate the total output tokens, copy and paste your output into the OpenAI tokenizer

Cost Approximation

OpenAI: divide the input and output tokens by 1000; then multiply by their respective costs based on OpenAI pricing*

Anthropic: divide the input and output tokens by 1,000,000; then multiply by their respective costs based on Anthropic pricing*

*This is the cost if you bring your own API Key. If you choose to use AirOps hosted models, you will be charged tasks according to your usage.

Last updated 2 months ago

Was this helpful?