Choosing a Model
Determine which large language model to use
Which Model Should I Use?
What to Consider
Choosing a model depends on the following:
Context Window: the context window refers to the number of tokens you can provide to a LLM. ~1 Token = ~4 characters
Task Complexity: more capable models are generally better suited for complex logic.
Cost: more capable models are generally more expensive - for example, GPT4 32K is 60x more expensive than GPT-3.5.
Speed: more capable models are generally slower to execute.
Which Model Should I Use to...
Generate real-time information?
Perplexity Sonar Large or Small
Classify a social post?
Claude 3 Haiku, GPT-3.5 Turbo or GPT-3.5 Instruct
Formatting a URL is a mechanical, low-complexity task requiring minimal context window.
Extract information?
For a low context window: Claude 3 Haiku, GPT-3.5 or GPT-3.5 Instruct
For a higher context window: Claude 3 Haiku or Claude 3 Sonnet
Text extraction is a mechanical, low-complexity task but the specific model may depend on the context window needed .
Format a URL?
Claude 3 Haiku, GPT-3.5 Turbo or GPT-3.5 Instruct
Formatting a URL is a mechanical, low-complexity task requiring minimal context window.
Write a summary?
For non consumer-facing summaries: Claude 3 Haiku.
For consumer-facing summaries: GPT-4 Turbo or Claude 3 Opus.
Summarization typically requires a higher context window. Additionally, GPT-4 or Claude 3 are going to deliver higher-quality writing for consumer-facing summaries.
Write a blog or report?
Claude 3 Opus or GPT-4 Turbo.
Claude 3 Opus or GPT-4 Turbo are better suited for higher-quality writing.
AirOps Recommended LLMs
Perplexity Sonar Large
Online, Live Data
Fast
Claude 3 Opus
Logic, Content Generation
Moderate
GPT-4o
Logic, Content Generation
Fast
GPT-4-Turbo
Logic, Content Generation
Moderate
GPT-4
Logic, Content Generation
Slow
Gemini Pro 1.5
Information Retrieval
Moderate
Gemini Flash 1.5
Information Retrieval
Fast
Gemini Pro
Summarization
Very fast
Claude 3 Sonnet
Summarization
Fast
Claude 3 Haiku
Classification
Very Fast
GPT-3.5 Turbo
Classification
Fast
GPT-3.5 Instruct
Classification
Very Fast
How much will it cost to run?
The cost to run a model depends on the number of input and output tokens.
Token Approximation
Input tokens: to approximate the total input tokens, copy and paste your system, user, and assistant prompts into the OpenAI tokenizer
Output tokens: to approximate the total output tokens, copy and paste your output into the OpenAI tokenizer
Cost Approximation
OpenAI: divide the input and output tokens by 1000; then multiply by their respective costs based on OpenAI pricing*
Anthropic: divide the input and output tokens by 1,000,000; then multiply by their respective costs based on Anthropic pricing*
*This is the cost if you bring your own API Key. If you choose to use AirOps hosted models, you will be charged tasks according to your usage.
Last updated