Choosing a Model
Determine which large language model to use
Which Model Should I Use?
What to Consider
Choosing a model depends on the following:
Context Window: the context window refers to the number of tokens you can provide to a LLM. ~1 Token = ~4 characters
Task Complexity: more capable models are generally better suited for complex logic.
Cost: more capable models are generally more expensive - for example, GPT4 32K is 60x more expensive than GPT-3.5.
Speed: more capable models are generally slower to execute.
Which Model Should I Use to...
Generate real-time information?
Perplexity Sonar Large or Small
Classify a social post?
Claude 3 Haiku, GPT-3.5 Turbo or GPT-3.5 Instruct
Formatting a URL is a mechanical, low-complexity task requiring minimal context window.
Extract information?
For a low context window: Claude 3 Haiku, GPT-3.5 or GPT-3.5 Instruct
For a higher context window: Claude 3 Haiku or Claude 3 Sonnet
Text extraction is a mechanical, low-complexity task but the specific model may depend on the context window needed .
Format a URL?
Claude 3 Haiku, GPT-3.5 Turbo or GPT-3.5 Instruct
Formatting a URL is a mechanical, low-complexity task requiring minimal context window.
Write a summary?
For non consumer-facing summaries: Claude 3 Haiku.
For consumer-facing summaries: GPT-4 Turbo or Claude 3 Opus.
Summarization typically requires a higher context window. Additionally, GPT-4 or Claude 3 are going to deliver higher-quality writing for consumer-facing summaries.
Write a blog or report?
Claude 3 Opus or GPT-4 Turbo.
Claude 3 Opus or GPT-4 Turbo are better suited for higher-quality writing.
AirOps Recommended LLMs
Model | Best For* | Speed | Online (Live Data) |
---|---|---|---|
Perplexity Sonar Large | Online, Live Data | Fast | |
Claude 3 Opus | Logic, Content Generation | Moderate | |
GPT-4o | Logic, Content Generation | Fast | |
GPT-4-Turbo | Logic, Content Generation | Moderate | |
GPT-4 | Logic, Content Generation | Slow | |
Gemini Pro 1.5 | Information Retrieval | Moderate | |
Gemini Flash 1.5 | Information Retrieval | Fast | |
Gemini Pro | Summarization | Very fast | |
Claude 3 Sonnet | Summarization | Fast | |
Claude 3 Haiku | Classification | Very Fast | |
GPT-3.5 Turbo | Classification | Fast | |
GPT-3.5 Instruct | Classification | Very Fast |
How much will it cost to run?
The cost to run a model depends on the number of input and output tokens.
Token Approximation
Input tokens: to approximate the total input tokens, copy and paste your system, user, and assistant prompts into the OpenAI tokenizer
Output tokens: to approximate the total output tokens, copy and paste your output into the OpenAI tokenizer
Cost Approximation
OpenAI: divide the input and output tokens by 1000; then multiply by their respective costs based on OpenAI pricing*
Anthropic: divide the input and output tokens by 1,000,000; then multiply by their respective costs based on Anthropic pricing*
*This is the cost if you bring your own API Key. If you choose to use AirOps hosted models, you will be charged tasks according to your usage.
Last updated