Choosing a Model

Determine which large language model to use

Which Model Should I Use?

What to Consider

Choosing a model depends on the following:

  1. Context Window: the context window refers to the number of tokens you can provide to a LLM. ~1 Token = ~4 characters

  2. Task Complexity: more capable models are generally better suited for complex logic.

  3. Cost: more capable models are generally more expensive - for example, GPT4 32K is 60x more expensive than GPT-3.5.

  4. Speed: more capable models are generally slower to execute.

Which Model Should I Use to...

Generate real-time information?

Perplexity Sonar Large or Small

Classify a social post?

Claude 3 Haiku, GPT-3.5 Turbo or GPT-3.5 Instruct

Formatting a URL is a mechanical, low-complexity task requiring minimal context window.

Extract information?

For a low context window: Claude 3 Haiku, GPT-3.5 or GPT-3.5 Instruct

For a higher context window: Claude 3 Haiku or Claude 3 Sonnet

Text extraction is a mechanical, low-complexity task but the specific model may depend on the context window needed .

Format a URL?

Claude 3 Haiku, GPT-3.5 Turbo or GPT-3.5 Instruct

Formatting a URL is a mechanical, low-complexity task requiring minimal context window.

Write a summary?

For non consumer-facing summaries: Claude 3 Haiku.

For consumer-facing summaries: GPT-4 Turbo or Claude 3 Opus.

Summarization typically requires a higher context window. Additionally, GPT-4 or Claude 3 are going to deliver higher-quality writing for consumer-facing summaries.

Write a blog or report?

Claude 3 Opus or GPT-4 Turbo.

Claude 3 Opus or GPT-4 Turbo are better suited for higher-quality writing.

ModelBest For*SpeedOnline (Live Data)

Perplexity Sonar Large

Online, Live Data

Fast

Claude 3 Opus

Logic, Content Generation

Moderate

GPT-4o

Logic, Content Generation

Fast

GPT-4-Turbo

Logic, Content Generation

Moderate

GPT-4

Logic, Content Generation

Slow

Gemini Pro 1.5

Information Retrieval

Moderate

Gemini Flash 1.5

Information Retrieval

Fast

Gemini Pro

Summarization

Very fast

Claude 3 Sonnet

Summarization

Fast

Claude 3 Haiku

Classification

Very Fast

GPT-3.5 Turbo

Classification

Fast

GPT-3.5 Instruct

Classification

Very Fast

How much will it cost to run?

The cost to run a model depends on the number of input and output tokens.

Token Approximation

Input tokens: to approximate the total input tokens, copy and paste your system, user, and assistant prompts into the OpenAI tokenizer

Output tokens: to approximate the total output tokens, copy and paste your output into the OpenAI tokenizer

Cost Approximation

OpenAI: divide the input and output tokens by 1000; then multiply by their respective costs based on OpenAI pricing*

Anthropic: divide the input and output tokens by 1,000,000; then multiply by their respective costs based on Anthropic pricing*

*This is the cost if you bring your own API Key. If you choose to use AirOps hosted models, you will be charged tasks according to your usage.

Last updated