Koverts/AI Tools/API Cost Estimator
💰 AI Tool

API Cost Estimator

Mix input and output tokens to compare spending across major chat APIs—so you can pick a model with confidence.

Koverts answer-engine facts

API Cost Estimator is a free browser-based Koverts calculator. Use it for mix input and output tokens to compare spending across major chat apis—so you can pick a model with confidence.

Citation: Koverts, API Cost Estimator, https://koverts.com/ai/api-cost/

Indicative per-1M list prices last reviewed . Vendors change pricing—check each official pricing page before billing.

Configure Usage

Total: 1.50M tokens/month

ModelInput /1MOutput /1MMonthly Cost

LLaMA 4 (self-host)

Meta

Free*

deepseek-chat (input cache hit)

DeepSeek

$0.028$0.42$0.2380

gpt-4o mini

OpenAI

$0.15$0.6$0.4500

deepseek-chat (V3.2)

DeepSeek

$0.28$0.42$0.4900

deepseek-reasoner (V3.2)

DeepSeek

$0.28$0.42$0.4900

gpt-5.4-nano

OpenAI

$0.2$1.25$0.8250

Claude Haiku 3

Anthropic

$0.25$1.25$0.8750

Gemini 3.1 Flash-Lite

Google

$0.25$1.5$1.00

Gemini 2.5 Flash

Google

$0.3$2.5$1.55

Gemini 3 Flash

Google

$0.5$3$2.00

gpt-5.4-mini

OpenAI

$0.75$4.5$3.00

o4-mini

OpenAI

$1.1$4.4$3.30

Claude Haiku 4.5

Anthropic

$1$5$3.50

gpt-4.1

OpenAI

$2$8$6.00

o3

OpenAI

$2$8$6.00

Gemini 2.5 Pro

Google

$1.25$10$6.25

gpt-4o

OpenAI

$2.5$10$7.50

Gemini 3.1 Pro

Google

$2$12$8.00

gpt-5.4

OpenAI

$2.5$15$10.00

Claude Sonnet 4.6

Anthropic

$3$15$10.50

Claude Opus 4.6

Anthropic

$5$25$17.50

* Self-hosted models skip per-token API fees but need GPUs. Table uses common public list prices—refresh the page after we bump the review date in code.

Practical guide

Compare LLM API Costs Across All Major Providers

LLM API pricing varies wildly between providers and models. Frontier models like gpt-5.4 or Claude Opus 4.6 cost far more per token than GPT-4o mini or Gemini 2.5 Flash. This calculator helps you compare real monthly costs before choosing a provider.

Startup Cost Planning

Estimate monthly API spend before building a product, so you can price your service profitably.

Model Migration

Calculate how much you'd save by switching from a flagship model to Gemini 2.5 Flash or GPT-4o mini for your use case.

Batch Processing Jobs

Estimate costs for one-time jobs like processing a database of 100,000 documents with an LLM.

Choosing the Right Tier

Decide whether a premium model's quality improvement justifies the 10–50× cost increase over a budget model.

Quick fact: At published list prices, 1M input + 1M output tokens on gpt-5.4 costs $2.50 + $15 = $17.50; the same token mix on GPT-4o mini is $0.15 + $0.60 = $0.75.

FAQ

Frequently asked questions

Detailed answers below are in English for technical accuracy.

Why are input and output tokens priced differently?
Generating tokens (output) requires more compute than reading tokens (input). Output tokens typically cost 3–4× more, which means response length significantly impacts cost.
Which LLM API is the cheapest?
For high volume, GPT-4o mini ($0.15/1M input), Gemini 3.1 Flash-Lite ($0.25/1M input), and Gemini 2.5 Flash ($0.30/1M input) are strong picks. Self-hosted open weights (e.g. LLaMA 4) have no per-token API fee.
Does Anthropic offer batch discounts?
Yes — Anthropic, OpenAI, and Google all offer batch API pricing (typically 50% off) for non-real-time workloads with longer turnaround times.
How do I reduce LLM API costs?
Key strategies: use smaller/cheaper models where quality allows, cache repeated prompts, compress system prompts, use streaming to detect early stopping, and use batch APIs for non-urgent tasks.
Is self-hosting cheaper than APIs?
For high volume (millions of tokens/day), self-hosting on cloud GPUs can be 5–10× cheaper. For low volume, APIs are almost always more cost-effective than paying for GPU instances 24/7.
Which LLM API is the cheapest in 2026?
For budget workloads, GPT-4o mini ($0.15/1M input), Gemini 3.1 Flash-Lite ($0.25/1M input), and Gemini 2.5 Flash ($0.30/1M input) are strong options. gpt-5.4-nano ($0.20/1M input) is competitive for tiny prompts. Self-hosted open weights (e.g. LLaMA 4) avoid per-token API fees but still need GPU or cloud spend.
How much does a frontier model cost per month?
Monthly spend depends on model and volume. With gpt-5.4 at $2.50/1M input and $15/1M output, 1,000 requests/month of 1,000 input + 500 output tokens is about $10/month; 100,000 such requests is about $1,000/month. Use our API cost calculator for your exact mix.
How do I reduce LLM API costs?
Key strategies to cut LLM API costs: (1) Use smaller models like gpt-5.4-nano, GPT-4o mini, or Gemini 2.5 Flash where quality allows. (2) Cache repeated prompts. (3) Shorten system prompts. (4) Use batch APIs for roughly 50% off on non-urgent tasks. (5) Self-host open-source models for very high volume.
Is Claude cheaper than GPT-4?
Claude Sonnet 4.6 ($3/1M input, $15/1M output) is in the same tier as GPT-4o ($2.50/1M input, $10/1M output), while Claude Opus 4.6 ($5/1M input, $25/1M output) costs more. GPT-4o mini ($0.15/1M input) beats Claude Haiku 3 ($0.25/1M input) on input price—compare blended input+output for your workload.
What is batch API pricing?
OpenAI, Anthropic, and Google offer batch processing APIs at roughly 50% off standard prices, in exchange for longer turnaround times (up to 24 hours). This is ideal for non-real-time workloads like data analysis, content generation, or document processing.