Skip to content
Article Issue #5198

Cost Per Token

What to know

Cost Per Token is the pricing unit for language model API usage, typically expressed in US dollars per million tokens and split into separate input (prompt) and output (completion) rates; A provider tokenizes each request, counts input tokens (everything in the prompt, including system message, history, and retrieved context), and counts output tokens (everything the model generates in response); Unit economics analysis is essential before scaling any LLM feature

Cost Per Token, WikiWalls Glossary illustration

« Back to Glossary Index

Cost Per Token is the pricing unit for language model API usage, typically expressed in US dollars per million tokens and split into separate input (prompt) and output (completion) rates. Because output generation is more compute-intensive than input processing, output tokens are generally priced 3 to 5 times higher than input tokens for the same model.

How it works

A provider tokenizes each request, counts input tokens (everything in the prompt, including system message, history, and retrieved context), and counts output tokens (everything the model generates in response). Total cost equals (input tokens / 1,000,000) times input price plus (output tokens / 1,000,000) times output price. Cached input tokens, if applicable, are counted at the lower cached rate.

Key facts

  • Typical ranges (2025): Frontier models cost 1 to 15 USD per million input tokens; open-weight hosted models cost 0.1 to 1 USD.
  • Output premium: Output tokens typically cost 3 to 5 times more than input tokens for the same model.
  • Context window impact: Sending large contexts on every request quickly dominates total cost; prompt caching and RAG reduce this.
  • Volume discounts: Enterprise agreements and committed usage tiers unlock significant discounts below list pricing.

For builders

Unit economics analysis is essential before scaling any LLM feature. The standard exercise is to estimate average input and output token counts per request, multiply by expected call volume, and project monthly cost at list prices. Prompt caching, model tiering (routing simpler tasks to cheaper models), and output length constraints are the primary levers for reducing cost once the baseline is established.

Sources

« Back to Definition Index
Administrator · 41 published guides · Joined 2016

Welcome to wikiwalls

The WikiWalls Journal · Free, weekly

One careful fix in your inbox each Wednesday.

No affiliate links inside the diagnosis. No sponsored "top 10". One careful fix per week — unsubscribe in one click.

No tracking pixels · No spam · Edited by a human.