Article Issue #5198

Cost Per Token

What to know

Cost Per Token is the pricing unit for language model API usage, typically expressed in US dollars per million tokens and split into separate input (prompt) and output (completion) rates; A provider tokenizes each request, counts input tokens (everything in the prompt, including system message, history, and retrieved context), and counts output tokens (everything the model generates in response); Unit economics analysis is essential before scaling any LLM feature

Wikiwalls Team Administrator

May 15, 2026 2 min read

« Back to Glossary Index

Cost Per Token is the pricing unit for language model API usage, typically expressed in US dollars per million tokens and split into separate input (prompt) and output (completion) rates. Because output generation is more compute-intensive than input processing, output tokens are generally priced 3 to 5 times higher than input tokens for the same model.

How it works

A provider tokenizes each request, counts input tokens (everything in the prompt, including system message, history, and retrieved context), and counts output tokens (everything the model generates in response). Total cost equals (input tokens / 1,000,000) times input price plus (output tokens / 1,000,000) times output price. Cached input tokens, if applicable, are counted at the lower cached rate.

Key facts

Typical ranges (2025): Frontier models cost 1 to 15 USD per million input tokens; open-weight hosted models cost 0.1 to 1 USD.
Output premium: Output tokens typically cost 3 to 5 times more than input tokens for the same model.
Context window impact: Sending large contexts on every request quickly dominates total cost; prompt caching and RAG reduce this.
Volume discounts: Enterprise agreements and committed usage tiers unlock significant discounts below list pricing.

For builders

Unit economics analysis is essential before scaling any LLM feature. The standard exercise is to estimate average input and output token counts per request, multiply by expected call volume, and project monthly cost at list prices. Prompt caching, model tiering (routing simpler tasks to cheaper models), and output length constraints are the primary levers for reducing cost once the baseline is established.

Sources

« Back to Definition Index

If this saved you an afternoon — and we will send the next one straight to your inbox.

Wikiwalls Team

Administrator · 41 published guides · Joined 2016

Welcome to wikiwalls

How it works

Key facts

For builders

Sources

More from WikiWalls

Cursor vs Copilot vs Cody vs Windsurf, after a 30-day production diary

The Cheapest Production-Grade LLM, ranked at constant output quality

Best Mini-PC for Homelab: Beelink, Minisforum, GMKtec Tested

Best AI Note Apps: Mem vs Reflect vs Tana vs Saner.ai

One careful fix in your inbox each Wednesday.