Skip to content
Article Issue #5166

Context Window

What to know

Context Window is the finite token budget that bounds what an LLM can 'see' during a single inference call; During inference, the model computes attention across every token in the context; Managing context is a core engineering concern

Context Window, WikiWalls Glossary illustration

« Back to Glossary Index

Context Window is the finite token budget that bounds what an LLM can ‘see’ during a single inference call. All system instructions, conversation history, retrieved documents, tool outputs, and generated text must fit within this limit. Once the window is full, the model cannot reference earlier content unless it is explicitly re-included.

How it works

During inference, the model computes attention across every token in the context. Quadratic attention complexity means memory and compute grow rapidly as context length increases, which is why extending windows requires architectural changes like sliding-window attention, sparse attention, or rotary position embeddings. Providers set a hard token cap per model.

Key facts

  • Units: Measured in tokens, not characters. One token averages roughly 4 English characters.
  • Range: Early GPT-3 supported 4,096 tokens; current frontier models support 128K to 1M tokens.
  • Input vs. output: Total context equals input tokens plus the generated output tokens.
  • Cost impact: Larger contexts cost more; pricing is typically linear with token count.

For builders

Managing context is a core engineering concern. Builders must decide what to include, what to truncate, and what to offload to retrieval systems. Prompt caching can reduce the cost of repeating large, stable context blocks across many calls, which is important for long-document or multi-turn workflows.

Sources

« Back to Definition Index
Administrator · 41 published guides · Joined 2016

Welcome to wikiwalls

The WikiWalls Journal · Free, weekly

One careful fix in your inbox each Wednesday.

No affiliate links inside the diagnosis. No sponsored "top 10". One careful fix per week — unsubscribe in one click.

No tracking pixels · No spam · Edited by a human.