Article Issue #5172

Top-p Sampling

What to know

Top-p Sampling is a token selection method that, at each generation step, considers only the tokens whose cumulative probability mass sums to at least p, then samples from that nucleus; The model sorts all vocabulary tokens by descending probability, then accumulates probabilities until the running total reaches p; Most production LLM API calls expose both temperature and top_p parameters

Wikiwalls Team Administrator

May 15, 2026 2 min read

« Back to Glossary Index

Top-p Sampling is a token selection method that, at each generation step, considers only the tokens whose cumulative probability mass sums to at least p, then samples from that nucleus. Unlike top-k sampling, which always considers the same fixed number of tokens, top-p dynamically adjusts the candidate pool size based on how concentrated the model’s distribution is.

How it works

The model sorts all vocabulary tokens by descending probability, then accumulates probabilities until the running total reaches p. Only tokens in that prefix set are eligible for sampling; the rest are zeroed out and the remaining distribution is renormalized before drawing. When the model is confident, the nucleus is small; when uncertain, it widens to include more options.

Key facts

Common defaults: p values of 0.9 to 0.95 are typical; 1.0 disables the filter entirely.
Proposed by: Holtzman et al. in ‘The Curious Case of Neural Text Degeneration’ (2019).
Interacts with temperature: Temperature rescales logits before top-p is applied, so both parameters jointly shape the sampling behavior.
Top-k comparison: Top-k fixes the candidate count; top-p fixes the probability mass, making it more adaptive.

For builders

Most production LLM API calls expose both temperature and top_p parameters. For factual tasks, setting top_p to 1.0 and temperature to 0 is common. For generation tasks, a top_p of 0.9 with a moderate temperature provides a good balance between coherence and variety. Avoid tuning both simultaneously without systematic evals, as their interaction can be difficult to predict.

Sources

« Back to Definition Index

If this saved you an afternoon — and we will send the next one straight to your inbox.

Wikiwalls Team

Administrator · 41 published guides · Joined 2016

Welcome to wikiwalls

How it works

Key facts

For builders

Sources

More from WikiWalls

Cursor vs Copilot vs Cody vs Windsurf, after a 30-day production diary

The Cheapest Production-Grade LLM, ranked at constant output quality

Best Mini-PC for Homelab: Beelink, Minisforum, GMKtec Tested

Best AI Note Apps: Mem vs Reflect vs Tana vs Saner.ai

One careful fix in your inbox each Wednesday.