Temperature (LLM)
Temperature (LLM) is a scalar parameter applied to the model's logit distribution before sampling the next token; Before sampling, each token's raw logit score is divided by the temperature value; For tasks requiring factual accuracy or structured data extraction, builders should use temperature 0 or near 0 to maximize consistency
Temperature (LLM) is a scalar parameter applied to the model’s logit distribution before sampling the next token. At temperature 0, the model always picks the highest-probability token, producing deterministic output. As temperature increases, the probability distribution flattens, making lower-probability tokens more likely to be sampled.
How it works
Before sampling, each token’s raw logit score is divided by the temperature value. A temperature below 1 sharpens the distribution (the top token becomes even more dominant), while a temperature above 1 flattens it. At temperature 0, the model greedily selects the argmax token at every step. Most APIs clamp temperature to the range 0 to 2.
Key facts
- Recommended range: 0 for deterministic tasks (classification, extraction), 0.7 to 1.0 for general generation, above 1 for creative tasks.
- Not randomness seeds: Setting temperature to 0 does not guarantee identical outputs due to floating-point non-determinism on GPUs.
- Interacts with top-p: Temperature and top-p are often used together; setting both high simultaneously amplifies randomness substantially.
- No effect on reasoning: For models with chain-of-thought reasoning baked in, temperature applies to the visible output token stream.
For builders
For tasks requiring factual accuracy or structured data extraction, builders should use temperature 0 or near 0 to maximize consistency. For creative or generative features like marketing copy or brainstorming tools, a temperature around 0.8 to 1.0 produces more varied and engaging output. Always pair temperature choices with eval metrics rather than choosing by intuition alone.
Sources
- Wei, J., et al. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv:2201.11903. arxiv.org
- Brown, T., et al. (2020). Language Models are Few-Shot Learners. arXiv:2005.14165. arxiv.org
- Anthropic. Prompt engineering best practices. anthropic.com
- OpenAI. Prompt engineering guide. platform.openai.com
- NIST. (2023). AI Risk Management Framework. nist.gov