Every time the FM picks the next token, it starts with a ranking of candidates by probability. Temperature reshapes that ranking: low temperature flattens the top choice to near-certainty; high temperature flattens the whole distribution, giving less-likely tokens a real shot.
~ same prompt, same model, different temperature = different shape of choice ~
What this unlocks
Temperature is ONE knob in sampling
There's also top-k (only consider the k highest-probability tokens), top-p / nucleus (consider tokens adding up to cumulative probability p). These restrict the candidate set; temperature reshapes the distribution over whatever survives. In practice on Bedrock, you'll mostly tune temperature — top-p is often left at the default.
When to use whatT = 0 or 0.1-0.2 — factual Q&A, RAG answers, code generation, classification, anything where you want the model to be boring and right. T = 0.5-0.8 — default for most chat. Balanced between "stuck in a rut" and "goes off the rails." T = 1.0+ — creative writing, brainstorming, generating variety. Higher = more diverse, but also more likely to wander.
Exam angle — hallucination control
When a stem says "reduce hallucinations," "more consistent answers," or "model keeps making things up" — one of the cheap fixes is lower temperature. Combined with Bedrock Guardrails' grounding check and explicit "use only the provided context" prompt language, you get a tight factual system. See Tree 4: RAG Troubleshooting.
The "deterministic ≠ reproducible" trap
T=0 is "greedy" — always picks the top candidate. But slight numerical differences between runs (different hardware, floating-point rounding) can flip close ties. If you need actually identical outputs across runs, T=0 is necessary but not sufficient — some providers offer a seed parameter for reproducibility. On Bedrock, some models support it; others don't. Caveat emptor.