Mental Model · Temperature as Probability

~ same prompt, same model, different temperature = different shape of choice ~

What this unlocks

Temperature is ONE knob in sampling There's also top-k (only consider the k highest-probability tokens), top-p / nucleus (consider tokens adding up to cumulative probability p). These restrict the candidate set; temperature reshapes the distribution over whatever survives. In practice on Bedrock, you'll mostly tune temperature — top-p is often left at the default.

When to use what T = 0 or 0.1-0.2 — factual Q&A, RAG answers, code generation, classification, anything where you want the model to be boring and right.
T = 0.5-0.8 — default for most chat. Balanced between "stuck in a rut" and "goes off the rails."
T = 1.0+ — creative writing, brainstorming, generating variety. Higher = more diverse, but also more likely to wander.

Exam angle — hallucination control When a stem says "reduce hallucinations," "more consistent answers," or "model keeps making things up" — one of the cheap fixes is lower temperature. Combined with Bedrock Guardrails' grounding check and explicit "use only the provided context" prompt language, you get a tight factual system. See Tree 4: RAG Troubleshooting.

The "deterministic ≠ reproducible" trap T=0 is "greedy" — always picks the top candidate. But slight numerical differences between runs (different hardware, floating-point rounding) can flip close ties. If you need actually identical outputs across runs, T=0 is necessary but not sufficient — some providers offer a seed parameter for reproducibility. On Bedrock, some models support it; others don't. Caveat emptor.

Pattern 1: Basic RAG · Tree 4: RAG Troubleshooting
Mental Model 1: Embeddings · Mental Model 3: Prompt Injection · Mental Model 4: Attention · Mental Model 5: Context Window

Temperature = How Flat the Dice Are

What this unlocks

Related