The context window holds everything — system prompt, retrieved RAG chunks, conversation history, user question, AND the room for the response. Fixed capacity. When the bucket overflows, things get truncated. This metaphor makes "why did my response cut off?" instantly obvious.
"Context window is a bucket" isn't technically precise — it's a mental handle. You don't need to remember the exact token count or formula; you need to remember what category of problem overflow is. A bucket fills up, spills over, needs to be managed. That mental image does real work on the exam because it lets you pattern-match fast.
Tree 4: RAG Troubleshooting — symptom #3 (response cut off) maps directly to bucket overflow
Mental Model 1: Embeddings · Mental Model 2: Temperature · Mental Model 3: Prompt Injection · Mental Model 4: Attention