When the FM is generating the next token, it doesn't treat all prior words equally. Some get spotlighted — the model "looks at" them hard. Others fade into background. Attention weights are how the model decides what's relevant right now.
~ darker square = model is paying more attention to that word ~
What this unlocks
"Attention" is literal, not metaphorical
Inside the transformer, for every token being generated, the model computes a weight on every previous token — a number saying "how much do I care about this word for what I'm generating next." These weights are called attention scores. Darker shading in the diagram = higher weight. The weighted average of prior tokens' representations informs the next token.
Why this explains "lost in the middle"
Long contexts have a known failure mode: information in the middle gets ignored more than stuff at the beginning or end. Attention budgets aren't evenly distributed across the context. This is why stuffing 50 chunks into the prompt performs worse than carefully curating 3 chunks — the middle ones just don't get looked at much.
Why chunking strategy matters
If a chunk is too big, the FM's attention is diluted across lots of mostly-irrelevant tokens. If a chunk is too small, critical context around an answer is missing. Good chunking (typically 300-800 tokens with some overlap) keeps each chunk focused enough that the FM can attend to it coherently.
Exam angle — why reranking helps
Rerankers (like Bedrock Rerank / Cohere Rerank) run a lightweight cross-attention pass between the query and each candidate chunk — essentially asking "how much would the query attend to this chunk?" That's different from (and more accurate than) pure vector similarity. See Pattern 2: Advanced RAG.
You won't compute attention on the exam
The exam won't ask you to calculate attention weights. But it tests the consequences: why long contexts degrade, why rerank beats pure vector, why chunking strategy matters, why "just use a 200K context model" often isn't the answer. Those all trace back to attention.