Architecture diagram
— Ingestion Path (one-time + on content update) —
Source
Amazon S3
Docs, PDFs, HTML
→
Chunk
Bedrock KB
Splits documents
→
Embed
Titan Embeddings
Chunks → vectors
→
Store
Vector DB
OpenSearch / Aurora
⇣ Indexed & ready ⇣
— Query Path (every user request) —
User
Question
"How do I...?"
→
API
API Gateway + Lambda
Entry point
→
Retrieve
Vector Search
Top-k chunks
→
Generate
Bedrock FM
Grounded answer
→
Response
Answer + Citations
To user
Data source
Service / compute
Foundation model
Storage / index
Output to user
How data flows
The architecture runs in two phases. Ingestion happens once at setup and again whenever source content changes — documents are chunked, embedded, and stored. Query happens on every user request — the question is embedded, similar chunks are retrieved from the vector store, and the FM generates an answer grounded in those chunks.
The "managed" version of this pattern is Amazon Bedrock Knowledge Bases, which handles chunking, embedding, and syncing automatically. You just point it at an S3 bucket and choose a vector store.
AWS services used
Amazon S3
Source document storage — PDFs, HTML, TXT, Word docs. Trigger re-sync on new uploads.
Bedrock Knowledge Bases
Managed orchestration — handles chunking, embedding, sync, retrieval. This is the service that makes RAG "easy mode."
Amazon Titan Embeddings
Default embedding model. Converts chunks into vectors. Cohere embeddings also supported.
Vector store
OpenSearch Serverless (default) · Aurora pgvector · Pinecone · Redis · MongoDB Atlas · Neptune Analytics.
Bedrock FM
Claude, Llama, Titan Text, Nova — the model that generates the final answer from retrieved context.
API Gateway + Lambda
Standard entry point for user requests. Often combined with Cognito for authentication.
When to use this pattern
✓ Use Basic RAG when…
-
The knowledge changes frequently
Policy docs updated weekly, product catalogs, support KBs, internal wikis. RAG re-indexes automatically; fine-tuning would require retraining every update.
-
You need citations back to source documents
Compliance, legal, medical, enterprise support — anywhere the user needs to verify claims. Knowledge Bases returns citations natively.
-
You need to reduce hallucinations for factual questions
Grounding in retrieved content dramatically cuts fabrication. This is the primary technique for factual accuracy.
-
The documents are too many or too large to fit in the context window
100MB of docs will never fit in any FM context. RAG retrieves only the 3-10 most relevant chunks per query.
-
You want the fastest time-to-production for a document-Q&A app
Bedrock Knowledge Bases can have a working RAG app in under an hour. Minimal infrastructure.
-
You need per-document access control
Use metadata filtering at retrieval time to enforce which docs a user can see based on their identity.
✗ Do NOT use Basic RAG when…
-
You need to change the model's tone, style, or writing voice
That's a fine-tuning job, not a retrieval job. RAG provides knowledge, not voice. If the prompt alone can't get the tone right, fine-tune.
-
Your data is small enough to fit in the context window
If your total knowledge is under ~50 pages, just put it in the system prompt. You don't need the complexity of RAG.
-
Your queries need exact-term matching (product SKUs, function names, IDs)
Pure semantic search misses exact matches. Use Pattern 2: Advanced RAG with hybrid search.
-
The task is transactional, not informational
"Look up my order and cancel it" isn't RAG — it's an agent calling tools. Use Pattern 3: Agent.
-
The data is structured (tables, relational records)
Text-to-SQL over your database is more accurate than embedding-based retrieval for structured queries. RAG is for unstructured text.
-
Retrieval quality is poor even after tuning
If top-k chunks are irrelevant despite good chunking and embedding, upgrade to Advanced RAG with reranking and hybrid search.
-
Responses must come from highly creative generation (not grounded)
Marketing copy, brainstorms, open-ended storytelling. RAG over-constrains creative tasks.
Exam angle
Pattern-match shortcuts
When a stem mentions "internal documents" + "answer questions" + "grounded" or "citations" or "reduce hallucination" + "data that changes", you're looking at Basic RAG. Pick the option that references Bedrock Knowledge Bases.
The fine-tuning trap
AWS loves to offer fine-tuning as a distractor on RAG questions. Fine-tuning is almost never the right answer when the question says "frequently updated," "dynamic," or "current data." Fine-tuning bakes the data into the model — you'd have to retrain every update, which is neither cost-effective nor practical.
The "just put it in the prompt" trap
A distractor on many questions is "put the documents in the system prompt." This fails when docs are large (context overflow) or numerous (token cost per request). Knowledge Bases retrieve only what's needed per query.
Keywords that point here
internal documentation
knowledge base
citations
reduce hallucination
grounded responses
dynamic knowledge
frequently updated
question answering
S3-sourced docs
Related patterns
If retrieval quality isn't good enough, upgrade to Pattern 2: Advanced RAG (Hybrid + Rerank).
If the app needs to take actions (not just answer), see Pattern 3: Agentic Workflow.
For safety around the whole stack, see Pattern 10: Defense-in-Depth.