Architecture diagram

— Ingestion Path (one-time + on content update) —
Source Amazon S3 Docs, PDFs, HTML
Chunk Bedrock KB Splits documents
Embed Titan Embeddings Chunks → vectors
Store Vector DB OpenSearch / Aurora
⇣ Indexed & ready ⇣
— Query Path (every user request) —
User Question "How do I...?"
API API Gateway + Lambda Entry point
Retrieve Vector Search Top-k chunks
Generate Bedrock FM Grounded answer
Response Answer + Citations To user
Data source
Service / compute
Foundation model
Storage / index
Output to user

How data flows

The architecture runs in two phases. Ingestion happens once at setup and again whenever source content changes — documents are chunked, embedded, and stored. Query happens on every user request — the question is embedded, similar chunks are retrieved from the vector store, and the FM generates an answer grounded in those chunks.

The "managed" version of this pattern is Amazon Bedrock Knowledge Bases, which handles chunking, embedding, and syncing automatically. You just point it at an S3 bucket and choose a vector store.

AWS services used

Amazon S3 Source document storage — PDFs, HTML, TXT, Word docs. Trigger re-sync on new uploads.
Bedrock Knowledge Bases Managed orchestration — handles chunking, embedding, sync, retrieval. This is the service that makes RAG "easy mode."
Amazon Titan Embeddings Default embedding model. Converts chunks into vectors. Cohere embeddings also supported.
Vector store OpenSearch Serverless (default) · Aurora pgvector · Pinecone · Redis · MongoDB Atlas · Neptune Analytics.
Bedrock FM Claude, Llama, Titan Text, Nova — the model that generates the final answer from retrieved context.
API Gateway + Lambda Standard entry point for user requests. Often combined with Cognito for authentication.

When to use this pattern

Use Basic RAG when…

  • The knowledge changes frequently Policy docs updated weekly, product catalogs, support KBs, internal wikis. RAG re-indexes automatically; fine-tuning would require retraining every update.
  • You need citations back to source documents Compliance, legal, medical, enterprise support — anywhere the user needs to verify claims. Knowledge Bases returns citations natively.
  • You need to reduce hallucinations for factual questions Grounding in retrieved content dramatically cuts fabrication. This is the primary technique for factual accuracy.
  • The documents are too many or too large to fit in the context window 100MB of docs will never fit in any FM context. RAG retrieves only the 3-10 most relevant chunks per query.
  • You want the fastest time-to-production for a document-Q&A app Bedrock Knowledge Bases can have a working RAG app in under an hour. Minimal infrastructure.
  • You need per-document access control Use metadata filtering at retrieval time to enforce which docs a user can see based on their identity.

Do NOT use Basic RAG when…

  • You need to change the model's tone, style, or writing voice That's a fine-tuning job, not a retrieval job. RAG provides knowledge, not voice. If the prompt alone can't get the tone right, fine-tune.
  • Your data is small enough to fit in the context window If your total knowledge is under ~50 pages, just put it in the system prompt. You don't need the complexity of RAG.
  • Your queries need exact-term matching (product SKUs, function names, IDs) Pure semantic search misses exact matches. Use Pattern 2: Advanced RAG with hybrid search.
  • The task is transactional, not informational "Look up my order and cancel it" isn't RAG — it's an agent calling tools. Use Pattern 3: Agent.
  • The data is structured (tables, relational records) Text-to-SQL over your database is more accurate than embedding-based retrieval for structured queries. RAG is for unstructured text.
  • Retrieval quality is poor even after tuning If top-k chunks are irrelevant despite good chunking and embedding, upgrade to Advanced RAG with reranking and hybrid search.
  • Responses must come from highly creative generation (not grounded) Marketing copy, brainstorms, open-ended storytelling. RAG over-constrains creative tasks.

Exam angle

Pattern-match shortcuts When a stem mentions "internal documents" + "answer questions" + "grounded" or "citations" or "reduce hallucination" + "data that changes", you're looking at Basic RAG. Pick the option that references Bedrock Knowledge Bases.
The fine-tuning trap AWS loves to offer fine-tuning as a distractor on RAG questions. Fine-tuning is almost never the right answer when the question says "frequently updated," "dynamic," or "current data." Fine-tuning bakes the data into the model — you'd have to retrain every update, which is neither cost-effective nor practical.
The "just put it in the prompt" trap A distractor on many questions is "put the documents in the system prompt." This fails when docs are large (context overflow) or numerous (token cost per request). Knowledge Bases retrieve only what's needed per query.

Keywords that point here

internal documentation knowledge base citations reduce hallucination grounded responses dynamic knowledge frequently updated question answering S3-sourced docs

Related patterns

If retrieval quality isn't good enough, upgrade to Pattern 2: Advanced RAG (Hybrid + Rerank).
If the app needs to take actions (not just answer), see Pattern 3: Agentic Workflow.
For safety around the whole stack, see Pattern 10: Defense-in-Depth.