Architecture diagram

— Advanced RAG · parallel retrieval + reranking —
👤 User query "find docs on X" QUERY Query processor keyword path vector path KEYWORD SEARCH BM25 / OpenSearch exact term matching VECTOR SEARCH Embed + k-NN semantic similarity N keyword hits N vector hits FUSE Score fusion RERANK Bedrock Reranker top-k reranked GENERATE Bedrock FM grounded + cited answer + citations source compute / FM search layer result flow

How data flows

The query fans out to two parallel retrieval systems. The top path runs BM25 keyword matching — great for exact terms like SKUs, function names, error codes. The bottom path embeds the query and runs k-NN vector search — great for semantic meaning. The two result sets converge in a score-fusion step (typically reciprocal rank fusion), then a reranker model re-scores the top candidates using deeper semantic analysis. Only the top-k reranked chunks go to the FM.

The payoff: you get keyword precision and semantic breadth, and the reranker catches cases where the best chunks weren't ranked first by either method alone.

AWS services used

Amazon OpenSearchHosts both the BM25 keyword index and the k-NN vector index. Supports true hybrid search natively (neural + lexical in one query).
Bedrock RerankerBedrock offers Cohere Rerank and other reranker models. Takes the top N candidates and re-scores them for relevance.
Amazon Titan EmbeddingsEmbeds the query into the same vector space as the indexed chunks.
Bedrock FMGenerates the final answer from the top reranked chunks. Fewer chunks, higher quality = better grounding.
Lambda / Bedrock KBOrchestrates the parallel search, fusion, and reranking. Knowledge Bases can handle this natively with hybrid search + rerank configured.

When to use this pattern

Use Advanced RAG when…

  • Basic RAG retrieval quality is not good enoughUsers report missing relevant results, or irrelevant results crowding out good ones. Adding hybrid + rerank typically lifts recall and precision together.
  • Queries mix exact terms and conceptual language"Refund policy for SKU AB-447" — you need to match both "AB-447" (exact) and "refund policy" (semantic). Pure vector search misses the SKU.
  • Your corpus contains codes, names, IDs, function signatures, or technical jargonTechnical docs, API references, legal documents, medical records — anywhere exact terminology matters and semantics aren't enough.
  • You need higher precision before sending to the FMFewer, higher-quality chunks mean less context to process, cheaper calls, and better grounded answers.
  • You're okay with slightly higher latency per queryHybrid + rerank adds ~100-300ms. Acceptable for most chat apps; prohibitive only for very tight SLAs.

Do NOT use Advanced RAG when…

  • Basic RAG is already producing good resultsDon't over-engineer. If Pattern 1 is working, the added complexity of hybrid + rerank is wasted cost and latency.
  • Ultra-low latency is a hard requirementReal-time voice assistants, autocomplete scenarios. The extra hops (fusion + reranker call) push you past your budget.
  • Your corpus is purely conversational / narrative textNatural-language FAQs, customer support transcripts, marketing copy — pure semantic search already covers these well.
  • You haven't yet tuned chunking or embeddingIf chunks are too large or the embedding model mismatches your domain, hybrid + rerank won't save you. Fix the fundamentals first.
  • You need transactional operations, not retrieval"Cancel my order" isn't a retrieval problem. Use Pattern 3: Agent.

Exam angle

Pattern-match shortcuts When a stem mentions "retrieval is returning irrelevant results", "missing exact matches on codes/IDs/names", or "improve retrieval precision", the answer is usually hybrid search + reranking. On AWS, this maps to OpenSearch hybrid search and Bedrock rerank models.
The "throw more chunks at it" trap A distractor: "increase top-k from 3 to 20." More chunks means more noise, not better answers. The FM gets confused by irrelevant content, context fills up, and grounding degrades. Reranking fixes this by making your small top-k count — quality over quantity.

Keywords that point here

hybrid search BM25 + vector reranker exact term matching retrieval precision irrelevant results product SKUs function signatures

Related patterns

Start with Pattern 1: Basic RAG — upgrade to this one only when retrieval quality falls short.
If the app needs to take actions, see Pattern 3: Agent.
For full security posture, see Pattern 10: Defense-in-Depth.