Pattern 2 · Advanced RAG (Hybrid + Rerank)

The query fans out to two parallel retrieval systems. The top path runs BM25 keyword matching — great for exact terms like SKUs, function names, error codes. The bottom path embeds the query and runs k-NN vector search — great for semantic meaning. The two result sets converge in a score-fusion step (typically reciprocal rank fusion), then a reranker model re-scores the top candidates using deeper semantic analysis. Only the top-k reranked chunks go to the FM.

The payoff: you get keyword precision and semantic breadth, and the reranker catches cases where the best chunks weren't ranked first by either method alone.

AWS services used

Amazon OpenSearchHosts both the BM25 keyword index and the k-NN vector index. Supports true hybrid search natively (neural + lexical in one query).

Bedrock RerankerBedrock offers Cohere Rerank and other reranker models. Takes the top N candidates and re-scores them for relevance.

Amazon Titan EmbeddingsEmbeds the query into the same vector space as the indexed chunks.

Bedrock FMGenerates the final answer from the top reranked chunks. Fewer chunks, higher quality = better grounding.

Lambda / Bedrock KBOrchestrates the parallel search, fusion, and reranking. Knowledge Bases can handle this natively with hybrid search + rerank configured.

When to use this pattern

✓ Use Advanced RAG when…

Basic RAG retrieval quality is not good enoughUsers report missing relevant results, or irrelevant results crowding out good ones. Adding hybrid + rerank typically lifts recall and precision together.
Queries mix exact terms and conceptual language"Refund policy for SKU AB-447" — you need to match both "AB-447" (exact) and "refund policy" (semantic). Pure vector search misses the SKU.
Your corpus contains codes, names, IDs, function signatures, or technical jargonTechnical docs, API references, legal documents, medical records — anywhere exact terminology matters and semantics aren't enough.
You need higher precision before sending to the FMFewer, higher-quality chunks mean less context to process, cheaper calls, and better grounded answers.
You're okay with slightly higher latency per queryHybrid + rerank adds ~100-300ms. Acceptable for most chat apps; prohibitive only for very tight SLAs.

✗ Do NOT use Advanced RAG when…

Basic RAG is already producing good resultsDon't over-engineer. If Pattern 1 is working, the added complexity of hybrid + rerank is wasted cost and latency.
Ultra-low latency is a hard requirementReal-time voice assistants, autocomplete scenarios. The extra hops (fusion + reranker call) push you past your budget.
Your corpus is purely conversational / narrative textNatural-language FAQs, customer support transcripts, marketing copy — pure semantic search already covers these well.
You haven't yet tuned chunking or embeddingIf chunks are too large or the embedding model mismatches your domain, hybrid + rerank won't save you. Fix the fundamentals first.
You need transactional operations, not retrieval"Cancel my order" isn't a retrieval problem. Use Pattern 3: Agent.

Advanced RAG · Hybrid Search + Reranking

Architecture diagram

How data flows

AWS services used

When to use this pattern

✓ Use Advanced RAG when…

✗ Do NOT use Advanced RAG when…

Exam angle

Keywords that point here

Related patterns