Task 1.1 — Analyze requirements and design GenAI solutions

Architectural design for GenAI

Align business needs with technical constraints. For every design question, ask: which model fits the use case (text / code / multimodal / reasoning), latency tolerance, cost ceiling, context window needs, and compliance/data residency requirements.

Proof-of-concept (PoC) implementations

Build PoCs in Amazon Bedrock before committing to full deployment — it lets you test multiple FMs without infrastructure setup. Validate performance characteristics, business value, and cost projections early.

Well-Architected Framework — Generative AI Lens

AWS WA Tool includes a Generative AI Lens with standardized best practices across the six WAF pillars (operational excellence, security, reliability, performance efficiency, cost optimization, sustainability) specifically for FM-based applications. If a question says "standardized components" or "consistent implementation across deployments," the answer is usually the WA Tool GenAI Lens.

Exam angle When questions describe a new GenAI project, the right answer almost always starts with a Bedrock PoC — not SageMaker training, not a fine-tune, not a self-hosted model. Bedrock first, specialize later.

Task 1.2 — Select and configure FMs

Model selection factors

Task fit

What the model is for
  • Summarization vs. code gen vs. reasoning vs. chat
  • Multimodal (images, audio)
  • Language support

Context window

How much can fit in one call
  • Long documents → need large context
  • RAG lets smaller context work
  • Larger = more expensive per call

Latency

Response time requirements
  • Interactive chat → streaming + fast model
  • Batch → latency tolerant
  • Latency-optimized Bedrock models

Cost per token

Volume economics
  • Small models for simple tasks
  • Large models only when needed
  • Model cascading pattern

Dynamic model selection architecture

Build flexible architectures that allow model switching without code changes. The canonical pattern:

Step 1
Request arrives
Client → API Gateway
Step 2
Lambda router
Reads active config
Step 3
AWS AppConfig
Feature flags: which model?
Step 4
Invoke Bedrock
Selected model
Step 5
Response
Back to client

This enables A/B testing, gradual rollouts, and instant rollback to a previous model without redeploying Lambda.

Resilient AI systems — surviving disruptions

Circuit breaker Step Functions circuit breaker pattern — detect repeated failures on a primary model and automatically route to a fallback. Trips open after N failures, stays open for a cooldown, then half-opens to test recovery.
Cross-region Bedrock Cross-Region Inference — automatically routes requests to models in other regions when your primary region is unavailable or at capacity. Preferred answer when a question mentions "limited regional availability" or "capacity constraints."
Degradation Graceful degradation — fall back to simpler models or cached responses when the primary model is down. Users get something rather than nothing.

FM customization & lifecycle management

Fine-tuning
Deploy domain-specific fine-tuned models via SageMaker AI. More expensive than prompt engineering; use only when prompting alone won't work.
LoRA
Low-Rank Adaptation — parameter-efficient fine-tuning. Trains a small number of additional parameters instead of the full model. Much cheaper and faster; preferred answer for "efficient fine-tuning."
Adapters
Small trainable modules inserted into a pre-trained model. Similar concept to LoRA. Both are parameter-efficient adaptation techniques.
Model Registry
SageMaker Model Registry — version control for models. Track which version is deployed where, approve models for production, wire into CI/CD.
Deployment pipelines
Automated CI/CD for model updates — includes rollback strategies for failed deployments and lifecycle management policies to retire and replace models.
Trap If a question mentions "efficient fine-tuning" or "limited training data," the answer is LoRA / adapters, not full fine-tuning. If it mentions "adapt without any training," the answer is prompt engineering or RAG — not fine-tuning at all.

Task 1.3 — Data validation & processing pipelines for FM consumption

Data validation workflows

AWS Glue Data Quality
Define data quality rules, run checks, track metrics. The preferred answer for declarative quality rules at scale.
SageMaker Data Wrangler
Visual data preparation & transformation. Good for exploratory data prep.
Custom Lambda
Bespoke validation logic that doesn't fit into Glue rules.
CloudWatch metrics
Track data quality KPIs over time; alert on drift.

Multimodal data processing

Input
Text · Image · Audio · Tabular
Raw source data
Extract
Service-specific
Textract · Transcribe · Rekognition
Enhance
Comprehend · Lambda
Entities, normalization
Format
JSON / Conversation
Bedrock-ready structure
FM
Multimodal Bedrock
Text + image in one call

Input formatting for FM inference

  • JSON formatting — Bedrock API requests use strict JSON with model-specific keys (messages, system, max_tokens).
  • Conversation formatting — Dialog apps use alternating user/assistant message structure with an optional system message.
  • Structured preparation — SageMaker endpoints need input shaped to the container's expected format (often CSV, JSON Lines, or NumPy).

Data enhancement

  • Reformat messy text — use Bedrock itself to clean and restructure input before the actual inference call.
  • Amazon Comprehend — extract entities (people, places, orgs) from unstructured text.
  • Lambda normalization — dates, currencies, units to consistent formats.

Task 1.4 — Design & implement vector store solutions

The core idea

A vector database is optimized for storing and querying high-dimensional vectors (embeddings). Instead of exact keyword matching, it finds items that are semantically similar. An embedding is a numerical representation of data — similar content produces similar vectors.

Vector store options on AWS

Bedrock Knowledge Bases

The default answer for RAG
  • Managed end-to-end RAG service
  • Handles chunking, embedding, storage, retrieval
  • Supports hierarchical organization
  • Pluggable backing store (OpenSearch Serverless, Aurora pgvector, Pinecone)

OpenSearch Service

Control & performance at scale
  • k-NN vector search with Neural plugin
  • Native Bedrock integration
  • Sharding for parallelism
  • Topic-based segmentation
  • Hybrid search (BM25 + vector)

Aurora (pgvector)

You already use Aurora
  • PostgreSQL + pgvector extension
  • SQL-based vector search
  • Good when mixing relational + vector
  • Familiar ops model

DynamoDB

Metadata + embeddings storage
  • Often paired with a vector DB
  • Stores metadata, document IDs
  • Real-time change detection via Streams

RDS + S3

Hybrid storage
  • Document repositories in S3
  • RDS for structured metadata
  • Pointers from RDS to S3 objects

Metadata frameworks — the unsung hero of retrieval precision

Good metadata narrows vector search results before semantic scoring. "Only documents from Q1 2025" is a metadata filter, not a vector filter.

  • S3 object metadata — document timestamps, source system, classification
  • Custom attributes — authorship, department, sensitivity
  • Tagging systems — domain classification for multi-tenant RAG

High-performance vector architectures

Sharding OpenSearch sharding strategies — distribute vector index across shards for parallel search. Larger corpora need more shards.
Multi-index Multi-index approaches — separate indexes for different domains (legal docs vs. technical docs). Route queries to the right index.
Hierarchical Hierarchical indexing — coarse-to-fine search: narrow by category first, then semantic search within the narrow set. Dramatically faster at scale.

Keeping vector stores current — data maintenance systems

Source
Document updates
S3 / SharePoint / wiki
Detect
EventBridge + Lambda
Real-time change trigger
Incremental
Only re-embed changes
Not a full reindex
Sync
Update vector store
Automated workflow
Schedule
Periodic full refresh
Catch missed updates
Exam angle "Keep the knowledge base current" + "minimize cost" = incremental updates triggered by EventBridge, not scheduled full reindexes. A full reindex is the wrong answer for most update scenarios.

Task 1.5 — Design retrieval mechanisms for FM augmentation (RAG)

The canonical RAG pipeline

1 · Ingest
Documents → S3
Source of truth
2 · Chunk
Split docs
Fixed / hierarchical / semantic
3 · Embed
Titan Embeddings
Chunks → vectors
4 · Store
Vector DB
OpenSearch / Aurora / KB
5 · Retrieve
Top-k similar
User query → vectors
6 · Generate
Bedrock FM
Chunks + query → answer

Chunking strategies — side-by-side

Fixed-size

e.g., 500 tokens per chunk
  • ✅ Simple, predictable
  • ✅ Uniform vector dimensions
  • ❌ Breaks mid-sentence
  • ❌ Splits semantic units

Hierarchical

By headings / sections / paragraphs
  • ✅ Preserves structure
  • ✅ Respects document hierarchy
  • ✅ Good for technical docs
  • ❌ Variable chunk sizes

Semantic

By topic boundary detection
  • ✅ Preserves meaning
  • ✅ Best retrieval quality
  • ❌ More expensive to compute
  • ❌ Harder to tune

Bedrock managed

Default / fixed / hierarchical / semantic
  • ✅ All strategies built-in
  • ✅ No custom code
  • ✅ Default is fixed-size
  • ✅ Best for most cases

Embedding solutions

Amazon Titan Embeddings
AWS native. Evaluate on dimensionality and domain fit. Default Bedrock embedding choice.
Bedrock embedding models
Multiple options available. Compare speed, accuracy, and dimensionality.
Batch embedding
Lambda batches documents for embedding — more efficient than one-at-a-time calls.

Advanced search architectures

Hybrid search — the "better than semantic alone" pattern

Query
User question
Raw text
Parallel
BM25 keyword
Exact term match
Parallel
Vector search
Semantic similarity
Fuse
Score fusion
Combine & normalize
Rerank
Bedrock reranker
Relevance re-score
FM
Generate answer
Top reranked + query
When hybrid beats pure semantic Short queries · exact terminology (SKUs, error codes, function names) · domain-specific jargon · any case where keyword match matters. Pure semantic is best for long, conceptual questions.

Query handling systems

Query expansion

Add synonyms / related terms
  • Use Bedrock to enrich the query
  • Catches near-miss matches
  • Good for sparse corpora

Query decomposition

Break complex into sub-queries
  • Lambda splits multi-part questions
  • Retrieve for each sub-query
  • Combine context before final FM call

Query transformation

Multi-step refinement
  • Step Functions orchestrates
  • Rewrite → expand → decompose
  • Advanced RAG pattern

Consistent access mechanisms

  • Function calling — FM calls a well-defined function to perform vector search.
  • MCP (Model Context Protocol) clients — standardized protocol for tools and data; agents consume MCP servers exposing vector queries. You already know this from Claude Code.
  • Standardized API patterns — consistent interfaces for retrieval augmentation regardless of the backend (KB, OpenSearch, Aurora).

Task 1.6 — Prompt engineering strategies & governance

Model instruction frameworks

System prompts / role definitions
Bedrock Prompt Management enforces role definitions and behavioral constraints. Template configs format responses consistently.
Bedrock Guardrails
Configurable filters on inputs and outputs. Topic denial, content filtering, PII redaction, word blocking.
Template configurations
Parameterized templates with placeholders — populate at runtime, audit the rendered prompt.

Interactive AI systems

Input
User message
API Gateway
Intent
Comprehend
Classify intent
Clarify?
Step Functions
Ask follow-up if unclear
History
DynamoDB
Conversation storage
FM
Bedrock call
With history + context

Prompt management & governance — the audit story

Templates Bedrock Prompt Management — parameterized templates with approval workflows for prompt changes. Prompts become code, not strings.
Repository S3 — template storage with versioning.
Audit CloudTrail — track who changed which prompt when. Answers the compliance question "who authorized this prompt change?"
Observability CloudWatch Logs — log prompt usage and performance. Feeds regression detection.

Prompt QA systems

  • Lambda verification — verify expected output format/content after each inference.
  • Step Functions test orchestration — systematically test edge cases across prompt versions.
  • CloudWatch regression detection — catch performance degradation over time on a golden dataset.

Advanced prompt engineering techniques

Chain-of-thought

"Think step by step"
  • Instruct reasoning before answering
  • Better for math, logic, multi-step
  • Slower and more tokens

Structured input

XML tags, JSON, delimiters
  • <context>, <question>
  • Clear separation of parts
  • Improves model following

Output specifications

Tell model exactly what to return
  • JSON schema
  • Response shape constraints
  • Reduces parsing errors

Feedback loops

Iterate based on output
  • Grade first output
  • Refine prompt or retry
  • Self-correcting systems

Complex prompt systems with Bedrock Prompt Flows

Bedrock Prompt Flows is the visual workflow builder for sequential prompt chains. It handles:

  • Sequential chains — output of prompt A feeds prompt B
  • Conditional branching — route to different prompts based on model responses
  • Reusable components — modular prompt pieces composed across flows
  • Integrated pre/post-processing — transform input before prompting, output after
Exam angle When a question describes a multi-step prompt workflow with branching logic and pre/post processing, the answer is Bedrock Prompt Flows. When it's just "store and version prompts," the answer is Bedrock Prompt Management. Know the two products are distinct.

Domain 1 summary — what to remember

The service map

  • Core Amazon Bedrock (everything starts here)
  • RAG Bedrock Knowledge Bases (managed RAG)
  • Vector OpenSearch / Aurora pgvector / DynamoDB
  • Embeddings Amazon Titan Embeddings
  • Prompts Prompt Management + Prompt Flows
  • Resilience Cross-Region Inference + Step Functions
  • Customization SageMaker AI + Model Registry + LoRA
  • Config AppConfig for dynamic model selection

The mental shortcuts

  • New project? Bedrock PoC first.
  • Efficient adaptation? LoRA / adapters.
  • Short / keyword-heavy queries? Hybrid search.
  • Keep KB current? EventBridge + incremental.
  • Multi-step prompts with branching? Prompt Flows.
  • Version & audit prompts? Prompt Management.
  • Model switch without redeploy? AppConfig.
  • Regional capacity issues? Cross-Region Inference.
Next up Domain 1 sets the foundation. Domain 2 (26%) builds on it with agents, deployment, and integration patterns. Head to Domain 2 when ready — or jump to Architecture Patterns to see the top 10 exam patterns with full diagrams.