Task 1.1 — Analyze requirements and design GenAI solutions
Architectural design for GenAI
Align business needs with technical constraints. For every design question, ask: which model fits the use case (text / code / multimodal / reasoning), latency tolerance, cost ceiling, context window needs, and compliance/data residency requirements.
Proof-of-concept (PoC) implementations
Build PoCs in Amazon Bedrock before committing to full deployment — it lets you test multiple FMs without infrastructure setup. Validate performance characteristics, business value, and cost projections early.
Well-Architected Framework — Generative AI Lens
AWS WA Tool includes a Generative AI Lens with standardized best practices across the six WAF pillars (operational excellence, security, reliability, performance efficiency, cost optimization, sustainability) specifically for FM-based applications. If a question says "standardized components" or "consistent implementation across deployments," the answer is usually the WA Tool GenAI Lens.
Task 1.2 — Select and configure FMs
Model selection factors
Task fit
- Summarization vs. code gen vs. reasoning vs. chat
- Multimodal (images, audio)
- Language support
Context window
- Long documents → need large context
- RAG lets smaller context work
- Larger = more expensive per call
Latency
- Interactive chat → streaming + fast model
- Batch → latency tolerant
- Latency-optimized Bedrock models
Cost per token
- Small models for simple tasks
- Large models only when needed
- Model cascading pattern
Dynamic model selection architecture
Build flexible architectures that allow model switching without code changes. The canonical pattern:
This enables A/B testing, gradual rollouts, and instant rollback to a previous model without redeploying Lambda.
Resilient AI systems — surviving disruptions
FM customization & lifecycle management
Task 1.3 — Data validation & processing pipelines for FM consumption
Data validation workflows
Multimodal data processing
Input formatting for FM inference
- JSON formatting — Bedrock API requests use strict JSON with model-specific keys (
messages,system,max_tokens). - Conversation formatting — Dialog apps use alternating
user/assistantmessage structure with an optionalsystemmessage. - Structured preparation — SageMaker endpoints need input shaped to the container's expected format (often CSV, JSON Lines, or NumPy).
Data enhancement
- Reformat messy text — use Bedrock itself to clean and restructure input before the actual inference call.
- Amazon Comprehend — extract entities (people, places, orgs) from unstructured text.
- Lambda normalization — dates, currencies, units to consistent formats.
Task 1.4 — Design & implement vector store solutions
The core idea
A vector database is optimized for storing and querying high-dimensional vectors (embeddings). Instead of exact keyword matching, it finds items that are semantically similar. An embedding is a numerical representation of data — similar content produces similar vectors.
Vector store options on AWS
Bedrock Knowledge Bases
- Managed end-to-end RAG service
- Handles chunking, embedding, storage, retrieval
- Supports hierarchical organization
- Pluggable backing store (OpenSearch Serverless, Aurora pgvector, Pinecone)
OpenSearch Service
- k-NN vector search with Neural plugin
- Native Bedrock integration
- Sharding for parallelism
- Topic-based segmentation
- Hybrid search (BM25 + vector)
Aurora (pgvector)
- PostgreSQL + pgvector extension
- SQL-based vector search
- Good when mixing relational + vector
- Familiar ops model
DynamoDB
- Often paired with a vector DB
- Stores metadata, document IDs
- Real-time change detection via Streams
RDS + S3
- Document repositories in S3
- RDS for structured metadata
- Pointers from RDS to S3 objects
Metadata frameworks — the unsung hero of retrieval precision
Good metadata narrows vector search results before semantic scoring. "Only documents from Q1 2025" is a metadata filter, not a vector filter.
- S3 object metadata — document timestamps, source system, classification
- Custom attributes — authorship, department, sensitivity
- Tagging systems — domain classification for multi-tenant RAG
High-performance vector architectures
Keeping vector stores current — data maintenance systems
Task 1.5 — Design retrieval mechanisms for FM augmentation (RAG)
The canonical RAG pipeline
Chunking strategies — side-by-side
Fixed-size
- ✅ Simple, predictable
- ✅ Uniform vector dimensions
- ❌ Breaks mid-sentence
- ❌ Splits semantic units
Hierarchical
- ✅ Preserves structure
- ✅ Respects document hierarchy
- ✅ Good for technical docs
- ❌ Variable chunk sizes
Semantic
- ✅ Preserves meaning
- ✅ Best retrieval quality
- ❌ More expensive to compute
- ❌ Harder to tune
Bedrock managed
- ✅ All strategies built-in
- ✅ No custom code
- ✅ Default is fixed-size
- ✅ Best for most cases
Embedding solutions
Advanced search architectures
Hybrid search — the "better than semantic alone" pattern
Query handling systems
Query expansion
- Use Bedrock to enrich the query
- Catches near-miss matches
- Good for sparse corpora
Query decomposition
- Lambda splits multi-part questions
- Retrieve for each sub-query
- Combine context before final FM call
Query transformation
- Step Functions orchestrates
- Rewrite → expand → decompose
- Advanced RAG pattern
Consistent access mechanisms
- Function calling — FM calls a well-defined function to perform vector search.
- MCP (Model Context Protocol) clients — standardized protocol for tools and data; agents consume MCP servers exposing vector queries. You already know this from Claude Code.
- Standardized API patterns — consistent interfaces for retrieval augmentation regardless of the backend (KB, OpenSearch, Aurora).
Task 1.6 — Prompt engineering strategies & governance
Model instruction frameworks
Interactive AI systems
Prompt management & governance — the audit story
Prompt QA systems
- Lambda verification — verify expected output format/content after each inference.
- Step Functions test orchestration — systematically test edge cases across prompt versions.
- CloudWatch regression detection — catch performance degradation over time on a golden dataset.
Advanced prompt engineering techniques
Chain-of-thought
- Instruct reasoning before answering
- Better for math, logic, multi-step
- Slower and more tokens
Structured input
<context>,<question>- Clear separation of parts
- Improves model following
Output specifications
- JSON schema
- Response shape constraints
- Reduces parsing errors
Feedback loops
- Grade first output
- Refine prompt or retry
- Self-correcting systems
Complex prompt systems with Bedrock Prompt Flows
Bedrock Prompt Flows is the visual workflow builder for sequential prompt chains. It handles:
- Sequential chains — output of prompt A feeds prompt B
- Conditional branching — route to different prompts based on model responses
- Reusable components — modular prompt pieces composed across flows
- Integrated pre/post-processing — transform input before prompting, output after
Domain 1 summary — what to remember
The service map
- Core Amazon Bedrock (everything starts here)
- RAG Bedrock Knowledge Bases (managed RAG)
- Vector OpenSearch / Aurora pgvector / DynamoDB
- Embeddings Amazon Titan Embeddings
- Prompts Prompt Management + Prompt Flows
- Resilience Cross-Region Inference + Step Functions
- Customization SageMaker AI + Model Registry + LoRA
- Config AppConfig for dynamic model selection
The mental shortcuts
- New project? Bedrock PoC first.
- Efficient adaptation? LoRA / adapters.
- Short / keyword-heavy queries? Hybrid search.
- Keep KB current? EventBridge + incremental.
- Multi-step prompts with branching? Prompt Flows.
- Version & audit prompts? Prompt Management.
- Model switch without redeploy? AppConfig.
- Regional capacity issues? Cross-Region Inference.