Task 2.1 — Agentic AI solutions & tool integrations
An agent is an AI system that can autonomously plan, reason, use tools, and take actions. Unlike simple prompt-response, agents loop: observe → think → act → observe.
The ReAct loop
AWS agent services compared
Bedrock Agents
- Define action groups (OpenAPI + Lambda)
- Knowledge base integration
- Built-in guardrails
- Agent tracing out of the box
- Default choice for the exam
Strands Agents
- Run on your compute
- More control over reasoning loop
- Custom memory / state
- When Bedrock Agents can't flex enough
Bedrock AgentCore
- Compute & networking for agents
- Lifecycle management
- Production-grade hosting
AWS Agent Squad
- Coordinate specialized agents
- Supervisor + worker pattern
- Cross-agent routing
Model Context Protocol (MCP)
MCP is the open standard for agent-to-tool interactions. An MCP server exposes tools; an MCP client (the agent) consumes them. Same protocol whether the tool is a Lambda or a complex ECS service.
Safeguarded agent workflows
Human-in-the-loop pattern
Agent proposes an action, human approves, agent executes. Implemented with Step Functions callback pattern — the workflow pauses at an approval task, sends a notification (SNS, email), and resumes when approval arrives.
Task 2.2 — Model deployment strategies
Deployment options
Lambda → Bedrock
- Pay per request
- Zero idle cost
- Subject to throttling
- Good for low/variable volume
Provisioned Throughput
- Reserved capacity
- Guaranteed tokens/min
- Required for custom models
- Commitment-based pricing
SageMaker endpoints
- Full hosting control
- Instance type choice
- Auto-scaling
- Complex ops
Hybrid
- Bedrock for standard
- SageMaker for custom
- Route by request type
LLM-specific deployment challenges
Traditional ML deployments don't prepare you for LLMs:
- Memory requirements — container patterns optimized for tens to hundreds of GB of model weights
- GPU utilization — LLMs need specific GPU types (A10G, H100, A100); right-size carefully
- Token processing capacity — throughput is tokens/second, not just requests/second
- Model loading strategies — large weights take time to load; consider warm pools, pre-loading, snapshot restore
Optimized deployment approaches
Task 2.3 — Enterprise integration architectures
Enterprise connectivity patterns
GenAI enhancement patterns
Cross-environment AI
AWS Outposts
- AWS infra in your datacenter
- Data compliance (HIPAA, GDPR, sovereign)
- Secure routing cloud ↔ on-prem
AWS Wavelength
- Deploy at 5G edge
- Single-digit ms latency
- Mobile/IoT use cases
VPC endpoints + PrivateLink
- Bedrock via private network
- No internet egress
- Required for many compliance profiles
CI/CD for GenAI — what's different
GenAI gateway architecture
Enterprise answer when multiple teams/apps need FM access with centralized control. Pattern 6 in the Architecture Patterns reference.
Task 2.4 — FM API integrations
Sync vs. async vs. streaming
Synchronous (InvokeModel)
- Simple request/response
- Good for batch, APIs with short responses
- Higher perceived latency for chat
Asynchronous (via SQS)
- Decouples producer from FM
- Absorbs bursts
- Handles throttling gracefully
Streaming (InvokeModelWithResponseStream)
- Best for chat UX
- WebSocket or SSE transport
- API Gateway chunked transfer
Batch inference
- Submit jobs, poll for completion
- Large throughput at low cost
- No latency guarantee
Streaming delivery mechanisms
Resilient FM systems
Intelligent model routing
Static routing
- Simplest option
- No runtime flexibility
- Change requires redeploy
Content-based
- Step Functions orchestrates
- Classifier → model selection
- Supports cascading
Metrics-based
- Route by current latency/cost
- Automatically avoid slow models
- Requires performance telemetry
Gateway transform
- Rewrite requests per destination model
- Model-agnostic client
- Central routing logic
Task 2.5 — Application integration patterns & dev tools
GenAI-specific API patterns
- Streaming response handling in API Gateway — differs from traditional REST; use WebSocket API or chunked transfer
- Token limit management — truncate or summarize inputs that would exceed context window
- Retry strategies for model timeouts — LLM timeouts look different from HTTP timeouts; don't just retry blindly on 504
Developer-facing tools
Business system enhancements
CRM enhancement
- Lambda calls Bedrock on ticket create
- Writes AI summary back to CRM
- Agent uses the summary
Document processing
- Step Functions orchestrates
- Textract → Comprehend → Bedrock
- Handles async, retries, errors
Amazon Q Business
- Connectors for S3, SharePoint, Salesforce
- Q Business Apps = no-code custom apps
- Answer questions over internal knowledge
Bedrock Data Automation
- Document extraction + transformation
- Reduces custom pipeline code
- Integrates with Bedrock ecosystem
Amazon Q Developer
AI coding assistant for accelerating GenAI development:
- Code generation & refactoring for Bedrock integrations
- API assistance — auto-complete for Bedrock SDK calls
- AI component testing helpers
- Performance optimization suggestions
- GenAI-specific error pattern recognition when debugging
Troubleshooting efficiency
Domain 2 summary — what to remember
The service map
- Agents Bedrock Agents (default) · Strands · Agent Squad
- Agent infra Bedrock AgentCore
- Tools MCP (Lambda = simple, ECS = complex)
- Deploy Lambda · Provisioned Throughput · SageMaker
- Orchestrate Step Functions (anything) · Prompt Flows (Bedrock-only)
- Events EventBridge + Lambda
- Integration API Gateway + Cognito (gateway)
- CI/CD CodePipeline + prompt regression testing
The mental shortcuts
- Approval needed? Step Functions callback pattern.
- Complex MCP tool? ECS, not Lambda.
- Chat interface? Streaming API + WebSocket.
- Predictable high volume? Provisioned Throughput.
- Data can't leave premises? Outposts.
- Off public internet? VPC endpoints / PrivateLink.
- Multi-team FM access? GenAI gateway.
- Batch / non-real-time? Batch inference (discount).