Task 2.1 — Agentic AI solutions & tool integrations

An agent is an AI system that can autonomously plan, reason, use tools, and take actions. Unlike simple prompt-response, agents loop: observe → think → act → observe.

The ReAct loop

1 · Observe
User request
Task received
2 · Reason
Agent thinks
What's needed?
3 · Act
Call a tool
Action group / Lambda
4 · Observe
Tool result
Feedback to agent
5 · Loop?
Done or iterate
Another action or respond
6 · Respond
Final answer
Guardrails applied

AWS agent services compared

Bedrock Agents

Fully managed, AWS-native
  • Define action groups (OpenAPI + Lambda)
  • Knowledge base integration
  • Built-in guardrails
  • Agent tracing out of the box
  • Default choice for the exam

Strands Agents

Open-source, customizable
  • Run on your compute
  • More control over reasoning loop
  • Custom memory / state
  • When Bedrock Agents can't flex enough

Bedrock AgentCore

Infrastructure for agents at scale
  • Compute & networking for agents
  • Lifecycle management
  • Production-grade hosting

AWS Agent Squad

Multi-agent orchestration
  • Coordinate specialized agents
  • Supervisor + worker pattern
  • Cross-agent routing

Model Context Protocol (MCP)

MCP is the open standard for agent-to-tool interactions. An MCP server exposes tools; an MCP client (the agent) consumes them. Same protocol whether the tool is a Lambda or a complex ECS service.

Agent
MCP Client
Needs a tool
Discover
List tools
Standardized protocol
Invoke
Call with params
JSON schema validated
MCP Server
Lambda or ECS
Execute the tool
Return
Structured result
Back to agent context
Exam angle — MCP server deployment Lambda MCP server → lightweight, stateless, lightweight tool access. ECS MCP server → complex tools requiring persistent connections, significant compute, or long-running state. If the question says "persistent database connections" or "significant compute," the answer is ECS, not Lambda.

Safeguarded agent workflows

Stop conditions Step Functions define when an agent should halt iterating (max steps, quality threshold met, task complete).
Timeouts Lambda function timeouts prevent runaway agents from consuming unbounded compute.
IAM boundaries IAM resource boundaries — agents can only access what they're explicitly permitted. Least privilege on the agent's execution role.
Circuit breakers Circuit breakers detect failure patterns (repeated tool errors, infinite loops) and halt execution before cascading.

Human-in-the-loop pattern

Agent proposes an action, human approves, agent executes. Implemented with Step Functions callback pattern — the workflow pauses at an approval task, sends a notification (SNS, email), and resumes when approval arrives.

Pattern worth memorizing Any question about "high-value actions require approval" or "sensitive operations need review" maps to Step Functions + human approval task + callback pattern. Not Guardrails (those filter content), not IAM (binary allow/deny), not system prompts (unreliable).

Task 2.2 — Model deployment strategies

Deployment options

Lambda → Bedrock

On-demand, sporadic
  • Pay per request
  • Zero idle cost
  • Subject to throttling
  • Good for low/variable volume

Provisioned Throughput

Predictable high volume
  • Reserved capacity
  • Guaranteed tokens/min
  • Required for custom models
  • Commitment-based pricing

SageMaker endpoints

Custom / fine-tuned models
  • Full hosting control
  • Instance type choice
  • Auto-scaling
  • Complex ops

Hybrid

Best of both
  • Bedrock for standard
  • SageMaker for custom
  • Route by request type

LLM-specific deployment challenges

Traditional ML deployments don't prepare you for LLMs:

  • Memory requirements — container patterns optimized for tens to hundreds of GB of model weights
  • GPU utilization — LLMs need specific GPU types (A10G, H100, A100); right-size carefully
  • Token processing capacity — throughput is tokens/second, not just requests/second
  • Model loading strategies — large weights take time to load; consider warm pools, pre-loading, snapshot restore

Optimized deployment approaches

Cascading Model cascading — try a smaller/cheaper model first; only escalate to a larger model if the response fails a quality threshold. Pattern 5 in the Architecture Patterns reference.
Small models Smaller pre-trained models for routine queries (classification, extraction, summarization of short texts) reserve big models for hard work.
Right-sizing Right-sizing — match model capability to task complexity. Don't use a reasoning model for spell-check.

Task 2.3 — Enterprise integration architectures

Enterprise connectivity patterns

API-based integrations
REST/GraphQL endpoints wrapping Bedrock. Legacy systems talk to the API; GenAI stays behind it.
Event-driven
EventBridge for loose coupling. Business event fires → Lambda reads it → Bedrock call → result published.
Data sync
Keep enterprise data stores and vector stores in sync with change-data-capture (DynamoDB Streams, Aurora triggers, S3 events).

GenAI enhancement patterns

Event source
CRM / ticket / upload
Business event
Route
EventBridge
Filter & dispatch
Enrich
Lambda
Build prompt
Generate
Bedrock
FM inference
Integrate
Write back
CRM / DB / user

Cross-environment AI

AWS Outposts

Data must stay on-premises
  • AWS infra in your datacenter
  • Data compliance (HIPAA, GDPR, sovereign)
  • Secure routing cloud ↔ on-prem

AWS Wavelength

Ultra-low latency
  • Deploy at 5G edge
  • Single-digit ms latency
  • Mobile/IoT use cases

VPC endpoints + PrivateLink

Traffic off public internet
  • Bedrock via private network
  • No internet egress
  • Required for many compliance profiles

CI/CD for GenAI — what's different

Commit
CodePipeline
Source trigger
Build
CodeBuild
Package Lambda / prompts
Test
Prompt regression
Golden dataset
Scan
Injection tests
Guardrail validation
Deploy
Canary + rollback
Gradual rollout
Exam angle A question asking which CI/CD step is unique to GenAI (not in a traditional pipeline) points to prompt regression testing against a golden dataset or guardrail validation. Unit tests, integration tests, IAM scans — all exist in regular pipelines.

GenAI gateway architecture

Enterprise answer when multiple teams/apps need FM access with centralized control. Pattern 6 in the Architecture Patterns reference.

Entry API Gateway + Cognito — identity federation, auth, request validation.
Filter Bedrock Guardrails — content filtering and PII redaction before the FM sees anything.
Route Lambda / Step Functions — pick the right FM per team, request type, policy.
Govern Rate limit per team (API Gateway throttling), cost attribution (tags), audit (CloudTrail + Invocation Logs).

Task 2.4 — FM API integrations

Sync vs. async vs. streaming

Synchronous (InvokeModel)

Wait for full response
  • Simple request/response
  • Good for batch, APIs with short responses
  • Higher perceived latency for chat

Asynchronous (via SQS)

Queue when volume exceeds capacity
  • Decouples producer from FM
  • Absorbs bursts
  • Handles throttling gracefully

Streaming (InvokeModelWithResponseStream)

Token-by-token delivery
  • Best for chat UX
  • WebSocket or SSE transport
  • API Gateway chunked transfer

Batch inference

~50% discount, non-real-time
  • Submit jobs, poll for completion
  • Large throughput at low cost
  • No latency guarantee

Streaming delivery mechanisms

WebSocket API Gateway WebSocket API — bidirectional, persistent connection. Best choice for chat.
SSE Server-Sent Events — one-way server → client over HTTP. Simpler than WebSocket when you don't need bidirectional.
Chunked HTTP API Gateway chunked transfer encoding — stream over REST. Works when WebSocket isn't available.

Resilient FM systems

Exponential backoff
AWS SDK retry strategies. Wait longer between each retry. Add jitter to avoid thundering herd.
Rate limiting
API Gateway throttling to prevent your app from hammering Bedrock.
Graceful degradation
Fall back to cached responses or simpler models when the primary FM is unavailable.
X-Ray tracing
Observability across service boundaries. Essential for diagnosing where latency comes from.

Intelligent model routing

Static routing

Hard-coded in app
  • Simplest option
  • No runtime flexibility
  • Change requires redeploy

Content-based

Analyze request, pick model
  • Step Functions orchestrates
  • Classifier → model selection
  • Supports cascading

Metrics-based

Live performance data
  • Route by current latency/cost
  • Automatically avoid slow models
  • Requires performance telemetry

Gateway transform

API Gateway request transforms
  • Rewrite requests per destination model
  • Model-agnostic client
  • Central routing logic

Task 2.5 — Application integration patterns & dev tools

GenAI-specific API patterns

  • Streaming response handling in API Gateway — differs from traditional REST; use WebSocket API or chunked transfer
  • Token limit management — truncate or summarize inputs that would exceed context window
  • Retry strategies for model timeouts — LLM timeouts look different from HTTP timeouts; don't just retry blindly on 504

Developer-facing tools

AWS Amplify
Declarative UI components for GenAI frontends. Hosting, auth, API. Accelerates UI build.
OpenAPI specs
API-first development. Your Bedrock Agent action groups use OpenAPI to define tool schemas.
Bedrock Prompt Flows
No-code visual workflow builder for prompt chains. Good for teams that want to iterate on logic without deploying code.

Business system enhancements

CRM enhancement

Auto-generate summaries
  • Lambda calls Bedrock on ticket create
  • Writes AI summary back to CRM
  • Agent uses the summary

Document processing

Extract → classify → summarize
  • Step Functions orchestrates
  • Textract → Comprehend → Bedrock
  • Handles async, retries, errors

Amazon Q Business

Enterprise assistant
  • Connectors for S3, SharePoint, Salesforce
  • Q Business Apps = no-code custom apps
  • Answer questions over internal knowledge

Bedrock Data Automation

Automated processing pipelines
  • Document extraction + transformation
  • Reduces custom pipeline code
  • Integrates with Bedrock ecosystem

Amazon Q Developer

AI coding assistant for accelerating GenAI development:

  • Code generation & refactoring for Bedrock integrations
  • API assistance — auto-complete for Bedrock SDK calls
  • AI component testing helpers
  • Performance optimization suggestions
  • GenAI-specific error pattern recognition when debugging

Troubleshooting efficiency

Logs Insights CloudWatch Logs Insights — query prompts and responses at scale. Find patterns in failures.
Tracing X-Ray — trace FM API calls end-to-end. See which hop in the RAG pipeline is slow.
Error recognition Amazon Q Developer — recognizes GenAI-specific error patterns (context overflow, guardrail blocks, model timeouts) and suggests fixes.

Domain 2 summary — what to remember

The service map

  • Agents Bedrock Agents (default) · Strands · Agent Squad
  • Agent infra Bedrock AgentCore
  • Tools MCP (Lambda = simple, ECS = complex)
  • Deploy Lambda · Provisioned Throughput · SageMaker
  • Orchestrate Step Functions (anything) · Prompt Flows (Bedrock-only)
  • Events EventBridge + Lambda
  • Integration API Gateway + Cognito (gateway)
  • CI/CD CodePipeline + prompt regression testing

The mental shortcuts

  • Approval needed? Step Functions callback pattern.
  • Complex MCP tool? ECS, not Lambda.
  • Chat interface? Streaming API + WebSocket.
  • Predictable high volume? Provisioned Throughput.
  • Data can't leave premises? Outposts.
  • Off public internet? VPC endpoints / PrivateLink.
  • Multi-team FM access? GenAI gateway.
  • Batch / non-real-time? Batch inference (discount).
Next up Continue to Domain 3 — AI Safety, Security & Governance (20% of the exam, and your CISSP strength). Or jump to the 10 Architecture Patterns for diagrammed end-to-end reference designs.