Domain 2: Implementation and Integration

Task 2.1 — Agentic AI solutions & tool integrations

An agent is an AI system that can autonomously plan, reason, use tools, and take actions. Unlike simple prompt-response, agents loop: observe → think → act → observe.

The ReAct loop

1 · Observe

User request

Task received

2 · Reason

Agent thinks

What's needed?

3 · Act

Call a tool

Action group / Lambda

4 · Observe

Tool result

Feedback to agent

5 · Loop?

Done or iterate

Another action or respond

6 · Respond

Final answer

Guardrails applied

AWS agent services compared

Bedrock Agents

Fully managed, AWS-native

Define action groups (OpenAPI + Lambda)
Knowledge base integration
Built-in guardrails
Agent tracing out of the box
Default choice for the exam

Strands Agents

Open-source, customizable

Run on your compute
More control over reasoning loop
Custom memory / state
When Bedrock Agents can't flex enough

Bedrock AgentCore

Infrastructure for agents at scale

Compute & networking for agents
Lifecycle management
Production-grade hosting

AWS Agent Squad

Multi-agent orchestration

Coordinate specialized agents
Supervisor + worker pattern
Cross-agent routing

Model Context Protocol (MCP)

MCP is the open standard for agent-to-tool interactions. An MCP server exposes tools; an MCP client (the agent) consumes them. Same protocol whether the tool is a Lambda or a complex ECS service.

Agent

MCP Client

Needs a tool

Discover

List tools

Standardized protocol

Invoke

Call with params

JSON schema validated

MCP Server

Lambda or ECS

Execute the tool

Return

Structured result

Back to agent context

Exam angle — MCP server deployment Lambda MCP server → lightweight, stateless, lightweight tool access. ECS MCP server → complex tools requiring persistent connections, significant compute, or long-running state. If the question says "persistent database connections" or "significant compute," the answer is ECS, not Lambda.

Safeguarded agent workflows

Stop conditions Step Functions define when an agent should halt iterating (max steps, quality threshold met, task complete).

Timeouts Lambda function timeouts prevent runaway agents from consuming unbounded compute.

IAM boundaries IAM resource boundaries — agents can only access what they're explicitly permitted. Least privilege on the agent's execution role.

Circuit breakers Circuit breakers detect failure patterns (repeated tool errors, infinite loops) and halt execution before cascading.

Human-in-the-loop pattern

Agent proposes an action, human approves, agent executes. Implemented with Step Functions callback pattern — the workflow pauses at an approval task, sends a notification (SNS, email), and resumes when approval arrives.

Pattern worth memorizing Any question about "high-value actions require approval" or "sensitive operations need review" maps to Step Functions + human approval task + callback pattern. Not Guardrails (those filter content), not IAM (binary allow/deny), not system prompts (unreliable).

Task 2.2 — Model deployment strategies

Deployment options

Lambda → Bedrock

On-demand, sporadic

Pay per request
Zero idle cost
Subject to throttling
Good for low/variable volume

Provisioned Throughput

Predictable high volume

Reserved capacity
Guaranteed tokens/min
Required for custom models
Commitment-based pricing

SageMaker endpoints

Custom / fine-tuned models

Full hosting control
Instance type choice
Auto-scaling
Complex ops

Hybrid

Best of both

Bedrock for standard
SageMaker for custom
Route by request type

LLM-specific deployment challenges

Traditional ML deployments don't prepare you for LLMs:

Memory requirements — container patterns optimized for tens to hundreds of GB of model weights
GPU utilization — LLMs need specific GPU types (A10G, H100, A100); right-size carefully
Token processing capacity — throughput is tokens/second, not just requests/second
Model loading strategies — large weights take time to load; consider warm pools, pre-loading, snapshot restore

Optimized deployment approaches

Cascading Model cascading — try a smaller/cheaper model first; only escalate to a larger model if the response fails a quality threshold. Pattern 5 in the Architecture Patterns reference.

Small models Smaller pre-trained models for routine queries (classification, extraction, summarization of short texts) reserve big models for hard work.

Right-sizing Right-sizing — match model capability to task complexity. Don't use a reasoning model for spell-check.

Task 2.3 — Enterprise integration architectures

Enterprise connectivity patterns

API-based integrations

REST/GraphQL endpoints wrapping Bedrock. Legacy systems talk to the API; GenAI stays behind it.

Event-driven

EventBridge for loose coupling. Business event fires → Lambda reads it → Bedrock call → result published.

Data sync

Keep enterprise data stores and vector stores in sync with change-data-capture (DynamoDB Streams, Aurora triggers, S3 events).

GenAI enhancement patterns

Event source

CRM / ticket / upload

Business event

Route

EventBridge

Filter & dispatch

Enrich

Lambda

Build prompt

Generate

Bedrock

FM inference

Integrate

Write back

CRM / DB / user

Cross-environment AI

AWS Outposts

Data must stay on-premises

AWS infra in your datacenter
Data compliance (HIPAA, GDPR, sovereign)
Secure routing cloud ↔ on-prem

AWS Wavelength

Ultra-low latency

Deploy at 5G edge
Single-digit ms latency
Mobile/IoT use cases

VPC endpoints + PrivateLink

Traffic off public internet

Bedrock via private network
No internet egress
Required for many compliance profiles

CI/CD for GenAI — what's different

Commit

CodePipeline

Source trigger

Build

CodeBuild

Package Lambda / prompts

Test

Prompt regression

Golden dataset

Scan

Injection tests

Guardrail validation

Deploy

Canary + rollback

Gradual rollout

Exam angle A question asking which CI/CD step is unique to GenAI (not in a traditional pipeline) points to prompt regression testing against a golden dataset or guardrail validation. Unit tests, integration tests, IAM scans — all exist in regular pipelines.

GenAI gateway architecture

Enterprise answer when multiple teams/apps need FM access with centralized control. Pattern 6 in the Architecture Patterns reference.

Entry API Gateway + Cognito — identity federation, auth, request validation.

Filter Bedrock Guardrails — content filtering and PII redaction before the FM sees anything.

Route Lambda / Step Functions — pick the right FM per team, request type, policy.

Govern Rate limit per team (API Gateway throttling), cost attribution (tags), audit (CloudTrail + Invocation Logs).

Task 2.4 — FM API integrations

Sync vs. async vs. streaming

Synchronous (InvokeModel)

Wait for full response

Simple request/response
Good for batch, APIs with short responses
Higher perceived latency for chat

Asynchronous (via SQS)

Queue when volume exceeds capacity

Decouples producer from FM
Absorbs bursts
Handles throttling gracefully

Streaming (InvokeModelWithResponseStream)

Token-by-token delivery

Best for chat UX
WebSocket or SSE transport
API Gateway chunked transfer

Batch inference

~50% discount, non-real-time

Submit jobs, poll for completion
Large throughput at low cost
No latency guarantee

Streaming delivery mechanisms

WebSocket API Gateway WebSocket API — bidirectional, persistent connection. Best choice for chat.

SSE Server-Sent Events — one-way server → client over HTTP. Simpler than WebSocket when you don't need bidirectional.

Chunked HTTP API Gateway chunked transfer encoding — stream over REST. Works when WebSocket isn't available.

Resilient FM systems

Exponential backoff

AWS SDK retry strategies. Wait longer between each retry. Add jitter to avoid thundering herd.

Rate limiting

API Gateway throttling to prevent your app from hammering Bedrock.

Graceful degradation

Fall back to cached responses or simpler models when the primary FM is unavailable.

X-Ray tracing

Observability across service boundaries. Essential for diagnosing where latency comes from.

Intelligent model routing

Static routing

Hard-coded in app

Simplest option
No runtime flexibility
Change requires redeploy

Content-based

Analyze request, pick model

Step Functions orchestrates
Classifier → model selection
Supports cascading

Metrics-based

Live performance data

Route by current latency/cost
Automatically avoid slow models
Requires performance telemetry

Gateway transform

API Gateway request transforms

Rewrite requests per destination model
Model-agnostic client
Central routing logic

Task 2.5 — Application integration patterns & dev tools

GenAI-specific API patterns

Streaming response handling in API Gateway — differs from traditional REST; use WebSocket API or chunked transfer
Token limit management — truncate or summarize inputs that would exceed context window
Retry strategies for model timeouts — LLM timeouts look different from HTTP timeouts; don't just retry blindly on 504

Developer-facing tools

AWS Amplify

Declarative UI components for GenAI frontends. Hosting, auth, API. Accelerates UI build.

OpenAPI specs

API-first development. Your Bedrock Agent action groups use OpenAPI to define tool schemas.

Bedrock Prompt Flows

No-code visual workflow builder for prompt chains. Good for teams that want to iterate on logic without deploying code.

Business system enhancements

CRM enhancement

Auto-generate summaries

Lambda calls Bedrock on ticket create
Writes AI summary back to CRM
Agent uses the summary

Document processing

Extract → classify → summarize

Step Functions orchestrates
Textract → Comprehend → Bedrock
Handles async, retries, errors

Amazon Q Business

Enterprise assistant

Connectors for S3, SharePoint, Salesforce
Q Business Apps = no-code custom apps
Answer questions over internal knowledge

Bedrock Data Automation

Automated processing pipelines

Document extraction + transformation
Reduces custom pipeline code
Integrates with Bedrock ecosystem

Amazon Q Developer

AI coding assistant for accelerating GenAI development:

Code generation & refactoring for Bedrock integrations
API assistance — auto-complete for Bedrock SDK calls
AI component testing helpers
Performance optimization suggestions
GenAI-specific error pattern recognition when debugging

Troubleshooting efficiency

Logs Insights CloudWatch Logs Insights — query prompts and responses at scale. Find patterns in failures.

Tracing X-Ray — trace FM API calls end-to-end. See which hop in the RAG pipeline is slow.

Error recognition Amazon Q Developer — recognizes GenAI-specific error patterns (context overflow, guardrail blocks, model timeouts) and suggests fixes.

Domain 2 summary — what to remember

The service map

Agents Bedrock Agents (default) · Strands · Agent Squad
Agent infra Bedrock AgentCore
Tools MCP (Lambda = simple, ECS = complex)
Deploy Lambda · Provisioned Throughput · SageMaker
Orchestrate Step Functions (anything) · Prompt Flows (Bedrock-only)
Events EventBridge + Lambda
Integration API Gateway + Cognito (gateway)
CI/CD CodePipeline + prompt regression testing

The mental shortcuts

Approval needed? Step Functions callback pattern.
Complex MCP tool? ECS, not Lambda.
Chat interface? Streaming API + WebSocket.
Predictable high volume? Provisioned Throughput.
Data can't leave premises? Outposts.
Off public internet? VPC endpoints / PrivateLink.
Multi-team FM access? GenAI gateway.
Batch / non-real-time? Batch inference (discount).

Next up Continue to Domain 3 — AI Safety, Security & Governance (20% of the exam, and your CISSP strength). Or jump to the 10 Architecture Patterns for diagrammed end-to-end reference designs.