Domain 3: AI Safety, Security & Governance

Task 3.1 — Input & output safety controls

The six Bedrock Guardrails filter categories

Memorize these. Every Guardrails question maps to one or more:

1 · Denied topics

Custom categories you forbid

e.g., "legal advice," "medical diagnosis"
Natural-language topic definitions

2 · Content filters

Hate, insults, sexual, violence, misconduct

Threshold per category (low/med/high)
Applied independently to input & output

3 · Word filters

Profanity / custom blocklists

Exact-word blocking
Domain-specific blacklists

4 · Sensitive info

PII detection & redaction

SSN, credit card, email, phone, etc.
Block or redact (mask)

5 · Contextual grounding

Catches hallucinations in RAG

Checks if output is supported by retrieved context
Flags/blocks ungrounded responses

6 · Prompt attack filter

Jailbreak / injection detection

Detects override attempts
Blocks at input layer

Defense-in-depth for GenAI — the seven layers

Applied in order from outside to inside. This is Pattern 10 in the Architecture Patterns reference.

1 · Network VPC endpoints (PrivateLink) for Bedrock — keep FM traffic off the public internet.

2 · Identity IAM policies scoped to specific models and actions; Cognito for end-user auth; identity federation to enterprise IdP.

3 · Pre-processing Amazon Comprehend PII detection + Lambda sanitization before the FM sees input.

4 · Model-level Bedrock Guardrails — topic denial, content filters, PII, grounding, prompt-attack.

5 · Post-processing Lambda validates output format, accuracy, safety after FM responds.

6 · API API Gateway rate limiting; WAF against abuse.

7 · Audit CloudTrail + Bedrock Model Invocation Logs for forensic traceability.

Hallucination reduction techniques

RAG grounding

Force FM to use sources

Bedrock Knowledge Bases
FM cites retrieved chunks
Primary defense

Confidence scoring

Flag low-confidence

Uncertainty signals
Human review for low scores

Semantic similarity

Verify claims vs. sources

Check FM statements against docs
Catches fabricated facts

JSON Schema

Structured outputs

Force exact response shape
Fields populate from sources
Less free-form hallucination

Adversarial threat detection

Prompt injection

Attacker embeds instructions that override the system prompt ("Ignore previous instructions. Instead, do X."). Detect with Guardrails' prompt attack filter, input sanitization, safety classifiers.

Jailbreak

Attempt to bypass safety guardrails through clever framing ("You are now DAN," role-play exploits). Same defenses plus adversarial testing.

Input sanitization

Strip or escape potentially malicious content before prompt assembly.

Safety classifiers

Dedicated models that classify input risk before the primary FM runs.

Adversarial testing

Red-team your FM with automated attack workflows; add failed cases to Guardrails.

Trap — security over-engineering Your CISSP instinct may want to add every possible control. The exam rewards the right level of security for the scenario. VPC endpoints + IAM + encryption handle most scenarios. Don't add Lambda@Edge content filtering when Bedrock Guardrails does it natively.

Task 3.2 — Data security & privacy controls

Protected AI environments

Isolation VPC endpoints for bedrock-runtime and bedrock. Invoking compute in a VPC; all traffic stays private.

Access control IAM policies enforce least-privilege access to models and data.

Fine-grained data AWS Lake Formation — column/row-level access control on data lakes feeding FMs.

Monitoring CloudWatch monitors all data access patterns; alarm on anomalies.

Privacy-preserving systems — the PII flow

Discover

Amazon Macie

Scan S3 for sensitive data

Detect

Comprehend PII

Entity recognition in input

Filter

Guardrails sensitive info

Block or redact

Model

Bedrock FM

Processes clean data

Retain

S3 Lifecycle

Auto-delete after retention period

Bedrock data privacy by default AWS does not use your Bedrock data to train base models. Your data stays in your account. Encrypted at rest (KMS) and in transit (TLS 1.2+). This is a frequent exam fact.

Anonymization strategies

Data masking

Replace with realistic fake data

"John Smith" → "Alex Johnson"
Preserves data shape for testing

Comprehend PII detection

Find & tag entities

SSN, CC, email, phone
Pre-built PII entity types

Anonymization

Irreversible removal

Strip identifiers permanently
For data that doesn't need re-linking

Pseudonymization

Reversible with key

Replace IDs with tokens
Mapping table controls re-identification
GDPR-friendly

Task 3.3 — Governance & compliance mechanisms

Compliance frameworks

SageMaker Model Cards

Programmatic documentation of model purpose, limitations, performance metrics, intended use. Required for governance and audit trails.

Glue Data Lineage

Automatically track where data came from, how it was transformed, where it went. Essential for provenance questions.

Metadata tagging

Systematic source attribution in FM-generated content. Tag outputs with which source documents informed them.

CloudWatch Logs

Comprehensive decision logs for audit. Query with Logs Insights.

Data source tracking for traceability

Glue Data Catalog

Central metadata

Tag

Source attribution

Which doc informed output

Log

CloudTrail

Who · what · when

Invoke logs

Model Invocation Logs

Full request/response

Investigate

Logs Insights

Query the audit trail

Continuous monitoring & advanced governance

Misuse detection

Automated anomaly detection

Unusual usage patterns
Policy violations

Drift monitoring

Model behavior changes

Output distribution shifts
Quality degradation alerts

Bias drift

Fairness over time

Track demographic disparities
Alert on widening gaps

Token redaction

Log-level PII protection

Redact sensitive fields before logging
Auditable but privacy-safe

Task 3.4 — Responsible AI principles

Transparency

Reasoning Reasoning displays — show users how the AI arrived at its answer.

Confidence CloudWatch confidence metrics — quantify and display uncertainty.

Sources Evidence presentation — citations linking claims to source documents (built into Knowledge Bases).

Traces Bedrock Agent Tracing — reasoning traces showing the agent's thought process, tool calls, decision points. Essential for agent debugging and explainability.

Fairness evaluations

CloudWatch fairness metrics

Pre-defined metrics track model performance across demographic groups.

Systematic A/B testing

Bedrock Prompt Management + Prompt Flows for comparing outputs across groups; identify disparate impact.

LLM-as-a-Judge

Use a second FM to evaluate the primary FM's outputs for bias. Bedrock supports this via Model Evaluations automated evaluation jobs.

SageMaker Clarify

Bias detection and model explainability (disparate impact, demographic parity, SHAP).

Policy-compliant AI systems

Bedrock Guardrails configured to policy — denied topics match policy rules, word filters match forbidden language
Model cards document limitations — what the FM should and shouldn't be used for
Lambda compliance checks — automated verification against policy rules; flag or block violations

Exam angle When a question asks about "explainability" or "showing how the agent reached a conclusion," the answer is Bedrock Agent Tracing. When it's "bias detection" or "demographic fairness," the answer includes SageMaker Clarify or LLM-as-a-Judge via Bedrock Model Evaluations.

Domain 3 summary — what to remember

The service map

Guardrails Bedrock Guardrails (6 filter types)
PII Comprehend detection + Macie discovery
Network VPC endpoints / PrivateLink for Bedrock
Identity IAM (least privilege) + Cognito
Data Lake Formation + KMS encryption
Audit CloudTrail + Model Invocation Logs
Docs SageMaker Model Cards (programmatic)
Lineage Glue Data Lineage + Data Catalog
Bias SageMaker Clarify · LLM-as-a-Judge
Traces Bedrock Agent Tracing

The mental shortcuts

PII in user input? Comprehend + Guardrails sensitive info.
Audit of model invocations? Model Invocation Logs.
Off public internet? VPC endpoints (PrivateLink).
Grounded RAG responses? Guardrails contextual grounding.
Prompt injection defense? Guardrails prompt attack filter.
Bias detection? SageMaker Clarify.
Agent explainability? Bedrock Agent Tracing.
Model documentation? SageMaker Model Cards.

Next up Continue to Domain 4 — Operational Efficiency & Optimization (12%). Or see the Defense-in-Depth pattern fully diagrammed.