Every term, acronym, and concept the AIP-C01 exam can throw at you. Search to filter across all terms.
A large AI model pre-trained on massive datasets that can be adapted to a wide range of tasks. Examples: Claude, GPT, Llama, Amazon Titan. The exam focuses on using FMs, not building them.
A type of FM specifically for text tasks. Subset of FMs.
The basic unit of text that FMs process. Roughly 4 English characters = 1 token. Cost is per input/output token.
Max tokens a model can process in one request (input + output combined). Exceeding it causes truncation or errors.
Sending input to an FM and getting a response. Each inference costs tokens.
Further training a pre-trained FM on domain-specific data. More expensive than prompt engineering. Use when prompting alone can't achieve goals.
Parameter-efficient fine-tuning. Trains a small set of additional parameters instead of the full model. Cheaper and faster than full fine-tuning.
Small trainable modules inserted into a pre-trained model. Similar concept to LoRA — parameter-efficient fine-tuning.
Crafting inputs to FMs to get desired outputs without modifying the model. Cheaper and faster than fine-tuning.
Instructions given to the FM that define its role, behavior, constraints. Persists across conversation, invisible to end user.
Controls randomness. 0 = deterministic. 1.0 = max randomness. Low for factual tasks, high for creative.
Limits model to choose from the top k most probable tokens. Lower k = more focused output.
Limits model to tokens whose cumulative probability exceeds p. Alternative to top-k.
When an FM generates plausible but factually incorrect content. RAG and grounding techniques reduce this.
Anchoring FM responses in verifiable source data (via RAG, fact-checking). Opposite of hallucination.
Models that can process multiple data types (text + images, text + audio).
Architecture: user question → retrieve relevant docs from a KB → retrieved docs + question → FM → grounded answer.
Numerical vector representation of data. Similar content → similar embeddings. Used for semantic search.
Database optimized for storing/querying high-dimensional vectors. Enables semantic similarity search.
Finding results based on meaning rather than exact keyword matching.
Combining traditional keyword search (BM25) with semantic vector search. Often outperforms either alone.
Breaking large documents into smaller pieces for embedding and retrieval. Strategies: fixed-size, hierarchical, semantic.
Second-stage retrieval. After initial search returns N results, a reranker model reorders them by relevance. Improves precision.
Algorithm that finds the k most similar vectors to a query vector. Core of vector search.
Efficient approximate nearest neighbor algorithm used by OpenSearch. Tuning params: ef_search (query-time accuracy/speed) and ef_construction (index build quality).
Number of dimensions in an embedding vector. Higher = more information but slower search and more storage.
AI system that can autonomously plan, reason, use tools, take actions. Loops: observe → think → act → observe.
Agent pattern where model alternates between reasoning and taking actions (tool calls).
FM generates a structured request to call an external function. Tool result fed back to FM.
Open standard for connecting AI agents to tools and data sources. MCP servers expose tools; MCP clients (agents) consume them.
Multiple specialized agents collaborating. One researches, another writes, another reviews.
In Bedrock Agents, a set of related actions/tools defined with OpenAPI schemas and Lambda backing.
Design pattern where an agent pauses for human review/approval before high-stakes actions. Implemented via Step Functions callback pattern.
Pattern that detects repeated failures and temporarily stops calling the failing service. Prevents cascading failures.
Attack where malicious input tricks the FM into ignoring its system prompt and following attacker instructions.
Technique to bypass an FM's safety guardrails. Model tricked into generating content it was designed to refuse.
Configurable safety controls filtering FM inputs/outputs. Bedrock Guardrails support 6 categories: topic denial, content filter, word filter, sensitive info (PII), contextual grounding, prompt attack.
Documentation describing a model's purpose, capabilities, limitations, training data, evaluation results, intended use. SageMaker supports programmatic model cards.
Complete history of data: where it came from, how transformed, where it went. Important for compliance and debugging.
Linking FM-generated content back to source documents. Critical for trust and compliance.
Using one FM to evaluate another FM's outputs. Scalable alternative to human evaluation. Bedrock supports this natively.
Pre-purchased capacity in Bedrock measured in model units. Guarantees consistent performance. Required for custom models.
Pay-per-token, no reserved capacity. Cost-effective for variable/low-volume workloads but subject to throttling.
Bedrock feature that automatically routes requests to models in other regions when primary region is at capacity. Improves availability.
Try a cheaper/faster model first; escalate to a more expensive model only when needed. Reduces average cost.
Caching processed system prompts and long context prefixes across requests. Big wins when many requests share the same prefix (chatbots).
Caching responses for semantically similar queries. Same question in different words → cached answer.
Processing multiple requests in a single API call. ~50% discount in Bedrock for non-real-time workloads.
Delivering FM response tokens incrementally. Improves perceived latency.
Gradual change in model behavior over time. Detected via golden dataset comparison.
Curated set of questions with known correct answers used to benchmark FM performance over time. Detects regression and drift.