~ where should the model actually run? ~
START HERE Is the model you want available in Bedrock (Claude, Nova, Llama)? YES NO Steady high-volume traffic with tight latency SLAs? Popular open model from HF / Jumpstart catalog? YES NO YES NO Bedrock On-Demand pay per token, 90% of cases Bedrock Provisioned Reserved capacity, predictable cost/latency. Commit 1-6 months. SageMaker JumpStart Pre-packaged open models deployed to SageMaker endpoints Custom model, custom container, or fine-tuned from scratch? YES SageMaker Real-Time Endpoint full control, bring your own container / weights 💡 default answer: Bedrock On-Demand unless a constraint forces otherwise.

Why these four options

Bedrock On-Demand Pay per input/output token. No infrastructure to manage. Elastic — scales from 0 to high concurrency automatically. The default answer for almost every exam question unless a specific requirement forces you elsewhere.
Bedrock Provisioned Throughput Reserved capacity for a committed duration (1-month or 6-month). Guaranteed latency, flat cost. Required for fine-tuned Bedrock models (they can only be invoked via provisioned throughput). Use when traffic is predictable and high, or when you fine-tuned a Bedrock model.
SageMaker JumpStart A catalog of pre-packaged open-source and proprietary models (Llama, Mistral, Falcon, Stable Diffusion, etc.) you can deploy to a SageMaker endpoint in a few clicks. Use when the model you want isn't in Bedrock but exists in JumpStart. You manage the endpoint.
SageMaker Real-Time Endpoint Full control — bring your own container, your own fine-tuned weights, your own inference code. Highest operational burden but maximum flexibility. Required for truly custom models or specialized inference stacks.
Exam angle — three tells (1) "fine-tuned Bedrock model" → Provisioned Throughput is mandatory, not optional. (2) "open-source Llama / Mistral / Stable Diffusion" and stem says "deploy quickly" → JumpStart. (3) "bring your own model / custom container / proprietary weights" → SageMaker Real-Time Endpoint. If none of those apply, the answer is Bedrock On-Demand.
The "cost optimization" distractor Distractors often pitch Provisioned Throughput as a "cost optimization." It's only cheaper at high predictable volume. If traffic is bursty or low, on-demand wins on cost. Match the pattern.

Related trees

Tree 1: Vector Store Selection · Tree 3: Cost Optimization · Tree 4: RAG Troubleshooting