The exam loves this one. Bedrock On-Demand, Bedrock Provisioned Throughput, SageMaker JumpStart, or SageMaker Endpoint? The right answer depends on model choice, traffic pattern, and operational appetite.
~ where should the model actually run? ~
Why these four options
Bedrock On-Demand
Pay per input/output token. No infrastructure to manage. Elastic — scales from 0 to high concurrency automatically. The default answer for almost every exam question unless a specific requirement forces you elsewhere.
Bedrock Provisioned Throughput
Reserved capacity for a committed duration (1-month or 6-month). Guaranteed latency, flat cost. Required for fine-tuned Bedrock models (they can only be invoked via provisioned throughput). Use when traffic is predictable and high, or when you fine-tuned a Bedrock model.
SageMaker JumpStart
A catalog of pre-packaged open-source and proprietary models (Llama, Mistral, Falcon, Stable Diffusion, etc.) you can deploy to a SageMaker endpoint in a few clicks. Use when the model you want isn't in Bedrock but exists in JumpStart. You manage the endpoint.
SageMaker Real-Time Endpoint
Full control — bring your own container, your own fine-tuned weights, your own inference code. Highest operational burden but maximum flexibility. Required for truly custom models or specialized inference stacks.
Exam angle — three tells
(1) "fine-tuned Bedrock model" → Provisioned Throughput is mandatory, not optional. (2) "open-source Llama / Mistral / Stable Diffusion" and stem says "deploy quickly" → JumpStart. (3) "bring your own model / custom container / proprietary weights" → SageMaker Real-Time Endpoint. If none of those apply, the answer is Bedrock On-Demand.
The "cost optimization" distractor
Distractors often pitch Provisioned Throughput as a "cost optimization." It's only cheaper at high predictable volume. If traffic is bursty or low, on-demand wins on cost. Match the pattern.