Decision Tree · Deployment Choice

~ where should the model actually run? ~

Why these four options

Bedrock On-Demand Pay per input/output token. No infrastructure to manage. Elastic — scales from 0 to high concurrency automatically. The default answer for almost every exam question unless a specific requirement forces you elsewhere.

Bedrock Provisioned Throughput Reserved capacity for a committed duration (1-month or 6-month). Guaranteed latency, flat cost. Required for fine-tuned Bedrock models (they can only be invoked via provisioned throughput). Use when traffic is predictable and high, or when you fine-tuned a Bedrock model.

SageMaker JumpStart A catalog of pre-packaged open-source and proprietary models (Llama, Mistral, Falcon, Stable Diffusion, etc.) you can deploy to a SageMaker endpoint in a few clicks. Use when the model you want isn't in Bedrock but exists in JumpStart. You manage the endpoint.

SageMaker Real-Time Endpoint Full control — bring your own container, your own fine-tuned weights, your own inference code. Highest operational burden but maximum flexibility. Required for truly custom models or specialized inference stacks.

Exam angle — three tells (1) "fine-tuned Bedrock model" → Provisioned Throughput is mandatory, not optional. (2) "open-source Llama / Mistral / Stable Diffusion" and stem says "deploy quickly" → JumpStart. (3) "bring your own model / custom container / proprietary weights" → SageMaker Real-Time Endpoint. If none of those apply, the answer is Bedrock On-Demand.

The "cost optimization" distractor Distractors often pitch Provisioned Throughput as a "cost optimization." It's only cheaper at high predictable volume. If traffic is bursty or low, on-demand wins on cost. Match the pattern.

Bedrock vs SageMaker — Deployment Choice

Why these four options

Related trees