Quick answer. Use an API for most cases — it's simpler, faster to ship and cheap at low or spiky volume. Self-host an open-weight model when you need data control, jurisdiction removal (e.g. a Chinese model under GDPR), or predictable cost at high steady volume.
The three trade-offs
| API (hosted) | Self-hosted (open weights) | |
|---|---|---|
| Cost shape | Pay per token; scales with use | Fixed infra cost; cheaper at high volume |
| Setup | Minutes | Engineering project |
| Data control | Provider's jurisdiction | Entirely yours |
| Compliance | Depends on provider/tier | You control residency, logs, versioning |
| Maintenance | None | Ongoing |
| Best for | Most businesses | Regulated, high-volume, sovereignty-critical |
Cost: when self-hosting actually wins
Self-hosting has real fixed costs — GPUs (owned or rented), setup and maintenance. It beats per-token API pricing only once volume is high and steady enough to amortise them. At low or bursty volume, the API almost always wins. Model your own break-even with the token cost calculator before assuming "self-hosted = cheaper".
Complexity: the honest part
Running a model means open-weight files, sufficient GPU compute, an inference stack, and the engineering to deploy, secure, monitor and update it. This is the catch behind cheap Chinese models: the price is low, but safe deployment is a real project. The cheapest option is the most complex to run well.
Compliance: the reason it's worth it
Self-hosting is the mitigation that makes otherwise-risky models viable. Run DeepSeek, Qwen, GLM, MiniMax or Llama on your own infrastructure and the data never leaves your control — no Chinese-jurisdiction exposure, no CLOUD Act question, full audit logs and residency. For HIPAA, GDPR and financial-services work, this is often the only way to use open-weight models. See the China risk assessment.
Decision matrix
| If you… | Choose |
|---|---|
| Want to ship fast with little ops | API |
| Have low or unpredictable volume | API |
| Must keep data in your control | Self-host |
| Run very high steady volume | Self-host (better unit cost) |
| Want a cheap Chinese model under GDPR | Self-host |
Going self-hosted? You'll need GPU-capable infrastructure — our sister site Best VPS Match compares the options. First, confirm where each model's data goes in the sovereignty comparison.