Quick answer. For raw lowest price, Gemini 3.1 Flash-Lite ($0.10/$0.40). For cheap + GDPR/HIPAA, Claude Haiku 4.5 ($0.25/$1.00). For budget coding, DeepSeek V3 ($0.27/$1.10). But the headline price is not your real cost — the output split and the agentic multiplier decide what you actually pay.
Cheapest AI APIs ranked
| Model | Input $/M | Output $/M | Context | Compliance |
|---|---|---|---|---|
| Gemini 3.1 Flash-Lite | $0.10 | $0.40 | 1M | GDPR |
| Llama 4 (via Groq) | $0.18 | $0.29 | 1M | Self-host |
| Claude Haiku 4.5 | $0.25 | $1.00 | 200k | GDPR, HIPAA, SOC2 |
| DeepSeek V3 | $0.27 | $1.10 | 128k | None (China) |
| Gemini 3 Flash | $0.50 | $3.00 | 1M | GDPR, SOC2 |
DeepSeek V3 is the cheapest for coding quality, but it is operated from China with no GDPR/HIPAA — not suitable for regulated or sensitive data.
The 750x price range
Token pricing runs from roughly $0.10 to $75 per million tokens — a 750x range. The lesson is not "always buy the cheapest"; it is "match the model to the task". A budget model handling classification or extraction can be 30–100x cheaper than a frontier model, with no quality loss for that job.
The hidden cost: the agentic multiplier
Cheap can become expensive. Agentic workflows use 5–20x more tokens than a single completion, because the model retrieves, reasons and calls tools in a loop. A cheap model run agentically can cost more than a pricier model used for single completions. Always model the multiplier before you commit.
Model your real high-volume cost → — set your volume, output split and the agentic toggle to see monthly and annual cost across every model.
Decision matrix
| If you need… | Choose | Why |
|---|---|---|
| Absolute lowest price | Gemini 3.1 Flash-Lite | $0.10/$0.40 per M |
| Cheap + compliant | Claude Haiku 4.5 | GDPR, HIPAA, SOC2 at $0.25/$1.00 |
| Budget coding | DeepSeek V3 | Near-frontier code at low cost |
| Self-hosted / private | Llama 4 | Open weights, your own hardware |
| Cheap + long context | Gemini 3 Flash | 1M context at $0.50/$3.00 |
What changed in June 2026
- Gemini 3.1 Flash-Lite at $0.10/$0.40 set a new floor for major-provider API pricing.
- Llama 4 via Groq made open-weight inference cost-competitive with the cheapest proprietary tiers.
- AI token pricing has fallen roughly 80% over the past year — re-checking your model choice monthly is now a real cost lever.
Picking a budget model? The match engine balances cost against the quality your task actually needs.