The core limit: a language model predicts plausible text, not verified truth. Everything below flows from that. None of it means AI is useless — it means you design around the gaps with retrieval, tools and human review.
The seven real limits
1. It can't reliably cite real sources without RAG
Ask a bare model for references and it will often invent plausible-looking ones. Real citations require retrieval-augmented generation — connecting the model to an actual document store. Without that, treat every citation as unverified.
2. It can't do reliable maths
Models approximate arithmetic from patterns, not calculation. For anything beyond trivial sums, they need a calculator or code tool wired in. Never trust an unaided model with figures that matter.
3. It can't make legal or medical decisions safely
Hallucination rates in legal queries run 58–88% and in medical summaries 43–64% without mitigation (see the Truth Score). AI can draft and summarise in these fields, but a qualified human must make and own the decision.
4. It can't replace domain expertise
AI lacks accountability and real-world judgment. It is a powerful assistant to an expert, not a substitute for one. The expert knows when the confident answer is wrong — the model often doesn't.
5. It can't remember between sessions
By default a model only sees the current context window. Persistent memory needs an explicit memory architecture built around it. Without one, every conversation starts from zero.
6. It can't hold a consistent persona without a system prompt
Tone and behaviour drift unless you anchor them with a system prompt. For brand-consistent customer-facing use, the system prompt is mandatory, not optional — see AI personality.
7. It can't know that it's right
A model has no internal sense of certainty. It can state a fabrication with exactly the same confidence as a fact. This is the "confident liar" problem — and the reason a human review step is non-negotiable for anything important.
What this means for deployment
| The limit | The workaround |
|---|---|
| Invents sources | Add RAG against your verified documents |
| Unreliable maths | Wire in a calculator/code tool |
| Unsafe for high-stakes decisions | Human makes and owns the decision |
| No memory | Add a memory layer if continuity matters |
| Persona drift | Anchor with a system prompt |
| Can't self-verify | Human review on anything that matters |
What changed in June 2026
- Reasoning models were shown to hallucinate more on facts, reinforcing that "smarter" does not mean "safe to trust unchecked".
- RAG and tool-use became standard practice precisely because these limits are now well understood.
- The 2025 proof that hallucination is mathematically inevitable settled the debate: design around the limits, don't wait for them to be fixed.
Building responsibly? Pair this with the starter guide, governance basics and the data privacy checklist.