Quick answer. Biggest lever: retrieval-augmented generation (RAG), ~71% reduction. Then self-consistency checking (~65%), ensemble checking (30–50%) and prompt mitigation (~22 points). Layer RAG with a human review step and you turn an unreliable generalist into a dependable tool.
Techniques ranked by impact
| Technique | Reduction | Effort | When to use |
|---|---|---|---|
| RAG (retrieval) | ~71% | Medium | Any factual or document-grounded task |
| Self-consistency checking | ~65% | Medium | High-stakes single answers |
| Ensemble (multi-model) | 30–50% | High | Critical decisions worth the cost |
| Prompt mitigation | ~22pp | Low | Every prompt — cheap baseline |
| Fine-tuning on domain data | Varies | High | Narrow, repeated, specialised tasks |
1. RAG — ground the model in your sources
Retrieval-augmented generation connects the model to a store of your verified documents, so it retrieves real facts instead of inventing them. It is the single most effective technique and the foundation of reliable business AI. If you do one thing, do this.
2. Self-consistency checking
Ask the model the same question several ways (or several times) and compare. Agreement signals reliability; divergence flags a likely hallucination. Effective for individual high-stakes answers.
3. Ensemble checking
Run the query across more than one model and compare outputs. Disagreement surfaces fabrication. It costs more — remember the token cost — so reserve it for genuinely critical decisions.
4. Prompt mitigation — the cheap baseline
Instruct the model to say when it is unsure and to show its reasoning. A simple line like "If you are not certain, say so rather than guessing" cuts hallucination by around 22 percentage points and costs nothing. Apply it everywhere.
5. Fine-tuning
For narrow, repeated tasks, fine-tuning on your own verified data can reduce error meaningfully. Higher effort — worth it only for stable, high-volume use cases.
The non-negotiable: human review
No technique reaches zero. For anything consequential, keep a human in the loop. This is the heart of basic governance and the reason it matters to know what AI can't do.
Picking a reliable base model first? Start from the Truth Score — a higher-scoring model plus RAG is the strongest combination.