What is a large language model (LLM)?

An LLM is a system trained to predict the next word (token) in a sequence of text. By doing this across vast amounts of writing, it learns grammar, facts, styles and reasoning-like patterns. It is not a database and does not look things up — it completes patterns at enormous scale.

How does ChatGPT actually work?

ChatGPT and similar tools predict the most likely next token given the text so far, one token at a time. There is no understanding or memory in the human sense — just extraordinarily sophisticated pattern completion learned from huge volumes of text.

What are parameters in an AI model?

Parameters are the weights a model learned during training — the distilled pattern of everything it read. GPT-4 has roughly 1.8 trillion. More parameters means more stored patterns, not necessarily more intelligence, and usually higher cost and latency.

What Is an LLM? Plain-English Guide

In one line: a large language model is trained to predict the next token (word-piece) in a sequence. Repeat that across nearly all the text ever written and you get a system that can write, summarise, translate and code — not by understanding, but by completing patterns with extraordinary accuracy.

Next-token prediction, explained

The training goal sounds almost too simple: give the model some text, ask it to predict what comes next, and repeat billions of times. Yet at scale this produces systems that write essays, summarise documents, translate languages and explain code. In learning which word tends to follow another, the model also absorbs grammar, facts, writing styles, code patterns and reasoning-like structures.

The analogy: imagine reading every book, article, forum post and website ever written, then learning to predict — with uncanny accuracy — what word comes next in any sentence. That's an LLM. Not intelligence. Not understanding. Pattern completion at scale.

Why it's brilliant at some things and useless at others

This single fact explains every AI product on the market:

Brilliant at (recombining known patterns)	Poor at (needs grounding or novelty)
Writing and rewriting	Reliable factual recall
Summarising supplied text	Exact maths without a tool
Translating	Genuinely novel reasoning
Explaining and generating code	Real-world grounding

Tasks that recombine existing knowledge play to the model's strength. Tasks that need new reasoning or verified facts hit its structural weakness — which is why what AI can't do matters as much as what it can.

Parameters and model size, simply

Parameters are the "weights" a model learned in training — the distilled pattern of everything it read.
GPT-4 has roughly 1.8 trillion parameters; frontier Claude models, several hundred billion.
A bigger model isn't automatically smarter — it saw and stored more patterns. It's also slower and more expensive.
This is why Mixture-of-Experts architecture matters: massive total parameters, but only a tiny relevant subset runs per query. See how DeepSeek did it.

What it means for your business

Understanding next-token prediction is the foundation of every sensible AI decision. It tells you to ground the model in your own data (RAG) for facts, to add a human check for anything consequential, and to match model size to task rather than defaulting to the biggest, priciest option. Start with the business starter guide.

Going deeper

Where did the patterns come from, and who got paid? See training data and copyright. Can this approach ever become real intelligence? See can AI become intelligent and the honest AGI debate.

Now choose one. With the fundamentals clear, use the match engine to find the right model for your task.

What is an LLM?

Next-token prediction, explained

Why it's brilliant at some things and useless at others

Parameters and model size, simply

What it means for your business

Going deeper