Lesson 5 — How Language Models Work | Class 8

Story · Divya's Poem and the Wrong Word

The AI Was Confident — And Completely Wrong 📜

Divya, 13, from Coimbatore, was writing a poem about the Kaveri river for her Tamil literature class. She asked an AI tool to suggest the next line after: "The Kaveri flows through ancient stone..."

The AI suggested: "...where emperors of the Maurya Empire once held their throne." It sounded beautiful and confident. But Divya's teacher pointed out immediately: the Maurya Empire was from the Ganga plain in the north — not from the Kaveri region in Tamil Nadu. The AI had chosen historically accurate-sounding words that were simply factually wrong for this geography.

"Why does it say something so confidently if it is wrong?" Divya asked. Her teacher explained: "The AI does not understand history the way you do. It has learned which words tend to follow other words in texts it has read. 'Ancient stone + emperors + throne' was a pattern it found plausible — but plausible is not the same as accurate."

👉 This lesson explains exactly how a language model generates text — and why that process produces fluent, confident language that can still be factually wrong.

Section 1 of 7

🧩 What Is a Token?

Language models do not read text word by word. They read it in pieces called tokens. A token is roughly a short word, part of a word, or a punctuation mark. The model learns patterns across tokens.

The sentence "The Kaveri flows" might be tokenised as:

The K aver i flows

Notice: common words like "The" and "flows" are single tokens. Less common words like "Kaveri" are split into parts. This is why the model handles common words well but can struggle more with rare names, technical terms, and regional language words.

Roughly: 1 token ≈ 0.75 words in English. A 100-word paragraph is about 130–140 tokens. GPT-4 can process up to about 128,000 tokens in one conversation — that is roughly a full novel.

Section 2 of 7

🔮 Next-Word Prediction: What the Model Actually Does

At its core, a language model does one thing: given the tokens it has seen so far, it calculates the probability of every possible next token — and picks from the most likely ones.

After seeing "The Kaveri flows through ancient", the model assigns probabilities to every possible next token:

"stone"

82%

"temples"

44%

"cities"

28%

"rivers"

12%

"elephant"

The model repeats this for every token in the output — hundreds or thousands of times per response. The result reads like fluent, thoughtful writing — but it is generated one probabilistic step at a time.

Key insight: The model is not "thinking" about whether its output is true. It is asking: "What token is most likely to follow what I have seen so far, based on patterns in text I was trained on?" Fluency and factual accuracy are two separate things.

Section 3 of 7

📏 The Context Window

A language model can only "see" a limited amount of text at once — called the context window. Anything outside the window is invisible to the model when it is generating its response.

[ Everything in this window is visible to the model right now: your conversation history, your current prompt, and the model's own previous output. ]

↑ Anything before the window start → model cannot see it. Anything after the window end → not yet generated.

Why the context window matters

In a very long conversation, the AI may "forget" what you said at the start because it has passed out of the context window.
If your prompt is very long, it leaves less room for the model's response.
Large context windows (like 128K tokens) allow the model to use entire documents as context — much more useful than small windows.

Analogy: Imagine reading a book but you can only see 10 pages at a time through a sliding window. You can answer questions about what is in the window perfectly — but you cannot remember what was on page 3 when you are now on page 40.

Section 4 of 7

🌡️ The Temperature Parameter

When a model picks the next token, it does not always pick the highest-probability one. A setting called temperature controls how much randomness is used in the selection.

Temperature setting	Behaviour	Good for
Low (0.0–0.3)	Always picks the most likely token — outputs are predictable and consistent	Code generation, fact-based Q&A, translations
Medium (0.5–0.8)	Balanced — mostly likely tokens with some variation	Essay writing, explanations, study guides
High (1.0–2.0)	More surprising token choices — outputs are more creative but less reliable	Poetry, brainstorming, story generation

This is why the same question asked to an AI twice may produce slightly different answers — especially at higher temperature settings.

Section 5 of 7

👻 Why AI Hallucinates — A Deeper Explanation

You learned in Class 7 that AI "hallucination" means the model produces confident-sounding but false information. Now that you understand next-word prediction, you can understand why it happens.

The model does not "know" facts — it knows patterns. If a combination of tokens was common in its training data, it will seem plausible to the model — even if it is false.
It cannot say "I don't know" reliably. When asked about something it was not trained on, it does not stop and say "I don't know." It generates the most plausible-sounding continuation — which may be entirely fabricated.
It is rewarded for sounding helpful. The training process rewards responses that humans rate as helpful and coherent. A confident-sounding wrong answer may have been rated better than an honest "I'm not sure" during training.

Real consequence: Divya's AI suggested a historically wrong line because "ancient stone + emperors + throne" was a common pattern in historical poetry across many cultures in its training data. The model had no mechanism to check whether the Maurya Empire was actually relevant to the Kaveri region. It was producing the most statistically plausible continuation — not the most historically accurate one.

What to do: Always verify specific facts, names, dates, places, and statistics from AI output against a reliable source. The more specific and unusual a claim, the higher the risk that it is a hallucination.

Section 6 of 7

🔧 What Is Fine-Tuning?

A base language model is trained on vast amounts of general text — books, websites, Wikipedia, etc. But to make it useful for a specific task or domain, it is often fine-tuned.

Fine-tuning means continuing the training process with a much smaller, more focused dataset — teaching the model to behave differently for a particular context.

Examples of fine-tuning:

A general language model fine-tuned on medical text → becomes better at answering health questions in appropriate clinical language
A general model fine-tuned on customer service conversations → becomes better at handling support queries politely and efficiently
A general model fine-tuned on Indian legal documents → becomes better at understanding Indian law and regulatory language

Fine-tuning is also how models are taught to be "helpful assistants" — by training on examples of good assistant behaviour, evaluated by human raters. This process is called RLHF (Reinforcement Learning from Human Feedback).

Section 7 of 7

🗺️ Key Vocabulary Map

Term	Simple meaning
Token	The basic unit of text a language model reads — roughly a short word or word fragment
Next-word prediction	The core task: given previous tokens, pick the most likely next token. Repeated many times to produce a response.
Context window	How much text the model can see at once. Anything outside the window is invisible.
Temperature	A setting that controls how random or creative the model's token choices are (low = predictable, high = creative)
Hallucination	When the model generates confident-sounding but false information — because it follows statistical patterns, not facts
Fine-tuning	Continued training on a smaller, focused dataset to specialise the model for a particular task or domain
RLHF	Reinforcement Learning from Human Feedback — the process used to train models to behave as helpful assistants

🧠 Quiz — Lesson 5

8 questions · Click your answer · Submit for your score

1. What is a "token" in a language model?

2. What is the core task that a language model performs?

3. The "context window" of a language model refers to:

4. Which temperature setting would you use for generating code that must work correctly every time?

5. Why does AI hallucination happen? Choose the BEST explanation.

6. Divya's AI suggested that Maurya Empire emperors used the Kaveri region as their base. This is an example of:

7. What is "fine-tuning" a language model?

8. You ask an AI the same factual question twice and get slightly different answers. The most likely explanation is:

📝 Worksheet — Spot the Hallucination

Tip: in the print dialog, choose "Save as PDF" to download.

In your notebook, complete these exercises:

Ask an AI chatbot a specific question about a place or event in Andhra Pradesh, Telangana, or your own state. Copy the answer.
Verify 3 specific claims from the answer using NCERT textbooks, government websites, or an encyclopaedia. Mark each as: ✅ Verified / ❌ Wrong / ❓ Cannot verify.
If you found an error: explain in 2 sentences WHY the AI might have generated that wrong information based on what you learned in this lesson about next-word prediction and hallucination.
Write one tip you would give a younger student about how to use AI for research safely, based on what you now know.

📋 Note for Parents and Teachers

What this lesson covers: Tokens and tokenisation, next-word prediction as the core mechanism, the context window, temperature settings, a deeper mechanistic explanation of hallucination, and the concept of fine-tuning including RLHF. No mathematics is required — the concepts are explained through analogy and visual description.

Discussion prompts:

"If a language model does not know facts, only patterns — what does that mean for using it as a study reference? What safeguards should you use?"
"Think of a subject where hallucination would be dangerous (medicine, law, history) and one where it might be fine (brainstorming creative ideas). Why the difference?"

How Language Models Work 🧠

Class 8 Lesson 5 — How Language Models Work

The AI Was Confident — And Completely Wrong 📜

🧩 What Is a Token?

🔮 Next-Word Prediction: What the Model Actually Does

📏 The Context Window

Why the context window matters

🌡️ The Temperature Parameter

👻 Why AI Hallucinates — A Deeper Explanation

🔧 What Is Fine-Tuning?

🗺️ Key Vocabulary Map

🧠 Quiz — Lesson 5

📝 Worksheet — Spot the Hallucination

📋 Note for Parents and Teachers