AI for Students · Class 8 · Age 12–13 · Lesson 5 of 12

How Language Models Work 🧠

Why does the AI write so fluently — but sometimes get facts completely wrong? This lesson goes inside a language model to understand tokens, predictions, context, and why hallucination happens.

📘 Class 8 · Lesson 5 🕐 45–55 min 🚫 No coding needed 🆓 Free lesson
Illustrated scene: Indian student surrounded by colourful word tokens floating in the air, with a probability dial, representing language model prediction
Watch first · 2–3 minutes

Class 8 Lesson 5 — How Language Models Work

No sign-in needed · English narration · Safe for all school ages

Story · Divya's Poem and the Wrong Word

The AI Was Confident — And Completely Wrong 📜

Divya, 13, from Coimbatore, was writing a poem about the Kaveri river for her Tamil literature class. She asked an AI tool to suggest the next line after: "The Kaveri flows through ancient stone..."

The AI suggested: "...where emperors of the Maurya Empire once held their throne." It sounded beautiful and confident. But Divya's teacher pointed out immediately: the Maurya Empire was from the Ganga plain in the north — not from the Kaveri region in Tamil Nadu. The AI had chosen historically accurate-sounding words that were simply factually wrong for this geography.

"Why does it say something so confidently if it is wrong?" Divya asked. Her teacher explained: "The AI does not understand history the way you do. It has learned which words tend to follow other words in texts it has read. 'Ancient stone + emperors + throne' was a pattern it found plausible — but plausible is not the same as accurate."

👉 This lesson explains exactly how a language model generates text — and why that process produces fluent, confident language that can still be factually wrong.
Section 1 of 7

🧩 What Is a Token?

Language models do not read text word by word. They read it in pieces called tokens. A token is roughly a short word, part of a word, or a punctuation mark. The model learns patterns across tokens.

The sentence "The Kaveri flows" might be tokenised as:

The K aver i flows

Notice: common words like "The" and "flows" are single tokens. Less common words like "Kaveri" are split into parts. This is why the model handles common words well but can struggle more with rare names, technical terms, and regional language words.

Roughly: 1 token ≈ 0.75 words in English. A 100-word paragraph is about 130–140 tokens. GPT-4 can process up to about 128,000 tokens in one conversation — that is roughly a full novel.
Section 2 of 7

🔮 Next-Word Prediction: What the Model Actually Does

At its core, a language model does one thing: given the tokens it has seen so far, it calculates the probability of every possible next token — and picks from the most likely ones.

After seeing "The Kaveri flows through ancient", the model assigns probabilities to every possible next token:

"stone"
82%
"temples"
44%
"cities"
28%
"rivers"
12%
"elephant"
3%

The model repeats this for every token in the output — hundreds or thousands of times per response. The result reads like fluent, thoughtful writing — but it is generated one probabilistic step at a time.

Key insight: The model is not "thinking" about whether its output is true. It is asking: "What token is most likely to follow what I have seen so far, based on patterns in text I was trained on?" Fluency and factual accuracy are two separate things.
Section 3 of 7

📏 The Context Window

A language model can only "see" a limited amount of text at once — called the context window. Anything outside the window is invisible to the model when it is generating its response.

[ Everything in this window is visible to the model right now: your conversation history, your current prompt, and the model's own previous output. ]

↑ Anything before the window start → model cannot see it. Anything after the window end → not yet generated.

Why the context window matters

Analogy: Imagine reading a book but you can only see 10 pages at a time through a sliding window. You can answer questions about what is in the window perfectly — but you cannot remember what was on page 3 when you are now on page 40.
Section 4 of 7

🌡️ The Temperature Parameter

When a model picks the next token, it does not always pick the highest-probability one. A setting called temperature controls how much randomness is used in the selection.

Temperature settingBehaviourGood for
Low (0.0–0.3)Always picks the most likely token — outputs are predictable and consistentCode generation, fact-based Q&A, translations
Medium (0.5–0.8)Balanced — mostly likely tokens with some variationEssay writing, explanations, study guides
High (1.0–2.0)More surprising token choices — outputs are more creative but less reliablePoetry, brainstorming, story generation
This is why the same question asked to an AI twice may produce slightly different answers — especially at higher temperature settings.
Section 5 of 7

👻 Why AI Hallucinates — A Deeper Explanation

You learned in Class 7 that AI "hallucination" means the model produces confident-sounding but false information. Now that you understand next-word prediction, you can understand why it happens.

  1. The model does not "know" facts — it knows patterns. If a combination of tokens was common in its training data, it will seem plausible to the model — even if it is false.
  2. It cannot say "I don't know" reliably. When asked about something it was not trained on, it does not stop and say "I don't know." It generates the most plausible-sounding continuation — which may be entirely fabricated.
  3. It is rewarded for sounding helpful. The training process rewards responses that humans rate as helpful and coherent. A confident-sounding wrong answer may have been rated better than an honest "I'm not sure" during training.
Real consequence: Divya's AI suggested a historically wrong line because "ancient stone + emperors + throne" was a common pattern in historical poetry across many cultures in its training data. The model had no mechanism to check whether the Maurya Empire was actually relevant to the Kaveri region. It was producing the most statistically plausible continuation — not the most historically accurate one.
What to do: Always verify specific facts, names, dates, places, and statistics from AI output against a reliable source. The more specific and unusual a claim, the higher the risk that it is a hallucination.
Section 6 of 7

🔧 What Is Fine-Tuning?

A base language model is trained on vast amounts of general text — books, websites, Wikipedia, etc. But to make it useful for a specific task or domain, it is often fine-tuned.

Fine-tuning means continuing the training process with a much smaller, more focused dataset — teaching the model to behave differently for a particular context.

Examples of fine-tuning:
  • A general language model fine-tuned on medical text → becomes better at answering health questions in appropriate clinical language
  • A general model fine-tuned on customer service conversations → becomes better at handling support queries politely and efficiently
  • A general model fine-tuned on Indian legal documents → becomes better at understanding Indian law and regulatory language

Fine-tuning is also how models are taught to be "helpful assistants" — by training on examples of good assistant behaviour, evaluated by human raters. This process is called RLHF (Reinforcement Learning from Human Feedback).

Section 7 of 7

🗺️ Key Vocabulary Map

TermSimple meaning
TokenThe basic unit of text a language model reads — roughly a short word or word fragment
Next-word predictionThe core task: given previous tokens, pick the most likely next token. Repeated many times to produce a response.
Context windowHow much text the model can see at once. Anything outside the window is invisible.
TemperatureA setting that controls how random or creative the model's token choices are (low = predictable, high = creative)
HallucinationWhen the model generates confident-sounding but false information — because it follows statistical patterns, not facts
Fine-tuningContinued training on a smaller, focused dataset to specialise the model for a particular task or domain
RLHFReinforcement Learning from Human Feedback — the process used to train models to behave as helpful assistants

🧠 Quiz — Lesson 5

8 questions · Click your answer · Submit for your score

1. What is a "token" in a language model?
2. What is the core task that a language model performs?
3. The "context window" of a language model refers to:
4. Which temperature setting would you use for generating code that must work correctly every time?
5. Why does AI hallucination happen? Choose the BEST explanation.
6. Divya's AI suggested that Maurya Empire emperors used the Kaveri region as their base. This is an example of:
7. What is "fine-tuning" a language model?
8. You ask an AI the same factual question twice and get slightly different answers. The most likely explanation is:

📝 Worksheet — Spot the Hallucination

Tip: in the print dialog, choose "Save as PDF" to download.

In your notebook, complete these exercises:

  1. Ask an AI chatbot a specific question about a place or event in Andhra Pradesh, Telangana, or your own state. Copy the answer.
  2. Verify 3 specific claims from the answer using NCERT textbooks, government websites, or an encyclopaedia. Mark each as: ✅ Verified / ❌ Wrong / ❓ Cannot verify.
  3. If you found an error: explain in 2 sentences WHY the AI might have generated that wrong information based on what you learned in this lesson about next-word prediction and hallucination.
  4. Write one tip you would give a younger student about how to use AI for research safely, based on what you now know.

📋 Note for Parents and Teachers

What this lesson covers: Tokens and tokenisation, next-word prediction as the core mechanism, the context window, temperature settings, a deeper mechanistic explanation of hallucination, and the concept of fine-tuning including RLHF. No mathematics is required — the concepts are explained through analogy and visual description.

Discussion prompts:

← Lesson 4: AI Workflows Lesson 6: Computer Vision →