Class 6 Lesson 3 — How AI Learns

Story · Meet Priya and Her Little Brother

Priya's Teaching Problem 🐱🐶

Priya's little brother Arjun is four years old. One afternoon, Priya showed him pictures from a book and tried to teach him the difference between a cat and a dog.

"This is a cat," she said, pointing at a fluffy orange animal. "And this is a dog." She pointed at a golden retriever.

Arjun looked carefully. "Okay," he said.

She showed him twenty more pictures. Cat. Dog. Cat. Dog. Cat. Each time, she told him the answer.

Then she tested him. She showed him a new picture of a Labrador he had never seen. He looked at it for a moment. "Dog!" he said.

"Right!" said Priya. "How did you know?"

Arjun shrugged. "Big, floppy ears. Tongue out. Looks like the others."

Priya stopped. She realised she had never told him the rules. Nobody said "four legs + floppy ears + wagging tail = dog". He just looked at enough examples and found the pattern himself.

👉 That is exactly how AI learns. You show it thousands of examples, tell it the right answer each time, and it figures out the patterns on its own. No one writes the rules. In this lesson we will see how that works — step by step.

Section 1 of 8

🧠 What Does "Learning" Mean for a Computer?

There is an important difference between two things a computer can do:

Memorisation (old computers)

A programmer writes every rule
If dog → big ears, tail, fur → label "dog"
New situations break the rules
Can only do exactly what it was told
Can't handle things it hasn't seen

Learning (AI / Machine Learning)

Nobody writes the rules
The AI finds patterns itself from examples
Works on new situations it has never seen
Gets better with more data
Makes mistakes — just like a student does

The type of AI that learns from examples is called Machine Learning (ML). The word "machine" just means computer. The word "learning" means exactly what it sounds like — the computer improves by looking at data.

Think of it this way: A calculator is memorisation — you give it a formula and it uses it exactly. Google Photos recognising your family's faces is machine learning — nobody wrote rules saying "this nose shape = Amma". The AI learned it from thousands of photos.

Why is this powerful? There are tasks where it is impossible to write all the rules. No one can write a complete rule for recognising faces, understanding speech, or predicting rain. There are too many variables. Machine learning handles this by learning the pattern from data instead of rules.

Section 2 of 8

📦 Training Data — The Textbook the AI Studies From

Before an AI can learn anything, it needs data. Lots of it. This is called training data — the collection of examples the AI uses to learn.

Think of training data as the textbook the AI reads. But instead of reading it once, the AI "reads" the same data hundreds or thousands of times, finding slightly better patterns each time.

What does training data look like?

AI System	What the Training Data Contains	How Much Data
Face unlock	Millions of labelled face photos: "Person A, Person B, Person A..."	10+ million images
Autocorrect keyboard	Billions of real sentences written by millions of people	100+ billion words
YouTube recommendation	Watch history, clicks, completion rates of billions of users	Trillions of events per day
Weather AI (IMD)	100+ years of temperature, rainfall, wind, satellite readings across India	Petabytes of sensor data
Plantix (crop disease)	Labelled photos of diseased and healthy crop leaves	1 million+ plant images

Important: The quality of training data matters enormously. An AI can only be as good as the data it learned from. If the data is wrong, incomplete, or biased — the AI will be wrong, incomplete, or biased. We will see this in Section 6.

India data point: For Hindi and Indian language AI, there is much less training data available compared to English. This is why AI systems sometimes work better in English than in Telugu, Tamil, or Hindi — they had more examples to learn from. This is an active area of research by Indian universities and companies like AI4Bharat.

Section 3 of 8

🏷️ Labelling — How Humans Teach AI the Answers

Training data alone is not enough. The AI needs to know the right answer for each example. This process of telling the AI "this one is a cat" or "this one has a disease" is called labelling.

Human beings do the labelling. Thousands of workers around the world sit and look at images, audio clips, and text — and tag each one with the correct answer. This work is called data annotation.

Example — Plantix: Before Plantix could recognise paddy blast disease, a human expert had to look at thousands of rice leaf photographs and tag each one: "Healthy leaf", "Brown Spot disease", "Bacterial Blight", "Paddy Blast". The AI could only learn after humans did this labelling work first.

Types of labelling work:

Type 1

Image Labelling

Tag what is in a photo: "cat", "dog", "car", "diseased leaf", "face of person A".

Type 2

Text Labelling

Tag the meaning of text: "this review is positive", "this sentence is a question", "this word is a place name".

Type 3

Audio Labelling

Write out what is being said in an audio clip. Used to train voice recognition (like Google Assistant or Alexa).

Type 4

Drawing Boxes

Draw a rectangle around every object in a photo. Used to train AI that needs to locate things — like self-driving cars spotting pedestrians.

This labelling work is done by millions of workers worldwide, including many in India. It is one of the largest digital jobs in the country. It is not glamorous — but without it, no AI system would work.

Career note: Data annotation and quality review are growing career fields in India. Companies like iMerit, Appen, and Scale AI hire thousands of people in India for this work. Some students start freelancing in this field while still in college.

Section 4 of 8

🔎 Finding Patterns — What the Model Actually Does

Once the AI has labelled training data, it starts to learn. But what does that actually mean?

The model looks at an example and makes a guess.
If wrong → it adjusts itself slightly.
If right → it reinforces the pattern.
This repeats millions of times until the model gets consistently accurate.

⚖️

Think of it like a weighing scale

Inside the AI, there are millions of tiny "dials" called weights. When the model is wrong, it turns the dials slightly. When it is right, it keeps them the same. After millions of adjustments, the dials end up in a position that produces correct answers for most examples.

Concrete example — learning to spot a dog:

The model sees a photo. It guesses: "cat".
The label says: "dog". The model was wrong.
The model adjusts its dials slightly: "I should have looked more at the ears. And the body shape. Less at the colour."
Next time it sees floppy ears + large body, it guesses "dog" — and gets it right.
After 50,000 photos, it has a very clear sense of what makes a dog look like a dog.

The key insight: The model does not learn the rules the way a textbook explains them. It discovers its own internal rules by trial and error. These internal rules are often so complex that even the people who built the AI cannot fully explain why it makes a specific decision. This is called the "black box" problem.

Real example of "unknown rules": When researchers studied a skin disease detection AI, it was making highly accurate diagnoses — but also using the presence of a ruler in the image as a clue. Why? Because dermatologists often photograph serious lesions next to a ruler for scale. The AI had learned "ruler = probably serious" from the data — a pattern no doctor intended to teach it.

Section 5 of 8

📊 Testing and Accuracy — How Do We Know It Learned Correctly?

After training, how do we know the AI actually learned the right thing? We test it.

Before training begins, the team sets aside a portion of the labelled data that the AI has never seen. This is called the test set. After training, they check the AI's answers on this unseen data.

Collect Data

Gather images, text, or sensor readings

Label Data

Humans tag each example with the right answer

Split Data

80% for training, 20% kept hidden for testing

Train Model

AI studies training data and adjusts its weights

Test Model

AI answers questions on the hidden test set

Measure Accuracy

Count correct answers ÷ total questions × 100%

What does accuracy mean in practice?

95% accuracy on face unlock means it identifies you correctly 19 out of 20 times — and wrongly once. For a phone, this is acceptable.
90% accuracy on a cancer detection AI means it misses 1 in 10 cancer cases. For medicine, this may not be acceptable.
99.9% accuracy on an aircraft control system means it still fails 1 in 1000 times. For safety-critical systems, this is never acceptable on its own.

Accuracy alone is not enough. An AI trained to detect a rare disease (that appears in 1% of patients) can get 99% accuracy by just saying "no disease" for every patient — because 99% of the time, that would be correct! High accuracy numbers can be misleading. Always ask: accurate at what task, on what data, in what conditions?

Section 6 of 8

⚠️ What Can Go Wrong — Biased Data and Overfitting

AI learns from data. This means AI can only be as fair and accurate as the data it learned from. Two very common problems are bias and overfitting.

Problem 1: Biased Data

If the training data does not fairly represent everyone, the AI will work well for some groups and badly for others.

Real example — face recognition: In 2018, a MIT study found that a commercial face recognition AI had 99% accuracy for light-skinned men — but only 65% accuracy for dark-skinned women. Why? The training data had far more photos of light-skinned men than dark-skinned women. The AI learned from what it saw most.

This matters in India: AI systems trained mostly on Western data sometimes fail to recognise Indian faces with the same accuracy.

Problem 2: Overfitting

Overfitting happens when the AI learns the training data too well — including the mistakes and quirks — and then fails on new data it has never seen.

Analogy: Imagine a student who memorised every question and answer from last year's question paper. They would score 100% on that exact paper — but fail if even one question was rephrased. They "overfitted" to the exam paper instead of understanding the subject. AI can make the same mistake.

Problem 3: Missing Data

If something important is not in the training data, the AI will not know it exists. Early speech recognition AIs in India struggled with regional accents because very few training recordings came from rural areas or smaller cities. The AI was not biased on purpose — it just had not seen enough examples from those places.

The fix: Better, more diverse, more representative data. This is an active challenge in Indian AI — collecting enough high-quality data in Telugu, Kannada, Odia, Marathi, and other languages is ongoing work by universities, CDAC, and startups like AI4Bharat and Karya.

Hands-On Activity · No computer needed

🎮 Be the AI — Sort Shapes Without Rules

This activity lets you experience what it feels like to learn from examples — just like an AI model does. No computer needed. Works with pencil and paper, or draw the shapes on small cards.

Setup: Your teacher or parent creates two piles of shapes. Pile A has 10 shapes labelled "Group A". Pile B has 10 shapes labelled "Group B". You do not know the rule — just like an AI at the start of training.

🔵

Group A examples

Blue circle, blue square, blue triangle

🔴

Group B examples

Red circle, red square, red triangle

❓

Test shapes

New shapes you have not seen before

How to play:

Study the Group A and Group B examples carefully. Notice any patterns — colour, shape, size, whatever you can spot.
Write down your guess for the rule in your notebook. Do not share it yet.
The teacher shows you 5 new "test" shapes. Classify each one as Group A or Group B using your rule.
After everyone finishes, the teacher reveals the actual rule. How did you do?

Variation — make it harder:

Mix colours and shapes unpredictably (the rule might be shape, not colour)
Add a third group (Group C) with overlapping features
Try with only 3 examples instead of 10 — see how accuracy drops with less data

Discuss after the activity:

What happened when you had more examples to study? Did your accuracy improve?
Did anyone pick a wrong rule that still got most answers right? (This is overfitting!)
What would happen if all Group A examples were only blue squares — and then you got a blue triangle in the test? (This is biased data!)

🎯 Congratulations — you just simulated supervised machine learning. The examples = training data. The labels (Group A / Group B) = annotations. Your rule-guessing = the model finding patterns. The test shapes = the test set.

Section 8 of 8

🚀 From Data to App — How a Trained Model Becomes YouTube

You have now seen all the pieces. Let us put them together to understand how a real AI product — like the YouTube recommendation engine — gets built from scratch.

Collect

Trillions of YouTube watch events collected every day

Label

Each video: topic, language, audience, quality score

Train

Model learns: "users who watched X also loved Y"

Test

Test on a held-out group: did they watch recommended videos longer?

Deploy

Model runs live, making 80 billion recommendations per day

Retrain

New data flows in constantly → model updates continuously

The critical thing to notice: this cycle never stops. The model is always learning from new data. The YouTube you see today is subtly different from the YouTube that existed last week — because new watch patterns keep retraining the system.

This is also why AI systems can change suddenly. If millions of users suddenly start watching a new type of content, the model retrains and starts recommending it to everyone. Trends on social media can spread with extraordinary speed because AI systems learn and amplify them almost in real time.

Bridge to Lesson 4: Now you know how AI learns. In the next lesson we take a tour of the different types of AI tools — chatbots, image generators, voice assistants, recommendation engines, and more — and understand which kind of learning each one uses.

🎯 Quick Quiz — 10 Questions to Check What You Learned

Q1. Priya's brother Arjun learned to identify a dog just from examples — without anyone writing the rules. What does this represent in AI terms?

Q2. What is training data?

Q3. Thousands of workers looked at paddy leaf photos and tagged each one as "healthy" or "diseased" so Plantix could learn. What is this process called?

Q4. How does an AI model "adjust" itself when it gets a wrong answer during training?

Q5. Before testing an AI, engineers keep some labelled data separate and never show it to the model during training. What is this held-out data called?

Q6. A skin cancer detection AI learned to associate a ruler in a photo with "likely serious case" — because doctors photograph serious cases next to rulers. What type of AI problem is this?

Q7. A face recognition AI worked well for light-skinned men but poorly for dark-skinned women. What was the most likely cause?

Q8. A student memorised last year's exact exam questions and answers. On the new exam with rephrased questions, she failed. Which AI problem does this represent?

Q9. Why do AI voice assistants sometimes work better in English than in Telugu or Hindi?

Q10. After an AI model is deployed in an app like YouTube, does it ever stop learning?

0/10

questions correct

📝 Activity Sheet — Map a Real AI to Its Learning Pipeline

Tip: in the print dialog, choose "Save as PDF" to download.

Choose any AI tool you use regularly (YouTube, Maps, autocorrect, face unlock, weather app, Plantix, etc.) and fill in the pipeline below for that tool in your notebook.

Step	Question	Your Answer
1. Data	What type of data does this AI learn from?
2. Label	How would a human label that data? (e.g., "cat/dog", "positive/negative")
3. Amount	How much data do you think it needed? (hundreds / millions / billions)
4. Pattern	What pattern do you think the AI learned? Describe in one sentence.
5. Risk	What biased or incomplete data could make this AI unfair or inaccurate?

Reflection (write in notebook):

In the hands-on activity, what happened when you had fewer examples to study? Did your guesses become less accurate?
Can you think of an AI tool that may have been trained without enough Indian data? How might that affect its usefulness for Indian users?
If you were building an AI to help farmers identify pests in their crops, what data would you collect, and how would you make sure it was fair for farmers across all states of India?

Use this table in your notebook today, or print this page directly if helpful.

👨‍👩‍👧 For Parents and Teachers

What this lesson covers: This is Lesson 3 of 12 in the Class 6 full-year AI curriculum. Students learn how machine learning works — training data, labelling/annotation, pattern finding, testing, accuracy, biased data, and overfitting — through story, analogy, and a hands-on group activity. India-specific examples (Plantix, Aadhaar, IMD, AI4Bharat) appear throughout.

Learning time: Around 60–75 minutes. The hands-on "Sort Shapes" activity works best as a 15-minute group exercise and can be run with paper and pencil — no technology needed.

Key concepts introduced for the first time:

Machine learning — learning from data vs. following programmed rules
Training data and annotation — the human labour behind every AI system
Weights / model adjustment — a simple analogy for gradient descent
Test set and accuracy — how AI performance is measured and why it can mislead
Bias and overfitting — two fundamental failure modes students will encounter throughout their lives

For classroom use: The hands-on activity can be turned into a 2-day project. Day 1: students work through the lesson. Day 2: shape-sorting activity in groups, then class discussion on what "more data" does to accuracy. Connecting it to the real Plantix or IMD examples makes the abstract concrete for Indian classrooms.

Safety by design: No personal data collected from students on this page. No login required. All examples use age-appropriate, school-safe contexts.