Priya's Teaching Problem 🐱🐶
Priya's little brother Arjun is four years old. One afternoon, Priya showed him pictures from a book and tried to teach him the difference between a cat and a dog.
"This is a cat," she said, pointing at a fluffy orange animal. "And this is a dog." She pointed at a golden retriever.
Arjun looked carefully. "Okay," he said.
She showed him twenty more pictures. Cat. Dog. Cat. Dog. Cat. Each time, she told him the answer.
Then she tested him. She showed him a new picture of a Labrador he had never seen. He looked at it for a moment. "Dog!" he said.
"Right!" said Priya. "How did you know?"
Arjun shrugged. "Big, floppy ears. Tongue out. Looks like the others."
Priya stopped. She realised she had never told him the rules. Nobody said "four legs + floppy ears + wagging tail = dog". He just looked at enough examples and found the pattern himself.
🧠 What Does "Learning" Mean for a Computer?
There is an important difference between two things a computer can do:
Memorisation (old computers)
- A programmer writes every rule
- If dog → big ears, tail, fur → label "dog"
- New situations break the rules
- Can only do exactly what it was told
- Can't handle things it hasn't seen
Learning (AI / Machine Learning)
- Nobody writes the rules
- The AI finds patterns itself from examples
- Works on new situations it has never seen
- Gets better with more data
- Makes mistakes — just like a student does
The type of AI that learns from examples is called Machine Learning (ML). The word "machine" just means computer. The word "learning" means exactly what it sounds like — the computer improves by looking at data.
Why is this powerful? There are tasks where it is impossible to write all the rules. No one can write a complete rule for recognising faces, understanding speech, or predicting rain. There are too many variables. Machine learning handles this by learning the pattern from data instead of rules.
📦 Training Data — The Textbook the AI Studies From
Before an AI can learn anything, it needs data. Lots of it. This is called training data — the collection of examples the AI uses to learn.
Think of training data as the textbook the AI reads. But instead of reading it once, the AI "reads" the same data hundreds or thousands of times, finding slightly better patterns each time.
What does training data look like?
| AI System | What the Training Data Contains | How Much Data |
|---|---|---|
| Face unlock | Millions of labelled face photos: "Person A, Person B, Person A..." | 10+ million images |
| Autocorrect keyboard | Billions of real sentences written by millions of people | 100+ billion words |
| YouTube recommendation | Watch history, clicks, completion rates of billions of users | Trillions of events per day |
| Weather AI (IMD) | 100+ years of temperature, rainfall, wind, satellite readings across India | Petabytes of sensor data |
| Plantix (crop disease) | Labelled photos of diseased and healthy crop leaves | 1 million+ plant images |
🏷️ Labelling — How Humans Teach AI the Answers
Training data alone is not enough. The AI needs to know the right answer for each example. This process of telling the AI "this one is a cat" or "this one has a disease" is called labelling.
Human beings do the labelling. Thousands of workers around the world sit and look at images, audio clips, and text — and tag each one with the correct answer. This work is called data annotation.
Types of labelling work:
Image Labelling
Tag what is in a photo: "cat", "dog", "car", "diseased leaf", "face of person A".
Text Labelling
Tag the meaning of text: "this review is positive", "this sentence is a question", "this word is a place name".
Audio Labelling
Write out what is being said in an audio clip. Used to train voice recognition (like Google Assistant or Alexa).
Drawing Boxes
Draw a rectangle around every object in a photo. Used to train AI that needs to locate things — like self-driving cars spotting pedestrians.
This labelling work is done by millions of workers worldwide, including many in India. It is one of the largest digital jobs in the country. It is not glamorous — but without it, no AI system would work.
🔎 Finding Patterns — What the Model Actually Does
Once the AI has labelled training data, it starts to learn. But what does that actually mean?
- The model looks at an example and makes a guess.
- If wrong → it adjusts itself slightly.
- If right → it reinforces the pattern.
- This repeats millions of times until the model gets consistently accurate.
Think of it like a weighing scale
Inside the AI, there are millions of tiny "dials" called weights. When the model is wrong, it turns the dials slightly. When it is right, it keeps them the same. After millions of adjustments, the dials end up in a position that produces correct answers for most examples.
Concrete example — learning to spot a dog:
- The model sees a photo. It guesses: "cat".
- The label says: "dog". The model was wrong.
- The model adjusts its dials slightly: "I should have looked more at the ears. And the body shape. Less at the colour."
- Next time it sees floppy ears + large body, it guesses "dog" — and gets it right.
- After 50,000 photos, it has a very clear sense of what makes a dog look like a dog.
📊 Testing and Accuracy — How Do We Know It Learned Correctly?
After training, how do we know the AI actually learned the right thing? We test it.
Before training begins, the team sets aside a portion of the labelled data that the AI has never seen. This is called the test set. After training, they check the AI's answers on this unseen data.
What does accuracy mean in practice?
- 95% accuracy on face unlock means it identifies you correctly 19 out of 20 times — and wrongly once. For a phone, this is acceptable.
- 90% accuracy on a cancer detection AI means it misses 1 in 10 cancer cases. For medicine, this may not be acceptable.
- 99.9% accuracy on an aircraft control system means it still fails 1 in 1000 times. For safety-critical systems, this is never acceptable on its own.
⚠️ What Can Go Wrong — Biased Data and Overfitting
AI learns from data. This means AI can only be as fair and accurate as the data it learned from. Two very common problems are bias and overfitting.
Problem 1: Biased Data
If the training data does not fairly represent everyone, the AI will work well for some groups and badly for others.
This matters in India: AI systems trained mostly on Western data sometimes fail to recognise Indian faces with the same accuracy.
Problem 2: Overfitting
Overfitting happens when the AI learns the training data too well — including the mistakes and quirks — and then fails on new data it has never seen.
Problem 3: Missing Data
If something important is not in the training data, the AI will not know it exists. Early speech recognition AIs in India struggled with regional accents because very few training recordings came from rural areas or smaller cities. The AI was not biased on purpose — it just had not seen enough examples from those places.
🎮 Be the AI — Sort Shapes Without Rules
This activity lets you experience what it feels like to learn from examples — just like an AI model does. No computer needed. Works with pencil and paper, or draw the shapes on small cards.
Setup: Your teacher or parent creates two piles of shapes. Pile A has 10 shapes labelled "Group A". Pile B has 10 shapes labelled "Group B". You do not know the rule — just like an AI at the start of training.
How to play:
- Study the Group A and Group B examples carefully. Notice any patterns — colour, shape, size, whatever you can spot.
- Write down your guess for the rule in your notebook. Do not share it yet.
- The teacher shows you 5 new "test" shapes. Classify each one as Group A or Group B using your rule.
- After everyone finishes, the teacher reveals the actual rule. How did you do?
Variation — make it harder:
- Mix colours and shapes unpredictably (the rule might be shape, not colour)
- Add a third group (Group C) with overlapping features
- Try with only 3 examples instead of 10 — see how accuracy drops with less data
Discuss after the activity:
- What happened when you had more examples to study? Did your accuracy improve?
- Did anyone pick a wrong rule that still got most answers right? (This is overfitting!)
- What would happen if all Group A examples were only blue squares — and then you got a blue triangle in the test? (This is biased data!)
🎯 Congratulations — you just simulated supervised machine learning. The examples = training data. The labels (Group A / Group B) = annotations. Your rule-guessing = the model finding patterns. The test shapes = the test set.
🚀 From Data to App — How a Trained Model Becomes YouTube
You have now seen all the pieces. Let us put them together to understand how a real AI product — like the YouTube recommendation engine — gets built from scratch.
The critical thing to notice: this cycle never stops. The model is always learning from new data. The YouTube you see today is subtly different from the YouTube that existed last week — because new watch patterns keep retraining the system.
🎯 Quick Quiz — 10 Questions to Check What You Learned
📝 Activity Sheet — Map a Real AI to Its Learning Pipeline
Tip: in the print dialog, choose "Save as PDF" to download.Choose any AI tool you use regularly (YouTube, Maps, autocorrect, face unlock, weather app, Plantix, etc.) and fill in the pipeline below for that tool in your notebook.
| Step | Question | Your Answer |
|---|---|---|
| 1. Data | What type of data does this AI learn from? | |
| 2. Label | How would a human label that data? (e.g., "cat/dog", "positive/negative") | |
| 3. Amount | How much data do you think it needed? (hundreds / millions / billions) | |
| 4. Pattern | What pattern do you think the AI learned? Describe in one sentence. | |
| 5. Risk | What biased or incomplete data could make this AI unfair or inaccurate? |
Reflection (write in notebook):
- In the hands-on activity, what happened when you had fewer examples to study? Did your guesses become less accurate?
- Can you think of an AI tool that may have been trained without enough Indian data? How might that affect its usefulness for Indian users?
- If you were building an AI to help farmers identify pests in their crops, what data would you collect, and how would you make sure it was fair for farmers across all states of India?
Use this table in your notebook today, or print this page directly if helpful.
What this lesson covers: This is Lesson 3 of 12 in the Class 6 full-year AI curriculum. Students learn how machine learning works — training data, labelling/annotation, pattern finding, testing, accuracy, biased data, and overfitting — through story, analogy, and a hands-on group activity. India-specific examples (Plantix, Aadhaar, IMD, AI4Bharat) appear throughout.
Learning time: Around 60–75 minutes. The hands-on "Sort Shapes" activity works best as a 15-minute group exercise and can be run with paper and pencil — no technology needed.
Key concepts introduced for the first time:
- Machine learning — learning from data vs. following programmed rules
- Training data and annotation — the human labour behind every AI system
- Weights / model adjustment — a simple analogy for gradient descent
- Test set and accuracy — how AI performance is measured and why it can mislead
- Bias and overfitting — two fundamental failure modes students will encounter throughout their lives
For classroom use: The hands-on activity can be turned into a 2-day project. Day 1: students work through the lesson. Day 2: shape-sorting activity in groups, then class discussion on what "more data" does to accuracy. Connecting it to the real Plantix or IMD examples makes the abstract concrete for Indian classrooms.
Safety by design: No personal data collected from students on this page. No login required. All examples use age-appropriate, school-safe contexts.