Lesson 3 — How AI Learns: Classification and Prediction | Class 8

Story · Asha's Mango Sorting Machine

How Does a Machine Know Which Mango Is Ripe? 🥭

Asha, 13, from Rajahmundry, visited her aunt's mango orchard. Workers were sorting mangoes — ripe, unripe, or too ripe — by looking at colour, feeling firmness, and smelling the fruit.

A company had recently installed an automated sorting machine that used a camera and AI to sort mangoes automatically. Asha was curious: "Does the machine smell them too?"

The engineer laughed. "Not smell — but it analyses colour and shape with 98% accuracy. We showed it 50,000 photos of mangoes with labels: ripe, unripe, overripe. After training, it sorts better than a tired worker at 4am."

Asha asked the important question: "What happens if it sees a mango variety it has never been trained on?" The engineer nodded. "Good question. It will probably make mistakes. That is why we keep adding new training data when we expand to new varieties."

👉 Asha has just seen classification in action. This lesson explains how it works — and its sibling task, prediction — in simple terms.

Section 1 of 7

📂 Classification vs Prediction — The Two Big Tasks

Almost everything an AI model does falls into one of two categories:

🏷️ Classification

Assign an input to one of a fixed set of categories. The output is a label.

Examples: spam/not spam · ripe/unripe/overripe · cat/dog/bird · disease/no disease · pass/fail

📈 Prediction (Regression)

Predict a number on a continuous scale. The output is a value.

Examples: predict exam score · predict tomorrow's rainfall · predict house price · predict energy consumption

Rule of thumb: If the output is a category (yes/no, type A/B/C), it is classification. If the output is a number on a scale, it is regression (prediction).

Section 2 of 7

🔢 How Classification Works: A Simple Walk-Through

Let us use a simple example: predicting whether a student will pass or fail, based on two features: hours studied per day and attendance percentage.

Collect labelled data. You gather records of 1,000 past students — each with study hours, attendance, and whether they passed or failed.

The model looks for a boundary. Imagine plotting students on a graph: x-axis = study hours, y-axis = attendance. Passing students cluster in one region; failing students in another. The model tries to find a line (or curve) that separates them.

That line is the decision boundary. For a new student, the model checks which side of the boundary their data falls on — and predicts accordingly.

Confidence scores. Most models do not just say "pass" — they say "72% chance of passing." The probability is called a confidence score. Decisions made with low confidence (e.g. 51% vs 49%) should be treated with more caution.

Test the boundary. The model is tested on students it has never seen to check whether the boundary generalises or is overfit to the training data.

Section 3 of 7

📊 Measuring How Good a Classifier Is

Saying a model is "95% accurate" sounds impressive. But accuracy alone can be misleading. Here is why.

Imagine 1,000 emails: 950 are not spam, 50 are spam. A model that simply labels everything as "not spam" would be 95% accurate — but useless at actually catching spam.

The confusion matrix

	Predicted: Spam	Predicted: Not Spam
Actually: Spam	True Positive (TP) ✅	False Negative (FN) ❌ — missed spam
Actually: Not Spam	False Positive (FP) ❌ — wrongly blocked	True Negative (TN) ✅

Key metrics from the confusion matrix

Accuracy: (TP + TN) / total — proportion of all predictions that were correct
Precision: TP / (TP + FP) — of all items predicted as spam, how many actually were?
Recall: TP / (TP + FN) — of all actual spam, how many did the model catch?

Why this matters: In medical diagnosis, high recall is critical — missing a real disease (false negative) is dangerous. In fraud detection, high precision matters — wrongly blocking a legitimate transaction (false positive) hurts trust. The right metric depends on the real-world consequences of each type of error.

Section 4 of 7

🌳 Decision Trees: AI You Can Read

One of the most intuitive ML models is the decision tree. It asks a series of yes/no questions about the features of an input and follows a branch for each answer until it reaches a conclusion.

Example decision tree for mango classification:

Is the mango yellow? → If YES → Is it firm? → If YES → RIPE; If NO → OVERRIPE
Is the mango yellow? → If NO → Is it mostly green? → If YES → UNRIPE; If NO → OVERRIPE

Decision trees are easy to understand and explain. A farmer or doctor can look at the tree and verify whether the rules make sense. This is called interpretability — and it is an important property for AI systems used in high-stakes decisions.

More complex models (like neural networks with millions of weights) produce better results but are much harder to explain — they are often called "black boxes".

Important trade-off: Better accuracy often comes with less explainability. Choosing the right balance depends on the application — a slightly less accurate model that you can explain may be far more trustworthy and safe to deploy than a highly accurate black box.

Section 5 of 7

📈 Prediction (Regression) — Estimating Values

When the output is a number, the task is called regression. The model tries to find a mathematical relationship between the input features and the output value.

Simple example: Predicting exam score from study hours.
If you studied 3 hours → model predicts 65 marks
If you studied 5 hours → model predicts 78 marks
The model finds the best-fitting line (or curve) through the training data points.

Real Indian examples of prediction (regression)

Prediction task	Input features	Predicted output
Crop yield prediction	Rainfall, temperature, soil type, fertiliser	Expected harvest in tonnes/hectare
Electricity demand forecasting	Time of day, day of week, season, temperature	Expected demand in MW
Flood risk estimation	Rainfall over 48 hours, river level, soil moisture	Probability of flooding (0–100%)
Hospital readmission risk	Age, diagnosis, treatment, previous admissions	Probability of readmission within 30 days

Section 6 of 7

⚠️ When AI Gets It Wrong: Error Analysis

Every AI classifier makes mistakes. Understanding the pattern of mistakes is more important than knowing the overall accuracy score.

Where does the model fail most? Certain categories, regions, or user groups may have higher error rates — often because they were underrepresented in training data.
What are the consequences of errors? A wrong recommendation on a video platform is annoying. A wrong medical diagnosis is dangerous. Error analysis should always consider real-world consequences.
Does accuracy differ across groups? If a model is 95% accurate for one group and 78% accurate for another — that is unfair, even if the average accuracy is high.

Asha's question, answered: When the mango sorting machine sees a new variety it was not trained on, it may confidently classify it incorrectly — because the patterns it learned do not apply to the new variety. This is why engineers keep expanding the training data and regularly test models on new situations.

Section 7 of 7

🗺️ Key Vocabulary Summary

Term	Simple meaning
Classification	Sorting input into a fixed set of categories (pass/fail, ripe/unripe)
Regression (prediction)	Predicting a number on a continuous scale (exam score, rainfall)
Decision boundary	The line (or surface) that separates different categories in the model's learned space
Confidence score	The probability the model assigns to its prediction (e.g. 73% confident this is spam)
Confusion matrix	A table showing correct and incorrect predictions broken down by category
False positive	Predicted positive but actually negative (wrongly blocked email)
False negative	Predicted negative but actually positive (missed real spam)
Decision tree	A model that makes predictions using a series of yes/no questions — interpretable and explainable
Interpretability	How well humans can understand why a model made a particular prediction

🎯 Quiz — Lesson 3

8 questions · Click your answer · Submit for your score

1. Which of these is a classification task?

2. What is a "decision boundary" in a classification model?

3. A spam filter that simply labels every email as "not spam" achieves 95% accuracy on a dataset where 95% of emails are legitimate. This shows:

4. In medical diagnosis AI, which metric is MOST important to maximise?

5. Why are decision trees valued for high-stakes AI applications (medical, legal, financial)?

6. A model predicts the probability of rain tomorrow as 58%. This 58% is called:

7. A model is 96% accurate for urban users but only 72% accurate for rural users. This difference is:

8. Which of these is a regression (prediction) task?

📝 Worksheet — Classify Your World

Tip: in the print dialog, choose "Save as PDF" to download.

In your notebook, answer these questions:

Think of 3 AI systems you have used or read about. For each one, say whether it is mainly a classification task, a regression task, or both.
Describe a real-world situation in India where a false negative from an AI classifier would be dangerous. What would you recommend the engineers prioritise — recall or precision?
Design a simple decision tree (3–5 yes/no questions) to classify whether a given day is good for flying a kite (consider wind, rain, and time of year).

📋 Note for Parents and Teachers

What this lesson covers: Classification vs regression, the training process for classifiers, the confusion matrix and evaluation metrics, decision trees and interpretability, and real Indian examples of both tasks. No mathematics beyond percentages is required.

Discussion prompts:

"If an AI was deciding whether you were allowed into an exam hall and made an error — would you prefer a false positive (letting cheaters in) or a false negative (blocking honest students)? Why?"
"Why is it important that a doctor can understand WHY an AI made a diagnosis — not just that it made a correct one?"

How AI Learns: Classification and Prediction 🎯

Class 8 Lesson 3 — Classification and Prediction

How Does a Machine Know Which Mango Is Ripe? 🥭

📂 Classification vs Prediction — The Two Big Tasks

🏷️ Classification

📈 Prediction (Regression)

🔢 How Classification Works: A Simple Walk-Through

📊 Measuring How Good a Classifier Is

The confusion matrix

Key metrics from the confusion matrix

🌳 Decision Trees: AI You Can Read

📈 Prediction (Regression) — Estimating Values

Real Indian examples of prediction (regression)

⚠️ When AI Gets It Wrong: Error Analysis

🗺️ Key Vocabulary Summary

🎯 Quiz — Lesson 3

📝 Worksheet — Classify Your World

📋 Note for Parents and Teachers