How Does a Machine Know Which Mango Is Ripe? 🥭
Asha, 13, from Rajahmundry, visited her aunt's mango orchard. Workers were sorting mangoes — ripe, unripe, or too ripe — by looking at colour, feeling firmness, and smelling the fruit.
A company had recently installed an automated sorting machine that used a camera and AI to sort mangoes automatically. Asha was curious: "Does the machine smell them too?"
The engineer laughed. "Not smell — but it analyses colour and shape with 98% accuracy. We showed it 50,000 photos of mangoes with labels: ripe, unripe, overripe. After training, it sorts better than a tired worker at 4am."
Asha asked the important question: "What happens if it sees a mango variety it has never been trained on?" The engineer nodded. "Good question. It will probably make mistakes. That is why we keep adding new training data when we expand to new varieties."
📂 Classification vs Prediction — The Two Big Tasks
Almost everything an AI model does falls into one of two categories:
🏷️ Classification
Assign an input to one of a fixed set of categories. The output is a label.
Examples: spam/not spam · ripe/unripe/overripe · cat/dog/bird · disease/no disease · pass/fail
📈 Prediction (Regression)
Predict a number on a continuous scale. The output is a value.
Examples: predict exam score · predict tomorrow's rainfall · predict house price · predict energy consumption
🔢 How Classification Works: A Simple Walk-Through
Let us use a simple example: predicting whether a student will pass or fail, based on two features: hours studied per day and attendance percentage.
📊 Measuring How Good a Classifier Is
Saying a model is "95% accurate" sounds impressive. But accuracy alone can be misleading. Here is why.
Imagine 1,000 emails: 950 are not spam, 50 are spam. A model that simply labels everything as "not spam" would be 95% accurate — but useless at actually catching spam.
The confusion matrix
| Predicted: Spam | Predicted: Not Spam | |
|---|---|---|
| Actually: Spam | True Positive (TP) ✅ | False Negative (FN) ❌ — missed spam |
| Actually: Not Spam | False Positive (FP) ❌ — wrongly blocked | True Negative (TN) ✅ |
Key metrics from the confusion matrix
- Accuracy: (TP + TN) / total — proportion of all predictions that were correct
- Precision: TP / (TP + FP) — of all items predicted as spam, how many actually were?
- Recall: TP / (TP + FN) — of all actual spam, how many did the model catch?
🌳 Decision Trees: AI You Can Read
One of the most intuitive ML models is the decision tree. It asks a series of yes/no questions about the features of an input and follows a branch for each answer until it reaches a conclusion.
- Is the mango yellow? → If YES → Is it firm? → If YES → RIPE; If NO → OVERRIPE
- Is the mango yellow? → If NO → Is it mostly green? → If YES → UNRIPE; If NO → OVERRIPE
Decision trees are easy to understand and explain. A farmer or doctor can look at the tree and verify whether the rules make sense. This is called interpretability — and it is an important property for AI systems used in high-stakes decisions.
More complex models (like neural networks with millions of weights) produce better results but are much harder to explain — they are often called "black boxes".
📈 Prediction (Regression) — Estimating Values
When the output is a number, the task is called regression. The model tries to find a mathematical relationship between the input features and the output value.
If you studied 3 hours → model predicts 65 marks
If you studied 5 hours → model predicts 78 marks
The model finds the best-fitting line (or curve) through the training data points.
Real Indian examples of prediction (regression)
| Prediction task | Input features | Predicted output |
|---|---|---|
| Crop yield prediction | Rainfall, temperature, soil type, fertiliser | Expected harvest in tonnes/hectare |
| Electricity demand forecasting | Time of day, day of week, season, temperature | Expected demand in MW |
| Flood risk estimation | Rainfall over 48 hours, river level, soil moisture | Probability of flooding (0–100%) |
| Hospital readmission risk | Age, diagnosis, treatment, previous admissions | Probability of readmission within 30 days |
⚠️ When AI Gets It Wrong: Error Analysis
Every AI classifier makes mistakes. Understanding the pattern of mistakes is more important than knowing the overall accuracy score.
- Where does the model fail most? Certain categories, regions, or user groups may have higher error rates — often because they were underrepresented in training data.
- What are the consequences of errors? A wrong recommendation on a video platform is annoying. A wrong medical diagnosis is dangerous. Error analysis should always consider real-world consequences.
- Does accuracy differ across groups? If a model is 95% accurate for one group and 78% accurate for another — that is unfair, even if the average accuracy is high.
🗺️ Key Vocabulary Summary
| Term | Simple meaning |
|---|---|
| Classification | Sorting input into a fixed set of categories (pass/fail, ripe/unripe) |
| Regression (prediction) | Predicting a number on a continuous scale (exam score, rainfall) |
| Decision boundary | The line (or surface) that separates different categories in the model's learned space |
| Confidence score | The probability the model assigns to its prediction (e.g. 73% confident this is spam) |
| Confusion matrix | A table showing correct and incorrect predictions broken down by category |
| False positive | Predicted positive but actually negative (wrongly blocked email) |
| False negative | Predicted negative but actually positive (missed real spam) |
| Decision tree | A model that makes predictions using a series of yes/no questions — interpretable and explainable |
| Interpretability | How well humans can understand why a model made a particular prediction |
🎯 Quiz — Lesson 3
8 questions · Click your answer · Submit for your score
📝 Worksheet — Classify Your World
In your notebook, answer these questions:
- Think of 3 AI systems you have used or read about. For each one, say whether it is mainly a classification task, a regression task, or both.
- Describe a real-world situation in India where a false negative from an AI classifier would be dangerous. What would you recommend the engineers prioritise — recall or precision?
- Design a simple decision tree (3–5 yes/no questions) to classify whether a given day is good for flying a kite (consider wind, rain, and time of year).
📋 Note for Parents and Teachers
What this lesson covers: Classification vs regression, the training process for classifiers, the confusion matrix and evaluation metrics, decision trees and interpretability, and real Indian examples of both tasks. No mathematics beyond percentages is required.
Discussion prompts:
- "If an AI was deciding whether you were allowed into an exam hall and made an error — would you prefer a false positive (letting cheaters in) or a false negative (blocking honest students)? Why?"
- "Why is it important that a doctor can understand WHY an AI made a diagnosis — not just that it made a correct one?"