Convolutional Neural Networks ๐Ÿ‘๏ธ

Class 10Age 14โ€“15Lesson 1 of 12๐Ÿ†“ Free
Indian student studying CNN architecture diagrams on laptop in a well-lit Delhi room, filter visualisations on screen
Watch first - 2-3 minutes

Class 10 Lesson 1 - Convolutional Neural Networks

No sign-in needed - English narration - Safe for all school ages

Meet Meera โ€” Class 10, Delhi

Meera's younger brother has a mango tree in the backyard. Every season, some mangoes develop a fungal disease. Her father usually identifies it by eye โ€” dark spots with a yellowish ring. Meera wondered: "Could an AI identify diseased fruit from a photo?" She'd heard of apps like Plantix. But how exactly does a computer see an image?

She opened her Class 9 notes. She knew a regular neural network flattens every pixel into a 1D list. For a 224ร—224 image with 3 colour channels, that's 150,528 numbers โ€” before even one neuron processes them. "No wonder they struggle," she thought. "They lose all the spatial structure." Then she read about Convolutional Neural Networks โ€” and everything clicked.

The Core Problem
Why Regular Networks Fail on Images

In Class 9, you learned how a dense (fully-connected) neural network works: every input connects to every neuron. That's fine for small data, but disastrous for images:

CNNs solve all three problems with one key idea: local connections + shared weights.

Core Concept
How a Convolutional Filter Works

A filter (also called a kernel) is a small grid of weights โ€” typically 3ร—3. It slides across the image, computing a dot product at each position. The result is a feature map that highlights where that pattern appears.

๐Ÿ”ฒ Edge Detector (Vertical)
-1
0
+1
-1
0
+1
-1
0
+1
Activates strongly at vertical edges in the image
๐Ÿ”ฒ Blur Filter
1
1
1
1
1
1
1
1
1
Averages neighbouring pixels โ€” smooths noise
๐Ÿ”ฒ Sharpen Filter
0
-1
0
-1
+5
-1
0
-1
0
Amplifies centre vs neighbours โ€” sharpens detail

In a trained CNN, the network learns the filter weights automatically from the data. You don't hand-design them โ€” they emerge from backpropagation.

Architecture
Layers of a CNN
Input
224ร—224ร—3
โ†’
Conv2D
32 filters ร— 3ร—3
โ†’
MaxPool2D
2ร—2, halves size
โ†’
Conv2D + Pool
64 filters
โ†’
Flatten + Dense
128 neurons
โ†’
Softmax Output
N classes
Why pooling matters: A 224ร—224 image after two rounds of MaxPool2D becomes 56ร—56. The dense layer only sees 56ร—56 ร— (number of filters) instead of 224ร—224 ร— 3. This reduces parameters by ~16ร— and makes the model translation invariant โ€” the cat at the corner and the cat at the centre produce similar feature activations.
Python Code
Build a CNN in Keras
# CNN for Image Classification โ€” Google Colab
# Task: Classify images into 10 categories (CIFAR-10 dataset)

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import matplotlib.pyplot as plt

# โ”€โ”€ Step 1: Load and normalise CIFAR-10 โ”€โ”€
(X_train, y_train), (X_test, y_test) = keras.datasets.cifar10.load_data()
X_train = X_train.astype('float32') / 255.0  # scale pixels to 0-1
X_test  = X_test.astype('float32')  / 255.0

class_names = ['airplane','automobile','bird','cat','deer',
               'dog','frog','horse','ship','truck']

print(f"Training set: {X_train.shape}")   # (50000, 32, 32, 3)
print(f"Test set:     {X_test.shape}")    # (10000, 32, 32, 3)

# โ”€โ”€ Step 2: Build CNN architecture โ”€โ”€
model = keras.Sequential([
    # Block 1: Detect edges and textures
    layers.Conv2D(32, (3,3), activation='relu', padding='same',
                  input_shape=(32,32,3)),
    layers.Conv2D(32, (3,3), activation='relu', padding='same'),
    layers.MaxPool2D(2, 2),     # 32x32 -> 16x16
    layers.Dropout(0.25),

    # Block 2: Detect shapes from edges
    layers.Conv2D(64, (3,3), activation='relu', padding='same'),
    layers.Conv2D(64, (3,3), activation='relu', padding='same'),
    layers.MaxPool2D(2, 2),     # 16x16 -> 8x8
    layers.Dropout(0.25),

    # Classifier head
    layers.Flatten(),
    layers.Dense(512, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(10, activation='softmax')  # 10 CIFAR-10 classes
])

model.summary()  # see total parameters

# โ”€โ”€ Step 3: Compile โ”€โ”€
model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

# โ”€โ”€ Step 4: Train โ”€โ”€
history = model.fit(
    X_train, y_train,
    epochs=20,
    batch_size=64,
    validation_split=0.1,
    verbose=1
)

# โ”€โ”€ Step 5: Evaluate โ”€โ”€
test_loss, test_acc = model.evaluate(X_test, y_test, verbose=0)
print(f"\nTest accuracy: {test_acc:.2%}")

# โ”€โ”€ Step 6: Plot training curves โ”€โ”€
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))
ax1.plot(history.history['accuracy'], label='Train')
ax1.plot(history.history['val_accuracy'], label='Val')
ax1.set_title('Accuracy'); ax1.legend()
ax2.plot(history.history['loss'], label='Train')
ax2.plot(history.history['val_loss'], label='Val')
ax2.set_title('Loss'); ax2.legend()
plt.tight_layout(); plt.show()

# โ”€โ”€ Step 7: Predict on one image โ”€โ”€
import numpy as np
idx = 42
pred = model.predict(X_test[idx:idx+1])
print(f"Predicted: {class_names[np.argmax(pred)]}  "
      f"Actual: {class_names[y_test[idx][0]]}")
Expected result: With 20 epochs on CIFAR-10, this architecture reaches roughly 75โ€“80% test accuracy. ResNet-50 (from Class 10 Lesson 2) gets 93%+ using transfer learning โ€” same task, far less training, much better accuracy.
Key Ideas Summary
What Makes CNNs Powerful
Meera's mango project: To build a mango disease detector, you need ~500โ€“1000 labelled photos (healthy / diseased). With CNNs and transfer learning (next lesson), you can achieve 90%+ accuracy from scratch in Colab โ€” no expensive GPU required for inference.

๐Ÿงช Check Your Understanding โ€” Lesson 1 Quiz

1. The main reason regular (dense) neural networks struggle with images is:
a) They are too slow to train
b) They require GPU hardware
c) They lose spatial structure when images are flattened, and have too many parameters for image-sized inputs
d) They can only work with greyscale images
2. A convolutional filter (kernel) is typically:
a) The same size as the full input image
b) A small grid of learnable weights (e.g., 3ร—3) that slides across the image
c) A pre-defined mathematical function that cannot be changed by training
d) A list of pixel brightness values from the image
3. The purpose of MaxPool2D(2,2) in a CNN is to:
a) Add more feature maps to increase model capacity
b) Apply an activation function to remove negative values
c) Reduce the spatial dimensions of feature maps by half, keeping the strongest activations and reducing parameters
d) Connect all neurons from one layer to all neurons in the next
4. "Parameter sharing" in CNNs means:
a) Multiple users share the same trained model
b) One filter's weights are used at every position as it slides across the image, drastically reducing the number of learnable parameters
c) All layers in the network use the same weight values
d) Dense layers share weights with convolutional layers
5. In a deep CNN, what does the network typically learn in its earliest layers?
a) High-level concepts like "cat" or "car"
b) Simple patterns like edges and colour gradients, which later layers combine into shapes and objects
c) The softmax probabilities for each class
d) The exact pixel colours in the training images
6. In Keras, `padding='same'` in Conv2D means:
a) All layers use the same filter size
b) The output feature map has the same spatial dimensions as the input by adding zeros around the border
c) The model uses the same weights as the previous layer
d) No activation function is applied
7. Why is data augmentation useful when training a CNN?
a) It increases the number of test images to evaluate accuracy more precisely
b) It makes images smaller so training is faster
c) It artificially increases training set diversity (flips, rotations, zooms) so the model generalises better and overfits less
d) It normalises pixel values to the range 0โ€“1
8. After two MaxPool2D(2,2) layers, a 224ร—224 feature map becomes:
a) 224ร—224 (pooling doesn't change size)
b) 112ร—112
c) 56ร—56
d) 28ร—28
โ† Class 10 Hub Lesson 2: Transfer Learning โ†’