Lesson 10 — AI for Real Indian Problems | Class 10

Meet Kavita — Class 10, Bhopal

Kavita's school had a "Student Innovation for India" competition. The theme: use technology to solve a problem you see in your community. While her classmates thought of apps for ordering food faster, Kavita thought bigger. Her uncle is a soybean farmer in Vidisha district who loses 20–30% of his crop every year to diseases he can't identify early. Her grandmother lives 40 km from the nearest hospital.

"AI doesn't just make apps faster," Kavita wrote in her project proposal. "It can give a poor farmer the same advisory service as a large agricultural company. It can give a rural patient access to preliminary diagnosis that was only possible in big cities. That's the kind of AI I want to build." She won first place. This lesson explores the four domains where student-built AI can create the most impact in India.

Four Impact Domains

Where Student AI Projects Can Make a Real Difference

🌾

Agriculture

58% of India's population depends on farming

India has 120M+ smallholder farms. Most farmers lose 15–40% of crop yield to pests, diseases, and poor timing. Expensive inputs are often applied too late or in wrong quantities.

Project idea 1: Crop disease detector — CNN trained on PlantVillage dataset (54,000 labelled leaf images, 38 disease classes). Output: disease name + treatment + severity. Deploy as Streamlit app.

Project idea 2: Crop price predictor — LSTM or gradient boosting on Agmarknet price data (government open data, district-wise daily prices). Predict next week's mandi price.

Datasets: PlantVillage (Kaggle), Agmarknet API, ICAR open datasets

Beginner: price predictor Intermediate: disease detector

🏥

Healthcare

1 doctor per 1511 patients in India (WHO recommends 1:1000)

India has a severe shortage of doctors, especially in rural areas. 70% of hospitals are in cities. Telemedicine and AI-assisted screening can bridge the gap for primary diagnosis.

Project idea 1: Diabetes risk predictor — Logistic regression / Random Forest on PIMA Indian Diabetes dataset. Input: 8 clinical features. Output: risk score + lifestyle advice.

Project idea 2: X-ray pneumonia detector — CNN using transfer learning (MobileNetV2) on NIH Chest X-ray dataset. Binary classification: Normal / Pneumonia.

Datasets: PIMA Diabetes (Kaggle), NIH Chest X-ray (Kaggle), AIIMS open research data

Beginner: diabetes risk Advanced: X-ray CNN

📚

Education

10M+ students drop out of school annually in India

India has 250M school children. Dropout rates are highest in Grades 6–9, especially for girls. Early identification of at-risk students enables targeted intervention by teachers and social workers.

Project idea 1: Student dropout predictor — Gradient boosting on UDISE+ data (school-level enrollment, attendance, pass rates by district). Identify high-risk districts.

Project idea 2: Telugu/Hindi learning assistant — Fine-tuned small LLM (DistilBERT) to answer curriculum questions in Telugu. RAG over NCERT textbook PDFs.

Datasets: UDISE+ (data.gov.in), NCERT textbooks (public), ASER rural education survey

Beginner: dropout analysis Intermediate: RAG assistant

💳

Financial Inclusion

190M+ Indians are unbanked or underbanked

India has 800M+ UPI users but access to formal credit is limited for rural and informal workers. Fair, explainable AI credit scoring can help microfinance institutions lend responsibly.

Project idea 1: Micro-credit scoring — Gradient boosting + SHAP explainability on synthetic PMJDY-inspired dataset. Feature importance shows which factors drive the decision (required for responsible lending).

Project idea 2: UPI fraud detector — Anomaly detection (Isolation Forest) on transaction patterns. Flag transactions that deviate significantly from user's history.

Datasets: RBI PMJDY data, UCI Credit Default dataset, synthetic data generators for privacy

Intermediate: credit scoring Advanced: fraud detection

Sample Project

Kavita's Soybean Disease Detector

# Soybean Crop Disease Classifier — Kavita's project
# Uses: Transfer Learning (MobileNetV2) + Streamlit + Streamlit Cloud

# ── Step 1: Data — PlantVillage dataset on Kaggle ──
# https://www.kaggle.com/datasets/abdallahalidev/plantvillage-dataset
# Folder structure: data/train/Soybean_Bacterial_Blight/
#                   data/train/Soybean_Frogeye_Leaf_Spot/
#                   data/train/Soybean_Healthy/

import tensorflow as tf
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D, Dropout
from tensorflow.keras.models import Model
from tensorflow.keras.preprocessing.image import ImageDataGenerator

CLASSES = ["Bacterial Blight", "Frogeye Leaf Spot", "Healthy"]
IMG_SIZE = 224
BATCH    = 32
EPOCHS   = 10

# Data generators with augmentation
train_gen = ImageDataGenerator(
    rescale=1./255, rotation_range=20, horizontal_flip=True,
    zoom_range=0.15, validation_split=0.2
)
train_data = train_gen.flow_from_directory(
    "data/", target_size=(IMG_SIZE, IMG_SIZE),
    batch_size=BATCH, subset='training', class_mode='categorical'
)
val_data = train_gen.flow_from_directory(
    "data/", target_size=(IMG_SIZE, IMG_SIZE),
    batch_size=BATCH, subset='validation', class_mode='categorical'
)

# Transfer learning: MobileNetV2 base + custom head
base = MobileNetV2(input_shape=(IMG_SIZE, IMG_SIZE, 3), include_top=False, weights='imagenet')
base.trainable = False
x = base.output
x = GlobalAveragePooling2D()(x)
x = Dropout(0.3)(x)
output = Dense(len(CLASSES), activation='softmax')(x)
model = Model(inputs=base.input, outputs=output)
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Train
history = model.fit(train_data, validation_data=val_data, epochs=EPOCHS)
model.save("soybean_model.h5")
print(f"Val accuracy: {max(history.history['val_accuracy']):.3f}")

# ── Step 2: Streamlit app (save as app.py) ──
# streamlit run app.py

APP_CODE = '''
import streamlit as st, numpy as np
from PIL import Image
import tensorflow as tf

CLASSES = ["Bacterial Blight", "Frogeye Leaf Spot", "Healthy"]
TREATMENTS = {
    "Bacterial Blight":   "Copper-based bactericide. Remove infected leaves. Avoid overhead irrigation.",
    "Frogeye Leaf Spot":  "Thiophanate-methyl fungicide. Plant resistant varieties next season.",
    "Healthy":            "Crop is healthy! Monitor weekly for early signs."
}

@st.cache_resource
def load_model():
    return tf.keras.models.load_model("soybean_model.h5")

st.title("🌱 Soybean Disease Detector")
st.markdown("Upload a soybean leaf photo to identify disease.")
model = load_model()
uploaded = st.file_uploader("Upload leaf image", type=["jpg","jpeg","png"])
if uploaded:
    img = Image.open(uploaded).convert("RGB")
    st.image(img, use_container_width=True)
    arr = np.expand_dims(np.array(img.resize((224,224)))/255.0, 0)
    pred = model.predict(arr)[0]
    cls  = CLASSES[np.argmax(pred)]
    conf = pred.max()
    st.success(f"**{cls}** ({conf*100:.1f}% confidence)")
    st.info(TREATMENTS[cls])
'''
with open("app.py", "w") as f:
    f.write(APP_CODE)
print("app.py written — run: streamlit run app.py")

Kavita's competition result: Her app correctly identified Frogeye Leaf Spot in 8 of 10 test images from her uncle's farm. The judges were impressed not by the accuracy but by her problem framing: she had spoken to 15 farmers, understood what disease descriptions they actually understood, and included the treatment cost estimate next to each recommendation. Technical skill + domain understanding = real impact.

Choosing Your Project

The India AI Project Checklist

✅ Is there a real person who has this problem? Can you talk to them? (Farmer, patient, teacher, small business owner)
✅ Is there public data? Check: Kaggle, data.gov.in, AI4Bharat, iNaturalist, UDISE+, RBI open data
✅ Does your output have a clear action? "Disease X — use Y treatment" is better than just "Probability: 0.73"
✅ Can you build a v1 in 2 weeks? Start simpler than you think. A working simple model beats a theoretically perfect one
✅ How will you handle wrong predictions? Always show confidence. For healthcare/agriculture, always say "Verify with an expert"
✅ Who gives feedback? Real users who test your app give 10× more learning than any evaluation metric

The most important lesson: India's AI opportunity is not in building the next GPT. It is in taking existing, proven AI techniques and applying them to 100 million-scale problems that Silicon Valley ignores — because those problems don't generate enough revenue for large companies but matter enormously to real people.

🧪 Check Your Understanding — Lesson 10 Quiz

1. The PlantVillage dataset is suitable for a beginner crop disease project because:

a) It contains financial data about crop prices in Indian mandis

b) It has 54,000+ labelled leaf images across 38 disease classes — large enough to train a CNN with transfer learning to good accuracy without needing to collect your own images

c) It is the only agriculture dataset approved for school projects in India

d) It was created specifically for the Indian climate and crop varieties

2. Why should an AI health screening app always say "Verify with an expert" even if it has 90% accuracy?

a) To make the app seem less impressive so users don't trust it too much

b) 90% accuracy means 1 in 10 results is wrong — for medical decisions, a false negative (missing a disease) or false positive (unnecessary treatment) can cause serious harm. AI assists; it does not replace clinical judgment.

c) Regulatory requirements in India mandate disclaimers on all mobile apps

d) Because the model has never seen Indian patients in its training data

3. UDISE+ data is useful for a student dropout predictor because:

a) It contains individual student test scores from all Indian schools

b) It is India's official school-level dataset with enrollment, attendance, pass rates, and infrastructure data by district — allowing analysis of patterns that predict high dropout risk at scale

c) It provides real-time dropout notifications from school attendance systems

d) It only covers urban schools in the top 10 cities

4. SHAP (SHapley Additive exPlanations) is important in a credit scoring AI because:

a) It increases model accuracy by removing redundant features

b) It explains WHY the model made a decision — showing which features (income, employment history, existing loans) contributed most to approving or rejecting a loan. This is required for fair, auditable lending decisions.

c) It compresses the model file for faster loading on mobile

d) It translates model outputs into Hindi for rural users

5. Which aspect of Kavita's project impressed the competition judges most?

a) She achieved the highest possible model accuracy (100%)

b) Her domain understanding — she talked to 15 farmers, included disease descriptions farmers understood, and added treatment cost estimates. Technical skill combined with real user insight.

c) She used the most expensive cloud GPU to train the model

d) She built the app entirely in JavaScript without Python

6. Anomaly detection with Isolation Forest is appropriate for UPI fraud detection because:

a) Isolation Forest can only process payment data in rupee amounts

b) Fraud transactions are rare and labelled data is scarce — Isolation Forest is an unsupervised method that learns normal transaction patterns and flags outliers without needing labelled fraud examples

c) It is the only algorithm approved by the RBI for payment systems

d) It processes transactions in real-time faster than neural networks

7. Why does the project checklist ask "Is there a real person who has this problem?"

a) Because AI models require human-labelled data from the exact target user group

b) Projects that solve imaginary problems (or problems in Silicon Valley, not India) waste time. Talking to the actual user reveals what features matter, what language to use, and what trust looks like — insights that change the whole design.

c) Real users are required to sign legal agreements before ML projects can begin

d) To satisfy school competition rules that require user interviews

8. India's AI opportunity compared to Silicon Valley is described as:

a) Building larger and more powerful foundation models to compete with GPT-4

b) Applying existing, proven AI techniques to 100 million-scale problems that matter to real people but are ignored by large companies because they don't generate enough profit

c) Copying successful American AI startups and translating their products to Hindi

d) Focusing only on AI for English-speaking urban Indians

← Lesson 9: Streamlit Lesson 11: Responsible AI →

AI for Real Indian Problems 🇮🇳

Class 10 Lesson 10 - AI for Real Indian Problems

🧪 Check Your Understanding — Lesson 10 Quiz