Preethi, 16, from Coimbatore had built a loan approval model for a class project โ a gradient-boosted classifier trained on 2022 credit data, 91% accuracy on the test set. She "deployed" it (as a FastAPI on a free tier cloud service) and thought she was done.
Three months later, a friend pointed out the model was approving nearly everyone. Accuracy had quietly dropped to 61% โ just above random chance. The problem? India's credit environment had shifted. New income patterns post-COVID, new UPI transaction behaviours, new borrower demographics. The 2022 data no longer reflected 2024 reality.
"This is data drift," her teacher explained. "Your model was correct when deployed. The world changed. Every production model needs a monitoring system that alerts you before users notice." Preethi spent a day setting up Evidently AI. Her next model has never gone stale.
| Drift Type | What Changes | Loan Model Example | Detection Method |
|---|---|---|---|
| Data Drift | Input feature distribution P(X) changes | Borrower income distribution shifts post-COVID โ typical applicant income was โน30k/month in 2022, now โน45k/month | PSI, KS test, chi-squared |
| Concept Drift | Relationship P(Y|X) changes โ same features, different correct output | A UPI score of 750 was "excellent" in 2022 but is now "average" because scoring criteria changed | Prediction accuracy monitoring |
| Label Drift | Output label distribution P(Y) changes | Default rate in the population changed from 8% to 14% | Monitor prediction distribution |
| Upstream Drift | A data pipeline changes, altering feature values | Bureau changed how they calculate CIBIL score | Feature statistics monitoring |
# pip install evidently pandas scikit-learn
import pandas as pd
from evidently.report import Report
from evidently.metric_preset import DataDriftPreset, ClassificationPreset
# โโ Load reference (training) data and current (production) data โโ
reference = pd.read_csv("train_data_2022.csv") # 10,000 rows
current = pd.read_csv("production_data_2024.csv") # last 30 days
# โโ 1. Data Drift Report โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
data_drift_report = Report(metrics=[DataDriftPreset()])
data_drift_report.run(reference_data=reference, current_data=current)
data_drift_report.save_html("drift_report.html") # open in browser
# โโ 2. Model Performance Report (needs ground truth labels) โโโโโโโ
# For loan approvals: labels available after 30-day default window
perf_report = Report(metrics=[ClassificationPreset()])
perf_report.run(
reference_data=reference.assign(prediction=reference['approved']),
current_data=current.assign(prediction=current['model_output'])
)
perf_report.save_html("performance_report.html")
# โโ 3. Programmatic drift detection for alerting โโโโโโโโโโโโโโโโโ
from evidently.test_suite import TestSuite
from evidently.tests import TestNumberOfDriftedColumns
suite = TestSuite(tests=[
TestNumberOfDriftedColumns(lt=3), # alert if more than 3 columns drift
])
suite.run(reference_data=reference, current_data=current)
result = suite.as_dict()
if not result['summary']['all_passed']:
print("ALERT: Significant data drift detected!")
print(result['summary']['failed'])
Population Stability Index (PSI) โ for continuous features
PSI = ฮฃ (P_current - P_reference) * ln(P_current / P_reference) PSI < 0.10: No significant drift PSI 0.10โ0.25: Moderate drift โ investigate PSI > 0.25: Significant drift โ retrain requiredimport numpy as np
from scipy import stats
def psi(reference, current, n_bins=10):
"""Population Stability Index for continuous features."""
bins = np.percentile(reference, np.linspace(0, 100, n_bins + 1))
bins[0] = -np.inf; bins[-1] = np.inf
ref_counts, _ = np.histogram(reference, bins=bins)
cur_counts, _ = np.histogram(current, bins=bins)
ref_pct = ref_counts / len(reference)
cur_pct = cur_counts / len(current)
# Avoid division by zero / log(0)
ref_pct = np.where(ref_pct == 0, 1e-6, ref_pct)
cur_pct = np.where(cur_pct == 0, 1e-6, cur_pct)
return np.sum((cur_pct - ref_pct) * np.log(cur_pct / ref_pct))
# โโ Kolmogorov-Smirnov test โ for continuous features โโโโโโโโโโโโ
def ks_drift_test(reference, current, significance=0.05):
"""KS test: p-value < significance โ drift detected."""
stat, p_value = stats.ks_2samp(reference, current)
drift = p_value < significance
return {"ks_statistic": stat, "p_value": p_value, "drift": drift}
# โโ Chi-squared test โ for categorical features โโโโโโโโโโโโโโโโโโโ
def chi2_drift_test(reference_counts, current_counts, significance=0.05):
"""Chi-squared test for categorical column drift."""
stat, p_value = stats.chisquare(current_counts,
f_exp=reference_counts * sum(current_counts) / sum(reference_counts))
return {"chi2_statistic": stat, "p_value": p_value, "drift": p_value < significance}
# Example usage:
income_ref = reference['monthly_income'].values
income_cur = current['monthly_income'].values
print(f"PSI (income): {psi(income_ref, income_cur):.4f}")
print(ks_drift_test(income_ref, income_cur))
Two strategies for deciding when to retrain:
- Scheduled retraining: Retrain every N days regardless of drift. Simple but wasteful โ you retrain even when the model is fine.
- Metric-based retraining: Retrain when PSI > 0.25 OR accuracy drops below threshold. Efficient but requires monitoring infrastructure.
# โโ A/B test two model versions in production โโโโโโโโโโโโโโโโโโโโ
import random
def route_request(user_id: str, request_data: dict) -> dict:
"""Route 10% of traffic to new model v2 for comparison."""
# Stable hash ensures same user always gets same version
use_v2 = (hash(user_id) % 100) < 10 # 10% to v2
if use_v2:
result = model_v2.predict(request_data)
result['model_version'] = 'v2'
else:
result = model_v1.predict(request_data)
result['model_version'] = 'v1'
# Log everything for later analysis
log_prediction(user_id, request_data, result)
return result
# After 30 days (once ground truth labels are available):
# Compare v1 vs v2 accuracy, false positive rate, and PSI of their inputs
# If v2 wins on all metrics with statistical significance โ promote to 100%
# โโ Complete monitoring pipeline (runs daily via cron/Airflow) โโโ
def daily_monitoring_check():
ref_data = load_training_data() # stored in your data warehouse
prod_data = load_last_7_days_production()
# 1. Check data drift
drift_results = {}
for col in MONITORED_FEATURES:
drift_results[col] = ks_drift_test(ref_data[col], prod_data[col])
drifted_cols = [c for c,r in drift_results.items() if r['drift']]
# 2. Check prediction distribution
ref_approval_rate = ref_data['approved'].mean()
cur_approval_rate = prod_data['model_prediction'].mean()
# 3. Alert if necessary
if len(drifted_cols) > 3:
send_alert(f"DATA DRIFT: {drifted_cols} โ consider retraining")
if abs(cur_approval_rate - ref_approval_rate) > 0.15:
send_alert(f"PREDICTION SHIFT: approval rate {cur_approval_rate:.1%} vs baseline {ref_approval_rate:.1%}")
return {"status": "ok", "drifted_columns": drifted_cols}