Lesson 8 — FastAPI: Serving Your ML Model | Class 10

Meet Sneha — Class 10, Vadodara

Sneha had trained a model in Lesson 7 that predicted whether a loan application would default. Accuracy: 87%. She was proud of it. But then her friend Rohan asked: "Can I use your model from my phone app?" She sent him the .pkl file and Python code. He didn't know Python. Dead end.

Her teacher explained: "A trained model living in a Jupyter notebook is like a doctor who only sees patients at their own home. To be useful, they need a clinic — a fixed address where anyone can walk in with their questions and receive answers. For ML models, that clinic is called a REST API. FastAPI lets you build one in under 50 lines of Python."

Concept

What is a REST API?

REST (Representational State Transfer) is a standard way for different programs to communicate over the internet. An API (Application Programming Interface) is the "front desk" of your ML model:

📱 Client (Phone App / Website / Python Script)

→ POST /predict (JSON)

⚡ FastAPI Server

→

🧠 Loaded ML Pipeline

← JSON Response

📊 {"prediction": 0, "probability": 0.12}

FastAPI is the most popular Python web framework for ML APIs because it is fast, automatically generates interactive documentation, and uses Pydantic for input validation — catching bad data before it reaches your model.

Pydantic is Python's data validation library. When you define a Pydantic model with income: float, FastAPI will automatically return a clear error if a client sends "income": "banana" — before your ML model ever sees the bad input.

Full Code

Loan Default Prediction API

# ── FILE 1: train_and_save.py ──
# Run this first to create the model file
import pickle, numpy as np, pandas as pd
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score

X, y = make_classification(
    n_samples=1000, n_features=5, n_informative=4, random_state=42
)
feature_names = ["income", "loan_amount", "credit_score", "employment_years", "existing_loans"]
X_df = pd.DataFrame(X, columns=feature_names)
X_train, X_test, y_train, y_test = train_test_split(X_df, y, test_size=0.2, random_state=42)

pipeline = Pipeline([
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler',  StandardScaler()),
    ('model',   GradientBoostingClassifier(n_estimators=100, random_state=42))
])
pipeline.fit(X_train, y_train)
print(f"Test accuracy: {accuracy_score(y_test, pipeline.predict(X_test)):.3f}")

with open("loan_model.pkl", "wb") as f:
    pickle.dump(pipeline, f)
print("Model saved to loan_model.pkl")

# ── FILE 2: main.py ── (the FastAPI application)
# Run with: uvicorn main:app --reload
import pickle
import numpy as np
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel, Field, field_validator

# ── Load model at startup (not on each request) ──
with open("loan_model.pkl", "rb") as f:
    pipeline = pickle.load(f)

app = FastAPI(
    title="Loan Default Prediction API",
    description="Predict probability of loan default. Built with FastAPI + scikit-learn.",
    version="1.0.0"
)

# ── Input schema: Pydantic validates every incoming request ──
class LoanApplication(BaseModel):
    income: float            = Field(..., gt=0,  description="Monthly income in thousands")
    loan_amount: float       = Field(..., gt=0,  description="Loan amount in thousands")
    credit_score: float      = Field(..., ge=300, le=900, description="CIBIL score 300-900")
    employment_years: float  = Field(..., ge=0,  description="Years of employment")
    existing_loans: float    = Field(..., ge=0,  description="Number of existing loans")

    @field_validator('credit_score')
    @classmethod
    def score_must_be_valid(cls, v):
        if v < 300 or v > 900:
            raise ValueError("CIBIL credit score must be between 300 and 900")
        return v

    class Config:
        json_schema_extra = {
            "example": {
                "income": 45.0,
                "loan_amount": 200.0,
                "credit_score": 720.0,
                "employment_years": 3.5,
                "existing_loans": 1.0
            }
        }

# ── Output schema ──
class PredictionResponse(BaseModel):
    prediction: int               # 0 = no default, 1 = default
    prediction_label: str         # human-readable
    default_probability: float    # 0.0 – 1.0
    risk_level: str               # Low / Medium / High

# ── Endpoints ──
@app.get("/")
def root():
    return {"message": "Loan Default Prediction API is running",
            "docs": "/docs", "version": "1.0.0"}

@app.get("/health")
def health():
    return {"status": "ok", "model_loaded": pipeline is not None}

@app.post("/predict", response_model=PredictionResponse)
def predict(application: LoanApplication):
    """
    Predict whether a loan application will default.
    Returns prediction (0/1), probability, and risk level.
    """
    try:
        features = [[
            application.income,
            application.loan_amount,
            application.credit_score,
            application.employment_years,
            application.existing_loans
        ]]
        prediction  = int(pipeline.predict(features)[0])
        probability = float(pipeline.predict_proba(features)[0][1])

        if probability < 0.25:
            risk = "Low"
        elif probability < 0.60:
            risk = "Medium"
        else:
            risk = "High"

        return PredictionResponse(
            prediction=prediction,
            prediction_label="Default" if prediction == 1 else "No Default",
            default_probability=round(probability, 4),
            risk_level=risk
        )
    except Exception as e:
        raise HTTPException(status_code=500, detail=f"Prediction error: {str(e)}")

# ── FILE 3: test_api.py ── (test the running API)
import requests

BASE = "http://127.0.0.1:8000"

# Test 1: Health check
resp = requests.get(f"{BASE}/health")
print("Health:", resp.json())

# Test 2: Low-risk applicant
low_risk = {
    "income": 80.0,
    "loan_amount": 150.0,
    "credit_score": 800.0,
    "employment_years": 10.0,
    "existing_loans": 0.0
}
resp = requests.post(f"{BASE}/predict", json=low_risk)
print("Low-risk:", resp.json())
# Expected: {"prediction": 0, "prediction_label": "No Default",
#             "default_probability": 0.04, "risk_level": "Low"}

# Test 3: High-risk applicant
high_risk = {
    "income": 15.0,
    "loan_amount": 500.0,
    "credit_score": 380.0,
    "employment_years": 0.5,
    "existing_loans": 4.0
}
resp = requests.post(f"{BASE}/predict", json=high_risk)
print("High-risk:", resp.json())
# Expected: {"prediction": 1, "prediction_label": "Default",
#             "default_probability": 0.87, "risk_level": "High"}

# Test 4: Invalid input (Pydantic validation)
bad_input = {
    "income": -100,      # negative — Field gt=0 will reject
    "loan_amount": 200.0,
    "credit_score": 720.0,
    "employment_years": 3.0,
    "existing_loans": 1.0
}
resp = requests.post(f"{BASE}/predict", json=bad_input)
print("Bad input response:", resp.status_code, resp.json()["detail"][0]["msg"])

Automatic Swagger docs: When your server is running, go to http://127.0.0.1:8000/docs in your browser. FastAPI auto-generates an interactive UI where you can test your endpoints without writing any test code. Your model has a professional API with zero extra work.

To run in Google Colab: !pip install fastapi uvicorn[standard] pyngrok requests → start uvicorn as a background process → use pyngrok to expose port 8000 as a public URL. Then share the ngrok URL with classmates who can call your API from their phones!

🧪 Check Your Understanding — Lesson 8 Quiz

1. A REST API for an ML model is needed because:

a) Trained models cannot make predictions without an API

b) It provides a language-agnostic interface — any app (mobile, web, Python, Java) can call your model over HTTP without needing a Python environment

c) REST APIs make models more accurate by optimising inference

d) FastAPI retrains your model automatically on new incoming data

2. Pydantic's `Field(..., gt=0)` on `income` means:

a) Income must be a string greater than zero characters

b) Income must be a number strictly greater than zero — FastAPI returns a 422 validation error automatically if the client sends a negative or zero value

c) The field is optional and defaults to zero

d) Income values greater than zero are rounded to the nearest integer

3. Why is the model loaded ONCE at startup (outside the endpoint function) rather than inside `/predict`?

a) FastAPI requires models to be global variables

b) Loading a model reads a file from disk and can take seconds — doing it on every prediction would make the API very slow. Loading once means predictions use the already-loaded model in memory.

c) pickle files cannot be opened inside Python functions

d) It prevents the model file from being updated while the server is running

4. `@app.post("/predict")` versus `@app.get("/predict")` — why POST for prediction?

a) POST requests are processed faster than GET requests

b) POST requests carry a body (JSON payload with features) — GET requests pass data in the URL which is limited in size and exposes sensitive loan data in browser history and server logs

c) FastAPI requires POST for all machine learning endpoints

d) GET requests do not return JSON responses

5. `HTTPException(status_code=500, detail=...)` is raised in the prediction endpoint to:

a) Crash the server so the user knows something went wrong

b) Return a structured error response with HTTP status 500 (Internal Server Error) when an unexpected exception occurs — the client receives a JSON error message instead of seeing Python tracebacks

c) Log the error to MLflow automatically

d) Retry the prediction automatically with different inputs

6. What does `uvicorn main:app --reload` do?

a) Starts a new Python process that reloads the page every second

b) Starts the FastAPI application using the Uvicorn ASGI server, watching for file changes and auto-restarting — ideal for development so you don't need to manually restart after each code change

c) Retrains the ML model whenever new data is added to main.py

d) Runs all test files in the project folder automatically

7. FastAPI's automatic Swagger UI at `/docs` is generated from:

a) A separate documentation file you must write manually

b) The Pydantic models, endpoint decorators, and docstrings in your Python code — FastAPI introspects these to auto-generate an interactive OpenAPI specification with zero extra configuration

c) Templates downloaded from the FastAPI website on first run

d) A JavaScript file that must be placed in the same folder as main.py

8. `pipeline.predict_proba(features)[0][1]` returns:

a) The predicted class label (0 or 1)

b) The probability that the sample belongs to class 1 (default) — the second column of the probability matrix for the first (and only) sample in our request

c) The feature importance for the most predictive variable

d) The index of the most important feature in the pipeline

← Lesson 7: ML Pipelines Lesson 9: Streamlit →

FastAPI: Serving Your ML Model ⚡

Class 10 Lesson 8 - FastAPI: Serving Your ML Model

🧪 Check Your Understanding — Lesson 8 Quiz