Lesson 5 — MLOps: Dockerising Your ML Model | Class 11

Story

Siddharth's "Works on My Machine" Crisis

Siddharth, 16, from Mumbai had built a sentiment analysis API using FastAPI and a fine-tuned DistilBERT model. It worked perfectly on his laptop — Python 3.10, transformers 4.36, a specific PyTorch version, and a dozen other libraries he'd installed over months.

When he deployed it to a friend's server — a different Ubuntu version, different Python — nothing worked. "ModuleNotFoundError." "CUDA version mismatch." "NumPy incompatible with scikit-learn." Two hours of debugging later, he had learned the most important lesson in software engineering: environment is everything.

"Docker solves this," his mentor said. "You package the code and everything it needs into a container. Then it runs identically everywhere — your laptop, a cloud VM, or a Kubernetes cluster." Siddharth dockerised his API in 20 minutes. It's been running in production ever since.

Section 1

Why Docker: The Reproducibility Problem

A trained ML model is not just a .pkl file. It's a system: a specific Python version, library versions, system libraries, CUDA drivers, and model weights. Docker captures all of this in an image — a layered, immutable snapshot of the entire environment.

Image: Blueprint (like a class in Python). Read-only. Built from a Dockerfile.
Container: Running instance of an image (like an object). Isolated process.
Registry: Storage for images. Docker Hub, GitHub Container Registry, AWS ECR.
Layer caching: Each Dockerfile instruction creates a layer. Unchanged layers are cached — rebuilds are fast.

Key mental model: A Docker container is a lightweight VM without the OS kernel overhead. It shares the host kernel but has its own filesystem, network, and processes. You get isolation without the 1GB+ overhead of a full virtual machine.

Section 2

Dockerfile for FastAPI + ML Model

A Dockerfile is a recipe that builds your image layer by layer. Each instruction creates one layer. Order matters — put things that change rarely (Python install, pip dependencies) before things that change often (your code).

sentiment-api/ ├── Dockerfile ├── docker-compose.yml ├── .dockerignore ├── requirements.txt ├── app/ │ ├── main.py # FastAPI app │ └── predict.py # model loading + inference └── model/ └── distilbert-sentiment/ # fine-tuned weights

# ── Dockerfile ───────────────────────────────────────────────────
# 1. Base image — use official Python slim (smallest that works)
FROM python:3.11-slim

# 2. Set working directory inside container
WORKDIR /app

# 3. Install system dependencies (kept separate for layer caching)
RUN apt-get update && apt-get install -y \
    build-essential \
    && rm -rf /var/lib/apt/lists/*

# 4. Copy requirements FIRST (changes rarely → cached until requirements change)
COPY requirements.txt .

# 5. Install Python dependencies
RUN pip install --no-cache-dir -r requirements.txt

# 6. Copy application code (changes frequently → placed AFTER pip install)
COPY app/ ./app/
COPY model/ ./model/

# 7. Expose the port the app listens on
EXPOSE 8000

# 8. Set environment variables
ENV PYTHONUNBUFFERED=1
ENV MODEL_PATH=/app/model/distilbert-sentiment

# 9. Run the FastAPI app with uvicorn
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

# ── .dockerignore — do NOT copy these into the image ─────────────
__pycache__/
*.py[cod]
.git/
.github/
*.ipynb
*.ipynb_checkpoints/
venv/
.env
.env.*
tests/
*.log
# Never commit secrets — use Docker secrets or env vars at runtime

Command	What it does
docker build -t sentiment-api .	Build image named "sentiment-api" from Dockerfile in current dir
docker run -p 8000:8000 sentiment-api	Run container, map host port 8000 → container port 8000
docker run -d --name my-api sentiment-api	Run detached (background) with a name
docker logs my-api	View container stdout/stderr
docker exec -it my-api bash	Open shell inside running container
docker stop my-api && docker rm my-api	Stop and delete container
docker images	List all local images
docker push yourdockerhub/sentiment-api:v1	Push image to Docker Hub registry

Section 3

docker-compose: Multi-Service Setup

Real ML APIs often need more than one service — the ML API, a Redis cache for storing predictions, maybe a monitoring sidecar. docker-compose orchestrates multiple containers with a single docker compose up command.

# ── docker-compose.yml ───────────────────────────────────────────
version: "3.9"

services:
  api:
    build: .                         # build from local Dockerfile
    ports:
      - "8000:8000"
    environment:
      - REDIS_URL=redis://cache:6379 # use service name "cache" as hostname
      - MODEL_PATH=/app/model/distilbert-sentiment
    depends_on:
      - cache                        # wait for Redis before starting
    volumes:
      - ./model:/app/model:ro        # mount model weights read-only (no copy in image)
    restart: unless-stopped

  cache:
    image: redis:7-alpine            # official Redis image, no custom Dockerfile needed
    ports:
      - "6379:6379"
    volumes:
      - redis_data:/data             # persist Redis data across restarts
    restart: unless-stopped

volumes:
  redis_data:                        # named volume managed by Docker

# ── app/predict.py — Redis caching for predictions ───────────────
import redis, json, hashlib
from transformers import pipeline

model = pipeline("sentiment-analysis",
                 model="./model/distilbert-sentiment")
r = redis.Redis.from_url("redis://cache:6379", decode_responses=True)

def predict_with_cache(text: str) -> dict:
    # Hash the input text as cache key
    key = "pred:" + hashlib.sha256(text.encode()).hexdigest()

    # Return cached result if available
    cached = r.get(key)
    if cached:
        return json.loads(cached)

    # Run inference and cache for 1 hour (3600 seconds)
    result = model(text)[0]
    r.setex(key, 3600, json.dumps(result))
    return result

Caching inference results is one of the most impactful performance wins in ML APIs. Repeated identical queries (common in production) return in <1ms from Redis instead of 100–500ms from the model. For high-traffic endpoints, this can reduce GPU costs by 60–80%.

Section 4

GitHub Actions CI Pipeline

Continuous Integration (CI) automatically runs tests and builds on every push to GitHub. This catches regressions before they reach production. Here's a complete pipeline that tests your API, builds the Docker image, and pushes to Docker Hub.

# ── .github/workflows/ci.yml ─────────────────────────────────────
name: ML API CI

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

env:
  REGISTRY: docker.io
  IMAGE_NAME: ${{ github.repository }}   # yourusername/sentiment-api

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.11"
          cache: "pip"

      - name: Install dependencies
        run: pip install -r requirements.txt pytest httpx

      - name: Run tests
        run: pytest tests/ -v --tb=short

  build-and-push:
    needs: test           # only runs if test job passes
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'   # only on main branch

    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Log in to Docker Hub
        uses: docker/login-action@v3
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}   # stored as GitHub secret

      - name: Extract Docker metadata (tags, labels)
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: ${{ env.IMAGE_NAME }}
          tags: |
            type=sha,prefix=sha-         # git SHA for traceability
            type=raw,value=latest        # "latest" tag on main

      - name: Build and push Docker image
        uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=gha            # GitHub Actions build cache
          cache-to: type=gha,mode=max     # fast rebuilds

Security rule: NEVER put passwords, API keys, or Docker Hub tokens in your code or Dockerfile. Use GitHub Secrets (Settings → Secrets and variables → Actions). The CI pipeline references them as ${{ secrets.DOCKERHUB_TOKEN }} — they are never exposed in logs.

🐳 Lesson 5 Quiz — MLOps: Docker and CI

1. In a Dockerfile, requirements.txt is copied and pip-installed BEFORE the application code is copied. This ordering:

a) Is required by Docker — pip install cannot run if application code is present first

b) Exploits Docker's layer caching: the pip install layer is only invalidated when requirements.txt changes. Application code changes frequently but dependencies change rarely — so most rebuilds skip the expensive pip install and reuse the cached layer, making builds much faster.

c) Prevents the application code from interfering with the Python package namespace

d) Makes the final image smaller by allowing Docker to compress dependency layers separately

2. .dockerignore lists files like __pycache__/, .git/, and .env that should NOT be copied into the image. The primary security reason for excluding .env is:

a) .env files cause syntax errors in Dockerfile COPY instructions

b) .env files contain secrets (API keys, database passwords). If they are copied into the image and the image is pushed to a public registry, those secrets become permanently public. Anyone who pulls your image can extract them with docker inspect or by running the container. Always inject secrets at runtime via environment variables.

c) Docker cannot parse the KEY=VALUE format used in .env files

d) .env files cause the container to fail to start on Linux systems

3. docker run -p 8000:8000 maps host port 8000 to container port 8000. If you use -p 9000:8000 instead:

a) The application crashes because it's configured to listen on port 8000

b) The container still listens internally on port 8000 (unchanged — this is what EXPOSE declares and what uvicorn is configured for). The host machine maps its port 9000 to the container's 8000. You access the API at localhost:9000 from your host, but inside the container it's still port 8000.

c) Port 9000 on the host is now blocked by Docker and cannot be used by other services

d) Both port 8000 and 9000 on the host will route to the container simultaneously

4. In docker-compose.yml, the api service has depends_on: cache. This means:

a) The api service will wait indefinitely until Redis is fully accepting connections

b) Docker will start the cache (Redis) container before starting the api container — but only waits for the process to start, not for Redis to be ready to accept connections. For production, you also need a health check and retry logic in the application code to handle the brief startup window.

c) If the cache container stops, Docker automatically stops the api container too

d) The api service shares the same network namespace as the cache service

5. Redis caching in the prediction endpoint uses SHA-256 of the input text as the cache key. Using the full input text directly as the key would be problematic because:

a) Redis does not support string keys — only integer keys are allowed

b) Long texts (e.g., 10,000 character documents) would create very large cache keys, wasting Redis memory and increasing network overhead. SHA-256 hashes are always exactly 64 characters regardless of input length, are fast to compute, and have negligible collision probability for practical input sizes.

c) The text might contain special characters that break Redis string encoding

d) Direct text keys would expose user data in Redis, violating GDPR regulations

6. The GitHub Actions CI pipeline uses needs: test for the build-and-push job. The purpose is:

a) The build job downloads the test artifacts as input for the Docker build

b) It creates a dependency: build-and-push only runs if the test job completes successfully. This prevents a broken (failing-tests) image from ever being pushed to Docker Hub and potentially deployed to production — tests act as a quality gate before every image publish.

c) The test and build jobs must run on the same virtual machine for consistency

d) GitHub Actions requires needs: declarations for all jobs running in parallel

7. The CMD instruction in a Dockerfile runs uvicorn with --host 0.0.0.0. Why not --host 127.0.0.1?

a) 0.0.0.0 is required by Docker — 127.0.0.1 causes the container to fail

b) 127.0.0.1 (localhost) inside the container is the container's own loopback — it's unreachable from outside the container. 0.0.0.0 means "listen on all interfaces" — the Docker bridge network can forward host traffic to it. Using 127.0.0.1 would make your API unreachable even with port mapping.

c) 127.0.0.1 is a security vulnerability — 0.0.0.0 restricts access to container-internal traffic only

d) 0.0.0.0 enables IPv6 support which 127.0.0.1 does not provide

8. Model weights are mounted as a volume (./model:/app/model:ro) rather than copied into the image with COPY. The primary advantage for an ML team is:

a) Volumes allow the model to train itself during container runtime

b) ML model weights are often several GB. Copying them into the image makes every image that size — slow to push, pull, and rebuild. Mounting as a volume separates model lifecycle from code lifecycle: you can update model weights without rebuilding the Docker image, and the same image can serve different model versions by changing the mount path.

c) Docker images cannot exceed 2GB, so large models must be mounted externally

d) The :ro (read-only) flag is only available for volumes, not for COPY instructions

← Lesson 4: Deep RL and PPO Lesson 6: Production Monitoring →

MLOps: Dockerising Your ML Model 🐳

Class 11 Lesson 5 - MLOps: Dockerising Your ML Model

🐳 Lesson 5 Quiz — MLOps: Docker and CI