Priya interned at a Hyderabad startup building a Telugu-first OTT app. Their original "trending" feed showed everyone the same 20 films. Engagement was 4 minutes/day. After Priya built a personalised recommender, engagement grew to 19 minutes/day in 8 weeks.
She started with simple collaborative filtering, hit cold-start problems, then built a two-tower neural model that handled new users (cold-start) and scaled to 200K users + 8K films.
Popularity
Show the trending. No personalisation. The baseline. Beats nothing personalised.
Collaborative Filtering
"Users who liked X also liked Y." Works when you have ratings. Cold-start fails.
Matrix Factorisation
Decompose user-item matrix into latent factors. SVD/ALS/NMF. The classic.
Content-Based
Recommend items similar in features (genre, cast, language). Works for new items.
Two-Tower Neural
One tower encodes users, one encodes items. Both share an embedding space. Scales to billions.
Transformer-based
Sequence of past interactions โ next-item prediction. State of the art (SASRec, BERT4Rec).
import torch
import torch.nn as nn
import torch.nn.functional as F
class TwoTowerRecommender(nn.Module):
"""User tower + Item tower โ cosine similarity in shared 64-dim space."""
def __init__(self, num_users, num_items, num_genres, num_languages, dim=64):
super().__init__()
# User features: ID + age_bucket + city + watch_history (last 10 items)
self.user_id_emb = nn.Embedding(num_users, 32)
self.age_emb = nn.Embedding(8, 8) # 8 age buckets
self.city_emb = nn.Embedding(50, 16) # 50 cities
self.history_emb = nn.Embedding(num_items + 1, 32, padding_idx=0)
self.user_mlp = nn.Sequential(
nn.Linear(32 + 8 + 16 + 32, 128), nn.ReLU(),
nn.Dropout(0.2), nn.Linear(128, dim),
)
# Item features: ID + genre + language + release_year_bucket
self.item_id_emb = nn.Embedding(num_items, 32)
self.genre_emb = nn.Embedding(num_genres, 16)
self.lang_emb = nn.Embedding(num_languages, 8)
self.year_emb = nn.Embedding(10, 8)
self.item_mlp = nn.Sequential(
nn.Linear(32 + 16 + 8 + 8, 128), nn.ReLU(),
nn.Dropout(0.2), nn.Linear(128, dim),
)
def encode_user(self, user_id, age, city, history):
history_avg = self.history_emb(history).mean(dim=1) # avg of last 10 items
x = torch.cat([self.user_id_emb(user_id), self.age_emb(age),
self.city_emb(city), history_avg], dim=-1)
return F.normalize(self.user_mlp(x), dim=-1)
def encode_item(self, item_id, genre, lang, year):
x = torch.cat([self.item_id_emb(item_id), self.genre_emb(genre),
self.lang_emb(lang), self.year_emb(year)], dim=-1)
return F.normalize(self.item_mlp(x), dim=-1)
def forward(self, user_inputs, item_inputs):
u = self.encode_user(*user_inputs)
v = self.encode_item(*item_inputs)
return (u * v).sum(dim=-1) # cosine similarity (since both normalised)
def train_step(model, batch, num_negatives=4):
"""For each positive (user, item) pair, sample N random items as negatives."""
user_inputs = batch["user"] # (user_id, age, city, history)
pos_item = batch["pos_item"] # (item_id, genre, lang, year)
# Sample negatives uniformly (or proportional to popularity^0.75)
batch_size = pos_item[0].size(0)
neg_ids = torch.randint(0, num_items, (batch_size, num_negatives))
neg_items = (neg_ids,
item_genre_table[neg_ids],
item_lang_table[neg_ids],
item_year_table[neg_ids])
pos_score = model(user_inputs, pos_item) # (B,)
# Encode all negatives in one go
u = model.encode_user(*user_inputs).unsqueeze(1) # (B, 1, dim)
v_neg = model.encode_item(*neg_items) # (B, N, dim)
neg_scores = (u * v_neg).sum(dim=-1) # (B, N)
logits = torch.cat([pos_score.unsqueeze(1), neg_scores], dim=1) # (B, 1+N)
targets = torch.zeros(batch_size, dtype=torch.long, device=logits.device)
return F.cross_entropy(logits, targets)
At inference, encode all 8K films into a FAISS index once, then for each user encode the user vector and look up top-K nearest films in <5ms.
- Cold-start users: A new user has no history. Use age + city + onboarding genre selection. The two-tower model handles this naturally โ user_id_emb with random init combined with strong age/city/genre signals.
- Cold-start items: A new film has no views. Use content features (genre, language, cast, plot embedding). Re-train item embeddings nightly.
- Diversity: Top-K by score gives 10 nearly identical action films. Use Maximal Marginal Relevance (MMR) or determinantal point processes to enforce variety.
- Filter bubbles: Always inject 1โ2 random "exploration" items in every list of 10. Track engagement to learn from new patterns.
- Negative feedback: Track skips and "Not Interested" โ these are stronger signals than weak positives like watch-1-minute.
- Allow "show me everything chronologically" โ let users opt out of personalisation.
- Don't recommend extreme/harmful content even if engagement is high.
- Make the "Why am I seeing this?" explanation accessible (e.g., "Because you watched X").