A complete look at my project work across machine learning, NLP, data infrastructure, and analytics.
Customers weren't being treated differently based on how they actually behaved, so I built a segmentation system that changed that. Using RFM clustering on large-scale transaction logs, I identified a 76% drop-off in a high-value segment that had never been spotted, and built predictive valuation models that put a $5.5M number on the addressable opportunity.
Built a post-campaign analysis framework to understand why some promotions worked and others didn't. Used Apriori-based association rule mining to surface cross-department purchase patterns, confirming an 11–32% performance range across campaigns and a 26% feature attachment rate that informed the next round of targeting strategy.
Built a real-time content monitoring pipeline to flag unsafe YouTube content at scale. The system ingests network traffic under 250ms, parses content signals with custom Python ETL scripts, and routes flagged records into an Elasticsearch index for fast downstream querying. Think of it as a safety filter that never sleeps.
A retail chain running 14 branches had no single view of what was happening across the business. I designed an executive analytics layer in AWS QuickSight - built on reusable SQL KPI models — that surfaced live AOV, cancellation rates, and fulfillment ratios in one place. Leadership could finally see the whole picture.
When the same person shows up as five different records, your analytics lie to you. I built an automated entity linkage system using custom blocking schemas and string distance metrics to collapse ~9M records into clean, verified identity clusters - achieving 98% accuracy while keeping compute costs manageable.
Seasonal spikes in retail are predictable — until they're not. I engineered an end-to-end forecasting pipeline using LightGBM and structural time-series models to predict daily product demand across regional distribution hubs. The model integrated historical transaction data with live promotional signals to stay accurate even during demand outliers.
Designed a real-time personalization framework that scores a customer's likelihood to buy — and what to recommend next — at the moment of checkout. The system uses collaborative filtering on historical basket data and live clickstream signals to surface relevant cross-sells. Deployed at checkout where it matters most.
An interactive browser tool that reviews code snippets using the Claude API; scoring readability, structure, and maintainability, surfacing three specific improvements, one positive note, and a critical flag if a real bug is detected. Built as part of the Careem Forward Deployment Engineer AI challenge.