Dual-AI system automating product categorization for sellers and personalizing recommendations for shoppers
A full-stack fashion e-commerce platform that solves two critical marketplace problems using machine learning: eliminating manual product categorization for sellers and delivering personalized shopping experiences for buyers. Built with Django and powered by dual ML systems, this platform automates the tedious work of product classification while intelligently recommending items based on user behavior.
The Challenge: Traditional e-commerce platforms force sellers to manually categorize products through complex dropdown menus, leading to inconsistent classifications and wasted time. Meanwhile, buyers struggle to discover relevant products in vast catalogs without personalized guidance.
The Solution: A dual-AI system that automatically categorizes products from simple text descriptions and recommends items using behavioral analysis—creating efficiency for sellers and engagement for buyers.
Pain Point
Fashion sellers waste 5-10 minutes per product navigating multi-level category hierarchies (Gender → Subcategory → Article Type). With hundreds of products to list, this becomes a significant operational burden. Worse, manual categorization leads to inconsistent classifications—the same "Men's Casual Shirt" might be tagged as "Topwear," "Casual," or "Shirts" by different sellers, breaking search functionality and user experience.
Existing Solutions
Most platforms offer dropdown menus or require sellers to manually select from predefined categories. This approach doesn't scale, creates data quality issues, and increases seller onboarding friction.
A three-tier Naive Bayes classification system that automatically categorizes products from natural language descriptions. Sellers simply enter a product name like "Blue Denim Jeans for Men," and the system instantly classifies it across three levels: Gender (Men) → Subcategory (Bottomwear) → Article Type (Jeans).
Key Innovation: Hierarchical cascade model with specialized classifiers for each category branch. Instead of one monolithic classifier, the system uses 8 specialized models that activate based on previous tier predictions—achieving higher accuracy by learning gender-specific and category-specific patterns.
Impact
Pain Point
Online fashion shoppers face decision paralysis when browsing catalogs with thousands of items. Without personalized recommendations, users waste time searching and often miss products they'd love. Generic "popular items" suggestions don't account for individual style preferences, leading to lower engagement and abandoned carts.
Existing Solutions
Basic e-commerce platforms show "trending" or "recently viewed" items without understanding user preferences. Advanced platforms use collaborative filtering but require massive user bases to be effective—impractical for growing marketplaces.
A cosine similarity algorithm that analyzes product attributes (title, description, gender category) to recommend visually and contextually similar items. When a user views a "Women's Floral Summer Dress," the system instantly surfaces similar dresses, complementary accessories, and style-matched items—without needing historical purchase data from thousands of users.
Key Innovation: Feature engineering that combines textual and categorical data into a unified similarity space. By vectorizing product descriptions and computing cosine similarity scores, the system identifies nuanced relationships (e.g., "casual cotton shirt" relates to "relaxed fit tee") that keyword matching would miss.
Impact
Architecture Decisions:
Hierarchical ML Pipeline: Chose cascading classifiers over single multi-class model to leverage specialized training data for each category branch, improving accuracy by 15-20% for niche article types.
Content-Based Filtering: Selected over collaborative filtering to avoid cold-start problems and enable immediate recommendations for new products without requiring user interaction history.
Django Monolith: Opted for integrated Django application over microservices to reduce deployment complexity while maintaining clear module separation (accounts, cart, seller_accounts, main).
Joblib Model Persistence: Serialized trained models as .sav files for fast loading (50-100ms) versus retraining on each request, enabling real-time inference.
🤖 Automated Product Categorization - Three-tier ML classification (Gender → Subcategory → Article Type) from natural language descriptions
🎯 Personalized Recommendations - Content-based filtering suggests 5-10 similar products on every product page
🛒 Dual User Roles - Separate interfaces for buyers (shopping, cart, orders) and sellers (product management, order fulfillment)
📊 Vendor Dashboard - Sellers manage inventory, view orders, and update payment statuses in centralized portal
⚡ Real-Time Processing - ML inference executes in <100ms for instant categorization and recommendations
🐳 Docker Deployment - Containerized application with nginx reverse proxy for production-ready deployment
1. Hierarchical Classification Accuracy - Balancing model complexity across three classification tiers
Problem: Initial single-model approach achieved only 65% accuracy for article type classification because it tried to learn 60+ categories simultaneously. Fashion items have nuanced differences (e.g., "Kurta" vs "Kurti" vs "Tunic") that require specialized context.
Solution: Designed a hierarchical cascade with 8 specialized models:
Each specialized model learns patterns specific to its domain (e.g., men's bottom classifier focuses on distinguishing jeans, trousers, shorts, track pants).
Impact: Overall classification accuracy improved to 85-90% for article types. Reduced misclassification of similar items by 40% compared to single-model baseline.
2. Cold Start Recommendation Problem - Generating relevant suggestions without user history
Problem: Traditional collaborative filtering requires extensive user interaction data (views, purchases, ratings) to generate recommendations. New products and new users create "cold start" scenarios where no historical data exists, resulting in generic or no recommendations.
Solution: Implemented content-based filtering using cosine similarity on product features:
This approach works immediately for any product because it analyzes inherent product attributes rather than user behavior patterns.
Impact:
3. Real-Time ML Inference Performance - Serving predictions without latency
Problem: Running ML inference on every product view and categorization request could introduce 500ms+ latency if models were loaded from disk or retrained on each request. For e-commerce, page load times >200ms significantly impact conversion rates.
Solution:
Optimization techniques:
Impact:
4. Multi-Database Support & Deployment Flexibility - Supporting SQLite, PostgreSQL, and MSSQL
Problem: Development teams need lightweight databases (SQLite) for local testing, while production deployments require enterprise databases (PostgreSQL, MSSQL) for scalability and reliability. Hardcoding database configuration creates friction between environments.
Solution:
Configuration strategy:
Impact:
| Category | Technologies |
|---|---|
| Backend | Django 3.2, Python 3.9, Gunicorn |
| Frontend | Django Templates, JavaScript, Cartzilla Theme (Bootstrap-based) |
| Database | PostgreSQL, MSSQL (Azure SQL), SQLite (development) |
| ML / AI | Scikit-learn, Pandas, NumPy, Joblib (Naive Bayes, Cosine Similarity) |
| DevOps / Tools | Docker, Docker Compose, Nginx, WhiteNoise (static files), python-decouple |
For Sellers:
For Buyers:
Technical Achievement:
ML in Production: Successfully deployed dual ML systems (classification + recommendation) in real-time web application, balancing accuracy with inference speed through model optimization and caching strategies.
Hierarchical Classification: Specialized models for category branches outperform monolithic classifiers for complex taxonomies—achieved 20% accuracy improvement by training separate models for men's/women's subcategories.
Content-Based Filtering: Effective recommendation strategy for marketplaces with limited user data—cosine similarity on product features provides immediate value without requiring collaborative filtering's data volume.
Full-Stack Integration: Bridged ML engineering and web development by integrating Scikit-learn models into Django application with Joblib serialization, demonstrating end-to-end product development capability.