Fang Sun, Paul Zhang, Pranav Subbaraman and Yizhou Sun
UCLA Computer Science Dept.
| Summary | RevMax is a comprehensive recommendation system assignment where students compete to design algorithms that maximize revenue through multi-iteration user-item interactions. Using the Sim4Rec simulation framework, students implement content-based, sequence-based, and graph-based recommenders in a realistic production environment where models continuously learn from user feedback. |
| Topics | Machine Learning, Recommender Systems, Content-Based Filtering, Collaborative Filtering, Sequential Pattern Mining, Graph Neural Networks, K-Nearest Neighbors, Logistic Regression, Decision Trees, Random Forest, Gradient Boosting, RNNs/LSTMs, Transformers, Graph Convolutional Networks, Link Prediction, Feature Engineering, Online Learning |
| Audience | Advanced undergraduate or graduate students in Data Mining, Machine Learning, or AI courses. Prerequisites include programming proficiency, basic machine learning concepts, and familiarity with Python/PySpark. |
| Difficulty | Medium to High difficulty. Students need 2-3 weeks per checkpoint (total 6-9 weeks for the full assignment). Each checkpoint requires implementing increasingly sophisticated algorithms, from basic content-based methods to advanced graph neural networks. |
| Strengths |
• Real-world relevance: Models revenue optimization used in industry recommendation systems • Comprehensive coverage: Integrates multiple data mining techniques in one cohesive project • Interactive learning: Multi-iteration environment provides immediate feedback • Competitive element: Leaderboard motivates students to improve algorithms • Scalable difficulty: Three checkpoints allow progressive skill building • Hands-on experience with production ML concepts (train-test splits, online learning, hyperparameter tuning) |
| Weaknesses |
• Computational requirements: Needs Java 17 and sufficient memory for Spark processing • Setup complexity: Multiple dependencies and frameworks to install • Time intensive: Full assignment requires significant time investment • Limited to synthetic data: Real-world recommendation challenges may differ • Focus on revenue may oversimplify real recommendation system objectives |
| Dependencies |
Prerequisites: Python programming, basic machine learning, linear algebra, probability/statistics Software: Python 3.8+, Java 17 (OpenJDK), Apache Spark, uv package manager Hardware: 8GB+ RAM recommended, multi-core processor for Spark Libraries: PySpark, NumPy, Pandas, Scikit-learn, Matplotlib, Sim4Rec framework |
| Variants | Instructors can customize the assignment by: (1) Adjusting the number of users/items and feature dimensions to control complexity, (2) Modifying evaluation metrics to emphasize different objectives (e.g., diversity, fairness), (3) Adding constraints like computational budgets or cold-start scenarios, (4) Incorporating real datasets from MovieLens or Amazon reviews, (5) Extending to multi-stakeholder scenarios with advertiser budgets, (6) Adding explainability requirements for recommendations. The modular design allows focusing on specific techniques or expanding to semester-long projects. |
Students should begin by: