Implementing a Recommender system using
MapReduce
Summary |
This assignment combines
two prominent machine learning / big data technologies: MapReduce and recommendation. Nearly 20 years ago, as Google scaled, they
created several new technologies.
Foundational amongst those technologies are the resilient Google File
System (GFS) and a computing paradigm known as MapReduce. Google published their work in a few highly
influential papers. Researchers,
inspired by these descriptions, created open source versions, which
eventually became the big data platform now known as Hadoop. Yelp create a Python based library to write
MapReduce programs known as mrjob. Based
on user preferences, recommender systems produce an ordered set of
recommendations. In this series of
exercises, students gain hands on experience with how user-based, item-based,
and content based recommender systems work.
Modeling the core computation in a spreadsheet helps convey the
essence of these algorithms. With this
background students then express the recommender algorithm in the MapReduce
paradigm using mrjob. Experiments are
done using the Movielens data set https://grouplens.org/datasets/movielens/
. The assignment also forms the context for discussing the Netflix prize. |
Topics |
MapReduce, Recommender
Systems, Big Data machine learning |
Audience |
Advanced students of AI;
could also be used in a CS2 class as an extended assignment |
Difficulty |
Medium |
Strengths |
Explores a prominent AI
application (recommendation) |
Weaknesses |
Need to have sufficient
time in the course schedule to discuss the MapReduce paradigm |
Dependencies |
|
Variants |
Recommendation can be done
without using MapReduce |