Implementing a Recommender system using MapReduce

 

 

Summary

This assignment combines two prominent machine learning / big data technologies:  MapReduce and recommendation.  Nearly 20 years ago, as Google scaled, they created several new technologies.  Foundational amongst those technologies are the resilient Google File System (GFS) and a computing paradigm known as MapReduce.  Google published their work in a few highly influential papers.  Researchers, inspired by these descriptions, created open source versions, which eventually became the big data platform now known as Hadoop.  Yelp create a Python based library to write MapReduce programs known as mrjob.   Based on user preferences, recommender systems produce an ordered set of recommendations.  In this series of exercises, students gain hands on experience with how user-based, item-based, and content based recommender systems work.  Modeling the core computation in a spreadsheet helps convey the essence of these algorithms.  With this background students then express the recommender algorithm in the MapReduce paradigm using mrjob.  Experiments are done using the Movielens data set https://grouplens.org/datasets/movielens/ . The assignment also forms the context for discussing the Netflix prize.

 

Topics

MapReduce, Recommender Systems, Big Data machine learning

Audience

Advanced students of AI; could also be used in a CS2 class as an extended assignment

Difficulty

Medium

Strengths

Explores a prominent AI application (recommendation)

Weaknesses

Need to have sufficient time in the course schedule to discuss the MapReduce paradigm

Dependencies

 

Variants

Recommendation can be done without using MapReduce

 

 

assignment-sketch.docx

mapreduce

recosys