Lisa Zhang and Pouria Fewzee
In this assignment, students combine the techniques they learned throughout a deep learning course to build a denoising autoencoder for news headlines. Students then use this denoising autoencoder to query similar headlines, and interpolate between headlines. The assignment combines students understanding of autoencoders, language modelling via recurrent neural networks (RNN), data augmentation, and working with embeddings.
The cumulative nature of this assignment makes it a good choice for a final assignment in an introductory Artificial Intelligence, Machine Learning, or Deep Learning course that covers the pre-requisite topics.
For a traditional AI course, it is possible to modify the assignment to use various other Information Retrieval techniques on the provided dataset.
Students combine their understanding of autoencoders, language modelling, data augmentation, and embedding to build a denoising recurrent neural network autoencoder for news headlines. Students use this model to retrieve similar headlines, and interpolate between headlines.
Third- and fourth-year students in an introductory artificial intelligence, machine learning, or deep learning course.
The assignment is of moderate difficulty, and depends on student’s comfort in programming, understanding of the pre-requisite materials, and ability to debug neural network code. Some of the common debugging issues are listed in the final section of this page.
Still, the assignment is heavily scaffolded so that a student can tackle some of the questions without completing all of them.
Software: We use Google Colab with Python, PyTorch, and torchtext for this assignment. Although the use of Google Colab is not necessary, the scaffolding in the assignment is specific to Google Colab, PyTorch and torchtext. In particular, the assignment contains instructions for how to use a GPU in Google Colab.
Prior Material: The handout assumes that students have seen autoencoders, recurrent neural networks, data augmentation, embeddings, and programming using Python, Google Colab, and pytorch.
The dataset can be used for a similar information retrieval task using traditional AI techniques.
A pre-trained model is available as part of the assignment. Instructors can instead use this pre-trained model for a lecture demonstration or a smaller exercise.
Assignment Handout:
Pre-requisite Materials:
Makefile for building the jupyter notebooks from the markdown source.
Using a cloud environment like Google Colab helps to avoid common installation issues, as well as ensure equitable access to computational resources for students.
Debugging is difficult for students. Overfitting on a single headline help students understand that there is an issue, but it is difficult for students to identify the issue. Here are some of the most common observations and the underlying issue:
decode
function is replicating the input token, rather than predicting the next token. Since it is very easy to replicate the input token, the loss decreases very quickly.N
different sequences, each of length 1, rather than 1 sequence of length N
. Then, when making predictions for these N
sequences, since PyTorch is not conditioning the first token on any information, the prediction for that first token is the same for all N
sequences.AutoEncoder.__init__
method is incorrect, and asks Colab to initialize a large number of weights, requiring too much memory.Some of these issues may not relate directly to the kind of machine learning concepts commonly taught in courses, but these issues are extremely common when developing one’s own machine learning model. This assignment is appropriate if debugging neural networks is a course learning objective, but some students will need help identifying issues with their code.