Understanding How Recurrent Neural Networks Model Text

Michael Guerzhoy (University of Toronto and LKS-CHART, St. Michael's Hospital, guerzhoy@cs.toronto.edu),
Renjie Liao (University of Toronto and Uber ATG, rjliao@cs.toronto.edu)

Overview

In this assignment, students work with a Recurrent Neural Network (RNN) and explore the mechanism that allows RNNs to model English text character-by-character. Students learn to think of an RNN as a state machine, and explore the mechanism that allows RNNs to model and generate text. Students acquire facility with working with an implementation of an RNN and adding functionality to it. The assignment was originally used in a third-year Introduction to Neural Networks and Machine Learning course, and could be adapted for use in an Intro AI course where neural networks are a special topic: the assignment can be modified to require students to understand how neural networks produce outputs, but not how they are trained. The assignment only requires an intuitive understanding of state machines, and does not require Calculus (although students should be encouraged to understand Backprop as applied to RNNs in detail).

Recurrent Neural Networks have recently been shown to be remarkably effective at modelling English text character-by-character. Realistic-seeming ”fake” English text, and even C code, can be generated using RNN models. Students should be pointed to Andrej Karpathy's excellent essay "The Unreasonable Effectiveness of Recurrent Neural Networks," which inspired this assignment.

Many students and practitioners tend to view RNNs as black boxes. This assignment forces students to understand how RNNs model text by having students explain some properties of the outputs generated by the model (e.g., newlines' always following colons, as in the training text) in reference to specific weights of the model.

Students are provided with a simple RNN model that was trained on a corpus of Shakespeare plays. To make sure students gain an understanding of an implementation of an RNN and to give students practice with working with ML code, students are asked to modify and extend min-char-rnn.py to generate text from an RNN model at different temperatures and with different starting strings. Students are then asked to explain several properties of the text by viewing the RNN as a state machine. Students are encouraged to find new interesting properties of the "fake" text generated by the model and to explain how the model generates text with those properties, for bonus marks.

In courses that spend a fair amount of time on RNNs, a tutorial can be run to explain the RNN implementation used, min-char-rnn.py. The tutorial handout (LaTeX source) is provided.

Meta Information

Summary

Students extend and modify existing code to generate "fake English" text from an RNN. Students explore how the RNN model is able to generate text that resembles the training text by analyzing the weights and architecture of the RNN. Optionally, students train the RNN themselves using a corpus of Shakespeare plays as the training set.

Topics

Recurrent Neural Networks, text models, generation from probabilistic models

Audience

Third and fourth year students in Intro ML classes. Students in Intro AI classes in which neural networks are covered. Accessible to any student with a good background in Discrete Math from CS and to excellent high school students.

Difficulty

Students find this assignment easier than the average third-year assignment. Third-year students take about five hours to complete it.

Strengths
  • The assignment forces students to understand how RNNs can model text. Introductory machine learning assignments often do not manage to make students gain an intuition about why the models work.
  • The fun factor: students reported having fun playing around with RNNs.
  • The assignment is open-ended: students can explore complex mechanisms for generating text that looks like a play written in English.
  • Students gain experience with working with and modifying relatively complex machine learning code, while still avoiding a "fill-in-the-blank" feel to the assignment. A tutorial explaining the training code is provided with this assignment.
  • The assignment is fairly short. We found this to be an advantage since RNNs would tend to be covered towards the end of the course.
Weaknesses
  • The text generated by the simple "vanilla RNN" model the student work with isn't as impressive as what can be generated by LSTM models.
  • While the students can train the model themselves, it is likely impractical to require them to do that, since this takes a while on a CPU. While many students gain experience training an RNN, some prefer to use the weights that we provide.
Dependencies Students must have a good understanding of Recurrent Neural Networks (though not necessarily of the details of learning RNNs). Students should have enough CS background to be comfortable with reasoning about Finite State Machine-like mechanisms. Students should have had some practice programming with NumPy.
Variants
  • The assignment is open-ended by design: there are a lot of properties that students can find for bonus marks.
  • The assignment could easily be modified to use other texts. Choices include poetry, literature in languages other than English, and computer code.
  • Retraining the model included with the handout will produce new weights that will again need to be explained.
  • Interested students can train LSTM networks to get more impressive results.

Handout

Handout: HTML (markdown source).

Supporting the assignment

This assignment should be supported by 2-3 hours of lecture on Recurrent Neural Networks. Slides relevant to the assignment are available on request. When using the assignment, we also ran a tutorial explaining the RNN code used, Andrej Karpathy's min-char-rnn.py, line-by-line. The tutorial handout (LaTeX source) is provided.

Lessons learned

Prodding students to be creative when looking for interesting properties of the generated text proved to be a challenge: many were stuck in the "character A follows character B" paradigm. We would consider quickly introducing various properties of English text when teaching about language models.