In this assignment, students work with a Recurrent Neural Network (RNN) and explore the mechanism that allows RNNs to model English text character-by-character. Students learn to think of an RNN as a state machine, and explore the mechanism that allows RNNs to model and generate text. Students acquire facility with working with an implementation of an RNN and adding functionality to it. The assignment was originally used in a third-year Introduction to Neural Networks and Machine Learning course, and could be adapted for use in an Intro AI course where neural networks are a special topic: the assignment can be modified to require students to understand how neural networks produce outputs, but not how they are trained. The assignment only requires an intuitive understanding of state machines, and does not require Calculus (although students should be encouraged to understand Backprop as applied to RNNs in detail).
Recurrent Neural Networks have recently been shown to be remarkably effective at modelling English text character-by-character. Realistic-seeming ”fake” English text, and even C code, can be generated using RNN models. Students should be pointed to Andrej Karpathy's excellent essay "The Unreasonable Effectiveness of Recurrent Neural Networks," which inspired this assignment.
Many students and practitioners tend to view RNNs as black boxes. This assignment forces students to understand how RNNs model text by having students explain some properties of the outputs generated by the model (e.g., newlines' always following colons, as in the training text) in reference to specific weights of the model.
Students are provided with a simple RNN model that was trained on a corpus of Shakespeare plays. To make sure students gain an understanding of an implementation of an RNN and to give students practice with working with ML code, students are asked to modify and extend min-char-rnn.py to generate text from an RNN model at different temperatures and with different starting strings. Students are then asked to explain several properties of the text by viewing the RNN as a state machine. Students are encouraged to find new interesting properties of the "fake" text generated by the model and to explain how the model generates text with those properties, for bonus marks.
In courses that spend a fair amount of time on RNNs, a tutorial can be run to explain the RNN implementation used, min-char-rnn.py. The tutorial handout (LaTeX source) is provided.
Students extend and modify existing code to generate "fake English" text from an RNN. Students explore how the RNN model is able to generate text that resembles the training text by analyzing the weights and architecture of the RNN. Optionally, students train the RNN themselves using a corpus of Shakespeare plays as the training set.
Recurrent Neural Networks, text models, generation from probabilistic models
Third and fourth year students in Intro ML classes. Students in Intro AI classes in which neural networks are covered. Accessible to any student with a good background in Discrete Math from CS and to excellent high school students.
Students find this assignment easier than the average third-year assignment. Third-year students take about five hours to complete it.
|Dependencies||Students must have a good understanding of Recurrent Neural Networks (though not necessarily of the details of learning RNNs). Students should have enough CS background to be comfortable with reasoning about Finite State Machine-like mechanisms. Students should have had some practice programming with NumPy.|
Handout: HTML (markdown source).
This assignment should be supported by 2-3 hours of lecture on Recurrent Neural Networks. Slides relevant to the assignment are available on request. When using the assignment, we also ran a tutorial explaining the RNN code used, Andrej Karpathy's min-char-rnn.py, line-by-line. The tutorial handout (LaTeX source) is provided.
Prodding students to be creative when looking for interesting properties of the generated text proved to be a challenge: many were stuck in the "character A follows character B" paradigm. We would consider quickly introducing various properties of English text when teaching about language models.