Understanding How Recurrent Neural Networks Model Text

In this assignment you will explore how Recurrent Neural Networks (RNNs) model and generate text. You will work on extending Andrej Karpathy’s min-char-rnn.py implementation of a vanilla RNN, train the RNN on a corpus of Shakespeare’s texts (download here ), and figure out how the RNN manages to generate “Shakespeare-like” text. and write up your findings in a report.

We are providing you with a model trained on the Shakespeare corpus. The model is available here (alternatively, you can read in the model from a Pickle file ), and code to read in the weights is here .

Part 1 (20%)

Recall from lecture that at each time-step \(t\), the RNN computes an output layer

$$y^{(t)} = (y^{(t)}_1, y^{(t)}_2, ..., y^{(t)}_k).$$

The estimate of the probability that the character at time-step \((t+1)\) is character \(i\) is then proportional to \(\exp(y^{(t)}_i)\):

$$P(x^{(t+1)}=i) = \frac{\exp(y^{(t)}_i)}{\sum_{i'=1}^k \exp(y^{(t)}_{i'})}.$$

As discussed in lecture, when sampling from the RNN, we can sample using different “temperatures.” We can sample the character at time-step \((t+1)\) by setting the probability of sampling character \(i\) to be proportional to \(\exp(\alpha y^{(t)}_i)\):

$$P(x^{(t+1)}=i) = \frac{\exp(\alpha y^{(t)}_i)}{\sum_{i'=1}^k \exp(\alpha y^{(t)}_{i'})}.$$

The quantity \(1/\alpha\) is called the “temperature.”

Write a function to sample text from the model using different temperatures. Try different temperatures, and, in your report, include examples of texts generated using different temperatures. Briefly discuss what difference the temperature makes.

You should either train the RNN yourself (this can take a couple of hours), or use the weights we provided – up to you.

Part 2 (50%)

Write a function that uses an RNN to complete a string. That is, the RNN should generate text that is a plausible continuation of a given starter string. In order to do that, you will need to compute the hidden activity \(h^{(t)}\) at the end of the starter string of length \(t\), and then start generating new text.

In your report, include five interesting examples of outputs that your network generated using a starter string that you chose.

You should either train the RNN yourself (this can take a couple of hours), or use the weights we provided – up to you.

Part 3 (30%)

Some examples of texts generated from the model provided to you (at temperature = 1) are here .

In the samples that the RNN generated, it seems that a newline or a space usually follow the colon (i.e., “:” ) character. In the weight data provided to you, identify the specific weights that are responsible for this behaviour by the RNN. In your report, specify the coordinates and values of the weights you identified, and explain how those weights make the RNN generate newlines and spaces after colons. Explain how you figured out which weights are responsible for the behaviour. You are encouraged to write code to get the answer, and to include the scripts you wrote in your report.

Part 4 (10% bonus)

Identify another interesting behaviour of the RNN, and identify the weights that are responsible for it. Specify the coordinates and the values of the weights, and explain how those weights lead to the behaviour that you identified. To obtain more than 2/10 for the bonus part, the behaviour has to be more interesting than the behaviour in Part 3 (i.e., character A following character B).

What to submit

The project should be implemented using Python. Your report should be in PDF format. You should use LaTeX to generate the report, and submit the .tex file as well.

Reproducibility counts! We should be able to obtain all the graphs and figures in your report by running your code. Set all the seeds to 0 to enable us to reproduce your outputs.