Implementing, Tuning and Data Curation of Feedforward Networks

Overview

The purpose of this assignment is to learn the underpinnings of feedforward networks. You will implement a perceptron network and a single hidden layer feedforward network with a step-activation function. You will experiment with key hyper parameters as well as key activation functions so as to understand how they interact and how they influence the performance of the networks. You will additionally learn about the taks of curating data so that it may be used to successfully train neural networks. You will be working with code that implements perceptrons, feed-forward networks and the backpropagation algorithm.

Basics

  1. Review the materials from sections 18.7.2 and 18.7.4 of the Russell and Norvig, in particular the algorithm presented in figure 18.24. For additional resources, consider the Wikipedia articles on Perceptrons, Feedforward neural networks and Backpropagation
  2. Download and install the Perceptron.java It is a perceptron network with a single output node. It uses a step activation function. Currently the training data is set up to learn the Boolean and function. Implement the trainNetwork() method with a fixed number of ten training episodes.
  3. Given the learning rate of 0.1, how many training episodes does the network take to learn Boolean and?
  4. Modify the learning rate. Can you learn in one training episode? If so, what is the learning rate?
  5. Set the learning rate back to 0.1. Now modify the threshold value. Can you learn in one training episode? If so, what is the threshold value?
  6. What is the relationship between the learning rate and the threshold value in the context of questions (4) and (5)?
  7. Modify the training data so as to attempt to learn the xor Boolean function. Experiment with different values for the learning rate and the threshold value. What appears to be the problem with the perceptron network when attempting to learn xor?

XOR Experiments

Download and install XOR.java. It is a feedforward network with one output node. It uses a sigmoid activation function. The weights are initialized to random values in the range [0..1[. Study the code and conduct the following experiments:
  1. Experiment with the following tuning parameters: number of training episodes, learning rate and hidden layer size. Currently, they are set to 1,000, 0.1 and 3, respectively. This is not sufficient to learn xor reliably. Change those parameters to values so as to efficiently and reliably train the network. Add to your report several sets of values with which you were able to train the network efficiently and reliably. Briefly state what you consider to be reliable. Additionally, address any relationships you may notice among the parameters. Notice that the testNetwork() method prints the actual output and the desired output.
  2. Now modify the XOR.java file so that the network uses a step activation function rather than a sigmoid activation function. Hint: Recall that the derivative of the step function is not defined for all values. Experiment with values of the number of training episodes, learning rate, threshold of the activation function and hidden layer size. Add to your report several sets of values with which you were able to train the network efficiently and reliably. Briefly state what you consider to be reliable. Additionally, address any relationships you may notice among the parameters.

Digit Recognition

Download and install FeedForwardNetwork.java and Training.java. Study the code. Notice that Training.java is set-up to train the feed-forward network to train XOR. It serves to test your installation. Please notice that there is some testing code in the FeedForwardNetwork.java file. Feel free to modify it to your needs. Modify the training file so that it successfully recognizes handwritten digits. The training data and an explanation of it can be found near the bottom of The MNIST Database document.

There are two portions to this assignment.

  1. At first, you need to curate the data from the MNIST data set so that it can be used for training purposes. We recommend that you:
    1. Learn about the byte level layout of an idx-file.
    2. Study which Java classes and functions you will be using to read bytes from an idx-file.
    3. Format and arrange the bytes so that they can be used to train the given neural network.
  2. For the second part of this assignment, we ask that you experiment with the various parameters so as to efficiently and reliably train the network. In your report, document the number of training episodes required to obtain the sort of precision you deem sufficient. Please justify your decision for the precision. Additionally, include the values of the following parameters: the range of the initial weights, the learning rate and the number and size of the hidden layers. Address how long it takes to train your network, given your parameters.

Submission

Please submit the following items:
  1. Your report for the experiments your conducted.
  2. Your modified XOR.java and Training.java files.