Implementing, Tuning and Data Curation of Feedforward Networks
Overview
The purpose of this assignment is to learn the underpinnings of feedforward networks.
You will implement a perceptron network and a single hidden layer feedforward network
with a step-activation function. You will experiment with key hyper parameters as
well as key activation functions so as to understand how they interact and how
they influence the performance of the networks. You will additionally learn about the
taks of curating data so that it may be used to successfully train neural networks.
You will be working with code that implements perceptrons, feed-forward networks and
the backpropagation algorithm.
Basics
- Review the materials from sections 18.7.2 and 18.7.4 of the Russell
and Norvig, in particular the algorithm presented in figure 18.24. For
additional resources, consider the Wikipedia articles on
Perceptrons, Feedforward
neural networks and Backpropagation
- Download and install the Perceptron.java It is a
perceptron network with a single output node. It uses a step activation function.
Currently the training data is set up to learn the Boolean and function.
Implement the trainNetwork() method with a fixed number of ten training episodes.
- Given the learning rate of 0.1, how many training episodes does
the network take to learn Boolean and?
- Modify the learning rate. Can you learn in one training episode?
If so, what is the learning rate?
- Set the learning rate back to 0.1. Now modify the
threshold value. Can you learn in one training episode?
If so, what is the threshold value?
- What is the relationship between the learning rate and the
threshold value in the context of questions (4) and (5)?
- Modify the training data so as to attempt to learn the xor Boolean function.
Experiment with different values for the learning rate and the threshold value.
What appears to be the problem with the perceptron network when attempting to learn xor?
XOR Experiments
Download and install XOR.java. It is a
feedforward network with one output node. It uses a sigmoid
activation function. The weights are initialized to random
values in the range [0..1[. Study the code and conduct the following
experiments:
- Experiment with the following tuning parameters:
number of training episodes, learning rate and hidden
layer size. Currently, they are set to 1,000, 0.1 and 3,
respectively. This is not sufficient to learn xor
reliably. Change those parameters to values so as to efficiently and
reliably train the network. Add to your report several sets of values
with which you were able to train the network efficiently and reliably. Briefly state
what you consider to be reliable. Additionally, address any relationships
you may notice among the parameters. Notice that the testNetwork() method
prints the actual output and the desired output.
- Now modify the XOR.java file so that the network uses a
step activation function rather than a sigmoid activation
function. Hint: Recall that the derivative of the step function is
not defined for all values. Experiment with values of the number of training
episodes, learning rate, threshold of the activation function and hidden layer size.
Add to your report several sets of values
with which you were able to train the network efficiently and reliably. Briefly state
what you consider to be reliable. Additionally, address any relationships
you may notice among the parameters.
Digit Recognition
Download and install FeedForwardNetwork.java and Training.java. Study the code. Notice that Training.java is
set-up to train the feed-forward network to train XOR. It serves to test your
installation. Please notice that there
is some testing code in the FeedForwardNetwork.java file. Feel
free to modify it to your needs. Modify the
training file so that it successfully recognizes handwritten
digits. The training data and an explanation of it can be found near
the bottom of The MNIST
Database document.
There are two portions to this assignment.
- At first, you need to curate the data from the MNIST data set so
that it can be used for training purposes. We recommend that you:
- Learn about the byte level layout of an idx-file.
- Study which Java classes and functions you will be using to read
bytes from an idx-file.
- Format and arrange the bytes so that they can be used to train the
given neural network.
- For the second part of this assignment, we ask that you experiment
with the various parameters so as to efficiently and reliably train
the network. In your report, document the number of training
episodes required to obtain the sort of precision you deem
sufficient. Please justify your decision for the
precision. Additionally, include the values of the following
parameters: the range of the initial weights, the learning rate and the
number and size of the hidden layers. Address how long it takes
to train your network, given your parameters.
Submission
Please submit the following items:
- Your report for the experiments your conducted.
- Your modified XOR.java and Training.java files.