After completing this assignment, students should be able to:
Complete the following stubbed-out ScalarFlow library so that all public methods and attributes correspond to the provided docstring comments:
You can read a nicely-formatted version of the scalarflow documentation here.
The scalarflow.py
module depends on the
networkx library for actually maintaining
the graph structure. You'll need to install networkx
:
pip install networkx
Note that the starter code already includes functionality for building
and visualizing computation graphs. For example, the following
snippet should work without making any modifications to
scalarflow.py
:
import scalarflow as sf
graph = sf.Graph()
with graph:
x = sf.Variable(2.0, name='x')
y = sf.Variable(4.0, name='y')
x_squared = sf.Pow(x, 2)
y_squared = sf.Pow(y, 2)
xy_sum = sf.Add(x_squared, y_squared)
func = sf.Pow(xy_sum, .5) # (Square root)
graph.gen_dot("sample.dot")
This code creates a computation graph corresponding to the formula \(\sqrt{x^2 + y^2}\). The resulting dot file can be used to generate a nicely-formatted image representing the structure of the graph:
The provided code doesn't provide any functionality for actually performing the calculations represented by the graph. You are free to add any additional helper methods or private instance variables you find helpful for providing the required functionality.
Use good OO programming style. For example, each operator class should be responsible for performing its own forward and backward calculations. If you find yourself writing long if/elif blocks checking node types, you are probably on the wrong track. There are also opportunities to avoid code repetition by putting common functionality into the appropriate superclasses.
Avoid the temptation to perform the forward and backward passes recursively. This can be made to work, but it is inefficient and error prone. The provided code includes functionality for obtaining a topologically sorted list of ancestors for a particular node. Both the forward and backward passes should involve iterating over this list.
The following files provide a simple machine learning library built on top of the ScalarFlow toolkit:
sf_util.py - Useful utility functions for developing ML algorithms using ScalarFlow.
sf_classifiers.py - MLP and Logistic Regression classifiers.
sf_classifier_examples.py - Demos of the classifiers on small synthetic datasets.
If scalarflow.py
is completed correctly, the demo methods in
sf_classifier_examples.py
should reliably learn the two
classification tasks.
Even without completing the classes in scalarflow.py
it is possible to initialize the models in sf_classifiers.py
and visualize the structure of the corresponding computation graphs:
We have discussed the fact that sigmoid nonlinearities can lead to vanishing gradients when used with deep neural networks (more than three layers). Rectified linear units, or Relu's, are widely used because they help to avoid the problem of vanishing gradients: while sigmoids have near-zero derivatives across much of their domain, Relu's have non-zero derivatives for all positive values.
For Part 2 of this assignment, complete the following steps:
Add a Relu
node type to scalarflow.py
.
Update the MLP
class in sf_classifiers.py
to accept 'relu'
as an option for the activation
argument of the constructor.
Make sure to change the weight initialization code to He
initialization for relu
units.
Test your modified MLP implementation to make sure that it can reliably learn the xor task with ten hidden units using both sigmoid and relu activation functions. I found that it took some experimentation with learning rates to get both versions of the network to work well. It seems that the sigmoid activation function requires a significantly higher learning rate than relu.
Create two figures illustrating the learning curves for each classifier across 10 training runs. Each figure should show the epoch number on the x-axis and training loss on the y-axis. The first figure should include ten lines, each representing a single training run with sigmoid nonlinearities. The second figure should show the same information for the relu network. The captions should explain the data in the figures and should include the learning rate that was used.
The point of these figures is to illustrate that either activation function is effective for a three-layer network.
Create two additional figures by replicating the experiment above using networks with five hidden layers, each with 10 hidden units. The question here is whether the relu network does a better job of learning with a much deeper network. Again, the captions for the figures should explain the data and include the learning rates. (You should use the same learning rates here as in the previous experiment.)
Combine your figures into a single pdf document for submission.
Grades will be calculated according to the following distribution.
Readability/Style 10%
Your code should follow PEP8 conventions. It should be well documented and well organized.
Passes Reference Tests 70%
Note that I will also test your code using tf_classifiers.py
.
For the provided examples in tf_classifier_examples.py
, each
epoch should require less than a second to complete.
Part 2 Submission 20%