ScalarFlow: Implementing Automatic Differentiation

Submission Information

Summary

Many of the most significant breakthroughs in artificial intelligence over the past decade have been based on progress in deep neural networks. That progress has been facilitated by high-quality deep-learning libraries like Theano, TensorFlow and PyTorch. The key algorithm at the heart of all of these libraries is reverse-mode automatic differentiation. The purpose of this assignment is to gain a better understanding of deep learning algorithms and frameworks by programming an automatic differentiation engine and then modifying machine learning code built on top of that engine.

Topics Neural Networks, Reverse-Mode Automatic Differentiation, Vanishing Gradients, Rectified Linear Units
Audience Upper-level undergraduate students in an AI or Machine Learning course.
Difficulty

The assignment doesn't require very many lines of code, but it does require students to develop a good understanding of the algorithm. I've given the assignment over a two week period.

Strengths

Reverse-mode automatic differentiation is a simple and elegant algorithm. Restricting the assignment to scalar-valued computations makes it possible to illustrate the central ideas while avoiding many of the complexities of existing frameworks like TensorFlow or PyTorch.

The assignment involves both completing the low-level automatic differentiation engine and working with machine learning code built on top of that engine.

Weaknesses

The starter code for the assignment includes a fairly complicated Python class hierarchy that makes use of features of the Python language that some students may be unfamiliar with. Students will need to spend some effort to understand the provided structure before they can make progress on implementing the algorithm.

Since all of the computation is performed directly in Python, without making use of vectorized numpy operations or GPU acceleration, the finished library is too slow to apply to large-scale machine learning problems.

The assignment is likely to be a struggle for students without a good foundation in calculus. Students with only one or two semesters of calculus will need to be introduced to the idea of partial derivatives.

Debugging the code can be challenging because it isn't easy to visualize the computation as it proceeds. The dot files provide a mechanism for visualizing the computation graph, but they are a bit clunky to work with.

To my knowledge this topic is not covered in any standard textbooks, and the materials available online are not targeted to an undergraduate audience. The main mechanism that I use to teach the algorithm is a YouTube video where I work through an extended paper/pencil example:

https://youtu.be/EEbnprb_YTU

The video description includes links to some additional tutorials and resources.

Dependencies

Students need some background in calculus. This assignment is designed to take place after students have gone through a more traditional introduction to the basic ideas of machine learning and neural networks. The current assignment is implemented in Python3. It does not have any operating system dependencies.

Variants

There are many ways that this assignment could be extended. In its current incarnation the assignment involves three stages: First, the students complete the automatic differentiation engine. Second, they add an additional operator to the engine (Relu) and update the machine learning library to make use of it. Third, they conduct some experiments using the machine learning library and report on their results. Each of those stages could be expanded in many ways: Students could add additional options or features to the ScalarFlow library. Students could build additional machine learning algorithms on top of the library. Students could run additional learning experiments.

Project Details

Follow this link for the project description, starter code etc: scalarflow.shtml.