Gesture Recognition using Convolutional Neural Networks

Lisa Zhang (University of Toronto, `lczhang@cs.toronto.edu`)
Bibin Sebastian (University of Toronto, `bibin.sebastian@mail.utoronto.ca`)

Overview

In this assignment, students build a Convolutional Neural Network (CNN) to recognize American Sign Language (ASL) hand gestures. While doing so, students experience the entire machine learning workflow, and learn best practices for debugging neural networks.

Students begin by collecting and cleaning their own photos demonstrating ASL gestures. The teaching team then pools together data collected by the entire class and provides it to students. In the meantime, students build a CNN by first having a simple model "overfit" or memorize a small dataset to show their network's correctness. Once the entire class's dataset is available, students train their CNN, tune hyperparameters, and report results. There is also a module where students apply transfer learning by using pre-trained AlexNet weights to obtain better performance.

This assignment leads naturally to a discussion about fairness in machine learning, since the training data excludes demographics not represented by students in the class.

It is also possible to run a competition on an unseen test set. Interested instructors can contact the author(s) for a secret test set not previously shared with students.

We used this assignment in a third-year introductory machine learning class. However, a neural networks course that covers convolutional neural networks can adopt this assignment as well.

Meta Information

Summary	Students build a Convolutional Neural Network (CNN) to recognize American Sign Language (ASL) hand gestures. While doing so, students learn how to collect and clean data, split data into training/validation/test sets, debug neural networks, tune hyperparameters, and use pre-trained weights.
Topics	Convolutional Neural Networks, Data Collection, Debugging, Hyperparameter Tuning, Transfer Learning, Machine Learning Fairness
Audience	Third and fourth year students in introductory machine learning or neural networks courses.
Difficulty	The assignment is of moderate difficulty, and depends on student's comfort in programming. A good student reported spending 12 hours on model building, hyperparameter tuning, and transfer learning.
Strengths	This assignment gives students a sense of what it is like to work on a machine learning problem in practice. Students encounter many of the issues that they would face in a real-life scenario. For example: Students work with real, messy data that they collect, and face issues like mistakes made by another student that the teaching team did not catch. Students see that splitting data into training, validation, and test sets can be non-trivial. As in a realistic machine learning use case, random data splitting is not appropriate for this task. Students learn debugging techniques not often explicitly taught in machine learning courses. The assignment handout guides them to first make sure that their model can "overfit" to a small set of data before moving on to actual training. Students are encouraged to explore the different hyperparameters of their CNN. Students see the extent to which transfer learning can drastically improve model performance. The assignment leads naturally to a discussion about fairness in machine learning, and why their model could perform poorly for certain demographics.
Weaknesses	There is work involved in combining student photos, and in doing a first-pass to filter out obviously malformatted data. It took us around 6 hours to inspect and grade the photos collected by roughly 70 students. We inspected the images both visually and with the help of a Python script, to check for image resolution and general correctness. Since students collect their own data, this assignment will not work well for small classes. The assignment worked well for a class of ~70 students, and another class of ~40 students. For smaller class sizes (e.g. ~20), the instructor can place more emphasis on the transfer learning portion of the assignment, increase the number of photos collected per student (and maybe decrease the number of ASL letters used) or include a discussion on data augmentation.
Dependencies	Software: We used Google Colab with Python and PyTorch for this assignment. It is possible to modify the transfer learning portion of the assignment to use tensorflow or a different library. Prior Material: No starter code is given in this assignment. Students should have other exposure to building neural network models, either through previous assignments, lecture material, or other resources. For sample Jupyter notebooks and resources, see the course website for a course that used this assignment.
Variants	The transfer learning portion can be included or omitted. Instructors can add an additional portion to the assignment exploring the fairness of the trained models, and assess model accuracy for various demographics. The assignment effort can be adjusted by increasing or reducing the number of ASL letters that needs to be captured and recognized. It is possible to run a class competition to see whose model can achieve the best accuracy on an unseen test set. Interested instructors can contact the author(s) for a secret test set not previously shared with students.

Materials

Assignment Handout (markdown source, PDF).
Tutorial Slides to help students take good photos.

Lessons learned

The wording of the data collection instructions need to be very careful. In one iteration of the course, some students opted to use photo editing softwares to erase the background of their photos.
Some ASL alphabet posters show gestures at different angles than others. It is important to choose a good gesture poster and ask students to stick with the one given.
The correct way to split data into training, validation, and test set is not obvious to students. It is worth having a discussion with students about different ways of splitting data, and why we need to ensure that the test set is as much as possible representative of an unseen dataset.
Using a cloud environment like Google Colab helps to avoid common installation issues, as well as ensure equitable access to computational resources for students.