Gesture Recognition using Convolutional Neural Networks

Lisa Zhang (University of Toronto, lczhang@cs.toronto.edu)
Bibin Sebastian (University of Toronto, bibin.sebastian@mail.utoronto.ca)

Overview

In this assignment, students build a Convolutional Neural Network (CNN) to recognize American Sign Language (ASL) hand gestures. While doing so, students experience the entire machine learning workflow, and learn best practices for debugging neural networks.

Students begin by collecting and cleaning their own photos demonstrating ASL gestures. The teaching team then pools together data collected by the entire class and provides it to students. In the meantime, students build a CNN by first having a simple model "overfit" or memorize a small dataset to show their network's correctness. Once the entire class's dataset is available, students train their CNN, tune hyperparameters, and report results. There is also a module where students apply transfer learning by using pre-trained AlexNet weights to obtain better performance.

This assignment leads naturally to a discussion about fairness in machine learning, since the training data excludes demographics not represented by students in the class.

It is also possible to run a competition on an unseen test set. Interested instructors can contact the author(s) for a secret test set not previously shared with students.

We used this assignment in a third-year introductory machine learning class. However, a neural networks course that covers convolutional neural networks can adopt this assignment as well.

Meta Information

Summary

Students build a Convolutional Neural Network (CNN) to recognize American Sign Language (ASL) hand gestures. While doing so, students learn how to collect and clean data, split data into training/validation/test sets, debug neural networks, tune hyperparameters, and use pre-trained weights.

Topics

Convolutional Neural Networks, Data Collection, Debugging, Hyperparameter Tuning, Transfer Learning, Machine Learning Fairness

Audience

Third and fourth year students in introductory machine learning or neural networks courses.

Difficulty

The assignment is of moderate difficulty, and depends on student's comfort in programming. A good student reported spending 12 hours on model building, hyperparameter tuning, and transfer learning.

Strengths This assignment gives students a sense of what it is like to work on a machine learning problem in practice. Students encounter many of the issues that they would face in a real-life scenario. For example:
  • Students work with real, messy data that they collect, and face issues like mistakes made by another student that the teaching team did not catch.
  • Students see that splitting data into training, validation, and test sets can be non-trivial. As in a realistic machine learning use case, random data splitting is not appropriate for this task.
  • Students learn debugging techniques not often explicitly taught in machine learning courses. The assignment handout guides them to first make sure that their model can "overfit" to a small set of data before moving on to actual training.
  • Students are encouraged to explore the different hyperparameters of their CNN.
  • Students see the extent to which transfer learning can drastically improve model performance.
  • The assignment leads naturally to a discussion about fairness in machine learning, and why their model could perform poorly for certain demographics.
Weaknesses

There is work involved in combining student photos, and in doing a first-pass to filter out obviously malformatted data. It took us around 6 hours to inspect and grade the photos collected by roughly 70 students. We inspected the images both visually and with the help of a Python script, to check for image resolution and general correctness.

Since students collect their own data, this assignment will not work well for small classes. The assignment worked well for a class of ~70 students, and another class of ~40 students. For smaller class sizes (e.g. ~20), the instructor can place more emphasis on the transfer learning portion of the assignment, increase the number of photos collected per student (and maybe decrease the number of ASL letters used) or include a discussion on data augmentation.

Dependencies

Software: We used Google Colab with Python and PyTorch for this assignment. It is possible to modify the transfer learning portion of the assignment to use tensorflow or a different library.

Prior Material: No starter code is given in this assignment. Students should have other exposure to building neural network models, either through previous assignments, lecture material, or other resources. For sample Jupyter notebooks and resources, see the course website for a course that used this assignment.

Variants
  • The transfer learning portion can be included or omitted.
  • Instructors can add an additional portion to the assignment exploring the fairness of the trained models, and assess model accuracy for various demographics.
  • The assignment effort can be adjusted by increasing or reducing the number of ASL letters that needs to be captured and recognized.
  • It is possible to run a class competition to see whose model can achieve the best accuracy on an unseen test set. Interested instructors can contact the author(s) for a secret test set not previously shared with students.

Materials

Lessons learned