In this assignment, students build a Convolutional Neural Network (CNN) to recognize American Sign Language (ASL) hand gestures. While doing so, students experience the entire machine learning workflow, and learn best practices for debugging neural networks.
Students begin by collecting and cleaning their own photos demonstrating ASL gestures. The teaching team then pools together data collected by the entire class and provides it to students. In the meantime, students build a CNN by first having a simple model "overfit" or memorize a small dataset to show their network's correctness. Once the entire class's dataset is available, students train their CNN, tune hyperparameters, and report results. There is also a module where students apply transfer learning by using pre-trained AlexNet weights to obtain better performance.
This assignment leads naturally to a discussion about fairness in machine learning, since the training data excludes demographics not represented by students in the class.
It is also possible to run a competition on an unseen test set. Interested instructors can contact the author(s) for a secret test set not previously shared with students.
We used this assignment in a third-year introductory machine learning class. However, a neural networks course that covers convolutional neural networks can adopt this assignment as well.
Summary | Students build a Convolutional Neural Network (CNN) to recognize American Sign Language (ASL) hand gestures. While doing so, students learn how to collect and clean data, split data into training/validation/test sets, debug neural networks, tune hyperparameters, and use pre-trained weights. |
Topics | Convolutional Neural Networks, Data Collection, Debugging, Hyperparameter Tuning, Transfer Learning, Machine Learning Fairness |
Audience |
Third and fourth year students in introductory machine learning or neural networks courses. |
Difficulty | The assignment is of moderate difficulty, and depends on student's comfort in programming. A good student reported spending 12 hours on model building, hyperparameter tuning, and transfer learning. |
Strengths |
This assignment gives students a sense of what it is like to work on a machine learning
problem in practice. Students encounter many of the issues that they would face in a real-life
scenario. For example:
|
Weaknesses |
There is work involved in combining student photos, and in doing a first-pass to filter out obviously malformatted data. It took us around 6 hours to inspect and grade the photos collected by roughly 70 students. We inspected the images both visually and with the help of a Python script, to check for image resolution and general correctness. Since students collect their own data, this assignment will not work well for small classes. The assignment worked well for a class of ~70 students, and another class of ~40 students. For smaller class sizes (e.g. ~20), the instructor can place more emphasis on the transfer learning portion of the assignment, increase the number of photos collected per student (and maybe decrease the number of ASL letters used) or include a discussion on data augmentation. |
Dependencies |
Software: We used Google Colab with Python and PyTorch for this assignment. It is possible to modify the transfer learning portion of the assignment to use tensorflow or a different library. Prior Material: No starter code is given in this assignment. Students should have other exposure to building neural network models, either through previous assignments, lecture material, or other resources. For sample Jupyter notebooks and resources, see the course website for a course that used this assignment. |
Variants |
|