Summary |
In this assignment, students get first-hand experience with gender bias in word embeddings. They are asked to evaluate and quantify the presence of gender bias in a pre-trained word embedding, debias the embedding in post-processing, and determine the extent to which the bias is still present after debiasing. Students are also asked to experiment with existing Large Language Models to evaluate the presence of bias in their answers. The assignment is modelled after the 2019 paper "Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them", by Hila Gonen and Yoav Goldberg. Our thanks go to Dr. Vered Shwartz for sharing the initial idea that led to the creation of this assignment. |
Topics |
Natural language processing (NLP), word embeddings, AI bias. |
Audience |
Undergraduate course in machine learning or NLP. |
Difficulty |
Medium level - while a knowledge of word embeddings is not necessary, students may need some guidance to understand how the bias is measured and corrected, as well as understanding the role of visualization in bias evaluation. |
Strengths |
The assignment is accessible to all students with an understanding of basic machine learning concepts (training, classifying, and clustering), while existing knowledge of NLP and word embeddings is not required. The amount of coding required is minimal (the assignment focuses more on the evaluation of results), making the assignment accessible to a wider audience. The assignment also starts with a review of basic vector arithmetic, to prepare the students for the following steps. |
Weaknesses |
The assignment includes some open ended questions, which sometimes can be answered in different ways. We provide a rubric to help with grading, and encourage the instructors to focus on students' reasoning over answer correctness. |
Dependencies |
Fundamentals of Python are required to understand and complete the presented code. Students need access to an environment to run Jupyter Notebooks, including numpy and matplotlib packages. Instalment of other packages is required and included in the assignment instructions. Students also need an understanding of basic machine learning concepts (training, classifying, and clustering). |
Variants |
For more advanced students, instructors could consider removing larger portions of code and let the students working on completing the code themselves (e.g. calculating words' projections). |