Summary Recreating Tesauro's TD-Gammon: implementing TD(lambda) in neural net backgammon player.
Topics Game tree search, representing state evaluation functions using feedforward neural networks, backpropagation, reinforcement learning, temporal difference learning
Audience Juniors/senior computer science undergraduates
Difficulty I usually allocate two weeks for this assignment; it is the last assignment in my junior/senior AI class. It is difficult but doable within the time frame. Starting early is crucial because training time for the learner is high -- one may have to run the code for several hours between debugging runs.
Strengths Students come away with a true understanding of TD-lambda and neural networks. Students also get a real sense of accomplishment when their reinforcement learner beats them in backgammon.
Weaknesses The experimentally weaker students get overwhelmed because debugging the code is difficult and there are many moving parts and parameters. With help from the teacher and teaching assistants, however, most complete the assignment satisfactorily.
Dependencies Topic knowledge: game trees, neural networks, reinforcement learning, temporal difference learning, rules of backgammon. Language: Java, Computing requirements: any modern day laptop.
Variants A different game would be an interesting extension, a simpler extension would be implementing a more recent temporal difference learning algorithm than the one used by Tesauro.

Downloads:

Javadoc documentation

Devika Subramanian