Summary | Recreating Tesauro's TD-Gammon: implementing TD(lambda) in neural net backgammon player. |
Topics | Game tree search, representing state evaluation functions using feedforward neural networks, backpropagation, reinforcement learning, temporal difference learning |
Audience | Juniors/senior computer science undergraduates |
Difficulty | I usually allocate two weeks for this assignment; it is the last assignment in my junior/senior AI class. It is difficult but doable within the time frame. Starting early is crucial because training time for the learner is high -- one may have to run the code for several hours between debugging runs. |
Strengths | Students come away with a true understanding of TD-lambda and neural networks. Students also get a real sense of accomplishment when their reinforcement learner beats them in backgammon. |
Weaknesses | The experimentally weaker students get overwhelmed because debugging the code is difficult and there are many moving parts and parameters. With help from the teacher and teaching assistants, however, most complete the assignment satisfactorily. |
Dependencies | Topic knowledge: game trees, neural networks, reinforcement learning, temporal difference learning, rules of backgammon. Language: Java, Computing requirements: any modern day laptop. |
Variants | A different game would be an interesting extension, a simpler extension would be implementing a more recent temporal difference learning algorithm than the one used by Tesauro. |
Downloads: