In 2000, Hart and Mas-Colell introduced the important game-theoretic
algorithm of regret matching. Players reach equilibrium play by tracking
regrets for past plays, making future plays proportional to positive cumulative
regrets (i.e. how much they wished they had made the moves on average). The
technique is not only simple and intuitive; it inspired the counterfactual
regret minimization algorithm that has since formed the basis for bots
dominating annual computer

poker competitions.

Since these algorithms are relatively recent, there are few curricular materials available to introduce students, researchers, and practitioners to regret-based algorithms. The tutorial PDF, suggested exercises, and sample code offered below represent a modest first step towards making such recent innovations more accessible.

Summary | An Introduction to Counterfactual Regret Minimization - Learn the game-theoretic regret-based techniques that spurred a revolution in competitive computer poker. |

Topics | Game theory, regret, regret matching, counterfactual regret minimization (CFR), fixed strategy iteration counterfactual regret minimization (FSICFR), strategy purification, epsilon errors. |

Audience | Advanced Artificial Intellegence students |

Difficulty | Suitable for an upper-level undergraduate or graduate course with a focus on game theory, game AI, or other related topics. A few weeks would be needed to complete all exercises. |

Strengths | Students acquire a deep experiential knowledge of game-theoretic regret-based algorithms that leaves one close to the frontier of research in 2013. For undergraduates, this would open doors to interesting undergraduate research projects suggested within. For graduate students and researchers, these materials provide a fast means of grasping an important, emerging area of game theory. |

Weaknesses | The presentation follows Knuth's literate programming style, and thus requires knowledge of the Java programming language. |

Dependencies | Students must be mathematically literate and able to comprehend complex code and pseudocode. Students must also have advanced programming skills and be familiar with Java syntax and semantics. |

Variants | Given that there are many simple and complex poker variants and bluffing dice games that have yet to receive research attention (see "Further Challenges" section), there are many possible follow-on projects and parallel examples that might be presented by the instructor. |

Using Java code examples in Donald Knuth's literate programming style, we will present example applications of the regret matching algorithm for normal form "one-shot" games, counterfactual regret minimization (CFR) for extensive form games, and fixed-strategy iteration counterfactual regret minimization (FSICFR). We also briefly touch on strategy cleaning and how one might take maximal advantage of opponent error. Throughout, the reader first sees an example application and is then invited to deepen understanding through application to additional challenge problems.

The object of this project is
to give the student a deep, experiential understanding of **
game-theoretic regret-based algorithms** through explanation,
implementation examples, and implementation exercises

- Regret-Matching
__:__ - Explain basic definitions of game theory and the concept of regret.
- Present an outline of the regret-matching algorithm, along with a worked Java application example for the game of Rock-Paper-Scissors.
- Provide programming exercises for knowledge transfer.
- Counterfactual Regret Minimization (CFR):
- Explain the concept of counterfactual regret and the counterfactual regret minimization algorithm.

- Demonstrate its application to the simple game of Kuhn Poker.
- Specify a CFR application to the bluffing dice game of Dudo (restricted to 1-die-versus-1-die).

- Fixed-Strategy Iteration Counterfactual Regret Minimization (FSICFR):
- Explain the applicability of FSICFR to game graphs along with the reduction of CFR computational complexity from exponential to linear.
- Present FSICFR pseudocode and demonstrate its application in the context of our simple bluffing dice game, Liar Die.
- Present a challenge problem where one applies FSICFR to 1-die-versus-1-die Dudo with a claim memory of 3.
- Additional Concepts:
- Describe means of "cleaning" approximately optimal play policies in order to simplify and, in many cases, improve play.
- Describe an open research problem concerning how one might take maximal advantage of opponent error.

Students must be mathematically literate and able to comprehend complex code and pseudocode. Students must also have advanced programming skills and be familiar with Java syntax and semantics. Students who program in C++ should be able to follow most of the Java examples.

While not required, students with no prior exposure to game theory may wish to pair these materials with a brief primer on game theory online or in print (e.g. Leyton-Brown and Shoham's "Essentials of Game Theory", Morgan & Claypool, 2008).

Students are encouraged to play the games described in the materials to have a better appreciation of the decisions and a greater likelihood of discerning erroneous policy output.

The projects are described in the
file ** cfr.pdf**. You will need the
free Adobe
Acrobat Reader to view this file.

Example code described in each project is available here:

This code was generated from the same source file that generated cfr.pdf using the literate programming tool noweb.

Students seeking more extended challenges will also find descriptions of related advanced projects in the last section of the document.

Here is a sample syllabus used in (course omitted for review)

Readings:

4/3: Essentials of Game Theory, Ch. 1, 2, Definition 3.2.1, Section 3.5;
Handout: "Introduction to Counterfactual Regret Minimization", Sections 1-2;
[Optional: S. Hart, A. Mas-Colell. A
Simple Adaptive Procedure Leading to Correlated Equilibrium. Sections 2-3;
H. Kuhn. Simplified Two-Person Poker.]

4/5: Handout: "Introduction to
Counterfactual Regret Minimization", Section 3; T. Neller, S. Hnath. Approximating
Optimal Dudo Play with Fixed-Strategy Iteration Counterfactual Regret
Minimization, through Section 2 [Optional: M. Zinkevich et al. Regret
Minimization in Games with Incomplete Information.]

4/10: Handout:
"Introduction to Counterfactual Regret Minimization", Section 4; T. Neller, S.
Hnath. Approximating
Optimal Dudo Play with Fixed-Strategy Iteration Counterfactual Regret
Minimization, remaining sections.

4/12: Handout "Introduction to
Counterfactual Regret Minimization", Section 5

4/17: Handout "Introduction to
Counterfactual Regret Minimization", Section 6

Assignments:

1. Regret - Rock-Paper-Scissors (Thursday 4/5): Read Section 1 of CFR
handout in order to learn about game theoretic definitions, regret, regret
matching and minimization.
Using the Rock-Paper-Scissors worked example of one-sided regret-matching, complete
the two-sided regret-matching RPS Equilibrium, exercise 2.5.

2. Counter Factual Regret Minimization (CFR) - 1-die-versus-1-die Dudo
(Thursday 4/12): Read Section 2 of CFR handout in order to learn about CFR.
Using the Kuhn Poker worked example as a model, complete the 1-die-versus-1-die
Dudo exercise of subsection 2.5

3. Error modeling (Thursday 4/19): - Read
sections 3-5 of CFR handout in order to learn about cleaning results, FSICFR,
and epsilon error. Modify the Liar Die FSICFR worked example of subsection
4.3 so as to have clean results that maximally exploit potential opponent
errors.

- Game Theory Wikipedia Page
- GameTheory.net
- References:

- James Ernest, Phil Foglio, and Mike Selinker. Dealer's Choice: the complete handbook of Saturday night poker. Overlook Duckworth, Peter Mayer Publishers, Inc., New York, 2005.
- Sam Ganzfried, Tuomas Sandholm, and Kevin Waugh. Strategy purification and thresholding: Effective non-equilibrium approaches for playing large games. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2012.
- Sergiu Hart and Andreu Mas-Colell. A simple adaptive procedure leading to correlated equilibrium. Econometrica, 68(5):1127-1150, September 2000.
- Gil Jacobs. The World's Best Dice Games, new edition. John N. Hansen Co., Inc., Milbrae, California, 1993.
- Reiner Knizia. Dice Games Properly Explained. Elliot Right-Way Books, Brighton Road, Lower Kingswood, Tadworth, Surrey, KT20 6TD U.K., 1999.
- Harold W. Kuhn. Simplified two-person poker. In Harold W. Kuhn and Albert W. Tucker, editors, Contributions to the Theory of Games, volume 1, pages 97-103. Princeton University Press, 1950.
- Marc Lanctot. Monte Carlo Sampling and Regret Minimization for Equilibrium Computation and Decision-Making in Large Extensive Form Games. PhD thesis, University of Alberta, University of Alberta, Computing Science, 116 St. and 85 Ave., Edmonton, Alberta T6G 2R3, January 2013.
- Marc Lanctot, Kevin Waugh, Martin Zinkevich, and Michael Bowling. Monte carlo sampling for regret minimization in extensive games. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 1078-1086. MIT Press, 2009.
- Kevin Layton-Brown and Yoav Shoham. Essentials of Game Theory: A Concise, Multidisciplinary Introduction. Morgan and Claypool Publishers, 2008.
- Peter Bro Miltersen and Troels Bjerre Sørensen. Computing a quasi-perfect equilibrium of a two-player game. Economic Theory, 42(1):175-192, 2010.
- Merilyn Simonds Mohr. The New Games Treasury. Houghton Mifflin, Boston, Massachusetts, 1993.
- Todd W. Neller and Steven Hnath. Approximating Optimal Dudo Play with Fixed-Strategy Iteration Counterfactual Regret Minimization, in van den Herik, H. Jaap and Plaat, Aske, eds., LNCS 7168: Advances in Computer Games, 13th International Conference, ACG 2011, Tilburg, The Netherlands, November 20-22, 2011, Revised Selected Papers, Springer, 2012, pp. 169-182.
- Qi Zeng, Bruce R. Davis, and Derek Abbott. Reverse auction: The lowest unique positive integer game. Fluctuation and Noise Letters, 7(4):L439-L447, 2007.
- Martin Zinkevich, Michael Johanson, Michael Bowling, and Carmelo Piccione. Regret minimization in games with incomplete information. In J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 1729{1736. MIT Press, Cambridge, MA, 2008.

This work is partially funded by the Netherlands Organisation for Scientific Research (NWO) in the framework of the project Go4Nature, grant number 612.000.938.

Todd Neller, Marc Lanctot