Training Artificial Neural Networks to Beat StarCraft II

Summary Students create an artificial neural network and use reinforcement learning to implement an automated agent in the Starcraft II game. This project uses the pysc2 Application Programming Interface (API) by DeepMind and the Keras Deep Learning API to train an agent to win at StarCraft II. StarCraft II is freely available for download from the game maker Blizzard, and the pysc2 and Keras APIs are open-source and freely available.
The assignment is intended to be a final team project for an undergraduate AI course. We have divided the assignment into two checkpoints and a final turn-in. The first checkpoint familiarizes students with Starcraft and the API by implementing scripted actions in the pysc2 API without using AI techniques. The second checkpoint requires students to implement the artificial neural network and apply predicted actions to the game. The final turn-in requires the students to tune their artificial neural network and report on their results. Instructors can use these checkpoints to identify struggling students and at the end of the checkpoint, can provide solution code for the checkpoint to move any struggling students to the next part of the project.

Topics Applied artificial neural networks and reinforcement learning

Audience This project is targeted to undergraduate students in an Artificial Intelligence (AI) survey course, for which the only prerequisites are CS0 and CS1. At our institution, approximately 80% of the students are Seniors and 20% are Juniors, majoring in Computer Science, Cyber Science, Data Science, Operations Research, or Applied Math. The prerequisites for the course are to have taken our CS0 and CS1 level courses. Python is introduced in the CS0 course and the data science majors have taken additional courses that use Python.

Difficulty Suitable for an upper-level undergraduate or graduate course with a focus on AI programming. We require all students to take a CS0-level Python programming course and a CS1-level course in C or MATLAB. The first milestones of the assignment walk students with basic Python skills through tutorials using the pysc2 API skills.

Strengths Using StarCraft II gamifies the process of configuring and training an artificial neural network into an interesting task. Visualizing the results (in the form of an agent playing StarCraft II) prevents students from developing tunnel vision --only focusing on improving the accuracy of the artificial neural network without looking at predictions from the artificial neural network. Data from the API is not a standard dataset which can be downloaded directly into a Keras model. This forces the students to format a unique dataset before inputting the data to the Keras artificial neural network.

Weaknesses Code is currently only available in Python, since it relies on pysc2. There is a learning curve to pysc2, although we provide some scaffolding via an introductory, exploratory assignment. Our install instructions in checkpoint 1 are currently written for Visual Studio on Windows.

Dependencies The prerequisites for this assignment include programming (in Python), data structures, and probability and statistics. This assignment is designed for students who have taken at least a CS0 and CS1 programming class. Starcraft requires approximately 30 GB of disk space and a video card on student computers (official dependencies).

Variants There are additional assignments that could be built to further expand on this initial assignment. Romo and Jain applied Q-learning using pysc2; however, a custom Starcraft II map was utilized that does not align with the game. Brown also built a PySC2 Agent that utilizes Q-learning to beat an adversary. Assigning the students to build a Q-learning Agent may not work, since the complete solution is freely available, but building a Deep Q-learning Agent (where a neural nework learns the Q-values of each state) could be an interesting variation. Building a competitive framework, where each student team submits their Agent model to their instructor and then the instructor stages a competition between the teams, could motivate some students. Finally, DeepMind's champion model utilized a Convolutional Neural Network (CNN) to decide which points in the map to send attacking units instead of statically defining these points (see Vinyals et al., joint work by the PySC2 and Blizzard teams). Adding the CNN would increase the students' knowledge of a deep learning implementation. While this assignment isn't an implmentation of Q-learning, it could be extended to use a more effective reward function. pysc2 exposes a game score that is updated after every step, and feedback that an action added 50 points is more immediate and more powerful than eventual feedback about the final game state (win, loss, or tie). Another way the assignment could be extended would be to allow more general game agents. For simplicity, the Starcraft player agents are currently all of the Terran race. If students abstracted the race away, they could test the generalizability of the policies learned: does a Terran policy work for a Zerg agent?

Assignment description: We assign this programming project as a final course project for our undergraduate Artifical Intelligence Course. The project is divided into three turn-ins so that we can identify students struggling with the early stages of this project and assist them in completing the rest of the project. The following are the assignment descriptions for each turn-in:

Checkpoint 1 - Starcraft II Warm-up (.pdf)

Checkpoint 1 - Starting code (.zip)

Checkpoint 2 - Building and Training a Neural Network (.pdf)

Checkpoint 3 - Tuning Hyperparameters and Reporting Performance (.pdf)

Solution Code

StarCraft II logo and sceenshots utilized under Fair Use Exception granted by Blizzard Entertainment. StarCraft® II: Wings of Liberty® ©2010 Blizzard Entertainment, Inc. All rights reserved. Wings of Liberty is a trademark, and StarCraft and Blizzard Entertainment are trademarks or registered trademarks of Blizzard Entertainment, Inc. in the U.S. and/or other countries.

Summary	Students create an artificial neural network and use reinforcement learning to implement an automated agent in the Starcraft II game. This project uses the pysc2 Application Programming Interface (API) by DeepMind and the Keras Deep Learning API to train an agent to win at StarCraft II. StarCraft II is freely available for download from the game maker Blizzard, and the pysc2 and Keras APIs are open-source and freely available. The assignment is intended to be a final team project for an undergraduate AI course. We have divided the assignment into two checkpoints and a final turn-in. The first checkpoint familiarizes students with Starcraft and the API by implementing scripted actions in the pysc2 API without using AI techniques. The second checkpoint requires students to implement the artificial neural network and apply predicted actions to the game. The final turn-in requires the students to tune their artificial neural network and report on their results. Instructors can use these checkpoints to identify struggling students and at the end of the checkpoint, can provide solution code for the checkpoint to move any struggling students to the next part of the project.
Topics	Applied artificial neural networks and reinforcement learning
Audience	This project is targeted to undergraduate students in an Artificial Intelligence (AI) survey course, for which the only prerequisites are CS0 and CS1. At our institution, approximately 80% of the students are Seniors and 20% are Juniors, majoring in Computer Science, Cyber Science, Data Science, Operations Research, or Applied Math. The prerequisites for the course are to have taken our CS0 and CS1 level courses. Python is introduced in the CS0 course and the data science majors have taken additional courses that use Python.
Difficulty	Suitable for an upper-level undergraduate or graduate course with a focus on AI programming. We require all students to take a CS0-level Python programming course and a CS1-level course in C or MATLAB. The first milestones of the assignment walk students with basic Python skills through tutorials using the pysc2 API skills.
Strengths	Using StarCraft II gamifies the process of configuring and training an artificial neural network into an interesting task. Visualizing the results (in the form of an agent playing StarCraft II) prevents students from developing tunnel vision --only focusing on improving the accuracy of the artificial neural network without looking at predictions from the artificial neural network. Data from the API is not a standard dataset which can be downloaded directly into a Keras model. This forces the students to format a unique dataset before inputting the data to the Keras artificial neural network.
Weaknesses	Code is currently only available in Python, since it relies on pysc2. There is a learning curve to pysc2, although we provide some scaffolding via an introductory, exploratory assignment. Our install instructions in checkpoint 1 are currently written for Visual Studio on Windows.
Dependencies	The prerequisites for this assignment include programming (in Python), data structures, and probability and statistics. This assignment is designed for students who have taken at least a CS0 and CS1 programming class. Starcraft requires approximately 30 GB of disk space and a video card on student computers (official dependencies).
Variants	There are additional assignments that could be built to further expand on this initial assignment. Romo and Jain applied Q-learning using pysc2; however, a custom Starcraft II map was utilized that does not align with the game. Brown also built a PySC2 Agent that utilizes Q-learning to beat an adversary. Assigning the students to build a Q-learning Agent may not work, since the complete solution is freely available, but building a Deep Q-learning Agent (where a neural nework learns the Q-values of each state) could be an interesting variation. Building a competitive framework, where each student team submits their Agent model to their instructor and then the instructor stages a competition between the teams, could motivate some students. Finally, DeepMind's champion model utilized a Convolutional Neural Network (CNN) to decide which points in the map to send attacking units instead of statically defining these points (see Vinyals et al., joint work by the PySC2 and Blizzard teams). Adding the CNN would increase the students' knowledge of a deep learning implementation. While this assignment isn't an implmentation of Q-learning, it could be extended to use a more effective reward function. `pysc2` exposes a game score that is updated after every step, and feedback that an action added 50 points is more immediate and more powerful than eventual feedback about the final game state (win, loss, or tie). Another way the assignment could be extended would be to allow more general game agents. For simplicity, the Starcraft player agents are currently all of the Terran race. If students abstracted the race away, they could test the generalizability of the policies learned: does a Terran policy work for a Zerg agent?