A Genetic Algorithm for Robby the Robot

In this assignment, you will implement a genetic algorithm to evolve control strategies for Robby the Robot, as described in Chapter 9 of Complexity: A Guided Tour by Melanie Mitchell, which was handed out during class. A control strategy will be represented as a string of characters that code for the following robot actions:

Your GA will maintain a population of genomes representing control strategies. Each genome will be a 243-character string specifying the action that Robby should take for every possible situation he might find himself in. Robby's "situation" will be encoded as a 5-character percept string specifying what Robby currently sees in his immediate vicinity. For example, the string 'WECWC' means there is a wall to the north, an empty grid cell to the south, a soda can to the east, another wall to the west, and a soda can in Robby's own grid cell. Each percept string corresponds to a unique code number in the range 0-242, which indicates the (0-based) index number of the situation, as illustrated in Table 9-1 on page 132 of the handout. The job of your GA is to evolve a good control strategy that maximizes Robby's average cumulative reward as he collects empty soda cans for 200 time steps.

Robby's World

To use the simulator, download the file robby.zip and unzip it. This will create a folder named robby. Put this folder in the same location as your Python program for this assignment. IMPORTANT: do not put your Python file or any other files inside the folder; just make sure the folder is in the same location as your program file. You should not modify any of the code in this folder. Then put the following lines at the top of your program file, which will create a 10 × 10 grid world named rw when your file is loaded into Python:

import robby
rw = robby.World(10, 10)

To interact with the world, you can use the following commands:

For example, you can watch strategy M in action by typing the command rw.demo(rw.strategyM) at the Python prompt.

GA Implementation Details

I would recommend using the following parameters for your GA:

The fitness value of a control strategy should be the average total reward accumulated during a cleaning session when following the strategy, averaged over 100 sessions. For this assignment, you should use rank selection when selecting genomes for crossover and mutation. Many control strategies can have negative fitness values, which would cause problems with fitness-proportionate selection. To implement rank selection, first calculate the fitness of each genome in the population. Then sort the genomes by their fitness values (whether positive or negative), in increasing order from lowest to highest fitness. The selection weight of a genome will be its rank number from 1 (lowest) to 200 (highest), instead of the fitness value itself. Here is one way in Python to sort a list of genomes based on fitness. Assuming that a function called fitness is defined for genomes, the function sortByFitness shown below takes a list of genomes (strategy strings) and returns (1) a new list of the genomes sorted into ascending order by fitness value, and (2) a list of the corresponding fitness values:

def sortByFitness(genomes):
    tuples = [(fitness(g), g) for g in genomes]
    tuples.sort()
    sortedFitnessValues = [f for (f, g) in tuples]
    sortedGenomes = [g for (f, g) in tuples]
    return sortedGenomes, sortedFitnessValues

Experiments

Run your GA for a total of 500 generations. Every 10 generations, your GA should record the best strategy from the current population, along with the strategy's fitness value, in an output file called GAoutput.txt. More specifically, each line of the file should contain four items separated by whitespace, in the following order: (1) the generation number, (2) the average fitness of the whole population for this generation, (3) the fitness value of the best strategy of this generation, and (4) the best strategy itself. You may also want to have your GA periodically demo the best strategy found so far (every 10 or 20 generations, for example). That way, as the evolution proceeds, you can watch Robby's progress as he learns to clean up his environment.

You should also experiment with different GA settings (population size, number of generations, crossover and mutation rates, etc.) to see how quickly your GA can discover a really good strategy. Save the best strategy your GA managed to find in a text file called bestStrategy.txt, and write a 1-2 page summary of your experiments, including the GA parameter settings that produced the best strategy.

What to Turn In

Submit your Python program file, your GAoutput.txt data file, your bestStrategy.txt file, and your written summary (in PDF or Word format). Also please turn in a printout of your written summary during class. If you have questions about anything, don't hesitate to ask!