Introduction to Genetic Algorithms
In this assignment, you will study the performance of a genetic
algorithm at solving three problems of increasing difficulty: the
pattern problem, the traveling salesman problem, and the nurse
scheduling problem. I have
provided the bulk of the code for you, but there are a few methods
remaining for you to implement.
About the code
This code demonstrates a Pythonic technique known as
"mixins". This should be familiar to you if you've used Lisp or
Scheme; Java's interface system is somewhat similar.
Mixins take advantage of Python's support for multiple inheritance
to "mix in" behavior from several unrelated classes. In our case,
we'll use it to deal with one of the common frustrations involving
GAs: all the different choices for implementing the algorithm.
We will mix in behavior from four different classes:
- GA. This is the class that performs the actual
algorithm. It has an __init__ method, and a run() method which
runs the GA.
- Selector. This is the class that governs selection. It has a
selectChromosomes method that takes a population as input and
returns two individuals to act as parents. It is subclassed by
TournamentSelector and RouletteSelector.
- Elitism. This is the class that governs elitism (retention of
individuals from one generation to the next.) It implements one
method: ApplyElitism, which take a population and an elitism rate
and returns a list of individuals to be kept for the next
generation. It is subclassed by DeterministicElitism,
TournamentElitism, and RouletteElitism.
- Crossover. This is the class that governs the generation of
new individuals. It implements one method: crossover, which takes
as input two parents and returns two children. It is subclassed by
OnePointStringCrossover, TwoPointStringCrossover,
PermutationCrossover, and GreedyCrossover.
We also need a class to contain the problem knowledge, which
includes a fitness function, a method to generate an initial
population, and a method to mutate chromosomes. I have provided
three of these, along with an abstract base class.
- BitStringGAProblem contains all of the information needed
to solve a "bitstring" style GA problem. Its constructor takes as
input a function to be used for fitness, and the length of each
string in the population.
- TSPGAProblem contains all of the information needed to
construct and solve TSP-style problems. Its constructor takes as
input a fitness function (I've provided one you might like called
computeTourCost) and the number of cities to work with.
- NurseSchedulingProblem contains the information needed to construct
and evaluate the nurse scheduling problem (see below). You will
need to implement a fitness function and the associated constraints.
This architecture may seem complicated, but it allows us to easily try
out different combinations of elitism, crossover and selection
mechanisms without editing, recompiling, or generating config
files. Instead, we create a mixin that inherits the specific classes
whose behavior we want to incorporate. For example, let's suppose we
want to solve the "all-ones" problem: find a bitstring with all ones
in it. This is easy, but will help us get going.
We start by making a fitness function:
>>> def allOnes(chr) :
return len([x for x in chr.bitstring if x == '1'])
This is not the only possible fitness function; it's just as easy one
to get us started. The key is that it should return larger values for
solutions that are closer to the goal. Try writing fitness functions
to match particular patterns, or compute parity.
Now, we create a BitStringProblem. Let's suppose we want each
individual to be of length 20.
import BitStringGAProblem
b = BitStringGAProblem.BitStringGAProblem(allOnes,20)
Lastly, we build a solver for this problem. Let's suppose we want to
use deterministic elitism, single-point crossover, and roulette
selection. We would do:
>>> class mySolver(GA.GA, Elitism.DeterministicElites, Crossover.OnePointStringCrossover, Selector.RouletteSelector):
... pass
This creates a class called mySolver that "mixes in" the behavior we
want. We then create an instance of this class, pass it the problem we
want to solve, and run it.
>>> m=mySolver(b)
>>> m.run()
(output omitted)
Run contains a number of optional arguments:
- popsize. How many individuals are in the population.
- elitismRate. The fraction of individuals carried over to the
next generation without crossover.
- mutationRate. The probability that a child will undergo
mutation.
- itersToRun. How long the GA should run.
I've left a few methods for you to implement:
TournamentSelection and GreedyCrossover. TournamentSelection is pretty
straightforward; GreedyCrossover is more challenging. (There are some
hints in Crossover.py). You might want to get comfortable running the
code first and start working on the next question, and then return to
this at the end.
The bitstring problem is a good place to get started, because it's
easy to see what the solution should be, but it's pretty boring. We
don't really need all this infrastructure just to write a program that
generates a string of all ones.
This assignment also contains two "real-world" problem: the nurse
scheduling problem and the traveling salesman problem. This will let
you experiment with using a genetic algorithm to solve a more complex problem.
The nurse scheduling problem can be stated as this: given a set of nurses n1 - nk, a
set of shifts s1-sj, and a set of constraints c1-cm, find an
assignment of nurses to shifts such that all constraints are
satisfied. (Or, if this is impossible, such that the constraints are
minimally violated.)
We can represent the nurse scheduling problem as a matrix, with nurses
on the rows and shifts on the columns. A one in a cell in the matrix
indicates that the nurse is scheduled for that shift, and a 0
indicates he/she is not. for example:
s1 s2 s3
n1 0 1 1
n2 1 0 0
n3 0 1 0
We can encode this as the bitstring 011100010.
NurseSchedulingProblem.py creates a NurseSchedulingProblem by
providing a set of contraints to a NurseFactory, which is
repsonsible for creating these objects. By creating different
constraints, you can specify different problems.
To begin, run the GA with the nursing problem on a simple example to
get the feel of it. I've provided you with two sample constraints:
oneNursePerShift, which says that each shift should have exactly one
nurse working it, and oneShiftEach, which says that each nurse should
work exactly one shift. (for there to be a solution, there must be the
same number of nurses as shifts).
We can build a nurse scheduling problem like so
constraintList = [oneShiftEach, oneNursePerShift]
nurseProblem = nurseFactory(constraintList)
b = BitStringGAProblem.BitStringGAProblem(nurseProblem,25)
This will create a nurseProblem with 5 nurses and 5 shifts, and attach
the two constraints. We can then run this as any other bitstring GA
problem.
You should add the following constraints:
- There must be at least one nurse, and at most three nurses, on
each shift.
- Each nurse should be scheduled for five shifts per week.
- No nurse can work more than three days in a row without a day
off.
- Nurses prefer consistency - they would like to always work the
same shift (days, evenings, or nights).
You should write constraint functions for each of these
constraints. (In other words, write four separate functions.) These
functions should return (0-number of constraints violated). For
example,
- a schedule with one nurse working 6 shifts, and one working 4,
and all others working 5, would return -2 when evaluated by the
fiveShiftsPerNurse constraint.
- A schedule with a nurse working 3 day shifts and 2 night shifts
would return -2 when evaluated by the consistency constraint.
You will want to test each of these constraints separately.
The traveling salesman problem requires a different approach. The
problem can be stated as follows: Given a set of cities c1-cn, and a
set of distances dij between each pair of cities, find the shortest
route that visits each city exactly once and returns to the starting
city.
This problem is typically represented as a graph, where cities are
vertices and distances are edges. In using a GA to find a solution,
we must decide how individuals are represented and altered.
For this assignment, a TSP solution (implemented as TSPChromosome)
is a list of vertices indicating the order they are visited, and the
fitness is the corresponding tour. (an aside: this is a minimization
problem, so you will want to design your fitness function to return
MAXLENGTH - tour length). The number of possible solutions to this
problem is the number of permutations of our list of vertices.
The biggest challenge with using a GA to solve a graph problem is
developing appropriate crossover and mutation operators. We can't
just use single-point crossover, as the resulting individuals will
not be proper tours. Instead, we use two graph-specific crossover
operators: PermutationCrossover and
GreedyCrossover. PermutationCrossover is implemented for you;
GreedyCrossover is left for you to implement (pseudocode is
provided.)
A challenge with GAs is knowing what parameters to
use and what methodologies to choose. To address this, you should
perform a set of experiments and prepare a report that summarizes your
results.
There are lots of potential experiments to perform; rather than
explicitly list them all, I have provided some broad
guidelines. Within these guidelines, you may choose how to explore the
effect of different choices. You may want to focus on the time needed
to discover a solution, the quality of the solution found, or
both. You are free to modify or extend the code however you like,
including adding functionality to capture or log partial results,
print out answers, or add additional features. Please include a
README that describes any changes you have made.
Suggested Experiments:
- Compare the effect of different selection operators on each
problem.
- Compare the effect of different elitism mechanisms on each
problem.
- Compare the effect of different crossover operators on each
problem.
- Study the effect of different mutation rates or elitism rates.
- Study the effect of different population sizes.
- Compare the GAs ability to solve different varieties of the
pattern problem. Are some easier than others?
You may choose to do some or all of these experiments, or others
that you find interesting. Quality is more important than quantity;
a smaller number of carefully-crafted experiments with clear
conclusions is more interesting than a large number of shallow
experiments.
What to turn in: You should turn in a report (in Word or PDF)
that describes your experiments. Make sure you include enough
detail that your experiments can be replicated (e.g. all relevant
paramter settings). I would strongly recommend including a graph for
each experiment. You may use whatever graphing software you like,
but please be sure that your graphs are clear and legible. You
should also make sure that you explain the results of your
experiments. Don't just say "graph 3 shows compares roulette
selection and tournament selection" - explain what your conclusions
are.
You will be graded on the thoroughness, quality, and presentation of
your experiments. Typical ways students are marked down are unclear
or incomplete graphs, little or no explanation of results, and a
lack of depth in experiments. Perfect English is not expected, but
we do need to be able to understand what you are trying to say.
Files
Below are links to the files needed for this assignment:
- GA.py The class implementing the genetic
algorithm.
- GAProblem.py An abstract base class describing
a problem to be solved. Specific problems will subclass this.
- Crossover.py Classes implementing
different types of crossover operators.
- Elitism.py Classes implementing
different types of elitism.
- Selector.py Classes implementing
different types of selection.
- BitStringGAProblem.py A
class implementing the problem-specific knowledge for a bitstring
problem such as all ones or parity.
- NurseSchedulingProblem.py A
class implementing the problem-specific knowledge for a nurse
scheduling problem, along with some sample constraints.
- TSPGAProblem.py A
class implementing the problem-specific knowledge for a traveling
salesman problem and correxponding chromosome.
- NurseFactory.py A
helper class for instantiating NurseFactory problems.