So recently I’ve been taking a course on *Bio-inspired computer science*,
produced by a great teacher/researcher: (Guillaume
Beslon)[http://liris.cnrs.fr/guillaume.beslon/). Needless to say that is was
way too interesting for me not to get my hands dirty and try out the theory.

You probably already know this problem: a traveling salesman needs to visit a series of cities as fast as he can. It is a NP-hard problem with many possible approaches in order to solve it.

I used the open dataset available here to test the algorithm.

To be honest, in this case a genetic algorithm isn’t adapted: in a 2D world,
where all distances are euclidean and no boundaries/obstacles are involved, it
would probably be faster and even more robust to use a basic heuristic
approach. Still though, it was an **interesting exercise**.

There are many things to say concerning genetic algorithms, so I’ll try to keep it short and to the point.

Genetic algorithms (GAs) make use of an *evolutive process* for a *population
of solutions* (or genomes, or individuals – there’s a bit of biological
terminology). GAs also produce somewhat “suboptimal” solution to the problem.

You can use a GA in multiple cases, the two big use cases in my opinion being:

- the solution set is too big for brute-force
- you have no idea how to find a good solution

There are a few things you need to be able to do:

- generate a random set of solutions
- estimate how good a solution is, or at least compare two solutions
- slightly and randomly modify a solution, changing how good it is
- combine two solutions somehow, making a “child” solution

Here’s a bit of pseudocode/python describing what a genetic algorithm usually looks like:

And that’s it! You mainly need to implement selection/reproduction, which are
the two methods you’ll need to think about (random_population is fairly
straightforward in most cases). You’ll notice the variety of parameters, which
is probably the weirdest thing about GAs: **they need a bunch of parameters**.

In the case of the TSP, any permutation of cities is in fact a solution to the problem. Therefore we’ll define the genome of a solution as an ordered list of cities. Note that we will never try to optimize the path itself, but the idea here-and behind any genetic algorithm-is actually to evolve a population of solutions, somehow selecting the “fitter” solutions and producing other solutions which resemble these.

A genome being defined as a permutation of cities, we can initialize a random population as follows :

Personally, I like to write one-liners, here I’m a bit disappointed since I
can’t actually `random.shuffle`

modifies the given list in place and doesn’t
return anything.

*Ah, the interesting part!* There are a lot of ways to do selection in genetic
algorithms, such as elitism, roulette-based, tournament and a few others I
haven’t worked with…

Each method has its benefits and drawbacks, but I personally prefer the roulette-based method, which seems more “just” than others (who said nature was just?).

The reproduction phase of a genetic algorithm is what I believe to be the most
important phase. Now some people say the beauty of the evolutionary process is
the balance between selection and mutation: **that is very true**. I simply
think that the selection process is further from the problem that the
reproduction phase.

This phase can be describe as follows: given a certain population of parents, combine them together following a certain strategy to produce the new population for the next iteration of the algorithm.

Again, there are many ways to do this, a few notable ones are:

- monogamous and mortal: parents are “married” and have a number of children (random distribution), and have a certain probability of dying. This situation follows our current western model and the probability distribution is usually mirrored from actual statistics.
- selective, immortal and polygamous: parents have children with others until the population limit is reached.

In the case of this problem, I chose to go with the second version, based on the simplicity of the problem.

The mutation operator consists in a *slight* modification of a new child’s
genome upon creation. This is similar to a random walk and is one of the core
concepts of an evolutionary algorithm. Imagine if we only had a combination of
our ancestor’s genome, would that conceptually make us any different from
someone else than, say, our common ancestor?

Like all genetic operators, the mutation has a (generally small) probability of happening.

It must affect the genome of the child *slightly* (this is important), if the
modification is too great, the GA can be assimilated to a random search. In my
case, I chose to swap the order of two cities to keep the solution somewhat
similar, yet changing it.

The cross-over operator is what defines the combination of two parent solutions to generate a new child. Think of it as taking genomes and merging them together by taking parts of each genome. Once again, there are many ways of doing this, I decided to swap a fragment of the genes of both parents and choosing one of the two resulting genomes (one being the “opposite” of the other).

Like all genetic operators, the cross-over has a probability of happening.

I’ve included here below the results of running the GA for the TSP with the following parameters:

- number of iterations = 200
- population size = 80
- selected proportion = 20%
- cross-over probability = 30%
- mutation probability = 5%