What’s the connection between evolutionary algorithms and mother nature, and how can it help solve complicated computing problems?
Wikipedia defines evolution as “a change in the heritable characteristics of biological populations over successive generations”. While it often relates to mother nature, animals or humans, it’s also a part of the computing world.
In the following post we’ll explore evolution through genetic algorithms, and see how it can help us solve problems faster.
— OverOps (@overopshq) April 6, 2017
Natural Selection in Code
Genetic algorithms imitate the evolution process in nature by evolving superior solutions to problems. Natural selection plays a major part in this process – the differential survival and reproduction of individuals due to differences in phenotype.
According to Charles Darwin, also known as the “father” of evolutionary thought, natural selection is the key force that helps preserve good changes, and eradicate bad ones. This helps populations improve over time.
Genetic algorithms mimic the power of evolution with code, along with natural selection, in order to solve problems better and faster. In computing, our population consists of a collection of solutions to a specific problem. Once we have the population, we can move on to the evolution process, which consists of the following steps:
1. Fitness – Giving a score to each solution, that represent how good it is
2. Selection – Selecting pairs of parent solutions according to their Fitness value
3. Crossover – Breeding the selected parents to produce an offspring
4. Mutation – Mutating the offspring solution with a very low probability
The evolution process here leads to finding a “superior” solution to the problem, or at least so we hope.
The Evolution of the Traveling Salesman Problem
When choosing to use genetic algorithms (that’s part of evolutionary algorithms), the first thing we need to understand is how to represent an individual solution in our population. In order to understand how to do that, we’ll use the Traveling Salesman Problem (TSP):
Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city exactly once?
Or in other words, the Uber line or Lyft pool problem 😉
A solution to the problem can be represented as an ordered list of cities, when each one describes the desired route. And it’s important to point out that every city should be listed exactly once.
In this example, our solution population consists of a collection of lists of cities. To form our first generation, we will randomly generate a collection of solutions, and we need to make sure that each generated solution is valid (and is a possible solution to the problem). We call these solutions chromosomes.
The next step will be taking these chromosomes into a process of evolution, and use the power of natural selection to obtain a better population.
In order to achieve this, we first need to evaluate each chromosome and give the better ones higher probabilities to produce children. This is done by a function called the Fitness Function that receives a chromosome, and returns a score that represents how good the chromosome is in terms of the problem. Going back to our example, the function could be the sum of the distances in the route.
At this point we take the scores, and each chromosome will get a relative probability to be selected as parent for the next generation. And as we mentioned earlier, better chromosomes will get higher probabilities. Or in other words, this is where we use natural selection and now it’s time to move to the process of evolution.
Creating a Better Solution, Each Time
Our main goal is to select “parents” based on their probabilities (calculated by their fitness score), breed them, and move the produced child into the next generation. The breeding process is derivative of the problem, when we need to find a way to combine the parents into a valid solution.
When trying to breed parents, the first idea that comes up is to take 50% from each one. However, in the Travelling Salesman Problem (TSP) it might lead to an invalid solution – in which each city will appear more than once.
So how can we solve this?
By taking the first part from the first parent, and then taking the rest of the cities according to their order of appearance on the second parent solution. We can be as creative as we want, as long as we make sure that we’re generating a valid solution that combines both parents.
Before we can take the child we created into the “next generation”, we need to apply mutation with a very low probability. How can we do this? Well, it depends on the problem we’re facing and just like before, we need to make sure that the mutated chromosome is still a valid solution.
In our TSP example, this means that we can draw two cities and switch them – the mutated route is valid since each city appears once.
Circle of Life
Unlike mother nature, we are limited by our memory size and this means we want to work with a fixed size population. We start with a stable pool of N parents, each time we raffle 2 parents, create a child and return the parents back to the pool. We do this process N times in order to get N children, which take us to the next generation. N can be in the range of 10’s to 100’s, according to our memory limit.
Then, just like the circle of life, we will calculate a fitness function for each child, give it selection probabilities, run the evolution, get a new generation and so on. This process can go on forever.
When do we stop? That’s pretty much up to us. For example we can stop:
– After X generations were made
– If we apply a time limit
– Once we reach our fitness threshold (which we set in advance)
In the TSP, we could set a threshold of 50 Miles which means that once a route shorter than that was found, we got a satisfied answer for the problem at hand.
When and Why Should We Use Genetic Algorithms?
Now that we know how genetic algorithms work, it’s time to wonder when we should actually use them. The top use case is for optimization problems, when we’re facing a big search space, where other search algorithms have failed. But, why would we think that genetic algorithms would perform better?
When thinking about it, we just start with a random collection of solutions hoping to eventually find a good one. It almost feels like we’re counting on sheer luck for our process to work.
How do we make it work? By using:
1. Mutation – that encourages diversity which spreads the chromosomes over the search space, helping us discover as many hills as possible
2. Selection and Crossover – that encourages homogeneity by mating the top performing chromosomes with higher probabilities.
Selection and Crossover will help us “climb up the hills”, and all three elements combined will handle the high number of solutions we’ll get.
We know how and when to use genetic algorithms, the only thing left is to understand if we can use it for the problem we’re facing. There are 2 prerequisites that makes our problem eligible for using genetic algorithms:
1. We need to understand how to represent a solution to the problem
2. We should have the ability to calculate continuous fitness
If our problem meets these criteria, we’re good to go. There are some nice examples of problems genetic algorithms helped solve, but our favorite one is the evolving Mona Lisa, in which the algorithm creates an approximation of the Mona Lisa using 250 semi transparent circles.
One of the coolest thing about genetic algorithms is that it’s a general framework, so we don’t have to be experts in the specific field of the problem we’re facing. As long as we have the prerequisite in hand (represent a solution to the problem and calculating continuous fitness) – we can use an existing library and get the best solutions.
This post is based on my Genetic Algorithm lecture from BeeScala 2016. Get the slides here.