Showing posts with label genetics. Show all posts
Showing posts with label genetics. Show all posts

Tuesday, December 29, 2020

Evolution and Evolutionary Algorithms: Selection, Mutation, and Drift

A guide to the three pressures that shape innovation in living and non-living systems.

(a version of this article has also been posted to Medium)

Diagram bringing together genotypic diversity, phenotypic diversity, drift, mutation, and natural selection

I teach a course on Bio-Inspired AI and Optimization that is meant to be a graduate-level survey of nature-inspired algorithms that also provides a more serious background in the natural-science (primarily biological) fundamentals underlying the inspiration. The first half of the course covers nature-inspired optimization metaheuristics, with a heavy focus on evolutionary algorithms. An evolutionary algorithm is a scheme for automating the process of goal-directed discovery to allow computers to find innovative solutions to complex problems. There is a wide range of evolutionary algorithms, but a common feature of each is that the computer generates a population of random candidate solutions, evaluates the performance of each of these candidates, and then uses the best of these candidates as “parents” to guide the generation of new candidates in the next generation.

Most of my students come from computer science or engineering backgrounds and, as such, have very little formal education in biology let alone something as specific as population genetics (“popgen”). However, to really understand the complex process of evolutionary innovation inherent to evolutionary algorithms (and evolutionary computing in general), it requires at least some fundamental background in popgen. I think when most people reflect back on their high-school biology courses, they might remember something about natural selection and mutation being important in thinking about the evolution of adaptations in natural populations. However, there is a third evolutionary force that is extremely important — especially when considering small populations, like the ones that are artificially generated in an evolutionary algorithm. That force is (genetic) drift. So let’s review all three:

  • Natural selection reflects that some individuals in a population will be at a fundamental disadvantage with respect to other individuals. Those individuals (who are, in the computational creativity context, are relatively poor solutions to a problem) will be very likely to be “selected out” in large populations because there will be so many other individuals who are relatively “fitter.” “Fitness” is a measure of how many offspring an individual can put into the next generation given the current context. If some individuals can put more individuals into the next generation than others, they are “more fit.” If all individuals have the same fitness, then every parent has the same chance of getting her offspring into the next generation. If some individuals have less fitness than others, then they have less chance of getting their offspring into the next generation.
     
    Some people are taught that natural selection only matters when resources are scarce and thus population sizes are limited (thus making individuals compete for opportunities). This is not the whole story and is why we must discuss (genetic) drift below. Before getting into that, note that even in populations that are not limited, differences in the rates of growth of different strategies will gradually change the relative share a strategy has of a population. So even without resource limitation, differences in “relative fitness” will naturally select for the most fit individuals to have the strongest share of the population.
     
    By itself, selection can only tune the relative proportions that different strategies have in a population. However, many evolutionary processes have a way of blending from different parents to create offspring that somehow interpolate from those parents. In biology, we view “sex” as the primary way in which we see “recombination” of strategies. There are sex-like mechanisms in evolutionary algorithms that do the same. So when natural selection is combined with recombination (“sex”), we get optimization combined with a little bit of goal-directed novelty generation. However, recombining strategies across different parents can deleterious because breaking up two functional strategies and putting them together does not guarantee that the result will itself be functional. Those strategies that result that are functional might improve upon both parents, but the novelty may be limited because it simply borrows from strategies of the parents.
  • Mutation is one way to introduce novelty that can be tuned to be less disruptive as recombination while also producing more novel solutions than recombining solutions from the parent generation. In mutation, random changes in a parent strategy are introduced. In a population of clones of a single strategy, mutation introduces novel variations that generates differences in offspring that hopefully lead to differences in relative fitness. These fitness differences will cause some mutations to grow in representation and others to shrink in representation. So one of the functions of drift is exploration to find new candidate solutions that might be better than anything in the current population. However, another important function of mutation is to balance the stagnating force of genetic drift.
  • (Genetic) Drift is a subtle but extremely important evolutionary pressure that represents what occurs when population sizes eventually meet their limits. As mentioned above, in a world of plentiful resources, natural selection will allow every strategy to survive and produce offspring, but strategies that produce more offspring will grow in their share of the total population. Eventually, if the population is very large and becomes limited in how much it can grow, those strategies that have a lot of representation will have a much higher probability of being represented after the limitation kicks in. In other words, when population sizes are high, resource limitation is a culling effect—strategies that are more fit tend to be selected to continue and strategies that are less fit are “selected out” and removed. However, this culling effect eventually leads to its own demise as the removal of low-fitness individuals also results in the removal of diversity which is required for natural selection to work. As mentioned above, the action of natural selection only optimizes among the diversity of solutions in the parent generation. If the parent generation has no diversity, then there are no improvements that natural selection can make. When a population finds itself full of identical individuals and thus stuck and unable to generate any new novelty, we refer to that population as being “fixed” or having reached “fixation.” Genetic drift represents this gradual march toward fixation. Natural selection, when combined with population limitation, is always being pulled toward fixation where natural selection will fail to be able to act.
     
    Fortunately, mutation (mentioned above) can rescue us from drift. Mutation introduces new variation in a population, and natural selection can choose strategies out of that new variation. So if we want to combat drift, we can just crank up the mutation rate. The downside of that is that the mutation rate also quickly corrupts well-performing strategies. So populations that have a high mutation rate will tend to have a diverse set of strategies within them and maintain a diverse set of fitnesses. Some individuals will have very high fitness, but they will co-exist with individuals with very low fitness (due to mutation) that are just a side effect of the stabilizing force of mutation. Reducing the mutation rate helps to ensure all solutions have similar fitness, but there is never any way to know if a population of individuals with similar fitness is because their shared strategy is good or they simply reached fixation too soon.
     
    The problem of reaching fixation “too soon” is particularly strong for small population sizes. In a small population size, small differences in fitness may fail to generate sufficient selective pressure to dominate the force of genetic drift. For example, in a population that is limited to a size of 10, an individual with a fitness 1/100 of some other individual may still by “good luck” produce a single offspring in the next generation. That offspring, although 1/100'th as fit of a strategy as some other in the population, nevertheless takes up 10% of the next generation. So for small population sizes, mutation and drift are essentially the only drivers of evolution.

So when building an evolutionary algorithm, it is important to start with a diverse population and then build mutation and selection operations that maintain diversity as long as possible (staving off genetic drift). So long as the population is diverse, natural selection will continue to explore large regions of the strategy space. However, if mutation is too strong, then it will limit exploitation and tuning of strategies because adaptations that make small changes in fitness will quickly be lost to mutation. Consequently, if you have the computational budget, it is best to build very large population sizes with very low mutation rates and choose selection operators that moderate selection pressure — giving low-fitness strategies a chance to stay in the large population pool.

Similarly, when thinking about evolution in natural systems, it is important to remember how large the ancestral populations were. Those that evolved in large-population contexts may tend to show more signs of natural selection (and will likely have evolved mechanisms to reduce the mutation rate). Those that evolved in small-population contexts may tend to have high mutation rates and show diversity patterns more closely related to randomness. This latter case relates to neutral theories of evolution, which are important to consider when trying to understand the source of observed variation in systems we see today.

This story is summarized in the graphic I’ve prepared above, which shows mutation and natural selection as forces re-shaping populations within a drift field that, in the absence of those forces, will eventually homogenize the population on an arbitrary strategy. 

So how do we come up with interesting new ideas for mutation and selection operators for evolutionary algorithms? We should continue to look at population genetics. In fact, some theories in population genetics (like Sewall Wright’s shifting-balance theory) are much better descriptors of evolutionary algorithm behavior than the more complex evolutionary trajectories of living systems. For example, distributed genetic algorithms, which create islands of evolutionary algorithms that only exchange population members across island boundaries infrequently, tend to out-perform conventional genetic algorithms on the same computational budgets for reasons that make sense in light of population genetics. This is a more advanced topic, and you’re welcome to read/listen more about this in my CSE/IEE 598 lectures. For now, I hope you look at living and non-living populations around you through the lenses of mutation, drift, and natural selection.

Wednesday, September 28, 2011

Why symbiosis is not a good example of group selection

[ In a Google+ post, someone asked whether symbiosis was a good example of group selection. I responded in a comment, and another comment asked me to expand my response a little bit in my own post. So here is a copy of that post (with a few more hyperlinks). ]

Part 1: What is group selection?

Typically "group selection" doesn't cross species boundaries. That is, group selection refers to the proliferation of a particular form of a gene, otherwise known as an "allele", due to its benefits to groups of individuals which share that allele despite the individual costs of having that allele. It may help to consider the basic group-selection argument for the evolution of altruism (i.e., the evolution of behaviors that are costly to an individual and yet beneficial to a different unrelated individual). Before that, consider why we wouldn't expect altruistic alleles to have strong representation in a population.

For every gene or group of genes, there can be many different variations (alleles). Some of those variations will be deleterious to an individual, and so you would expect the relative representation of those deleterious variations to decrease over generations. So imagine if one of those alleles encoded an altruistic trait that caused an individual to do something costly for the benefit of another (e.g., helping a stranger understand group selection with no expectation of future payoff). Individuals with that allele are suckers. Those without that allele instead focus on tasks that return direct benefit to themselves, and that direct benefit would payoff with greater productivity of offspring that share that non-altruistic allele. When an altruist met a non-altruist, the benefit from the altruist would increase the non-altruist's alleles representation in the next population while decreasing its own alleles' representations. So we would expect that altruistic alleles would fade away into obscurity. Moreover, the benefit from all of the altruists would diffuse across the variety of alleles rather than being concentrated on just the altruistic ones.

However, what if that altruistic allele also encoded a behavior that would seek out others with that same allele. This non-random association means that each individual who helps another does actually help to increase the productivity in that allele. That is, even though there is a cost to the individual doing the altruistic task, the benefit going to the other individual is felt by the other copy of the same allele in the different (and unrelated) person. So when these altruists group together, altruistic benefits do not diffuse. They are captured within the group. Moreover, the group's synergy can cause it to be more productive than the remaining groups of non-altruists. Consequently, the altruistic allele not only persists in the population, but its representation can grow because there is a differential benefit between altruistic and non-altruistic groups. It is this differential benefit between groups that is group selection.

Part 2: Symbiosis and Mutualism

A symbiotic relationship between members of different species is not group selection (in general) because it does not posit that there is a mutual allele that may be deleterious in an individual but beneficial in a group. That is, there is no group synergy that is mitigating individual costs by generating benefits elsewhere that help to support alleles that would otherwise naturally decay. When species are mixed within a population of interest, the analysis is a bit different because alleles cannot flow across the species barrier (except for special cases).

For example, consider an allele that existed across species (e.g., an allele for a gene shared between humans and bonobos), the speciation in general would prevent the sort of group selection gains because there would be no way for increased numbers of alleles in one species to transfer to the other species. Imagine that altruists in one species seek out altruists in the other species. The result could lead to more increases in the altruist representation in one species than another, and so there would be an altruist surplus. Those surplus altruists would have no choice but to associate with non-altruists in the other species. However, if the group was all of one species, then there would not be surplus altruists. Altruistic benefit need not diffuse across non-altruists too.

However, most examples of symbiosis are not altruistic. Instead, they are mutualistic. That is, the behavior does benefit another, but that is a possibly unavoidable side effect of an action that benefits the individual doing the behavior. For example, if I'm driving through a parking lot looking for an empty space to park, I am revealing information to my competitors (other drivers) about where empty spots are not. I don't want to help the competing drivers, but it is unavoidable because they can see me go down an isle of the parking lot and not find a spot. Consequently, they do not go down that same isle. Of course, I use their searching behavior to inform my choices of the next isle. So we are doing "cooperative search" only because the behaviors have mutual benefits. The same goes for many symbiotic relationships among individuals of different species.

Consider a remora ("sharksucker"). It's a small fish that essentially attaches to another host (fish, whale, turtle, etc.). It can receive nutrients from on or around the host. It can also be protected from predators that avoid the host. In some cases, the host could eat the remora, but the remora is so small that it may not be worth the effort. Some hosts actually receive a small benefit (cleaning, for example) from the remora. Regardless, the remora experiences very little cost and plenty of benefit. Moreover, the host experiences very little cost and possibly some benefit. So there's no surprise that this behavior evolved. You don't need any fancy mathematical model to show how this is possible – when the benefits align like this, it's natural to assume that it is going to be favored by natural selection.

Part 2.5: Symbiosis and Co-evolution

Having said all of that, symbiosis can lead to elegant examples (or at least suggestions) of co-evolution, which describes how a change in one species can lead to a change in other species. In particular, natural selection on different species creates a feedback across species. One species is the ecological background for another species, and so as each species changes it creates new niches (and destroys old ones) for other species. So the evolution of one species can guide the evolution in another. But I think this post is long enough. :)

More information

Wikipedia does a pretty good job on these particular subjects. Check 'em out there.

( I have also mirrored this content on a post on my website. )