C1 - CiteSeerX

Transcription

C1 - CiteSeerX

The evolutionary consequences of redundancy in
natural and artificial genetic codes
Guillaume Barreau
Submitted for the degree of D. Phil.
University of Sussex
May, 2002
Declaration
I hereby declare that this thesis has not been submitted, either in the same or different
form, to this or any other university for a degree.
Signature:
Acknowledgements
I would like to thank Jean-Arcady Meyer and Phil Husbands for their guidance in the early
stages of this project and for helping me to obtain financial support from the European
Commission.
I would like to express my gratitude to Inman Harvey and Phillip Jones for stimulating
discussions and critical comments on this work, to Jason Noble and Oliver Sharpe for
reading and improving this manuscript and to Stephen Eglen for his help with too many
things to list here.
Finally, thanks to Arantza Etxeberria, Cecile Fairhead, Jason Noble, Lisbeth Barreau,
Margarita Sanchez, Olivier Colin, Paulo Costa, Rafael Perez y Perez, Rodric Hemming,
Sarah Bourlat, Stephen Eglen, Teresa del Soldato and Valeria Judice for their encouragements and unwavering support along this sometimes difficult path.
The evolutionary consequences of redundancy in
natural and artificial genetic codes
Guillaume Barreau
Abstract
Whilst the existence of redundancy within the genetic code has been recognised for some
time, the consequences of this redundancy for natural selection have not been granted
any attention by theoretical biologists. We postulate an adaptive value to the pattern
of redundancy found in the modern genetic code and argue that redundancy might also
be beneficial to the performance of genetic algorithms when introduced at a similar level
in their encodings. We define a formal framework in which some comparable patterns of
redundancy can be modelled and studied. We show that these patterns of redundancy
vary significantly in their effects and that the number of neutral mutations they induce
is a relevant parameter in understanding this variation. We then quantify the impact of
this form of redundancy on a genetic algorithm. Several optimisation problems are tried
in which redundancy brings a substantial decrease in the number of generations needed
to find a solution of a given quality. A problem is also presented where redundancy does
not speed up the discovery of good solutions. A more detailed analysis is carried out of
the factors responsible for this limitation. The consequences of these findings for genetic
algorithms and for the evolution of the genetic code are discussed.
Submitted for the degree of D. Phil.
University of Sussex
May, 2002
Contents
1 Introduction
1
1.1
The evolution of the genetic code . . . . . . . . . . . . . . . . . . . . . . . .
1
1.2
Codes and genetic algorithms . . . . . . . . . . . . . . . . . . . . . . . . . .
4
1.3
Aims of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
1.4
Structure of this thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
2 Codes and neutrality in biological and simulated evolution
2.1
2.2
2.3
2.4
2.5
7
The genetic code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
2.1.1
From DNA to mRNA . . . . . . . . . . . . . . . . . . . . . . . . . .
7
2.1.2
From mRNA to protein . . . . . . . . . . . . . . . . . . . . . . . . .
8
2.1.3
Redundancy and neutrality in the genetic code . . . . . . . . . . . . 11
2.1.4
The wobble rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.5
Which codes are possible given the wobble rules? . . . . . . . . . . . 13
2.1.6
The underlying causes of the wobble rules . . . . . . . . . . . . . . . 14
Evolution of the genetic code . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.1
Frozen accident versus stereochemical theory . . . . . . . . . . . . . 14
2.2.2
The genetic code is not universal . . . . . . . . . . . . . . . . . . . . 16
2.2.3
Adaptive forces shaping the genetic code . . . . . . . . . . . . . . . . 18
2.2.4
An adaptive hypothesis for neutrality in the genetic code . . . . . . 19
Codes in Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3.1
Importance of the genotype to phenotype mapping for GAs . . . . . 21
2.3.2
Existing work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.3
Relevance of redundancy for GAs . . . . . . . . . . . . . . . . . . . . 24
Neutrality in RNA evolution
. . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4.1
RNA folding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4.2
Shape space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4.3
Sequences folding into s and sequences compatible with s . . . . . . 27
2.4.4
Connectivity of C(s) . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.4.5
Modelling neutral networks with random graphs . . . . . . . . . . . 29
2.4.6
Random graphs compared to simulated neutral networks . . . . . . . 29
2.4.7
Population dynamics on neutral networks . . . . . . . . . . . . . . . 29
2.4.8
Perpetual innovation along a neutral network . . . . . . . . . . . . . 30
2.4.9
Critique of the random graph approach . . . . . . . . . . . . . . . . 31
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Contents
3 A formal framework for a comparative study of redundancy
6
34
3.1
Requirements for a definition of redundancy . . . . . . . . . . . . . . . . . . 34
3.2
Redundancy in a minimal form . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3
3.4
3.5
3.2.1
A possible definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2.2
Some examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2.3
The identity permutation . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2.4
Redundancy and neutrality . . . . . . . . . . . . . . . . . . . . . . . 40
3.2.5
Permutations as the expression of redundancy . . . . . . . . . . . . . 41
3.2.6
Redundancy in a graphical form . . . . . . . . . . . . . . . . . . . . 41
The framework in a more general form . . . . . . . . . . . . . . . . . . . . . 43
3.3.1
The natural generalisation . . . . . . . . . . . . . . . . . . . . . . . . 44
3.3.2
Other ways of generalising . . . . . . . . . . . . . . . . . . . . . . . . 46
The criteria for assessing redundancy . . . . . . . . . . . . . . . . . . . . . . 49
3.4.1
Assigning fitness to symbols . . . . . . . . . . . . . . . . . . . . . . . 49
3.4.2
How meaningful is the fitness of a 3 bit long sequence? . . . . . . . . 50
3.4.3
Counting numbers of optima . . . . . . . . . . . . . . . . . . . . . . 51
3.4.4
Dealing with neutral paths . . . . . . . . . . . . . . . . . . . . . . . 53
3.4.5
Comparing numbers of optima in a meaningful way
. . . . . . . . . 54
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4 A statistical analysis of redundancy patterns
56
4.1
Aims of the chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.2
Some quantitative features of a permutation . . . . . . . . . . . . . . . . . . 56
4.3
4.4
4.5
4.2.1
Number of invariant elements . . . . . . . . . . . . . . . . . . . . . . 57
4.2.2
Number of orbits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.2.3
Sum of distances between a sequence and its image . . . . . . . . . . 57
4.2.4
Connectivity between pairs of signification . . . . . . . . . . . . . . . 58
Best and worst permutations when n is equal to 3 . . . . . . . . . . . . . . 61
4.3.1
Some considerations of size . . . . . . . . . . . . . . . . . . . . . . . 61
4.3.2
Description of the data
4.3.3
Any redundancy is better than none . . . . . . . . . . . . . . . . . . 61
4.3.4
Differences between Dσ and Rσ . . . . . . . . . . . . . . . . . . . . . 64
4.3.5
The proportions of adverse cases . . . . . . . . . . . . . . . . . . . . 64
4.3.6
Trends in the other variables . . . . . . . . . . . . . . . . . . . . . . 65
. . . . . . . . . . . . . . . . . . . . . . . . . 61
The incidence of Inv on Rσ . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.4.1
Codes defined on sequences of length 3 . . . . . . . . . . . . . . . . . 66
4.4.2
Codes defined on sequences longer than 3 . . . . . . . . . . . . . . . 67
The incidence of other variables on Rσ . . . . . . . . . . . . . . . . . . . . . 68
4.5.1
The relation with Conn0 . . . . . . . . . . . . . . . . . . . . . . . . 68
4.5.2
The relation with Conn1, Conn2 and Conn3 . . . . . . . . . . . . . 72
4.5.3
The relation with SumDist . . . . . . . . . . . . . . . . . . . . . . . 73
4.6
Parallels with a quaternary alphabet . . . . . . . . . . . . . . . . . . . . . . 74
4.7
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Contents
5 Redundancy on trial in evolution
7
77
5.1
The Genetic Algorithm
5.2
First test problem: a case of no epistasis . . . . . . . . . . . . . . . . . . . . 78
5.3
5.4
5.5
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.2.1
The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.2.2
Introducing redundancy . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.2.3
Experimental procedure . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.2.4
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Second test problem: selection for a periodical chromosome . . . . . . . . . 89
5.3.1
The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.3.2
Adding redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.3.3
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Third test problem: finding a compact non-overlapping path on a grid . . . 94
5.4.1
The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.4.2
Adding redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.4.3
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
6 Some limitations to the benefits of redundancy
6.1
6.2
6.3
6.4
100
Application of redundancy to the design of a wing box . . . . . . . . . . . . 100
6.1.1
The problem and the original encoding . . . . . . . . . . . . . . . . . 100
6.1.2
Modifying the encoding . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.1.3
Introducing redundancy . . . . . . . . . . . . . . . . . . . . . . . . . 103
6.1.4
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Comparing redundancy on three non-redundant codes . . . . . . . . . . . . 105
6.2.1
Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.2.2
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
Why does T1 perform better than T3 ? . . . . . . . . . . . . . . . . . . . . . 110
6.3.1
Non-redundant codes and partial fitness functions . . . . . . . . . . 110
6.3.2
Counting numbers of optima . . . . . . . . . . . . . . . . . . . . . . 112
6.3.3
Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
7 Conclusion
122
7.1
Summary of the approach and main contributions . . . . . . . . . . . . . . 122
7.2
Conclusions for GAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
7.2.1
7.3
Further lines of research . . . . . . . . . . . . . . . . . . . . . . . . . 125
Conclusions for the genetic code . . . . . . . . . . . . . . . . . . . . . . . . 126
7.3.1
How optimal is the redundancy of the code? . . . . . . . . . . . . . . 126
7.3.2
Limitations of the model . . . . . . . . . . . . . . . . . . . . . . . . . 127
7.3.3
Further lines of research . . . . . . . . . . . . . . . . . . . . . . . . . 127
List of Figures
1.1
A DNA molecule and its replication. . . . . . . . . . . . . . . . . . . . . . .
2
2.1
The base sequence and secondary structure of a tRNA. . . . . . . . . . . . .
9
2.2
A schematic view of the translation of a mRNA into proteins. . . . . . . . . 10
2.3
The evolution of the genetic code in non-mitochondrial genomes. . . . . . . 16
2.4
The evolution of the genetic code in mitochondrial genomes . . . . . . . . . 17
2.5
Mutation and the travelling salesman problem. . . . . . . . . . . . . . . . . 24
2.6
Two possible redefinitions of mutation compatible with set partitioning
problems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.7
Two RNA molecules with the same secondary structure. . . . . . . . . . . . 26
3.1
A spatial representation of 3 bit long sequences. . . . . . . . . . . . . . . . . 41
3.2
3.3
A spatial representation of permutation [13465207]. . . . . . . . . . . . . . . 43
3.4
A spatial representation of permutation [13025746]. . . . . . . . . . . . . . . 44
3.5
3.6
Defining the meanings of Cd1 d2 d3 with seven permutations. . . . . . . . . . . 46
3.7
Defining the meanings of Cd1 d2 d3 with three permutations. . . . . . . . . . . 47
3.8
Defining the meanings of Cd1 d2 d3 with three commutative permutations. . . 48
3.9
Counting local optima with and without redundancy. . . . . . . . . . . . . . 52
3.10 Counting local optima in the presence of neutral paths. . . . . . . . . . . . 53
4.1
A connection of type 0 between a pair of symbols. . . . . . . . . . . . . . . 58
4.2
A connection of type 1 between a pair of symbols. . . . . . . . . . . . . . . 59
4.3
Possible connections of type 2 between pairs of symbols. . . . . . . . . . . . 60
4.4
Possible connections of type 3 between pairs of symbols. . . . . . . . . . . . 62
4.5
Permutations which never increase the number of optima. . . . . . . . . . . 65
4.6
Rσ as a function of Inv when n equals 3. . . . . . . . . . . . . . . . . . . . 66
4.7
R<Inv> as a function of Inv when n equals 3. . . . . . . . . . . . . . . . . . 67
4.8
Rσ as a function of Inv when n is greater than 3. . . . . . . . . . . . . . . . 69
4.9
R<Inv> as a function of Inv when n is greater than 3. . . . . . . . . . . . . 70
4.10 Rσ as a function of other variables when n equals 3. . . . . . . . . . . . . . 71
4.11 Rσ as a function of Inv when Conn0 is fixed. . . . . . . . . . . . . . . . . . 72
4.12 Rσ as a function of the best linear combination of Conn1, Conn2 and Conn3. 73
4.13 Rσ as a function of SumDist when Inv is fixed. . . . . . . . . . . . . . . . 73
4.14 A representation of sequence distances in the case of a quaternary alphabet. 75
5.1
The probability of a slot being picked in a selection tournament. . . . . . . 78
List of Figures
5.2
A redefinition of function f in NK fitness landscape terms.
5.3
First problem: the proportion of optimal blocks after 100 generations as a
9
. . . . . . . . . 80
function of the mutation rate. . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.4
First problem: the proportion of optimal blocks after 400 generations as a
function of the mutation rate. . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.5
First problem: the proportion of optimal blocks as a function of the number
of generations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.6
First problem: comparing the speed of evolution with and without redundancy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.7
The value of f (i, j) as a function of j. . . . . . . . . . . . . . . . . . . . . . 89
5.8
Second problem: transforming the encoding through permutation σ. . . . . 91
5.9
Second problem: transforming the encoding through permutation [07143562]. 92
5.10 Second problem: the proportion of optimal blocks as a function of the
mutation rate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.11 Second problem: comparing the speed of evolution with and without redundancy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.12 Possible moves from a cell of the grid. . . . . . . . . . . . . . . . . . . . . . 96
5.13 Third problem: transforming the encoding through permutation [07143562]
96
5.14 Third problem: the proportion of optimal blocks as a function of the mutation rate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.15 Third problem: comparing the speed of evolution with and without redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
6.1
The relevant elements of a wing. . . . . . . . . . . . . . . . . . . . . . . . . 101
6.2
The representation of the wing parameters on the chromosome. . . . . . . . 101
6.3
Fitness after 200 generations as a function of the mutation rate. . . . . . . . 104
6.4
Comparing the speed of evolution with and without redundancy. . . . . . . 104
6.5
The non-redundant code T2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.6
The non-redundant code T1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.7
The non-redundant code T3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.8
Fitness after 200 generations as a function of the mutation rate with code T1 .107
6.9
Fitness after 200 generations as a function of the mutation rate with code T3 .107
6.10 Comparing the speed of evolution with and without redundancy when T3
is used. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
6.11 Comparing non-redundant codes T1 , T2 and T3 without redundancy. Error
bars indicate the standard error. . . . . . . . . . . . . . . . . . . . . . . . . 109
6.12 Comparing non-redundant codes T1 , T2 and T3 with redundancy. . . . . . . 109
6.13 Possible variations of fitness when changing the thickness of a single panel. 111
6.14 The average number of optima as a function of the number of generations. . 114
6.15 The proportion of blocks found at a local optimum as a function of the
number of generations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.16 Optimality along the wing with non-redundant code T1 . . . . . . . . . . . . 117
List of Figures
10
List of Tables
2.1
The universal genetic code. . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2
The wobble rules for the universal genetic code. . . . . . . . . . . . . . . . . 12
4.1
The 10 best and worst permutations when n equals 3. . . . . . . . . . . . . 63
4.2
Summary of the correlations between all variables . . . . . . . . . . . . . . . 68
4.3
A summary of the differences between a binary and a quaternary alphabet.
5.1
The relationship between number of blocks optimised after 400 generations
76
and the value of Rσ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Chapter 1
Introduction
1.1
The evolution of the genetic code
One hundred and forty years after it was proposed by Darwin, the theory of evolution
through random variation and natural selection is more firmly established than ever. The
discovery, in the second half of this century, of the molecular basis of heredity was an
important consolidation of the theory. In 1953, Watson and Crick elucidated the chemical
structure of deoxyribonucleic acid (DNA) revealing a formidable potential for information
storage within every living cell and a simple mechanism by which this information could
be replicated quickly and very accurately at every cell division.
Briefly, Watson and Crick found that DNA is a molecule made of two strands, each
of which consists of a backbone to which chemical bases are attached at regular intervals.
These bases come in four varieties: adenine (A), cytosine (C), guanine (G) and thymine
(T). The two strands are held together by bonds between opposite bases as shown in
Figure 1.1. However, only two types of bonds are possible: A-T and C-G. Since A always
pairs with T, T with A, C with G and G with C, the base sequence on one strand is
entirely constrained by the sequence found on the other. This complementarity provides a
simple mechanism for replicating the molecule: when the two strands are separated, each
can be used as a template for the manufacture of a new complementary strand as shown
in Figure 1.1. The end result is two molecules identical to the original.
Thirteen additional years were needed to discover how the message stored in the DNA
is interpreted by the cell to control the synthesis of all its proteins. At the heart of this
process, molecular biologists found what is known as the genetic code, a dictionary specifying which of 20 possible amino acids corresponds to each of the possible combinations
of three bases. The genetic code (shown on page 11) specifies, for instance, that triplet
TTG stands for the amino acid leucine. Using this dictionary, some sequences of bases
along the DNA strand can ultimately be decoded as a sequence of amino acids otherwise
known as a protein.
It was at first thought that this code was universal (Crick, 1968), i.e. that all forms
of life used the same dictionary when interpreting the genetic message in their DNA.
Chapter 1. Introduction
A
A
T
T
A
T
T
A
T
A
T
A
G
C
G
C
C
C
T
G
A
T
G
A
C
G
C
G
G
C
G
C
C
G
G
C
G
G
C
C
T
A
A
T
T
A
A
T
T
C
A
G
T
G
A
C
G
T
C
A
C
T
G
A
2
Direction of
the copying
process
Figure 1.1: A DNA molecule and its replication. The two strands that make up the
molecule are progressively separated and, as this happens, the unpaired bases are matched
by new complementary bases.
But more diversified genetic studies showed that this is not exactly true even though all
variations documented so far are minor (see Osawa et al. (1992) for an exhaustive review
of the known variants). This undeniable evidence of variation gave strength to the idea
that the code itself was, at some point in the past, the object of natural selection and that
its quasi-universal version is in some sense superior to possible alternatives.
But what makes one code better than another? We can answer this question first
in a more familiar context. The Morse code uses different sequences of ‘–’ and ‘.’ to
represent each letter of the alphabet. There is a substantial degree of arbitrariness to
the way this assignment is made but at least one design consideration can be identified.
Not all letters are coded with the same number of dashes and dots; common letters are
assigned short sequences (such as ‘.’ for ‘E’) while rare letters are assigned long ones (‘–
–..’ for ‘Z’). This results in shorter messages which are more economical to transmit. We
can think of another consideration which could, under certain circumstances, benefit the
Morse code. Supposing that the likelihood of a ‘.’ being mistaken for a ‘–’ is high, we
might want to reduce the probability of such errors going undetected for they could lead
to a misinterpretation of the message. In many English words, the transformation of an
‘i’ into an ‘a’ will lead to a different but valid word. The alteration of ‘fit’ into ‘fat’ is one
instance where such change could be the cause of some misunderstanding. On the other
hand, a change of ‘i’ into a ‘z’ is unlikely to go unnoticed. It would therefore be sensible
to chose for ‘i’ a sequence which is more likely to be corrupted into the sequence for ‘z’ or
3
‘h’ than into the sequence for ‘a’.
Interestingly, both considerations are relevant to the genetic code. The first consideration is similar to the hypothesis that amino acids which are frequently used in proteins
should have have more triplets of bases representing them in the code than amino acids
which are rare. By this token methionine, which has a single triplet representing it, should
be found in proteins in a much lower proportion than leucine which has six. This idea
has been around for a while but it seems to have been disproved (Maynard Smith and
Szathmary, 1995). The second consideration finds its counterpart in the suggestion that
triplets which can easily be mutated into one another, such as GCA and GCC, should
code for amino acids with similar chemical properties. The benefit is that a mutation
that causes a change of amino acid will be less damaging for the protein because, if the
new amino acid resembles the old one, the protein is likely to still be able to perform its
function. Sonneborn (1965) first proposed this idea and, although controversial for a long
time, it seems to have been established more recently by two publications (Haig and Hurst,
1991; Di Giulio, 1989). Notice that the property is analogous but somehow opposite to
the one suggested for the Morse code. In one case corruption of the message should lead
to unnoticeable change while in the other it should lead to an easily detectable change.
This thesis will focus on a different feature. There is a large amount of redundancy in
the genetic code: many triplets stand for the same amino acid since there are 64 possible
triplets but only 20 amino acids to be encoded. This thesis aims to determine whether the
way this redundancy is distributed could have had an impact on the ability of evolution
to discover better proteins and, more generally, better-adapted life forms.
It is important to understand that selection for a better genetic code is quite different
from, say, selection for a better eye design, to take a classic example of complex adaptation achieved by evolution. Enhanced vision will grant an individual some reproductive
advantage; a distinct genetic code, in the sense which interests us here, will not. If the
genetic code of an organism is suddenly changed, the result is bound to be catastrophic
for the organism since newly synthesised proteins will be altered wherever the protein
was encoded by a triplet whose meaning has changed. The comparison of interest from
our point of view is between organisms which produce exactly the same proteins but use
a different code to represent these proteins in their DNA. Under these conditions, both
organisms should have the same fitness. However, they will differ in the offspring they
produce since the changes caused by mutation will be determined by the code they use.
The consequences of such differences can only be appreciated after the passing of some
time.
The evolution of the genetic code could be described as a meta-evolutionary problem,
similar in this respect to the evolution of sex (Maynard Smith, 1978; Michod and Levin,
1987). In both cases, the feature that is evolving does not directly affect an individual’s
fitness but impinges on the relationship that exists between an individual and its offspring.
As such, over the course of many generations, it can have significant cumulative effects.
A mutation that causes an individual to revert to the asexual mode of reproduction, for
instance, does not necessarily affect the ability of this individual to survive and repro-
4
duce. But it can be shown that hundreds of generations later, the average fitness of its
descendants is much lower than if that mutation had not taken place (Peck et al., 1997).
1.2
Codes and genetic algorithms
So well established is Darwin’s idea that it has been turned into a general purpose problem
solving strategy. Algorithms implementing the principles of random variation and selection
have been used to design artificial neural networks (Beer and Gallagher, 1991; Boers and
Kuiper, 1992), optimise schedules and timetables (Levine, 1996; Monfroglio, 1996; Sridhar
and Rajendran, 1996) and predict the tertiary structure of proteins (Schulze-Kremer,
1992). These algorithms are commonly known as genetic algorithms (GAs).
When such algorithms are used to solve problems, two distinct descriptions are needed
of candidate solution objects. The first and most natural one describes the object in
functional terms and is used to determine how well it solves the problem at hand. The
second is used by the genetic operators of mutation and crossover to produce variants
of those individuals in the population which are chosen for reproduction. This duality
mirrors the phenotype/genotype distinction so essential to our understanding of life with
the important difference that in GAs this relationship is imposed from outside by the
designer of the algorithm. Increasingly, it has been realised that this freedom of choice
should be exercised with care since for hard problems the choice of a suitable mapping
between genotype and phenotype can critically influence the performance of the algorithm.
According to Mitchell (1996), the way in which candidate solutions are encoded is a central,
if not the central, factor in the success of a genetic algorithm.
Ideally, a good encoding should allow genotypes representing satisfactory solutions
to be reached with high probability from any randomly generated population. But for
hard problems, any realistic encoding will result in some genotypes of less than optimal
fitness acting as attractors for the population and delaying further evolution for very long
periods of time. A good encoding is one for which the fitness of such genotypes is as high
as possible.
The choice of an encoding encompasses many issues: the shape of chromosomes (linear
or tree-shaped as in Genetic Programming (Koza, 1992)), their length, how to distribute
information on them, and many others which can only be discussed in the context of a
specific problem. In the case of neural networks, for instance, one must explicitly specify
how the connectivity of the network is going to be expressed in a linear form on the
chromosome. All these questions deserve thorough investigation. In this thesis, however,
we will focus on the aspect of the genotype to phenotype mapping which most resembles
the genetic code. In many GA applications, the first level of interpretation of the genotype
is done by parsing the binary string into blocks of predefined size; every such block defines
one parameter or variable of the solution and all blocks taken together should give enough
information to allow a solution to be evaluated. The size of a block is determined by
the number of possible values we want the variable to take; a block of size n allows the
encoding of up to 2n values. The mapping between those 2n binary values and the possible
values of the variable is equivalent to the genetic code in the sense that it is the lowest
5
layer of interpretation of the genotype.
Redundancy as we find it in the genetic code can easily be introduced in such mappings.
Furthermore, if redundancy helps evolution in the genetic code, we expect the same kind
of redundancy to also improve the performance of a GA. In fact, the benefits that can
be expected for a GA should be at least as large as those that might be brought about
by redundancy in the genetic code. In the code, redundancy would have had to impose
itself by selection. But as will be explained in the next chapter, the ability of the code
to change is limited. Besides, the benefits of redundancy might only be felt in the long
term which makes the task of selection difficult. The situation is quite different from a
GA point of view. There, codes are imposed by the designer rather than depending on
historical contingencies and evolution; nothing prevents the designer from including in the
code a property known to be beneficial.
The distinction between redundancy in the genetic code and redundancy in GAs will
be blurred anyway since GAs will be used as our model of biological evolution. In Chapter 5 and Chapter 6, several patterns of redundancy will be introduced in the underlying
code of GAs designed to solve different problems. These experiments will measure the
practical benefits of redundancy for GAs as well as provide an indication of the likelihood
of redundancy having been selected for in the genetic code.
1.3
Aims of the thesis
The aims of this thesis are the following:
• To define a theoretical framework within which the impact of redundancy on natural
selection and GAs can be investigated clearly.
• To determine, using this framework, whether some forms of redundancy can speed
up adaptation when incorporated in the genetic code. The underlying suggestion is
that life forms using such code would have been selected for their greater ability to
adapt.
• To establish the conditions under which such beneficial forms of redundancy can be
successfully incorporated into a GA.
The first two objectives address issues which are relevant both to the genetic code and
to GAs. The third one is purely a GA issue.
1.4
Structure of this thesis
Chapter 2 deals with some rudiments of molecular biology necessary to understand the
genetic code and the problems associated with its evolution. Some of the most important
theories about its evolution are then discussed prior to an intuitive formulation of why
some redundancy in the genetic code could improve the ability of a lineage to adapt.
The issue of redundancy is then discussed in the context of codes for GAs where it has
received no attention at all. Finally, we review some work in the field of RNA evolution
where neutrality, a possible consequence of redundancy, has been discussed in a way that
is relevant to this thesis.
6
Chapter 3 defines the formal framework within which our study of redundancy takes
place. A set containing all possible patterns of redundancy is defined as well as a scalar
measure which equates good redundancy with an ability to statistically suppress local,
non-global, optima.
Chapter 4 is an exhaustive study of all patterns of redundancy in the case of codes of
low dimension. The aim is to identify the features of a pattern which are responsible for
suppressing local optima.
In Chapter 5, a few selected patterns of redundancy are included in a GA and tested
for their ability to improve its performance. It is shown, on three distinct problems,
that a pattern of redundancy which scores high on the measure of Chapter 3 diminishes
significantly the time needed by the GA to find solutions of a given fitness.
In Chapter 6, redundancy is added to a GA whose task is to optimise the design of
an aeroplane’s wing. It is shown that redundancy is not necessarily beneficial in this case
which leads to important observations about the conditions under which redundancy will
or will not speed up evolution.
Chapter 7 summarises the approach and the main contributions of this thesis. It
concludes on the validity of the approach for the genetic algorithms. It reexamines the
question of the origins of redundancy in the genetic code in the light of our findings.
Further lines of research are outlined.
Chapter 2
Codes and neutrality in biological and
simulated evolution
Section 2.1 is a brief overview of the molecular biology of the gene. This subject is
associated with a vast literature and a thriving research effort—considerations of space
preclude an exhaustive summary. For further information, the interested reader is referred
to such comprehensive accounts as Watson et al. (1987). Fortunately, the facts relevant
to the current argument concerning the genetic code have been established for some time
and are no longer the object of controversy.
Section 2.2 deals with more controversial and speculative ideas concerning the origins
and evolution of the genetic code. The theme has fascinated theoreticians from the early
days of the discovery of the code but a comprehensive picture of the process is still to
come. A discussion of the main theories will set the stage for the suggestion that selection
for redundancy has been responsible for some of the changes that have taken place in the
code.
Section 2.3 explains why this research is relevant to the field of genetic algorithms. We
show its place as one aspect of the much larger question of the genetic representation of
candidate solutions.
Finally, the issue of neutrality in RNA evolution is discussed in Section 2.4. Researchers
in that field have expressed an interest in neutrality for some time and we will examine in
detail their theoretical approach to the problem.
2.1
2.1.1
The genetic code
From DNA to mRNA
Proteins are not obtained directly from DNA. Instead, in a process known as transcription,
DNA is used as a template for the synthesis of a very similar single stranded molecule
called ribonucleic acid or RNA. RNA is, like DNA, made of a backbone to which chemical
bases attach sequentially. When a section of DNA is transcribed, the RNA molecule that
is produced has exactly the same sequence of bases as the template except that the base
uracyl (U) replaces every occurrence of the base thymine (T) on the DNA.
Chapter 2. Codes and neutrality in biological and simulated evolution
8
Transcription happens in the following way. At one or more stage in the cell cycle, an
enzyme called RNA polymerase attaches to the DNA molecule and separates both strands
over a section of 17 bases. One of the strands is then used as a template onto which
complementary nucleotides are attracted by DNA-like base pairing except that A pairs
with U instead of T. The opened section of DNA moves along as the RNA molecule is
assembled. The beginning and end of this process are controlled in a very precise manner
ensuring that the same mRNA molecules are produced every time.
Because transcription is the main point of control of gene expression, its activation
and its rate are regulated by many products collectively known as transcription factors.
These factors interfere with transcription in many different ways either activating it or
repressing it. This is a vast and fascinating topic but it is not central to our argument.
The interested reader is referred to Gilbert (1994).
Several types of RNA exist which are all obtained by transcription. The majority
are intermediary products in the synthesis of proteins and are called messenger RNA
(mRNA). Other types exist which will be described in the next section. In eukaryotes,
RNA molecules must migrate out of the nucleus into the cytoplasm for protein synthesis
to take place.
2.1.2
From mRNA to protein
When the mRNA molecule has reached the cytoplasm, it can be used as a template to
build a protein; this process is known as translation. A protein is a chain of amino acids of
arbitrary length. The precise nature and order of the amino acids in the chain is what will
be dictated by the mRNA. Three main elements are involved in translation: ribosomes,
transfer RNAs and the mRNA itself.
Ribosomes are roughly spherical particles on which the bond between the adjacent
amino acids forms. These particles are very complex assemblies consisting of about onethird protein and two-thirds ribosomal RNA, a form of RNA dedicated to this function
and not translated into protein.
Transfer RNAs, or tRNAs, are the main protagonists in the implementation of the
genetic code since they mediate the inclusion of a specific amino acid in the protein conditionally on the identity of three bases found on the mRNA. A tRNA performs its function
by associating on one end with a specific amino acid and by having three of its bases, know
as the anticodon, capable of selectively associating with certain triplets on the mRNA (Figure 2.1). This selective association between codon and anticodon proceeds according to
the standard rules of pairing (A with U and G with C) for two of the bases; for the third
base of the codon, some special rules apply which are described in detail in Section 2.1.4.
Transfer RNAs, loaded with their amino acid, diffuse to the ribosome. The position
of the ribosome on the mRNA defines a particular triplet of the mRNA as the one that is
currently readable. When a tRNA capable of binding that triplet does so, it releases its
amino acid onto the growing protein chain; the ribosomal unit and the protein are then
shifted three bases further along the mRNA and the next codon is ready to be read. The
ribosome unit has been compared to the head of a tape reader and the mRNA to a tape.
3’
A
9
~ Alanine
C
C
5’
A
G
C
G
C
G
U
C
G
G
C
G
C
U
G
A
U
G
C
G
C
U
U
G
A
G
G
C
C
U
C
C
G
G
U
U
A
G
C
G
G
A
G
C
G
C
C
C
A
C
G
U
A
C
G
C
G
C
G
G
Unusual bases
U
U
3’
I
G
C
C
C
G
Codon
Anticodon
5’
mRNA
Figure 2.1: The base sequence and secondary structure of a tRNA. This tRNA will bind
with an alanine amino acid on its 3’ end as shown. The anticodon will bind to GCC
whenever such codon is ready to be read on the mRNA. This causes the alanine amino
acid to be added to the protein that is being synthesised. The bars between bases show
internal pairings of the molecule. The grey circles represent unusual bases obtained from
A, C, G or U by chemical modification. One such base, inosine (I), is found at the first
position of the anticodon. As explained in Section 2.1.4, it causes this tRNA to recognise
codons GCA or GCU as well as GCC.
Free ribosomal subunits
about to attach to mRNA
Growing protein
10
Completed protein
Amino acid
tRNA
3’
5’
mRNA
Figure 2.2: A schematic view of the translation of a mRNA into proteins. Ribosomes are
moving from left to right. Each of them is at a different stage of the manufacture of the
same protein. Adapted from Watson et al. (1987).
In fact, the tape is read simultaneously by several tape units which are some distance
apart from each other as shown in Figure 2.2.
The release of the growing chain of amino acids from the reading mechanism does not
wait until the end of the mRNA is reached; it is triggered by the occurrence of some special
codons (UAU, UAU and UGA) which are not bound by any tRNA but are recognised by
proteins known as release factors. These proteins, as their name indicates, release the
ribosome from the mRNA and liberate the completed protein into the cytoplasm.
It is conventional in molecular biology to refer to a base sequence from the 5’ end of
the molecule to the 3’ end (these ends can be distinguished on RNA and DNA molecules
because the backbone onto which the bases attach is not symmetrical). Codon and anticodon run in opposite directions when pairing together as shown in Figure 2.1. Hence, if
both codon and anticodon are described using this convention, the first base of the codon
pairs with the third base of the anticodon, the second with the second and the third with
the first. Codon 5’– CGA – 3’ will pair with anticodon 5’– UCG – 3’ for instance. If we
choose to always represent anticodon with the opposite convention (from 3’ to 5’), checking the compatibility of a codon and an anticodon is easier because we do not need to
mentally invert the anticodon to check its compatibility with the codon. This convention
will be used in all that follows.
For translation to be reliable, it is essential that a tRNA is always loaded with the same
amino acid. If for instance, the tRNA represented in Figure 2.1 is loaded with another
amino acid than alanine, some GCC codon will be misread resulting in the some proteins
having an altered structure. The high reliability of the loading is made possible by the
existence of enzymes, called aminoacyl-tRNA synthetase, whose function it is to recognise
a tRNA and attach the correct amino acid to it.
The consistent translation of a codon into the correct amino acid is therefore the joint
responsibility of the tRNAs and the aminoacyl-tRNA synthetases. Both these molecules
have their structure specified somewhere in the DNA of the organism: transfer RNAs
are obtained by transcription only while aminoacyl-tRNA synthetases are proteins and
therefore obtained by transcription and translation. The potential therefore exists for
11
Table 2.1: The universal genetic code. The amino acids (and their abbreviations) are:
phenylalanine (phe), serine (ser), tyrosine (tyr), cysteine (cys), leucine (leu), trytophan
(trp), proline (pro), histidine (his), arginine (arg), glutamine (gln), ISO-leucine (Ile), threonine (thr), aspargine (asn), lysine (lys), methionine (met), valine (val), alanine (ala),
aspartic acid (asp), glycine (gly) and glutamic acid (glu).
1
2
3
U
C
A
G
U
phe
ser
tyr
cys
U
U
phe
ser
tyr
cys
C
U
leu
ser
stop
stop
A
U
leu
ser
stop
trp
G
C
leu
pro
his
arg
U
C
leu
pro
his
arg
C
C
leu
pro
gln
arg
A
C
leu
pro
gln
arg
G
A
Ile
thr
asn
ser
U
A
Ile
thr
asn
ser
C
A
Ile
thr
lys
arg
A
A
met
thr
lys
arg
G
G
val
ala
asp
gly
U
G
val
ala
asp
gly
C
G
val
ala
glu
gly
A
G
val
ala
glu
gly
G
these molecules to change by mutation: a tRNA could have its anticodon changed while
still carrying the same amino acid, or an aminoacyl-tRNA synthetase could be altered
and recognise a different type of tRNA. Either type of alteration would cause a change of
meaning for some codon.
2.1.3
Redundancy and neutrality in the genetic code
The universal genetic code is the look-up table which summarises the meaning of all 64
possible triplets of bases as they are interpreted in translation by most living beings. It is
pictured in Table 2.1. The first and third base define a line of the table while the second
one defines a column. At the intersection of the two, the desired amino acid is found.
As there are sixty-four different codons but only twenty-one different meanings to be
expressed (twenty amino acid and the stop signal), some codons must share the same
meaning. The code is said to be redundant. Some amino acids, such as serine, leucine,
and arginine, have up to six different triplets coding for them.
The smallest change that can take place in the genetic message is the alteration of one
base into another, also called a point mutation. Such events are rare enough to render
12
Table 2.2: The wobble rules for the universal genetic code.
First anticodon base
Compatible third codon base
G
U, C
C
G
A
U
U
A, G
I
A, U, C
negligible the probability that more than one base is altered at a time in any given triplet.
Furthermore, mutations are, in first approximation, equally likely to change any base into
any other. It is therefore natural to think of two triplets that differ in only one of their
three bases as neighbours. Every triplet has nine neighbours since three alterations are
possible for each of the three bases. Consider for instance triplet ACG. Mutation of the
first base can lead to CCG, GCG and UCG which are found occupying the same position
as ACG, at the bottom of the three other boxes of the same column. Mutation of the
second base leads to AAG, AGG and AUG which are found in the three other columns
on the same line of the table. Mutations of the third base lead to ACA, ACC and ACU
which are found in the same box as ACG.
Mutations in the first base can very occasionally be neutral as is the case with UUA ↔
CUA which both code for leucine. Mutation in the second base are never neutral with the
exception of UAA ↔ UGA which are both stop codons. Mutations in third base are very
often neutral. Indeed, triplets with synonymous meanings are not randomly scattered in
the table but tend to appear within the vertical boxes that demarcate a fixed choice of the
first two bases and an arbitrary value for the third. For instance, codons in the bottom
left corner all start with GU and code for valine. As a result, whenever a codon starts
with GU, a mutation at the third base will be neutral. Family boxes, as these groups of
synonymous codons are called, are found for codons starting with UC, CU, CC, CG, AC,
GU, GC and GG. They make up half the total number of triplets.
2.1.4
The wobble rules
As the deciphering of the code was coming to an end, it became clear that there wasn’t a
different tRNA molecule for every codon in the table; some tRNAs could pair with more
than one type of codon. For instance, a tRNA for alanine was shown to respond well to
GCU, GCC, GCA but little if at all to GCG (Nirenberg et al., 1966). This observation and
others of the same kind prompted Crick (1966) to propose the wobble hypothesis. Crick
suggested that codon-anticodon pairing at the first two positions of the codon obeys the
traditional base pairing (G with C and U with A) but that pairing at the third position
is less discriminative and follows a special set of rules displayed in Table 2.2 and known
as the wobble rules. The reason for this more fragile bond have since been explained in
molecular terms (Pluhar, 1994).
13
This table shows that only A and C obey the traditional pairing. G, C and I, a base
called inosine which is only found in tRNA, are not as discriminative and will cause a
tRNA to accept several options as the third letter of a codon. In particular, since inosine
at the first position of an anticodon pairs with A, U or C at the third position of the
codon, this could account in part for the common pattern where the third letter of the
codon is irrelevant. It was indeed shown that tRNAs with inosine at the first position of
the anticodon exist for each of the eight family boxes. But this accounts only for three out
of the four synonymous in a box. For a codon ending in G to have the same meaning as
the three others, there must be another tRNA with a C at the first anticodon position, the
same two letters at the other two positions of the anticodon, and carrying the same amino
acid. If that tRNA carries a different amino acid, then a different meaning is possible for
the codon ending G as is the case when UAG codes for methionine while UAU, UAC and
UAU code for ISO-leucine.
The wobble rules also account for the common case where, within a box defined by
a choice of the first two letters, the triplets ending in U and C have one meaning while
those ending in A and G have a different one. Codons UUU and UUC, for instance, code
for phenylalanine while UUA and UUG code for leucine. The wobble rules show that this
pattern can be the result of a tRNA with anticodon AAG binding either UUU and UUC
and another tRNA with anticodon AAU binding either UUA and UUG.
2.1.5
Which codes are possible given the wobble rules?
The wobble rules set some limitations to the power of discrimination of tRNAs at the
third position of the codon. No base at the first anticodon position will pair exclusively
with C for instance. Bases I and G will pair with C but they also pair with U. Hence
whenever an amino acid is associated with a triplet ending in C, the same amino acid is
also associated with the triplet starting with the same two bases and ending in U.
Similarly, no base is capable of pairing only with A at the third position. Either A
is recognised with G (base U in the anticodon) or it is recognised with U and C (base I
in the anticodon). A triplet XYA is thus either synonymous of XYU and XYC, or it is
synonymous of XYG.
Triplet UGA appears to be an exception to this rule since it has a different meaning
than all other triplets starting in UG. This is only possible because UGA is a stop codon
and does not have an associated tRNA. There is a tRNA with anticodon ACG which
recognises UGU and UGC and another with anticodon ACC which recognises UGG only.
No tRNA will bind to UGA which is why it acts as a stop codon. The differentiation
of a triplet XYA from other triplets XYZ is therefore only possible if it results from the
absence of tRNA recognising XYA, i.e. if it is a stop codon. A triplet ending in C could
also differentiate itself from the one ending in U if it was a stop codon but that case is not
observed in the genetic code.
To summarise, there are six possible configurations for a box of 4 codons which share
the first two letters. If a and b denote some arbitrary amino acids, these are
14
XYU
a
a
a
a
a
a
XYC
a
a
a
a
stop
stop
XYA
a
a
b
stop
b
stop
XYG
a
b
b
b
b
b
Only the first four of these configurations are observed in the genetic code. In any case,
no more than two distinct amino acids can coexist in a box which means that the 64 triplets
could not possibly code for more than 32 amino acids. A large share of the redundancy in
the code is therefore simply a consequence of the wobble rules. The amount of redundancy
that could be accounted for by any other explanation is therefore much smaller than a
superficial examination would let us believe.
2.1.6
The underlying causes of the wobble rules
We have been unable to find in the literature any discussion of the underlying causes of
the wobble rules. We can therefore only speculate as to what these causes might be. We
can think of three different types of explanation.
The first one is that these rules are a constraint arising from the underlying laws of
RNA chemistry; in other words, no RNA molecule exist that are capable of doing all the
things a tRNA does and in addition obey the normal rules of pairing at the third position
of the codon. This explanation is unlikely given how versatile RNA molecules are in their
function.
The second possible explanation is that the wobble rules are the result of selection.
That is, organisms endowed with tRNAs capable of discriminating as well at the third
position as at the other two would have been at some disadvantage against those organisms
whose tRNAs respected the wobble rules. We cannot think of any good reason why such
selection would happen. Furthermore, if that explanation was right, it should be possible
to mutate the existing tRNAs in such a way that they violate the wobble rules. As far as
we are aware, no such tRNA has ever been found.
The third explanation, which we favour, is closer to the first one than to the second.
It suggests that there exist RNA molecules which could perform as tRNAs without the
limitation of the wobble rules. However, these improved tRNAs might be far away from
the current cloverleaf structure common to all tRNAs. Evolution would have taken the
molecular machinery of the code down a path where tRNAs are stuck in a local optimum
with respect to their ability to discriminate at the third position. Moving away from the
wobble rules would at this stage require changes that evolution cannot perform. Furthermore, it is unclear anyway whether there would be some benefit from breaking free from
the wobble rules.
2.2
2.2.1
Evolution of the genetic code
Frozen accident versus stereochemical theory
Once the meaning of all triplets was identified, reading in the code something about its
origins became a natural concern. Crick (1968) made one of the earliest contributions to
15
the debate with his suggestion of a frozen accident. Remarkably, many points raised in
this article are still relevant today.
Crick uses the term accident to contrast his theory with the so called stereochemical
hypothesis (Woese, 1965). The latter claimed that the the relationship between the anticodon of a tRNA and the amino acid it carries is not arbitrary but extends some kind of
natural affinity that exists between a codon and its associated amino acid. This affinity
would have been at the origins of the code at a time when tRNAs where not yet available
to perform their translating function. The stereochemical hypothesis has two attractive
features. First, it proposes an explanation for how the code could have originated in
the absence of tRNAs. Secondly, it makes the universality of the code a necessity since
chemistry would have shaped it. To this hypothesis, Crick opposes evidence, confirmed
since, that the anticodon of a tRNA can be changed without changing the type of amino
acid it accepts. There is therefore nothing absolute in the association between codon and
amino acid, at least not in the current form of the code. The variations of the genetic
code which will be discussed later are another tangible proof of this fact (Osawa et al.,
1992). It could still be argued that an affinity existed between some codons and some
amino acids in some very early version of the code which disappeared as tRNAs became
more sophisticated. Yet, no evidence supporting this hypothesis has been found.
Crick suggests that the coding was arbitrary right from the start and that it slowly
changed to allow the introduction of new amino acids. But such changes, he argues, could
only take place very early in the history of life. Only very crude proteins could withstand
the consequences of a change in the meaning of a codon. As soon as proteins became too
precise in their function, they lost the ability to cope with such widespread disruption and
the code became frozen.
Crick argues that the primitive genetic code would have encoded a smaller number of
amino acids since many of them would have been unavailable to start with. This smaller
number would have made it possible for one or two bases to be enough to code them instead
of the present three. But if the translation process moved along by steps of one or two
bases on the mRNA instead of the current three, transition to the later system would have
been impossible without completely scrambling the existing message. On the other hand,
it is possible that the early code proceeded by steps of three bases but interpreted only
the first two bases ignoring completely the third one. This scenario is supported by the
irrelevance of the third base in half of the cases in the present genetic code. Furthermore,
although Table 2.2 states that U at the first position of the anticodon pairs with A or G,
it has been shown that the base U has to be chemically modified to behave in this way.
Left unmodified, U at the first position will pair with any of the four bases including itself.
Such totally indiscriminate pairing occurs in the code of mitochondria (Heckman et al.,
1980) with the effect that a single tRNA is enough to decode the four codons in a family
box. Jukes (1981) also argued in favour of such “two out of three” pairing as the norm in
the early genetic code.
Crick points out that the sophistication of an early, simple code by introduction of
new amino acid is likely to lead to the situation where chemically similar amino acids are
A : UGA: stop
B : CUG: leu
C : UAA/G: stop
D : UGA: stop
trp
ser
gln
cys
Eubacteria
16
Eukaryotes
Yeasts
A
C
B
Others
Mycoplasma
Spiroplasma
Saccharomyces
Candida
cylind.
D
Acetabularia
C
Euplotes Tetrahymena
Paramecium
etc.
Others
Figure 2.3: The evolution of the genetic code in non-mitochondrial genomes. From Osawa
et al. (1992).
encoded by similar codons. The reason is that when a new amino acid becomes the new
meaning of a codon, the change is much more likely to be tolerated by all the affected
proteins if the new amino acid is not too different from the old one.
Sonneborn (1965) was the first to suggest that the code might indeed be such that
chemically similar amino acids are nearby in the table. His explanation however was
different; he suggested that this property had been positively selected because it creates
a situation where the effects of mutation are minimised. Crick’s explanation is also in a
sense about selection although a more immediate form of it: all the codes which do not
have the property are immediately eliminated. In the scenario imagined by Sonneborn,
codes which do not have that property came into existence but were eliminated in the long
run because organisms using them suffered more damaging deleterious mutations.
2.2.2
The genetic code is not universal
Contrarily to what was thought in the early days, the genetic code is not universal. However, none of the variants is very different from the “universal” code and all encode exactly
the same amino acids. The code used by the mitochondria of yeast is the most different
with 6 codons with a different meaning. Generally speaking, larger deviations from the
universal code are found in mitochondrial codes than in nuclear ones. Figure 2.3 represents
the changes known in the codes of nuclear genomes while Figure 2.4 represents changes
in the codes of mitochondria. Most people still regard these variants as exotic exceptions.
But Osawa et al. (1992) argue that even more variants might be discovered as a larger
proportion of the 10 million species come under scrutiny.
But even the limited amount of variation that is already known contradict Crick’s idea
that the code cannot change in a sophisticated life form. So how can we explain that such
changes occurred without killing the organisms that fostered them?
According to Osawa et al. (1992), mutation pressure is the answer. In the nuclear
genome of most species, the fraction of bases which are either G or C deviates significantly
(7)
Coelenterates
17
Vertebrates
Insects
(4,9)
Molluscs
Echinoderms
(3)
Nematodes
Platyhelminths
(2)
(4,5)
Paramecium
(1)
(8)
(3,6)
Protosymbionts
Molds
Torulopsis
Saccharomyces
UNIVERSAL
CODE
Green Plants
Figure 2.4: The evolution of the genetic code in non-plant mitochondria. R stands for A
or G; N stands for any base. (1) UGA: stop→trp; (2) ACR: thr→ser; (3) AUA: Ile→met;
(4) AAA: lys→asn; (5) UAA: stop→tyr; (6) CUN: leu→thr; (7) AGR: arg→stop; (8) CGN:
arg→ noncoding; (9) AUA: met→Ile. The point of change (3) is not definite. From Osawa
et al. (1992).
form the expected value of 50%. This does not seem to be the result of selection for amino
acids coded by triplets containing such bases. In eubacteria, species can be found with a
G+C content anywhere between 25% and 75%. The favoured explanation for a high G+C
content is that mutations from A or T to G or C happen more frequently than in the other
direction, the reverse being true of a high A+T ratio. This bias in the mutation process is
the result of factors internal to living systems, most probably in the type of errors made
at the time of DNA replication. Evidence for this can be found in the comparison of the
G+C content of the genome of Escherichia Coli with that of the genome of bacteriophage
viruses that infect that bacteria. A correlation exists between the G+C content of the
two when the virus uses the bacteria’s replicating machinery for the replication of its own
genome; but this correlation is not observed for viruses which have their own replicating
machinery.
This mutation pressure leaves its mark with more or less intensity on functionally
different parts of the genome. In non coding parts of the genome, the pressure is practically
not resisted; in genes coding for proteins, it is resisted only marginally more; in genes
coding for tRNA and ribosomal RNA it is more strongly resisted but it is nonetheless
observable. Among genes coding for proteins, the third base of the codons is the one
whose G+C content has the highest correlation with the G+C content of the genome as a
whole. Given that many mutations at this positions are neutral, as pointed out earlier, a
high proportion of biased mutations at this position are passed on to future generations.
The first position of the codon shows a smaller correlation and the second position an
18
even smaller one. This is consistent with the idea that mutation bias is felt more strongly
where the mutations are neutral.
Strong mutation bias combined with the neutrality of the code will lead to some codons
becoming completely or almost unused. In mitochondria of the yeast Torulopsis glabrata,
for instance, codons of types CGN (N being any base) have disappeared altogether as well
as any tRNA capable of reading them (Figure 2.4). Arginine is now always encoded as
AGA or AGG. This, as Osawa et al. (1992) explain, sets the stage for a nearly harmless
codon reassignment. Indeed, if a new tRNA (probably obtained by gene duplication)
appears which can read that codon, there is no alteration of existing proteins and the new
codon can come back into usage in the course of further evolution. Note that for this to
be possible the codon must not share its tRNA with another codon which is still in use.
We have now a mechanism by which changes in the genetic code can take place even
in complex life forms contrarily to what Crick thought. It is therefore easier to imagine
that at some point in the past variation was found in codes on which selection could have
acted.
2.2.3
Adaptive forces shaping the genetic code
Can adaptive forces be invoked for the present shape of the genetic code? There are reasons
to be cautious about it. We just saw that mutation pressure can eliminate the disastrous
consequences of changing the meaning of a codon. However, reliance on mutation pressure
for such changes implies that very long time scales will be necessary for variation to be
generated on which selection can act. In addition, the benefits to be gained from changes
in the code can themselves only be felt in the long term. A change in the code is not
beneficial to the organism (we just saw that we are lucky if it is neutral), but only to the
organism’s lineage if it confers to it some kind of improved evolvability. If a new amino
acid is introduced in the code for example, enormous amounts of new possibilities open up
for evolution. But the benefits only become tangible once improved proteins have evolved
which use the new amino acid.
Selection is therefore possible but it has to be a very slow and inefficient process.
After this necessary warning, we now look at a feature of the code for which adaptive
explanations have long been put forward.
Sonneborn (1965) suggested very early on that neighbouring codons tend to code for
chemically similar amino acids. He claimed that this property was not accidental but that
the code had adapted to minimise the effect of mutations. Crick (1968) and Woese (1967)
rejected the idea. Not on the grounds that an adaptive hypothesis was untenable but
because they thought that the selective advantage of minimising mutation would be too
small. As we saw in Section 2.2.1, Crick (1968) proposed a different explanation for this
property. Wong (1980) complicated the debate arguing that the code was not in fact optimal with respect to the property put forward by Sonneborn and that it could be improved
by a series of minor rearrangements. Swanson (1984) pointed out that Sonnerborn’s property would also result in limited damage when a tRNA pairs by accident with a similar
but unproper codon. If similar amino acids are encoded by similar codons it is also the
19
case that the occasional misreading of a codon by an unsuitable tRNA will result in the
substitution of an amino acid by a similar one. This additional benefit strengthened the
case of the adaptive origin.
More recently, Haig and Hurst (1991) reexamined how well mutation effects are minimised in the current code. They compared the average chemical distance between amino
acids encoded by neighbour codons for the universal genetic code and for 10,000 randomly
generated codes, all of them with the same amount of neutrality. Out of these, only 2 had
a lower average distance than the genetic code. This indicates that Sonnerborn’s property
is highly optimised in the universal code. Yet the part played by selection in this state of
affairs remains difficult to assess. Maynard Smith and Szathmary (1995) consider it the
most likely explanation.
This example illustrates well the difficulty in reaching definite answer over such issues.
When a beneficial feature is postulated for the code, its benefits can be difficult to quantify
and balance against the small potential for change. The part played by historical accidents
is also very difficult to assess.
2.2.4
An adaptive hypothesis for neutrality in the genetic code
The hypothesis that this thesis puts forward is that some forms of redundancy cause
the evolutionary search for optimal sequences to go faster than others. Furthermore, we
postulate that one of the relevant features of redundancy in this respect is the number of
neutral mutations it defines. In what follows we give an intuitive argument as to why it
could be so.
Consider a codon CCC which becomes CCG after being the object of a mutation. This
mutation is neutral since both CCC and CCG code for proline. Consider now the effect of
another mutation that changes the middle base of both CCC and CCG to A. At the amino
acid level, the transition CCC → CAC leads from proline to histidine, while CCG → CAG
leads from proline to glutamine. We conclude that the neutral mutation has created the
conditions for a diversity of outcomes in the face of subsequent mutations.
Consider now a sequence S that is M triplets long; most of those triplets can be the
object of neutral mutations such as the ones discussed above. We call Hs the set containing
all sequences accessible from S by neutral mutations only. By definition, all sequences in
Hs have the same phenotype. Until a sequence of higher fitness is found, a population
made of identical copies of S will freely drift across Hs since nothing opposes change within
Hs . As with CCC and CCG in the previous paragraph, nothing seems to happen when
the population moves around Hs since all sequences have the same phenotype Ps ; but the
conditions are being created for a greater diversity of outcomes in the face of subsequent
non neutral mutations. This diversity can be quantified in Is , the set of sequences which
are exactly one non-neutral mutation away from elements of Hs . Is forms an enclosing
envelope around Hs which constitutes a forced passage out of it.
The pattern of neutrality in the code will have an effect on Hs and indirectly on Is . If a
pattern of neutrality leads to a high number of different phenotypes in Is , more phenotypes
are reachable from S and the likelihood of getting trapped in a local fitness maximum is
20
reduced.
However, the relationship between the fraction of possible mutations which are neutral
and the variety of phenotypes in Is is not straightforward. As the fraction of neutral
neighbours increases, the size of Hs increases and hence the size of Is as well. But this
increase in neutrality results in more of the sequences of Is having the same phenotype;
and since we are interested in the variety of phenotypes found in Is , the overall effect
might be negative. An extreme example would be that both Hs and Is are very large but
redundancy is so ubiquitous that all sequences in Is have the same phenotype. Only one
phenotypic transition would be possible from Hs in this case. We conclude that neutrality
should not simply be maximised. In any case, Chapter 3 will show that the fraction of
neutral neighbours is not the only parameter relevant to the problem.
Before molecular data became widely available, evolutionary theory did not pay any
attention to the possibility of neutral change. It took two articles by Kimura (1968) and
King and Jukes (1969) to open the eyes of the Darwinian establishment to the fact that
most of the changes at the molecular level are neutral. This revelation inspired Sewall
Wright the following comment:
Changes in wholly nonfunctional parts of the molecule would be the most
frequent ones but would be unimportant unless they occasionally give a basis
for later changes which improve function in the species in question (Provine,
1986, p.474).
This is essentially the same point as we have made.
In order to be complete, an argumentation for neutrality as an adaptation of the genetic
code would have to:
• take into account the wobble rules,
• show that varying the pattern of neutrality of the code affects the evolutionary
process,
• show that the current pattern of neutrality in the genetic code is optimal or at least
locally optimal,
• indicate how an upwards path to that optimal pattern could have taken place from
an early version of the code.
Only the first point will be fully addressed in this thesis. We will then examine the
consequences of this fact for the practice of GAs. As explained in the introduction, the
relevance of the first point for GAs is not dependent on the outcome of the other two.
2.3
Codes in Genetic Algorithms
Codes or encodings, as they are sometimes called in the GA literature, can be broadly
defined as the symbolic manipulations through which genetic information is translated
into a more convenient description of the candidate solution. Whereas the genetic code
only addresses the very first layer of interpretation of the genome, GA encodings refer to
the symbolic manipulations going all the way up to the full definition of a solution. These
21
encodings are therefore sometimes called genotype to phenotype mappings. Because they
cover the whole transformation from genotype to phenotype, encodings have a strong
impact on the performance of GAs.
2.3.1
Importance of the genotype to phenotype mapping for GAs
Search space
Solving a problem with a genetic algorithms is, unlike biological evolution, a goal oriented
process where fitness measures an ability to solve a predefined problem. Hence, the evolutionary process has to be constrained to operate within boundaries compatible with some
preconceived idea about the structure of a solution. Before we decide on the encoding, we
should thus have a set of objects A among which we expect to find some good solutions
to our problem and a fitness function which applied to any element of A returns a real
valued number measuring its quality as a solution. A is therefore the set of phenotypes
to which we want to restrict the evolutionary process.
This leads to two considerations in the choice of a genotype to phenotype mapping:
• mostly elements of A are represented by genotypes,
• almost all elements of of A are represented by some genotype.
In the case where we have some additional information about which objects in A are
likely to yield good solutions, we have the possibility of biasing the encoding by allowing
more genotypes to represent those objects which we think are likely to perform well.
Mutation
In standard genetic algorithms, genotypes are bit-strings, and mutation alters a genotype
by changing each bit with a low probability. Therefore, from any given genotype, G,
mutation produces genotypes which differ from G at a few positions. The expectation
is that this type of transition will keep producing improved solutions until an acceptable
solution is found. Whether this is the case or not depends on the phenotypic consequences
of mutation which will be determined by the genotype to phenotype mapping. Or, as
Wagner and Altenberg (1996) put it:
What turns out to be crucial to the success of the evolutionary algorithm is
how the candidate solutions are represented as data structures ... The process
of adaptation can proceed only to the extent that favorable mutations occur,
and this depends on how genetic variation maps onto phenotypic variation.
However, the relationship between a phenotype and those that can be reached from it
by point mutation offers only limited understanding of the evolutionary process. Mutation
will rarely be used on its own in the generation of new individuals. It is usually used in
conjunction with crossover which complicates the issue significantly. Also, as said above,
a parent will differ from its offspring by a number of mutation that is Poisson distributed.
However, if instead of entire chromosomes we look at small sections of them as they travel
down the generations, it is indeed the case that point mutation is the main source of
change: a small section of chromosome is unlikely to be broken down by recombination or
altered by more than one mutation at the same time.
22
Crossing-over
The rationale behind the use of the crossover operator is that it can combine in a single
individual good schemas which only exist in different individuals of the population. For this
to favour the discovery of even better solutions, good schemas must combine gracefully.
High epistasis describes the situation where schemas behave very differently depending
on the genetic information found elsewhere on the chromosome. Low epistasis, on the
other hand, makes it possible to define good schemas as those which increase the fitness
of a chromosome regardless of the genetic information contained elsewhere. To give an
example, epistasis would be at its minimum in a situation where the fitness of a genotype
was proportional to the number of 1s it contains.
Epistasis is a joint property of the problem and its encoding. An encoding can potentially rearrange the candidate solutions in any arbitrary way at the genetic level. That
is, any problem could in theory be encoded in such a way that changing a 0 into a 1
anywhere on the chromosome would produce an increase in fitness. In practice, however,
such argument is useless because producing such an encoding implies that we order all
possible phenotypes by fitness and assign them to conveniently chosen genotypes. If we
can do that, there is no point in running a GA to solve the problem anymore. More
realistic encodings can have an effect on the epistasis but only a limited one. The more
information one has about the structure of the problem, the more able one is to produce
an encoding that reduces epistasis.
2.3.2
Existing work
Despite its recognised importance, the issue of representation is poorly investigated as
pointed out in the latest survey of the field (Mitchell, 1996). The reason is probably that
it is a very hard issue that cannot be examined in complete abstraction of the problem
to which one is applying the GA. Indeed, an examination of the literature shows that a
constant preoccupation for practitioners of GAs is to tailor the encoding to the peculiarities
of their problem.
Encoding neural networks
Neural network design is the application of GAs where the issue of encoding has received
most attention. Early attempts focused on nets of fixed architecture and only allowed the
weights of the connections to evolve (Montana and Davis, 1989). Some researchers then
extended the evolutionary search to the topology of the network. Miller et al. (1989), for
instance, encoded the connectivity matrix of networks whose number of nodes was fixed.
Harvey et al. (1992) proposed another encoding which allowed the number of nodes as
well as the number of connections to vary. Every connection is encoded using 7 bits which
describe the nature of the connection (excitatory or inhibitory) and its destination. The
origin of the connection is implicit from the position of the segment on the chromosome;
all connections coming out of a node are grouped together on the chromosome. Kitano
(1990) criticised such encodings on two grounds. Firstly, the chromosome would have to
get larger and larger in order to cope with larger and larger networks. Secondly, these
encodings do not favour the generation of repeated structures which he thought would be
23
helpful in most problems. He proposed, as an alternative, to use the chromosome to encode
a set of production rules that could produce topologies whose size is not correlated to the
size of the rules. This approach also happened to be more in tune with the biological way
neural connections are specified by genetic information. Many other proposals followed
which all relied on encoding a recipe to produce the network rather than a blueprint of it
(Gruau, 1992; Boers and Kuiper, 1992; Husbands et al., 1994; Dellaert and Beer, 1994).
Despite the large number of proposals made, no clear picture has emerged of the best
way of genetically representing neural networks for fruitful manipulation by the GA. The
main problem is a lack of comparative work between those approaches. But comparative
work would not be easy for two reasons. Although the end product is in all cases a network,
the encoding is often tailored with a particular task in mind. Hence it would be difficult to
decide on which task the comparison should take place. Secondly, those encodings differ a
lot from each other so that even if some were found to be better than the others it would
be very difficult to identify the exact causes of the difference. It is almost certain, however,
that the design of neural networks by GAs could be improved by a better understanding of
encoding issues. It might be the case, however, that the problem is too complex to provide
the right context in which to carry out informative comparisons of coding strategies.
Comparing coding strategies
In one of the few studies of its kind, Caruana and Schaffer (1988) compared standard
binary encoding to Gray encoding on a set of test functions. They showed that Gray
encoding outperforms standard binary encoding on most functions. In binary encoding
some consecutive integers have genetic representations which are very far apart. For
instance 7 and 8 are encoded by 0111 and 1000 respectively. Gray codes, on the other hand,
have been designed to avoid such cliffs: consecutive integers will be represented by binary
strings whose hamming distance is 1. Because the functions on which the comparison
was carried out were quite regular when examined at the numerical level, Gray encoding
preserved that smoothness better than standard binary encoding. For functions with a
rugged structure at the numerical level, Gray encoding would not perform particularly
well.
Whichever way we choose to represent integers, some transitions will be facilitated at
the expense of others. But Caruana and Schaffer point out that there will be as many
functions for which standard binary encoding outperforms Gray encoding as there are
functions for which the reverse is true. The superiority of Gray encoding only stems from
the fact that functions of interest are more likely to fall in the second category.
Neutrality in GAs
Neutrality and neutral drift are not typically regarded as relevant issues in the field of
genetic algorithms. Harvey (1993) has argued against this on the basis that in most realworld GA applications, neutral or nearly neutral paths will exist which the population
will not fail to explore. The author illustrates this with a detailed genetic study of a
population of neural networks under selection for their ability to guide a robot to the
center of a room. It is shown that despite its high degree of convergence, the center of
mass of the population moves around at a rate suggesting a large amount of neutral drift.
24
Order of visit of the cities
75431260
Binary
75531260
representation
111 101 100 011 001 010 110 000
mutation
111 101 101 011 001 010 110 000
Invalid phenotype:
city 4 is not visited
Figure 2.5: Mutation and the travelling salesman problem. The standard mutation operator does not interact well with a natural representation of the order of visit of the
cities.
The issue of neutrality has, however, not been discussed by Harvey in connection with
codes.
2.3.3
Relevance of redundancy for GAs
Redundancy in the genetic code will only be relevant to GA applications where similar
coding mechanisms exist. We count as similar to the genetic code any mapping between
binary segments of determined size and symbols taken from a larger alphabet. The point
of such mechanisms in GAs is to make a bridge between bits and more expressive symbols
which provide a higher language in which a solution can be expressed.
A good illustration of such mapping can be found in Boers and Kuiper (1992). They
evolve neural nets and their encoding relies on grammatical rules to produce the topologies.
These rules are expressed using 17 different types of symbols. Although only 5 bits would
have been sufficient to encode those 17 symbols in a binary form, the authors chose to use
the genetic code as the inspiration for the correspondence between the symbols and their
binary representation and therefore included some redundancy. Six bits per symbol are
used so that the table has 64 entries and the redundancy is allocated in a way as close to
the genetic code as possible. No justification, however, is provided for this choice.
Very often, the symbols which are being encoded are numerical values, either integer
or real. Whenever this is the case, the standard representation of integers in base 2 can
be used as the basis for the mapping. If, for instance, real values between X and Y are to
be represented using p bits, the segment which reads a1 , a2 , ..., ap can be associated with
X +i(a1 , a2 , ..., ap )(Y −X)/(2p −1) where i(a1 , a2 , ..., ap ) is the integer whose representation
is a1 , a2 , ..., ap in base 2.
In all these cases, redundancy can easily be introduced regardless of whether the symbols are integers, real numbers or part of a grammatical rule. All we have to do is increase
the number of bits used to represent them without increasing the number of values they
can take.
Representation is a particularly difficult issue for many combinatorial problems whose
PHENOTYPE
GENOTYPE
representation
7
965
214
8
3
1
965
2147
03
038
Mutation by creation of a new subset
from elements of other subsets
6
5
78
Mutation by redistribution
of one subset into the others
0
2
9
25
4
038
147
96
52
Figure 2.6: Two possible redefinitions of mutation compatible with set partitioning problems. Adapted from Falkenauer (1995)
solution requires the ordering or the partitioning of some set. The travelling salesman
problem and job-shop scheduling are instances of such problems. In the travelling salesman
problem, the set A of phenotypes on which evolution should operate contains all the
possible orders in which the cities can be visited. A simple genetic representation of
that ordering is to list the index of the cities in their order of visit on the chromosome.
The problem with such representation is that it does not interact well with the mutation
operator as shown on Figure 2.5.
Some alternatives to the standard GA have been proposed to address such limitations.
Many of them call for a redefinition of both mutation and crossover. Falkenauer (1995)
for instance suggests that, for partitioning problems, the unit of information on the chromosome should be the subsets that define the partition of the set rather than individual
elements of it. Accordingly, he redefines mutation so that it is adapted to such building
blocks. Two possible redefinitions are illustrated in Figure 2.6. The traditional mutation
operator inspired from biological systems acts at a very low level, the binary digit, and
can alter any bit independently of any other. The alternatives suggested by Falkenauer,
on the other hand, operate at a much higher level and have to respect some constraints
which are global to the chromosome such as the non-repetition of an element in different
subsets. Defining genetic operators which can handle such high-level representations suppresses the need for binary representations and consequently for a mechanism similar to
the genetic code. Redundancy as it is studied in this thesis is therefore not applicable to
problems where such representations are used.
2.4
Neutrality in RNA evolution
In the past three years, a group of theoreticians working around Peter Schuster has been
investigating neutrality in RNA folding and its consequences for the evolutionary process.
We will now introduce their approach and discuss their findings.
6
6
G
5
U
4
C
26
A
A
7
5
U
G
7
A
8
4
A
A
8
3
G
C
9
3
C
G
9
2
G
C
10
2
A
U
10
1
A
U
11
1
A
U
11
A
12
5’
5’
G
12
3’
3’
Figure 2.7: Two RNA molecules with the same secondary structure.
2.4.1
RNA folding
RNA molecules fulfill a wide variety of functions. As messenger RNA, they convey information from the nucleus to the ribosomes in the cytoplasm. But RNA molecules are also
capable of more active roles. We saw that transfer RNAs are in charge of implementing
the genetic code. More recently, Kruger et al. (1982) showed they can perform enzymatic
activity which was thought to be the exclusive domain of proteins. Like proteins, RNAs
are capable of specific action by folding into a precise three-dimensional pattern called the
tertiary structure of the molecule. This structure somehow results from the base sequence,
also called the primary structure, but for the time being no model is capable of predicting
how one results from the other. The secondary structure is an intermediary description
of the folding process which is easier to predict from the primary structure. It describes
the pairing that takes place between some of the bases of the molecule. This creates some
important constraints on the folding of the molecule but it is by no means a full description
of its three-dimensional structure.
2.4.2
Shape space
If we consider RNAs made of nucleotides, A, C, G and U, three pairings are possible:
the Watson-Crick base pairs C-G and A-U as well as the weaker G-U pairs. Two RNA
sequences have the same secondary structure if they have the same number of bases and if
the relative positions of the pairing bases are the same, disregarding which of the possible
pairs of nucleotides (A-U, U-A, C-G, ...) are actually involved. An example of two
molecules folding in the same shape (the term shape will be used in what follows as
synonymous of secondary structure) is given in Figure 2.7. Remember that because of the
asymmetry of the backbone, the ends of an RNA can be distinguished. One is called 3’,
the other 5’.
27
A natural and unambiguous way of defining a shape is to list the position of the bases
(counted from the 5’ end) which are attached to each other. A position can appear only
once in this list and one that does not appear in it indicates an unpaired base. The
condition that no knots or pseudo-knots exist implies that pairs (xi, xj) and (xk, xl) such
that xi < xk < xj < xl are ruled out. In this representation, the shape in Figure 2.7
would be [(1, 11), (2, 10), (3, 9)]. An alternative way of describing shapes is to denote an
unpaired position on the molecule by ‘.’ and a paired one by either ‘(’ or ‘)’: the character
‘(’ is used if the partner position is located further to the 3’ (right side) of the sequence
and ‘)’ if it is located to the 5’ (left side) of the sequence. With these conventions, the
shape in Figure 2.7 would be denoted ‘(((.....))).’.
2.4.3
Sequences folding into s and sequences compatible with s
From a formal point of view, we can describe the folding process as a function
f : Qnα → Y n
where Qnα is the set of possible sequences of n letters taken from an alphabet of size α
and Y n is the space containing all secondary structures of length n (excluding knots and
pseudo-knots which are not considered here). In the case where we consider all standard
nucleotides, we have α = 4. The set Qnα can be seen as a graph if we decide that sequences
which differ at a single base are connected by an edge.
To understand neutrality in the context of RNA folding is to propose some description
of f −1 (s), the set of all sequences that fold into a shape s. Unfortunately, no inverse
algorithm exists that generates all the sequences that fold into a given shape. The only
way to proceed is to try one by one all sequences checking whether they fold into s.
However, when doing so, we do not need to examine all the sequences of Qnα . We can
define a subset in which we are sure to find all the sequences that fold into s. This subset,
which we call the set of compatible sequences of s and denote as C(s), is made of all the
sequences which could fold into s without contradicting the logic of base pairing.
Consider again the shape s in Figure 2.7. Any sequence where base 1 can pair with
base 11, base 2 with base 10, and base 3 with base 9, will belong to C(s) since it could
fold into the shape of that figure without contradicting base pairing. A sequence where
those conditions are not met clearly cannot fold into that shape. Belonging to C(s) is a
necessary condition for folding into s. But it is not sufficient since a sequence is typically
compatible with many different secondary structures. The considerations which allow
to decide in which of those possible configurations a sequence will end up will not be
described here. Various algorithms do such predictions based on considerations of free
energy of the molecule. There is however no shorter way to determine f −1 (s) than to
apply such algorithms to every element of C(s).
At the unpaired positions of shape s, elements of C(s) are free to take any value. It
results that, given a shape with nu unpaired positions and 2np paired ones, the number of
compatible sequences is nαu nβp where α is the number of possible bases and β, the number
of choices for two bases paired together. In the case of alphabet {A,C,G,U}, α is equal
28
to 4 and β to 6 (A-U, U-A, G-C, C-G, G-U and U-G). In the simpler case that will be
considered later where the alphabet is restricted to {G,C}, α and β are both equal to 2.
2.4.4
Connectivity of C(s)
Let us examine now the connectivity of C(s) considered as a subgraph of Qnα . Consider
two elements of C(s) which differ in the way one of the base pairs that characterise s is
implemented. An instance of this could be AGGCACCGCCUG and ACGCACCGCGUG
if s is the shape in Figure 2.7. Such sequences are two point mutations away. However, if
we perform one of those mutations but not the other the pairing becomes impossible; there
is therefore no way of going from one of these sequences to the other by point mutation
without stepping out of C(s) for one step. We conclude that C(s) is not a connected
subgraph of Qnα . Note that the six possible base pairs are split in two groups: transitions
C-G ↔ U-G ↔ U-A are possible on one side and G-C ↔ G-U ↔ A-U, on the other. It
is therefore possible to go, for instance, from AGGCACCGCCUG to AAGCACCGCUUG
by point mutations without stepping out of C(s). Hence, C(s) is fragmented into 2np
connected subgraphs corresponding to the possible assignments of base pairs from either
of these two groups to each of the paired positions. Each of these subgraphs contains
exactly 3np × 4nu points corresponding to the choice of one base pair among the three in
each group and the unconstrained choice of bases for the unpaired positions.
In order to cement this constellation of components into a single connected graph,
Reidys and colleagues decided to add edges to Qnα by connecting sequences which differ
at two points when these points correspond to paired positions of shape s. The resulting
graph is richer in edges than Qnα , but the number and nature of these added edges depends
on s which is reflected in its name, C(s). To make clear the different parts played by paired
and unpaired positions in this new representation, Reidys and colleagues define this graph
as the product of two graphs:
n
C(s) = Qnαu × Qβ p
n
In Qβ p , which corresponds to the np paired positions of the shape, the β possible base
pairs are treated as having a distance of 1 with each other. Sequences which differ in a
n
single base at an unpaired position are neighbours in Qnαu × Qβ p as well as those differing
at two positions paired together as long as the two new bases can also pair together.
The sequences AGGCACCGCCUG and ACGCACCGCGUG shown at the beginning are
now connected. But AGGCACCGCCUG is still not connected with AGGCACCGCGUG
because G cannot pair with G.
The preimage of s, f −1 (s), seen as a subgraph of Qnα , is likely to be as fragmented as
C(s) given that it is a subset of it. To avoid this, Reidys and colleagues consider f −1 (s)
as embedded in the connected graph C(s). They call the resulting subgraph N (s), the
neutral network of s. Clearly, N (s) is likely to be more connected than f −1 (s) seen as a
subgraph of Qnα .
2.4.5
29
Modelling neutral networks with random graphs
In order to build a simple statistical model of N (s), Reidys and colleagues have resorted
to random graph theory (Palmer, 1985; Bollobás, 1985). The only result of random graph
theory that is used here is the following.
Consider Γλ a subgraph of Qnα constructed by including in it every vertex of Qnα with
probability λ. Edges of Qnα are in Γλ only when they connect two vertices which are in Γλ .
√
It has been shown that there exists a critical value λ∗ = 1 − 1−α α such that, whenever
λ > λ∗ , Γλ is almost certainly connected.
n
Given that N (s) is embedded into Qnαu × Qβ p as we saw above, Reidys and colleagues
break down N (s) into the product of two random graphs
N (s) = Γu × Γp
n
with Γu a random subgraph of Qnαu with probability λu and Γp a random subgraph of Qβ p
with probability λp . Random graph theory then states that if both λu and λp are greater
√
√
than their respective critical values λ∗u = 1 − 1−α α and λ∗p = 1 − 1−β β then N (s) is a
connected subgraph of C(s).
2.4.6
Random graphs compared to simulated neutral networks
The only available data against which random graphs can be evaluated as models of neutral
networks is described in Grüner et al. (1996a). The authors applied a folding algorithm
to every 30-bases-long RNA made of G and C. All the possible shapes s were recorded
together with all the sequences that fold into each of them. This computation took 130
days on an IBM Risc 6000 workstation!
The results are not exactly in agreement with random graph theory. Since the alphabet
is only {G,C}, α and β are equal to 2 and both λ∗u and λ∗p are equal to 0.5. However,
many observed secondary structures for which the calculated values of λu and λp are well
above 0.5 have neutral networks N (s) made of 2 or 4 sub-graphs of similar sizes. This
contradicts random graph theory which would predict a single connected component. In
order to explain the discrepancy, the authors observe that these different components vary
in the ratio of C and G at critical positions corresponding to unpaired positions that
could easily become paired and destroy the shape. But, in the words of the authors, “The
deviations from theory were explained by structural features that are inaccessible to the
random graph approach.”
Other shapes were found whose neutral networks should have been connected according
to random graph theory but were in fact made of one large component and some smaller
components. The authors attribute this type of discrepancy to finite size effects since the
results of random graph theory are only true in the limit of long sequences.
2.4.7
Population dynamics on neutral networks
Whether or not they are properly described by random graphs, large neutral networks
appear to be the norm rather than the exception in the folding of RNA molecules. Some
twenty years ago, Eigen and Schuster (1977) analysed the situation where a master sequence, fitter than all neighbour sequences, replicates with a probability of error p per
30
nucleotide. They discovered the existence of a critical value of p, the so-called error
threshold, beyond which the master sequence will eventually be lost. Below the error
threshold the population is organised in what Eigen and Schuster call a quasi-species: a
proportion of the population sits at the master sequence while the rest forms a cloud whose
density decreases with hamming distance. The closer p is to the error threshold, the more
spread out this cloud is.
This analysis was adapted by Huynen et al. to address the situation where the master
sequence is embedded in a neutral network. The concept of error threshold for the master
sequence has to be abandoned since for any mutation rate the population moves around
the neutral network loosing rapidly the original sequence. The secondary structure or
phenotype is, however, preserved as this happens. What they define is a phenotypic error
threshold beyond which the secondary structure itself is lost. Below this threshold, the
population divides itself into identifiable clusters of sequences spread out on the neutral
network. Derrida and Peliti (1991) have analysed neutral drift on a flat landscape and
found qualitatively the same behaviour. The population is less fragmented in the case of
a neutral network because the boundaries of the network have a canalising effect on the
drift.
The main conclusion of this work is that below the phenotypic error threshold, the
population is homogeneous in phenotype but is in fact exploring different regions of the
neutral network. If an entry point is found into a fitter shape, the population will reassemble on the fitter side of that entry point and start spreading out again from there on
the neutral network of the new, fitter shape. A population of evolving RNAs is therefore
not a single localised quasi-species in sequence space but rather a collection of constantly
moving quasi-species. Huynen et al. claim that the independent diffusion of these quasispecies increases the likelihood that the population as a whole encounters entry points to
the neutral networks of better shapes.
2.4.8
Perpetual innovation along a neutral network
For this claim to hold, neutral networks have to offer a variety of phenotypes, i.e. shapes,
at their boundaries. It could in principle be the case that, even though the boundaries
of a neutral network are large, the number of neutral networks with which it share this
boundary is relatively small thus limiting the number of possible transitions. This is
a consideration related to the variety of phenotypes found in the set Is as discussed in
Section 2.2.4.
Huynen (1996) explored the boundaries of a neutral network. Starting from a 76
bases long sequence whose secondary structure is that of the tRNA for phenylalanine,
he counted the number of distinct secondary structures (S) that were found among the
non-neutral neighbours of that sequence and kept a list (Q) of those structures. Then
he chose an arbitrary neutral neighbour of the sequence and examined the non-neutral
neighbours of that new sequence adding to S the number of structures not already in Q
and adding the new sequences to Q. This procedure was repeated for 1000 steps. When
the cumulative number of different shapes is plotted against the number of steps taken
31
on the neutral network, a linear relation emerges whose slope indicates that an average
of 18.1 new shapes are found every step. The same procedure for a totally random walk
would produce 39 new shapes every step. The linear relationship between the cumulative
number of shapes found and the number of steps taken leads Huynen to conclude that
novelty does not saturate as one moves around the neutral network. Every step along it
brings an equal number of not yet encountered shapes. Note that Huynen only used point
mutations when performing the random walk on the neutral network of the tRNA. Hence,
as discussed above, many changes at the paired positions were impossible and only one of
2np = 220 ≈ 106 components was explored.
2.4.9
Critique of the random graph approach
Distortion of sequence space
As we saw in Section 2.4.4, the underlying sequence space had to be transformed prior to
application of the random graph approach to accommodate the fact that mutations must
sometimes happen in concert at paired positions in order not to disturb the structure.
This has two major drawbacks. First, as the authors point out,
Defining N (s) as an induced subgraph of C(s), rather than as an induced
subgraph of the sequence space Qnα itself, avoids the peculiarities introduced
by the logic of base pairing. On the other hand, the neighborhood relation no
longer coincides with the action of mutation. Hence we have traded technical
tractability for biophysical interpretation.
Secondly, the sequence space Qnα which is normally adequate to describe any RNA of
length n has had to be replaced by a space C(s), tailored to deal with a specific shape and
whose properties are dependent on this shape.
Interactions between paired and unpaired positions
The decomposition of N (s) into Γu × Γp is not necessarily a natural one. Under that
definition, a sequence x1 ...xnu y1 ...ynp belongs to N (s) if and only if x1 ...xnu is a vertex
in Γu and y1 ...ynp is a vertex in Γp . But this product definition also commands that if
x1 ...xnu y1 ...ynp and x01 ...x0nu y10 ...yn0 p are two sequences in N (s) then x01 ...x0nu y1 ...ynp and
x1 ...xnu y10 ...yn0 p are also in N (s). Such regularity in the interaction between paired and
unpaired positions seems unlikely, and would need to be assessed. If it is true, for a shape
s with parameters nu ,np ,λu and λp , we expect the size of N (s) to be
| N (s) | =| Γu || Γp | = λu αnu λp β np
From some of the runs described in Grüner et al. (1996a), we have the necessary numerical
values to check how well this relationship holds. The most common structure to appear
among 30-bases-long sequences made of G and C is ‘........(((((((((....)))))))))’ from which
we deduce that nu = 12 and np = 9. The authors calculated that λu = 0.860 and
λp = 0.895 (no explanation is provided of how these numbers were calculated). The
expected value for | N (s) | from these figures is therefore
| N (s) |e = 0.860 × 0.895 × 212 × 29 = 1614178
The actual value of | N (s) | is 1568485, which is reasonably close.
32
But structure
‘......((..(((((((...))))))).))’, found in the same table, has λu = 0.562 and λp = 0.576 which
would predict | N (s) |e = 678873 while the real value is only 118307. In fact, most instances for which we have made such calculations are well under the expected value. This,
we think, casts serious doubts on the ability of the cartesian product between the two
graphs to capture the nature of the interaction between paired and unpaired positions.
Correlation and neutral networks
Both Schuster (1996) and Reidys (1995) acknowledge that random graph theory relies on
the assumption that the sequences forming the same structure are distributed randomly
in the space of compatible sequences. But no comment is made about the realism of such
assumption. This is, however, unlikely to be the case since it states that, given a sequence
X with shape s, changing one of the unpaired bases of X is just as likely to destroy s
as changing all paired and unpaired bases (provided we remain in the set of sequences
compatible with s). This is counter-intuitive but could probably be tested from the data
presented in Grüner et al. (1996a,b). A non-random distribution, if it was observed, could
explain why unconnected neutral networks were obtained where random graph theory
would have predicted connected ones. If sequences folding into a shape were found not to
be distributed randomly, the random graph model would have to be modified to account
for this fact. The tendency of sequences folding into s to form clusters within C(s) would
have to be measured and some way of generating graphs with the same distribution would
have to be found. No doubt, however, this would complicate the model a great deal and
limit the range of theorems from random graph theory that would be applicable.
Is full connectedness of neutral networks relevant to adaptation?
The main concern for applying random graph theory was to find simple conditions under
which a neutral network is fully connected. The underlying assumption is that the bigger
the neutral network the more effective the search for better adapted shapes. But given
a number of generations and a population size, there is a limit to how large a neutral
network can be effectively explored. A calculation of this size would be a useful figure
against which the actual size of neutral networks ought to be compared. Knowing that
two neutral networks must, somewhere in their immensity, come very close to each other
is not very useful if we cannot qualify this fact with the expected number of generations
it would take a population of size N to find this secret passage.
Furthermore, Schuster (1995) showed that all common shapes (shapes which are realised by a large number of different sequences) can be found within a small radius of
sequence space. Being able to travel over long distances on a neutral network might
therefore not be necessary for the discovery of useful novelty.
2.5
Conclusion
This chapter has shown that the genetic code is not an immutable consequence of chemistry; it can change and selection could conceivably favour good codes against bad ones.
We suggested that the pattern of redundancy found in the genetic code could be the result
33
of such selection.
We showed that if redundancy has been beneficial to the genetic code, there are good
reasons to believe that it could also be beneficial to similar codes as used in genetic
algorithms. Although the issue of representation has been consistently singled out as an
important one, the voluntary introduction of redundancy in codes has not been the object
of any investigation.
Neutrality in the mapping from primary to secondary structure of RNA molecules has
been described as enhancing the process of evolution in these molecules. The neutrality in
question is not mediated by a code; it is a consequence of the laws of physics and therefore
has no potential to change or be selected for. The fact that it could be beneficial in such
an immutable form is nonetheless encouraging for our case where neutrality can be tuned
by progressive changes to the code.
Chapter 3
A formal framework for a comparative
study of redundancy
This chapter proposes a formal definition of redundancy that will be used in the rest of
this thesis. Section 3.1 formulates the desired requirements for that definition. Section
3.2 proposes a definition in the minimal case where only one bit of redundancy is added.
Section 3.3 examines possible ways of generalising it to the addition of several bits. Section 3.4 proposes a shortcut for assessing the impact of a pattern of redundancy on the
evolutionary process.
3.1
Requirements for a definition of redundancy
The previous chapter suggested that redundancy in the genetic code could be more than
a historical accident. Because it can induce neutral paths between sequences, redundancy
might be able to reduce the likelihood that a protein is stuck in a local maximum of catalytic efficiency from which further improvement through natural selection is hindered.
The same principle could be beneficial to a GA when the introduction of similar redundancy is possible. Assessing the impact of redundancy on the evolutionary process is
therefore relevant both to the origins of the code and to the practice of genetic algorithms.
If we want to address both issues simultaneously, we need a language to talk about redundancy in codes which is insensitive to the function of the code and to the nature of the
symbols which are encoded.
The theory of error-correcting codes faces a similar need (Haykin, 1988). Whenever a
digital message is transmitted over a noisy channel, some of the symbols that compose it
can be corrupted and the message misinterpreted at the receiver’s end. Error-correcting
codes minimise the likelihood of such misinterpretations by adding some well designed
redundancy to the message. This makes the detection, and sometimes the correction, of
errors possible at the receiver’s end. For these techniques to have the required generality,
they must, however, overlook the nature of the message so that correction can take place
regardless of whether text, voice or data is being transmitted. Such dissociation from the
semantic of the layer above is also desirable if we want to conduct our study in enough
Chapter 3. A formal framework for a comparative study of redundancy
35
generality for it to apply to any GA application and to the genetic code.
Another desirable feature for our characterisation of redundancy is that it captures
well those features which are relevant to its interaction with the evolutionary process. As
discussed in the previous chapter, we have reasons to believe that the existence of neutral
mutations could be one such feature. But it is probably not the only one and it is therefore
advisable to keep an open mind about this question. The more precise our language is
in its description of redundancy, the more likely we are to be able to identify properties
relevant to the interaction of redundancy and evolution.
Genetic Algorithms typically handle bit-strings, whereas the genetic code uses a quaternary alphabet. In this thesis, we chose to focus on the case of binary alphabets. Our
results are therefore directly applicable to GAs; the drawback is that we move away from
the best model for the genetic code. This can be seen as problematic since we would like
our conclusions to hold for the genetic code as well. We believe, however, that results
obtained with binary alphabets apply equally well to quaternary ones since all the definitions and arguments in favour of redundancy described in this chapter and the next are
easily translated to the case of a quaternary alphabet. The difference between the two
cases is therefore of a quantitative nature not of a qualitative one. We will come back to
this issue at the end of Chapter 4.
3.2
3.2.1
Redundancy in a minimal form
A possible definition
We can formally define a code as a function
T : {0, 1}n → S
where S is an arbitrary set of symbols. We are not concerned with the nature of these
symbols since we want this definition to be as broad as possible. The function T associates
an element of S to each of the possible sequences of n bits. In the case of the genetic
code for instance, S contains 20 amino-acids and the stop codon. In the case of genetic
algorithms, elements of S are not the candidate solution but building blocks which are
further combined together to define it. In the rest of this thesis, the term symbol will be
used to refer exclusively to elements of S.
A non-redundant code is a code for which T is injective:
∀(a1 , a2 , ..., an ) ∈ {0, 1}n ,
∀(b1 , b2 , ..., bn ) ∈ {0, 1}n ,
(a1 , a2 , ..., an ) 6= (b1 , b2 , ..., bn ) =⇒ T (a1 , a2 , ..., an ) 6= T (b1 , b2 , ..., bn )
In other words, a code is non-redundant when no element of S has more than one sequence
representing it. This conforms to our intuitions about redundancy. A non-redundant code
will allow the representation of as many symbols as there are sequences, 2n in the case of
sequences of length n. Since we are only interested in elements of S which can be expressed
within the code, we exclude from S symbols which have no sequence representing them
(i.e. we ensure that T is surjective). As a result, for a code T defined on sequences of
36
length n, S has a maximum of 2n elements. It has exactly that number of elements if, and
only if, the code is non-redundant in which case T is a bijective function.
In this thesis, all instances of codes that will be considered prior to our controlled
introduction of redundancy will be of the above type. Clearly, non-redundant codes are
only a subset of all possible codes. And if the size of S is not a power of 2 no non-redundant
code exists that encodes only elements of S.
We believe, however, that restricting our study to these non-redundant codes is necessary. If we do not take this precaution, we will be adding redundancy to codes which are
already very dissimilar in the amount of redundancy they display. Compare for instance
a code whose set S has 2n + 1 elements with one that has 2n+1 − 1. Both require at least
n + 1 bits. But the first one has a large amount of structural redundancy since almost half
of the binary strings can be used to provide redundant representations for elements of S.
The second has very little scope for such redundancy since only one string is unassigned
once all elements of S have been assigned a bitstring. Understanding the effect of adding
redundancy will be difficult if some amount of redundancy was already there in the first
place. And even more so, if that amount of redundancy is variable.
Given a non-redundant code, there are only two ways to produce a redundant version
of that code. One is to reduce the size of S, the other is to increase the length of the
sequences on which the code is defined while keeping S unchanged. The first option
reduces the power of expression of the code which will probably make it unsuitable for
its original purpose. The genetic code, for instance, would not fulfill its purpose if it
could only express eight amino-acids. We will therefore only investigate the case where
sequences are increased in length. This length can be augmented by any amount but we
will focus in this section on the simple case where a single bit is added. Let
T : {0, 1}n → S
be a non-redundant code. We define
T σ : {0, 1}n+1 → S
∀(a1 , a2 , ..., an ) ∈ {0, 1}n
T σ (0, a1 , a2 , ..., an ) = T (a1 , a2 , ..., an )
T σ (1, a1 , a2 , ..., an ) = T (σ(a1 , a2 , ..., an ))
where
σ : {0, 1}n → {0, 1}n
is any permutation or shuffling over the set of n bit long sequences. That is, σ is a function
from the set of n bit long sequences to itself such that every sequence has one and only
one preimage sequence through σ. In mathematical terms:
∀(b1 , b2 , ..., bn ) ∈ {0, 1}n , ∃! (a1 , a2 , ..., an ) ∈ {0, 1}n /
σ(a1 , a2 , ..., an ) = (b1 , b2 , ..., bn )
Let us now examine the properties of a redundant code T σ defined in this way.
37
1. Since T σ (0, a1 , a2 , ..., an ) = T (a1 , a2 , ..., an ) for any (a1 , a2 , ..., an ), the relationship
of neighbourhood between symbols that exists in T is preserved in T σ . That is, if
a point mutation can turn symbol S1 into S2 under code T , the same transition
between symbols is also possible under T σ .
2. By constraining σ to be a permutation we make sure that redundancy is equally
distributed among the elements of S. Under T , every symbol is represented by one
sequence; under T σ the number of sequences has doubled due to the addition of one
bit and every element of S is represented by exactly two sequences. If we failed to
ensure this, redundancy could introduce a bias in the representation of some symbols
which would make comparison difficult.
These two points provide a justification for this choice of definition.
3.2.2
Some examples
Let us illustrate these definitions on some concrete example.
A valid example
Consider the following non-redundant code,
T : {0, 1}3 → S = {A, B, C, D, E, F, G, H}
a1 a2 a3
T (a1 a2 a3 )
000
A
001
B
010
C
011
D
100
E
101
F
110
G
111
H
and the following permutation σ represented here both in its binary and decimal form.
a1 a2 a3
σ(a1 a2 a3 )
i
σ(i)
000
001
0
1
001
011
1
3
010
100
2
4
011
110
3
6
100
101
4
5
101
010
5
2
110
000
6
0
111
111
7
7
⇔
The resulting redundant code is:
T σ : {0, 1}4 → S = {A, B, C, D, E, F, G, H}
d1 a1 a2 a3
T σ (d1 a1 a2 a3 )
d1 a1 a2 a3
T σ (d1 a1 a2 a3 )
0000
A
1000
B
0001
B
1001
D
0010
C
1010
E
0011
D
1011
G
0100
E
1101
F
0101
F
1101
C
0110
G
1110
A
0111
H
1111
H
38
The condition that ∀(a1 , a2 , ..., an ), T σ (0, a1 , a2 , ..., an ) = T (a1 , a2 , ..., an ) results in all
the elements of S appearing in the second column of the table in the same order as they
appeared in the definition of T .
The condition that ∀(a1 , a2 , ..., an ), T σ (1, a1 , a2 , ..., an ) = T (σ(a1 , a2 , ..., an )) constrains
all the elements of S to appear once and only once in the rightmost column. Their order
of appearance is arbitrary and determined by the permutation σ.
An invalid example
Consider the following redundant code U ,
U : {0, 1}4 → {A, B, C, D, E, F, G, H}
d1 a1 a2 a3
U (d1 a1 a2 a3 )
d1 a1 a2 a3
U (d1 a1 a2 a3 )
0000
A
1000
D
0001
C
1001
B
0010
G
1010
E
0011
D
1011
G
0100
E
1101
F
0101
B
1101
C
0110
H
1110
A
0111
F
1111
H
This code will not be considered a valid redundant version of T because U (0001) 6= T (001)
which contradicts our definition. Symbols A and B which are neighbours under T are not
neighbours under U .
Another invalid example
Consider now code V ,
V : {0, 1}4 → {A, B, C, D, E, F, G, H}
d1 a1 a2 a3
V (d1 a1 a2 a3 )
d1 a1 a2 a3
V (d1 a1 a2 a3 )
0000
A
1000
A
0001
B
1001
H
0010
C
1010
H
0011
D
1011
H
0100
E
1101
H
0101
F
1101
A
0110
G
1110
A
0111
H
1111
A
39
This is not a valid redundant version of T either because only symbols A and H appear
in the right column. We could define a function τ such that
V (1, a1 , a2 , a3 ) = T (τ (a1 , a2 , a3 ))
but τ would not be a permutation since τ (a1 , a2 , a3 ) will always be 000 or 111 and other
values of (a1 , a2 , a3 ) would not have antecedents by τ . Suppose A and H confer on average
a higher fitness than other symbols when they appear in the genome; code V would then
be better than T not because of the pattern of redundancy but because of the overrepresentation of these symbols. We wanted to rule out such possibility.
3.2.3
The identity permutation
Let us consider now the special case where the identity permutation, Id (∀x, Id(x) = x),
is used to define a redundant code. The resulting code T Id is defined as follows:
T Id : {0, 1}4 → S = {A, B, C, D, E, F, G, H}
d1 a1 a2 a3
T Id (d1 a1 a2 a3 )
d1 a1 a2 a3
T Id (d1 a1 a2 a3 )
0000
A
1000
A
0001
B
1001
B
0010
C
1010
C
0011
D
1011
D
0100
E
1101
E
0101
F
1101
F
0110
G
1110
G
0111
H
1111
H
The obvious feature of this code is that the elements of S appear exactly in the same
order in the second and fourth columns. This indicates that the leftmost added bit d1 is
never relevant in decoding a sequence; the three rightmost bit are sufficient and they can
be interpreted exactly as they would under T . Bit d1 is best seen as junk genetic material
in this case. From an evolutionary point of view, code T Id should behave in exactly the
same way as code T provided the mutation rate per bit is kept constant.
40
The same conclusion holds for values of n greater than 3. Consider a non-redundant
code,
T : {0, 1}n → S
to which the identity permutation is applied. The code
T Id : {0, 1}n+1 → S
is such that
∀(a1 , a2 , ..., an ) ∈ {0, 1}n
T Id (0, a1 , a2 , ..., an ) = T (a1 , a2 , ..., an )
T Id (1, a1 , a2 , ..., an ) = T (Id(a1 , a2 , ..., an )) = T (a1 , a2 , ..., an )
The first bit is therefore always irrelevant to the interpretation of a sequence.
3.2.4
Redundancy and neutrality
As discussed in the previous chapter, an important consequence of redundancy is the potential for neutral point mutations between sequences which differ in a single bit and have
the same meaning. We are therefore interested in finding a systematic way of detecting
these mutations once a redundant code is defined. A mutation will be said to be neutral only when sequence (a1 , a2 , ..., an ) is changed into a sequence whose meaning is also
T (a1 , a2 , ..., an ). In most genotype to phenotype mappings, there will also be mutations
which change the value of T but do not change the phenotype. The analogy in the case
of the genetic code would be a change of amino acid which did not change the function of
the protein it is part of. This neutrality at the meta-level will be left out of our discussion
for the time being.
We have already pointed out that, provided T is a non-redundant code and σ is a
permutation, every element of S appears only once in the second and fourth columns of
the table that defines T σ . As a consequence, given a sequence (d1 , a1 , a2 , a3 ), mutations
in bits a1 , a2 , or a3 will never be neutral since they correspond to a move to a different
line within the same column. The only bit that can be neutral is thus d1 . A mutation of
d1 corresponds to a move to the same line in the other column. Consequently, mutation
of d1 will be neutral if, and only if, the same symbol appears on the same line in both the
second and fourth columns of the table that defines T σ . This will happen if
T σ (0, a1 , a2 , a3 ) = T σ (1, a1 , a2 , a3 )
which, by definition, is equivalent to
T (a1 , a2 , a3 ) = T (σ(a1 , a2 , a3 ))
which, because T is non-redundant, amounts to
(a1 , a2 , a3 ) = σ(a1 , a2 , a3 )
In other words d1 is neutral only for sequences (d1 , a1 , a2 , a3 ) such that (a1 , a2 , a3 ) is
invariant through σ. This argument generalises trivially to any value of n.
110
111
101
100
010
000
41
011
001
Figure 3.1: A spatial representation of 3 bit long sequences. This is in fact a graph with
the property that two sequences connected by an edge can be changed into each other by
point mutation.
3.2.5
Permutations as the expression of redundancy
We just showed that the number and the identity of the sequences connected through
neutral mutations will depend on σ only and not on T . Our formal definition of redundant
codes thus appears to disentangle the redundant component of the code, in the form of the
permutation σ, from the non-redundant component, T . This was made possible through
our decision to restrict ourselves in the choice of T to codes where the number of symbols
is a power of 2. Given that restriction, the function T has no interesting property from our
point of view and the nature of its output set S, can be conveniently overlooked. Given
the choices made in Section 3.2.1, the study of redundancy can be reduced to the study
of the possible permutations.
For sequences of length n, the set containing all permutations of the sequences will
be called Pn . It contains 2n ! elements each of which defines a valid form of redundancy.
In fact, given the many symmetries of the sequence space, many of these permutations
turn out to be equivalent for our purpose but we will not try to identify these equivalence
classes.
In Chapter 4 and 5, redundancy will be investigated with no mention of the properties
of the underlying non-redundant codes as advocated above. We will see that a great deal
can be understood without paying any attention to these codes. Chapter 6, however,
will show that there are cases when the features of T cannot be ignored if we want to
understand the consequences of redundancy for the evolutionary process.
3.2.6
Redundancy in a graphical form
In the case where n = 3, it is possible to gain some insight about our definition of redundancy by examining the issue from a graphical perspective. Consider the cube H3 shown
in Figure 3.1. Each of its corners represents one of the eight sequences on which T is
defined. This cube is in fact a graph with the property that sequences which differ in a
single bit are connected by an edge and reciprocally. Hence, a point mutation is equivalent
1110
0110
0111
0010
C0
1111
1101
1010
0101
0100
0000
1100
0011
1000
0001
42
C1
1011
1001
Figure 3.2: A spatial representation of 4 bit long sequences. This graph can be seen as
made of two copies of the one shown in the previous figure. One copy (C0 ) contains the
sequences starting in 0, the other (C1 ) contains the sequences starting in 1. The two are
connected by parallel edges corresponding to a change in the first bit.
to moving from one corner of H3 to an adjacent one.
If we want a similar representation for sequences one bit longer, we can turn to the
four-dimensional hypercube H4 shown in Figure 3.2. H4 can be thought of as consisting
of two replicas of H3 : one contains the sequences for which the extra bit is equal to 0
(which we call C0 ) and the other contains those for which that bit is equal to 1 (called
C1 ). Notice that sequences (0, a1 , a2 , a3 ) and (1, a1 , a2 , a3 ), which differ only at the extra
bit, occupy the same relative position in C0 and C1 and are connected by an edge of the
graph.
With our definition of redundancy, every sequence in C0 has a synonymous sequence
in C1 and H4 provides a convenient way of picturing the relationship between such pairs
of sequences. We label the sequences of C0 with their decimal equivalent and use the same
labels in C1 in such a way that synonymous sequences display the same label. Hence, 0
labels sequence 0000 and the sequence of C1 which is synonymous of 0000, 1 labels 0001
and the sequence of C1 which is synonymous of 0001, ... The general rule is that (a1 , a2 , a3 )
labels sequence (0, a1 , a2 , a3 ) and sequence (1, σ −1 (a1 , a2 , a3 )) which are synonymous.
We can illustrate this with the permutation used in our example page 37. It was
defined by σ(0) = 1, σ(1) = 3, σ(2) = 4, σ(3) = 6, σ(4) = 5, σ(5) = 2, σ(6) = 0 and σ(7) = 7
which we write [13465207] by listing the images in order. Its graphical representation
is shown on Figure 3.3. We can gather from it that sequence 0101 (labelled 5 in C0 ) is
synonymous with sequence 1100 (labelled 5 in C1 ) and that sequence 0111 is synonymous
with 1111 (both being labelled 7).
When equivalent corners of C0 and C1 are assigned the same label (such as 0111 and
1111 in the previous example), two sequences differing at the first bit are synonymous.
The edge that joins the two is therefore a neutral mutation. As explained before, this can
0
6
7
5
4
C0
7
2
6
4
3
2
0
5
1
1
43
C1
3
Figure 3.3: A spatial representation of permutation [13465207].
only happen with edges linking C0 to C1 . All other edges are not neutral by construction.
Figures such as 3.3 are a useful way of picturing the relations of synonymity introduced
by a permutation independently of the code to which they are applied. But we can also
see the decimal representations of binary sequences as a code itself
a1 a2 a3
T (a1 a2 a3 )
000
0
001
1
010
2
011
3
100
4
101
5
110
6
111
7
in which case the labels in Figure 3.3 actually represent the meaning of the sequence whose
corner of the cube it corresponds to. For instance, 3 is associated in C1 with the sequence
1001 which, if T is the code shown above, effectively means that T σ (1001) = 3.
In addition to letting us visualise the relations of synonymity between sequences, this
graphical representation also displays the arrangement of symbols in C1 . For instance, in
permutation [13025746] represented in Figure 3.4, we see that C1 is obtained from C0 by
rotation of 90 degrees about a vertical axis. This property is not easily detected and it
has important consequences for the redundancy that results from it as will be discussed
on page 64.
3.3
The framework in a more general form
The previous section has shown how one bit of redundancy could be added to a code T
defined on sequences of arbitrary length n. The aim of this section is to show how this
4
6
7
5
4
C0
6
7
2
0
3
2
0
5
1
1
44
C1
3
Figure 3.4: A spatial representation of permutation [13025746]. We see that C1 is a rotated
version of C0 .
can be generalised to the case where p bits of redundancy are added. Most of the section
discusses the case where p and n are both equal to 3. The advantage of this case is that
it can be illustrated with diagrams which is not possible for greater dimensions. The
generalisation from there to any value of n and p is however straightforward once this case
is understood.
3.3.1
The natural generalisation
In the case of 1 bit of redundancy, we doubled the original sequence space and defined
the meaning of the newly created sequences based on the meanings of the existing ones
via a permutation. If we add 3 bits of redundancy, we have an eight-fold expansion of our
sequence space: for every existing sequence, there are now 7 new ones whose assignment
must be defined. In the case where n = 3, the new sequence space is a six-dimensional
hypercube, H6 , which has been represented in Figure 3.5. This graph can be subdivided
into 8 replicas of H3 depending on the value of the first three redundancy bits d1 d2 d3 . We
call Cd1 d2 d3 the (n-dimensional) cube containing sequences which start in d1 d2 d3 .
For p = 1, we ensured that the redundancy bit was irrelevant when it was set to 0,
i.e. T σ (0, a1 , a2 , ..., an ) = T (a1 , a2 , ..., an ). In the same way we can ensure that given a
non-redundant code:
T : {0, 1}n → S
the redundant code T red3
T red3 : {0, 1}n+3 → S
obtained from T by the addition of 3 redundant bits will be such that
∀(a1 , a2 , ..., an ) ∈ {0, 1}n
T red3 (0, 0, 0, a1 , a2 , ..., an ) = T (a1 , a2 , . . . , an )
C110
110011
C111
101100
100100
C100
45
C010
010000
C101
C011
001110
C000
000001
011111
C001
Figure 3.5: A spatial representation of 6 bit long sequences. Not all the edges are represented on this diagram. The lines connecting the small cubes to each other are a shorthand
for 8 edges connecting every pair of equivalent corners of the small cubes.
In other words, the redundancy bits do not change the meaning of the rest of the sequence
when they are all equal to 0. For p = 1, we enforced that all sequences starting in 1 would
have a different meaning from each other and that these meanings would be taken from
set S. This ensured that the number of sequences encoding elements of S remained the
same. The simplest way to generalise this constraint to p = 3 is to define 7 permutations
σd1 d2 d3 , one for each possible value of d1 d2 d3 other than 000 and have:
∀(a1 , a2 , ..., an ) ∈ {0, 1}n ,
T red3 (d1 , d2 , d3 , a1 , a2 , . . . , an ) = T (σd1 d2 d3 (a1 , a2 , . . . , an ))
When n = 3, the result can be visualised in Figure 3.6.
But there are some drawbacks associated with this way of defining a redundant code.
As Figure 3.6 shows, all Cd1 d2 d3 are not on an equal relationship to each other. The distance
between the redundancy bits d1 d2 d3 induces a distance between the cubes Cd1 d2 d3 which
we have captured by representing them as the corners of a cube.
Sequence 010011 in C010 has six neighbours, one for each of its 6 bits. A mutation
in one of the three rightmost bits will produce a sequence which is also in C010 . These
mutations cannot be neutral since, by relating Cd1 d2 d3 to C000 through a permutation, we
ensured that the 8 sequences in Cd1 d2 d3 map to different symbols of S. A mutation in
the first three bits of the sequence will lead to C110 , C000 or C011 , i.e. one of the three
cubes occupying neighbouring positions from C010 . We know that any neutral neighbour
of 010011 is to be found among those 3. Sequence 000011 will, for instance, be a neutral
neighbour of 010011 if, and only if, 011 is invariant through σ010 . If that is the case, we
have
T red3 (010011) = T (σ010 (011)) = T (011) = T red3 (000011)
and the two sequences are indeed synonymous.
C111
C110
C101
σ 110
C100
46
σ 111
σ
σ 100
10
1
C010
σ
C000
01
σ 011
0
σ 001
C011
C001
Figure 3.6: Defining the meanings of Cd1 d2 d3 with seven permutations. Cube Cd1 d2 d3 is
related to C000 via permutation σd1 d2 d3 .
But if we want to determine whether 011011 is a neutral neighbour of 010011, complications arise because the relationship between C010 and C011 is only defined by the
intermediary of C000 . We have
T red3 (010011) = T red3 (011011)
⇔ T (σ010 (011)) = T (σ011 (011))
⇔ σ010 (011) = σ011 (011)
(by injectivity of T)
−1
⇔ σ011
(σ010 (011)) = 011
Sequence 011011 is thus a neutral neighbour of 010011 if and only if 011 is invariant through
−1
σ011
◦ σ010 . However, finding the invariant elements of such compositions of permutations
is not easy and we will not have much control over the number of neutral mutations which
will result from defining redundancy in this way.
3.3.2
Other ways of generalising
If we want to have ready access to the number and identity of the neutral mutations that
result from the definition of a function T red3 , we can relate pairs of Cd1 d2 d3 graphs which
are next to each other rather than defining each of them in relation to C000 . For instance,
if we explicitly defined a permutation σ010→011 such that
T red3 (011a1 a2 a3 ) = T red3 (010σ010→011 (a1 a2 a3 ))
finding whether 011011 is a neutral neighbour of 010011 would just require checking
whether 011 is invariant through σ010→011 . We are now confronted by another problem. There is a total of 12 pairs of Cd1 d2 d3 which are neighbours but only 7 permutations
C110
C111
α
C101
β
σ3
C100
β = σ -31 σ 2 σ 3
β
σ3
γ
δ
σ1
γ = σ -31 σ 1 σ 3
δ = σ -21 σ 1 σ 2
σ2
C000
α = σ -31 σ -21 σ 1 σ 2 σ 3
σ3
σ 3 C010
47
σ2
C011
C001
Figure 3.7: Defining the meanings of Cd1 d2 d3 with three permutations. The number of
invariants of permutations α, β, γ and δ is still difficult to calculate.
to be defined. If we do define a permutation for each of the 12 pairs of Cd1 d2 d3 we will have
conflicting definitions for the meaning of many sequences which is clearly not acceptable.
One way around this is to define only three permutations σ1 , σ2 and σ3 , one for each of
the three bits of redundancy, and compose them to reach any Cd1 d2 d3 from C000 as shown
on Figure 3.7. In order to go from C000 to C011 for instance, we apply permutation σ2
followed by σ3 . Notice that we must specify the order in which these permutations should
be applied because the result will in general depend on it. We could for instance adopt the
convention that they should always be combined in order of increasing index; hence C101
would be reached from C000 by σ1 ◦σ3 rather than σ3 ◦σ1 . Defining T red3 this way, 7 out of
the 12 pairs of neighbour Cd1 d2 d3 will be related through either σ1 , σ2 or σ3 for which we
know the number of invariants. The other 5 permutations would result from compositions
and inversions such as σ3−1 ◦ σ2 ◦ σ3 whose number of invariants is still difficult to predict.
To simplify things even further, we can choose σ1 , σ2 and σ3 so that they all commute
with each other. The permutation σ3−1 ◦ σ2 ◦ σ3 which relates cube C101 to C111 will then
reduce itself to σ2 . The other four compositions in Figure 3.7 will also reduce to a single
permutation. All 12 transitions between neighbouring Cd1 d2 d3 will be dictated by either
σ1 , σ2 or σ3 as shown on Figure 3.8.
Mutation of a redundant bit now always corresponds to the same permutation regardless of the value of the other two redundant bits. As a result, the neutral mutations
between say C000 and C100 will be determined by the invariant elements of σ1 . So will
the neutral mutations between C010 and C110 , C001 and C101 , and between C011 and C111
since all these pairs are related through σ1 .
We can calculate in this case the total number of neutral mutations, Ntot , that will
result from T red3 . It will be the sum of the neutral mutations for each of the 12 pairs of
C110
C111
σ1
C101
σ2
σ3
C100
48
σ2
σ3
σ1
σ3
σ 3 C010
σ1
σ2
σ2
C011
σ1
C000
C001
Figure 3.8: Defining the meanings of Cd1 d2 d3 with three commutative permutations. Pairs
of neighbour Cd1 d2 d3 are now all related through either σ1 , σ2 or σ3 .
neighbouring Cd1 d2 d3 . Of these 12, 4 are related through σ1 , 4 through σ2 and 4 through
σ3 . Hence:
Ntot = 4(Nσ1 + Nσ2 + Nσ3 )
where Nσi is the number of invariants of σi .
We can generalise to the case where p bits of redundancy d1 , d2 , . . . , dp are added. We
define p permutations σ1 , σ2 . . . σp , one for each of the bits of redundancy, choosing them
so that they all commute with each other. The permutation that relates the meanings
of sequences in Cd1 d2 ...0...dp to those in Cd1 d2 ...1...dp is σi where i is the position of the bit
where they differ. Because all permutations commute with each other, this will be true
regardless of the value of the other redundant bits. There are p2p−1 pairs of Cd1 d2 ...dp
which differ in a single bit. Of these, 2p−1 will be related through σ1 , 2p−1 through σ2 . . . .
The total number of neutral mutations will thus be:
p−1
Ntot = 2
p
X
N σi
i=1
where Nσi is the number of elements which are invariant through permutation σi .
Notice that, no matter which way we generalise, when all the permutations we define
are equal to identity, we are in the situation where only the three (n in the general case)
last bits define the meaning of a sequence. The first three (p in the general case) bits are
irrelevant in all situations and can be regarded as genetic junk just as in the case where
p = 1.
3.4
49
The criteria for assessing redundancy
When fitness stops improving in an evolutionary algorithm it is usually because the individuals that compose the population are trapped in one or several local fitness optima.
By optimum, we mean a sequence whose fitness is greater than the fitness of all sequences
that can be reached from it by a point mutation. The number of local optima in a fitness
function is therefore an indication of how effectively evolution can optimise that function.
Likewise, if we can reduce the number of local optima of a function without changing
the proportion of sequences with a given fitness, we will ease the evolutionary process by
making high fitness points easier to reach. We can therefore assess redundancy on the
basis of its impact on the number of optima of the code to which it is applied.
3.4.1
Assigning fitness to symbols
Section 3.2 has shown how every permutation defined on a set of 2n elements can be
regarded as a different way of adding one bit of redundancy to a code defined on sequences
of length n. We now want to assess the impact of these permutations on the number of
optima of the code to which they are applied. But codes are not fitness functions; they
have been defined as mappings between binary sequences and some symbols which then
interact together at a higher conceptual level. If numbers of optima are to be compared
on a code, we need to to have some criteria to establish that a symbol is better than all
the other symbols that can be reached from it by point mutation. Hence, given a code T
defined as before,
T : {0, 1}3 → S = {A, B, C, D, E, F, G, H}
we need a function z which assigns a positive real number to each element of S:
z : S = {A, B, C, D, E, F, G, H} → <
The composition of the two will yield a function f ,
f = z ◦ T : {0, 1}3 → <
which assigns fitness z(T (a1 , a2 , a3 )) to sequence (a1 , a2 , a3 ), bypassing the meaning of the
sequences in terms of elements of S.
Suppose we now apply the same function z to T σ , the redundant code defined by
permutation σ, instead of T . We have,
z ◦ T σ : {0, 1}4 → <
z(T σ (0, a1 , a2 , a3 )) = z(T (a1 , a2 , a3 )) = f (a1 , a2 , a3 )
z(T σ (1, a1 , a2 , a3 )) = z(T (σ(a1 , a2 , a3 ))) = f (σ(a1 , a2 , a3 ))
We can therefore see z ◦ T σ as the application of redundancy directly to function f and
ignore its underlying definition as z ◦ T ,
f σ : {0, 1}4 → <
50
f σ (0, a1 , a2 , a3 )) = f (a1 , a2 , a3 )
f σ (1, a1 , a2 , a3 )) = f (σ(a1 , a2 , a3 ))
Until the end of this chapter and in the next one, we will be describing the application
of redundancy to functions of the type of f rather than T . The signification of function
z which underlies the transformation of one into the other will be discussed in the next
section. The decomposition of f into z and T will also be discussed again in Chapter 6.
The point of defining functions such as f is to make meaningful the notion of local
optima. But all we really need is some rule to decide which of two neighbouring sequences
is the fitter. The actual fitness values of the sequences are not necessary. Hence, two
fitness functions f1 and f2 which assign different fitnesses to sequences but are such that
sequences end up in the same order when arranged by increasing order of fitness are
identical for our purpose. All that matters when defining such a function is the order it
induces on the sequences.
3.4.2
How meaningful is the fitness of a 3 bit long sequence?
What exactly does it mean to study fitness functions defined on sequences as short as the
ones we have been talking about so far? Surely, any evolutionary process worth of this
name will have to handle sequences orders of magnitude longer. The inspiration behind our
definition of a code was both the genetic code and the low-level codes used in evolutionary
algorithms. We will answer the question in both contexts.
Analogy with the genetic code
In the analogy with the genetic code, assigning fitnesses to elements of S, as function z
does, is analogous to assigning fitnesses to amino acids. This is, at first sight, a meaningless
thing to do since amino acids are not intrinsically good or bad but can only be judged in
the context of the protein or genome of which they are part. However, if one kepdf all the
bases in the genome constant except for three which define one amino acid, it is possible
to define a fitness for each of the 20 choices of amino acids at that position.
These fitnesses will depend highly on the context in which this amino acid is allowed
to vary. But a function such as z could nonetheless be defined provided that a genetic
background was kept constant. The nature of this function would change radically from
one background to the other but that needs not worry us here.
Parallels with Population Genetics
When geneticists talk of beneficial or deleterious mutations, they perform a similar abstraction to the one we are proposing. They imagine two organisms whose genomes differ
by a single mutation, one of the two being fitter than the other. They are not interested
in measuring the fitness of these organisms, only in distinguishing whether that mutation has a positive or a negative effect every thing else being equal. In fact individuals
differing only in this mutation will probably be very unlikely to exist, especially if the
organisms in question reproduce sexually but the fact that it is in principle possible makes
this definition useful.
51
Much population genetics work uses single locus models where evolutionary change
is only considered at one locus, the rest of the genome being supposed constant. Such
models provide a welcome simplification without which more realistic scenarios cannot be
understood. Natural selection acts on many genetic differences at the same time but in
first approximation something similar to what is predicted by the one-locus model will be
happening in parallel at all the loci under selection.
Our approach is similar to the one-locus model in that it is concerned with changes at
a very localised point of the genotype. The possible alleles for our locus are the elements
of S. The novelty lies in the fact that our alleles are modelled down to the nucleotide
level and that we can examine the consequences of these alleles having more than one
representation at that level.
Similarity with schema analysis
Theoretical analysis of GAs has also resorted to the attribution of fitness to short sections
of the chromosome. Holland’s schema theorem (Holland, 1992) talks of the fitness of
schemas where schemas are sections of the chromosome of sufficiently short length in order
not to be too disrupted by recombination. The fitness of a schema is the average fitness of
all the individuals in the population that possess the schema at a well defined place in their
genome. This is very similar to our function z; the difference is that instead of evaluating
the schema in a fixed background, an average is taken in a variety of backgrounds provided
by the evolving population.
3.4.3
Counting numbers of optima
Let us now illustrate with some examples how redundancy affects the number of optima
of a function. In Figure 3.9 we have represented permutation σ =[32541706], two fitness
functions f1 and f2 defined on sequences of 3 bits, as well as f1σ and f2σ , the functions
obtained by application of σ to f1 and f2 .
Function f1 has 2 local maxima: 001 of fitness 0.4 and 111 of fitness 0.7 which is the
global maximum. Its redundant version, f1σ , also has 2 local maxima. One is 0111: since
this sequence is still associated with the highest fitness, its new neighbour 1111 cannot
possibly be fitter. The other local maximum is 1101 which is synonymous of 0111 and also
has the maximum fitness 0.7. Local optimum 001 of f1 has disappeared with redundancy
since its new neighbour 1001 has a greater fitness (0.5).
Function f2 also has two local maxima. These are 001 with fitness 0.7, and 110 with
fitness 0.6. But function f2σ has 5 maxima. Both 001 and 110 remain maxima after a 0
has been added to their left (0001 and 0110). Three new ones are created in C1 1010 with
fitness 0.5, 1100 with fitness 0.7 and 1111 with fitness 0.6.
These two cases show that the same permutation σ can have a very different impact
on functions which have the same numbers of optima. We should not expect to be able
to assess redundancy on an arbitrary fitness function. The proper way to handle this
difficulty is discussed in 3.4.5.
0
6
C0
3
2
(a)
σ
f1
0.6
0.6
0.7
0.2
0.1
0.4
0.7
0.4
C0
0.6
0.7
0.3
0.2
0.1
0.5
0.0
0.0
0.2
0.3
0.5
0.0
C1
1
f1
0.3
4
5
3
2
6
7
5
4
0
1
7
52
0.1
0.4
C1
0.5
(b)
σ
f2
f2
0.6
0.5
0.1
0.0
0.7
0.2
C0
0.6
0.2
0.1
0.5
0.0
0.4
0.3
0.7
0.5
0.1
0.4
0.3
0.6
0.2
0.3
0.0
0.7
C1
0.4
(c)
1110
110
100
111
101
000
0100
011
010
001
0110
0101
0010
0000
0111
0011
0001
C0
1100
1111
1101
1010
1011
1000 1001
C1
(d)
Figure 3.9: Counting local optima with and without redundancy. (a) A representation of
permutation σ =[32541706]. (b) Function f1 and its redundant version f1σ . The number of
local optima (circled corners) is 2 in both cases. (c) Function f2 and its redundant version
f2σ . The number of local optima (circled corners) increases from 2 to 5. (d) A reminder
of the correspondence between corners of the cubes and sequences.
f
σ
0.7
0.5
0.5
0.8
0.6
C0
0.6
0.8
0.2
0.4
0.3
1.0
0.3
0.4
0.7
0.2
53
C1
1.0
(a)
1110
0110
0100
1100
0011
0001
C0
1101
1010
0101
0010
0000
0111
1111
1011
1000 1001
C1
(b)
Figure 3.10: Counting local optima in the presence of neutral paths. (a) A redundant
function with two neutral paths. Sequences in a solid circle are both counted as local
optima. The sequence in the dashed circle loses its optimality as a result of the neutral
path. (b) A reminder of the correspondence between corners of the cubes and sequences.
3.4.4
Dealing with neutral paths
We will make explicit here the procedure used to count numbers of optima in the presence
of neutral mutations. Consider function f σ resulting from the introduction of permutation
[23017564] and pictured in Figure 3.10. The non-redundant function f from which f σ is
derived can be obtained by looking at the left half of the figure. In f , sequences 101 and
110 are both local optima. Since both sequences are invariant through permutation σ, the
paths that connect these sequences to their new neighbour in C1 are both neutral. Should
0101 and 0110 be counted as optima?
The answer depends on the situation of their neutral neighbours 1101 and 1110. Sequence 1110 has no neighbour of higher fitness within C1 . In that case, both 0110 and 1110
will be counted as local optima. Sequence 1101, on the other hand, has a neighbour (1001)
whose fitness is greater. In that case, neither 0110 nor 1110 will be counted as optimal
sequences because transition from 0110 to a sequence of higher fitness is now possible via
a neutral mutation in the leftmost bit.
Stated in general terms, suppose a neutral mutation exists between sequences A and B
both of fitness fAB ; both A and B will be counted as optima if none of the neighbours of
A or B has a fitness strictly greater than fAB , but none of these sequences will be counted
as optimum if either A or B has a neighbour of fitness greater than fAB . A is thus not
54
counted as an optimum if a path to higher fitness exists via B and vice-versa.
Using the procedures described above, for any function f and any permutation σ, we
can calculate Nf , the number of optima of f and Nf σ , the number of optima of f σ . The
comparison of these two numbers will be the basis for assessing the quality of permutation
σ.
3.4.5
Comparing numbers of optima in a meaningful way
Correcting for space expansion
We have seen in Section 3.2.3 that, when σ is the identity permutation, the redundancy
that results makes absolutely no difference in evolutionary terms. Given a function f with
Nf numbers of optima, Nf Id , the number of optima of f Id , will be equal to twice Nf simply
because every sequence that is optimal under f is duplicated with the duplication of the
sequence space. Before we can compare the numbers Nf and Nf σ , we must therefore divide
the latter by two, to compensate for this doubling of the number of sequences. If Nf σ /2 is
equal to Nf redundancy has made no difference; if it is smaller, redundancy has reduced
the number of local optima and the greater the difference between Nf and Nf σ /2, the
more significant the improvement. For any code f and permutation σ, we will collect the
values of both Nf − Nf σ /2, the net decrease in number of optima, and (Nf − Nf σ /2)/Nf ,
the proportion of the original number of optima that has been eliminated by redundancy.
If these numbers are negative, then redundancy has effectively increased the number of
optima. This can happen as illustrated in Figure 3.9.c.
Averaging out
Figure 3.9 has shown that the difference between Nf and Nf σ depends as much on f as
on σ. If f has no local optima other than the global one (Nf = 1), no permutation can
reduce the number of optima. Function f σ will have at least two local optima and most
probably more. But the same permutation stands a much better chance of reducing the
number of optima of a function f if that function is rich in local optima to start with. The
effect of σ will thus vary a lot from one fitness function to the other. To avoid biasing our
assessment of a permutation by an unrepresentative choice of functions f , we must average
out the values of Nf − Nf σ /2 and (Nf − Nf σ /2)/Nf across as many different functions as
possible. Only these averages, which we will call Dσ and Rσ respectively, can reliably be
used to compare the benefits of different permutations. The larger Dσ and Rσ , the more
we can expect σ to reduce, on average, the number of local optima and bring evolutionary
benefits. If Dσ and Rσ are negative, redundancy is detrimental since it does on average
increase the number of local optima.
3.5
Conclusion
We saw how a shuffling or permutation of the elements of a set containing 2n elements can
be regarded as defining a pattern of redundancy for a code defined on n bit long sequences.
This is only one of the possible way of characterising redundancy. Its advantages, as we
have shown, is to allow an easy distinction between the redundant and the non-redundant
component of a code. This redundancy will be applicable to any code regardless of the
55
nature of what is being represented by the binary sequences. In this thesis, we will only
consider the case where such forms of redundancy are applied to non-redundant codes.
Nothing prevents us, however, from using such permutations to add redundancy to codes
which are already redundant. In this case, however, the distinction between the redundant
and non-redundant aspects of the code become blurred. Conveniently, in the case where
we start from a non-redundant code, the number of invariant elements of the permutation
is equal to the number of neutral mutations that will exist in the redundant code.
A procedure was defined for assessing the impact of these permutations on the number
of local optimas of the code. This is done by assuming that all rankings of the symbols
of the code are equally likely when considered in the large number of genetic contexts in
which they can be found.
Chapter 4
A statistical analysis of redundancy patterns
4.1
Aims of the chapter
In the previous chapter, we saw how a code defined on sequences of length n can be made
redundant by the addition of one bit to the length of the existing sequences. In this
chapter, we will compare the many ways in which that particular type of redundancy can
be added. The addition of a single bit of redundancy is defined by a permutation of 2n
elements. Our investigation of redundancy will therefore compare the elements of P2n , the
set of all permutations of 2n elements.
The criterion by which these permutations will be judged is their ability to reduce the
number of local optima of a randomly chosen function associating fitnesses to the symbols
of the code. Two scalars, Rσ and Dσ , measure the expected reduction in numbers of
optima that results from using σ to define redundancy. One of the aims of the chapter is
to find permutations with the highest possible values of Rσ . These permutations will then
be used in subsequent chapters to test the validity of “good” redundancy under a GA.
Until we have some understanding of the features of a permutation which are responsible for large positive values of Rσ and Dσ , the only possible strategy to find such
permutations is exhaustive search. Chapter 2 raised the possibility that the number of
neutral mutations resulting from redundancy could be one of the factors responsible for
large positive values of Rσ and Dσ . This chapter will assess that claim by examining the
relationship between Rσ and the number of neutral mutations for very large instances of
permutations. At the same time, we will keep track of many other features of the permutations and examine whether their incidence on Rσ tells us anything new about the causes
of evolutionarily beneficial redundancy.
4.2
Some quantitative features of a permutation
This section defines a series of variables which will be recorded for each permutation. If
some of these variables correlate with Rσ and Dσ , we will learn something about the way
redundancy affects the numbers of optima.
Chapter 4. A statistical analysis of redundancy patterns
4.2.1
57
Number of invariant elements
This parameter, called Inv(σ), is the number of elements such that σ(i) = i. We showed
in the previous chapter that, if a1 , a2 , ..., an is a sequence such that σ(a1 , a2 , ..., an ) =
a1 , a2 , ..., an , then (0, a1 , a2 , ..., an ) and (1, a1 , a2 , ..., an ) will be synonymous and mutation
of the first of these bits will be neutral. Inv(σ) is therefore the number of neutral mutation
that will result from applying σ to a code.
4.2.2
Number of orbits
Group theory defines the orbit of a permutation as follows. Take an arbitrary element a on
which the permutation is defined and apply the permutation to it, then to its image, then
to the image of its image, and so on. Since the set of elements on which the permutation is
defined is finite, these successive applications of σ will eventually take us back to a. That
is
∃q
/
σ q (a) = a
The subset containing a together with all the elements that are encountered before returning to a is called the orbit of a. The number of orbits defined by permutation σ is the
variable that will be collected; it is called Orb(σ).
This number is somewhere between 1 and 2n . The identity has 2n orbits since every
element is on an orbit of its own. At the other extreme, consider the permutation that
associates to an element the next one along in the set, looping back on the first element
when the last one is reached. By construction, one will not return to the same element
before having been through the entire set. Hence this permutation has a single orbit which
includes all the elements. Notice that, for any permutation, every invariant element is on
an orbit of its own. The number of orbits of a permutation is therefore at least as big as
its number of invariant elements.
4.2.3
Sum of distances between a sequence and its image
We define the following variable:
SumDist(σ) =
n −1
2X
H(i, σ(i))
i=0
where H(x, y) is the hamming distance between the binary representations of x and y.
From Chapter 3 we know that sequence (1, a1 , a2 ..., an ) will be synonymous of a unique
sequence starting with 0: (0, σ(a1 a2 ...an )). The hamming distance between i and σ(i) is
therefore a measure of how far apart these synonymous sequences are (assuming i is the
decimal equivalent of (a1 , a2 , ..., an )). If H(i, σ(i)) = 0, the two synonymous sequences
differ only on the leftmost bit and a neutral mutation exists between the two. At the
other extreme if H(i, σ(i)) = n, the synonymous sequences are complementary and all
bits must mutate to go from one synonymous to the other. The variable SumDist is the
sum of these distances for all sequences in C0 .
Since i = Id(i) for all i, we have SumDist(Id) = 0. The identity permutation therefore
minimises the variable SumDist. The highest possible score is obtained by permutation τ
58
T(000)
T(110)
T(110)
T(000)
C1
C0
Figure 4.1: A connection of type 0 between a pair of symbols.
which associates to every sequence its complementary sequence. We have SumDist(τ ) =
n2n . There is a negative correlation between SumDist and Inv.
4.2.4
Connectivity between pairs of signification
In Chapter 2, we suggested that redundancy would result in a code where transition
between symbols would be easier by point mutation. (We use symbol to describe that
which is represented by the code.) It is therefore desirable that we assess the impact of a
permutation on such transitions.
In the case of a non-redundant code defined on sequences of length n, the number of
different symbols is 2n . Adding redundancy does not alter this number. The number of
pairs of symbols whose connectivity can be examined is thus equal to
C22n = 2n−1 (2n − 1)
In the case where only one bit of redundancy is added, patterns of connectivity between
pairs of symbols can be classified in four mutually exclusive classes. We call Conn0(σ),
Conn1(σ), Conn2(σ) and Conn3(σ) the numbers of pairs of symbols which fall into each
of these four class. Because a given pair of symbols falls in one and only one of these, we
have,
Conn0 + Conn1 + Conn2 + Conn3 = 2n−1 (2n − 1)
We will now describe each of these classes in turn.
Class 0
Consider a redundant code T σ and two distinct sequences (0, a1 , a2 , ..., an ) and (0, b1 , b2 , ..., bn ).
The meaning of these sequences is T (a1 , a2 , ..., an ) and T (b1 , b2 , ..., bn ) under T σ . These
sequences are synonymous of sequences (1, σ(a1 , a2 , ..., an )) and (1, σ(b1 , b2 , ..., bn )) respectively. We will say that the pair of symbols (T (a1 , a2 , ..., an ), T (b1 , b2 , ..., bn )) is in class 0
if neither of the sequences whose meaning is T (a1 , a2 , ..., an ) is a neighbour of neither of
the sequences whose meaning is T (b1 , b2 , ..., bn ).
This is the most restrictive case from the point of view of evolution. It means that
transition from T (a1 , a2 , ..., an ) to T (b1 , b2 , ..., bn ) will always have to proceed via some
sequence standing for a third symbol. If T (a1 , a2 , ..., an ) and T (b1 , b2 , ..., bn ) are the best
59
T(000)
T(110)
T(110)
T(000)
C1
C0
Figure 4.2: A connection of type 1 between a pair of symbols.
and second best symbols at a certain locus, then both sequences associated with the second
best symbol will be local optima because transition to the best symbol will be impossible
without a deleterious mutation taking place first. An example of a pair of symbols in this
class is shown in Figure 4.1.
Notice that pairs of symbols (T (a1 , a2 , ..., an ), T (b1 , b2 , ..., bn )) cannot be in class 0 if the
Hamming distance between (a1 , a2 , ..., an ) and (b1 , b2 , ..., bn ) is 1. The reason is that such
symbols are by definition neighbour in C0 . Given that there are n2n−1 pairs of symbols
for which this is true (the number of edges of a cube of dimension n), the maximum value
of Conn0 is:
2n−1 (2n − 1) − n2n−1 = 2n−1 (2n − n − 1)
Class 1
A pair of symbols (T (a1 , a2 , ..., an ), T (b1 , b2 , ..., bn )) will fall in this category when one and
only one of the two sequences meaning T (a1 , a2 , ..., an ) is a neighbour of one and only
one of the two sequences meaning T (b1 , b2 , ..., bn ). The other two synonymous sequences
must be disjoint from the connected pair and disjoint from each other. An example of two
symbols in class 1 is shown in Figure 4.2.
In this case, substitution of one symbol by the other by point mutation may or may
not be possible depending on which of the synonymous representations is used for it. If
it is the unconnected one, then substitution is impossible. In Figure 4.2 for instance,
if T (110) is encoded as 0110, transition to T (000) is possible by mutation to sequence
1110. If T (110) is encoded by 1011, transition to T (000) is impossible because none of the
neighbours stands for T (000). When a pair of symbols is in class 1, we expect transition
from one to the other to be possible in half of the cases.
Class 2
A pair of symbols (T (a1 , a2 , ..., an ), T (b1 , b2 , ..., bn )) will be in class 2 if any three of the
four sequences associated with these two symbols are connected together, the fourth one
being disjoint from them. Figure 4.3 illustrates several possible configurations for this
case.
Pairs of symbols in this class are not in a symmetrical relationship to each other.
From one of the symbols transition will always be possible to the other while the reverse
60
T(100)
T(110)
T(110)
T(100)
C1
C0
(a)
T(110)
T(110)
T(100)
T(100)
C1
C0
(b)
1110
0110
1100
0010
0011
0001
C0
1101
1010
0101
0100
0000
0111
1111
1011
1000 1001
C1
(c)
Figure 4.3: Possible connections of type 2 between pairs of symbols. (a) and (b) Two
possible configurations. (c) A reminder of the correspondence between corners of the
cubes and sequences.
transition will depend on genetic representation. In Figure 4.3.a for instance, it is always
possible to go from T (100) to T (110) because both representations of T (100) are neighbours of 0110. The reverse transition from T (110) to T (100) is only possible in half of the
cases, when T (110) is encoded as 0110. On average, transition from one symbol to the
other is possible in three out of four cases.
Notice the difference between Figure 4.3.a and Figure 4.3.b. In the second case, transition from T (110) to T (100) will require a neutral mutation to happen first when the
starting point is sequence 1110.
Class 3
A connection of type 3 exists between symbols T (a1 , a2 , ..., an ) and T (b1 , b2 , ..., bn ) if it is
always possible to go from one of the symbols to the other without having to go through
a third symbol. The path that goes from one symbol to the other might include some
neutral mutations. The possible configuration for this case are represented in Figure 4.4.
In practice, all configurations implementing this case are such that a single mutation
is enough to go from one symbol to the other. Neutral mutations can take place first but
they are never necessary.
61
If all pairs of symbols were in this class, there would be never be any local optimum
given that transition to the global optimum would be possible from anywhere. However,
as we will see, no permutation comes close to achieving this.
4.3
Best and worst permutations when n is equal to 3
4.3.1
Some considerations of size
In the case where n is equal to 3, patterns of redundancy are defined by permutations over
8 elements. The set P8 containing these permutations has 40320 elements. Because of the
many symmetries of the cube, many of them will lead to identical patterns of redundancy.
Unfortunately, these equivalence relation are not easy to detect and we had to explore
the entire set. Each permutation should be evaluated on as large a number of fitness
functions as possible. As explained in the previous chapter, when a function is defined,
only the induced ranking of the symbols matters for the count of local optima. There
are therefore 40320 fitness rankings on which each permutation can be evaluated. Testing
every permutation on every fitness ranking requires that we count numbers of optima in
approximately 1.5 billion different configurations.
4.3.2
Description of the data
Our first task was to find among P8 the permutations with the 10 highest and 10 lowest
values of Rσ . For those 20 permutations we recorded the values of all the variables which
have been defined in the previous section. The result is displayed in Table 4.3.1.
Each line of these tables corresponds to a different permutation. The first column
identifies the permutation σ by listing the values of σ(0) . . . σ(7). The invariant elements
of the permutation are underlined. The second column gives the value of Rσ and between
brackets the ranking of the permutation with respect to this value. The third column gives
the same information for Dσ . In Table 4.3.1.b the ranking is reversed indicating how far
a permutation is from the bottom of the list.
The values of Rσ and Dσ are the averages of Nf − Nf σ /2 and (Nf − Nf σ /2)/Nf over
40320 fitness functions. For some of these function Nf − Nf σ /2 and (Nf − Nf σ /2)/Nf
will be positive, indicating that the number of optima has diminished as a result of the
redundancy. For others, they will be negative indicating that redundancy has increased
the number of optima of that function. These adverse cases will contribute negatively to
the value of Rσ and Dσ . For each permutation, we calculated the proportion of fitness
functions which result in Nf − Nf σ /2 being negative. This number is indicated in the
fourth column of the tables under the label BadCases.
The following columns indicate the values of the variables defined in the previous
section.
4.3.3
Any redundancy is better than none
The most remarkable observation that can be made from Table 4.3.1 is that the identity
permutation is the worst possible permutation. It is rated as such by both Dσ and Rσ ,
scoring exactly 0 with both measures. This score was expected since Rσ and Dσ were
T(110)
62
T(100)
T(110)
T(100)
C1
C0
(a)
T(100)
T(110)
T(110)
T(100)
C1
C0
(b)
T(001)
T(110)
C0
C1
T(001)
T(110)
1110
0110
0100
(c)
1100
1101
1010
0101
0010
0000
0111
0011
0001
C0
1111
1011
1000 1001
C1
(d)
Figure 4.4: Possible connections of type 3 between pairs of symbols. (a),(b) and (c) Three
possible configurations. (d) A reminder of the correspondence between corners of the
cubes and sequences.
Rσ (rank)
0.3225 (1)
0.3212 (2)
0.3168 (3)
0.3120 (4)
0.3061 (5)
0.3030(6)
0.2981(7)
0.2970 (8)
0.2962 (9)
0.2958 (10)
Rσ (rank)
0.0852 (10)
0.0797(9)
0.0790 (8)
0.0692 (7)
0.0680 (6)
0.0625 (5)
0.0603 (4)
0.0480 (3)
0.0453 (2)
0.000 (1)
σ(0) . . . σ(7)
07143562
01763254
01567234
06743512
07153264
06137254
01762354
07543612
56401237
06751234
σ(0) . . . σ(7)
40576123
46570123
01452367
40125673
40615723
75016423
45607123
40516273
40675123
01234567
0.000 (1)
0.0976 (2)
0.1000 (3)
0.2298 (>10)
0.1321 (4)
0.1464 (5)
0.2476 (>10)
0.2000 (9)
0.1667 (6)
0.1810 (7)
Dσ (rank)
0.7060 (10)
0.7060 (9)
0.7190 (5)
0.6952 (>10)
0.7024 (>10)
0.7155 (6)
0.7690 (1)
0.7595 (4)
0.7595 (3)
0.7679 (2)
Dσ (rank)
0.000
0.098
0.000
0.161
0.073
0.146
0.161
0.000
0.049
0.122
BadCases
0.125
0.120
0.139
0.099
0.108
0.104
0.157
0.165
0.112
0.140
BadCases
8
0
0
0
0
0
0
4
0
16
14
15
10
14
13
10
14
14
13
Conn0
(b)
Inv
0
3
3
3
6
5
4
3
4
4
3
Conn0
1
1
1
2
2
2
2
2
2
3
(a)
Inv
0
4
0
8
2
6
8
0
0
4
Conn1
15
15
13
10
8
10
8
10
10
9
Conn1
0
0
0
4
2
0
4
0
4
2
Conn2
4
4
6
4
8
8
12
8
8
12
Conn2
12
10
13
6
10
9
6
14
10
9
Conn3
6
6
6
8
7
6
5
6
6
4
Conn3
0
8
12
10
10
8
10
8
12
10
SumDist
16
16
18
12
14
14
16
14
14
12
SumDist
Table 4.1: (a) The 10 best permutations when n equals 3. (b) The 10 worst permutations when n equals 3.
8
3
2
3
1
2
1
6
3
2
Orb
2
2
3
4
4
3
5
5
3
5
Orb
63
64
somehow calibrated to produce 0 for the identity permutation (see Section 3.4.5). The
surprise comes from the fact that all other permutations have a positive score. We can
therefore state that, on average, every permutation will reduce the numbers of local optima.
Since the identity permutation is tantamount to a non-redundant version of the code, we
can conclude that any of the kinds of redundancy that has been included in this study
brings at least a small improvement.
It is important to bear in mind that this is true only on average over a large number of
fitness functions. For any permutation, there will be plenty of fitness functions such that
Nf − Nf σ /2 and (Nf − Nf σ /2)/Nf are negative; their proportion is given in the BadCases
column.
Table 4.3.1.a indicates that, at the other end of the spectrum, the very best permutations have Rσ values greater than 0.3. These permutations will therefore suppress nearly
a third of the local optima. This figure is very encouraging bearing in mind that it is an
average over all possible fitness rankings of the symbols, a large number of which cannot
possibly be made smoother (Nf = 1). We must also remember that we are adding here
the minimum amount of redundancy possible.
4.3.4
Differences between Dσ and Rσ
Comparison of the figures found in columns 2 and 3 shows that Dσ and Rσ yield slightly
different rankings. We can understand the origin of this difference by imagining two
permutations σ1 and σ2 which both reduce by, say, 30% the number of optima of half
the possible fitness functions but leave the number of optima unchanged in the other
half of the cases. If σ1 achieves a 30% reduction on functions with high numbers of
optima while σ2 is more effective on functions with low numbers of optima, we will have
Rσ1 = Rσ2 = 0.3/2 = 0.15, but Dσ1 will be greater than Dσ2 because in absolute terms
σ1 will eliminate many more optima than σ2 . The fact that rankings according to Rσ and
Dσ do not differ significantly indicates that permutations are effective on average over the
same type of fitness functions. This conclusion is confirmed by the next observation.
4.3.5
The proportions of adverse cases
The values of BadCases are very similar for the best and the worst permutations. This
indicates that good permutations do better, not by being effective on a bigger proportion
of the fitness functions on which they are tested, but rather, by achieving more significant
reductions on those functions which are rich in optima.
Interestingly, we find at the bottom of the list a few permutations for which the value
of BadCases is 0. Those forms of redundancy have the property that whatever the
fitness function to which they are applied, they never increase its number of optima. One
such function is the identity permutation which is not surprising since it never changes
anything either for better or for worse. Permutations such as [40516273] and [01452367]
are more interesting because they have a small overall positive effect while never making
things worse. The reasons behind this property can be understood by examination of
Figure 4.5. In the case of the identity permutation, C1 is an exact copy of C0 . In the case
6
6
5
4
C0
7
3
5
4
0
3
2
0
2
7
1
C1
1
(a)
4
6
C0
5
1
7
6
2
3
2
0
0
7
5
4
65
1
C1
3
(b)
Figure 4.5: Permutations which never increase the number of optima. (a) Cube C1 can
be obtained from C0 by symmetry about the plane that includes 0, 1, 6 and 7. (b) Cube
C1 can be obtained from C0 by rotation of 90 degrees around an horizontal axis.
of [40516273] and [01452367], C1 can be obtained from C0 by a rotation and a symmetry
respectively. All three transformations have in common the fact that they preserve in C1
the relationships between symbols that existed in C0 . Therefore, no maximum can be
created in C1 that did not already exist in C0 . In the cases of [40516273] and [01452367]
however, some maxima might disappear because the interconnection of C0 and C1 creates
some new paths between symbols. In the case of the identity permutation, this does not
happen since the interconnection of C1 and C0 only connects sequences which code for the
same symbol.
4.3.6
Trends in the other variables
Examination of the values of other variables in Table 4.3.1 leads to the following observations.
The values of Inv for the best permutations are all 1, 2 or 3. For the worst permutations, this value is in most cases 0 with the exception of an 8 (the Identity permutation)
and a 4. Although this does not indicate a linear relation between Rσ and Inv, the values
found at the top are distinct from those found at the bottom.
Conn0 points to a simple trend with low values at the top (although not the lowest
ones since 0 is a possible value) and high values at the bottom. Conn3 shows a similar
trend but the values for the best and worst permutations are not as differentiated. The
66
0.35
0.3
Rσ
0.25
0.2
0.15
0.1
0.05
0
-1
0
1
2
3
4
5
6
7
8
Inv
Figure 4.6: Rσ as a function of Inv when n equals 3.
value 6, for instance, is found for one of the worst and one of the best permutations. Conn1
and Conn2 have somehow the reverse trend: high values for the good permutations and
low values for the bad ones. Here again some values appear both in Table 4.3.1.a and
Table 4.3.1.b.
The values of SumDist in the top table are all greater than 12 while they are all
smaller than 12 in the bottom table. However the value 12 is obtained for several of the
best and the worst permutations. The values of Orb are between 2 and 5 in the top table
and between 1 and 8 in the bottom one. Values 4 and 5 only appear in the top table.
Values 2 and 3 are possible for the best and the worst permutations.
These observations indicate some trends but only the examination of more samples
can give us a clearer picture. This will be done in Section 4.5.
4.4
The incidence of Inv on Rσ
The variable Inv is the one whose relation to Rσ we are most interested in investigating.
There are two reasons for that. The first is that our account, in Section 2.2.4, of how
redundancy could help the genetic code implied that the amount of neutrality could be an
important factor. The second reason is that it is simple to generate a permutation with a
given number of invariants. Hence, if some number of invariant leads to high value of Rσ ,
we would have a convenient method for generating beneficial forms of redundancy.
4.4.1
Codes defined on sequences of length 3
In Figure 4.6, the value of Rσ has been plotted against Inv for every possible permutation.
The points appear on 8 equidistant vertical lines corresponding to the possible values of
Inv. No point exists for Inv = 7 and a single one for Inv = 8 corresponding to the
identity permutation whose value is 0. The density of points for values of Inv equal to 0,
1 and 2 is such that the superposition of points gives the impression of a continuous lines.
The graph shows that more permutations exist with low numbers of invariants than with
67
0.25
0.15
R
<Inv>
0.2
0.1
0.05
0
0
1
2
3
4
Inv
5
6
7
8
Figure 4.7: R<Inv> as a function of Inv when n equals 3.
high ones.
The highest point on the graph is in the line corresponding to Inv = 3. It is, however
quite isolated from the other points in the same line and nearly matched by the best
element for which Inv = 2. Many other points follow closely in the same line. This
mirrors the data in the fourth column of Table 4.3.1.
In any given line, points extend over a large range of Rσ values; the number of invariant
elements of a permutation does not in itself constrain the value of Rσ very much. Despite
this fact, we cannot fail to notice an overall trend on the graph. The first three lines of
points display an upwards trend. The line corresponding to Inv = 3 is similar to the one
of Inv = 2, being only less dense and not extending quite as far down. As Inv increases
beyond 3, the clouds of point are shifted downwards.
Another way of visualising this trend, is to average Rσ for all the points on a vertical
line. We call R<i> the average value of Rσ across all permutations with i invariant
elements. Figure 4.7 shows the variation of R<i> with i. The trend described above is
clearer on this figure. The value of R<i> increases with i up to a value of 3. For larger
values of i the trend is reversed and the decline is quite sharp for i greater than 5.
The next section examine how this pattern changes for greater values of n.
4.4.2
Codes defined on sequences longer than 3
When n is equal to 4, both the number of possible permutations and fitness functions are
of the order of 21 × 1012 . It is therefore clear that for values of n greater than 3, the
test of all permutations on all fitness functions is out of question. Instead we have to rely
on sampling both these sets. For that purpose, we created an algorithm which, given a
value of n and a value of i smaller than 2n , generates random permutations with exactly
i invariant elements. For each value of i between 0 and 2n (except 2n − 1 for which no
permutation exists), we generated 5000 such permutations and calculated Rσ for each of
them by averaging (Nf − Nf σ /2)/Nf for 100 randomly chosen fitness rankings f (taken
among the 2n ! possible ones). This experiment was performed for n equals 4, 5, 6 and 7.
68
Figure 4.8 shows four plots corresponding to the four values of n. In each of them, the
value of Rσ is plotted against Inv for the 5000 × 2n randomly generated permutations.
On each of these graphs, we recognise the pattern that was found for n equals 3. Moreover, as n increases, the pattern seems to come more into focus; the points corresponding
to a single value of i are more clustered, suggesting an improved correlation between Inv
and Rσ . In the four cases, the curves increase on the left half of their x range and decreases
towards 0 in the other half. All permutations have positive Rσ values confirming that any
form of redundancy is a positive factor in reducing the number of local optima.
For all values of n, a wide band of points stands out along which we find a few scattered
points. The best permutation are usually found among these atypical points. The very
best permutations are found in the middle of the x axis, in the same region where the
band reaches its maximum.
The best permutations have Rσ values which decrease with n: from 0.3, when n equals
4, it drops to 0.24 when n = 7. This could be a genuine trend. However, because the
proportion of sampled points decreases exponentially with n, the very best permutations
become increasingly unlikely to be included in our sample. It is therefore conceivable that
permutations with value of Rσ over 0.3 exist for all values of n.
As in the previous case, we also plotted R<i> , the average value of Rσ for all permutations with i invariant elements, against i. The four plots are shown in Figures 4.9. They
reproduce, as expected, a similar pattern to the one described by the dense band of points
in Figure 4.8. They also confirm the decrease of the values of Rσ with n.
4.5
The incidence of other variables on Rσ
This section examines the relation between Rσ and the other variables that have been
defined. In Figure 4.10, Rσ is plotted against all these variables and Table 4.2 summarises
the correlation coefficients between every pair of variable.
4.5.1
The relation with Conn0
Table 4.2 shows that Conn0 is the variable with the highest correlation with Rσ . The
coefficient is negative indicating that small values of Conn0 lead to high values of Rσ . This
Table 4.2: Summary of the correlations between all variables
Rσ
Inv Conn0 Conn1 Conn2 Conn3 SumDist
Orb
Rσ
1.00
0.33
-0.66
0.16
0.27
-0.12
0.32
0.25
Inv
0.33
1.00
0.19
-0.64
0.64
0.03
-0.66
0.88
Conn0
-0.66
0.19
1.00
-0.75
-0.09
0.55
-0.53
0.21
Conn1
0.16
-0.64
-0.75
1.00
-0.59
-0.30
0.64
-0.55
Conn2
0.27
0.64
-0.09
-0.59
1.00
-0.52
-0.51
0.41
Conn3
-0.12
0.03
0.55
-0.30
-0.52
1.00
0.11
0.23
SumDist
0.32
-0.66
-0.53
0.64
-0.51
0.11
1.00
-0.51
Orb
0.25
0.88
0.21
-0.55
0.41
0.23
-0.51
1.00
Rσ
Rσ
0
0.05
0.1
0.15
0.2
0.25
0.3
0
0.05
0.1
0.15
0.2
0.25
0
0
8
2
16
4
Inv
32
Inv
8
40
10
48
12
56
n=6
14
64
16
0
0.05
0.1
0.15
0.2
0.25
0
0.05
0.1
0.15
0.2
0.25
0.3
0
0
16
4
32
8
48
12
Figure 4.8: Rσ as a function of Inv when n is greater than 3.
24
6
n=4
Rσ
Rσ
0.3
16
Inv
64
Inv
80
20
96
24
112
n=7
28
n=5
128
32
69
<Inv>
40
10
48
12
56
n=6
14
64
16
0.15
0.2
0.25
0
0
0
4
20
8
40
12
16
Inv
60
Inv
Figure 4.9: R<Inv> as a function of Inv when n is greater than 3.
Inv
0
32
Inv
8
0
24
6
0.05
16
4
0.05
8
2
0.05
0.1
0.15
0.2
0.1
0
0
n=4
0.25
0.1
0.15
0.2
0.25
0
0.05
0.1
0.15
0.2
<Inv>
<Inv>
R
<Inv>
R
R
R
0.25
80
20
100
24
n=7
28
n=5
120
32
70
0.35
0.35
0.3
0.3
0.25
0.25
0.2
Rσ
Rσ
0.2
0.15
0.15
0.1
0.1
0.05
0.05
0
0
-0.05
-0.05
0
2
4
6
8
10
12
14
-5
16
0
5
10
0.35
20
25
0.35
0.3
0.3
0.25
0.25
0.2
Rσ
0.2
Rσ
15
Conn1
Conn0
0.15
0.15
0.1
0.1
0.05
0.05
0
0
-0.05
-0.05
-5
0
5
10
15
20
25
-5
0
5
Conn2
10
15
20
Conn3
0.35
0.35
0.3
0.3
0.25
0.25
0.2
Rσ
Rσ
71
0.15
0.2
0.15
0.1
0.1
0.05
0.05
0
0
0
5
10
15
20
25
SumDist
0
2
4
6
8
10
Orb
Figure 4.10: Rσ as a function of other variables when n equals 3.
is consistent with what we observed for the best and worst permutations in Tables 4.3.1.
Variable Conn0 is the number of pairs of symbols such that the substitution of one
by the other is never possible without going first through a third symbol. It is also the
number of pairs of symbols which are not connected by a connection of type 1, 2 or 3.
We can describe the correlation by saying that the more pairs of symbols are connected in
some way (by a connection of type 1, 2 or 3) the more likely are local optima to disappear
through the introduction of redundancy. The relationship between the two variables is
not however as simple as we might expect.
One permutation exists whose value of Conn0 is equal to 0 and five have a value of
Conn0 equal to 1. However these are not among the permutations with the largest values
of Rσ . Some permutations with values of Conn0 as large as 8 have greater values of Rσ
as can be seen on Figure 4.10.a. The value of Conn0 which leads to the largest value of
Rσ is 3. The relationship between Rσ and Conn0 is therefore not linear.
On Figure 4.11, Rσ has been plotted against Inv for permutations satisfying Conn0 =
3 on one plot and Conn0 = 4 on the other. As this figure shows, there is in both cases
a high residual correlation between Rσ and Inv. The correlation coefficients are 0.71 and
0.75 respectively. We conclude that the impact of Inv on Rσ is not mediated or explained
away by the variable Conn0.
72
0.32
0.3
0.3
Rσ
Rσ
0.28
0.26
0.24
0.25
0.2
0.22
0.15
0.2
0
1
2
3
0
1
(a)
2
Inv
Inv
(b)
Figure 4.11: Rσ as a function of Inv when Conn0 is fixed. (a) Conn0 = 3. (b) Conn0 = 4.
4.5.2
The relation with Conn1, Conn2 and Conn3
None of these variables taken individually has a high correlation with Rσ (Table 4.2).
However, because of the equality
Conn1 + Conn2 + Conn3 = 28 − Conn0
we know that that Conn1 + Conn2 + Conn3 has the same correlation coefficient with
Rσ as Conn0 but with an opposite sign. If that is the case, it is tempting to alter the
weighting of this sum to see if this correlation can be improved. A linear regression will
find the weights a2 and a3 such that the correlation between Conn1 + a2 Conn2 + a3 Conn3
and Rσ is maximised.
The result is a new variable
BestConn = Conn1 + 1.38Conn2 + 1.84Conn3
whose correlation coefficient with Rσ is 0.92. A scatter plot of Rσ against BestConn is
shown in Figure 4.12. The fit is much improved compared to any of the variables we have
seen so far.
This weighting suggests that Conn1, Conn2 and Conn3 all contribute to increasing
the value of Rσ but not to an equal amount. It could be said that a connection of type
3 is 1.84 times more effective than a connection of type 1 while a connection of type 2
is somewhere in the middle. The relative magnitude of the weights is consistent with
the definition of the classes: a connection of type 3 guarantees unrestricted possibility
of transition between two symbols while a connection of type 1 only makes substitution
possible in half of the cases.
This variable shows that our division in 4 classes is sound and that connectivity between
symbols, as defined by these classes, is instrumental in the definition of good redundancy.
The variable BestConn could also help discover permutations with high values of Rσ . It
cannot do so directly because it would be very difficult to construct a permutation with a
large value of BestConn. However, since BestConn is much more economical to calculate
than Rσ , we can use it in order to assess the quality of randomly generated permutations.
73
0.35
0.3
Rσ
0.25
0.2
0.15
0.1
0.05
0
22
23
24
25
26
27
28
29
30
31
32
33
34
BestConn
Figure 4.12: Rσ as a function of the best linear combination of Conn1, Conn2 and Conn3.
0.35
0.35
0.3
0.3
0.25
Rσ
Rσ
0.25
0.2
0.2
0.15
0.15
0.1
0.05
0.1
5
10
15
20
25
30
25
30
35
SumDist
40
45
50
55
60
65
SumDist
(a)
(b)
Figure 4.13: Rσ as a function of SumDist when Inv is fixed. (a) n = 4 and Inv = 7; the
correlation coefficient is 0.54. (b) n = 5 and Inv = 15; the correlation coefficient is 0.37.
4.5.3
The relation with SumDist
The correlation between SumDist and Rσ is quite small (0.32) and no interesting pattern
emerges from the graph shown in Figure 4.10. However, when the correlation between
SumDist and Rσ is calculated within subsets defined by a constant value of Inv, the
correlation coefficients increase enormously as shown in the following table.
Inv
0
1
2
3
4
5
6
Corr(Rσ ,SumDist)
0.82
0.89
0.80
0.81
0.57
0.98
0.72
This observation is interesting because it suggests a simple procedure for finding good
permutations. Given that the best permutations have about half of their elements invariants, we could further constrain them to have the largest possible value of SumDist.
All we have to do to build such permutations is assign to those elements which are not
invariant an image that is as different from them as possible in binary terms.
To check the validity of this procedure, we checked whether this correlation was also
74
found for greater values of n. We examined the cases where n equals 4 and 5. The
optimal numbers of invariant elements are 7 and 15 respectively for these cases. In one
case we generated permutations of 16 elements, of which 7 were invariant and in the other
permutations of 32 elements, 15 of which were invariant. Figure 4.13 shows that, in both
cases, the value of SumDist does not help identify the best permutations among those
with the optimal number of invariant elements. The procedure proposed is therefore not
applicable.
4.6
Parallels with a quaternary alphabet
In this section, we briefly come back to the difference between codes using binary and
quaternary alphabets. We argued at the beginning of Chapter 3 that conclusions reached
with a binary alphabet could be extrapolated to a quaternary one, at least in qualitative
terms. We can now reconsider this statement in the context of what we know from this
chapter and the previous one.
To make the discussion more concrete and directly relevant to the study of the genetic
code, let us compare the case where 16 distinct symbols are encoded with a binary alphabet
with the same 16 symbols encoded using a quaternary one. With a quaternary alphabet,
two letters only are needed to describe each symbol. If a binary alphabet is used, 4 bits are
needed instead. A graphical representation of the neighbouring relationships is shown in
Figure 4.14 in the case of a quaternary alphabet. Pairs of letters, taken from the alphabet
{A,B,C,D}, are connected by a line whenever they differ at only one position. This can
be compared with Figure 3.2 which represents the same relationship in the binary case.
The following features distinguish these two cases. In the quaternary case, sequences
are a maximum of two mutations away while they are as far as four mutations away in
the binary case. Every sequence has 6 neighbours in the quaternary case and 4 in the
binary case. This means that the number of sequence that are not accessible by mutation
are similar, respectively 10 and 12. These numbers are shown in Table 4.3. Given this
fact, it looks as if redundancy has as much potential to increase the number of possible
transitions in the quaternary case than it has in the binary case.
Furthermore, as Table 4.3 shows, the number of pairs of sequences which are connected
is different in both cases. However, adding two bits or one quaternary letter of redundancy
multiplies this number by four in both cases. Thus even though the numbers are different,
the ratio is the same. These facts taken together indicate that although the underlying
graphs are different, the potential for redundancy to add new paths between previously
unconnected symbols is very similar in both cases.
4.7
Conclusion
This chapter has shown that permutations range from no effect at all to a reduction by
30% of the number of optima. None of the forms of redundancy investigated here was
found to have an overall negative effect. The number of neutral mutations induced by a
permutation, which is its number of invariant elements, provides some indication of the
value of Rσ . For the values of n tried here, the best patterns of redundancy displayed
AA
AB
AC
AD
BA
BB
BC
BD
CA
CB
CC
CD
DA
DB
DC
DD
75
Figure 4.14: A representation of sequence distances in the case of a quaternary alphabet.
around 2n−1 neutral mutations. A better prediction of Rσ can be made by a careful
analysis of the possibilities of transitions between the pairs of symbols represented by the
code.
The variable Rσ was defined as an indicator of a potentially beneficial interaction
between the pattern of redundancy defined by σ and evolution by mutation and selection.
We need, however, some confirmation that this variable is indeed fulfilling that purpose.
In the next two chapters, some of the patterns of redundancy studied here will be
included in simulations of evolution in the form of a GA. This will provide the ultimate
measure by which we want to decide whether a pattern of redundancy is beneficial or not.
It will also be the occasion to evaluate the usefulness of Rσ .
76
Table 4.3: A summary of the differences between a binary alphabet and a quaternary one
in the case where 16 symbols are encoded. Redundancy is assumed to result from the
addition of two binary digits or one quaternary one.
Binary alphabet
Quaternary alphabet
Length of strings
2
4
Maximum distance between strings
2
4
Number of neighbours from a sequence
4
6
Number of sequences more than one mutation
12
10
32
48
192
288
away
Number of pairs of sequences which are neighbour
Number of pairs of sequences which are neighbours after redundancy is added
Chapter 5
Redundancy on trial in evolution
5.1
The Genetic Algorithm
We describe in this section the genetic algorithm that has been used in this chapter and the
next one. We used a spatially distributed genetic algorithm where every individual in the
population occupies a slot in a two-dimensional grid. The edges of the grid wrap around so
that the grid is torus-shaped. Every slot in the grid contains exactly one individual. The
grid had 20 cells in width and 20 in height giving a total population of 400 individuals.
The GA performs the following sequence of steps:
• For each slot s of the grid in turn
– Pick t individuals from the neighbourhood of s. The probability of an individual
being picked in any of these t choices is represented in Figure 5.1.
– Identify the best two genotypes, P1 and P2 , among those t individuals.
– Perform a one point crossover between P1 and P2 to produce a new genotype
C.
– Mutate each bit of C with probability pmut .
– Replace the worst of the t genotypes by C.
This defines a steady-state GA where one individual is replaced at a time. Contrarily
to GAs where the entire population is replaced at the same time, there is no natural point
in time marking the beginning of a new generation. We can nonetheless refer to 400 of
the cycles above as a generation since this is the time it takes to replace an amount of
individuals equal to the population size.
This GA was used since there is mounting evidence that it gives good reliable results
over a range of problems (McIlhagga et al., 1996; Collins and Jefferson, 1991). In fact,
many of the simulations presented here were also tried with a random mating genetic
algorithm with no appreciable difference in the results.
In some of the experiments described in this chapter, recombination is turned off and
we replace the worst of the T individuals by a mutated version of the best one. Notice
that because we always remove the worst individual among T , there is no way in which
Chapter 5. Redundancy on trial in evolution
78
0
0
1
1
2
3
4
5
5
5
4
3
2
1
1
0
0
0
1
2
3
5
7
9
11
12
11
9
7
5
3
2
1
0
1
2
3
6
10
15
19
23
24
23
19
15
10
6
3
2
1
1
3
6
11
18
27
36
42
44
42
36
27
18
11
6
3
1
2
5
10
18
30
44
58
69
73
69
58
44
30
18
10
5
2
3
7
15
27
44
65
86
101
107
101
86
65
44
27
15
7
3
4
9
19
36
58
86
113
133
140
133
113
86
58
36
19
9
4
5
11
23
42
69
101
133
157
166
157
133
101
69
42
23
11
5
5
12
24
44
73
107
141
166
175
166
141
107
73
44
24
12
5
5
11
23
42
69
101
133
157
166
157
133
101
69
42
23
11
5
4
9
19
35
58
86
113
133
140
133
113
86
58
36
19
9
4
3
7
15
27
44
65
86
101
107
101
86
65
44
27
15
7
3
2
5
10
18
30
44
58
69
73
69
58
44
30
18
10
5
2
1
3
6
11
18
27
35
42
44
42
35
27
18
11
6
3
1
1
2
3
6
10
15
19
23
24
23
19
15
10
6
3
2
1
0
1
2
3
5
7
9
11
12
11
9
7
5
3
2
1
0
0
0
1
1
2
3
4
5
5
5
4
3
2
1
1
0
0
Figure 5.1: The probability of a slot being picked in a selection tournament. At the center
in grey is slot s handled by the GA. Numbers have to be divided by 10000. In less than
1% of the cases an individual outside this part of the grid will be chosen. This probability
distribution is obtained from a two-dimensional Gaussian probability distribution with a
standard deviation of 3 slots.
the best individual in the population can be lost. Hence, the fitness of the best individual
in the population as a function of time has to be an increasing or flat function; it cannot
decrease. As will be explained later, various mutation rates were compared in all our
experiments.
5.2
5.2.1
First test problem: a case of no epistasis
The problem
Consider a fitness function f defined on sequences of size rn in the following way:
f : {0, 1}rn → [0, 1]
f (x11 , . . . , x1n , x21 , . . . , x2n , . . . , xr1 . . . xrn ) =
1
(f1 (x11 , x12 , . . . , x1n ) + f2 (x21 , . . . , x2n ) + . . . + fr (xr1 , . . . , xrn ))
r
with
fi : {0, 1}n → [0, 1],
1<i<r
79
The r fi functions operate independently assigning a fitness to each block of n bits;
the value of f is the average of these r values. A function fi is defined by explicitly
assigning a number between 0 and 1 to each of the 2n values that the input can take.
The definition of f therefore requires that we generate in total r2n values between 0 and
1. In the experiments that will be described here, we have set n to 3 and r to 100: one
hundred look-up tables of 8 entries each are thus required to define f . When defining a
look-up table for any fi function, we make sure that the worse value is 0, the second worse
value is 71 , ..., the second best value is 76 , and the best value is 1. Hence, the eight values
{0, 17 , 27 , 37 , 47 , 57 , 67 , 1} appear once and only once in the table but in any order.
The interaction between the blocks is purely additive. Hence, each fi can be optimised
independently of the others and the global optimum of f is the concatenation of the optima
of f1 , f2 ,. . . ,fr . This is therefore a case of total absence of epistasis between the r blocks.
This function will nonetheless have many optima; in the case where n = 3 each fi has on
average 2 maxima and f has therefore an average of 2r local maxima.
In the field of theoretical population genetics, such additive models (or multiplicative
ones, which have the same property) are commonly used (Peck et al., 1997; Nagylaki,
1994; Turelli and Barton, 1994; Goodnight, 1995). There are two reasons for that. One
is that they are accurate models of the biological reality of the interaction of genes that
contribute to a common trait (Ehdaie and Waines, 1994; Larsen, 1994). The other is that
they are more easily analysed than any other model.
On the other hand, for the optimisation of a function to require the use of a genetic
algorithm, there has to be some epistasis in the sense described above. This could therefore
raise doubts about the relevance of a function with no epistasis as a test problem. Fitness
functions associated with real-world problems will typically have unknown properties.
Testing novel features of a GA on such functions is therefore risky since one does not
understand the underlying topology of the fitness landscape. Ultimately, of course a GA
has to prove itself on such functions but experimenting with a GA is probably best done,
in first instance, in a carefully controlled environment.
In the case of the function just defined, the selection pressure at any one locus will be
the same at all times. Once the optimum value has been found at a locus, selection does
not have to do any more work, except for the occasional mutation which might displace
it. In the presence of epistasis this is not the case anymore. A locus that has reached an
optimum in a given context might lose its optimality simply from changes at other loci.
This means that selection might need to optimise the same loci many times over. Hence,
if redundancy helps selection in this process, we could expect it to be more beneficial in
an epistatic case, everything else being equal, since selection has to more work to perform.
Functions of the type of f can be considered a special case of a wider class of functions known as NK fitness landscapes. These functions have been proposed as a tool to
investigate the dynamics of evolution by mutation and selection (Kauffman and Levin,
1987; Kauffman, 1993). In an NK fitness landscape, every bit contributes additively to
the fitness of the genotype as a whole. The contribution of a bit, however, does not depend
only on its value but also on the value of K other bits. These bits can be anywhere on the
80
fi1 (xi1 xi2 xi3 )
xi1 xi2 xi3
fi2 (xi1 xi2 xi3 )
xi1 xi2 xi3
fi (xi1 xi2 xi3 )
000
a0
000
a0 /3
001
a1
001
a1 /3
010
a2
010
a2 /3
011
a3
011
a3 /3
100
a4
100
a4 /3
101
a5
101
a5 /3
110
a6
110
a6 /3
111
a7
111
a7 /3
fi3 (xi1 xi2 xi3 )
→
Figure 5.2: A redefinition of function f in NK fitness landscape terms.
chromosome and their location has to be specified for each individual bit. The relationship
does not have to be reciprocal: if the value of bit i is needed to calculate the contribution
of bit j, it does not have to be the case that the value of bit j is needed to calculate the
contribution of bit j. The parameter N is the total number of bits in the genotype.
Functions of the type of f are therefore instances of NK fitness landscape with N equal
to nr and K equal to n−1. They have an additional constraint over NK fitness landscapes:
the epistatic interactions are confined within each of the r blocks, and within these blocks
each of the r bits is in epistatic interaction with every other.
It could be objected that our functions f are a sum of contributions from groups of
r bits whereas in NK fitness landscapes contributions come from individual bits. This is
only an apparent difference. We can split the contribution of a group of r bits into r equal
contributions which are then assigned to individual bits. In the case where r = 3, this
transformation is shown in Figure 5.2.
5.2.2
Introducing redundancy
Chapter 3 showed how a permutation σ of 2n elements can be used to characterise the
addition of one bit of redundancy to a function such as fi . Denoting fiσ the resulting
function, we have:
fiσ : {0, 1}n+1 → [0, 1]
such that
fiσ (0, x1 , x2 , ..., xn ) = fi (x1 , x2 , ..., xn )
fiσ (1, x1 , x2 , ..., xn ) = fi (σ(x1 , x2 , ..., xn ))
Transforming the r functions fi through the same permutation σ, we can transform function f into,
f σ : {0, 1}r(n+1) → [0, 1]
81
such that
f σ (x10 , x11 , . . . , x1n , x20 , x21 , . . . , x2n , . . . , xr0 , xr1 , . . . xrn ) =
1 σ
(f (x10 , x11 , . . . , x1n ) + f2σ (x20 , x21 , . . . , x2n ) + ... + frσ (xr0 , xr1 , . . . , xrn ))
r 1
The underlined bits are redundancy bits which did not exist under f . Transforming f
into f σ therefore amounts to inserting a redundancy bit to the left of each of the r blocks
and using the value of that new bit to decide whether, for block i, fiσ (xi0 , xi1 , . . . , xin )
is equal to fi (x1 , x2 , ..., xn ) or fi (σ(x1 , x2 , ..., xn )). The transformation of f into f σ is
therefore the result of applying the same permutation-induced redundancy to each of the
r blocks which jointly define a genotype for f .
Note that here again f Id , the function obtained by applying permutation Id, is almost
identical to f . It is defined on sequences which are r bits longer but the value of those
bits is irrelevant to its value. These bits are therefore junk genetic material and provided
that the per bit mutation rate is kept the same, f Id will take exactly the same time to be
optimised as f .
Since n is set to 3 and r to 100, the length of the chromosome on which f σ is defined will
be 400. To calculate the value of f σ , we parse the genotype in 100 blocks of 4 bits and, for
each block, the value of the first bit (xi0 ) decides whether the contribution of the following
three bits (xi1 , xi2 , xi3 ) is fi (xi1 , xi2 , xi3 ) or fi (σ(xi1 , xi2 , xi3 )). The values resulting from
the 100 blocks are then added up to give the value of f σ . In the experiments described
here, f Id was always used to represent the situation when no redundancy is added.
5.2.3
Experimental procedure
On this problem we compared the effect of six different permutations chosen to cover the
range of possible Rσ values. The following table defines these permutations and indicates
their Rσ values.
σ
Rσ
[07143562]
0.3225
[50241367]
0.2633
[10234567]
0.1464
[40576123]
0.0852
[40675123]
0.0453
Id
0.0000
As usual, a permutation is defined by listing the values of [σ(0)σ(1)...σ(8)] except for
the identity permutation which is referred to as Id. The permutation at the top is the one
with the highest Rσ value in P3 . For each permutation σ in the table, we performed the
following steps.
For 50 trials
• Define f by generating 100 random fi functions of 8 entries each,
82
• Generate an initial random population,
For 400 generations
• Run the GA on f σ ,
• Calculate the number of blocks which are optimal in the best individual,
• Calculate the average of this value over all 50 trials.
At the end of this procedure, we have the average number of blocks optimised in the
best individuals after g generations for all values of g between 1 and 400. The reason
for averaging these results over 50 trials is that functions f are of variable difficulty for
evolution depending on the number of local optima in the underlying fi functions which
are generated randomly. If all 100 fi functions have a single optimum, f will be extremely
easy to optimise; if on the other hand all 100 fi functions have 4 optima, f will be much
more difficult to optimise. In each of the 50 trials, a new instance of f was generated
ensuring that our comparison between the performances of different permutations is not
biased by some permutations getting easier functions to optimise than others. This average
is also justified by the stochastic nature of the GA.
Figure 5.3 and Figure 5.4 display the number of blocks optimised in the best individual
for each of the 6 permutations as a function of the genomic mutation rate. In Figure 5.4
comparison is made after 400 generations while in Figure 5.3 it is made after only 100
generations. Mutation rates were varied between 0.5 and 5 by steps of 0.5. The probability
of a bit being mutated can be obtained by dividing this number by 400, the total number
of bits in a chromosome.
These graphs make it possible to determine an optimal mutation rate for each of the
permutations which we believe is crucial for a meaningful comparison between the different
permutations. Suppose that we compared permutations σ1 and σ2 for an arbitrary value
of the mutation rate and concluded that σ1 was better than σ2 because it led to more
blocks being optimised in a given number of generations. Unless we actually explore a
range of mutation rates, we cannot be confident that this conclusion is not the result
of a good match between the mutation rate used and permutation σ1 . Since ultimately,
the mutation rate can be fine-tuned to maximise performance, the only meaningful way of
comparing redundant codes is to use the best mutation rate possible for each permutation.
5.2.4
Results
Examination of Figure 5.4 indicates that, for all permutations, the mutation rate has a
noticeable impact on the performance of the GA. The pattern that emerges, both with
and without recombination, is an improved performance with increased levels of mutation
until a maximum is reached. Beyond that point, increasing the mutation rate degrades
the performance of the GA.
The optimal mutation rate depends on the permutation used and whether recombination is used or not. For any given permutation, the optimal value of mutation is higher
when recombination is off than when it is on. This is understandable since in the absence
of recombination, mutation becomes the only source of novelty in the system. A higher
PROPORTION OF
BLOCKS OPTIMISED
0.75
83
07143562 (Rσ = 0.3225)
50241367 (Rσ = 0.2633)
10234567 (Rσ = 0.1464)
0.7
40576123 (Rσ = 0.0852)
40675123 (Rσ = 0.0453)
0.65
Id
(Rσ = 0.0000)
0.6
0.55
0.5
0
1
2
3
4
5
MUTATIONS PER GENOME
(a) Recombination OFF
PROPORTION OF
BLOCKS OPTIMISED
0.8
0.75
0.7
07143562
(Rσ = 0.3225)
50241367
(Rσ = 0.2633)
10234567
(Rσ = 0.1464)
40576123
(Rσ = 0.0852)
40675123
(Rσ = 0.0453)
Id
(Rσ = 0.0000)
0.65
0.6
0.55
0
1
2
3
4
5
(b) Recombination ON
Figure 5.3: First problem: the proportion of optimal blocks in the best individual after
100 generations as a function of the mutation rate. Each line corresponds to the use of
a different permutation. Every point is the average of 50 trials. Error bars indicate the
standard error.
PROPORTION OF
BLOCKS OPTIMISED
0.82
84
07143562 (Rσ = 0.3225)
50241367 (Rσ = 0.2633)
0.8
10234567 (Rσ = 0.1464)
0.78
40576123 (Rσ = 0.0852)
40675123 (Rσ = 0.0453)
0.76
Id
(Rσ = 0.0000)
0.74
0.72
0.7
0.68
0.66
0
1
2
3
4
5
0.85
PROPORTION OF
BLOCKS OPTIMISED
07143562
0.8
(Rσ = 0.3225)
50241367
(Rσ = 0.2633)
10234567
(Rσ = 0.1464)
40576123
(Rσ = 0.0852)
40675123
(Rσ = 0.0453)
Id
(Rσ = 0.0000)
0.75
0.7
0
1
2
3
4
5
Figure 5.4: First problem: the proportion of optimal blocks in the best individual after
400 generations as a function of the mutation rate. Each line corresponds to the use of
a different permutation. Every point is the average of 50 trials. Error bars indicate the
standard error.
PROPORTION OF
BLOCKS OPTIMISED
0.82
85
07143562 m=2.5 (Rσ = 0.3225)
50241367 m=3.5 (Rσ = 0.2633)
0.8
10234567 m=3
(Rσ = 0.1464)
40576123 m=2.5 (Rσ = 0.0852)
40675123 m=2.5 (R σ= 0.0453)
0.78
Id
m=2.5 (Rσ = 0.0000)
0.76
0.74
0.72
350
360
370
380
390
400
GENERATION
PROPORTION OF
BLOCKS OPTIMISED
07143562 m=1.5
0.84
0.82
(Rσ = 0.3225)
50241367 m=2.5
(Rσ = 0.2633)
10234567 m=2
(Rσ = 0.1464)
40576123 m=1.5
(Rσ = 0.0852)
40675123 m=2
(Rσ = 0.0453)
Id
(Rσ = 0.0000)
m=2.5
0.8
0.78
0.76
350
360
370
380
390
400
GENERATION
Figure 5.5: First problem: the proportion of optimal blocks in the best individual as a
function of the number of generations. For each permutation, the optimal mutation rate
has been used and is indicated in the legend. Every point is the average of 50 trials. Error
bars indicate the standard error.
86
0.8
PROPORTION OF
BLOCKS OPTIMISED
07143562 m=1
0.75
Id
(Rσ = 0.3225)
m=2.5 (Rσ = 0.0000)
0.7
0.65
0.6
0.55
0.5
0
50
100
150
200
250
300
350
400
GENERATION
07143562 m=1
PROPORTION OF
BLOCKS OPTIMISED
0.8
Id
0.76
(Rσ = 0.3225)
m=2.5 (Rσ = 0.0000)
0.72
0.68
0.64
0.6
0.56
0.52
0
50
100
150
200
250
300
350
400
GENERATION
Figure 5.6: First problem: comparing the speed of evolution with permutations [07143562]
and Id. For each permutation, the optimal mutation rate has been used and is indicated
in the legend. The dashed line shows the number of generations it takes permutation
[07143562] to reach the level of optimality obtained after 400 generations with permutation
Id. Every point is the average of 50 trials. Error bars indicate the standard error.
87
level of mutation is therefore needed to compensate for the absence of the other source of
variation.
Table 5.1: The relationship between number of blocks optimised after 400 generations and
the value of Rσ .
σ
Rσ
Opt
σ
Rσ
Opt
[07143562]
0.3225
0.8145
[07143562]
0.3225
0.8405
[50241367]
0.2633
0.7923
[50241367]
0.2633
0.8208
[10234567]
0.1464
0.7587
[10234567]
0.1464
0.7842
[40576123]
0.0852
0.7477
[40576123]
0.0852
0.7753
[40675123]
0.0453
0.7239
[40675123]
0.0453
0.7685
Id
0.0000
0.7327
Id
0.0000
0.7639
Tables 5.1 shows for all 6 permutations the proportion of blocks which are optimal
after 400 generations when the best mutation rate is used. If we compare the Rσ values of
a permutation with the performance it achieves, we see that a higher value of Rσ always
leads to a greater number of optimised blocks. The only exception is Id which optimises
more blocks than [40675123] when recombination is turned off but these two permutations
are very similar in their Rσ values.
When Rσ was defined in Chapter 3, it was intended as a fast way of estimating the
potential of a pattern of redundancy to facilitate adaptation. However, in order to keep the
calculation of Rσ tractable, we had to make the following assumptions about the impact
of redundancy on evolution.
• that the neutral paths created by redundancy would be effectively used as a way of
escaping otherwise local optima,
• that the action of recombination could be ignored in first approximation.
Although these assumptions sound intuitively acceptable, an experimental validation of
the adequacy of Rσ was needed. The results just described indicate that Rσ fulfills its
purpose in a most satisfactory way.
Furthermore, Chapter 3 showed that patterns of redundancy lead in some case to a
large reduction in the number of optima. However, even assuming a correlation between
such reduction and some beneficial effect for adaptation, it could have been that this effect
was too small to be interesting. The values found in Table 5.1 show that this is not the
case; permutations with large values of Rσ have an impact large enough to be of practical
relevance.
We can put this impact into perspective by comparing it with recombination, the
GA operator par excellence. Comparison of Table 5.1(a) and (b) shows that for any
permutation, recombination increases substantially the number of blocks optimised. In the
absence of redundancy (σ = Id) and recombination, 73.27% of the blocks are optimised
88
after 400 generations. Adding recombination but no redundancy increases this amount to
76.39%. Adding redundancy but no recombination (σ=[07143562]) increases it to 81.45%.
Hence, the improvement brought about by redundancy is more than twice as large as
the one brought about by recombination. This comparison is done in favourable terms for
recombination given that epistasis is null between blocks. Notice also that the improvement
brought about by redundancy and recombination together is equal to the sum of the
improvement brought about by each of them individually. It thus looks as if the two
operate completely independently of each other.
In Figure 5.5, we have plotted for the six permutations the number of blocks optimised
by the best individual in the population (averaged over 50 trials) against the number of
generations. For every permutation, the optimal mutation rate has been taken and is
indicated in the legend. All the curves are very stable and almost flat.
Comparison of Figure 5.3 and Figure 5.4 shows that the optimal mutation rate depends
on the number of generations we want the GA to run for. The general trend is that
the optimal mutation rate for a best result after 100 generations is lower than the one
that optimises performance after 400 generations. Because we start from a truly random
population, the genetic variation that exists in the population is very large at the beginning
but decreases with time to reach an equilibrium value that depends on both the mutation
rate and whether or not recombination is used. Hence, during the first 100 generations
selection operates in an environment exceptionally rich in genetic diversity. Mutation is
therefore not as needed at that stage as it is in later stages of the evolutionary process.
We have seen that permutations with high values of Rσ lead to a higher number of
blocks being optimised in a given amount of time. But this number is not a satisfactory
currency for comparison since many more blocks are optimised per unit of time at the beginning of the GA than at the end. Instead of comparing the number of blocks optimised
after a given number of generations, we can take the reverse approach of comparing the
time taken by two permutations to optimise a given number of blocks. This is more revealing because numbers of generations are proportional to the number of function evaluations
which is the yardstick normally used to compare optimisation methods.
In Figure 5.6, permutation [07143562] and the identity permutation are plotted on the
same graph over the entire length of the experiment. We compare these two permutations
because one should yield the best results possible with redundancy and the other one
shows what would happen in the absence of redundancy.
The level of optimisation achieved by the identity permutation after 400 generations
was chosen as the basis for comparison. Figure 5.6 shows that with permutation [07143562]
roughly 100 generations are needed to achieve that same level. This is true whether
recombination is on or off confirming that the impact of redundancy is not affected by
it. Redundancy at its best thus achieves a remarkable fourfold increase in the speed of
optimisation on this problem. As a comparison, recombination only achieves slightly less
than a two fold increase in speed using the same criteria.
89
f(i,j)
4
3
2
1
0
i-20
j
i-16
i-12
i-8
i-4
i
i+4
i+8
i+12 i+16 i+20
Figure 5.7: The value of f (i, j) as a function of j.
5.3
Second test problem: selection for a periodical chromosome
The previous problem has shown that the impact of redundancy is very significant on
a special class of NK fitness landscapes with no epistasis between triplets of bits. This
section examines the impact of redundancy on a completely different problem.
5.3.1
The problem
In this problem, the chromosome is read by triplets of bits and each of the possible eight
values is translated into a different symbol from the set {A, B, C, D, E, F, G, H}. Fitness
depends on the number of positions that separate successive occurrences of the same
symbols. If i and j are the positions of two successive occurrences of the same symbol,
their contribution to fitness is:
f (i, j) = ||i − j|[8] − 4|
where |i − j|[8] is the rest of |i − j| in the division by 8. The action of this function is easily
understood by examination of Figure 5.7 where f (i, j) has been represented for values of j
varying around an arbitrary value of i. The function is maximum and equal to 4 whenever
consecutive occurrences of the same symbol on the chromosome are separated by a number
of positions that is a multiple of eight. The case i = j never comes into consideration
since we are only interested in different occurrences of the same symbol. The function is
minimum and equal to 0 whenever the two symbols are separated by a number of positions
which is half-way between multiples of 8 such as 4, 12, 20... In between these periodical
maxima and minima, the function varies linearly with the difference between i and j.
This function describes the contribution to fitness of two consecutive occurrences of
the same symbol on the chromosome. The total fitness of the chromosome is the sum of
all such contributions. A possible algorithm to calculate that fitness is therefore:
fitness:=0,
for each symbol X in {A, B, C, D, E, F, G, H},
for each consecutive appearance of X on the chromosome whose positions are i and j,
90
add f (i, j) to fitness.
Supposing X appears at positions i and j on the chromosome, f (i, j) will only be
added to the fitness of the chromosome if X appears nowhere between i and j. If symbol
X appears NX times on the chromosome, there are therefore NX −1 contribution to fitness
from that symbol.
The identity of optimal sequences depends marginally on the size of the chromosome.
Consider for instance a chromosome coding for q symbols (i.e. of 3q bits) that contains
only the symbol A. Since every occurrence of A next to itself will contribute 3 to fitness
(f (i + 1, i) = 3), the fitness of the chromosome will be 3(q − 1). Consider now a chromosome where the eight possible symbols appear at the eight first positions and are subsequently repeated in the same order along the chromosome. ACDEF GHBACDEF GHB
ACDEF GHBACDEF GHBACD... would be an instance of such chromosome. If the
chromosome is q symbols in length, its fitness will be 4(q − 8) since every symbol except
the first 8 will contribute 4 to fitness. Such chromosome will be better than the one made
of a single symbol if
4(q − 8) ≥ 3(q − 1)
⇔
q ≥ 29
Hence, for a chromosome of length greater than 29, the optimal chromosome will be of
the second type. In all the results described hereafter, the value of q is set to 100 and
chromosomes are therefore 300 bits in length.
Because the contribution to fitness of a symbol depends on other occurrences of the
same symbol on the chromosome, the optimal value for three bits that define a symbol
depends entirely on the information content of other parts of the genome. In other words,
if we are given no information about the rest of the chromosome, all eight symbols are
equally likely to be optimal at a given locus. Contrast this with the situation in the
problem of Section 5.2. There, one of the triplets was optimal at a locus regardless of the
alleles found at other loci. The identity of the optimal triplet changed from one trial to
the next but it remained constant for the duration of a trial. In the present problem, the
optimal value for a symbol at a specified position on the chromosome changes as the rest
of the chromosome changes.
If we examine in more detail the nature of these epistatic interactions, we find that,
given a symbol X at position i on the chromosome, its contribution to fitness will be determined by the leftmost occurrence of X to the right of i, and by the rightmost occurrence
of X to the left of i. Hence, we cannot point at pairs of loci which interact together all the
time. The nature of the epistatic interaction itself depends on the assignement of other
loci. One property always holds: there will be a maximum of 16 other loci with which a
locus has epistatic interactions. These are the nearest occurrences of the eight possible
symbols to the left of the locus and the eight to the right of the locus.
This problem is meant as a crude analogy to amino acids evolving to optimise the
shape and function of a protein. The basis for this analogy is the following. Consider
three consecutive base positions i, i + 1, i + 2 on a chromosome which together code for an
amino acid A in a protein P . What determines the optimal bases for positions i, i + 1 and
91
i + 2 is the context defined by other amino acids that are part of P . The value of i in itself
is irrelevant; if the coding sequence is shifted on the chromosome, the optimality of A at
the new position remains the same. In the problem defined here too, the optimal values
of positions i, i + 1, i + 2 on the chromosome are completely dependent on information
held at other points on the chromosome and not at all on the value of i. As we pointed
out, this is exactly the reverse of the previously defined problem where the optimal value
of positions i, i + 1, i + 2 depended exclusively on the value of i. These two problems are
therefore at the two extremes of a spectrum. Other aspects of the interaction between
amino acids in proteins are extremely complicated to model and the function f (i, j) has
no pretension to be be faithful to them.
5.3.2
Adding redundancy
Adding redundancy to this problem is straightforward. The mapping T between triplets
of bits and symbols from the set {A, B, C, D, E, F, G, H} corresponds exactly to what
we defined in Chapter 3 as a non-redundant code. Hence the procedure described in
that chapter can be applied literally here. That is, from the arbitrary mapping between
triplets of bits and symbols defined on the left side of Figure 5.8, redundancy defined by
a permutation σ will result in the mapping shown on the right side of that same figure.
a1 a2 a3
T (a1 a2 a3 )
d1 a1 a2 a3
T σ (d1 a1 a2 a3 )
d1 a1 a2 a3
T σ (d1 a1 a2 a3 )
000
A
0000
A
1000
T (σ(000))
001
B
0001
B
1001
T (σ(001))
010
C
0010
C
1010
T (σ(010))
011
D
0011
D
1011
T (σ(011))
100
E
0100
E
1101
T (σ(100))
101
F
0101
F
1101
T (σ(101))
110
G
0110
G
1110
T (σ(110))
111
H
0111
H
1111
T (σ(111))
→
Figure 5.8: Second problem: transforming the encoding through permutation σ.
Without redundancy, the expression of 100 symbols would require 300 bits. Since each
symbol is now encoded by 4 bits, we need 400 bits to encode the 100 symbols.
On this problem we only compared the performance of the identity permutation with
that of permutation [07143562] which is the permutation with the highest Rσ value. In
the case of the identity permutation the leftmost bit of every block is irrelevant to the
decoding of the other 3 and the results will be the same as if we had not used any
redundancy. Figure 5.9 shows the redundant code that results from the application of
permutation [07143562].
5.3.3
Results
The results described here were obtained using the same procedure as in the previous
problem. Given that on that problem redundancy and recombination did not interfere
a1 a2 a3
T (a1 a2 a3 )
d1 a1 a2 a3
T σ (d1 a1 a2 a3 )
d1 a1 a2 a3
T σ (d1 a1 a2 a3 )
000
A
0000
A
1000
A
001
B
0001
B
1001
H
010
C
0010
C
1010
B
0011
D
1011
E
→
011
D
100
E
0100
E
1101
D
101
F
0101
F
1101
F
110
G
0110
G
1110
G
111
H
0111
H
1111
C
92
Figure 5.9: Second problem: transforming the encoding through permutation [07143562].
with each other, we did all our runs with recombination since these are the standard
conditions for a GA. As before, all the points on the graphs are the average of 50 trials.
Figure 5.10 plots the fitness of the best individual as a function of the mutation rate
for both [07143562] and the identity permutation. Panel (a) displays the fitnesses reached
after 100 generations while panel (b) displays them after 400 generations. This figure
clearly shows that the GA operating with redundancy finds better solutions than the one
operating without it. It is true after 100 and 400 generations and presumably for any
number of generations in between.
Both panels in Figure 5.10 are very similar. In both cases, the curve for permutation
[07143562] is made of an increasing segment followed by a decreasing one. The maximum
fitness is achieved in both cases for a rate of 2 mutations per genome. This maximum is
quite well marked with values around it significantly lower. This contrasts with the other
curve where a range of mutation rates achieves maximal or nearly maximal fitness.
After 100 generations, the performance gap that exists between the performances with
and without redundancy closes up for large values of the mutation rate. For mutation
rates of 4.5 or 5, the two curves are on top of each other and decreasing fast. After 400
generations, the performance gap still exists for these mutation rates and performance is
not in decline in such a strong way. In the case of no redundancy, the performance for a
mutation rate of 5 is still almost maximal. It therefore seems that high mutation rates
are bad in the short term but manage to offset this disadvantage in the long run. This
probably is a consequence of the random population effect which was mentioned earlier.
A higher mutation rate becomes more appropriate as the high genetic variation found in
the initial random population is eroded by selection.
Figure 5.11 shows fitness against number of generations for both permutations. The
mutation rate that has been used for the identity permutation is the one that achieves
maximum fitness after 400 generations. The mutation rate used for permutation [07143562]
is the one that reaches the same fitness in the shortest time. As in the previous problem,
we used this figure to compare the time taken with and without redundancy to reach the
same level of fitness. Taking the fitness reached after 400 generation without redundancy
as the basis for comparison, we see that the GA operating with a redundant code reaches
93
248
07143562
Id
FITNESS
246
244
242
240
238
0
1
2
3
4
5
(a) After 100 generations
258
07143562
FITNESS
256
Id
254
252
250
248
246
0
1
2
3
4
5
(b) After 400 generations
Figure 5.10: Second problem: the proportion of optimal blocks in the best individual as a
function of the expected number of mutations per genome. Each line corresponds to the
use of a different permutation. Every point is the average of 50 trials. Error bars indicate
the standard error.
94
260
07143562 m=2
Id
255
m=2.5
FITNESS
250
245
240
235
230
0
50
100
150
200
250
300
350
400
GENERATION
Figure 5.11: Second problem: comparing the speed of evolution with permutations
[07143562] and Id. For each permutation, the optimal mutation rate has been used and
is indicated in the legend. The dashed line shows the number of generations it takes permutation [07143562] to reach the level of optimality obtained after 400 generations with
permutation Id. Every point is the average of 50 trials. Error bars indicate the standard
error.
that same level in slightly more than 160 generations. This is not quite as high as in the
previous case but is still very significant from a practical point of view.
Both curves have at the start a very strong upwards slope at which point they are
almost indistinguishable. From generation 40, both curves experience a decrease in their
rate of improvement and by generation 150 they have stabilised at a much slower slope.
At that point however, the curve for redundancy is a great deal higher than the other.
The superiority of redundancy therefore appears to be rooted to a large extent in that
intermediary phase. Redundancy seems to be able to extend the period of high growth
for a little bit longer.
5.4
5.4.1
Third test problem: finding a compact non-overlapping
path on a grid
The problem
In this problem, chromosomes represent paths through a two-dimensional grid made of
square cells. A path is defined as a succession of steps from one cell to a neighbouring
one. From any cell on the grid, there are eight such neighbouring cells and hence eight
possible moves as shown in Figure 5.12. The grid is large enough to ensure that no path
ever reaches the edges: since the paths considered here are 100 moves in length, we have
to ensure that the grid extends more than 100 cell in all directions from the initial cell to
ensure that its boundaries are never reached.
95
In the non-redundant version of the problem, a chromosome is decoded into a path in
the following way. The chromosome, which is 300 bits in length as in the two previous
problems, is read from left to right by groups of three bits. The eight possible values that
those three bits can take map to the eight moves possible from the current cell. Performing
these moves on the grid in the order in which they appear on the chromosome leads to a
complete path.
The fitness function that was used in these experiments was set up so as to select for
two somehow antagonistic criteria. The first one is that the path goes through as many
different cells as possible. A path going always in the same direction would be optimal
in this respect since it would never cross the same cell twice and would therefore include
101 distinct cells. On the other hand, a path that went alternatively up and down would
only go through two cells and would have the lowest possible score. The component of
fitness corresponding to this criteria is simply defined as the number of cells, other than
the initial one, which are traversed by the path. Hence the first of the examples would
score 100 while the second one would score 1.
The second criteria is that the rectangle enclosing the entire path has the smallest
possible perimeter. Suppose we define a system of coordinates on the grid such that the
cell from which the path starts has coordinates (0, 0) and a cell with coordinates (X, Y )
is found moving X times to the right and Y times up on the grid. Negative values of
X and Y correspond to cells which are respectively left and down from the initial cell.
Call Xmin the X coordinate of the leftmost cell that is reached by the path, Xmax the X
coordinate of the rightmost one, Ymin the Y coordinate of the bottom most one and Ymax
the Y coordinate of the up-most one. The variable (Xmax − Xmin) + (Ymax − Y min)
is equal to half the perimeter of the smallest rectangle enclosing the entire path. Since
Xmax ≥ Xmin and Ymax ≥ Ymin , (Xmax − Xmin) + (Ymax − Y min) is always positive.
Fitness is obtained by subtracting the value of this variable from the value obtained from
the first criteria.
In the case of a path made of 100 consecutive movements to the right, Xmax would be
equal to 100 while Xmin, Ymax and Y min would all be equal to 0. Hence 100 from the
first criteria would be subtracted from 100 from the second criteria resulting in a fitness
of 0.
There are many different optimal paths for this fitness function. The maximum score
that can be obtained from the first criteria is 100. It requires that 101 distinct cells
are crossed by the path. The rectangle with the smallest perimeter that can encompass
that number of cells would have to be 11 cells by 10. The total fitness would then be
100 − 11 − 10 = 79. Alternatively the entire path could be fitted in a square of size 10 by
having only one cell traversed twice by the path. The fitness would then be 99 − 10 − 10
which is the same.
5.4.2
Adding redundancy
Here again, redundancy can be added following the precise lines of Chapter 3. The mapping between groups of 3 bits and directions of movement used in the non-redundant
NW
N
W
SW
96
NE
E
S
SE
Figure 5.12: From the cell at the center eight possible moves are possible as indicated by
the eight arrows.
version is shown in the table on the left of Figure 5.13. Introducing redundancy with
permutation σ = [07143562] leads to the code shown on the right of that same figure.
a1 a2 a3
T (a1 a2 a3 )
d1 a1 a2 a3
T σ (d1 a1 a2 a3 )
d1 a1 a2 a3
T σ (d1 a1 a2 a3 )
000
N
0000
N
1000
N
001
NE
0001
NE
1001
NW
010
E
0010
E
1010
NE
0011
SE
1011
S
→
011
SE
100
S
0100
S
1101
SE
101
SW
0101
SW
1101
SW
110
W
0110
W
1110
W
111
NW
0111
NW
1111
E
Figure 5.13: Third problem: transforming the encoding through permutation [07143562].
Moves are abbreviated by their geographical equivalent, N for north, N E for northeast,...
Redundancy takes the length of the chromosome up to 400 bits. The case of no
redundancy is here again studied by the intermediary of permutation Id.
5.4.3
Results
Figure 5.14 shows the fitness of the best individual in the population as a function of
the genomic mutation rate. Every point on those graphs is the average of 50 trials as in
previous experiments. Panel (a) shows the fitness achieved in 100 generations while panel
(b) shows fitness after 400 generations. The mutation rate is increased up to a value of
10 compared with only 5 in the two previous problems. Because performance is nearly
optimal at a value of 5, the range had to be extended in order to display the decline in
performance with high mutation rates.
The two panels shown in Figure 5.14 are very similar to what was obtained in the two
previous problems. Redundancy here again leads to a better performance of the GA when
comparison is done at the optimal mutation rates. In both panels, the following trend
can be observed. Both curves follow a straight downwards line past a certain mutation
97
73
07143562
72
Id
FITNESS
71
70
69
68
67
66
0
1
2
3
4
5
6
7
8
9
10
(a) After 100 generations
75
07143562
Id
FITNESS
74
73
72
71
70
0
1
2
3
4
5
6
7
8
9
10
(b) After 400 generations
Figure 5.14: Third problem: the proportion of optimal blocks in the best individual as a
function of the expected number of mutations per genome. Each line corresponds to the
use of a different permutation. Every point is the average of 50 trials. Error bars indicate
the standard error.
75
98
07143562 m=3
Id
m=5
FITNESS
74
73
72
71
70
0
50
100
150
200
250
300
350
400
GENERATION
Figure 5.15: Third problem: comparing the speed of evolution with permutations
[07143562] and Id. For each permutation, the optimal mutation rate has been used and
is indicated in the legend. The dashed line shows the number of generations it takes permutation [07143562] to reach the level of optimality obtained after 400 generations with
permutation Id. Every point is the average of 50 trials. Error bars indicate the standard
error.
rate. However the curve for redundancy starts its descent at a slightly lower mutation
rate. This results in the curves crossing over and the curve for redundancy being under
and parallel to the other one from that point onwards.
The offset between the two curves at high mutation rates can be explained as follows.
All mutations of the first bit in a block of four are neutral when permutation Id is used
while only 3/8 of these are neutral when permutation [07143562] is used. Hence, at a given
genomic mutation rate, more mutations are neutral with Id than with [07143562]. Given
that in that part of the graph (m > 7) the excess of mutation is hindering performance,
we expect Id to be favoured because its high proportion of neutral mutations shields it in
part from the excess. Both curves would be on top of each other if the rate of non-neutral
mutations was plotted.
The mutation rate at which performance starts to degrade with higher mutation rates
(both with and without permutation) is higher after 400 generations than it is after 100.
This was also observed in the previous examples and we suggested an explanation page 92.
Figure 5.15 shows fitness plotted against time for both the redundant code and the
non-redundant one. This graph allows a comparison in time as performed in the previous
sections. The fitness reached without redundancy in 400 generations can be reached in
just under 200 with a redundant code. This is a twofold increase in speed, comparable to
what was obtained in the previous problem.
5.5
99
Conclusion
The previous chapter showed indirectly that certain forms of redundancy could facilitate
evolution’s search for fitter individuals. This chapter has shown that this effect is not just
a theoretical possibility and can be demonstrated, for some mutation rates, when evolution
is simulated using a GA. We also showed that previously defined variable Rσ correlates
well with the magnitude of the improvement observed and can therefore used reliably as
a ways of detecting useful redundancy.
On three very different problems, comparison was made between a GA using a nonredundant code and the same GA using a redundant version of that code in the sense
defined in Chapter 3. The pattern used has the highest Rσ value possible. Performance was
always optimised with respect to the mutation rate prior to any comparison. Redundant
codes were found to perform much more effectively on all three problems. This was
illustrated by a reduction by a factor between 2 and 4 in the number of evaluations
needed to reach an arbitrary level of fitness.
Chapter 6
Some limitations to the benefits of redundancy
6.1
6.1.1
Application of redundancy to the design of a wing box
The problem and the original encoding
The problem that will be described here was defined as part of the Genetic Algorithms
in Manufacturing Engineering (GAME) project. British Aerospace, one of the industrial
partners in this project, provided data inspired from the design of the Airbus wing box
for its definition. It is common in aircraft structure design to be faced with the problem
of defining mechanical structures of minimal weight that can withstand a given load. The
formulation of a reliable and easy to use optimisation procedure which can discover good
solutions rapidly for these high dimensional problems is still a standing challenge. One of
the aims of the GAME project was to assess the performance of GAs on that challenge.
Figure 6.1 displays a simple sketch representing the elements of wing structure which
are relevant to this problem. The wing is supported at regular intervals by solid ribs which
run parallel to the aircraft’s fuselage. On the upper part of the wing, thin metal panels
cover the gap separating adjacent ribs. The number of this panels is equal to the number
of ribs minus one. The objective of this problem is to optimise the number of ribs (or
panels) and their thickness in such a way that the weight of the wing is minimal and that
it does not buckle under the compressive stresses produced by the bending moments of a
2.5g manoeuvre. The ribs are supposed to be strong enough to sustain their corresponding
load and their buckling is not considered. Only the panels can buckle. All dimensions of
the wing are fixed.
Mass has to be minimised; it is therefore sensible to take as the fitness of a candidate
wing its mass preceded by a minus sign. The mass of a panel will depend on its thickness
and its position on the wing: because the wing is tapered, the panels near the tip have
lower dimensions and thus a lower mass for a given thickness. The total mass of the ribs
only depends on the rib pitch (i.e. on the number of ribs), not on the thickness of the
panels they have to support.
For every panel, the stress incurred by panel i, σi , as well as a threshold stress σit are
calculated. The equations can be found in (McIlhagga et al., 1996). If the stress on the
Chapter 6. Some limitations to the benefits of redundancy
101
Fuselage
Ribs
Top panel
Rib pitch
Cavity
Figure 6.1: The relevant elements of a wing. The wing dimensions are fixed. The variable
elements are the number of ribs and the thickness of the top panels.
N
th(1)
∆ th(1)=
th(2)-th(1)
...
∆ th(i)=
th(i+1)-th(i)
...
∆ th(N-2)=
th(N-1)-th(N-2)
N: Number of ribs
th(i): Thickness of i
th
panel
Figure 6.2: The representation of the wing parameters on the chromosome.
panel is smaller than the threshold (σi < σit ), the panel will not buckle and the mass of
the panel is added to the mass of the other panels without correction. If on the other hand
the stress exceeds the threshold (σi > σit ), the panel is too thin and will buckle. The mass
of the panel is then multiplied by 1 + (σi /σit ) before being added. This penalty function
compensates for the weakness of the panel by increasing its thickness to a value which
should allow it to withstand the stress. By doing this, the constraint of withstanding
stress is converted into mass, the currency of fitness.
Let us now examine the encoding that was used in the GAME project. The parameters
that need to be specified for a full definition of a solution to this problem are the number
of ribs, N , and the thicknesses of the N −1 panels. There is a constraint to be respected on
the thickness of these panels which is that adjacent panels do not differ in thickness by more
than 0.25 millimeter. The simplest way to ensure that only wings respecting this constraint
are handled by the GA is to encode the differences in thicknesses between adjacent panels
rather than the absolute thicknesses of the panels. If we know the difference in thickness
102
∆th(i) between panel i and i + 1 for i between 1 and N − 1, the absolute thickness of the
first panel is enough to define everything else. All these parameters are mapped on the
chromosome in the order described on Figure 6.2. Notice that a change in ∆th(i) leads to
a changes in the thickness of panel i + 1 and of all subsequent panels up to the tip of the
wing; all these panels have their thickness changed by the same amount.
The number of ribs, N, is represented using 4 bits. This allows 16 different values
which have been chosen to be anything between 42 and 57. The thickness of the first
panel was allowed to vary between 5 and 15 mm by steps of 10−3 mm. This requires a
minimum a 14 bits to represent all these values. But 14 bits allow over 16,000 possible
values to be encoded, some thickness were therefore represented by more than one binary
sequence. We are not concerned here with the redundancy of that mapping.
For all subsequent N − 2 panels the difference in thickness with the previous panel is
represented on the chromosome. In the GAME project, only five values were allowed for
this difference which were the result of considerations on manufacturing tolerances: -0.25
mm, -0.125 mm, 0 mm, 0.125 mm and 0.25 mm. Three bits were used to encode these
five values with the following mapping:
a1 a2 a3
T0 (a1 a2 a3 )
000
-0.25 mm
001
-0.125 mm
010
0.0 mm
011
0.125 mm
100
0.25 mm
101
0 mm
110
0 mm
111
0 mm
Chromosomes of constant length were used. Their length was such as to allow the
encoding of the maximum number of panels possible (56). When less panels are needed,
the end of the chromosome codes for non-existent panels which are simply ignored. The
total number of bits needed for the chromosome is 4 + 14 + 3 × 55 = 183.
6.1.2
Modifying the encoding
In order to apply redundancy in a way that is consistent with the definitions used in this
thesis, we need to define a non-redundant encoding to which various forms of redundancy
can then be applied. But the mapping used by the GAME project is already redundant
since the value 0 is represented by four different triplets.
The only acceptable way to transform this mapping into a non-redundant one is to
increase the number of possible differences in thickness from five to eight. That will ensure
that a different value can be assigned to each triplet. We maintained -0.25 and 0.25 as the
lower and upper bounds of the range so that the same space of solutions is explored. We
also kept 0 as one of the possible values. The five remaining values were chosen as shown
on the following table:
a1 a2 a3
T2 (a1 a2 a3 )
000
-0.25 mm
001
-0.1875 mm
010
-0.125 mm
011
-0.0625 mm
100
0 mm
101
0.0833 mm
110
0.166 mm
111
0.25 mm
103
The positive half of the range is split equally between three values while the negative
is split between four.
6.1.3
Introducing redundancy
As in previous experiments, we focused here on the comparison of two permutations only:
the permutation with the highest Rσ value, [07143562], and the identity permutation
which is equivalent to using no redundancy at all.
In all the experiments described here, redundancy was only used on the triplets defining
differences in thickness between panels; the encoding of the value of N and of the thickness
of the first panel were left unchanged. Every block defining the difference in thickness of a
panel was encoded using four bits instead of three. In the case of the identity permutation,
the value of the extra bit is irrelevant to fitness. In the case of permutation [07143562],
the following mapping is obtained:
d1 a1 a2 a3
T 2σ (d1 a1 a2 a3 )
d1 a1 a2 a3
T2σ (d1 a1 a2 a3 )
0000
-0.25
1000
-0.25
0001
-0.1875
1001
0.25
0010
-0.125
1010
-0.1875
0011
-0.0625
1011
0.0
0100
0.0
1101
-0.0625
0101
0.0833
1101
0.0833
0110
0.166
1110
0.166
0111
0.25
1111
-0.125
The maximum length of chromosome now needed is: 4 + 14 + 4 × 55 = 238.
6.1.4
Results
In all the results of this chapter, the GA described in Section 5.1 was used. The only
difference with the previous chapter is that the population grid was changed from 20 ×
20 to 40 × 40 in order to be in line with the values used in the GAME project. The
population size is therefore 1600. Recombination was always used. Whenever fitness is
represented, it is the fitness of the best individual in the population averaged over 50 trials.
Some graphs plot fitness as a function of the number of generations elapsed. When fitness
is plotted against the mutation rate, fitness is taken after 200 generations.
104
-12860
07143562 T
FITNESS
T2
Id
-12880
2
-12900
-12920
-12940
-12960
-12980
0
1
2
3
4
5
6
7
8
9
10
11
12
Figure 6.3: The fitness of the best individual in the population after 200 generations as a
function of the mutation rate per genome. Error bars indicate the standard error.
-12860
07143562m=7
FITNESS
-12870
Id
T2
m=9 T 2
-12880
-12890
-12900
-12910
-12920
100
120
140
160
180
200
GENERATION
Figure 6.4: Comparing the speed of evolution with and without redundancy. The best
mutation rates are used in both cases. Error bars indicate the standard error.
a1 a2 a3
105
T2 (a1 a2 a3 )
000
-0.25
001
-0.1875
010
-0.125
011
-0.0625
100
0
101
0.0833
110
0.166
111
0.25
0.166
0.25
0.0833
0.0
-0.125
-0.25
-0.0625
-0.1875
Figure 6.5: The non-redundant code T2
Figure 6.3 shows that the improvement brought about by redundancy in this case is
very marginal. Only at a mutation rate of 7 is there any improvement at all. Figure 6.4
shows that it takes 165 generations to reach the level of fitness that would be achieved
in 200 generations without redundancy. This is disappointing compared with the kind of
improvements that were observed on previous problems. There are however good reasons
for this. The rest of this section and the next one will be devoted to uncovering them.
6.2
6.2.1
Comparing redundancy on three non-redundant codes
Definition
The non-redundant code, T2 , on which redundancy was added in the previous experiments
is shown on Figure 6.5. It was chosen because the possible values of ∆th appear in a natural
increasing order in the right column as was the case in the original code used in the GAME
project. But this arbitrary choice has important consequences. It implicitly determines
which transitions between ∆th values are possible by point mutation and which are not.
A convenient way of picturing this situation is shown in Figure 6.5. The 8 possible values
of ∆th are placed at the corners of a cube according to the triplet that represents them.
The association between corners of the cube and binary sequences is the same that was
used in Chapter 3 and represented in Figure 3.1. From any corner of the cube, it is
possible to go by point mutation to one of the three corners which are connected by an
edge. As Figure 6.5 shows, with T2 it is not always possible to go from one value of ∆th
to the nearest one by point mutation. From 0 for instance, it is possible to go to 0.0833
(100→101) but not to -0.0625 (100→001) which is three mutations away.
To understand the part of T2 in the results of the previous section, we ran the same
experiments changing T2 to other non-redundant codes. These non-redundant codes are
obtained by using the same set of possible ∆th values but assigned to binary sequences in
a different order.
One such code is T1 represented in Figure 6.6. It is built on the same principle as a
Gray code: two ∆th values which are neighbours have binary representations which are
also neighbours. This way, smooth transitions in thickness are always possible by point
a1 a2 a3
T1 (a1 a2 a3 )
000
-0.25 mm
001
-0.1875 mm
010
-0.0625 mm
011
-0.125 mm
100
0.25 mm
101
0.166 mm
110
0.00 mm
111
0.0833 mm
0.0
0.0833
0.166
0.25
-0.0625
-0.25
106
-0.125
-0.1875
Figure 6.6: The non-redundant code T1 .
a1 a2 a3
T3 (a1 a2 a3 )
000
-0.25
001
0.00
010
0.0833
011
-0.1875
100
0.166
101
-0.125
110
-0.0625
111
0.25
-0.0625
0.166
0.25
-0.125
0.0833
-0.25
-0.1875
0.0
Figure 6.7: The non-redundant code T3 .
mutation as can be seen from the cube in Figure 6.6.
The other non-redundant code that was tried is T3 defined in Figure 6.7. It is built on
the reverse principles of the previous one. That is, ∆th values which are neighbours are
given binary representations which are at least two point mutations apart.
6.2.2
Results
Figure 6.8 compares the fitness achieved with and without redundancy when code T1 is
used. In this case, even the marginal advantage of redundancy obtained with T2 has
disappeared. Fitness at the optimal mutation rate is slightly lower with redundancy than
without.
Figure 6.9 makes the comparison with and without redundancy when code T3 is used.
Here, in contrast, redundancy at a mutation rate of 5 or 6 results in a noticeable increase
in fitness over what is possible without redundancy at any mutation rate. The shapes and
relative positions of the two lines are similar to what was obtained in the previous chapter
in similar comparisons (Figures 5.10 and 5.14): a noticeable gap exists between the two
curves which disappears for high mutation rates. Figure 6.10 translates this superiority of
redundancy on the generation axis: the level of fitness reached in 200 generations without
107
-12860
07143562
T1
Id
T1
FITNESS
-12880
-12900
-12920
-12940
-12960
0
1
2
3
4
5
6
7
8
9
10
11
12
Figure 6.8: The fitness of the best individual in the population after 200 generations as
a function of the mutation rate per genome when non-redundant code T1 is used. Error
-12860
07143562 T
-12880
Id
3
T3
FITNESS
-12900
-12920
-12940
-12960
-12980
-13000
-13020
0
1
2
3
4
5
6
7
8
9
10
11
12
Figure 6.9: The fitness of the best individual in the population after 200 generations as
a function of the mutation rate per genome when non-redundant code T3 is used. Error
-12850
07143562 m=5 T
Id
-12900
FITNESS
108
3
m=7 T 3
-12950
-13000
-13050
-13100
0
50
100
150
200
GENERATION
Figure 6.10: Comparing the speed of evolution with and without redundancy when T3 is
used. The best mutation rates are used in both cases. Error bars indicate the standard
error.
redundancy can be reached in only 120 when redundancy is used. This is the sort of
improvement which was observed on the problems of the previous chapter.
In Figures 6.11 we have represented the same data in a different manner. The first
figure compares the performances of T1 , T2 and T3 when used in their raw form, i.e.
prior to the addition of any redundancy. Figure 6.12 makes the same comparison after
redundancy has been added to them.
Prior to the addition of redundancy to them the three non-redundant codes do not
perform equally well. At almost all mutation rates examined here, the following relation
emerges: T1 > T2 > T3 . However, once redundancy has been added to them, the three
codes perform equally well. The level of performance achieved by the three codes in
this case is the same as is achieved by T1 without any redundancy. Redundancy hence
compensates for the disadvantage of the other two.
We conclude that, on this problem, the impact of redundancy on the GA depends
on the choice of the non-redundant code as well. Redundancy does badly on T1 which
preserves, at the binary level, the natural closeness that exists between similar values of
∆th; but it does much better on T3 which does not preserve that distance. The other
code, T2 , is somewhere in between and performs accordingly.
The most likely explanation for these observations is that the kind of local optima
which redundancy removes does not exist on this problem when a code such as T1 is used.
Only an unnatural code such as T3 introduces some of these optima which redundancy
can then remove.
109
-12860
Id T
-12880
Id T 3
-12900
FITNESS
1
Id T 2
-12920
-12940
-12960
-12980
-13000
-13020
0
1
2
3
4
5
6
7
8
9
10
11
12
Figure 6.11: Comparing non-redundant codes T1 , T2 and T3 without redundancy. Error
-12860
FITNESS
-12880
07143562
T1
07143562
T2
07143562
T3
-12900
-12920
-12940
-12960
-12980
0
1
2
3
4
5
6
7
8
9
10
11
12
Figure 6.12: Comparing non-redundant codes T1 , T2 and T3 with redundancy in the form
of permutation [07143562]. Error bars indicate the standard error.
6.3
110
Why does T1 perform better than T3 ?
6.3.1
Non-redundant codes and partial fitness functions
For the purposes of this problem, a wing W is defined by the specification of the number of
panels, N , the thickness of the first panel, th(1), and an array of N − 1 numbers, [∆th(1),
∆th(2), ..., ∆th(N − 1)], all taken from the set S = {−0.25, −0.1875, −0.125, −0.625, 0,
0.083, 0.166, 0.25}. If we set the value of ∆th(i) to each of the eight possible values while
keeping all other values ∆th(j) as they are in W , we obtain eight different wings. We call
Wx the wing obtained by setting ∆th(i) = x.
The function z,
z : {−0.25, −0.1875, −0.125, −0.625, 0, 0.083, 0.166, 0.25} → ] − ∞, 0]
z(x) = F (Wx )
where F (Wx ) is the fitness of wing Wx , describes what happens when ∆th(i) = x is varied
in W while everything else is kept constant. If the value of ∆th(i) found in the original
definition of W is not the one that maximises z, there is some scope for improving the
wing by changing the bits that define ∆th(i). Notice that the definition of z depends
completely on the choice of W and i: changing the value of any other panel other than i
will change the definition of z.
Functions such as z were in fact previously encountered in the context of Section 3.4.1.
They were the justification for assigning fitnesses to elements of S so that we could calculate numbers of optima before and after the introduction of redundancy (Nf and Nf σ ).
Indeed, when a function such as z is combined with a code T (of which T1 , T2 and T1 are
examples) it produces a function:
z ◦ T : {0, 1}3 → ] − ∞, 0]
which allows us to talk of beneficial and deleterious mutations over these triplets.
To calculate the average effect of a form of redundancy, it was assumed in Chapter 3
that all possible functions z would be encountered with equal probability. The translation
of this assumption in the present context would be that, for a random wing W and a
random value of i, any change of ∆th(i) is as likely to improve fitness as any other. A
consequence of this assumption would be that any non-redundant code is as good as any
other.
The previous section has showed that this is not the case: some non-redundant codes
perform better than others on this problem. It must therefore be the case that the functions
z are not uniformly distributed in this problem. They must have some statistical properties
cause them to combine gracefully with code T1 and ungracefully with code T3 .
Consider in Figure 6.13 a few of the shapes a function z might take. When z is strictly
monotonic as pictured at the top left, mutations which increase the value of ∆th increase
the fitness of the wing. Since under T1 such mutations are always possible, function z ◦ T1
will have a single optimum (0.25 mm).
Function z ◦ T2 , will also have a single optimum in this case, since transition to some
larger value (not necessarily the one immediately larger) of ∆th is always possible from
-13150
111
-13260
-13280
-13200
FITNESS
FITNESS
-13300
-13250
-13300
-13320
-13340
-13360
-13380
-13350
-13400
-13400
-13420
-0.25 -0.18 -0.12 -0.06
0.0
∆ th
0.08
0.16
0.25
-0.25 -0.18 -0.12 -0.06
0.0
∆ th
0.08
0.16
0.25
-13150
FITNESS
-13200
-13250
-13300
-13350
-13400
-0.25 -0.18 -0.12 -0.06
0.0
∆ th
0.08
0.16
0.25
Figure 6.13: Possible variations of fitness when changing the thickness of a single panel.
any binary triplet. However, the same function z will combine badly with code T3 creating
4 local optima: 0.0, 0.08, 0.16 and 0.25. Indeed since these four values are two mutations
apart from each other (Figure 6.7), point mutations from any of them will lead to values
of ∆th which are all smaller.
A function such as the one at the top right of Figure 6.13 will also result in z ◦ T1
having a single optimum since moves to the immediately larger or smaller value of ∆th
are always possible by point mutation under T1 . However, z ◦ T2 would have two optima
since from 0.00 (the second best value) one cannot make the transition to 0.06 because
their respective binary representations, 100 and 011, are three mutations away.
The function at the bottom of Figure 6.13 is an instance of function z that combines
better with T3 than T1 ; z ◦ T1 has 4 optima while z ◦ T3 has only one.
These three examples illustrate that for any instance of z, some non-redundant codes
are well matched and some are not. Only on average over all possible types of function
z are all non-redundant codes equivalent. If in the present problem z functions are often
like those at the top of Figure 6.13, many fitness improvements will be possible by point
mutation when code T1 or T2 is used which will be impossible when code T3 is used. This
is bound to have an impact on the speed of the GA and would explain the differences in
performance shown in Figure 6.11.
We showed in Chapter 3 that a permutation such as σ =[07143562] caused the function
z ◦ T σ to have less optima than the function z ◦ T provided that the functions z were
randomly generated. When this is true, the function z ◦ T has on average two optima.
But in the case where code T1 combines only with functions such as the ones on top of
112
Figure 6.13, this permutation cannot do anything because the numbers of optima of z ◦ T1
is always one. In this case, no permutation could do better because there is simply no room
for redundancy to act. The same z function combined with codes like T3 offers a very rich
terrain for redundancy to make a difference. And as Figure 6.12 shows, the performance
of the GA with T3σ reaches the same level as T1 . Redundancy therefore compensates for
the poor match between T3 and z.
We have reasons to expect functions z to be more like the ones at the top of Figure 6.13
in this problem. Among the eight possible values of ∆th for a given panel of the wing,
there might be a threshold value below which the wing buckles and above which it does
not. When that is the case, the optimal value for ∆th will be the lowest value for which
the wing does not buckle. Indeed any lower value will be associated with very low fitness
(because of the buckling) and increasing the thickness beyond that threshold value will
add unnecessary mass to the wing. Hence functions z will always be decreasing for those
values where the wing does not buckle. We cannot predict the shape z is likely to have for
those values which cause the wing to buckle. But any local optima that could exist there
is probably not relevant: the low fitness of these points means that they are not the ones
where the GA is likely to get stuck. Hence, for the parts of the function which matters, z
functions are likely to be monotonic.
6.3.2
Counting numbers of optima
The previous section proposed that the difference in performance between the three codes
T1 , T2 and T3 is due to the functions z being biased towards a situation where z ◦ T1 has
few optima while z ◦ T3 has many (and z ◦ T2 being somewhere in between). This section
will test this claim experimentally.
Although a low number of optima indicates an easy task for the GA, we should not
assume that many optima necessarily lead to a difficult one. We can imagine situations
where many optima exist which are irrelevant to the evolutionary process because individuals handled by the GA — or at least those which breed — are never found to be held up
by them. Genotypes handled by the GA are a tiny subset of all possible genotypes whose
atypical features are (1) to be of higher than average fitness, (2) to be such that a GA can
reach them by mutation and recombination. To make sure that the numbers of optima
we find for the function z ◦ Tk (k = 1, 2, or 3) are representative of a real difficulty for
the GA, we will also calculate the probability of a genotype from an evolving population
being found in such optima.
As we have emphasised, a function z is defined given the choice of a wing W (or
equivalently a genotype G coding for W ) and an integer i indicating which value ∆th(i) is
being varied while the others are kept constant. As we are only interested in the functions
z ◦ Tk encountered by the GA, it is sensible to take G among a population of evolving
genotypes rather than completely randomly; in all that follows, G is the best individual
in the population after g generations. We choose the best individual because it is the one
from which improvement is most likely to come and any suboptimal value of ∆th(i) in
that individual will be taken as good indication that the GA is hindered. The number of
113
generations g is a parameter which is varied to monitor how the situation changes as the
population becomes fitter.
We want to compare the situation when each of the three codes T1 , T2 and T3 was
used; hence the following steps were performed.
For 1 ≤ k ≤ 3
Repeat 200 times
Run a GA with code Tk for g generations
Pick G, the best individual in the population,
For 1 ≤ i ≤ 40
(a) Count the number of optima in function z ◦ Tk defined by G and i.
(b) Check if the value of G at ∆th(i)is a local (non-global) optimum.
The answer to (b) is necessarily negative if the answer to (a) is 1 since there are no local
optima in that case. If the number found in (a) is large but never results in (b), we can
not invoke (a) as a cause of delay for the GA.
In one set of graphs, we averaged the number found in (a) over the 40 values of i and
the 200 trials and similarly calculated the proportion of cases where the answer to (b) was
positive out of these 200 × 40 cases. In another set of graphs, we kept separate averages
(over 200 trials) for each value of i and plotted these results as a function of i. By keeping
i lower than 40 we make sure that the values of ∆th(i) are always relevant to the fitness
of the wing. Larger values would be in the variable length part of the chromosome and
would therefore not always be under selection.
6.3.3
Results and Discussion
Figure 6.14 shows how the number of optima (averaged over all values of i) changes with
the number of generations g. Three lines are plotted corresponding to T1 , T2 and T3 . The
line for T3 is well above the other two with an initial value of 3.75 that increases over 50
generations to an equilibrium value of 3.8. Code T1 is at the other extreme with values
very near the minimum value of 1. It starts at 1.03 and increases steadily to 1.08 over
the 200 generations. Code T2 is in between the two but much closer to T1 than to T3 .
Whereas the other two increase with g, T2 decreases from 1.25 to 1.2.
This supports our explanation for the differences in performance. The number of
optima of z ◦ T1 is only marginally greater than 1 indicating that there are very few local
optima in which a GA using that code could be caught. The number of optima of z ◦ T3
is very high with an average of 3.85 local optima. This tells us there are many optima in
the vicinity of the best individual in the population. It does not tell us whether the best
individual tends to avoid them or not.
Figure 6.15 answers this question. It plots the probability that the best genotype G is
found with ∆th(i) at a local optimum as a function of g. As in the previous figures, all
values of i are averaged together. With T3 , over 40 % of the blocks are at local optimum
at generation 0. This value diminishes over the entire length of the trial but after 200
generations it is still greater than the starting value for code T2 . The decrease of this
3.9
114
T3
3.85
NUMBER OF OPTIMA
3.8
3.75
3.7
3.65
3.6
T2
T1
1.3
1.25
1.2
1.15
1.1
1.05
1
0
50
100
150
200
GENERATION
Figure 6.14: The average number of optima in blocks defining a panel thickness as a
function of the number of generations. This data is an average of the first 40 panels of the
best individual. Each line corresponds to a different non-redundant code.
PROPORTION OF BLOCKS
IN LOCAL OPTIMA
115
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
T3
0.06
T2
T1
0.05
0.04
0.03
0.02
0.01
0
-0.01
0
50
100
150
200
GENERATION
Figure 6.15: The proportion of blocks found at a local optimum as a function of the number
of generations. This data is an average of the first 40 panels of the best individual. Each
line corresponds to a different non-redundant code.
116
number with time shows that these optima are in the way of the GA and that part of
the evolutionary effort is directed at eliminating them. In contrast the line for T1 shows
that these optima are not an issue when this code is used. Their number is initially less
than 0.5% and it converges towards 0 in less than 20 generations. The line for T2 is here
again between the two but closer to T1 than to T3 . Convergence to 0 is achieved in 70
generations.
Taken together, the results of Figure 6.14 and Figure 6.15 give strong support to the
idea put forward at the end of Section 6.3.1. The statistical properties of functions z encountered in this problem are such that function z ◦ T1 cannot be improved by redundancy
whereas the average function z ◦ T3 has much scope for it.
The results that will be presented now show the same data analysed differently. Instead
of averaging the points obtained for different values of i, we keep them separate and treat i
as a variable. This will allow us to determine whether the way z interacts with T depends
on the part of the wing which is considered. Variable i replaces g on the X axis and
different values of g are represented as different lines on the same graph. The data for T1 ,
T2 and T3 appear in Figure 6.16, Figure 6.17 and Figure 6.18 respectively.
Figure 6.16(a) shows how the number of optima of function z ◦ T1 changes with i. For
any value of i greater than 18 and for any value of g, the number of optima is exactly 1 in
this case (which means that it is consistently 1 for each of the 200 trials). For values of i
between 13 and 17 the number of optima is around 1.1 and does not change significantly
with g. For values between 6 and 12, the numbers are around 1.1 to start with but they
increase significantly with g. At generation 200, the numbers culminate at 1.5 for i =10,
11 and 12. For i between 1 and 5, numbers are low and stable with g.
Figure 6.16(b) shows that for values of i greater than 18, the best individual in the
population is never caught in local optima. We can guess that from Figure 6.16(a) since for
such values of i there is only one optimum which must be global. After only 20 generations
the best individual is free of local optimum anywhere on the chromosome. In the limited
range of space and time where local optima are found, no clear trend emerges. In any
case, the high value observed for i = 10 in Figure 6.16(a) does not cause a high probability
of being caught in a local optimum for that value of i.
Figure 6.17(a) shows the number of optima of z ◦ T2 as a function of i. For i = 19
this number increases with time while it decreases or remain stable for all other values of
i. The depression with time is most marked for i = 10 where the numbers go down to 1.
This contrasts with z ◦ T1 where, for the same value of i, numbers of optima went up with
g to 1.5. This is therefore a value of i for which z seems to interact more beneficially with
T2 than with T1 .
A closer look at the function z for that value of i reveals the following trend. The best
genotype in the population is such that a value of ∆th(10) lower than 0.166 results in a wing
that buckles and hence of low fitness. However, among these low fitness values, z(−0.125)
or z(−0.0625) are slightly higher than the others. As can be seen from Figure 6.6, this
creates a local maximum under T1 because -0.125 and -0.0625 are two point mutations
away from the better values 0.166 and 0.25. Under T2 on the other hand, mutation from
T1
1.6
NUMBER OF OPTIMA
117
gen=0
gen=50
gen=100
gen=150
gen=200
1.5
1.4
1.3
1.2
1.1
1
0.9
0
5
10
15
20
25
BLOCK POSITION
(a)
T1
PROBABILITY OF
LOCAL OPTIMA
0.04
gen=0
gen=10
gen=20
gen=200
0.03
0.02
0.01
0
-0.01
0
5
10
15
20
25
BLOCK POSITION
(b)
Figure 6.16: Optimality along the wing with T1 . (a) The number of optima of z ◦ T1 as a
function of i. The genotype G used is the best in the population after g generations. (b)
The probability of G being at a local (non-global) optimum for ∆th(i) as a function of i.
In both cases, several values of g are plotted. Each point on both s graphs is the average
of 200 trials.
T2
2
NUMBER OF OPTIMA
118
gen=0
gen=5
gen=10
gen=20
gen=50
gen=200
1.8
1.6
1.4
1.2
1
0
5
10
15
20
25
BLOCK POSITION
(a)
T2
PROBABILITY OF
LOCAL OPTIMA
0.2
gen=0
gen=10
gen=20
gen=40
gen=50
gen=200
0.15
0.1
0.05
0
-0.05
0
5
10
15
20
25
BLOCK POSITION
(b)
of 200 trials.
119
-0.125 to 0.166 and from -0.0625 to 0.25 are possible.
Figure 6.17(b) shows that initially the probability of finding the best individual at a
local optimum is around 10% for values of i between 1 and 19. As generations pass, this
number decreases for all values of i. But the decrease is slower for i =18 or 19, the values
for which the number of optima is largest. These hot spots of optima on the chromosome
have therefore an impact on the GA in this case. We can imagine that changing the coding
from T2 to T1 for ∆th(18) and ∆th(19) would eliminate the most significant handicap of
code T2 for this problem and a GA using this hybrid coding would perform nearly as well
as one using only T1 .
Figure 6.18(a) shows that, when T3 is used, numbers of optima are very high for all
values of i, most of them being very near the largest possible value of 4. The value of
i = 19 which was a maximum when T1 is now a minimum.
In Figure 6.18(b) we see that the probability of being caught in a local optimum is over
30 % for all values of i. After only 10 generations this number has been reduced drastically
to less than 10 % for values of i < 15 but other values are still high. Optimisation seems
to happen first for low values of i progressing with time towards larger values of i. After
200 generations, values of i near 40 are still above the 20 % mark. We can understand
why low values of i are optimised before large ones. Given the nature of the encoding, a
change in ∆th(1) changes the thickness of all panels by the same amount, while a change
in ∆th(40) changes only panels between the 41st and the tip of the wing. For low values
of i, changes in ∆th(i) will have a much larger impact on fitness; this results in a much
higher pressure of selection on these values until they are optimal.
We have seen that there is some diversity, across values of i, in the way z interacts
with codes T1 , T2 and T3 . However, for all values of i, z interacts better with T1 than with
T2 and better with T2 than with T3 . Even in the rare case where the number of optima
of z ◦ T2 is smaller than the one for z ◦ T1 (i = 10), it turns out that the GA is unaffected
by the optima at that point. In the case of code T2 and code T3 , we have seen that the
higher number of optima of function z ◦ T has an impact on the performance of the GA.
6.4
Conclusion
We can draw several conclusions from these results. The first one is that the nature of
the non-redundant code is important in this problem and code T1 is a very good choice
for it. A code such as T2 is only marginally worse and we have been able to identify a few
points on the chromosome which are responsible for most of the discrepancy. Code T3 on
the other hand is highly inadequate for this problem.
When, as is the case here, the regularity of the z functions can be exploited by a nonredundant code T to yield very few optima on z ◦ T , no form of redundancy will be able
to contribute anything to the problem. The best rule to match a code T to a function z,
is that elements of S which tend to have similar fitness should be encoded by neighbour
binary sequences.
However, in the case where the non-redundant code is not well matched to the problem,
redundancy can be applied to great effect. Code T3 , probably the worse possible choice
T3
4.2
NUMBER OF OPTIMA
120
gen=0
gen=10
gen=50
gen=200
4
3.8
3.6
3.4
3.2
3
2.8
0
5
10
15
20
25
BLOCK POSITION
(a)
T3
0.7
gen=0
gen=10
gen=50
gen=100
gen=200
PROBABILITY OF
LOCAL OPTIMA
0.6
0.5
0.4
0.3
0.2
0.1
0
-0.1
0
5
10
15
20
25
30
35
40
BLOCK POSITION
(b)
of 200 trials.
121
for that problem, was raised to a performance level equal to T1 by the introduction of
redundancy. Redundancy can therefore help when the properties of z are unknown and
one has no indication of whether the non-redundant code is well matched to the problem
or not. Also, some fitness functions will be such that the properties of z will be highly
variable from one part of the chromosome to another. It will hence be impossible for a
non-redundant code to cope advantageously with all of them leaving scope for redundancy
to improve the situation at those points in the chromosome where the mismatch results
in z ◦ T having large numbers of optima.
Chapter 7
Conclusion
7.1
Summary of the approach and main contributions
Although a lot is known about the biochemical details of the implementation of the genetic
code, its origins and evolution remain poorly understood. In particular, very little is
known about whether the assignment of amino acids to triplets is arbitrary or whether
it was selected because of beneficial properties for the evolutionary process. Part of this
ignorance is due to the persisting image of the frozen code suggested by Crick in 1968.
Not enough attention has been paid to the fact that the universal genetic code is not
in fact universal and that some of its variants are relatively recent (Osawa et al., 1992).
Such findings suggest that some changes might outlive others because of their positive
consequences for evolution. This thesis argues that changing the pattern of redundancy,
while keeping the set of possible amino acids constant, can have such positive consequences.
Questions concerning the impact of codes on evolution can be also be asked in the
context of genetic algorithms. Indeed, in many GA applications, associations similar to
the genetic code are defined between binary sequences and possible values of a variable
involved in the definition of a candidate solution. The expectation was that, for these
codes as for the genetic code, carefully chosen patterns of redundancy could improve the
performance of the GA. The two questions can be addressed simultaneously since a GA is
a valid tool to simulate the natural phenomenon of mutation and selection.
Chapter 3 defined a possible formal language in which these questions could be phrased
unambiguously. The desired features of the language were that
• it would dissociate redundancy from the power of expression of the code so that the
definition of redundancy would not be tied down to the semantics of the code.
• it could characterise with precision a large number of possible forms of redundancy.
The result was a description of patterns of redundancy by means of permutations defined
over a set of 2n elements, where 2n is the number of sequences used by the code. As a
result of the first feature, a
pattern of redundancy can be introduced in any problem where 2n symbols are coded
in binary form. (Following the terminology of Chapter 3, the word symbol is used here
Chapter 7. Conclusion
123
to designate those things which are represented by the code, be they amino acids or any
building block used by the GA.) It is therefore possible to compare the effect of a pattern
of redundancy across different problems.
Ideally, each of these patterns of redundancy would be assessed on its ability to improve
the performance of a GA on some test problems. This, however, is too long a process given
the large number of possible patterns. Instead we proposed a shortcut for discovering
promising redundancy patterns based on their ability to suppress local optima. We talk of
local optimum when a symbol of the code (in the sense defined above) has a higher fitness
than all other symbols that can be reached from it by point mutation. It is therefore only
possible to count the number of local optima when we have a ranking of the symbols by
fitness. Such ranking can be seen as resulting from a partial fitness function obtained by
trying out all possible symbols at an arbitrary position in an arbitrary genotype. At a
given position in a protein, for instance, each amino acid can be ranked according to how
well the protein performs its function with that amino acid at that position. We assumed
that, averaged over all genetic contexts, all rankings of the symbols are equally likely to
arise. We therefore average the number of optima obtained for a large number of such
rankings. For a permutation σ, we obtain a number Rσ which represents the average
proportion of local optima suppressed by the introduction of σ compared with the case
where no redundancy exists.
In Chapter 4, we performed a large scale study of the patterns of redundancy so defined.
The aim was to assess the variation in Rσ values across the whole set of permutations and
look for the features of a permutation leading to a high value of Rσ . All permutations were
found to have overall a positive effect on the number of optima, the best ones reducing
the numbers by more than 30%. Two main features were identified as leading to large
values of Rσ . The first one is the number of neutral mutations that is induced by the
pattern of redundancy. This number is also the number of invariant elements of the
permutation that defines the pattern. It was found that a number of neutral mutations
around half the maximum possible number leads to the highest expected value of Rσ .
The best permutations were also found to have this property. The second feature has
more explanatory power but is more difficult to identify in a permutation. It relies on
classifying each of the possible pairs of symbols into four classes according to the relative
ease with which mutation could transform one into the other. A linear combination of the
numbers found in each class correlates highly with Rσ . We conclude that good patterns
of redundancy do indeed work by creating more routes which can be used by mutation to
change one symbol into another.
In Chapter 5, we checked how well Rσ predicted the outcome of a genuine evolutionary
trial. Some permutations were chosen covering the entire range of Rσ values and the
resulting redundancy patterns were included in the code used by a GA running on an NK
fitness landscape. We found that Rσ predicted the performance of the GA: the higher the
value of Rσ , the better the GA performed when σ was used to define redundancy. The
best permutation speeded up evolution by a factor of four, twice as much as the increase
in speed obtained by using recombination. The best pattern of redundancy was tried on
124
two other problems where it was found to speed up optimisation by a factor of two.
In Chapter 6, the best redundancy pattern was included in the code of a GA optimising the design of an aeroplane wing. The gain in speed on this problem was very
marginal. However, we were able to show that the gain depends on the definition of the
non-redundant part of the code. In this problem, symbols of the code are real numbers;
substitution of one real number by a close one is therefore likely to result in a smaller
change in fitness than substitution by a very different one. This violates the assumption
made when determining the best pattern of redundancy that substitutions of one symbol
by another are all likely to cause the same change in fitness. It could therefore be the case
that better patterns of redundancy would have been missed because the procedure used
in Chapter 4 does not reflect the conditions of this problem. In fact, we have been able
to show that this is not the case. Rather, the problem is such that when a non-redundant
code is built according to the principles of a Gray code, there are no local optima of the
kind that redundancy can suppress. No form of redundancy could therefore improve the
GA when that is the case. If on the other hand, a sub-optimal non-redundant code is
used, the best pattern of redundancy can compensate for that choice and improve the
performance of the GA to the level obtained with a Gray code.
7.2
Conclusions for GAs
There is no easy way to tell whether a code can be improved by introducing redundancy.
When the symbols are real numbers, a non-redundant code built on the principles of Gray
coding will probably introduce few local optima of the kind that redundancy can eliminate.
If, however, the partial fitness functions obtained by varying one such real number at a
time are not perfectly smooth, a Gray code will still have some scope to be improved by
redundancy. Whether this is the case or not can be checked experimentally by picking
random individuals in the evolving population and trying all possible binary combinations
at the positions that define one such real number. In problems where the symbols encoded
are in no obvious distance relation to each other, redundancy is very likely to be a useful
addition to the code such as in the two last problems of Chapter 5.
The assessment of redundancy was always done on the basis of an optimal mutation
rate. That is, we compared the performance of the GA operating with and without
redundancy at the optimal mutation rate in both cases. This is, we think, the fairest form
of comparison. But that means that the gains will only be obtained if one takes time to
try several mutation rates. If one operates at a random mutation rate, redundancy might,
in many cases, bring no benefit. But since it never degrades the performance of the GA
provided that the best mutation rate is used, it can be considered a safe bet to include it.
If indeed redundancy improves evolvability by increasing the number of transitions
possible from any given symbol, we should expect another modification of the GA to have
roughly the same effect.
Consider for instance the problem described in Section 5.3. The non-redundant code T
used in that problem meant that mutation could only change A into B, C or E. Transition
from A to any other symbol required that several mutations took place simultaneously in
125
the triplet of bits that define A, an unlikely event at the kind of mutation rates at which a
GA operates. Instead of adding redundancy, we can redefine the mutation operator in the
following way. Abandoning a binary representation of the symbols A, B, C, D, E, F, G, H,
we define mutation in such a way that any of these symbols can mutate into any other
with equal probability. In that case, none of the symbols can ever be a local (non-global)
optimum since transition to the best symbol at any loci is always possible with probability
pmut /8.
The elimination of all local optima in this way is not, however, without a cost. We
can characterise this cost by imagining the consequences of using the same procedure
on entire chromosomes instead of blocks defined by a small number of bits. Redefining
mutation in such a way means that the offspring of a chromosome is equally likely to
be any other chromosome in the search space. The GA is then effectively performing a
random search. The reason for constraining an offspring to look genetically like its parent
is that we assume that choosing a new solution in the neighbourhood of a good one is
more likely to be successful than picking one totally at random. That is we expect the
quality of the parent to be somehow heritable by the offspring.
Similarly, at the symbol level, we might want to enlarge the set of possible transitions
without completely destroying the underlying distance that preexisted between those symbols. Redundancy, in the way defined in this thesis, allows to do just that.
Both the redefinition of mutation just outlined and redundancy can probably be seen
as instances of a more general definition of mutation through the following matrix:











P (A → A)
P (B → A)
P (C → A)
P (D → A)
P (E → A)
P (F → A)
P (G → A)
P (H → A)
P (A → B)
P (B → B)
P (C → B)
P (D → B)
P (E → B)
P (F → B)
P (G → B)
P (H → B)
P (A → C)
P (B → C)
P (C → C)
P (D → C)
P (E → C)
P (F → C)
P (G → C)
P (H → C)
P (A → D)
P (B → D)
P (C → D)
P (D → D)
P (E → D)
P (F → D)
P (G → D)
P (H → D)
P (A → E)
P (B → E)
P (C → E)
P (D → E)
P (E → E)
P (F → E)
P (G → E)
P (H → E)
P (A → F )
P (B → F )
P (C → F )
P (D → F )
P (E → F )
P (F → F )
P (G → F )
P (H → F )
P (A → G)
P (B → G)
P (C → G)
P (D → G)
P (E → G)
P (F → G)
P (G → G)
P (H → G)
P (A → H)
P (B → H)
P (C → H)
P (D → H)
P (E → H)
P (F → H)
P (G → H)
P (H → H)











where, for instance, 1 − P (A → A) is the probability of mutation of the A symbol and
P (A → E) is the probability that it mutates into an E. The numbers on any of the lines
should add up to 1. This matrix can express any situation where all transitions between
symbols are possible but not necessarily with the same probabilities.
For any given problem, one could try to find the coefficients of this matrix that would
optimise the performance of the GA. However, the extra time spent in doing this would
have to be compared to the gains in performance.
7.2.1
Further lines of research
The previous discussion suggests important lines of research. It would be interesting to
see whether, indeed, the patterns of redundancy defined in this thesis can actually be
summarised in a matrix as outlined above. That is, whether for any permutation, a
matrix can be defined which cause the GA to behave as if redundancy had been added.
If that was the case, then the study of redundancy could be incorporated into this more
general framework.
126
As for redundancy itself, we need to understand better the features which cause a
pattern of permutation to improve codes. When the value of n is greater than 3, exhaustive
search for the best patterns is not possible. We therefore need to find some practical ways
of finding the patterns with high Rσ values. The number of invariant elements of a
permutation gives some indication of its Rσ value but some additional criteria are needed
in order to find the very best patterns.
Some work can be done exploring the consequences of adding more than one bit of
redundancy along the lines defined in Section 3.3. From a practical point of view, this is
probably only interesting for codes defined on more than 4 bits.
7.3
7.3.1
Conclusions for the genetic code
How optimal is the redundancy of the code?
The experiments of Chapter 5 and Chapter 6 lead us to believe that changes in the
redundancy of the genetic code have significant consequences for the evolutionary process.
But has selection been able to take advantage of this fact? One indication of this would
be to find that the redundancy of the code matches those patterns which have been found
to be optimal.
We saw in Chapter 2 that some of the redundancy found in the genetic code is a
necessary consequence of the way tRNAs bind with mRNAs; the so-called wobble rules
make it impossible for more than 2 amino acids to be specified by four codons which differ
only at the third position. When two amino acids are indeed specified by such codons, it
is almost always the case that XYU and XYC will code for one amino acid and XYA and
XYG for the other one. As was argued in Section 2.1.6, the redundancy resulting from
these rules has not been the object of any selection. It is therefore best left out of our
discussion here. One way of leaving it out of further considerations is to assimilate XYU
and XYC to a single point of our conceptual sequence space which we denote XYU
C and
do the same for XYA and XYG which we call XYA
G . Having done that we are left with a
sequence space containing 32 points.
U
U
A
Either XYA
G and XYC have the same meaning such as CCG and CCC which both
U
code for proline, or their meaning differs such as UUA
G which codes for leucine and UUC
which codes for phenylalanine. This is comparable to the situation in Chapter 3 where
the transition from a sequence in C0 to one in C1 by mutation of the redundancy bit could
either be neutral or not. It therefore makes sense to think of sequences XYA
G as being in
C0 and sequences XYU
C as being in C1 or vice-versa.
We can now see how the language used in this thesis to define redundancy can be, to
some extent, applied to the genetic code.
We saw that in an early version of the code, the third base was probably never relevant
A
i.e. XYU
C would always be synonymous of XYG . In the analogy defined above, this
matches perfectly the pattern of redundancy associated with the identity permutation
since all transitions from C0 to C1 are neutral. This would have been an easy starting
point for selection since we showed that this pattern of redundancy is the worst possible
one. Any mutant version of the code would therefore have been favoured. But can we
127
decide whether the present version is optimal?
It was shown in Chapter 4 that in the best patterns of redundancy about half of the
total transitions between C0 and C1 are neutral. This feature is robust with respect to the
size of the code as was shown in Figure 4.8. To assess the situation in the code, we count
A
the number of values of XY for which XYU
C is synonymous of XYG . Out of a possible
number of sixteen, exactly 8 fall in that category. Redundancy in the code fulfills perfectly
that criteria for optimality.
7.3.2
Limitations of the model
This fit is very encouraging but it must be treated with some caution given that some
features of the code are not really mirrored in our model.
First, one of the assumptions of our model was that the symbols found in C0 are all
different. For this assumption to hold in the code, it would have to be the case that at
the point where all codons starting in XY coded for the same amino acid, no redundancy
existed in the use of the first two letters. In other words, the sixteen different instantiations
of XY would have to lead to 16 different amino acids. This was not the case since AG(X)
would have coded for either serine or arginine which were also represented by UC and CG
respectively.
Secondly, we know that if there ever was a proto-code capable of encoding a maximum
of 16 amino acids, and in which the third base did not matter, changes leading to the
quasi-universal genetic code must have included the introduction of new amino acids.
This situation is not captured by our model. It would not make much sense to include it
because this is a different effect altogether whose consequences would have to be analysed
independently. Unfortunately the two effects are bound to be difficult to disentangle in
the history of the code.
Thirdly, the permutations associated with good redundancy were identified on the
basis that any ranking in fitness of the amino acids is equally likely as any other. This is
unlikely to be true since amino acids have some measurable degree of similarity with each
other. We consequently expect similar amino acids to appear nearby in any ranking.
7.3.3
Further lines of research
A different approach could address some of the limitations discussed above. We can
examine whether the code is at a local optimum with respect to its pattern of redundancy.
That is, we can compare the code with variants which differ by a minimum change in
codon assignments. If the code turns out to be better than all, or almost all, these minimal
rearrangements, it is almost certain that selection is responsible for the code being at that
local optimum.
Assessing whether the code is better than a near variant could be done by comparing
numbers of local optima as was done in Chapter 3. In order to address the third of the
limitations above, we could count these numbers by averaging not over all possible fitness
rankings of the amino acids but rather on rankings which are compatible with the chemical
similarity of amino acids.
128
What would be gained in plausibility for the study of the genetic code by this approach
would be lost in applicability to GAs.
Bibliography
Beer, R. and Gallagher, J. (1991). Evolving dynamical neural networks for adaptive
behaviour. Adaptive behavior 1: 91–122.
Boers, J. and Kuiper, H. (1992). Biological metaphors and the design of modular
artificial neural networks. Master’s thesis, University of Leiden.
Bollobás, B. (1985). Random graphs. Academic Press.
Caruana, R. and Schaffer, J. (1988). Representation and hidden bias: Gray vs.
binary coding for genetic algorithm. In J. Laird (ed.), Proceedings of the Fifth International Conference on Machine Learning, pp. 153–161. Morgan Kauffman.
Collins, R. and Jefferson, D. (1991). Selection in massively parallel genetic algorithms. In R. Belew and L. Booker (eds.), Proceedings of the Fourth International
Conference on Genetic Algorithms, ICGA-91,, pp. 249–256. Morgan Kauffman.
Crick, F. (1966). Codon-anticodon pairing: The wobble hypothesis. Journal of molecular
biology 19: 548–555.
Crick, F. (1968). The origin of the genetic code. Journal of molecular biology 38:
367–379.
Dellaert, F. and Beer, R. (1994). Toward an evolvable model of development for
autonomous agent synthesis. In Artificial Life IV, pp. 246–257. MIT Press.
Derrida, B. and Peliti, L. (1991). Evolution in a flat fitness landscape. Bulletin of
mathematical biology 53: 355–382.
Di Giulio, M. (1989). The extension reached by the minimisation of the polarity distances
during the evolution of the genetic code. Journal of Molecular Evolution 29: 288–293.
Ehdaie, B. and Waines, J. (1994). Genetic analysis of carbon-isotope discrimination
and agronomic characters in a bread wheat cross. Theoretical and Applied Genetics
88(8): 1023–1028.
Eigen, M. and Schuster, P. (1977). The hypercycle. a principle of natural selforganisation. Part A: emergence of the hypercycle. Naturwissenschaften 64: 541–565.
Falkenauer, E. (1995). Soving equal piles with the grouping genetic algorithm. In L. Eshelman (ed.), Proceedings of the Sixth International Conference on Genetic Algorithms,
pp. 492–497. Morgan Kauffman.
Gilbert, S. (1994). Developmental Biology. Sinauer, Sunderland, Mass.
Goodnight, C. (1995). Epistasis and the increase in additive genetic variance - implications for phase-1 of wrights shifting-balance process. Evolution 49(3): 502–511.
Gruau, F. (1992). Genetic synthesis of boolean neural networks with a cell rewriting developmental process. In L. Whitley and J. Schaffer (eds.), COGANN-92:International
Workshop on Combinations of Genetic Algorithms and Neural Networks. IEEE Computer Society Press.
Bibliography
130
Grüner, W., Giegerich, R., Strothmann, D., Reidys, C., Weber, J., Hofacker,
I., Stadler, P., and Schuster, P. (1996a). Analysis of RNA sequence structure
maps by exhaustive enumeration II. Structures of neutral networks and shape space
covering. Monatshefte für Chemie 127: 375–389.
Grüner, W., Giegerich, R., Strothmann, D., Reidys, C., Weber, J., Hofacker,
I., Stadler, P., and Schuster, P. (1996b). Analysis of RNA sequence structure
maps by exhaustive enumeration I. Neutral networks. Monatshefte für Chemie 127:
355–374.
Haig, D. and Hurst, L. (1991). A quantitative measure of error minimisation of the
genetic code. Journal of Molecular Evolution 33: 412–417.
Harvey, I. (1993). The artificial evolution of adaptive behaviour. Ph.D. thesis, University
of Sussex.
Harvey, I., Husbands, P., and Cliff, D. (1992). Issues in evolutionary robotics. In
J.-A. Meyer, H. Roitblat, and S. Wilson (eds.), Proceedings of the Second International
Conference on the Simulation of Adaptive Behavior, pp. 364–373. MIT Press/Bradford
Books, Cambridge MA.
Haykin, S. (1988). Digital Communications. Wiley, New York.
Heckman, J., Sarnoff, J., Alzner-de Weerd, B., Yin, S., and Rajbhandary,
U. (1980). Novel feature in the gentic code and codon reading patterns in neurospors
crassa mithochondrial tRNAs. Proceedings of the National Academy of Sciences, USA
77: 3159–3163.
Holland, J. (1992). Adaptation in natural and artificial systems. MIT press.
Husbands, P., Harvey, I., Cliff, D., and Miller, G. (1994). The use of genetic
algorithms for the development of sensorimotor control systems. In P. Gaussier and
J.-D. Nicoud (eds.), From Perception to Action, pp. 110–121. IEEE Computer Society
Press.
Huynen, M. (1996). Exploring phenotype space through neutral evolution. Journal of
molecular evolution 43: 165–169.
Huynen, M., Stadler, P., and Fontana, W. (1996). Smoothness within ruggedness:
The role of neutrality in adaptation. Proceedings of the National Academy of Science,
USA 93: 397–401.
Jukes, T. (1981). Amino acid codes in mitochondria as possible clues to primitive codes.
Journal of Molecular Evolution 18: 15–17.
Kauffman, S. (1993). The origins of order. Oxford University Press.
Kauffman, S. and Levin, S. (1987). Towards a general theory of adaptive walks on
rugged landscapes. Journal of theoretical biology 128: 11–45.
Kimura, M. (1968). Evolutionary rate at the molecular level. Nature 217: 624–626.
King, J. and Jukes, T. (1969). Non-darwinian evolution. Science 164: 788–798.
Kitano, H. (1990). Designing neural networks using genetic algorithms with graph generation system. Complex Systems 4: 461–476.
Bibliography
131
Koza, J. (1992). Genetic programming: On the programming of computers by means of
natural selection. The MIT Press.
Kruger, K., Grabowski, P., Zaug, A., Sands, J., Gottschling, D., and Cech, T.
(1982). Self-splicing RNA: autoexcision and autocyclization of the ribosomal intervening
sequences of Tetrahymena. Cell 31: 147–157.
Larsen, A. (1994). Breeding winter hardy grasses. Euphytica 77(3): 231–237.
Levine, D. (1996). Application of a hybrid genetic algorithm to airline crew scheduling.
Computers and operations research 23(6): 547–558.
Maynard Smith, J. (1978). The evolution of sex. Cambridge University Press.
Maynard Smith, J. and Szathmary, E. (1995). The major transitions in evolution.
W.H. Freeman.
McIlhagga, M., Husbands, P., and Ives, R. (1996). A comparison of search techniques on wing box optimisation problem. In H.-M. Voigt, E. W., I. Rechenberg, and H.P. Schwefel (eds.), Parallel problem solving from nature-PPSN IV, pp. 614–623. Springer.
Michod, R. and Levin, B. (eds.) (1987). The evolution of sex : an examination of
current ideas. Sinauer Sunderland, Mass.
Miller, G., Todd, P., and Hedge, S. (1989). Designing neural networks using genetic
algorithms. In J. Schaffer (ed.), Proceedings of the Third International Conference on
Genetic Algorithms, pp. 379–384. Morgan Kauffman.
Mitchell, M. (1996). An introduction to genetic algorithms. MIT Press.
Monfroglio, A. (1996). Timetabling through constrained heuristic-search and genetic
algorithms. Software-practice and experience 26(3): 251–279.
Montana, D. and Davis, L. (1989). Training feedforward networks using genetic algorithms. In Proceedings of the International Joint Conference on Artificial Intelligence.
Morgan Kaufmann.
Nagylaki, T. (1994). Geographical variation in a quantitative character. Genetics
136(1): 361–381.
Nirenberg, M., Caskey, T., Marshall, R., Brimacombe, R., Kellogg, D., Doctor, B., Hatfield, D., Levin, J., Rottman, F., Pestka, S., Wilcox, M., and
Anderson, F. (1966). The RNA code and protein synthesis. In Cold Spring Harbor
Symposium in Quantitative Biology, volume 31.
Osawa, S., Jukes, T., Watanabe, K., and Muto, A. (1992). Recent evidence for
evolution of the genetic code. Microbiological Reviews 56: 229–264.
Palmer, E. (1985). Graphical Evolution: an introduction to the theory of random graphs.
Wiley.
Peck, J., Barreau, G., and Heath, S. (1997). Imperfect genes, fisherian mutation
and the evolution of sex. Genetics 145(4): 1171–1199.
Pluhar, W. (1994). The molecular basis of wobbling - an alterative hypothesis. Journal
of Theoretical Biology 169(3): 305–312.
Bibliography
132
Provine, W. (1986). Sewall Wright and Evolutionary biology. University of Chicago
Press, Chicago.
Reidys, C. (1995). Neutral networks of RNA secondary structures. Ph.D. thesis, University of Jena.
Schulze-Kremer, S. (1992). Genetic algorithms for protein tertiary structure prediction.
In R. Männer and B. Manderick (eds.), Parallel Problem Solving from Nature 2. NorthHolland.
Schuster, P. (1995). How to search for RNA structures. Theoretical concepts in evolutionary biotechnology. Journal of Biotechnology 41: 239–257.
Schuster, P. (1996). Landscapes and molecular evolution. Technical Report 96-07-047,
Santa Fe Institute.
Sonneborn, T. (1965). Degeneracy of the genetic code: extent, nature, and genetic
implications. In V. Bryson and H. Vogel (eds.), Evolving genes and proteins, pp. 377–
379. Academic Press, New York.
Sridhar, J. and Rajendran, C. (1996). Scheduling in flowshop and cellular manufacturing stems with multiple objective — a genetic algorithmic approach. Production
planning and control 7(4): 374–382.
Swanson, R. (1984). A unifying concept for the amino acid code. Bulletin of Mathematical
Biology 46: 187–203.
Turelli, M. and Barton, N. (1994). Genetic and statistical analyses of strong selection
on polygenic traits. Genetics 138(3): 913–941.
Wagner, G. and Altenberg, L. (1996). Complex adaptations and the evolution of
evolvability. Evolution 50: 967–976.
Watson, J., Hopkins, N., Roberts, J., and Steitz, J. (1987). Molecular biology of
the gene. The Benjamin/Cummings Publishing Company.
Woese, C. (1965). On the evolution of the genetic code. Proceedings of the National
Academy of Sciences, USA 54.
Woese, C. (1967). The genetic code: the molecular basis for genetic expression. Harper
& Row, New York.
Wong, J.-F. (1980). Role of minimisation of chemical distances between amino acids in
the evolution of the genetic code. Proceedings of the National Academy of Science, USA
77: 1083–1086.

C1 - CiteSeerX

Transcription

Similar documents

NJR2 - 100 Delawanna Ave., Clifton, NJ

mininova.org - Healthworks

the PDF

Welcome to the Fitness Trails

January.2013 - Fitness oprema

METRODOME LEISURE COMPLEX BARNSLEY WIN

135 136 photography by JeNaVieVe belair

NJR3 - 100 Delawanna Ave., Clifton, NJ

LiveWell Fitness Ad January 2016

Promotional Materials