C1 - CiteSeerX
Transcription
C1 - CiteSeerX
The evolutionary consequences of redundancy in natural and artificial genetic codes Guillaume Barreau Submitted for the degree of D. Phil. University of Sussex May, 2002 Declaration I hereby declare that this thesis has not been submitted, either in the same or different form, to this or any other university for a degree. Signature: Acknowledgements I would like to thank Jean-Arcady Meyer and Phil Husbands for their guidance in the early stages of this project and for helping me to obtain financial support from the European Commission. I would like to express my gratitude to Inman Harvey and Phillip Jones for stimulating discussions and critical comments on this work, to Jason Noble and Oliver Sharpe for reading and improving this manuscript and to Stephen Eglen for his help with too many things to list here. Finally, thanks to Arantza Etxeberria, Cecile Fairhead, Jason Noble, Lisbeth Barreau, Margarita Sanchez, Olivier Colin, Paulo Costa, Rafael Perez y Perez, Rodric Hemming, Sarah Bourlat, Stephen Eglen, Teresa del Soldato and Valeria Judice for their encouragements and unwavering support along this sometimes difficult path. The evolutionary consequences of redundancy in natural and artificial genetic codes Guillaume Barreau Abstract Whilst the existence of redundancy within the genetic code has been recognised for some time, the consequences of this redundancy for natural selection have not been granted any attention by theoretical biologists. We postulate an adaptive value to the pattern of redundancy found in the modern genetic code and argue that redundancy might also be beneficial to the performance of genetic algorithms when introduced at a similar level in their encodings. We define a formal framework in which some comparable patterns of redundancy can be modelled and studied. We show that these patterns of redundancy vary significantly in their effects and that the number of neutral mutations they induce is a relevant parameter in understanding this variation. We then quantify the impact of this form of redundancy on a genetic algorithm. Several optimisation problems are tried in which redundancy brings a substantial decrease in the number of generations needed to find a solution of a given quality. A problem is also presented where redundancy does not speed up the discovery of good solutions. A more detailed analysis is carried out of the factors responsible for this limitation. The consequences of these findings for genetic algorithms and for the evolution of the genetic code are discussed. Submitted for the degree of D. Phil. University of Sussex May, 2002 Contents 1 Introduction 1 1.1 The evolution of the genetic code . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Codes and genetic algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Aims of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.4 Structure of this thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 Codes and neutrality in biological and simulated evolution 2.1 2.2 2.3 2.4 2.5 7 The genetic code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.1 From DNA to mRNA . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.2 From mRNA to protein . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.1.3 Redundancy and neutrality in the genetic code . . . . . . . . . . . . 11 2.1.4 The wobble rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.1.5 Which codes are possible given the wobble rules? . . . . . . . . . . . 13 2.1.6 The underlying causes of the wobble rules . . . . . . . . . . . . . . . 14 Evolution of the genetic code . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2.1 Frozen accident versus stereochemical theory . . . . . . . . . . . . . 14 2.2.2 The genetic code is not universal . . . . . . . . . . . . . . . . . . . . 16 2.2.3 Adaptive forces shaping the genetic code . . . . . . . . . . . . . . . . 18 2.2.4 An adaptive hypothesis for neutrality in the genetic code . . . . . . 19 Codes in Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.3.1 Importance of the genotype to phenotype mapping for GAs . . . . . 21 2.3.2 Existing work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.3.3 Relevance of redundancy for GAs . . . . . . . . . . . . . . . . . . . . 24 Neutrality in RNA evolution . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.4.1 RNA folding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.4.2 Shape space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.4.3 Sequences folding into s and sequences compatible with s . . . . . . 27 2.4.4 Connectivity of C(s) . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.4.5 Modelling neutral networks with random graphs . . . . . . . . . . . 29 2.4.6 Random graphs compared to simulated neutral networks . . . . . . . 29 2.4.7 Population dynamics on neutral networks . . . . . . . . . . . . . . . 29 2.4.8 Perpetual innovation along a neutral network . . . . . . . . . . . . . 30 2.4.9 Critique of the random graph approach . . . . . . . . . . . . . . . . 31 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Contents 3 A formal framework for a comparative study of redundancy 6 34 3.1 Requirements for a definition of redundancy . . . . . . . . . . . . . . . . . . 34 3.2 Redundancy in a minimal form . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.3 3.4 3.5 3.2.1 A possible definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.2.2 Some examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.2.3 The identity permutation . . . . . . . . . . . . . . . . . . . . . . . . 39 3.2.4 Redundancy and neutrality . . . . . . . . . . . . . . . . . . . . . . . 40 3.2.5 Permutations as the expression of redundancy . . . . . . . . . . . . . 41 3.2.6 Redundancy in a graphical form . . . . . . . . . . . . . . . . . . . . 41 The framework in a more general form . . . . . . . . . . . . . . . . . . . . . 43 3.3.1 The natural generalisation . . . . . . . . . . . . . . . . . . . . . . . . 44 3.3.2 Other ways of generalising . . . . . . . . . . . . . . . . . . . . . . . . 46 The criteria for assessing redundancy . . . . . . . . . . . . . . . . . . . . . . 49 3.4.1 Assigning fitness to symbols . . . . . . . . . . . . . . . . . . . . . . . 49 3.4.2 How meaningful is the fitness of a 3 bit long sequence? . . . . . . . . 50 3.4.3 Counting numbers of optima . . . . . . . . . . . . . . . . . . . . . . 51 3.4.4 Dealing with neutral paths . . . . . . . . . . . . . . . . . . . . . . . 53 3.4.5 Comparing numbers of optima in a meaningful way . . . . . . . . . 54 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4 A statistical analysis of redundancy patterns 56 4.1 Aims of the chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.2 Some quantitative features of a permutation . . . . . . . . . . . . . . . . . . 56 4.3 4.4 4.5 4.2.1 Number of invariant elements . . . . . . . . . . . . . . . . . . . . . . 57 4.2.2 Number of orbits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.2.3 Sum of distances between a sequence and its image . . . . . . . . . . 57 4.2.4 Connectivity between pairs of signification . . . . . . . . . . . . . . . 58 Best and worst permutations when n is equal to 3 . . . . . . . . . . . . . . 61 4.3.1 Some considerations of size . . . . . . . . . . . . . . . . . . . . . . . 61 4.3.2 Description of the data 4.3.3 Any redundancy is better than none . . . . . . . . . . . . . . . . . . 61 4.3.4 Differences between Dσ and Rσ . . . . . . . . . . . . . . . . . . . . . 64 4.3.5 The proportions of adverse cases . . . . . . . . . . . . . . . . . . . . 64 4.3.6 Trends in the other variables . . . . . . . . . . . . . . . . . . . . . . 65 . . . . . . . . . . . . . . . . . . . . . . . . . 61 The incidence of Inv on Rσ . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.4.1 Codes defined on sequences of length 3 . . . . . . . . . . . . . . . . . 66 4.4.2 Codes defined on sequences longer than 3 . . . . . . . . . . . . . . . 67 The incidence of other variables on Rσ . . . . . . . . . . . . . . . . . . . . . 68 4.5.1 The relation with Conn0 . . . . . . . . . . . . . . . . . . . . . . . . 68 4.5.2 The relation with Conn1, Conn2 and Conn3 . . . . . . . . . . . . . 72 4.5.3 The relation with SumDist . . . . . . . . . . . . . . . . . . . . . . . 73 4.6 Parallels with a quaternary alphabet . . . . . . . . . . . . . . . . . . . . . . 74 4.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 Contents 5 Redundancy on trial in evolution 7 77 5.1 The Genetic Algorithm 5.2 First test problem: a case of no epistasis . . . . . . . . . . . . . . . . . . . . 78 5.3 5.4 5.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.2.1 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.2.2 Introducing redundancy . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.2.3 Experimental procedure . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.2.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 Second test problem: selection for a periodical chromosome . . . . . . . . . 89 5.3.1 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 5.3.2 Adding redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 5.3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Third test problem: finding a compact non-overlapping path on a grid . . . 94 5.4.1 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 5.4.2 Adding redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 5.4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 6 Some limitations to the benefits of redundancy 6.1 6.2 6.3 6.4 100 Application of redundancy to the design of a wing box . . . . . . . . . . . . 100 6.1.1 The problem and the original encoding . . . . . . . . . . . . . . . . . 100 6.1.2 Modifying the encoding . . . . . . . . . . . . . . . . . . . . . . . . . 102 6.1.3 Introducing redundancy . . . . . . . . . . . . . . . . . . . . . . . . . 103 6.1.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 Comparing redundancy on three non-redundant codes . . . . . . . . . . . . 105 6.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 6.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Why does T1 perform better than T3 ? . . . . . . . . . . . . . . . . . . . . . 110 6.3.1 Non-redundant codes and partial fitness functions . . . . . . . . . . 110 6.3.2 Counting numbers of optima . . . . . . . . . . . . . . . . . . . . . . 112 6.3.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 113 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 7 Conclusion 122 7.1 Summary of the approach and main contributions . . . . . . . . . . . . . . 122 7.2 Conclusions for GAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 7.2.1 7.3 Further lines of research . . . . . . . . . . . . . . . . . . . . . . . . . 125 Conclusions for the genetic code . . . . . . . . . . . . . . . . . . . . . . . . 126 7.3.1 How optimal is the redundancy of the code? . . . . . . . . . . . . . . 126 7.3.2 Limitations of the model . . . . . . . . . . . . . . . . . . . . . . . . . 127 7.3.3 Further lines of research . . . . . . . . . . . . . . . . . . . . . . . . . 127 List of Figures 1.1 A DNA molecule and its replication. . . . . . . . . . . . . . . . . . . . . . . 2 2.1 The base sequence and secondary structure of a tRNA. . . . . . . . . . . . . 9 2.2 A schematic view of the translation of a mRNA into proteins. . . . . . . . . 10 2.3 The evolution of the genetic code in non-mitochondrial genomes. . . . . . . 16 2.4 The evolution of the genetic code in mitochondrial genomes . . . . . . . . . 17 2.5 Mutation and the travelling salesman problem. . . . . . . . . . . . . . . . . 24 2.6 Two possible redefinitions of mutation compatible with set partitioning problems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.7 Two RNA molecules with the same secondary structure. . . . . . . . . . . . 26 3.1 A spatial representation of 3 bit long sequences. . . . . . . . . . . . . . . . . 41 3.2 A spatial representation of 4 bit long sequences. . . . . . . . . . . . . . . . . 42 3.3 A spatial representation of permutation [13465207]. . . . . . . . . . . . . . . 43 3.4 A spatial representation of permutation [13025746]. . . . . . . . . . . . . . . 44 3.5 A spatial representation of 6 bit long sequences. . . . . . . . . . . . . . . . . 45 3.6 Defining the meanings of Cd1 d2 d3 with seven permutations. . . . . . . . . . . 46 3.7 Defining the meanings of Cd1 d2 d3 with three permutations. . . . . . . . . . . 47 3.8 Defining the meanings of Cd1 d2 d3 with three commutative permutations. . . 48 3.9 Counting local optima with and without redundancy. . . . . . . . . . . . . . 52 3.10 Counting local optima in the presence of neutral paths. . . . . . . . . . . . 53 4.1 A connection of type 0 between a pair of symbols. . . . . . . . . . . . . . . 58 4.2 A connection of type 1 between a pair of symbols. . . . . . . . . . . . . . . 59 4.3 Possible connections of type 2 between pairs of symbols. . . . . . . . . . . . 60 4.4 Possible connections of type 3 between pairs of symbols. . . . . . . . . . . . 62 4.5 Permutations which never increase the number of optima. . . . . . . . . . . 65 4.6 Rσ as a function of Inv when n equals 3. . . . . . . . . . . . . . . . . . . . 66 4.7 R<Inv> as a function of Inv when n equals 3. . . . . . . . . . . . . . . . . . 67 4.8 Rσ as a function of Inv when n is greater than 3. . . . . . . . . . . . . . . . 69 4.9 R<Inv> as a function of Inv when n is greater than 3. . . . . . . . . . . . . 70 4.10 Rσ as a function of other variables when n equals 3. . . . . . . . . . . . . . 71 4.11 Rσ as a function of Inv when Conn0 is fixed. . . . . . . . . . . . . . . . . . 72 4.12 Rσ as a function of the best linear combination of Conn1, Conn2 and Conn3. 73 4.13 Rσ as a function of SumDist when Inv is fixed. . . . . . . . . . . . . . . . 73 4.14 A representation of sequence distances in the case of a quaternary alphabet. 75 5.1 The probability of a slot being picked in a selection tournament. . . . . . . 78 List of Figures 5.2 A redefinition of function f in NK fitness landscape terms. 5.3 First problem: the proportion of optimal blocks after 100 generations as a 9 . . . . . . . . . 80 function of the mutation rate. . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.4 First problem: the proportion of optimal blocks after 400 generations as a function of the mutation rate. . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.5 First problem: the proportion of optimal blocks as a function of the number of generations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.6 First problem: comparing the speed of evolution with and without redundancy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 5.7 The value of f (i, j) as a function of j. . . . . . . . . . . . . . . . . . . . . . 89 5.8 Second problem: transforming the encoding through permutation σ. . . . . 91 5.9 Second problem: transforming the encoding through permutation [07143562]. 92 5.10 Second problem: the proportion of optimal blocks as a function of the mutation rate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5.11 Second problem: comparing the speed of evolution with and without redundancy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 5.12 Possible moves from a cell of the grid. . . . . . . . . . . . . . . . . . . . . . 96 5.13 Third problem: transforming the encoding through permutation [07143562] 96 5.14 Third problem: the proportion of optimal blocks as a function of the mutation rate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 5.15 Third problem: comparing the speed of evolution with and without redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 6.1 The relevant elements of a wing. . . . . . . . . . . . . . . . . . . . . . . . . 101 6.2 The representation of the wing parameters on the chromosome. . . . . . . . 101 6.3 Fitness after 200 generations as a function of the mutation rate. . . . . . . . 104 6.4 Comparing the speed of evolution with and without redundancy. . . . . . . 104 6.5 The non-redundant code T2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 6.6 The non-redundant code T1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 6.7 The non-redundant code T3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 6.8 Fitness after 200 generations as a function of the mutation rate with code T1 .107 6.9 Fitness after 200 generations as a function of the mutation rate with code T3 .107 6.10 Comparing the speed of evolution with and without redundancy when T3 is used. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 6.11 Comparing non-redundant codes T1 , T2 and T3 without redundancy. Error bars indicate the standard error. . . . . . . . . . . . . . . . . . . . . . . . . 109 6.12 Comparing non-redundant codes T1 , T2 and T3 with redundancy. . . . . . . 109 6.13 Possible variations of fitness when changing the thickness of a single panel. 111 6.14 The average number of optima as a function of the number of generations. . 114 6.15 The proportion of blocks found at a local optimum as a function of the number of generations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 6.16 Optimality along the wing with non-redundant code T1 . . . . . . . . . . . . 117 6.17 Optimality along the wing with non-redundant code T2 . . . . . . . . . . . . 118 List of Figures 10 6.18 Optimality along the wing with non-redundant code T3 . . . . . . . . . . . . 120 List of Tables 2.1 The universal genetic code. . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 The wobble rules for the universal genetic code. . . . . . . . . . . . . . . . . 12 4.1 The 10 best and worst permutations when n equals 3. . . . . . . . . . . . . 63 4.2 Summary of the correlations between all variables . . . . . . . . . . . . . . . 68 4.3 A summary of the differences between a binary and a quaternary alphabet. 5.1 The relationship between number of blocks optimised after 400 generations 76 and the value of Rσ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 Chapter 1 Introduction 1.1 The evolution of the genetic code One hundred and forty years after it was proposed by Darwin, the theory of evolution through random variation and natural selection is more firmly established than ever. The discovery, in the second half of this century, of the molecular basis of heredity was an important consolidation of the theory. In 1953, Watson and Crick elucidated the chemical structure of deoxyribonucleic acid (DNA) revealing a formidable potential for information storage within every living cell and a simple mechanism by which this information could be replicated quickly and very accurately at every cell division. Briefly, Watson and Crick found that DNA is a molecule made of two strands, each of which consists of a backbone to which chemical bases are attached at regular intervals. These bases come in four varieties: adenine (A), cytosine (C), guanine (G) and thymine (T). The two strands are held together by bonds between opposite bases as shown in Figure 1.1. However, only two types of bonds are possible: A-T and C-G. Since A always pairs with T, T with A, C with G and G with C, the base sequence on one strand is entirely constrained by the sequence found on the other. This complementarity provides a simple mechanism for replicating the molecule: when the two strands are separated, each can be used as a template for the manufacture of a new complementary strand as shown in Figure 1.1. The end result is two molecules identical to the original. Thirteen additional years were needed to discover how the message stored in the DNA is interpreted by the cell to control the synthesis of all its proteins. At the heart of this process, molecular biologists found what is known as the genetic code, a dictionary specifying which of 20 possible amino acids corresponds to each of the possible combinations of three bases. The genetic code (shown on page 11) specifies, for instance, that triplet TTG stands for the amino acid leucine. Using this dictionary, some sequences of bases along the DNA strand can ultimately be decoded as a sequence of amino acids otherwise known as a protein. It was at first thought that this code was universal (Crick, 1968), i.e. that all forms of life used the same dictionary when interpreting the genetic message in their DNA. Chapter 1. Introduction A A T T A T T A T A T A G C G C C C T G A T G A C G C G G C G C C G G C G G C C T A A T T A A T T C A G T G A C G T C A C T G A 2 Direction of the copying process Figure 1.1: A DNA molecule and its replication. The two strands that make up the molecule are progressively separated and, as this happens, the unpaired bases are matched by new complementary bases. But more diversified genetic studies showed that this is not exactly true even though all variations documented so far are minor (see Osawa et al. (1992) for an exhaustive review of the known variants). This undeniable evidence of variation gave strength to the idea that the code itself was, at some point in the past, the object of natural selection and that its quasi-universal version is in some sense superior to possible alternatives. But what makes one code better than another? We can answer this question first in a more familiar context. The Morse code uses different sequences of ‘–’ and ‘.’ to represent each letter of the alphabet. There is a substantial degree of arbitrariness to the way this assignment is made but at least one design consideration can be identified. Not all letters are coded with the same number of dashes and dots; common letters are assigned short sequences (such as ‘.’ for ‘E’) while rare letters are assigned long ones (‘– –..’ for ‘Z’). This results in shorter messages which are more economical to transmit. We can think of another consideration which could, under certain circumstances, benefit the Morse code. Supposing that the likelihood of a ‘.’ being mistaken for a ‘–’ is high, we might want to reduce the probability of such errors going undetected for they could lead to a misinterpretation of the message. In many English words, the transformation of an ‘i’ into an ‘a’ will lead to a different but valid word. The alteration of ‘fit’ into ‘fat’ is one instance where such change could be the cause of some misunderstanding. On the other hand, a change of ‘i’ into a ‘z’ is unlikely to go unnoticed. It would therefore be sensible to chose for ‘i’ a sequence which is more likely to be corrupted into the sequence for ‘z’ or Chapter 1. Introduction 3 ‘h’ than into the sequence for ‘a’. Interestingly, both considerations are relevant to the genetic code. The first consideration is similar to the hypothesis that amino acids which are frequently used in proteins should have have more triplets of bases representing them in the code than amino acids which are rare. By this token methionine, which has a single triplet representing it, should be found in proteins in a much lower proportion than leucine which has six. This idea has been around for a while but it seems to have been disproved (Maynard Smith and Szathmary, 1995). The second consideration finds its counterpart in the suggestion that triplets which can easily be mutated into one another, such as GCA and GCC, should code for amino acids with similar chemical properties. The benefit is that a mutation that causes a change of amino acid will be less damaging for the protein because, if the new amino acid resembles the old one, the protein is likely to still be able to perform its function. Sonneborn (1965) first proposed this idea and, although controversial for a long time, it seems to have been established more recently by two publications (Haig and Hurst, 1991; Di Giulio, 1989). Notice that the property is analogous but somehow opposite to the one suggested for the Morse code. In one case corruption of the message should lead to unnoticeable change while in the other it should lead to an easily detectable change. This thesis will focus on a different feature. There is a large amount of redundancy in the genetic code: many triplets stand for the same amino acid since there are 64 possible triplets but only 20 amino acids to be encoded. This thesis aims to determine whether the way this redundancy is distributed could have had an impact on the ability of evolution to discover better proteins and, more generally, better-adapted life forms. It is important to understand that selection for a better genetic code is quite different from, say, selection for a better eye design, to take a classic example of complex adaptation achieved by evolution. Enhanced vision will grant an individual some reproductive advantage; a distinct genetic code, in the sense which interests us here, will not. If the genetic code of an organism is suddenly changed, the result is bound to be catastrophic for the organism since newly synthesised proteins will be altered wherever the protein was encoded by a triplet whose meaning has changed. The comparison of interest from our point of view is between organisms which produce exactly the same proteins but use a different code to represent these proteins in their DNA. Under these conditions, both organisms should have the same fitness. However, they will differ in the offspring they produce since the changes caused by mutation will be determined by the code they use. The consequences of such differences can only be appreciated after the passing of some time. The evolution of the genetic code could be described as a meta-evolutionary problem, similar in this respect to the evolution of sex (Maynard Smith, 1978; Michod and Levin, 1987). In both cases, the feature that is evolving does not directly affect an individual’s fitness but impinges on the relationship that exists between an individual and its offspring. As such, over the course of many generations, it can have significant cumulative effects. A mutation that causes an individual to revert to the asexual mode of reproduction, for instance, does not necessarily affect the ability of this individual to survive and repro- Chapter 1. Introduction 4 duce. But it can be shown that hundreds of generations later, the average fitness of its descendants is much lower than if that mutation had not taken place (Peck et al., 1997). 1.2 Codes and genetic algorithms So well established is Darwin’s idea that it has been turned into a general purpose problem solving strategy. Algorithms implementing the principles of random variation and selection have been used to design artificial neural networks (Beer and Gallagher, 1991; Boers and Kuiper, 1992), optimise schedules and timetables (Levine, 1996; Monfroglio, 1996; Sridhar and Rajendran, 1996) and predict the tertiary structure of proteins (Schulze-Kremer, 1992). These algorithms are commonly known as genetic algorithms (GAs). When such algorithms are used to solve problems, two distinct descriptions are needed of candidate solution objects. The first and most natural one describes the object in functional terms and is used to determine how well it solves the problem at hand. The second is used by the genetic operators of mutation and crossover to produce variants of those individuals in the population which are chosen for reproduction. This duality mirrors the phenotype/genotype distinction so essential to our understanding of life with the important difference that in GAs this relationship is imposed from outside by the designer of the algorithm. Increasingly, it has been realised that this freedom of choice should be exercised with care since for hard problems the choice of a suitable mapping between genotype and phenotype can critically influence the performance of the algorithm. According to Mitchell (1996), the way in which candidate solutions are encoded is a central, if not the central, factor in the success of a genetic algorithm. Ideally, a good encoding should allow genotypes representing satisfactory solutions to be reached with high probability from any randomly generated population. But for hard problems, any realistic encoding will result in some genotypes of less than optimal fitness acting as attractors for the population and delaying further evolution for very long periods of time. A good encoding is one for which the fitness of such genotypes is as high as possible. The choice of an encoding encompasses many issues: the shape of chromosomes (linear or tree-shaped as in Genetic Programming (Koza, 1992)), their length, how to distribute information on them, and many others which can only be discussed in the context of a specific problem. In the case of neural networks, for instance, one must explicitly specify how the connectivity of the network is going to be expressed in a linear form on the chromosome. All these questions deserve thorough investigation. In this thesis, however, we will focus on the aspect of the genotype to phenotype mapping which most resembles the genetic code. In many GA applications, the first level of interpretation of the genotype is done by parsing the binary string into blocks of predefined size; every such block defines one parameter or variable of the solution and all blocks taken together should give enough information to allow a solution to be evaluated. The size of a block is determined by the number of possible values we want the variable to take; a block of size n allows the encoding of up to 2n values. The mapping between those 2n binary values and the possible values of the variable is equivalent to the genetic code in the sense that it is the lowest Chapter 1. Introduction 5 layer of interpretation of the genotype. Redundancy as we find it in the genetic code can easily be introduced in such mappings. Furthermore, if redundancy helps evolution in the genetic code, we expect the same kind of redundancy to also improve the performance of a GA. In fact, the benefits that can be expected for a GA should be at least as large as those that might be brought about by redundancy in the genetic code. In the code, redundancy would have had to impose itself by selection. But as will be explained in the next chapter, the ability of the code to change is limited. Besides, the benefits of redundancy might only be felt in the long term which makes the task of selection difficult. The situation is quite different from a GA point of view. There, codes are imposed by the designer rather than depending on historical contingencies and evolution; nothing prevents the designer from including in the code a property known to be beneficial. The distinction between redundancy in the genetic code and redundancy in GAs will be blurred anyway since GAs will be used as our model of biological evolution. In Chapter 5 and Chapter 6, several patterns of redundancy will be introduced in the underlying code of GAs designed to solve different problems. These experiments will measure the practical benefits of redundancy for GAs as well as provide an indication of the likelihood of redundancy having been selected for in the genetic code. 1.3 Aims of the thesis The aims of this thesis are the following: • To define a theoretical framework within which the impact of redundancy on natural selection and GAs can be investigated clearly. • To determine, using this framework, whether some forms of redundancy can speed up adaptation when incorporated in the genetic code. The underlying suggestion is that life forms using such code would have been selected for their greater ability to adapt. • To establish the conditions under which such beneficial forms of redundancy can be successfully incorporated into a GA. The first two objectives address issues which are relevant both to the genetic code and to GAs. The third one is purely a GA issue. 1.4 Structure of this thesis Chapter 2 deals with some rudiments of molecular biology necessary to understand the genetic code and the problems associated with its evolution. Some of the most important theories about its evolution are then discussed prior to an intuitive formulation of why some redundancy in the genetic code could improve the ability of a lineage to adapt. The issue of redundancy is then discussed in the context of codes for GAs where it has received no attention at all. Finally, we review some work in the field of RNA evolution where neutrality, a possible consequence of redundancy, has been discussed in a way that is relevant to this thesis. Chapter 1. Introduction 6 Chapter 3 defines the formal framework within which our study of redundancy takes place. A set containing all possible patterns of redundancy is defined as well as a scalar measure which equates good redundancy with an ability to statistically suppress local, non-global, optima. Chapter 4 is an exhaustive study of all patterns of redundancy in the case of codes of low dimension. The aim is to identify the features of a pattern which are responsible for suppressing local optima. In Chapter 5, a few selected patterns of redundancy are included in a GA and tested for their ability to improve its performance. It is shown, on three distinct problems, that a pattern of redundancy which scores high on the measure of Chapter 3 diminishes significantly the time needed by the GA to find solutions of a given fitness. In Chapter 6, redundancy is added to a GA whose task is to optimise the design of an aeroplane’s wing. It is shown that redundancy is not necessarily beneficial in this case which leads to important observations about the conditions under which redundancy will or will not speed up evolution. Chapter 7 summarises the approach and the main contributions of this thesis. It concludes on the validity of the approach for the genetic algorithms. It reexamines the question of the origins of redundancy in the genetic code in the light of our findings. Further lines of research are outlined. Chapter 2 Codes and neutrality in biological and simulated evolution Section 2.1 is a brief overview of the molecular biology of the gene. This subject is associated with a vast literature and a thriving research effort—considerations of space preclude an exhaustive summary. For further information, the interested reader is referred to such comprehensive accounts as Watson et al. (1987). Fortunately, the facts relevant to the current argument concerning the genetic code have been established for some time and are no longer the object of controversy. Section 2.2 deals with more controversial and speculative ideas concerning the origins and evolution of the genetic code. The theme has fascinated theoreticians from the early days of the discovery of the code but a comprehensive picture of the process is still to come. A discussion of the main theories will set the stage for the suggestion that selection for redundancy has been responsible for some of the changes that have taken place in the code. Section 2.3 explains why this research is relevant to the field of genetic algorithms. We show its place as one aspect of the much larger question of the genetic representation of candidate solutions. Finally, the issue of neutrality in RNA evolution is discussed in Section 2.4. Researchers in that field have expressed an interest in neutrality for some time and we will examine in detail their theoretical approach to the problem. 2.1 2.1.1 The genetic code From DNA to mRNA Proteins are not obtained directly from DNA. Instead, in a process known as transcription, DNA is used as a template for the synthesis of a very similar single stranded molecule called ribonucleic acid or RNA. RNA is, like DNA, made of a backbone to which chemical bases attach sequentially. When a section of DNA is transcribed, the RNA molecule that is produced has exactly the same sequence of bases as the template except that the base uracyl (U) replaces every occurrence of the base thymine (T) on the DNA. Chapter 2. Codes and neutrality in biological and simulated evolution 8 Transcription happens in the following way. At one or more stage in the cell cycle, an enzyme called RNA polymerase attaches to the DNA molecule and separates both strands over a section of 17 bases. One of the strands is then used as a template onto which complementary nucleotides are attracted by DNA-like base pairing except that A pairs with U instead of T. The opened section of DNA moves along as the RNA molecule is assembled. The beginning and end of this process are controlled in a very precise manner ensuring that the same mRNA molecules are produced every time. Because transcription is the main point of control of gene expression, its activation and its rate are regulated by many products collectively known as transcription factors. These factors interfere with transcription in many different ways either activating it or repressing it. This is a vast and fascinating topic but it is not central to our argument. The interested reader is referred to Gilbert (1994). Several types of RNA exist which are all obtained by transcription. The majority are intermediary products in the synthesis of proteins and are called messenger RNA (mRNA). Other types exist which will be described in the next section. In eukaryotes, RNA molecules must migrate out of the nucleus into the cytoplasm for protein synthesis to take place. 2.1.2 From mRNA to protein When the mRNA molecule has reached the cytoplasm, it can be used as a template to build a protein; this process is known as translation. A protein is a chain of amino acids of arbitrary length. The precise nature and order of the amino acids in the chain is what will be dictated by the mRNA. Three main elements are involved in translation: ribosomes, transfer RNAs and the mRNA itself. Ribosomes are roughly spherical particles on which the bond between the adjacent amino acids forms. These particles are very complex assemblies consisting of about onethird protein and two-thirds ribosomal RNA, a form of RNA dedicated to this function and not translated into protein. Transfer RNAs, or tRNAs, are the main protagonists in the implementation of the genetic code since they mediate the inclusion of a specific amino acid in the protein conditionally on the identity of three bases found on the mRNA. A tRNA performs its function by associating on one end with a specific amino acid and by having three of its bases, know as the anticodon, capable of selectively associating with certain triplets on the mRNA (Figure 2.1). This selective association between codon and anticodon proceeds according to the standard rules of pairing (A with U and G with C) for two of the bases; for the third base of the codon, some special rules apply which are described in detail in Section 2.1.4. Transfer RNAs, loaded with their amino acid, diffuse to the ribosome. The position of the ribosome on the mRNA defines a particular triplet of the mRNA as the one that is currently readable. When a tRNA capable of binding that triplet does so, it releases its amino acid onto the growing protein chain; the ribosomal unit and the protein are then shifted three bases further along the mRNA and the next codon is ready to be read. The ribosome unit has been compared to the head of a tape reader and the mRNA to a tape. Chapter 2. Codes and neutrality in biological and simulated evolution 3’ A 9 ~ Alanine C C 5’ A G C G C G U C G G C G C U G A U G C G C U U G A G G C C U C C G G U U A G C G G A G C G C C C A C G U A C G C G C G G Unusual bases U U 3’ I G C C C G Codon Anticodon 5’ mRNA Figure 2.1: The base sequence and secondary structure of a tRNA. This tRNA will bind with an alanine amino acid on its 3’ end as shown. The anticodon will bind to GCC whenever such codon is ready to be read on the mRNA. This causes the alanine amino acid to be added to the protein that is being synthesised. The bars between bases show internal pairings of the molecule. The grey circles represent unusual bases obtained from A, C, G or U by chemical modification. One such base, inosine (I), is found at the first position of the anticodon. As explained in Section 2.1.4, it causes this tRNA to recognise codons GCA or GCU as well as GCC. Chapter 2. Codes and neutrality in biological and simulated evolution Free ribosomal subunits about to attach to mRNA Growing protein 10 Completed protein Amino acid tRNA 3’ 5’ mRNA Figure 2.2: A schematic view of the translation of a mRNA into proteins. Ribosomes are moving from left to right. Each of them is at a different stage of the manufacture of the same protein. Adapted from Watson et al. (1987). In fact, the tape is read simultaneously by several tape units which are some distance apart from each other as shown in Figure 2.2. The release of the growing chain of amino acids from the reading mechanism does not wait until the end of the mRNA is reached; it is triggered by the occurrence of some special codons (UAU, UAU and UGA) which are not bound by any tRNA but are recognised by proteins known as release factors. These proteins, as their name indicates, release the ribosome from the mRNA and liberate the completed protein into the cytoplasm. It is conventional in molecular biology to refer to a base sequence from the 5’ end of the molecule to the 3’ end (these ends can be distinguished on RNA and DNA molecules because the backbone onto which the bases attach is not symmetrical). Codon and anticodon run in opposite directions when pairing together as shown in Figure 2.1. Hence, if both codon and anticodon are described using this convention, the first base of the codon pairs with the third base of the anticodon, the second with the second and the third with the first. Codon 5’– CGA – 3’ will pair with anticodon 5’– UCG – 3’ for instance. If we choose to always represent anticodon with the opposite convention (from 3’ to 5’), checking the compatibility of a codon and an anticodon is easier because we do not need to mentally invert the anticodon to check its compatibility with the codon. This convention will be used in all that follows. For translation to be reliable, it is essential that a tRNA is always loaded with the same amino acid. If for instance, the tRNA represented in Figure 2.1 is loaded with another amino acid than alanine, some GCC codon will be misread resulting in the some proteins having an altered structure. The high reliability of the loading is made possible by the existence of enzymes, called aminoacyl-tRNA synthetase, whose function it is to recognise a tRNA and attach the correct amino acid to it. The consistent translation of a codon into the correct amino acid is therefore the joint responsibility of the tRNAs and the aminoacyl-tRNA synthetases. Both these molecules have their structure specified somewhere in the DNA of the organism: transfer RNAs are obtained by transcription only while aminoacyl-tRNA synthetases are proteins and therefore obtained by transcription and translation. The potential therefore exists for Chapter 2. Codes and neutrality in biological and simulated evolution 11 Table 2.1: The universal genetic code. The amino acids (and their abbreviations) are: phenylalanine (phe), serine (ser), tyrosine (tyr), cysteine (cys), leucine (leu), trytophan (trp), proline (pro), histidine (his), arginine (arg), glutamine (gln), ISO-leucine (Ile), threonine (thr), aspargine (asn), lysine (lys), methionine (met), valine (val), alanine (ala), aspartic acid (asp), glycine (gly) and glutamic acid (glu). 1 2 3 U C A G U phe ser tyr cys U U phe ser tyr cys C U leu ser stop stop A U leu ser stop trp G C leu pro his arg U C leu pro his arg C C leu pro gln arg A C leu pro gln arg G A Ile thr asn ser U A Ile thr asn ser C A Ile thr lys arg A A met thr lys arg G G val ala asp gly U G val ala asp gly C G val ala glu gly A G val ala glu gly G these molecules to change by mutation: a tRNA could have its anticodon changed while still carrying the same amino acid, or an aminoacyl-tRNA synthetase could be altered and recognise a different type of tRNA. Either type of alteration would cause a change of meaning for some codon. 2.1.3 Redundancy and neutrality in the genetic code The universal genetic code is the look-up table which summarises the meaning of all 64 possible triplets of bases as they are interpreted in translation by most living beings. It is pictured in Table 2.1. The first and third base define a line of the table while the second one defines a column. At the intersection of the two, the desired amino acid is found. As there are sixty-four different codons but only twenty-one different meanings to be expressed (twenty amino acid and the stop signal), some codons must share the same meaning. The code is said to be redundant. Some amino acids, such as serine, leucine, and arginine, have up to six different triplets coding for them. The smallest change that can take place in the genetic message is the alteration of one base into another, also called a point mutation. Such events are rare enough to render Chapter 2. Codes and neutrality in biological and simulated evolution 12 Table 2.2: The wobble rules for the universal genetic code. First anticodon base Compatible third codon base G U, C C G A U U A, G I A, U, C negligible the probability that more than one base is altered at a time in any given triplet. Furthermore, mutations are, in first approximation, equally likely to change any base into any other. It is therefore natural to think of two triplets that differ in only one of their three bases as neighbours. Every triplet has nine neighbours since three alterations are possible for each of the three bases. Consider for instance triplet ACG. Mutation of the first base can lead to CCG, GCG and UCG which are found occupying the same position as ACG, at the bottom of the three other boxes of the same column. Mutation of the second base leads to AAG, AGG and AUG which are found in the three other columns on the same line of the table. Mutations of the third base lead to ACA, ACC and ACU which are found in the same box as ACG. Mutations in the first base can very occasionally be neutral as is the case with UUA ↔ CUA which both code for leucine. Mutation in the second base are never neutral with the exception of UAA ↔ UGA which are both stop codons. Mutations in third base are very often neutral. Indeed, triplets with synonymous meanings are not randomly scattered in the table but tend to appear within the vertical boxes that demarcate a fixed choice of the first two bases and an arbitrary value for the third. For instance, codons in the bottom left corner all start with GU and code for valine. As a result, whenever a codon starts with GU, a mutation at the third base will be neutral. Family boxes, as these groups of synonymous codons are called, are found for codons starting with UC, CU, CC, CG, AC, GU, GC and GG. They make up half the total number of triplets. 2.1.4 The wobble rules As the deciphering of the code was coming to an end, it became clear that there wasn’t a different tRNA molecule for every codon in the table; some tRNAs could pair with more than one type of codon. For instance, a tRNA for alanine was shown to respond well to GCU, GCC, GCA but little if at all to GCG (Nirenberg et al., 1966). This observation and others of the same kind prompted Crick (1966) to propose the wobble hypothesis. Crick suggested that codon-anticodon pairing at the first two positions of the codon obeys the traditional base pairing (G with C and U with A) but that pairing at the third position is less discriminative and follows a special set of rules displayed in Table 2.2 and known as the wobble rules. The reason for this more fragile bond have since been explained in molecular terms (Pluhar, 1994). Chapter 2. Codes and neutrality in biological and simulated evolution 13 This table shows that only A and C obey the traditional pairing. G, C and I, a base called inosine which is only found in tRNA, are not as discriminative and will cause a tRNA to accept several options as the third letter of a codon. In particular, since inosine at the first position of an anticodon pairs with A, U or C at the third position of the codon, this could account in part for the common pattern where the third letter of the codon is irrelevant. It was indeed shown that tRNAs with inosine at the first position of the anticodon exist for each of the eight family boxes. But this accounts only for three out of the four synonymous in a box. For a codon ending in G to have the same meaning as the three others, there must be another tRNA with a C at the first anticodon position, the same two letters at the other two positions of the anticodon, and carrying the same amino acid. If that tRNA carries a different amino acid, then a different meaning is possible for the codon ending G as is the case when UAG codes for methionine while UAU, UAC and UAU code for ISO-leucine. The wobble rules also account for the common case where, within a box defined by a choice of the first two letters, the triplets ending in U and C have one meaning while those ending in A and G have a different one. Codons UUU and UUC, for instance, code for phenylalanine while UUA and UUG code for leucine. The wobble rules show that this pattern can be the result of a tRNA with anticodon AAG binding either UUU and UUC and another tRNA with anticodon AAU binding either UUA and UUG. 2.1.5 Which codes are possible given the wobble rules? The wobble rules set some limitations to the power of discrimination of tRNAs at the third position of the codon. No base at the first anticodon position will pair exclusively with C for instance. Bases I and G will pair with C but they also pair with U. Hence whenever an amino acid is associated with a triplet ending in C, the same amino acid is also associated with the triplet starting with the same two bases and ending in U. Similarly, no base is capable of pairing only with A at the third position. Either A is recognised with G (base U in the anticodon) or it is recognised with U and C (base I in the anticodon). A triplet XYA is thus either synonymous of XYU and XYC, or it is synonymous of XYG. Triplet UGA appears to be an exception to this rule since it has a different meaning than all other triplets starting in UG. This is only possible because UGA is a stop codon and does not have an associated tRNA. There is a tRNA with anticodon ACG which recognises UGU and UGC and another with anticodon ACC which recognises UGG only. No tRNA will bind to UGA which is why it acts as a stop codon. The differentiation of a triplet XYA from other triplets XYZ is therefore only possible if it results from the absence of tRNA recognising XYA, i.e. if it is a stop codon. A triplet ending in C could also differentiate itself from the one ending in U if it was a stop codon but that case is not observed in the genetic code. To summarise, there are six possible configurations for a box of 4 codons which share the first two letters. If a and b denote some arbitrary amino acids, these are Chapter 2. Codes and neutrality in biological and simulated evolution 14 XYU a a a a a a XYC a a a a stop stop XYA a a b stop b stop XYG a b b b b b Only the first four of these configurations are observed in the genetic code. In any case, no more than two distinct amino acids can coexist in a box which means that the 64 triplets could not possibly code for more than 32 amino acids. A large share of the redundancy in the code is therefore simply a consequence of the wobble rules. The amount of redundancy that could be accounted for by any other explanation is therefore much smaller than a superficial examination would let us believe. 2.1.6 The underlying causes of the wobble rules We have been unable to find in the literature any discussion of the underlying causes of the wobble rules. We can therefore only speculate as to what these causes might be. We can think of three different types of explanation. The first one is that these rules are a constraint arising from the underlying laws of RNA chemistry; in other words, no RNA molecule exist that are capable of doing all the things a tRNA does and in addition obey the normal rules of pairing at the third position of the codon. This explanation is unlikely given how versatile RNA molecules are in their function. The second possible explanation is that the wobble rules are the result of selection. That is, organisms endowed with tRNAs capable of discriminating as well at the third position as at the other two would have been at some disadvantage against those organisms whose tRNAs respected the wobble rules. We cannot think of any good reason why such selection would happen. Furthermore, if that explanation was right, it should be possible to mutate the existing tRNAs in such a way that they violate the wobble rules. As far as we are aware, no such tRNA has ever been found. The third explanation, which we favour, is closer to the first one than to the second. It suggests that there exist RNA molecules which could perform as tRNAs without the limitation of the wobble rules. However, these improved tRNAs might be far away from the current cloverleaf structure common to all tRNAs. Evolution would have taken the molecular machinery of the code down a path where tRNAs are stuck in a local optimum with respect to their ability to discriminate at the third position. Moving away from the wobble rules would at this stage require changes that evolution cannot perform. Furthermore, it is unclear anyway whether there would be some benefit from breaking free from the wobble rules. 2.2 2.2.1 Evolution of the genetic code Frozen accident versus stereochemical theory Once the meaning of all triplets was identified, reading in the code something about its origins became a natural concern. Crick (1968) made one of the earliest contributions to Chapter 2. Codes and neutrality in biological and simulated evolution 15 the debate with his suggestion of a frozen accident. Remarkably, many points raised in this article are still relevant today. Crick uses the term accident to contrast his theory with the so called stereochemical hypothesis (Woese, 1965). The latter claimed that the the relationship between the anticodon of a tRNA and the amino acid it carries is not arbitrary but extends some kind of natural affinity that exists between a codon and its associated amino acid. This affinity would have been at the origins of the code at a time when tRNAs where not yet available to perform their translating function. The stereochemical hypothesis has two attractive features. First, it proposes an explanation for how the code could have originated in the absence of tRNAs. Secondly, it makes the universality of the code a necessity since chemistry would have shaped it. To this hypothesis, Crick opposes evidence, confirmed since, that the anticodon of a tRNA can be changed without changing the type of amino acid it accepts. There is therefore nothing absolute in the association between codon and amino acid, at least not in the current form of the code. The variations of the genetic code which will be discussed later are another tangible proof of this fact (Osawa et al., 1992). It could still be argued that an affinity existed between some codons and some amino acids in some very early version of the code which disappeared as tRNAs became more sophisticated. Yet, no evidence supporting this hypothesis has been found. Crick suggests that the coding was arbitrary right from the start and that it slowly changed to allow the introduction of new amino acids. But such changes, he argues, could only take place very early in the history of life. Only very crude proteins could withstand the consequences of a change in the meaning of a codon. As soon as proteins became too precise in their function, they lost the ability to cope with such widespread disruption and the code became frozen. Crick argues that the primitive genetic code would have encoded a smaller number of amino acids since many of them would have been unavailable to start with. This smaller number would have made it possible for one or two bases to be enough to code them instead of the present three. But if the translation process moved along by steps of one or two bases on the mRNA instead of the current three, transition to the later system would have been impossible without completely scrambling the existing message. On the other hand, it is possible that the early code proceeded by steps of three bases but interpreted only the first two bases ignoring completely the third one. This scenario is supported by the irrelevance of the third base in half of the cases in the present genetic code. Furthermore, although Table 2.2 states that U at the first position of the anticodon pairs with A or G, it has been shown that the base U has to be chemically modified to behave in this way. Left unmodified, U at the first position will pair with any of the four bases including itself. Such totally indiscriminate pairing occurs in the code of mitochondria (Heckman et al., 1980) with the effect that a single tRNA is enough to decode the four codons in a family box. Jukes (1981) also argued in favour of such “two out of three” pairing as the norm in the early genetic code. Crick points out that the sophistication of an early, simple code by introduction of new amino acid is likely to lead to the situation where chemically similar amino acids are Chapter 2. Codes and neutrality in biological and simulated evolution A : UGA: stop B : CUG: leu C : UAA/G: stop D : UGA: stop trp ser gln cys Eubacteria 16 Eukaryotes Yeasts A C B Others Mycoplasma Spiroplasma Saccharomyces Candida cylind. D Acetabularia C Euplotes Tetrahymena Paramecium etc. Others Figure 2.3: The evolution of the genetic code in non-mitochondrial genomes. From Osawa et al. (1992). encoded by similar codons. The reason is that when a new amino acid becomes the new meaning of a codon, the change is much more likely to be tolerated by all the affected proteins if the new amino acid is not too different from the old one. Sonneborn (1965) was the first to suggest that the code might indeed be such that chemically similar amino acids are nearby in the table. His explanation however was different; he suggested that this property had been positively selected because it creates a situation where the effects of mutation are minimised. Crick’s explanation is also in a sense about selection although a more immediate form of it: all the codes which do not have the property are immediately eliminated. In the scenario imagined by Sonneborn, codes which do not have that property came into existence but were eliminated in the long run because organisms using them suffered more damaging deleterious mutations. 2.2.2 The genetic code is not universal Contrarily to what was thought in the early days, the genetic code is not universal. However, none of the variants is very different from the “universal” code and all encode exactly the same amino acids. The code used by the mitochondria of yeast is the most different with 6 codons with a different meaning. Generally speaking, larger deviations from the universal code are found in mitochondrial codes than in nuclear ones. Figure 2.3 represents the changes known in the codes of nuclear genomes while Figure 2.4 represents changes in the codes of mitochondria. Most people still regard these variants as exotic exceptions. But Osawa et al. (1992) argue that even more variants might be discovered as a larger proportion of the 10 million species come under scrutiny. But even the limited amount of variation that is already known contradict Crick’s idea that the code cannot change in a sophisticated life form. So how can we explain that such changes occurred without killing the organisms that fostered them? According to Osawa et al. (1992), mutation pressure is the answer. In the nuclear genome of most species, the fraction of bases which are either G or C deviates significantly Chapter 2. Codes and neutrality in biological and simulated evolution (7) Coelenterates 17 Vertebrates Insects (4,9) Molluscs Echinoderms (3) Nematodes Platyhelminths (2) (4,5) Paramecium (1) (8) (3,6) Protosymbionts Molds Torulopsis Saccharomyces UNIVERSAL CODE Green Plants Figure 2.4: The evolution of the genetic code in non-plant mitochondria. R stands for A or G; N stands for any base. (1) UGA: stop→trp; (2) ACR: thr→ser; (3) AUA: Ile→met; (4) AAA: lys→asn; (5) UAA: stop→tyr; (6) CUN: leu→thr; (7) AGR: arg→stop; (8) CGN: arg→ noncoding; (9) AUA: met→Ile. The point of change (3) is not definite. From Osawa et al. (1992). form the expected value of 50%. This does not seem to be the result of selection for amino acids coded by triplets containing such bases. In eubacteria, species can be found with a G+C content anywhere between 25% and 75%. The favoured explanation for a high G+C content is that mutations from A or T to G or C happen more frequently than in the other direction, the reverse being true of a high A+T ratio. This bias in the mutation process is the result of factors internal to living systems, most probably in the type of errors made at the time of DNA replication. Evidence for this can be found in the comparison of the G+C content of the genome of Escherichia Coli with that of the genome of bacteriophage viruses that infect that bacteria. A correlation exists between the G+C content of the two when the virus uses the bacteria’s replicating machinery for the replication of its own genome; but this correlation is not observed for viruses which have their own replicating machinery. This mutation pressure leaves its mark with more or less intensity on functionally different parts of the genome. In non coding parts of the genome, the pressure is practically not resisted; in genes coding for proteins, it is resisted only marginally more; in genes coding for tRNA and ribosomal RNA it is more strongly resisted but it is nonetheless observable. Among genes coding for proteins, the third base of the codons is the one whose G+C content has the highest correlation with the G+C content of the genome as a whole. Given that many mutations at this positions are neutral, as pointed out earlier, a high proportion of biased mutations at this position are passed on to future generations. The first position of the codon shows a smaller correlation and the second position an Chapter 2. Codes and neutrality in biological and simulated evolution 18 even smaller one. This is consistent with the idea that mutation bias is felt more strongly where the mutations are neutral. Strong mutation bias combined with the neutrality of the code will lead to some codons becoming completely or almost unused. In mitochondria of the yeast Torulopsis glabrata, for instance, codons of types CGN (N being any base) have disappeared altogether as well as any tRNA capable of reading them (Figure 2.4). Arginine is now always encoded as AGA or AGG. This, as Osawa et al. (1992) explain, sets the stage for a nearly harmless codon reassignment. Indeed, if a new tRNA (probably obtained by gene duplication) appears which can read that codon, there is no alteration of existing proteins and the new codon can come back into usage in the course of further evolution. Note that for this to be possible the codon must not share its tRNA with another codon which is still in use. We have now a mechanism by which changes in the genetic code can take place even in complex life forms contrarily to what Crick thought. It is therefore easier to imagine that at some point in the past variation was found in codes on which selection could have acted. 2.2.3 Adaptive forces shaping the genetic code Can adaptive forces be invoked for the present shape of the genetic code? There are reasons to be cautious about it. We just saw that mutation pressure can eliminate the disastrous consequences of changing the meaning of a codon. However, reliance on mutation pressure for such changes implies that very long time scales will be necessary for variation to be generated on which selection can act. In addition, the benefits to be gained from changes in the code can themselves only be felt in the long term. A change in the code is not beneficial to the organism (we just saw that we are lucky if it is neutral), but only to the organism’s lineage if it confers to it some kind of improved evolvability. If a new amino acid is introduced in the code for example, enormous amounts of new possibilities open up for evolution. But the benefits only become tangible once improved proteins have evolved which use the new amino acid. Selection is therefore possible but it has to be a very slow and inefficient process. After this necessary warning, we now look at a feature of the code for which adaptive explanations have long been put forward. Sonneborn (1965) suggested very early on that neighbouring codons tend to code for chemically similar amino acids. He claimed that this property was not accidental but that the code had adapted to minimise the effect of mutations. Crick (1968) and Woese (1967) rejected the idea. Not on the grounds that an adaptive hypothesis was untenable but because they thought that the selective advantage of minimising mutation would be too small. As we saw in Section 2.2.1, Crick (1968) proposed a different explanation for this property. Wong (1980) complicated the debate arguing that the code was not in fact optimal with respect to the property put forward by Sonneborn and that it could be improved by a series of minor rearrangements. Swanson (1984) pointed out that Sonnerborn’s property would also result in limited damage when a tRNA pairs by accident with a similar but unproper codon. If similar amino acids are encoded by similar codons it is also the Chapter 2. Codes and neutrality in biological and simulated evolution 19 case that the occasional misreading of a codon by an unsuitable tRNA will result in the substitution of an amino acid by a similar one. This additional benefit strengthened the case of the adaptive origin. More recently, Haig and Hurst (1991) reexamined how well mutation effects are minimised in the current code. They compared the average chemical distance between amino acids encoded by neighbour codons for the universal genetic code and for 10,000 randomly generated codes, all of them with the same amount of neutrality. Out of these, only 2 had a lower average distance than the genetic code. This indicates that Sonnerborn’s property is highly optimised in the universal code. Yet the part played by selection in this state of affairs remains difficult to assess. Maynard Smith and Szathmary (1995) consider it the most likely explanation. This example illustrates well the difficulty in reaching definite answer over such issues. When a beneficial feature is postulated for the code, its benefits can be difficult to quantify and balance against the small potential for change. The part played by historical accidents is also very difficult to assess. 2.2.4 An adaptive hypothesis for neutrality in the genetic code The hypothesis that this thesis puts forward is that some forms of redundancy cause the evolutionary search for optimal sequences to go faster than others. Furthermore, we postulate that one of the relevant features of redundancy in this respect is the number of neutral mutations it defines. In what follows we give an intuitive argument as to why it could be so. Consider a codon CCC which becomes CCG after being the object of a mutation. This mutation is neutral since both CCC and CCG code for proline. Consider now the effect of another mutation that changes the middle base of both CCC and CCG to A. At the amino acid level, the transition CCC → CAC leads from proline to histidine, while CCG → CAG leads from proline to glutamine. We conclude that the neutral mutation has created the conditions for a diversity of outcomes in the face of subsequent mutations. Consider now a sequence S that is M triplets long; most of those triplets can be the object of neutral mutations such as the ones discussed above. We call Hs the set containing all sequences accessible from S by neutral mutations only. By definition, all sequences in Hs have the same phenotype. Until a sequence of higher fitness is found, a population made of identical copies of S will freely drift across Hs since nothing opposes change within Hs . As with CCC and CCG in the previous paragraph, nothing seems to happen when the population moves around Hs since all sequences have the same phenotype Ps ; but the conditions are being created for a greater diversity of outcomes in the face of subsequent non neutral mutations. This diversity can be quantified in Is , the set of sequences which are exactly one non-neutral mutation away from elements of Hs . Is forms an enclosing envelope around Hs which constitutes a forced passage out of it. The pattern of neutrality in the code will have an effect on Hs and indirectly on Is . If a pattern of neutrality leads to a high number of different phenotypes in Is , more phenotypes are reachable from S and the likelihood of getting trapped in a local fitness maximum is Chapter 2. Codes and neutrality in biological and simulated evolution 20 reduced. However, the relationship between the fraction of possible mutations which are neutral and the variety of phenotypes in Is is not straightforward. As the fraction of neutral neighbours increases, the size of Hs increases and hence the size of Is as well. But this increase in neutrality results in more of the sequences of Is having the same phenotype; and since we are interested in the variety of phenotypes found in Is , the overall effect might be negative. An extreme example would be that both Hs and Is are very large but redundancy is so ubiquitous that all sequences in Is have the same phenotype. Only one phenotypic transition would be possible from Hs in this case. We conclude that neutrality should not simply be maximised. In any case, Chapter 3 will show that the fraction of neutral neighbours is not the only parameter relevant to the problem. Before molecular data became widely available, evolutionary theory did not pay any attention to the possibility of neutral change. It took two articles by Kimura (1968) and King and Jukes (1969) to open the eyes of the Darwinian establishment to the fact that most of the changes at the molecular level are neutral. This revelation inspired Sewall Wright the following comment: Changes in wholly nonfunctional parts of the molecule would be the most frequent ones but would be unimportant unless they occasionally give a basis for later changes which improve function in the species in question (Provine, 1986, p.474). This is essentially the same point as we have made. In order to be complete, an argumentation for neutrality as an adaptation of the genetic code would have to: • take into account the wobble rules, • show that varying the pattern of neutrality of the code affects the evolutionary process, • show that the current pattern of neutrality in the genetic code is optimal or at least locally optimal, • indicate how an upwards path to that optimal pattern could have taken place from an early version of the code. Only the first point will be fully addressed in this thesis. We will then examine the consequences of this fact for the practice of GAs. As explained in the introduction, the relevance of the first point for GAs is not dependent on the outcome of the other two. 2.3 Codes in Genetic Algorithms Codes or encodings, as they are sometimes called in the GA literature, can be broadly defined as the symbolic manipulations through which genetic information is translated into a more convenient description of the candidate solution. Whereas the genetic code only addresses the very first layer of interpretation of the genome, GA encodings refer to the symbolic manipulations going all the way up to the full definition of a solution. These Chapter 2. Codes and neutrality in biological and simulated evolution 21 encodings are therefore sometimes called genotype to phenotype mappings. Because they cover the whole transformation from genotype to phenotype, encodings have a strong impact on the performance of GAs. 2.3.1 Importance of the genotype to phenotype mapping for GAs Search space Solving a problem with a genetic algorithms is, unlike biological evolution, a goal oriented process where fitness measures an ability to solve a predefined problem. Hence, the evolutionary process has to be constrained to operate within boundaries compatible with some preconceived idea about the structure of a solution. Before we decide on the encoding, we should thus have a set of objects A among which we expect to find some good solutions to our problem and a fitness function which applied to any element of A returns a real valued number measuring its quality as a solution. A is therefore the set of phenotypes to which we want to restrict the evolutionary process. This leads to two considerations in the choice of a genotype to phenotype mapping: • mostly elements of A are represented by genotypes, • almost all elements of of A are represented by some genotype. In the case where we have some additional information about which objects in A are likely to yield good solutions, we have the possibility of biasing the encoding by allowing more genotypes to represent those objects which we think are likely to perform well. Mutation In standard genetic algorithms, genotypes are bit-strings, and mutation alters a genotype by changing each bit with a low probability. Therefore, from any given genotype, G, mutation produces genotypes which differ from G at a few positions. The expectation is that this type of transition will keep producing improved solutions until an acceptable solution is found. Whether this is the case or not depends on the phenotypic consequences of mutation which will be determined by the genotype to phenotype mapping. Or, as Wagner and Altenberg (1996) put it: What turns out to be crucial to the success of the evolutionary algorithm is how the candidate solutions are represented as data structures ... The process of adaptation can proceed only to the extent that favorable mutations occur, and this depends on how genetic variation maps onto phenotypic variation. However, the relationship between a phenotype and those that can be reached from it by point mutation offers only limited understanding of the evolutionary process. Mutation will rarely be used on its own in the generation of new individuals. It is usually used in conjunction with crossover which complicates the issue significantly. Also, as said above, a parent will differ from its offspring by a number of mutation that is Poisson distributed. However, if instead of entire chromosomes we look at small sections of them as they travel down the generations, it is indeed the case that point mutation is the main source of change: a small section of chromosome is unlikely to be broken down by recombination or altered by more than one mutation at the same time. Chapter 2. Codes and neutrality in biological and simulated evolution 22 Crossing-over The rationale behind the use of the crossover operator is that it can combine in a single individual good schemas which only exist in different individuals of the population. For this to favour the discovery of even better solutions, good schemas must combine gracefully. High epistasis describes the situation where schemas behave very differently depending on the genetic information found elsewhere on the chromosome. Low epistasis, on the other hand, makes it possible to define good schemas as those which increase the fitness of a chromosome regardless of the genetic information contained elsewhere. To give an example, epistasis would be at its minimum in a situation where the fitness of a genotype was proportional to the number of 1s it contains. Epistasis is a joint property of the problem and its encoding. An encoding can potentially rearrange the candidate solutions in any arbitrary way at the genetic level. That is, any problem could in theory be encoded in such a way that changing a 0 into a 1 anywhere on the chromosome would produce an increase in fitness. In practice, however, such argument is useless because producing such an encoding implies that we order all possible phenotypes by fitness and assign them to conveniently chosen genotypes. If we can do that, there is no point in running a GA to solve the problem anymore. More realistic encodings can have an effect on the epistasis but only a limited one. The more information one has about the structure of the problem, the more able one is to produce an encoding that reduces epistasis. 2.3.2 Existing work Despite its recognised importance, the issue of representation is poorly investigated as pointed out in the latest survey of the field (Mitchell, 1996). The reason is probably that it is a very hard issue that cannot be examined in complete abstraction of the problem to which one is applying the GA. Indeed, an examination of the literature shows that a constant preoccupation for practitioners of GAs is to tailor the encoding to the peculiarities of their problem. Encoding neural networks Neural network design is the application of GAs where the issue of encoding has received most attention. Early attempts focused on nets of fixed architecture and only allowed the weights of the connections to evolve (Montana and Davis, 1989). Some researchers then extended the evolutionary search to the topology of the network. Miller et al. (1989), for instance, encoded the connectivity matrix of networks whose number of nodes was fixed. Harvey et al. (1992) proposed another encoding which allowed the number of nodes as well as the number of connections to vary. Every connection is encoded using 7 bits which describe the nature of the connection (excitatory or inhibitory) and its destination. The origin of the connection is implicit from the position of the segment on the chromosome; all connections coming out of a node are grouped together on the chromosome. Kitano (1990) criticised such encodings on two grounds. Firstly, the chromosome would have to get larger and larger in order to cope with larger and larger networks. Secondly, these encodings do not favour the generation of repeated structures which he thought would be Chapter 2. Codes and neutrality in biological and simulated evolution 23 helpful in most problems. He proposed, as an alternative, to use the chromosome to encode a set of production rules that could produce topologies whose size is not correlated to the size of the rules. This approach also happened to be more in tune with the biological way neural connections are specified by genetic information. Many other proposals followed which all relied on encoding a recipe to produce the network rather than a blueprint of it (Gruau, 1992; Boers and Kuiper, 1992; Husbands et al., 1994; Dellaert and Beer, 1994). Despite the large number of proposals made, no clear picture has emerged of the best way of genetically representing neural networks for fruitful manipulation by the GA. The main problem is a lack of comparative work between those approaches. But comparative work would not be easy for two reasons. Although the end product is in all cases a network, the encoding is often tailored with a particular task in mind. Hence it would be difficult to decide on which task the comparison should take place. Secondly, those encodings differ a lot from each other so that even if some were found to be better than the others it would be very difficult to identify the exact causes of the difference. It is almost certain, however, that the design of neural networks by GAs could be improved by a better understanding of encoding issues. It might be the case, however, that the problem is too complex to provide the right context in which to carry out informative comparisons of coding strategies. Comparing coding strategies In one of the few studies of its kind, Caruana and Schaffer (1988) compared standard binary encoding to Gray encoding on a set of test functions. They showed that Gray encoding outperforms standard binary encoding on most functions. In binary encoding some consecutive integers have genetic representations which are very far apart. For instance 7 and 8 are encoded by 0111 and 1000 respectively. Gray codes, on the other hand, have been designed to avoid such cliffs: consecutive integers will be represented by binary strings whose hamming distance is 1. Because the functions on which the comparison was carried out were quite regular when examined at the numerical level, Gray encoding preserved that smoothness better than standard binary encoding. For functions with a rugged structure at the numerical level, Gray encoding would not perform particularly well. Whichever way we choose to represent integers, some transitions will be facilitated at the expense of others. But Caruana and Schaffer point out that there will be as many functions for which standard binary encoding outperforms Gray encoding as there are functions for which the reverse is true. The superiority of Gray encoding only stems from the fact that functions of interest are more likely to fall in the second category. Neutrality in GAs Neutrality and neutral drift are not typically regarded as relevant issues in the field of genetic algorithms. Harvey (1993) has argued against this on the basis that in most realworld GA applications, neutral or nearly neutral paths will exist which the population will not fail to explore. The author illustrates this with a detailed genetic study of a population of neural networks under selection for their ability to guide a robot to the center of a room. It is shown that despite its high degree of convergence, the center of mass of the population moves around at a rate suggesting a large amount of neutral drift. Chapter 2. Codes and neutrality in biological and simulated evolution 24 Order of visit of the cities 75431260 Binary 75531260 representation 111 101 100 011 001 010 110 000 mutation 111 101 101 011 001 010 110 000 Invalid phenotype: city 4 is not visited Figure 2.5: Mutation and the travelling salesman problem. The standard mutation operator does not interact well with a natural representation of the order of visit of the cities. The issue of neutrality has, however, not been discussed by Harvey in connection with codes. 2.3.3 Relevance of redundancy for GAs Redundancy in the genetic code will only be relevant to GA applications where similar coding mechanisms exist. We count as similar to the genetic code any mapping between binary segments of determined size and symbols taken from a larger alphabet. The point of such mechanisms in GAs is to make a bridge between bits and more expressive symbols which provide a higher language in which a solution can be expressed. A good illustration of such mapping can be found in Boers and Kuiper (1992). They evolve neural nets and their encoding relies on grammatical rules to produce the topologies. These rules are expressed using 17 different types of symbols. Although only 5 bits would have been sufficient to encode those 17 symbols in a binary form, the authors chose to use the genetic code as the inspiration for the correspondence between the symbols and their binary representation and therefore included some redundancy. Six bits per symbol are used so that the table has 64 entries and the redundancy is allocated in a way as close to the genetic code as possible. No justification, however, is provided for this choice. Very often, the symbols which are being encoded are numerical values, either integer or real. Whenever this is the case, the standard representation of integers in base 2 can be used as the basis for the mapping. If, for instance, real values between X and Y are to be represented using p bits, the segment which reads a1 , a2 , ..., ap can be associated with X +i(a1 , a2 , ..., ap )(Y −X)/(2p −1) where i(a1 , a2 , ..., ap ) is the integer whose representation is a1 , a2 , ..., ap in base 2. In all these cases, redundancy can easily be introduced regardless of whether the symbols are integers, real numbers or part of a grammatical rule. All we have to do is increase the number of bits used to represent them without increasing the number of values they can take. Representation is a particularly difficult issue for many combinatorial problems whose Chapter 2. Codes and neutrality in biological and simulated evolution PHENOTYPE GENOTYPE representation 7 965 214 8 3 1 965 2147 03 038 Mutation by creation of a new subset from elements of other subsets 6 5 78 Mutation by redistribution of one subset into the others 0 2 9 25 4 038 147 96 52 Figure 2.6: Two possible redefinitions of mutation compatible with set partitioning problems. Adapted from Falkenauer (1995) solution requires the ordering or the partitioning of some set. The travelling salesman problem and job-shop scheduling are instances of such problems. In the travelling salesman problem, the set A of phenotypes on which evolution should operate contains all the possible orders in which the cities can be visited. A simple genetic representation of that ordering is to list the index of the cities in their order of visit on the chromosome. The problem with such representation is that it does not interact well with the mutation operator as shown on Figure 2.5. Some alternatives to the standard GA have been proposed to address such limitations. Many of them call for a redefinition of both mutation and crossover. Falkenauer (1995) for instance suggests that, for partitioning problems, the unit of information on the chromosome should be the subsets that define the partition of the set rather than individual elements of it. Accordingly, he redefines mutation so that it is adapted to such building blocks. Two possible redefinitions are illustrated in Figure 2.6. The traditional mutation operator inspired from biological systems acts at a very low level, the binary digit, and can alter any bit independently of any other. The alternatives suggested by Falkenauer, on the other hand, operate at a much higher level and have to respect some constraints which are global to the chromosome such as the non-repetition of an element in different subsets. Defining genetic operators which can handle such high-level representations suppresses the need for binary representations and consequently for a mechanism similar to the genetic code. Redundancy as it is studied in this thesis is therefore not applicable to problems where such representations are used. 2.4 Neutrality in RNA evolution In the past three years, a group of theoreticians working around Peter Schuster has been investigating neutrality in RNA folding and its consequences for the evolutionary process. We will now introduce their approach and discuss their findings. Chapter 2. Codes and neutrality in biological and simulated evolution 6 6 G 5 U 4 C 26 A A 7 5 U G 7 A 8 4 A A 8 3 G C 9 3 C G 9 2 G C 10 2 A U 10 1 A U 11 1 A U 11 A 12 5’ 5’ G 12 3’ 3’ Figure 2.7: Two RNA molecules with the same secondary structure. 2.4.1 RNA folding RNA molecules fulfill a wide variety of functions. As messenger RNA, they convey information from the nucleus to the ribosomes in the cytoplasm. But RNA molecules are also capable of more active roles. We saw that transfer RNAs are in charge of implementing the genetic code. More recently, Kruger et al. (1982) showed they can perform enzymatic activity which was thought to be the exclusive domain of proteins. Like proteins, RNAs are capable of specific action by folding into a precise three-dimensional pattern called the tertiary structure of the molecule. This structure somehow results from the base sequence, also called the primary structure, but for the time being no model is capable of predicting how one results from the other. The secondary structure is an intermediary description of the folding process which is easier to predict from the primary structure. It describes the pairing that takes place between some of the bases of the molecule. This creates some important constraints on the folding of the molecule but it is by no means a full description of its three-dimensional structure. 2.4.2 Shape space If we consider RNAs made of nucleotides, A, C, G and U, three pairings are possible: the Watson-Crick base pairs C-G and A-U as well as the weaker G-U pairs. Two RNA sequences have the same secondary structure if they have the same number of bases and if the relative positions of the pairing bases are the same, disregarding which of the possible pairs of nucleotides (A-U, U-A, C-G, ...) are actually involved. An example of two molecules folding in the same shape (the term shape will be used in what follows as synonymous of secondary structure) is given in Figure 2.7. Remember that because of the asymmetry of the backbone, the ends of an RNA can be distinguished. One is called 3’, the other 5’. Chapter 2. Codes and neutrality in biological and simulated evolution 27 A natural and unambiguous way of defining a shape is to list the position of the bases (counted from the 5’ end) which are attached to each other. A position can appear only once in this list and one that does not appear in it indicates an unpaired base. The condition that no knots or pseudo-knots exist implies that pairs (xi, xj) and (xk, xl) such that xi < xk < xj < xl are ruled out. In this representation, the shape in Figure 2.7 would be [(1, 11), (2, 10), (3, 9)]. An alternative way of describing shapes is to denote an unpaired position on the molecule by ‘.’ and a paired one by either ‘(’ or ‘)’: the character ‘(’ is used if the partner position is located further to the 3’ (right side) of the sequence and ‘)’ if it is located to the 5’ (left side) of the sequence. With these conventions, the shape in Figure 2.7 would be denoted ‘(((.....))).’. 2.4.3 Sequences folding into s and sequences compatible with s From a formal point of view, we can describe the folding process as a function f : Qnα → Y n where Qnα is the set of possible sequences of n letters taken from an alphabet of size α and Y n is the space containing all secondary structures of length n (excluding knots and pseudo-knots which are not considered here). In the case where we consider all standard nucleotides, we have α = 4. The set Qnα can be seen as a graph if we decide that sequences which differ at a single base are connected by an edge. To understand neutrality in the context of RNA folding is to propose some description of f −1 (s), the set of all sequences that fold into a shape s. Unfortunately, no inverse algorithm exists that generates all the sequences that fold into a given shape. The only way to proceed is to try one by one all sequences checking whether they fold into s. However, when doing so, we do not need to examine all the sequences of Qnα . We can define a subset in which we are sure to find all the sequences that fold into s. This subset, which we call the set of compatible sequences of s and denote as C(s), is made of all the sequences which could fold into s without contradicting the logic of base pairing. Consider again the shape s in Figure 2.7. Any sequence where base 1 can pair with base 11, base 2 with base 10, and base 3 with base 9, will belong to C(s) since it could fold into the shape of that figure without contradicting base pairing. A sequence where those conditions are not met clearly cannot fold into that shape. Belonging to C(s) is a necessary condition for folding into s. But it is not sufficient since a sequence is typically compatible with many different secondary structures. The considerations which allow to decide in which of those possible configurations a sequence will end up will not be described here. Various algorithms do such predictions based on considerations of free energy of the molecule. There is however no shorter way to determine f −1 (s) than to apply such algorithms to every element of C(s). At the unpaired positions of shape s, elements of C(s) are free to take any value. It results that, given a shape with nu unpaired positions and 2np paired ones, the number of compatible sequences is nαu nβp where α is the number of possible bases and β, the number of choices for two bases paired together. In the case of alphabet {A,C,G,U}, α is equal Chapter 2. Codes and neutrality in biological and simulated evolution 28 to 4 and β to 6 (A-U, U-A, G-C, C-G, G-U and U-G). In the simpler case that will be considered later where the alphabet is restricted to {G,C}, α and β are both equal to 2. 2.4.4 Connectivity of C(s) Let us examine now the connectivity of C(s) considered as a subgraph of Qnα . Consider two elements of C(s) which differ in the way one of the base pairs that characterise s is implemented. An instance of this could be AGGCACCGCCUG and ACGCACCGCGUG if s is the shape in Figure 2.7. Such sequences are two point mutations away. However, if we perform one of those mutations but not the other the pairing becomes impossible; there is therefore no way of going from one of these sequences to the other by point mutation without stepping out of C(s) for one step. We conclude that C(s) is not a connected subgraph of Qnα . Note that the six possible base pairs are split in two groups: transitions C-G ↔ U-G ↔ U-A are possible on one side and G-C ↔ G-U ↔ A-U, on the other. It is therefore possible to go, for instance, from AGGCACCGCCUG to AAGCACCGCUUG by point mutations without stepping out of C(s). Hence, C(s) is fragmented into 2np connected subgraphs corresponding to the possible assignments of base pairs from either of these two groups to each of the paired positions. Each of these subgraphs contains exactly 3np × 4nu points corresponding to the choice of one base pair among the three in each group and the unconstrained choice of bases for the unpaired positions. In order to cement this constellation of components into a single connected graph, Reidys and colleagues decided to add edges to Qnα by connecting sequences which differ at two points when these points correspond to paired positions of shape s. The resulting graph is richer in edges than Qnα , but the number and nature of these added edges depends on s which is reflected in its name, C(s). To make clear the different parts played by paired and unpaired positions in this new representation, Reidys and colleagues define this graph as the product of two graphs: n C(s) = Qnαu × Qβ p n In Qβ p , which corresponds to the np paired positions of the shape, the β possible base pairs are treated as having a distance of 1 with each other. Sequences which differ in a n single base at an unpaired position are neighbours in Qnαu × Qβ p as well as those differing at two positions paired together as long as the two new bases can also pair together. The sequences AGGCACCGCCUG and ACGCACCGCGUG shown at the beginning are now connected. But AGGCACCGCCUG is still not connected with AGGCACCGCGUG because G cannot pair with G. The preimage of s, f −1 (s), seen as a subgraph of Qnα , is likely to be as fragmented as C(s) given that it is a subset of it. To avoid this, Reidys and colleagues consider f −1 (s) as embedded in the connected graph C(s). They call the resulting subgraph N (s), the neutral network of s. Clearly, N (s) is likely to be more connected than f −1 (s) seen as a subgraph of Qnα . Chapter 2. Codes and neutrality in biological and simulated evolution 2.4.5 29 Modelling neutral networks with random graphs In order to build a simple statistical model of N (s), Reidys and colleagues have resorted to random graph theory (Palmer, 1985; Bollobás, 1985). The only result of random graph theory that is used here is the following. Consider Γλ a subgraph of Qnα constructed by including in it every vertex of Qnα with probability λ. Edges of Qnα are in Γλ only when they connect two vertices which are in Γλ . √ It has been shown that there exists a critical value λ∗ = 1 − 1−α α such that, whenever λ > λ∗ , Γλ is almost certainly connected. n Given that N (s) is embedded into Qnαu × Qβ p as we saw above, Reidys and colleagues break down N (s) into the product of two random graphs N (s) = Γu × Γp n with Γu a random subgraph of Qnαu with probability λu and Γp a random subgraph of Qβ p with probability λp . Random graph theory then states that if both λu and λp are greater √ √ than their respective critical values λ∗u = 1 − 1−α α and λ∗p = 1 − 1−β β then N (s) is a connected subgraph of C(s). 2.4.6 Random graphs compared to simulated neutral networks The only available data against which random graphs can be evaluated as models of neutral networks is described in Grüner et al. (1996a). The authors applied a folding algorithm to every 30-bases-long RNA made of G and C. All the possible shapes s were recorded together with all the sequences that fold into each of them. This computation took 130 days on an IBM Risc 6000 workstation! The results are not exactly in agreement with random graph theory. Since the alphabet is only {G,C}, α and β are equal to 2 and both λ∗u and λ∗p are equal to 0.5. However, many observed secondary structures for which the calculated values of λu and λp are well above 0.5 have neutral networks N (s) made of 2 or 4 sub-graphs of similar sizes. This contradicts random graph theory which would predict a single connected component. In order to explain the discrepancy, the authors observe that these different components vary in the ratio of C and G at critical positions corresponding to unpaired positions that could easily become paired and destroy the shape. But, in the words of the authors, “The deviations from theory were explained by structural features that are inaccessible to the random graph approach.” Other shapes were found whose neutral networks should have been connected according to random graph theory but were in fact made of one large component and some smaller components. The authors attribute this type of discrepancy to finite size effects since the results of random graph theory are only true in the limit of long sequences. 2.4.7 Population dynamics on neutral networks Whether or not they are properly described by random graphs, large neutral networks appear to be the norm rather than the exception in the folding of RNA molecules. Some twenty years ago, Eigen and Schuster (1977) analysed the situation where a master sequence, fitter than all neighbour sequences, replicates with a probability of error p per Chapter 2. Codes and neutrality in biological and simulated evolution 30 nucleotide. They discovered the existence of a critical value of p, the so-called error threshold, beyond which the master sequence will eventually be lost. Below the error threshold the population is organised in what Eigen and Schuster call a quasi-species: a proportion of the population sits at the master sequence while the rest forms a cloud whose density decreases with hamming distance. The closer p is to the error threshold, the more spread out this cloud is. This analysis was adapted by Huynen et al. to address the situation where the master sequence is embedded in a neutral network. The concept of error threshold for the master sequence has to be abandoned since for any mutation rate the population moves around the neutral network loosing rapidly the original sequence. The secondary structure or phenotype is, however, preserved as this happens. What they define is a phenotypic error threshold beyond which the secondary structure itself is lost. Below this threshold, the population divides itself into identifiable clusters of sequences spread out on the neutral network. Derrida and Peliti (1991) have analysed neutral drift on a flat landscape and found qualitatively the same behaviour. The population is less fragmented in the case of a neutral network because the boundaries of the network have a canalising effect on the drift. The main conclusion of this work is that below the phenotypic error threshold, the population is homogeneous in phenotype but is in fact exploring different regions of the neutral network. If an entry point is found into a fitter shape, the population will reassemble on the fitter side of that entry point and start spreading out again from there on the neutral network of the new, fitter shape. A population of evolving RNAs is therefore not a single localised quasi-species in sequence space but rather a collection of constantly moving quasi-species. Huynen et al. claim that the independent diffusion of these quasispecies increases the likelihood that the population as a whole encounters entry points to the neutral networks of better shapes. 2.4.8 Perpetual innovation along a neutral network For this claim to hold, neutral networks have to offer a variety of phenotypes, i.e. shapes, at their boundaries. It could in principle be the case that, even though the boundaries of a neutral network are large, the number of neutral networks with which it share this boundary is relatively small thus limiting the number of possible transitions. This is a consideration related to the variety of phenotypes found in the set Is as discussed in Section 2.2.4. Huynen (1996) explored the boundaries of a neutral network. Starting from a 76 bases long sequence whose secondary structure is that of the tRNA for phenylalanine, he counted the number of distinct secondary structures (S) that were found among the non-neutral neighbours of that sequence and kept a list (Q) of those structures. Then he chose an arbitrary neutral neighbour of the sequence and examined the non-neutral neighbours of that new sequence adding to S the number of structures not already in Q and adding the new sequences to Q. This procedure was repeated for 1000 steps. When the cumulative number of different shapes is plotted against the number of steps taken Chapter 2. Codes and neutrality in biological and simulated evolution 31 on the neutral network, a linear relation emerges whose slope indicates that an average of 18.1 new shapes are found every step. The same procedure for a totally random walk would produce 39 new shapes every step. The linear relationship between the cumulative number of shapes found and the number of steps taken leads Huynen to conclude that novelty does not saturate as one moves around the neutral network. Every step along it brings an equal number of not yet encountered shapes. Note that Huynen only used point mutations when performing the random walk on the neutral network of the tRNA. Hence, as discussed above, many changes at the paired positions were impossible and only one of 2np = 220 ≈ 106 components was explored. 2.4.9 Critique of the random graph approach Distortion of sequence space As we saw in Section 2.4.4, the underlying sequence space had to be transformed prior to application of the random graph approach to accommodate the fact that mutations must sometimes happen in concert at paired positions in order not to disturb the structure. This has two major drawbacks. First, as the authors point out, Defining N (s) as an induced subgraph of C(s), rather than as an induced subgraph of the sequence space Qnα itself, avoids the peculiarities introduced by the logic of base pairing. On the other hand, the neighborhood relation no longer coincides with the action of mutation. Hence we have traded technical tractability for biophysical interpretation. Secondly, the sequence space Qnα which is normally adequate to describe any RNA of length n has had to be replaced by a space C(s), tailored to deal with a specific shape and whose properties are dependent on this shape. Interactions between paired and unpaired positions The decomposition of N (s) into Γu × Γp is not necessarily a natural one. Under that definition, a sequence x1 ...xnu y1 ...ynp belongs to N (s) if and only if x1 ...xnu is a vertex in Γu and y1 ...ynp is a vertex in Γp . But this product definition also commands that if x1 ...xnu y1 ...ynp and x01 ...x0nu y10 ...yn0 p are two sequences in N (s) then x01 ...x0nu y1 ...ynp and x1 ...xnu y10 ...yn0 p are also in N (s). Such regularity in the interaction between paired and unpaired positions seems unlikely, and would need to be assessed. If it is true, for a shape s with parameters nu ,np ,λu and λp , we expect the size of N (s) to be | N (s) | =| Γu || Γp | = λu αnu λp β np From some of the runs described in Grüner et al. (1996a), we have the necessary numerical values to check how well this relationship holds. The most common structure to appear among 30-bases-long sequences made of G and C is ‘........(((((((((....)))))))))’ from which we deduce that nu = 12 and np = 9. The authors calculated that λu = 0.860 and λp = 0.895 (no explanation is provided of how these numbers were calculated). The expected value for | N (s) | from these figures is therefore | N (s) |e = 0.860 × 0.895 × 212 × 29 = 1614178 Chapter 2. Codes and neutrality in biological and simulated evolution The actual value of | N (s) | is 1568485, which is reasonably close. 32 But structure ‘......((..(((((((...))))))).))’, found in the same table, has λu = 0.562 and λp = 0.576 which would predict | N (s) |e = 678873 while the real value is only 118307. In fact, most instances for which we have made such calculations are well under the expected value. This, we think, casts serious doubts on the ability of the cartesian product between the two graphs to capture the nature of the interaction between paired and unpaired positions. Correlation and neutral networks Both Schuster (1996) and Reidys (1995) acknowledge that random graph theory relies on the assumption that the sequences forming the same structure are distributed randomly in the space of compatible sequences. But no comment is made about the realism of such assumption. This is, however, unlikely to be the case since it states that, given a sequence X with shape s, changing one of the unpaired bases of X is just as likely to destroy s as changing all paired and unpaired bases (provided we remain in the set of sequences compatible with s). This is counter-intuitive but could probably be tested from the data presented in Grüner et al. (1996a,b). A non-random distribution, if it was observed, could explain why unconnected neutral networks were obtained where random graph theory would have predicted connected ones. If sequences folding into a shape were found not to be distributed randomly, the random graph model would have to be modified to account for this fact. The tendency of sequences folding into s to form clusters within C(s) would have to be measured and some way of generating graphs with the same distribution would have to be found. No doubt, however, this would complicate the model a great deal and limit the range of theorems from random graph theory that would be applicable. Is full connectedness of neutral networks relevant to adaptation? The main concern for applying random graph theory was to find simple conditions under which a neutral network is fully connected. The underlying assumption is that the bigger the neutral network the more effective the search for better adapted shapes. But given a number of generations and a population size, there is a limit to how large a neutral network can be effectively explored. A calculation of this size would be a useful figure against which the actual size of neutral networks ought to be compared. Knowing that two neutral networks must, somewhere in their immensity, come very close to each other is not very useful if we cannot qualify this fact with the expected number of generations it would take a population of size N to find this secret passage. Furthermore, Schuster (1995) showed that all common shapes (shapes which are realised by a large number of different sequences) can be found within a small radius of sequence space. Being able to travel over long distances on a neutral network might therefore not be necessary for the discovery of useful novelty. 2.5 Conclusion This chapter has shown that the genetic code is not an immutable consequence of chemistry; it can change and selection could conceivably favour good codes against bad ones. We suggested that the pattern of redundancy found in the genetic code could be the result Chapter 2. Codes and neutrality in biological and simulated evolution 33 of such selection. We showed that if redundancy has been beneficial to the genetic code, there are good reasons to believe that it could also be beneficial to similar codes as used in genetic algorithms. Although the issue of representation has been consistently singled out as an important one, the voluntary introduction of redundancy in codes has not been the object of any investigation. Neutrality in the mapping from primary to secondary structure of RNA molecules has been described as enhancing the process of evolution in these molecules. The neutrality in question is not mediated by a code; it is a consequence of the laws of physics and therefore has no potential to change or be selected for. The fact that it could be beneficial in such an immutable form is nonetheless encouraging for our case where neutrality can be tuned by progressive changes to the code. Chapter 3 A formal framework for a comparative study of redundancy This chapter proposes a formal definition of redundancy that will be used in the rest of this thesis. Section 3.1 formulates the desired requirements for that definition. Section 3.2 proposes a definition in the minimal case where only one bit of redundancy is added. Section 3.3 examines possible ways of generalising it to the addition of several bits. Section 3.4 proposes a shortcut for assessing the impact of a pattern of redundancy on the evolutionary process. 3.1 Requirements for a definition of redundancy The previous chapter suggested that redundancy in the genetic code could be more than a historical accident. Because it can induce neutral paths between sequences, redundancy might be able to reduce the likelihood that a protein is stuck in a local maximum of catalytic efficiency from which further improvement through natural selection is hindered. The same principle could be beneficial to a GA when the introduction of similar redundancy is possible. Assessing the impact of redundancy on the evolutionary process is therefore relevant both to the origins of the code and to the practice of genetic algorithms. If we want to address both issues simultaneously, we need a language to talk about redundancy in codes which is insensitive to the function of the code and to the nature of the symbols which are encoded. The theory of error-correcting codes faces a similar need (Haykin, 1988). Whenever a digital message is transmitted over a noisy channel, some of the symbols that compose it can be corrupted and the message misinterpreted at the receiver’s end. Error-correcting codes minimise the likelihood of such misinterpretations by adding some well designed redundancy to the message. This makes the detection, and sometimes the correction, of errors possible at the receiver’s end. For these techniques to have the required generality, they must, however, overlook the nature of the message so that correction can take place regardless of whether text, voice or data is being transmitted. Such dissociation from the semantic of the layer above is also desirable if we want to conduct our study in enough Chapter 3. A formal framework for a comparative study of redundancy 35 generality for it to apply to any GA application and to the genetic code. Another desirable feature for our characterisation of redundancy is that it captures well those features which are relevant to its interaction with the evolutionary process. As discussed in the previous chapter, we have reasons to believe that the existence of neutral mutations could be one such feature. But it is probably not the only one and it is therefore advisable to keep an open mind about this question. The more precise our language is in its description of redundancy, the more likely we are to be able to identify properties relevant to the interaction of redundancy and evolution. Genetic Algorithms typically handle bit-strings, whereas the genetic code uses a quaternary alphabet. In this thesis, we chose to focus on the case of binary alphabets. Our results are therefore directly applicable to GAs; the drawback is that we move away from the best model for the genetic code. This can be seen as problematic since we would like our conclusions to hold for the genetic code as well. We believe, however, that results obtained with binary alphabets apply equally well to quaternary ones since all the definitions and arguments in favour of redundancy described in this chapter and the next are easily translated to the case of a quaternary alphabet. The difference between the two cases is therefore of a quantitative nature not of a qualitative one. We will come back to this issue at the end of Chapter 4. 3.2 3.2.1 Redundancy in a minimal form A possible definition We can formally define a code as a function T : {0, 1}n → S where S is an arbitrary set of symbols. We are not concerned with the nature of these symbols since we want this definition to be as broad as possible. The function T associates an element of S to each of the possible sequences of n bits. In the case of the genetic code for instance, S contains 20 amino-acids and the stop codon. In the case of genetic algorithms, elements of S are not the candidate solution but building blocks which are further combined together to define it. In the rest of this thesis, the term symbol will be used to refer exclusively to elements of S. A non-redundant code is a code for which T is injective: ∀(a1 , a2 , ..., an ) ∈ {0, 1}n , ∀(b1 , b2 , ..., bn ) ∈ {0, 1}n , (a1 , a2 , ..., an ) 6= (b1 , b2 , ..., bn ) =⇒ T (a1 , a2 , ..., an ) 6= T (b1 , b2 , ..., bn ) In other words, a code is non-redundant when no element of S has more than one sequence representing it. This conforms to our intuitions about redundancy. A non-redundant code will allow the representation of as many symbols as there are sequences, 2n in the case of sequences of length n. Since we are only interested in elements of S which can be expressed within the code, we exclude from S symbols which have no sequence representing them (i.e. we ensure that T is surjective). As a result, for a code T defined on sequences of Chapter 3. A formal framework for a comparative study of redundancy 36 length n, S has a maximum of 2n elements. It has exactly that number of elements if, and only if, the code is non-redundant in which case T is a bijective function. In this thesis, all instances of codes that will be considered prior to our controlled introduction of redundancy will be of the above type. Clearly, non-redundant codes are only a subset of all possible codes. And if the size of S is not a power of 2 no non-redundant code exists that encodes only elements of S. We believe, however, that restricting our study to these non-redundant codes is necessary. If we do not take this precaution, we will be adding redundancy to codes which are already very dissimilar in the amount of redundancy they display. Compare for instance a code whose set S has 2n + 1 elements with one that has 2n+1 − 1. Both require at least n + 1 bits. But the first one has a large amount of structural redundancy since almost half of the binary strings can be used to provide redundant representations for elements of S. The second has very little scope for such redundancy since only one string is unassigned once all elements of S have been assigned a bitstring. Understanding the effect of adding redundancy will be difficult if some amount of redundancy was already there in the first place. And even more so, if that amount of redundancy is variable. Given a non-redundant code, there are only two ways to produce a redundant version of that code. One is to reduce the size of S, the other is to increase the length of the sequences on which the code is defined while keeping S unchanged. The first option reduces the power of expression of the code which will probably make it unsuitable for its original purpose. The genetic code, for instance, would not fulfill its purpose if it could only express eight amino-acids. We will therefore only investigate the case where sequences are increased in length. This length can be augmented by any amount but we will focus in this section on the simple case where a single bit is added. Let T : {0, 1}n → S be a non-redundant code. We define T σ : {0, 1}n+1 → S ∀(a1 , a2 , ..., an ) ∈ {0, 1}n T σ (0, a1 , a2 , ..., an ) = T (a1 , a2 , ..., an ) T σ (1, a1 , a2 , ..., an ) = T (σ(a1 , a2 , ..., an )) where σ : {0, 1}n → {0, 1}n is any permutation or shuffling over the set of n bit long sequences. That is, σ is a function from the set of n bit long sequences to itself such that every sequence has one and only one preimage sequence through σ. In mathematical terms: ∀(b1 , b2 , ..., bn ) ∈ {0, 1}n , ∃! (a1 , a2 , ..., an ) ∈ {0, 1}n / σ(a1 , a2 , ..., an ) = (b1 , b2 , ..., bn ) Let us now examine the properties of a redundant code T σ defined in this way. Chapter 3. A formal framework for a comparative study of redundancy 37 1. Since T σ (0, a1 , a2 , ..., an ) = T (a1 , a2 , ..., an ) for any (a1 , a2 , ..., an ), the relationship of neighbourhood between symbols that exists in T is preserved in T σ . That is, if a point mutation can turn symbol S1 into S2 under code T , the same transition between symbols is also possible under T σ . 2. By constraining σ to be a permutation we make sure that redundancy is equally distributed among the elements of S. Under T , every symbol is represented by one sequence; under T σ the number of sequences has doubled due to the addition of one bit and every element of S is represented by exactly two sequences. If we failed to ensure this, redundancy could introduce a bias in the representation of some symbols which would make comparison difficult. These two points provide a justification for this choice of definition. 3.2.2 Some examples Let us illustrate these definitions on some concrete example. A valid example Consider the following non-redundant code, T : {0, 1}3 → S = {A, B, C, D, E, F, G, H} a1 a2 a3 T (a1 a2 a3 ) 000 A 001 B 010 C 011 D 100 E 101 F 110 G 111 H and the following permutation σ represented here both in its binary and decimal form. a1 a2 a3 σ(a1 a2 a3 ) i σ(i) 000 001 0 1 001 011 1 3 010 100 2 4 011 110 3 6 100 101 4 5 101 010 5 2 110 000 6 0 111 111 7 7 ⇔ The resulting redundant code is: T σ : {0, 1}4 → S = {A, B, C, D, E, F, G, H} Chapter 3. A formal framework for a comparative study of redundancy d1 a1 a2 a3 T σ (d1 a1 a2 a3 ) d1 a1 a2 a3 T σ (d1 a1 a2 a3 ) 0000 A 1000 B 0001 B 1001 D 0010 C 1010 E 0011 D 1011 G 0100 E 1101 F 0101 F 1101 C 0110 G 1110 A 0111 H 1111 H 38 The condition that ∀(a1 , a2 , ..., an ), T σ (0, a1 , a2 , ..., an ) = T (a1 , a2 , ..., an ) results in all the elements of S appearing in the second column of the table in the same order as they appeared in the definition of T . The condition that ∀(a1 , a2 , ..., an ), T σ (1, a1 , a2 , ..., an ) = T (σ(a1 , a2 , ..., an )) constrains all the elements of S to appear once and only once in the rightmost column. Their order of appearance is arbitrary and determined by the permutation σ. An invalid example Consider the following redundant code U , U : {0, 1}4 → {A, B, C, D, E, F, G, H} d1 a1 a2 a3 U (d1 a1 a2 a3 ) d1 a1 a2 a3 U (d1 a1 a2 a3 ) 0000 A 1000 D 0001 C 1001 B 0010 G 1010 E 0011 D 1011 G 0100 E 1101 F 0101 B 1101 C 0110 H 1110 A 0111 F 1111 H This code will not be considered a valid redundant version of T because U (0001) 6= T (001) which contradicts our definition. Symbols A and B which are neighbours under T are not neighbours under U . Another invalid example Consider now code V , V : {0, 1}4 → {A, B, C, D, E, F, G, H} Chapter 3. A formal framework for a comparative study of redundancy d1 a1 a2 a3 V (d1 a1 a2 a3 ) d1 a1 a2 a3 V (d1 a1 a2 a3 ) 0000 A 1000 A 0001 B 1001 H 0010 C 1010 H 0011 D 1011 H 0100 E 1101 H 0101 F 1101 A 0110 G 1110 A 0111 H 1111 A 39 This is not a valid redundant version of T either because only symbols A and H appear in the right column. We could define a function τ such that V (1, a1 , a2 , a3 ) = T (τ (a1 , a2 , a3 )) but τ would not be a permutation since τ (a1 , a2 , a3 ) will always be 000 or 111 and other values of (a1 , a2 , a3 ) would not have antecedents by τ . Suppose A and H confer on average a higher fitness than other symbols when they appear in the genome; code V would then be better than T not because of the pattern of redundancy but because of the overrepresentation of these symbols. We wanted to rule out such possibility. 3.2.3 The identity permutation Let us consider now the special case where the identity permutation, Id (∀x, Id(x) = x), is used to define a redundant code. The resulting code T Id is defined as follows: T Id : {0, 1}4 → S = {A, B, C, D, E, F, G, H} d1 a1 a2 a3 T Id (d1 a1 a2 a3 ) d1 a1 a2 a3 T Id (d1 a1 a2 a3 ) 0000 A 1000 A 0001 B 1001 B 0010 C 1010 C 0011 D 1011 D 0100 E 1101 E 0101 F 1101 F 0110 G 1110 G 0111 H 1111 H The obvious feature of this code is that the elements of S appear exactly in the same order in the second and fourth columns. This indicates that the leftmost added bit d1 is never relevant in decoding a sequence; the three rightmost bit are sufficient and they can be interpreted exactly as they would under T . Bit d1 is best seen as junk genetic material in this case. From an evolutionary point of view, code T Id should behave in exactly the same way as code T provided the mutation rate per bit is kept constant. Chapter 3. A formal framework for a comparative study of redundancy 40 The same conclusion holds for values of n greater than 3. Consider a non-redundant code, T : {0, 1}n → S to which the identity permutation is applied. The code T Id : {0, 1}n+1 → S is such that ∀(a1 , a2 , ..., an ) ∈ {0, 1}n T Id (0, a1 , a2 , ..., an ) = T (a1 , a2 , ..., an ) T Id (1, a1 , a2 , ..., an ) = T (Id(a1 , a2 , ..., an )) = T (a1 , a2 , ..., an ) The first bit is therefore always irrelevant to the interpretation of a sequence. 3.2.4 Redundancy and neutrality As discussed in the previous chapter, an important consequence of redundancy is the potential for neutral point mutations between sequences which differ in a single bit and have the same meaning. We are therefore interested in finding a systematic way of detecting these mutations once a redundant code is defined. A mutation will be said to be neutral only when sequence (a1 , a2 , ..., an ) is changed into a sequence whose meaning is also T (a1 , a2 , ..., an ). In most genotype to phenotype mappings, there will also be mutations which change the value of T but do not change the phenotype. The analogy in the case of the genetic code would be a change of amino acid which did not change the function of the protein it is part of. This neutrality at the meta-level will be left out of our discussion for the time being. We have already pointed out that, provided T is a non-redundant code and σ is a permutation, every element of S appears only once in the second and fourth columns of the table that defines T σ . As a consequence, given a sequence (d1 , a1 , a2 , a3 ), mutations in bits a1 , a2 , or a3 will never be neutral since they correspond to a move to a different line within the same column. The only bit that can be neutral is thus d1 . A mutation of d1 corresponds to a move to the same line in the other column. Consequently, mutation of d1 will be neutral if, and only if, the same symbol appears on the same line in both the second and fourth columns of the table that defines T σ . This will happen if T σ (0, a1 , a2 , a3 ) = T σ (1, a1 , a2 , a3 ) which, by definition, is equivalent to T (a1 , a2 , a3 ) = T (σ(a1 , a2 , a3 )) which, because T is non-redundant, amounts to (a1 , a2 , a3 ) = σ(a1 , a2 , a3 ) In other words d1 is neutral only for sequences (d1 , a1 , a2 , a3 ) such that (a1 , a2 , a3 ) is invariant through σ. This argument generalises trivially to any value of n. Chapter 3. A formal framework for a comparative study of redundancy 110 111 101 100 010 000 41 011 001 Figure 3.1: A spatial representation of 3 bit long sequences. This is in fact a graph with the property that two sequences connected by an edge can be changed into each other by point mutation. 3.2.5 Permutations as the expression of redundancy We just showed that the number and the identity of the sequences connected through neutral mutations will depend on σ only and not on T . Our formal definition of redundant codes thus appears to disentangle the redundant component of the code, in the form of the permutation σ, from the non-redundant component, T . This was made possible through our decision to restrict ourselves in the choice of T to codes where the number of symbols is a power of 2. Given that restriction, the function T has no interesting property from our point of view and the nature of its output set S, can be conveniently overlooked. Given the choices made in Section 3.2.1, the study of redundancy can be reduced to the study of the possible permutations. For sequences of length n, the set containing all permutations of the sequences will be called Pn . It contains 2n ! elements each of which defines a valid form of redundancy. In fact, given the many symmetries of the sequence space, many of these permutations turn out to be equivalent for our purpose but we will not try to identify these equivalence classes. In Chapter 4 and 5, redundancy will be investigated with no mention of the properties of the underlying non-redundant codes as advocated above. We will see that a great deal can be understood without paying any attention to these codes. Chapter 6, however, will show that there are cases when the features of T cannot be ignored if we want to understand the consequences of redundancy for the evolutionary process. 3.2.6 Redundancy in a graphical form In the case where n = 3, it is possible to gain some insight about our definition of redundancy by examining the issue from a graphical perspective. Consider the cube H3 shown in Figure 3.1. Each of its corners represents one of the eight sequences on which T is defined. This cube is in fact a graph with the property that sequences which differ in a single bit are connected by an edge and reciprocally. Hence, a point mutation is equivalent Chapter 3. A formal framework for a comparative study of redundancy 1110 0110 0111 0010 C0 1111 1101 1010 0101 0100 0000 1100 0011 1000 0001 42 C1 1011 1001 Figure 3.2: A spatial representation of 4 bit long sequences. This graph can be seen as made of two copies of the one shown in the previous figure. One copy (C0 ) contains the sequences starting in 0, the other (C1 ) contains the sequences starting in 1. The two are connected by parallel edges corresponding to a change in the first bit. to moving from one corner of H3 to an adjacent one. If we want a similar representation for sequences one bit longer, we can turn to the four-dimensional hypercube H4 shown in Figure 3.2. H4 can be thought of as consisting of two replicas of H3 : one contains the sequences for which the extra bit is equal to 0 (which we call C0 ) and the other contains those for which that bit is equal to 1 (called C1 ). Notice that sequences (0, a1 , a2 , a3 ) and (1, a1 , a2 , a3 ), which differ only at the extra bit, occupy the same relative position in C0 and C1 and are connected by an edge of the graph. With our definition of redundancy, every sequence in C0 has a synonymous sequence in C1 and H4 provides a convenient way of picturing the relationship between such pairs of sequences. We label the sequences of C0 with their decimal equivalent and use the same labels in C1 in such a way that synonymous sequences display the same label. Hence, 0 labels sequence 0000 and the sequence of C1 which is synonymous of 0000, 1 labels 0001 and the sequence of C1 which is synonymous of 0001, ... The general rule is that (a1 , a2 , a3 ) labels sequence (0, a1 , a2 , a3 ) and sequence (1, σ −1 (a1 , a2 , a3 )) which are synonymous. We can illustrate this with the permutation used in our example page 37. It was defined by σ(0) = 1, σ(1) = 3, σ(2) = 4, σ(3) = 6, σ(4) = 5, σ(5) = 2, σ(6) = 0 and σ(7) = 7 which we write [13465207] by listing the images in order. Its graphical representation is shown on Figure 3.3. We can gather from it that sequence 0101 (labelled 5 in C0 ) is synonymous with sequence 1100 (labelled 5 in C1 ) and that sequence 0111 is synonymous with 1111 (both being labelled 7). When equivalent corners of C0 and C1 are assigned the same label (such as 0111 and 1111 in the previous example), two sequences differing at the first bit are synonymous. The edge that joins the two is therefore a neutral mutation. As explained before, this can Chapter 3. A formal framework for a comparative study of redundancy 0 6 7 5 4 C0 7 2 6 4 3 2 0 5 1 1 43 C1 3 Figure 3.3: A spatial representation of permutation [13465207]. only happen with edges linking C0 to C1 . All other edges are not neutral by construction. Figures such as 3.3 are a useful way of picturing the relations of synonymity introduced by a permutation independently of the code to which they are applied. But we can also see the decimal representations of binary sequences as a code itself a1 a2 a3 T (a1 a2 a3 ) 000 0 001 1 010 2 011 3 100 4 101 5 110 6 111 7 in which case the labels in Figure 3.3 actually represent the meaning of the sequence whose corner of the cube it corresponds to. For instance, 3 is associated in C1 with the sequence 1001 which, if T is the code shown above, effectively means that T σ (1001) = 3. In addition to letting us visualise the relations of synonymity between sequences, this graphical representation also displays the arrangement of symbols in C1 . For instance, in permutation [13025746] represented in Figure 3.4, we see that C1 is obtained from C0 by rotation of 90 degrees about a vertical axis. This property is not easily detected and it has important consequences for the redundancy that results from it as will be discussed on page 64. 3.3 The framework in a more general form The previous section has shown how one bit of redundancy could be added to a code T defined on sequences of arbitrary length n. The aim of this section is to show how this Chapter 3. A formal framework for a comparative study of redundancy 4 6 7 5 4 C0 6 7 2 0 3 2 0 5 1 1 44 C1 3 Figure 3.4: A spatial representation of permutation [13025746]. We see that C1 is a rotated version of C0 . can be generalised to the case where p bits of redundancy are added. Most of the section discusses the case where p and n are both equal to 3. The advantage of this case is that it can be illustrated with diagrams which is not possible for greater dimensions. The generalisation from there to any value of n and p is however straightforward once this case is understood. 3.3.1 The natural generalisation In the case of 1 bit of redundancy, we doubled the original sequence space and defined the meaning of the newly created sequences based on the meanings of the existing ones via a permutation. If we add 3 bits of redundancy, we have an eight-fold expansion of our sequence space: for every existing sequence, there are now 7 new ones whose assignment must be defined. In the case where n = 3, the new sequence space is a six-dimensional hypercube, H6 , which has been represented in Figure 3.5. This graph can be subdivided into 8 replicas of H3 depending on the value of the first three redundancy bits d1 d2 d3 . We call Cd1 d2 d3 the (n-dimensional) cube containing sequences which start in d1 d2 d3 . For p = 1, we ensured that the redundancy bit was irrelevant when it was set to 0, i.e. T σ (0, a1 , a2 , ..., an ) = T (a1 , a2 , ..., an ). In the same way we can ensure that given a non-redundant code: T : {0, 1}n → S the redundant code T red3 T red3 : {0, 1}n+3 → S obtained from T by the addition of 3 redundant bits will be such that ∀(a1 , a2 , ..., an ) ∈ {0, 1}n T red3 (0, 0, 0, a1 , a2 , ..., an ) = T (a1 , a2 , . . . , an ) Chapter 3. A formal framework for a comparative study of redundancy C110 110011 C111 101100 100100 C100 45 C010 010000 C101 C011 001110 C000 000001 011111 C001 Figure 3.5: A spatial representation of 6 bit long sequences. Not all the edges are represented on this diagram. The lines connecting the small cubes to each other are a shorthand for 8 edges connecting every pair of equivalent corners of the small cubes. In other words, the redundancy bits do not change the meaning of the rest of the sequence when they are all equal to 0. For p = 1, we enforced that all sequences starting in 1 would have a different meaning from each other and that these meanings would be taken from set S. This ensured that the number of sequences encoding elements of S remained the same. The simplest way to generalise this constraint to p = 3 is to define 7 permutations σd1 d2 d3 , one for each possible value of d1 d2 d3 other than 000 and have: ∀(a1 , a2 , ..., an ) ∈ {0, 1}n , T red3 (d1 , d2 , d3 , a1 , a2 , . . . , an ) = T (σd1 d2 d3 (a1 , a2 , . . . , an )) When n = 3, the result can be visualised in Figure 3.6. But there are some drawbacks associated with this way of defining a redundant code. As Figure 3.6 shows, all Cd1 d2 d3 are not on an equal relationship to each other. The distance between the redundancy bits d1 d2 d3 induces a distance between the cubes Cd1 d2 d3 which we have captured by representing them as the corners of a cube. Sequence 010011 in C010 has six neighbours, one for each of its 6 bits. A mutation in one of the three rightmost bits will produce a sequence which is also in C010 . These mutations cannot be neutral since, by relating Cd1 d2 d3 to C000 through a permutation, we ensured that the 8 sequences in Cd1 d2 d3 map to different symbols of S. A mutation in the first three bits of the sequence will lead to C110 , C000 or C011 , i.e. one of the three cubes occupying neighbouring positions from C010 . We know that any neutral neighbour of 010011 is to be found among those 3. Sequence 000011 will, for instance, be a neutral neighbour of 010011 if, and only if, 011 is invariant through σ010 . If that is the case, we have T red3 (010011) = T (σ010 (011)) = T (011) = T red3 (000011) and the two sequences are indeed synonymous. Chapter 3. A formal framework for a comparative study of redundancy C111 C110 C101 σ 110 C100 46 σ 111 σ σ 100 10 1 C010 σ C000 01 σ 011 0 σ 001 C011 C001 Figure 3.6: Defining the meanings of Cd1 d2 d3 with seven permutations. Cube Cd1 d2 d3 is related to C000 via permutation σd1 d2 d3 . But if we want to determine whether 011011 is a neutral neighbour of 010011, complications arise because the relationship between C010 and C011 is only defined by the intermediary of C000 . We have T red3 (010011) = T red3 (011011) ⇔ T (σ010 (011)) = T (σ011 (011)) ⇔ σ010 (011) = σ011 (011) (by injectivity of T) −1 ⇔ σ011 (σ010 (011)) = 011 Sequence 011011 is thus a neutral neighbour of 010011 if and only if 011 is invariant through −1 σ011 ◦ σ010 . However, finding the invariant elements of such compositions of permutations is not easy and we will not have much control over the number of neutral mutations which will result from defining redundancy in this way. 3.3.2 Other ways of generalising If we want to have ready access to the number and identity of the neutral mutations that result from the definition of a function T red3 , we can relate pairs of Cd1 d2 d3 graphs which are next to each other rather than defining each of them in relation to C000 . For instance, if we explicitly defined a permutation σ010→011 such that T red3 (011a1 a2 a3 ) = T red3 (010σ010→011 (a1 a2 a3 )) finding whether 011011 is a neutral neighbour of 010011 would just require checking whether 011 is invariant through σ010→011 . We are now confronted by another problem. There is a total of 12 pairs of Cd1 d2 d3 which are neighbours but only 7 permutations Chapter 3. A formal framework for a comparative study of redundancy C110 C111 α C101 β σ3 C100 β = σ -31 σ 2 σ 3 β σ3 γ δ σ1 γ = σ -31 σ 1 σ 3 δ = σ -21 σ 1 σ 2 σ2 C000 α = σ -31 σ -21 σ 1 σ 2 σ 3 σ3 σ 3 C010 47 σ2 C011 C001 Figure 3.7: Defining the meanings of Cd1 d2 d3 with three permutations. The number of invariants of permutations α, β, γ and δ is still difficult to calculate. to be defined. If we do define a permutation for each of the 12 pairs of Cd1 d2 d3 we will have conflicting definitions for the meaning of many sequences which is clearly not acceptable. One way around this is to define only three permutations σ1 , σ2 and σ3 , one for each of the three bits of redundancy, and compose them to reach any Cd1 d2 d3 from C000 as shown on Figure 3.7. In order to go from C000 to C011 for instance, we apply permutation σ2 followed by σ3 . Notice that we must specify the order in which these permutations should be applied because the result will in general depend on it. We could for instance adopt the convention that they should always be combined in order of increasing index; hence C101 would be reached from C000 by σ1 ◦σ3 rather than σ3 ◦σ1 . Defining T red3 this way, 7 out of the 12 pairs of neighbour Cd1 d2 d3 will be related through either σ1 , σ2 or σ3 for which we know the number of invariants. The other 5 permutations would result from compositions and inversions such as σ3−1 ◦ σ2 ◦ σ3 whose number of invariants is still difficult to predict. To simplify things even further, we can choose σ1 , σ2 and σ3 so that they all commute with each other. The permutation σ3−1 ◦ σ2 ◦ σ3 which relates cube C101 to C111 will then reduce itself to σ2 . The other four compositions in Figure 3.7 will also reduce to a single permutation. All 12 transitions between neighbouring Cd1 d2 d3 will be dictated by either σ1 , σ2 or σ3 as shown on Figure 3.8. Mutation of a redundant bit now always corresponds to the same permutation regardless of the value of the other two redundant bits. As a result, the neutral mutations between say C000 and C100 will be determined by the invariant elements of σ1 . So will the neutral mutations between C010 and C110 , C001 and C101 , and between C011 and C111 since all these pairs are related through σ1 . We can calculate in this case the total number of neutral mutations, Ntot , that will result from T red3 . It will be the sum of the neutral mutations for each of the 12 pairs of Chapter 3. A formal framework for a comparative study of redundancy C110 C111 σ1 C101 σ2 σ3 C100 48 σ2 σ3 σ1 σ3 σ 3 C010 σ1 σ2 σ2 C011 σ1 C000 C001 Figure 3.8: Defining the meanings of Cd1 d2 d3 with three commutative permutations. Pairs of neighbour Cd1 d2 d3 are now all related through either σ1 , σ2 or σ3 . neighbouring Cd1 d2 d3 . Of these 12, 4 are related through σ1 , 4 through σ2 and 4 through σ3 . Hence: Ntot = 4(Nσ1 + Nσ2 + Nσ3 ) where Nσi is the number of invariants of σi . We can generalise to the case where p bits of redundancy d1 , d2 , . . . , dp are added. We define p permutations σ1 , σ2 . . . σp , one for each of the bits of redundancy, choosing them so that they all commute with each other. The permutation that relates the meanings of sequences in Cd1 d2 ...0...dp to those in Cd1 d2 ...1...dp is σi where i is the position of the bit where they differ. Because all permutations commute with each other, this will be true regardless of the value of the other redundant bits. There are p2p−1 pairs of Cd1 d2 ...dp which differ in a single bit. Of these, 2p−1 will be related through σ1 , 2p−1 through σ2 . . . . The total number of neutral mutations will thus be: p−1 Ntot = 2 p X N σi i=1 where Nσi is the number of elements which are invariant through permutation σi . Notice that, no matter which way we generalise, when all the permutations we define are equal to identity, we are in the situation where only the three (n in the general case) last bits define the meaning of a sequence. The first three (p in the general case) bits are irrelevant in all situations and can be regarded as genetic junk just as in the case where p = 1. Chapter 3. A formal framework for a comparative study of redundancy 3.4 49 The criteria for assessing redundancy When fitness stops improving in an evolutionary algorithm it is usually because the individuals that compose the population are trapped in one or several local fitness optima. By optimum, we mean a sequence whose fitness is greater than the fitness of all sequences that can be reached from it by a point mutation. The number of local optima in a fitness function is therefore an indication of how effectively evolution can optimise that function. Likewise, if we can reduce the number of local optima of a function without changing the proportion of sequences with a given fitness, we will ease the evolutionary process by making high fitness points easier to reach. We can therefore assess redundancy on the basis of its impact on the number of optima of the code to which it is applied. 3.4.1 Assigning fitness to symbols Section 3.2 has shown how every permutation defined on a set of 2n elements can be regarded as a different way of adding one bit of redundancy to a code defined on sequences of length n. We now want to assess the impact of these permutations on the number of optima of the code to which they are applied. But codes are not fitness functions; they have been defined as mappings between binary sequences and some symbols which then interact together at a higher conceptual level. If numbers of optima are to be compared on a code, we need to to have some criteria to establish that a symbol is better than all the other symbols that can be reached from it by point mutation. Hence, given a code T defined as before, T : {0, 1}3 → S = {A, B, C, D, E, F, G, H} we need a function z which assigns a positive real number to each element of S: z : S = {A, B, C, D, E, F, G, H} → < The composition of the two will yield a function f , f = z ◦ T : {0, 1}3 → < which assigns fitness z(T (a1 , a2 , a3 )) to sequence (a1 , a2 , a3 ), bypassing the meaning of the sequences in terms of elements of S. Suppose we now apply the same function z to T σ , the redundant code defined by permutation σ, instead of T . We have, z ◦ T σ : {0, 1}4 → < z(T σ (0, a1 , a2 , a3 )) = z(T (a1 , a2 , a3 )) = f (a1 , a2 , a3 ) z(T σ (1, a1 , a2 , a3 )) = z(T (σ(a1 , a2 , a3 ))) = f (σ(a1 , a2 , a3 )) We can therefore see z ◦ T σ as the application of redundancy directly to function f and ignore its underlying definition as z ◦ T , f σ : {0, 1}4 → < Chapter 3. A formal framework for a comparative study of redundancy 50 f σ (0, a1 , a2 , a3 )) = f (a1 , a2 , a3 ) f σ (1, a1 , a2 , a3 )) = f (σ(a1 , a2 , a3 )) Until the end of this chapter and in the next one, we will be describing the application of redundancy to functions of the type of f rather than T . The signification of function z which underlies the transformation of one into the other will be discussed in the next section. The decomposition of f into z and T will also be discussed again in Chapter 6. The point of defining functions such as f is to make meaningful the notion of local optima. But all we really need is some rule to decide which of two neighbouring sequences is the fitter. The actual fitness values of the sequences are not necessary. Hence, two fitness functions f1 and f2 which assign different fitnesses to sequences but are such that sequences end up in the same order when arranged by increasing order of fitness are identical for our purpose. All that matters when defining such a function is the order it induces on the sequences. 3.4.2 How meaningful is the fitness of a 3 bit long sequence? What exactly does it mean to study fitness functions defined on sequences as short as the ones we have been talking about so far? Surely, any evolutionary process worth of this name will have to handle sequences orders of magnitude longer. The inspiration behind our definition of a code was both the genetic code and the low-level codes used in evolutionary algorithms. We will answer the question in both contexts. Analogy with the genetic code In the analogy with the genetic code, assigning fitnesses to elements of S, as function z does, is analogous to assigning fitnesses to amino acids. This is, at first sight, a meaningless thing to do since amino acids are not intrinsically good or bad but can only be judged in the context of the protein or genome of which they are part. However, if one kepdf all the bases in the genome constant except for three which define one amino acid, it is possible to define a fitness for each of the 20 choices of amino acids at that position. These fitnesses will depend highly on the context in which this amino acid is allowed to vary. But a function such as z could nonetheless be defined provided that a genetic background was kept constant. The nature of this function would change radically from one background to the other but that needs not worry us here. Parallels with Population Genetics When geneticists talk of beneficial or deleterious mutations, they perform a similar abstraction to the one we are proposing. They imagine two organisms whose genomes differ by a single mutation, one of the two being fitter than the other. They are not interested in measuring the fitness of these organisms, only in distinguishing whether that mutation has a positive or a negative effect every thing else being equal. In fact individuals differing only in this mutation will probably be very unlikely to exist, especially if the organisms in question reproduce sexually but the fact that it is in principle possible makes this definition useful. Chapter 3. A formal framework for a comparative study of redundancy 51 Much population genetics work uses single locus models where evolutionary change is only considered at one locus, the rest of the genome being supposed constant. Such models provide a welcome simplification without which more realistic scenarios cannot be understood. Natural selection acts on many genetic differences at the same time but in first approximation something similar to what is predicted by the one-locus model will be happening in parallel at all the loci under selection. Our approach is similar to the one-locus model in that it is concerned with changes at a very localised point of the genotype. The possible alleles for our locus are the elements of S. The novelty lies in the fact that our alleles are modelled down to the nucleotide level and that we can examine the consequences of these alleles having more than one representation at that level. Similarity with schema analysis Theoretical analysis of GAs has also resorted to the attribution of fitness to short sections of the chromosome. Holland’s schema theorem (Holland, 1992) talks of the fitness of schemas where schemas are sections of the chromosome of sufficiently short length in order not to be too disrupted by recombination. The fitness of a schema is the average fitness of all the individuals in the population that possess the schema at a well defined place in their genome. This is very similar to our function z; the difference is that instead of evaluating the schema in a fixed background, an average is taken in a variety of backgrounds provided by the evolving population. 3.4.3 Counting numbers of optima Let us now illustrate with some examples how redundancy affects the number of optima of a function. In Figure 3.9 we have represented permutation σ =[32541706], two fitness functions f1 and f2 defined on sequences of 3 bits, as well as f1σ and f2σ , the functions obtained by application of σ to f1 and f2 . Function f1 has 2 local maxima: 001 of fitness 0.4 and 111 of fitness 0.7 which is the global maximum. Its redundant version, f1σ , also has 2 local maxima. One is 0111: since this sequence is still associated with the highest fitness, its new neighbour 1111 cannot possibly be fitter. The other local maximum is 1101 which is synonymous of 0111 and also has the maximum fitness 0.7. Local optimum 001 of f1 has disappeared with redundancy since its new neighbour 1001 has a greater fitness (0.5). Function f2 also has two local maxima. These are 001 with fitness 0.7, and 110 with fitness 0.6. But function f2σ has 5 maxima. Both 001 and 110 remain maxima after a 0 has been added to their left (0001 and 0110). Three new ones are created in C1 1010 with fitness 0.5, 1100 with fitness 0.7 and 1111 with fitness 0.6. These two cases show that the same permutation σ can have a very different impact on functions which have the same numbers of optima. We should not expect to be able to assess redundancy on an arbitrary fitness function. The proper way to handle this difficulty is discussed in 3.4.5. Chapter 3. A formal framework for a comparative study of redundancy 0 6 C0 3 2 (a) σ f1 0.6 0.6 0.7 0.2 0.1 0.4 0.7 0.4 C0 0.6 0.7 0.3 0.2 0.1 0.5 0.0 0.0 0.2 0.3 0.5 0.0 C1 1 f1 0.3 4 5 3 2 6 7 5 4 0 1 7 52 0.1 0.4 C1 0.5 (b) σ f2 f2 0.6 0.5 0.1 0.0 0.7 0.2 C0 0.6 0.2 0.1 0.5 0.0 0.4 0.3 0.7 0.5 0.1 0.4 0.3 0.6 0.2 0.3 0.0 0.7 C1 0.4 (c) 1110 110 100 111 101 000 0100 011 010 001 0110 0101 0010 0000 0111 0011 0001 C0 1100 1111 1101 1010 1011 1000 1001 C1 (d) Figure 3.9: Counting local optima with and without redundancy. (a) A representation of permutation σ =[32541706]. (b) Function f1 and its redundant version f1σ . The number of local optima (circled corners) is 2 in both cases. (c) Function f2 and its redundant version f2σ . The number of local optima (circled corners) increases from 2 to 5. (d) A reminder of the correspondence between corners of the cubes and sequences. Chapter 3. A formal framework for a comparative study of redundancy f σ 0.7 0.5 0.5 0.8 0.6 C0 0.6 0.8 0.2 0.4 0.3 1.0 0.3 0.4 0.7 0.2 53 C1 1.0 (a) 1110 0110 0100 1100 0011 0001 C0 1101 1010 0101 0010 0000 0111 1111 1011 1000 1001 C1 (b) Figure 3.10: Counting local optima in the presence of neutral paths. (a) A redundant function with two neutral paths. Sequences in a solid circle are both counted as local optima. The sequence in the dashed circle loses its optimality as a result of the neutral path. (b) A reminder of the correspondence between corners of the cubes and sequences. 3.4.4 Dealing with neutral paths We will make explicit here the procedure used to count numbers of optima in the presence of neutral mutations. Consider function f σ resulting from the introduction of permutation [23017564] and pictured in Figure 3.10. The non-redundant function f from which f σ is derived can be obtained by looking at the left half of the figure. In f , sequences 101 and 110 are both local optima. Since both sequences are invariant through permutation σ, the paths that connect these sequences to their new neighbour in C1 are both neutral. Should 0101 and 0110 be counted as optima? The answer depends on the situation of their neutral neighbours 1101 and 1110. Sequence 1110 has no neighbour of higher fitness within C1 . In that case, both 0110 and 1110 will be counted as local optima. Sequence 1101, on the other hand, has a neighbour (1001) whose fitness is greater. In that case, neither 0110 nor 1110 will be counted as optimal sequences because transition from 0110 to a sequence of higher fitness is now possible via a neutral mutation in the leftmost bit. Stated in general terms, suppose a neutral mutation exists between sequences A and B both of fitness fAB ; both A and B will be counted as optima if none of the neighbours of A or B has a fitness strictly greater than fAB , but none of these sequences will be counted as optimum if either A or B has a neighbour of fitness greater than fAB . A is thus not Chapter 3. A formal framework for a comparative study of redundancy 54 counted as an optimum if a path to higher fitness exists via B and vice-versa. Using the procedures described above, for any function f and any permutation σ, we can calculate Nf , the number of optima of f and Nf σ , the number of optima of f σ . The comparison of these two numbers will be the basis for assessing the quality of permutation σ. 3.4.5 Comparing numbers of optima in a meaningful way Correcting for space expansion We have seen in Section 3.2.3 that, when σ is the identity permutation, the redundancy that results makes absolutely no difference in evolutionary terms. Given a function f with Nf numbers of optima, Nf Id , the number of optima of f Id , will be equal to twice Nf simply because every sequence that is optimal under f is duplicated with the duplication of the sequence space. Before we can compare the numbers Nf and Nf σ , we must therefore divide the latter by two, to compensate for this doubling of the number of sequences. If Nf σ /2 is equal to Nf redundancy has made no difference; if it is smaller, redundancy has reduced the number of local optima and the greater the difference between Nf and Nf σ /2, the more significant the improvement. For any code f and permutation σ, we will collect the values of both Nf − Nf σ /2, the net decrease in number of optima, and (Nf − Nf σ /2)/Nf , the proportion of the original number of optima that has been eliminated by redundancy. If these numbers are negative, then redundancy has effectively increased the number of optima. This can happen as illustrated in Figure 3.9.c. Averaging out Figure 3.9 has shown that the difference between Nf and Nf σ depends as much on f as on σ. If f has no local optima other than the global one (Nf = 1), no permutation can reduce the number of optima. Function f σ will have at least two local optima and most probably more. But the same permutation stands a much better chance of reducing the number of optima of a function f if that function is rich in local optima to start with. The effect of σ will thus vary a lot from one fitness function to the other. To avoid biasing our assessment of a permutation by an unrepresentative choice of functions f , we must average out the values of Nf − Nf σ /2 and (Nf − Nf σ /2)/Nf across as many different functions as possible. Only these averages, which we will call Dσ and Rσ respectively, can reliably be used to compare the benefits of different permutations. The larger Dσ and Rσ , the more we can expect σ to reduce, on average, the number of local optima and bring evolutionary benefits. If Dσ and Rσ are negative, redundancy is detrimental since it does on average increase the number of local optima. 3.5 Conclusion We saw how a shuffling or permutation of the elements of a set containing 2n elements can be regarded as defining a pattern of redundancy for a code defined on n bit long sequences. This is only one of the possible way of characterising redundancy. Its advantages, as we have shown, is to allow an easy distinction between the redundant and the non-redundant component of a code. This redundancy will be applicable to any code regardless of the Chapter 3. A formal framework for a comparative study of redundancy 55 nature of what is being represented by the binary sequences. In this thesis, we will only consider the case where such forms of redundancy are applied to non-redundant codes. Nothing prevents us, however, from using such permutations to add redundancy to codes which are already redundant. In this case, however, the distinction between the redundant and non-redundant aspects of the code become blurred. Conveniently, in the case where we start from a non-redundant code, the number of invariant elements of the permutation is equal to the number of neutral mutations that will exist in the redundant code. A procedure was defined for assessing the impact of these permutations on the number of local optimas of the code. This is done by assuming that all rankings of the symbols of the code are equally likely when considered in the large number of genetic contexts in which they can be found. Chapter 4 A statistical analysis of redundancy patterns 4.1 Aims of the chapter In the previous chapter, we saw how a code defined on sequences of length n can be made redundant by the addition of one bit to the length of the existing sequences. In this chapter, we will compare the many ways in which that particular type of redundancy can be added. The addition of a single bit of redundancy is defined by a permutation of 2n elements. Our investigation of redundancy will therefore compare the elements of P2n , the set of all permutations of 2n elements. The criterion by which these permutations will be judged is their ability to reduce the number of local optima of a randomly chosen function associating fitnesses to the symbols of the code. Two scalars, Rσ and Dσ , measure the expected reduction in numbers of optima that results from using σ to define redundancy. One of the aims of the chapter is to find permutations with the highest possible values of Rσ . These permutations will then be used in subsequent chapters to test the validity of “good” redundancy under a GA. Until we have some understanding of the features of a permutation which are responsible for large positive values of Rσ and Dσ , the only possible strategy to find such permutations is exhaustive search. Chapter 2 raised the possibility that the number of neutral mutations resulting from redundancy could be one of the factors responsible for large positive values of Rσ and Dσ . This chapter will assess that claim by examining the relationship between Rσ and the number of neutral mutations for very large instances of permutations. At the same time, we will keep track of many other features of the permutations and examine whether their incidence on Rσ tells us anything new about the causes of evolutionarily beneficial redundancy. 4.2 Some quantitative features of a permutation This section defines a series of variables which will be recorded for each permutation. If some of these variables correlate with Rσ and Dσ , we will learn something about the way redundancy affects the numbers of optima. Chapter 4. A statistical analysis of redundancy patterns 4.2.1 57 Number of invariant elements This parameter, called Inv(σ), is the number of elements such that σ(i) = i. We showed in the previous chapter that, if a1 , a2 , ..., an is a sequence such that σ(a1 , a2 , ..., an ) = a1 , a2 , ..., an , then (0, a1 , a2 , ..., an ) and (1, a1 , a2 , ..., an ) will be synonymous and mutation of the first of these bits will be neutral. Inv(σ) is therefore the number of neutral mutation that will result from applying σ to a code. 4.2.2 Number of orbits Group theory defines the orbit of a permutation as follows. Take an arbitrary element a on which the permutation is defined and apply the permutation to it, then to its image, then to the image of its image, and so on. Since the set of elements on which the permutation is defined is finite, these successive applications of σ will eventually take us back to a. That is ∃q / σ q (a) = a The subset containing a together with all the elements that are encountered before returning to a is called the orbit of a. The number of orbits defined by permutation σ is the variable that will be collected; it is called Orb(σ). This number is somewhere between 1 and 2n . The identity has 2n orbits since every element is on an orbit of its own. At the other extreme, consider the permutation that associates to an element the next one along in the set, looping back on the first element when the last one is reached. By construction, one will not return to the same element before having been through the entire set. Hence this permutation has a single orbit which includes all the elements. Notice that, for any permutation, every invariant element is on an orbit of its own. The number of orbits of a permutation is therefore at least as big as its number of invariant elements. 4.2.3 Sum of distances between a sequence and its image We define the following variable: SumDist(σ) = n −1 2X H(i, σ(i)) i=0 where H(x, y) is the hamming distance between the binary representations of x and y. From Chapter 3 we know that sequence (1, a1 , a2 ..., an ) will be synonymous of a unique sequence starting with 0: (0, σ(a1 a2 ...an )). The hamming distance between i and σ(i) is therefore a measure of how far apart these synonymous sequences are (assuming i is the decimal equivalent of (a1 , a2 , ..., an )). If H(i, σ(i)) = 0, the two synonymous sequences differ only on the leftmost bit and a neutral mutation exists between the two. At the other extreme if H(i, σ(i)) = n, the synonymous sequences are complementary and all bits must mutate to go from one synonymous to the other. The variable SumDist is the sum of these distances for all sequences in C0 . Since i = Id(i) for all i, we have SumDist(Id) = 0. The identity permutation therefore minimises the variable SumDist. The highest possible score is obtained by permutation τ Chapter 4. A statistical analysis of redundancy patterns 58 T(000) T(110) T(110) T(000) C1 C0 Figure 4.1: A connection of type 0 between a pair of symbols. which associates to every sequence its complementary sequence. We have SumDist(τ ) = n2n . There is a negative correlation between SumDist and Inv. 4.2.4 Connectivity between pairs of signification In Chapter 2, we suggested that redundancy would result in a code where transition between symbols would be easier by point mutation. (We use symbol to describe that which is represented by the code.) It is therefore desirable that we assess the impact of a permutation on such transitions. In the case of a non-redundant code defined on sequences of length n, the number of different symbols is 2n . Adding redundancy does not alter this number. The number of pairs of symbols whose connectivity can be examined is thus equal to C22n = 2n−1 (2n − 1) In the case where only one bit of redundancy is added, patterns of connectivity between pairs of symbols can be classified in four mutually exclusive classes. We call Conn0(σ), Conn1(σ), Conn2(σ) and Conn3(σ) the numbers of pairs of symbols which fall into each of these four class. Because a given pair of symbols falls in one and only one of these, we have, Conn0 + Conn1 + Conn2 + Conn3 = 2n−1 (2n − 1) We will now describe each of these classes in turn. Class 0 Consider a redundant code T σ and two distinct sequences (0, a1 , a2 , ..., an ) and (0, b1 , b2 , ..., bn ). The meaning of these sequences is T (a1 , a2 , ..., an ) and T (b1 , b2 , ..., bn ) under T σ . These sequences are synonymous of sequences (1, σ(a1 , a2 , ..., an )) and (1, σ(b1 , b2 , ..., bn )) respectively. We will say that the pair of symbols (T (a1 , a2 , ..., an ), T (b1 , b2 , ..., bn )) is in class 0 if neither of the sequences whose meaning is T (a1 , a2 , ..., an ) is a neighbour of neither of the sequences whose meaning is T (b1 , b2 , ..., bn ). This is the most restrictive case from the point of view of evolution. It means that transition from T (a1 , a2 , ..., an ) to T (b1 , b2 , ..., bn ) will always have to proceed via some sequence standing for a third symbol. If T (a1 , a2 , ..., an ) and T (b1 , b2 , ..., bn ) are the best Chapter 4. A statistical analysis of redundancy patterns 59 T(000) T(110) T(110) T(000) C1 C0 Figure 4.2: A connection of type 1 between a pair of symbols. and second best symbols at a certain locus, then both sequences associated with the second best symbol will be local optima because transition to the best symbol will be impossible without a deleterious mutation taking place first. An example of a pair of symbols in this class is shown in Figure 4.1. Notice that pairs of symbols (T (a1 , a2 , ..., an ), T (b1 , b2 , ..., bn )) cannot be in class 0 if the Hamming distance between (a1 , a2 , ..., an ) and (b1 , b2 , ..., bn ) is 1. The reason is that such symbols are by definition neighbour in C0 . Given that there are n2n−1 pairs of symbols for which this is true (the number of edges of a cube of dimension n), the maximum value of Conn0 is: 2n−1 (2n − 1) − n2n−1 = 2n−1 (2n − n − 1) Class 1 A pair of symbols (T (a1 , a2 , ..., an ), T (b1 , b2 , ..., bn )) will fall in this category when one and only one of the two sequences meaning T (a1 , a2 , ..., an ) is a neighbour of one and only one of the two sequences meaning T (b1 , b2 , ..., bn ). The other two synonymous sequences must be disjoint from the connected pair and disjoint from each other. An example of two symbols in class 1 is shown in Figure 4.2. In this case, substitution of one symbol by the other by point mutation may or may not be possible depending on which of the synonymous representations is used for it. If it is the unconnected one, then substitution is impossible. In Figure 4.2 for instance, if T (110) is encoded as 0110, transition to T (000) is possible by mutation to sequence 1110. If T (110) is encoded by 1011, transition to T (000) is impossible because none of the neighbours stands for T (000). When a pair of symbols is in class 1, we expect transition from one to the other to be possible in half of the cases. Class 2 A pair of symbols (T (a1 , a2 , ..., an ), T (b1 , b2 , ..., bn )) will be in class 2 if any three of the four sequences associated with these two symbols are connected together, the fourth one being disjoint from them. Figure 4.3 illustrates several possible configurations for this case. Pairs of symbols in this class are not in a symmetrical relationship to each other. From one of the symbols transition will always be possible to the other while the reverse Chapter 4. A statistical analysis of redundancy patterns 60 T(100) T(110) T(110) T(100) C1 C0 (a) T(110) T(110) T(100) T(100) C1 C0 (b) 1110 0110 1100 0010 0011 0001 C0 1101 1010 0101 0100 0000 0111 1111 1011 1000 1001 C1 (c) Figure 4.3: Possible connections of type 2 between pairs of symbols. (a) and (b) Two possible configurations. (c) A reminder of the correspondence between corners of the cubes and sequences. transition will depend on genetic representation. In Figure 4.3.a for instance, it is always possible to go from T (100) to T (110) because both representations of T (100) are neighbours of 0110. The reverse transition from T (110) to T (100) is only possible in half of the cases, when T (110) is encoded as 0110. On average, transition from one symbol to the other is possible in three out of four cases. Notice the difference between Figure 4.3.a and Figure 4.3.b. In the second case, transition from T (110) to T (100) will require a neutral mutation to happen first when the starting point is sequence 1110. Class 3 A connection of type 3 exists between symbols T (a1 , a2 , ..., an ) and T (b1 , b2 , ..., bn ) if it is always possible to go from one of the symbols to the other without having to go through a third symbol. The path that goes from one symbol to the other might include some neutral mutations. The possible configuration for this case are represented in Figure 4.4. In practice, all configurations implementing this case are such that a single mutation is enough to go from one symbol to the other. Neutral mutations can take place first but they are never necessary. Chapter 4. A statistical analysis of redundancy patterns 61 If all pairs of symbols were in this class, there would be never be any local optimum given that transition to the global optimum would be possible from anywhere. However, as we will see, no permutation comes close to achieving this. 4.3 Best and worst permutations when n is equal to 3 4.3.1 Some considerations of size In the case where n is equal to 3, patterns of redundancy are defined by permutations over 8 elements. The set P8 containing these permutations has 40320 elements. Because of the many symmetries of the cube, many of them will lead to identical patterns of redundancy. Unfortunately, these equivalence relation are not easy to detect and we had to explore the entire set. Each permutation should be evaluated on as large a number of fitness functions as possible. As explained in the previous chapter, when a function is defined, only the induced ranking of the symbols matters for the count of local optima. There are therefore 40320 fitness rankings on which each permutation can be evaluated. Testing every permutation on every fitness ranking requires that we count numbers of optima in approximately 1.5 billion different configurations. 4.3.2 Description of the data Our first task was to find among P8 the permutations with the 10 highest and 10 lowest values of Rσ . For those 20 permutations we recorded the values of all the variables which have been defined in the previous section. The result is displayed in Table 4.3.1. Each line of these tables corresponds to a different permutation. The first column identifies the permutation σ by listing the values of σ(0) . . . σ(7). The invariant elements of the permutation are underlined. The second column gives the value of Rσ and between brackets the ranking of the permutation with respect to this value. The third column gives the same information for Dσ . In Table 4.3.1.b the ranking is reversed indicating how far a permutation is from the bottom of the list. The values of Rσ and Dσ are the averages of Nf − Nf σ /2 and (Nf − Nf σ /2)/Nf over 40320 fitness functions. For some of these function Nf − Nf σ /2 and (Nf − Nf σ /2)/Nf will be positive, indicating that the number of optima has diminished as a result of the redundancy. For others, they will be negative indicating that redundancy has increased the number of optima of that function. These adverse cases will contribute negatively to the value of Rσ and Dσ . For each permutation, we calculated the proportion of fitness functions which result in Nf − Nf σ /2 being negative. This number is indicated in the fourth column of the tables under the label BadCases. The following columns indicate the values of the variables defined in the previous section. 4.3.3 Any redundancy is better than none The most remarkable observation that can be made from Table 4.3.1 is that the identity permutation is the worst possible permutation. It is rated as such by both Dσ and Rσ , scoring exactly 0 with both measures. This score was expected since Rσ and Dσ were Chapter 4. A statistical analysis of redundancy patterns T(110) 62 T(100) T(110) T(100) C1 C0 (a) T(100) T(110) T(110) T(100) C1 C0 (b) T(001) T(110) C0 C1 T(001) T(110) 1110 0110 0100 (c) 1100 1101 1010 0101 0010 0000 0111 0011 0001 C0 1111 1011 1000 1001 C1 (d) Figure 4.4: Possible connections of type 3 between pairs of symbols. (a),(b) and (c) Three possible configurations. (d) A reminder of the correspondence between corners of the cubes and sequences. Rσ (rank) 0.3225 (1) 0.3212 (2) 0.3168 (3) 0.3120 (4) 0.3061 (5) 0.3030(6) 0.2981(7) 0.2970 (8) 0.2962 (9) 0.2958 (10) Rσ (rank) 0.0852 (10) 0.0797(9) 0.0790 (8) 0.0692 (7) 0.0680 (6) 0.0625 (5) 0.0603 (4) 0.0480 (3) 0.0453 (2) 0.000 (1) σ(0) . . . σ(7) 07143562 01763254 01567234 06743512 07153264 06137254 01762354 07543612 56401237 06751234 σ(0) . . . σ(7) 40576123 46570123 01452367 40125673 40615723 75016423 45607123 40516273 40675123 01234567 0.000 (1) 0.0976 (2) 0.1000 (3) 0.2298 (>10) 0.1321 (4) 0.1464 (5) 0.2476 (>10) 0.2000 (9) 0.1667 (6) 0.1810 (7) Dσ (rank) 0.7060 (10) 0.7060 (9) 0.7190 (5) 0.6952 (>10) 0.7024 (>10) 0.7155 (6) 0.7690 (1) 0.7595 (4) 0.7595 (3) 0.7679 (2) Dσ (rank) 0.000 0.098 0.000 0.161 0.073 0.146 0.161 0.000 0.049 0.122 BadCases 0.125 0.120 0.139 0.099 0.108 0.104 0.157 0.165 0.112 0.140 BadCases 8 0 0 0 0 0 0 4 0 16 14 15 10 14 13 10 14 14 13 Conn0 (b) Inv 0 3 3 3 6 5 4 3 4 4 3 Conn0 1 1 1 2 2 2 2 2 2 3 (a) Inv 0 4 0 8 2 6 8 0 0 4 Conn1 15 15 13 10 8 10 8 10 10 9 Conn1 0 0 0 4 2 0 4 0 4 2 Conn2 4 4 6 4 8 8 12 8 8 12 Conn2 12 10 13 6 10 9 6 14 10 9 Conn3 6 6 6 8 7 6 5 6 6 4 Conn3 0 8 12 10 10 8 10 8 12 10 SumDist 16 16 18 12 14 14 16 14 14 12 SumDist Table 4.1: (a) The 10 best permutations when n equals 3. (b) The 10 worst permutations when n equals 3. 8 3 2 3 1 2 1 6 3 2 Orb 2 2 3 4 4 3 5 5 3 5 Orb Chapter 4. A statistical analysis of redundancy patterns 63 Chapter 4. A statistical analysis of redundancy patterns 64 somehow calibrated to produce 0 for the identity permutation (see Section 3.4.5). The surprise comes from the fact that all other permutations have a positive score. We can therefore state that, on average, every permutation will reduce the numbers of local optima. Since the identity permutation is tantamount to a non-redundant version of the code, we can conclude that any of the kinds of redundancy that has been included in this study brings at least a small improvement. It is important to bear in mind that this is true only on average over a large number of fitness functions. For any permutation, there will be plenty of fitness functions such that Nf − Nf σ /2 and (Nf − Nf σ /2)/Nf are negative; their proportion is given in the BadCases column. Table 4.3.1.a indicates that, at the other end of the spectrum, the very best permutations have Rσ values greater than 0.3. These permutations will therefore suppress nearly a third of the local optima. This figure is very encouraging bearing in mind that it is an average over all possible fitness rankings of the symbols, a large number of which cannot possibly be made smoother (Nf = 1). We must also remember that we are adding here the minimum amount of redundancy possible. 4.3.4 Differences between Dσ and Rσ Comparison of the figures found in columns 2 and 3 shows that Dσ and Rσ yield slightly different rankings. We can understand the origin of this difference by imagining two permutations σ1 and σ2 which both reduce by, say, 30% the number of optima of half the possible fitness functions but leave the number of optima unchanged in the other half of the cases. If σ1 achieves a 30% reduction on functions with high numbers of optima while σ2 is more effective on functions with low numbers of optima, we will have Rσ1 = Rσ2 = 0.3/2 = 0.15, but Dσ1 will be greater than Dσ2 because in absolute terms σ1 will eliminate many more optima than σ2 . The fact that rankings according to Rσ and Dσ do not differ significantly indicates that permutations are effective on average over the same type of fitness functions. This conclusion is confirmed by the next observation. 4.3.5 The proportions of adverse cases The values of BadCases are very similar for the best and the worst permutations. This indicates that good permutations do better, not by being effective on a bigger proportion of the fitness functions on which they are tested, but rather, by achieving more significant reductions on those functions which are rich in optima. Interestingly, we find at the bottom of the list a few permutations for which the value of BadCases is 0. Those forms of redundancy have the property that whatever the fitness function to which they are applied, they never increase its number of optima. One such function is the identity permutation which is not surprising since it never changes anything either for better or for worse. Permutations such as [40516273] and [01452367] are more interesting because they have a small overall positive effect while never making things worse. The reasons behind this property can be understood by examination of Figure 4.5. In the case of the identity permutation, C1 is an exact copy of C0 . In the case Chapter 4. A statistical analysis of redundancy patterns 6 6 5 4 C0 7 3 5 4 0 3 2 0 2 7 1 C1 1 (a) 4 6 C0 5 1 7 6 2 3 2 0 0 7 5 4 65 1 C1 3 (b) Figure 4.5: Permutations which never increase the number of optima. (a) Cube C1 can be obtained from C0 by symmetry about the plane that includes 0, 1, 6 and 7. (b) Cube C1 can be obtained from C0 by rotation of 90 degrees around an horizontal axis. of [40516273] and [01452367], C1 can be obtained from C0 by a rotation and a symmetry respectively. All three transformations have in common the fact that they preserve in C1 the relationships between symbols that existed in C0 . Therefore, no maximum can be created in C1 that did not already exist in C0 . In the cases of [40516273] and [01452367] however, some maxima might disappear because the interconnection of C0 and C1 creates some new paths between symbols. In the case of the identity permutation, this does not happen since the interconnection of C1 and C0 only connects sequences which code for the same symbol. 4.3.6 Trends in the other variables Examination of the values of other variables in Table 4.3.1 leads to the following observations. The values of Inv for the best permutations are all 1, 2 or 3. For the worst permutations, this value is in most cases 0 with the exception of an 8 (the Identity permutation) and a 4. Although this does not indicate a linear relation between Rσ and Inv, the values found at the top are distinct from those found at the bottom. Conn0 points to a simple trend with low values at the top (although not the lowest ones since 0 is a possible value) and high values at the bottom. Conn3 shows a similar trend but the values for the best and worst permutations are not as differentiated. The Chapter 4. A statistical analysis of redundancy patterns 66 0.35 0.3 Rσ 0.25 0.2 0.15 0.1 0.05 0 -1 0 1 2 3 4 5 6 7 8 Inv Figure 4.6: Rσ as a function of Inv when n equals 3. value 6, for instance, is found for one of the worst and one of the best permutations. Conn1 and Conn2 have somehow the reverse trend: high values for the good permutations and low values for the bad ones. Here again some values appear both in Table 4.3.1.a and Table 4.3.1.b. The values of SumDist in the top table are all greater than 12 while they are all smaller than 12 in the bottom table. However the value 12 is obtained for several of the best and the worst permutations. The values of Orb are between 2 and 5 in the top table and between 1 and 8 in the bottom one. Values 4 and 5 only appear in the top table. Values 2 and 3 are possible for the best and the worst permutations. These observations indicate some trends but only the examination of more samples can give us a clearer picture. This will be done in Section 4.5. 4.4 The incidence of Inv on Rσ The variable Inv is the one whose relation to Rσ we are most interested in investigating. There are two reasons for that. The first is that our account, in Section 2.2.4, of how redundancy could help the genetic code implied that the amount of neutrality could be an important factor. The second reason is that it is simple to generate a permutation with a given number of invariants. Hence, if some number of invariant leads to high value of Rσ , we would have a convenient method for generating beneficial forms of redundancy. 4.4.1 Codes defined on sequences of length 3 In Figure 4.6, the value of Rσ has been plotted against Inv for every possible permutation. The points appear on 8 equidistant vertical lines corresponding to the possible values of Inv. No point exists for Inv = 7 and a single one for Inv = 8 corresponding to the identity permutation whose value is 0. The density of points for values of Inv equal to 0, 1 and 2 is such that the superposition of points gives the impression of a continuous lines. The graph shows that more permutations exist with low numbers of invariants than with Chapter 4. A statistical analysis of redundancy patterns 67 0.25 0.15 R <Inv> 0.2 0.1 0.05 0 0 1 2 3 4 Inv 5 6 7 8 Figure 4.7: R<Inv> as a function of Inv when n equals 3. high ones. The highest point on the graph is in the line corresponding to Inv = 3. It is, however quite isolated from the other points in the same line and nearly matched by the best element for which Inv = 2. Many other points follow closely in the same line. This mirrors the data in the fourth column of Table 4.3.1. In any given line, points extend over a large range of Rσ values; the number of invariant elements of a permutation does not in itself constrain the value of Rσ very much. Despite this fact, we cannot fail to notice an overall trend on the graph. The first three lines of points display an upwards trend. The line corresponding to Inv = 3 is similar to the one of Inv = 2, being only less dense and not extending quite as far down. As Inv increases beyond 3, the clouds of point are shifted downwards. Another way of visualising this trend, is to average Rσ for all the points on a vertical line. We call R<i> the average value of Rσ across all permutations with i invariant elements. Figure 4.7 shows the variation of R<i> with i. The trend described above is clearer on this figure. The value of R<i> increases with i up to a value of 3. For larger values of i the trend is reversed and the decline is quite sharp for i greater than 5. The next section examine how this pattern changes for greater values of n. 4.4.2 Codes defined on sequences longer than 3 When n is equal to 4, both the number of possible permutations and fitness functions are of the order of 21 × 1012 . It is therefore clear that for values of n greater than 3, the test of all permutations on all fitness functions is out of question. Instead we have to rely on sampling both these sets. For that purpose, we created an algorithm which, given a value of n and a value of i smaller than 2n , generates random permutations with exactly i invariant elements. For each value of i between 0 and 2n (except 2n − 1 for which no permutation exists), we generated 5000 such permutations and calculated Rσ for each of them by averaging (Nf − Nf σ /2)/Nf for 100 randomly chosen fitness rankings f (taken among the 2n ! possible ones). This experiment was performed for n equals 4, 5, 6 and 7. Chapter 4. A statistical analysis of redundancy patterns 68 Figure 4.8 shows four plots corresponding to the four values of n. In each of them, the value of Rσ is plotted against Inv for the 5000 × 2n randomly generated permutations. On each of these graphs, we recognise the pattern that was found for n equals 3. Moreover, as n increases, the pattern seems to come more into focus; the points corresponding to a single value of i are more clustered, suggesting an improved correlation between Inv and Rσ . In the four cases, the curves increase on the left half of their x range and decreases towards 0 in the other half. All permutations have positive Rσ values confirming that any form of redundancy is a positive factor in reducing the number of local optima. For all values of n, a wide band of points stands out along which we find a few scattered points. The best permutation are usually found among these atypical points. The very best permutations are found in the middle of the x axis, in the same region where the band reaches its maximum. The best permutations have Rσ values which decrease with n: from 0.3, when n equals 4, it drops to 0.24 when n = 7. This could be a genuine trend. However, because the proportion of sampled points decreases exponentially with n, the very best permutations become increasingly unlikely to be included in our sample. It is therefore conceivable that permutations with value of Rσ over 0.3 exist for all values of n. As in the previous case, we also plotted R<i> , the average value of Rσ for all permutations with i invariant elements, against i. The four plots are shown in Figures 4.9. They reproduce, as expected, a similar pattern to the one described by the dense band of points in Figure 4.8. They also confirm the decrease of the values of Rσ with n. 4.5 The incidence of other variables on Rσ This section examines the relation between Rσ and the other variables that have been defined. In Figure 4.10, Rσ is plotted against all these variables and Table 4.2 summarises the correlation coefficients between every pair of variable. 4.5.1 The relation with Conn0 Table 4.2 shows that Conn0 is the variable with the highest correlation with Rσ . The coefficient is negative indicating that small values of Conn0 lead to high values of Rσ . This Table 4.2: Summary of the correlations between all variables Rσ Inv Conn0 Conn1 Conn2 Conn3 SumDist Orb Rσ 1.00 0.33 -0.66 0.16 0.27 -0.12 0.32 0.25 Inv 0.33 1.00 0.19 -0.64 0.64 0.03 -0.66 0.88 Conn0 -0.66 0.19 1.00 -0.75 -0.09 0.55 -0.53 0.21 Conn1 0.16 -0.64 -0.75 1.00 -0.59 -0.30 0.64 -0.55 Conn2 0.27 0.64 -0.09 -0.59 1.00 -0.52 -0.51 0.41 Conn3 -0.12 0.03 0.55 -0.30 -0.52 1.00 0.11 0.23 SumDist 0.32 -0.66 -0.53 0.64 -0.51 0.11 1.00 -0.51 Orb 0.25 0.88 0.21 -0.55 0.41 0.23 -0.51 1.00 Rσ Rσ 0 0.05 0.1 0.15 0.2 0.25 0.3 0 0.05 0.1 0.15 0.2 0.25 0 0 8 2 16 4 Inv 32 Inv 8 40 10 48 12 56 n=6 14 64 16 0 0.05 0.1 0.15 0.2 0.25 0 0.05 0.1 0.15 0.2 0.25 0.3 0 0 16 4 32 8 48 12 Figure 4.8: Rσ as a function of Inv when n is greater than 3. 24 6 n=4 Rσ Rσ 0.3 16 Inv 64 Inv 80 20 96 24 112 n=7 28 n=5 128 32 Chapter 4. A statistical analysis of redundancy patterns 69 <Inv> 40 10 48 12 56 n=6 14 64 16 0.15 0.2 0.25 0 0 0 4 20 8 40 12 16 Inv 60 Inv Figure 4.9: R<Inv> as a function of Inv when n is greater than 3. Inv 0 32 Inv 8 0 24 6 0.05 16 4 0.05 8 2 0.05 0.1 0.15 0.2 0.1 0 0 n=4 0.25 0.1 0.15 0.2 0.25 0 0.05 0.1 0.15 0.2 <Inv> <Inv> R <Inv> R R R 0.25 80 20 100 24 n=7 28 n=5 120 32 Chapter 4. A statistical analysis of redundancy patterns 70 Chapter 4. A statistical analysis of redundancy patterns 0.35 0.35 0.3 0.3 0.25 0.25 0.2 Rσ Rσ 0.2 0.15 0.15 0.1 0.1 0.05 0.05 0 0 -0.05 -0.05 0 2 4 6 8 10 12 14 -5 16 0 5 10 0.35 20 25 0.35 0.3 0.3 0.25 0.25 0.2 Rσ 0.2 Rσ 15 Conn1 Conn0 0.15 0.15 0.1 0.1 0.05 0.05 0 0 -0.05 -0.05 -5 0 5 10 15 20 25 -5 0 5 Conn2 10 15 20 Conn3 0.35 0.35 0.3 0.3 0.25 0.25 0.2 Rσ Rσ 71 0.15 0.2 0.15 0.1 0.1 0.05 0.05 0 0 0 5 10 15 20 25 SumDist 0 2 4 6 8 10 Orb Figure 4.10: Rσ as a function of other variables when n equals 3. is consistent with what we observed for the best and worst permutations in Tables 4.3.1. Variable Conn0 is the number of pairs of symbols such that the substitution of one by the other is never possible without going first through a third symbol. It is also the number of pairs of symbols which are not connected by a connection of type 1, 2 or 3. We can describe the correlation by saying that the more pairs of symbols are connected in some way (by a connection of type 1, 2 or 3) the more likely are local optima to disappear through the introduction of redundancy. The relationship between the two variables is not however as simple as we might expect. One permutation exists whose value of Conn0 is equal to 0 and five have a value of Conn0 equal to 1. However these are not among the permutations with the largest values of Rσ . Some permutations with values of Conn0 as large as 8 have greater values of Rσ as can be seen on Figure 4.10.a. The value of Conn0 which leads to the largest value of Rσ is 3. The relationship between Rσ and Conn0 is therefore not linear. On Figure 4.11, Rσ has been plotted against Inv for permutations satisfying Conn0 = 3 on one plot and Conn0 = 4 on the other. As this figure shows, there is in both cases a high residual correlation between Rσ and Inv. The correlation coefficients are 0.71 and 0.75 respectively. We conclude that the impact of Inv on Rσ is not mediated or explained away by the variable Conn0. Chapter 4. A statistical analysis of redundancy patterns 72 0.32 0.3 0.3 Rσ Rσ 0.28 0.26 0.24 0.25 0.2 0.22 0.15 0.2 0 1 2 3 0 1 (a) 2 Inv Inv (b) Figure 4.11: Rσ as a function of Inv when Conn0 is fixed. (a) Conn0 = 3. (b) Conn0 = 4. 4.5.2 The relation with Conn1, Conn2 and Conn3 None of these variables taken individually has a high correlation with Rσ (Table 4.2). However, because of the equality Conn1 + Conn2 + Conn3 = 28 − Conn0 we know that that Conn1 + Conn2 + Conn3 has the same correlation coefficient with Rσ as Conn0 but with an opposite sign. If that is the case, it is tempting to alter the weighting of this sum to see if this correlation can be improved. A linear regression will find the weights a2 and a3 such that the correlation between Conn1 + a2 Conn2 + a3 Conn3 and Rσ is maximised. The result is a new variable BestConn = Conn1 + 1.38Conn2 + 1.84Conn3 whose correlation coefficient with Rσ is 0.92. A scatter plot of Rσ against BestConn is shown in Figure 4.12. The fit is much improved compared to any of the variables we have seen so far. This weighting suggests that Conn1, Conn2 and Conn3 all contribute to increasing the value of Rσ but not to an equal amount. It could be said that a connection of type 3 is 1.84 times more effective than a connection of type 1 while a connection of type 2 is somewhere in the middle. The relative magnitude of the weights is consistent with the definition of the classes: a connection of type 3 guarantees unrestricted possibility of transition between two symbols while a connection of type 1 only makes substitution possible in half of the cases. This variable shows that our division in 4 classes is sound and that connectivity between symbols, as defined by these classes, is instrumental in the definition of good redundancy. The variable BestConn could also help discover permutations with high values of Rσ . It cannot do so directly because it would be very difficult to construct a permutation with a large value of BestConn. However, since BestConn is much more economical to calculate than Rσ , we can use it in order to assess the quality of randomly generated permutations. Chapter 4. A statistical analysis of redundancy patterns 73 0.35 0.3 Rσ 0.25 0.2 0.15 0.1 0.05 0 22 23 24 25 26 27 28 29 30 31 32 33 34 BestConn Figure 4.12: Rσ as a function of the best linear combination of Conn1, Conn2 and Conn3. 0.35 0.35 0.3 0.3 0.25 Rσ Rσ 0.25 0.2 0.2 0.15 0.15 0.1 0.05 0.1 5 10 15 20 25 30 25 30 35 SumDist 40 45 50 55 60 65 SumDist (a) (b) Figure 4.13: Rσ as a function of SumDist when Inv is fixed. (a) n = 4 and Inv = 7; the correlation coefficient is 0.54. (b) n = 5 and Inv = 15; the correlation coefficient is 0.37. 4.5.3 The relation with SumDist The correlation between SumDist and Rσ is quite small (0.32) and no interesting pattern emerges from the graph shown in Figure 4.10. However, when the correlation between SumDist and Rσ is calculated within subsets defined by a constant value of Inv, the correlation coefficients increase enormously as shown in the following table. Inv 0 1 2 3 4 5 6 Corr(Rσ ,SumDist) 0.82 0.89 0.80 0.81 0.57 0.98 0.72 This observation is interesting because it suggests a simple procedure for finding good permutations. Given that the best permutations have about half of their elements invariants, we could further constrain them to have the largest possible value of SumDist. All we have to do to build such permutations is assign to those elements which are not invariant an image that is as different from them as possible in binary terms. To check the validity of this procedure, we checked whether this correlation was also Chapter 4. A statistical analysis of redundancy patterns 74 found for greater values of n. We examined the cases where n equals 4 and 5. The optimal numbers of invariant elements are 7 and 15 respectively for these cases. In one case we generated permutations of 16 elements, of which 7 were invariant and in the other permutations of 32 elements, 15 of which were invariant. Figure 4.13 shows that, in both cases, the value of SumDist does not help identify the best permutations among those with the optimal number of invariant elements. The procedure proposed is therefore not applicable. 4.6 Parallels with a quaternary alphabet In this section, we briefly come back to the difference between codes using binary and quaternary alphabets. We argued at the beginning of Chapter 3 that conclusions reached with a binary alphabet could be extrapolated to a quaternary one, at least in qualitative terms. We can now reconsider this statement in the context of what we know from this chapter and the previous one. To make the discussion more concrete and directly relevant to the study of the genetic code, let us compare the case where 16 distinct symbols are encoded with a binary alphabet with the same 16 symbols encoded using a quaternary one. With a quaternary alphabet, two letters only are needed to describe each symbol. If a binary alphabet is used, 4 bits are needed instead. A graphical representation of the neighbouring relationships is shown in Figure 4.14 in the case of a quaternary alphabet. Pairs of letters, taken from the alphabet {A,B,C,D}, are connected by a line whenever they differ at only one position. This can be compared with Figure 3.2 which represents the same relationship in the binary case. The following features distinguish these two cases. In the quaternary case, sequences are a maximum of two mutations away while they are as far as four mutations away in the binary case. Every sequence has 6 neighbours in the quaternary case and 4 in the binary case. This means that the number of sequence that are not accessible by mutation are similar, respectively 10 and 12. These numbers are shown in Table 4.3. Given this fact, it looks as if redundancy has as much potential to increase the number of possible transitions in the quaternary case than it has in the binary case. Furthermore, as Table 4.3 shows, the number of pairs of sequences which are connected is different in both cases. However, adding two bits or one quaternary letter of redundancy multiplies this number by four in both cases. Thus even though the numbers are different, the ratio is the same. These facts taken together indicate that although the underlying graphs are different, the potential for redundancy to add new paths between previously unconnected symbols is very similar in both cases. 4.7 Conclusion This chapter has shown that permutations range from no effect at all to a reduction by 30% of the number of optima. None of the forms of redundancy investigated here was found to have an overall negative effect. The number of neutral mutations induced by a permutation, which is its number of invariant elements, provides some indication of the value of Rσ . For the values of n tried here, the best patterns of redundancy displayed Chapter 4. A statistical analysis of redundancy patterns AA AB AC AD BA BB BC BD CA CB CC CD DA DB DC DD 75 Figure 4.14: A representation of sequence distances in the case of a quaternary alphabet. around 2n−1 neutral mutations. A better prediction of Rσ can be made by a careful analysis of the possibilities of transitions between the pairs of symbols represented by the code. The variable Rσ was defined as an indicator of a potentially beneficial interaction between the pattern of redundancy defined by σ and evolution by mutation and selection. We need, however, some confirmation that this variable is indeed fulfilling that purpose. In the next two chapters, some of the patterns of redundancy studied here will be included in simulations of evolution in the form of a GA. This will provide the ultimate measure by which we want to decide whether a pattern of redundancy is beneficial or not. It will also be the occasion to evaluate the usefulness of Rσ . Chapter 4. A statistical analysis of redundancy patterns 76 Table 4.3: A summary of the differences between a binary alphabet and a quaternary one in the case where 16 symbols are encoded. Redundancy is assumed to result from the addition of two binary digits or one quaternary one. Binary alphabet Quaternary alphabet Length of strings 2 4 Maximum distance between strings 2 4 Number of neighbours from a sequence 4 6 Number of sequences more than one mutation 12 10 32 48 192 288 away Number of pairs of sequences which are neighbour Number of pairs of sequences which are neighbours after redundancy is added Chapter 5 Redundancy on trial in evolution 5.1 The Genetic Algorithm We describe in this section the genetic algorithm that has been used in this chapter and the next one. We used a spatially distributed genetic algorithm where every individual in the population occupies a slot in a two-dimensional grid. The edges of the grid wrap around so that the grid is torus-shaped. Every slot in the grid contains exactly one individual. The grid had 20 cells in width and 20 in height giving a total population of 400 individuals. The GA performs the following sequence of steps: • For each slot s of the grid in turn – Pick t individuals from the neighbourhood of s. The probability of an individual being picked in any of these t choices is represented in Figure 5.1. – Identify the best two genotypes, P1 and P2 , among those t individuals. – Perform a one point crossover between P1 and P2 to produce a new genotype C. – Mutate each bit of C with probability pmut . – Replace the worst of the t genotypes by C. This defines a steady-state GA where one individual is replaced at a time. Contrarily to GAs where the entire population is replaced at the same time, there is no natural point in time marking the beginning of a new generation. We can nonetheless refer to 400 of the cycles above as a generation since this is the time it takes to replace an amount of individuals equal to the population size. This GA was used since there is mounting evidence that it gives good reliable results over a range of problems (McIlhagga et al., 1996; Collins and Jefferson, 1991). In fact, many of the simulations presented here were also tried with a random mating genetic algorithm with no appreciable difference in the results. In some of the experiments described in this chapter, recombination is turned off and we replace the worst of the T individuals by a mutated version of the best one. Notice that because we always remove the worst individual among T , there is no way in which Chapter 5. Redundancy on trial in evolution 78 0 0 1 1 2 3 4 5 5 5 4 3 2 1 1 0 0 0 1 2 3 5 7 9 11 12 11 9 7 5 3 2 1 0 1 2 3 6 10 15 19 23 24 23 19 15 10 6 3 2 1 1 3 6 11 18 27 36 42 44 42 36 27 18 11 6 3 1 2 5 10 18 30 44 58 69 73 69 58 44 30 18 10 5 2 3 7 15 27 44 65 86 101 107 101 86 65 44 27 15 7 3 4 9 19 36 58 86 113 133 140 133 113 86 58 36 19 9 4 5 11 23 42 69 101 133 157 166 157 133 101 69 42 23 11 5 5 12 24 44 73 107 141 166 175 166 141 107 73 44 24 12 5 5 11 23 42 69 101 133 157 166 157 133 101 69 42 23 11 5 4 9 19 35 58 86 113 133 140 133 113 86 58 36 19 9 4 3 7 15 27 44 65 86 101 107 101 86 65 44 27 15 7 3 2 5 10 18 30 44 58 69 73 69 58 44 30 18 10 5 2 1 3 6 11 18 27 35 42 44 42 35 27 18 11 6 3 1 1 2 3 6 10 15 19 23 24 23 19 15 10 6 3 2 1 0 1 2 3 5 7 9 11 12 11 9 7 5 3 2 1 0 0 0 1 1 2 3 4 5 5 5 4 3 2 1 1 0 0 Figure 5.1: The probability of a slot being picked in a selection tournament. At the center in grey is slot s handled by the GA. Numbers have to be divided by 10000. In less than 1% of the cases an individual outside this part of the grid will be chosen. This probability distribution is obtained from a two-dimensional Gaussian probability distribution with a standard deviation of 3 slots. the best individual in the population can be lost. Hence, the fitness of the best individual in the population as a function of time has to be an increasing or flat function; it cannot decrease. As will be explained later, various mutation rates were compared in all our experiments. 5.2 5.2.1 First test problem: a case of no epistasis The problem Consider a fitness function f defined on sequences of size rn in the following way: f : {0, 1}rn → [0, 1] f (x11 , . . . , x1n , x21 , . . . , x2n , . . . , xr1 . . . xrn ) = 1 (f1 (x11 , x12 , . . . , x1n ) + f2 (x21 , . . . , x2n ) + . . . + fr (xr1 , . . . , xrn )) r with fi : {0, 1}n → [0, 1], 1<i<r Chapter 5. Redundancy on trial in evolution 79 The r fi functions operate independently assigning a fitness to each block of n bits; the value of f is the average of these r values. A function fi is defined by explicitly assigning a number between 0 and 1 to each of the 2n values that the input can take. The definition of f therefore requires that we generate in total r2n values between 0 and 1. In the experiments that will be described here, we have set n to 3 and r to 100: one hundred look-up tables of 8 entries each are thus required to define f . When defining a look-up table for any fi function, we make sure that the worse value is 0, the second worse value is 71 , ..., the second best value is 76 , and the best value is 1. Hence, the eight values {0, 17 , 27 , 37 , 47 , 57 , 67 , 1} appear once and only once in the table but in any order. The interaction between the blocks is purely additive. Hence, each fi can be optimised independently of the others and the global optimum of f is the concatenation of the optima of f1 , f2 ,. . . ,fr . This is therefore a case of total absence of epistasis between the r blocks. This function will nonetheless have many optima; in the case where n = 3 each fi has on average 2 maxima and f has therefore an average of 2r local maxima. In the field of theoretical population genetics, such additive models (or multiplicative ones, which have the same property) are commonly used (Peck et al., 1997; Nagylaki, 1994; Turelli and Barton, 1994; Goodnight, 1995). There are two reasons for that. One is that they are accurate models of the biological reality of the interaction of genes that contribute to a common trait (Ehdaie and Waines, 1994; Larsen, 1994). The other is that they are more easily analysed than any other model. On the other hand, for the optimisation of a function to require the use of a genetic algorithm, there has to be some epistasis in the sense described above. This could therefore raise doubts about the relevance of a function with no epistasis as a test problem. Fitness functions associated with real-world problems will typically have unknown properties. Testing novel features of a GA on such functions is therefore risky since one does not understand the underlying topology of the fitness landscape. Ultimately, of course a GA has to prove itself on such functions but experimenting with a GA is probably best done, in first instance, in a carefully controlled environment. In the case of the function just defined, the selection pressure at any one locus will be the same at all times. Once the optimum value has been found at a locus, selection does not have to do any more work, except for the occasional mutation which might displace it. In the presence of epistasis this is not the case anymore. A locus that has reached an optimum in a given context might lose its optimality simply from changes at other loci. This means that selection might need to optimise the same loci many times over. Hence, if redundancy helps selection in this process, we could expect it to be more beneficial in an epistatic case, everything else being equal, since selection has to more work to perform. Functions of the type of f can be considered a special case of a wider class of functions known as NK fitness landscapes. These functions have been proposed as a tool to investigate the dynamics of evolution by mutation and selection (Kauffman and Levin, 1987; Kauffman, 1993). In an NK fitness landscape, every bit contributes additively to the fitness of the genotype as a whole. The contribution of a bit, however, does not depend only on its value but also on the value of K other bits. These bits can be anywhere on the Chapter 5. Redundancy on trial in evolution 80 fi1 (xi1 xi2 xi3 ) xi1 xi2 xi3 fi2 (xi1 xi2 xi3 ) xi1 xi2 xi3 fi (xi1 xi2 xi3 ) 000 a0 000 a0 /3 001 a1 001 a1 /3 010 a2 010 a2 /3 011 a3 011 a3 /3 100 a4 100 a4 /3 101 a5 101 a5 /3 110 a6 110 a6 /3 111 a7 111 a7 /3 fi3 (xi1 xi2 xi3 ) → Figure 5.2: A redefinition of function f in NK fitness landscape terms. chromosome and their location has to be specified for each individual bit. The relationship does not have to be reciprocal: if the value of bit i is needed to calculate the contribution of bit j, it does not have to be the case that the value of bit j is needed to calculate the contribution of bit j. The parameter N is the total number of bits in the genotype. Functions of the type of f are therefore instances of NK fitness landscape with N equal to nr and K equal to n−1. They have an additional constraint over NK fitness landscapes: the epistatic interactions are confined within each of the r blocks, and within these blocks each of the r bits is in epistatic interaction with every other. It could be objected that our functions f are a sum of contributions from groups of r bits whereas in NK fitness landscapes contributions come from individual bits. This is only an apparent difference. We can split the contribution of a group of r bits into r equal contributions which are then assigned to individual bits. In the case where r = 3, this transformation is shown in Figure 5.2. 5.2.2 Introducing redundancy Chapter 3 showed how a permutation σ of 2n elements can be used to characterise the addition of one bit of redundancy to a function such as fi . Denoting fiσ the resulting function, we have: fiσ : {0, 1}n+1 → [0, 1] such that fiσ (0, x1 , x2 , ..., xn ) = fi (x1 , x2 , ..., xn ) fiσ (1, x1 , x2 , ..., xn ) = fi (σ(x1 , x2 , ..., xn )) Transforming the r functions fi through the same permutation σ, we can transform function f into, f σ : {0, 1}r(n+1) → [0, 1] Chapter 5. Redundancy on trial in evolution 81 such that f σ (x10 , x11 , . . . , x1n , x20 , x21 , . . . , x2n , . . . , xr0 , xr1 , . . . xrn ) = 1 σ (f (x10 , x11 , . . . , x1n ) + f2σ (x20 , x21 , . . . , x2n ) + ... + frσ (xr0 , xr1 , . . . , xrn )) r 1 The underlined bits are redundancy bits which did not exist under f . Transforming f into f σ therefore amounts to inserting a redundancy bit to the left of each of the r blocks and using the value of that new bit to decide whether, for block i, fiσ (xi0 , xi1 , . . . , xin ) is equal to fi (x1 , x2 , ..., xn ) or fi (σ(x1 , x2 , ..., xn )). The transformation of f into f σ is therefore the result of applying the same permutation-induced redundancy to each of the r blocks which jointly define a genotype for f . Note that here again f Id , the function obtained by applying permutation Id, is almost identical to f . It is defined on sequences which are r bits longer but the value of those bits is irrelevant to its value. These bits are therefore junk genetic material and provided that the per bit mutation rate is kept the same, f Id will take exactly the same time to be optimised as f . Since n is set to 3 and r to 100, the length of the chromosome on which f σ is defined will be 400. To calculate the value of f σ , we parse the genotype in 100 blocks of 4 bits and, for each block, the value of the first bit (xi0 ) decides whether the contribution of the following three bits (xi1 , xi2 , xi3 ) is fi (xi1 , xi2 , xi3 ) or fi (σ(xi1 , xi2 , xi3 )). The values resulting from the 100 blocks are then added up to give the value of f σ . In the experiments described here, f Id was always used to represent the situation when no redundancy is added. 5.2.3 Experimental procedure On this problem we compared the effect of six different permutations chosen to cover the range of possible Rσ values. The following table defines these permutations and indicates their Rσ values. σ Rσ [07143562] 0.3225 [50241367] 0.2633 [10234567] 0.1464 [40576123] 0.0852 [40675123] 0.0453 Id 0.0000 As usual, a permutation is defined by listing the values of [σ(0)σ(1)...σ(8)] except for the identity permutation which is referred to as Id. The permutation at the top is the one with the highest Rσ value in P3 . For each permutation σ in the table, we performed the following steps. For 50 trials • Define f by generating 100 random fi functions of 8 entries each, Chapter 5. Redundancy on trial in evolution 82 • Generate an initial random population, For 400 generations • Run the GA on f σ , • Calculate the number of blocks which are optimal in the best individual, • Calculate the average of this value over all 50 trials. At the end of this procedure, we have the average number of blocks optimised in the best individuals after g generations for all values of g between 1 and 400. The reason for averaging these results over 50 trials is that functions f are of variable difficulty for evolution depending on the number of local optima in the underlying fi functions which are generated randomly. If all 100 fi functions have a single optimum, f will be extremely easy to optimise; if on the other hand all 100 fi functions have 4 optima, f will be much more difficult to optimise. In each of the 50 trials, a new instance of f was generated ensuring that our comparison between the performances of different permutations is not biased by some permutations getting easier functions to optimise than others. This average is also justified by the stochastic nature of the GA. Figure 5.3 and Figure 5.4 display the number of blocks optimised in the best individual for each of the 6 permutations as a function of the genomic mutation rate. In Figure 5.4 comparison is made after 400 generations while in Figure 5.3 it is made after only 100 generations. Mutation rates were varied between 0.5 and 5 by steps of 0.5. The probability of a bit being mutated can be obtained by dividing this number by 400, the total number of bits in a chromosome. These graphs make it possible to determine an optimal mutation rate for each of the permutations which we believe is crucial for a meaningful comparison between the different permutations. Suppose that we compared permutations σ1 and σ2 for an arbitrary value of the mutation rate and concluded that σ1 was better than σ2 because it led to more blocks being optimised in a given number of generations. Unless we actually explore a range of mutation rates, we cannot be confident that this conclusion is not the result of a good match between the mutation rate used and permutation σ1 . Since ultimately, the mutation rate can be fine-tuned to maximise performance, the only meaningful way of comparing redundant codes is to use the best mutation rate possible for each permutation. 5.2.4 Results Examination of Figure 5.4 indicates that, for all permutations, the mutation rate has a noticeable impact on the performance of the GA. The pattern that emerges, both with and without recombination, is an improved performance with increased levels of mutation until a maximum is reached. Beyond that point, increasing the mutation rate degrades the performance of the GA. The optimal mutation rate depends on the permutation used and whether recombination is used or not. For any given permutation, the optimal value of mutation is higher when recombination is off than when it is on. This is understandable since in the absence of recombination, mutation becomes the only source of novelty in the system. A higher Chapter 5. Redundancy on trial in evolution PROPORTION OF BLOCKS OPTIMISED 0.75 83 07143562 (Rσ = 0.3225) 50241367 (Rσ = 0.2633) 10234567 (Rσ = 0.1464) 0.7 40576123 (Rσ = 0.0852) 40675123 (Rσ = 0.0453) 0.65 Id (Rσ = 0.0000) 0.6 0.55 0.5 0 1 2 3 4 5 MUTATIONS PER GENOME (a) Recombination OFF PROPORTION OF BLOCKS OPTIMISED 0.8 0.75 0.7 07143562 (Rσ = 0.3225) 50241367 (Rσ = 0.2633) 10234567 (Rσ = 0.1464) 40576123 (Rσ = 0.0852) 40675123 (Rσ = 0.0453) Id (Rσ = 0.0000) 0.65 0.6 0.55 0 1 2 3 4 5 MUTATIONS PER GENOME (b) Recombination ON Figure 5.3: First problem: the proportion of optimal blocks in the best individual after 100 generations as a function of the mutation rate. Each line corresponds to the use of a different permutation. Every point is the average of 50 trials. Error bars indicate the standard error. Chapter 5. Redundancy on trial in evolution PROPORTION OF BLOCKS OPTIMISED 0.82 84 07143562 (Rσ = 0.3225) 50241367 (Rσ = 0.2633) 0.8 10234567 (Rσ = 0.1464) 0.78 40576123 (Rσ = 0.0852) 40675123 (Rσ = 0.0453) 0.76 Id (Rσ = 0.0000) 0.74 0.72 0.7 0.68 0.66 0 1 2 3 4 5 MUTATIONS PER GENOME (a) Recombination OFF 0.85 PROPORTION OF BLOCKS OPTIMISED 07143562 0.8 (Rσ = 0.3225) 50241367 (Rσ = 0.2633) 10234567 (Rσ = 0.1464) 40576123 (Rσ = 0.0852) 40675123 (Rσ = 0.0453) Id (Rσ = 0.0000) 0.75 0.7 0 1 2 3 4 5 MUTATIONS PER GENOME (b) Recombination ON Figure 5.4: First problem: the proportion of optimal blocks in the best individual after 400 generations as a function of the mutation rate. Each line corresponds to the use of a different permutation. Every point is the average of 50 trials. Error bars indicate the standard error. Chapter 5. Redundancy on trial in evolution PROPORTION OF BLOCKS OPTIMISED 0.82 85 07143562 m=2.5 (Rσ = 0.3225) 50241367 m=3.5 (Rσ = 0.2633) 0.8 10234567 m=3 (Rσ = 0.1464) 40576123 m=2.5 (Rσ = 0.0852) 40675123 m=2.5 (R σ= 0.0453) 0.78 Id m=2.5 (Rσ = 0.0000) 0.76 0.74 0.72 350 360 370 380 390 400 GENERATION (a) Recombination OFF PROPORTION OF BLOCKS OPTIMISED 07143562 m=1.5 0.84 0.82 (Rσ = 0.3225) 50241367 m=2.5 (Rσ = 0.2633) 10234567 m=2 (Rσ = 0.1464) 40576123 m=1.5 (Rσ = 0.0852) 40675123 m=2 (Rσ = 0.0453) Id (Rσ = 0.0000) m=2.5 0.8 0.78 0.76 350 360 370 380 390 400 GENERATION (b) Recombination ON Figure 5.5: First problem: the proportion of optimal blocks in the best individual as a function of the number of generations. For each permutation, the optimal mutation rate has been used and is indicated in the legend. Every point is the average of 50 trials. Error bars indicate the standard error. Chapter 5. Redundancy on trial in evolution 86 0.8 PROPORTION OF BLOCKS OPTIMISED 07143562 m=1 0.75 Id (Rσ = 0.3225) m=2.5 (Rσ = 0.0000) 0.7 0.65 0.6 0.55 0.5 0 50 100 150 200 250 300 350 400 GENERATION (a) Recombination OFF 07143562 m=1 PROPORTION OF BLOCKS OPTIMISED 0.8 Id 0.76 (Rσ = 0.3225) m=2.5 (Rσ = 0.0000) 0.72 0.68 0.64 0.6 0.56 0.52 0 50 100 150 200 250 300 350 400 GENERATION (b) Recombination ON Figure 5.6: First problem: comparing the speed of evolution with permutations [07143562] and Id. For each permutation, the optimal mutation rate has been used and is indicated in the legend. The dashed line shows the number of generations it takes permutation [07143562] to reach the level of optimality obtained after 400 generations with permutation Id. Every point is the average of 50 trials. Error bars indicate the standard error. Chapter 5. Redundancy on trial in evolution 87 level of mutation is therefore needed to compensate for the absence of the other source of variation. Table 5.1: The relationship between number of blocks optimised after 400 generations and the value of Rσ . (a) Recombination OFF (b) Recombination ON σ Rσ Opt σ Rσ Opt [07143562] 0.3225 0.8145 [07143562] 0.3225 0.8405 [50241367] 0.2633 0.7923 [50241367] 0.2633 0.8208 [10234567] 0.1464 0.7587 [10234567] 0.1464 0.7842 [40576123] 0.0852 0.7477 [40576123] 0.0852 0.7753 [40675123] 0.0453 0.7239 [40675123] 0.0453 0.7685 Id 0.0000 0.7327 Id 0.0000 0.7639 Tables 5.1 shows for all 6 permutations the proportion of blocks which are optimal after 400 generations when the best mutation rate is used. If we compare the Rσ values of a permutation with the performance it achieves, we see that a higher value of Rσ always leads to a greater number of optimised blocks. The only exception is Id which optimises more blocks than [40675123] when recombination is turned off but these two permutations are very similar in their Rσ values. When Rσ was defined in Chapter 3, it was intended as a fast way of estimating the potential of a pattern of redundancy to facilitate adaptation. However, in order to keep the calculation of Rσ tractable, we had to make the following assumptions about the impact of redundancy on evolution. • that the neutral paths created by redundancy would be effectively used as a way of escaping otherwise local optima, • that the action of recombination could be ignored in first approximation. Although these assumptions sound intuitively acceptable, an experimental validation of the adequacy of Rσ was needed. The results just described indicate that Rσ fulfills its purpose in a most satisfactory way. Furthermore, Chapter 3 showed that patterns of redundancy lead in some case to a large reduction in the number of optima. However, even assuming a correlation between such reduction and some beneficial effect for adaptation, it could have been that this effect was too small to be interesting. The values found in Table 5.1 show that this is not the case; permutations with large values of Rσ have an impact large enough to be of practical relevance. We can put this impact into perspective by comparing it with recombination, the GA operator par excellence. Comparison of Table 5.1(a) and (b) shows that for any permutation, recombination increases substantially the number of blocks optimised. In the absence of redundancy (σ = Id) and recombination, 73.27% of the blocks are optimised Chapter 5. Redundancy on trial in evolution 88 after 400 generations. Adding recombination but no redundancy increases this amount to 76.39%. Adding redundancy but no recombination (σ=[07143562]) increases it to 81.45%. Hence, the improvement brought about by redundancy is more than twice as large as the one brought about by recombination. This comparison is done in favourable terms for recombination given that epistasis is null between blocks. Notice also that the improvement brought about by redundancy and recombination together is equal to the sum of the improvement brought about by each of them individually. It thus looks as if the two operate completely independently of each other. In Figure 5.5, we have plotted for the six permutations the number of blocks optimised by the best individual in the population (averaged over 50 trials) against the number of generations. For every permutation, the optimal mutation rate has been taken and is indicated in the legend. All the curves are very stable and almost flat. Comparison of Figure 5.3 and Figure 5.4 shows that the optimal mutation rate depends on the number of generations we want the GA to run for. The general trend is that the optimal mutation rate for a best result after 100 generations is lower than the one that optimises performance after 400 generations. Because we start from a truly random population, the genetic variation that exists in the population is very large at the beginning but decreases with time to reach an equilibrium value that depends on both the mutation rate and whether or not recombination is used. Hence, during the first 100 generations selection operates in an environment exceptionally rich in genetic diversity. Mutation is therefore not as needed at that stage as it is in later stages of the evolutionary process. We have seen that permutations with high values of Rσ lead to a higher number of blocks being optimised in a given amount of time. But this number is not a satisfactory currency for comparison since many more blocks are optimised per unit of time at the beginning of the GA than at the end. Instead of comparing the number of blocks optimised after a given number of generations, we can take the reverse approach of comparing the time taken by two permutations to optimise a given number of blocks. This is more revealing because numbers of generations are proportional to the number of function evaluations which is the yardstick normally used to compare optimisation methods. In Figure 5.6, permutation [07143562] and the identity permutation are plotted on the same graph over the entire length of the experiment. We compare these two permutations because one should yield the best results possible with redundancy and the other one shows what would happen in the absence of redundancy. The level of optimisation achieved by the identity permutation after 400 generations was chosen as the basis for comparison. Figure 5.6 shows that with permutation [07143562] roughly 100 generations are needed to achieve that same level. This is true whether recombination is on or off confirming that the impact of redundancy is not affected by it. Redundancy at its best thus achieves a remarkable fourfold increase in the speed of optimisation on this problem. As a comparison, recombination only achieves slightly less than a two fold increase in speed using the same criteria. Chapter 5. Redundancy on trial in evolution 89 f(i,j) 4 3 2 1 0 i-20 j i-16 i-12 i-8 i-4 i i+4 i+8 i+12 i+16 i+20 Figure 5.7: The value of f (i, j) as a function of j. 5.3 Second test problem: selection for a periodical chromosome The previous problem has shown that the impact of redundancy is very significant on a special class of NK fitness landscapes with no epistasis between triplets of bits. This section examines the impact of redundancy on a completely different problem. 5.3.1 The problem In this problem, the chromosome is read by triplets of bits and each of the possible eight values is translated into a different symbol from the set {A, B, C, D, E, F, G, H}. Fitness depends on the number of positions that separate successive occurrences of the same symbols. If i and j are the positions of two successive occurrences of the same symbol, their contribution to fitness is: f (i, j) = ||i − j|[8] − 4| where |i − j|[8] is the rest of |i − j| in the division by 8. The action of this function is easily understood by examination of Figure 5.7 where f (i, j) has been represented for values of j varying around an arbitrary value of i. The function is maximum and equal to 4 whenever consecutive occurrences of the same symbol on the chromosome are separated by a number of positions that is a multiple of eight. The case i = j never comes into consideration since we are only interested in different occurrences of the same symbol. The function is minimum and equal to 0 whenever the two symbols are separated by a number of positions which is half-way between multiples of 8 such as 4, 12, 20... In between these periodical maxima and minima, the function varies linearly with the difference between i and j. This function describes the contribution to fitness of two consecutive occurrences of the same symbol on the chromosome. The total fitness of the chromosome is the sum of all such contributions. A possible algorithm to calculate that fitness is therefore: fitness:=0, for each symbol X in {A, B, C, D, E, F, G, H}, for each consecutive appearance of X on the chromosome whose positions are i and j, Chapter 5. Redundancy on trial in evolution 90 add f (i, j) to fitness. Supposing X appears at positions i and j on the chromosome, f (i, j) will only be added to the fitness of the chromosome if X appears nowhere between i and j. If symbol X appears NX times on the chromosome, there are therefore NX −1 contribution to fitness from that symbol. The identity of optimal sequences depends marginally on the size of the chromosome. Consider for instance a chromosome coding for q symbols (i.e. of 3q bits) that contains only the symbol A. Since every occurrence of A next to itself will contribute 3 to fitness (f (i + 1, i) = 3), the fitness of the chromosome will be 3(q − 1). Consider now a chromosome where the eight possible symbols appear at the eight first positions and are subsequently repeated in the same order along the chromosome. ACDEF GHBACDEF GHB ACDEF GHBACDEF GHBACD... would be an instance of such chromosome. If the chromosome is q symbols in length, its fitness will be 4(q − 8) since every symbol except the first 8 will contribute 4 to fitness. Such chromosome will be better than the one made of a single symbol if 4(q − 8) ≥ 3(q − 1) ⇔ q ≥ 29 Hence, for a chromosome of length greater than 29, the optimal chromosome will be of the second type. In all the results described hereafter, the value of q is set to 100 and chromosomes are therefore 300 bits in length. Because the contribution to fitness of a symbol depends on other occurrences of the same symbol on the chromosome, the optimal value for three bits that define a symbol depends entirely on the information content of other parts of the genome. In other words, if we are given no information about the rest of the chromosome, all eight symbols are equally likely to be optimal at a given locus. Contrast this with the situation in the problem of Section 5.2. There, one of the triplets was optimal at a locus regardless of the alleles found at other loci. The identity of the optimal triplet changed from one trial to the next but it remained constant for the duration of a trial. In the present problem, the optimal value for a symbol at a specified position on the chromosome changes as the rest of the chromosome changes. If we examine in more detail the nature of these epistatic interactions, we find that, given a symbol X at position i on the chromosome, its contribution to fitness will be determined by the leftmost occurrence of X to the right of i, and by the rightmost occurrence of X to the left of i. Hence, we cannot point at pairs of loci which interact together all the time. The nature of the epistatic interaction itself depends on the assignement of other loci. One property always holds: there will be a maximum of 16 other loci with which a locus has epistatic interactions. These are the nearest occurrences of the eight possible symbols to the left of the locus and the eight to the right of the locus. This problem is meant as a crude analogy to amino acids evolving to optimise the shape and function of a protein. The basis for this analogy is the following. Consider three consecutive base positions i, i + 1, i + 2 on a chromosome which together code for an amino acid A in a protein P . What determines the optimal bases for positions i, i + 1 and Chapter 5. Redundancy on trial in evolution 91 i + 2 is the context defined by other amino acids that are part of P . The value of i in itself is irrelevant; if the coding sequence is shifted on the chromosome, the optimality of A at the new position remains the same. In the problem defined here too, the optimal values of positions i, i + 1, i + 2 on the chromosome are completely dependent on information held at other points on the chromosome and not at all on the value of i. As we pointed out, this is exactly the reverse of the previously defined problem where the optimal value of positions i, i + 1, i + 2 depended exclusively on the value of i. These two problems are therefore at the two extremes of a spectrum. Other aspects of the interaction between amino acids in proteins are extremely complicated to model and the function f (i, j) has no pretension to be be faithful to them. 5.3.2 Adding redundancy Adding redundancy to this problem is straightforward. The mapping T between triplets of bits and symbols from the set {A, B, C, D, E, F, G, H} corresponds exactly to what we defined in Chapter 3 as a non-redundant code. Hence the procedure described in that chapter can be applied literally here. That is, from the arbitrary mapping between triplets of bits and symbols defined on the left side of Figure 5.8, redundancy defined by a permutation σ will result in the mapping shown on the right side of that same figure. a1 a2 a3 T (a1 a2 a3 ) d1 a1 a2 a3 T σ (d1 a1 a2 a3 ) d1 a1 a2 a3 T σ (d1 a1 a2 a3 ) 000 A 0000 A 1000 T (σ(000)) 001 B 0001 B 1001 T (σ(001)) 010 C 0010 C 1010 T (σ(010)) 011 D 0011 D 1011 T (σ(011)) 100 E 0100 E 1101 T (σ(100)) 101 F 0101 F 1101 T (σ(101)) 110 G 0110 G 1110 T (σ(110)) 111 H 0111 H 1111 T (σ(111)) → Figure 5.8: Second problem: transforming the encoding through permutation σ. Without redundancy, the expression of 100 symbols would require 300 bits. Since each symbol is now encoded by 4 bits, we need 400 bits to encode the 100 symbols. On this problem we only compared the performance of the identity permutation with that of permutation [07143562] which is the permutation with the highest Rσ value. In the case of the identity permutation the leftmost bit of every block is irrelevant to the decoding of the other 3 and the results will be the same as if we had not used any redundancy. Figure 5.9 shows the redundant code that results from the application of permutation [07143562]. 5.3.3 Results The results described here were obtained using the same procedure as in the previous problem. Given that on that problem redundancy and recombination did not interfere Chapter 5. Redundancy on trial in evolution a1 a2 a3 T (a1 a2 a3 ) d1 a1 a2 a3 T σ (d1 a1 a2 a3 ) d1 a1 a2 a3 T σ (d1 a1 a2 a3 ) 000 A 0000 A 1000 A 001 B 0001 B 1001 H 010 C 0010 C 1010 B 0011 D 1011 E → 011 D 100 E 0100 E 1101 D 101 F 0101 F 1101 F 110 G 0110 G 1110 G 111 H 0111 H 1111 C 92 Figure 5.9: Second problem: transforming the encoding through permutation [07143562]. with each other, we did all our runs with recombination since these are the standard conditions for a GA. As before, all the points on the graphs are the average of 50 trials. Figure 5.10 plots the fitness of the best individual as a function of the mutation rate for both [07143562] and the identity permutation. Panel (a) displays the fitnesses reached after 100 generations while panel (b) displays them after 400 generations. This figure clearly shows that the GA operating with redundancy finds better solutions than the one operating without it. It is true after 100 and 400 generations and presumably for any number of generations in between. Both panels in Figure 5.10 are very similar. In both cases, the curve for permutation [07143562] is made of an increasing segment followed by a decreasing one. The maximum fitness is achieved in both cases for a rate of 2 mutations per genome. This maximum is quite well marked with values around it significantly lower. This contrasts with the other curve where a range of mutation rates achieves maximal or nearly maximal fitness. After 100 generations, the performance gap that exists between the performances with and without redundancy closes up for large values of the mutation rate. For mutation rates of 4.5 or 5, the two curves are on top of each other and decreasing fast. After 400 generations, the performance gap still exists for these mutation rates and performance is not in decline in such a strong way. In the case of no redundancy, the performance for a mutation rate of 5 is still almost maximal. It therefore seems that high mutation rates are bad in the short term but manage to offset this disadvantage in the long run. This probably is a consequence of the random population effect which was mentioned earlier. A higher mutation rate becomes more appropriate as the high genetic variation found in the initial random population is eroded by selection. Figure 5.11 shows fitness against number of generations for both permutations. The mutation rate that has been used for the identity permutation is the one that achieves maximum fitness after 400 generations. The mutation rate used for permutation [07143562] is the one that reaches the same fitness in the shortest time. As in the previous problem, we used this figure to compare the time taken with and without redundancy to reach the same level of fitness. Taking the fitness reached after 400 generation without redundancy as the basis for comparison, we see that the GA operating with a redundant code reaches Chapter 5. Redundancy on trial in evolution 93 248 07143562 Id FITNESS 246 244 242 240 238 0 1 2 3 4 5 MUTATIONS PER GENOME (a) After 100 generations 258 07143562 FITNESS 256 Id 254 252 250 248 246 0 1 2 3 4 5 MUTATIONS PER GENOME (b) After 400 generations Figure 5.10: Second problem: the proportion of optimal blocks in the best individual as a function of the expected number of mutations per genome. Each line corresponds to the use of a different permutation. Every point is the average of 50 trials. Error bars indicate the standard error. Chapter 5. Redundancy on trial in evolution 94 260 07143562 m=2 Id 255 m=2.5 FITNESS 250 245 240 235 230 0 50 100 150 200 250 300 350 400 GENERATION Figure 5.11: Second problem: comparing the speed of evolution with permutations [07143562] and Id. For each permutation, the optimal mutation rate has been used and is indicated in the legend. The dashed line shows the number of generations it takes permutation [07143562] to reach the level of optimality obtained after 400 generations with permutation Id. Every point is the average of 50 trials. Error bars indicate the standard error. that same level in slightly more than 160 generations. This is not quite as high as in the previous case but is still very significant from a practical point of view. Both curves have at the start a very strong upwards slope at which point they are almost indistinguishable. From generation 40, both curves experience a decrease in their rate of improvement and by generation 150 they have stabilised at a much slower slope. At that point however, the curve for redundancy is a great deal higher than the other. The superiority of redundancy therefore appears to be rooted to a large extent in that intermediary phase. Redundancy seems to be able to extend the period of high growth for a little bit longer. 5.4 5.4.1 Third test problem: finding a compact non-overlapping path on a grid The problem In this problem, chromosomes represent paths through a two-dimensional grid made of square cells. A path is defined as a succession of steps from one cell to a neighbouring one. From any cell on the grid, there are eight such neighbouring cells and hence eight possible moves as shown in Figure 5.12. The grid is large enough to ensure that no path ever reaches the edges: since the paths considered here are 100 moves in length, we have to ensure that the grid extends more than 100 cell in all directions from the initial cell to ensure that its boundaries are never reached. Chapter 5. Redundancy on trial in evolution 95 In the non-redundant version of the problem, a chromosome is decoded into a path in the following way. The chromosome, which is 300 bits in length as in the two previous problems, is read from left to right by groups of three bits. The eight possible values that those three bits can take map to the eight moves possible from the current cell. Performing these moves on the grid in the order in which they appear on the chromosome leads to a complete path. The fitness function that was used in these experiments was set up so as to select for two somehow antagonistic criteria. The first one is that the path goes through as many different cells as possible. A path going always in the same direction would be optimal in this respect since it would never cross the same cell twice and would therefore include 101 distinct cells. On the other hand, a path that went alternatively up and down would only go through two cells and would have the lowest possible score. The component of fitness corresponding to this criteria is simply defined as the number of cells, other than the initial one, which are traversed by the path. Hence the first of the examples would score 100 while the second one would score 1. The second criteria is that the rectangle enclosing the entire path has the smallest possible perimeter. Suppose we define a system of coordinates on the grid such that the cell from which the path starts has coordinates (0, 0) and a cell with coordinates (X, Y ) is found moving X times to the right and Y times up on the grid. Negative values of X and Y correspond to cells which are respectively left and down from the initial cell. Call Xmin the X coordinate of the leftmost cell that is reached by the path, Xmax the X coordinate of the rightmost one, Ymin the Y coordinate of the bottom most one and Ymax the Y coordinate of the up-most one. The variable (Xmax − Xmin) + (Ymax − Y min) is equal to half the perimeter of the smallest rectangle enclosing the entire path. Since Xmax ≥ Xmin and Ymax ≥ Ymin , (Xmax − Xmin) + (Ymax − Y min) is always positive. Fitness is obtained by subtracting the value of this variable from the value obtained from the first criteria. In the case of a path made of 100 consecutive movements to the right, Xmax would be equal to 100 while Xmin, Ymax and Y min would all be equal to 0. Hence 100 from the first criteria would be subtracted from 100 from the second criteria resulting in a fitness of 0. There are many different optimal paths for this fitness function. The maximum score that can be obtained from the first criteria is 100. It requires that 101 distinct cells are crossed by the path. The rectangle with the smallest perimeter that can encompass that number of cells would have to be 11 cells by 10. The total fitness would then be 100 − 11 − 10 = 79. Alternatively the entire path could be fitted in a square of size 10 by having only one cell traversed twice by the path. The fitness would then be 99 − 10 − 10 which is the same. 5.4.2 Adding redundancy Here again, redundancy can be added following the precise lines of Chapter 3. The mapping between groups of 3 bits and directions of movement used in the non-redundant Chapter 5. Redundancy on trial in evolution NW N W SW 96 NE E S SE Figure 5.12: From the cell at the center eight possible moves are possible as indicated by the eight arrows. version is shown in the table on the left of Figure 5.13. Introducing redundancy with permutation σ = [07143562] leads to the code shown on the right of that same figure. a1 a2 a3 T (a1 a2 a3 ) d1 a1 a2 a3 T σ (d1 a1 a2 a3 ) d1 a1 a2 a3 T σ (d1 a1 a2 a3 ) 000 N 0000 N 1000 N 001 NE 0001 NE 1001 NW 010 E 0010 E 1010 NE 0011 SE 1011 S → 011 SE 100 S 0100 S 1101 SE 101 SW 0101 SW 1101 SW 110 W 0110 W 1110 W 111 NW 0111 NW 1111 E Figure 5.13: Third problem: transforming the encoding through permutation [07143562]. Moves are abbreviated by their geographical equivalent, N for north, N E for northeast,... Redundancy takes the length of the chromosome up to 400 bits. The case of no redundancy is here again studied by the intermediary of permutation Id. 5.4.3 Results Figure 5.14 shows the fitness of the best individual in the population as a function of the genomic mutation rate. Every point on those graphs is the average of 50 trials as in previous experiments. Panel (a) shows the fitness achieved in 100 generations while panel (b) shows fitness after 400 generations. The mutation rate is increased up to a value of 10 compared with only 5 in the two previous problems. Because performance is nearly optimal at a value of 5, the range had to be extended in order to display the decline in performance with high mutation rates. The two panels shown in Figure 5.14 are very similar to what was obtained in the two previous problems. Redundancy here again leads to a better performance of the GA when comparison is done at the optimal mutation rates. In both panels, the following trend can be observed. Both curves follow a straight downwards line past a certain mutation Chapter 5. Redundancy on trial in evolution 97 73 07143562 72 Id FITNESS 71 70 69 68 67 66 0 1 2 3 4 5 6 7 8 9 10 MUTATIONS PER GENOME (a) After 100 generations 75 07143562 Id FITNESS 74 73 72 71 70 0 1 2 3 4 5 6 7 8 9 10 MUTATIONS PER GENOME (b) After 400 generations Figure 5.14: Third problem: the proportion of optimal blocks in the best individual as a function of the expected number of mutations per genome. Each line corresponds to the use of a different permutation. Every point is the average of 50 trials. Error bars indicate the standard error. Chapter 5. Redundancy on trial in evolution 75 98 07143562 m=3 Id m=5 FITNESS 74 73 72 71 70 0 50 100 150 200 250 300 350 400 GENERATION Figure 5.15: Third problem: comparing the speed of evolution with permutations [07143562] and Id. For each permutation, the optimal mutation rate has been used and is indicated in the legend. The dashed line shows the number of generations it takes permutation [07143562] to reach the level of optimality obtained after 400 generations with permutation Id. Every point is the average of 50 trials. Error bars indicate the standard error. rate. However the curve for redundancy starts its descent at a slightly lower mutation rate. This results in the curves crossing over and the curve for redundancy being under and parallel to the other one from that point onwards. The offset between the two curves at high mutation rates can be explained as follows. All mutations of the first bit in a block of four are neutral when permutation Id is used while only 3/8 of these are neutral when permutation [07143562] is used. Hence, at a given genomic mutation rate, more mutations are neutral with Id than with [07143562]. Given that in that part of the graph (m > 7) the excess of mutation is hindering performance, we expect Id to be favoured because its high proportion of neutral mutations shields it in part from the excess. Both curves would be on top of each other if the rate of non-neutral mutations was plotted. The mutation rate at which performance starts to degrade with higher mutation rates (both with and without permutation) is higher after 400 generations than it is after 100. This was also observed in the previous examples and we suggested an explanation page 92. Figure 5.15 shows fitness plotted against time for both the redundant code and the non-redundant one. This graph allows a comparison in time as performed in the previous sections. The fitness reached without redundancy in 400 generations can be reached in just under 200 with a redundant code. This is a twofold increase in speed, comparable to what was obtained in the previous problem. Chapter 5. Redundancy on trial in evolution 5.5 99 Conclusion The previous chapter showed indirectly that certain forms of redundancy could facilitate evolution’s search for fitter individuals. This chapter has shown that this effect is not just a theoretical possibility and can be demonstrated, for some mutation rates, when evolution is simulated using a GA. We also showed that previously defined variable Rσ correlates well with the magnitude of the improvement observed and can therefore used reliably as a ways of detecting useful redundancy. On three very different problems, comparison was made between a GA using a nonredundant code and the same GA using a redundant version of that code in the sense defined in Chapter 3. The pattern used has the highest Rσ value possible. Performance was always optimised with respect to the mutation rate prior to any comparison. Redundant codes were found to perform much more effectively on all three problems. This was illustrated by a reduction by a factor between 2 and 4 in the number of evaluations needed to reach an arbitrary level of fitness. Chapter 6 Some limitations to the benefits of redundancy 6.1 6.1.1 Application of redundancy to the design of a wing box The problem and the original encoding The problem that will be described here was defined as part of the Genetic Algorithms in Manufacturing Engineering (GAME) project. British Aerospace, one of the industrial partners in this project, provided data inspired from the design of the Airbus wing box for its definition. It is common in aircraft structure design to be faced with the problem of defining mechanical structures of minimal weight that can withstand a given load. The formulation of a reliable and easy to use optimisation procedure which can discover good solutions rapidly for these high dimensional problems is still a standing challenge. One of the aims of the GAME project was to assess the performance of GAs on that challenge. Figure 6.1 displays a simple sketch representing the elements of wing structure which are relevant to this problem. The wing is supported at regular intervals by solid ribs which run parallel to the aircraft’s fuselage. On the upper part of the wing, thin metal panels cover the gap separating adjacent ribs. The number of this panels is equal to the number of ribs minus one. The objective of this problem is to optimise the number of ribs (or panels) and their thickness in such a way that the weight of the wing is minimal and that it does not buckle under the compressive stresses produced by the bending moments of a 2.5g manoeuvre. The ribs are supposed to be strong enough to sustain their corresponding load and their buckling is not considered. Only the panels can buckle. All dimensions of the wing are fixed. Mass has to be minimised; it is therefore sensible to take as the fitness of a candidate wing its mass preceded by a minus sign. The mass of a panel will depend on its thickness and its position on the wing: because the wing is tapered, the panels near the tip have lower dimensions and thus a lower mass for a given thickness. The total mass of the ribs only depends on the rib pitch (i.e. on the number of ribs), not on the thickness of the panels they have to support. For every panel, the stress incurred by panel i, σi , as well as a threshold stress σit are calculated. The equations can be found in (McIlhagga et al., 1996). If the stress on the Chapter 6. Some limitations to the benefits of redundancy 101 Fuselage Ribs Top panel Rib pitch Cavity Figure 6.1: The relevant elements of a wing. The wing dimensions are fixed. The variable elements are the number of ribs and the thickness of the top panels. N th(1) ∆ th(1)= th(2)-th(1) ... ∆ th(i)= th(i+1)-th(i) ... ∆ th(N-2)= th(N-1)-th(N-2) N: Number of ribs th(i): Thickness of i th panel Figure 6.2: The representation of the wing parameters on the chromosome. panel is smaller than the threshold (σi < σit ), the panel will not buckle and the mass of the panel is added to the mass of the other panels without correction. If on the other hand the stress exceeds the threshold (σi > σit ), the panel is too thin and will buckle. The mass of the panel is then multiplied by 1 + (σi /σit ) before being added. This penalty function compensates for the weakness of the panel by increasing its thickness to a value which should allow it to withstand the stress. By doing this, the constraint of withstanding stress is converted into mass, the currency of fitness. Let us now examine the encoding that was used in the GAME project. The parameters that need to be specified for a full definition of a solution to this problem are the number of ribs, N , and the thicknesses of the N −1 panels. There is a constraint to be respected on the thickness of these panels which is that adjacent panels do not differ in thickness by more than 0.25 millimeter. The simplest way to ensure that only wings respecting this constraint are handled by the GA is to encode the differences in thicknesses between adjacent panels rather than the absolute thicknesses of the panels. If we know the difference in thickness Chapter 6. Some limitations to the benefits of redundancy 102 ∆th(i) between panel i and i + 1 for i between 1 and N − 1, the absolute thickness of the first panel is enough to define everything else. All these parameters are mapped on the chromosome in the order described on Figure 6.2. Notice that a change in ∆th(i) leads to a changes in the thickness of panel i + 1 and of all subsequent panels up to the tip of the wing; all these panels have their thickness changed by the same amount. The number of ribs, N, is represented using 4 bits. This allows 16 different values which have been chosen to be anything between 42 and 57. The thickness of the first panel was allowed to vary between 5 and 15 mm by steps of 10−3 mm. This requires a minimum a 14 bits to represent all these values. But 14 bits allow over 16,000 possible values to be encoded, some thickness were therefore represented by more than one binary sequence. We are not concerned here with the redundancy of that mapping. For all subsequent N − 2 panels the difference in thickness with the previous panel is represented on the chromosome. In the GAME project, only five values were allowed for this difference which were the result of considerations on manufacturing tolerances: -0.25 mm, -0.125 mm, 0 mm, 0.125 mm and 0.25 mm. Three bits were used to encode these five values with the following mapping: a1 a2 a3 T0 (a1 a2 a3 ) 000 -0.25 mm 001 -0.125 mm 010 0.0 mm 011 0.125 mm 100 0.25 mm 101 0 mm 110 0 mm 111 0 mm Chromosomes of constant length were used. Their length was such as to allow the encoding of the maximum number of panels possible (56). When less panels are needed, the end of the chromosome codes for non-existent panels which are simply ignored. The total number of bits needed for the chromosome is 4 + 14 + 3 × 55 = 183. 6.1.2 Modifying the encoding In order to apply redundancy in a way that is consistent with the definitions used in this thesis, we need to define a non-redundant encoding to which various forms of redundancy can then be applied. But the mapping used by the GAME project is already redundant since the value 0 is represented by four different triplets. The only acceptable way to transform this mapping into a non-redundant one is to increase the number of possible differences in thickness from five to eight. That will ensure that a different value can be assigned to each triplet. We maintained -0.25 and 0.25 as the lower and upper bounds of the range so that the same space of solutions is explored. We also kept 0 as one of the possible values. The five remaining values were chosen as shown on the following table: Chapter 6. Some limitations to the benefits of redundancy a1 a2 a3 T2 (a1 a2 a3 ) 000 -0.25 mm 001 -0.1875 mm 010 -0.125 mm 011 -0.0625 mm 100 0 mm 101 0.0833 mm 110 0.166 mm 111 0.25 mm 103 The positive half of the range is split equally between three values while the negative is split between four. 6.1.3 Introducing redundancy As in previous experiments, we focused here on the comparison of two permutations only: the permutation with the highest Rσ value, [07143562], and the identity permutation which is equivalent to using no redundancy at all. In all the experiments described here, redundancy was only used on the triplets defining differences in thickness between panels; the encoding of the value of N and of the thickness of the first panel were left unchanged. Every block defining the difference in thickness of a panel was encoded using four bits instead of three. In the case of the identity permutation, the value of the extra bit is irrelevant to fitness. In the case of permutation [07143562], the following mapping is obtained: d1 a1 a2 a3 T 2σ (d1 a1 a2 a3 ) d1 a1 a2 a3 T2σ (d1 a1 a2 a3 ) 0000 -0.25 1000 -0.25 0001 -0.1875 1001 0.25 0010 -0.125 1010 -0.1875 0011 -0.0625 1011 0.0 0100 0.0 1101 -0.0625 0101 0.0833 1101 0.0833 0110 0.166 1110 0.166 0111 0.25 1111 -0.125 The maximum length of chromosome now needed is: 4 + 14 + 4 × 55 = 238. 6.1.4 Results In all the results of this chapter, the GA described in Section 5.1 was used. The only difference with the previous chapter is that the population grid was changed from 20 × 20 to 40 × 40 in order to be in line with the values used in the GAME project. The population size is therefore 1600. Recombination was always used. Whenever fitness is represented, it is the fitness of the best individual in the population averaged over 50 trials. Some graphs plot fitness as a function of the number of generations elapsed. When fitness is plotted against the mutation rate, fitness is taken after 200 generations. Chapter 6. Some limitations to the benefits of redundancy 104 -12860 07143562 T FITNESS T2 Id -12880 2 -12900 -12920 -12940 -12960 -12980 0 1 2 3 4 5 6 7 8 9 10 11 12 MUTATIONS PER GENOME Figure 6.3: The fitness of the best individual in the population after 200 generations as a function of the mutation rate per genome. Error bars indicate the standard error. -12860 07143562m=7 FITNESS -12870 Id T2 m=9 T 2 -12880 -12890 -12900 -12910 -12920 100 120 140 160 180 200 GENERATION Figure 6.4: Comparing the speed of evolution with and without redundancy. The best mutation rates are used in both cases. Error bars indicate the standard error. Chapter 6. Some limitations to the benefits of redundancy a1 a2 a3 105 T2 (a1 a2 a3 ) 000 -0.25 001 -0.1875 010 -0.125 011 -0.0625 100 0 101 0.0833 110 0.166 111 0.25 0.166 0.25 0.0833 0.0 -0.125 -0.25 -0.0625 -0.1875 Figure 6.5: The non-redundant code T2 Figure 6.3 shows that the improvement brought about by redundancy in this case is very marginal. Only at a mutation rate of 7 is there any improvement at all. Figure 6.4 shows that it takes 165 generations to reach the level of fitness that would be achieved in 200 generations without redundancy. This is disappointing compared with the kind of improvements that were observed on previous problems. There are however good reasons for this. The rest of this section and the next one will be devoted to uncovering them. 6.2 6.2.1 Comparing redundancy on three non-redundant codes Definition The non-redundant code, T2 , on which redundancy was added in the previous experiments is shown on Figure 6.5. It was chosen because the possible values of ∆th appear in a natural increasing order in the right column as was the case in the original code used in the GAME project. But this arbitrary choice has important consequences. It implicitly determines which transitions between ∆th values are possible by point mutation and which are not. A convenient way of picturing this situation is shown in Figure 6.5. The 8 possible values of ∆th are placed at the corners of a cube according to the triplet that represents them. The association between corners of the cube and binary sequences is the same that was used in Chapter 3 and represented in Figure 3.1. From any corner of the cube, it is possible to go by point mutation to one of the three corners which are connected by an edge. As Figure 6.5 shows, with T2 it is not always possible to go from one value of ∆th to the nearest one by point mutation. From 0 for instance, it is possible to go to 0.0833 (100→101) but not to -0.0625 (100→001) which is three mutations away. To understand the part of T2 in the results of the previous section, we ran the same experiments changing T2 to other non-redundant codes. These non-redundant codes are obtained by using the same set of possible ∆th values but assigned to binary sequences in a different order. One such code is T1 represented in Figure 6.6. It is built on the same principle as a Gray code: two ∆th values which are neighbours have binary representations which are also neighbours. This way, smooth transitions in thickness are always possible by point Chapter 6. Some limitations to the benefits of redundancy a1 a2 a3 T1 (a1 a2 a3 ) 000 -0.25 mm 001 -0.1875 mm 010 -0.0625 mm 011 -0.125 mm 100 0.25 mm 101 0.166 mm 110 0.00 mm 111 0.0833 mm 0.0 0.0833 0.166 0.25 -0.0625 -0.25 106 -0.125 -0.1875 Figure 6.6: The non-redundant code T1 . a1 a2 a3 T3 (a1 a2 a3 ) 000 -0.25 001 0.00 010 0.0833 011 -0.1875 100 0.166 101 -0.125 110 -0.0625 111 0.25 -0.0625 0.166 0.25 -0.125 0.0833 -0.25 -0.1875 0.0 Figure 6.7: The non-redundant code T3 . mutation as can be seen from the cube in Figure 6.6. The other non-redundant code that was tried is T3 defined in Figure 6.7. It is built on the reverse principles of the previous one. That is, ∆th values which are neighbours are given binary representations which are at least two point mutations apart. 6.2.2 Results Figure 6.8 compares the fitness achieved with and without redundancy when code T1 is used. In this case, even the marginal advantage of redundancy obtained with T2 has disappeared. Fitness at the optimal mutation rate is slightly lower with redundancy than without. Figure 6.9 makes the comparison with and without redundancy when code T3 is used. Here, in contrast, redundancy at a mutation rate of 5 or 6 results in a noticeable increase in fitness over what is possible without redundancy at any mutation rate. The shapes and relative positions of the two lines are similar to what was obtained in the previous chapter in similar comparisons (Figures 5.10 and 5.14): a noticeable gap exists between the two curves which disappears for high mutation rates. Figure 6.10 translates this superiority of redundancy on the generation axis: the level of fitness reached in 200 generations without Chapter 6. Some limitations to the benefits of redundancy 107 -12860 07143562 T1 Id T1 FITNESS -12880 -12900 -12920 -12940 -12960 0 1 2 3 4 5 6 7 8 9 10 11 12 MUTATIONS PER GENOME Figure 6.8: The fitness of the best individual in the population after 200 generations as a function of the mutation rate per genome when non-redundant code T1 is used. Error bars indicate the standard error. -12860 07143562 T -12880 Id 3 T3 FITNESS -12900 -12920 -12940 -12960 -12980 -13000 -13020 0 1 2 3 4 5 6 7 8 9 10 11 12 MUTATIONS PER GENOME Figure 6.9: The fitness of the best individual in the population after 200 generations as a function of the mutation rate per genome when non-redundant code T3 is used. Error bars indicate the standard error. Chapter 6. Some limitations to the benefits of redundancy -12850 07143562 m=5 T Id -12900 FITNESS 108 3 m=7 T 3 -12950 -13000 -13050 -13100 0 50 100 150 200 GENERATION Figure 6.10: Comparing the speed of evolution with and without redundancy when T3 is used. The best mutation rates are used in both cases. Error bars indicate the standard error. redundancy can be reached in only 120 when redundancy is used. This is the sort of improvement which was observed on the problems of the previous chapter. In Figures 6.11 we have represented the same data in a different manner. The first figure compares the performances of T1 , T2 and T3 when used in their raw form, i.e. prior to the addition of any redundancy. Figure 6.12 makes the same comparison after redundancy has been added to them. Prior to the addition of redundancy to them the three non-redundant codes do not perform equally well. At almost all mutation rates examined here, the following relation emerges: T1 > T2 > T3 . However, once redundancy has been added to them, the three codes perform equally well. The level of performance achieved by the three codes in this case is the same as is achieved by T1 without any redundancy. Redundancy hence compensates for the disadvantage of the other two. We conclude that, on this problem, the impact of redundancy on the GA depends on the choice of the non-redundant code as well. Redundancy does badly on T1 which preserves, at the binary level, the natural closeness that exists between similar values of ∆th; but it does much better on T3 which does not preserve that distance. The other code, T2 , is somewhere in between and performs accordingly. The most likely explanation for these observations is that the kind of local optima which redundancy removes does not exist on this problem when a code such as T1 is used. Only an unnatural code such as T3 introduces some of these optima which redundancy can then remove. Chapter 6. Some limitations to the benefits of redundancy 109 -12860 Id T -12880 Id T 3 -12900 FITNESS 1 Id T 2 -12920 -12940 -12960 -12980 -13000 -13020 0 1 2 3 4 5 6 7 8 9 10 11 12 MUTATIONS PER GENOME Figure 6.11: Comparing non-redundant codes T1 , T2 and T3 without redundancy. Error bars indicate the standard error. -12860 FITNESS -12880 07143562 T1 07143562 T2 07143562 T3 -12900 -12920 -12940 -12960 -12980 0 1 2 3 4 5 6 7 8 9 10 11 12 MUTATIONS PER GENOME Figure 6.12: Comparing non-redundant codes T1 , T2 and T3 with redundancy in the form of permutation [07143562]. Error bars indicate the standard error. Chapter 6. Some limitations to the benefits of redundancy 6.3 110 Why does T1 perform better than T3 ? 6.3.1 Non-redundant codes and partial fitness functions For the purposes of this problem, a wing W is defined by the specification of the number of panels, N , the thickness of the first panel, th(1), and an array of N − 1 numbers, [∆th(1), ∆th(2), ..., ∆th(N − 1)], all taken from the set S = {−0.25, −0.1875, −0.125, −0.625, 0, 0.083, 0.166, 0.25}. If we set the value of ∆th(i) to each of the eight possible values while keeping all other values ∆th(j) as they are in W , we obtain eight different wings. We call Wx the wing obtained by setting ∆th(i) = x. The function z, z : {−0.25, −0.1875, −0.125, −0.625, 0, 0.083, 0.166, 0.25} → ] − ∞, 0] z(x) = F (Wx ) where F (Wx ) is the fitness of wing Wx , describes what happens when ∆th(i) = x is varied in W while everything else is kept constant. If the value of ∆th(i) found in the original definition of W is not the one that maximises z, there is some scope for improving the wing by changing the bits that define ∆th(i). Notice that the definition of z depends completely on the choice of W and i: changing the value of any other panel other than i will change the definition of z. Functions such as z were in fact previously encountered in the context of Section 3.4.1. They were the justification for assigning fitnesses to elements of S so that we could calculate numbers of optima before and after the introduction of redundancy (Nf and Nf σ ). Indeed, when a function such as z is combined with a code T (of which T1 , T2 and T1 are examples) it produces a function: z ◦ T : {0, 1}3 → ] − ∞, 0] which allows us to talk of beneficial and deleterious mutations over these triplets. To calculate the average effect of a form of redundancy, it was assumed in Chapter 3 that all possible functions z would be encountered with equal probability. The translation of this assumption in the present context would be that, for a random wing W and a random value of i, any change of ∆th(i) is as likely to improve fitness as any other. A consequence of this assumption would be that any non-redundant code is as good as any other. The previous section has showed that this is not the case: some non-redundant codes perform better than others on this problem. It must therefore be the case that the functions z are not uniformly distributed in this problem. They must have some statistical properties cause them to combine gracefully with code T1 and ungracefully with code T3 . Consider in Figure 6.13 a few of the shapes a function z might take. When z is strictly monotonic as pictured at the top left, mutations which increase the value of ∆th increase the fitness of the wing. Since under T1 such mutations are always possible, function z ◦ T1 will have a single optimum (0.25 mm). Function z ◦ T2 , will also have a single optimum in this case, since transition to some larger value (not necessarily the one immediately larger) of ∆th is always possible from Chapter 6. Some limitations to the benefits of redundancy -13150 111 -13260 -13280 -13200 FITNESS FITNESS -13300 -13250 -13300 -13320 -13340 -13360 -13380 -13350 -13400 -13400 -13420 -0.25 -0.18 -0.12 -0.06 0.0 ∆ th 0.08 0.16 0.25 -0.25 -0.18 -0.12 -0.06 0.0 ∆ th 0.08 0.16 0.25 -13150 FITNESS -13200 -13250 -13300 -13350 -13400 -0.25 -0.18 -0.12 -0.06 0.0 ∆ th 0.08 0.16 0.25 Figure 6.13: Possible variations of fitness when changing the thickness of a single panel. any binary triplet. However, the same function z will combine badly with code T3 creating 4 local optima: 0.0, 0.08, 0.16 and 0.25. Indeed since these four values are two mutations apart from each other (Figure 6.7), point mutations from any of them will lead to values of ∆th which are all smaller. A function such as the one at the top right of Figure 6.13 will also result in z ◦ T1 having a single optimum since moves to the immediately larger or smaller value of ∆th are always possible by point mutation under T1 . However, z ◦ T2 would have two optima since from 0.00 (the second best value) one cannot make the transition to 0.06 because their respective binary representations, 100 and 011, are three mutations away. The function at the bottom of Figure 6.13 is an instance of function z that combines better with T3 than T1 ; z ◦ T1 has 4 optima while z ◦ T3 has only one. These three examples illustrate that for any instance of z, some non-redundant codes are well matched and some are not. Only on average over all possible types of function z are all non-redundant codes equivalent. If in the present problem z functions are often like those at the top of Figure 6.13, many fitness improvements will be possible by point mutation when code T1 or T2 is used which will be impossible when code T3 is used. This is bound to have an impact on the speed of the GA and would explain the differences in performance shown in Figure 6.11. We showed in Chapter 3 that a permutation such as σ =[07143562] caused the function z ◦ T σ to have less optima than the function z ◦ T provided that the functions z were randomly generated. When this is true, the function z ◦ T has on average two optima. But in the case where code T1 combines only with functions such as the ones on top of Chapter 6. Some limitations to the benefits of redundancy 112 Figure 6.13, this permutation cannot do anything because the numbers of optima of z ◦ T1 is always one. In this case, no permutation could do better because there is simply no room for redundancy to act. The same z function combined with codes like T3 offers a very rich terrain for redundancy to make a difference. And as Figure 6.12 shows, the performance of the GA with T3σ reaches the same level as T1 . Redundancy therefore compensates for the poor match between T3 and z. We have reasons to expect functions z to be more like the ones at the top of Figure 6.13 in this problem. Among the eight possible values of ∆th for a given panel of the wing, there might be a threshold value below which the wing buckles and above which it does not. When that is the case, the optimal value for ∆th will be the lowest value for which the wing does not buckle. Indeed any lower value will be associated with very low fitness (because of the buckling) and increasing the thickness beyond that threshold value will add unnecessary mass to the wing. Hence functions z will always be decreasing for those values where the wing does not buckle. We cannot predict the shape z is likely to have for those values which cause the wing to buckle. But any local optima that could exist there is probably not relevant: the low fitness of these points means that they are not the ones where the GA is likely to get stuck. Hence, for the parts of the function which matters, z functions are likely to be monotonic. 6.3.2 Counting numbers of optima The previous section proposed that the difference in performance between the three codes T1 , T2 and T3 is due to the functions z being biased towards a situation where z ◦ T1 has few optima while z ◦ T3 has many (and z ◦ T2 being somewhere in between). This section will test this claim experimentally. Although a low number of optima indicates an easy task for the GA, we should not assume that many optima necessarily lead to a difficult one. We can imagine situations where many optima exist which are irrelevant to the evolutionary process because individuals handled by the GA — or at least those which breed — are never found to be held up by them. Genotypes handled by the GA are a tiny subset of all possible genotypes whose atypical features are (1) to be of higher than average fitness, (2) to be such that a GA can reach them by mutation and recombination. To make sure that the numbers of optima we find for the function z ◦ Tk (k = 1, 2, or 3) are representative of a real difficulty for the GA, we will also calculate the probability of a genotype from an evolving population being found in such optima. As we have emphasised, a function z is defined given the choice of a wing W (or equivalently a genotype G coding for W ) and an integer i indicating which value ∆th(i) is being varied while the others are kept constant. As we are only interested in the functions z ◦ Tk encountered by the GA, it is sensible to take G among a population of evolving genotypes rather than completely randomly; in all that follows, G is the best individual in the population after g generations. We choose the best individual because it is the one from which improvement is most likely to come and any suboptimal value of ∆th(i) in that individual will be taken as good indication that the GA is hindered. The number of Chapter 6. Some limitations to the benefits of redundancy 113 generations g is a parameter which is varied to monitor how the situation changes as the population becomes fitter. We want to compare the situation when each of the three codes T1 , T2 and T3 was used; hence the following steps were performed. For 1 ≤ k ≤ 3 Repeat 200 times Run a GA with code Tk for g generations Pick G, the best individual in the population, For 1 ≤ i ≤ 40 (a) Count the number of optima in function z ◦ Tk defined by G and i. (b) Check if the value of G at ∆th(i)is a local (non-global) optimum. The answer to (b) is necessarily negative if the answer to (a) is 1 since there are no local optima in that case. If the number found in (a) is large but never results in (b), we can not invoke (a) as a cause of delay for the GA. In one set of graphs, we averaged the number found in (a) over the 40 values of i and the 200 trials and similarly calculated the proportion of cases where the answer to (b) was positive out of these 200 × 40 cases. In another set of graphs, we kept separate averages (over 200 trials) for each value of i and plotted these results as a function of i. By keeping i lower than 40 we make sure that the values of ∆th(i) are always relevant to the fitness of the wing. Larger values would be in the variable length part of the chromosome and would therefore not always be under selection. 6.3.3 Results and Discussion Figure 6.14 shows how the number of optima (averaged over all values of i) changes with the number of generations g. Three lines are plotted corresponding to T1 , T2 and T3 . The line for T3 is well above the other two with an initial value of 3.75 that increases over 50 generations to an equilibrium value of 3.8. Code T1 is at the other extreme with values very near the minimum value of 1. It starts at 1.03 and increases steadily to 1.08 over the 200 generations. Code T2 is in between the two but much closer to T1 than to T3 . Whereas the other two increase with g, T2 decreases from 1.25 to 1.2. This supports our explanation for the differences in performance. The number of optima of z ◦ T1 is only marginally greater than 1 indicating that there are very few local optima in which a GA using that code could be caught. The number of optima of z ◦ T3 is very high with an average of 3.85 local optima. This tells us there are many optima in the vicinity of the best individual in the population. It does not tell us whether the best individual tends to avoid them or not. Figure 6.15 answers this question. It plots the probability that the best genotype G is found with ∆th(i) at a local optimum as a function of g. As in the previous figures, all values of i are averaged together. With T3 , over 40 % of the blocks are at local optimum at generation 0. This value diminishes over the entire length of the trial but after 200 generations it is still greater than the starting value for code T2 . The decrease of this Chapter 6. Some limitations to the benefits of redundancy 3.9 114 T3 3.85 NUMBER OF OPTIMA 3.8 3.75 3.7 3.65 3.6 T2 T1 1.3 1.25 1.2 1.15 1.1 1.05 1 0 50 100 150 200 GENERATION Figure 6.14: The average number of optima in blocks defining a panel thickness as a function of the number of generations. This data is an average of the first 40 panels of the best individual. Each line corresponds to a different non-redundant code. PROPORTION OF BLOCKS IN LOCAL OPTIMA Chapter 6. Some limitations to the benefits of redundancy 115 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 T3 0.06 T2 T1 0.05 0.04 0.03 0.02 0.01 0 -0.01 0 50 100 150 200 GENERATION Figure 6.15: The proportion of blocks found at a local optimum as a function of the number of generations. This data is an average of the first 40 panels of the best individual. Each line corresponds to a different non-redundant code. Chapter 6. Some limitations to the benefits of redundancy 116 number with time shows that these optima are in the way of the GA and that part of the evolutionary effort is directed at eliminating them. In contrast the line for T1 shows that these optima are not an issue when this code is used. Their number is initially less than 0.5% and it converges towards 0 in less than 20 generations. The line for T2 is here again between the two but closer to T1 than to T3 . Convergence to 0 is achieved in 70 generations. Taken together, the results of Figure 6.14 and Figure 6.15 give strong support to the idea put forward at the end of Section 6.3.1. The statistical properties of functions z encountered in this problem are such that function z ◦ T1 cannot be improved by redundancy whereas the average function z ◦ T3 has much scope for it. The results that will be presented now show the same data analysed differently. Instead of averaging the points obtained for different values of i, we keep them separate and treat i as a variable. This will allow us to determine whether the way z interacts with T depends on the part of the wing which is considered. Variable i replaces g on the X axis and different values of g are represented as different lines on the same graph. The data for T1 , T2 and T3 appear in Figure 6.16, Figure 6.17 and Figure 6.18 respectively. Figure 6.16(a) shows how the number of optima of function z ◦ T1 changes with i. For any value of i greater than 18 and for any value of g, the number of optima is exactly 1 in this case (which means that it is consistently 1 for each of the 200 trials). For values of i between 13 and 17 the number of optima is around 1.1 and does not change significantly with g. For values between 6 and 12, the numbers are around 1.1 to start with but they increase significantly with g. At generation 200, the numbers culminate at 1.5 for i =10, 11 and 12. For i between 1 and 5, numbers are low and stable with g. Figure 6.16(b) shows that for values of i greater than 18, the best individual in the population is never caught in local optima. We can guess that from Figure 6.16(a) since for such values of i there is only one optimum which must be global. After only 20 generations the best individual is free of local optimum anywhere on the chromosome. In the limited range of space and time where local optima are found, no clear trend emerges. In any case, the high value observed for i = 10 in Figure 6.16(a) does not cause a high probability of being caught in a local optimum for that value of i. Figure 6.17(a) shows the number of optima of z ◦ T2 as a function of i. For i = 19 this number increases with time while it decreases or remain stable for all other values of i. The depression with time is most marked for i = 10 where the numbers go down to 1. This contrasts with z ◦ T1 where, for the same value of i, numbers of optima went up with g to 1.5. This is therefore a value of i for which z seems to interact more beneficially with T2 than with T1 . A closer look at the function z for that value of i reveals the following trend. The best genotype in the population is such that a value of ∆th(10) lower than 0.166 results in a wing that buckles and hence of low fitness. However, among these low fitness values, z(−0.125) or z(−0.0625) are slightly higher than the others. As can be seen from Figure 6.6, this creates a local maximum under T1 because -0.125 and -0.0625 are two point mutations away from the better values 0.166 and 0.25. Under T2 on the other hand, mutation from Chapter 6. Some limitations to the benefits of redundancy T1 1.6 NUMBER OF OPTIMA 117 gen=0 gen=50 gen=100 gen=150 gen=200 1.5 1.4 1.3 1.2 1.1 1 0.9 0 5 10 15 20 25 BLOCK POSITION (a) T1 PROBABILITY OF LOCAL OPTIMA 0.04 gen=0 gen=10 gen=20 gen=200 0.03 0.02 0.01 0 -0.01 0 5 10 15 20 25 BLOCK POSITION (b) Figure 6.16: Optimality along the wing with T1 . (a) The number of optima of z ◦ T1 as a function of i. The genotype G used is the best in the population after g generations. (b) The probability of G being at a local (non-global) optimum for ∆th(i) as a function of i. In both cases, several values of g are plotted. Each point on both s graphs is the average of 200 trials. Chapter 6. Some limitations to the benefits of redundancy T2 2 NUMBER OF OPTIMA 118 gen=0 gen=5 gen=10 gen=20 gen=50 gen=200 1.8 1.6 1.4 1.2 1 0 5 10 15 20 25 BLOCK POSITION (a) T2 PROBABILITY OF LOCAL OPTIMA 0.2 gen=0 gen=10 gen=20 gen=40 gen=50 gen=200 0.15 0.1 0.05 0 -0.05 0 5 10 15 20 25 BLOCK POSITION (b) Figure 6.17: Optimality along the wing with T2 . (a) The number of optima of z ◦ T2 as a function of i. The genotype G used is the best in the population after g generations. (b) The probability of G being at a local (non-global) optimum for ∆th(i) as a function of i. In both cases, several values of g are plotted. Each point on both s graphs is the average of 200 trials. Chapter 6. Some limitations to the benefits of redundancy 119 -0.125 to 0.166 and from -0.0625 to 0.25 are possible. Figure 6.17(b) shows that initially the probability of finding the best individual at a local optimum is around 10% for values of i between 1 and 19. As generations pass, this number decreases for all values of i. But the decrease is slower for i =18 or 19, the values for which the number of optima is largest. These hot spots of optima on the chromosome have therefore an impact on the GA in this case. We can imagine that changing the coding from T2 to T1 for ∆th(18) and ∆th(19) would eliminate the most significant handicap of code T2 for this problem and a GA using this hybrid coding would perform nearly as well as one using only T1 . Figure 6.18(a) shows that, when T3 is used, numbers of optima are very high for all values of i, most of them being very near the largest possible value of 4. The value of i = 19 which was a maximum when T1 is now a minimum. In Figure 6.18(b) we see that the probability of being caught in a local optimum is over 30 % for all values of i. After only 10 generations this number has been reduced drastically to less than 10 % for values of i < 15 but other values are still high. Optimisation seems to happen first for low values of i progressing with time towards larger values of i. After 200 generations, values of i near 40 are still above the 20 % mark. We can understand why low values of i are optimised before large ones. Given the nature of the encoding, a change in ∆th(1) changes the thickness of all panels by the same amount, while a change in ∆th(40) changes only panels between the 41st and the tip of the wing. For low values of i, changes in ∆th(i) will have a much larger impact on fitness; this results in a much higher pressure of selection on these values until they are optimal. We have seen that there is some diversity, across values of i, in the way z interacts with codes T1 , T2 and T3 . However, for all values of i, z interacts better with T1 than with T2 and better with T2 than with T3 . Even in the rare case where the number of optima of z ◦ T2 is smaller than the one for z ◦ T1 (i = 10), it turns out that the GA is unaffected by the optima at that point. In the case of code T2 and code T3 , we have seen that the higher number of optima of function z ◦ T has an impact on the performance of the GA. 6.4 Conclusion We can draw several conclusions from these results. The first one is that the nature of the non-redundant code is important in this problem and code T1 is a very good choice for it. A code such as T2 is only marginally worse and we have been able to identify a few points on the chromosome which are responsible for most of the discrepancy. Code T3 on the other hand is highly inadequate for this problem. When, as is the case here, the regularity of the z functions can be exploited by a nonredundant code T to yield very few optima on z ◦ T , no form of redundancy will be able to contribute anything to the problem. The best rule to match a code T to a function z, is that elements of S which tend to have similar fitness should be encoded by neighbour binary sequences. However, in the case where the non-redundant code is not well matched to the problem, redundancy can be applied to great effect. Code T3 , probably the worse possible choice Chapter 6. Some limitations to the benefits of redundancy T3 4.2 NUMBER OF OPTIMA 120 gen=0 gen=10 gen=50 gen=200 4 3.8 3.6 3.4 3.2 3 2.8 0 5 10 15 20 25 BLOCK POSITION (a) T3 0.7 gen=0 gen=10 gen=50 gen=100 gen=200 PROBABILITY OF LOCAL OPTIMA 0.6 0.5 0.4 0.3 0.2 0.1 0 -0.1 0 5 10 15 20 25 30 35 40 BLOCK POSITION (b) Figure 6.18: Optimality along the wing with T3 . (a) The number of optima of z ◦ T3 as a function of i. The genotype G used is the best in the population after g generations. (b) The probability of G being at a local (non-global) optimum for ∆th(i) as a function of i. In both cases, several values of g are plotted. Each point on both s graphs is the average of 200 trials. Chapter 6. Some limitations to the benefits of redundancy 121 for that problem, was raised to a performance level equal to T1 by the introduction of redundancy. Redundancy can therefore help when the properties of z are unknown and one has no indication of whether the non-redundant code is well matched to the problem or not. Also, some fitness functions will be such that the properties of z will be highly variable from one part of the chromosome to another. It will hence be impossible for a non-redundant code to cope advantageously with all of them leaving scope for redundancy to improve the situation at those points in the chromosome where the mismatch results in z ◦ T having large numbers of optima. Chapter 7 Conclusion 7.1 Summary of the approach and main contributions Although a lot is known about the biochemical details of the implementation of the genetic code, its origins and evolution remain poorly understood. In particular, very little is known about whether the assignment of amino acids to triplets is arbitrary or whether it was selected because of beneficial properties for the evolutionary process. Part of this ignorance is due to the persisting image of the frozen code suggested by Crick in 1968. Not enough attention has been paid to the fact that the universal genetic code is not in fact universal and that some of its variants are relatively recent (Osawa et al., 1992). Such findings suggest that some changes might outlive others because of their positive consequences for evolution. This thesis argues that changing the pattern of redundancy, while keeping the set of possible amino acids constant, can have such positive consequences. Questions concerning the impact of codes on evolution can be also be asked in the context of genetic algorithms. Indeed, in many GA applications, associations similar to the genetic code are defined between binary sequences and possible values of a variable involved in the definition of a candidate solution. The expectation was that, for these codes as for the genetic code, carefully chosen patterns of redundancy could improve the performance of the GA. The two questions can be addressed simultaneously since a GA is a valid tool to simulate the natural phenomenon of mutation and selection. Chapter 3 defined a possible formal language in which these questions could be phrased unambiguously. The desired features of the language were that • it would dissociate redundancy from the power of expression of the code so that the definition of redundancy would not be tied down to the semantics of the code. • it could characterise with precision a large number of possible forms of redundancy. The result was a description of patterns of redundancy by means of permutations defined over a set of 2n elements, where 2n is the number of sequences used by the code. As a result of the first feature, a pattern of redundancy can be introduced in any problem where 2n symbols are coded in binary form. (Following the terminology of Chapter 3, the word symbol is used here Chapter 7. Conclusion 123 to designate those things which are represented by the code, be they amino acids or any building block used by the GA.) It is therefore possible to compare the effect of a pattern of redundancy across different problems. Ideally, each of these patterns of redundancy would be assessed on its ability to improve the performance of a GA on some test problems. This, however, is too long a process given the large number of possible patterns. Instead we proposed a shortcut for discovering promising redundancy patterns based on their ability to suppress local optima. We talk of local optimum when a symbol of the code (in the sense defined above) has a higher fitness than all other symbols that can be reached from it by point mutation. It is therefore only possible to count the number of local optima when we have a ranking of the symbols by fitness. Such ranking can be seen as resulting from a partial fitness function obtained by trying out all possible symbols at an arbitrary position in an arbitrary genotype. At a given position in a protein, for instance, each amino acid can be ranked according to how well the protein performs its function with that amino acid at that position. We assumed that, averaged over all genetic contexts, all rankings of the symbols are equally likely to arise. We therefore average the number of optima obtained for a large number of such rankings. For a permutation σ, we obtain a number Rσ which represents the average proportion of local optima suppressed by the introduction of σ compared with the case where no redundancy exists. In Chapter 4, we performed a large scale study of the patterns of redundancy so defined. The aim was to assess the variation in Rσ values across the whole set of permutations and look for the features of a permutation leading to a high value of Rσ . All permutations were found to have overall a positive effect on the number of optima, the best ones reducing the numbers by more than 30%. Two main features were identified as leading to large values of Rσ . The first one is the number of neutral mutations that is induced by the pattern of redundancy. This number is also the number of invariant elements of the permutation that defines the pattern. It was found that a number of neutral mutations around half the maximum possible number leads to the highest expected value of Rσ . The best permutations were also found to have this property. The second feature has more explanatory power but is more difficult to identify in a permutation. It relies on classifying each of the possible pairs of symbols into four classes according to the relative ease with which mutation could transform one into the other. A linear combination of the numbers found in each class correlates highly with Rσ . We conclude that good patterns of redundancy do indeed work by creating more routes which can be used by mutation to change one symbol into another. In Chapter 5, we checked how well Rσ predicted the outcome of a genuine evolutionary trial. Some permutations were chosen covering the entire range of Rσ values and the resulting redundancy patterns were included in the code used by a GA running on an NK fitness landscape. We found that Rσ predicted the performance of the GA: the higher the value of Rσ , the better the GA performed when σ was used to define redundancy. The best permutation speeded up evolution by a factor of four, twice as much as the increase in speed obtained by using recombination. The best pattern of redundancy was tried on Chapter 7. Conclusion 124 two other problems where it was found to speed up optimisation by a factor of two. In Chapter 6, the best redundancy pattern was included in the code of a GA optimising the design of an aeroplane wing. The gain in speed on this problem was very marginal. However, we were able to show that the gain depends on the definition of the non-redundant part of the code. In this problem, symbols of the code are real numbers; substitution of one real number by a close one is therefore likely to result in a smaller change in fitness than substitution by a very different one. This violates the assumption made when determining the best pattern of redundancy that substitutions of one symbol by another are all likely to cause the same change in fitness. It could therefore be the case that better patterns of redundancy would have been missed because the procedure used in Chapter 4 does not reflect the conditions of this problem. In fact, we have been able to show that this is not the case. Rather, the problem is such that when a non-redundant code is built according to the principles of a Gray code, there are no local optima of the kind that redundancy can suppress. No form of redundancy could therefore improve the GA when that is the case. If on the other hand, a sub-optimal non-redundant code is used, the best pattern of redundancy can compensate for that choice and improve the performance of the GA to the level obtained with a Gray code. 7.2 Conclusions for GAs There is no easy way to tell whether a code can be improved by introducing redundancy. When the symbols are real numbers, a non-redundant code built on the principles of Gray coding will probably introduce few local optima of the kind that redundancy can eliminate. If, however, the partial fitness functions obtained by varying one such real number at a time are not perfectly smooth, a Gray code will still have some scope to be improved by redundancy. Whether this is the case or not can be checked experimentally by picking random individuals in the evolving population and trying all possible binary combinations at the positions that define one such real number. In problems where the symbols encoded are in no obvious distance relation to each other, redundancy is very likely to be a useful addition to the code such as in the two last problems of Chapter 5. The assessment of redundancy was always done on the basis of an optimal mutation rate. That is, we compared the performance of the GA operating with and without redundancy at the optimal mutation rate in both cases. This is, we think, the fairest form of comparison. But that means that the gains will only be obtained if one takes time to try several mutation rates. If one operates at a random mutation rate, redundancy might, in many cases, bring no benefit. But since it never degrades the performance of the GA provided that the best mutation rate is used, it can be considered a safe bet to include it. If indeed redundancy improves evolvability by increasing the number of transitions possible from any given symbol, we should expect another modification of the GA to have roughly the same effect. Consider for instance the problem described in Section 5.3. The non-redundant code T used in that problem meant that mutation could only change A into B, C or E. Transition from A to any other symbol required that several mutations took place simultaneously in Chapter 7. Conclusion 125 the triplet of bits that define A, an unlikely event at the kind of mutation rates at which a GA operates. Instead of adding redundancy, we can redefine the mutation operator in the following way. Abandoning a binary representation of the symbols A, B, C, D, E, F, G, H, we define mutation in such a way that any of these symbols can mutate into any other with equal probability. In that case, none of the symbols can ever be a local (non-global) optimum since transition to the best symbol at any loci is always possible with probability pmut /8. The elimination of all local optima in this way is not, however, without a cost. We can characterise this cost by imagining the consequences of using the same procedure on entire chromosomes instead of blocks defined by a small number of bits. Redefining mutation in such a way means that the offspring of a chromosome is equally likely to be any other chromosome in the search space. The GA is then effectively performing a random search. The reason for constraining an offspring to look genetically like its parent is that we assume that choosing a new solution in the neighbourhood of a good one is more likely to be successful than picking one totally at random. That is we expect the quality of the parent to be somehow heritable by the offspring. Similarly, at the symbol level, we might want to enlarge the set of possible transitions without completely destroying the underlying distance that preexisted between those symbols. Redundancy, in the way defined in this thesis, allows to do just that. Both the redefinition of mutation just outlined and redundancy can probably be seen as instances of a more general definition of mutation through the following matrix: P (A → A) P (B → A) P (C → A) P (D → A) P (E → A) P (F → A) P (G → A) P (H → A) P (A → B) P (B → B) P (C → B) P (D → B) P (E → B) P (F → B) P (G → B) P (H → B) P (A → C) P (B → C) P (C → C) P (D → C) P (E → C) P (F → C) P (G → C) P (H → C) P (A → D) P (B → D) P (C → D) P (D → D) P (E → D) P (F → D) P (G → D) P (H → D) P (A → E) P (B → E) P (C → E) P (D → E) P (E → E) P (F → E) P (G → E) P (H → E) P (A → F ) P (B → F ) P (C → F ) P (D → F ) P (E → F ) P (F → F ) P (G → F ) P (H → F ) P (A → G) P (B → G) P (C → G) P (D → G) P (E → G) P (F → G) P (G → G) P (H → G) P (A → H) P (B → H) P (C → H) P (D → H) P (E → H) P (F → H) P (G → H) P (H → H) where, for instance, 1 − P (A → A) is the probability of mutation of the A symbol and P (A → E) is the probability that it mutates into an E. The numbers on any of the lines should add up to 1. This matrix can express any situation where all transitions between symbols are possible but not necessarily with the same probabilities. For any given problem, one could try to find the coefficients of this matrix that would optimise the performance of the GA. However, the extra time spent in doing this would have to be compared to the gains in performance. 7.2.1 Further lines of research The previous discussion suggests important lines of research. It would be interesting to see whether, indeed, the patterns of redundancy defined in this thesis can actually be summarised in a matrix as outlined above. That is, whether for any permutation, a matrix can be defined which cause the GA to behave as if redundancy had been added. If that was the case, then the study of redundancy could be incorporated into this more general framework. Chapter 7. Conclusion 126 As for redundancy itself, we need to understand better the features which cause a pattern of permutation to improve codes. When the value of n is greater than 3, exhaustive search for the best patterns is not possible. We therefore need to find some practical ways of finding the patterns with high Rσ values. The number of invariant elements of a permutation gives some indication of its Rσ value but some additional criteria are needed in order to find the very best patterns. Some work can be done exploring the consequences of adding more than one bit of redundancy along the lines defined in Section 3.3. From a practical point of view, this is probably only interesting for codes defined on more than 4 bits. 7.3 7.3.1 Conclusions for the genetic code How optimal is the redundancy of the code? The experiments of Chapter 5 and Chapter 6 lead us to believe that changes in the redundancy of the genetic code have significant consequences for the evolutionary process. But has selection been able to take advantage of this fact? One indication of this would be to find that the redundancy of the code matches those patterns which have been found to be optimal. We saw in Chapter 2 that some of the redundancy found in the genetic code is a necessary consequence of the way tRNAs bind with mRNAs; the so-called wobble rules make it impossible for more than 2 amino acids to be specified by four codons which differ only at the third position. When two amino acids are indeed specified by such codons, it is almost always the case that XYU and XYC will code for one amino acid and XYA and XYG for the other one. As was argued in Section 2.1.6, the redundancy resulting from these rules has not been the object of any selection. It is therefore best left out of our discussion here. One way of leaving it out of further considerations is to assimilate XYU and XYC to a single point of our conceptual sequence space which we denote XYU C and do the same for XYA and XYG which we call XYA G . Having done that we are left with a sequence space containing 32 points. U U A Either XYA G and XYC have the same meaning such as CCG and CCC which both U code for proline, or their meaning differs such as UUA G which codes for leucine and UUC which codes for phenylalanine. This is comparable to the situation in Chapter 3 where the transition from a sequence in C0 to one in C1 by mutation of the redundancy bit could either be neutral or not. It therefore makes sense to think of sequences XYA G as being in C0 and sequences XYU C as being in C1 or vice-versa. We can now see how the language used in this thesis to define redundancy can be, to some extent, applied to the genetic code. We saw that in an early version of the code, the third base was probably never relevant A i.e. XYU C would always be synonymous of XYG . In the analogy defined above, this matches perfectly the pattern of redundancy associated with the identity permutation since all transitions from C0 to C1 are neutral. This would have been an easy starting point for selection since we showed that this pattern of redundancy is the worst possible one. Any mutant version of the code would therefore have been favoured. But can we Chapter 7. Conclusion 127 decide whether the present version is optimal? It was shown in Chapter 4 that in the best patterns of redundancy about half of the total transitions between C0 and C1 are neutral. This feature is robust with respect to the size of the code as was shown in Figure 4.8. To assess the situation in the code, we count A the number of values of XY for which XYU C is synonymous of XYG . Out of a possible number of sixteen, exactly 8 fall in that category. Redundancy in the code fulfills perfectly that criteria for optimality. 7.3.2 Limitations of the model This fit is very encouraging but it must be treated with some caution given that some features of the code are not really mirrored in our model. First, one of the assumptions of our model was that the symbols found in C0 are all different. For this assumption to hold in the code, it would have to be the case that at the point where all codons starting in XY coded for the same amino acid, no redundancy existed in the use of the first two letters. In other words, the sixteen different instantiations of XY would have to lead to 16 different amino acids. This was not the case since AG(X) would have coded for either serine or arginine which were also represented by UC and CG respectively. Secondly, we know that if there ever was a proto-code capable of encoding a maximum of 16 amino acids, and in which the third base did not matter, changes leading to the quasi-universal genetic code must have included the introduction of new amino acids. This situation is not captured by our model. It would not make much sense to include it because this is a different effect altogether whose consequences would have to be analysed independently. Unfortunately the two effects are bound to be difficult to disentangle in the history of the code. Thirdly, the permutations associated with good redundancy were identified on the basis that any ranking in fitness of the amino acids is equally likely as any other. This is unlikely to be true since amino acids have some measurable degree of similarity with each other. We consequently expect similar amino acids to appear nearby in any ranking. 7.3.3 Further lines of research A different approach could address some of the limitations discussed above. We can examine whether the code is at a local optimum with respect to its pattern of redundancy. That is, we can compare the code with variants which differ by a minimum change in codon assignments. If the code turns out to be better than all, or almost all, these minimal rearrangements, it is almost certain that selection is responsible for the code being at that local optimum. Assessing whether the code is better than a near variant could be done by comparing numbers of local optima as was done in Chapter 3. In order to address the third of the limitations above, we could count these numbers by averaging not over all possible fitness rankings of the amino acids but rather on rankings which are compatible with the chemical similarity of amino acids. Chapter 7. Conclusion 128 What would be gained in plausibility for the study of the genetic code by this approach would be lost in applicability to GAs. Bibliography Beer, R. and Gallagher, J. (1991). Evolving dynamical neural networks for adaptive behaviour. Adaptive behavior 1: 91–122. Boers, J. and Kuiper, H. (1992). Biological metaphors and the design of modular artificial neural networks. Master’s thesis, University of Leiden. Bollobás, B. (1985). Random graphs. Academic Press. Caruana, R. and Schaffer, J. (1988). Representation and hidden bias: Gray vs. binary coding for genetic algorithm. In J. Laird (ed.), Proceedings of the Fifth International Conference on Machine Learning, pp. 153–161. Morgan Kauffman. Collins, R. and Jefferson, D. (1991). Selection in massively parallel genetic algorithms. In R. Belew and L. Booker (eds.), Proceedings of the Fourth International Conference on Genetic Algorithms, ICGA-91,, pp. 249–256. Morgan Kauffman. Crick, F. (1966). Codon-anticodon pairing: The wobble hypothesis. Journal of molecular biology 19: 548–555. Crick, F. (1968). The origin of the genetic code. Journal of molecular biology 38: 367–379. Dellaert, F. and Beer, R. (1994). Toward an evolvable model of development for autonomous agent synthesis. In Artificial Life IV, pp. 246–257. MIT Press. Derrida, B. and Peliti, L. (1991). Evolution in a flat fitness landscape. Bulletin of mathematical biology 53: 355–382. Di Giulio, M. (1989). The extension reached by the minimisation of the polarity distances during the evolution of the genetic code. Journal of Molecular Evolution 29: 288–293. Ehdaie, B. and Waines, J. (1994). Genetic analysis of carbon-isotope discrimination and agronomic characters in a bread wheat cross. Theoretical and Applied Genetics 88(8): 1023–1028. Eigen, M. and Schuster, P. (1977). The hypercycle. a principle of natural selforganisation. Part A: emergence of the hypercycle. Naturwissenschaften 64: 541–565. Falkenauer, E. (1995). Soving equal piles with the grouping genetic algorithm. In L. Eshelman (ed.), Proceedings of the Sixth International Conference on Genetic Algorithms, pp. 492–497. Morgan Kauffman. Gilbert, S. (1994). Developmental Biology. Sinauer, Sunderland, Mass. Goodnight, C. (1995). Epistasis and the increase in additive genetic variance - implications for phase-1 of wrights shifting-balance process. Evolution 49(3): 502–511. Gruau, F. (1992). Genetic synthesis of boolean neural networks with a cell rewriting developmental process. In L. Whitley and J. Schaffer (eds.), COGANN-92:International Workshop on Combinations of Genetic Algorithms and Neural Networks. IEEE Computer Society Press. Bibliography 130 Grüner, W., Giegerich, R., Strothmann, D., Reidys, C., Weber, J., Hofacker, I., Stadler, P., and Schuster, P. (1996a). Analysis of RNA sequence structure maps by exhaustive enumeration II. Structures of neutral networks and shape space covering. Monatshefte für Chemie 127: 375–389. Grüner, W., Giegerich, R., Strothmann, D., Reidys, C., Weber, J., Hofacker, I., Stadler, P., and Schuster, P. (1996b). Analysis of RNA sequence structure maps by exhaustive enumeration I. Neutral networks. Monatshefte für Chemie 127: 355–374. Haig, D. and Hurst, L. (1991). A quantitative measure of error minimisation of the genetic code. Journal of Molecular Evolution 33: 412–417. Harvey, I. (1993). The artificial evolution of adaptive behaviour. Ph.D. thesis, University of Sussex. Harvey, I., Husbands, P., and Cliff, D. (1992). Issues in evolutionary robotics. In J.-A. Meyer, H. Roitblat, and S. Wilson (eds.), Proceedings of the Second International Conference on the Simulation of Adaptive Behavior, pp. 364–373. MIT Press/Bradford Books, Cambridge MA. Haykin, S. (1988). Digital Communications. Wiley, New York. Heckman, J., Sarnoff, J., Alzner-de Weerd, B., Yin, S., and Rajbhandary, U. (1980). Novel feature in the gentic code and codon reading patterns in neurospors crassa mithochondrial tRNAs. Proceedings of the National Academy of Sciences, USA 77: 3159–3163. Holland, J. (1992). Adaptation in natural and artificial systems. MIT press. Husbands, P., Harvey, I., Cliff, D., and Miller, G. (1994). The use of genetic algorithms for the development of sensorimotor control systems. In P. Gaussier and J.-D. Nicoud (eds.), From Perception to Action, pp. 110–121. IEEE Computer Society Press. Huynen, M. (1996). Exploring phenotype space through neutral evolution. Journal of molecular evolution 43: 165–169. Huynen, M., Stadler, P., and Fontana, W. (1996). Smoothness within ruggedness: The role of neutrality in adaptation. Proceedings of the National Academy of Science, USA 93: 397–401. Jukes, T. (1981). Amino acid codes in mitochondria as possible clues to primitive codes. Journal of Molecular Evolution 18: 15–17. Kauffman, S. (1993). The origins of order. Oxford University Press. Kauffman, S. and Levin, S. (1987). Towards a general theory of adaptive walks on rugged landscapes. Journal of theoretical biology 128: 11–45. Kimura, M. (1968). Evolutionary rate at the molecular level. Nature 217: 624–626. King, J. and Jukes, T. (1969). Non-darwinian evolution. Science 164: 788–798. Kitano, H. (1990). Designing neural networks using genetic algorithms with graph generation system. Complex Systems 4: 461–476. Bibliography 131 Koza, J. (1992). Genetic programming: On the programming of computers by means of natural selection. The MIT Press. Kruger, K., Grabowski, P., Zaug, A., Sands, J., Gottschling, D., and Cech, T. (1982). Self-splicing RNA: autoexcision and autocyclization of the ribosomal intervening sequences of Tetrahymena. Cell 31: 147–157. Larsen, A. (1994). Breeding winter hardy grasses. Euphytica 77(3): 231–237. Levine, D. (1996). Application of a hybrid genetic algorithm to airline crew scheduling. Computers and operations research 23(6): 547–558. Maynard Smith, J. (1978). The evolution of sex. Cambridge University Press. Maynard Smith, J. and Szathmary, E. (1995). The major transitions in evolution. W.H. Freeman. McIlhagga, M., Husbands, P., and Ives, R. (1996). A comparison of search techniques on wing box optimisation problem. In H.-M. Voigt, E. W., I. Rechenberg, and H.P. Schwefel (eds.), Parallel problem solving from nature-PPSN IV, pp. 614–623. Springer. Michod, R. and Levin, B. (eds.) (1987). The evolution of sex : an examination of current ideas. Sinauer Sunderland, Mass. Miller, G., Todd, P., and Hedge, S. (1989). Designing neural networks using genetic algorithms. In J. Schaffer (ed.), Proceedings of the Third International Conference on Genetic Algorithms, pp. 379–384. Morgan Kauffman. Mitchell, M. (1996). An introduction to genetic algorithms. MIT Press. Monfroglio, A. (1996). Timetabling through constrained heuristic-search and genetic algorithms. Software-practice and experience 26(3): 251–279. Montana, D. and Davis, L. (1989). Training feedforward networks using genetic algorithms. In Proceedings of the International Joint Conference on Artificial Intelligence. Morgan Kaufmann. Nagylaki, T. (1994). Geographical variation in a quantitative character. Genetics 136(1): 361–381. Nirenberg, M., Caskey, T., Marshall, R., Brimacombe, R., Kellogg, D., Doctor, B., Hatfield, D., Levin, J., Rottman, F., Pestka, S., Wilcox, M., and Anderson, F. (1966). The RNA code and protein synthesis. In Cold Spring Harbor Symposium in Quantitative Biology, volume 31. Osawa, S., Jukes, T., Watanabe, K., and Muto, A. (1992). Recent evidence for evolution of the genetic code. Microbiological Reviews 56: 229–264. Palmer, E. (1985). Graphical Evolution: an introduction to the theory of random graphs. Wiley. Peck, J., Barreau, G., and Heath, S. (1997). Imperfect genes, fisherian mutation and the evolution of sex. Genetics 145(4): 1171–1199. Pluhar, W. (1994). The molecular basis of wobbling - an alterative hypothesis. Journal of Theoretical Biology 169(3): 305–312. Bibliography 132 Provine, W. (1986). Sewall Wright and Evolutionary biology. University of Chicago Press, Chicago. Reidys, C. (1995). Neutral networks of RNA secondary structures. Ph.D. thesis, University of Jena. Schulze-Kremer, S. (1992). Genetic algorithms for protein tertiary structure prediction. In R. Männer and B. Manderick (eds.), Parallel Problem Solving from Nature 2. NorthHolland. Schuster, P. (1995). How to search for RNA structures. Theoretical concepts in evolutionary biotechnology. Journal of Biotechnology 41: 239–257. Schuster, P. (1996). Landscapes and molecular evolution. Technical Report 96-07-047, Santa Fe Institute. Sonneborn, T. (1965). Degeneracy of the genetic code: extent, nature, and genetic implications. In V. Bryson and H. Vogel (eds.), Evolving genes and proteins, pp. 377– 379. Academic Press, New York. Sridhar, J. and Rajendran, C. (1996). Scheduling in flowshop and cellular manufacturing stems with multiple objective — a genetic algorithmic approach. Production planning and control 7(4): 374–382. Swanson, R. (1984). A unifying concept for the amino acid code. Bulletin of Mathematical Biology 46: 187–203. Turelli, M. and Barton, N. (1994). Genetic and statistical analyses of strong selection on polygenic traits. Genetics 138(3): 913–941. Wagner, G. and Altenberg, L. (1996). Complex adaptations and the evolution of evolvability. Evolution 50: 967–976. Watson, J., Hopkins, N., Roberts, J., and Steitz, J. (1987). Molecular biology of the gene. The Benjamin/Cummings Publishing Company. Woese, C. (1965). On the evolution of the genetic code. Proceedings of the National Academy of Sciences, USA 54. Woese, C. (1967). The genetic code: the molecular basis for genetic expression. Harper & Row, New York. Wong, J.-F. (1980). Role of minimisation of chemical distances between amino acids in the evolution of the genetic code. Proceedings of the National Academy of Science, USA 77: 1083–1086.