Emergent Conventions and the Structure of Multi Agent Systems
Transcription
Emergent Conventions and the Structure of Multi Agent Systems
Emergent Conventions and the Structure of Multi{Agent Systems James E. Kittock Robotics Laboratory Stanford University [email protected] Abstract This paper examines the emergence of conventions through \co-learning" in a model multiagent system. Agents interact through a two-player game, receiving feedback according to the game's payo matrix. The agent model species how agents use this feedback to choose a strategy from the possible strategies for the game. A global structure, represented as a graph, restricts which agents may interact with one another. Results are presented from experiments with two dierent games and a range of global structures. We nd that for a given game, the choice of global structure has a profound eect on the evolution of the system. We give some preliminary analytical results and intuitive arguments to explain why the systems behave as they do and suggest directions of further study. Finally, we briey discuss the relationship of these systems to work in computer science, economics, and other elds. To appear in: Lynn Nadel and Daniel Stein, editors, 1993 Lectures in Complex Systems: the proceedings of the 1993 Complex Systems Summer School, Santa Fe Institute Studies in the Sciences of Complexity Lecture Volume VI. Santa Fe Institute, Addison-Wesley Publishing Co., 1994. Available as: http://robotics.stanford.edu/people/jek/Papers/sfi93.ps This research was supported in part by grants from the Advanced Research Projects Agency and the Air Force O ce of Scientic Research. 1 1 Introduction Conventions are common in human society, including such disparate things as standing in line and trading currency for goods. Driving an automobile is a commonplace task which requires many conventions|one can imagine the chaos that would result if each driver used a completely dierent set of strategies. This example is easy to extend into an articial society, as autonomous mobile robots would also need to obey trac laws. Indeed, it appears that conventions are generally necessary in multi-agent systems: conventions reduce the potential for conict and help ensure that agents can achieve their goals in an orderly, ecient manner. In 5], Shoham and Tennenholtz introduced the notion of emergent conventions. In contrast with conventions which might be designed into agents' behavior or legislated by a central authority, emergent conventions are the result of the behavioral decisions of individual agents based on feedback from local interactions. Shoham and Tennenholtz extended this idea into a more general framework, dubbed co-learning 6]. In the co-learning paradigm, agents acquire experience through interactions with the world, and use that experience to guide their future course of action. A distinguishing characteristic of co-learning is that each agent's environment consists (at least in part) of the other agents in the system. Thus, in order for agents to adapt to their environment, they must adapt to one another's behavior. Here, we describe a modication of the co-learning framework as presented in 6] and examine its eects on the emergence of conventions in a model multi-agent system. Simulation Model We assume that most tasks that an agent might undertake can only be performed in a limited number of ways actions are thus chosen from a nite selection of strategies. A convention exists when most or all agents in a system are using one particular strategy for a given task. We will consider a simplied system in which the agents have only one abstract task to perform, and we will examine how a convention for this task can arise spontaneously through co-learning, using a very basic learning rule. In our model system, each agent's environment is comprised solely of the other agents. That is, the only feedback that agents receive comes from their interactions with other agents. We model agent interactions using payo matrices analogous to those used for two-person games in game theory. Each time two agents interact, they receive feedback as specied by the payo matrix. It is this feedback the agents will use to select their action the next time they interact. Systems similar to this have been referred to as iterated games 3, 6, 7]. Each task has a corresponding two-person iterated game. In this paper, we will consider two dierent games, which represent the distinct goals of \coordination" and \cooperation." Our modication to the co-learning setting is the addition of an interaction graph which limits agent interactions. In the original study of emergent conven2 tions, any pair of agents could interact 5] we will restrict this by only allowing interactions between agents which are adjacent on the interaction graph. Our primary objective is to explore the eects of this global structure on the behavior of the system. In particular, we examine how the time to reach a convention scales with the number of agents in the system for dierent types of interaction graph. The basic structure of our simulations is as follows. We select a game to use and specify an interaction graph. We create a number of agents, and each agent is given an initial strategy. The simulation is run for a nite number of \time-steps," and during each time step, a pair of agents is chosen to interact. The agents receive feedback based on the strategies they used, and they incorporate this feedback into their memories. Using a learning algorithm we call the strategy update rule, each agent then selects the strategy it will use the next time it is chosen to interact. The system can either be run for a predetermined number of time-steps or be run until a convention has been reached. Overview In the following section, the structure of the simulation model is explained in more detail. In Section 3, we describe some results of experiments with these systems. Section 4 puts forth our preliminary analytic and intuitive understanding of these systems. In Section 5 we discuss possibilities for further research and the relationship of these experiments to work in a number of other elds. 2 Simulating Agent Societies In order to conduct experiments on articial agent societies, we must choose a way to model them. Since the simulations detailed in this paper are intended only to explore some basic issues, the model used is deliberately simple. We envision agents existing in an environment where they choose actions from a nite repertoire of behaviors. When an agent performs an action, it aects the environment, which in turn aects the agent. That is, an agent receives feedback as a result of its behavior. When investigating emergent conventions, we are primarily concerned with how the agents are aected by each others' behavior. Thus, in the present implementation of our system, all feedback that agents receive is due to their mutual interactions|the agents are each others' environment. 2.1 The Agent Model In each simulation there is a xed, nite number, N , of agents. Each agent has two dening characteristics, its strategy and its memory. For a given task, s , the strategy of agent k, is chosen from a set = f 1 : : : g of S distinct abstract strategies. It is important to note that does not represent any particular suite of possible actions rather, it serves to model the general situation where multiple strategies are available to agents. We will only consider the two k S 3 strategy case similar systems with more than two strategy choices are discussed in 5]. The memory of agent k, M , is of maximum size , where is the memory size, a parameter of the agent model. An agent's memory is conveniently thought of as a set, each element of which is a feedback event. An event m 2 M is written as a triple, m = ht(m) s(m) f(m)i, where f(m) is the feedback the agent received when using strategy s(m) at time t(m). An agent uses the contents of its memory to select the strategy it will use the next time it interacts with another agent. k k 2.2 Modelling Interactions Once the structure of individual agents is specied, we must decide how the agents interact with their environment. We introduce the concept of an interaction graph to specify which agents can interact with one another, and we use the payo matrix of a two-player game to determine agents' feedback. 2.2.1 Interaction Graph In other, similar models, it was possible for any pair of agents to interact 5, 6]. To explore the eects of incomplete mixing of agents, we specify an interaction graph, I , which has N vertices, each representing one of the agents in the system. An edge connecting vertices i and j in I indicates that the pair of agents (i j ) may be chosen to interact by playing the specied game. Interacting pairs are chosen randomly and uniformly from all pairs allowed by I . Note that when I is K , the complete N -vertex graph, the present model is equivalent to those which allowed complete mixing of the agents. N C 6,1 = C 6 C 6,2 C 6,3 = K 6 Figure 1: Relationship between C , C , and K . N N r N To facilitate investigation of the eects of the structure of I , we dene a class of graphs representing agents arranged on a circular lattice with a xed interaction radius. Denition 1 (C ) C is the graph on N vertices such that vertex i is adjacent to vertices (i + j ) mod N and (i ; j ) mod N for 1 j r. We call r the N r N r 4 interaction radius of C .1 C C r b 2 c C = K . N r N N r N N 1 is the cycle on N vertices. Note that for N See Figure 1 for an illustration of the denition. We note that while this is a somewhat arbitrary choice of structure (i.e. why not a two-dimensional grid or a tree structure?), it does yield interesting and illuminating results, while avoiding the added complexity we might expect from a more elaborate structure. The interaction graph is a general way to model restrictions on interactions. Such restrictions may be due to any number of factors, including hierarchies, physical separations, communication links, security barriers, etc. Whatever its origin, the structure of I will be seen to have a substantial eect on the behavior of the systems we examine. 2.2.2 Two-Player Games Once we have determined which pairs of agents are allowed to interact, we must specify what game the agents will play. We examine two games, the iterated cooperation game (ICG) and the iterated prisoner's dilemma (IPD). ICG IPD A B C D A +1 +1 ;1 ;1 C +2 +2 ;6 +6 B ;1 ;1 +1 +1 D +6 ;6 ;5 ;5 Table 1: Payo matrices for coordination game and prisoner's dilemma. ICG is a \pure coordination" game 3], with two possible strategies, labelled A and B. When agents with identical strategies meet, they get positive feedback, and when agents with dierent strategies meet, they get negative feedback. The payo matrix is specied in Table 1. This game is intended to model situations where: 1) from the point of view of interaction with the world, the two available strategies are equivalent and there is no a priori way for agents to choose between them, and 2) the two strategies are mutually incompatible. A simple example of such a situation is driving on a divided road: one can either drive on the left or on the right, but it is sub-optimal if some people do one and some people do the other. In this case, our goal is for the agents to reach any convention|either with strategy A or with strategy B. IPD has two available strategies, labelled C and D. The payo matrix is detailed in Table 1. This game is designed to model situations where two agents benet from cooperating (strategy C), but there is also the potential to get a large payo by defecting (strategy D) if the other player cooperates. However, if both agents defect, they are both punished. Our goal here is for the agents to reach a convention with strategy C, which indicates the agents are all cooperating. 1 Not to be confused with the graph-theoretic radius, which is something else altogether. 5 The prisoner's dilemma has been examined extensively, in particular by Axelrod:Evolution in his classic book 1]. The relationship between previous work with prisoner's dilemma and the present experiments will be briey discussed in Section 5. 2.2.3 Strategy Selection Once the agent model and game are specied, our nal step in dening the system is to determine how agents choose the strategy that they will use. In these experiments we use a version of the Highest Current Reward (HCR) strategy update rule 6].2 The current reward for a strategy is the total remembered feedback for using that strategy, i.e. for strategy the current reward is the sum of f(m) for all feedback events m in the agent's memory such that s(m) = . We can now dene the HCR rule (in the two-strategy case) as: \If the other strategy has a current reward greater than that of the current strategy, change strategies." Note that HCR is performed after the feedback event from the interaction which has just occurred is added to the agent's memory, i.e. HCR is performed on a set M = M fm g, where M is the memory of agent k at time t and m is the feedback event agent k records at time t. Once an agent's next strategy has been chosen, the agent's memory is updated by incorporating the event which the agent just received and discarding the oldest event: M +1 = M fm g ; farg mint t(m)g: 0 t k t k t k t k t k t k t k k M Agents apply the HCR rule immediately after receiving feedback, and their strategies are considered to be updated instantaneously. Agents which were not chosen to interact at time t do nothing, so we have simply M +1 = M for those agents. t k t k 3 Experimental Results Before we proceed with a look at some results from our simulations, a word is in order about how we compare the behavior of the various possible systems. 3.1 Performance Measures In the present situation, the most obvious performance criterion is \how well" the system reaches a convention. For an ICG system, the goal is to have all of the agents using the same strategy, but we do not care which particular strategy the agents are using. On the other hand, for an IPD system, we want all agents to be cooperating. Thus, we have dierent notions of convergence for the two systems. We dene C , the convergence of a system at time t, as follows. For ICG, the t 2 It should be noted that the present denition of memory is slightly dierent than that found in 6]. There, memory was assumed to record a xed amount of \time" during which an agent might interact many, few, or no times. Here, memory refers explicitly to the number of previous interactions in which an agent has participated that it remembers. 6 convergence is the fraction of agents using the majority strategy (either A or B) for IPD, the convergence is the fraction of agents using the cooperate strategy (C). Note that convergence ranges from 0.5 to 1 for ICG and from 0 to 1 for IPD. Given this denition of convergence, we can also dene T , the convergence time for a simulation: the convergence time for a given level of convergence c is the earliest time at which C c. In this paper, we will use \time" and \number of interactions" interchangeably. Thus, when we speak about \time t", we are referring to the point in the evolution of the system when t interactions have occurred. We use two dierent measures of performance which arise from these denitions. The rst measure is average time to a xed convergence. In this case, we run simulations until a xed convergence level is reached and note how long it took. When our concern is that the system reach a critical degree of convergence as rapidly as possible, this is a useful measure. However, some systems will never converge on practical timescales, and yet may have interesting behavior which will not be evident from timing data. The second measure we use is average convergence after a xed time. We simply run simulations for a specied amount of time and note the convergence level at the end of the run. We nd that this is often one of the most revealing measures. However, for systems where the convergence over time is not generally monotonic, this measure is eectively meaningless. There are, of course, other possible measures of performance, such as probability of achieving a xed convergence after a xed time (used in 5, 6]) and maximum convergence achieved in a xed amount of time. We have chosen the measures which we found most revealing for the issues at hand. c t 3.2 Eects of Interaction Graph Topology Unless otherwise specied, the simulations were run with one hundred agents, each agent had an equal probability of having the two possible strategies as its initial strategy, and the data were averaged over one thousand trials with dierent random seeds. We used memory sizes of = 1 and = 0 for ICG and IPD, respectively (note that with = 0, agents change strategies immediately upon receiving negative feedback) experimentation showed that these memory sizes were among the most ecient for the HCR rule. We will rst consider the extreme cases, where all agents are allowed to interact with one another (I = K ) and where agents are only allowed to interact with their nearest neighbors on the one-dimensional lattice (I = C ). One of our most interesting discoveries was the radically diering behavior of HCR with ICG and IPD as a function of the structure of I . The experimental data are presented in Figure 2, which shows the time to achieve 90% convergence as a function of the number of agents for both games on K and C . The performance of the HCR rule with ICG is reasonable for both cases of I . The linear form of the data on the log-log plot indicates that ICG systems can be expected to converge in polynomial time on both K and C . For intermediate interaction radii, the performance of the ICG systems is somewhere between that N N N N N 7 N 106 + 105 104 103 10 2 + ++++ 22 + + + 2 + ++++ 2 + +2 3 3 3 3 3 3 ICG, I = K IPD, I = K ICG, I = C IPD, I = C N N N 2 10 3 3 N 100 1000 Number of Agents 3 + 2 10000 Figure 2: T90% vs. N for ICG and IPD with dierent congurations of I . for C and K . For IPD, the story is dierent. Using the HCR rule and working with a system equivalent to our IPD system on K , Shoham and Tennenholtz write, \ HCR] is hopeless in the cooperation setting" 6]. They had discovered what we see in Figure 2: convergence time for IPD on K appears to be at least exponential in the number of agents. HCR is redeemed somewhat by its performance with IPD on C , which appears to be polynomial, and is possibly linear. On C 2 (not shown), the IPD system still manages to converge in reasonable time, but for all interaction radii greater than two, it once again becomes \hopeless." In general, it appears that the particular choice of I has a drastic eect on the way system performance scales with system size. N N N N N N I =K I=C ICG T90% / N log N T90% / N 3 IPD T90% / c T90% / N N N N Table 2: Possible relationships between T90% and the number of agents for dierent interactiong graph structures. Possible functional relations between expected values of T90% and N are summarized in Table 2 they were derived from tting curves to the simulation results and are merely descriptive at this stage. 8 4 Analysis In this section, we aim to give a avor for some of the ways we can pursue an understanding of the behavior of both the coordination game and the prisoner's dilemma. 4.1 Iterated Coordination Game 1 0.9 0.8 0.7 0.6 C3000 vs. Interaction Radius 3 33 0 2 333 4 6 33 333 3 3333 333 ICG, N = 100 3 8 10 12 14 Interaction Radius (r) 16 18 20 Figure 3: ICG: C3000 as a function of r, I = C100 . r To begin our investigation of the relationship between ICG performance and the structure of I , we look to Figure 3, which shows how performance (measured as convergence after a xed time) varies with the interaction radius for agents on C100 . Empirically, we nd that performance increases with increasing interaction radius. Thus, we are lead to ask, what properties of I vary with the interaction radius? Two important ones are the vertex degree and the graph diameter. The degree of a vertex is the number of edges containing that vertex. In the present case, all vertices of I have the same degree, so we can speak of the vertex degree, , of I . For C = 2r (restricted to 2 N ; 1). Thus, as r increases, each agent can interact with more agents. The diameter of a graph is the maximum longest path between any two vertices, and it provides a lower limit on the time for information to propagate throughout the graph. As r increases, the diameter of C decreases, and we expect that the time for information to travel among the agents will decrease as well. We speculate that either or both of these properties of I aect the observed performance. However, for C , the diameter is d 2 e = d e, so it is closely related to the vertex degree. To test the relative importance of graph diameter and vertex degree, it would be useful to construct a set of graphs for which one property (diameter or vertex degree) is constant, while the other property varies. Initially, we would also like to keep our graphs symmetric with respect to each vertex, to r N r N r N r N r 9 N C3000 vs. Diameter of I 0.9 0.8 0.7 0.6 6 8 10 ICG, N = 100 12 14 16 18 Graph Diameter 20 22 24 26 Figure 4: ICG: C3000 plotted against the diameter of I , for I = D100 , 2 49. avoid introducing eects due to inhomogeneity. It turns out to be straightforward to construct a symmetric graph of xed vertex degree. As a test case, we dene the class of graphs D such that an edge connects vertex i with vertices i + 1 i ; 1 i + and i ; (all mod N ). For 2 < b 2 c, the vertex degree is xed at four, and a variety of diameters result (we can measure the diameter of each graph using, e.g. Dijkstra's algorithm 8]). Once we have measured the performance of ICG on each graph D , we can plot performance against diameter, as seen in Figure 4. We see that there is a correlation between the diameter of an interaction graph and the performance of an ICG system on that graph. We have hinted that this may be a function of the speed with which information can ow among the agents. However, more work is necessary to determine precisely how and why the diameter, vertex degree, and other graph properties of I aect the performance of an ICG system. It will also take further study to prove (or disprove) the relationships between expected convergence time for ICG and number of agents proposed in Table 2. N N N 4.2 Iterated Prisoner's Dilemma It was seen in Section 3 that IPD behaves quite dierently from ICG with respect to the structure of I . For r = 1, IPD on C converges quite rapidly, but for large r, it does not converge on any reasonable time scale. We can get some intuition for why this is if we think in terms of \stable" cooperative agents. A cooperative agent is stable at a given time if it is guaranteed to interact with a cooperative agent. On K , we have an all-or-none situation: an agent can only be stable if all of the agents are cooperative. In contrast, on C an agent need only have its two neighbors be cooperative to be stable. As yet, we have not extended this notion to a formal analysis. However, for IPD on K , we can give an analytical argument N r N N N 10 for the relatively poor performance of HCR in our experiments (recall that I = K shows a dramatic increase in convergence time as the number of agents is increased, as seen in Figure 2). N 2 1.5 1 0.5 0 -0.5 Expected change in N vs. C C h N i C 0 0.2 0.4 0.6 Convergence Level (C ) 0.8 1 Figure 5: Expected change in number of cooperative agents as a function of the convergence level for IPD, I = K . N We begin by computing the expected change in the number of cooperative agents as a function of the convergence level. Since we are considering agents with memory size = 0, whenever two agents using strategy D meet, they will both switch to using strategy C (because they will both get negative feedback, as seen in Table 1). When an agent using strategy C encounters an agent using strategy D, the agent with strategy C will switch to strategy D. When two agents with strategy C meet, nothing will happen. Now we can compute the expected change in the number of cooperative agents as a function of the probabilities of each of these meetings taking place: h N i = 2 p (DD) ; 1 p (CD) + p (DC)] : These probabilities are functions of the number of agents using strategy C and hence the convergence level of the system. Thus, we can compute the expected change in the number of cooperative agents as a function of the convergence level this function is plotted in Figure 5. Note that for C > 0:5 we can actually expect the number of cooperative agents to decrease. This partially explains the reluctance of the IPD system to converge|as the system gets closer to total convergence it tends to move away, and towards 50% convergence. Note that before converging, the IPD system must pass through a state in which just two agents have strategy D. We can calculate that in this state, the probability that the system will move away from convergence is a factor of O(N ) greater than the probability that the system will move towards convergence. Thus, we see that there is essentially a probabilistic barrier between the fully converged state and less converged states. We can approximate this situation by assuming that the system repeatedly attempts to cross the barrier all at once. C 11 The expected time for the system to achieve the fully convergent state is then inversely proportional to the probability of the system succeeding in an attempt: 1 . A straightforward analysis shows that p hTconv i / pconv ) for some conv O (c c > 1, so hTconv i O(c ), which correlates with what we saw experimentally. On C , analysis is complicated by the fact that not all states with the same convergence level are isomorphic (for example, consider the case where agents have alternating strategies around the lattice versus the case where agents 1 : : : 2 have one strategy and agents 2 + 1 : : : N have the other strategy). Thus, our analysis would require a dierent methodology than that used for K . Experimentaly, we see that the order imposed by I allows the formation of stable groups of cooperating agents. These groups tend to grow and merge, until eventually all agents are cooperative. It is hoped that continued work with the idea of \stable" agents will lead to a more complete understanding of the relationship between performance and the topology of I . As a nal note, Shoham and Tennenholtz have proven a general lower bound of N log N on the convergence time of systems such as these 6], which appears to contradict our assertion that T90% appears to be proportional to N . However, in the general case T100% need not be proportional to T90% , because the nal stages of convergence might be much less likely to take place. Experimental data for IPD on C indicate that while T90% / N , T100% / N log N . The arguments presented in this section are an attempt to explain the results of the simulations they are more theory than theorem. Deriving tight bounds on the performance for any of these systems is an open problem which will most likely require an appeal to both rigorous algorithmic analysis and dynamical systems theory. ; ;N N N N N N N 5 Discussion We have seen that a wide variety of interesting and often surprising behaviors can result from a system which is quite simple in concept. Further analytic investigation is necessary to gain a clear theoretical understanding of the origins and ramications of the complexity inherent in this multi-agent framework. To move from the current system to practical applications will also require adding features to the model that reect real-world situations, such as random and systemic noise and feedback due to other environmental factors. A number of possible applications to test the viability of this framework are presently under consideration, including distributed memory management and automated load balancing in multiprocessor systems. The systems discussed here have ties to work in a number of other areas both within and outside of computer science. Co-learning has fundamental ties to the machine learning subeld of articial intelligence. The Highest Current Reward strategy update rule provides another link to machine learning, as it is essentially a basic form of reinforcement learning. This leads to another possibly fruitful avenue of investigation: systems of substantially more sophisticated agents. Schaerf et al 12 have used co-learning with a more sophisticated learning rule to investigate load balancing without central control in a model multi-agent system 4]. Our present framework also has ties to theoretical computer science, especially when we view either individual agents or the entire system as nite state machines. Readers familiar with the Prisoner's Dilemma and its treatment in game theory and economics have probably noticed that our approach is markedly dierent. Our emphasis on feedback-based learning techniques violates some of the basic assumptions of economic cooperation theory 1, 6]. In particular, we do not allow for any meta-reasoning by agents that is, our agents do not have access to the payo matrix and thus can only make decisions based on their experience. Furthermore, agents do not know the specic source of their feedback. They do not see the actions which other agents take and, indeed, have no means of distinguishing feedback due to interactions with other agents from feedback due to other environmental factors. In our framework, agents must learn solely based on the outcome of their actions. In some respects, this may limit our systems, but it also allows for a more general approach to learning in a dynamic environment. The current interest in economics with \bounded rationality" has led to some work which is closer in spirit to our model 3]. The systems discussed in this paper (and multi-agent systems in general) are also related to various other dynamical systems. Ties to population genetics are suggested both by the resemblance of the spread of convention information through an agent society to the spread of genetic information through a population and by the possible similarity of the selection of behavioral conventions to the selection of phenotypic traits. There are also links to statistical mechanics, which are exploited more thoroughly in other models of multi-agent systems which have been called \computational ecologies" 2]. For a more thorough discussion of the relationship of the present framework to other complex dynamic systems, see 6]. 6 Conclusion We have seen that the proper global structure is required if conventions are to arise successfully in our model multi-agent system, and that this optimal structure depends upon the nature of the interactions in the agent society. Social structures which readily allow conventions of one sort to arise may be completely inadequate with regards to other conventions. Designing multi-agent systems with the capacity to automatically and locally develop behavioral conventions has its own unique diculties and challenges emergent conventions are not simply a panacea for the problems of o-line and centralized social legislation methods. However, the study of emergent conventions is in its earliest stages and still has potential for improving the functionality of multi-agent systems. Furthermore, the framework presented here invites creative design and investigative eorts which may ultimately borrow ideas from|and share ideas with|the broad range of subjects loosely grouped under the heading \complex systems." 13 Acknowledgements I would like to thank Yoav Shoham for introducing me to this interesting topic, and to Marko Balabanovi!c and Tom!as Uribe for reading and commenting on a draft of this paper. References 1] R. Axelrod. The Evolution of Cooperation. New York: Basic Books, 1984. 2] Bernardo Huberman and Tad Hogg. The behavior of computational ecologies. In Bernardo Huberman, editor, The Ecology of Computation. Elsevier Science Publishers B.V., 1988. 3] M. Kandori, G. Mailath, and R. Rob. Learning, mutation and long equilibria in games. Econometrica, 61:29{56, 1993. 4] Andrea Schaerf, Moshe Tennenholtz, and Yoav Shoham. Adaptive load balancing: a study in co-learning. Draft manuscript, 1993. 5] Yoav Shoham and Moshe Tennenholtz. Emergent conventions in multi-agent systems: initial experimental results and observations. In KR-92, 1992. 6] Yoav Shoham and Moshe Tennenholtz. Co-learning and the evolution of social activity. Submitted for publication, 1993. 7] Karl Sigmund. Games of Life: Explorations in Ecology, Evolution, and Behaviour. Oxford University Press, 1993. 8] Steven Skiena. Implementing Discrete Mathematics. Addison-Wesley Publishing Co., 1990. 14