MULTI AGENT SYSTEMS ASSIGNMENT 2: THE COMPUTING GAME

Transcription

MULTI AGENT SYSTEMS ASSIGNMENT 2: THE COMPUTING GAME
 MULTI AGENT SYSTEMS ASSIGNMENT 2: THE COMPUTING GAME Yannick Soldati Kalle Fischer Thomas Debray TABLE OF CONTENTS Table of contents ................................................................................................................................................................ 2 Introduction ......................................................................................................................................................................... 3 The assignement ................................................................................................................................................................. 3 Brief overview ................................................................................................................................................................ 3 Parsing ............................................................................................................................................................................... 4 Nash equilibrium ........................................................................................................................................................... 4 Negotiation ...................................................................................................................................................................... 5 Final product ................................................................................................................................................................... 6 Conclusion ............................................................................................................................................................................. 6 INTRODUCTION The goal of this project is to create an agent which can cooperate with other agents in a static environment in such a way its earnings are maximized. The different agents involved in the computing game will be explained in the “Brief overview” section. In the computing game, agents are being sent assignments which need to be solved using either solutions from other agents, either solutions bought from the calculator agent. The goal can therefore be divided into 3 different parts: Earn a maximal amount of money Buy as few solutions as possible from the calculator agent and try to cooperate as good as possible Refrain from cheating With this in mind, we understand that the founding of this problem is related to a negotiation game. In order to achieve a maximal earning, negotiation with other agents is needed in such a way a minimal amount of solutions is being bought from the calculator agent. THE ASSIGNEMENT BRIEF OVERVIEW Calculator
Referee
Bank
competing agents
The diagram above shows how the environment is populated by agents: Competing agents : these agents are programmed by the students, and will compete with each other by trying to achieve a maximal earnings Referee agent : this agent gives the assignments, a deadline and a reward if the agents find the correct solution Calculator agent : this agent performs simple calculations for a fixed price Bank agent : this agent manages the accounts of the agents PARSING A first step in the agents’ behavior consists of parsing the assignments in a correct order such that communication can be started with either other agents either the calculator. Since the assignments are notated in RPN (Reversed Polish Notation), also called postfix notation, and the communication protocol is set up with the prefix notation, a module needs to be responsible for translating the assignments from postfix to prefix notation. Moreover, transactions can only consist of small assignments such as 5 9 *, meaning that the core subtasks need to be detected and solved recursively, replacing each subtask by its solution in the next step. Although the game rules do not allow such action, this actually means that by converting the whole assignment, the agent has enough knowledge to solve the assignments itself. In Reverse Polish notation, the operators follow their operands; for instance, to add three and four, one would write "3 4 +" rather than "3 + 4". If there are multiple operations, the operator is given immediately after its second operand; so the expression written "3 − 4 + 5" in conventional infix notation would be written "3 4 − 5 +" in RPN: first subtract 4 from 3, then add 5 to that. An example of a non‐parsed set of assignments: (assignments
)(assignment
/ 44 1 + 5 *
)(assignment
4 * 42 06 06
(assignment 0 40 4 * 09 - 9 * 00 + 04 9 * 07 6 + - 1 04 07 + 9 2 + 03 + /)(assignment 2 49 7 * 8 4 + * 48 1 09 *
06 01 04 + * * / +)(assignment 3 05 0 - 49 + 1 8 07 * * 01 - 4 2 3 - 9 - 44 * 00 04 9 6 * 3 + / * 2 + +)(assignment 5 06 07
+ + / 04 44 - - + 42 49… NASH EQUILIBRIUM The Nash equilibrium considers the following: If there is a set of strategies with the property that no player can benefit by changing her strategy while the other players keep their strategies unchanged, then that set of strategies and the corresponding payoffs constitute the Nash Equilibrium. This idea can be implemented in 2 different ways in the agent. Moreover, the game can be compared to the prisoner dilemma: different strategies can emerge when the game is repeated multiple times. Considering that the game will only be played one time, a possible solution for implementing the Nash theorem is to imply tit‐for‐fat on the agent in such a way that his strategy becomes not only a fixed strategy, but moreover becomes very predictable and adds more importance to a common group behavior than individualistic behavior. Tit‐for‐tat strategy has moreover been proven experimentally to be a winning strategy because it has the opportunity to play against other programs that were inclined to cooperate. Imaging that the same game would be repeated multiple times, the best group strategy would be to cooperate in such a way that every sub assignment is only calculated once by the calculator for the whole group, and that these sub assignments are then shared within. This would imply a uniform distribution of calculator requests by the agents in order to achieve a fair system. As a result, all agents would end up with the most amount of money which is possible, and with very small fluctuations in bank balances. The implementation of such algorithm would however require that the game is repeated enough times so that each agent understands the importance of long‐run cooperation. Otherwise, some individuals might cause an abrupt behavior change in the group towards individualism, resulting in an overall worse reward level. These changes can be explained by the ising spin model, which introduces density and temperature (noise) as main factors of this abrupt change. In this case, the density would be explained by the amount of agents that are in connection with each other, and the temperature would be explained by the expectations about the long‐run situation: will the game be repeated many times or not. If so, then it is always more interesting to cooperate since lower energy states are to be reached in those cases, yielding better performance for all agents. NEGOTIATION Negotiation is one of the most important parts of the computing game. It allows obtaining the solutions for a minimal cost since it avoids making payments to the bank, which is the most expensive. In this section, we will not explain every phase of the negotiation due to the lack of space. We will explain the negotiation when it is impossible to trade a solution with another one. When we ask to other agents for solutions, the other agents will send a list of solutions, if the trade is impossible; we propose money for the solution. The graph here under explains this trading phase, where an amount of money is proposed. Solution request
Solution lookup answer
Our agent Other agents
Money proposal
Accept/reject
New proposal based on behavior copy Accept/reject
In order to achieve a fair utility tradeoff (the buyer wants to minimize his costs, the seller wants to maximize his earnings), the monotonic concession protocol is implemented. Although in many situations this protocol is known to have a slow convergence rate, we can assume that this problem will not occur in the computing game: the maximal cost of a solution (10) is defined by the calculator, while the minimal cost is always 0. Moreover, we know that we are dealing with discrete values. This results into 11 possible price deals for a solution. Combining this knowledge with the tit‐for‐tat strategy, we can restrict the amount of proposals by simply copying the trader’s behavior and therefore move faster towards a convergence point. FINAL PRODUCT Due to stability problems in the computing game environment, and due to the lack of testing candidates (since they were in development up to the last minute), profound analysis of the yielded agent was not possible. We however think that the combined strategies should perform well and allow the agent to start successful cooperation with its environment. Moreover, it will always try moving into the direction of the group equilibrium due to its tit‐for‐tat strategy. CONCLUSION As a final spotlight onto the agent’s behavior, a priority list of the agent’s preferences is shown: as you can see in the picture, the main priority is to spend as less money as possible, and then to earn to maximum amount of money. The final priority is to find the final solution. This preference is only 3rd because it is not convenient to find the final solution if the costs are too high and the earnings too low. An advanced cost‐management system should allow the agent to deal with balance problems and to maximize its profits, keeping in mind that the final solution should be found as soon as possible. As conclusion, it can be stated that developing an agent in a multi‐agent system environment is not very simple and requires a lot of knowledge not only about the goals, but also about the other participants. The interaction of different agents will lead to different self‐organizing systems with different Nash equilibriums. Although the summed benefits will always be optimal when calls to the calculator are minimized as a group, we can suspect that this situation will very unlikely occur due to the competing challenge.