General Game Playing Systems
Transcription
General Game Playing Systems
General Game Playing Systems Andreas Holt Kongens Lyngby 2008 IMM-M.Sc.-2008-117 Summary General Game Playing (GGP) is a field in artificial intelligence (AI) that deals with systems that, provided with only the rules of an arbitrary game, can play the game in an intelligent way. This problem is much harder than making a computer play a specific game since you cannot rely on predefined evaluation functions or any other domain specific knowledge. It is also more interesting from an AI point of view since the computer needs to show some intelligent behaviour in order to come up with a good move instead of just following a predefined formula. In this report we will investigate different methods of making such a system. We will look at what others have done, and an actual implementation of our own general game player is presented. This implementation can either use the minimax algorithm with a simulation based evaluation mechanism or the UCT algorithm based on Monte Carlo simulations. In the end of this report these two techniques are compared by playing against each other in different games. These comparisons shows that the minimax algorithm with an evaluation function is a good choice in GGP but the UCT algorithm can be very strong when given enough time or computational resources to make a suitable amount of simulations. ii Resumé Generelle spil (general game playing) er et område inden for kunstig intelligens, der omhandler systemer, som kan spille vilkårlige spil på en intelligent måde, blot ved at få stillet spillets regler til rådighed. Problemet er meget sværere at løse end at få en computer til at spille et bestemt spil, da man ikke kan benytte forudbestemte evalueringsfunktioner eller anden forhåndsviden om spillet. Det er også en mere interessant problemstilling i forhold til kunstig intelligens, da computeren er nødt til at udvise en intelligent tankegang for at finde på et godt træk, i modsætning til blot at følge en forudbestemt formel. I denne rapport vil vi undersøge forskellige metoder til at lave et sådan system. Vi vil se på hvad andre har lavet, og vi vil præsentere vores egen implementering af et system til at spille generelle spil. Denne implementering kan enten benytte minimax algoritmen med en simulationsbaseret evalueringsfunktion eller UCT algoritmen baseret på Monte Carlo simuleringer. Til sidst i rapporten bliver disse to teknikker sammenlignet ved at spille mod hinanden i forskellige spil. Disse sammenligninger viser at minimax algoritmen med en evalueringsfunktion er et godt valg til generelle spil, men UCT algoritmen kan være meget stærk hvis den for tildelt nok tid eller computer ressourcer til at udføre tilpas mange simuleringer. iv Preface This thesis was prepared at Informatics Mathematical Modelling (IMM), the Technical University of Denmark (DTU) in partial fulfilment of the requirements for acquiring the Master of Science degree in engineering. A list of abbreviations used in this report can be found in appendix A page 49. Kongens Lyngby, November 2008 Andreas Holt vi Acknowledgements I thank: • My supervisor Jørgen Villadsen for his help and input in the making of this project. • Miriam Ortwed for proofreading and great support during the project period. • Henrik Alsing Pedersen for providing valuable feedback on the report. • Thomas Bolander for telling me about his game Kolibrat and allowing me to use it for testing my game player. • The Logic Group at Stanford University for providing the general game playing framework including the Game Description Language, the communication protocol and a variety of game descriptions. • Stephan Schiffel at Technische Universität Dresden for making a game server implementation available. viii Contents Summary i Resumé iii Preface v Acknowledgements vii 1 Introduction 1 2 Background 2.1 The AAAI General Game Playing Competition . . . . . . . . . . 2.2 Other game players . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3 8 3 Algorithms 3.1 Types of games . . . . . . . . . . . . . 3.2 Minimax . . . . . . . . . . . . . . . . . 3.3 A simulation based evaluation function 3.4 Monte Carlo methods and UCT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 11 13 14 17 4 Architecture 19 4.1 Layered architecture . . . . . . . . . . . . . . . . . . . . . . . . . 19 5 Implementation 5.1 Java or Prolog . . . 5.2 Reasoner . . . . . . . 5.3 Transposition tables 5.4 The Players . . . . . 5.5 Game Analyser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 23 24 26 28 31 x CONTENTS 5.6 5.7 5.8 Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Game Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . HTTP Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 31 34 6 Results 35 6.1 Minimax versus UCT . . . . . . . . . . . . . . . . . . . . . . . . 35 6.2 Stress tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 7 Future work 7.1 Speed improvement . . . 7.2 History heuristics . . . . 7.3 Parallelization . . . . . . 7.4 Game analyser methods 7.5 The UCT bias constant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 41 42 42 42 42 8 Conclusion 45 A Abbreviations 49 B Game rules 51 B.1 Tic-tac-toe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 C Analyser tests 55 C.1 Analyser tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 C.2 Evaluator tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 D Source Code 57 D.1 gameplayer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 D.2 kifParser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 D.3 network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 Chapter 1 Introduction Ever since the research in artificial intelligence started, when the first computers were introduced, people have been writing programs able to play games. Most of these programs have been highly specialised and have not really contributed to the research in artificial intelligence. Instead there has been developed clever search algorithms and evaluation functions devised by the programmer himself and not the system. A general game playing system takes a different approach. A such system is only given a description of the game rules and must by itself figure out a way to play the game. This means that a general game player cannot rely on clever evaluation functions or large databases of previously played games, since there is no way to tell in advance, what game it should play. The only way to play the game successfully is for the system to reason about the rules of the game and come up with a suitable strategy. This property of the general game player makes it interesting for the research in artificial intelligence outside the area of games. The challenge of making a good general game player is that it has to be good at a broad variety of different games, and a good strategy in one game might be a bad strategy in another. A general game player will typically include different artificial intelligence disciplines like reasoning, planning, heuristic search, knowledge representation and learning. In this project we will look at how a general game playing system can be con- 2 Introduction structed. We will have a look at some of the best game players participating in the annual general game playing competition held by the Association for the Advancement of Artificial Intelligence (AAAI). Then we will implement some of these techniques in our own general game player and get it to work on the same setting as the AAAI competition. We will investigate the minimax algorithm and the UCT algorithm and look into enhancements and optimisations such as alpha-beta pruning, transposition tables etc. Even though the game player must be able to play all types of games in a reasonable way, the main focus of this project will be on two-player competitive games, since this is the core discipline of the AAAI competition. Finally we will evaluate the game player by comparing the performance in single player puzzles to other game players and compare the performance of the different algorithms used. Chapter 2 Background In this chapter we will have a look at what others have done on the subject of general game playing, including a description of the AAAI competition. The earliest research in GGP was made in 1992 by Barney Pell, where he presented the idea of the Metagame[9]. He argues why general game playing is an interesting AI subject, and he describes how a GGP match should be set up with communication protocols, description of game rules etc. Many of his ideas are used in GGP today. 2.1 The AAAI General Game Playing Competition Each year there is a competition in general game playing held by the Association for the Advancement of Artificial Intelligence (AAAI) at their annual conference. The competition was held for the first time in 2005 and has since then served as an unofficial world championship in general game playing. The rules and setting for this competition is developed by the Logic Group at Stanford University[5]. This includes the communication protocols, the game description language and the specific game rules. 4 Background 2.1.1 The Game Playing Protocol At the competition each game player communicates with the game server through a TCP/IP connection using HTTP. It is assumed that the players are listening for incoming communication on a particular port. A message from the server can be of three different types: • START - The START command is used to initialize a new game. The command contains five arguments. A match ID, which is a unique identifier of the match, the role to be played by the game player, the game rules, the time limit for pre-game analysis and the time limit between each move. When the game player is done pre-analysing the game, it must reply with READY. • PLAY - The PLAY command is used to start each step of the game. It has two arguments, the match ID and a list of the moves made by all the players in the previous step. At the end of each step, the game players must reply with the move they want to make. • STOP - When the game is over a STOP command is sent. The arguments are similar to the arguments of the PLAY command telling which moves were the final moves. It is considered polite of the players to respond with DONE, but it is not mandatory. If the game players wish to learn from the outcome of the game, they need to figure out the result of the match by them self. The server does not send the scores of the match. A message from the server and the reply from the game player could look like this: The game server sends: POST / HTTP/1.0 Accept: text/delim Sender: GAMESERVER Receiver: GAMEPLAYER Content-type: text/acl Content-length: 40 (PLAY MATCH.3316980891 (NOOP (MARK 3 3)) The game player replies: HTTP/1.0 200 OK Content-type: text/acl 2.1 The AAAI General Game Playing Competition Content-length: (MARK 2 1) 5 10 If a game player for some reason fails to reply or replies with an illegal move, the game server chooses a random legal move on the behalf of the player. 2.1.2 Knowledge Interchange Format (KIF) All communication between the server and the game players is formatted in prefix KIF[4] as we just saw in the above example. KIF stands for Knowledge Interchange Format and is a format originally developed to interchange knowledge between different programs or platforms. It provides for the expression of any first-order logic, which is necessary for describing game rules. Although KIF is not intended for interaction with humans it is still readable. As an example the Datalog sentence A ⇐ B ∧ C translates into (<= (A B C)). The most important features to remember when using KIF is that every expression must be surrounded by parentheses, all operators are written in prefix form, and all variable names starts with a question mark. 2.1.3 The Game Description Language (GDL) The Game Description Language[5] (GDL) is developed specifically to describe games played in the competition. GDL is a variant of Datalog that makes it possible to describe the rules of a game using logic. It is limited to only describe deterministic, complete information games. The GDL uses the following set of relations: role, init, true, does, next, legal, goal, terminal and distinct to describe the mechanics of the game. To explain how the GDL and KIF work and how it is used, we will construct the game rules of tic-tac-toe as an example. A more formal and complete description can be found in [5] and [4]. The complete game description can be found in appendix B.1 page 51. First of all we need to specify the roles in the game. In tic-tac-toe we have the roles of cross and nought, but for simplicity we will call them x and o. We use the role relation to define them: (role x) (role o) 6 Background Then we need to define the initial state. We need to express that the board is empty (b for blank) and that cross is first to move. To do this we use the init relation: (init (cell 1 1 b)) (init (cell 1 2 b)) (init (cell 1 3 b)) (init (cell 2 1 b)) (init (cell 2 2 b)) (init (cell 2 3 b)) (init (cell 3 1 b)) (init (cell 3 2 b)) (init (cell 3 3 b)) (init (control x)) The cell and control relations are names we make up and are not a part of GDL. Next we want to express that when a player marks a cell, that cell will be marked in the next state. Also we want to express, that if a player does not mark a specific cell, that cell remains the same. We use the next and does relations to express this: (<= (next (cell ?x ?y ?player)) (does ?player (mark ?x ?y))) (<= (next (cell ?x ?y ?mark)) (true (cell ?x ?y ?mark)) (does ?player (mark ?m ?n)) (distinctCell ?x ?y ?m ?n)) We used a new relation distinctCell, that we need to define. The relation means that the two positions (x, y) and (m, n) are distinct. For two positions to be distinct it is enough that at least one of the coordinates are distinct, i.e. if either x 6= m or y 6= n the two positions are distinct. We use the the predefined relation distinct to express the inequality: (<= (distinctCell ?x ?y ?m ?n) (distinct ?x ?m)) (<= (distinctCell ?x ?y ?m ?n) (distinct ?y ?n)) Now we want to express that the control alternates between cross and nought. To do this we use the true relation that expresses that some statement is true in the current state of the game: (<= (next (control x)) (true (control o))) (<= (next (control o)) (true (control x))) To specify the legal moves in any state, we use the legal relation. In tic-tac-toe it is legal to mark a cell, that is not already marked and if it is your turn. If it 2.1 The AAAI General Game Playing Competition 7 is not your turn, you are not allowed to do anything. Since every player must have at least one legal move in every state of the game (except terminal states), we express this by using a noop operation that does not change the state of the game: (<= (legal ?player (mark ?x ?y)) (true (cell ?x ?y b)) (true (control ?player))) (<= (legal x noop) (true (control o))) (<= (legal o noop) (true (control x))) The terminal conditions are expressed using the terminal relation. The game is over if one of the players have three pieces in a line or when there is no empty cells on the board: (<= terminal (line x)) (<= terminal (line o)) (<= terminal (not open)) Here we used the not relation that simply negates an expression, and we used the helping relations line and open. The open relation is true if there is at least one empty cell: (<= open (true (cell ?x ?y b))) The line relation is defined as a player has either a row, a column or a diagonal of thee pieces: (<= (line ?player) (row ?x ?player)) (<= (line ?player) (column ?y ?player)) (<= (line ?player) (diagonal ?player)) The row relation looks like this. The column and the diagonal are defined in the same way: (<= (row ?x ?player) (true (cell ?x 1 ?player)) (true (cell ?x 2 ?player)) (true (cell ?x 3 ?player))) Finally we need to use the goal relation to define the rewards of the terminal state. The reward must be an integer between 0 and 100 (including both) where 8 Background larger numbers are better. There must be defined a goal value for every player in every terminal state. For non terminal states the goal value is optional. This means that you could actually choose to build in an evaluation function of non terminal states into the game rules. However, we do not want that, so we only specify the goal values for terminal states: (<= (goal ?player 100) (line ?player)) (<= (goal ?player 50) (not (line x)) (not (line o)) (not open)) (<= (goal ?player1 0) (line ?player2) (distinct ?player1 ?player2)) The three implications describes that a player receives 100 if it has a line, 50 if no player has a line and no cell is empty, and 0 if the other player has a line. 2.2 Other game players Let us take a look at some of the competitors in the GGP competition. Here is a short description of three of the most notable game players that have participated in the annual GGP competition during the last four years. 2.2.1 Fluxplayer The Fluxplayer[11] is a player developed by Stephan Schiffel and Michael Thielscher from the Department of Computer Science in Dresden University of Technology. The player won the AAAI GGP Competition in 2006. It uses the minimax search algorithm with a heuristic evaluation function based on goal distance to evaluate non-terminal states. The idea is to calculate the degree of truth of the goal and terminal conditions using fuzzy logic. The evaluation function will then seek to avoid terminal states when the goal value is low, and go for terminal states where the goal value is high. 2.2 Other game players 2.2.2 9 Cluneplayer The Cluneplayer[6] won the AAAI GGP Competition in 2005 and performed very well in the following three annual competitions as well. It is developed by Jim Clune, a Ph.D. student of University of California. The Cluneplayer uses a heuristic evaluation function together with several search algorithms such as minimax. Unlike Fluxplayer, the Cluneplayer deduces features of the game from the game description and uses simulation to determine how these features should add to the evaluation function. A feature could for instance be piece count or movability. These two features would be good to use in classic board games like chess and checkers. The approach is very strong in games where the player is able to deduce many relevant features of the game, but it is weak when this is not the case. 2.2.3 CADIA-Player In 2007 and again in 2008 a player developed by Hilmar Finnson and Yngvi Björnsson, the CADIA-Player[3], won the AAAI GGP Competition. Unlike most other players, this player do not use a heuristic evaluation function. Instead it uses simulations of the game to determine what move to make next. The player uses the UCT algorithm to solve the exploration/exploitation question. The first simulations made are totally random, but as the player learns more about the outcome of the moves, it will explore the best moves more and more often. Furthermore the player uses history heuristics to prioritise exploring the moves, that earlier have shown to be rewarding. The approach of the CADIAPlayer will be good for almost any game, but it will be even better for games, where random simulations to a terminal state would serve as a good evaluation function. The greatest weakness of the UCT algorithm is single player puzzles, and therefore the CADIA-Player also uses enhanced IDA* for these kind of games. If the enhanced IDA* fails to find a solution within the start clock, the player switches back to the UCT algorithm. 10 Background Chapter 3 Algorithms When implementing a general game player one can take different approaches. Most existing game players use minimax and some kind of heuristic function. Others have used other algorithms. In this chapter we will look into the two different search approaches used by the game player of this project: the Monte Carlo method based UCT algorithm and the minimax search algorithm. The obvious choice of a search algorithm would normally be minimax, but as we will discover, there are several problems when using that algorithm in the context of general game playing. 3.1 Types of games When designing a general game player and especially when choosing what algorithm to use, one must keep in mind that the game player must be able to play every game that can be expressed by the game description language. As mentioned earlier, the game description language used in this project supports description of all deterministic games with perfect information. A game is deterministic if the next state of the game is uniquely defined if the actions of the players are known. This means that games of chance, e.g. games that includes dice, are not supported. A perfect information game is a game where all information is known to all players, as opposed to many card games where the hand 12 Algorithms of one player is not known by the opponents. This narrows down the classes of games, we need to take into account, but there are still many different game types that we need to support. • Competitive games versus cooperation games: Traditional twoplayer games like chess and checkers are zero-sum games, meaning that the sum of the two players’ scores is always zero. If one player wins, the opponent looses. In general game playing this need not be the case. Whether the players need to work together or against each other can make a huge difference in the optimal strategy of the individual player. Furthermore, it can be a difficult task to determine whether the best strategy is to cooperate or not. This is very well illustrated by the prisoners dilemma: Two criminals are caught by the police. The police does not have enough evidence to get them both convicted, so they separate the criminals and offer each of them a deal. If one testifies against the other, he is set free, and the other criminal receives a 10-year sentence. If they both testify each will receive a 5-year sentence, but if none of them accepts the deal, both will get away with a 6-month sentence for a smaller crime. How should the criminals act? The most rational choice would be to betray the other criminal, because no matter what the other criminal does, betraying will result in a shorter sentence. However if the other criminal thinks in the same way, each criminal receives a longer sentence, than they would have received by cooperating. Even though the prisoners dilemma and other cooperative dilemmas might not have a clear optimal solution, the possibility of cooperation must be at least considered when designing a general game player, because different algorithms might have different ways of reacting to these situations. • Single player puzzles versus multi player games: Any general game player must be able to play games with either one, two or even more players. Furthermore playing against opponents or solving a single player puzzle is two very distinct tasks. Both can be solved by searching but whereas the multi player game often has many won terminal states far away from the initial position, puzzles often only have one or a few won terminal states relatively close to the initial state. There are of course exceptions, but this is the general case, and the general game player needs to adjust accordingly. • Games with alternating moves versus games with simultaneous moves: Any player must be able to handle simultaneous moves. This is not a big problem, but one has to keep it in mind when designing the algorithms. Some algorithms might also be better at handling simultaneous moves than others. 3.2 Minimax 3.2 13 Minimax The minimax algorithm is a widely used algorithm in game players. It got its name because it minimises the maximum possible loss. With the addition of alpha-beta pruning together with initial sorting of the nodes and with the use of transposition tables, the minimax algorithm can become very fast. Some implementations search deeper in some branches of the search tree than other branches for some particularly interesting moves, but the main idea is the same. Games with a large state space like e.g. chess will however not be fully searchable no matter how fast or clever an algorithm you use. Therefore the minimax algorithm depends on a good heuristic evaluation function to be efficient. In the context of general game playing a good heuristic evaluation function can be very difficult to find, and that is the greatest drawback of the minimax algorithm. There are however several other problems in using minimax in a general game player. The algorithm only works on two player, zero-sum games with alternating moves. These conditions can be met by any general game by making some assumptions, but the assumptions come with a cost. First of all any game can be seen as a two player game if you make the paranoid assumption that all the opponents work together to beat you. This way all opponents can be treated like one. The assumption can however lead to suboptimal play. You could for instance imagine a game with several players, where the best strategy would be to cooperate with some of the opponents to beat the rest of them. In overall the paranoid assumption will lead to overly defensive play. The algorithm will only work on zero sum games, but in the general game, we cannot be sure that this is the case. To solve this we only focus on our own score and ignore the opponent’s score. Instead we assume that the opponent tries to minimize our score instead of maximizing their own. This way we can treat all games as zero sum games. In most competitive games, this approach will work just fine, but in cooperative games or other non-constant sum games, it can lead to suboptimal play. An alternative strategy would be to look at the difference between our own score and the opponent’s score, and try to maximise the difference in favour of our score. This would maybe be better in some nonzero sum competitive games, but the strategy would totally fail in cooperative games. To make all games turn taking games, we assume that whenever simultaneous moves occur, we serialize the moves so that we move first, and our opponent second. This way all games will be turn taking. We do however assume that 14 Algorithms our opponent knows our move, which is not the case in the real game. The assumption fits the spirit of the minimax algorithm very well by minimising the maximal loss, and the minimax algorithm would not have chosen any different, even if modelling of simultaneous moves were possible. Despite the problems and compromises one has to make in order to use minimax in general game playing, the algorithm is widely used by programmers of general game players, and with good results. This is partly because most of the games played have met all the requirements of the original minimax algorithm, and partly because it is just a very good algorithm. The minimax algorithm is simple to implement, is very well documented and tested, and comes with some very nice optimisations such as alpha-beta pruning. The real challenge when using this algorithm is to find a good evaluation function. 3.3 A simulation based evaluation function In order to use the minimax algorithm to its full potential it is necessary to use a heuristic evaluation function. It is however not an easy task to develop such an evaluation function for a general game. Most of the general game players have had their own individual approach to this problem, since there does not exist any solid research results of how to make a good general evaluation function. Of course you could come up with features like piece-count and mobility that will be good contributions to an evaluation function in many games, but there will always be games, where they will not work or maybe even harm the game player. When dealing with general game playing it is therefore most interesting to investigate general evaluation functions that do not rely on specific features that might not exist or make sense in all games. For instance would a piece-count evaluation function be good in chess but would not make much sense in tic-tactoe. Making an evaluation, that build partly on such features and partly on other elements, could be a feasible strategy. However, our goal is to make an evaluation function, that in a general way finds the important features of any game without relying on specific predefined features. The overall idea of the evaluation function is to explore a lot of states, and find out by simulation, how good or bad they are. Then each state is broken into atoms, and the value of the atoms is calculated from the evaluations of the states, that contain them. An atom of a state could for instance in chess be that the white queen is on coordinate C 5 or in tic-tac-toe that there is a cross in the upper right corner. Later on any other state can be evaluated depending on 3.3 A simulation based evaluation function 15 what atoms it contains. Since this is not something anyone has done and written about before, I have taken a rather experimental approach towards finding the best way to create the evaluation function. A possible weakness of the approach is that it depends on how the game rules are written. One could imagine that the game was described in such an odd way that this strategy would make no sense, but that seems to be more of a theoretic weakness than a practical issue. The evaluation algorithm falls into two parts. The first part is to analyse the game and collect usable data. The second part is to use these data to evaluate any given state. 3.3.1 The analysing part When analysing the game, simulations are used to generate states from a given starting state. One could use totally random simulations, but to tune the algorithm towards exploring the most interesting moves, the UCT algorithm is used for controlling the simulations. For each simulation every state on the path from the starting state, to the terminal state is stored, and the goal value of the terminal state is found. Then some samples from the states are selected and broken into atoms, and each atom is added the found goal value. In step by step, the algorithm works as following: 1. Make an UCT simulation from the start state to a terminal state, and save all the states found in the path. 2. Get and save the goal value of the terminal state. 3. Select some of the saved states. 4. For each of the selected states, break the state into atoms and add the saved goal value to each atom. 5. Repeat until the time is up. We will investigate three different ideas for sampling states. The first idea is to select one random state from each simulation. This works fine, but is throwing a lot of information away. The second thought was to use all the states on the path. This seems to work a little better, even though more computation needs to be done for each simulation. The final idea is to only use the terminal state. The hope is that this will help the game player to win by going more directly after sub-goals, but it is not guaranteed that it will work for all games. 16 Algorithms To find out which strategy to use, we conduct a mini tournament with the three different implementations of the algorithm. The result of the tournament shows, that the idea about only using the terminal states is not very good. The two other strategies performs roughly the same with the “sample all states” strategy winning a few more games than the “sample one random state per simulation” strategy. A detailed listing of the results can be found in appendix C page 55. 3.3.2 The evaluating part When the game has been analysed, any state can be evaluated by looking at the found values of the atoms of the state. There are several different ways to compose the evaluation function. One could use the mean of all the state atom values. A better way would perhaps be to weight the values somehow. Values with a low variance could be weighted higher than values with a high variance, since it is plausible that values with low variance is more important to the game outcome. Like with the analysing part, a mini tournament is conducted to determine which method is the best. The following candidates of how to weight the contributions from each state atom, are considered. • The mean value. • A weighted sum where all values are weighed by one divided by the variance. • A weighted sum where all values are weighed by one divided by standard deviation. The weightings using the variance or standard deviation will weight the values with the smallest variation highest. In order to avoid errors by dividing by zero, the values are actually weighted with one divided by the maximum of one and the variance/standard deviation. The result of the tests shows that the strategy using the variance performs better than the two alternatives, winning 75% of all matches. The final evaluation function therefore uses the weithed sum using the variance. A detailed listing of the results can be found in appendix C page 55. 3.4 Monte Carlo methods and UCT 3.4 17 Monte Carlo methods and UCT Because of the problems one discovers when trying to use the minimax algorithm in a general game player, people have been looking elsewhere for algorithms better suited for the general game. Monte Carlo methods have proven to be very powerful and the game player of this project also implements a variant of this approach. Monte Carlo methods relies on repeated random simulations to compute the results. The simplest strategy is just to make repeated random simulations of the game until the time is up. The move that yield the best result is picked. Using this strategy will however spend the same time exploring the bad moves as it spends exploring the good moves. If we instead focus on using the information, we have already gathered to weight exploration of good moves higher, it would lead to better play, since using time on exploring how bad a bad move really is, is a waste of precious time. The time is better used exploring the more interesting and rewarding moves. The problem is a variant of the multi armed bandit problem. In the normal multi armed bandit problem you have a slot machine with multiple levers. Each lever produces a random reward from an unknown distribution, and the reward distribution for each lever may be different from the other levers. The task is to maximise your collected reward from iterative pulls. Pulling different levers may teach you more about each lever, but while you do this, you might loose potential rewards by pulling a suboptimal lever. This task is also known as the exploration/exploitation dilemma. There are several different approaches to this problem. One of the strategies is the UCB1[1] algorithm. UCB stands for Upper Confidence Bounds, and as the name implies, the algorithm ensures an upper bound of the regret made from not pulling the optimal lever. The idea of the algorithm is that each lever has a record of the average reward of pulling that lever recorded so far, and a bias. Whenever the algorithm has to choose which lever to explore or pull, it chooses the lever that maximises the sum of the average reward and the bias. The key feature of the strategyqis how the bias is calculated. In the UCB1 algorithm the bias is calculated as 2 nlnj n where nj is the number of times lever j has been pulled so far, and n is the total number of pulls done so far. When using this formula it is assumed that the rewards will be between 0 and 1. When applying the UCB strategy to games, the scenario needs to be changed a little. Instead of having a single bandit with independent levers, each lever on the first bandit will either spawn a new bandit with new independent levers or 18 Algorithms yield a reward. This corresponds to making a move in a game and either get a new game state or a reward from a terminal state. We can still use the idea of the UCB algorithm to solve this problem. The new algorithm is called UCT[7] and was proposed by Levente Kocsis and Csaba Szepesvári. UCT simply stands for UCB applied to trees. This algorithm is used by the best computer players of the very advanced game Go. It has also proven to be a very viable strategy in general game players, since the winner of the AAAI GGP Competition in both 2007 and 2008, CADIA-Player, uses this approach[3]. The UCT algorithm works like the simple Monte Carlo simulation strategy, but instead of choosing random actions it uses the UCB algorithm at each state in the game to explore the rewarding action more thoroughly. The greatest advantage of the UCT algorithm is that is does not require any evaluation function to give a good result, since it uses the real rewards to estimate the value of the moves. Also it is proven mathematically that the probability of choosing the optimal action converge to 1, when the number of simulations grows. The algorithm is also an any-time algorithm1 , which makes it very suitable for implementing in a general game player, where the result must be returned within a given time frame. Unfortunately there are also some drawbacks. If the game tree is very deep or each state update is heavy to compute, the algorithm might never or only a few times hit a terminal state. This means that it will have a very thin foundation for making any good decisions. Also a move that initially looks good but really is bad, may cheat the algorithm if it does not get to make enough simulations to realise its mistake. The algorithm will work for single player puzzles, but more conventional search methods like iterative deepening depth first search has shown to give better results. The reason is as mentioned before, that these games often have only one or a few paths to a winning state relatively close to the initial state, that the UCT algorithm might overlook when searching deep down the tree for a terminal state. 1 Any-time algorithm means that the algorithm can be stopped at any time and still return a useable answer. Chapter 4 Architecture Writing a program like a general game player is a complex task and we need some kind of strategy of how to approach this task. It seems reasonable to use a divide and conquer strategy to break down the task into smaller and easier subproblems. How to do this, and how the solutions to the individual sub-problems merge into a solution to the main problem, is the architecture of the program. 4.1 Layered architecture A way to divide the task into sub-problems is to use a layered architecture, where the bottom layer is the very basic ability of the program and every other layer builds on the underlying layer, adding new features or functionality. This model proved to work very well with the game player. As the very bottom layer we need to place the most basic ability of the game player. This ability must be to reason about game rules. Without this ability the game player would not be able to make a legal move, let alone distinguish between good and bad moves. Since the game rules are provided in a Datalog like language, it is a doable task to reason about game rules using a logic language like e.g. Prolog. What kinds of reasoning the bottom layer must be able to do, is determined by what the above layer needs. 20 Architecture When the reasoning layer is in place, we can use it to build upon. Now we can actually implement the algorithms that determine which moves are good and which are bad. This layer is where the actual game playing will take place, and therefore we call it the player layer. To implement the player algorithms, the reasoning layer must provide information on terminal states, goal values, legal moves and state updates, and this will be doable. To speed up computation and eventually improving overall performance, we do however put in an additional layer between the reasoner and the player. In this layer we implement a transposition table, that acts like a cache storage for the reasoner. If a request to the reasoner has already been calculated, the value is returned from the cache rather than being recalculated. When implementing the minimax player, we need to make some analyser, that can analyse the game and thereafter evaluate game states. This part of the program can be seen as a part of the player layer, but it can also be seen as a separate layer beneath the player. Now we have a program that at any given state in any given game can calculate a good move, but we still need a lot more on top of that. The next layer will remember what game is being played, what role in the game the player is playing and what state the game is in. Furthermore the layer needs to be able to update the game state. This layer will be called the session layer. Now the game player will in theory be able to play a single game from start to end. However the layered architecture starts to get a bit blurry here, because the session layer needs to skip several layers to utilise the reasoning layer when updating the state. The next layer will take care of the various messages from the server that is using the special game protocol. It will check if the match ID fits the current ongoing game before communicating with the underlying layers. It will also start and stop games accordingly to the instructions from the game server. This layer is called the game manager. Since all communication to and from the server is in KIF, the game manager needs to be able to translate the KIF strings to and from whatever communication method is used between it and the session layer. This communication will actually be in Prolog strings since this proves to be smart when the reasoner eventually gets the information. For simplicity the parts translating between KIF and Prolog can be separated from the game manager and implemented individually. 4.1 Layered architecture 21 HTTP HTTP Server Server KIF KIF Parser Parser Game GameManager Manager Prolog Prolog Parser Parser Session Session Player Player Game Game Analyzer Analyzer Transposition TranspositionTable Table Reasoner Reasoner Figure 4.1: Structure of the game player Finally we need a layer for sending and receiving messages over an HTTP connection. This will simply be a HTTP server, or at least a server able to handle a subset of the HTTP, since not all of it will be used. All these layers will together form a game player, that can function in an environment similar to the AAAI GGP competition. The layered architecture of the game player is illustrated on figure 4.1. 22 Architecture Chapter 5 Implementation As described in chapter 4 the game player is build from different parts with their own specific tasks, where each part is a layer, that builds on the underlying layers. In this chapter the most interesting implementation details of each part of the program are described. The source code of the game player can be found in appendix D. All of the game player is written in Java 1.6. Furthermore the player uses a Prolog engine to calculate everything related to the game rules. For this purpose the SWI Prolog environment is used because it comes with the JPL package, which makes it possible to make Prolog queries from Java. 5.1 Java or Prolog Everything in the game player can be implemented entirely in Java or entirely in Prolog. The reason for mixing those two languages is that they each have their strengths and weaknesses. The idea by combining them is that they can complement each others weaknesses. Prolog is a logic programming language making reasoning about logic very easy. On the other hand it also makes many things more difficult. Java is an object oriented and imperative language making a lot of things easier to implement for a person who is not an expert in logic programming. 24 Implementation When using both programming languages, we must decide where to use which language. It is obvious that it will be smart to implement the reasoning layer in Prolog, since reasoning about logic is where Prolog is really strong. An early implementation of the game player showed that the player and every layer below can be implemented in Prolog. However, the transposition table is a challenge, and instead of using too much time on it, an implementation in Java, using Prolog queries from the reasoner to make the actual calculations in this layer, was also tried. The algorithms run much faster in the Java implementation, so we will to stick with this solution. 5.2 Reasoner The game reasoner is the bottom layer of the game player and is used by the other parts of the player to reason about the game rules. It must be able to calculate four different things: • All legal moves for all players from a given state. • The next state given a current state and an action array. • The goal values of a state. • Whether a state is a terminal state. To do this the reasoner uses the logic programming language Prolog. This is smart because the game rules written in GDL can easily be translated to Prolog clauses, and when this is done, all the above tasks can be done by simple Prolog queries. When the reasoner is instantiated, it receives the game description as a Prolog string ready to be read into the Prolog engine. The reasoner also extracts information about the initial state and the roles of the games from the game description, so that it will not be necessary to query Prolog for these informations later. Whenever a rule is put into the Prolog database, a reference number to the rule is also saved in Prolog using the assert/2 command. This means that the rule can be easily retracted again when a new set of game rules need to be loaded. Unfortunately there is no other way to reset the Prolog engine in the current implementation of the JPL package, that the reasoner uses. 5.2 Reasoner 25 The value of many of the statements in the rules depends on the current state. Instead of loading the current state into Prolog every time a query is made, every Prolog expression that depends on the game state, has the state added as an extra argument. For instance a rule from tic-tac-toe should be translated like this: goal(Player, 100) :- line(Player) ↓ goal(Player, 100, State) :- line(Player, State) This is done because tests have shown that it is faster than loading in and retracting the game state for every query made. There are three rules in the GDL description that applies to all games. The first rule is the relation distinct that takes two arguments and is true if and only if the two arguments are not equal. This is implemented in Prolog using the non-equivalence operator: distinct(X, Y) :- X \== Y The second rule is the true relation, meaning that the argument is true in the current state. As discussed earlier, the true expression will have a state argument added because it depends on the current state. An expression is true in a state, if the expression is contained in the list of true expressions in the state: true(X, State) :- member(X, State) The third rule is the negation relation not. This is already implemented in SWI-Prolog, but it is not in the Prolog ISO standard. So to make the reasoner compatible with other implementations of Prolog, we add it with the line: not(X) :- \+ X Finally the following rules are loaded into the Prolog engine. or(A, B) :- A ; B or(A, B, C) :- A ; B ; C 26 Implementation or(A, B, C, D) :- A ; B ; C ; D or(A, B, C, D, E) :- A ; B ; C ; D ; E or(A, B, C, D, E, F) :- A ; B ; C ; D ; E ; F These rules used to be a part of the GDL but was removed again. They are included here because some old game descriptions might use them. 5.3 Transposition tables The transposition tables act as a cache storage for the reasoner. Every time another part of the game player needs information from the reasoner, they ask the transposition table. If the entry exists in the transposition table, the value is returned immediately. If not, the request is passed on to the Reasoner, and the response is stored in the transposition table, and sent back to the original requester. The data is stored in a hash table where each state is saved as a state object with a specific hash code. This makes it possible to fetch a state from the table in constant time, no matter how many states the table holds. In each state object all legal moves for all players and the next state for all action combinations are stored in hash tables. Furthermore the goal value for each player is stored as well at the information about whether the state is terminal. At initialization all the tables and values are empty, but are filled as the game player requests the informations. In order to store the states in a hash table, we need a way to make a hash value from the state. A state in the implementation is represented by a collection of strings and these strings can be used to calculate a hash value rather easy and fast. In Java the hash value H of a string S with length n is calculated as H(S) = S[0] · 31n−1 + S[1] · 31n−2 + · · · + S[n − 1] The same idea is used when calculating the hash value for the entire state. The hash value for a state made up of n strings S1 , S2 , . . . , Sn is calculated the following way. Hstate = H(S1 ) · 31n−1 + H(S2 ) · 31n−2 + · · · + H(Sn ) Early experiments showed that using 32-bit modular arithmetic in Java in order to store the hash values in a 32-bit integer (int) lead to occasional hash conflicts. Therefore the calculations will use 64-bit modular arithmetic in order to store the value in a 64-bit integer (long). This means that there can be 264 (over 5.3 Transposition tables 27 1.8 · 1019 ) different hash values. When using the formula the values will be very well distributed even though the strings in the state will contain strong patterns. That means that we can be pretty sure that two different states will have different hash values. Reordering of the strings will change the hash code, but the reordered strings will still represent the same state. Therefore we need to make sure that the strings are always ordered in the same way. This is done by storing the strings in a HashSet. When iterating over the hash set the strings will always be ordered by their hash code, and therefore always come out in the same order, no matter what order they were added in. When actually calculating the hash values we want to avoid calculating 31 to the power of something, because this is a fairly heavy computation. Instead the following algorithm is used. It gives the same result, but is computed faster. Here state refers to the HashSet of strings forming the state. hashValue = 0; for (String s : state) { hashValue = 31*hashValue + s.hashCode(); } To further speed up the game player, the hash value of the state is actually saved in the state object and used as a cache. When making a lookup in the hash table, only the hash code is used. There is no actual check whether the state in the table matches the requested state. The reason for this is that the lookup needs to be fast, and the extra check would slow the game player down. Instead the transposition table relies on the hash codes to be well distributed and not collide. If two states should get the same hash code anyway, the transposition table just ignores this and returns the results from the wrong state. With the hash code described above, this will happen very rarely, and is accepted as a cost of the speed improvement. Even though a collision should happen, the game player will continue to run without errors, but some parts of the search tree will not be searched, and this can of course lead to suboptimal play. Storing all computations made by the reasoner requires a lot of memory to be available. The default memory allocation for the Java Virtual Machine (JVM) is 128 megabyte, which the table can use up quickly. Instead of using the default value, the JVM is started with the argument -Xmx512m which gives 512 megabyte memory. This helps the transposition table to store a lot more information, but it really just postpones the problem. The transposition table will run out of 28 Implementation memory sooner or later, if we do not do something about it. In order to solve this, a limit in the number of states in the transposition table is implemented. We use a limit of 18000 states and that seems to fit well with the 512 megabyte memory. When the limit is reached, the table stops storing any new states, but just pass on the calculations from the reasoner. Most of the frequently visited states have already been visited at this point, thus it is only the less frequently visited states, that do not get stored. As the game develops it will most likely be other states, that get visited the most. Therefore we need to make room for new states as the game progresses. We do this by simply clearing the entire table at the beginning of a new move, if more than half of the capacity of the transposition table is used. This may sound a bit brutal, but figuring out which states currently gets the fewest visits and remove them would take too much time. Clearing the entire table is fast and efficient. 5.4 The Players The players are the main aspect of the game player. The player comes in two variants, the minimax player and the UCT player. It is the task of the players to decide what move to make next. A player must implement the player interface, which contains two methods. The first method is called makePreGameAnalysis and does not return a value. When this method is called, the player can choose to make some kind of analysis of the game before it starts. The second method is makeMove. This method receives a state of the game, and must return a legal move. The player must of course try to return the best possible move. Both methods are called with a time constraint and must return when the time is up. 5.4.1 Minimax player The minimax player is a implementation of the minimax algorithm. The algorithm is however modified to a more general form, but the idea is the same. The more general form of the algorithm works for any number of players by treating the opponents as one. The implementation even works with no opponents by skipping the minimising part of the algorithm. In this case the algorithm is just performing a depth first search. As mentioned in chapter 3 the minimax will only work on alternating moves, and to overcome this problem, the assumption is made that the player moves first, then the opponents. 5.4 The Players 29 When asked to make a pre-game analysis, the minimax player simply starts the game analyser. The result of the analyser’s work can be used to make an evaluation of each game state. Since the algorithm runs on a time constraint, it has to be able so stop computation and return a good result at any time. In order to meet this requirement the algorithm is implemented as an iterative deepening minimax. For each iteration the depth limit of the search is increased by one and the result of the latest completed search is returned when the time is up. This sounds like a lot of extra work for the reasoner, but since the implementation uses transposition tables, where all the calculated state-action pairs and state updates are cached, it is almost costless to start the search over. When a search is started, the minimax player first checks how many moves are available. If only one move is possible, there is no reason to find out how good that move is, since there is no alternative. Instead the game analyser is started once again to collect more data in order to make more accurate state evaluations. This is especially useful in games with alternating moves, where the game player otherwise would be idle half of the time. If more than one move is possible, then the actual search is started. The search runs like a normal minimax with the exceptions mentioned. Furthermore the alpha beta pruning technique is used. 5.4.2 UCT player The other player implemented is the UCT player, that implements the UCT algorithm. This algorithm uses no evaluation function and therefore it does not start the game analyser. Instead it uses the time before the first move to start making simulations from the initial state immediately. A simple description of the algorithm looks like this: 1. Get all legal moves in the current state and increase the state visit counter by one. 2. If a move has not been explored, explore it by updating the current state. Otherwise explore the move that maximises the sum of the expected reward and the bias of the move. If more than one move satisfies the conditions mentioned, choose randomly among them. 30 Implementation 3. Repeat step 1 and 2 until a terminal state is reached. Then get the goal value of the terminal state and update all the expected rewards on the path leading to the terminal state. 4. While there is time left, reset the current state and repeat step 1 through 3. The expected reward of a move is the mean of all rewards seen so far involving that move from the specific state. The bias is calculated as r 40 · log visitstot visitsm where visitsm is the number of times the move m has been explored so far, and visitstot is the total number of visits of the state. Unfortunately there is no way to calculate or mathematically deduce the optimal value of the constant factor. It needs to be empirically found for each application of the algorithm. In a general game playing context this is a bit complicated since the optimal constant may vary from game to game. The good news however is, that no matter what constant is used, the probability of choosing the optimal move will converge to 1 when the number of simulations grows. Testing has shown that the value 40 performs well in most games. Choosing a value just above 0 or just below 100 really decreases the performance of the game player, so the optimal value must be somewhere between these values. Since no moves will be explored at the beginning, the first simulation of the algorithm will be totally random. In the next few runs, the different moves from the initial position will be explored and the following moves will be random. As the algorithm explores more moves and learns more about the game, it will become less random and more deterministic following the UBC1 algorithm for choosing actions in each state. In step 2 of the algorithm where the description mentions how to choose a move to explore, it should of course be done in the same way for all players. When every players move has been chosen, the state can be updated. In the simulation each player simply tries to maximise their own reward without looking at the other players’ rewards. This feature makes the algorithm perform significantly different from, and possibly better than the minimax algorithm in cooperative games, or other games with non-zero sum goal functions. The UCT algorithm uses information of rewards and number of visits for every state and move visited. These values need to be saved in a data structure somewhere. Since the implementation uses transposition tables like the minimax 5.5 Game Analyser 31 player, the data structure of this table can be used to store the UCT specific values as well. In the actual implementation a recursive method is used to implement step 1 through 3 in the algorithm. This is the easiest and most elegant way, but in Java this can in extreme cases cause some problems, since the call stack is limited. If a stack overflow occurs, the algorithm returns a draw, e.g. all goal values are set to 50. However an overflow does not occur before a couple of thousand recursive calls, and so many calls will never be necessary in the context of the general game playing competition, since those rules are made so that a single game is relatively fast done. 5.5 Game Analyser The game analyser analyses the game by using simulations. Once the game is analysed, the analyser can evaluate any game state. It is very important that the evaluation function can be calculated very fast as it is called many times by the minimax algorithm. To accomplish this the atoms are saved in a hash table with the hash value of the string representation as a key. This way every atom can be found in constant time. 5.6 Session The task of the session layer is to save and update the current state of the game and to query the player for the next move. For updating the state of the game, the session object uses the reasoner layer. In the session layer, a time buffer of one second is subtracted from the time available for the players. This is done to make sure the game player will be able to answer in time, despite of scheduling delays or network lag etc. 5.7 Game Manager When a HTTP request from the server has been parsed, the actual message is handed over to the game manager. The task of the game manager is to understand the message and initiate the appropriate actions. It is also the 32 Implementation game manager’s responsibility to keep track of the match ID of the current game, and reject any messages with the wrong match ID. A request from the server can be one of the tree types START, PLAY or STOP. If a START command is received and the player is not already playing a game, the player role, game description, match ID and the start and play clocks are stored. A session layer containing the description of the game, the role to be played and a player object is instantiated and the session is requested to start analysing the game. When a PLAY command is received, the match ID of the command is checked, and if it matches the current ID, the moves of the players are sent to the session layer and a move is requested. The same happens when a STOP command is received, with the exception that a new move from the session layer is not requested. 5.7.1 KIF Parser The communication between the game server and the game player will be written in strings in the Knowledge Interchange Format (KIF). However the rest of the game player must be given the information in a more accessible format. Therefore the game manager uses a KIF parser to translate the KIF string into a Prolog string. Since GDL is a kind of Datalog, just written in KIF, it can without too much effort be translated to Prolog. For instance the rule from tic-tac-toe: (<= (goal ?player 100) (line ?player)) becomes goal(Player, 100) :- line(Player) The KIF parser uses a two-pass approach where the KIF is first translated into an internal data structure consisting of lists, variables, numbers and atoms. Thereafter this data structure is traversed in order to make the Prolog string. To make sure that no input in the game description will overwrite protected keywords in Prolog or use illegal characters, all variables and atoms are renamed to a random string with a special prefix, such that there is no danger of this happening. Later we will have to translate the random names back to the 5.7 Game Manager 33 original names, so we keep track of all the renaming in a symbol table. The reason for giving each symbol a new random string representation and not just some integer representation is that later, when we want to reason about the game rules, the strings can be used directly in the reasoner module. As mentioned earlier, the state dependent expressions must have a state argument added. But how does one find out whether the expression depend on the state or not? It is obvious that the predefined expressions true, legal, terminal, goal and next depend on the current state. It becomes more difficult for user defined expressions. There exists two types of user defined expressions. The first one is like line, row and column in tic-tac-toe. These expressions can be state dependent and they need to be explicitly defined in the game rules with an implication like: line(Player) :- row(X, Player) This type of expression will have the extra state argument attached. The second type is like cell and mark in tic-tac-toe. These expressions are just names and are not defined in any explicit way in the game rules. Therefore they are not given the extra state argument. To recognise the first type of user defined expression, the parser will run through all the game rules looking for definitions of user defined expressions before it translates the rules. Every expression found is put in a collection of expression together with the five predefined state dependant expressions. The KIF parser will now be able to tell whether to add the state argument or not. One could imagine a rule of the first type that was not state dependent, but these rules will often not make much sense, and will occur rarely if at all. It will however not do any harm if the state argument was added to these kind of expressions. 5.7.2 Prolog Parser When the game player after some calculations returns the next move to the game manager, the response comes as a Prolog string and has to be translated to KIF. The Prolog parser takes care of just this task. It uses a reverse of the symbol table made by the KIF parser to translate the random strings back to their original form. The Prolog parser uses the same two-pass strategy with the intermediate internal data structure as the KIF parser. 34 5.8 Implementation HTTP Server The final layer is the HTTP server layer. This is the layer, that allows the game player to communicate with the game player. As the name suggests, the layer is actually an implementation of an HTTP server. However only the POST request is supported, since this is the only type of request, the game server will ever send. When a request is received from the game server, the HTTP header is parsed and the content of the message is handed over to the game manager. After some time, the game manager returns a string, which is then wrapped up in a HTTP response and sent back to the game server. Chapter 6 Results In the previous chapters we have seen what algorithms the game player uses and how it is implemented. In this chapter we will make an empirical evaluation of how good the game player performs. A game server called GameController[2] is made available by Technische Universität Dresden. This game server implementation simulates the behaviour of the game server used in the AAAI competition. All the tests of the game player were made with this game server implementation. The tests were done by running one or two instances of the game player on the same computer with a dual core processor. This way each player should have the same computational resources available. 6.1 Minimax versus UCT The game player has been implemented with the minimax algorithm and the UCT algorithm. We will investigate how these two approaches compare to each other when they compete. We wish to investigate if and how different time constraints and different games influence the balance between the algorithms. In the experiments we play four different games. The games are chosen to reflect different features of possible games like small/large branching factor, deep/shallow game trees etc. Each game is played with three different time 36 Results Game Connect four Connect four Connect four Kolibrat Kolibrat Kolibrat Checkers Checkers Checkers Tic-tac-toe par. Tic-tac-toe par. Tic-tac-toe par. Start 30 60 120 30 60 120 30 60 120 30 60 120 Play 10 30 60 10 30 60 10 30 60 10 30 60 UCT 17.5% 25.0% 20.0% 17.5% 17.5% 17.5% 5.0% 7.5% 7.5% 60.0% 55.0% 55.0% Minimax 82.5% 75.0% 80.0% 82.5% 82.5% 82.5% 95.0% 92.5% 92.5% 40.0% 45.0% 45.0% Table 6.1: Results from matches between the minimax and the UCT player. constraints and each game is played 40 times per time constraint with the players changing roles after half of the games. The results of the matches are shown in table 6.1. The following games are played. • Connect four has an initial branching factor of 8, which is reduced near the end of a game, when the columns start to fill up, and the game has a maximum depth of 48 plies. It is difficult to make an automated heuristic function for this game since the value of a state depends on the pieces’ relations to each other and not the piece-count or absolute positions. In this test the minimax player is by far superior to the UCT player. This is caused by the fact that the UCT player is using its time to search to a terminal state and thus overlooking a lost state very near the current state of the game. On the other hand the minimax player quickly finds the flaws of the UCT player and forces a quick win near the initial position. • Kolibrat is a game developed by associate professor Thomas Bolander. It is very suitable for general game playing since it is not too difficult nor too easy to play. It has a small branching factor, about 3.12 in average[8], and a typical game is about 60 to 80 plies long. In this game the minimax player is the best player. It quickly takes control of the game and plays almost flawlessly, leaving the UCT player with no chance to make a comeback in the game. The small branching factor makes the minimax player able to fully search many plies ahead helping it to keep the control of the game. • Tic-tac-toe (parallel) is two ordinary tic-tac-toe games played simultaneously. The two games does not affect each other. This game has a very 6.2 Stress tests 37 large initial branching factor of 81 (nine possibilities on each tic-tac-toe board), that decreases during the game. The game depth is however very shallow. After only nine plies or sooner, the game is over. The shallow depth of the game shows to be an advantage for the UCT player. The player is able to make good estimates of the possible moves in a short amount of time as opposed to the minimax player, which do often not have a clue about what to do. The reason why the minimax algorithm wins a fair amount of games anyway, is that even though the starting position is a draw, the player initial in control have an advantage over the other player. Given more time both players seems to have figured out this relatively simple game resulting in most games ending in a draw positions. • Checkers have for a long time been an interesting game for artificial intelligence researchers since it a difficult game to master, yet it is not as complex as chess. It was recently weakly solved[10] i.e. it has been determined that the initial position is a draw and an explicit strategy to always achieve (at least) a draw has been found for both players. The fact that it is only weakly solved in contrast to strongly solved means that not all states of the game has been solved. This is however not necessary for perfect play as long as the game starts with the usual initial state. In this game the minimax player is superior. The evaluation function works well and learns more and more as the games progressed. The UCT player on the other hand have no chance, as the simulations take too much time to compute, leaving the player with only a very fragile foundation of only a few simulations for making its decisions. In figure 6.1 the results from the matches are drawn as several graphs showing the connection between the chosen time constraints and the relative performance of the game players. Overall the minimax player performed much better than the UCT player. The success of the minimax player must be put down to the evaluation function as all of the games (except perhaps tic-tac-toe) requires far-seeing strategies far beyond the search horizon of the minimax algorithm. The UCT player seems to do a little better as the time limit increases, but this tendency is not statistically significant. More tests would be required in order to find out if the tendency is just caused by statistical variation, or if it is a real result of increasing the computation time. 6.2 Stress tests To test the robustness of the game player a series of stress tests are performed. The tests are designed as single player puzzles with different properties. 38 Results Connect four Kolibrat Checkers Tic-tac-toe par. 75% 50% 25% 0% 10 s 30 s 60 s Figure 6.1: Graph of win ratio of the UCT player against the minimax player for various games and time constraints. • State space is a game based on tree search. The three variants has approximately 1, 000, 1, 000, 000 and 1, 000, 000, 000 states respectively and the reward is based on the path taken through the tree. • Duplicate state is much like the state space test. The three variants has the same number of states as the three state space tests, but only 5, 10 or 15 of the states are unique. • Rule depth is a test where it is possible for the game player to either give up or continue. However the amount of effort to prove that it is legal to continue grows linearly, quadratically or exponentially. The results of the tests are shown in table 6.2. The average and the best scores of the eight game players participating in the qualifying rounds of the AAAI competition in 2007 is are shown in the last two columns for comparison. All the test run smoothly with no errors from the game player. However there is a problem with the rule depth exponential test, where the implementation of the game server can not finish the test. The game player however works better and is able to continue until it get the goal value of 80, where both the UCT and the minimax implementation have to give up. It was expected that UCT and minimax would perform the same in this test as it is really the reasoner and the stability that is being tested. It is worth noticing that the game player, whether UCT or minimax was used, scores overall above average of the AAAI competition scores. The game player 6.2 Stress tests Game Duplicate state S Duplicate state M Duplicate state L Rule depth lin. Rule depth quad. Rule depth exp. State space S State space M State space L 39 Start 30 240 600 10 10 10 60 240 600 Play 10 10 10 10 10 10 10 10 10 UCT 75 44 28 100 100 (80) 75 44 7 Minimax 100 88 0 100 100 (80) 100 11 28 avg 59.4 34.6 45.4 51 39.5 17.5 84.4 34.4 9.6 best 100 100 100 100 100 60 100 77 35 Table 6.2: Results from stress tests. For highlighting purposes the numbers above average are coloured green while the numbers below average are red. did particularly well in the rule depth test where both the UCT and minimax player (would have) scored better than any of the competitors in 2007. 40 Results Chapter 7 Future work There are still several things that can be done in order to improve the game player. Here are some improvements that should be looked into. 7.1 Speed improvement It came to my attention during testing, that the game player makes fewer calculations within the same amount of time than some of the competitors. This issue was investigated during the project and it showed that the reasoner was the only significant time consuming part of the program. A lot of effort has been made to make as few calls to the reasoner as possible by using transposition tables, but it should be investigated whether the reasoner itself could be made faster. The reasoner builds on a SWI-Prolog engine and the JPL interface. SWI-Prolog is know to be a slow implementation whereas the Prolog implementation YAP is known to be several times faster. Unfortunately the JPL interface only works with SWI-Prolog and there is not any other easy-to-use alternative. The YAP implementation comes with a C interface and it should be very possible to either interface from Java to YAP via C or make a new YAP interface completely in Java. This is however too time consuming to be included in this project. 42 7.2 Future work History heuristics History heuristics is an interesting technique, which, unfortunately, there was no time to implement. The idea of the technique is that a good move in one state of the game also might be a good move in other states of the game. Experience shows that this is actually often the case. To use this to our advantage we would need to store evaluations of moves independent of the game state. Whenever a new state should be explored the most promising moves would be examined first. With the minimax player this would mean that the alpha-beta pruning would cut of search earlier and thus reach a larger search depth within the same amount of time. For the UCT player it would mean that potentially better moves would be explored first and therefore the simulations will be more relevant than just making random selections. 7.3 Parallelization Another subject for improvement is support for parallel computations. Today new computers have multi core CPUs and if you want to make heavy calculations, you really need to consider distributing the calculation among several CPUs. Since GGP is a topic where computational power is of great importance, it would be a great improvement to support multi core calculations. There already exists several proposals of how to run both minimax and UCT in parallel. 7.4 Game analyser methods The choice of method of both the analysing and the evaluating part of the game analyser build on relatively few matches, and only one game is played. To increase confidence that the methods chosen indeed are the best, more tests could be run. 7.5 The UCT bias constant When calculating the bias in the UCT algorithm a constant factor of 40 is used to weight the exploration/exploitation balance. It is unlikely that this is the optimal constant for all games. More research should be done towards tuning 7.5 The UCT bias constant 43 this constant to the game at hand. Branching factor and state space size might be parameters that impact the optimal size of the constant. 44 Future work Chapter 8 Conclusion In this thesis we have worked with general game playing and seen how a general game player can be constructed. First we had a look at the annual GGP competition, where we saw how a match is executed and how a game is described in the game description language. We also had a look at existing players, that had performed well in the competition and learned that there are many different approaches to making a game player. Then we presented an implementation of a game player of our own, which could use either the simulation based UCT algorithm or the minimax search algorithm. We used a layered architecture to simplify implementation and make it easier to change the implementation of a single part of the program. This came in handy when we implemented both a UCT and a minimax player using the same layers beneath and above player-algorithm layer. Finally we tested the game player, where we let the two different algorithms compete against each other in different games with different time constraints. This showed us that the minimax algorithm performed much better than the UCT algorithm. This is a surprising result as the winner of the AAAI GGP competition in both 2007 and 2008, the CADIA-Player, uses the UCT algorithm. There are several factors that could have caused this result. First of all the CADIA-Player is written in C and uses YAP Prolog in its computations, which probably makes it able to run much faster and produce more simulations. 46 Conclusion Secondly it uses history heuristics that also improves the performance. It is possible that the UCT algorithm will perform significantly better when given many more simulations. Our tests also showed a slight but non-significant indication of this along with the fact, that many of the games were lost, because the UCT player simply overlooked important states near the root of the search. Given more simulations those mistakes would be eliminated or at least moved further away from the root state, making them less critical. The test also showed, that the proposed evaluation function for the minimax algorithm work very well with the computational resources available in the tests. The evaluation function gives the minimax algorithm the possibility to look far beyond the search horizon and make good decisions, even though the terminal states are far away. Furthermore the actual minimax search will make sure the algorithm does not overlook any important states near the root of the search. Different stress tests were also run, and both variants of the game player did well in these tests, even when compared to the best existing game players. We have learned that making a general game player is a large project, where you have to address many different issues. It takes a lot of engineering effort to compose an implementation where all the different parts of the program has to work together. Since computational time is a very valuable resource in GGP, each part of the implementation must be tuned toward this, and every millisecond must be squeezed out of the algorithms. We learned that even though the UCT algorithm looked very promising, it is difficult to make the underlying computations run fast enough to get good results. The implementation of this project did not do the UCT algorithm full justice. On the other hand the minimax algorithm with the proposed evaluation function worked very well. The evaluation function was inspired by the UCT algorithm and also uses it when choosing which states to explore. General game playing is still a new area of research, and much has to be done, before general game players can beat humans in traditional games like the ones used as tests in this project. A topic that none of today’s game players have implemented, is transferring knowledge from one match of a game to another. This could potentially hugely increase the performance of the player. For human players practise makes perfect, and that might also be a good strategy for computer players. The task is however challenging, since the rules of the same game could be written in many different ways, and in the current GGP framework, the players are not told the name of the game. Identifying the game however does not do it alone. When one is able to transfer knowledge between game instances, the next step would be to transfer knowledge between different games. For instance one could easily imagine, that knowledge learned by playing checkers on a 8 times 8 board could be used in checkers on a 10 times 10 47 board and vice versa. 48 Conclusion Appendix A Abbreviations Here is a list of the abbreviations used in this report. • AAAI - Association for the Advancement of Artificial Intelligence • AI - Artificial Intelligence • IP - Internet Protocol • KIF - Knowledge Interchange Format • GDL - Game Description Language • GGP - General Game Playing • HTTP - Hypertext Transfer Protocol • TCP - Transmission Control Protocol • UCB - Upper Confidence Bounds • UCT - UCB applied to Trees 50 Abbreviations Appendix B Game rules B.1 Tic-tac-toe (role x) (role o) (init (cell 1 1 b)) (init (cell 1 2 b)) (init (cell 1 3 b)) (init (cell 2 1 b)) (init (cell 2 2 b)) (init (cell 2 3 b)) (init (cell 3 1 b)) (init (cell 3 2 b)) (init (cell 3 3 b)) (init (control x)) (<= (next (cell ?x ?y ?player)) (does ?player (mark ?x ?y))) (<= (next (cell ?x ?y ?mark)) (true (cell ?x ?y ?mark)) (does ?player (mark ?m ?n)) (distinctCell ?x ?y ?m ?n)) (<= (next (control x)) 52 (true (control o))) (<= (next (control o)) (true (control x))) (<= (row ?x ?player) (true (cell ?x 1 ?player)) (true (cell ?x 2 ?player)) (true (cell ?x 3 ?player))) (<= (column ?y ?player) (true (cell 1 ?y ?player)) (true (cell 2 ?y ?player)) (true (cell 3 ?y ?player))) (<= (diagonal ?player) (true (cell 1 1 ?player)) (true (cell 2 2 ?player)) (true (cell 3 3 ?player))) (<= (diagonal ?player) (true (cell 1 3 ?player)) (true (cell 2 2 ?player)) (true (cell 3 1 ?player))) (<= (line ?player) (row ?x ?player)) (<= (line ?player) (column ?y ?player)) (<= (line ?player) (diagonal ?player)) (<= open (true (cell ?x ?y b))) (<= (distinctCell ?x ?y ?m ?n) (distinct ?x ?m)) (<= (distinctCell ?x ?y ?m ?n) (distinct ?y ?n)) (<= (legal ?player (mark ?x ?y)) (true (cell ?x ?y b)) (true (control ?player))) (<= (legal x noop) (true (control o))) (<= (legal o noop) (true (control x))) (<= (goal ?player 100) (line ?player)) (<= (goal ?player 50) (not (line x)) (not (line o)) (not open)) (<= (goal ?player1 0) (line ?player2) (distinct ?player1 ?player2)) (<= terminal (line x)) (<= terminal Game rules B.1 Tic-tac-toe (line o)) (<= terminal (not open)) 53 54 Game rules Appendix C Analyser tests The methods are compared by playing the game Kolibrat with a start clock of 120 seconds and a play clock of 10 seconds. The first score is the score of the method in the row, and the second score is the score of the method in the column. C.1 Analyser tests Method All states Random state Terminal state Winning ratio: All 3-2 0-5 Rand. 4-1 0-5 All states Random state Terminal state Term. 5-0 5-0 80% 70% 0% 56 C.2 Analyser tests Evaluator tests Method Mean Standard deviation Variance Winning ratio: Mean 2-3 5-0 SD 4-1 4-1 Mean Standard deviation Variance Var. 2-3 2-3 45% 30% 75% Appendix D Source Code D.1 D.1.1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 gameplayer Atom.java package gameplayer ; public class Atom { private private private private String atom ; int sum = 0; int squareSum = 0; int visits = 0; public Atom ( String atom ) { this . atom = atom ; } public float getMean () { if ( visits > 0) { return ( float ) sum / ( float ) visits ; } else return 50; } public float getVariance () { if ( visits > 0) { float mean = getMean () ; return (( float ) squareSum / ( float ) visits ) - ( mean * mean ) ; 58 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 Source Code } else return 0; } public void addValue ( int value ) { sum += value ; squareSum += value * value ; visits ++; } public int hashCode () { return atom . hashCode () ; } public String getAtom () { return atom ; } public boolean equals ( Object obj ) { return ( obj instanceof Atom && (( Atom ) obj ) . getAtom () . equals ( atom ) ) ; } public String toString () { return atom ; } } D.1.2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 GameAnalyzer.java package gameplayer ; import import import import java . util . ArrayList ; java . util . HashSet ; java . util . Hashtable ; java . util . logging . Logger ; public class GameAnalyzer { private static final Logger logger = Logger . getLogger ( " gameplayer . GameAnalyzer " ) ; private Reasoner reasoner ; private T r an sp o s i t i o n T a b l e transTable ; private Hashtable < Integer , Atom > atoms = new Hashtable < Integer , Atom >() ; private boolean abort ; private UCTSearcher searcher ; public GameAnalyzer ( Reasoner reasoner ) { this . reasoner = reasoner ; transTable = new T r a n s p o s i t i o n T a b l e ( reasoner , false ) ; searcher = new UCTSearcher ( transTable ) ; } public void analyze ( State startState , long endTime , int playerIndex ) { if ( transTable . size () > T r a n s p o s i t i o n T a b l e . MAXSIZE / 2) { // Clear transposition table transTable = new T r a n s p o s i t i o n T a b l e ( reasoner , false ) ; searcher = new UCTSearcher ( transTable ) ; } while ( System . c u r r e n t T i m e M i l l i s () < endTime ) { ArrayList < State > states = new ArrayList < State >() ; State tempState = startState ; D.1 gameplayer 32 states . add ( tempState ) ; // Add initial state to make sure there is at least // one state in the list when the time is up . abort = false ; int value = searcher . s e a r c h S a v e S t a t e s ( states , startState , endTime ) [ playerIndex ]; if (! abort ) { for ( State state : states ) { HashSet < String > newAtoms = state . getAtoms () ; for ( String s : newAtoms ) { Atom f = new Atom ( s ) ; if ( atoms . containsKey ( f . hashCode () ) ) { atoms . get ( f . hashCode () ) . addValue ( value ) ; } else { f . addValue ( value ) ; atoms . put ( f . hashCode () , f ) ; } } } } 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 } logger . fine ( " Analysing done " ) ; for ( Atom f : atoms . values () ) { logger . finest ( " " + f . toString () + " } - " + f . getVariance () ) ; } public float evaluate ( State state ) { float dividend = 0; float divisor = 0; for ( String stateAtom : state . getAtoms () ) { if ( atoms . containsKey ( stateAtom . hashCode () ) ) { float variance = ( float ) Math . max (1 , atoms . get ( stateAtom . hashCode () ) . getVariance () ) ; dividend += atoms . get ( stateAtom . hashCode () ) . getMean () / variance ; divisor += 1 / variance ; } } if ( divisor > 0) return dividend / divisor ; else return 50 f ; // Unknown , return draw } } D.1.3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 59 GameManager.java package gameplayer ; import java . util . Hashtable ; import java . util . logging . Logger ; import import import import import import import import kifParser . Command ; kifParser . ParseE xceptio n ; kifParser . KIFParser ; kifParser . PlayCommand ; kifParser . PrologLexer ; kifParser . PrologParser ; kifParser . StartCommand ; kifParser . StopCommand ; public class GameManager { 60 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 Source Code private static final Logger logger = Logger . getLogger ( " gameplayer . GameManager " ) ; private Session session ; private String role ; private String matchID = null ; private int playClock ; private int startClock ; private int moves = 0; private Hashtable < String , String > symbolTable = new Hashtable < String , String >() ; private Hashtable < String , String > r e v e r s e S y m b o l T a b l e = new Hashtable < String , String >() ; private KIFParser kifParser = new KIFParser ( symbolTable ) ; private final Gameplayer gameplayer ; public Gameplayer getGameplayer () { return gameplayer ; } public GameManager ( Gameplayer gameplayer ) { this . gameplayer = gameplayer ; symbolTable . put ( " LEGAL " , " LEGAL " ) ; symbolTable . put ( " TERMINAL " , " TERMINAL " ) ; symbolTable . put ( " INIT " , " INIT " ) ; symbolTable . put ( " ROLE " , " ROLE " ) ; symbolTable . put ( " DISTINCT " , " DISTINCT " ) ; symbolTable . put ( " OR " , " OR " ) ; symbolTable . put ( " TRUE " , " TRUE " ) ; symbolTable . put ( " NEXT " , " NEXT " ) ; symbolTable . put ( " DOES " , " DOES " ) ; symbolTable . put ( " GOAL " , " GOAL " ) ; symbolTable . put ( " NOT " , " NOT " ) ; symbolTable . put ( " <= " , " <= " ) ; } public String h a n d l e G a m e S e r v e r R e q u e s t ( String request ) throws ParseEx ception { Command command = kifParser . parseKIF ( request ) ; if ( command instanceof StartCommand ) { if ( matchID == null ) { moves = 0; StartCommand c = ( StartCommand ) command ; role = c . getRole () ; matchID = c . getMatchID () ; playClock = c . getPlayClock () ; startClock = c . getStartClock () ; r e ve r se S y m b o l T a b l e = new Hashtable < String , String >() ; for ( String s : symbolTable . keySet () ) { r e ve r se S y m b o l T a b l e . put ( symbolTable . get ( s ) . toLowerCase () , s ) ; } logger . info ( " = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = " ) ; logger . info ( " New match ! " ) ; logger . info ( " " ) ; logger . info ( " My role : " + role ) ; logger . info ( " Match ID : " + matchID ) ; logger . info ( " Start clock : " + startClock ) ; logger . info ( " Play clock : " + playClock ) ; logger . info ( " " ) ; session = new Session ( gameplayer , c . getD escripti on () , role ) ; session . m a k e P r e G a m e A n a l y s i s ( startClock * 1000) ; return " READY " ; } else { return " G A M E _ A L R E A D Y _ P L A Y I N G " ; D.1 gameplayer 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 } } else if ( command instanceof StopCommand ) { StopCommand c = ( StopCommand ) command ; if ( c . getMatchID () . equals ( matchID ) ) { role = null ; matchID = null ; playClock = 0; startClock = 0; logger . info ( " Match stopped " ) ; return " DONE " ; } else { logger . warning ( " Got wrong match ID " ) ; return " WRONG_ MATCH_I D " ; } } else if ( command instanceof PlayCommand ) { PlayCommand c = ( PlayCommand ) command ; logger . info ( " Got play command " ) ; if ( c . getMatchID () . equals ( matchID ) ) { if ( moves > 0) session . updateState ( c . getActions () ) ; moves ++; Move move = session . makeMove ( playClock * 1000) ; if ( move != null ) { return PrologParser . parseFact ( new PrologLexer ( reverseSymbolTable , move . to PrologS tring () ) ) . toKIFString () ; } else { logger . warning ( " Could not find a move " ) ; return null ; } } else { logger . warning ( " Got wrong match ID " ) ; return " WRONG_ MATCH_I D " ; } } else { logger . severe ( " Some error occured " ) ; return " S O M E _ E R R O R _ O C C U R E D " ; } } } D.1.4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 61 Gameplayer.java package gameplayer ; import import import import import java . util . logging . Console Handler ; java . util . logging . Formatter ; java . util . logging . Level ; java . util . logging . LogRecord ; java . util . logging . Logger ; import network . HTTPServer ; public class Gameplayer { private int port = 40000; private String algorithm = " UCT " ; private boolean drawGraph = false ; 62 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 Source Code private static final Logger logger = Logger . getLogger ( " gameplayer . Gameplayer " ) ; public static void main ( String [] args ) { new Gameplayer ( args ) ; } public Gameplayer ( String [] args ) { handleArgs ( args ) ; setupLoggers () ; logger . info ( " Gameplayer running on port : " + port ) ; logger . info ( " Algorithm : " + algorithm ) ; if ( drawGraph ) logger . info ( " Graph : on " ) ; logger . info ( " Memory available : " + ( Runtime . getRuntime () . maxMemory () / 1048576) + " MB " ) ; GameManager gameManager = new GameManager ( this ) ; new HTTPServer ( gameManager ) . startServer () ; } private void handleArgs ( String [] args ) { for ( int index = 0; index < args . length ; index ++) { if ( args [ index ]. eq u a l s I g n o r e C a s e ( " - port " ) ) { index ++; if ( index < args . length ) port = Integer . parseInt ( args [ index ]) ; } else if ( args [ index ]. e q u a l s I g n o r e C as e ( " - algorithm " ) ) { index ++; if ( index < args . length ) algorithm = args [ index ]; } else if ( args [ index ]. e q u a l s I g n o r e C as e ( " - graph " ) ) { index ++; if ( index < args . length && args [ index ]. e q u a l s I g n or e C a s e ( " on " ) ) drawGraph = true ; } else { logger . warning ( " Illegal argument : " + args [ index ]) ; } } } private void setupLoggers () { ConsoleHandle r ch = new Co nsoleHan dler () ; ch . setFormatter ( new Formatter () { @Override public String format ( LogRecord lr ) { return lr . getLoggerName () + " " + lr . getLevel () . getName () + " : " + lr . getMessage () + " \ n " ; } }) ; Logger . getLogger ( " gameplayer " ) . addHandler ( ch ) ; Logger . getLogger ( " gameplayer " ) . s e t U s e P a r e n t H a n d l e r s ( false ) ; Logger . getLogger ( " network " ) . addHandler ( ch ) ; Logger . getLogger ( " network " ) . s e t U s e P a r e n t H a n d l e r s ( false ) ; Logger . getLogger ( " kifParser " ) . addHandler ( ch ) ; Logger . getLogger ( " kifParser " ) . s e t U s e P a r e n t H a n d l e r s ( false ) ; ch . setLevel ( Level . ALL ) ; ConsoleHandle r p r o l o g C o n s o l e H a n d l e r = new Con soleHan dler () ; p r o l o g C o n s o l e H a n d l e r . setFormatter ( new Formatter () { @Override public String format ( LogRecord lr ) { return lr . getMessage () + " \ n " ; } D.1 gameplayer 78 79 }) ; Logger . getLogger ( " gameplayer . S W I P r o l o g I n t e r f a c e " ) . s e t U s e P a r e n t H a n d l e r s ( false ) ; Logger . getLogger ( " gameplayer . S W I P r o l o g I n t e r f a c e " ) . addHandler ( prologConsoleHandler ); p r o l o g C o n s o l e H a n d l e r . setLevel ( Level . ALL ) ; 80 81 82 83 84 Logger . getLogger ( " gameplayer " ) . setLevel ( Level . WARNING ) ; Logger . getLogger ( " gameplayer . S W I P r o l o g I n t e r f a c e " ) . setLevel ( Level . WARNING ) ; Logger . getLogger ( " gameplayer . MiniMaxPlayer " ) . setLevel ( Level . FINEST ) ; Logger . getLogger ( " gameplayer . UCTPlayer " ) . setLevel ( Level . FINEST ) ; Logger . getLogger ( " gameplayer . GameAnalyzer " ) . setLevel ( Level . WARNING ) ; Logger . getLogger ( " gameplayer . GameManager " ) . setLevel ( Level . WARNING ) ; Logger . getLogger ( " gameplayer . T r a n s p o s i t i o n T a b l e " ) . setLevel ( Level . WARNING ) ; Logger . getLogger ( " network " ) . setLevel ( Level . WARNING ) ; Logger . getLogger ( " kifParser " ) . setLevel ( Level . WARNING ) ; 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 } public String getAlgorithm () { return algorithm ; } public int getPort () { return port ; } public boolean isDrawGraph () { return drawGraph ; } } D.1.5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 63 Session.java package gameplayer ; import java . util . logging . Logger ; public class Session { private static final Logger logger = Logger . getLogger ( " gameplayer . Session " ) ; // One second seems to be sufficient buffer . long timeBuffer = 1000; private Reasoner reasoner ; private State currentState ; private Player player ; public Session ( Gameplayer gameplayer , String description , String role ) { reasoner = new Reasoner ( description ) ; currentState = reasoner . getInitState () ; if ( gameplayer . getAlgorithm () . e q u a l s I g n o r e C a s e ( " minimax " ) ) player = new MiniMaxPlayer ( reasoner , role , gameplayer . isDrawGraph () ) ; else if ( gameplayer . getAlgorithm () . e q u a l s I g n o re C a s e ( " uct " ) ) player = new UCTPlayer ( reasoner , role , gameplayer . isDrawGraph () ) ; else player = new MiniMaxPlayer ( reasoner , role , gameplayer . isDrawGraph () ) ; } 64 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 public void updateState ( String [] actions ) { currentState = reasoner . getNextState ( reasoner . getRoles () , actions , currentState ) ; logger . fine ( " Current state updated : " + currentState . toPro logStri ng () ) ; } public void m a k e P r e G a m e A n a l y s i s ( long time ) { long endTime = System . c u r r e n t T i m e M i l l i s () + time ; player . m a k e P r e G a m e A n a l y s i s ( endTime - timeBuffer ) ; } public Move makeMove ( long time ) { long endTime = System . c u r r e n t T i m e M i l l i s () + time ; return player . makeMove ( currentState , endTime - timeBuffer ) ; } } D.1.6 1 2 3 4 5 6 7 8 9 10 11 12 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 IProlog.java package gameplayer ; import java . util . Hashtable ; public interface IProlog { public void ruleClause ( String clause ) ; public Hashtable [] allSolutions ( String goalClause ) ; public Hashtable oneSolution ( String goalClause ) ; public boolean hasSolution ( String goalClause ) ; } D.1.7 1 2 3 4 5 6 7 8 9 Source Code MiniMaxPlayer.java package gameplayer ; import java . util . logging . Logger ; import jpl . PrologE x ce pt io n ; public class MiniMaxPlayer implements Player { private static final Logger logger = Logger . getLogger ( " gameplayer . MiniMaxPlayer " ) ; private private private private private private private private final String role ; final Reasoner reasoner ; final boolean drawGraph ; final GameAnalyzer gameAnalyzer ; T r an sp o s i t i o n T a b l e transTable ; String [] opponents ; String [] players ; int playerIndex ; private Move bestMove ; private boolean abort = false ; private boolean compl eteSear ch = false ; public MiniMaxPlayer ( Reasoner reasoner , String role , boolean drawGraph ) { this . reasoner = reasoner ; D.1 gameplayer 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 65 this . role = role ; this . gameAnalyzer = new GameAnalyzer ( reasoner ) ; this . transTable = new T r a n s p o s i t i o n T a b l e ( reasoner , drawGraph ) ; this . drawGraph = drawGraph ; this . opponents = new String [ reasoner . getRoles () . length - 1]; this . players = new String [ reasoner . getRoles () . length ]; int i = 0; for ( String r : reasoner . getRoles () ) { if ( r . equals ( role ) ) { playerIndex = i ; } else { opponents [ i ] = r ; players [ i ] = r ; i ++; } } players [ players . length - 1] = role ; } public void m a k e P r e G a m e A n a l y s i s ( long endTime ) { gameAnalyzer . analyze ( reasoner . getInitState () , endTime , playerIndex ) ; } public Move makeMove ( State state , long endTime ) { if ( transTable . size () > T r a n s p o s i t i o n T a b l e . MAXSIZE / 2) { transTable = new T r a n s p o s i t i o n T a b l e ( reasoner , drawGraph ) ; } abort = false ; completeSearch = false ; int depth = 1; bestMove = new Move ( " NIL " ) ; while (! comple teSearch && System . c u r r e n t T i m e M i l l i s () < endTime ) { Move m = minimax ( state , depth , endTime ) ; if (! abort ) bestMove = m ; if ( bestMove . getValue () == 100) break ; // No better move can be found depth ++; } // If there is time left , make some additional analysis . if ( System . c u r r e n t T i m e M i l l i s () < endTime ) gameAnalyzer . analyze ( state , endTime , playerIndex ) ; return bestMove ; } private Move minimax ( gameplayer . State state , int depthlimit , long endTime ) { long currentTime = System . c u r r e n t T i m e M i l l i s () ; completeSearch = true ; abort = false ; Move bestMove = new Move ( " NIL " ) ; Move [] moves = transTable . getMoves ( role , state ) ; if ( moves . length > 1) for ( Move move : moves ) { if ( System . c u r r e n t T i m e M i l l i s () >= endTime ) { abort = true ; break ;} Move [][] oppo nentsMo ves = null ; float value ; if ( opponents . length == 0) { gameplayer . State nextState = transTable . getNextState ( players , new Move []{ move } , state ) ; value = evalMaxNode ( nextState , 0 , Integer . MAX_VALUE , 0 , depthlimit , endTime ) ; } else { 66 88 89 for ( int i = 0; i < opponents . length ; i ++) opponentsMov es = cartesian ( opponentsMoves , transTable . getMoves ( opponents [ i ] , state ) ) ; value = evalMinNode ( move , opponentsMoves , state , bestMove . getValue () , Integer . MAX_VALUE , 0 , depthlimit , endTime ) ; 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 Source Code } if (! abort ) move . setValue ( value ) ; if ( move . getValue () > bestMove . getValue () ) bestMove = move ; if ( bestMove . getValue () == 100) break ; // no better move can be found } else if ( moves . length == 1) { // Only one possible move bestMove = moves [0]; // Spend the time doing some additional analyzing of the game gameAnalyzer . analyze ( state , endTime , playerIndex ) ; } logger . finest ( " *** Depth : " + depthlimit + " *** " ) ; logger . finest ( " Time : " + ( System . c u r r e n t T i m e M i l l i s () - currentTime ) + " ms " ) ; logger . finest ( bestMove . getValue () + " : " + bestMove . toPr ologStr ing () ) ; return bestMove ; } private float evalMaxNode ( gameplayer . State state , float alpha , float beta , int depth , int depthLimit , long endTime ) { try { if ( transTable . isTerminal ( state ) ) { // Return the goal value of the player return transTable . goalValues ( state ) [ playerIndex ]; } if ( depth >= depthLimit ) { if ( compl eteSear ch ) comple teSearch = false ; // Return a heuristic evaluation of the state return gameAnalyzer . evaluate ( state ) ; } Move [] playerMoves = transTable . getMoves ( role , state ) ; Move [][] oppon entsMov es = null ; if ( opponents . length == 0) { // No opponent . Skip the minimising part . for ( Move playerMove : playerMoves ) { if ( System . c u r r e n t T i m e M i l l i s () >= endTime ) { abort = true ; return alpha ;} gameplayer . State nextState = transTable . getNextState ( players , new Move []{ playerMove } , state ) ; alpha = Math . max ( alpha , evalMaxNode ( nextState , 0 , beta , depth +1 , depthLimit , endTime ) ) ; // If an optimal solution is found , return it . if ( alpha == 100) return alpha ; } } else { // Make a two dimentional array of all moves of all opponents for ( int i = 0; i < opponents . length ; i ++) { opponentsMove s = cartesian ( opponentsMoves , transTable . getMoves ( opponents [ i ] , state ) ) ; } for ( Move playerMove : playerMoves ) { if ( System . c u r r e n t T i m e M i l l i s () >= endTime ) { abort = true ; return alpha ;} alpha = Math . max ( alpha , evalMinNode ( playerMove , opponentsMoves , D.1 gameplayer 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 67 state , alpha , beta , depth , depthLimit , endTime ) ) ; if ( beta <= alpha ) return alpha ; } } } catch ( S t a c k O v e r f l o w E r r o r e ) { // Overflow occured before a terminal state was reached . // The overflow was probably caused by a very deep search . logger . warning ( e . getMessage () ) ; } catch ( P ro lo gE x ce pt io n e ) { // Something nasty happened in Prolog logger . warning ( e . getMessage () ) ; } return alpha ; } private float evalMinNode ( Move playerMove , Move [][] opponentsMoves , gameplayer . State state , float alpha , float beta , int depth , int depthLimit , long endTime ) { for ( Move [] opponentsMove : oppo nentsMov es ) { if ( System . c u r r e n t T i m e M i l l i s () >= endTime ) { // Time is up // Return value is not used abort = true ; return beta ; } String [] moveArray = new String [ reasoner . getRoles () . length ]; for ( int i = 0; i < opponentsMove . length ; i ++) { moveArray [ i ] = opponentsMove [ i ]. to PrologS tring () ; } moveArray [ opponentsMove . length ] = playerMove . toP rologSt ring () ; gameplayer . State nextState = transTable . getNextState ( players , moveArray , state ) ; beta = Math . min ( beta , evalMaxNode ( nextState , alpha , beta , depth +1 , depthLimit , endTime ) ) ; if ( beta <= alpha ) return beta ; } return beta ; } /* * * Adds another array to an existing cartesian product */ private static Move [][] cartesian ( Move [][] cart , Move [] array ) { Move [][] r ; if ( cart == null ) { r = new Move [ array . length ][1]; for ( int i = 0; i < array . length ; i ++) r [ i ][0] = array [ i ]; } else { r = new Move [ cart . length * array . length ][ cart [0]. length + 1]; for ( int i = 0; i < cart . length * array . length ; i = i + cart . length ) { for ( int j = 0; j < cart . length ; j ++) { for ( int k = 0; k < cart [0]. length ; k ++) { r [ j + i ][ k ] = cart [ j ][ k ]; } r [ j + i ][ cart [0]. length ] = array [ i / cart . length ]; } } } return r ; 68 204 205 } } D.1.8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 Source Code Move.java package gameplayer ; public class Move { private static final int E X P L O R E _ E X P L O I T _ F A C T O R = 40; private String move ; /* * calculated value of move */ private float value = -1 f ; private long UCTValue = 0; private long UCTVisits = 0; public Move ( String move ) { this . move = move ; } public static Move c r e a t e Fr o m P r o l o g ( String prologString ) { return new Move ( prologString ) ; } public void addUCTValue ( long i ) { UCTValue += i ; UCTVisits ++; } public float getTotalValue ( State state ) { return getUCTValue () + g e t UC T B o n u s V a l u e ( state . getVisits () ) ; } public float getUCTValue () { if ( UCTVisits == 0) { // We have no clue about this move , so we only prefer it over // moves where we know we have lost by returning 1. return 1; } else return (( float ) UCTValue ) / (( float ) UCTVisits ) ; } public float g e t U C T B o n u s V a l u e ( long stateVisits ) { if ( UCTVisits == 0) return Float . MAX_VALUE ; else return E X P L O R E _ E X P L O I T _ F A C T O R * ( float ) Math . sqrt ( Math . log ( stateVisits ) / ( float ) UCTVisits ) ; } public String t oProlog String () { return move ; } public String toString () { return move ; } public float getValue () { return value ; } public boolean equals ( Object obj ) { D.1 gameplayer 58 59 60 61 62 63 64 65 66 67 68 return ( obj instanceof Move && (( Move ) obj ) . toP rologStr ing () . equals ( this . toPro logStri ng () ) ) ; } public void setValue ( float value ) { this . value = value ; } public long getUCTVisits () { return UCTVisits ; } } D.1.9 1 2 3 4 5 6 7 8 9 Player.java package gameplayer ; public interface Player { public abstract void m a k e P r e G a m e A n a l y s i s ( long endTime ) ; public abstract Move makeMove ( State state , long endTime ) ; } D.1.10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 69 Reasoner.java package gameplayer ; import import import import java . util . ArrayList ; java . util . HashSet ; java . util . Hashtable ; java . util . logging . Logger ; public class Reasoner { private ); private private private static final Logger logger = Logger . getLogger ( " gameplayer . Reasoner " IProlog prolog = new S W I P r o l o g I n t e r f a c e () ; String [] roles ; State initState ; public Reasoner ( String description ) { ArrayList < String > roles = new ArrayList < String >() ; initState = new State () ; for ( String line : description . split ( " \ n " ) ) { if ( line . startsWith ( " init ( " ) ) { initState . add ( line . substring (5 , line . length () -1) ) ; } else if ( line . startsWith ( " role ( " ) ) { roles . add ( line . substring (5 , line . length () -1) ) ; } // Some descriptions might refer to init and role in other rules , so we leave them in . prolog . ruleClause ( line ) ; } this . roles = roles . toArray ( new String []{}) ; } public State getInitState () { 70 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 Source Code return initState ; } public Move [] getMoves ( String role , State state ) { Hashtable [] answer = prolog . allSolutions ( " legal ( " + role + " , Move , " + state . toPrologStri ng () + " ) " ) ; HashSet < String > moveSet = new HashSet < String >() ; for ( int i = 0; i < answer . length ; i ++) { moveSet . add ( answer [ i ]. get ( " Move " ) . toString () ) ; } Move [] moves = new Move [ moveSet . size () ]; int i = 0; for ( String move : moveSet ) { moves [ i ] = Move . c r e a t e F r o m Pr o l o g ( move ) ; i ++; } return moves ; } public Move getOneMove ( String role , State state ) { Hashtable answer = prolog . oneSolution ( " legal ( " + role + " , Move , " + state . toPrologStri ng () + " ) " ) ; return Move . c r ea t e F r o m P r o l o g ( answer . get ( " Move " ) . toString () ) ; } public int [] goalValues ( State state ) { int [] values = new int [ roles . length ]; int index = 0; for ( String role : roles ) { Hashtable answer = prolog . oneSolution ( " goal ( " + role + " , Value , " + state . toPrologStr ing () + " ) " ) ; if ( answer == null || answer . get ( " Value " ) == null ) { logger . severe ( " Could not find goal value for player : " + role ) ; values [ index ++] = 0; } else { values [ index ++] = Integer . parseInt ( answer . get ( " Value " ) . toString () ) ; } } return values ; } public State getNextState ( String [] roles , String [] actions , State currentState ) { if ( actions . length == 1 && actions [0]. e q u a l s I g n o r e C a s e ( " NIL " ) ) { // Do nothing return currentState ; } else if ( roles . length == actions . length ) { State nextState = new State () ; for ( int i = 0; i < roles . length ; i ++) { prolog . hasSolution ( " assert ( does ( " + roles [ i ]+ " ," + actions [ i ]+ " ) ) " ) ; } Hashtable [] answer = prolog . allSolutions ( " next ( Next , " + currentState . toPrologStr ing () + " ) " ) ; for ( Hashtable ht : answer ) { nextState . add ( ht . get ( " Next " ) . toString () ) ; } prolog . allSolutions ( " retractall ( does (_ , _ ) ) " ) ; return nextState ; } else { // This should never happen . logger . severe ( " Roles and action arrays are not of equal size . " ) ; return currentState ; D.1 gameplayer 92 93 94 95 96 97 98 99 100 101 102 } } public String [] getRoles () { return roles ; } public boolean isTerminal ( State state ) { return prolog . hasSolution ( " terminal ( " + state . toProlo gString () + " ) " ) ; } } D.1.11 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 71 State.java package gameplayer ; import java . util . HashSet ; import java . util . Hashtable ; public class State { private private private private private private private private HashSet < String > state = new HashSet < String >() ; Hashtable < String , Move [] > moves = new Hashtable < String , Move [] >() ; Hashtable < Long , State > nextStates = new Hashtable < Long , State >() ; Boolean isTerminal = null ; int [] values = null ; String stringCache = null ; Long hashCache = null ; long visits = 0; public long getVisits () { return visits ; } public void incVisits () { visits ++; } public void add ( String fluent ) { state . add ( fluent ) ; } public void setMoves ( String role , Move [] moves ) { this . moves . put ( role , moves ) ; } public Move [] getMoves ( String role ) { return moves . get ( role ) ; } public boolean hasMoves ( String role ) { return moves . containsKey ( role ) ; } public void setTerminal ( boolean value ) { isTerminal = value ; } public boolean isTerminal () { return isTerminal ; } public boolean hasTerminal () { 72 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 Source Code return ( isTerminal != null ) ; } public void setValues ( int [] values ) { this . values = values ; } public int [] getValues () { return values ; } public boolean hasValues () { return ( values != null ) ; } public void setNextState ( String [] roles , String [] actions , State nextState ) { long hash = 0; for ( String s : roles ) { hash = 31* hash + s . hashCode () ; } for ( String s : actions ) { hash = 31* hash + s . hashCode () ; } nextStates . put ( hash , nextState ) ; } public State getNextState ( String [] roles , String [] actions ) { return nextStates . get ( hashArrays ( roles , actions ) ) ; } public boolean hasNextState ( String [] roles , String [] actions ) { return nextStates . containsKey ( hashArrays ( roles , actions ) ) ; } private long hashArrays ( String [] arr1 , String [] arr2 ) { long hash = 0; for ( String s : arr1 ) { hash = 31* hash + s . hashCode () ; } for ( String s : arr2 ) { hash = 31* hash + s . hashCode () ; } return hash ; } public boolean equals ( Object obj ) { return ( obj instanceof State && (( State ) obj ) . getAtoms () . equals ( state ) ) ; } public HashSet < String > getAtoms () { return state ; } public long getHashCode () { if ( hashCache == null ) { hashCache = new Long (0) ; for ( String s : state ) { hashCache = 31* hashCache + s . hashCode () ; } } return hashCache ; } public String to PrologSt ring () { D.1 gameplayer 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 if ( stringCache == null ) { StringBuilder sb = new StringBuilder () ; sb . append ( " [ " ) ; int i = 0; for ( String fluent : state ) { if ( i > 0) sb . append ( " , " ) ; i ++; sb . append ( fluent ) ; } sb . append ( " ] " ) ; stringCache = sb . toString () ; } return stringCache ; } } D.1.12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 73 SWIPrologInterface.java package gameplayer ; import java . util . Hashtable ; import java . util . logging . Logger ; import jpl . Query ; public class S W I P r o l o g I n t e r f a c e implements IProlog { private static final Logger logger = Logger . getLogger ( " gameplayer . S WI P ro l o g I n t e r f a c e " ) ; public S WI P r o l o g I n t e r f a c e () { // Define the procedure system_rule to avoid errors at first reset . hasSolution ( " dynamic system_rule /1 " ) ; // Reset the prolog engine resetRules () ; // Global rules true for all games ruleClause ( " true (X , State ) : - member (X , State ) " ) ; ruleClause ( " distinct (X , Y ) : - X \\== Y " ) ; ruleClause ( " not ( X ) : - \\+ X " ) ; // already in SWI - prolog // Deprecated rules , ruleClause ( " or (A , B ) ruleClause ( " or (A , B , ruleClause ( " or (A , B , ruleClause ( " or (A , B , ruleClause ( " or (A , B , but needed for old game descriptions :- A ; B"); C) :- A ; B ; C"); C, D) :- A ; B ; C ; D"); C, D, E) :- A ; B ; C ; D ; E"); C, D, E, F) :- A ; B ; C ; D ; E ; F"); } public Hashtable [] allSolutions ( String goalClause ) { logger . fine ( goalClause + " . " ) ; return Query . allSolutions ( goalClause ) ; } public Hashtable oneSolution ( String goalClause ) { logger . fine ( goalClause + " . " ) ; return Query . oneSolution ( goalClause ) ; } public boolean hasSolution ( String goalClause ) { logger . fine ( goalClause + " . " ) ; return Query . hasSolution ( goalClause ) ; 74 45 46 47 48 49 } private void resetRules () { logger . fine ( " system_rule ( _X ) , erase ( _X ) , retract ( system_rule ( _X ) ) . " ) ; Query . allSolutions ( " system_rule ( _X ) , erase ( _X ) , retract ( system_rule ( _X ) ) " ); } 50 51 52 53 54 55 56 public void ruleClause ( String clause ) { logger . fine ( " assert (( " + clause + " ) , _R ) , assert ( system_rule ( _R ) ) . " ) ; Query . allSolutions ( " assert (( " + clause + " ) , _R ) , assert ( system_rule ( _R ) ) "); } } D.1.13 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 Source Code TranspositionTable.java package gameplayer ; import java . util . Hashtable ; import java . util . logging . Logger ; import org . ubiety . ubigraph . Ubig raphClie nt ; public class Tr a n s p o s i t i o n T a b l e { public static final int MAXSIZE = 18000; private static final String graphHost = " http : / / 1 9 2 . 1 6 8 . 0 . 3 : 2 0 7 3 8 / RPC2 " ; private static final Logger logger = Logger . getLogger ( " gameplayer . T ra n sp os i t i o n T a b l e " ) ; private final Reasoner reasoner ; private Hashtable < Long , State > table = new Hashtable < Long , State >() ; private boolean drawGraph = false ; private UbigraphClien t graph ; public T ra n sp o s i t i o n T a b l e ( Reasoner reasoner , boolean drawGraph ) { this . drawGraph = drawGraph ; this . reasoner = reasoner ; if ( drawGraph ) { graph = new U bigraphC lient ( graphHost ) ; graph . clear () ; graph . newEdgeStyle (1 , 0) ; graph . s e t E d g e S t y l e A t t r i b u t e (1 , " oriented " , " true " ) ; } } public int size () { return table . size () ; } public Reasoner getReasoner () { return reasoner ; } public State g e t S t a t e F r o m T a b l e ( State currentState ) { State transState = table . get ( currentState . getHashCode () ) ; if ( transState == null ) { transState = currentState ; putStateInT ab le ( transState , null ) ; } if (! currentState . equals ( transState ) ) { logger . warning ( " Hash collision detected . " ) ; D.1 gameplayer 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 75 return currentState ; } return transState ; } private void pu tS ta t eI nT ab l e ( State transState , State parentState ) { if ( table . size () < MAXSIZE ) { table . put ( transState . getHashCode () , transState ) ; if ( drawGraph ) { graph . newVertex ( transState . hashCode () ) ; if ( parentState != null ) { int edge = graph . newEdge ( parentState . hashCode () , transState . hashCode () ) ; graph . ch an ge E dg eS ty l e ( edge , 1) ; } } } } public State getNextState ( String [] roles , Move [] actions , State currentState ) { String [] stringMoves = new String [ actions . length ]; for ( int i = 0; i < actions . length ; i ++) { stringMoves [ i ] = actions [ i ]. t oProlog String () ; } return getNextState ( roles , stringMoves , currentState ) ; } public State getNextState ( String [] roles , String [] actions , State currentState ) { State transState = g e t S t a t e F r o m T a b l e ( currentState ) ; if ( transState . hasNextState ( roles , actions ) ) { State nextState = transState . getNextState ( roles , actions ) ; return nextState ; } else { State nextState = reasoner . getNextState ( roles , actions , currentState ) ; transState . setNextState ( roles , actions , nextState ) ; putSta te In T ab le ( nextState , transState ) ; return nextState ; } } public Move [] getMoves ( String role , State state ) { State transState = g e t S t a t e F r o m T a b l e ( state ) ; if ( transState . hasMoves ( role ) ) { return transState . getMoves ( role ) ; } else if ( transState . getVisits () == 1 && table . size () < MAXSIZE ) { Move move = reasoner . getOneMove ( role , state ) ; return new Move []{ move }; } else { Move [] moves = reasoner . getMoves ( role , state ) ; transState . setMoves ( role , moves ) ; return moves ; } } public int [] goalValues ( State state ) { State transState = g e t S t a t e F r o m T a b l e ( state ) ; if ( transState . hasValues () ) { return transState . getValues () ; } else { 76 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 Source Code int [] values = reasoner . goalValues ( state ) ; transState . setValues ( values ) ; return values ; } } public boolean isTerminal ( State state ) { State transState = g e t S t a t e F r o m T a b l e ( state ) ; if ( transState . hasTerminal () ) { return transState . isTerminal () ; } else { boolean value = reasoner . isTerminal ( state ) ; transState . setTerminal ( value ) ; if ( value && drawGraph ) graph . s e t V e r t e x A t t r i b u t e ( state . hashCode () , " color " , " #00 ff00 " ) ; return value ; } } } D.1.14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 UCTPLayer.java package gameplayer ; import java . util . Random ; import java . util . logging . Logger ; public class UCTPlayer implements Player { private static final Logger logger = Logger . getLogger ( " gameplayer . UCTPlayer "); private static final Random rnd = new Random () ; private final String role ; private final Reasoner reasoner ; private final boolean drawGraph ; private T r an sp o s i t i o n T a b l e transTable ; private UCTSearcher searcher ; public UCTPlayer ( Reasoner reasoner , String role , boolean drawGraph ) { this . reasoner = reasoner ; this . transTable = new T r a n s p o s i t i o n T a b l e ( reasoner , drawGraph ) ; this . drawGraph = drawGraph ; this . role = role ; this . searcher = new UCTSearcher ( transTable ) ; } public void m a k e P r e G a m e A n a l y s i s ( long endTime ) { while ( System . c u r r e n t T i m e M i l l i s () < endTime ) searcher . search ( reasoner . getInitState () , endTime ) ; } public Move makeMove ( State state , long endTime ) { state = transTable . g e t S t a t e F r o m T a b l e ( state ) ; Move [] moves = transTable . getMoves ( role , state ) ; if ( moves . length == 0) { // If no moves are possible // something must be wrong return new Move ( " NIL " ) ; } if ( moves . length == 1) { D.1 gameplayer 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 // Only one possible move // Spend the time doing some analyzing of the game while ( System . c u r r e n t T i m e M i l l i s () < endTime ) searcher . search ( state , endTime ) ; // Note that the transposition table is not cleared return moves [0]; } while ( System . c u r r e n t T i m e M i l l i s () < endTime ) { searcher . search ( state , endTime ) ; } // Randomly choose a move along the best moves // This makes the player make a random choise if it has no clue float bestValue = 0; Move [] returnMove = new Move [ moves . length ]; int found = 0; for ( Move move : moves ) { if ( move . getUCTValue () == bestValue ) { returnMove [ found ++] = move ; } else if ( move . getUCTValue () > bestValue ) { bestValue = move . getUCTValue () ; found = 0; returnMove [ found ++] = move ; } } logger . finest ( " --- All moves - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - " ) ; logger . finest ( " Value \ t : Visit \ t : Move \ t \ t : Bonus " ) ; for ( Move move : moves ) { logger . finest ( move . getUCTValue () + " \ t : " + move . getUCTVisits () + " \ t : " + move . toString () + " \ t : " + move . g e t U C T B o n u s V a l u e ( state . getVisits () ) ) ; } logger . finest ( " - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - " ) ; logger . finer ( " Table size : " + transTable . size () ) ; if ( transTable . size () > T r a n s p o s i t i o n T a b l e . MAXSIZE / 2) { // Clear transposition table transTable = new T r a n s p o s i t i o n T a b l e ( reasoner , drawGraph ) ; searcher = new UCTSearcher ( transTable ) ; } return returnMove [ rnd . nextInt ( found ) ]; } } D.1.15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 77 UCTSearcher.java package gameplayer ; import java . util . ArrayList ; import java . util . Random ; import java . util . logging . Logger ; import jpl . P ro lo gE x ce pt io n ; public class UCTSearcher { private static final Logger logger = Logger . getLogger ( " gameplayer . UCTSearcher " ) ; /* * An array of goal values representing a draw */ private final int [] drawArray ; private T r a n s p o s i t i o n T a b l e transTable ; 78 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 Source Code private boolean abort ; public UCTSearcher ( T r a n s p o s i t i o n T a b l e transTable ) { this . transTable = transTable ; drawArray = new int [ transTable . getReasoner () . getRoles () . length ]; for ( int i = 0; i < drawArray . length ; i ++) { drawArray [ i ] = 50; } } public int [] search ( State state , long endTime ) { abort = false ; return searchRec ( state , endTime ) ; } private int [] searchRec ( State state , long endTime ) { state . incVisits () ; try { if ( transTable . isTerminal ( state ) ) { return transTable . goalValues ( state ) ; } else if ( abort || System . c u r r e n t T i m e M i l l i s () >= endTime ) { // The time is up . Return - values are not used . abort = true ; return drawArray ; } else { Move [] moves = new Move [ transTable . getReasoner () . getRoles () . length ]; for ( int i = 0; i < moves . length ; i ++) { moves [ i ] = selectMove ( state , transTable . getMoves ( transTable . getReasoner () . getRoles () [ i ] , state ) ) ; } State nextState = transTable . getNextState ( transTable . getReasoner () . getRoles () , moves , state ) ; int [] moveValues = searchRec ( nextState , endTime ) ; if (! abort ) { for ( int i = 0; i < moves . length ; i ++) { moves [ i ]. addUCTValue ( moveValues [ i ]) ; } } return moveValues ; } } catch ( S ta c kO v e r f l o w E r r o r e ) { // Overflow occured before a terminal state was reached . // The overflow was probably caused by a very deep search . logger . warning ( e . getMessage () ) ; // The evaluation value is uncertain so we return a draw . return drawArray ; } catch ( Prolo gE x ce pt io n e ) { // Something nasty happened in Prolog logger . warning ( e . getMessage () ) ; // Return array of zeros to avoid this state again . return new int [ transTable . getReasoner () . getRoles () . length ]; } } public int [] s e a r c h S a v e S t a t es ( ArrayList < State > states , State state , long endTime ) { abort = false ; return s e a r c h R e c S a v e S t a t e s ( states , state , endTime ) ; } private int [] s e a r c h R e c S a v e S t a t e s ( ArrayList < State > states , State state , D.1 gameplayer long endTime ) { states . add ( state ) ; state . incVisits () ; try { if ( transTable . isTerminal ( state ) ) { return transTable . goalValues ( state ) ; } else if ( abort || System . c u r r e n t T i m e M i l l i s () >= endTime ) { // The time is up . Return - values are not used . abort = true ; return drawArray ; } else { Move [] moves = new Move [ transTable . getReasoner () . getRoles () . length ]; for ( int i = 0; i < moves . length ; i ++) { moves [ i ] = selectMove ( state , transTable . getMoves ( transTable . getReasoner () . getRoles () [ i ] , state ) ) ; } State nextState = transTable . getNextState ( transTable . getReasoner () . getRoles () , moves , state ) ; int [] moveValues = s e a r c h R e c S a v e S t a t e s ( states , nextState , endTime ) ; if (! abort ) { for ( int i = 0; i < moves . length ; i ++) { moves [ i ]. addUCTValue ( moveValues [ i ]) ; } } return moveValues ; } } catch ( S t a c k O v e r f l o w E r r o r e ) { // Overflow occured before a terminal state was reached . // The overflow was probably caused by a very deep search . logger . warning ( e . getMessage () ) ; // The evaluation value is uncertain so we return a draw . return drawArray ; } catch ( P ro lo gE x ce pt io n e ) { // Something nasty happened in Prolog logger . warning ( e . getMessage () ) ; // Return array of zeros to avoid this state again . return new int [ transTable . getReasoner () . getRoles () . length ]; } 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 79 } public static Move selectMove ( State state , Move [] moves ) { Random rnd = new Random () ; float bestValue = 0; Move [] returnMove = new Move [ moves . length ]; int found = 0; float value = 0; for ( Move move : moves ) { value = move . getTotalValue ( state ) ; if ( value == bestValue ) { returnMove [ found ++] = move ; } else if ( value > bestValue ) { found = 0; returnMove [ found ++] = move ; bestValue = value ; } } return returnMove [ rnd . nextInt ( found ) ]; } } 80 D.2 D.2.1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 kifParser Command.java package kifParser ; public class Command { private String matchID = " " ; public String getMatchID () { return matchID ; } public void setMatchID ( String matchID ) { if ( matchID != null ) this . matchID = matchID ; else this . matchID = " " ; } } D.2.2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 Source Code GDLAtom.java package kifParser ; import java . util . HashSet ; public class GDLAtom implements GDLExpression { /* * The identifier in lower case */ private String identifier ; public GDLAtom ( String identifier ) { this . identifier = identifier . toLowerCase () ; } public String getIdentifier () { return identifier ; } public String toString () { return identifier ; } public String t oProlog String () { if ( identifier . equals ( " <= " ) ) return " : - " ; return identifier ; } public String toKIFString () { return identifier . toUpperCase () ; } public String t oProlog String ( HashSet < String > ruleAtoms ) { return toProlo gString () ; } } D.2 kifParser D.2.3 1 2 3 4 5 6 GDLDescription.java package kifParser ; import java . util . ArrayList ; import java . util . HashSet ; public class GD LDescrip tion extends ArrayList < GDLExpression > implements GDLExpression { 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 81 private static final long s er i a l V e r s i o n U I D = 8 1 5 1 0 3 1 0 6 6 9 8 0 6 7 4 2 L ; public String to PrologS tring () { HashSet < String > ruleAtoms = new HashSet < String >() ; ruleAtoms . add ( " true " ) ; ruleAtoms . add ( " legal " ) ; ruleAtoms . add ( " terminal " ) ; ruleAtoms . add ( " goal " ) ; ruleAtoms . add ( " next " ) ; for ( GDLExpression expression : this ) { if ( expression instanceof GDLList ) { if ((( GDLList ) expression ) . get (0) . to PrologS tring () . equals ( " : - " ) ) { ruleAtoms . add ((( GDLList ) expression ) . get (1) . to PrologSt ring () . split ( " \\( " ) [0]) ; } } } return toProlog String ( ruleAtoms ) ; } public String toKIFString () { String s = " " ; int i = 0; for ( GDLExpression expression : this ) { if ( i > 0) s += " \ n " ; s += expression . toKIFString () ; i ++; } return s ; } // // // // // public String to PrologS tring ( HashSet < String > ruleAtoms ) { ArrayList < String > descArray = new ArrayList < String >() ; for ( GDLExpression expression : this ) { descArray . add ( expression . toProlo gString ( ruleAtoms ) ) ; System . out . println ( expression . toProl ogString ( ruleAtoms ) ) ; } String s = " " ; int i = 0; for ( GDLExpression expression : this ) { if ( i > 0) s += " \ n " ; s += expression . toPro logStrin g ( ruleAtoms ) ; i ++; } return s ; } } 82 D.2.4 1 2 3 4 5 6 7 8 9 10 11 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 GDLExpression.java package kifParser ; import java . util . HashSet ; public interface GDLExpression { public String t oProlog String () ; public String t oProlog String ( HashSet < String > ruleAtoms ) ; public String toKIFString () ; } D.2.5 1 2 3 4 5 6 Source Code GDLList.java package kifParser ; import java . util . ArrayList ; import java . util . HashSet ; public class GDLList extends ArrayList < GDLExpression > implements GDLExpression { private static final long s er i a l V e r s i o n U I D = -5171710015998671320 L ; public String toString () { String s = " [ " ; for ( GDLExpression expression : this ) { s += expression . toString () ; } s += " ] " ; return s ; } public String t oProlog String () { return toProlo gString ( new HashSet < String >() ) ; } public String t oProlog String ( HashSet < String > ruleAtoms ) { String s = " " ; if ( size () > 0) { boolean imp = this . get (0) . t oProlog String ( ruleAtoms ) . equals ( " : - " ) ; s += this . get (0) . toP rologSt ring ( ruleAtoms ) ; if ( size () > 1) s += " ( " ; int i = 0; for ( GDLExpression expression : this ) { if ( i > 1) s += " , " ; if ( i == 2 && imp ) s += " ( " ; if ( i != 0) { s += expression . toPr ologStr ing ( ruleAtoms ) ; if ( expression instanceof GDLAtom && ruleAtoms . contains ( expression . toProlo gString () ) ) { s += " ( State ) " ; } } i ++; } if ( imp && size () > 2) s += " ) " ; else if ( imp && size () <= 2) s += " , true " ; else if ( ruleAtoms . contains ( this . get (0) . to PrologSt ring () ) ) D.2 kifParser 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 s += " , State " ; if ( size () > 1) s += " ) " ; } return s ; } public String toKIFString () { String s = " " ; if ( size () > 0) s = " ( " + this . get (0) . toKIFString () ; int i = 0; for ( GDLExpression expression : this ) { if ( i > 0) s += " " ; if ( i != 0) s += expression . toKIFString () ; i ++; } if ( size () > 0) s += " ) " ; return s ; } } D.2.6 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 package kifParser ; import java . util . HashSet ; public class GDLNumber implements GDLExpression { private int number ; public GDLNumber ( int number ) { this . number = number ; } public String toKIFString () { return number + " " ; } public String to PrologS tring () { return number + " " ; } public String to PrologS tring ( HashSet < String > ruleAtoms ) { return number + " " ; } } D.2.7 1 2 3 4 5 6 GDLNumber.java GDLVariable.java package kifParser ; import java . util . HashSet ; public class GDLVariable implements GDLExpression { 83 84 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Source Code private String identifier ; public GDLVariable ( String identifier ) { this . identifier = identifier ; } public String t oProlog String () { return identifier . toUpperCase () ; } public String toKIFString () { return " ? " + identifier . toUpperCase () ; } public String t oProlog String ( HashSet < String > ruleAtoms ) { return toProlo gString () ; } } D.2.8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 KIFLexer.java package kifParser ; import java . io . St r ea mT ok e ni ze r ; import java . io . StringReader ; import java . util . Hashtable ; public class KIFLexer extends Lexer { private final String [] tokens = new String [] { " START " , " PLAY " , " STOP " , "(", ")", "?" }; public public public public public public final final final final final final int int int int int int START = 0; PLAY = 1; STOP = 2; LPAR = 3; RPAR = 4; QUESTION = 5; protected String [] getTokens () { return tokens ; } public KIFLexer ( Hashtable < String , String > symbolTable ) { super ( symbolTable ) ; for ( String s : getTokens () ) { symbolTable . put (s , s ) ; } } public void setText ( String text ) throws Pars eExcept ion { st = new Strea m To ke ni z er ( new StringReader ( text ) ) ; st . resetSyntax () ; st . parseNumbers () ; st . wordChars ( ’! ’ , ’~ ’) ; D.2 kifParser 42 43 44 45 46 47 48 49 50 51 52 st . ordinaryChar ( ’( ’) ; st . ordinaryChar ( ’) ’) ; st . ordinaryChar ( ’? ’) ; st . white sp ac e Ch ar s ( ’ ’ , ’ ’) ; st . white sp ac e Ch ar s ( ’\ n ’ , ’\ n ’) ; st . white sp ac e Ch ar s ( ’\ t ’ , ’\ t ’) ; st . white sp ac e Ch ar s ( ’\ r ’ , ’\ r ’) ; getNextToken () ; } } D.2.9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 Lexer.java package kifParser ; import import import import java . io . IOException ; java . io . St r ea mT ok e ni ze r ; java . util . Hashtable ; java . util . Random ; public abstract class Lexer { public final int IDENT = 100; public final int EOF = 101; public final int NUMBER = 102; protected protected protected protected protected St re am T ok en i ze r st ; int currentToken ; String c u r r e n t T o k e n V a l u e ; abstract String [] getTokens () ; Hashtable < String , String > symbolTable ; public Lexer ( Hashtable < String , String > symbolTable ) { this . symbolTable = symbolTable ; } public int g et Cu r re nt To k en () { return currentToken ; } public String g e t C u r r e n t T o k e n V a l u e () { return c u r r e n t T o k e n V a l u e ; } private String g et Ra nd o mS tr in g () { Random r = new Random () ; StringBuilder sb = new StringBuilder () ; // Put a prefix on random strings to avoid hitting a reserved keyword sb . append ( " r_ " ) ; for ( int i = 0; i < 6; i ++) { // Generates a random char between ’A ’ and ’Z ’ sb . append (( char ) (( int ) ( r . nextInt (( int ) ’Z ’ - ’A ’) ) + ’A ’) ) ; } String token = sb . toString () ; if ( symbolTable . containsValue ( token ) ) // If the random string is already in use , then generate another . return g et Ra nd o mS tr i ng () ; else return token ; } public void getNextToken () throws Pa rseExce ption { 85 86 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 105 106 107 108 109 110 111 try { if ( st . nextToken () != S tr ea mT o ke ni z er . TT_EOF ) { if ( st . ttype == St re a mT ok e ni ze r . TT_NUMBER ) { currentToken = NUMBER ; cu rre n t T o k e n V a l u e = st . nval + " " ; } else { if ( st . ttype == St r ea mT ok e ni ze r . TT_WORD ) { if (! symbolTable . containsKey ( st . sval ) ) { symbolTable . put ( st . sval , g et R an do mS t ri ng () ) ; } cu rr e n t T o k e n V a l u e = symbolTable . get ( st . sval ) ; // For debugging . // Uncomment for undoing the scrambled strings . if (! st . sval . e q u a l s I g n o r e C a s e (" succ ") && ! st . sval . e q u a l s I g n o r e C a s e (" index ") && ! st . sval . e q u a l s I g n o r e C a s e (" plus ") ) c u r r e n t T o k e n V a l u e = st . sval ; } else { cu rr e n t T o k e n V a l u e = (( char ) st . ttype ) + " " ; } boolean found = false ; for ( int i = 0; i < getTokens () . length ; i ++) { if ( getTokens () [ i ]. equals ( c u r r e n t T o k e n V a l u e ) ) { currentToken = i ; found = true ; break ; } } if (! found ) { currentToken = IDENT ; } } } else { currentToken = EOF ; cu rre ntT o k e n V a l u e = " EOF " ; } } catch ( IOException e ) { throw new ParseEx ception ( Parse Excepti on . INTERNAL_ERROR , " An unknown error occured while parsing " ) ; } // // // // 92 93 94 95 96 97 98 99 100 101 102 103 104 Source Code } public void accept ( int token ) throws P arseExce ption { if ( currentToken == token ) { getNextToken () ; } else { String tokenValue = " " ; if ( token < getTokens () . length ) tokenValue = getTokens () [ token ]; else if ( token == EOF ) tokenValue = " EOF " ; else if ( token == IDENT ) tokenValue = " identifier " ; throw new P arseExc eption ( ParseE xceptio n . SYNTAX_ERROR , " Syntax error : Expected ’" + tokenValue + " ’ but got ’" + c u r r e n t T o k e n V a l u e + " ’" ); } } public void acceptIt () throws P arseExce ption { getNextToken () ; } } D.2 kifParser D.2.10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 package kifParser ; public class Pa rseExcep tion extends Exception { private static final long s er i a l V e r s i o n U I D = -1159681338343544219 L ; public static final int SYNTAX_ERROR = 1; public static final int IN TERNAL_E RROR = 2; private int type ; public ParseExc eption ( int type , String msg ) { super ( msg ) ; } public int getType () { return type ; } } D.2.11 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 PlayCommand.java package kifParser ; import java . util . ArrayList ; public class PlayCommand extends Command { protected ArrayList < String > actions = new ArrayList < String >() ; public void addAction ( String action ) { actions . add ( action ) ; } public String [] getActions () { return actions . toArray ( new String []{}) ; } public String toString () { String s = " PLAY command :\ n " ; s += " MatchID : " + getMatchID () + " \ n " ; for ( String action : actions ) { s += " Action : " + action + " \ n " ; } return s ; } } D.2.12 1 2 3 4 5 6 7 8 ParseException.java PrologLexer.java package kifParser ; import java . io . St r ea mT ok e ni ze r ; import java . io . StringReader ; import java . util . Hashtable ; public class PrologLexer extends Lexer { 87 88 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 Source Code public final String [] tokens = new String [] { "(", ")", " ," }; public final int LPAR = 0; public final int RPAR = 1; public final int COMMA = 2; protected String [] getTokens () { return tokens ; } public PrologLexer ( Hashtable < String , String > symbolTable , String text ) throws Parse Excepti on { super ( symbolTable ) ; st = new Strea m To ke ni z er ( new StringReader ( text ) ) ; st . resetSyntax () ; st . parseNumbers () ; st . wordChars ( ’! ’ , ’~ ’) ; st . ordinaryChar ( ’( ’) ; st . ordinaryChar ( ’) ’) ; st . ordinaryChar ( ’ , ’) ; st . whitespace Ch ar s ( ’ ’ , ’ ’) ; st . whitespace Ch ar s ( ’\ n ’ , ’\ n ’) ; st . whitespace Ch ar s ( ’\ t ’ , ’\ t ’) ; st . whitespace Ch ar s ( ’\ r ’ , ’\ r ’) ; getNextToken () ; } } D.2.13 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 PrologParser.java package kifParser ; public class PrologParser { public static GDLExpression parseFact ( PrologLexer lex ) throws ParseExcepti on { GDLExpression expression ; expression = pa rs e Ex pr e ss io n ( lex ) ; lex . accept ( lex . EOF ) ; return expression ; } private static GDLExpression p a rs eE x pr es si o n ( PrologLexer lex ) throws ParseExcepti on { String identifier = pa rs eI d en ti fi e r ( lex ) ; if ( lex . getC ur re n tT ok en () == lex . LPAR ) { return parseList ( lex , identifier ) ; } else { return new GDLAtom ( identifier ) ; } } private static String p ar se Id e nt if ie r ( PrologLexer lex ) throws ParseExcepti on { D.2 kifParser 24 25 26 27 28 29 30 31 32 33 34 35 if ( lex . g et Cu r re nt T ok en () == lex . IDENT ) { String identifier = lex . g e t C u r r e n t T o k e n V a l u e () ; lex . acceptIt () ; return identifier ; } else if ( lex . ge tC ur r en tT o ke n () == lex . NUMBER ) { String identifier = lex . g e t C u r r e n t T o k e n V a l u e () ; lex . acceptIt () ; return ( int ) Double . parseDouble ( identifier ) + " " ; } else { throw new ParseExc eption ( ParseE xceptio n . SYNTAX_ERROR , " Syntax error . Identifier expected but got ’" + lex . g e t C u r r e n t T o k e n V a l u e () + " ’" ) ; } 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 } private static GDLList parseList ( PrologLexer lex , String f i rs tI de n ti fi er ) throws Parse Excepti on { GDLList list = new GDLList () ; lex . accept ( lex . LPAR ) ; list . add ( new GDLAtom ( f ir st I de nt if i er ) ) ; list . add ( p ar se E xp re ss i on ( lex ) ) ; while ( lex . currentToken == lex . COMMA ) { lex . acceptIt () ; list . add ( p ar se E xp re s si on ( lex ) ) ; } lex . accept ( lex . RPAR ) ; return list ; } } D.2.14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 89 StartCommand.java package kifParser ; public class StartCommand extends Command { private String role ; /* * Description in Prolog . Each rule is separated by newline . */ private String description ; private int startClock ; private int playClock ; public String ge tDescri ption () { return description ; } public void setDe scriptio n ( String description ) { this . description = description ; } public int getPlayClock () { return playClock ; } public void setPlayClock ( int playClock ) { this . playClock = playClock ; } public String getRole () { return role . toLowerCase () ; } public void setRole ( String role ) { this . role = role ; } public int getStartClock () { 90 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 return startClock ; } public void setStartClock ( int startClock ) { this . startClock = startClock ; } public String toString () { String s = " START command :\ n " ; s += " MatchID : " + getMatchID () + " \ n " ; s += " Role : " + role + " \ n " ; s += " Description : " + description + " \ n " ; s += " startClock : " + startClock + " \ n " ; s += " playClock : " + playClock + " \ n " ; return s ; } } D.2.15 1 2 3 4 5 6 7 8 9 10 11 12 13 public class StopCommand extends PlayCommand { public String toString () { String s = " STOP command :\ n " ; s += " MatchID : " + getMatchID () + " \ n " ; for ( String action : actions ) { s += " Action : " + action + " \ n " ; } return s ; } } D.3.1 network HTTPConnectionException.java package network ; // Used when a HTTP connection is lost public class H T T P C o n n e c t i o n E x c e p t i o n extends Exception { private static final long s er i a l V e r s i o n U I D = 8 1 5 6 1 5 7 8 5 5 3 7 0 4 7 9 7 1 8 L ; } D.3.2 1 2 3 4 5 6 StopCommand.java package kifParser ; D.3 1 2 3 4 5 6 7 8 Source Code HTTPDummyRequest.java package network ; import java . io . B u f f e r e d O u t p u t S t r e a m ; public class HTT P D u m m y R e q u e s t extends HTTPRequest { D.3 network 7 8 9 10 11 12 13 public HTT P D u m m y R e q u e s t ( B u f f e r e d O u t p u t S t r e a m output , H T T P S e r v e r C o n f i g config ) { super ( output , config ) ; } public void execute () {} } D.3.3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 package network ; public class HTTPException extends Exception { private static final long s er i a l V e r s i o n U I D = -7340386472281554265 L ; private String responseCode ; private String body ; public HTTPException () { super () ; responseCode = HTTPRequest . R E S P O N S E _ C O D E _ 5 0 0 ; } public HTTPException ( String responseCode ) { super ( responseCode ) ; this . responseCode = responseCode ; } public HTTPException ( String responseCode , String body ) { super ( responseCode + " " + body ) ; this . responseCode = responseCode ; this . body = body ; } public String ge t Re sp o ns eC od e () { return responseCode ; } public String getBody () { return body ; } } D.3.4 1 2 3 4 5 6 7 8 9 10 11 12 13 HTTPException.java HTTPParser.java package network ; import import import import import java . io . B u f f e r e d O u t p u t S t r e a m ; java . io . IOException ; java . io . St r ea mT ok e ni ze r ; java . io . Bu fferedR eader ; java . util . logging . Logger ; public class HTTPParser { private static final Logger logger = Logger . getLogger ( " network " ) ; private HT T P S er v e r C o n f i g config ; 91 92 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 Source Code private private private private private StreamTo ke n iz er st ; BufferedReade r reader ; String currentToken ; B u f f e r e d O u t p u t S t r e a m out ; // Output to client HTTPRequest request ; // HTTP Request // = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = // Constructor // = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = public HTTPParser ( H T T P S e r v e r C o nf i g config , Buf feredRea der reader , B u f f e r e d O u t p u t S t r e a m out ) { this . config = config ; this . reader = reader ; this . out = out ; st = new Strea m To ke ni z er ( reader ) ; st . resetSyntax () ; st . wordChars ( ’\ u0000 ’ , ’\ u00FF ’) ; st . ordinaryChar ( ’ ’) ; st . ordinaryChar ( ’\ n ’) ; st . ordinaryChar ( ’\ r ’) ; } // = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = // Parse HTTP Request // = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = public HTTPRequest parseRequest () throws IOException , HTTPConnectionException , HTTPException { getNextToken () ; pa rse Req uest _ l i n e () ; // Treat space as a wordchar when parsing the header fields to parse one line at a time st . wordChars ( ’ ’ , ’ ’) ; accept ( HTTPSe r v e r C o n f i g . NL ) ; parseRequestH e a d () ; if ( request . g e tC o n t e n t L e n g t h () > 0) parseReques t B o d y () ; return request ; } // = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = // Parse next token // = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = private void getNextToken () throws IOException , H T T P C o n n e c t i o n E x c e p t i o n { if ( st . nextToken () != S tr ea mT o ke ni ze r . TT_EOF ) { if ( st . ttype == St re a mT ok en i ze r . TT_WORD ) { currentToken = st . sval ; } else { currentToken = (( char ) st . ttype ) + " " ; } if ( currentToken . equals (( char ) 13 + " " ) ) { // Take special care of the character ’\ r ’ getNextToken () ; return ; } logger . fine ( currentToken ) ; } else { // Connection was lost throw new H T T P C o n n e c t i o n E x c e p t i o n () ; } } // = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = D.3 network 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 93 // Accept current token if it matches expected token // = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = private void accept ( String token ) throws IOException , HTTPConnectionException , HTTPException { if ( currentToken . equals ( token ) ) { getNextToken () ; } else { // Syntax error throw new HTTPException ( HTTPRequest . R E S P O N S E _ C O D E _ 4 0 0 ) ; } } // = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = // Accept current token // = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = private void acceptIt () throws IOException , H T T P C o n n e c t i o n E x c e p t i o n { getNextToken () ; } // = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = // Parse request line // = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = private void p a r s e R e q u e s t _ l i n e () throws HTTPException , HTTPConnectionException , IOException { parseMethod () ; accept ( H T T P S e r v e r C o n f i g . SP ) ; parseReq u e s t _ U R I () ; accept ( H T T P S e r v e r C o n f i g . SP ) ; if ( currentToken . equals ( " HTTP /1.0 " ) ) { acceptIt () ; request . setVersion ( " HTTP /1.0 " ) ; } else if ( currentToken . equals ( " HTTP /1.1 " ) ) { acceptIt () ; request . setVersion ( " HTTP /1.1 " ) ; } else { throw new HTTPException ( HTTPRequest . R E S P O N S E _ C O D E _ 5 0 5 ) ; } } // = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = // Parse method // = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = private void parseMethod () throws IOException , HTTPConnectionException , HTTPException { if ( currentToken . equals ( " POST " ) ) { request = new H TT PP o st Re qu e st ( out , config ) ; acceptIt () ; } else { throw new HTTPException ( HTTPRequest . R E S P O N S E _ C O D E _ 5 0 1 ) ; } } private void p a r s e R e q u e s t _ UR I () throws IOException , HTTPConnectionException , HTTPException { accept ( " / " ) ; } // = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = // Parse request header // = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = private void p a r s e R e q u e s t H ea d () throws HTTPException , IOException , 94 HTTPConnectionException { String [] header ; while (! currentToken . equals ( H T T P S e r v e r C o n f ig . NL ) ) { header = currentToken . split ( " : " , 2) ; if ( header . length != 2) { throw new HTTPException ( HTTPRequest . RESPONSE_CODE_400 , " Header field is missing ’: ’ separator " ) ; } 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 Source Code // Remove leading and trailing whitespaces header [0] = header [0]. trim () ; header [1] = header [1]. trim () ; // Request header fields if ( header [0]. e q u a l s I g n o r e C as e ( " Accept " ) ) {} else if ( header [0]. e q u a l s I g n o r e C a s e ( " Accept - Charset " ) ) {} else if ( header [0]. e q u a l s I g n o r e C a s e ( " Accept - Encoding " ) ) {} else if ( header [0]. e q u a l s I g n o r e C a s e ( " Accept - Language " ) ) {} else if ( header [0]. e q u a l s I g n o r e C a s e ( " Authorization " ) ) {} else if ( header [0]. e q u a l s I g n o r e C a s e ( " Expect " ) ) {} else if ( header [0]. e q u a l s I g n o r e C a s e ( " From " ) ) {} else if ( header [0]. e q u a l s I g n o r e C a s e ( " Host " ) ) {} else if ( header [0]. e q u a l s I g n o r e C a s e ( " If - Match " ) ) {} else if ( header [0]. e q u a l s I g n o r e C a s e ( " If - Modified - Since " ) ) {} else if ( header [0]. e q u a l s I g n o r e C a s e ( " If - None - Match " ) ) {} else if ( header [0]. e q u a l s I g n o r e C a s e ( " If - Range " ) ) {} else if ( header [0]. e q u a l s I g n o r e C a s e ( " If - Unmodified - Since " ) ) {} else if ( header [0]. e q u a l s I g n o r e C a s e ( " Max - Forwards " ) ) {} else if ( header [0]. e q u a l s I g n o r e C a s e ( " Proxy - Authorization " ) ) {} else if ( header [0]. e q u a l s I g n o r e C a s e ( " Range " ) ) {} else if ( header [0]. e q u a l s I g n o r e C a s e ( " Referer " ) ) {} else if ( header [0]. e q u a l s I g n o r e C a s e ( " TE " ) ) {} else if ( header [0]. e q u a l s I g n o r e C a s e ( " User - Agent " ) ) {} // Entity header fields else if ( header [0]. e q u a l s I g n o r e C a s e ( " Allow " ) ) {} else if ( header [0]. e q u a l s I g n o r e C a s e ( " Content - Encoding " ) ) {} else if ( header [0]. e q u a l s I g n o r e C a s e ( " Content - Language " ) ) {} else if ( header [0]. e q u a l s I g n o r e C a s e ( " Content - Length " ) ) { try { int contentLength = Integer . parseInt ( header [1]) ; request . s e t C o n t e n t L e n g t h ( contentLength ) ; } catch ( N u m b e r F o r m a t E x c e p t i o n e ) { throw new HTTPException ( HTTPRequest . RESPONSE_CODE_400 , " Content Length must be a positive integer " ) ; } } else if ( header [0]. e q u a l s I g n o r e C a s e ( " Content - Location " ) ) {} else if ( header [0]. e q u a l s I g n o r e C a s e ( " Content - MD5 " ) ) {} else if ( header [0]. e q u a l s I g n o r e C a s e ( " Content - Range " ) ) {} else if ( header [0]. e q u a l s I g n o r e C a s e ( " Content - Type " ) ) {} else if ( header [0]. e q u a l s I g n o r e C a s e ( " Expires " ) ) {} else if ( header [0]. e q u a l s I g n o r e C a s e ( " Last - Modified " ) ) {} else { // Ignore other header fields } acceptIt () ; accept ( HTT P S e r v e r C o n f i g . NL ) ; } } private void p a r s e R e q u e s t B od y () throws IOException { char [] charArr = new char [ request . g e t C o n t e n t L e n g t h () ]; D.3 network 199 200 201 202 203 reader . read ( charArr , 0 , request . g e t C o n t e n t L e n g t h () ) ; request . s etReques tBody ( new String ( charArr ) ) ; logger . fine ( charArr . toString () ) ; } } D.3.5 1 2 3 4 5 6 7 8 9 import java . io .*; import kifParser . ParseE xceptio n ; public class HT TP P os tR eq u es t extends HTTPRequest { public HTT PP o st Re qu e st ( B u f f e r e d O u t p u t S t r e a m output , H T T P S e r v e r C o n f i g config ) { super ( output , config ) ; } public void execute () { try { setRes po ns e Bo dy ( config . getGam eManage r () . h a n d l e G a m e S e r v e r R e q u e s t ( getR equestB ody () ) ) ; setRes po ns e Co de ( HTTPRequest . R E S P O N S E _ C O D E _ 2 0 0 ) ; } catch ( ParseExc eption e ) { setRes po ns e Bo dy ( e . getMessage () ) ; switch ( e . getType () ) { case Par seExcept ion . SYNTAX_ERROR : setR es po ns e Co de ( HTTPRequest . R E S P O N S E _ C O D E _ 4 0 0 ) ; break ; case Par seExcept ion . I NTERNAL_ ERROR : setR es po ns e Co de ( HTTPRequest . R E S P O N S E _ C O D E _ 5 0 0 ) ; break ; default : setR es po ns e Co de ( HTTPRequest . R E S P O N S E _ C O D E _ 4 0 0 ) ; } } finally { send () ; // This is a good time to garbage collect , because there will be // a short idle period before the next request is received . System . gc () ; } } } D.3.6 1 2 3 4 5 6 7 8 9 HTTPPostRequest.java package network ; 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 95 HTTPRequest.java package network ; import java . io . B u f f e r e d O u t p u t S t r e a m ; import java . io . IOException ; import java . util . logging . Logger ; public abstract class HTTPRequest implements H T T P R e q u e s t I n t e r f a c e { private B u f f e r e d O u t p u t S t r e a m output ; 96 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 Source Code protected HTTPS e r v e r C o n f i g config ; private static final Logger logger = Logger . getLogger ( " network " ) ; // ===== Response codes ===== public static final String R E S P O N S E _ C O D E _ 1 0 0 public static final String R E S P O N S E _ C O D E _ 1 0 3 public static final String R E S P O N S E _ C O D E _ 2 0 0 public static final String R E S P O N S E _ C O D E _ 2 0 1 public static final String R E S P O N S E _ C O D E _ 2 0 2 public static final String R E S P O N S E _ C O D E _ 2 0 3 Information " ; public static final String R E S P O N S E _ C O D E _ 2 0 4 public static final String R E S P O N S E _ C O D E _ 2 0 5 public static final String R E S P O N S E _ C O D E _ 2 0 6 public static final String R E S P O N S E _ C O D E _ 3 0 0 public static final String R E S P O N S E _ C O D E _ 3 0 1 public static final String R E S P O N S E _ C O D E _ 3 0 2 public static final String R E S P O N S E _ C O D E _ 3 0 3 public static final String R E S P O N S E _ C O D E _ 3 0 4 public static final String R E S P O N S E _ C O D E _ 3 0 5 public static final String R E S P O N S E _ C O D E _ 3 0 7 public static final String R E S P O N S E _ C O D E _ 4 0 0 public static final String R E S P O N S E _ C O D E _ 4 0 1 public static final String R E S P O N S E _ C O D E _ 4 0 2 public static final String R E S P O N S E _ C O D E _ 4 0 3 public static final String R E S P O N S E _ C O D E _ 4 0 4 public static final String R E S P O N S E _ C O D E _ 4 0 5 public static final String R E S P O N S E _ C O D E _ 4 0 6 public static final String R E S P O N S E _ C O D E _ 4 0 7 Required " ; public static final String R E S P O N S E _ C O D E _ 4 0 8 public static final String R E S P O N S E _ C O D E _ 4 0 9 public static final String R E S P O N S E _ C O D E _ 4 1 0 public static final String R E S P O N S E _ C O D E _ 4 1 1 public static final String R E S P O N S E _ C O D E _ 4 1 2 public static final String R E S P O N S E _ C O D E _ 4 1 3 Large " ; public static final String R E S P O N S E _ C O D E _ 4 1 4 public static final String R E S P O N S E _ C O D E _ 4 1 5 ; public static final String R E S P O N S E _ C O D E _ 4 1 6 satisfiable " ; public static final String R E S P O N S E _ C O D E _ 4 1 7 public static final String R E S P O N S E _ C O D E _ 5 0 0 public static final String R E S P O N S E _ C O D E _ 5 0 1 public static final String R E S P O N S E _ C O D E _ 5 0 2 public static final String R E S P O N S E _ C O D E _ 5 0 3 public static final String R E S P O N S E _ C O D E _ 5 0 4 public static final String R E S P O N S E _ C O D E _ 5 0 5 supported " ; = = = = = = " 100 " 101 " 200 " 201 " 202 " 203 Continue " ; Switching Protocols " ; OK " ; Created " ; Accepted " ; Non - Authoritative = = = = = = = = = = = = = = = = = = " 204 " 205 " 206 " 300 " 301 " 302 " 303 " 304 " 305 " 307 " 400 " 401 " 402 " 403 " 404 " 405 " 406 " 407 No Content " ; Reset Content " ; Partial Content " ; Multiple Choices " ; Moved Permanently " ; Found " ; See Other " ; Not Modified " ; Use Proxy " ; Temporary Redirect " ; Bad Request " ; Unauthorized " ; Payment Required " ; Forbidden " ; Not Found " ; Method Not Allowed " ; Not Acceptable " ; Proxy Authe nticati on = = = = = = " 408 " 409 " 410 " 411 " 412 " 413 Request Time - out " ; Conflict " ; Gone " ; Length Required " ; Precondition Failed " ; Request Entity Too = " 414 Request - URI Too Large " ; = " 415 Unsupported Media Type " = " 416 Requested range not = = = = = = = " 417 " 500 " 501 " 502 " 503 " 504 " 505 // ===== Request line ===== private String version ; // ===== Request head ===== private int contentLength ; // ===== Request body ===== private String requestBody ; // ===== Response ===== // Response code . Default is 200. private String responseCode = R E S P O N S E _ C O D E _ 2 0 0 ; protected boolean sendAllow = false ; Expectation Failed " ; Internal Server Error " ; Not Implemented " ; Bad Gateway " ; Service Unavailable " ; Gateway Time - out " ; HTTP Version not D.3 network 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 97 protected boolean s e n d C o n t e n t L e n g t h = true ; protected boolean se nd C on te nt T yp e = true ; private String r e s p o n s e C o n t e n t T y p e = " text / acl " ; private String responseBody = " " ; public HTTPRequest ( B u f f e r e d O u t p u t S t r e a m output , H T T P S e r v e r C o n f ig config ) { this . output = output ; this . config = config ; } public void setVersion ( String version ) { this . version = version ; } public void setRe questBod y ( String body ) { this . requestBody = body ; } public String ge tReques tBody () { return requestBody ; } // Sets content - length of the request public void s e t C o n t en t L e n g t h ( int contentLength ) throws NumberFormatException { if ( contentLength < 0) throw new N u m b e r F o r m a t E x c e p t i o n ( " Content - Lenght must be a positive integer " ) ; this . contentLength = contentLength ; } // Returns content - length of the request public int g e t C o n t e n t L e n g t h () { return contentLength ; } public void s et R es po n se Co de ( String responseCode ) { this . responseCode = responseCode ; } public void s et R es po n se Bo dy ( String responseBody ) { if ( responseBody != null ) this . responseBody = responseBody ; } public abstract void execute () ; // = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = // Send response to client // = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = public void send () { sendResp o n s e C o d e ( responseCode ) ; sendResponseNL () ; sendResp o n s e H e a d () ; sendResponseNL () ; sendResponse ( responseBody ) ; } private void sendResponse ( String str ) { logger . fine ( str ) ; try { byte response [] = str . getBytes () ; output . write ( response , 0 , response . length ) ; output . flush () ; } catch ( IOException e ) { logger . severe ( " Could not write answer to game master . The connection 98 Source Code might be lost " ) ; 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 } } private void se n d R e s p o n s e C od e ( String code ) { sendResponse ( version + H T T P S e r v e r C o n f i g . SP + code ) ; } private void send Respons eNL () { sendResponse ( H T T P S e r v e r C o n f i g . NL ) ; } private void se n d R e s p o n s e H ea d () { if ( sendAllow == true ) { sendResponse ( " Allow : POST " ) ; sendResponseNL () ; } if ( se ndC ont e n t L e n g t h == true ) { sendResponse ( " Content - Length : " + responseBody . length () ) ; sendResponseNL () ; } if ( sendConten t Ty pe == true ) { if ( r es p o n s e C o n t e n t T y p e != null ) { sendResponse ( " Content - Type : " + r e s p o n s e C o n t e n t T y p e ) ; sendResponseNL () ; } } } } D.3.7 1 2 3 4 5 6 7 8 9 10 11 12 13 14 package network ; public interface H T T P R e q u e s t I n t e r f a c e { // Request line public abstract void setVersion ( String version ) ; // Request head public abstract void s e t C o n t e n t L e n g t h ( int contentLength ) ; // Execute request public abstract void execute () throws HTTPConnectionException , HTTPException ; } D.3.8 1 2 3 4 5 6 7 8 9 10 HTTPRequestInterface.java HTTPServer.java package network ; import gameplayer . GameManager ; public class HTTPServer { private HTTPSe rv e r C o n f i g config ; private HTTPSe rv e r T h r e a d thread ; public HTTPServer ( GameManager gameManager , int port ) { D.3 network 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 config = new H T T P S e r v e r C o n f i g ( gameManager ) ; config . setPort ( port ) ; thread = new H T T P S e r v e r T h r e a d ( config ) ; } public HTTPServer ( GameManager gameManager ) { config = new H T T P S e r v e r C o n f i g ( gameManager ) ; thread = new H T T P S e r v e r T h r e a d ( config ) ; } public void startServer () { thread . start () ; } public void stopServer () { thread . interrupt () ; } public void setPort ( int port ) { config . setPort ( port ) ; } } D.3.9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 HTTPServerConfig.java package network ; import gameplayer . GameManager ; public class H T T P S e r v e r C o n f i g { public static final String NL = " \ n " ; public static final String SP = " " ; public static final String HTTP_VERSION = " HTTP /1.1 " ; private GameManager gameManager ; // Default values : private int port ; private int timeout = 60000; public HTT P S e r v e r C o n f i g ( GameManager gameManager ) { this . gameManager = gameManager ; port = gameManager . getGameplayer () . getPort () ; } public int getPort () { return port ; } public void setPort ( int port ) { if ( port > 0 && port <= 65535) this . port = port ; } public int getTimeout () { return timeout ; } public GameManager g etGameMa nager () { return gameManager ; } } 99 100 D.3.10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Source Code HTTPServerThread.java package network ; import java . net .*; import java . util . logging . Logger ; import java . io .*; public class HTT P S e r v e r T h r e a d extends Thread { private static final Logger logger = Logger . getLogger ( " network . HTTPServer T h r e ad " ) ; private HTTPSe r v e r C o n f i g config ; private ServerSocket socket ; private BufferedReader inReader ; private B u f f e r e d O u t p u t S t r e a m outStream ; private Socket client ; private HTTPParser parser ; private HTTPRequest request ; private boolean connected = false ; // = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = // Constructor // = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = public HTTPSer v e r T h r e a d ( H T T P S e r v e r C o n f i g config ) { this . config = config ; try { socket = new ServerSocket ( config . getPort () ) ; logger . info ( " Game player server started " ) ; } catch ( IOException e ) { logger . severe ( e . getMessage () ) ; logger . severe ( " Close all other instances of this program or try using another port . " ) ; System . exit (0) ; } } // = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = // Initialize client connection // = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = private void init Connect ion () throws IOException { try { client = socket . accept () ; client . setSoTimeout ( config . getTimeout () ) ; inReader = new Buffe redRead er ( new I n p u t S t r e a m R e a d e r ( new B u f f e r e d I n p u t S t r e a m ( client . get InputStr eam () ) ) ) ; outStream = new B u f f e r e d O u t p u t S t r e a m ( client . ge tO u tp ut S tr ea m () ) ; connected = true ; } catch ( S o c k e t T i m e o u t E x c e p t i o n e ) { client . close () ; connected = false ; } catch ( SocketE x ce pt io n e ) { logger . severe ( e . getMessage () ) ; System . exit (0) ; } } // = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = // Run thread // = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = public void run () { while (! isInterrupted () ) { D.3 network 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 101 try { initCo nnectio n () ; if ( connected ) { try { parser = new HTTPParser ( config , inReader , outStream ) ; request = parser . parseRequest () ; request . execute () ; } catch ( HTTPException e ) { if ( request == null ) { request = new H T T P D um m y R e q u e s t ( outStream , config ) ; request . setVersion ( H T T P S e r v e r C o nf i g . HTTP_VERSION ) ; } request . s et Re s po ns eC o de ( e . g et Re sp o ns eC od e () ) ; if ( e . ge t Re sp o ns eC od e () . equals ( HTTPRequest . R E S P O N S E _ C O D E _ 4 0 5 ) ) { request . sendAllow = true ; } request . s et Re s po ns eB o dy ( e . getBody () ) ; request . send () ; } catch ( H T T P C o n n e c t i o n E x c e p t i o n e ) { logger . warning ( e . getMessage () ) ; // Connection was lost } finally { request = null ; client . close () ; } } } catch ( IOException e ) { logger . severe ( e . getMessage () ) ; } } } } 102 Source Code Bibliography [1] Peter Auer, Nicolò Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47:235–256, 2002. 17 [2] Technische Universität Dresden. GameController. http://www.generalgame-playing.de/downloads.html. 35 [3] Hilmar Finnson. CADIA-Player: A General Game Playing Agent. Master’s thesis, Reykjavı́k University, December 2007. 9, 18 [4] Michael Genesereth and Richard Fikes. Knowledge Interchange Format. Technical report, Stanford University, 1992. 5 [5] Michael Genesereth, Nathaniel Love, Timothy Hinrich, David Haley, and Eric Schkufza. General Game Playing: Game Description Language Specification. Technical report, Stanford University, 2008. 3, 5 [6] James Edmond Clune III. Heuristic Evaluation Functions for General Game Playing. PhD thesis, University of California, 2008. 9 [7] Levente Kocsis and Csaba Szepesvári. Bandit based Monte-Carlo Planning. In ECML-06, 2006. 18 [8] Aron Lindberg. A.I. in board games. Bachelor thesis, 2007. 36 [9] Barney Pell. Metagame: A new challenge for games and learning. Heuristic Programming in Artificial Intelligence 3 - The Third Computer Olympiad, 1992. 3 [10] Jonathan Schaeffer, Neil Burch, Yngvi Björnsson, Akihiro Kishimoto, Martin Müller, Robert Lake, Paul Lu, and Steve Sutphen. Checkers is solved. Science, September 2007. 37 104 BIBLIOGRAPHY [11] Stehpan Schiffel and Michael Thielscher. Fluxplayer: A Successful General Game Player. Technical report, Dresden University of Technology, 2007. 8 The numbers at the end of each bibliographical item above refer to the pages where the item is cited.