report

Transcription

report
AI Agents to play Settlers of Catan
Sierra Kaplan-­‐Nelson (sierrakn)
Sherman Leung (skleung)
Nick Troccoli (troccoli)
1.
Introduction Games are considered important benchmark opportunities for artificial intelligence research. Settlers of Catan is a relatively new game that has elements of both luck and strategy inherent to its design. Being a non-­‐deterministic and a non-­‐zero sum game, Settlers of Catan provided an interesting setting in which to apply our understanding of AI strategies. In this project, we implemented several player agents to play our simplified version of Settlers of Catan.
2.
Background To provide a high-­‐level description of the game, Settlers of Catan is, at its core, a resource-­‐
gathering and building board game. Players have a variety of settlements on the board that harvest resources every turn, and those resources can be used to purchase more settlements or other game objects, which ultimately give the player victory points needed to win the game. A picture of a basic game board is shown at right -­‐ each hexagonal tile represents a resource, and every tile is also assigned a number between 1 and 12. On a player’s turn, the game dice are rolled. Any player with a settlement bordering a hexagonal tile with that number on it receives one resource of that type (cities provide a player with 2 resources of that type). One exception is that if the robber pawn is located on top of a hexagon, that hexagon does not produce resources. The robber must be moved to a new tile by a player whenever that player rolls a 7.
Each player keeps all their resources in their hand. After distributing resources as determined by the dice roll, the current player can then take any number of actions -­‐ they may buy a new settlement, city, or road, trade with another player, or buy an action card. For building, a new settlement must be built on a vertex on the board, it must be at least 2 edges away from any other settlement, and it must be connected to one of the player’s other settlements by roads. A new road must be built on an edge on the board, and it must be connected 1
to another one of the player’s settlements or roads. A city (an upgraded settlement, which gets 2x the resources per turn that a settlement does) may be built in place of an existing settlement.
The player may also choose to trade resources with any of the other players. Note that only the player whose turn it is may initiate trades. Finally, the player may choose to buy action cards, which can be played on future turns. Action cards are drawn randomly from the top of the deck. Actions enabled by action cards include stealing other people’s resources, gaining extra victory points, and more.
Each new settlement gives the player 1 victory point, and each city gives an additional victory point on top of the 3 for the now-­‐replaced settlement. The first player to get to a certain number of points (10 for 3-­‐player games) wins.
3.
Task Definition For this project, we built several modules to both define, visualize, and assess our agents while playing Settlers of Catan. Further defined in our approach (Section 5), these modules worked together to assess the success of different players against a baseline random agent and against each other. Overall, we were interested in seeing how different AI agents, including Expectimax, Expectiminimax, and Monte Carlo, performed in Settlers, as well as how different heuristics (game strategies) performed.
4.
Related Work Given that this game is less than a decade old, few studies have been performed to characterize this game and even fewer studies have attempted to apply any sort of AI approach to this game. We were inspired by our class’s implementation of Pacman and structured our implementation of Catan in a similar manner to reflect the organization and segmentation of the different modules.
Around 10 AI’s for Settlers already exist. According to one study, two of the best are Castle Hill Studios’s version and Robert S. Thomas’ JSettlers. Castle Hill Studios’ version of the game, which is currently part of Microsoft’s MSN Games, features AI players who use trading extensively. JSettlers is the basis for many game versions on the internet; one uses hand-­‐coded high-­‐level heuristics with low-­‐level model trees constructed by reinforcement learning [1].
In our research to understand the literature around AI engines for games and Settlers of Catan, we found one study from 2010 in Advances in Computer Games that argues how MCTS can be applied effectively to Settlers of Catan. In our study, we extended their findings to investigate the effectiveness of Expectimax agents and compared an agent using MCTS and a Expectimax and Expectiminimax agents against the same baseline.
5.
Approach As Settlers is a fairly complex game, to help with the modeling described below, we removed or simplified some game rules to simplify the game. In addition to only having 2 players, there’s no robber, each player can take only one action on his/her turn, cannot purchase development cards from the deck, and cannot trade with other players (trading is limited to trading 4 of one resource type to the “bank” for 1 of another resource type -­‐ though we realized late that a bug only deducted one of the four resources as payment. However this bug was consistent across all simulations with trading so we do not believe it impacted our results significantly).
2
5.1 Modeling the Game
Our board class defined the hexagonal board grid and the constraints of the game related to gathering resources and settlements. The Game and GameState classes defined the constraints that each player agent operated in to interact with the board and other players. Lastly, the each agent was implemented as a subclass of a generic agent class which we assessed in the gameplay.
5.2 Visualizing the game
A substantial part of the project also involved visualizing the game itself to reflect the data structure we used to implement our board. Some investigation online allowed us to build off of a two-­‐player implementation of Settlers of Catan that only allowed for human play. We implemented the board’s data structure to allow for O(1) lookup of hexagons, edges, and vertices as this is an essential part of getting legal actions and evaluating actions in Settlers of Catan. We used three matrices to represent each type, and wrote methods that define the relationships between the locations in each matrix. This implementation was based on previous studies of optimal representations of hexagonal grids [2].
We then defined a mapping between vertex numbers and relative offsets on the board in order to be able to draw settlements and roads by getting the coordinates of the vertex positions. Downloaded images were used to represent each tile, number chit, settlement and city.
Figure 2: A screenshot in the middle of gameplay from our graphics module
3
5.3 Expectimax and Expectiminimax
For Expectiminimax and Expectimax (our non-­‐Monte-­‐Carlo attempt), we modeled different player agents in a similar manner to the course’s Pacman assignment [3]. We defined a generic class Agent that we subclassed for each specific type of agent we wanted to use. The generic agent class implemented all required methods except for getAction, and each subclass for Expectiminimax and Expectimax overrode the inherited getAction method. To handle the different heuristics, we had every type of Agent accept an optional heuristic as an evaluation function in their constructor (this was defaulted to a default heuristic).
5.4 Monte-­‐Carlo Agent
The classes that defined the agent and the game node class for the Monte-­‐Carlo Tree Search were abstracted out into separate classes as well (separate from the existing Expectimax and Expectiminimax agents). Because our game state model returned a list of all possible actions given a player index, the adaptation of our model to a generic Monte Carlo algorithm was fairly straightforward.
6.
Methods For our methods below, we treated a Random agent as the baseline for our AI, and presumed an agent capable of winning 100% of the time (by knowing what the other player will do) as our oracle. We felt that a random agent was a good indicator of “average” play as it acts as a sort of baseline of various strategies.
6.1 Expectimax and Expectiminimax
For our first attempt, we applied the algorithms for Expectimax and Expectiminimax. These algorithms were used to account for the randomness associated with dice rolls on each turn. To be more specific, Expectimax assumes the opponent is a random agent, while Expectiminimax assumes that the opponent is actively trying to minimize the agent’s success. Each agent examines all possible actions they or their opponent could take (until the max depth is reached) and calculates a score based on its evaluation function to decide the best decision to take at a given point.
One of the issues with the Expectimax/Expectiminimax approach was the sheer number of possible actions (across trading, building roads, etc.). To combat the exponential search space, we implemented Alpha/Beta pruning to reduce the search space by ignoring branches of the game tree that cannot improve upon a result already seen. Even with this improvement, a search depth of 2 was only attainable by pruning actions that wouldn’t lead to a settlement or city when that type of action was possible. This was implemented using a priority queue to only examine a few actions when propagating down the tree.
6.2 Monte-­‐Carlo Tree Search with UCT
Taking suggestions for improvement from our project poster session, we aso adapted a Monte Carlo-­‐based algorithm to our existing implementation. In recent years, Monte-­‐Carlo-­‐based techniques proven to be effective applications to computer games. 4
Monte-­‐Carlo Tree Search is a Monte-­‐Carlo simulation based technique that was first introduced in 2006 and implemented in the top-­‐rated Go AI agent [4].
At a high-­‐level, Monte-­‐Carlo Tree Search is an adaption of the basic Monte-­‐Carlo algorithm that searches for moves with high win-­‐rates. As a Monte-­‐Carlo based algorithm it searches out the most promising moves by playing simulations of games and using data that it sees to update its existing knowledge. Monte-­‐Carlo Tree Search extends on this algorithm by conducting a tree search of game states with Monte-­‐Carlo-­‐like stochastic simulations. It’s this stochastic nature that makes it appropriate for games with random characteristics (i.e. the dice agent in Settlers of Catan). The algorithm implements this search technique by exploring moves with a random sampling that weighs the most promising game nodes higher. In each simulation of a game, the final game result is back propagated through ancestor nodes to increase the likelihood that better nodes are chosen in future simulation of the game. This process can be broken down into:
1. Selection: starting from the Root node, R, select successive child nodes down to a leaf node, L. 2. Expansion: From leaf node L, choose one of its children, C, to start the simulation (or none if the game ends there) 3. Simulation: From node C, simulate a random playout until the game’s end. 4. Backpropagation: Use the result of the finished simulation to backpropagate the game outcome from nodes C to R. Figure 3: A diagram of the steps of the MCTS algorithm
The Upper Confidence bounds applied to Trees (UCT) is an exploration strategy often used in combination with MCTS that tries to balance the deep search of successful nodes against the exploration of new game nodes [4]. It selects moves by searching a tree which is grown in memory. As soon as a terminal node is found, a new move/child is added to the tree and the rest of the game is played randomly. The evaluation of the finished random game is then used to update the statistics of all moves in the tree that were part of that game.
In our implementation, we adapted an open source implementation of the MCTS algorithm using the UCT exploration strategy [5]. The original implementation was intended to solve the game of Connect Four, but provided a general framework for the algorithm that we could use to interface with our Settlers of Catan game state object. It should also be noted that this implementation featured a new set of actions that allowed 5
trading. Given that MCTS sped up the game considerably, trading was now possible as a new set of actions while it was timing out with the implementation of Expectimax before.
7.
Data We ran a variety of simulations between our Expectimax, Expectiminimax, and Random agents, as seen below, testing win rates and different heuristics:
Table 1: Comparing Expectimax vs. Expectiminimax
100 trials were run at a search depth of 1 and 50 trials were ran at a search depth of 2
Table 2: Comparing Heuristics on Expectiminimax
100 trials were run at a search depth of 1 and 50 trials were ran at a search depth of 2
We also ran simulations testing our Monte-­‐Carlo agent against the Random agent:
Win Rate %
Average Turns Average Point Advantage
Monte-­‐Carlo Tree Search
84.4%
28.3
6.466
Random Agent
15.6%
37.0
2.666
Table 3: Monte Carlo
Each game was run with 50 simulations for a total of 60 trials altogether.
8.
Analysis 8.1 Effect of Starting Settlements
A player’s starting settlements significantly impacts their likelihood to win, especially in an environment without trading. A player receives resources every turn based on which resources adjoin its settlements. Without trading (which were not implemented for 6
Expectimax/Expectiminimax), a player must have starting settlements touching every resource in order to ever build another settlement or city. To increase the likelihood of this happening, we gave each player 4 starting settlements instead of 2 and tried to position at least 2 settlements next to critical resources, but there are still games where players can never build another settlement due to resource constraints. When we ran our initial tests using an Expectimax agent with seeded starting positions, no matter which positions we picked one pair would always be slightly better and the player with that pair won 100% of the time, whether Random or Expectimax. This shows that without trading, starting settlement positions have an enormous statistical significance.
Previous findings mention that “the seating order of players has a significant effect on their final rankings” [1]. Since players weren’t allowed to choose starting settlements, seating order isn’t relevant, but the same study still shows that better starting positions significantly increases a player’s likelihood to win. In our implementation of Settlers, we provide each player with two starting settlements that are guaranteed to be on a brick or lumber resource tile. Two additional settlements are then chosen randomly from the board. This implementation allows one player to receive slightly better starting positions 50% of the time, and therefore have a significant advantage 50% of the time. We believe this accounts for how often a Random agent was able to win against our smarter agents, particularly against Expectimax and Expectiminimax. 8.2 Effect of Trading
To address above inconsistencies in starting settlements, we attempted to add trading as a legal action to offset the significance of the starting settlements; however, trading introduced an exponential increase in the search space that could not be resolved in time realistically to gather data points for Expectiminimax. In our Monte Carlo based implementation, trading was integrated effectively and was found to significantly help offset the randomness of the game. Though games took longer to play out with the additional option to trade, our MCTS agent proved to be better off even when given a disadvantageous starting positions because of the newfound ability to trade for other resources. 8.3 Expectiminimax vs. random agents
Although we made modifications to Settlers so that it could mimic a zero sum game, such as limiting the number of players to 2, its design isn’t optimal for the Expectiminimax algorithm. For one thing, Settlers actually contains some collaboration in its true form, through trading. For another, it’s not always optimal for one player to try to minimize the other’s score rather than maximizing their own (blocking another player’s settlement position may not be better than settling somewhere else with optimal resources for oneself). Plus, Expectiminimax assumes the other player is minimizing, when in fact our opposing agent was random. These reasons explain our less than 60% results for our Expectiminimax agents.
8.4. Expectimax vs. random agents
Expectimax, on the other hand, is more optimal because it assumes a random agent. This explains why Expectimax was more successful than Expectiminimax at depth 1. However, for depth 2, given such a large search space, we believe that expectimax wasn’t able to perform at its full 7
capacity because the maximum depth that we succeeded with in running the algorithm only saw two turns ahead with significant pruning. We believe that the high variability of actions caused by the dice rolls significantly increased the search complexity to such a degree that the expectimax algorithm was no longer effective at a depth of 2. One interesting point is that Expectiminimax did worse than Expectimax on depth 1, but better on depth 2. While this was a confusing data point for us, we theorized that while overall Expectimax is more optimal, and thus had better depth 1 results, Expectiminimax assumes the worst case for an opponent, which may have helped it achieve a better depth 2 result while Expectimax collapsed. We opted to understand this point of tension better with a search agent that could explore more of the game tree, which led to our implementation of the MCTS agent.
8.5 Heuristic analysis for Expectiminimax We tried two heuristics for the Expectiminimax algorithm, one which took into account the number of resources that each player had access to via their settlements (heuristic 1) and another heuristic that took into account only the difference in settlements between the players (heuristic 2). At depth 1, An Expectiminimax agent playing with heuristic 1 showed stronger results than one playing with heuristic 2 as expected, with no action pruning. This can be explained by the importance to prioritize settlements that give access to more resources. This also explains prioritizing of roads that lead to locations with more resources. However, in order to be able to run simulations with Expectimax and Expectiminimax agents exploring the search space at depth 2, we reduced the search space complexity by pruning the number of actions explored at each depth. To do this, we added an algorithm that prioritized some actions over others; actions that lead to settlements and cities were always valued over building roads. This gave contrasting results than for those at depth 1. We suggest that this can be explained because the advanced foresight of building a road that leads to a settlement was never explored at a depth of 1. This again emphasizes the disadvantages of using Expectimax/Expectiminimax methods for a game with as much randomness as Settlers. Because the search space becomes so enormous so quickly, it becomes impossible to adequately explore it.
8.6 Why our agents win faster and by more
In the games where our agents win, they on average win in fewer turns and with a larger point differential. At depth 1, our ExpectiMax agent won in 31 fewer turns on average than the random agent (75 vs. 106) and at depth 2 our ExpectiMax agent won in 29 fewer turns on average (73 vs. 102). Our Monte Carlo agent won in 9 fewer turns on average than the random agent (28 vs. 37). This shows that our agents do make smarter choices than the random agents, although they still lose in games where random agents have a significant advantage. The fact that our agents win by more points on average as well reinforces the fact that our agents are able to make smarter choices and think further ahead than the random agents.
8.7 Comparing Monte Carlo with Expectimax/Expectiminimax
Although we hypothesized that the same strategies used in Pacman would generalize to Settlers, our data shows that Expectimax/Expectiminimax algorithms were not ideal for Settlers due to the large amount of randomness, the importance of trading, and the fact that Settlers is a non 8
zero-­‐sum game. We underestimated how quickly the search space would grow, and didn’t fully realize this fact until after implementing Expectimax and Expectiminimax. With Monte Carlo our agent was able to explore much more of the tree at every iteration and therefore make better decisions at every decision point, as evidenced by the higher win rate of the Monte Carlo agent vs. our previous agents. Because the game tree in MCTS grows asymmetrically, the algorithm is inherently biased to concentrate on more promising choices. Thanks to this bias, it’s able to narrow the search space significantly and achieve much better results than classical algorithms like Expectimax or Expectiminimax that expand symmetrically in games like Settlers with a high branching factor.
9.
Conclusion Our choice to build an AI for Settlers of Catan encouraged us to explore and compare many of the algorithms we learned in class. Our biggest challenges turned out to be modeling the game and tweaking our initial algorithms to run within a reasonable amount of time. Though our Expectimax/Expectiminimax didn’t show the success that we expected it would, it was interesting to see that our agents won quicker and by more in the games where they had the initial advantage, an indication that Expectimax and Expectiminimax agents were making smarter decisions at every turn than Random agents. However, since they weren’t able to explore the game space and a substantial depth, our agents weren’t able to navigate more complex decision trees that might have helped overcome any initial disadvantage. This initial setback led us to try our variant of the Monte Carlo Tree Search algorithm against such a complex and random game. Our results show that Monte Carlo Tree Search is effective against random agents and that trading helps offset the impact of randomness in a game. This suggests MCTS as a suitable tool for implementing a strong Settlers of Catan player and lays the groundwork for further investigation to be done with MCTS against existing AI agents and human players.
References
[1] István Szita, Guillaume Chaslot and Pieter Spronck. Monte-­‐Carlo Tree Search in Settlers of Catan. Advances in Computer Games (2010)
[2] Amit Patel. Hexagonal Grids. Red Blob Games (2013)
[3] Stanford CS221. Multi-­‐agent Pac-­‐Man Assignment (2014)
[4] Chaslot, G., Saito, J., Bouzy, B., Uiterwijk, J., van den Herik, H.: Monte-­‐carlo strategies for computer go. In: Proceedings of the 18th BeNeLux Conference on Artificial Intelligence, pp. 83–90 (2006)
[5] George Lesica. Implementation of the MCTS algorithm along with a simple game AI that plays Connect Four (2012)
9