slides - SimBionic
Transcription
slides - SimBionic
16/08/2004 University of Paris 6 Computer Science Laboratory Bootstrapping the Learning Process for the Semi-automated Design of a Challenging Game AI Charles Madeira, Vincent Corruble, Corruble, Geber Ramalho and Bohdana Ratitch AAAIAAAI-04 Workshop on Challenges in Game AI 16 August, 2004 1 Complex strategy games Large physical space Hundreds or thousands of units must coordinate their actions in order to achieve a common goal Age of Empires© (Microsoft) 16 August, 2004 Battleground™ (Talonsoft) 2 1 16/08/2004 Motivation A challenging game AI is one of the keys to retain a high replay value for a game [Nareyek 2004] It is necessary to design good opponents in order to keep the interest in the game Industrial development is dominated by rule-based systems (fixed reasoning) [Pottinger 2003][Rabin 2002] Any fixed reasoning is easily caught by the opponent and its loopholes can soon be exploited [Corruble 2000] Therefore, it is legitimate to consider other alternatives to address the design of a game AI 16 August, 2004 3 Motivation Few detailed attempts for developing methodologies for the design of a game AI are found in the literature A very interesting field of research [Buro 2003] A lot of difficulties of the real world are combined [Schaeffer and Herik 2002][Pannérec 2002] Partially observable Non determinism Cooperation Coordination Game environments are excellent simulators for experimenting new AI techniques A new field of research for the academic AI community 16 August, 2004 4 2 16/08/2004 The main goal Develop an adaptive game AI for complex strategy games In order to Propose a challenging opponent for expert humans How? Using a worthy alternative Decision-making system developed by Reinforcement Learning techniques 16 August, 2004 5 Why Reinforcement Learning? Useful in the situations where efficient strategies are unknown or not easily automated Basic AI techniques have much difficulty It is particularly suited in the case of strategic games Specifically designed for learning sequential decisionmaking strategies in the long-term It has produced great results TD-Gammon has become one of the best backgammon players in the world [Tesauro 1992, 1994, 1995] 16 August, 2004 6 3 16/08/2004 Reinforcement Learning [Sutton and Barto 1998] Environment (Finite state space) Situation Reward (Finite action space) Action s0 a0 r0 s1 a1 r1 s2 a2 … r2 Objective: approach an optimal strategy based on longterm performance criteria Learning from interaction with a stochastic and unknown environment Sequential decision-making with intervals of discrete time 16 August, 2004 7 Difficulties for using Reinforcement Learning Large state space A set of possible situations Large action space A set of possible actions for each situation Parallel nature of the decision-making The complexity increases exponentially with the number of units 16 August, 2004 8 4 16/08/2004 Plan of battle Control the complexity of the problem Determine how to represent game situations Find a good manner to acquire experience We propose a methodology that one can follow to avoid typical difficulties 16 August, 2004 9 Methodology Decomposition of the problem Using multi-agent distributed reasoning Representation of game situations Finding proper granularities (by terrain analysis, …) Acquiring experience by playing against good opponents Bootstrapping the learning process (learning by steps) Multi-agent coordination We apply our methodology to the Battleground™ series of wargames 16 August, 2004 10 5 16/08/2004 John Tiller’s Battleground™ series (Talonsoft) 16 August, 2004 11 Methodology applied to Battleground Decomposition of the problem The military hierarchy of command and control Representation of game situations Abstracting state and action spaces Acquiring experience by playing against the existing Battleground AI (bootstrap AI) Bootstrapping the learning process Learning by levels of the military hierarchy Multi-agent coordination 16 August, 2004 12 6 16/08/2004 The military hierarchy of command and control Long-term objective (Strategy) Army Commander Situation Report Order … CC Corps Commander Situation Report Order Division Commander Order Brigade Commander Order Situation Report Lower Level Units Actions Perceptions … … DC Order Situation Report Brigade Commander Order … Situation Report Lower Level Units Actions 16 August, 2004 Perceptions Specific action (Tactic) 13 Methodology applied to Battleground Decomposition of the problem The military hierarchy of command and control Representation of game situations Abstracting state and action spaces Acquiring experience by playing against the existing Battleground AI (bootstrap AI) Bootstrapping the learning process Learning by levels of the military hierarchy Multi-agent coordination 16 August, 2004 14 7 16/08/2004 Abstracting state and action spaces Finding abstract search spaces with proper granularities Commanders must process only relevant information The situation of their subordinate units The environment situation Friendly units Enemy units The possible actions 16 August, 2004 15 Methodology applied to Battleground Decomposition of the problem The military hierarchy of command and control Representation of game situations Abstracting state and action spaces Acquiring experience by playing against the existing Battleground AI (bootstrap AI) Bootstrapping the learning process Learning by levels of the military hierarchy Multi-agent coordination 16 August, 2004 16 8 16/08/2004 Bootstrapping the learning process Side controlled by the learning AI and by the bootstrap AI Our side AC Side controlled by the bootstrap AI Opponent side Army AC CC … CC Corps CC … CC DC … DC Division DC … DC BC … BC Brigade BC … BC … Lower Level Units U U U 16 August, 2004 … U 17 Experiments with Battleground Experimental platform Developing an easily configurable environment Evaluation criteria Comparing the results of our learning agent model 16 August, 2004 18 9 16/08/2004 Experimental platform Napolectronic Platform AI Engine Decision Making Situation Representation Data Situation Actions Game Engine Interaction Manager Wargame System Operating System 16 August, 2004 19 Evaluation criteria Compare the results of our learning agent model with others agent models A random agent The bootstrap agent Everyone plays against the bootstrap AI 16 August, 2004 20 10 16/08/2004 First experiments Decision-making scheme chosen Army Commander Situation Report Order Order Situation Report Order Corps Commander Corps Commander Controlled by an existing AI Controlled by an existing AI Situation report Corps Commander Controlled by an existing AI All subordinate units (the bootstrap AI) 16 August, 2004 21 First experiments Corps Description Environment Description Friendly Corps (12 bits) Action Friendly Unit Groups (12 bits) Enemy Unit Groups (12 bits) Zone 6 Strength Zone 1 Strength Zones 2, 3, 4, 5 Strength Zone 6 Strength Order Type Target Location 6 bits Zone Artillery Strength Cavalry Strength Infantry Strength Fatigue Level Quality Rating Zone 1 Strength Zones 2, 3, 4, 5 Strength 6 4 4 4 4 2 4 … 4 4 … 4 5 8 000 00 00 00 00 0 00 … 00 00 … 00 000 000 3 1 5 4 2 16 August, 2004 6 Tabular representation of state/action pairs Maximum table size = 241 lines 22 11 16/08/2004 First results Major Victory Score 1000 Minor Victory 600 Score < 1000 Average score (100:1) Evaluation results of the Learning AI Random 400 350 300 Bootstrap AI Learning AI Logarithmic (Learning AI) Draw 200 < Score < 600 250 200 150 100 50 0 -50 Minor Defeat 0 < Score 200 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 Major Defeat Score 0 Number of learning episodes (x 1000) 16 August, 2004 23 Qualitative results Our system has learned some basic strategy The first 10000 episodes Friendly units took more precaution Afterwards Friendly units take much more risk in order to capture the most important objectives They achieve higher scores But they fail dramatically at some occasions Long-term benefits of risky actions have been learned and are reflected in the strategy obtained 16 August, 2004 24 12 16/08/2004 Conclusions We proposed a new approach to the semi-automatic generation of a game AI using Reinforcement Learning techniques Our approach uses a bootstrapping of the learning process based on a specific methodology applied to complex strategy games The first results are quite encouraging, despite the poverty of the representation used We believe that the lessons learned have a wide applicability for the AI design applied to strategy games 16 August, 2004 25 Ongoing / Future work Control others levels of the hierarchy Use function approximation for generalization [Sutton and Barto 1998] Neural Network (back-propagation) Cerebellar Model Articulation Controller (CMAC) [Albus 1975][Santamaria 1996] Design detailed representations of game situations Coordinate agent actions [Guestrin 2002] 16 August, 2004 26 13