slides - SimBionic

Transcription

slides - SimBionic

16/08/2004
University of Paris 6
Computer Science Laboratory
Bootstrapping the Learning Process for
the Semi-automated Design of a
Challenging Game AI
Charles Madeira, Vincent Corruble,
Corruble,
Geber Ramalho and Bohdana Ratitch
AAAIAAAI-04 Workshop on Challenges in Game AI
16 August, 2004
1
Complex strategy games
Large physical space
Hundreds or thousands of units must coordinate their
actions in order to achieve a common goal
Age of Empires© (Microsoft)
16 August, 2004
Battleground™ (Talonsoft)
2
1
16/08/2004
Motivation
A challenging game AI is one of the keys to retain a high
replay value for a game [Nareyek 2004]
It is necessary to design good opponents in order to keep the
interest in the game
Industrial development is dominated by rule-based
systems (fixed reasoning) [Pottinger 2003][Rabin 2002]
Any fixed reasoning is easily caught by the opponent and its
loopholes can soon be exploited [Corruble 2000]
Therefore, it is legitimate to consider other alternatives
to address the design of a game AI
16 August, 2004
3
Motivation
Few detailed attempts for developing methodologies for
the design of a game AI are found in the literature
A very interesting field of research [Buro 2003]
A lot of difficulties of the real world are combined [Schaeffer and
Herik 2002][Pannérec 2002]
Partially observable
Non determinism
Cooperation
Coordination
Game environments are excellent simulators for experimenting
new AI techniques
A new field of research for the academic AI community
16 August, 2004
4
2
16/08/2004
The main goal
Develop an adaptive game AI for complex
strategy games
In order to
Propose a challenging opponent for expert
humans
How?
Using a worthy alternative
Decision-making system developed by
Reinforcement Learning techniques
16 August, 2004
5
Why Reinforcement Learning?
Useful in the situations where efficient strategies
are unknown or not easily automated
Basic AI techniques have much difficulty
It is particularly suited in the case of strategic
games
Specifically designed for learning sequential decisionmaking strategies in the long-term
It has produced great results
TD-Gammon has become one of the best backgammon
players in the world [Tesauro 1992, 1994, 1995]
16 August, 2004
6
3
16/08/2004
Reinforcement Learning [Sutton and Barto 1998]
Environment
(Finite state space)
Situation
Reward
(Finite action space)
Action
s0
a0
r0
s1
a1
r1
s2
a2
…
r2
Objective: approach an optimal strategy based on longterm performance criteria
Learning from interaction with a stochastic and unknown
environment
Sequential decision-making with intervals of discrete time
16 August, 2004
7
Difficulties for using Reinforcement Learning
Large state space
A set of possible situations
Large action space
A set of possible actions for each situation
Parallel nature of the decision-making
The complexity increases exponentially with
the number of units
16 August, 2004
8
4
16/08/2004
Plan of battle
Control the complexity of the problem
Determine how to represent game
situations
Find a good manner to acquire experience
We propose a methodology that one can follow
to avoid typical difficulties
16 August, 2004
9
Methodology
Decomposition of the problem
Using multi-agent distributed reasoning
Representation of game situations
Finding proper granularities (by terrain analysis, …)
Acquiring experience by playing against good
opponents
Bootstrapping the learning process (learning by steps)
Multi-agent coordination
We apply our methodology to the Battleground™ series
of wargames
16 August, 2004
10
5
16/08/2004
John Tiller’s Battleground™ series (Talonsoft)
16 August, 2004
11
Methodology applied to Battleground
The military hierarchy of command and control
Abstracting state and action spaces
Acquiring experience by playing against
the existing Battleground AI (bootstrap AI)
Bootstrapping the learning process
Learning by levels of the military hierarchy
16 August, 2004
12
6
16/08/2004
Long-term
objective
(Strategy)
Army Commander
Situation
Report
Order
… CC
Corps Commander
Situation
Report
Order
Division Commander
Order
Brigade Commander
Order
Situation
Report
Lower Level Units
Actions
Perceptions
…
… DC
Order
Situation
Report
Brigade Commander
Order
…
Situation
Report
Lower Level Units
Actions
16 August, 2004
Perceptions
Specific
action
(Tactic)
13
16 August, 2004
14
7
16/08/2004
Finding abstract search spaces with
proper granularities
Commanders must process only relevant
information
The situation of their subordinate units
The environment situation
Friendly units
Enemy units
The possible actions
16 August, 2004
15
16 August, 2004
16
8
16/08/2004
Side controlled by the learning AI
and by the bootstrap AI
Our
side
AC
Side controlled by the bootstrap AI
Opponent
side
Army
AC
CC … CC
Corps
CC … CC
DC … DC
Division
DC … DC
BC … BC
Brigade
BC … BC
…
Lower Level Units
U
U
U
16 August, 2004
…
U
17
Experiments with Battleground
Experimental platform
Developing an easily configurable
environment
Evaluation criteria
Comparing the results of our learning agent
model
16 August, 2004
18
9
16/08/2004
Experimental platform
Napolectronic Platform
AI Engine
Decision Making
Situation Representation
Data
Situation
Actions
Game Engine
Interaction Manager
Wargame System
Operating System
16 August, 2004
19
Evaluation criteria
Compare the results of our learning agent
model with others agent models
A random agent
The bootstrap agent
Everyone plays against the bootstrap AI
16 August, 2004
20
10
16/08/2004
First experiments
Decision-making scheme chosen
Army Commander
Situation
Report
Order
Order
Situation
Report Order
Corps
Commander
Corps
Commander
Controlled by
an existing AI
Controlled by
an existing AI
Situation
report
Corps
Commander
Controlled by
an existing AI
All subordinate units
(the bootstrap AI)
16 August, 2004
21
First experiments
Corps Description
Environment Description
Friendly Corps
(12 bits)
Action
Friendly Unit Groups
(12 bits)
Enemy Unit Groups
(12 bits)
Zone 6
Strength
Zone 1
Strength
Zones
2, 3, 4, 5
Strength
Zone 6
Strength
Order
Type
Target
Location
6 bits
Zone
Artillery
Strength
Cavalry
Strength
Infantry
Strength
Fatigue
Level
Quality
Rating
Zone 1
Strength
Zones
2, 3, 4, 5
Strength
6
4
4
4
4
2
4
…
4
4
…
4
5
8
000
00
00
00
00
0
00
…
00
00
…
00
000
000
3
1
5
4
2 16 August, 2004
6
Tabular representation
of state/action pairs
Maximum table size =
241 lines
22
11
16/08/2004
First results
Major Victory
Score 1000
Minor Victory
600 Score < 1000
Average score (100:1)
Evaluation results of the Learning AI
Random
400
350
300
Bootstrap AI
Learning AI
Logarithmic (Learning AI)
Draw
200 < Score < 600
250
200
150
100
50
0
-50
Minor Defeat
0 < Score 200
1
4
7 10 13 16 19 22 25 28 31 34 37 40 43 46 49
Major Defeat
Score 0
Number of learning episodes (x 1000)
16 August, 2004
23
Qualitative results
Our system has learned some basic strategy
The first 10000 episodes
Friendly units took more precaution
Afterwards
Friendly units take much more risk in order to capture the
most important objectives
They achieve higher scores
But they fail dramatically at some occasions
Long-term benefits of risky actions have been
learned and are reflected in the strategy
obtained
16 August, 2004
24
12
16/08/2004
Conclusions
We proposed a new approach to the semi-automatic
generation of a game AI using Reinforcement Learning
techniques
Our approach uses a bootstrapping of the learning
process based on a specific methodology applied to
complex strategy games
The first results are quite encouraging, despite the
poverty of the representation used
We believe that the lessons learned have a wide
applicability for the AI design applied to strategy games
16 August, 2004
25
Ongoing / Future work
Control others levels of the hierarchy
Use function approximation for generalization
[Sutton and Barto 1998]
Neural Network (back-propagation)
Cerebellar Model Articulation Controller (CMAC)
[Albus 1975][Santamaria 1996]
Design detailed representations of game
situations
Coordinate agent actions [Guestrin 2002]
16 August, 2004
26
13

slides - SimBionic

Transcription

Similar documents

Lycoming IO-580 320 HP FAA STC SA00241BO

Sony CDX

United StateS Marine CorpS Civilian Marine profeSSional reading

The Show That Never Ends

The Method of Bootstrapping (5.8)

Commander`s Intent-Tactical Edge

SCC Presentation - FireWise of Southwest Colorado

Bootstrapping Time Series Data

The Regulator - NC Historic Sites

paper - Command and Control Research Portal

The Regulator - NC Historic Sites

Bootstrap Distributions Or: How do we get a sense of a... distribution when we only have ONE sample?

Asymptotic properties of sample quantiles from a finite population Arindam Chatterjee

T E C H

How to Find Relevant Training Data: Steganalysis Pham Hai Dang Le

from ism.ac.jp

Finite Sample Properties of the Dependent Bootstrap for Conditional Moment Models

Kaihan Krippendorff`s Presentation