MAL Seminar BW-API

Transcription

MAL Seminar BW-API
MAL Seminar
BW-API
RL Code
● Starter Code is provided in F:/BWAPI.3.7.4/BWAPI.
3.7.4/SeminarAIModule
● Implements SARSA(lambda) with epsilon
greedy policy
Scenario
1 slide BWAPI manual
1. Implement AI agent
2. Rebuild SeminarAIModule in Visual studio
3. Copy SeminarAIModule.dll from F:/BWAPI.3.7.4/BWAPI.
3.7.4/SeminarAIModule/Release to F:/StarCraft/BWAPIdata/AI
4. Start F:/BWAPI.3.7.4/BWAPI.3.7.4/ChaosLauncher and click
start
5. AImodule and Map can be specified by selecting config in
launcher
Configuring trials
● Set trial parameters in F:/StarCraft/trials.cfg
○ First line is # of trials
○ Each further line specifies parameters for 1 trial
○ Format: alpha, lambda, #episode,
doGreedyEvalRuns (0/1)
○ Output is in starcraft folder in trial<nr>_out.txt files
○ Output format: episode,total reward, number of steps
State space
● Current implementation extracts 6 vars from game info
○ X- coordinate in [0,1000]
○ Y-coordinate in [0,1000]
○ enemy distance [0,1000]
○ hitpoint difference unit-enemy in [-50,50]
○ enemy attacking/moving (boolean)
○ enemy angle [-pi,pi]
● Implemented in SeminarAIModule::getState()
Discretization
● Each state variable (except boolean
isAttacking) is discretized into set of finite
values
● Resolution per variable is set in
SeminarAIModule::initQ()
● Finer discretization allows more accurate,
but slower learning
Action Space
● Agent has 7 discrete actions
○
○
○
○
0: stop
1: attack enemy (only if visible)
2: move towards enemy (only if visible)
3-6: move N,E,S,W 30 units
● Implemented in SeminarAIModule::
executeAction
Rewards
● step_reward (-0.03) for every non-final step
● hitpoint difference when either unit is killed
○ negative for loss
○ positive for win
● So goal is to kill enemy as quickly as
possible, with maximum hitpoints left