Game 2048 AI Controller Competition @ GECCO 2015

Transcription

Game 2048 AI Controller Competition @ GECCO 2015
Game 2048 AI Controller Competition @ GECCO 2015
Wojciech Jaśkowski
Marcin Szubert
Institute of Computing Science
July 13, 2015
Game of 2048
Rules
single-player, nondeterministic
4 × 4 board
actions: left, right, up, down
merging: score the sum
every move: 2 or 4 in random
position
goal: construct tile 2048
http://gabrielecirulli.github.io/2048
2048 AI Controller Competition @ GECCO 2015
2 / 10
Jaśkowski & Szubert
Search Results of ‘2048’ in Android Store
2048 AI Controller Competition @ GECCO 2015
3 / 10
Jaśkowski & Szubert
Motivation & Goal
Motivation
Popularity:
4mln visitors in the first weekend
3000 man-years in 3 weeks
Easy to learn but not trivial to master → ideal test bed for CI/AI
Langerman, S., & Uno, Y. (2015). Threes!, Fives, 1024!, and 2048 are
Hard.
1021 states (upper bound), Backgammon: 1020
Few previous studies (RL only!):
Szubert, M., & Jaśkowski, W. (2014, August). Temporal difference
learning of N-tuple networks for the game 2048.
Wu, I. C., Yeh, K. H., Liang, C. C., Chang, C. C., & Chiang, H. (2014).
Multi-Stage Temporal Difference Learning for 2048.
2048 AI Controller Competition @ GECCO 2015
4 / 10
Jaśkowski & Szubert
Motivation & Goal
Motivation
Popularity:
4mln visitors in the first weekend
3000 man-years in 3 weeks
Easy to learn but not trivial to master → ideal test bed for CI/AI
Langerman, S., & Uno, Y. (2015). Threes!, Fives, 1024!, and 2048 are
Hard.
1021 states (upper bound), Backgammon: 1020
Few previous studies (RL only!):
Szubert, M., & Jaśkowski, W. (2014, August). Temporal difference
learning of N-tuple networks for the game 2048.
Wu, I. C., Yeh, K. H., Liang, C. C., Chang, C. C., & Chiang, H. (2014).
Multi-Stage Temporal Difference Learning for 2048.
Goals
1
2
Learning/evolving the strategy.
Learning/devising effective function approximators
2048 AI Controller Competition @ GECCO 2015
4 / 10
Jaśkowski & Szubert
Competition Rules
Rules
Java API provided, sumitting a jar (any JVM language).
1 ms (enough!) for a given action (on average)
Objective: expected score (average over 1000 games)
2048 AI Controller Competition @ GECCO 2015
5 / 10
Jaśkowski & Szubert
Submissions 1: Treecko
Authors: T. Chabin, M. Elouafi, P. Carvalho, A. Tonda
Affiliation: INRA (France),
Linear Genetic Programming: (MicroGP)
19h with 16 cores
(Code)
(Demo)
2048 AI Controller Competition @ GECCO 2015
6 / 10
Jaśkowski & Szubert
Submissions 2: IeorIITB2048
Authors: Devanand and Shivaram Kalyanakrishnan
Affiliation: Indian Institute of Technology Bombay, India
Cross Entrophy Method
A linear weighed function approximator with 9 hand-designed features
2-ply Expectimax
(Demo)
2048 AI Controller Competition @ GECCO 2015
7 / 10
Jaśkowski & Szubert
Competition Results
Ranking :
1
IeorIITB2048
Avg. Score: 18245.6 ± 660.3
2048: 0.341
4096: 0.013
2
Treecko
Avg. Score: 5334.3 ± 171.0
1024: 0.048
2048: 0.0
2048 AI Controller Competition @ GECCO 2015
8 / 10
Jaśkowski & Szubert
State-of-the-art
1
2
Small number of hand-designed features + CMA-ES + ExpectiMax
Large number of binary features + TD + ExpectiMax
Systematic n-Tuple Networks1
0
64
128
2
2
8
1
2
3
128
f (s) =
m
X
fi (s) =
i=1
1
m
X
8
4
2
2
0123
weight
0000
3.04
0001 −3.90
0002 −2.14
..
..
.
.
0010
5.89
..
..
.
.
0130 -2.01
..
..
.
.
h
i
LUTi index sloci1 , . . . , slocini
i=1
See also: EML1, Tuesday 17:05
2048 AI Controller Competition @ GECCO 2015
9 / 10
Jaśkowski & Szubert
Conclusions
Summary
2048: new interesting challenge for AI/CI with simple rules and
highly popular, quick to play (even 20ms for one game)
Still early research, plenty of opportunities, also as a benchmark
2048 AI Controller Competition @ GECCO 2015
10 / 10
Jaśkowski & Szubert
Conclusions
Summary
2048: new interesting challenge for AI/CI with simple rules and
highly popular, quick to play (even 20ms for one game)
Still early research, plenty of opportunities, also as a benchmark
Open questions
Effective learning (currently millions of games)?
Automatically extracting useful features / function approximator.
What is the expected score of the optimal policy? Currently
[437 000, 3 932 100]
Highest possible winning rate for 2048, 4096, 8192, 16384,
32768,65536. . . ?
2048 AI Controller Competition @ GECCO 2015
10 / 10
Jaśkowski & Szubert