Game 2048 AI Controller Competition @ GECCO 2015
Transcription
Game 2048 AI Controller Competition @ GECCO 2015
Game 2048 AI Controller Competition @ GECCO 2015 Wojciech Jaśkowski Marcin Szubert Institute of Computing Science July 13, 2015 Game of 2048 Rules single-player, nondeterministic 4 × 4 board actions: left, right, up, down merging: score the sum every move: 2 or 4 in random position goal: construct tile 2048 http://gabrielecirulli.github.io/2048 2048 AI Controller Competition @ GECCO 2015 2 / 10 Jaśkowski & Szubert Search Results of ‘2048’ in Android Store 2048 AI Controller Competition @ GECCO 2015 3 / 10 Jaśkowski & Szubert Motivation & Goal Motivation Popularity: 4mln visitors in the first weekend 3000 man-years in 3 weeks Easy to learn but not trivial to master → ideal test bed for CI/AI Langerman, S., & Uno, Y. (2015). Threes!, Fives, 1024!, and 2048 are Hard. 1021 states (upper bound), Backgammon: 1020 Few previous studies (RL only!): Szubert, M., & Jaśkowski, W. (2014, August). Temporal difference learning of N-tuple networks for the game 2048. Wu, I. C., Yeh, K. H., Liang, C. C., Chang, C. C., & Chiang, H. (2014). Multi-Stage Temporal Difference Learning for 2048. 2048 AI Controller Competition @ GECCO 2015 4 / 10 Jaśkowski & Szubert Motivation & Goal Motivation Popularity: 4mln visitors in the first weekend 3000 man-years in 3 weeks Easy to learn but not trivial to master → ideal test bed for CI/AI Langerman, S., & Uno, Y. (2015). Threes!, Fives, 1024!, and 2048 are Hard. 1021 states (upper bound), Backgammon: 1020 Few previous studies (RL only!): Szubert, M., & Jaśkowski, W. (2014, August). Temporal difference learning of N-tuple networks for the game 2048. Wu, I. C., Yeh, K. H., Liang, C. C., Chang, C. C., & Chiang, H. (2014). Multi-Stage Temporal Difference Learning for 2048. Goals 1 2 Learning/evolving the strategy. Learning/devising effective function approximators 2048 AI Controller Competition @ GECCO 2015 4 / 10 Jaśkowski & Szubert Competition Rules Rules Java API provided, sumitting a jar (any JVM language). 1 ms (enough!) for a given action (on average) Objective: expected score (average over 1000 games) 2048 AI Controller Competition @ GECCO 2015 5 / 10 Jaśkowski & Szubert Submissions 1: Treecko Authors: T. Chabin, M. Elouafi, P. Carvalho, A. Tonda Affiliation: INRA (France), Linear Genetic Programming: (MicroGP) 19h with 16 cores (Code) (Demo) 2048 AI Controller Competition @ GECCO 2015 6 / 10 Jaśkowski & Szubert Submissions 2: IeorIITB2048 Authors: Devanand and Shivaram Kalyanakrishnan Affiliation: Indian Institute of Technology Bombay, India Cross Entrophy Method A linear weighed function approximator with 9 hand-designed features 2-ply Expectimax (Demo) 2048 AI Controller Competition @ GECCO 2015 7 / 10 Jaśkowski & Szubert Competition Results Ranking : 1 IeorIITB2048 Avg. Score: 18245.6 ± 660.3 2048: 0.341 4096: 0.013 2 Treecko Avg. Score: 5334.3 ± 171.0 1024: 0.048 2048: 0.0 2048 AI Controller Competition @ GECCO 2015 8 / 10 Jaśkowski & Szubert State-of-the-art 1 2 Small number of hand-designed features + CMA-ES + ExpectiMax Large number of binary features + TD + ExpectiMax Systematic n-Tuple Networks1 0 64 128 2 2 8 1 2 3 128 f (s) = m X fi (s) = i=1 1 m X 8 4 2 2 0123 weight 0000 3.04 0001 −3.90 0002 −2.14 .. .. . . 0010 5.89 .. .. . . 0130 -2.01 .. .. . . h i LUTi index sloci1 , . . . , slocini i=1 See also: EML1, Tuesday 17:05 2048 AI Controller Competition @ GECCO 2015 9 / 10 Jaśkowski & Szubert Conclusions Summary 2048: new interesting challenge for AI/CI with simple rules and highly popular, quick to play (even 20ms for one game) Still early research, plenty of opportunities, also as a benchmark 2048 AI Controller Competition @ GECCO 2015 10 / 10 Jaśkowski & Szubert Conclusions Summary 2048: new interesting challenge for AI/CI with simple rules and highly popular, quick to play (even 20ms for one game) Still early research, plenty of opportunities, also as a benchmark Open questions Effective learning (currently millions of games)? Automatically extracting useful features / function approximator. What is the expected score of the optimal policy? Currently [437 000, 3 932 100] Highest possible winning rate for 2048, 4096, 8192, 16384, 32768,65536. . . ? 2048 AI Controller Competition @ GECCO 2015 10 / 10 Jaśkowski & Szubert