Level Generation System

Transcription

Level Generation System
Level Generation System for Platform Games
Based on a Reinforcement Learning Approach
Atanas Laskov
MSc in Artificial Intelligence
School of Informatics
The University of Edinburgh
2009
Abstract
Automated level generation is a topic of much controversy in the video games
industry that divides the opinions strongly for and against it. The current situation
is that generation techniques are in common use only in a few specific genres of
video games for reasons that can be principal, practical and in some cases entirely
subjective. At the same time there is a widespread tendency for game worlds to
become larger and more repayable. Manual level design becomes an expensive
task and the necessity for automated productivity tools is growing.
In this project I focus my efforts on the creation of a level generation system for a
genre of video games that is very conservative in its use of these techniques.
Automated generation for platform video games also presents a technological
challenge because there are hard restrictions on what constitutes a valid level.
The intuition for choosing reinforcement learning as a generative approach is based
on the structure of platform game levels, which lends itself to a natural
representation as a sequential decision making process.
2
Acknowledgements
I would like to thank my parents for their care, as well as my supervisor Taku
Komura for being so friendly and supportive. I also owe many thanks to the
creators of my favourite game - Pandemonium, for their great work and the
inspiration it gives me.
3
Declaration
I declare that this thesis was composed by myself, that the work contained herein
is my own except where explicitly stated otherwise in the text, and that this work
has not been submitted for any other degree of professional qualification except as
specified.
(Atanas Laskov)
4
CONTENTS
......................................
10
I.1. Introduction to Automated Level Generation . . . . . . . . . . . . . . .
10
I.2. Introduction to Platform Games . . . . . . . . . . . . . . . . . . . . . . . .
12
I.3. Goals of the Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
I. INTRODUCTION
I.3.1. Target Level Structure . . . . . . . . . . . . . . . . . . . . . . . . . .
13
I.3.2. Target Mode of Distribution . . . . . . . . . . . . . . . . . . . . . . .
14
I.3.3. Learning Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
II. EXISITING RESEARCH, LIBRARIES AND TOOLS . . . . . . . . . . . . .
16
II.1. Research in Level Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
II. 1.1. Using Context Free Grammars and Graphs . . . . . . . . . . . . . .
16
II. 1.2. Landscape Generation, Fractal and Related Methods . . . . . . .
17
II. 1.3. Level Generation for Platform Games . . . . . . . . . . . . . . . . . .
18
II.2. Reinforcement Learning Overview . . . . . . . . . . . . . . . . . . . . . . . .
19
II. 2.1. The Reinforcement Learning Paradigm . . . . . . . . . . . . . . . . .
20
II. 2.2. Methods in Reinforcement Learning . . . . . . . . . . . . . . . . . . .
22
II. 2.2.1. Bellman Equations and Dynamic Programming . . . . .
22
II. 2.2.2. Temporal-difference Learning . . . . . . . . . . . . . . . . .
23
II. 2.2.3. Actor-critic Methods . . . . . . . . . . . . . . . . . . . . . . .
24
II. 2.2.4. Model-building Methods . . . . . . . . . . . . . . . . . . . . .
24
II. 2.2.5. Policy Gradient Methods . . . . . . . . . . . . . . . . . . . . .
25
II. 2.3. Reinforcement Learning Libraries . . . . . . . . . . . . . . . . . . . . .
25
II. 2.3.1. RL Toolbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
II. 2.3.2. LibPG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
II. 2.3.3. Other Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
II. 2.3.4. Choice of a Reinforcement Learning Library . . . . . . .
27
II.3. Library for the Visualization and Testing of Levels . . . . . . . . . . . . 28
III. ARCHITECTURE OF THE SYSTEM . . . . . . . . . . . . . . . . . . . . . . . . 30
III.1. Conventions and Class Diagrams . . . . . . . . . . . . . . . . . . . . . . . .
30
III.1.1. Class Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30
III.1.2. Member Variables and Methods . . . . . . . . . . . . . . . . . . . . . .
30
III.2. Architecture of the Level Generation System . . . . . . . . . . . . . . .
30
III.3. Stages of Level Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
32
5
IV. LEVEL GENERATION AS A REINFORCEMENT LEARNING TASK
34
IV.1. Task Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
34
IV. 1.1. State Space Representation . . . . . . . . . . . . . . . . . . . . . . . .
34
IV. 1.3. Designing the Reward Function . . . . . . . . . . . . . . . . . . . . . .
37
IV. 1.4. Actions of the Building Agent . . . . . . . . . . . . . . . . . . . . . . .
39
IV. 1.5. Choice of a Learning Algorithm . . . . . . . . . . . . . . . . . . . . . .
40
IV.2. Task Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
41
IV. 2.1. The “Generation Algorithm” Sub-system . . . . . . . . . . . . . . .
41
IV. 2.2. Class Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . .
42
IV. 2.1. Abstract learning algorithm ( genAlgorithm ) . . . . . . .
42
IV. 2.2. Reinforcement Learning algorithm ( genAlgorithm_RL )
42
IV. 2.3. State of the building agent ( genState_RL ) . . . . . . . .
44
IV. 2.4. Actions of the building agent ( genAction_RL ) . . . . . .
44
IV. 2.5. Transition and reward functions ( genWorldModel ) . . .
44
IV. 2.6. Traceability Markers . . . . . . . . . . . . . . . . . . . . . . . .
45
IV. 2.6.1. Updating the Traceability Markers . . . . . . . .
46
IV. 2.6.2. Jump Trajectory . . . . . . . . . . . . . . . . . . . . .
48
IV. 2.6.3. Representation in Memory . . . . . . . . . . . . . .
48
V. POST-PROCESSING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
50
V.1. The “Blueprint” and “Post-processing” Sub-systems . . . . . . . . . . .
51
V.2. Implementation of Post-processing . . . . . . . . . . . . . . . . . . . . . . . . 52
V. 2.1. Wildcards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
52
V. 2.2. Context Matchers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
53
V. 2.2.1. Removing Redundancies. Smoothing and Bordering . .
54
V.3. Prepared Graphical Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
V.4. Output File Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
57
V.4.1. Extending the system to other file formats . . . . . . . . . . . . . . .
57
VI. GRAPHICAL USER INTERFACE . . . . . . . . . . . . . . . . . . . . . . . . . . 58
VI.1. The Level Parameters Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
VI.2. Implemented User Interface Controls . . . . . . . . . . . . . . . . . . . . .
59
VI.3. Level Generator Dialogs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
61
VI. 3.1. Parameters Dialog ( uiParametersDialog ) . . . . . . . . . . . . . .
62
VI. 3.1.1. Generation thread . . . . . . . . . . . . . . . . . . . . . . . .
63
VI. 3.2. Progress Dialog ( uiProgressDialog ) . . . . . . . . . . . . . . . . . .
63
6
VI. 3.3. Statistics Dialog ( uiStatisitcsDialog ) . . . . . . . . . . . . . . . . .
63
VI. 3.4. Completion Dialog ( uiCompletedDialog ) . . . . . . . . . . . . . . .
63
VII. OUTPUT AND EVALUATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
VII.1. Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
67
VII.1.1. Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . .
67
VII.1.2. Performance Measurement . . . . . . . . . . . . . . . . . . . . . .
68
VII.1.3. Error Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . .
70
VII.2. Optimisation of Parameter Settings . . . . . . . . . . . . . . . . . . . . . . 70
VII. 2.1. Global Search in the Parameter Space . . . . . . . . . . . . .
70
VII. 2.2. Local Search in the Parameter Space . . . . . . . . . . . . . .
73
VII. 2.3. Results of Parameter Optimisation . . . . . . . . . . . . . . . .
74
VII.3. Generator Benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
75
VII. 3.1. Intrinsic Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . .
75
VII. 3.1.1. Benchmark: training time . . . . . . . . . . . . . . .
75
VII. 3.1.2. Benchmark: variant generation . . . . . . . . . . .
76
VII. 3.1.3. Benchmark: post-processing time . . . . . . . . .
78
VII. 3.1.4. Total level generation time . . . . . . . . . . . . . . .
79
VII. 3.2. User Suggestions for Future Development . . . . . . . . . .
80
CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
81
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
7
LIST OF FIGURES
Fig. I-1
Classification of platform games according to level structure . . . .
13
Fig. II-1
Evaluative Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
Fig. III-1
Notation used in class diagrams . . . . . . . . . . . . . . . . . . . . . . . .
30
Fig. III-2
Architecture of the level generation system . . . . . . . . . . . . . . . .
31
Fig. III-3
Five stages of level generation
32
Fig. IV-1
Level generation as a sequential process . . . . . . . . . . . . . . . . .
35
Fig. IV-2
Global and local state of the building agent . . . . . . . . . . . . . . . .
36
Fig. IV-3
Is the state a fail state? . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
Fig. IV-4
Rewards and penalties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38
Fig. IV-5
Actions of the building agent . . . . . . . . . . . . . . . . . . . . . . . . . .
39
Fig. IV-6
Class diagram of the “Generation Algorithm” subsystem . . . . . . .
41
Fig. IV-7
Pseudo-code of the transition function . . . . . . . . . . . . . . . . . . .
45
Fig. IV-8
Sample level blueprint showing the traceability markers . . . . . . .
46
Fig. IV-9
Updating the traceability markers . . . . . . . . . . . . . . . . . . . . . .
48
Fig. IV-10
Simplification of jump trajectory . . . . . . . . . . . . . . . . . . . . . . .
49
Fig. V-1
The “Blueprint” and “Post-processing” subsystems . . . . . . . . . . .
51
Fig. V-2
Table of common wildcards used during post-processing . . . . . .
52
Fig. V-3
Context matchers for removing redundancies . . . . . . . . . . . . . .
54
Fig. V-4
Terrain smoothing context matchers . . . . . . . . . . . . . . . . . . . .
55
Fig. V-5
Bordering context matchers . . . . . . . . . . . . . . . . . . . . . . . . . .
55
Fig. V-6
Contour tiles joined together . . . . . . . . . . . . . . . . . . . . . . . . . .
56
Fig. V-7
Lava trap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
Fig. V-8
Shooting adversary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
Fig. V-9
Sample output to a .lvl file . . . . . . . . . . . . . . . . . . . . . . . . . . .
57
Fig. VI-1
Layout of the user interface . . . . . . . . . . . . . . . . . . . . . . . . . .
58
Fig. VI-2
Registering a new parameter . . . . . . . . . . . . . . . . . . . . . . . . . .
59
Fig. VI-3
Base classes of the user interface . . . . . . . . . . . . . . . . . . . . . .
60
Fig. VI-4
Dialogs implemented for the level generation system . . . . . . . . .
61
Fig. VI-5
Parameters Dialog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
62
Fig. VI-6
Completion Dialog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
63
8
........................
Fig. VII-1
Examples for generated levels with different branching factors . .
66
Fig. VII-2
Global search in the parameter space . . . . . . . . . . . . . . . . . . . .
71
Fig. VII-3
Best parameter configurations after global search . . . . . . . . . . .
72
Fig. VII-4
Influence of the gamma parameter on the generation of variants
72
Fig. VII-5
Correlation between PC, epsilon and its attenuation . . . . . . . . . .
73
Fig. VII-6
Correlation between PC, alpha and its attenuation . . . . . . . . . . .
73
Fig. VII-7
Results of parameter optimisation . . . . . . . . . . . . . . . . . . . . . .
74
Fig. VII-8
Test configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
75
Fig. VII-9
Training time benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . .
76
Fig. VII-10
Variant generation benchmark . . . . . . . . . . . . . . . . . . . . . . . . .
76
Fig. VII-11
Post-processing benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . .
77
Fig. VII-12
Scalability with regard to level length and branching . . . . . . . . .
78
Fig. VII-13
Scalability with regard to the “chaos” parameter . . . . . . . . . . . .
79
Fig. VII-14
User suggestions for further improvement . . . . . . . . . . . . . . . .
80
EQUATIONS
Eq. II-1
Discounted reward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
Eq. II-2
Bellman equation for an optimal value function V*(S) . . . . . . . . .
22
p
Eq. II-3
The value function V (S) of a policy p . . . . . . . . . . . . . . . . . . . . .
22
Eq. II-4
The update rule of temporal difference learning . . . . . . . . . . . . .
23
Eq. IV-1
Penalty for reaching the fail state . . . . . . . . . . . . . . . . . . . . . . .
38
Eq. IV-2
Reward for creating a branch in the level . . . . . . . . . . . . . . . . . .
38
Eq. IV-3
Reward for placing dangers and treasure . . . . . . . . . . . . . . . . . .
38
Eq. VII-1
Convergence metric PC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
69
Eq. VII-2
Variant generation metric PG . . . . . . . . . . . . . . . . . . . . . . . . . . .
69
Eq. VII-3
Standard error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
70
9
CHAPTER I
INTRODUCTION
Throughout the history of the video games industry there are many instances of
level generation tools of varied sophistication but very little work, and even less
practical success, when it comes to level generation for the genre of platform
games. This deficiency can be explained with the technical obstacles that arise as a
result of the structure of platform games and also with the tradition of using
handcrafted levels.
In spite of this challenge, it is my belief that the development of an automated tool
for platform games is a worthwhile task. With the constant increase of graphical
complexity and player expectations level design becomes a costly effort. Large
game studios invest millions in level design alone, whereas small and independent
developers struggle to deliver a game that is up to standard and still within a
reasonable budget. This makes an automated tool to aid the design process for
platform games a valuable and timely contribution.
I.1. Introduction to Automated Level Generation
Automated level generation has been successfully applied to some genres of video
games but to this date there are no commercial platform games using generation
techniques. The indie game “Spelunky” [Yu08] is a significant success in this
respect and to the best of my knowledge the only game to combine random level
generation and elements of platform gameplay. However, it can be classified as a
platform game only in the broadest sense because the used level structure
resembles more closely that of a Rogue-like game. “Rogue” [TWA80] is the first
game in a long succession of dungeon exploration games, a genre that is most
often associated with automated level generation. The widespread use of
generated levels in Rouge derivatives is based on the practical reason that there
are very few hard restrictions on the shape that a dungeon level can take. By
contrast, platform games are a much more demanding genre, as even small
changes in a level can easily invalidate it.
10
Although the technological challenges can differ for each genre of game the
benefits of using a level generation system apply to universally. There are two
important advantages of automated level generation over manually created levels:
·
Increased replayability of games. In recent years replayability is becoming
a very important design objective for many game development studios
[Bou06]. Designers use different approaches to encourage reliability, such
as making the game non-linear, introducing hidden and hard to reach (HTR)
areas, unlocking additional playable characters. In spite of these efforts, the
level is very much the same the second time it is played. Integrating an
automated generation system in the game can improve replayability
because the player will be provided with a completely different level for
every replay.
·
Reduced production cost. With the increase of graphical complexity and
size of the game worlds, the amount of effort that level designers must put
in a single level is also increasing. It is often the case that a whole team of
designers must work on a single level, making handcrafted levels very
costly [Ada01]. Automated level generation can reduce the amount of
manual work by providing an initial approximation to the level structure that
is further refined by human designers. Critical parts of the level design,
such a boss encounters or any scripted events, would receive more
attention than less important parts of a level.
Common criticism of automated level generation is that it produces levels of
inferior quality to that of hand-crafted levels. For example, generating a level
containing logical puzzles or one that contributes to the storyline of the game is
very difficult to achieve. It is also not possible to give an automated system the
sense of aesthetics that a human designer can apply.
These arguments are valid but they are not a reason for abandoning research in
this field. Even if the technology is not fully mature it is still possible to integrate
automated and human design with good results. Many dungeon exploration games,
such as “Diablo” I and II [Bli00], demonstrate the validity of this statement. With
the advancement of technology it would become a practical solution to entrust the
generator with more sophisticated design tasks.
11
I.2. Introduction to Platform Games
This section makes a brief introduction to platform games and a description of the
level structure typical for this genre. In doing so I will try to highlight the specific
requirements that apply to level generation for platform games and also to
familiarise the reader with the area of application of this project.
Platform games have evolved significantly since the early years of the game
industry with considerable sophistication of game mechanics and a transition form
2D to 3D graphics. Nonetheless, the core gameplay is based on the same principles
that apply to early titles such as “Super Mario Bros” [Nin87] and “Donkey Kong”
[Nin81]. Initially, the main character in the Mario games was called “Jumpman”, a
name that reveals one of the distinguishing features of this genre – jumping in
order to avoid obstacles. Platform games encourage the player to perform jumps
and other acrobatic activities within a level that contains a sequence of platforms,
pits, monsters and traps [Bou06, SCW08]. It is the goal of the player to overcome
all obstacles on his way to the finishing point.
Because of limitations in development time and resources, it is common to have a
small closed set of obstacle elements that are repeatedly used throughout the
levels. For each level the human designer chooses an appropriate subset of level
elements, commonly referred to as the “tileset”, and decides how to arrange them
in order to produce the illusion of variety. The use of a tileset greatly facilitates the
functioning of the automated generation system. It allows to specify the actions of
the building agent as indexes in this relatively small set of elements.
Traditionally platform games have mostly linear level structure, allowing for only a
few alternative paths for the player [Bou06]. It is typical for a level to start with
one pathway then branch into several alternative pathways and by the end of the
level for all of them to merge again into a single pathway. There is only one start
point and only one finishing point but still some freedom for the player to choose
how to approach a level. This is what will be referred to as the classical “branching
structure” of levels.
The transition to 3D graphics brought not only a cosmetic change but also the
opportunity for level designers to experiment with an alternative level structure.
Adding a third dimension makes it possible to engage in more exploration if this is
supported by a non-linear structure of levels. Some games, as for example “Super
12
Mario 64” [Nin96], have adopted this very different level design approach but there
are also many 3D games that continue to be faithful to the traditional design
paradigm. It may seem that a 3D graphics engine lends itself naturally to a nonlinear level concept but this is not necessarily the best choice. Non-linear levels can
be problematic if not implemented with care, creating burdens for the player such
as unintuitive camera control and increased difficulty in judging distances.
It should also be noted that even the most “non-linear” world concept is based on
implicit linear episodes. It is by definition true that in platform games the player
performs a sequences of jumps in order to avoid a sequence of obstacles. In the
classical branching structure of levels this order is strictly compulsory, whereas in
non-linear levels it can only be suggested by the level designer.
I.3. Goals of the project
It is the goal of this project to implement a level generation system capable of
adequately modelling the level structure of platform video games. There are many
automated generation tools in existence but none of them is able of adequately
performing this task. In the following sections I present this goal more accurately
by describing the target level type, mode of distribution and learning approach
used by the level generation system.
I.3.1 Target Level Structure
There is a significant variety of platform games so in designing a level generation
system it is necessary first step to specify type of levels that it will be able to
produce. Figure I-1 on the next page presents one possible classification of
platform games by level structure. This project has the ambition of implementing a
level generation system for platform games of Class (1). This class includes most
of the 2D games and also some games with 3D graphics but a traditional 2D level
structure, such as “Pandemonium” [Cry97] and “Duke Nukem: Manhattan Project”
[3DR02]. The system is designed with the idea of extensibility to Class (2) but
implementing level generation for this class is beyond the scope of this project.
Games with the level structure described by Class (3) differ from the other two
classes to an extend that makes their classification as “platform games” open to a
debate. Generating levels of this class would require a significantly different
approach challenges so it is excluded from the goals of this project.
13
Figure I-1. Classification of Platform Games by Level Structure
Class (1) Level
Branching Levels in 2D and 2.5D
The player walks along the X axis and jumps along the Y axis.
Typically, the player starts at the leftmost end of the level and
gradually progresses to the right. The view of the level is
scrolled as the player moves. There can be some amount of
non-linearity,
implemented as several “branches” running in
parallel to one another.
Although the level structure is two-dimensional, modern games
of this type are usually implemented with a 3D graphical engine.
The resulting genre is sometimes referred to as 2.5D.
Example: Pandemonium, DN: Manhattan Project [Cry97, 3DR02]
Class (2) Level
Branching Levels in 3D
The player walks on the (XY) plane and jumps along the Z axis.
The freedom of movement on the (XY) plane is restricted to a
narrow pathway and a small area around it. There can be both
horizontal and vertical “branches” of the level, providing
alternative choices for the player.
Example: Crash Bandicoot [Son96]
Class (3) Level
Non-linear levels
The player walks on the (XY) plane and jumps along the Z axis.
Unlike levels of Class (2), the movement on the (XY) plane is
not restricted and the player can choose in what order to
interact with terrain elements.
Example: Super Mario 64 [Nin96]
I.3.2 Target Mode of Distribution
Another important consideration for the goals of this project is related to the way
in which the level generation system will deliver its output to the user. It is
possible to use the generation system either as an internal development tool or to
distribute it to the player, together with a video game. In the first case the user of
14
the system is not the player but the game developer. Under this mode of
distribution it is acceptable to for the level output to contain occasional errors, for
as long as the correction of these errors by a human designer takes significantly
less time than building a level from scratch. In this case the replayability of levels
is not increased but the system can be a useful productivity tool.
The second alternative is to distribute the level generation system in integration
with a game project. Under this mode of distribution the advantages of having a
level generation system can be exploited more fully but it is also more demanding
with regard to the validity the output. It is necessary to guarantee that the system
generates an error-free level within a reasonable time limit. The quality of the
output must also meet higher standards in terms of visual appeal because in this
case there is no human intervention to improve it.
This project was originally intended as an internal development tool but the
evaluation of the system showed that, with some additional work, it can be
extended to the more demanding mode of distribution. The implementation
outlined in this document makes it is possible to decouple the learning and
generation part of the system, resulting in more predictable and error-free output.
The game could integrate only a library of learned polices a lightweight variant
generation algorithm.
I.3.3 Learning Approach
In this project I choose to make use of an unsupervised learning approach for
solving the level generation task. Unsupervised learning operates without the need
for training data which is a significant advantage in light of the fact that most of
the existing level data is proprietary. It was also my reasoning the task lends itself
naturally to a reinforcement learning interpretation because platform game levels
are discrete, sequential, and not unlike many of the Gridworld examples in the
literature [SB98]. The difference is that a level generation system would have a
“building agent” that changes the environment, rather than a “walking agent”
responding to existing obstacles.
15
CHAPTER II
EXISITNG RESEARCH,
LIBRARIES AND TOOLS
In this chapter I present existing research in level generation in light of its
relevance to the task of level generation for platform games. Methods such as
context free grammars and graph grammars are examined, as well as research
focusing on level generation for platform games. Research in this specific area is
sparse but it gives a valuable insight, especially with regard to the method used for
evaluating the validity of levels.
The chapter continues with an overview of the of reinforcement learning paradigm,
some principal methods and learning algorithms. I also examine several libraries
implementing these algorithms and discuss how the use of an existing library
relates to the goals of this project.
II.1. Research in Level Generation
II.1.1 Using Context-free Grammars and Graphs
In the context-free grammar (CFG) approach to level generation [Inc99], which is
typically used for generating dungeon levels, the permissible level structures are
encoded in the productions of a grammar. Each level is represented as a sequence
of grammar productions starting with the global non-terminal LEVEL. Given this
level representation, automatic generation amounts to building random valid
sentences. This is a simple procedure that can be implemented as follows: Starting
with the LEVEL non-terminal, the generation algorithm recursively applies a
random production out of the set of productions that have the required nonterminal in their antecedents (i.e. on the left side of the production). This is
repeated until all the leafs of the expansion tree become terminal symbols.
The author of [Inc99] mentions several drawbacks of this approach, including
irregular distribution of the enemies, existence of “pointless sections” and deadends in the level output. There is also the necessity to convert the tree
representation into a level layout, which is not necessarily possible for every
16
generated tree. Although these problems could be solved by modifications in the
tree-generation algorithm, in my view the CFG approach has a more significant
disadvantage. The necessity to specify an explicit grammatical representation of all
possible level configurations could require very time consuming and complicated
work. Even if a corpora of LEVEL strings existed, which to the best of my
knowledge is not the case, it would be necessary to implement some sort of
authomated inference in order to learn the grammar [JM08]. In light of this, I see
no viable alternative to creating the grammar by hand. It is clear that any method
based on CFG or tree expansion would have to involve considerable amount of
manual work in order to adapt it for a different game or in the case the level
specification changes.
In [Ada02] the another presents a variation of the grammar approach that uses
Graph Grammars (GG) for the purposes of generating dungeon levels. Under this
approach the grammar encodes a series of transformations applied to a graph
representation of the level. This appears to be a reasonable choice for modelling
the web-like structure of dungeon levels but it is not clear what the advantage
would be in the case of platform games. The current project is concerned with
levels that have no enclosed “rooms” to be represented as nodes of a graph, and
no linking corridors corresponding to the arcs. The connectivity of different
locations is a platform game level is the result of the simulated laws of physics and
not an explicit design choice.
II.1.2 Landscape Generation, Fractal and Related Methods
The task of generating naturally-looking landscapes has attracted a lot of attention
and as a result there are several mature approaches capable of producing
satisfactory results. Fractal generation methods [Mar97, Mil89] and bitmap-based
methods [Mar00] are often used in game development, as they can produce very
flexible output at low computational costs.
There are also some experimental landscape generation systems, as for example
the one presented in [OSK*05] where the authors develop a two-stage Genetic
Algorithm for “evolving” terrains. In the first stage of generation the user specifies
a 2D sketch of the environment that is refined by a genetic algorithm. In the
second stage another GA uses the refined sketch as a template for blending
together several terrain types and introducing variations. In the second stage the
GA a database of terrain samples is used. The authors point out that unlike
17
Geographically Informed Systems (GIS) that are based on a physical simulation of
tectonic activity and erosion processes, this method is can not only produce
realistic terrains but also comical or exaggerated comical landscapes.
In spite of their popularity and generative power these methods do not address the
specific challenges of level generation for platform games. Even if there is outdoor
terrain in a given platform game it usually serves as nothing more than a graphical
backdrop.
II.1.3. Level Generation for Platform Games
There is very little research dedicated to the specific needs of level generation for
platform games.
In [CM06] the authors describe a level generation system for
platform games that is related to the context-free grammar methods discussed in
section II.1.1. This paper proposes a hierarchical model of levels, which consists of
three layers - level elements, patterns and cells. Each of the components in this
hierarchy is created in a different way and captures a different aspect of platform
level design. Level elements are the manually specified terminal symbols of the
grammar (or a “bracketing system” as it is referred by the authors), representing
different types of platforms. Patterns capture the sequence of jumps and obstacles
in the level and are generated by a hill-climbing algorithm based on random
modifications of the bracketing rules. Cells capture the non-linearity (i.e.
“branching”) of levels. This research shares the shortcoming of all CFG methods in
that it needs a manual specification of all pattern and cell configurations. This is
referred to as a “bracketing system” but as far as it can be judged by the provided
description, there is no principal difference to designing a grammar. Specifying all
possible combination of level cells is likely to be a time consuming task that must
be repeated for every new game taking advantage of the generator.
Another problem with the proposed level generation method is in the use of a hillclimbing algorithm, which converges on a local maximum in the solution space. For
each possible subset of cells the generation algorithm uses steepest ascend hillclimbing in order to find the optimal level pattern. Out of all these patterns with
different length the system selects the one which is closest to the desired difficulty
value. It is the author’s claim that is an acceptable solution because of the absence
of local maxima in the search space of possible level patterns. However, this
statement is not substantiated in the paper. It could be the case that the particular
18
grammar is designed in order to ensure this but in cannot be assumed in the
general case of a bracketing system developed for any game.
The same work describes an interesting method for evaluating the difficulty and
traceability of levels by running a ballistic simulation of the player. This simulation
is executed on the whole level pattern, followed by a step of the hill-climbing
algorithm. In the evaluation step the simulated player moves along the platforms
in the level and performs jumps at every possible location. Each jump defines a
“spatial window” that marks the portion of the level below the ballistic curve as
accessible. In the tradition of platform games this physical simulation is not
entirely realistic, as it includes the ability to change direction in “mid-air”. Because
if this ability not only the ballistic curve but the whole area below it becomes
accessible in a single jump. In the implemented model there is also the possibility
for other popular exaggerations, such as double jumps for example.
For the goals of this project I develop a system of traceability markers based on
the same idea of level evaluation. The method described in [CM06] evaluates and
re-evaluates the whole level at every step of a hill-climbing algorithm. Unlike this
procedure, traceability markers need be updated only once for each position in the
level.
II.2. Reinforcement Learning Overview
As discussed in the previous section, approaches based on context free grammars
have been applied to platform games but they require the hand-coding of
generational rules. This requirement for manual specification of rules and the
resulting lack of generality are flaws of the CFG method.
Supervised learning techniques could be a solution to this problem. Unfortunately,
training data is difficult to come by because every game has a unique level format,
which is often proprietary. One additional disadvantage of using a supervised
approach would be that it tries to mimic existing data whereas game designers
want originality in their levels.
It is my reasoning that a more natural solution to the generation problem would be
based on unsupervised methods, such as Reinforcement Learning. This would allow
for a specification of the task in terms of desirable and undesirable states of affairs,
rather than by mimicking existing data.
19
II.2.1 The Reinforcement Learning Paradigm
Reinforcement learning is concerned with the task of finding an optimal policy p*
that maximises the long-term, online performance of an agent functioning in a
given environment [KLM96]. The immediate performance is measured by the
reward signal rtÎR that the environment generates after each action of the agent.
There are several possible definitions for long-term performance. In the most
commonly used definition the goal is to optimise the infinite-horizon discounted
reward [SB98]:
¥
R = å g t rt
(Eq. II-1)
t =0
where g is a reward discount rate. One important distinguishing feature of
reinforcement learning that sets it apart form supervised learning techniques is the
use of a reward signal. In a
discrete
interpretation,
at
each time step t the agent
performs an action atÎA and
depending
on
the
Agent
Action
<at>
Environment
current
state of the environment stÎS
Reward signal and new state
<rt+1, st+1>
it receives a different reward
signal rtÎR. The next state
st+1
is
at
least
Figure II-1 Evaluative feedback
partially
determined by the performed
action (Figure II-1).
The use of a reward signal is termed “evaluative feedback”, as opposed to the
“instructive feedback” used in supervised learning. In most cases the reward signal
is delayed (e.g. until a success or failure state is reached) and the environment
could also contain deceptive reward signals that lead to a very undesirable part of
the state space and a subsequent penalty. Therefore, the probability of a future
event occurring must have an effect on the actions that the agent takes in the
present time step. Maximizing the long term performance makes it necessary to
devise some method of evaluating these probabilities and propagating the future
rewards back to the current state.
20
In order to specify a given task as a reinforcement learning problem, it necessary
to satisfy several conditions [SB98]:
(1) Discrete state representation
Most reinforcement learning techniques are designed to work in a discrete
state space and at discrete time steps. If the natural representation of the
state space is continuous then discretisation may increase the number of
states dramatically, which in turn will affect the performance and memory
requirements of the learning algorithm. As outlined in [KLM96], supervised
learning methods, such as the approximation of the value functions with a
neural network, can help to alleviate this problem.
Fortunately for the goals of this project, platform games often represent the
level as a discrete grid, each cell containing one out of a finite number of
values (i.e. indexes in the tileset). In some games with a more free-form
appearance of the terrain, converting to a discrete grid representation may
require additional processing.
(2) Markov Property
The Markov property requires all decisions of the agent to be made only on the
basis of the current state and not the previous history of states. In the case of
the level generation problem this requirement is not immediately satisfied but
with the use of traceability markers, introduced in Chapter III, the previous
history of states can be efficiently compressed into the current state.
(3) Reward signal
Evaluative feedback defines the goals of the optimisation task. Specifying the
reward signal can be a challenging task and if not implemented with care the
learning algorithm could converge on a solution that is optimal with respect to
the reward signal but hardly useful for the goals and purposes of the designer.
It is good practice to specify incremental rewards leading to a desirable part of
the state space, rather than only in the final state of achievement. Seeding the
value function with an approximate solution could be another way to guide the
learning process.
The environment of the agent can be non-deterministic, allowing for a transition
function that draws the next state form some underlying and unchanging
probability distribution. If the underlying probability distribution changes the
21
environment is not only non-deterministic but also non-static. In [KLM96] the
authors point out that most learning algorithms are proved to converge only under
the assumptions for a static environment and lack such guarantees in the case of a
non-static environment. Nonetheless, for a slowly changing non-static environment
reinforcement learning methods still show good results.
II.2.2 Methods for Reinforcement Learning
Because of the variety of available methods reinforcement learning can be
described more precisely by the learning paradigm described in the previous
section,
rather
than
a
specific
learning
algorithm.
The
wide
variety
of
reinforcement learning methods is often divided into two distinct groups – methods
learning a value function and policy gradient methods.
The discussion of learning algorithms starts with value-based methods that take
advantage of the representation of the learning task as a Markov Decision Process
(MDP). By contrast with this specific value-learning approach, policy gradient
method can use any general optimisation algorithm combined with evaluative
feedback.
II.2.2.1 Bellman Equations and Dynamic Programming
Dynamic programming, as applied to solving the Bellman optimality equations, is
amongst the first approaches to reinforcement learning. This method requires a
model of the environment which means that the transition probabilities and the
reward function must be specified in advance [SB98]. The Bellman equation for
V*(s) is a recursive definition of what constitutes an optimal value function:
V * ( S ) = max å Pssa' ( Rssa ' + gV * ( s ' )) ,
a
(Eq. II-2)
s'
where Pass’ is the transition probability, R ass’ is the reward signal and g is the reward
discount rate. More generally, for any given policy the value function V
p
(S) is
given as follows:
V p (S ) = å p ( s, a)å Pssa' ( Rssa ' + gV p ( s ' )),
a
(Eq. II-3)
s'
where p(s,a) is the probability of choosing action aÎA in state sÎS under the
current policy. The dynamic programming approach can be implemented as a two
phase iterative algorithm. In each iteration it brings the current policy p one step
22
closer to the optimal policy p* by alternately performing “policy evaluation” and
“policy improvement”.
Policy evaluation computes an estimate of V(s) for the
current policy p, followed by policy improvement, which is a greedy change of p
with regard to the new estimate for V(s). There is a theoretical proof for
convergence of the algorithm and it is also guaranteed that at each step the policy
will only improve. The most significant disadvantages of the Dynamic Programming
approach are the requirement for a model of the environment and the
computational costs of re-estimating the value function.
II.2.2.2 Temporal-Difference Learning
In [SB98], temporal-difference learning is described as an approach to estimating
the value of V(s), or Q(s,a), on the basis of previously obtained estimates. It does
not require a model of the environment and learning occurs throughout the
training run and not only at the end of it. The updates of the value function be
performed either in sweeps or each state can be updated asynchronously. The
update rule of the TD algorithm is as follows:
V new ( s ) = V ( s ) + a(r + gV ( s ' ) - V ( s )),
(Eq. II-4)
where a is a learning rate parameter (gradually decreased during the run), g is the
reward discount rate and V(s’) is the current estimate for the value of the next
state. This update rule makes no reference to the transition or reward function of
the environment, making TD a model-free approach to estimating the value
function. The update rule can also be extended to a look-up in several subsequent
states, resulting in the TD(l) algorithm. This is technique is referred to as an
“eligibility trace”, controlled by the l parameter. Eligibility traces can improve the
accuracy of value estimates at the cost of more computation per iteration [KLM96].
There are two important learning algorithms that are based on the principles of
temporal-difference learning. The SARSA algorithm is an on-policy learning
algorithm that learns the value function of state-action pairs Q(s,a), instead of
V(s). SARSA can also make use of eligibility traces for performing more accurate
estimates. Q-Learning is an off-policy temporal difference learning algorithm. Like
SARSA it learns a value function for the state action pairs Q(s,a). Unlike SARSA, it
always chooses the best action in its update rule, regardless of the choice that the
agent makes.
23
There are both advantages and disadvantages to off-policy learning. It is more
likely for an on-policy learning algorithm, such as SARSA, to converge on
suboptimal solution if the initial seed of the value function happens to be unlucky.
Q-learning is more resilient with regard to bad policies [SB98]. However, it is
possible that an exploratory move would push the agent into a undesirable state.
The value function learned by an off-policy algorithm would not account for this
probability and therefore the agent will not learn how to avoid such states under
the current policy. Therefore, under some circumstances the “on-line performance”
of the agent can become worse [SB98, Num05].
II.2.2.3. Actor-Critic Methods
As outlined in [KLM96], the distinguishing feature of Actor-critic methods is that
they store the learned value function and the policy separately. The critic part of
the system receives a reward signal form the environment and uses it to compute
an error for its value estimates. This is followed by an update of the estimate and a
propagation of the error value to the actor part of the system. It is the
responsibility of the actor to update the action selection policy based on the error
signal supplied by the critic and not directly the environment reward signal. The
Adaptive Heuristic Critic algorithm is a modified version of policy iteration that
stores the policy and optimal value function separately. Unlike the Dynamic
Programming version of policy iteration, it does not rely on the Bellman equations
for computing the V(s) function but performs the TD update rule presented in
equation (VII-4). Natural Actor Critic [PS05] is another development in the group
actor-critic methods. It is an advantage of methods in this group that the
computational cost of update rules can be very small.
II.2.2.4. Model-building Methods
Model-building methods, as presented in [KLM96, SB98], try to maximise the use
information that can be obtained form training experience by trying to estimate the
dynamics of the environment. For tasks such as robotic control where training
experience can be difficult to obtain, this could be a well motivated approach. The
original model-building algorithm, referred to as Certainty Equivalence, performs a
full re-evaluation of every state of the environment at each time step. This is a
procedure
that
can
easily
performance is required.
becomes
intractable,
especially
when
real-time
The Dyna algorithm suggests a more computationally
tractable alternative that chooses a random sample of k states at each time step.
Prioritized Sweeping [MA93] is an elaboration of the Dyna sampling method that
uses DV(s), the amount of “surprise”, as an indicator for good parts of the state
24
place to be sampled. Model-building methods trade the speed of the update rules
for a better use of training experience, which in the case of a simulated
environment (i.e. where experience is easy to obtain) probably is not a good
choice.
II.2.2.5. Policy Gradient Methods
Policy Gradient Reinforcement Learning (PGRL) arises as a combination of
evaluative feedback and traditional supervised learning approaches. Neural
networks have been successfully used in combination with a reinforcement signal
that readjusts the weights of the network, as in [WDR*93] for example, and other
gradient methods such as simulated annealing can also be adapted. There are no
limitations in the use of an optimisation algorithm for as long as it can
accommodate the feedback generated by the environment.
One particularly interesting line of development is Evolutionary Algorithms for
Reinforcement Learning or “EARL” [MSG99] and reinforcement Genetic Algorithms.
The genetic algorithm uses a fitness function in order to rank a population of
competing
“chromosomes”,
encoding
candidate
solutions
to
a
problem.
Traditionally, the fitness function is calculated by testing the chromosome on a
database of problem/answer pairs, where fitness is awarded reverse proportionally
to the distance from the correct answer. In an alternative solution the GA can be
adapted to the reinforcement learning paradigm by creating an environment model
that responds to the behaviour of a candidate solution. The cumulative reward can
then be used as the fitness measure and the GA will perform selection according to
its value. It should be noted that in this case the goal of the optimisation would not
be a future discounted reward, as in Eq. II-1, but the cumulative reward for the
run.
II.2.3. Reinforcement Learning Libraries
The majority of the algorithms described in the previous sections have already
been implemented and studied in the work of many researchers. Therefore, it
would be a repetition of previous work to focus on a new implementation when the
goal of this project is not to improve any particular learning algorithm but to
implement the task of level generation for platform games. In this section I
present several reinforcement learning libraries and discuss my choice of an
implementation that addresses the needs of this project.
25
II. 2.3.1 RL Toolbox
RL-Toolbox [Num05] is a C++ library for reinforcement learning that implements
many of the value-learning algorithms. This includes support for TD(l)-learning
(SARSA and Q-Learning), Actor-Critic methods and Dynamic Programming. The
library also includes implementations of commonly used action selection policies,
such as e-greedy and Softmax action selection, as well as a useful hierarchy of
adaptive parameter classes.
Learning algorithms in this library are represented by an abstract interface that is
inherited by the specific implementations. This makes changes of the learning
algorithm relatively easy. In order to initialize the reinforcement learning
environment, an agent object is created, all available actions are registered in it
and a “controller” object is specified (i.e. the action-selection policy). Specifying
the environment transition and reward function requires the creation of a derived
class that implements two virtual methods corresponding to these functions.
The author of RL-Toolbox draws attention the fact that the primarily design
objective of the framework is to serve as a general library for reinforcement
learning and performance of the implementation takes a second place. It is my
reasoning that using a generalised library, such as RL-Toolbox, is a good choose in
the prototyping of a reinforcement learning solution. Only after sufficient
experiments have been performed it would be a sensible decision to invest efforts
in developing a customised and streamlined learning algorithm for the task at
hand.
Another advantage of this library is the availability of both Linux and Windows
distributions. The ease of prototyping and experimentation, combined with the
compatibility of the target platform are two strong points that motivate the choice
of RL-Toolbox as the library used in the current project. RL-Toolbox is also well
documented, initially in [Num05] but also in terms of source code documentation
and the availability of examples.
II. 2.3.2 LibPG
LibPG is an open-source library developed as part of the Factored Policy-Gradient
Planner [BA06]. The acronym “PG” stands for Policy Gradient but the library also
implements value-based learning algorithms. LibPG implements the SARSA, Q-
26
Learning, Policy-Gradient Reinforcement Learning, Natural Actor-Critic and Least
Squares Policy Iteration (LSPI) algorithms.
This project is distributed together with example programs and has good
documentation, although not as extensive as that of RL-Toolbox. Unfortunately,
the LibPG library is only available for Linux. In light of the fact that the current
project is developed in a Microsoft Windows programming environment, it appears
that LibPG would require some time consuming readjustments if it is to be used.
II. 2.3.3 Other Libraries
There are two Java implementations that merit at least a brief description. PIQLE
[Com05] stands for “Platform for Implementing Q-Learning Experiments” and as
the name suggests is an implementation of Q-Learning. This is an open-source
project
that
also
supports
multi-agent
reinforcement
learning.
The
“Free
Connectionist Q-learning Java Framework” [Kuz02] is another open-source Java
implementation. This framework is an example for combining the reinforcement
learning paradigm with supervised learning techniques, in this case a neural
network.
Another
interesting
project,
which
does
not
contain
any
algorithmic
implementations but specifies a standard for the components of a reinforcement
learning system, is RL-Glue [WLB*07]. The goal of this library is to create a
standard for the interaction between the different components of an reinforcement
learning experiment. Experiments, agents and environments can be implemented
in a number of different programming languages, including C/C++, Java and as
Matlab scripts. These objects can work together by means of interfacing with the
standardising RL-Glue code.
II. 2.3.4. Choice of a Reinforcement Learning Library
When comparing different reinforcement learning libraries the following factors
were considered:
·
Implemented reinforcement learning algorithms;
·
Compatibility of the programming language and development environment;
·
Availability of documentation;
The programming language used by this project is C++ and the development
platform is the free Express Edition of Microsoft Visual Studio. Out of the five
27
different libraries that were discussed in this chapter, only RL-Toolbox and RL-Glue
are compatible with this development environment. Furthermore, RL-Glue does not
contain any algorithmic implementations but only serves as a standard for
performing experiments. LibPG would be a very strong alternative to RL-Toolbox in
the case of platform compatibility but the lack of a Windows library distribution
makes it a less desirable choice.
Migrating to a Linux development environment wound is not a realistic option, as
the graphical engine used for visualisation and testing of levels [Las07] is not
available under that operating system. It would either be necessary to adapt LibPG
to the Windows environment, or the graphical engine to Linux, neither of which is
trivial and neither is an essential goal of this project. Having this in consideration,
the most practical course of action would be to use the RL-Toolbox library.
II.3. Library for the Visualization and Testing of Levels
Evaluating the output of a level generation system can be a challenging task,
because a generated level becomes meaningful output only when integrated in a
video game. The performance of the level generation system can be measured
independently in terms of generation time, convergence of the learning algorithm
and validity of the output. However, it is difficult to be sure that the generated
level is “playable” without actually playing it and even more difficult to debug the
system by looking at the raw output. It is therefore essential to have some means
of visualising and testing generated levels.
For these purposes I use a game prototype [Las07] that I have developed in a
previous project. The engine is mostly OpenGL-based but it also uses DirectInput
(a part of Microsoft DirectX), and some Windows API calls during initialization. It
implements the loading and visualisation of levels, collision detection, interaction
with treasure and dangers. The user interface of the level generation system also
uses the rendering, texture and font management facilities provided by this library.
It should be noted that the level generation system itself is not dependant on the
Windows platform and neither in the OpenGL library. It is the graphical engine that
would need some reworking if a Linux port is to be implemented. In the case of
switching to a different graphical engine, only the user interface component of the
level generation system would need to be reworked.
28
CHAPTER III
ARCHITECTURE OF THE SYSTEM
This chapter presents the object-oriented design of the level generation system
and the principal goals that it is intended to achieve. The most fundamental design
consideration is that the level generation system must not make references to any
specific indexes in the tileset, because any realistic game project would use several
different sets of elements and changing them should be performed with ease.
Rather than working with indexes in the tileset, the level generator first produces
an abstract “blueprint” of the level. In a later stage of the level generation
procedure, referred to as post-processing, the level blueprint is transformed into a
renderable level.
In an extension this requirement for generality, it was also a design objective to
make the level generator as independent of the used game engine library as
possible. This would ensure that the system can be integrated in many different
game projects without the requirement of using one specific engine. To that end,
the functionality for producing file output is encapsulated in a class derived from
the abstract blueprint class. User interface and the all other visualisation tasks are
isolated in a separate sub-system of the level generator so that they can be
separated out if necessary.
Although the focus of this project is on implementing a reinforcement learning
generation algorithm, the system design allows for a multitude of generation
approaches. The level generation algorithm is represented by an abstract façade
class and other parts of the system are not aware that a reinforcement learning
implementation, or any other algorithm, is currently in use. This design choice
would make further extension of the system much easier.
In short, the system design is centred around the following goals:
·
Independence of the game engine library;
·
Independence of the learning technique and library;
·
Allowing for a change of the tileset and modifications in its content.
29
III.1. Conventions and Class Diagrams
Figure III-1 summarizes the different types of
relations can occur between classes. The class
diagrams in this chapter and in Chapter IV use
this standard notation based on [GHJ*95].
aggregation ( 1 x 1 )
aggregation ( 1 x N )
reference
inheritance
This project also follows some source code
conventions, briefly outlined here.
Figure III-1. Class diagram notation
III.1.1. Class Names
Class names start with the ‘gen’ prefix, indicating they are part of the level
generation system. For classes implementing the user interface this prefix is ‘ui’.
Classes corresponding to an implementation for any specific library or learning
technique include an abbreviation to indicate this, after an underscore at the end of
the class name. For example genAlgorithm is the abstract interface for a level
generation algorithm, whereas genAlgorithm_RL is the reinforcement learning
implementation of the same interface.
III.1.2. Member Variables and Methods
All member variables include one or two lowercase symbols to indicate type and
private member variables also start with an underscore. For instance _nLenght is a
private variable of type integer, whereas _pLevel is a private pointer. The only
requirement for method names is to start with a lowercase character and to reflect
the function that they serve.
III.2. Architecture of the Level Generation System
Figure III-2 on the next page shows the main subsystems of the level generator.
The reinforcement learning implementation, consisting of a state transition
function, reward function, state representation and an action set, is hidden behind
an abstract facade. Interaction with the reinforcement learning library is also
limited to this subsystem. This makes it easy to switch to a different reinforcement
learning implementation or even a different learning approach altogether.
30
The ‘Blueprint and Post-processing’ subsystem also has an abstract facade, which
is used by classes external to it. This part of the level generator is responsible for
representing the level blueprint in memory, performing post-processing, and
implementing output to the specific file format of the game engine. The Postprocessing algorithm achieves independence of the used engine and tileset by
storing this specific information in files with the extension .cm. Files of this type
contain context-matching and replacement rules that map the abstract blueprint
to appropriate tileset indexes.
External Libraries
‘Cubic’ 3D Engine
Level File
GUI Base Classes
RL-Toolbox
.CM Files
‘Cubic’ Blueprint
RL Algorithm
RL State
RL Actions
Generator Dialogs
Post-processing
GUI
Abstract Blueprint
RL Environment
Abstract Algorithm
Parameters
Blueprint
and Post-processing
Generation Algorithm
Figure III-2. Architecture of the Level Generation System
Last but not least important is the user interface subsystem. In order to use the
level generator as a development tool, it is essential to provide a user interface for
setting parameter values, training the system and visualising the results in a quick
and convenient way. Having a robust user interface also facilitates the debugging
of the system and allows to pre-visualise the output, rather than waiting for the
game engine to load all of the texture files and 3D models. This sub-system
depends on the graphical engine and no attempt is made to make it platform
independent. In the case of the level generation system being used as a
development tool, the output can still be targeted at any game engine. If the level
31
generator is to be distributed together with a game, the user interface can easily
be separated from the rest of the system.
III.3. Stages of Level Generation
The generation of levels can be regarded as a sequence of several simpler tasks,
namely a training phase, generation
Level Parameters
phase, post-processing and file output.
Input parameters specified by the user.
Clearly, tasks such as determining
which 3D model in the tileset would fit
in a particular position are of smaller
level of granularity than determining
Trained System
the overall shape of the terrain (i.e.
Generation policy is implicitly specified
by a learned value-function
the level blueprint). The sequence of
steps that leads to successful level
generation is illustrated in Figure III-3.
Level Blueprint
The first stage in the process is to
Level with no associated graphical
information
specify the parameters of the level and
this goal is achieved by developing the
user
interface
system.
In
of
the
the
second
generation
stage
the
Post-Processed Blueprint
system is trained to generate levels
Abstract terrain, danger and reward
elements are replaced with indexes
referring to the tileset.
with the specified parameters. Once
the system is trained it would be
possible to generate multiple variants
with similar properties. Implementing
this functionality is the goal of the
Level File
generation phase.
Level is stored in a file format supported by
the game engine
All of the tasks related to the visual
representation of levels are grouped
together in the post-processing phase.
This
phase
transforms
blueprint
by
matching
and
using
the
the
Visualisation and Testing
level
The level file can be loaded and tested
context-
replacement
rules
Figure III-3. Stages of Level Generation
specified in external files. In the case
of changing the tileset only these files would have to be modified and no changes
32
to the source code of the level generator would be required. Context-matchers
implement several different graphical tasks, such as the removal of redundant level
elements, terrain smoothing and bordering.
The last stage in the level generation process is output to a specific file format. In
the current implementation of the system there is only one supported file format
but this be changed easily by implementing a new derived class of the abstract
level blueprint.
33
CHAPTER IV
LEVEL GENERATION
AS A REINFORCEMENT LEARNING TASK
Up to his point the architectural design of the system was presented and level
generation was subdivided into five smaller and easier tasks. In this chapter the
focus is on two of them:
·
Learning a generation policy that satisfies the specified parameters;
·
Using the policy to generate a level blueprint.
It is assumed that the user has the means of supplying the necessary parameters
and the post-processing subsystem is capable of developing the blueprint into a
level that can be loaded, rendered and played.
IV.1. Task Specification
IV.1.1. State Space Representation
Platform games have a level structure that can easily be represented with a
discrete two-dimensional grid in which the cells represent terrain elements,
treasure and dangers. It is also typical for platform game levels to have a much
greater length than height, which is the source of term “side-scrolling games”. At
any given time only a small “window” of the level is visible and this view scrolls as
the player moves. The actual length to height ratio of the level by far exceeds that
of any monitor. While the height is usually in the small range [20;100] cells, the
length can be thousands of cells.
These observations are relevant to the goal of representing level generation as a
sequential action-taking process. Because of the much greater length of levels it is
convenient to divide them into vertical slices and perform incremental generation
slice by slice. For now we will assume that a slice could be a single action of the
building agent, as illustrated in Figure IV-1, although it will soon become clear why
this idea is an oversimplification.
34
Level length Î [100; 1 000] cells
1
2
3
4
5
History of previous actions
Figure IV-1.
Platform Game Level Generated as a Sequence of “Slices”
6
…
Current action
In order tot specify level generation as a Markov Decision Process (MDP), it is also
necessary to decide what will constitute a “state” of the environment that the
building agent can perceive. One possible approach would be to use the values in
the last slice as a state representation for the next one to be generated.
Unfortunately, this is not a valid representation. The terrain elements, dangers and
treasure located in any given slice may or may not be accessible depending on the
terrain in all preceding slices. For example, a very high platform would be
accessible to the player only if there is a slightly lower platform in a preceding
position. The player would be able to jump on the first platform and then the
second one but not directly to the second one.
In this project I develop a way for compactly representing these dependencies into
a relatively small state representation. Unlike the method of using the last slice as
a state indicator, this method accurately evaluates the traceability of cells.
Traceability markers are updated at each step of the level generation procedure by
running an incremental simulation of the player that is advanced with one slice
forward at every time step. Each marker is internally stored as an integer variable
but it is visible to the agent as a Boolean value.
Level generation can now be presented as a MDP and at least from a theoretical
point of view this should be sufficient to solve it with any of the value-based
methods outlined in Chapter II. In practise, the size of the state space would be
too large for any of these algorithms to achieve reasonable performance. Even
though traceability markers are visible to the agent as Boolean variables, rather
than by their full range of values, for an average level of height 20 there would be
220 markers. The number of possible actions would also be too large, because the
35
building agent must discriminate between several types of objects (i.e. terrain,
empty space, treasure or danger) and place them in different combinations.
Figure IV-2 illustrates this problem and one possible solution that is based on the
level structure of platform games. It was discussed in the introduction to this thesis
that platform games often have several pathways, or branches, running in parallel
to one another. Most of the meaningful interaction between the player and the
level occurs within the boundaries of the current branch with only the occasional
opportunities for switching to a different one. It is therefore reasonable to assume
that the building agent only needs the traceability markers of the current branch in
order to generate it in a meaningful way. Traceability markers are still updated on
a global scale (i.e. for the whole height of the level) which provides integration
between the pathways. However, the decision making of the level building agent is
based only on the much smaller local state, corresponding to a single branch of the
level.
Local state (a)
Only the local state is visible to
the building agent at any given
time step. Each branch has a
separate local state, consisting of
6 Boolean traceability markers,
plus a Boolean “pathway” flag.
Global State
Maximal Number of Branches
Branch height
ð
There are 3 x 6
traceability markers
=3
=6
=
18
ð
Each marker can take 5 internal
values, indicating jump height; Even if
it is converted to Boolean the number
of states is still very large:
ð
There are only 27
= 128 local states
Local state (b)
218 states in the global state
It is not practical to use the global
state directly for reinforcement
learning.
Local state (c)
Figure IV-2. Global and local state.
The traceability simulation runs in the
global state but only the local state is
visible to the building agent.
All level branches are updated in a sequence, followed by a transition to the next
position along the length of the level. In an early version of the level generation
system the agent was also allowed to perceive the index of the current branch.
Interestingly, this augmentation does not result in increased quality of levels but
rather in some disappointing solutions exploiting the index. For instance, the
36
lowest branch would often be completely blocked with obstacle elements so that
the player cannot fall off the bottom of the level, whereas upper branches would be
a very minimalistic sequence of empty and solid blocks. By contrast, if the index of
the branch cannot be perceived the building agent is forced to discover a more
general solution, aiming to simultaneously keep the current branch traceable and
promote access to upper and lower branches.
IV.1.2. Designing the Reward Function
Level generation can be described as a maintenance task in which the goal is to
generate a level that satisfies the user specified parameters at all times. If the
level
diverges
significantly
start
from the specified parameters,
a failure state is reached and
the agent receives a penalty.
Figure
IV-3
shows
yes
No set
traceability markers in
the global state ?
the
conditions that can trigger the
no
fail-state. The goal of these
checks is to ensure that there
Pathway flag set in the
local state ?
is always a way forward for
yes
the player and to keep the
level
accessible
at
all
times. By checking the flags in
the global state it is easy to
No set
traceability markers in
the local state ?
no
desired number of branches in
the
yes
yes
X > Length / 8.0
and less pathways than
required ?
verify that the level as a whole
no
no
is traceable. The local state is
false
augmented with a “pathway”
flag that serves as memory for
the
building
agent
true
and
Figure IV-3. Is the state a fail state?
indicates if at any time in the
past the current branch has been accessed. This mechanism prevents the creation
of dead-end branches.
The size of the fail-state penalty is proportional to the distance between the
current position and the length of the level. Decisions that pushe the level into a
fail-state at the very beginning would receive a big penalty, whereas close to the
37
end of the level the penalty would be smaller. This proportionality can be
expressed with the following equation:
RFail - State =
X current
- 1,
Parameters :: Lenght
(Eq. IV-1)
In the current implementation of the level generation system the user can specify
the branching factor (i.e. the number of parallel pathways), the desired length of
the level and a “chaos” factor that controls the amount of noise introduced in the
level. Apart from the penalty for reaching a fail-state, achieving the desired level of
branching is also encouraged at the increase in number branches. This is a smaller
task that is intended to improve the convergence of the learning algorithm. The
penalties and rewards, as implemented in the current version of the system, are
summarised in the table below.
Figure IV-4. Rewards and Penalties
Situation
(Eq. IV-1)
The failure state is reached. Return a penalty proportional to the part
of the level that was not generated. For example, for a level that is
almost completed only a small penalty would be imposed, whereas a
level that reaches the fail-state quickly is likely to be in the result of
worse choices and results in a larger penalty.
RFail - State =
(Eq. IV-2)
X current
-1
Parameters :: Lenght
Reward branching on every time step. This promotes branching and
also discourages the construction of deadens.
RBranching =
(Eq. IV-3)
N branches - 1
Parameters :: Lenght
Add a small reward proportional to the amount of treasure and
dangers placed in the level.
æ N Re ward + N Treasure ö
÷÷
RTreasue -Traps = 0.1 * çç
è Parameters :: Lenght ø
38
Placing danger and treasure elements also results in a small reward for the agent
but currently there is no explicit parametric control over this aspect of level
generation. This part of the system could be extended by keeping track of the
cumulative reward generated in the level and normalising it in order to calculate a
danger-per-length coefficient CD and treasure-per-length coefficient CT. Both
coefficients can be compared to a corresponding user specified parameter and if
they differ significantly this should result in a penalty for the agent.
IV.1.3. Actions of the Building Agent
The building agent chooses an action out of a small subset of all the possible
combinations of cell values. Most of these combinations are meaningless and not
likely to occur in any level. For instance, actions containing a lot of scattered bits of
terrain or dangers and traps are very unlikely and also undesirable in levels. Figure
IV-5 presents the action set currently used by the building agent. It consists of 9
basic action that are extended to 21 with the addition of dangers and treasure.
Figure IV-5. Actions available to the building agent
The 9 basic action can be combined with danger and treasure elements for a total of 21 different actions.
The width and height of actions are parameters of the level generation system that
have a significant effect on the appearance of the level output and the speed of
generation. In the initial design of the system these actions were intended to be
only one cell wide. It was my reasoning that this would reduce the number of
possible action types and thus facilitate learning procedure. This implementation
works but it results in a very cluttered level, so it appears to be too fine-grained.
After some experimentation with different sizes it was determined that a size of 6 x
3 is the best compromise between the number of different actions and their
meaningfulness. This size also tend to produce less cluttered levels than a size of 6
x 1. The currently active action set is specified in an external file located in the
”output\gen\actions\” sub-directory of the project. Using a different action set can
produce a distinctly different output because it forces the building agent to
discover alternative ways for solving the same problem.
39
IV.1.4. Choice of a Learning Algorithm
With regard to the choose of a learning algorithm, the principal alternatives are
value-based methods, actor-critic learning and policy gradient search. There would
be no benefit to using model-building approaches, such as Prioritized Sweeping,
because the task is entirely simulated and scarcity of training experience is not an
issue [KLM96]. Furthermore, using a value-based method would take advantage
of the discrete and sequential representation of the task whereas a policy gradient
method would not, which motivates my decision if favour of algorithms that learn a
value function.
It is also my reasoning that for the task of level generation using an on-policy
learning algorithm, such as SARSA, would be a better motivated choose than offpolicy learning. This could be the case because there is a certain amount of
desirable “chaos”, or randomness, in levels and this makers the action-selection
policy a very prominent factor in the dynamics of the environment. Level chaos is
implemented by making the action selection policy less deterministic than in other
reinforcement learning tasks. If the system is to use an off-policy learning
algorithm the random action choices would be ignored, resulting in different
dynamics of the environment. Q-learning would optimise for a chaos-free
environment and this could hinder the performance of the system because it is not
a realistic assumption.
Another contributing factor to the suitability of on-policy learning is that level
generation is a maintenance task containing states resulting in a very large penalty
if not avoided. In combination with level chaos this can easily “push” the building
agent into a fail-state from nearby neutral states. Because random action selection
is ignored by the Q-learning algorithm, these dangerous states would remain
neutral and will not be avoided by the building agent.
By contrast, SARSA would make a more conservative chose and stay away from
the fail-states. In [SB98] this effect is illustrated with a “Cliff Walking” example,
where the goal is to learn how to make a trip between two points in a Gridworld
without falling into the chasm between them. In this case Q-learning finds a
solution that walks dangerously close to the edge of the cliff, whereas SARSA
points to a longer but much safer route.
The author of [Num05] also makes a
point of this advantage of the SARSA learning algorithm for tasks that have the
aforementioned properties.
40
IV.2. Task Implementation
IV.2.1 The “Generation Algorithm” Sub-system
The class hierarchy implementing the level generation reinforcement learning task
is presented in Figure IV-6. These classes are grouped together in the “Generation
Algorithm” subsystem which is accessed by the rest of the system through the
abstract interface genAlgorithm. The abstract interface provides methods for
training the system, generating level variants and also gives access to the
genParameters object.
RL-Toolbox
Generation Algorithm
genWorldModel
genAlgorithm_RL
transitionFunction()
getReward()
genAlgorithm
setParameters( param )
train( statistics )
generate( blueprint )
genAction_RL
genState_RL
apply ( blueprint )
build ( markers )
genTrainingStatistics
genTraceabilityMarkers
genParameters
genBlueprint
FigureIV-6.
The “Generation Algorithm” Subsystem
The reinforcement learning implementation of the training and generation methods
is contained by the derived class genAlgorithm_RL. As already discussed, this
implementation is based on the SARSA learning algorithm and the RL-Toolbox
library. This implementation also involves several other classes. Local states visible
to the agent are stored in the class genState_RL. Objects of this type can be
created
from
the
column
of
traceability
markers
represented
by
the
genTraceabiltyMarkers class, which is in practise the “global state” of the level. The
level
blueprint maintains a pointer to the genTraceabiltyMarkers
object but
functionally this class is more closely relatedto the ‘Generation Algorithm’ sub-
41
system, as it play an important role in the reinforcement learning process.
Traceability markers are updated incrementally at each generative each step. At
the end of each training run both the traceability markers and the blueprint are
cleared and another level generation attempt can commence.
For classes external to the ‘Generation Algorithm’ sub-system this functionality is a
black box, visible only thought the genAlgorithm interface. The following sections
provide some insight into the functionality of the classes in Figure IV-6, collectively
implementing level generation as a reinforcement learning task.
IV.2.2. Class Implementations
IV. 2.2.1. Abstract learning algorithm ( genAlgorithm )
This is an abstract base class that maintains a pointer to the level blueprint and the
parameters object. It also defines the virtual methods genAlgorithm::train() and
genAlgorithm::generate() respectively for the training of the system and the
generation of level variants.
IV. 2.2.2. Reinforcement Learning algorithm ( genAlgorithm_RL )
This class is derived from genAlgorithm and provides the reinforcement learning
implementation
of
the
genAlgorithm::train()
and
genAlgorithm::generate()
methods. Training starts with a setup of the reinforcement learning framework,
consisting of the following steps:
·
Create an instance of class genWorldModel, implementing the transition and
reward functions;
·
Create an agent object and register the action list specified by the
genAction_RL class;
·
Create an instance of the learning algorithm, implemented in the reinforcement
learning library;
·
Create an ε-greedy action selection policy and register it as the active controller
of the agent.
After the initialization phase is completed the genAlgorithm_RL::train() method
starts a sequence of training runs. For the duration of each run the ε parameter of
the action selection strategy is adapted as a linear function of the current step. The
initial value of epsilon and its attenuation rate are specified by the parameters “RL
42
Epsilon” and “RL Epsilon-reduction” respectively. Additionally, the parameter “RL
Max Runs” is used to limit the number of training runs.
During the generation phase, implemented in genAlgorithm_RL::generate(), the
learned value function is frozen by deactivating the learning algorithm object. The
generation phase also uses a different action selection strategy. The ε-greedy
policy is replaced with a Softmax, which results in agent behaviour that chooses
actions with probability proportional to their value. Optimal and near-optimal
actions will be selected very often, whereas the probability of choosing a drastically
undesirable action would be small.
It is important to point out that following a greedy policy during the generation
phase is not a viable alternative because some amount of randomness is desirable
in the level output. Once the building agent manages to establish a certain rhythm,
optimal with regard to the value function, the greedy policy does not provide an
impulse for changing it. On the other hand, the player would feel bored after
playing to the same rhythm for a certain amount of time. Instead of trying to
estimate player “boredom” and adapting to it, a much simpler solution would be to
allow for some randomness in the generation phase. Research in platform games
often quotes the existence of a rhythm as a positive thing but it is assumed that
this rhythm will be alternated periodically, in order to avoid boredom. In [SCW08]
the authors make an analogy with music, where not only rhythm but also the
change of rhythm contributes to the enjoinment of the listener.
The use of a Softmax action selection strategy provides the necessary impulse for
changing the rhythm of the level without being as disruptive as an ε-greedy policy.
In the RL-Toolbox library Softmax is implemented as a Gibbs probability
distribution [Num05]. This distribution has a “greediness” parameter β that is
reversely proportional to the “RL Level Chaos” parameter.
As a positive side effect of using an non-deterministic generation phase, once the
system is trained it is capable of generating many “variants” of the same level. If
the level generation system is to be used as an internal development tool, the
human designer can generate several variants and select the one that best meets
his subjective aesthetical criteria. In the case that the system is to be distributed
together with a game, the generation of variants makes it possible to decouple the
training and generation components. The player would be provided only with a
library of learned value functions and the variant phase will be executed directly
43
when the game needs a particular level. This would provide much greater certainty
for the game developer as to the appearance of the level output.
IV. 2.2.3. State of the building agent ( genState_RL )
This class represents the state of the level under construction that can be
perceived by the building agent. It stores 6 traceability markers, converted to
Boolean values, and a “pathway” flag. The flag is used as a simple memory register
allowing the agent to remember if the current branch has ever been accessible to
the player. If this is the case and the branch suddenly ceases to be traceable that
can result in the creation of a dead-end. The reward function accounts for deadends but in order for the agent to learn how to avoid them it is necessary to have
this perception of the past.
IV. 2.2.4. Actions of the building agent ( genAction_RL )
The class genAction_RL stores a template of an action that can be applied at any
position in the level blueprint. Templates can be loaded form an external file
located in the project directory under ‘output\gen\actions\’ or they can be created
manually. As part of the setup of the reinforcement learning framework,
implemented in genAlgorithm_RL::train(), the list of actions that will be used is
loaded form a file in this directory.
Actions of the building agent are identified by their index in this list. When the
agent invokes a particular action the method genAction_RL::apply(X,Y, blueprint)
is activated with parameters referring to the current position in the blueprint. This
method copies the content of the action object onto the blueprint. There are two
special actions, applied at the start and end of levels. The start of level template is
applied at the specified start position of the player, whereas the end position is
determined automatically and merged with the terrain at the end of the level.
IV. 2.2.5. Transition and reward functions ( genWorldModel )
The class genWorldModel implements both the transition and reward function that
define the dynamics of the environment. In order to achieve this it inherits the RLToolbox classes CTransitionFunction and CRewardFunction and implements the
virtual methods genWorldModel::transitionFucntion(state, action, new-state) and
genWorldModel::getReward(state).
44
Figure IV-7 presents a pseudo-code of the environment transition function. This
method advances each branch of the level in parallel and keeps track of the
current position in the blueprint along the X axis. The exact positioning in the level
is not visible to the agent, as that would result in a huge state space and lack of
generality in the learned policy. Instead, the local state is built on the basis of the
traceability markers and the agent must discover for itself the dynamics of these
changes.
Extract currentState
currentAction.apply( blueprint, currentX, currentBranch)
currentBranch ++
If currentBranch > maxBranches then
{
currentBranch = 0
currentX++
blueprint.traceabilityUpdate()
}
If currentX > maxX then endAction.apply( currentX, detectEndY() )
Else
{
currentState.build( blueprint.traceabilityGet(), currentX, currentBranch)
Commit currentState
}
Figure IV-7. Pseudo-code of the Transition Function
The reward function is a straightforward implementation of equations Eq.VI-1,
Eq.VI-2 and Eq.VI-3 discussed earlier in this chapter. The class also implements a
genWorldModel::reset() method that returns the building agent to the beginning of
the level, clears to blueprint and the traceability markers.
IV.2.3. Traceability Markers
Traceability markers, implemented in class genTacabiltyMarkers, are a discrete and
incremental version on the level validation method outlined in [CM06]. It can be
judged whether or not certain parts of the level are accessible by executing a
simulation of the player that moves along platforms and performs all possible
jumps. The physics of platform games are not entirely true to the physics of the
real world because they allow the player to change direction in mid-air. This results
in an “accessibility
window” corresponding to the part of the level below the
45
ballistic curve, rather than only the curve itself. Most platform games use this
model of altered physics in order to increase the amount of control that can be
exerted on the player character.
By contrast with the player simulation described in [CM06], the method I propose
does not re-evaluate the traceability of the whole level at each step in the
generation procedure. Instead, as level generation progresses a column of
traceability markers moves along with the building agent and gradually “scans” the
whole length of the level. Figure IV-8 presents the history of traceability markers
for a sample level blueprint.
.
Figure IV-8. Sample level blueprint showing the traceability markers
For every cell of the blueprint an integer value is stored, indicating the distance
that the player can move upward form the current position. When this distance
becomes 0 no further movement upward is possible and if the value becomes
negative the corresponding cell is marked as inaccessible. The algorithm can work
equally well with real-valued traceability markers, which improves the accuracy of
the prediction but reduces its speed. For the purposes of the current project it was
determined that integer markers provide sufficient accuracy and using real values
would only increase the training time.
IV. 2.3.1. Updating the traceability markers
Given a column of traceability markers, the next column can be determined
unambiguously by performing transformations on the marker values illsutrated in
Figure IV-9. This figure shows a pseudo-code of the marker update algorithm, as
46
implemented in the constructor of class genTraceabiltyMarkers. With the exception
of the first column of markers, objects of this type are constructed only by
transforming the previously existing markers.
Set all traceability markers to UNACCESSIBLE
Set X = Xcurrent
For every Y < Height
If blueprint[X, Y] is not SOLID and old_markers[Y] is ACCESSIBLE then
{
If blueprint[X, Y+1] is SOLID then updateMarker(Y, DYmax)
Else
{
updateMarker(Y, doTrajectoryStep(old_markers[Y]))
propagate(Y)
}
}
updateMarker(Y, value)
marker[Y] = MAX(marker[Y], value)
doTrajectoryStep(value)
if value == 0 return UNACCESSIBLE else return value-1
propagate(Ystart)
{
Y = Ystart
While blueprint[ X, Y ] is not SOLID
updateMarker (Y, doTrajectoryStep(Y--, markers[Y] ))
}
Y = Ystart
While blueprint[ X, Y ] is not SOLID updateMarker (Y++, 0)
Figure IV-9. Updating the traceability markers
For every step along the X axis, the traceability markers that make up the Global State are updated. The
presented pseudo-code for doTrajectoryStep(value) corresponds to a linear approximation of the jump
trajectory. This function can be replaced with a more sophisticated ballistic simulation of the player.
The transformation starts by blocking all markers by default. Next, for every
marker not blocked in the blueprint the status of the surrounding markers is
evaluated. If there is solid terrain bellow the currently evaluated marker and the
old marker is accessible this is a possible starting point of a jump. Markers of this
type are set to the maximal jump height DYmax, which is a parameter fop the level
generator.
In the case of a marker that does not initiate a new jump the old value of the
marker is transformed by the function doTrajectoryStep() and it is propagated
upwards and downwards in the column of markers. In order to avoid conflicts a
47
value is updated only if it is greater than the current value. To illustrate this,
consider the following example. Any position in the level could accessible both as
the starting point of a jump (M=DYmax) and the landing site of a falling player
(M=0). Clearly, the greater value should take precedence because the possibility of
landing on a platform does not exclude the possibility of starting a new jump form
the same platform.
IV. 2.3.2. Jump trajectory
The trajectory that the simulated player
DXmax
follows during a jump is encoded in the
function
doTrajectoryStep(value)
which
under the current implementation of the
algorithm only decreases the value of the
DYmax
marker with one. As illustrated in Figure IV10 this corresponds to approximating the
ballistic curve with a line. The cells included
under the linear trajectory are only a subset
of the cells that a ballistic curve would
include. Because decisions are made on the
basis of a pessimistic scenario, and not an
overly optimistic one, the level could be
“more traceable” than predicted but not less
so. This ensures that if a level is valid with
respect to the traceability markers it will
also be valid in the game engine.
IV. 2.3.3. Representation in memory
Traceability markers are represented as a
list structure and a pointer to the rightmost
column of markers is stored in the level
blueprint. The last genTraceabilityMarkers
object is all that is necessary for performing
Figure IV-10.
Simplification of Jump Trajectory
The ballistic curve is approximated
with a simpler linear trajectory. The
shaded area beneath the curve
represents cells of the blueprint that are
flagged as traceable. Because the linear
trajectory covers a smaller area, it can
give only a pessimistic scenario for the
traceability of the level.
reinforcement learning but in order to visualise and debug the markers it is
convenient to preserve all of them. The Post-processing phase, discussed in the
next chapter, also uses the full history of markers in order to remove redundancies
in the level.
48
Each time a new column of traceability markers needs to be created, a
genTaceabilityMarker object is constructed from the markers currently stored in
the level blueprint. The constructor transforms the old markers and stores an
internal pointer to them. The level blueprint no longer stores this pointer as it is
replaced by the newly create genTracabilityMakrers object. Figure IV-12 illustrates
this interaction between the genBlueprint and genTracabilityMakrers classes.
genBlueprint
_pTM
traceabilityUpdate()
traceabilityGet()
genTraceabilityMarkers
_pPrevious
constructor ( old-markers, blueprint )
Figure IV-12.
Traceability Markers and the Blueprint
The blueprint object stores a pointer only to the rightmost column
of traceability markers. Every time the traceabilityUpdate() method
is invoked, it creates a new genTraceabilityMarkers object from the
previous one.
49
CHAPTER V
POST-PROCESSING
The post-processing phase of level generation takes the level blueprint as input
and transforms it in order to implement the following tasks:
·
Removal of redundancies. The output of the generation phase is a valid
blueprint, traceable from the start to the end. However, there is not way to
avoid the occasional placement of isolated terrain elements that are not
accessible in any way. These redundancies should not be present in the final
output.
·
Terrain smoothing. Scattered terrain elements should be merged where
appropriate and any gaps in the terrain that serve no functional purpose should
be filled in. The task of terrain smoothing must be implemented without
changing the traceability of the level, otherwise it could be invalidated.
·
Terrain bordering. The contour of the terrain must be defined by choosing
appropriate tileset indexes, taking in consideration the surrounding tiles. The
terrain around lava traps should be modified in order to accommodate for the
3D model of this type of obstacle. Other decorative elements are also placed in
this step.
After training the system and generating a level variant, the blueprint contains only
an abstract specification of the level. The areas that contain terrain elements,
dangers and treasure are marked but no specific indexes in the tileset are
assigned. Furthermore, the aesthetic quality of the level can be improved by
making some transformations of the terrain. In this project I develop a system of
context-matching
and
transformation
rules
that
implement
all
of
the
aforementioned tasks. The level generation system loads the context-matching
rules form external files with the extension ‘.cm’. All of the tileset-specific
information is contained within these files
.
50
V.1. The “Blueprint” and “Post-processing” Sub-systems
The architecture of the ‘Level Blueprint’ subsystem is presented in Figure V-1. The
purpose of class genBlueprint, which takes a prominent position in the sub-system,
is to store a representation of the blueprint in memory and to provide access to it.
This class also maintains a pointer to the parameters object and a list of
traceability markers updated during level generation.
‘Cubic’ 3D Engine
Level Blueprint
genBlueprint_CB
Post-processing
save( file )
draw ()
genContextMatcher
load( file )
apply( blueprint )
genContext
genBlueprint
match(X,Y, blueprint)
get( X,Y )
set( X,Y, value )
pp()
genWildcard
match ( value )
symbol ()
genParameters
genTraceabilityMarkers
Figure V-1. The “Blueprint”
and “Post-processing” subsystems
The class genBlueprint_CB is an implementation of the blueprint class capable of
producing file output to the specific file format of the game engine. If the level
generation system is to be extended to work with a different game engine it will be
necessary to implement another class, inheriting the genBluperint functionality and
to
implementing
the
virtual
method
genBluperint::save().
Except
for
the
genBlueprint_CB class, no other part of the level generation system makes a
reference to the file format used to store levels.
The rest of the ‘Blueprint’ subsystem is a group of classes implementing the tasks
of level post-processing. This part of the class hierarchy is activated by a call to the
genBlueprint::pp() method.
51
V.2. Implementation of Post-processing
The initialization of the post-processing subsystem is performed in the constructor
of the genBlueprint class by loading a list of ‘.cm’ files. The constructor creates two
vectors of objects relevant to the task of post processing. The first vector, stored
as the member variable _vWildcards, is a list of wildcard symbols used in the
context matching rules. The use of wildcards greatly reduces the number of
context matching rules and simplifies their design. The second vector represent the
context matcher and is stored in the _vContextMathcers member variable.
Invoking the genBlueprint::pp() method applies all of the registered context
matchers to the level blueprint.
V.2.1. Wildcards
Wildcards, as implemented by the class genWildcard, are a convenient tool that
allows for a more abstract specification of context-matching rules. It is often the
case that some values of the context are irrelevant or they matter only for as long
as they belong to a certain subset of the tileset. Each wildcard is specified as a
symbol followed by the tileset ranges that match this symbol. Figure V-2 lists the
most important wildcards used in context-matching rules. The list of wildcards is
user specified and for a different tileset it may contain a completely different set of
wildcards.
Symbol
*
Interpretation
Match any element.
Matching Ranges
[0;9999]
Match only elements representing ‘empty space’. This category
@
includes all elements that do not stop the player character from
0, [24;38], [55;57],
passing through, such as decorative elements, challenge and
[2000;9999]
reward elements.
Match only elements representing an obstacle to the movement of
[1;23], [39;54],
the player character.
[58;9999]
C
Match only challenge (danger) elements.
[2000;2999]
R
Match only reward (treasure) elements.
[3000;3999]
$
Match any challenge or reward element.
[2000;3999]
#
T
N
Match any ‘@’ element that has a set traceability flag in the
traceability markers.
The opposite of the ‘T’ wildcard.
like ‘@’
like ‘@’
Figure V-2. Wildcards used in the post-processing phase
The list of wildcard is loaded from the external file ‘gen\post-processing\post-processing.pp’, allowing
for an easy transition to a different tileset. Only the T and N wildcards have built in functionality.
52
In order to match a wildcard both the symbol and the range must match. In
addition to this requirement the two built in wildcards ‘T’ and ‘N’ also make a
reference to the traceability markers stored in by the blueprint.
These built in
wildcards are necessary for implementing the removal of terrain redundancies.
V.2.2. Context Matchers
Class genContext implements the functionality of a single context matching-rule.
Contexts are represented as a 3 x 3 matrix, consisting of the cell that is subjected
to transformation and the 8 cells that surround it. Objects of this type can be
constructed in two ways:
·
Constructed from a transformation rule. In this case the context matrix can
contain a mixture of tileset indexes and wildcard symbols, as specified in the
‘.cm’ file. Rules of this type also include a target value to be substituted in the
case of a successful match.
·
Constructed form a location in the level blueprint. In this case the context does
not contain any wildcards and the target value is ignored. During the postprocessing phase a genContext object is constructed for each cell of the
blueprint.
Each of the blueprint contexts is matched against all of the transformation rules.
This comparison is implemented in the genContext::match(context) method,
returning a Boolean value.
In order to separate the post-processing functionality from the rest of the level
blueprint
code,
context-matchers
are
encapsulated
in
the
class
genContextMatchers and not directly in the level blueprint. The functionality of this
class is concentrated mainly in the method genContextMatchers::apply(blueprint).
It implements a loop that builds the context of each cell in the blueprint and tries
to match it against all of the registered genContext rules. In the case of a match
the target value in the rule is transferred to the blueprint.
The same class also has a genContextMatchers::load() method invoked from the
constructor of the level blueprint.
53
V. 2.2.1. Removing redundancies. Smoothing and bordering
It
is perhaps easier to illustrate the functionality of context-matchers with a
specific example. Figure V-3 shows the context-matchers that implement removal
of redundant terrain elements.
This task requires with only two rules applied
recursively to the blueprint.
The
first
rule
matches
any
terrain
element (i.e. the ‘#’ wildcard) that has
empty non-traceable space above it. The
rest of the context is ignored, as signified
by the ‘*’ wildcard. The zero to the right
of the context indicates a replacement
with
the
nil
tileset
index
which
corresponds to empty space. This rule
[FLAGS]
recursive: true
recursion-depth: -1
[CONTEXT-MATCHERS]
*N*
*#*
***
:
0
*N*
*$*
*N*
:
0
[END]
“eats away” the surface of any isolated
terrain elements not accessible for the
player.
The second rule functions in a similar
way, this time matching any danger or
Figure V-3. Removing Redundancies
These two context-matching rules are used
for removing redundancies in the generated
level. The ‘N’ wildcard matches any nontraceable cells in the blueprint. The ‘$’
wildcard matches a treasure or trap element.
The ‘*’ wildcard matches any element type.
treasure element that is not accessible to
the player. These elements serve no functional purpose and look rather odd in the
final output so it is preferable that they are removed.
Not all of the post-processing tasks can be implemented with such a small number
of transformation rules. There are currently 18 rules that implement terrain
smoothing and 34 rules to implement bordering ( extracts are presented in Figures
V-3 and V-4 ). The large number of bordering context matchers is due to the fact
that each one of them must handle a configuration of terrain elements that
corresponds to a different 3D model and a different index in the tileset. This
number would be unmanageably larger if it was not for the use of wildcards.
Smoothing context matcher work with only two tileset indexes and add or remove
bits of the terrain that would create unpleasantly rough appearance or the sense of
a cluttered level. These rules are applied recursively so with careful design it is
possible to affect larger areas than the explicitly specified 3x3 context.
54
Figure V-4.
Terrain smoothing (an extract)
Occasional gaps in the terrain and
unnaturally looking configurations are
transformed by these “smoothing”
context matchers.
Figure V-5.
Bordering context matchers (an extract)
These transformation rules introduce the tileset
indexes that build up the contour of the terrain.
Each index in the tileset corresponds to a 3D
model designed for the specific context.
[FLAGS]
recursive: false
[FLAGS]
recursive: true
recursion-depth: -1
[CONTEXT-MATCHERS]
[CONTEXT-MATCHERS]
* 1 *
1 @ 1
* 1 *
:
1
* 1 *
1 @ @
* 1 *
:
1
* 1 *
@ @ 1
* 1 *
:
1
* 1 *
@ @ @
* 1 *
:
1
* 1 *
1 @ 1
* @ *
:
1
1 @ 1
* @ *
* 1 *
:
1
* 1 *
* @ *
1 @ 1
:
1
@ @ 1
@ 1 *
1 * *
:
0
1 @ @
* 1 @
* * 1
:
0
@ @ 1
@ 1 *
@ 1 *
:
0
1 @ @
* 1 @
* 1 @
:
0
* 1 *
@ 1 @
@ @ @
:
0
@ @ @
@ # @
@ @ #
:
51
@ @ @
@ # @
# @ @
:
52
@ # @
@ # @
@ @ #
:
53
@ # @
@ # @
# @ @
:
54
@ # @
@ # @
@ # #
:
22
@ # @
@ # @
# # @
:
23
@ @ @
# # #
* * *
:
2
@ @ @
# # @
* # @
:
5
@ @ @
@ # #
@ # *
:
6
@ # @
@ # @
@ # @
:
7
@ @ @
@ # @
@ # @
:
8
@ @ @
@ # #
# * *
:
9
... ...
... ...
[END]
[END]
55
V.3. Prepared Graphical Elements
The graphical engine used by the level generation system had a very small tileset
which was not capable of visualising all possible outputs of the level generation
system. To that end it was necessary to
create a new tileset implementing terrain
border 3D models, as well the models for
lava traps
and a single monster type.
Developing this tileset was the only way to
ensure
that
the
post-processing
system
performs any meaningful work and also a
necessity for debugging the whole level
generation system.
Most of the elements of
the tileset correspond
Figure V-6. Contour tiles joined together
to 3D models that are used to build a smooth contour of the
terrain when placed at appropriate locations. Figure V-6
illustrates this with a fragment of terrain that appears to be a
continuous curve but internally is represented as a regular
grid. The grid is populated with 3D models matched to the
context of surrounding cells. There are about 24 contour
elements in the tileset but due to symmetry only half that
number of 3D models were created. The tileset also contains
some decorative elements and variants of the contour tiles.
Figure V-8.
Shooting adversary
Because of the time constraints for the development of this
project there is only one monster type (Figure V-8) and one type of trap (Figure V7). The type of danger element to be inserted in the level is determined
automatically depending on its position. Lava traps are placed when the a danger
element specified in the blueprint comes in contact with the level terrain. Dangers
surrounded by empty space are
converted
to
the
shooting
monster type.
Lava traps can also be stretched
horizontally and this requires
Figure V-7. Lava trap
some additional post-processing in order to detect the position of the trap edges.
56
V.4. Output File Format
File output is implemented in the method
genBlueprint_CB::save() invoked after postprocessing of the level blueprint. Figure V-9
shows a sample level file generated by this
method. The file format starts with a ‘general’
section that specifies the name of the level
and the tileset file to be used. This is followed
by
a list
of level
‘segments’, each
one
representing a 24 x 10 matrix of tileset
indexes.
For
convenient
performance
for
the
reasons
graphical
it
is
engine
to
represent levels as a list of equally sized
segments
conform
and
to
the
this
output
format.
method
must
Following
the
segments is a list of object each one specified
by its position and a textual identifier of
type.
Some
object
types
may
require
additional parameters.
This textual representation is not a very
efficient one, considering the occurrence rate
of the nil element, but the file format was
designed primarily with readability in mind.
Figure V-9.
Sample output to a .lvl file (an extract)
[GENERAL]
name: 'Generated Level'
tileset: 'world-1.ts'
segment-count: 10
[SEGMENT]
use-litter-pools: 0
0
0
0
0
0
0
0
0
0
46
27
31
38
6
45
27
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
44
27
31
38
11
1
27
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
13
27
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
44
27
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
46
27
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
14
27
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
9
1
27
0
0
0
0
0
0
0
0
[SEGMENT]
... ...
[Object]
instance-of: 'player'
position:
0 [0, 11, 0]
Having a separate list of object, instead of
[Object]
instance-of: 'sentinel'
position:
0 [5, 12, 0]
specifying them directly as a part of the
... ...
segment list, is also due to the requirement
[END]
for readability and simple means of editing.
V.4.1. Extending the system to other file formats
In order to implement file output to another format a new class must be derived
from genBlueprint overriding the virtual method genBlueprint::save(). It would
also be necessary to provide a new implementation for the bordering context
matchers, as they depend completely on the contents of the tileset. It is unlikely
that any changes would be required in the smoothing and redundancy removal
context matchers.
57
CHAPTER VI
GRAPHICAL USER INTERFACE
In order to be a complete and useful tool the level generation system must be
provided with the user interface necessary for specifying level parameters and
inspecting the output of the system. Figure VI-1 presents layout of the graphical
user interface developed as part of this project.
Figure VI-1. Layout of the User Interface
At the top of the screen is located the “Training History” window, showing the cumulative reward for
each run and each training session. The user is also presented with two buttons for scrolling the blueprint
view at the left and right edge of the screen, as well as a progress indicator at the bottom.
The user interface is rendered by the same graphical engine that is used for testing
generated levels. An alternative solution would be to use Windows API directly or
trough a library such as MFC. This option was ruled out because of the resulting
dependence on the Windows operating system and the unnecessary complication
of the user interface subsystem. The small set of user interface controls that was
58
implemented for the project consists of buttons, sliders, an edit box control and a
“container” class that implements the functionality of a window and a loadable
dialog box. This class hierarchy provides sufficient functionality for the purposes of
the level generation system.
VI.1. The Level Parameters Object
Level generation parameters are stored in the genParameters class which is
derived from the STL class for an associative array std::map. This class is the link
between the different sub-systems of the level generator. The use of std::map as a
base class allows the handing of parameters to be implemented in a very intuitive
and flexible way. Parameters can be addressed by a string index corresponding to
their name. For each registered parameter the minimal, maximal and default value
can be specified. Once a pointer to the genParameters object is available reading
and writing parameters can be performed as demonstrated in Figure VI-2 below.
(*params)["Level Chaos"].dMin = 0.2;
(*params)["Level Chaos"].dMax = 0.8;
(*params)["Level Chaos"].dValue = 0.5;
Figure VI-2. Registering a new parameter
The minimum, maximum and default values are specified.
Any object can add a new parameter by referring to it for the first time and it
automatically become available to all interested readers of the parameter value.
The classes genAlgorithm and genBlueprint, as well as their derived classes, can
specify the parameters they need during the initialisation of the level generator.
This is implemented as a call to the methods genAlgorithm::specify(parms) and
genBlueprint::specify params).
It is the task of the user interface subsystem to identify these parameters and
provide the necessary controls for their modification. The user interface subsystem
adds a slider control to the parameters dialog for each one of the specified
parameters.
VI.2. Implemented User Interface Controls
The class hierarchy of user interface controls starts with the classes uiSkin and
uiSkinObject (Figure VI-3). Creating an instance of class uiSkin results in the
loading of all texture files in order to draw the user interface. This class assigns a
59
unique text identifier to every texture and colour variable presented in the skin file,
so that these objects can be accessed easily by user interface controls. The rest of
the user interface hierarchy is based on the class uiSkinObject that stores a pointer
to the uiSkin object.
‘Cubic’ 3D Engine
Texture Files
GUI Hierarchy
uiSkin
uiSkinObject
uiWindow
uiProgress
uiPointer
uiLabel
uiSlider
uiButton
uiContainer
uiEdit
uiDesktop
Figure VI-3. Base Classes of the User Interface
The used graphical engine does not have its own hierarchy of user interface classes but
they are necessary for the level generation system.
The class uiPointer represents the mouse pointer. This class accesses the texture
with identifier “TX_POINTER” and draws it at the current position of the cursor.
More important out of the two direct children of uiSkinObject is the class uiWindow.
It implements the interactive behaviour of a window and provides several virtual
methods
that
can
be
overloaded
by
children
classes.
The
method
uiWindow::draw() draws the window at its current position and uiWindow::reply()
is called when any mouse event occurs within the area of the window. In addition
to these virtual functions, the class also implements several mouse event methods
(e.g. onMouseMove(), onClick() ) that can also be overridden.
The functionality of a dialog box is implemented by class uiContainer, which
is
based on uiWindow and is capable of maintaining a list of child windows. The list is
represented as std::vector and the uiContainer::draw() and uiContainer::reply()
60
methods ensure all children are drawn at their appropriate positions and respond
to mouse input. This class also introduces a new method uiContainer::load(file)
that automatically creates the child controls specified in a dialog template file.
Dialog templates reside in the ‘output\gui\dialogs\’ directory of the project. The
level generation system implements several dialogs that use this mechanism.
The classes uiLabel, uiEdit, uiButton, uiProgress and uiSlider implement the user
interface controls that their respective names suggest. All of these classes inherit
the uiWindow functionality and implement their specific function by overriding the
methods uiWindow::draw(), uiWindow::reply() and mouse event methods.
The class uiDesktop is a special top-level container. Creating an instance of
uiDesktop will automatically create the uiSkin and uiPointer objects. All other
windows and containers should be inserted into a uiDesktop object, ensuring the
existence of a single skin and mouse pointer per desktop. There is no restriction on
the number of separate desktops that can be used but the level generation system
needs only one instance of this class.
VI.3. Level Generator Dialogs
The level generation system implement four main dialogs, all of them implemented
as classes derived from uiContainer (Figure VI-4). Most of the user interface
controls in the dialogs are created automatically form the corresponding template
files but in order to respond to user commands it is alo necessary to derive a class
and to override the uiCointainer::onCommand() method. In the following sections I
present the functionality implemented in each of these classes.
uiContainer
uiStatisticsWindow
uiParametersDialo
g
uiProgressDialo
g
uiCompletedDialog
Figure VI-4. Dialogs Implemented for the Level Generation System
61
VI.3.1. Parameters Dialog ( uiParametersDialog )
This is the first dialog that is created during the initialization of the system.
Although the parameter dialog is not visible when level generation starts the object
remains active in memory. This class creates all important objects of the level
generation system, namely an instance of genParameters, genAlgoritmh_RL and
genBlueprint_CB.
Figure VI-5. Parameters Dialog
During initialization a slider control is added for each
parameter contained in the genParameter object.
The constructor of the parameters dialog creates the level generation system
objects
and
calls
the
methods
genAlgoritmh_RL::specify(parameters)
and
genBlueprint_CB::specify(parameters). After this step all necessary parameters of
the level generation system are registered in the genParameters object. The next
step is to enter a loop that enumerates the parameters and insert a slider control
for each one of them. The end result of this process is presented in Figure VI-5.
Originally, the dialog template contains only the edit box controls and the push
buttons. After the specify methods are called the dialog also includes sliders
attached to the blueprint and generation algorithm parameters.
62
VI. 3.1.1 Generation thread
When the user presses the “Generate” button level generation starts in a separate
thread. This is implemented in the class genGenerationThread and the role of the
parameter dialog is only to create an object of this type and supply it with the
necessary pointers. Having two threads makes it possible to generate levels and
respond to user commands simultaneously. It also helps to separate the graphics
code form the core functionality of the level generation system. The two threads
communicate by means of sharing the common genBlueprint, genAlgorithm and
genParameters objects.
VI.3.2. Progress Dialog ( uiProgressDialog )
This class implements a progress tracking dialog and it is also created when the
level generation procedure starts. This dialog maintains a pointer to the level
blueprint and an internal flag indicating the current phase of level generation ( i.e.
training, variant generation, post-processing, or completed). Depending on the
current phase the progress bar is updated accordingly.
VI.3.3 Training History Dialog ( uiStatisticsDialog )
This class is a statistics dialog that draws a graph of the cumulative reward for
each training run. It also prints an indication of the number of successful runs and
measures the training time. All previous sessions are recorded by the class so it is
easy to see the effect of different parameter settings on the performance of the
reinforcement learning algorithm.
VI.3.4. Completion Dialog ( uiCompletedDialog )
This dialog is created at the end of the level generation procedure. As illustrated in
Figure VI-6 it prompts the
user to make a choice
about the next action of
the
level
generation
system.
If the generated level is
satisfactory,
it
can
be
tested by loading it into
the
game
engine.
Figure VI-6. The User is Prompted to Test the Level
Once the level generation procedure is completed, the user is
allowed to make a choice as to his next action. He can either test
the level with the game engine, generate another variant (without
re-training) or return to the parameters dialog and specify different
settings.
This
option is available through the “Test” button. There are two alternative options - in
63
case that the training phase was completed in a satisfactory way but the user
wants to try a different variant of the level, it is possible to do this by pressing the
button labelled “Another variant”. This clears the level blueprint and invokes the
genAlgoritmh::generate()
method
without
a
preceding
call
to
genAlgoritmh::train(). Generation of variants is a usually a quick procedure,
provided that the learned policy is good.
The last option available to the user is to return to the parameters dialog. If the
user chooses this option all dialogs of the generation phase are hidden and the
parameters dialog is presented again. This allows for re-training of the system and
further experimentation with different parameter values.
64
CHAPTER VII
OUTPUT AND EVALUATION
This chapter present the experiments that were performed on the level generation
system in order to find an optimal parameter setup and also a set of benchmarking
tests designed with the goal of evaluating performance and scalability.
Finding an optimal parameter configuration is an important prerequisite for the
evaluation of the system. The optimisation starts with a “global search” in the
parameter space that helps to identify areas of high performance. Due to
constraints in computational power, it is not possible to perform the global search
with a fine-grained step so a more accurate “local search” focuses on the better
parts of the joint parameter space. With respect to a small area of the parameter
space it is also possible draw meaningful conclusions on the basis of individual
parameters, rather than the multidimensional parameter setup. At the end of this
stage an optimal parameter configuration is identified and its convergence and
generative performance metrics are reported.
The evaluation of the system continues with a set of performance and scalability
benchmarks measuring Training Time, Variant Generation Performance, Postprocessing Time, Scalability with Regard to Level Length and Branching, as well as
the effects of the “chaos” parameter.
Figure VII-1 on the next page shows a sample output of the system. For each of
the displayed levels, the upper image shows the level blueprint and the lower
image is the post-processed level, as rendered by the graphical engine. The
presented levels are of the same length l=100 but the system is capable of
generating much longer levels with sizes in the range [50; 400] cells. Because the
player normally sees only a “window” of the level that scrolls as he moves,
presenting long levels in a printed document can be difficult. Levels of greater size
can be explored directly in the game engine integrated with the level generation
system.
65
(b=1)
(b=2)
(b=3)
Figure VII-1. Examples for generated levels with different branching factors
66
VII.1. Methodology
VII.1.1. Experimental Setup
All of the experiments presented in this chapter are implemented as methods of
the class genGenerationThread. This includes thee methods:
The method genGenerationThread::autodetect() can reproduce the parameter
optimisation tests. In the parameters object genParameters, the step sizes and
ranges for all parameters of the reinforcement learning algorithm are specified. For
a value-based learning algorithm this includes the following:
·
Learning rate, a;
·
Attenuation of the learning rate (specified in percents of a);
·
Reward discount rate, g;
·
Parameter of the eligibility traces, l;
·
Random action selection probability, e;
·
Attenuation of the random action probability (specified in percents of e);
·
Maximal number of training runs, NRmax.
Automatic parameter detection is implemented as a loop that re-trains the system
and explores all parameter combination with the specified ranges and step sizes.
Each trial is repeated N times, where the sample size N is a parameter varying for
different tests. Sample sizes of 25, 35 and 45 were used, as specified in the
particular test. It was the effort of the author to obtain as large a sample as
possible but in some of the tests larger values of N become computationally
intractable.
The method genGenerationThread::autodetect() also prints its results in the
training history window and also displays an additional window showing the best
parameter configuration discovered so far. For each sample the parameter vector
<a, amin, e, emin, g, l> and the pair <P, Perror> is recorded in the file “gs.txt” in the
output directory of the project, where P is the performance measure of interest.
Another method, genGenerationThread::benchmark_ttrain(), implements timing
benchmarks. This type of test differs form the detection of optimal parameters in
that it implements a loop over level parameters, rather than the reinforcement
67
learning parameters. Level parameters varied during benchmarks of this type may
include any of the following:
·
Level length, lÎ[50,500];
·
Level branching factor, bÎ{1;2;3};
·
Level chaos, c=1-β;
where βÎ(0;1) is the “greediness” parameter of the Softmax action selection
policy. In the used algorithmic implementation this is corresponds to a Gibbs
distribution with parameter β [Num05].
During the benchmarks reinforcement learning parameters are set to the optimal
values discovered in the optimisation phase. The benchmarking method also needs
a way to measure time and time variation. I use the Windows API function
GetTickCount(), which returns the number of milliseconds since the last reboot of
the system. The resolution of this timer is the same as the resolution of the
hardware system timer, which is adequate for the measurement of level generation
times.
It was already discussed in Chapter IV that the learning algorithm used by the
level generation system is SARSA. As a result of the time constraints for the
development of this project it was not possible to rigorously evaluate the
performance of multiple algorithms, although some informal comparison with QLearning showed that SARSA generally performs better. Another interesting future
development would be to compare performance of value-based learning and a
Genetic Algorithm implementation of unsupervised learning. Genetic algorithms
have a naturally built in randomness in the solution which could be beneficial for
this particular task.
VII.1.2. Performance Measurement
The tests presented here use two main performance measures. The first one is a
measure of convergence and is defined as follows:
PC =
N Successful
N Limit
,
(Eq. VII-1)
where NSuccessful is the number of training runs resulting in a successful outcome
(i.e. not a fail state) and NLimit is the maximal allowed number of successful runs.
Although each training session has an upper limit of NRmax, in the case of achieving
68
NLimit successful attempts it is assumed the learning outcome is successful and
further training is not necessary. If the training session ends and 0 < NSuccessful <
NLimit this shows some partial success but no convergence or late convergence. The
value of the parameter is always in the range PcÎ[0;1] and given enough samples
it can be interpreted as convergence probability.
The second parameter corresponds to the probability of generating a correct level
variant. It can be specified with the following equations:
If Pc > 0.2,
PG =
Otherwise,
N Attempts
æ
PG = 0.2 * PC + 0.8 * çç1 NG Max
è
ö
÷÷
ø
(Eq. VII-2)
PG = 0.2 * PC
The generative performance PG is a compound measure of the convergence
probability and the probability of generating a successful level variant on the first
attempt. The probability of generating a successful variant is estimated as the
number of attempts it takes to generate the level (NAttampts), divided by the upper
limit for the number of attempts.
In the case of poor convergence (Pc < 0.2) the additional assumption is made that
variant generation will not be successful for this policy. In this case the PG
parameter is proportional only to Pc. This approximation could introduce a minor
error in the results (e.g. if a very bad policy succeeds in generating a valid level
shortly before the value of NGMax is exceeded) but it speeds up testing
considerably. Generating a level with a bad policy results in many unsuccessful
attempts before the upper limit NGmax is exceeded and by adding the filtering
condition this performance bottleneck is resolved.
If a greedy policy is used for level generation, the value for the generative
performance would be PG=1. In practise PGÎ(0;1) because levels are generated by
following a Softmax action selection policy. Therefore when optimising the
parameters of the system the effect on PG should also be measured. For example
parameter setups having a very low value for e could result in acceptable
convergence but still be rejected as incompatible with the generation phase.
69
VII.1.3. Error measurement
The values of PC, PG and the different timing parameters are measured as the
average value in a sample of size NÎ{25; 35; 45}. The following definition for the
standard error applied:
SE =
s
N
(Eq. VII-3)
where s is the sample variance and N is the size of the sample. The error margins
are indicated with error bars, or in some cases with a dashed line, in a range of
two standard errors around the measured value.
VII.2. Optimisation of Parameter Settings
VII.2.1. Global Search in the Parameter Space
Performing a fine-grained search in the parameter space becomes computationally
tractable only for a small range of the parameter values so in order to find a global
optimal solution it is necessary to implement a preliminary “global search” with
large step sizes. It is assumed that the performance metrics outlined in the
previous section change smoothly within a small range of the parameter values. In
light of this, the global search should help to identify good areas in the joint space
of parameters and in a subsequent step these areas can be explored further.
The results of the global search are presented in Figure VII-2 on the next page. In
this test the sample size was set to N=45 and the maximal number of training runs
to NRmax=100. At this stage only the convergence metric PC was recorded because
calculating the generation metric PG would only incur additional computational
costs in a preliminary search. The test was performed for the following
combinations of parameters:
·
Learning rate, a Î [0.1; 0.5]; step size 0.2;
·
Attenuation of the learning rate [0; 80], step size 40%
·
Reward discount rate, g Î [0.1; 1], step size 0.2;
·
Parameter of the eligibility traces, l Î [0.1; 1] ; step size 0.2;
·
Random action selection probability, e Î [0.05; 0.5], step size 0.15;
·
Attenuation of the random action selection [0; 80], step size 40%;
70
Level length was set to l=100 and the maximal branching factor of b=3 was used.
The small length is intended as a way of avoiding superfluous computations as
longer levels do not present a greater challenge but training can take longer as it is
necessary to maintain the discovered behaviour of a longer period. The maximal
branching factor presents the greatest difficulty for the building agent so a value of
b=3 is used in all optimisation test.
a
0.5
0.1
amin, %
100
0
e
0.5
0.05
e min, %
100
0
g
1
0.1
l
1
0.1
performance
PC
0.59
0
324
333
342
351
360
369
Figure VII-2. Global Search in the Parameter Space
In the graph for PC the dots correspond to samples of the performance measure
and the lines surrounding them reflect the standard error confidence interval. It is
immediately obvious that values e > 0.2 inhibit performance regardless of other
parameter settings. Another fact revealed by the graphic is that higher values of
the learning rate a make the performance peak higher and concentrated towards
lower values of e. As evident by the confidence intervals the small scale variation is
71
not noise but it is caused by the g and l parameters changing with the highest
frequency.
Figure VII-3 shows the six highest performing parameter combinations discovered
during the global search and their corresponding g parameters. There is a
performance
suggesting
peak
this
approximate
is
value
at
a
g=0.9
0.65
good
for
the
0.51
This difference becomes even
Pc
parameter.
0.32
more clearly expressed when
comparing
the
generative
0.23
performance PG measured for
the top six solutions (Fig. VII-
0.1
4). Higher values of gamma
0.25
0.50
Gamma
Figure VII-3.
clearly outperform lower values
0.90
Best parameter configurations after global search
during the generation phase.
Figure VII-4 also shows the different performance when using a Softmax and egreedy policy to introduce level chaos. The Softmax test was performed with β=0.5
and the for the e-greedy the parameter was also set to e=0.5. Although it is
difficult to judge the equivalence of these parameter settings they are both the
minimal values that result in apparent level variety and acceptable output. For
values g > 0.5 the Softmax policy is less disruptive and results in a better value of
the PG metric.
e-Greedy
Softmax
0.5
0.5
0.41
Pg
0.34
0.12
0.06
0.2
0.4
0.6
Gamma
0.8
1
0.2
0.4
0.6
Gamma
0.8
Figure VII-4. Influence of the gamma parameter on the generation of level variants
72
1
1
VII.2.2. Local Search in the Parameter Space
During the local search the same experimental setup was used, except for smaller
step sizes concentrated within a smaller range of the parameter values. This
resulted in an improved parameter set with convergence measure Pc=0.64. The
corresponding parameter settings are <a=0.28, amin=60%, e=0.17, emin=75%,
g=0.9, l=1.0>.
Within a small range it is also reasonable to assume the independence of the
learning parameters and try to analyse them as separate variables. Figures VII-5
and VII-6 show the correlation between performance, the e and a parameters, and
their attenuation rates.
Epsilon min, 75%
Epsilon min, 45%
Epsilon min, 100%
Pc
0.6
0.53
0.4
0.18
0.2
0.14
0.1
0.2
0.1
0.3
0.2
Epsilon
0.3
0.1
0.2
0.3
Figure VII-5. Correlation between PC, epsilon and its attenuation
Alpha min, 40%
Alpha min, 60%
0.7
0.62
Alpha min, 80%
0.7
0.7
0.6
Pc
0.51
0.25
0.31
0.1
0.2
0.3
0.4
0.1
0.3
0.2
0.3
Alpha
0.4
0.1
0.2
0.3
0.4
Figure VII-6. Correlation between PC, alpha and its attenuation
In the first row of graphics it is clearly visible that performance peaks at an
attenuation rate of 75% and a value of the epsilon parameter around e=1.5. For
higher and lower attenuation rates the graphic lacks this distinct maximum and is
more evenly distributed.
The correlation between performance and the alpha parameter has a different
character. Low attenuation rates appear to shift the peak towards lower values of a
( Pc=0.62 at <a=0.25; amin=40%, > ) and higher attenuation rates towards higher
73
values of a ( Pc=0.6 at <a=0.4; amin=80%, > ). These two alternatives appear to
be equally good but the attenuation rates around amin=60% yield a worse maximal
performance.
VII.2.3. Results of Parameter Optimisation
Results of the parameter optimisation phase are summarised in the Figure VII-7.
As evident by the table, the global search value for Pc (1) is improved in the phase
of local search (2). The generation metric PG remains the same in both cases.
Implementing a phase of local search that optimises PG instead of Pc results in an
slight improvement of the convergence metric but at the same time a decrease in
convergence (3).
Figure VII-7. Results of parameter optimisation
Parameter Settings
PG
PG
Softmax, β = 0.5
e-Greedy, e = 0.5
0.5
0.4
0.25
0.64
0.4
0.3
0.4
0.55
0.25
PC
l = 0.3
l Reduction = 60%
(1)
e = 0.1
e Reduction = 70%
g = 0.9
l = 1.0
l = 0.28
l Reduction = 60%
(2)
e = 0.17
e Reduction = 75%
g = 0.90
l = 1.0
l = 0.29
l Reduction = 60%
(3)
e = 0.17
e Reduction = 75%
g = 0.94
l = 0.94
74
Because of the high amount of chaos introduced in levels, it is difficult to achieve a
value close to one for both of these parameters. Nonetheless, as the next section
demonstrates these results are sufficiently good to ensure the quick generation of
levels within some range of the parameters. It should also be noted that the PC and
PG parameters were tested for the maximal value of the branching factor b=3
which corresponds to the most challenging level generation task. In the case of
b={1;2} both parameters are in the range [0.8;1].
VII.3. Generator Benchmark
VII.3.1. Intrinsic evaluation
In this section I present several performance and scalability benchmarks based on
the optimal parameter configuration (Fig. VII-7, (2)). In most of the tests
performance is measured as the time required for completing the training of the
system, the generation of
a level variant or some
Component
Processor: Intel Core2 Duo 2.2Ghz
other time measure. It is
System memory: 2GB
therefore essential to give
an exact account of the
hardware
and
Description
Operating system: Windows XP, 64bit edition
configuration
operating
Figure VII-8. Test configuration
system
used for performing the benchmarks. This information is available in Figure VII-8.
All of the presented benchmarks use a sample size of N=35, except the Variant
Generation Benchmark, where a smaller sample size of N=25 was used. This
particular benchmark requires the generation of a large amount of levels with
increasing length and a re-training of the system for each one generated variant.
In this case using a larger sample was not possible.
VII. 3.1.1 Benchmark: Training time
The goal of this benchmark is to evaluate the time required for generating a level
with given length lÎ[50;500] and branching factors bÎ{1,2,3} and to determine
how this performance scales. Figure VII-9 shows the results of this benchmark.
It can be argued that the increase in length does not increase the complexity of the
reinforcement learning task but only requires the agent to maintain the discovered
behaviour for a longer period of time. This is why the performance scales linearly
75
with regard to level length. The more interesting effect is that of increasing the
branching factor. For the maximal level length of 500 and a branching factor of
b=3 the average training time is t=4.1 seconds, as opposed to only 0.6 seconds in
the case of b=1.
5
This can be explained with the
fact
that
higher
4
a
more
complex
problem. It is necessary to
learn how to create a longer
“ladder”
out
of
terrain
Training Tim e, sec .
requires the building agent to
solve
3.5
3
2.5
2
1.5
elements in order to make the
1
upper branches accessible to
0.5
the player. This difficulty is
concentrated
only
in
b=1
b=2
b=3
4.5
branching
50
100
150
200
the
250
300
Level Lenght
350
400
450
500
Figure VII-9. Training Time Benchmark
beginning of the level and
accounts for the initial offsets
of 0.05, 0.25 and 0.8 seconds respectively for b=1, 2 and 3. It is also necessary to
simultaneously prevent all of the branches from becoming deadens, which explains
why the slope of the graphics increase proportionally to branching.
VII. 3.1.2. Benchmark : Variant generation
The goal of this benchmark is to evaluate the number of attempts required to
generate a valid level in the presence of chaos, as introduced by the Softmax
action selection policy. Figure
VII-10
shows
1200
the
1000
Unlike training time, variant
generation does not scale well
with
the
increase
of
level
Variant Attem pts
experimental results of this
benchmark.
b=1
b=2
b=3
800
600
400
200
length. In the range lÎ[50;
250] the number of attempts
is
less
than
100
for
0
150
all
branching factors. However,
200
250
300
Level Lenght
350
400
Figure VII-10. Variant generation benchmark
as the length increases further a huge difference between the branching factors
76
appears. Levels with b=1 scale very well, whereas the number of attempts grows
up to 1200 in the case of bÎ{2;3}. This effect can be explained with the fact that
preventing all of the branches from becoming deadens becomes an increasingly
difficult task in the presence of level chaos. An analogy can be made with a palace
made of cards, where someone continuously shuffles random cards at the base of
the construction. As the palace gets taller it becomes more likely that a random
shuffle will result in the collapse of the structure. In the case of level generation,
each steps brings a small probability of triggering the fail state and as the building
agent moves along the length of the level this probability gradually accumulates. It
is sufficient to have one impossibly long jump and a branch of the level will be
invalidated.
Figure VII-11. Relation between PG and the Level Chaos parameter
1
0.9
0.8
Pg
0.7
0.6
0.5
0.4
0.3
0.2
0.2
b=3
b=2
0.3
0.4
0.5
0.6
Level Chaos
0.7
0.8
Figure VIII-11 presents an evaluation of the negative effects of level chaos on the
generative performance PG. This test was performed for a level length of l=200. As
evident by the figure, the probability of generating a valid level with three
branches is PG=0.6 in the range c Î [0.2;0.5]. As the chaos parameter increases
this probability drops to 0.3 at which point it would take many attempts to
generate a valid level. Lower branching factors are not affected by this negative
effect to such an extend. The generative performance decreases slightly for a
branching factor of two, whit a minimum of 0.74. The graph for b=1 is not
displayed on the figure because it is the horizontal line PG = 1.
In light of these results it is clear that the chaos factor should be set to the
minimal value that is sufficient to create the appearance of variety in the level. It is
77
the subjective judgement of the author that a value of the parameter c=0.5 is
sufficient for this purpose and higher values should be avoided.
After considering the graphs on figure VII-11 it becomes clear that the cause for
the large number of attempts it takes to generate levels with many branches is
mainly caused by introducing level chaos. Therefore a possible way to improve the
performance of the system is to find a way of creating level variety that is not
entirely dependant on the Softmax action selection policy. One approach to that
end would be to modify the reward function so that the agent is proactive in the
creation of varied levels instead of just responding to a non-deterministic
environment.
VII. 3.1.3. Benchmark: Post-processing time
The goal of this benchmark is to evaluate the time required for the post-processing
of levels and how well it scales with level length. Figure VII-12 shows the results of
this benchmark.
Figure VII-12. Post-processing benchmark
Post-processing Time, sec.
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0
50
100
150
200
250
Level Lenght
300
350
400
The figure reveals a linear dependence between length and post-processing time.
This is not a surprising result as the number of context-matches performed by the
post-processing algorithm also increases linearly with every new column added to
the length of the level. Level branching also increases this measure linearly but the
effect is negligibly small.
78
VII. 3.1.4 Total level generation time
For a level blueprint with parameters {l=400; b=3} post-processing adds only
tpp=0.06s to the total generation time. In comparison, the average training time
for the same length is tt= 4.1s and variant generation time (i.e. following the
Softmax policy) is measured to be tpp = 0.05s. Both training and variant generation
can take multiple attempts in order to generate a valid level. In view of these
performance measures and the PC and PG probabilities it is easy to estimate the
total generation time for different parameter setups. This estimation is presented
in the following table (Figure VII-13).
Figure VII-13 Estimated Generation Times
Parameters
Training
Generation
Post-processing
Total
l=100, b=1
0.06s x 1
0.03s x 1.5
0.015s
0.2s
l=300, b=1
0.2s x 1
0.04s x 10
0.04s
0.7s
l=400, b=1
0.4s x 1
0.05s x 25
0.06s
1.7s
l=100, b=2
0.45s x 1
0.03s x 1.5
0.015s
0.5s
l=300, b=2
1.3s x 1
0.04s x 420
0.04s
18s
l=400, b=2
1.6s x 1
0.05s x 650
0.06s
34s
l=100, b=3
1.04s x 2.2
0.03s x 30
0.015s
3s
l=300, b=3
2.9s x 1.8
0.04s x 1030
0.04s
46s
l=400, b=3
4.1s x 2.4
0.05s x 1100
0.06s
1m 5s
79
VII.3.2. User Suggestions for Future Development
User evaluation was performed only by a small group of people that have relevant
interests in video games and in game development. This evaluation, as well as test
plays performed by the author, confirm that generated levels are traceable in the
game engine and the output is valid. Although the sample size was too small to
draw any statistical conclusions about the quality of levels it proved valuable
source of ideas for the future improvement of the level generation system. Figure
VII-14 summarises these ideas and proposes possible solutions.
Figure VII-14.
User suggestions for further improvement
Problem
Possible Solution
Creating more monster and trap types; This is not part
1
The small number of
of the level generation system but if the placement of
enemy types makes the
enemies is to be an informed choice, and not completely
levels less interesting
random, the level generation system would need to
differentiate between several enemy types.
Introducing a difficulty parameter, controlled by the
user,
and
a
“difficulty
coefficient”
CD
calculated
automatically as the level is generated; CD could be
calculated as the number of danger element placed per
No parametric control
2
level length.
over the placement of
dangers and treasure
If CD diverges significantly from the user specified
parameter the fail state would be triggered. Parametric
control
over
the
placement
of
treasure
can
be
implemented in a similar way.
Solving this problem would require an implementation of
(2) and the dynamic adaptation of the CD parameter as
Difficulty does not
3
escalate near the end of
levels which is normally
expected to happen
a function of the current position in the level.
It must be pointed out that by adapting the CD
parameter the environment of the building agent will
become non-stationary. It must be ensured that these
non-stationary changes occur gradually in order for the
learning algorithm to perform well.
4
Enemies are too sparsely
located
Lower branches of the
level are sometimes
1
This issue can be resolved by implementing (1) and (2).
obstructed by upper
branches making some of
the jumps difficult
Some investigation revealed this problem is caused by
the post-processing rules. It was already discussed that
post-processing should not change the accessibility of
the levels but only to provide a visual improvement. The
rules that introduce changes in accessibility should be
modified.
80
CONCLUSION
This document presented the architecture and implementation of a reinforcement
learning level generation system. In the evaluation of the system it was
demonstrated that it can produce valid platform game levels within reasonable
time constraints. It was determined that the length of levels can be varied safely in
the range of 50 to 300 cells for all of the tested branching factors and up to 400 in
the case of a smaller branching factor. The system successfully implements the
tasks of placing enemies and treasure in the level, as evident by inspection of the
output and test plays within the game engine.
Improving scalability with regard to level length and introducing parametric control
over the placement of treasure and rewards can be two beneficial developments to
the current project. Recording the cumulative amount of treasure and danger
placed since the start of the level, or in a sliding window of fixed size, can be the
basis for better parametric control. This value would then be normalized in the
range 0-100% and compared against a user specified parameter. With regard to
the improvement of scalability, it was already discussed that creating a more
proactive building agent could be the solution to this problem. The agent would
actively introduce variety, allowing a smaller value for the level chaos parameter to
be used. During the analysis and evaluation of the system it was discovered that
the chaos parameter is the main obstacle that harms scalability. In order to avoid
this and teach the agent how to be a proactive “chaos maker” it is necessary to
devise some measure of level variety. This could be realised in a way similar to the
cumulative danger and treasure measures.
Another feature of the implemented system is level post-processing. This is an
important final step in level generation because it allows the produced output to be
rendered in a game engine. Without post-processing the level is only an abstract
specification that does not have any graphical information associated with it. As
evidence for the correct functioning of this step screenshots of levels rendered in
the 3D engine were presented (Fig. VII-1). Additional level output, including that of
much longer levels, is available and ready for visualisation with the executable files
of this project.
81
BIBLIOGRAPHY
[3DR02] 3D REALMS, Duke Nukem: Manhattan Project. Video game,
http://www.3drealms.com/dukemp/index.html, (2002).
[Ada01] ADAMS, E., Replayability Part 2: Game Mechanics. in Gamasutra, Article
available at http://www.gamasutra.com/features/20010703/adams_01.htm,
(2001).
[Ada02] ADAMS, D., Automatic Generation of Dungeons for Computer Games.
BSc Thesis, University of Sheffield, pp. 9-13 (2002).
[BA06] BUFFET, O., ABERDEEN, D. The Factored Policy-Gradient Planner. in
Proceedings of the Fifth International Planning Competition, pp. 69–71, (2006).
[Bli00] BLIZZARD ENTERTAINMENT, Diablo II. video game,
http://www.blizzard.com/us/diablo2/, (2000).
[Bou06] BOUTROS, D., A Detailed Cross-Examination of Yesterday and Today’s
Best-Selling Platform Games. in Gamasutra, Article available at
http://www.gamasutra.com/features/20060804/boutros_24.shtml, (2006).
[CM06] COMPTON, K., MATES, M., Procedural Level Design for Platform Games. in
Proceedings of the 2nd Artificial Intelligence and Interactive Digital Entertainment
Conference - AIIDE, (2006).
[Com05] DE COMITÉ F., A Java Platform for Reinforcement Learning Experiments.
Laboratoire d'Informatique Fondamentale de Lille, (2005).
[Cry97] CRYSTAL DYNAMICS, Pandemonium II. video game,
http://www.crystald.com, (1997).
[GHJ*95] GAMMA, E., HELM, R., JOHNSON, R., VLISSIDES, J. Design Patterns:
Elements of Reusable Object-Oriented Software. Pearson Education, pp. 35-53,
(1995).
[Inc99] INCE, S., Automatic Dynamic Content Generation for Computer Games.
Thesis, University of Sheffield, (1999).
[JM08] JURAFSKY, D., MARTIN, J., Speech and language processing: An
Introduction. Pearson Education press, pp. 438-442 (2008).
[KLM96] KAELBING, L., LITTMAN, M., MOORE, A. Reinforcement Learning: A
Survey. in Journal of Artificial Intelligence Research, volume4, pp. 237-285 (1996).
[Kuz02] KUZMIN V., Connectionist Q-Learning in Robot Control Task. in Scientific
Proceedings of Riga Technical University, (2002).
[Las07] LASKOV, A., Three-dimensional platform game “Jumping Ron”. Bachelors’
Thesis, Technical University of Sofia, (2007).
[MA93] MOORE, A., ATKENSON, C. Prioritized Sweeping: Reinforcement Learning
with Less Data and Less Real Time. in Machine Learning, volume 13, pp. 103-130,
(1993).
[Mar00] MARTIN, K., Using Bitmaps for Automatic Generation of Large-Scale
Terrain Models. in Gamasutra, Article available at
http://www.gamasutra.com/features/20000427/martin_pfv.htm, (2000).
82
[Mar97] MARTZ, P., Generating Random Fractal Terrain. in Game Programmer,
Article available at http://gameprogrammer.com/fractal.html, (1997).
[Mil89] MILLER, G., The Definition and Rendering of Terrain Maps. in SIGRAPH ’89
Conference Proceedings, pp. 39-48, (1989).
[MSG99] MORIARTY D., SCHULTZ, A., GREFENSTETTE, J. Evolutionary Algorithms
for Reinforcement Learning Journal of Artificial Intelligence Research, volume 11,
pp. 241-276, (1999).
[Nin81] NINTENDO, Donkey Kong. video game, http://nintendo.com (1981).
[Nin87] NINTENDO, Super Mario Bros. video game, http://nintendo.com (1987).
[Nin96] NINTENDO, Super Mario 64. video game, http://nintendo.com (1996).
[Num05] NUMANN, G., The Reinforcement Learning Toolbox, Reinforcement
Learning for Optimal Control Tasks. Master’s Thesis, Technical University of Graz,
(2005).
[OSK*05] ONG, T.J., SAUNDRES, R., KEYSER, J., LEGGETT, J. Terrain Generation
Using Genetic Algorithms. in Proceedings of the 2005 Conference on Genetic and
Evolutionary Computation, pp. 1463-1470, (2005).
[PS05] PETERS J., SCHAAL S. Natural Actor-critic. in Proceedings of the Sixteenth
European Conference on Machine Learning, pp. 280-291, (2005).
[SB98] SHUTTON, R., BARTO, A., Reinforcement Learning: An Introduction. The
MIT Press, pp. 52-82, (1998).
[SCW08] SMITH, G., CHA, M., WHITEHEAD, J., A Framework for Analysis of 2D
Platformer Levels. in Proceedings of the 2008 ACM SIGGRAPH symposium on Video
games, pp. 75-80 (2008).
[Son96] SONY ENTERTAINMENT. Crash Bandicoot. video game, (1996).
[TWA80] TOY, M., WITCHMAN, G., ARNOLD, K. Rogue. video game, (1980).
[WDR*93] WHITELY, D., DOMINIC, S., DAS, R., ANDERSON, C. Genetic
Reinforcement Learning for Neurocontrol Problems. in Machine Learning, volume
13, pp.259-285, (1993).
[WLB*07] WHITE, A., LEE, M., BUTCHER, A., TANNER, B., HACKMAN, L.,
SHUTTON R.,RL-Glue. Available at http://rlai.cs.ualberta.ca/RLBB/top.html, (2007).
[Yu08] YU, D., Spelunky. Video game, http://www.derekyu.com, (2008).
83