Attempting to discover infinite combos in fighting games using

Transcription

Attempting to discover infinite combos in fighting games using
Attempting to discover infinite combos in fighting games using hidden
markov models
Gianlucca L. Zuin
Yuri P. A. Macedo
UFMG, Departamento de Ciência da Computação, Brazil
Figure 1: A background for Street Fighter 2. A game who inspired the model in which most fighting games followed.
Abstract
Designing for balance is core in competitive games. Ensuring fairness in player vs player games is a design goal that any game that
features this sort of interaction should, at least to some extent, strive
for. Unfortunately, it often happens that the whole of the possibilities given to a player exceeds the designer’s expectations, creating
combinations and exploits that sometimes threaten the game’s reliability as a balanced and competitive title.
Focusing on searching for an automated solution to one of the main
flaws of fighting games, specifically infinite or unfair combos, this
work discusses the use of Hidden Markov Models to predict if a
subset of player commands would result in a combo. To this goal
we study two different approaches: predicting the most likely sequence of player inputs in each frame that would result in a combo
and the most likely sequence of player actions, regardless of frame
information, that also could result in a combo. Experiments were
performed on a fighting game of our own design.
Both supervised and unsupervised learning algorithms were applied, however, due of the excess of noise in the first approach and
particularities of the implemented model, the first approach was unable to successfully predict combos. We then change our minimal
discrete time interval to a player action, rather than game frame.
In this last scenario the HMM is capable of identifying small combos but, when asked to find larger ones, it can only append smaller
combos that cannot be performed in the actual game. Despite that,
our discussions in the matter and our findings are presented in this
paper and should be relevant to this overall discussion.
Keywords: Fighting games, Hidden Markov Models, Game Balance, Baum-welch, supervised HMM learning
Author’s Contact:
[email protected], [email protected]
1
Introduction
Fighting games struggle to achieve balance among its characters.
To design and program a character’s move set is no trivial task,
and as such, it is vulnerable to many design flaws, some of which
may be product of the possible interactions between character’s kit
rather than an isolated design mistake. And while recent games may
benefit from patches that fix bugs and flaws, older generations of
games had to get it right on the first try. Of course, as the game aged,
more of its bugs and intricacies would be found out by the player
community, which in turn found ways to exploit these discoveries
and carefully break the designer’s barriers. Effectively hampering
their efforts for balance.
A design flaw particular to fighting games and action games that allow player vs player interaction is the existence of infinite combos.
A combo is a sequence of attacks by a player that while damaging an opposing player also does not allow for that receiving player
to take any actions. Combos are an integral part of many action
games, rewarding precise execution of commands and encouraging
players to keep on practicing. While these strings of attacks should
eventually end allowing the receiving player to retaliate, infinite
combos, as the name suggests, do not end as long as the executing
player does not miss his inputs. These types of combos are usually
the result of unexpected interactions between a character’s actions,
where a string of attacks loops with itself indefinitely. Although it
is the designers’ and programmers’ jobs to avoid such interactions,
the exponential number of possible action strings may cause some
slip in the final version of the game.
Balancing against infinite combos requires that the designers and
programmers be fully aware of the intended possibilities created by
an action within the game’s physics and rules. It is an exhausting
task that requires a lot of testing. Even when the combos are not
infinite, some attack strings may be too strong in game for what the
designers had planned for the character. Either case proves that balancing is a time consuming task that is also essential to the game’s
overall design. Yet, we believe this task could be automated by
using pattern and feature recognition machine learning algorithms.
In this work, we attempt to use Hidden Markov Models to predict
whether a sequence of inputs results in combos in a fighting game
of our own design.
The remainder of this paper is organized as follows: section 2 cites
some works regarding Artificial Intelligence techniques applied to
fighting games and HMM approaches in games. Section 3 introduces basic concepts, such as what is a HMM, the supervised learning algorithm, the Baum-Welch learning algorithm and, once the
HMM is trained, how to apply the Viterbi algorithm. Section 4
explains the game developed in order to perform our experiments.
Finally, section 5 and 6 address our HMM modeling and the results
obtained, as well as a discussion of them. The last section brings
the conclusion directing future work and the insights the authors
came to at the end of this work.
2
Related Work
with an umbrella and N that he or she had nothing with him or her.
S and R are the same as before.
The paper [Matsumoto and Thawonmas 2004] discusses the use of
Hidden Markov Models on video games using logs of player actions
to classify players in Massive Multi-player Online Games. It compares HMM performance with Memory Base Reasoning, a method
which predicts values based on the similarity of other similar cases.
Although MBR could be another approach to predicting combos,
as [Matsumoto and Thawonmas 2004] states on its own discussion,
these methods do not exploit time structures hidden within the action sequences of the players.
Most academic research around fighting games discusses improvements to Artificial Intelligence using various methods. [Yamamoto
et al. 2014] proposes the use o the K-Nearest algorithm to develop
a bot capable of predicting its opponent next set of actions and be
able to deploy a countermeasure in order to win the match. [Graepel
et al. 2004] use reinforced learning and different reward functions
to design an AI with good policies in a commercial game. Like in
our work, their agent tries to learn relevant data by inference in the
Markov decision process. [Saini et al. 2011] attempts to create a bot
capable of playing like humans do highlighting that fighting games
nowadays are highly dependent on human-human interaction and,
as such, an agent capable of mimicking a player is of extreme value.
It classifies each possible set of actions using Naive Bayes and finite
state machines to select which set would better mimic a player.
[Andrade et al. 2005] also addresses the difficulty of balancing
games and validates his approach in a fighting game as validation.
Through the usage of reinforced learning, they attempt to create
intelligent agents capable of adjusting the difficulty of the game according to the player’s skill. The agent receives a negative reward
if the game is either too hard or to easy and a positive one otherwise. Although we both tackle a similar problem, our focuses
diverge. What they wish is to balance the game changing the way
the machine controls each character and we, on the other way, wish
to identify flaws in each character.
S
R
U
0.4
0.82
N
0.6
0.18
Matrix 2: Emission matrix. Expresses the odds of seeing someone
carrying or not an umbrella in sunny and rainy days.
Given that you have been at the mine for 5 days, you wish to know
if today is raining or not. You remember that the day that you came
to the mine was not raining. In the first two days you didn’t see
a miner with an umbrella however in the last two you did notice
people bringing an umbrella. So, is it raining or not today?
This type of system is known as a Hidden Markov Model. The
states, which we want to know, are hidden and we only have access
to a set of observations which are tied to the states by probabilities.
Our goal is to make effective and efficient use of the observable
information to gain insight into various aspects of the Markov process. [Stamp 2004]
The Markov assumption states that the future is independent of the
past. All the information needed to predict it is already encoded
in the present. [Bryson 1975]. Generalizing: a given state n can
be inferred directly from the state n − 1. Therefore, in our miner
example, to know whether it is raining or not in the fifth day we
only need to look at fourth day. By calculating the most likely state
of each day starting on day 1 given day 0, day 2 given day 1 until
day 5 given day 4, we can stipulate whether today is raining or not
as seen in Figure 2.
We could not find any papers or discussions that would mention the
use of machine learning algorithms to improve or automate other
areas of game development, which suggests that this work might be
a novelty on the field.
3
Hidden markov models
Let’s look at a simple example. Assume that you are a miner working deep underground and you wish to know if it is raining or it is
sunny outside. You cannot go out and see as you are busy working, but you can see new miners arriving and what they are carrying
with them. You also know how often it rains in your city.
During the many years you lived in this city, you came to the conclusion that most days are sunny however, if one day rained, than
most likely the next day would rain as well. After reminiscing the
last few weeks and counting how many days were sunny in a row,
and also how many days were rainy in a row before the rain would
stop or start, you finally come up with Matrix 1. S denotes Sunny
and R denotes Rain. If today is raining, there is a 70% chance that
tomorrow will rain as well and a 30% chance that the rain will stop.
S
R
S
R
0.8 0.2
0.3 0.7
Matrix 1: Transition matrix. Expresses the odds of all possible
transitions between the states Rainy and Sunny.
Some of your co-workers are well prepared while others aren’t.
Therefore, even if it’s sunny outside, some of them will carry an
umbrella and even if it is raining, some of them will not. You know
the ones that would bring an umbrella and those that would not,
however from where you are, you cannot see the faces of the new
miners arriving. This implies that you cannot infer it is raining outside by only looking what people are bringing. Doing a generalization one can infer Matrix 2, where U denotes that you saw someone
Figure 2: Markov chain of the miner problem
Figure 3 models the HMM of the problem using the matrices 1
and 2. The first one, representing the probabilities of the next state
given the current one is called Transition matrix. The second matrix, showing the likelihood of an observation in each state is called
Emission matrix. Besides these two, HMMs use a third matrix
called π matrix which expresses the initial probabilities and, in this
particular example, can be viewed in Matrix 3.
π=
R
0.0
S
1.0
Matrix 3: π matrix. Since we know that it was sunny in the first
day, the initial probabilities for each state are 0 for Rainy and 1 for
Sunny.
Therefore, an HMM is a doubly stochastic process. It has underlying stochastic processes that are not observable (a hidden layer)
but can inferred by another set of stochastic processes that produce
a sequence of observable symbols. [Rabiner and Juang 1986].
The method to answer the question is it raining on day 5? will be
explained in the section Viterbi algorithm.
• E(S, U ) = 8 Emissions occur at 13 , 21 , 33 , 41 , 46 51 , 52 and
57
• E(S, N ) = 12
• e(S, U ) = 8/(8 + 12) = 0.4
• e(S, N ) = 12/(8 + 12) = 0.6
For the emissions in Rain, we have:
• E(R, N ) = 2 Emissions occur at 16 and 31
• E(R, U ) = 9
Figure 3: Modeling of the HMM matrices to the miner problem
• e(R, N ) = 2/(2 + 9) = 0.18
3.1
Supervised learning
In the miner example, you were able to create the Transition and
Emission matrices by reminiscing of the past. Supervised training
uses statistics of known labeled samples and estimates transition
and emission probabilities from those samples.
Using a sample database to count the number of transitions
and the number of emissions we obtain the frequencies of each
state and observation. With this information we can estimate
φ = a(xi , xj ), e(xi , ys ) using the respective frequencies, where
a(xi , xj ) denotes the probability of a transition from the state xi to
xj and e(xi , ys ) denotes the probability of a observation ys to be
seen in the state xi . The steps of the supervised learning algorithm
are:
• For each tuple xi and xj , calculate A(xi , xj ) = number of
xi → xj transitions in all samples.
• For each tuple xi and ys , calculate E(xi , ys ) = number of
emissions of the symbol ys in the state xi in all samples.
P
• a(xi , xj ) = A(xi , xj )/ n
k=1 A(xi , xk )
Pn
• e(xi , ys ) = E(xi , ys )/ k=1 E(xi , yk )
The first two steps calculate the frequency of each transition and
each emission. The second two use the relative frequency probability [Griffin and Buehler 1999] to estimate t(xi , xj ) and e(xi , ys ).
When observing and repeating an experiment multiple times, the
relative frequency of occurrence of a given event is a measure of the
probability of this event. What this means is that, if nt is the sample
space, the total number events, and nv is the number of occurrences
of the event v, the probability P (v) of this event occurring can be
approximated by the relative frequency P (v) ≈ nnvt .
Let’s assume the sequences:
• e(R, U ) = 9/(2 + 9) = 0.82
3.2
Baum-Welch learning algorithm
Some domains have abundance of unlabaled data but labeled data,
on the other hand, can be hard to find. In some cases the labeling
has to be done manually which can take huge amount of time for
deliberate large data. An algorithm that can use this unlabaled data
to train is valuable.
Defining a HMM as λ = (A, E, π) where A represents the transition matrix, E the emission matrix and π the starting probabilities, the Baum-Welch algorithm [Baum 1972] seeks to maximize
λ∗ = maxλ P (Y |λ), that is the HMM that maximizes the probability of the observations Y = {y0 , y1 , ..., yn }. Note that the BaumWelch does not need information from the X sequence, the hidden
states associated with each observation, and uses the principles of
expectation-maximization.
In order to do this, the algorithm has four steps. The first one is the
initialization. The HMM λ = (A, E, π) can be initialized with random values, equal distributions or previous knowledge. We should
always try to fill the starting HMM with some sort of prior knowledge since the Baum-Welch always converges toward a local maximum [Bilmes et al. 1998]. This not only makes it converge faster
but also helps finding a local maximum that is closer to the global
maximum.
The next step is the forward procedure.What we seek in this step
is finding the probability of seeing each partial sequence y1 , ..., yt
ending up in each state xk at each time t.
The base case: αxi (t) = πxi ∗ Exi ,y1
PN
SSSRRR SRRSSS RRSSSS SSSSSS SSRRRRS
NNUUUN UUUNNN NUUNNN UNNNNU UUUUUUU
The recursion: αxj (t + 1) = Exj ,yt+1 ∗
For the transitions from Sunny, we have:
The third step is the backward procedure. This is the probability of
the ending a partial sequence yt+1 , ...yT given that the initial state
at time t was xi .
• A(S, R) = 3 Transitions occur at 14 , 22 and 53
• A(S, S) = 12
i=1
αxi (t) ∗ Axi ,xj
The base case: βxi (T ) = 1
• a(S, R) = 3/(3 + 12) = 0.2
• a(S, S) = 12/(3 + 12) = 0.8
For the transitions from Rain, we have:
• A(R, S) = 3 Transitions occur at 24 , 33 and 57
• A(R, R) = 7
• a(R, S) = 3/(3 + 7) = 0.3
• a(R, R) = 7/(3 + 7) = 0.7
For the emissions in Sunny, we have:
The recursion:βxi (t) =
PN
j=1
βxj (t + 1) ∗ Axi ,xj ∗ Exj ,yt+1
From α and β we calculate γi (t) = p(Xt = xi |Y, λ) and ij (t) =
p(Xt = xi , Xt+1 = xj |Y, λ) which are used in the update rules
and categorizing the last step. We repeat the forward and backward
steps updating the matrices values until convergence (eg.: the probabilities found in the next iteration are worse than the ones found in
the previous one).
Folowing is the pseudo-code of each step, taken from [Sarkar
2000].
4
Developed game
To even attempt to approach the problem, we needed a fighting
game where we could simply input a player’s commands and get
whether the opposing player is stuck in a combo state or not. The
first part is relatively easy, the second however would require us to
get information stored within the game’s variables, and not many
games allow this sort of intrusion. There are very few Fighting
Games that are open source or mod-able to the extent we need. And
for that reason, we decided to continue our research on a fighting
game of our own making.
3.3
Viterbi algorithm
We could find the most likely sequence of weathers by listing all
possible combinations of them and finding the probability of the observed sequence for each of them. The most likely sequence would
be the one that maximizes: P (observation|hiddenstate).
In the miner problem, the most likely sequence of the weather
would be the one that maximizes:
Figure 5: Street Fighter 2 is a classic game that inspired many
to follow the two dimensional fighting style. The Developed game
was designed to replicate this fighting game model. The game was
designed in the Gamemaker: Studio, an API that allows for quick
functional prototypes and can be used both by entry-level novices
and seasoned game development professionals.
P (N, N, U, U |S, S, S, S), P (N, N, U, U |S, S, S, R),
P (N, N, U, U |S, S, R, S), ..., P (N, N, U, U |R, R, R, R)
This approach is viable, but to find the most likely sequence trough
brute force is computationally expensive. This is where the Viterbi
algorithm shines [Viterbi 1967].
We already modeled a Hidden Markov Model with state space S,
initial probabilities πi of being in state i and transition probabilities
ti,j of transitioning from state i to state j. We also have the observations y1 , ..., yn . Therefore, the most likely state sequence x1 , ..., xn
that produces the observations can be found by recurrence:
V1,k = P y1 | k · πk
Vn,k = maxx∈S P yn | k · ax,k · Vn−1,x
Where Vn,k is the probability of the most likely sequence responsible for the first n observations that ends in k. The Viterbi path
can be retrieved by looking backwards from the goal state until the
starting state following the higher probability path or by saving a
reference to the previous state.
Applying to our miner problem, we have the following viterbi path
in figure 4:
Figure 4: Steps of the viterbi algorithm applied to the miner problem. The last 5 days have most likely been sunny.
The game was developed for academic research purposes only, using graphical resources from Street Fighter 3 and sound files from
Street Fighter 4. The game flows at 30 fps and was programmed
using the Gamemaker engine. It replicates the game style of the
street fighter series while making adaptations that compensate for
the lack of some game features as well as to make the fighting system more robust. The game allows for player versus player matches
without a time limit.
Our game computes inputs in the form of symbols that represent
attack inputs (l = light, m = medium, h = heavy) followed by a
number which corresponds to the current directional input and represents the common numpad notation for fighting games (1 = down
left, 2 = down, 3 down forward, 4 = left, 5 = neutral, 6 = right, 7
= up left, 8 = up, 9 = up right). Only one directional input may be
read at any frame. Multiple buttons can be read at one frame, however due to a priority system implemented within the game, and the
lack of commands that require multiple buttons, inputs follow a priority where if multiple attack buttons are pressed within the same
frame, the game only registers one, following the priority list (”h”
> ”m” > ”l”).
The game was adapted to this project by changing the player controller input to a input file containing the commands at every frame
for player one, and also to output to a file at every frame that writes
whether player two is in a combo state or not. To optimize training
and execution times, the game was stripped of most of its graphical
and sound resources and its speed was increased up to 333 times
the original game speed, resulting in the maximum fps allowed by
the Gamemaker engine.
5
The proposed HMM approach
A frame is the shortest discrete time interval in a game. This is
where physics and graphics simulations occur, as well as any event
checks and any script calls. Fighting games, specially those that
appeal to more competitive audiences, try to remain smooth at 60
frames per second to allow more room for inputs and more diverse
frame data for character actions. At each frame, the game also collects the controller inputs for both players. Most fighting games
store inputs in order to allow certain special attacks that require a
precise sequence of commands.
5.1
Modeling after inputs
Any game will require player inputs, as it is necessary for the most
basic interaction level. As this is the most common information the
algorithm may ask from the player, regardless of the type of game
being played, we believe it is also the most generic set of information upon which a model can be built. One advantage of taking
such a broad approach is that, if successful, it could be applied to
any other fighting game as long as every action is deterministic.
What that means is that for the same set of actions in any given instance of the game, the outcome will always be the same. This is
not the case for games in which characters may have attacks that
generate random results, or have random events that unpredictably
change the current state of the game.
When a player makes an attack, he is unable to make another action
for a number of frames. This happens since every attack animation
has a period of cooldown associated with it. During this time, most
of that player’s actions are restricted, with a very few exceptions.
As such, whenever an attack is made, the player can expect to be
unable to act for some time. Likewise, when a player is hit by an
attack, he is also denied of actions for a short amount of frames
during his damage animation. The amount of time it takes for a
character that was hit to become able to act again depends on the
attack that hit him. If a player that hits an opponent with an attack
manages to attack again before that opponent may recover from the
last hit, we call that sequence of attacks a Combo. However, if the
opponent is able to recover during that interval we say that he was
Free.
This however does not imply that players will not give any inputs
to the game while they are unable to act, its quite the opposite actually. In order to make a special move, an attack that is usually
more powerful than the standard ones, the player needs to give a
sequence of inputs in a row and this is more efficiently done during the time that the player could not make a new action. Because
of this, between two significant inputs (eg.: two inputs that combo
with each other) there is usually a lot of insignificant inputs that do
not result in any actions but have a Free or Combo state associated
to them because of the opponent’s latent state and, therefore, are
categorized as noise.
During the execution of the game there are two kinds of information
that can be evaluated: the input of the player and its effect, which
can be seen as the opponent’s state. Since we are able to extract both
sequences from the execution of the game we have two possible
models. In one of them we observe the opponent’s state and the
other we observe the player’s input. The markov chain for both of
them can be seen in figure 6.
In the first scenario, modeled in figure 7, each input is a hidden
state. By training in this HMM we are able to learn the likelihood
of each input resulting in a combo and which inputs often follow
another. In this representation we are able to answer questions such
as ’which is the most likely sequence of inputs that result in a combo
of n frames?’.
In the second one, the hidden states are Combo and Free. Training
in this HMM results in different conclusions. Given that an opponent is in a combo, we learn the likelihood of that being caused by
each input. In this sense, we can answer the question ’the sequence
x1 , x2 , ..xn was most likely a combo of how many frames?’
Figure 6: Markov chain for both possible HMM models in the
combo problem. In the upper one we want to infer the sequence
on inputs while in the second one we want to infer the sequence of
combo/free
The focus of this work is in the first scenario: to find large combos.
However we also explore the second one in order to evaluate how
well the HMM behaves in all cases for this particular modeling and
how robust it can be.
Figure 7: Modeling of the HMM representing the inputs and the
’Free’(0) and ’Combo’(1) observations. inputs range from all combinations of possible attacks: -(none), l(light),m(medium),h(heavy),
and all possible directional inputs (1 to 9). This leads to 36 possible
hidden states, each having a transition to each other state. Every
arrow is associated with a probability.
5.2
Modeling after moves
A move by a player is a consequence of a combination of his inputs
within a time interval. Moves are specific to each character and
each game. In contrast to the first described input modeling which
is as generic as possible, this modeling approach is very specific to
the attacking character, as most characters in the same game may
share the same set of basic actions while also having a set of special actions unique to them. This also means that for a different
fighting game, the input data will be completely different, due to
different games having different sets of actions. This does not however change the structure of the HMM, as it only affects the number
of possible observations and hidden states.
One of the advantages of this approach is the reduction of noise as
the data becomes much more objective, describing the exact action
performed by the player instead of having to be tested by running
the inputs on the game alone. This however comes at the cost of
information loss, but it is otherwise similar to the input approach.
Again as with the input modeling, we attempted two different ways
to represent our data. This time however, we kept the player information (the actions) as our observations in both approaches, and
instead changed our definition of a time step from a frame-by-frame
to an action-by-action time interval. The difference between will be
further explained within section 6.4.
Figure 8: Markov chain for the HMM that takes player actions as
observations. The only diference between both action aproaches is
when a new state is registered.
6
Experiments and discussion
We manually wrote 47 different possible combos including what we
believe to be the longest combo within the game, lasting for roughly
240 frames (or 8 seconds). Next, we created ten variations of each
combo, containing noise for each non-relevant input in the combo
(a non-relevant input is one that is not considered due to the character being busy whilst that input is being read). Along with these
470 now possible combos we also created 1500 randomly generated input sequences. All these 1970 sequences were then executed
trough the testing game in order to get the sequence of states for
each one of these sequences of inputs.
6.1
Supervised training and input modeling
Since we have both the [Combo,Free] sequences and the input sequences, we chose to train our HMM trough the supervised method.
Given its simplicity and the lack of any prior knowledge and representation, it far outshines the unsupervised approach requiring only
a few seconds to execute. The problem with this kind of algorithms
is the need of labeled data, which can be difficult to acquire but is
not the case in our problem.
We proceeded to execute two instances of our HMM and trained
each one with a different set. The first one was trained only with
the 470 manually generated combos and the second one received
all 1970 sequences. The objective of this particular experiment was
to observe the HMM behavior when faced with the nearly complete
chaos of the full training set. The HMM with the smaller and more
well-behaved training set functions as our control-group.
The resulting HMM was fed to the viterbi algorithm and we queried
for the most likely sequence for the inputs:
Combo Combo Combo
Free Combo Combo
Both of them returned the same response although with different
probabilities:
5- 5- 5One of the reasons this occurred is because when generating the
game, both players are unable to do any actions for the first 5 frames
of the game. Because of that, the first five inputs of the game are
always 5- (stay still) which causes the starting probabilities to be
1.0 for 5- and 0.0 for all other states. Not only that but the transition matrix heavily favors the transition 5-→ 5-. However, since
the viterbi algorithm takes into account the emissions probabilities
alongside the transition ones, this does not explain why 5- was selected, as staying still should be associated to the ’Free’ state and its
emission probabilities should be close to (0.0 1.0). Taking a closer
look at a segment of the emission probabilities in Table 1 we start
to see the problem:
Input
586h
496m
23m
73l
Combo (1) probability
0.165115
0.205040
0.250362
0.264810
0.357834
0.254803
0.270237
0.253883
0.352612
0.252662
Free (0) probability
0.834885
0.794960
0.749638
0.73519
0.642166
0.745197
0.729763
0.746117
0.647388
0.747338
Table 1: Some emission probabilities in the inputs by frame modeling. Combo refers to the observation 1 while Free to observation
0.
Although the probability of 5- being Free is higher than being
Combo, so is the probability of everything else. Because of the
noise inserted, the probabilities of each inputs seem to converge to
(0.25 0.75). There are a couple odd things though. When counting the 0 to 1 ratio in the files, we conclude that roughly a third of
the states are 1s, which should lead to the probabilities being (0.33
0.66). The other one is why do the inputs 8-, 9- and 7- also behave
differently from everyone else?
During the generation of the random combos, there are no moves
including the 7 or 9 directional. This was made in order to avoid
the player moving away from the enemy (7 represents up left while
9 represents up right). They were however inserted as noise in the
470 combos. Because of this they follow the 0.33-0.66 trend of the
file.
As for the input 8-, the answer is quite simple. In the 470 combos
created, a large sum of them involve jumping to start the combo.
Because of that, there is this shift in probabilities towards the 0
state since the combo didn’t start yet and the player needs to go a
few frames going up to arrive at the desired height before attacking.
The probabilities related to 8- being in the state 1 are caused when
8- is noise, however. We then remove the first 5 frames of each
input sequence and each state sequence and train our HMM again.
As expected because of its abundance in the 470 combos, the most
likely sequence becomes:
8- 8- 8The last question is why do movements like 2- have the same probabilities and attacks such as 6m? As explained before, when a significant input is given, the character the player is controlling starts
an animation. However, its effects are not atomic, it takes a couple
frames for an attack to actually land. This makes an attack such as
6m to be treated as a 0 even though it resulted in a combo later on,
because in the frame the action was made, its opponent’s state was
0. The other attacks in a combo are treated normally though. This
explains why the probabilities of attack inputs being lower than they
should: the first move of a combo is always 0 and it potentializes
insignificant sequences to be marked as 1’s.
Also, we only look at a limited frame window (300 frames as it
leads to the long in game time of ten seconds, which is a long time
for a single combo). Therefore, if a combo is made during the final
frames of a window (which is the case), a couple 1’s are removed
which also lowers a bit the general probabilities.
We then begin a third experiment with a couple changes. First, we
only train over the 1500 random inputs to avoid unwanted tendencies. Second, we remove the first five frames of each sequence to
avoid the first 5- in all inputs. Third, we hard-code the starting
probabilities to be equal. The third step should not be needed given
the law of big numbers, but we wanted to be sure.
For the sequences of 20 Frees and 20 Combos we obtained the most
likely sequences of:
6l 3l 6m 1m 8h 6- 6l 3l 6m 1m 8h 6- 6l 3l 6m
1m 8h 6- 6l 3l
probabilities, which show that the model as a whole did not improve much.
6h 3m 1m 8h 4- 8l 1m 8h 4- 8l 1m 8h 4- 8l 1m
8h 4- 8l 1m 8h
6.3
Although the sequence for the Combos seems promising, looking
at the sequence of Frees show that these predictions are nothing
more that wild guesses. Indeed, when looking at the emission (represented in Table 2) and transition matrices, we can easily conclude
that.
Input
22h
6m
4m
3l
2m
44h
Combo (1) probability
0.251431
0.246933
0.247652
0.247322
0.248667
0.251471
0.251075
0.247857
Free (0) probability
0.748569
0.753067
0.752348
0.752678
0.751333
0.748529
0.748925
0.752143
Supervised learning and moves by frame
modeling
One of the biggest problems in the last two experiments were the
noise. In a sequence of multiple inputs, only a small sum of them
were relevant and represented what was happening inside the game.
This made most inputs to be nearly randomly labeled. In order to
remove or at least reduce this behavior, we stop looking at the inputs
of the player and we model our HMM after the actual actions of the
player. Figure 9 shows the observations printed in a same set of
actions in the two different models.
Table 2: Some emission probabilities in the inputs by frame modeling using only the 1500 random inputs.
The transition matrix is too large to be shown here. All probabilities
are close to 0.035. Since in these examples there are no moves
with 7 or 9, this is expected as 1.0/28 = 0.035. In the HMM
trained with all the examples and only with the 470 examples, the
probability of a move input to transition to another move input (eg.:
2- to 4-) is slightly higher. This occurs because of the tendencies in
the manual combos.
When running the HMM modeled after the second scenario, using
the inputs as observations, the HMM returned Free for all inputs
in accordance to what we expected from seeing the other HMM
performance.
6.2
Unsupervised training and input modeling
In order to try to improve the HMM obtained, we ran the BaumWelch algorithm over the HMM trained with 470 examples using
only the states of the 470 combos and using all the 1970 examples.
The shorter one converged rather fast, only requiring two iterations
and about half an hour. The longer one however did not converge
after over a hundred iterations and over 40 hours of training. Because of that, we decided to omit its results as they are most likely
poor.
The results were absolutely the same as the supervised learning,
which was expected given the extremely fast convergence and that
it trained using the same examples. All probabilities did however
change slightly as shown in Table 3.
Input
586h
496m
23m
73l
Combo (1) probability
0.088287
0.198316
0.263652
0.281583
0.398486
0.268071
0.288073
0.253883
0.396075
0.266460
Free (0) probability
0.911713
0.801684
0.736348
0.718417
0.601514
0.731929
0.711927
0.732035
0.603925
0.733540
Table 3: Some emission probabilities in the inputs by frame modeling using unsupervised training over the previous emission matrix,
represented in Table 1.
The probability of 5- increased drastically towards 1 and 8- followed the same pattern. All inputs however showed an increase in
the probability of a Combo. Although this seems to be a good improvement, the most likely sequence continues not being a combo.
Also, inputs like 2- and 4- also showed an increase in its combo
Figure 9: Difference in observations between the two proposed
models in each frame of the game.
Because we removed the noise between two moves, the 470 controlled combos become only 47 repeated 10 times. Even so, we run
the supervised training in the two sets of combos (the 470 and the
1500 ones). Table 4 contains some emissions got in each of them
with approximated probabilities.
Move
ryu air h
ryu air h
ryu std idle
ryu std idle
ryu dash fwrd
ryu dash fwrd
ryu shoryuken
ryu shoryuken
ryu crouch m
ryu crouch m
ryu std l
ryu std l
ryu hadoken
ryu hadoken
Obs.
Free
Combo
Free
Combo
Free
Combo
Free
Combo
Free
Combo
Free
Combo
Free
Combo
1500 Set
Probability
0.97
0.03
0.77
0.23
0.92
0.08
0.35
0.65
0.64
0.36
0.72
0.28
0.34
0.66
470 Set
Probability
0.91
0.09
0.81
0.19
0.96
0.04
0.33
0.67
0.62
0.38
0.76
0.24
0.31
0.69
Table 4: Some emission probabilities in the moves by frame modeling in both training sets.
Both tests show relatively close probabilities. When comparing
with the previous emission probabilities, this one seems far better.
Although some moves like ryu std l and ryu crouch m are more
likely to be Free rather than a Combo, moves that were expected to
be free have a much higher probability of being free, like air moves
or dash. Also, we see that the moves hadoken and shoryuken have a
higher probability of being a combo and in the last emission table,
we had no input with this behavior.
When we ran the viterbi algorithm, it returned the combo
ryu_shoryuken, ryu_shoryuken, ...
In theory, this combo would be interesting but, in practice, it’s not.
In the game, the shoryuken move is a jumping uppercut. After the
move is made, the player’s character becomes unable to act for a
couple frames, he is locked in the ryu shoryuken landing, shown in
Figure 10. However, as seen in figure 9, a move is followed by itself
in a large portion of the observation sequence. This impacts heavily
in the transition probabilities making a move extremely unlikely to
change. This makes the move ryu shoryuken landing to be ignored
Next state
ryu standing idle
ryu standing m
ryu attack overhead
ryu jump neutral
ryu attack roundhouse
ryu tatsu start
ryu standing h
ryu standing l
ryu crouching l
ryu crouching h
ryu shoryuken
ryu walk backwards
ryu hadoken
ryu crouching m
ryu walk forward
ryu crouching idle
ryu throw forward
ryu dash backwards
ryu attack lowkick
and suggest that the sequence shoryuken, shoryuken, etc is combo,
which in reality isn’t even a valid sequence.
Figure 10: Example of the Shoryuken attack followed by the landing action after a Shoryuken
Another interesting fact is also the duration of each action. Special
attacks like hadoken and shoryuken last longer that simple attacks
like standing l. This behavior makes special attacks have a much
larger probability of being a combo, simply because they last longer
and therefore have more frames being labeled as a combo.
6.4
Supervised learning and moves modeling
Attempting to further avoid noise generated by multiple frames of
a same action, we instead attempted to abstract from our previous
discrete time interval, which is a single frame. Although it is the
shortest time step in a game, it does create other problems related
to the number of consecutive instances of the same action being
dependant on the cooldown of the action itself, rather than being
dependant of any informative variable we could derive or use to
improve our modeling.
For this attempt, we changed our HMM time step from a single
frame to the first frame of a new action by the attacking player. This
approach generates a considerably smaller and condensed new set
of information that does not inform of the timestamp of an action as
these actions are sequential, but the amount of frames in between
them is unknown. This change greatly reduces the ammount of
noise generated by multiple instances of the same input, but it does
however take away our capacity to run the obtained sequence of
actions within the game automatically, as we now have to write the
input for a set of actions by hand and without knowledge of each
action’s timestamp. For this, we simply assume each action within a
sequence selected by the HMM occours as soon as possible. That is,
we assume that between an action’s and the next one’s beginning,
there is the minimum possible ammount of idle frames, which in
most cases is zero frames.
Probability
0.0746869
0.00447227
0.0228086
0.13059
0.0330948
0.0442755
0.0778175
0.0594812
0.110018
0.0997317
0.0210197
0.0545617
0.0277281
0.106887
0.0178891
0.0514311
0.000894454
0.000447227
0.0290698
Table 6: All transition probabilities from state ryu standing m in
the moves modeling.
shoryuken.
When prompted to find larger combos, the HMM returned a repetition of these small combos. One of the reasons behind this is that
most moves are followed by the idle animation, specially tatsu spin
with a probability of nearly 95%. This is the same animation made
in the beginning of a match. This simulates a ’reset’ after a small
combo and the HMM tends to simply repeat these combos. The
other reason is that the HMM does not have enough information to
know when a move can be repeated which, in most of the suggested
sequences, would take a couple seconds.
In this last experiment, we ran the supervised training over all 1970
samples of attacks we had. This time the results were really promising. Figure 11 show a few examples of combos found after the
viterbi algorithm was ran and Tables 5 and 6 show a segment of
the emission and transition matrix. These combos however are all
small, lasting only a couple actions. The emission and transition
probabilities also appear to be extremely better than in the previous
experiments, as expected from the results.
Move
ryu standing idle
ryu standing m
ryu attack overhead
ryu jump neutral
ryu air m
ryu jump fall
ryu air l’
ryu tatsu spin
ryu standing h
ryu shoryuken
Combo (1)
probability
0.209363
0.445886
0.437616
0.123120
0.009825
0.0
0.023041
0.628571
0.372345
0.595197
Free (0)
probability
0.790637
0.554114
0.562384
0.876880
0.990174
1.0
0.976959
0.371429
0.627655
0.404803
Table 5: Some emission probabilities in the moves modeling.
The combos found are standing l + standing l + standing l; standing h + tatsu spin; crouching m + tatsu spin and overhead +
Figure 11: Four different combos found when running the Viterbi
algorithm for many combo sizes. Any combo bigger than 9 moves
is a repetition of the first example. For the ones between 1 and 8
moves, Viterbi returns different combinations of all of them.
7
Conclusion
Through application with a supervised HMM, and also an unsupervised HMM, we realized that the effects of noise and delay were
severe enough to overshadow the true patterns that lead to combos.
These implications assure us that inputs do not serve as a model-
ing criterion even when described under a sequential learning algorithm. The Markov assumption process suggests that the future
is independent of the past. In our case, this does not apply, as the
observation of a following frame depends mostly on a hidden state
somewhere in the past. We could specialize the algorithm to avoid
these flaws, or even try to predict the far future from a past frame
instead of trying to predict the following frame, but since this diverges from the input approach, it leads us to believe that it is not a
reliable modeling basis in the first place.
On the other hand, when analyzing the actions of a player rather
than its inputs, the results are more conclusive and precise. The
downside of this approach is that we HMM becomes overspecialized in this specific problem.
Future works intend to explore different and even more specialized
modeling approaches. One of such could be a modification of the
standard Viterbi algorithm, in which the repetition of a subset of
sequences is penalized. This new view of the problem is sure to
bring different challenges and difficulties, even more so when trying
to adapt them to other fighting games. But we can hope that these
approaches should bring lesser complications than the more generic
approaches.
References
A NDRADE , G., R AMALHO , G., S ANTANA , H., AND C ORRU BLE , V. 2005. Challenge-sensitive action selection: an application to game balancing. In Intelligent Agent Technology,
IEEE/WIC/ACM International Conference on, IEEE, 194–200.
BAUM , L. E. 1972. An equality and associated maximization
technique in statistical estimation for probabilistic functions of
markov processes. Inequalities 3, 1–8.
B ILMES , J. A., ET AL . 1998. A gentle tutorial of the em algorithm
and its application to parameter estimation for gaussian mixture
and hidden markov models. International Computer Science Institute 4, 510, 126.
B RYSON , A. E. 1975. Applied optimal control: optimization, estimation and control. CRC Press.
G RAEPEL , T., H ERBRICH , R., AND G OLD , J. 2004. Learning to fight. In Proceedings of the International Conference on
Computer Games: Artificial Intelligence, Design and Education,
193–200.
G RIFFIN , D., AND B UEHLER , R. 1999. Frequency, probability,
and prediction: easy solutions to cognitive illusions? Cognitive
psychology 38, 1, 48–78.
M ATSUMOTO , Y., AND T HAWONMAS , R. 2004. Mmog player
classification using hidden markov models. In Entertainment
Computing–ICEC 2004. Springer, 429–434.
R ABINER , L., AND J UANG , B.-H. 1986. An introduction to hidden markov models. ASSP Magazine, IEEE 3, 1, 4–16.
S AINI , S. S., DAWSON , C. W., AND C HUNG , P. W. 2011. Mimicking player strategies in fighting games. In Games Innovation
Conference (IGIC), 2011 IEEE International, IEEE, 44–47.
S ARKAR , A. 2000. Yet another introduction to hidden markov
models. Canada: School of Computing Science, Simon Fraser
University.
S TAMP, M. 2004. A revealing introduction to hidden markov models. Department of Computer Science San Jose State University.
V ITERBI , A. J. 1967. Error bounds for convolutional codes and an
asymptotically optimum decoding algorithm. Information Theory, IEEE Transactions on 13, 2, 260–269.
YAMAMOTO , K., M IZUNO , S., C HU , C. Y., AND T HAWONMAS ,
R. 2014. Deduction of fighting-game countermeasures using the
k-nearest neighbor algorithm and a game simulator. In Computational Intelligence and Games (CIG), 2014 IEEE Conference
on, IEEE, 1–5.