Attempting to discover infinite combos in fighting games using
Transcription
Attempting to discover infinite combos in fighting games using
Attempting to discover infinite combos in fighting games using hidden markov models Gianlucca L. Zuin Yuri P. A. Macedo UFMG, Departamento de Ciência da Computação, Brazil Figure 1: A background for Street Fighter 2. A game who inspired the model in which most fighting games followed. Abstract Designing for balance is core in competitive games. Ensuring fairness in player vs player games is a design goal that any game that features this sort of interaction should, at least to some extent, strive for. Unfortunately, it often happens that the whole of the possibilities given to a player exceeds the designer’s expectations, creating combinations and exploits that sometimes threaten the game’s reliability as a balanced and competitive title. Focusing on searching for an automated solution to one of the main flaws of fighting games, specifically infinite or unfair combos, this work discusses the use of Hidden Markov Models to predict if a subset of player commands would result in a combo. To this goal we study two different approaches: predicting the most likely sequence of player inputs in each frame that would result in a combo and the most likely sequence of player actions, regardless of frame information, that also could result in a combo. Experiments were performed on a fighting game of our own design. Both supervised and unsupervised learning algorithms were applied, however, due of the excess of noise in the first approach and particularities of the implemented model, the first approach was unable to successfully predict combos. We then change our minimal discrete time interval to a player action, rather than game frame. In this last scenario the HMM is capable of identifying small combos but, when asked to find larger ones, it can only append smaller combos that cannot be performed in the actual game. Despite that, our discussions in the matter and our findings are presented in this paper and should be relevant to this overall discussion. Keywords: Fighting games, Hidden Markov Models, Game Balance, Baum-welch, supervised HMM learning Author’s Contact: [email protected], [email protected] 1 Introduction Fighting games struggle to achieve balance among its characters. To design and program a character’s move set is no trivial task, and as such, it is vulnerable to many design flaws, some of which may be product of the possible interactions between character’s kit rather than an isolated design mistake. And while recent games may benefit from patches that fix bugs and flaws, older generations of games had to get it right on the first try. Of course, as the game aged, more of its bugs and intricacies would be found out by the player community, which in turn found ways to exploit these discoveries and carefully break the designer’s barriers. Effectively hampering their efforts for balance. A design flaw particular to fighting games and action games that allow player vs player interaction is the existence of infinite combos. A combo is a sequence of attacks by a player that while damaging an opposing player also does not allow for that receiving player to take any actions. Combos are an integral part of many action games, rewarding precise execution of commands and encouraging players to keep on practicing. While these strings of attacks should eventually end allowing the receiving player to retaliate, infinite combos, as the name suggests, do not end as long as the executing player does not miss his inputs. These types of combos are usually the result of unexpected interactions between a character’s actions, where a string of attacks loops with itself indefinitely. Although it is the designers’ and programmers’ jobs to avoid such interactions, the exponential number of possible action strings may cause some slip in the final version of the game. Balancing against infinite combos requires that the designers and programmers be fully aware of the intended possibilities created by an action within the game’s physics and rules. It is an exhausting task that requires a lot of testing. Even when the combos are not infinite, some attack strings may be too strong in game for what the designers had planned for the character. Either case proves that balancing is a time consuming task that is also essential to the game’s overall design. Yet, we believe this task could be automated by using pattern and feature recognition machine learning algorithms. In this work, we attempt to use Hidden Markov Models to predict whether a sequence of inputs results in combos in a fighting game of our own design. The remainder of this paper is organized as follows: section 2 cites some works regarding Artificial Intelligence techniques applied to fighting games and HMM approaches in games. Section 3 introduces basic concepts, such as what is a HMM, the supervised learning algorithm, the Baum-Welch learning algorithm and, once the HMM is trained, how to apply the Viterbi algorithm. Section 4 explains the game developed in order to perform our experiments. Finally, section 5 and 6 address our HMM modeling and the results obtained, as well as a discussion of them. The last section brings the conclusion directing future work and the insights the authors came to at the end of this work. 2 Related Work with an umbrella and N that he or she had nothing with him or her. S and R are the same as before. The paper [Matsumoto and Thawonmas 2004] discusses the use of Hidden Markov Models on video games using logs of player actions to classify players in Massive Multi-player Online Games. It compares HMM performance with Memory Base Reasoning, a method which predicts values based on the similarity of other similar cases. Although MBR could be another approach to predicting combos, as [Matsumoto and Thawonmas 2004] states on its own discussion, these methods do not exploit time structures hidden within the action sequences of the players. Most academic research around fighting games discusses improvements to Artificial Intelligence using various methods. [Yamamoto et al. 2014] proposes the use o the K-Nearest algorithm to develop a bot capable of predicting its opponent next set of actions and be able to deploy a countermeasure in order to win the match. [Graepel et al. 2004] use reinforced learning and different reward functions to design an AI with good policies in a commercial game. Like in our work, their agent tries to learn relevant data by inference in the Markov decision process. [Saini et al. 2011] attempts to create a bot capable of playing like humans do highlighting that fighting games nowadays are highly dependent on human-human interaction and, as such, an agent capable of mimicking a player is of extreme value. It classifies each possible set of actions using Naive Bayes and finite state machines to select which set would better mimic a player. [Andrade et al. 2005] also addresses the difficulty of balancing games and validates his approach in a fighting game as validation. Through the usage of reinforced learning, they attempt to create intelligent agents capable of adjusting the difficulty of the game according to the player’s skill. The agent receives a negative reward if the game is either too hard or to easy and a positive one otherwise. Although we both tackle a similar problem, our focuses diverge. What they wish is to balance the game changing the way the machine controls each character and we, on the other way, wish to identify flaws in each character. S R U 0.4 0.82 N 0.6 0.18 Matrix 2: Emission matrix. Expresses the odds of seeing someone carrying or not an umbrella in sunny and rainy days. Given that you have been at the mine for 5 days, you wish to know if today is raining or not. You remember that the day that you came to the mine was not raining. In the first two days you didn’t see a miner with an umbrella however in the last two you did notice people bringing an umbrella. So, is it raining or not today? This type of system is known as a Hidden Markov Model. The states, which we want to know, are hidden and we only have access to a set of observations which are tied to the states by probabilities. Our goal is to make effective and efficient use of the observable information to gain insight into various aspects of the Markov process. [Stamp 2004] The Markov assumption states that the future is independent of the past. All the information needed to predict it is already encoded in the present. [Bryson 1975]. Generalizing: a given state n can be inferred directly from the state n − 1. Therefore, in our miner example, to know whether it is raining or not in the fifth day we only need to look at fourth day. By calculating the most likely state of each day starting on day 1 given day 0, day 2 given day 1 until day 5 given day 4, we can stipulate whether today is raining or not as seen in Figure 2. We could not find any papers or discussions that would mention the use of machine learning algorithms to improve or automate other areas of game development, which suggests that this work might be a novelty on the field. 3 Hidden markov models Let’s look at a simple example. Assume that you are a miner working deep underground and you wish to know if it is raining or it is sunny outside. You cannot go out and see as you are busy working, but you can see new miners arriving and what they are carrying with them. You also know how often it rains in your city. During the many years you lived in this city, you came to the conclusion that most days are sunny however, if one day rained, than most likely the next day would rain as well. After reminiscing the last few weeks and counting how many days were sunny in a row, and also how many days were rainy in a row before the rain would stop or start, you finally come up with Matrix 1. S denotes Sunny and R denotes Rain. If today is raining, there is a 70% chance that tomorrow will rain as well and a 30% chance that the rain will stop. S R S R 0.8 0.2 0.3 0.7 Matrix 1: Transition matrix. Expresses the odds of all possible transitions between the states Rainy and Sunny. Some of your co-workers are well prepared while others aren’t. Therefore, even if it’s sunny outside, some of them will carry an umbrella and even if it is raining, some of them will not. You know the ones that would bring an umbrella and those that would not, however from where you are, you cannot see the faces of the new miners arriving. This implies that you cannot infer it is raining outside by only looking what people are bringing. Doing a generalization one can infer Matrix 2, where U denotes that you saw someone Figure 2: Markov chain of the miner problem Figure 3 models the HMM of the problem using the matrices 1 and 2. The first one, representing the probabilities of the next state given the current one is called Transition matrix. The second matrix, showing the likelihood of an observation in each state is called Emission matrix. Besides these two, HMMs use a third matrix called π matrix which expresses the initial probabilities and, in this particular example, can be viewed in Matrix 3. π= R 0.0 S 1.0 Matrix 3: π matrix. Since we know that it was sunny in the first day, the initial probabilities for each state are 0 for Rainy and 1 for Sunny. Therefore, an HMM is a doubly stochastic process. It has underlying stochastic processes that are not observable (a hidden layer) but can inferred by another set of stochastic processes that produce a sequence of observable symbols. [Rabiner and Juang 1986]. The method to answer the question is it raining on day 5? will be explained in the section Viterbi algorithm. • E(S, U ) = 8 Emissions occur at 13 , 21 , 33 , 41 , 46 51 , 52 and 57 • E(S, N ) = 12 • e(S, U ) = 8/(8 + 12) = 0.4 • e(S, N ) = 12/(8 + 12) = 0.6 For the emissions in Rain, we have: • E(R, N ) = 2 Emissions occur at 16 and 31 • E(R, U ) = 9 Figure 3: Modeling of the HMM matrices to the miner problem • e(R, N ) = 2/(2 + 9) = 0.18 3.1 Supervised learning In the miner example, you were able to create the Transition and Emission matrices by reminiscing of the past. Supervised training uses statistics of known labeled samples and estimates transition and emission probabilities from those samples. Using a sample database to count the number of transitions and the number of emissions we obtain the frequencies of each state and observation. With this information we can estimate φ = a(xi , xj ), e(xi , ys ) using the respective frequencies, where a(xi , xj ) denotes the probability of a transition from the state xi to xj and e(xi , ys ) denotes the probability of a observation ys to be seen in the state xi . The steps of the supervised learning algorithm are: • For each tuple xi and xj , calculate A(xi , xj ) = number of xi → xj transitions in all samples. • For each tuple xi and ys , calculate E(xi , ys ) = number of emissions of the symbol ys in the state xi in all samples. P • a(xi , xj ) = A(xi , xj )/ n k=1 A(xi , xk ) Pn • e(xi , ys ) = E(xi , ys )/ k=1 E(xi , yk ) The first two steps calculate the frequency of each transition and each emission. The second two use the relative frequency probability [Griffin and Buehler 1999] to estimate t(xi , xj ) and e(xi , ys ). When observing and repeating an experiment multiple times, the relative frequency of occurrence of a given event is a measure of the probability of this event. What this means is that, if nt is the sample space, the total number events, and nv is the number of occurrences of the event v, the probability P (v) of this event occurring can be approximated by the relative frequency P (v) ≈ nnvt . Let’s assume the sequences: • e(R, U ) = 9/(2 + 9) = 0.82 3.2 Baum-Welch learning algorithm Some domains have abundance of unlabaled data but labeled data, on the other hand, can be hard to find. In some cases the labeling has to be done manually which can take huge amount of time for deliberate large data. An algorithm that can use this unlabaled data to train is valuable. Defining a HMM as λ = (A, E, π) where A represents the transition matrix, E the emission matrix and π the starting probabilities, the Baum-Welch algorithm [Baum 1972] seeks to maximize λ∗ = maxλ P (Y |λ), that is the HMM that maximizes the probability of the observations Y = {y0 , y1 , ..., yn }. Note that the BaumWelch does not need information from the X sequence, the hidden states associated with each observation, and uses the principles of expectation-maximization. In order to do this, the algorithm has four steps. The first one is the initialization. The HMM λ = (A, E, π) can be initialized with random values, equal distributions or previous knowledge. We should always try to fill the starting HMM with some sort of prior knowledge since the Baum-Welch always converges toward a local maximum [Bilmes et al. 1998]. This not only makes it converge faster but also helps finding a local maximum that is closer to the global maximum. The next step is the forward procedure.What we seek in this step is finding the probability of seeing each partial sequence y1 , ..., yt ending up in each state xk at each time t. The base case: αxi (t) = πxi ∗ Exi ,y1 PN SSSRRR SRRSSS RRSSSS SSSSSS SSRRRRS NNUUUN UUUNNN NUUNNN UNNNNU UUUUUUU The recursion: αxj (t + 1) = Exj ,yt+1 ∗ For the transitions from Sunny, we have: The third step is the backward procedure. This is the probability of the ending a partial sequence yt+1 , ...yT given that the initial state at time t was xi . • A(S, R) = 3 Transitions occur at 14 , 22 and 53 • A(S, S) = 12 i=1 αxi (t) ∗ Axi ,xj The base case: βxi (T ) = 1 • a(S, R) = 3/(3 + 12) = 0.2 • a(S, S) = 12/(3 + 12) = 0.8 For the transitions from Rain, we have: • A(R, S) = 3 Transitions occur at 24 , 33 and 57 • A(R, R) = 7 • a(R, S) = 3/(3 + 7) = 0.3 • a(R, R) = 7/(3 + 7) = 0.7 For the emissions in Sunny, we have: The recursion:βxi (t) = PN j=1 βxj (t + 1) ∗ Axi ,xj ∗ Exj ,yt+1 From α and β we calculate γi (t) = p(Xt = xi |Y, λ) and ij (t) = p(Xt = xi , Xt+1 = xj |Y, λ) which are used in the update rules and categorizing the last step. We repeat the forward and backward steps updating the matrices values until convergence (eg.: the probabilities found in the next iteration are worse than the ones found in the previous one). Folowing is the pseudo-code of each step, taken from [Sarkar 2000]. 4 Developed game To even attempt to approach the problem, we needed a fighting game where we could simply input a player’s commands and get whether the opposing player is stuck in a combo state or not. The first part is relatively easy, the second however would require us to get information stored within the game’s variables, and not many games allow this sort of intrusion. There are very few Fighting Games that are open source or mod-able to the extent we need. And for that reason, we decided to continue our research on a fighting game of our own making. 3.3 Viterbi algorithm We could find the most likely sequence of weathers by listing all possible combinations of them and finding the probability of the observed sequence for each of them. The most likely sequence would be the one that maximizes: P (observation|hiddenstate). In the miner problem, the most likely sequence of the weather would be the one that maximizes: Figure 5: Street Fighter 2 is a classic game that inspired many to follow the two dimensional fighting style. The Developed game was designed to replicate this fighting game model. The game was designed in the Gamemaker: Studio, an API that allows for quick functional prototypes and can be used both by entry-level novices and seasoned game development professionals. P (N, N, U, U |S, S, S, S), P (N, N, U, U |S, S, S, R), P (N, N, U, U |S, S, R, S), ..., P (N, N, U, U |R, R, R, R) This approach is viable, but to find the most likely sequence trough brute force is computationally expensive. This is where the Viterbi algorithm shines [Viterbi 1967]. We already modeled a Hidden Markov Model with state space S, initial probabilities πi of being in state i and transition probabilities ti,j of transitioning from state i to state j. We also have the observations y1 , ..., yn . Therefore, the most likely state sequence x1 , ..., xn that produces the observations can be found by recurrence: V1,k = P y1 | k · πk Vn,k = maxx∈S P yn | k · ax,k · Vn−1,x Where Vn,k is the probability of the most likely sequence responsible for the first n observations that ends in k. The Viterbi path can be retrieved by looking backwards from the goal state until the starting state following the higher probability path or by saving a reference to the previous state. Applying to our miner problem, we have the following viterbi path in figure 4: Figure 4: Steps of the viterbi algorithm applied to the miner problem. The last 5 days have most likely been sunny. The game was developed for academic research purposes only, using graphical resources from Street Fighter 3 and sound files from Street Fighter 4. The game flows at 30 fps and was programmed using the Gamemaker engine. It replicates the game style of the street fighter series while making adaptations that compensate for the lack of some game features as well as to make the fighting system more robust. The game allows for player versus player matches without a time limit. Our game computes inputs in the form of symbols that represent attack inputs (l = light, m = medium, h = heavy) followed by a number which corresponds to the current directional input and represents the common numpad notation for fighting games (1 = down left, 2 = down, 3 down forward, 4 = left, 5 = neutral, 6 = right, 7 = up left, 8 = up, 9 = up right). Only one directional input may be read at any frame. Multiple buttons can be read at one frame, however due to a priority system implemented within the game, and the lack of commands that require multiple buttons, inputs follow a priority where if multiple attack buttons are pressed within the same frame, the game only registers one, following the priority list (”h” > ”m” > ”l”). The game was adapted to this project by changing the player controller input to a input file containing the commands at every frame for player one, and also to output to a file at every frame that writes whether player two is in a combo state or not. To optimize training and execution times, the game was stripped of most of its graphical and sound resources and its speed was increased up to 333 times the original game speed, resulting in the maximum fps allowed by the Gamemaker engine. 5 The proposed HMM approach A frame is the shortest discrete time interval in a game. This is where physics and graphics simulations occur, as well as any event checks and any script calls. Fighting games, specially those that appeal to more competitive audiences, try to remain smooth at 60 frames per second to allow more room for inputs and more diverse frame data for character actions. At each frame, the game also collects the controller inputs for both players. Most fighting games store inputs in order to allow certain special attacks that require a precise sequence of commands. 5.1 Modeling after inputs Any game will require player inputs, as it is necessary for the most basic interaction level. As this is the most common information the algorithm may ask from the player, regardless of the type of game being played, we believe it is also the most generic set of information upon which a model can be built. One advantage of taking such a broad approach is that, if successful, it could be applied to any other fighting game as long as every action is deterministic. What that means is that for the same set of actions in any given instance of the game, the outcome will always be the same. This is not the case for games in which characters may have attacks that generate random results, or have random events that unpredictably change the current state of the game. When a player makes an attack, he is unable to make another action for a number of frames. This happens since every attack animation has a period of cooldown associated with it. During this time, most of that player’s actions are restricted, with a very few exceptions. As such, whenever an attack is made, the player can expect to be unable to act for some time. Likewise, when a player is hit by an attack, he is also denied of actions for a short amount of frames during his damage animation. The amount of time it takes for a character that was hit to become able to act again depends on the attack that hit him. If a player that hits an opponent with an attack manages to attack again before that opponent may recover from the last hit, we call that sequence of attacks a Combo. However, if the opponent is able to recover during that interval we say that he was Free. This however does not imply that players will not give any inputs to the game while they are unable to act, its quite the opposite actually. In order to make a special move, an attack that is usually more powerful than the standard ones, the player needs to give a sequence of inputs in a row and this is more efficiently done during the time that the player could not make a new action. Because of this, between two significant inputs (eg.: two inputs that combo with each other) there is usually a lot of insignificant inputs that do not result in any actions but have a Free or Combo state associated to them because of the opponent’s latent state and, therefore, are categorized as noise. During the execution of the game there are two kinds of information that can be evaluated: the input of the player and its effect, which can be seen as the opponent’s state. Since we are able to extract both sequences from the execution of the game we have two possible models. In one of them we observe the opponent’s state and the other we observe the player’s input. The markov chain for both of them can be seen in figure 6. In the first scenario, modeled in figure 7, each input is a hidden state. By training in this HMM we are able to learn the likelihood of each input resulting in a combo and which inputs often follow another. In this representation we are able to answer questions such as ’which is the most likely sequence of inputs that result in a combo of n frames?’. In the second one, the hidden states are Combo and Free. Training in this HMM results in different conclusions. Given that an opponent is in a combo, we learn the likelihood of that being caused by each input. In this sense, we can answer the question ’the sequence x1 , x2 , ..xn was most likely a combo of how many frames?’ Figure 6: Markov chain for both possible HMM models in the combo problem. In the upper one we want to infer the sequence on inputs while in the second one we want to infer the sequence of combo/free The focus of this work is in the first scenario: to find large combos. However we also explore the second one in order to evaluate how well the HMM behaves in all cases for this particular modeling and how robust it can be. Figure 7: Modeling of the HMM representing the inputs and the ’Free’(0) and ’Combo’(1) observations. inputs range from all combinations of possible attacks: -(none), l(light),m(medium),h(heavy), and all possible directional inputs (1 to 9). This leads to 36 possible hidden states, each having a transition to each other state. Every arrow is associated with a probability. 5.2 Modeling after moves A move by a player is a consequence of a combination of his inputs within a time interval. Moves are specific to each character and each game. In contrast to the first described input modeling which is as generic as possible, this modeling approach is very specific to the attacking character, as most characters in the same game may share the same set of basic actions while also having a set of special actions unique to them. This also means that for a different fighting game, the input data will be completely different, due to different games having different sets of actions. This does not however change the structure of the HMM, as it only affects the number of possible observations and hidden states. One of the advantages of this approach is the reduction of noise as the data becomes much more objective, describing the exact action performed by the player instead of having to be tested by running the inputs on the game alone. This however comes at the cost of information loss, but it is otherwise similar to the input approach. Again as with the input modeling, we attempted two different ways to represent our data. This time however, we kept the player information (the actions) as our observations in both approaches, and instead changed our definition of a time step from a frame-by-frame to an action-by-action time interval. The difference between will be further explained within section 6.4. Figure 8: Markov chain for the HMM that takes player actions as observations. The only diference between both action aproaches is when a new state is registered. 6 Experiments and discussion We manually wrote 47 different possible combos including what we believe to be the longest combo within the game, lasting for roughly 240 frames (or 8 seconds). Next, we created ten variations of each combo, containing noise for each non-relevant input in the combo (a non-relevant input is one that is not considered due to the character being busy whilst that input is being read). Along with these 470 now possible combos we also created 1500 randomly generated input sequences. All these 1970 sequences were then executed trough the testing game in order to get the sequence of states for each one of these sequences of inputs. 6.1 Supervised training and input modeling Since we have both the [Combo,Free] sequences and the input sequences, we chose to train our HMM trough the supervised method. Given its simplicity and the lack of any prior knowledge and representation, it far outshines the unsupervised approach requiring only a few seconds to execute. The problem with this kind of algorithms is the need of labeled data, which can be difficult to acquire but is not the case in our problem. We proceeded to execute two instances of our HMM and trained each one with a different set. The first one was trained only with the 470 manually generated combos and the second one received all 1970 sequences. The objective of this particular experiment was to observe the HMM behavior when faced with the nearly complete chaos of the full training set. The HMM with the smaller and more well-behaved training set functions as our control-group. The resulting HMM was fed to the viterbi algorithm and we queried for the most likely sequence for the inputs: Combo Combo Combo Free Combo Combo Both of them returned the same response although with different probabilities: 5- 5- 5One of the reasons this occurred is because when generating the game, both players are unable to do any actions for the first 5 frames of the game. Because of that, the first five inputs of the game are always 5- (stay still) which causes the starting probabilities to be 1.0 for 5- and 0.0 for all other states. Not only that but the transition matrix heavily favors the transition 5-→ 5-. However, since the viterbi algorithm takes into account the emissions probabilities alongside the transition ones, this does not explain why 5- was selected, as staying still should be associated to the ’Free’ state and its emission probabilities should be close to (0.0 1.0). Taking a closer look at a segment of the emission probabilities in Table 1 we start to see the problem: Input 586h 496m 23m 73l Combo (1) probability 0.165115 0.205040 0.250362 0.264810 0.357834 0.254803 0.270237 0.253883 0.352612 0.252662 Free (0) probability 0.834885 0.794960 0.749638 0.73519 0.642166 0.745197 0.729763 0.746117 0.647388 0.747338 Table 1: Some emission probabilities in the inputs by frame modeling. Combo refers to the observation 1 while Free to observation 0. Although the probability of 5- being Free is higher than being Combo, so is the probability of everything else. Because of the noise inserted, the probabilities of each inputs seem to converge to (0.25 0.75). There are a couple odd things though. When counting the 0 to 1 ratio in the files, we conclude that roughly a third of the states are 1s, which should lead to the probabilities being (0.33 0.66). The other one is why do the inputs 8-, 9- and 7- also behave differently from everyone else? During the generation of the random combos, there are no moves including the 7 or 9 directional. This was made in order to avoid the player moving away from the enemy (7 represents up left while 9 represents up right). They were however inserted as noise in the 470 combos. Because of this they follow the 0.33-0.66 trend of the file. As for the input 8-, the answer is quite simple. In the 470 combos created, a large sum of them involve jumping to start the combo. Because of that, there is this shift in probabilities towards the 0 state since the combo didn’t start yet and the player needs to go a few frames going up to arrive at the desired height before attacking. The probabilities related to 8- being in the state 1 are caused when 8- is noise, however. We then remove the first 5 frames of each input sequence and each state sequence and train our HMM again. As expected because of its abundance in the 470 combos, the most likely sequence becomes: 8- 8- 8The last question is why do movements like 2- have the same probabilities and attacks such as 6m? As explained before, when a significant input is given, the character the player is controlling starts an animation. However, its effects are not atomic, it takes a couple frames for an attack to actually land. This makes an attack such as 6m to be treated as a 0 even though it resulted in a combo later on, because in the frame the action was made, its opponent’s state was 0. The other attacks in a combo are treated normally though. This explains why the probabilities of attack inputs being lower than they should: the first move of a combo is always 0 and it potentializes insignificant sequences to be marked as 1’s. Also, we only look at a limited frame window (300 frames as it leads to the long in game time of ten seconds, which is a long time for a single combo). Therefore, if a combo is made during the final frames of a window (which is the case), a couple 1’s are removed which also lowers a bit the general probabilities. We then begin a third experiment with a couple changes. First, we only train over the 1500 random inputs to avoid unwanted tendencies. Second, we remove the first five frames of each sequence to avoid the first 5- in all inputs. Third, we hard-code the starting probabilities to be equal. The third step should not be needed given the law of big numbers, but we wanted to be sure. For the sequences of 20 Frees and 20 Combos we obtained the most likely sequences of: 6l 3l 6m 1m 8h 6- 6l 3l 6m 1m 8h 6- 6l 3l 6m 1m 8h 6- 6l 3l probabilities, which show that the model as a whole did not improve much. 6h 3m 1m 8h 4- 8l 1m 8h 4- 8l 1m 8h 4- 8l 1m 8h 4- 8l 1m 8h 6.3 Although the sequence for the Combos seems promising, looking at the sequence of Frees show that these predictions are nothing more that wild guesses. Indeed, when looking at the emission (represented in Table 2) and transition matrices, we can easily conclude that. Input 22h 6m 4m 3l 2m 44h Combo (1) probability 0.251431 0.246933 0.247652 0.247322 0.248667 0.251471 0.251075 0.247857 Free (0) probability 0.748569 0.753067 0.752348 0.752678 0.751333 0.748529 0.748925 0.752143 Supervised learning and moves by frame modeling One of the biggest problems in the last two experiments were the noise. In a sequence of multiple inputs, only a small sum of them were relevant and represented what was happening inside the game. This made most inputs to be nearly randomly labeled. In order to remove or at least reduce this behavior, we stop looking at the inputs of the player and we model our HMM after the actual actions of the player. Figure 9 shows the observations printed in a same set of actions in the two different models. Table 2: Some emission probabilities in the inputs by frame modeling using only the 1500 random inputs. The transition matrix is too large to be shown here. All probabilities are close to 0.035. Since in these examples there are no moves with 7 or 9, this is expected as 1.0/28 = 0.035. In the HMM trained with all the examples and only with the 470 examples, the probability of a move input to transition to another move input (eg.: 2- to 4-) is slightly higher. This occurs because of the tendencies in the manual combos. When running the HMM modeled after the second scenario, using the inputs as observations, the HMM returned Free for all inputs in accordance to what we expected from seeing the other HMM performance. 6.2 Unsupervised training and input modeling In order to try to improve the HMM obtained, we ran the BaumWelch algorithm over the HMM trained with 470 examples using only the states of the 470 combos and using all the 1970 examples. The shorter one converged rather fast, only requiring two iterations and about half an hour. The longer one however did not converge after over a hundred iterations and over 40 hours of training. Because of that, we decided to omit its results as they are most likely poor. The results were absolutely the same as the supervised learning, which was expected given the extremely fast convergence and that it trained using the same examples. All probabilities did however change slightly as shown in Table 3. Input 586h 496m 23m 73l Combo (1) probability 0.088287 0.198316 0.263652 0.281583 0.398486 0.268071 0.288073 0.253883 0.396075 0.266460 Free (0) probability 0.911713 0.801684 0.736348 0.718417 0.601514 0.731929 0.711927 0.732035 0.603925 0.733540 Table 3: Some emission probabilities in the inputs by frame modeling using unsupervised training over the previous emission matrix, represented in Table 1. The probability of 5- increased drastically towards 1 and 8- followed the same pattern. All inputs however showed an increase in the probability of a Combo. Although this seems to be a good improvement, the most likely sequence continues not being a combo. Also, inputs like 2- and 4- also showed an increase in its combo Figure 9: Difference in observations between the two proposed models in each frame of the game. Because we removed the noise between two moves, the 470 controlled combos become only 47 repeated 10 times. Even so, we run the supervised training in the two sets of combos (the 470 and the 1500 ones). Table 4 contains some emissions got in each of them with approximated probabilities. Move ryu air h ryu air h ryu std idle ryu std idle ryu dash fwrd ryu dash fwrd ryu shoryuken ryu shoryuken ryu crouch m ryu crouch m ryu std l ryu std l ryu hadoken ryu hadoken Obs. Free Combo Free Combo Free Combo Free Combo Free Combo Free Combo Free Combo 1500 Set Probability 0.97 0.03 0.77 0.23 0.92 0.08 0.35 0.65 0.64 0.36 0.72 0.28 0.34 0.66 470 Set Probability 0.91 0.09 0.81 0.19 0.96 0.04 0.33 0.67 0.62 0.38 0.76 0.24 0.31 0.69 Table 4: Some emission probabilities in the moves by frame modeling in both training sets. Both tests show relatively close probabilities. When comparing with the previous emission probabilities, this one seems far better. Although some moves like ryu std l and ryu crouch m are more likely to be Free rather than a Combo, moves that were expected to be free have a much higher probability of being free, like air moves or dash. Also, we see that the moves hadoken and shoryuken have a higher probability of being a combo and in the last emission table, we had no input with this behavior. When we ran the viterbi algorithm, it returned the combo ryu_shoryuken, ryu_shoryuken, ... In theory, this combo would be interesting but, in practice, it’s not. In the game, the shoryuken move is a jumping uppercut. After the move is made, the player’s character becomes unable to act for a couple frames, he is locked in the ryu shoryuken landing, shown in Figure 10. However, as seen in figure 9, a move is followed by itself in a large portion of the observation sequence. This impacts heavily in the transition probabilities making a move extremely unlikely to change. This makes the move ryu shoryuken landing to be ignored Next state ryu standing idle ryu standing m ryu attack overhead ryu jump neutral ryu attack roundhouse ryu tatsu start ryu standing h ryu standing l ryu crouching l ryu crouching h ryu shoryuken ryu walk backwards ryu hadoken ryu crouching m ryu walk forward ryu crouching idle ryu throw forward ryu dash backwards ryu attack lowkick and suggest that the sequence shoryuken, shoryuken, etc is combo, which in reality isn’t even a valid sequence. Figure 10: Example of the Shoryuken attack followed by the landing action after a Shoryuken Another interesting fact is also the duration of each action. Special attacks like hadoken and shoryuken last longer that simple attacks like standing l. This behavior makes special attacks have a much larger probability of being a combo, simply because they last longer and therefore have more frames being labeled as a combo. 6.4 Supervised learning and moves modeling Attempting to further avoid noise generated by multiple frames of a same action, we instead attempted to abstract from our previous discrete time interval, which is a single frame. Although it is the shortest time step in a game, it does create other problems related to the number of consecutive instances of the same action being dependant on the cooldown of the action itself, rather than being dependant of any informative variable we could derive or use to improve our modeling. For this attempt, we changed our HMM time step from a single frame to the first frame of a new action by the attacking player. This approach generates a considerably smaller and condensed new set of information that does not inform of the timestamp of an action as these actions are sequential, but the amount of frames in between them is unknown. This change greatly reduces the ammount of noise generated by multiple instances of the same input, but it does however take away our capacity to run the obtained sequence of actions within the game automatically, as we now have to write the input for a set of actions by hand and without knowledge of each action’s timestamp. For this, we simply assume each action within a sequence selected by the HMM occours as soon as possible. That is, we assume that between an action’s and the next one’s beginning, there is the minimum possible ammount of idle frames, which in most cases is zero frames. Probability 0.0746869 0.00447227 0.0228086 0.13059 0.0330948 0.0442755 0.0778175 0.0594812 0.110018 0.0997317 0.0210197 0.0545617 0.0277281 0.106887 0.0178891 0.0514311 0.000894454 0.000447227 0.0290698 Table 6: All transition probabilities from state ryu standing m in the moves modeling. shoryuken. When prompted to find larger combos, the HMM returned a repetition of these small combos. One of the reasons behind this is that most moves are followed by the idle animation, specially tatsu spin with a probability of nearly 95%. This is the same animation made in the beginning of a match. This simulates a ’reset’ after a small combo and the HMM tends to simply repeat these combos. The other reason is that the HMM does not have enough information to know when a move can be repeated which, in most of the suggested sequences, would take a couple seconds. In this last experiment, we ran the supervised training over all 1970 samples of attacks we had. This time the results were really promising. Figure 11 show a few examples of combos found after the viterbi algorithm was ran and Tables 5 and 6 show a segment of the emission and transition matrix. These combos however are all small, lasting only a couple actions. The emission and transition probabilities also appear to be extremely better than in the previous experiments, as expected from the results. Move ryu standing idle ryu standing m ryu attack overhead ryu jump neutral ryu air m ryu jump fall ryu air l’ ryu tatsu spin ryu standing h ryu shoryuken Combo (1) probability 0.209363 0.445886 0.437616 0.123120 0.009825 0.0 0.023041 0.628571 0.372345 0.595197 Free (0) probability 0.790637 0.554114 0.562384 0.876880 0.990174 1.0 0.976959 0.371429 0.627655 0.404803 Table 5: Some emission probabilities in the moves modeling. The combos found are standing l + standing l + standing l; standing h + tatsu spin; crouching m + tatsu spin and overhead + Figure 11: Four different combos found when running the Viterbi algorithm for many combo sizes. Any combo bigger than 9 moves is a repetition of the first example. For the ones between 1 and 8 moves, Viterbi returns different combinations of all of them. 7 Conclusion Through application with a supervised HMM, and also an unsupervised HMM, we realized that the effects of noise and delay were severe enough to overshadow the true patterns that lead to combos. These implications assure us that inputs do not serve as a model- ing criterion even when described under a sequential learning algorithm. The Markov assumption process suggests that the future is independent of the past. In our case, this does not apply, as the observation of a following frame depends mostly on a hidden state somewhere in the past. We could specialize the algorithm to avoid these flaws, or even try to predict the far future from a past frame instead of trying to predict the following frame, but since this diverges from the input approach, it leads us to believe that it is not a reliable modeling basis in the first place. On the other hand, when analyzing the actions of a player rather than its inputs, the results are more conclusive and precise. The downside of this approach is that we HMM becomes overspecialized in this specific problem. Future works intend to explore different and even more specialized modeling approaches. One of such could be a modification of the standard Viterbi algorithm, in which the repetition of a subset of sequences is penalized. This new view of the problem is sure to bring different challenges and difficulties, even more so when trying to adapt them to other fighting games. But we can hope that these approaches should bring lesser complications than the more generic approaches. References A NDRADE , G., R AMALHO , G., S ANTANA , H., AND C ORRU BLE , V. 2005. Challenge-sensitive action selection: an application to game balancing. In Intelligent Agent Technology, IEEE/WIC/ACM International Conference on, IEEE, 194–200. BAUM , L. E. 1972. An equality and associated maximization technique in statistical estimation for probabilistic functions of markov processes. Inequalities 3, 1–8. B ILMES , J. A., ET AL . 1998. A gentle tutorial of the em algorithm and its application to parameter estimation for gaussian mixture and hidden markov models. International Computer Science Institute 4, 510, 126. B RYSON , A. E. 1975. Applied optimal control: optimization, estimation and control. CRC Press. G RAEPEL , T., H ERBRICH , R., AND G OLD , J. 2004. Learning to fight. In Proceedings of the International Conference on Computer Games: Artificial Intelligence, Design and Education, 193–200. G RIFFIN , D., AND B UEHLER , R. 1999. Frequency, probability, and prediction: easy solutions to cognitive illusions? Cognitive psychology 38, 1, 48–78. M ATSUMOTO , Y., AND T HAWONMAS , R. 2004. Mmog player classification using hidden markov models. In Entertainment Computing–ICEC 2004. Springer, 429–434. R ABINER , L., AND J UANG , B.-H. 1986. An introduction to hidden markov models. ASSP Magazine, IEEE 3, 1, 4–16. S AINI , S. S., DAWSON , C. W., AND C HUNG , P. W. 2011. Mimicking player strategies in fighting games. In Games Innovation Conference (IGIC), 2011 IEEE International, IEEE, 44–47. S ARKAR , A. 2000. Yet another introduction to hidden markov models. Canada: School of Computing Science, Simon Fraser University. S TAMP, M. 2004. A revealing introduction to hidden markov models. Department of Computer Science San Jose State University. V ITERBI , A. J. 1967. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. Information Theory, IEEE Transactions on 13, 2, 260–269. YAMAMOTO , K., M IZUNO , S., C HU , C. Y., AND T HAWONMAS , R. 2014. Deduction of fighting-game countermeasures using the k-nearest neighbor algorithm and a game simulator. In Computational Intelligence and Games (CIG), 2014 IEEE Conference on, IEEE, 1–5.