Page 33 - University of Guelph
Transcription
Page 33 - University of Guelph
Behavioral-Based Cheating Detection in Online First Person Shooters using Machine Learning Techniques Hashem Alayed Fotos Frangoudes Clifford Neuman University of Southern California [email protected] University of Southern California [email protected] Information Sciences Institute [email protected] Abstract—Cheating in online games comes with many consequences for both players and companies. Therefore, cheating detection and prevention is an important part of developing a commercial online game. Several anti-cheating solutions have been developed by gaming companies. However, most of these companies use cheating detection measures that may involve breaches to users’ privacy. In our paper, we provide a serverside anti-cheating solution that uses only game logs. Our method is based on defining an honest player’s behavior and cheaters’ behavior first. After that, using machine learning classifiers to train cheating models, then detect cheaters. We presented our results in different organizations to show different options for developers, and our methods’ results gave a very high accuracy in most of the cases. Finally, we provided a detailed analysis of our results with some useful suggestions for online games developers. Keywords—Cheating Detection; Online Games; Machine Learning I. I NTRODUCTION Security in online games is an important issue for any game to gain revenue. Therefore, cheating detection and prevention is an important part of developing a commercial online game. Cheating in online games comes with many consequences for both players and companies. It gives cheaters an unfair advantage over honest players and reduces the overall enjoyment of playing the game. Developers are affected monetarily in several ways as well. First, honest players choose to leave the game if cheaters are not banned and developers lose their subscription money. Next, the game will gain a bad reputation, which will lead to reduced revenue. Finally, companies will have to spend more resources each time a new cheat is discovered in order to develop and release patches to countermeasure the cheats. Anti-cheating solutions have been developed by gaming companies to counter these problems. However, most of the solutions use cheating detection measures that may involve breaches to users’ privacy, such as The Warden of ”World of WarCraft” [1], and PunkBuster [2]. In this paper, we provide a method of detecting cheats by simply monitoring game logs and thus using a player’s behavior study techniques. We chose to apply our method on a First Person Shooter (FPS) game we developed using the Unity3D 978-1-4673-5311-3/13/$31.00 ©2013 IEEE game engine [3]. The Game is fully online using a clientserver model, and game logs are collected on the server side. After collecting game logs, they are pre-processed and then several supervised Machine Learning techniques are applied, such as Support Vector Machines and Logistic Regression, in order to create detection models. When creating the models, we will provide different experimental organizations for our data: Multi-class classification, in which each cheat will be represented as a class. Then, Two-Class classification, in which all cheats will be classified as ’yes’. After that, similar cheats grouped together as one class. Finally, creating a model for each cheat seperatly. The resulted detection models can then be used on new unlabeled data to classify cheats. The classification of cheats will be based on different unique features. We defined our novel features in addition to previously used ones (in [4] and [5]). Finally, we will rank features and analyze the results; then suggest proper solutions for developers on how to use the resulting models. The remainder of the paper is organized as follows; Section II will contain related work. In Section III, we will describe the design and implementation of the game, the feature extractor, and the data analyzer. We will then explain our experimental results and analyze them in Section IV. Finally, in Section V we will readdress the problem to conclude and discuss future work. II. R ELATED W ORK Several papers showcased techniques for analyzing human players’ behavior using machine learning techniques. These techniques can be used to either differentiate a human player from a bot or to detect cheating. One of the earliest works in the field of using machine learning in online games was done by Thurau et al. They used different approaches to develop human-like AI agents using the analysis of human player behavior. That was done by introducing a neural network based bot that learns human behavior [6], and dividing players actions into strategies and tactics using Neural Gas Waypoint Learning algorithm to represent the virtual world [7]. Then, Improving [7] by using applying Manifold Learning for solving the curse of dimensionality [8]. They also used Bayesian imitation learning instead of Neural Gas Waypoint Learning algorithm in [9] in order to improve [7]. Fig. 1: Trojan Battles in-game screenshot Kuan-Ta Chen et al. [10] provided a bot detection method using Manifold Learning. Although their method was very accurate, it was based only on the character’s movement position and thus only worked for detecting a moving bot. Yeung et al [4] provided a scalable method that uses a Dynamic Bayesian Network (DBN) to detect AimBots in FPS games. Their method provided good results, although if more features were used, better results could have been obtained. Galli et al. [5] used different classifiers to detect cheating in Unreal Tournament III. The detection accuracy was very high; however, their results were based on play-tests using up to 2 human players versus 1 AI player. Also, some features relied on the fact that there are only two players in the game, which is not the case in commercial online games. For example, they calculate distance between a player and his target all the time (even if the target is not visible). In this paper, we will present different ways of cheating detection using more (and variant) features and different classifiers than [4] and [5]. III. D ESIGN AND I MPLEMENTATION Our system consists of four components: a game client with access to an AimBot system; a game server that handles communication between all the clients and logging; a feature extractor that pre-process log data; and finally, a data analyzer that is responsible of training classifiers and generating models for cheats. A. The Game: Trojan Battles Trojan Battles (Figure 1) is the game based on which all the tests were taken on. It is an online multi-player FPS game, built at GamePipe Lab at the University of Southern California [11]. The game, both server and client, were developed from scratch for full control over any game development changes and modifications with ease. This also, gives the ability to create game logs that contain data similar to many of the commercial games available now. 1) Game Server: The game server was developed using C# and is a non-authoritative server, thus being mainly responsible for communication between the clients and for logging game data. The server also manages game time, ensures all clients are synchronized, and notifies all players when each match Fig. 2: Lobby screen with customization options Fig. 3: In-game AimBot toggle screen ends. Communication is event-based and achieved by sending specific messages on in-game actions, like when a player spawns, dies, fires, etc. It also receives a heartbeat message on intervals that are set before the start of each match, which includes position and state data. 2) Game Client: The game client was developed using Unity3D and is used by players in order to login on the server, using their unique names. Players can then create a new game or find an existing game on the lobby screen (Figure 2), join, and finally play a timed-based death-match. Each game can be customized by using a different map (currently there is a medium-sized and a larger-sized map), changing the game duration (players can choose between 3, 5, 10 and 15 minutes), and altering the network protocol to be used (TCP or UDP) and the number of heartbeat messages being sent each second (can vary from 5 to 30). The game provides five different weapons each player can choose from. These are machine gun, riffle, sniper, shotgun and rocket launcher. Each of these weapons provides different aiming capabilities as well. Each game client gives the player an Aimbot utility as well which enables them to activate and deactivate any cheats they want during each match. 3) Type of AimBots: Like commercial AimBots [12], each cheat is implemented as a feature that can be triggered ingame (Shown in Figure 3). These features can be combined to apply a certain AimBot, or to create more complicated and harder to detect cheats. The types of AimBots implemented in this game are: • Lock (L): A typical AimBot that when it is enabled, will aim on a visible target continuously and instantly. • Auto-Switch (AS): Must be combined with Lock to be activated. It will switch the lock on and off to confuse • • • the cheat detector. This AimBot was used in [4]. Auto-Miss (AM): Must be combined with Lock to be activated. It will create intentional misses to confuse the cheat detector as well. This AimBot was also used in [4]. Slow-Aim (SA): Must be combined with Lock to be activated and it could also be combined with Auto-Switch or Auto-Miss. This is a well-known cheat that works as Lock, but it will not aim instantly to the target, but rather ease in from the current aim location. Auto-Fire (AF): Also called a TriggerBot; when a player cross-hair aims on a target, it will automatically fire. Could be combined with Lock to create an ultimate cheat B. Feature Extractor The feature extractor is a very important component in a behavior-based cheating detection technique. We need to extract the most useful features that define a cheating behavior of an unfair player from a normal behavior of an honest player. These features depend on the data sent from the client to the server. In online games, data sent between clients and server are limited due to the bandwidth limitation and the smoothness of the gameplay. Therefore, in our game, we tried to reduce the size of messages being exchanged to reduced bandwidth consumption while at the same time logging informative data that would provide meaningful results. The features are extracted in terms of time frames, and some of them are similar to what has been done in [5] and [4]. However, we used different time frames to observe their effect on the detection. Most of the features are extracted from the HeartBeat messages logged into our system. If a feature depends on a target, then it will be calculated only when a target is visible. In case of multiple visible targets, it will consider the closest target as the current target. For example, ’Mean Distance’ calculates the distance between the player and his closest visible target. We will explain the features extracted below and what kind of data they need from the logs. Also, we will show the most important features in section IV-B below. Note that features which are specific to FPS games will be marked as (FPS), and Generic Features that can be used in other types of games will be marked as (G): 1) Number of Records (G): Contains the total number of log rows during the current time frame. It uses the data from the HeartBeat messages. 2) Number of Visible-Target Rows (G): Contains the number of rows a target was visible. It also uses the data from the HeartBeat messages. 3) Visible-To-Total Ratio (G): Calculate the ratio of the number of rows where a target was visible to the number of total rows in the current frame. 4) Aiming Accuracy (FPS): Tracks the aiming accuracy based on the visible targets at each frame. If a target is visible, and the player is aiming at it, the aiming accuracy increases exponentially by each frame. When the player loses the target then the accuracy starts decreasing linearly. This feature will use aiming target ID and visible targets list from the logs. 5) Mean Aiming Accuracy (FPS): While a target is visible, simply divide the number of times a player aimed on a target by the number of times a target was visible. This feature will use aiming target ID, and visible targets list from the logs. 6) Player’s Movement (G): While a target is visible, this feature will define the effect of a player’s movement on aiming accuracy. It will use player’s movement animation from the logs. 7) Target’s Movement (G): While a target is visible, this feature will define the effect of a target’s movement on a player’s aiming accuracy. It will look up the target’s movement animation from the logs at the current timestamp. 8) Hit Accuracy (FPS): It will simply divide the number of hits over the number of total shots within the specified time frame. Head shots will be giving higher points than other body shots. It will use the target ID from the Actions Table in the logs (target will be 0 if it is a miss). 9) Weapon Influence (FPS): This is a complex feature that uses Weapon type, Distance, and Zooming to define it. While a target is visible, it will calculate the distance between the player and the closest target. Then, it will calculate the influence of the weapon used from this distance, and whether the player is zooming or not. This feature will use weapon type, player’s position, target’s position, and zooming value from the logs. It will look up the target’s position from the logs at the current timestamp. 10) Mean View Directions Change (G): This feature defines the change of the view direction vectors during the specified time frame. It will be calculated using the mean of the Euler Angles between vectors during that time frame. Note that players will change view directions drastically when the target dies; therefore, we put respawn directions into account. 11) Mean Position Change (G): This feature will define the mean of distances between each position and the next during the specified time frame. Like View Directions above, this feature will put re-spawns into account. 12) Mean Target Position Change (G): This feature will define the target’s position changes only when a target is visible. 13) Mean Distance (G): While a target is visible, calculate the distance between the player and the target. Then, calculate the mean of these distances. It will use a player’s position and its closest target’s position from the logs. 14) Fire On Aim Ratio (FPS): This Feature will define the ratio of firing while aiming on a target. It will use the firing flag as well as the aiming target from the logs. 15) Fire On Visible (FPS): This Feature will define the ratio of firing while a target is visible. It will use the firing flag from the logs. 16) Instant On Visible (FPS): This feature will define the ratio of using instant weapons while a target is visible. It will use the Instant-Weapon flag from the logs. 17) Time-To-Hit Rate (FPS): This feature will define the time it takes a player from when a target is visible until the target gets a hit. It will use clients’ timestamps from the HeartBeat Table in addition to Hit Target from the Action Result Table. 18) Cheat: This is the Labeling feature and it will specify the type of cheat used. The cheat that is used more than 50% of the time frame will be the labeled cheat. If no cheats were used over 50% of the time frame, it will be labeled as ”normal”. C. Data Analyzer The Data Analyzer will take the generated features’ file and apply some machine learning classifiers on them to detect the cheats used in our game. First, we will train the classifiers on a labeled set of data using cross-validation with 10 folds. Then, we will generate detection models to use them in a 3-player test run. Finally, we will specify the accuracy of each classifier with each AimBot type and present them in the Results section below. We used an implemented Data Mining tool called Weka [13] to apply the machine learning classifiers and create the models. TABLE I: Number of Instances used for IV. E XPERIMENTS To produce the training data, we played eighteen different death-matches. Eight matches were 10 minutes long and ten matches were 15 minutes long. Each match was played by two players over the network. Thus, we collected around 460 minutes of data. At each match, one player used one of the cheats in section III-A3 and the other player played normally, except for two matches, which were played without cheats. The messages between the clients and the server were exchanged at a rate of 15 messages/second. This rate was selected to accommodate delays, and to avoid over usage of bandwidth. Since the feature extractor is flexible, we generated data into different time frames, namely, 10, 30, 60 and 90 seconds/frame. Then, we classified them using the most commonly used classifiers; Logistic Regression, which is simple, and Support Vector Machines (SVM), which is complex but more powerful. To apply SVM in Weka, we used the SMO algorithm [14] For SVM, we used two different Kernels, namely: Linear (SVM-L) Kernel, and Radial Bases Function (SVM-RBF) Kernel. SVM contains different parameters that could be changed to obtain different results. One of those parameters is called the Complexity Parameter (C), which controls the softness of the margin in SVM, i.e. Larger ’C’ will lead to harder margins [15]. For L Kernel, we set the complexity parameter (C) value to 10. The complexity parameter of RBF Kernel was set to 1000. We trained the classifiers and created the models using cross-validation using 10 folds [16]. (a) Parts 1 and 2 Frame Size 10 30 60 90 Number of instances Cheats Normal Total 990 1082 2072 421 445 866 217 223 450 149 159 308 (b) Part 3 Frame Size 10 30 60 90 Number of instances Cheats Normal Total LB AF 708 282 1082 2072 300 121 445 866 157 60 233 450 107 42 159 308 (c) Part 4 Frame Size 10 30 60 90 Number of instances Cheats Normal Total 190 1082 1272 100 445 545 50 233 283 30 159 189 A. Results To display the results more clearly, we will illustrate them in five parts. The first four parts are used to create and train the models. Each part of the first four represents different data organization methods. The last part is the testing phase using 30 minutes of collected data from three different players. Before we show those representations (parts), Table I shows how many instances were used for each part. Note that Table I(c) contains the average number of cheat instances used in Part 4. Also, note that the number of total instances is less than 460 minutes sometimes, since we omit any feature row that has a Visible-To-Total ratio of less than 5%. 1) All cheats together, using multi-class classification: First, we analyzed data using multi-class classification, in which we had all the instances mixed together using specific label for each cheat class. The accuracy obtained using such methods was shown in Table II, and the best accuracy obtained was using Logistic Regression with frame size = 60. The confusion matrix of the best classification test is shown in Table II. In this table, you notice that there were many miss-classifications within the four ”Lock-Based” cheats (shown in gray): L, AS, AM, and SA; and that is due to the similarity between TABLE IV: Locked-Based vs AF Classification TABLE II: Multi-class Classification (a) Accuracy values for each classifier using different time frames Frame Size 10 30 60 90 SVM-L Accuracy 72.9 77.9 80.7 78.2 SVM-RBF Accuracy 73.7 78.4 80.7 79.2 (a) Accuracy values for each classifier using different time frames Logistic Accuracy 72.3 76.7 80.9 77.5 Frame Size 10 30 60 90 (b) Confusion matrix for Logistic and frame size 60 Actual L 25 3 3 13 0 1 L AS AM SA AF No AS 6 20 10 2 0 1 Predicted AM SA 3 15 11 2 14 2 2 19 0 0 5 0 AF 1 0 1 0 60 0 No 1 1 3 0 0 226 Actual LB AF No 2) All cheats combined, using two classes (yes or no): We analyzed data by combining all the cheating as a single cheat, i.e. label a cheat as ”yes” if occurred. As you can see in Table III, instead of posting the confusion matrix for each classifier, we showed the different accuracy measurements. The measurements are: Overall Accuracy (ACC), True Positive Rate (TPR), and False Positive Rate (FPR). The best values were distributed between all the classifiers. Therefore, it depends on what the game developers care about most. 3) Lock-Based cheats classified together versus AutoFire (Multi-Class): Once we combined Lock-Based cheats together we achieved an 18% jump in overall accuracy compared to Part 1 above. As you can see in Table IV(a), the best value was obtained by using SVM-RBF when frame size = 60. The confusion matrix of this test is shown in Table IV(b). 4) Each cheat classified separately: By separating each cheat, we achieved the highest accuracy. However, each cheat will have its favorite TABLE III: Two-class Classification ACC 87.7 93.5 97.3 97.1 SVM-L TPR 85.5 92.4 96.8 96.6 FPR 10.2 5.4 2.1 2.5 SVM-RBF ACC TPR FPR 89.3 87.1 8.6 93.7 92.6 5.2 97.3 97.2 2.6 97.1 96 1.9 ACC 87.3 93.4 97.1 95.1 Logistic TPR 85.3 92.4 97.2 96 SVM-RBF Accuracy 89.6 95.2 98.2 97.1 Logistic Accuracy 88.4 94.2 94.2 95.1 (b) Confusion matrix for SVM-RPF and frame size 60 those four cheats (They are all based on Locking with some confusion added in AS, AM, and SA). Therefore, in section IV-A.3 below, we combined those four cheats as one cheat to observe the difference in the resulting accuracy. Frame Size 10 30 60 90 SVM-L Accuracy 88.6 94.9 98 98.1 FPR 10.8 5.6 3 5.7 Predicted LB AF No 153 1 3 0 60 0 1 3 229 classifier as shown in Table V. Again, the choice of a certain classifier depends on which accuracy measurement the game developer cares about the most. We should clarify that the larger frame size, the less number of instances. This happened because we are using the same data. 5) Three Players Test Set: After we generated the models in the previous four parts, we played a 30-min long death-match that contained three players; one honest player and two cheaters (Lock and Auto-Fire). By looking at the results in Parts 1, 2, 3 nd 4 above, we decided to choose classifiers SVM-L and SVM-RBF with frame sizes 30 and 60. Therefore, we used the models created by the previous trained data on the current new unseen data (The 30-min, Three-players gameplay). We represented the results in Table VI as follows: Table VI(a) shows the number of instances collected for each frame size. Table VI(b) shows the best results obtained using models from Parts 1, 2, and 3 above. Finally, Table VI(c) shows the best results obtained by using the models from Part 4. In this table, we show the detection accuracy as TPR for each cheat type and for normal gameplay. We chose TPR to capture the detection accuracy for each model when we provided unknown cheat type. B. Features Ranking Before we analyse the results, we will show the features that were important and useful for the prediction process. In this paper, we will provide the ranking achieved by using only the SVM-L models on all the training parts above. To rank the features, we calculated the squares of the weights assigned by the SVM classifier [17]. Therefore, the feature with highest weight (squared) will be the most informative feature. TABLE VI: Test Set Results TABLE V: Separated Cheats Classification (a) Number of instances (a) Lock Frame Size 10 30 60 90 ACC 97.5 98.7 98.6 98.5 SVM-L TPR 87.9 95.2 94.1 94.7 FPR 1 0.7 0.4 0.6 SVM-RBF ACC TPR FPR 97.5 89 1.1 98.4 95.2 0.9 98.9 94.1 0 98.5 94.7 0.6 ACC 97.1 96 96.5 98.9 Logistic TPR 87.9 90.5 88.2 94.7 Frame Size FPR 1.5 2.9 1.7 0 30 60 L 59 29 Number of instances AF Normal Total 41 58 158 24 32 85 (b) Accuracy results for Parts 1, 2 and 3 (b) Auto-Switch Frame Size 10 30 60 90 ACC 93.9 96.2 98.9 99.5 SVM-L TPR 73.9 86.8 97.3 95.8 FPR 2.7 2.2 0.9 0 SVM-RBF ACC TPR FPR 94.2 75.5 2.6 96.4 86.8 2 98.9 97.3 0.9 99.5 95.8 0 ACC 93.4 96.4 97.4 99.5 Part Logistic TPR FPR 71.7 2.9 86.8 2 97.3 2.6 95.8 0 1 2 3 Best Selected Results Accuracy Frame Size Classifier 78.8% 60 SVM-L 94.1% 60 SVM-RBF 98.8% 60 SVM-L (c) Accuracy results for Part 4 (c) Auto-Miss Frame Size 10 30 60 90 ACC 93.5 96.9 98.1 97.8 SVM-L TPR 65.8 87.9 90.9 86.4 ACC 97.7 99.4 100 100 SVM-L TPR FPR 90.5 1 98.6 0.4 100 0 100 0 ACC 92.9 96.6 98.3 100 SVM-L TPR 75.9 92.6 96.7 100 FPR 2.4 1.8 0.9 0.6 SVM-RBF ACC TPR FPR 93.2 66.5 2.8 96.7 87.9 2 97.8 87.9 0.9 97.8 86.4 0.6 ACC 93.3 95.9 95.1 96.1 Logistic TPR 67.7 84.8 78.8 81.8 ACC 97.5 97.3 99.6 100 Logistic TPR 91.6 91.9 100 100 FPR 1.4 1.8 0.4 0 ACC 92.2 96.1 98.6 99 Logistic TPR 75.5 90.9 98.3 100 FPR 3.5 2.5 1.3 1.3 FPR 2.9 2.5 2.6 1.9 (d) Slow Aim Frame Size 10 30 60 90 SVM-RBF ACC TPR FPR 97.8 91.1 1 98.8 95.9 0.7 100 100 0 100 100 0 (e) Auto-Fire Frame Size 10 30 60 90 FPR 2.7 2.2 1.3 0 SVM-RBF ACC TPR FPR 94.4 81.6 2.2 97.2 94.2 2 98.6 98.3 1.3 100 100 0 Table VII shows the ranking of the features for each part in section IV-A above (only training parts 1 to 4). We only show the top five features since there is a big difference in weights between the first 3-5 and the others. Also, we noticed that there were some differences in the rankings between different frame sizes. However, the top feature is always the same for different frame sizes. The most common top feature is ’Mean Aim Accuracy’, and is the most informative feature in the prediction of Aimbots. For the ’Auto-Fire’ cheat, ’Fire-on-Visible’ and ’Fire-onAim’ (both got very high weights) were the most informative features since we are looking for detecting instant firing when a target is over the crosshair. Cheat Type TPR for L 96.6% 100% 89.7% 89.7% 55.2% L AS AM SA AF Classifier SVM-RBF Both SVM-RBF Both SVM-L classifier did not prove to be as accurate since we had confusion between the lock-based cheats. On the other hand, by separating the cheats and having a classifier for each type of cheat, we achieved higher accuracy in the first four parts (training sets). This shows that separated cheat detection can help developers with cheat detection by applying all the models simultaneously on each instance (explained in the next paragraph). In part IV-A.5 of the experiments, we noticed that the models obtained from parts IV-A.3 and IV-A.4 gave the highest accuracy. TABLE VII: Top Five Features in Each Training Part (a) Top Five Features for Part 1 Rank 1 2 3 4 5 Frame Size 30 60 MeanAimAcc MeanAimAcc FireOnAim FireOnAim FireOnVisible AimAcc AimAcc FireOnVisible TrgtPosChange TrgtPosChange (b) Top Five Features for Part 2 Rank C. Analysis As we can observe from the results above, to apply supervised machine learning methods we need full knowledge about our data. In our context, which is FPS AimBots, we noticed that awareness of the type of each cheat could help increase the accuracy of the classifier. However, using multi-class Best Selected Results TPR TPR Frame for AF for Normal Size 2.4% 100% 60 0% 100% 60 0% 100% 60 0% 100% 60 100% 96.9% 60 1 2 3 4 5 Frame 30 MeanAimAcc FireOnVisible FireOnAim TrgtPosChange HitAcc Size 60 MeanAimAcc FireOnVisible HitAcc FireOnAim MeanDistance (c) Top Five Features for Part 3 Rank 1 2 3 4 5 Frame Size 30 MeanAimAcc FireOnVisible FireOnAim HitAcc AimAcc 60 MeanAimAcc FireOnAim FireOnVisible HitAcc PlayerMov (d) Top Five Features for Part 4: Lock Rank 1 2 3 4 5 Frame Size 30 60 MeanAimAcc MeanAimAcc HitAcc FireOnVisible PlayerMov PlayerMov WpnInf HitAcc FireOnAim TrgtMov (e) Top Five Features for Part 4: Auto-Switch Rank 1 2 3 4 5 Frame 30 MeanAimAcc TrgtPosChange PlayerMov FireOnAim PositionChange However, when these models are applied to an online game, there is no labeled set (i.e. unseen data). Therefore, there needs to be a method of confirming whether a player is a cheater or not. A good way is to specify a threshold to identify a cheater. The threshold can be based on the number of flagged records and when reached, a player can be determined as a cheater. The threshold value depends on the detection accuracy of the model and on the developers’ policy for cheat detection. It also depends on when the detection is taking place: online while the cheater is playing (real-time detection), or offline after a game is finished. The following is an example of how this detection technique can be used by changing the threshold, based on the above test data (Part IV-A.5). For the purposes of the example, the model from Part IV-A.2 is going to be used and a frame size 60. The threshold is going to be set to 50%, which provides a loose detection to avoid some false positives. Then, by using the following formula the actual number of cheating records that will flag a player as a cheater can be determined: Size 60 MeanAimAcc TrgtPosChange PlayerMov MeanDistance WpnInf N umberOf Records(N R) = M atchLengthInSeconds/F rameSize Then, N umberOf CheatingRecords(N CR) = N R ∗ T hreshold (f) Top Five Features for Part 4: Auto-Miss Rank 1 2 3 4 5 Frame Size 30 MeanAimAcc AimAcc FireOnAim HitAcc PlayerMov 60 MeanAimAcc HitAcc MeanDistance TrgtPosChange PlayerMov (g) Top Five Features for Part 4: Slow-Aim Frame Size Rank 30 MeanAimAcc HitAcc PlayerMov FireOnAim MeanDistance 1 2 3 4 5 60 MeanAimAcc HitAcc MeanDistance TimeToHit AvgInstntWpnOnVis (h) Top Five Features for Part 4: Auto-Fire Rank 1 2 3 4 5 Frame Size 30 60 FireOnVisible FireOnVisible FireOnAim FireOnAim HitAcc HitAcc TrgtMov MeanAimAcc WpnInf MeanDistance In this example: N R = 1800/60 = 30 Therefore, N CR = 30 ∗ 0.5 = 15 On the other hand, if we look at the results on the test set for applying the models from part IV-A.4, we notice that the all Lock-Based models detected the Lock cheat accurately; however, only the Auto-Fire model detected the Auto-Fire cheat accurately. Therefore, using these models, and applying both of them simultaneously on the gameplay data can provide the required detection mechanism. The threshold is set again at 50%; i.e. NCR = 15 using the previous formula, however the following formula needs to be used to detect a cheater: LockBasedCheatingRecords(LBCR) = M ax(N CR(L), N CR(AS), N CR(AM ), N CR(SA)) Then, OverallN CR = LBCR + N CR(AF ) The reason for using an overall threshold is to detect players that use different cheats during the same match. After analyzing the data, we noticed that some results weren’t accurate enough sometimes. Many reasons could cause this inaccuracy to appear. First, in online games, delays (lags) happen all the time, which can cause the message exchange rate to fall down (it reaches less than 5 messages/sec in some cases). This delay reduces the accuracy of data collection and causes, some features give odd values hence, low accuracy will be expected. Another reason for inaccuracy is the number of cheating data collected compared to normal behavior data. In some cases, especially with separated cheats, the ratio between cheats and normal data is higher than 40:60 (which is a reasonable rate between normal and abnormal data for classification in our opinion). Finally, improving the feature set, by either adding new features or by modifying some of the existing ones, can also help increase accuracy. Overall, the accuracy that was achieved is very high, especially with the AimBots confusions added (Auto-Switch and Auto-Miss). Frame sizes affect the accuracy; the frame size 60 caused highest achievable accuracy over most of the experiments. However, as mentioned before, larger frames will result in a fewer number of instances. Also, we assume that any frame larger than 60 is too large, since smart cheaters will activate cheats only when they need them. Therefore, larger frame size might not be accurate to identify a cheating behavior. In our opinion, we suggest using any frame size between 30 and 60 seconds. V. C ONCLUSION AND F UTURE W ORK In this paper, we presented a behavior-based method to detect cheaters in Online First Person Shooter games. The game was developed using Unity3D game engine and it had client-server architecture. The game logs were stored in the server side, to be parsed by a feature generator and then analyzed by the data analyzer. Data was analyzed using implemented machine learning classifiers in Weka. After conducting several experiments using different types of detection models, we obtained great accuracy results. We then suggested using thresholds when applying the generated models. The value of the threshold depends on the developers’ policy for cheat detection. It could be low (harsh) or high (loose); however, it should not be too harsh or it will capture many experienced honest players as cheaters. In general, game data analysis depends on how much you know the game. Therefore, developers could adjust data in messages exchanged between client and server to accommodate the generation of the features. Also, using behavior-based cheat detection methods will protect players’ privacy. More work could be done in the future to improve the cheat detection models. We could collect more data using a mixture of cheats for each player instead of one cheat the whole game. Also, more features could be generated to improve identifying cheats. Moreover, we could identify the behavior of each player separately. That means building cheating detection models for each player instead of an overall model for each cheat. Other improvements and additions could be done including increasing the number of maps and weapons. ACKNOWLEDGMENT We would like to thank the members of GamePipe lab at the University of Southern California for their continuous support and advices; especially Professor Michael Zyda, Balakrishnan (Balki) Ranganathan, Marc Spraragen, Powen Yao, Chun (Chris) Zhang, and Mohammad Alzaid. R EFERENCES [1] G. Hoglund and G. McGraw, Exploiting online games: cheating massively distributed systems, 1st ed. Addison-Wesley Professional, 2007. [2] S. Webb and S. Soh, “A survey on network game cheats and P2P solutions,” Australian Journal of Intelligent Information Processing Systems, vol. 9, no. 4, pp. 34–43, 2008. [3] Unity3d - game engine. [Online]. Available: http://www.unity3d.com/ [4] S. Yeung and J. C. Lui, “Dynamic Bayesian approach for detecting cheats in multi-player online games,” Multimedia Systems, vol. 14, no. 4, pp. 221–236, 2008. [Online]. Available: http://dx.doi.org/10. 1007/s00530-008-0113-5 [5] L. Galli, D. Loiacono, L. Cardamone, and P. Lanzi, “A cheating detection framework for Unreal Tournament III: A machine learning approach,” in CIG, 2011, pp. 266–272. [6] C. Thurau, C. Bauckhage, and G. Sagerer, “Combining Self Organizing Maps and Multilayer Perceptrons to Learn Bot-Behavior for a Commercial Computer Game,” in Proc. GAME-ON, 2003, pp. 119–123. [7] C. Thurau, C. Bauckhage, and G. Sagerer, “Learning Human-Like Movement Behavior for Computer Games,” in Proc. Int. Conf. on the Simulation of Adaptive Behavior. MIT Press, 2004, pp. 315–323. [8] C. Thurau and C. Bauckhage, “Towards manifold learning for gamebot behavior modeling,” in Proceedings of the 2005 ACM SIGCHI International Conference on Advances in computer entertainment technology, ser. ACE ’05. New York, NY, USA: ACM, 2005, pp. 446– 449. [Online]. Available: http://doi.acm.org/10.1145/1178477.1178577 [9] C. Thurau, T. Paczian, and C. Bauckhage, “Is Bayesian Imitation Learning the Route to Believable Gamebots?” in Proc. GAME-ON North America, 2005, pp. 3–9. [10] K. Chen, H. Pao, and H. Chang, “Game Bot Identification based on Manifold Learning,” in Proceedings of ACM NetGames 2008, 2008. [11] USC GamePipe Laboratory. [Online]. Available: http://gamepipe.usc. edu/ [12] Artificial Aiming. [Online]. Available: http://www.artificialaiming.net/ [13] Weka 3: Data Mining Software in Java. [Online]. Available: http://www.cs.waikato.ac.nz/ml/weka/ [14] J. Platt, “Advances in kernel methods,” B. Schölkopf, C. J. C. Burges, and A. J. Smola, Eds. Cambridge, MA, USA: MIT Press, 1999, ch. Fast training of support vector machines using sequential minimal optimization, pp. 185–208. [Online]. Available: http://dl.acm.org/citation.cfm?id=299094.299105 [15] M. Rychetsky, Algorithms and Architectures for Machine Learning Based on Regularized Neural Networks and Support Vector Approaches. Germany: Shaker Verlag GmbH, Dec. 2001. [16] R. Kohavi, “A study of cross-validation and bootstrap for accuracy estimation and model selection,” in Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2, ser. IJCAI’95. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1995, pp. 1137–1143. [Online]. Available: http: //dl.acm.org/citation.cfm?id=1643031.1643047 [17] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene selection for cancer classification using support vector machines,” Mach. Learn., vol. 46, no. 1-3, pp. 389–422, mar 2002. [Online]. Available: http://dx.doi.org/10.1023/A:1012487302797