Page 33 - University of Guelph

Transcription

Page 33 - University of Guelph
Behavioral-Based Cheating Detection in
Online First Person Shooters using
Machine Learning Techniques
Hashem Alayed
Fotos Frangoudes
Clifford Neuman
University of Southern California
[email protected]
University of Southern California
[email protected]
Information Sciences Institute
[email protected]
Abstract—Cheating in online games comes with many consequences for both players and companies. Therefore, cheating
detection and prevention is an important part of developing a
commercial online game. Several anti-cheating solutions have
been developed by gaming companies. However, most of these
companies use cheating detection measures that may involve
breaches to users’ privacy. In our paper, we provide a serverside anti-cheating solution that uses only game logs. Our method
is based on defining an honest player’s behavior and cheaters’
behavior first. After that, using machine learning classifiers to
train cheating models, then detect cheaters. We presented our
results in different organizations to show different options for
developers, and our methods’ results gave a very high accuracy
in most of the cases. Finally, we provided a detailed analysis
of our results with some useful suggestions for online games
developers.
Keywords—Cheating Detection; Online Games; Machine
Learning
I. I NTRODUCTION
Security in online games is an important issue for any
game to gain revenue. Therefore, cheating detection and
prevention is an important part of developing a commercial
online game. Cheating in online games comes with many
consequences for both players and companies. It gives cheaters
an unfair advantage over honest players and reduces the
overall enjoyment of playing the game. Developers are affected
monetarily in several ways as well. First, honest players choose
to leave the game if cheaters are not banned and developers
lose their subscription money. Next, the game will gain a
bad reputation, which will lead to reduced revenue. Finally,
companies will have to spend more resources each time a new
cheat is discovered in order to develop and release patches to
countermeasure the cheats.
Anti-cheating solutions have been developed by gaming companies to counter these problems. However, most of the
solutions use cheating detection measures that may involve
breaches to users’ privacy, such as The Warden of ”World of
WarCraft” [1], and PunkBuster [2].
In this paper, we provide a method of detecting cheats by simply monitoring game logs and thus using a player’s behavior
study techniques. We chose to apply our method on a First
Person Shooter (FPS) game we developed using the Unity3D
978-1-4673-5311-3/13/$31.00 ©2013 IEEE
game engine [3]. The Game is fully online using a clientserver model, and game logs are collected on the server side.
After collecting game logs, they are pre-processed and then
several supervised Machine Learning techniques are applied,
such as Support Vector Machines and Logistic Regression, in
order to create detection models. When creating the models,
we will provide different experimental organizations for our
data: Multi-class classification, in which each cheat will be
represented as a class. Then, Two-Class classification, in which
all cheats will be classified as ’yes’. After that, similar cheats
grouped together as one class. Finally, creating a model for
each cheat seperatly. The resulted detection models can then
be used on new unlabeled data to classify cheats. The classification of cheats will be based on different unique features. We
defined our novel features in addition to previously used ones
(in [4] and [5]). Finally, we will rank features and analyze the
results; then suggest proper solutions for developers on how
to use the resulting models.
The remainder of the paper is organized as follows; Section II
will contain related work. In Section III, we will describe the
design and implementation of the game, the feature extractor,
and the data analyzer. We will then explain our experimental
results and analyze them in Section IV. Finally, in Section V
we will readdress the problem to conclude and discuss future
work.
II. R ELATED W ORK
Several papers showcased techniques for analyzing human
players’ behavior using machine learning techniques. These
techniques can be used to either differentiate a human player
from a bot or to detect cheating. One of the earliest works in
the field of using machine learning in online games was done
by Thurau et al. They used different approaches to develop
human-like AI agents using the analysis of human player
behavior. That was done by introducing a neural network
based bot that learns human behavior [6], and dividing players
actions into strategies and tactics using Neural Gas Waypoint
Learning algorithm to represent the virtual world [7]. Then,
Improving [7] by using applying Manifold Learning for solving the curse of dimensionality [8]. They also used Bayesian
imitation learning instead of Neural Gas Waypoint Learning
algorithm in [9] in order to improve [7].
Fig. 1: Trojan Battles in-game screenshot
Kuan-Ta Chen et al. [10] provided a bot detection method
using Manifold Learning. Although their method was very
accurate, it was based only on the character’s movement
position and thus only worked for detecting a moving bot.
Yeung et al [4] provided a scalable method that uses a
Dynamic Bayesian Network (DBN) to detect AimBots in FPS
games. Their method provided good results, although if more
features were used, better results could have been obtained.
Galli et al. [5] used different classifiers to detect cheating in
Unreal Tournament III. The detection accuracy was very high;
however, their results were based on play-tests using up to 2
human players versus 1 AI player. Also, some features relied
on the fact that there are only two players in the game, which
is not the case in commercial online games. For example, they
calculate distance between a player and his target all the time
(even if the target is not visible).
In this paper, we will present different ways of cheating detection using more (and variant) features and different classifiers
than [4] and [5].
III. D ESIGN AND I MPLEMENTATION
Our system consists of four components: a game client
with access to an AimBot system; a game server that handles
communication between all the clients and logging; a feature
extractor that pre-process log data; and finally, a data analyzer
that is responsible of training classifiers and generating models
for cheats.
A. The Game: Trojan Battles
Trojan Battles (Figure 1) is the game based on which all
the tests were taken on. It is an online multi-player FPS game,
built at GamePipe Lab at the University of Southern California
[11]. The game, both server and client, were developed from
scratch for full control over any game development changes
and modifications with ease. This also, gives the ability to
create game logs that contain data similar to many of the
commercial games available now.
1) Game Server: The game server was developed using C#
and is a non-authoritative server, thus being mainly responsible
for communication between the clients and for logging game
data. The server also manages game time, ensures all clients
are synchronized, and notifies all players when each match
Fig. 2: Lobby screen with customization options
Fig. 3: In-game AimBot toggle screen
ends. Communication is event-based and achieved by sending
specific messages on in-game actions, like when a player
spawns, dies, fires, etc. It also receives a heartbeat message
on intervals that are set before the start of each match, which
includes position and state data.
2) Game Client: The game client was developed using
Unity3D and is used by players in order to login on the server,
using their unique names. Players can then create a new game
or find an existing game on the lobby screen (Figure 2), join,
and finally play a timed-based death-match. Each game can
be customized by using a different map (currently there is
a medium-sized and a larger-sized map), changing the game
duration (players can choose between 3, 5, 10 and 15 minutes),
and altering the network protocol to be used (TCP or UDP)
and the number of heartbeat messages being sent each second
(can vary from 5 to 30).
The game provides five different weapons each player can
choose from. These are machine gun, riffle, sniper, shotgun
and rocket launcher. Each of these weapons provides different
aiming capabilities as well. Each game client gives the player
an Aimbot utility as well which enables them to activate and
deactivate any cheats they want during each match.
3) Type of AimBots: Like commercial AimBots [12], each
cheat is implemented as a feature that can be triggered ingame (Shown in Figure 3). These features can be combined
to apply a certain AimBot, or to create more complicated and
harder to detect cheats.
The types of AimBots implemented in this game are:
• Lock (L): A typical AimBot that when it is enabled, will
aim on a visible target continuously and instantly.
• Auto-Switch (AS): Must be combined with Lock to be
activated. It will switch the lock on and off to confuse
•
•
•
the cheat detector. This AimBot was used in [4].
Auto-Miss (AM): Must be combined with Lock to be
activated. It will create intentional misses to confuse the
cheat detector as well. This AimBot was also used in [4].
Slow-Aim (SA): Must be combined with Lock to be
activated and it could also be combined with Auto-Switch
or Auto-Miss. This is a well-known cheat that works as
Lock, but it will not aim instantly to the target, but rather
ease in from the current aim location.
Auto-Fire (AF): Also called a TriggerBot; when a player
cross-hair aims on a target, it will automatically fire.
Could be combined with Lock to create an ultimate cheat
B. Feature Extractor
The feature extractor is a very important component in
a behavior-based cheating detection technique. We need to
extract the most useful features that define a cheating behavior
of an unfair player from a normal behavior of an honest player.
These features depend on the data sent from the client to the
server. In online games, data sent between clients and server
are limited due to the bandwidth limitation and the smoothness
of the gameplay. Therefore, in our game, we tried to reduce
the size of messages being exchanged to reduced bandwidth
consumption while at the same time logging informative data
that would provide meaningful results.
The features are extracted in terms of time frames, and some
of them are similar to what has been done in [5] and [4].
However, we used different time frames to observe their effect
on the detection. Most of the features are extracted from
the HeartBeat messages logged into our system. If a feature
depends on a target, then it will be calculated only when a
target is visible. In case of multiple visible targets, it will
consider the closest target as the current target. For example,
’Mean Distance’ calculates the distance between the player
and his closest visible target. We will explain the features
extracted below and what kind of data they need from the logs.
Also, we will show the most important features in section IV-B
below. Note that features which are specific to FPS games will
be marked as (FPS), and Generic Features that can be used in
other types of games will be marked as (G):
1) Number of Records (G): Contains the total number of
log rows during the current time frame. It uses the data
from the HeartBeat messages.
2) Number of Visible-Target Rows (G): Contains the number of rows a target was visible. It also uses the data
from the HeartBeat messages.
3) Visible-To-Total Ratio (G): Calculate the ratio of the
number of rows where a target was visible to the number
of total rows in the current frame.
4) Aiming Accuracy (FPS): Tracks the aiming accuracy
based on the visible targets at each frame. If a target
is visible, and the player is aiming at it, the aiming
accuracy increases exponentially by each frame. When
the player loses the target then the accuracy starts
decreasing linearly. This feature will use aiming target
ID and visible targets list from the logs.
5) Mean Aiming Accuracy (FPS): While a target is visible,
simply divide the number of times a player aimed on a
target by the number of times a target was visible. This
feature will use aiming target ID, and visible targets list
from the logs.
6) Player’s Movement (G): While a target is visible, this
feature will define the effect of a player’s movement
on aiming accuracy. It will use player’s movement
animation from the logs.
7) Target’s Movement (G): While a target is visible, this
feature will define the effect of a target’s movement
on a player’s aiming accuracy. It will look up the
target’s movement animation from the logs at the current
timestamp.
8) Hit Accuracy (FPS): It will simply divide the number of
hits over the number of total shots within the specified
time frame. Head shots will be giving higher points
than other body shots. It will use the target ID from
the Actions Table in the logs (target will be 0 if it is a
miss).
9) Weapon Influence (FPS): This is a complex feature that
uses Weapon type, Distance, and Zooming to define it.
While a target is visible, it will calculate the distance
between the player and the closest target. Then, it will
calculate the influence of the weapon used from this
distance, and whether the player is zooming or not. This
feature will use weapon type, player’s position, target’s
position, and zooming value from the logs. It will look
up the target’s position from the logs at the current
timestamp.
10) Mean View Directions Change (G): This feature defines
the change of the view direction vectors during the
specified time frame. It will be calculated using the mean
of the Euler Angles between vectors during that time
frame. Note that players will change view directions
drastically when the target dies; therefore, we put respawn directions into account.
11) Mean Position Change (G): This feature will define the
mean of distances between each position and the next
during the specified time frame. Like View Directions
above, this feature will put re-spawns into account.
12) Mean Target Position Change (G): This feature will
define the target’s position changes only when a target
is visible.
13) Mean Distance (G): While a target is visible, calculate
the distance between the player and the target. Then,
calculate the mean of these distances. It will use a
player’s position and its closest target’s position from
the logs.
14) Fire On Aim Ratio (FPS): This Feature will define the
ratio of firing while aiming on a target. It will use the
firing flag as well as the aiming target from the logs.
15) Fire On Visible (FPS): This Feature will define the ratio
of firing while a target is visible. It will use the firing
flag from the logs.
16) Instant On Visible (FPS): This feature will define the
ratio of using instant weapons while a target is visible.
It will use the Instant-Weapon flag from the logs.
17) Time-To-Hit Rate (FPS): This feature will define the
time it takes a player from when a target is visible until
the target gets a hit. It will use clients’ timestamps from
the HeartBeat Table in addition to Hit Target from the
Action Result Table.
18) Cheat: This is the Labeling feature and it will specify
the type of cheat used. The cheat that is used more than
50% of the time frame will be the labeled cheat. If no
cheats were used over 50% of the time frame, it will be
labeled as ”normal”.
C. Data Analyzer
The Data Analyzer will take the generated features’ file
and apply some machine learning classifiers on them to detect
the cheats used in our game. First, we will train the classifiers
on a labeled set of data using cross-validation with 10 folds.
Then, we will generate detection models to use them in a
3-player test run. Finally, we will specify the accuracy of
each classifier with each AimBot type and present them in
the Results section below. We used an implemented Data
Mining tool called Weka [13] to apply the machine learning
classifiers and create the models.
TABLE I: Number of Instances used for
IV. E XPERIMENTS
To produce the training data, we played eighteen different
death-matches. Eight matches were 10 minutes long and ten
matches were 15 minutes long. Each match was played by
two players over the network. Thus, we collected around
460 minutes of data. At each match, one player used one
of the cheats in section III-A3 and the other player played
normally, except for two matches, which were played without
cheats. The messages between the clients and the server were
exchanged at a rate of 15 messages/second. This rate was
selected to accommodate delays, and to avoid over usage of
bandwidth.
Since the feature extractor is flexible, we generated data
into different time frames, namely, 10, 30, 60 and 90
seconds/frame. Then, we classified them using the most
commonly used classifiers; Logistic Regression, which is
simple, and Support Vector Machines (SVM), which is
complex but more powerful. To apply SVM in Weka, we
used the SMO algorithm [14] For SVM, we used two
different Kernels, namely: Linear (SVM-L) Kernel, and
Radial Bases Function (SVM-RBF) Kernel. SVM contains
different parameters that could be changed to obtain different
results. One of those parameters is called the Complexity
Parameter (C), which controls the softness of the margin
in SVM, i.e. Larger ’C’ will lead to harder margins [15].
For L Kernel, we set the complexity parameter (C) value
to 10. The complexity parameter of RBF Kernel was set to
1000. We trained the classifiers and created the models using
cross-validation using 10 folds [16].
(a) Parts 1 and 2
Frame Size
10
30
60
90
Number of instances
Cheats Normal Total
990
1082
2072
421
445
866
217
223
450
149
159
308
(b) Part 3
Frame Size
10
30
60
90
Number of instances
Cheats
Normal Total
LB AF
708 282
1082
2072
300 121
445
866
157 60
233
450
107 42
159
308
(c) Part 4
Frame Size
10
30
60
90
Number of instances
Cheats Normal Total
190
1082
1272
100
445
545
50
233
283
30
159
189
A. Results
To display the results more clearly, we will illustrate them
in five parts. The first four parts are used to create and train
the models. Each part of the first four represents different data
organization methods. The last part is the testing phase using
30 minutes of collected data from three different players.
Before we show those representations (parts), Table I shows
how many instances were used for each part. Note that Table
I(c) contains the average number of cheat instances used in
Part 4. Also, note that the number of total instances is less
than 460 minutes sometimes, since we omit any feature row
that has a Visible-To-Total ratio of less than 5%.
1) All cheats together, using multi-class classification:
First, we analyzed data using multi-class classification,
in which we had all the instances mixed together using
specific label for each cheat class. The accuracy obtained
using such methods was shown in Table II, and the
best accuracy obtained was using Logistic Regression
with frame size = 60. The confusion matrix of the best
classification test is shown in Table II. In this table, you
notice that there were many miss-classifications within
the four ”Lock-Based” cheats (shown in gray): L, AS,
AM, and SA; and that is due to the similarity between
TABLE IV: Locked-Based vs AF Classification
TABLE II: Multi-class Classification
(a) Accuracy values for each classifier using different time frames
Frame Size
10
30
60
90
SVM-L
Accuracy
72.9
77.9
80.7
78.2
SVM-RBF
Accuracy
73.7
78.4
80.7
79.2
(a) Accuracy values for each classifier using different time frames
Logistic
Accuracy
72.3
76.7
80.9
77.5
Frame Size
10
30
60
90
(b) Confusion matrix for Logistic and frame size 60
Actual
L
25
3
3
13
0
1
L
AS
AM
SA
AF
No
AS
6
20
10
2
0
1
Predicted
AM SA
3
15
11
2
14
2
2
19
0
0
5
0
AF
1
0
1
0
60
0
No
1
1
3
0
0
226
Actual
LB
AF
No
2) All cheats combined, using two classes (yes or no):
We analyzed data by combining all the cheating as a
single cheat, i.e. label a cheat as ”yes” if occurred.
As you can see in Table III, instead of posting the
confusion matrix for each classifier, we showed the
different accuracy measurements. The measurements
are: Overall Accuracy (ACC), True Positive Rate
(TPR), and False Positive Rate (FPR). The best values
were distributed between all the classifiers. Therefore,
it depends on what the game developers care about most.
3) Lock-Based cheats classified together versus AutoFire (Multi-Class):
Once we combined Lock-Based cheats together we
achieved an 18% jump in overall accuracy compared to
Part 1 above. As you can see in Table IV(a), the best
value was obtained by using SVM-RBF when frame
size = 60. The confusion matrix of this test is shown in
Table IV(b).
4) Each cheat classified separately:
By separating each cheat, we achieved the highest
accuracy. However, each cheat will have its favorite
TABLE III: Two-class Classification
ACC
87.7
93.5
97.3
97.1
SVM-L
TPR
85.5
92.4
96.8
96.6
FPR
10.2
5.4
2.1
2.5
SVM-RBF
ACC TPR FPR
89.3 87.1
8.6
93.7 92.6
5.2
97.3 97.2
2.6
97.1
96
1.9
ACC
87.3
93.4
97.1
95.1
Logistic
TPR
85.3
92.4
97.2
96
SVM-RBF
Accuracy
89.6
95.2
98.2
97.1
Logistic
Accuracy
88.4
94.2
94.2
95.1
(b) Confusion matrix for SVM-RPF and
frame size 60
those four cheats (They are all based on Locking with
some confusion added in AS, AM, and SA). Therefore,
in section IV-A.3 below, we combined those four cheats
as one cheat to observe the difference in the resulting
accuracy.
Frame
Size
10
30
60
90
SVM-L
Accuracy
88.6
94.9
98
98.1
FPR
10.8
5.6
3
5.7
Predicted
LB AF No
153
1
3
0
60
0
1
3
229
classifier as shown in Table V. Again, the choice
of a certain classifier depends on which accuracy
measurement the game developer cares about the most.
We should clarify that the larger frame size, the less
number of instances. This happened because we are
using the same data.
5) Three Players Test Set:
After we generated the models in the previous four
parts, we played a 30-min long death-match that
contained three players; one honest player and two
cheaters (Lock and Auto-Fire). By looking at the results
in Parts 1, 2, 3 nd 4 above, we decided to choose
classifiers SVM-L and SVM-RBF with frame sizes 30
and 60. Therefore, we used the models created by the
previous trained data on the current new unseen data
(The 30-min, Three-players gameplay).
We represented the results in Table VI as follows:
Table VI(a) shows the number of instances collected
for each frame size. Table VI(b) shows the best results
obtained using models from Parts 1, 2, and 3 above.
Finally, Table VI(c) shows the best results obtained by
using the models from Part 4. In this table, we show
the detection accuracy as TPR for each cheat type and
for normal gameplay. We chose TPR to capture the
detection accuracy for each model when we provided
unknown cheat type.
B. Features Ranking
Before we analyse the results, we will show the features
that were important and useful for the prediction process. In
this paper, we will provide the ranking achieved by using only
the SVM-L models on all the training parts above. To rank the
features, we calculated the squares of the weights assigned by
the SVM classifier [17]. Therefore, the feature with highest
weight (squared) will be the most informative feature.
TABLE VI: Test Set Results
TABLE V: Separated Cheats Classification
(a) Number of instances
(a) Lock
Frame
Size
10
30
60
90
ACC
97.5
98.7
98.6
98.5
SVM-L
TPR
87.9
95.2
94.1
94.7
FPR
1
0.7
0.4
0.6
SVM-RBF
ACC TPR FPR
97.5
89
1.1
98.4 95.2
0.9
98.9 94.1
0
98.5 94.7
0.6
ACC
97.1
96
96.5
98.9
Logistic
TPR
87.9
90.5
88.2
94.7
Frame Size
FPR
1.5
2.9
1.7
0
30
60
L
59
29
Number of instances
AF Normal Total
41
58
158
24
32
85
(b) Accuracy results for Parts 1, 2 and 3
(b) Auto-Switch
Frame
Size
10
30
60
90
ACC
93.9
96.2
98.9
99.5
SVM-L
TPR
73.9
86.8
97.3
95.8
FPR
2.7
2.2
0.9
0
SVM-RBF
ACC TPR FPR
94.2 75.5
2.6
96.4 86.8
2
98.9 97.3
0.9
99.5 95.8
0
ACC
93.4
96.4
97.4
99.5
Part
Logistic
TPR FPR
71.7
2.9
86.8
2
97.3
2.6
95.8
0
1
2
3
Best Selected Results
Accuracy Frame Size Classifier
78.8%
60
SVM-L
94.1%
60
SVM-RBF
98.8%
60
SVM-L
(c) Accuracy results for Part 4
(c) Auto-Miss
Frame
Size
10
30
60
90
ACC
93.5
96.9
98.1
97.8
SVM-L
TPR
65.8
87.9
90.9
86.4
ACC
97.7
99.4
100
100
SVM-L
TPR FPR
90.5
1
98.6
0.4
100
0
100
0
ACC
92.9
96.6
98.3
100
SVM-L
TPR
75.9
92.6
96.7
100
FPR
2.4
1.8
0.9
0.6
SVM-RBF
ACC TPR FPR
93.2 66.5
2.8
96.7 87.9
2
97.8 87.9
0.9
97.8 86.4
0.6
ACC
93.3
95.9
95.1
96.1
Logistic
TPR
67.7
84.8
78.8
81.8
ACC
97.5
97.3
99.6
100
Logistic
TPR
91.6
91.9
100
100
FPR
1.4
1.8
0.4
0
ACC
92.2
96.1
98.6
99
Logistic
TPR
75.5
90.9
98.3
100
FPR
3.5
2.5
1.3
1.3
FPR
2.9
2.5
2.6
1.9
(d) Slow Aim
Frame
Size
10
30
60
90
SVM-RBF
ACC TPR FPR
97.8 91.1
1
98.8 95.9
0.7
100
100
0
100
100
0
(e) Auto-Fire
Frame
Size
10
30
60
90
FPR
2.7
2.2
1.3
0
SVM-RBF
ACC TPR FPR
94.4 81.6
2.2
97.2 94.2
2
98.6 98.3
1.3
100
100
0
Table VII shows the ranking of the features for each part in
section IV-A above (only training parts 1 to 4). We only show
the top five features since there is a big difference in weights
between the first 3-5 and the others. Also, we noticed that
there were some differences in the rankings between different
frame sizes. However, the top feature is always the same for
different frame sizes.
The most common top feature is ’Mean Aim Accuracy’, and
is the most informative feature in the prediction of Aimbots.
For the ’Auto-Fire’ cheat, ’Fire-on-Visible’ and ’Fire-onAim’ (both got very high weights) were the most informative
features since we are looking for detecting instant firing when
a target is over the crosshair.
Cheat
Type
TPR
for L
96.6%
100%
89.7%
89.7%
55.2%
L
AS
AM
SA
AF
Classifier
SVM-RBF
Both
SVM-RBF
Both
SVM-L
classifier did not prove to be as accurate since we had
confusion between the lock-based cheats. On the other hand,
by separating the cheats and having a classifier for each
type of cheat, we achieved higher accuracy in the first four
parts (training sets). This shows that separated cheat detection
can help developers with cheat detection by applying all the
models simultaneously on each instance (explained in the next
paragraph). In part IV-A.5 of the experiments, we noticed that
the models obtained from parts IV-A.3 and IV-A.4 gave the
highest accuracy.
TABLE VII: Top Five Features in Each Training Part
(a) Top Five Features for Part 1
Rank
1
2
3
4
5
Frame Size
30
60
MeanAimAcc
MeanAimAcc
FireOnAim
FireOnAim
FireOnVisible
AimAcc
AimAcc
FireOnVisible
TrgtPosChange TrgtPosChange
(b) Top Five Features for Part 2
Rank
C. Analysis
As we can observe from the results above, to apply supervised machine learning methods we need full knowledge about
our data. In our context, which is FPS AimBots, we noticed
that awareness of the type of each cheat could help increase
the accuracy of the classifier. However, using multi-class
Best Selected Results
TPR
TPR
Frame
for AF for Normal
Size
2.4%
100%
60
0%
100%
60
0%
100%
60
0%
100%
60
100%
96.9%
60
1
2
3
4
5
Frame
30
MeanAimAcc
FireOnVisible
FireOnAim
TrgtPosChange
HitAcc
Size
60
MeanAimAcc
FireOnVisible
HitAcc
FireOnAim
MeanDistance
(c) Top Five Features for Part 3
Rank
1
2
3
4
5
Frame Size
30
MeanAimAcc
FireOnVisible
FireOnAim
HitAcc
AimAcc
60
MeanAimAcc
FireOnAim
FireOnVisible
HitAcc
PlayerMov
(d) Top Five Features for Part 4: Lock
Rank
1
2
3
4
5
Frame Size
30
60
MeanAimAcc MeanAimAcc
HitAcc
FireOnVisible
PlayerMov
PlayerMov
WpnInf
HitAcc
FireOnAim
TrgtMov
(e) Top Five Features for Part 4: Auto-Switch
Rank
1
2
3
4
5
Frame
30
MeanAimAcc
TrgtPosChange
PlayerMov
FireOnAim
PositionChange
However, when these models are applied to an online game,
there is no labeled set (i.e. unseen data). Therefore, there
needs to be a method of confirming whether a player is
a cheater or not. A good way is to specify a threshold
to identify a cheater. The threshold can be based on the
number of flagged records and when reached, a player can be
determined as a cheater. The threshold value depends on the
detection accuracy of the model and on the developers’ policy
for cheat detection. It also depends on when the detection is
taking place: online while the cheater is playing (real-time
detection), or offline after a game is finished. The following
is an example of how this detection technique can be used
by changing the threshold, based on the above test data (Part
IV-A.5). For the purposes of the example, the model from
Part IV-A.2 is going to be used and a frame size 60. The
threshold is going to be set to 50%, which provides a loose
detection to avoid some false positives. Then, by using the
following formula the actual number of cheating records that
will flag a player as a cheater can be determined:
Size
60
MeanAimAcc
TrgtPosChange
PlayerMov
MeanDistance
WpnInf
N umberOf Records(N R) =
M atchLengthInSeconds/F rameSize
Then,
N umberOf CheatingRecords(N CR) =
N R ∗ T hreshold
(f) Top Five Features for Part 4: Auto-Miss
Rank
1
2
3
4
5
Frame Size
30
MeanAimAcc
AimAcc
FireOnAim
HitAcc
PlayerMov
60
MeanAimAcc
HitAcc
MeanDistance
TrgtPosChange
PlayerMov
(g) Top Five Features for Part 4: Slow-Aim
Frame Size
Rank
30
MeanAimAcc
HitAcc
PlayerMov
FireOnAim
MeanDistance
1
2
3
4
5
60
MeanAimAcc
HitAcc
MeanDistance
TimeToHit
AvgInstntWpnOnVis
(h) Top Five Features for Part 4: Auto-Fire
Rank
1
2
3
4
5
Frame Size
30
60
FireOnVisible FireOnVisible
FireOnAim
FireOnAim
HitAcc
HitAcc
TrgtMov
MeanAimAcc
WpnInf
MeanDistance
In this example:
N R = 1800/60 = 30
Therefore,
N CR = 30 ∗ 0.5 = 15
On the other hand, if we look at the results on the test set
for applying the models from part IV-A.4, we notice that the
all Lock-Based models detected the Lock cheat accurately;
however, only the Auto-Fire model detected the Auto-Fire
cheat accurately. Therefore, using these models, and applying
both of them simultaneously on the gameplay data can
provide the required detection mechanism. The threshold is
set again at 50%; i.e. NCR = 15 using the previous formula,
however the following formula needs to be used to detect a
cheater:
LockBasedCheatingRecords(LBCR) =
M ax(N CR(L), N CR(AS), N CR(AM ), N CR(SA))
Then,
OverallN CR = LBCR + N CR(AF )
The reason for using an overall threshold is to detect
players that use different cheats during the same match.
After analyzing the data, we noticed that some results
weren’t accurate enough sometimes. Many reasons could
cause this inaccuracy to appear. First, in online games, delays
(lags) happen all the time, which can cause the message
exchange rate to fall down (it reaches less than 5 messages/sec
in some cases). This delay reduces the accuracy of data
collection and causes, some features give odd values hence,
low accuracy will be expected. Another reason for inaccuracy
is the number of cheating data collected compared to normal
behavior data. In some cases, especially with separated cheats,
the ratio between cheats and normal data is higher than 40:60
(which is a reasonable rate between normal and abnormal data
for classification in our opinion). Finally, improving the feature
set, by either adding new features or by modifying some of
the existing ones, can also help increase accuracy.
Overall, the accuracy that was achieved is very high, especially with the AimBots confusions added (Auto-Switch
and Auto-Miss). Frame sizes affect the accuracy; the frame
size 60 caused highest achievable accuracy over most of the
experiments. However, as mentioned before, larger frames will
result in a fewer number of instances. Also, we assume that
any frame larger than 60 is too large, since smart cheaters
will activate cheats only when they need them. Therefore,
larger frame size might not be accurate to identify a cheating
behavior. In our opinion, we suggest using any frame size
between 30 and 60 seconds.
V. C ONCLUSION AND F UTURE W ORK
In this paper, we presented a behavior-based method to
detect cheaters in Online First Person Shooter games. The
game was developed using Unity3D game engine and it
had client-server architecture. The game logs were stored
in the server side, to be parsed by a feature generator and
then analyzed by the data analyzer. Data was analyzed using
implemented machine learning classifiers in Weka.
After conducting several experiments using different types
of detection models, we obtained great accuracy results.
We then suggested using thresholds when applying the
generated models. The value of the threshold depends on
the developers’ policy for cheat detection. It could be low
(harsh) or high (loose); however, it should not be too harsh
or it will capture many experienced honest players as cheaters.
In general, game data analysis depends on how much
you know the game. Therefore, developers could adjust
data in messages exchanged between client and server to
accommodate the generation of the features. Also, using
behavior-based cheat detection methods will protect players’
privacy.
More work could be done in the future to improve the
cheat detection models. We could collect more data using
a mixture of cheats for each player instead of one cheat
the whole game. Also, more features could be generated
to improve identifying cheats. Moreover, we could identify
the behavior of each player separately. That means building
cheating detection models for each player instead of an overall
model for each cheat. Other improvements and additions
could be done including increasing the number of maps and
weapons.
ACKNOWLEDGMENT
We would like to thank the members of GamePipe lab at the
University of Southern California for their continuous support
and advices; especially Professor Michael Zyda, Balakrishnan
(Balki) Ranganathan, Marc Spraragen, Powen Yao, Chun
(Chris) Zhang, and Mohammad Alzaid.
R EFERENCES
[1] G. Hoglund and G. McGraw, Exploiting online games: cheating massively distributed systems, 1st ed. Addison-Wesley Professional, 2007.
[2] S. Webb and S. Soh, “A survey on network game cheats and P2P
solutions,” Australian Journal of Intelligent Information Processing
Systems, vol. 9, no. 4, pp. 34–43, 2008.
[3] Unity3d - game engine. [Online]. Available: http://www.unity3d.com/
[4] S. Yeung and J. C. Lui, “Dynamic Bayesian approach for detecting
cheats in multi-player online games,” Multimedia Systems, vol. 14,
no. 4, pp. 221–236, 2008. [Online]. Available: http://dx.doi.org/10.
1007/s00530-008-0113-5
[5] L. Galli, D. Loiacono, L. Cardamone, and P. Lanzi, “A cheating
detection framework for Unreal Tournament III: A machine learning
approach,” in CIG, 2011, pp. 266–272.
[6] C. Thurau, C. Bauckhage, and G. Sagerer, “Combining Self Organizing
Maps and Multilayer Perceptrons to Learn Bot-Behavior for a Commercial Computer Game,” in Proc. GAME-ON, 2003, pp. 119–123.
[7] C. Thurau, C. Bauckhage, and G. Sagerer, “Learning Human-Like
Movement Behavior for Computer Games,” in Proc. Int. Conf. on the
Simulation of Adaptive Behavior. MIT Press, 2004, pp. 315–323.
[8] C. Thurau and C. Bauckhage, “Towards manifold learning for gamebot
behavior modeling,” in Proceedings of the 2005 ACM SIGCHI
International Conference on Advances in computer entertainment
technology, ser. ACE ’05. New York, NY, USA: ACM, 2005, pp. 446–
449. [Online]. Available: http://doi.acm.org/10.1145/1178477.1178577
[9] C. Thurau, T. Paczian, and C. Bauckhage, “Is Bayesian Imitation
Learning the Route to Believable Gamebots?” in Proc. GAME-ON North
America, 2005, pp. 3–9.
[10] K. Chen, H. Pao, and H. Chang, “Game Bot Identification based on
Manifold Learning,” in Proceedings of ACM NetGames 2008, 2008.
[11] USC GamePipe Laboratory. [Online]. Available: http://gamepipe.usc.
edu/
[12] Artificial Aiming. [Online]. Available: http://www.artificialaiming.net/
[13] Weka 3: Data Mining Software in Java. [Online]. Available:
http://www.cs.waikato.ac.nz/ml/weka/
[14] J. Platt, “Advances in kernel methods,” B. Schölkopf, C. J. C.
Burges, and A. J. Smola, Eds. Cambridge, MA, USA: MIT
Press, 1999, ch. Fast training of support vector machines using
sequential minimal optimization, pp. 185–208. [Online]. Available:
http://dl.acm.org/citation.cfm?id=299094.299105
[15] M. Rychetsky, Algorithms and Architectures for Machine Learning
Based on Regularized Neural Networks and Support Vector Approaches.
Germany: Shaker Verlag GmbH, Dec. 2001.
[16] R. Kohavi, “A study of cross-validation and bootstrap for accuracy
estimation and model selection,” in Proceedings of the 14th
international joint conference on Artificial intelligence - Volume
2, ser. IJCAI’95. San Francisco, CA, USA: Morgan Kaufmann
Publishers Inc., 1995, pp. 1137–1143. [Online]. Available: http:
//dl.acm.org/citation.cfm?id=1643031.1643047
[17] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene selection for
cancer classification using support vector machines,” Mach. Learn.,
vol. 46, no. 1-3, pp. 389–422, mar 2002. [Online]. Available:
http://dx.doi.org/10.1023/A:1012487302797