Medialogy Project 6th Semester Group 617, Aalborg
Transcription
Medialogy Project 6th Semester Group 617, Aalborg
Medialogy Project 6th Semester Group 617, Aalborg University Copenhagen Christian S. Andersen Aalborg University - CPH Jesper T. Hansen Aalborg University - CPH Simon P. Norstedt Aalborg University - CPH Andreas V. Pedersen Aalborg University - CPH This page is intentionally left blank MED6, Group 617 Contents AAU-CPH Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Artificial Intelligence in FPS games . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Finite State Machine . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.2 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.3 Paradigms of ML . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.4 ML Paradigms Concerning AI FPS games . . . . . . . . . . . . . . 2.1.5 Basic Dynamic Scripting . . . . . . . . . . . . . . . . . . . . . . . . 2.1.6 Interim Summary for Artificial Intelligence in FPS games . . . . . . 2.2 Choice of Game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Unreal Tournament 2004 . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Counter-Strike 1.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.3 Quake 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.4 Interim Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Collaboration in Counter-Strike . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Radio Communication . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.3 Tactics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.4 Team tactics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.5 Defensive Elements . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.6 Offensive Elements . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.7 Interim Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Bots in Counter-Strike . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Static vs Dynamic Bots . . . . . . . . . . . . . . . . . . . . . . . . 2.4.2 Gathering Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.3 RealBot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.4 TeamBot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.5 PODbot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.6 Interim Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Testing ”human-like” Behavior . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.1 Standard Turing Test . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.2 Turing test in Virtual Game Environments . . . . . . . . . . . . . . 2.5.3 The BotPrize Competition . . . . . . . . . . . . . . . . . . . . . . . 2.5.4 Interim Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.1 The Bot Should Have a higher purpose/goal to evaluate actions and experiences from . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.2 The Bot Should be Responsive to and Communicate Tactical Decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.3 The Bot Should Be able to Navigate in any Map . . . . . . . . . . 2.6.4 The Bot Should Be able to Aim and Shoot . . . . . . . . . . . . . . 2.6.5 The Bot Should Choose Weapons in Relation to its Main Objective 2.6.6 The Bot Should Navigate in Relation to its Main Objective . . . . . 2.6.7 The Bot Should use Rotation in relation to its Main Objective . . . 2.6.8 The Bot Should Have Both a Defensive and Offensive Strategy at any given Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 4 4 5 5 6 9 10 13 14 14 15 16 16 17 18 18 18 18 19 19 20 20 21 21 22 24 25 26 27 27 27 27 28 28 . 28 . . . . . . 28 29 29 29 29 29 . 30 MED6, Group 617 2.7 3 3.1 3.2 3.3 3.4 3.5 4 4.1 4.2 4.3 4.4 4.5 4.6 4.7 5 5.1 5.2 5.3 6 6.1 6.2 6.3 7 7.1 7.2 7.3 7.4 8 9 Contents 2.6.9 The Bot Should Learn from its Experiences Summary . . . . . . . . . . . . . . . . . . . . . . . Design . . . . . . . . . . . . . . . . . . . . . . . . . Objectives Tree . . . . . . . . . . . . . . . . . . . . Concept Design . . . . . . . . . . . . . . . . . . . 3.2.1 Higher Purpose / Goal . . . . . . . . . . . . 3.2.2 Learn Through Experience . . . . . . . . . 3.2.3 Tactical Decisions . . . . . . . . . . . . . . 3.2.4 Offensive / Defensive Strategies . . . . . . 3.2.5 Tactical Communication . . . . . . . . . . . 3.2.6 Navigation . . . . . . . . . . . . . . . . . . 3.2.7 Aim and Shoot . . . . . . . . . . . . . . . . Interim Summary . . . . . . . . . . . . . . . . . . . Choice of Bot . . . . . . . . . . . . . . . . . . . . . Development Choices for the PODbot . . . . . . . Implementation . . . . . . . . . . . . . . . . . . . . Counter-Strike Structure . . . . . . . . . . . . . . . 4.1.1 Half-Life Software Development Kit . . . . . 4.1.2 Add-ons . . . . . . . . . . . . . . . . . . . . 4.1.3 Metamod . . . . . . . . . . . . . . . . . . . Compiling the Dynamic-Link Libraries . . . . . . . PODbot . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 PODbot Properties . . . . . . . . . . . . . . 4.3.2 Understanding of the PODbot’s Code . . . CIAbot . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Changes to the PODbot . . . . . . . . . . . HLTV & Screen Capturing . . . . . . . . . . . . . . Editing . . . . . . . . . . . . . . . . . . . . . . . . . Interim Summary . . . . . . . . . . . . . . . . . . . Testing . . . . . . . . . . . . . . . . . . . . . . . . . Different Approach Than The BotPrize Competition Testing Through Observation Instead of Interaction Test Setup . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Hypothesis . . . . . . . . . . . . . . . . . . 5.3.2 Playing against PODbot and CIAbot . . . . 5.3.3 Video footage . . . . . . . . . . . . . . . . . 5.3.4 Survey . . . . . . . . . . . . . . . . . . . . . 5.3.5 Finding Test Subjects . . . . . . . . . . . . Test Results . . . . . . . . . . . . . . . . . . . . . . Experience >= 2 . . . . . . . . . . . . . . . . . . . Experience >= 4 . . . . . . . . . . . . . . . . . . . Interim Summary . . . . . . . . . . . . . . . . . . . Discussion . . . . . . . . . . . . . . . . . . . . . . Choice of Game . . . . . . . . . . . . . . . . . . . Choice of Bot . . . . . . . . . . . . . . . . . . . . . Test setup . . . . . . . . . . . . . . . . . . . . . . . Test Results . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . Future Perspectives . . . . . . . . . . . . . . . . . iii AAU-CPH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 30 31 31 32 32 33 35 37 39 40 41 42 42 43 44 44 44 44 44 44 45 45 45 48 48 49 50 50 51 51 51 52 52 52 53 53 54 56 56 61 65 66 66 66 66 67 70 71 MED6, Group 617 10 Contents AAU-CPH Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 iv MED6, Group 617 Contents AAU-CPH Reader’s Guide This page is dedicated to explain the the syntax through this report, as well as list commonly used terms, and elaborate on their definition if necessary. Frequent Terms • FPS - First-Person Shooter, game genre. • NPC - Non-Playable Character, a computer controlled unit in a game. • bot - (ro)bot, an NPC fulfilling a role otherwise designed for a human player. • ML - Machine Learning • FSM - Finite State Machine • CS - Counter-Strike • DS - Dynamic Scripting • RL - Reinforcement Learning • UL - Unsupervised Learning • SL - Supervised Learning Source Reference According to the this research (Patel and Hexmoor [2009]), ... Cross References This is an example of a cross reference to the Turing Test section (2.5). v MED6, Group 617 1 1 INTRODUCTION AAU-CPH Introduction Through our own knowledge we have experienced that bots in First-Person Shooter games such as Battlefield (Arts) and Counter-Strike (Corporation), seems fairly unintelligent in correlation to teamwork and cooperative tactics. The bots of such games can often be set to a certain difficulty such as hard, medium or easy. The difficulty however, typically only affects the skills of the given bot such as how fast it is to engage in combat with an opponent or how well it aims, not the way it plays, where it runs to and the tactics used - such as a human player will have the ability to. In order to create a bot as similar to a person as possible, it is important to give each bot its own mindset to analyze what happens and react accordingly, which means it should be controlled by itself, and its own knowledge developing over time. Bots in games are a type of non-playable character designed to act as if a human player is playing it. Such bots vary a great deal depending on the game they are designed for. Rather well known bots (Randar) from some of the first multiplayer First-Person Shooter games1 can be found in games such as Quake (Software [a]). The game was designed as a single player mission game, where it was possible to play against other players through Local Area Network in a single map. The continuation desire of these games were therefore restricted by the need of other players, in order to use the multiplayer features. This resulted in the development of programs that created an agent that could simulate another player inside the game. This kind of bot is known as the C/S or ClientSide bots. Simply put there are two different types of bots; static bots and dynamic bots. The static bots are bots using waypoints or path nodes that is determined by the developers for each single map, while the dynamic bots are learning the map through trial and error, which makes it possible using it in any map (Patel and Hexmoor [2009]). The first kind of bots, were developed to be static, but when custom maps for games began to be developed, the demand for dynamic bots increased. Some of the benefits of having dynamic bots are their ability to behave more like a human would behave, in any given game. This includes the bots being less predictable than what could be seen in regards to the static bots. The static bots will often be easy to recognize for the experienced player, if he/she finds them in a situation not accounted for by the developers of the bots and they behave oddly or illogical. (Patel and Hexmoor [2009]) Not only will the dynamic bot be less predictable, it will also often prove to be more efficient and less time consuming to develop for the developers. Static bots needs to be instructed in what to do in every possible scenario of the game. According to Doherty and ORiordan [2009], Chishyan Liaw and Hao [2013] creating dynamic bots can be achieved by using Machine Learning algorithms, which will determine the success of the bots by evaluating upon how close they get to a certain predetermined goal. The bots will then run through a large number of generations, where they store the outcome of their decisions and find out which ways or methods will be most beneficial in order to achieve their goal. The more efficient the bots are in reaching their goal as well as behaving similarly to that of a human players behavior, is the essence of creating a bot that will not only be a worthy adversary, but also make the player unaware of the fact that he is playing against a program, and not a human being. 1 A game based on shooting mechanics, where the player controls the character from a First-Person perspective. 1 of 79 MED6, Group 617 1 INTRODUCTION AAU-CPH Zanetti and Rhalibi [2004] states that the first-person shooter game genre itself holds many variations of game mods and the main objectives in each mod can differ greatly from each other. This means that the bots needs to be instructed in such goals, if they are to simulate human players properly. Apart from the goals of the game mod, the bots also needs to behave like a human would play the game. It will be too easy to spot a bot if, for instance, it constantly points its rifle at an opponent, even if the opponent is not visible to the given bot. Many new methods of developing such bots have emerged during the past decade. In the more recent times state of development, finite state machines (FSM) has gotten nearer this point of making an AI capable of over-thinking a human. However the finite part of this kind of systems restrains it to only be capable of reacting properly to scenarios predicted by the developer of the FSM. Overcoming the challenge of creating bots capable of out-thinking human players is a hard one that can be attempted by numerous methods (Remco Straatman and Beij). Having a programmer write scripts for each and every possible scenario in a game is not necessarily a good way of ensuring that all possibilities are taken care of. This method will often be insufficient, not only because it is almost impossible to predict every single possible scenario in a game but also because it once again will be too predictable behavior. Many bots use the position of opponents to figure out if they need to take cover or flee. This analysis performed is based on information regarding opponents in the game as well as the objects in their proximity. Some bots also looks at their team-mates in order to figure out that they are not outnumbered and thereby be able to make the decision of staying and fight in their current position instead of fleeing. Besides this evaluation, most common bots are not using the information about their fellow team-mates for anything else Doherty and ORiordan [2009]. But what if the bots were able to give each other requests and commands? We are interested in giving the bots in a First-Person shooter game, the capability of interacting with each other, the same way human players do while playing a game. This approach is not researched in detail yet, assumptions and hypotheses states that it is ineffective since it requires a lot of code, and seems fairly complicated, but it will increase the unpredictability, in other words the human-like qualities, of a given FPS game BOT. According to Zanetti and Rhalibi [2004], Patel and Hexmoor [2009], Nicholas Cole and Miles [2009] substantial developmental effort and work has been put into ’evolving’ the traditional bots into learning, human-like entities, and with prominent results, showing great promise. However, most of these developments has been exclusively with the individual bot in mind, and the few which has thought of the interaction between the bots, only puts a limited focus on the matter, and due to the nature of Machine Learning, the results of their tests showed very vague and inconclusive answers to the significance the different aspects of the bots, were responsible for different outcomes. In 2001, Alistair Stewart developed a new bot named ”TeamBot”. The TeamBot focused on giving each bot a personality, allowing them to perform various tasks as part of team tactics. The collaboratory team tactics was never fully implemented, however. Later in the development of bots in Counter-Strike, another bot was implemented. The RealBot, which was made by Stefan Hendricks who’s goal was to make it as human-like as possible Strahan [2004], Hendricks [2007]. This resulted in a highly developed bot, though it still needs some further development to be as human-like as he wanted. 2 of 79 MED6, Group 617 1 INTRODUCTION AAU-CPH All in all the progress of learning, tactical and human-like bots, has come a long way, but it has yet to be seen; for the individual bots to look to each other, and recognize their own value to their fellow bots, vice versa, and utilizing each other to execute cooperative tactics, the same way human players are seen to think and play. Therefrom stems our final problem statement: Will interactive communication between FPS game bots, in order to establish dynamic combat-tactics, increase the development towards a completely human-like bot? 3 of 79 MED6, Group 617 2 2 ANALYSIS AAU-CPH Analysis Through this analysis, different aspects of the development of a bot will be addressed. The first thing that will come into focus is different approaches of artificial intelligence in regards to developing a human-like agent. Following this, three different games will be analyzed as to how they would function as a platform for further development of interactively communicating bots, where one will be chosen to serve as the main platform for this project. When a platform has been chosen, state of the art bots for this platform, will be analyzed with focus on human-like features and qualities. Next, there will be a section regarding the origin and further developed variations of the Turing Test, which for centuries has been a renowned test for measuring human-like qualities in technology. Lastly, a section containing requirements gathered through the analysis will be listed. The research within the area of FPS game bots with human-like learning capability, shows that the development has come a long way, in respect to the individual and independent bot. In order to further the development towards a capability of interactive cooperation between bots, further research will be necessary in the following aspects, in respective order: • Which methodology will constitute the optimal premise for the development? • What FPS game will provide a good premise to build and test our final problem statement from? • How is tactical (human) collaboration existing in this game? • What SOTA bots are available to develop on for this game, and what methodology do they use? 2.1 Artificial Intelligence in FPS games With the aim towards human-like bots, the field of artificial intelligence in FPS games will be in focus, raising the question; how are bots able to simulate the behavior of a humanplayer in FPS games? This chapter will investigate different approaches that have been applied to simulate human-player behaviors. The first section 2.1.1 will cover the Finite State Machine (FSM) which, for many years, have been the kind of program that were used to simulate logic behavior. However, since this machine does not have a dynamic feature, which is an essential aspect of this project, the main emphasis in the chapter will be to investigate how a program can learn. The term learning entails Machine Learning which will be the content of the subsequent section 2.1.2. This section will cover the general concept and history behind ML, followed by a section that introduces the most important categories of ML on a general aspect. The section 2.1.4 will further elaborate on the ML methods introduced in the 2.1.2 section, by putting a sharper focus on FPS game bots. The purpose of the section is to investigate which ML paradigms that enable FPS bots to learn how to perform and adjust combat tactics in a game environment. Computational considerations such as optimization will also be introduced. This is an important aspect of saving computational power and prioritizing which behaviors are essential in ”learning” most efficiently. The last section 2.1.5 will cover the ML method Dynamic Scripting. Although this method is only as an example of a ML method incorporated in FPS game bots, the example will attempt to give a more tangible understanding of 4 of 79 MED6, Group 617 2 ANALYSIS AAU-CPH which parameters are important in enabling a bot to continuously assess its own behavior in a FPS-game environment. 2.1.1 Finite State Machine With the aim towards human-like bots, the field of artificial intelligence in FPS games will be in focus, raising the question; how are bots able to simulate the behavior of a human-player in FPS games? Nareyek Throughout the first many years of FPS games, the bots were run by a method called the Finite State Machine (FSM), which enabled them to perform tasks that appeared fairly similar to humans’. Even today, most FPS games employ this method. Despite being a widely acknowledged and accepted method in constructing non-playing characters, the method has its’ limitation in the way that it only consists of hard-coded tasks. This required the developer to determine all ”human-like” tasks that the bot had to perform, in a pre-defined order. Comparing these static bots with human players, there are some general traits in the human player(and in the human mind in general) that makes his/her patterns of behavior a lot more complex. People would not be inclined to stick to a specific procedure, but would perform based on aspects such as curiosity, patience, mood, experience, etc. Replicating the extreme complexity of the human behavior has obviously not been accomplished yet, but some game industries have, however, succeeded in generating dynamic and adaptive behaviors in FPS bots. This leads to the main emphasis in this sections; Machine Learning (which will mainly be denoted as ML from here on out). 2.1.2 Machine Learning When trying to identify what lies at the core of intelligent behavior, the ability to learn and adapt is inevitable, discarding the FSMs static behavior. Buchanan [2006]This is why ML is essential for the development towards more realistic bots, as it has the potential to create diverse behavior without the need to implicitly code them. 2.1.2.1 History and Application of ML Over the past 50 years, the study of ML has utilized the fundamental aspect of probability and statistics. These mathematical measurements are an inevitable part of ML algorithms, and has turned out to be beneficial in a large variety of fields of ML. The first program to successfully incorporate computational ML was in the Game of Checkers (which will further elaborated in the next section). Arthur [1959] Since then more advanced ML algorithms has applied pattern recognition, which is utilized in areas such as: facial, speech and signature recognition, filtering out spam emails, computer-aided diagnosis for medical science, fingerprint analysis, navigation and guidance systems, etc. The beginning of the 21st century brought an explosion to ML where adaptive programming became the norm for intelligent programs. Geisler [2002] Now the programs were both capable of recognizing patterns, learn from experience, abstract new information from data, and optimizing it’s efficiency. Fields that applied these intelligent computations were for instance search engines, credit card fraud detections, adaptive web applications such as YouTube (and many many others) that continuously change its’ video recommendations to adjust to the individuals prior searches, financial applications, general field of intelligent robotics and many more areas.Mitchell [2006] 5 of 79 MED6, Group 617 2.1.2.2 2 ANALYSIS AAU-CPH Definition/Concept of ML Buchanan [2006] Making an attempt to write down an exclusively adequate definition of ML is really hard, as it is such a broad field, and there have been developed a vast amount of approaches to this. However, there are some general traits that are common for all ML methods: The computers ability to investigate data and make generalizations of the data based on experience. Arthur Samuel who was a pioneer of artificial intelligence for creating the first learning program,(the Checker program in 1952), defined ML as a ”Field of study that gives computers the ability to learn without being explicitly programmed.McCarthy [1992] Samuel’s program became a better player after many games against itself and a variety of human player using supervised learning (which will be elaborated in section (3.1.2.1)). So, how does the computers learn without being explicitly programmed? The idea is to infer algorithms from data structures with the purpose of exposing indistinct patterns, and then using these algorithms to predict new data. McCarthy [1992] In Samuel’s Game of Checkers, the program inferred algorithms by observing which strategical moves that contributed to winning the game and adapted its programming to utilize these moves.Arthur [1959] 2.1.3 Paradigms of ML ML algorithms are often categorised with respect to the training data that is available for the computation of learning. For this there are roughly three categories; • Supervised Learning (SL) • Unsupervised Learning (UL) • Reinforcement Learning (RL) There are a comprehensive amount of learning algorithms that utilize the basic concept of either of the three learning paradigms for training and modeling of data. Multiple learning algorithms are often used to form a hypothesis that takes multiple ”opinions” into account. By opinion, meaning each of the ML algorithms’ inferred model that define a relation between data input and output - action and reward.Policarpo [2011] The next section will look into the different ML paradigms, in order to get a better understanding of how they have been applied in FPS games, what their limitations are, and which role they play in reaching the human-like behavior in FPS games. This aims at providing insight into which solutions are most appropriate for an optimal development. 2.1.3.1 Supervised Learning Has [2009] Supervised learning deals with data that is already known. In other words, the agent learns from examples, where data instances (variables such as car speed and direction) and responses are known. The outcome of the different values of data instance is either given reward or error based on the initial purpose. In a more simplistic notion, this procedure corresponds to having a supervisor observing an agent go though a range of computations and explaining which behaviors were good and which were bad. After a sufficient amount of time, the learning agent would have gathered enough knowledge to form algorithms that address implicit relationships in the data, and thus have a general 6 of 79 MED6, Group 617 2 ANALYSIS AAU-CPH policy as to which behaviors are good and which are less good. These algorithms can then be used to infer predictions on unknown data. Fig. 1: This illustration depicts how Supervised-learning works. The algorithm that deduces relations in the raw data is trained by humans. The trained algorithm is then tested to see if it can be verified as a ”good solution” for classifying the data. 2.1.3.2 Unsupervised Learning Unsupervised learning is trying to indicate hidden structures in unlabeled data. By neglecting to classify the different possible data outcomes, the learning algorithm attempts to cluster data samples that share common properties. (Has [2009]) The Unsupervised Learning differs from Supervised Learning in the way that it does not know what is good and what is bad. It cannot label the clusters of data as it has no knowledge as to why data are grouped together, and what the grouping represents.(Jones [2009]) Fig. 2: This illustration depicts how Unsupervised-Learning works, where an algorithm is applied to infer statistical similarities in the data by clustering patterns in groups. The ”manual review” attempts to label/identify what the clusters of data represents. 7 of 79 MED6, Group 617 2.1.3.3 2 ANALYSIS AAU-CPH Reinforcement Learning Reinforcement learning (which can be referred to as a type of Unsupervised Learning. Sha [1990]) has its’ similarities with UL in the way that it is not presented with known data from the beginning. It works in such a manner, that an agent can learn behavior through trial-and-error interactions with a dynamic environment. The agent performs a random action which results in either a reward or a punishment. Based on the feedback, the agent tries to find the optimal policy of behavior. Kae [1996] By performing varieties of actions while progressively favoring those that are most likely to produce reward, the agent will eventually learn how to interact with the environment in the most optimal way. When the computer learns from an experience it can, quite intuitively, be referred to the logic behind human actions in terms of purpose and evaluation. When we make a decision, it is depended on what we intent to do, and an assessment of what will happen if we do it. Geisler [2002] Three conditions are relevant for acquiring learning for ML: • That there is an action performed. • That there is a purpose behind the action performed. • That there is an assessment of the extent to which the action performed in that particular situation was a success or not. This assessment is based on the initial purpose. Fig. 3: This illustration depicts how Reinforcement Learning works. The second step in the procedure illustrated above shows how a human tries to model a relation between sample data and feedback. The algorithm that represents the data relations gets modified throughout the entire body of data. Pieter Spronck [2003] Depending on the amount of actions performed, the Reinforcement Learning model will have an increasing understanding on which actions will benefit the success criteria in arbitrary situations, and which actions that dont accommodate the purpose. In order to properly estimate which actions in certain situations are most likely to be successful, the Reinforcement Learning uses probability measurements. These probability values determine how probable it is that an action in a given situation will be successful. If the action performed in the given situation turned out to be unsuccessful it did not accommodate the purpose - then the probability value will decrease, and vice versa. 8 of 79 MED6, Group 617 2.1.4 2 ANALYSIS AAU-CPH ML Paradigms Concerning AI FPS games The preferred technique of ML is fully depended on what problem there is to be solved. So, what is the machine trying to learn? When dealing with FPS games, it is essential to assess what representation of data that the agent must withdraw summarizations/predictions from. 2.1.4.1 What Should The FPS Game Bots Learn? Policarpo [2011] In first-person shooter games, the players’ objective is to win the game and fulfill some team-based tasks such as planting a bomb or rescuing hostages. Having addressed this goal, what the agent learns must in all cases refer to this main objectives. This is important, because if the agent must be capable of assessing its’ own perform (and thereby learn), it must be able to evaluate to which extent it reached its’ objectives. In most AI FPS games, agents are learning how to tactically navigate, where to aim, which weapons are good in which situations, etc. In order to do this, it needs to constantly evaluate the environment of alternating behaviors of human players, accounts for it’s own health level, position and weapon, the number of enemies nearby, their position and their weapons etc. 3.2.5 If the agent must adapt it’s behavior to accommodate its’ purpose, it needs to consistently update information about the consequences of its own actions. 2.1.4.2 Comparing the Learning Paradigms Has [2009] Supervised learning has its advantage of drawing assumptions based on fixed data - meaning, it can assess comprehensive amounts of data in- and -outputs from available examples of human players, enabling the agent to form a set of predictions such as - ”it is good to run away if the enemy is shooting at you and you does not have a weapon”. This background knowledge learned from observing a comprehensive amount of expert players is a method that is basically inevitable in Artificial Intelligence in FPS games today, as it gives as behavioral basis for the agent. However, determining specific actions in specific situation is not likely to be successful against a trained human player, as the agent would not be able to adjust its’ behavioral patterns in case the human-player starts to ”figure out” how it behaves. This is what makes Reinforcement Learning appropriate. The ML bot performs an action in a game environment, uses the feedback to assess its behavior, and applies the learning experience to make prediction regarding how to act next time it is in the same state. So, as the ML bot continuously received feedback (rewards if the actions gave good results, and punishments if it receives damage from an opponent, looses the game, etc.)- it will have an increasing knowledge as to which behaviors are likely to be successful. That way you can say that Reinforcement Learning in essence is a combination of Supervised and Unsupervised Learning, in that it can both explore unknown data that it is yet to learn from(Unsupervised Learning) and learn from these actions by connection relations between situations, actions and results (data instances, input and output, which is what Supervised Learning uses). If the developer has very limited knowledge as to how an observed player’s behavior varies, Unsupervised Learning can be used to classify it. For instance in the game Tomb Raider: Underworld, UL is used to ”determine representations of the input data in order to classify the players into distinct groups based on behavior”Magy Seif El-Nasr [2013]. Here, the classes of behavior for the players were characterized by completing the game really fast, another completing the game really slow, and a third by not completing the 9 of 79 MED6, Group 617 2 ANALYSIS AAU-CPH game at all. Unsupervised Learning techniques for FPS games are, however, rarely applied in FPS games alone, and requires a comprehensive amount of time to train the data if no previous model of the data can give it instructions as to how to implement the ML structure.MORIARTY [2005] 2.1.4.3 Optimizing Learning When deliberating which ML paradigms are relevant for NPC bot in FPS games, the question mainly concerns how to enable bots to behave in a way that are assumed to be human-like. However, there are more aspects to consider in order to reach a solid computational solution. For this, optimization is important to stress. Reinforcement learning is in general computationally because it is optimized by applying a genetic algorithm2.1.5 The algorithm uses the logic of biological evolutionary, which is presented in the way it sorts out behaviors that does not find relevant to learn from. This could for instance be to avoid making small changes in behavior to find locally optimal behavior, and instead search experience on a larger scale. It tries to: • Minimize the amount of dependencies: it connects relations between different variables such as location and weapon and tries to prioritize the ones with strong connection(have given good results). • Avoid Over-fitting; meaning it avoids to adapt its behavior to a restricted set of states in the game and performs poorly in other states. • Explore and Exploit: alters between exploring new policies (actions in given situations) and exploiting what it is already familiar with. ML is basically always accompanied by an optimization algorithm to some extent, which makes the program computationally smarter. Dynamic scripting, which is explained in the following section(2.1.5), is also a method for optimization, as it consistently sorts out behaviors that are not relevant, and focuses on behavior that it estimates to be most successful. 2.1.5 Basic Dynamic Scripting With the desire to replicate human-players ability to behave dynamically and adapt tactics based on continuous experience, the Reinforcement Learning model seems to accommodate these prerequisites. In order to better understand what enables this ML paradigm to behave dynamically, this section will cover the basics behind the online computational procedure, Dynamic Scripting(DS), and will attempt to explain the most essential terms that revolve around this dynamic programming method. The reason behind explaining this is to provide a more tangible understanding of which parameters the program must employ to acquire dynamic behavior for agents in a game environment. Despite the large variety of ways to compute a dynamic behavior, the concept of DS will help understanding the technical and conceptual part of general ML. This whole section will primarily be based on the article ”Online Adaptation of Computer Game Opponent AI” Pieter Spronck [2003] written by Pieter Spronck, Ida Sprinkhuizen-Kuyper, Eric Postma from the University of Maastricht, Netherlands. 10 of 79 MED6, Group 617 2.1.5.1 2 ANALYSIS AAU-CPH Rulebase The program has a finite amount of manually designed rules that is stored in the rule base. Each rule represents an action of an agent, composed by a condition clause and an effect clause. In other words, the action will be executed if the agent is in a certain condition, and if the agent estimates that there is a sufficient probability of reaching the effect desired from that action. To understand this in a less abstract expression, one rule could determine that a player would be fit to shoot at an opponent(action) if it has a health level of over 50 percent(internal condition), if the opponent is within view range and is outnumbered(external condition). 2.1.5.2 Rule weight Each rule in the rulebase consists of a rule weight. The rule weight, which value varies between 0 and 1, is associated with the probability for a rule to be selected. This parameter is crucial for understanding dynamic behavior and reinforcement learning, as it is what enables the agent to consistently value its own action. 2.1.5.3 Script Each agent is controlled by a script, which is composed of a subset of the available rules. The agent classes will be scripted differently, making them react differently to the environment they are presented with. A script can indicate the bots personality: like Counter-Terrorists and Terrorists will have different policies as to how to approach the game. 2.1.5.4 Learning Episode When an agent encounters an enemy, it will have a chance of evaluating its own tactical decisions this episode will be denotes a learning episode. 2.1.5.5 Rule Policy When the agent interacts with the environment it applies its script to select actions in given situations. In order for some rule to be executed, the agent must know which of the available rules in the script should have higher priority than the others. The order of the rules in the script is important, as the rule policy component orderly processes it and performs the first rule that is applicable to the current state condition of the agent. As the agent approaches the environment and gains experience, it can change its rule policy based on the feedback it gets from the environment (reinforcement learning). So where the agents script can be classified as a type of bot (the available rules of the agent), the rule policy is the individual priority of rules based on experience. 2.1.5.6 Assigning Weight Values To achieve dynamic difficulty by using defined rules, each rule can be rated by using RL through each learning episode (encounter with opponent). DS is a technique presented to reach this goal. Peyman Massoudi [2013] After each learning episode, the agent is reinforced with new knowledge as to whether the rules applied were successful in achieving its purpose or not. This is done by a reward function/fitness function. The assigned weight values for all rules exposes their chance of 11 of 79 MED6, Group 617 2 ANALYSIS AAU-CPH being selected in future encounters. The rules with the highest weight are located highest on the list of rules in the rulebase. Peyman Massoudi [2013] 2.1.5.7 Fitness Function The performance of an agent is measured by a fitness function providing weight values to the active rules in the agents script. Pieter Spronck [2003] The function produces a number between 0 and 1 to indicate how well the script performed during a game. A low fitness indicates a bad performance (weighing values between 0 and 0.5) whereas a high fitness indicates a successful performance (weighing values between 0.5 and 1). The fitness function contributes to an evaluation of the overall performance of an agent over a larger number of iterations. Since the outcome of a single match does not really have any real significance, as winning by chance is possible even with a weak tactical plan; an assessment of the average results over a large number of matches is important. This fitness function thus uses a fitness average to estimate the development of the bots performance. 2.1.5.8 Compensation Mechanism These above mentioned rules are important to understand in order to get a grasp of how DS works. Since the accumulative weight value of all rules is consistent, the weight of the rules will be distributed dynamically in relation to how well the agent performs. So when the agent performs a number of successful actions, the activated rules increase in weight, whereas the two other types of rules decrease in weight. And the other way around; when the rules applied in the script receives punishments throughout a sufficient amount of time, the values for the non-selected rules will be proportionally possess a higher weight, and will eventually achieve a high enough value to be selected onto the agents script. 2.1.5.9 Goal The goal of DS is to adjust the weight of rules to test which rules are most successful, which in the long run will increase the agents fitness after a number of iterations. When the agent’s fitness has increased over a period of time, it means that it has, through experience, learned which tactically behaviors have a higher probability giving success and which behaviors that have a low probability of being successful. 2.1.5.10 Interim Summary of Dynamic Scripting To sum up, every agent class in a FPS games consists of a number of selected rules (script) that is generated from a rulebase, which contains all rules in the game. This script determines the different policies that characterize each agent class in the game. Each rule has a weight value that influences its probability of being selected for the script and its probability of being executed. After each game, the fitness function assesses to which extent the agents actions accommodated its goals. It hereby calculates a weight adjustment for all activated rules that were selected, meaning all the rules that were applied in the round. The compensation mechanism enables the weight values of the non-selected and non-activated rules to behave in co-ordinance to the adjustment that are made to the weight of the selected rules. Pieter Spronck [2003] This is what enables the agent to learn to adjust its behavior in favor of rules that tend to give positive feedback, 12 of 79 MED6, Group 617 2 ANALYSIS AAU-CPH as well as empowers the agent to try new ways when it realizes that its action havent paid off. This is the most essential aspect of DS and basic dynamic programming: To have an number of rules(or variable or integers) that share a total weight. The shared weight is what enables a proportional relation between them all. 2.1.6 Interim Summary for Artificial Intelligence in FPS games In the section Artificial Intelligence in FPS games a variety of approaches to human-like behaviors have been analyzed. The widely applied FSM turned out to replicate human actions well, but didnt have the capability of replicating human behavior very well. The big drawback for this machine is that it wasnt able to adjust its computational pattern and thus becomes predictable at some point when the player had played the game a sufficient amount of time. The subsequent section investigated different ML styles and analyzed how each of the different approaches could assist in generating a more human-like behavior than shown in the FSM. Supervised Learning turned out to be a good solution for equipping an agent with basic behaviors that are copied directly from human-players. But no matter how well the offline learning algorithm2 can calculate optimal combat tactics, it was deemed inadequate in relation to the Final Problem Statement because of its inability to adjust behavior on-line3 . However, this learning paradigm might still be suitable for the project, as it can generate a good starting point as to what the agent should focus on and provide some general attributes such as: being able to aim and shoot at an opponent, being able to run, jump, hide etc. The study of ML however revealed that the most essential learning paradigm to consider for this project is the Reinforcement Learning algorithm. Its’ on-line adaptive behavior enables it to change tactics dynamically (meaning while playing) which is an feature that can certainly be argued to be more ”human-like”. 2 A system that does not change behavior once the initial training phase has been absolved.? Online learning is the process of adjusting assessment of data after input piece-by-piece in a serial fashion ? 3 13 of 79 MED6, Group 617 2.2 2 ANALYSIS AAU-CPH Choice of Game The amount of FPS games developed is now of a staggering amount. In this section the best candidates will be analyzed and discussed from the objective of being able to later put the Final Problem Statement to a test, and a final choice of platform will be determined, in which the project will be developed in order to do so. 2.2.1 Unreal Tournament 2004 Unreal Tournament 2004 Creativecommons.org is a First-Person Shooter game released in 2004, (Figure 4). The game is either played in a single player campaign mode where missions have to be accomplished in continuous order, or in a multiplayer mode with 8 different game modes, such as Capture the Flag, Deathmatch, Team Deathmatch, Double Domination, Bombing Run, Last Man Standing, Invasion, and Mutant, and further developed game features like vehicle Capture the Flag, that makes it possible to use vehicles. Fig. 4: Unreal Tournament 2004 Unreal Tournament is interesting because it has some highly developed bots that have been under development; the most human-like created bot to this date was even developed for Unreal Tournament Software [b]. However, the development of AI for Unreal Tournament is mainly focused on bots for the Deathmatch mode; which is individual bots that is primarily based on the skills and movements of the individual player. The bots for the Team Deathmatch mode, is the same bots as from the Deathmatch mode, which means that the focus is on the individual performance, and the premise of the game mode is relatively poor for collaboration, seeing as the players are arbitrarily spread out over the entire map, and will respawn equally random upon their deaths. Randar, Bots for Unreal Tournament 2004 are made server-side and are usually customized for the different levels they play. 14 of 79 MED6, Group 617 2.2.2 2 ANALYSIS AAU-CPH Counter-Strike 1.6 Counter-Strike (CS) 1999 (Corporation) is a team based FPS game (Figure 5), which is a MOD for Half-Life Valve [1996-2010], both of which are developed and owned by Valve. CS is usually played with exclusively human players, over the internet. Although several custom made mods and game modes has been developed, traditionally there are two different game modes: Bomb and Hostage. In each game mode the player can chose to be a part of the Terrorist- or the Counter-Terrorist team. In the Bomb game mode the Terorrist team has to plant a bomb on either one of two possible sites in the map, whereas the Counter-Terrorist team has to prevent the bomb from being planted and explode. In the hostage game mode the Terrorists will spawn in a base wherein several hostages are being kept, and it is the Counter-Terrorists’ objective to free and rescue all of the hostages. In both game modes the round can also be won be eliminating all of the opponent team’s players, since once a player has died he will remain dead until the beginning of the next round. Each round usually has a fixed time limit set to it, normally around 3 minutes. Fig. 5: Counter-Strike 1.6 Randar, Counter-Strike bots are made client-side 4 , and are usually written in C or C++. They are working as independent programs, required to process and interpret the data given to them more or less as a human player would have to. 4 ”client-side” is a class of computer programs that is executed by the user’s web browser” 15 of 79 MED6, Group 617 2.2.3 2 ANALYSIS AAU-CPH Quake 3 Quake 3 (Q3) 1999 id is a multiplayer action FPS game (Figure 6), much like Unreal Tournament 2.2.1, apart from having no single player game mode. Quake 3 has four game modes; Deathmatch, Team Deathmatch, Tourney (1 vs 1) and Capture the flag. Fig. 6: Quake 3 The only of the four game modes in Quake 3, with an alternative objective to the ”kill every opponent”, is the Capture the flag game mode. Here, two opposing teams are to defend their own flag whilst trying to obtain the opposing team’s flag. Once a player is killed, he will respawn at a random spot in his team’s area of the map. Quake 3 bots are made server-side 5 ).Randar 2.2.4 Interim Summary All the aforementioned games offers an easy accessibility for development, both in development platforms as well as in pre-existing material of which to extend this project from. This puts a sharper focus on the premise of the different game’s specific game modes and objectives; In this project the focus is on the collaboration and interaction between team bots, which means that some of the game modes are of more or less interest. The game modes of interest in Quake 3 and Unreal Tournament 2004 are Capture the flag, Team Deathmatch, Bombing Run and Invasion. The reasoning behind choosing these game modes, are that they all, in different ways, set a fundamental premise for collaboration as teams, either team against team, or team against NPCs. All three games offers highly developed bots open to further development, however CounterStrike 1.6 is an attractive platform to develop the project on primarily due to its origins: Since CS is a MOD from another game, the access to the game’s data has led to countless bots having been developed throughout the years, most of which having the same or a similar goal to this project - i.e. creating more human-like bots. However, more importantly the Counter-Strike game modes sets a distinct premise for collaboration, seeing as both teams spawn together with their team-mates, and the objectives for winning the game is not only delimited to killing the opponents, but can also be obtained through other objectives, whilst the consequences of being killed in a round will value collaboration with 5 a computer program that runs on a remote server 16 of 79 MED6, Group 617 2 ANALYSIS AAU-CPH team-mates higher than if immediate respawn was an option - such as in Quake 3 and Unreal Tournament 2004. All in all, Counter-Strike 1.6 offers a good foundation to put a team of collaborating bots’ crafting tactics to the test. Having chosen the game which will act as a platform for the project’s further research, the proceeding step will be to investigate how successful collaboration looks like, and what the elements of a tactic in Counter-Strike consists of. This leads the research into the following paragraph, which will investigate what professional Counter-Strike gamers consider to be imperative subjects, in Counter-Strike collaboration. 2.3 Collaboration in Counter-Strike This section will take a closer look at Counter-Strike and see how this game incorporates interactive communication. The essential aspect of this collaboration is how it is used for establishing dynamic combat-tactics. The first part of the section will outline what constitutes a good team performance - with a reference to actual human-players. This is followed up by description of what constitute a human-player/player in general, and what kind of communication he uses to inform team-mates. The tactical (player and team-based) element will also be covered, giving an insight into which objectives the two opposing teams (Terrorist and Counter-Terrorist team) have. For this, the offensive and defensive elements of the team will be included in the last part of this section. According to Denby and Psycrotic, Counter-Strike is a team player game. The team benefits from every individual player being aware of what is going on in the game. Even though individual performance contributes to the team’s winning chances, the focus should be on the team work and collaboration. There are some central parameters that have to be fulfilled before a team performance is good. The following parameters are: • Complementing each other (One could be good at long distance combat, where others are good at close combat) • Same goal (everybody in the group, have to know the goal, and know the path to fulfill it) • Communication and information (A player who is not well informed, is not a team player anymore) In an article written by a journalist Denby from the Gamers Magazine PC GAMER, an interview with a professional gamer is conducted. He is providing some tips regarding what constitutes a good gamer. In the following quote he puts emphasis on the need of communication in Counter-Strike, and how fast it must occur between the players. ”As with all team-based games, but perhaps even more so with Counter-Strike, it’s important to be in good contact with your team-mates throughout a match. A lack of communication can be the difference between a decisive victory and an embarrassing, crushing defeat, so talking to each other is tremendously important. But simply maintaining contact is not enough: it’s imperative to be efficient with your communications. ”It’s best to keep your calls about what’s happening short and quick, and explain everything you know, such as how many enemies you see, if you see the bomb carrier, and what weapons they have,” says Elliot, the professional gamer. And be sure to get a hold of a voice chat program such as Ventrilo(Ventrilo) or Mumble(Mumble) to utilize during practice: they allow 17 of 79 MED6, Group 617 2 ANALYSIS AAU-CPH you to speak to your team-mates whether you’re dead or alive, an advantage not afforded by Counter-Strike’s in-game communication system.” Denby In the following chapter, communication between players is described, how tactics are developed and how the players perform the task of communicating tactics. 2.3.1 Communication In Counter-Strike there are some limitations of how and when it is possible to communicate. These limitations are made to prevent cheating, such as Ghosting, which is the act of communicating with team-mate when the character is dead. These limitations are bypassed, by using an external communication application such as TeamSpeak(TeamSpeak) or Ventrilo(Ventrilo) that makes it possible for a group to talk to each other, asynchronous with the game. This kind of communication is not allowed in official clan wars since it is considered cheating, but through practice clan wars, it is allowed since it can help tweaking tactics throughout the game. 2.3.2 Radio Communication According to Bosskey the radio commands implemented by Counter-Strike is of very limited use. Most players are ignoring these, because they typically are too imprecise, but if used perfect it can be almost as good as voice communication. It is therefore most preferable by gamers, to use the voice communication. 2.3.3 Tactics The term ”tactics” is very broad in correlation to Counter-Strike, since the level of tactics is split into two parts, the individual tactic and team tactics, which ideally should compliment each other. The two parts can be split into a lot of elements, but since this project is focusing on communication between bots to achieve better combat-tactics, the only parts of all the elements analyzed through this chapter will be regarding communication and the remaining elements can be found in Appendix10 - 1. 2.3.4 Team tactics Wolfz0r As described in 2.2.2 there are two kinds of teams; a Terrorist team and a Counter-Terrorist team. The two teams have different approaches in relation to weapon preference and goal, as well as whether they start as defensive or offensive in the chosen game mode. The Terrorists’ offensive game mode goal is to plant a bomb in one of two bomb sites. Once the bomb is planted, the Terrorists’ objective is to protect it from being defused by the Counter-Terrorist. In the opposing game mod, their defensive goal is to prevent the Counter-Terrorists from rescuing the hostages. The Counter-Terrorists’ offensive game mode goal is to rescue the hostages, and bring them back to the safe zone, usually positioned the same place as their spawn point. Their defensive game mode goal is to protect the bombsites so that the Terrorists does not succeed in planting the bomb. If the bomb has been planted, the Counter-Terrorist’s game mode goal becomes an offensive one, where they are to defuse the bomb. Despite 18 of 79 MED6, Group 617 2 ANALYSIS AAU-CPH the fact that the Counter-Terrorists and Terrorists have opposing goals in the different game modes, the tactical elements(Appendix10 - 1) are generally the same. If none of the goals described above are met, a round is also won by eliminating all opponents. 2.3.5 Defensive Elements According to (Wolfz0r), when playing as a defensive team, you are awaiting the enemy to show themselves, and try to kill them before they plant the bomb or rescue the hostages. The defensive team can benefit from a major advantage, by the use of strategical positions. Positioning is crucial both to the individual player, but also to the team. If the positioning of all players on a team are known by everyone involved, then it is possible to know where a team-mate died or engaged combat with the enemy. Positioning can also be used to ambush the enemy from an unsuspected direction, or give a large overview of the map, which makes it possible to share the enemies positions with your team-mates. A part of the positioning element, is an element called rotation. Rotation1 is a tactical element that helps determine how a team reacts to specific situations. If something appear in the game, such as a player spots an enemy, gets shot, shoots the enemy etc, the team must change tactics immediately and reposition to adjust to the new information. According to Wolfz0r and Psycrotic the weapons of the players at each team are also important in relation to the positioning and the opponent team’s choices of weapons. A main rule that is used in any tactic, is to use blurring grenades such as flashbangs2 or smoke grenades3 , that gives the team some seconds to rotate, or get in position. The choice of weapon for each player on a team is based on the position, the skills, the teammates’ and the enemies weapons. If a players position is to surprise an enemy it would be preferable to use a close combat weapon such as a machine gun or shotgun. If, on the other hand, a players task in a given round, is to cover a great distance, a sniper might be preferred. However, as mentioned earlier, team-mates needs to compliment each other so having only snipers or only machine gunners on a team is not preferable, and some members must therefore use a weapon that might not be the first choice in correlation to the position and the individual player’s experience with that weapon. 2.3.6 Offensive Elements As stated by TheFeniX, when playing as an offensive team, it is important to get forward as fast as possible in order to reach the goal before the defenders have gotten into position. The tactics are therefore usually based on a rush-mentality. But there are many kinds of methods to reach the goal, and many parameters that have to be considered before rushing. The first thing that have to be taken into consideration is where the team-mates spawn, because it determines the route of the player, since you always are interested in getting to the goal before the enemy. This means that the offensive team has to use the buy time, to discuss the tactics. 1 Rotation is a tactical defensive element, that determines the response a player has to make when an action has happened. E.g. if the bomb is planted in bombsite B, some should cover, and others should rush to bombsite B. 2 flashbang is a grenade, that creates a sudden short, very sharp flash, which blinds any player who looks at it when exploding, for a number of seconds. 3 a grenade that fills a certain area with smoke, which covers the area 19 of 79 MED6, Group 617 2 ANALYSIS AAU-CPH The different methods are performed to gain an element of unpredictability to the tactics. One of these elements is the flanking element which is a method that makes use of the expectations of the enemy. The method usually involves sacrificing a team-mate that makes a lot of noise and draws the attention of the enemy to a certain position, while the rest of the team-mates are sneaking in another direction. Another very well-known rush method is the team rush, where the entire team are taking the same path, and rushes as fast as possible towards the goal. In general there are many different ways to rush and each tactic has its (dis)advantages. TheFeniX Because the offensive team has to reach the goal and be in movement, it can be difficult to know the team’s positioning, and if a player dies, he cannot inform what happened to the team-mates, and the team-mates’ knowledge of the dead player’s position prior to his demise is most likely limited at best. It is therefore beneficial to make use of the buddy system, where the player always has at least one buddy following, so that no one operates alone. TheFeniX This makes it possible to have the back covered at all times, which can be considered a crucial element of survival. No matter how large a group the player is in, it is important to be sure someone is covering the back of the group. TheFeniX As stated by TheFeniX, Kaizen [b], the weapons of the offensive team are mainly close combat weapons, since they offer a fast movement speed, as well as a quick reload time. One tactic that is important for the offensive team in relation to the weapons, is that all players have to buy flashbangs which can be used to buy a couple of seconds, either before- or after the enemy team has got into their intended positions. 2.3.7 Interim Summary Through this analysis of collaboration in Counter-Strike, different elements underlined by professional Counter-Strike players’ as being significant tactical considerations, have been listed and described. Before any requirements can be established for the project’s approach to the tactical collaboration between AI agents, research into how State of the Art Counter-Strike bots process the tactical elements described above has to be done. 2.4 Bots in Counter-Strike This project is interested in focusing on further development of a bot system: to have a valid bot, that is as close to the ”state of the art” bot as possible, and implement additional features that helps it become more ”human-like”. This section will go through key features in state of the art bots in Counter-Strike. First a brief section regarding the more common features in bots, followed by a look at two different types of bots and what makes them relevant for this project. As already mentioned briefly in the introduction, bots are non-player characters (NPCs), that can be used for a variety of reasons. They were originally developed to allow people without network connection, to play certain games that otherwise were only possible in multi-player games. The purpose of the bots are to simulate human-players and incorporate most of humans’ behavioral patterns. 20 of 79 MED6, Group 617 2.4.1 2 ANALYSIS AAU-CPH Static vs Dynamic Bots The method of which bots can simulate human-like behaviour can be categorized as being either static or dynamic. Randar, Static bots are usually made by defining all possible scenarios that might occur in a game, and hardcoding the behaviour of the bot for that specific situation. This method requires the programmers to write a heavy set of code but also proves rather flawed. If the programmers for instance, have failed to predict any given situation, the bots might not be able to progress from this state or at the very least, their behaviour is affected by this in a way, that it is quite obvious to the player, that these are in fact bots and not real people playing. Dynamic bots on the other hand, are often developed using ”Machine Learning” (2.1.2), which is a method to create an artificial intelligence that is able to learn from the data presented to it. Programmers developing dynamic bots, can achieve their goal with much less code needed than what can be seen in regards to the static bots, because they simply define what parameters the bots needs to take in, in order to succeed. When this is done, the bots need to learn, which is typically achieved by trial and error. So the dynamic bots will need to run a large amount of generations and process the data gathered from each generation and thereby evaluate, whether or not last generation was beneficial in order to figure out which ways to walk or which methods to employ during the next generation. The learning process of dynamic bots can also be very time consuming, depending on how many generations they are supposed to learn from, but if done correctly, they are able to run any amount of sample generations by themselves, so the programmer does still not need to actively spend as much time developing these compared with static bots. In most cases, the dynamic versus static bots is something that is present in regards to how they are configured in terms of navigation as to learn a given map. Static bots have hardcoded waypoints they will follow, whereas common dynamic bots can be placed in an arbitrary map, analyse it and create its own waypoints. 2.4.2 Gathering Data When trying to simulate human-like behaviour, the developers of the bots needs to figure out which features are necessary to replicate in order to create a bot that fulfills the requirements of behaving like human players would in the game. These features obviously depend on the target game as well as the goals within such a game. Bots in real-time strategy games typically consists of a decision trees of great length, because there are so much data to take in. This could be information regarding resources and cost of buildings/units as well as path-finding algorithms, the order of actions performed and much more. Bots in FPS games however, might also deal with a great amount of information, but it is usually more limited due to how they are expected to interact within the game. Features for FPS bots usually consists of information regarding movement, direction and jump state. These three features (Geisler [2002]) are not enough in itself however and varies between the FPS-bots. Typically, FPS bots also requires some information about the environment, which is often done so by something called a ”Navigation Mesh” (or simply ”Nav Mesh”). A navigation mesh is used to aid the bots’ pathfinding algorithms and thereby enable them to find their way around a given map. Navigation meshes also holds information about obstacles and 21 of 79 MED6, Group 617 2 ANALYSIS AAU-CPH walkable planes in order to avoid the scenario, where bots keep running into a wall for instance, or other impassable ways. In Counter-Strike, this data is gathered in ”.NAV” files. Most bots will typically analyse the environment close to them and create waypoints for the walkable terrain. The following sections, will contain a more in-depth analysis of features from three state of the art bots for Counter-Strike, namely: RealBot Hendricks [2007], TeamBot Stewart and the PODbot Markus. 2.4.3 RealBot As stated by Hendricks [2007], the RealBot have been developed by Stefan Hendricks through many years, with a lot of different versions, mainly for the better. The goal of the bot is to have an as easy to implement bot, that have a very human-like nature. Instead of only having one bot, that reacts the same way, Stefan Hendricks have implemented a personality into a great amount of different bots that makes them vary in behavior. There are a large amount of different parameters that defines the behavior of the bot. 2.4.3.1 Choice of weapon The bot is mainly working with two kinds of weapons, a primary and a secondary weapon. The primary choice consists of any weapon in the game, while the secondary only consists of handguns. This means that if the bot has enough funds to buy the primary weapon, it will always buy that weapon, if it does not have the funds for that weapon it will buy a handgun. 2.4.3.2 Choice of miscellaneous According to the readme file for the RealBot (Strahan [2004]), The bot is using probability to determine the use of miscellaneous artifacts, such as HE grenade, flashbangs, smokegrenade, armor or that it should save funds for the next round. This is done by the use of static numbers implemented by Hendricks in each of the bots personalities. This feature can make it more human-like since it seems like it takes some different decisions each round, but in the bottom-line is it just a matter of a percentage. 2.4.3.3 Skill of Bot One parameter of the bot that is used to determine the difficulty of the bot is the skill section. The higher a number, the worse the bot is. Furthermore it is also possible to change the offset of the bots aim, which determines how precise the bot is, the less offset, the more precise aim, and vice versa. A last parameter that determines the bots difficulty is the reaction time, if the bot has a reaction time at 0 would it be less human-like (Strahan [2004]), since it will react as soon as it sees an enemy, while a human player will have a reaction time about 0.15 to 0.3 seconds. 2.4.3.4 Goal decision The bot has implemented a set of parameters that decide whether the bots’ goal is to hunt down enemies, or try to solve the goals. This is split up into the goals described earlier in the tactics chapter. Each goal is also determined by probabilities. So the bot 22 of 79 MED6, Group 617 2 ANALYSIS AAU-CPH depends on the probability of each goal, will decide to plant the bomb, pick up a thrown bomb or rescue the hostages, or simply just do something random. 2.4.3.5 Communication Another set of parameters that is added to the personality is the communication. Which yet again is determined by probability, it is set to react on radio messages, but the probability determines whether it replies or dont replies a radio message sent to it. It can also create a radio message that could be communication regarding where in the map, actions are taking place, or telling a tactic to the humans cooperating with it. 2.4.3.6 Sympathy A tactical parameter for the bot is the sympathy. That determines whether it should help another team-mate, or just stick to its own tactic. This is determined at probability again, and is only in correlation to combat situations. 2.4.3.7 Physique and strategy behavior The last parameters that the bots uses are the physique and strategy parameters, which makes it human-like in relation to reaction patterns. The different parameters underneath this set of functions are the turnspeed of the bot, how often it runs with knife, the rate it can hear in, probability of escapes instead of going into battle, the probability of it chatting and the probability of it camping. Fig. 7: Editor of the TeamBot 23 of 79 MED6, Group 617 2.4.4 2 ANALYSIS AAU-CPH TeamBot The TeamBot Stewart is another type of bot that has been developed by Alistair Stewart, with focus on setting up more advanced strategies for the bots. The thing that makes the TeamBot interesting is mainly an editor (Figure 7), that allows the host to specify what type of bots should join the game. In the editor there are a number of premade bots. They all share most of their behaviours, but it allows custom settings for each individual bot. The purpose of this feature is an attempt to add personality to each bot. The personality was to be further developed so each bot could be set to have a specific task within the game. These tasks would allow the host to create a team of bots that all have their own task yet still make them collaborate, whereas the host settings for most of other bots only allows to change the settings for the entire team at a time. The tactical features however, have not yet been implemented, and the developer has stopped working on this project. This means that the TeamBot has a great framework for further developing tactical features, depending on where Alistair Stewart left off. 2.4.4.1 Choice of Weapon The Editor also provide the opportunity of changing the probability of weapon choice for each bot. The choice of weapons is something that affects the tactics a great deal. For instance in a scenario where the bots are taking a path where opponents can be met in close combat, a sniper would be a poor choice of weapon. However, if the bot is supposed to guard a bombsite, a sniper might be a fitting choice. 2.4.4.2 Communication TeamBot uses the standard radio commands from Counter-Strike, as described in section 2.3.2. For a player on a team with bots, the radio commands can be a good and easy way of instructing team-mates to certain tasks. The TeamBot will also use the radio commands themselves, so if they for instance are about to plant the bomb, they will most likely call out the radio command Cover Me, so that the team-mates will be covering while the bomb is being detonated. The TeamBot uses something that has been defined as an obedience level. This can be defined for the specific personality of each bot, and if it is not set to 100, there is a chance that they will not follow the order given. 2.4.4.3 Wayzones TeamBot also differs from other bots in the way they move around due to something called ”Wayzones”, which is created as an extension to the already existing concept of waypoints. Wayzones gives the opportunity for the bots to navigate to a point near the waypoint, which is said to give a more smooth and unpredictable outcome, and thereby more like human players. 24 of 79 MED6, Group 617 2.4.5 2 ANALYSIS AAU-CPH PODbot Another bot that has been acknowledged for its human-like behavior is the PODbot, which is an open-source metamod(4.1.3) bot add-on for Counter-Strike 1.6, (and to some extent CS 1.5) and CZERO. Markus, The bot was coded and developed by Marcus Klingen with the alias Count Floyd, who was hired by the game company GearBox Software to work on single-player agents. His main focus was to improve the bots ability to communicate and perform traditional teamplay. Like most developed bots, The PODbot uses waypoints for its’ navigation, and applies commands between one another. In the game, the bot is pre-programmed to target the human-players first, before focusing on other bots. 2.4.5.1 Three Modes The PODbot consists of three different modes that it can alter between; Normal, Aggressive and Defensive. These modes determine the preferred choice of weapon, as normal and aggressive bots are usually programmed to carry assault rifles, whereas the defensive bots carry sniper rifles. 2.4.5.2 Skill Level The skill level of the PODbot can be set to easy, normal, hard or expert. This makes it adjustable for the human player and his/her preference. 2.4.5.3 Communication Between Bots The bots can communicate a range of different radio commands to each other, and can also communicate via text message. These commands are communicated with the purpose of changing the behavior of team-mate bots when navigating in groups. 2.4.5.4 Obedience Since bots base their policies on multiple factors, including individual factors such as health, carried weapon (or lack thereof), condition (they are fighting another enemy!), they will not always respond to commands given from a team-mate. The PODbot is not always willing to obey the commands from the team-mates, as it has its own policies, and can thus choose to ignore the information. Before submitting to a command, the bot can use probabilistic assessments on the likelihood that it will actually be a good idea to change behavior, given the situation it is in. If it, for instance, is in a state of combat, then the combat state will have a higher priority than any other behavior because it will oppose its own purpose of trying to remain alive. 2.4.5.5 Goal Decision The bots will automatically know their overall goals in the game. The Terrorists have a common goal of planting a bomb, whereas The Counter-Terrorists are programmed to defuse the bomb and rescue hostages. The VIP bots will attempt to reach the rescue point. The objectives of the individual bots are employed dynamically and will depend mainly on factors that differ between the bots, such as; personality, health, team-mates nearby and items the bot is carrying. The bots will take these factors into account to assess the dynamically changing circumstances; if a team-mate needs support in a combat 25 of 79 MED6, Group 617 2 ANALYSIS AAU-CPH situation, the bot will evaluate its own position, health, weapon and ammo etc. to estimate if there is a probable chance of a success if it interferes. 2.4.5.6 Waypoints The PODbot uses waypoint to mark where the bots can go. There are several types of waypoints that indicate map goals, rescue zones, good camp spots, ladders, etc. The connection between the waypoints is what indicates paths and are calculated by applying a path finding algorithm. Depending on the state in which it is (normal, defensive or aggressive), the PODbot will apply different path-finding algorithms that either strive towards finding the fastest path or the path that seems most secure, depending on the bot modes 2.4.5.1. This connection between waypoints can be one-way or two-way connections, or be jump connections, which enables the bots to jump from one waypoint to another. The radius of a waypoint implies how strictly the bots should navigate according to the point; a small radius would indicate that the bot should follow the path firmly, whereas a wider radius would indicate that the bot is less restricted to a direct path. This makes it very profitable to mimic human behavior when being in an open space vs. navigating through a narrow passage. 2.4.5.7 Waypoint Flags Waypoint flags are used in assigning behavior to different waypoints. This could for instance be a waypoint flag that indicated that at the particular waypoint, the bot should climb (for instance waypoints in front of a latter). Other waypoint flags could indicate that the bot should duck, should point the weapon in a special direction where it knows that the thread is high, etc. 2.4.5.8 Limitations Despite being generally accepted as a close-to human-like bot, the PODbot has received critique in a range of different aspect in the game. They are said to tend to stick to other bots, fail to be affected by lack of illumination, and are easy to snipe. Lastly it is also claimed to lack advanced team-play as it only applies a restricted amount of commands and only shares experience in where damage were received. 2.4.6 Interim Summary The two different bots described above share many features. One of those they share, are that they were never completely finished, yet they are functioning quite well. The designers and developers of both bots, have implemented a to-do list, that can be seen in Appendix10 - 2. The PODbot cannot be said to be finished either. It is a product of numerous developers, adding features and functionality to it over the past many years. It is however, very well functioning and it performs quite well in regards to the others, when focusing on the human-like aspect. The bots of this chapter are not designed to collaborate but according to both designers, more collaboration and communication between the bots, would prove to be very beneficent. 26 of 79 MED6, Group 617 2.5 2 ANALYSIS AAU-CPH Testing ”human-like” Behavior This section will introduce different approaches that deal with estimating how ”human-like” a computer program can be considered. With an initial emphasis on the Turing Test 2.5.1 which’s procedure is applied for testing a wide variety of computer systems, the following section will dive more into testing ”human-like” bot behavior in FPS game environments. This will be followed up by a description of a contest called the BotPrize Competition, which measures ”human-like” behaviors in the FPS game Unreal Tournament 2004. To clear it out, the testing of humanness does not concern whether the program replicates the way humans think, but is rather concerned with how humans behave and act. 2.5.1 Standard Turing Test When dealing with a computer system striving towards replicating human behavior, its similarity in behavior is normally measured by a Turing test, a test proposed by Alan Turing in his 1950 paper ”Computing Machinery and Intelligence”Tur [1950]. The tests basic concept is to have a panel of test subjects each interact with both a human and subsequently a computer. In the original Turing test, which is called ”Original Imitation Game” Tur [1950], the interrogator is interacting with a human and a computer through a textual conversation. With no prior knowledge as to which if the two entities are human, the judge will make that decision based on the conduction of the test. The harder it is to distinguish between the two entities, the closer the computers behavior is to that of a human. 2.5.2 Turing test in Virtual Game Environments In game-like settings, alternative versions of the Turing test has been applied: one called the ”player believability test” Tog [2012], and another one called the ”Gestural Turing test” Ventrella. These tests differ from the Standard Turing test, in the way that the interrogator interacts with the computer/human player by moving around in an environment and assesses the opponents physical behavior and tactical tendencies, instead of merely interacting through textual questions and answers. The Gestural Turing test emphasizes the believability of body gestures and motion (which is not the focus of this project) whereas the Player believability test concerns the believability of the tactical behavior. The fundamental concept of trying to fool an expert into believing a computer-controlled program is controlled by a human, is consistent throughout through both the ”player Believability test” and the Turing test. 2.5.3 The BotPrize Competition In recent years of development of human-like FPS game bots, a contest called the BotPrize competition has evolved into a well-respected competition amongst AI game bot developers from all around the planet. The game used for the competition is based on a modified version of the FPS game, Unreal Tournament 2004, in the Deathmatch game-mode. The game allows three players in the map; a judge, a human player, and a computer controlled bot. In a Deathmatch game, all three players play against each other 27 of 79 MED6, Group 617 2 ANALYSIS AAU-CPH on a map, and respawn randomly around the map every time they get killed. The AI bots are tested by a group of experts, that judges which of the two other players are played by the human, and which is a bot. The human quality of the bot is measured by dividing the number of times a player (whether bot or human) is judged to be human, by the total number of times the player is judged. If the results reveal that the judges believed a bot was human 50 percent or more of the time, the test will be granted the top reward. The chat function, which is applicable in most on-line FPS games, and widely used by human players, is disabled in the test game to omit the bias of involving linguistic interference. 2.5.4 Interim Summary To deduce that a program possesses ”human-like” behaviors, this section has given insight into the relative simple approach that are used to test it: the Turing Test2.5.1 uses human judges to interact with first a human and subsequently a computer. Afterwards, the judge should choose which of the two he/she thought were human. If the total amount of human judges answer that the computer was human, around 50 percent of the time, it is said to be as human-like as possible. 2.6 Requirements Until now research and analysis have been done upon the different questions: what is machine learning and how can it be used in FPS games? which kind of game is accessible and suitable for developing a human-like bot? how do human-players collaborate and use tactics? which kinds of bots does CS use? and how is the degree of ”human-like” behavior tested? This section will list some requirements as to which features the bot should possess to be ”human-like”. This will include requirements regarding the individual and team-based tactics. 2.6.1 The Bot Should Have a higher purpose/goal to evaluate actions and experiences from 2.3.4 When the bots are going to analyze their gathered data they will need a purpose in order to evaluate their findings and experiences. 2.6.2 The Bot Should be Responsive to and Communicate Tactical Decisions This requirement is the cornerstone in the solution to the projects final problem statement. The bots will need to be able to communicate their analysis of their data to their teammates, and be receptive to their team-mates information, which they then will analyze and compare with their own, in order to make the best possible decision. 28 of 79 MED6, Group 617 2.6.3 2 ANALYSIS AAU-CPH The Bot Should Be able to Navigate in any Map One of the imperative points of a human-like play style is the ability to always be able to adapt. Although many bots are made to understand and navigate in a map by waypoints, inserted at a development stage, some are made to dynamically analyze any given map and asses fast routes, positions with cover, and several other factors. The ability to dynamically navigate through any given map is an important ability, especially since the way a player will look at certain areas in a map may change accordingly to how the opponents strategy is, or other events that may play out during the rounds of a game. 2.6.4 The Bot Should Be able to Aim and Shoot The ability to aim and shoot, not only properly but also in a human-like fashion is one of the most obvious requirements for a bot such as what this project has set out to create; seeing as it is often a point where it can be easy to determine a bot to be a non-human entity, when a bot can do a 180 degree turn and shoot a player in the head within the fraction of a second. This will naturally have to be implemented in such a way that will make the aim and shooting of similar characteristics to when a human player does the same actions. 2.6.5 The Bot Should Choose Weapons in Relation to its Main Objective The choice of weapon have to be decided based on the objectives individual bot. If the bot’s objective is to camp at a certain point, where precision is more in focus than speed of weapon, it would be preferable to use a sniper for example, and vice versa. 2.6.6 The Bot Should Navigate in Relation to its Main Objective The position of the bot have to be decided based on the main objective, such as defensive objective (2.3.5) at one of the bomb sites, or the position in a certain offensive tactics, where the main objective is to move the enemies focus to you. 2.6.7 The Bot Should use Rotation in relation to its Main Objective The rotation 6 of the individual bots depends on the experience of the team-mate. The objective would mainly be to cover as much of the map as possible, and will therefore be decided on behalf of the player objective, and the team objective. 6 Rotation is a tactical defensive element, that determines the response a player has to make when an action has happened 29 of 79 MED6, Group 617 2.6.8 2 ANALYSIS AAU-CPH The Bot Should Have Both a Defensive and Offensive Strategy at any given Point Through each round each bot have to ready using both a offensive (2.3.6) and defensive (2.3.5) tactic, since it can change within a second. In a bomb map, will the stance of tactics change when the bomb is planted, the Terrorist will be in offensive stance, until the bomb is planted, afterwards they have to defend the bomb area to be sure the CounterTerrorist not defuse the bomb. The Counter-Terrorist will start in a defensive stance, until the bomb is planted, then their main objective would be to defuse the bomb. 2.6.9 The Bot Should Learn from its Experiences The bot should have the ability to adjust its behavior in run-time by receiving feedback from it’s actions 2.1.4. By using the constant feedback to learn which behaviors are likely to be successful and which are not, the bot will have a much more similar behavior to that of humans. 2.7 Summary With an emphasis on the Final Problem Statement; ”Will interactive communication between FPS game bots, in order to establish dynamic combat-tactics, increase the development towards a completely human-like bot?” the analysis has investigated the essential areas regarding this question. To sum up, this has entailed: • deliberating which game platform the human bot should be developed on.2.2 • investigating which aspects of interactive communications and tactics are applied by professional human players.2.3 • examining Machine Learning and analyzing which methods are recognized for enabling bots to learn dynamically.2.1 • investigating which valid SOTA ”human-like” bot systems are accessible for further development.2.4 • exploring how human behavior in a FPS bot can be tested.2.5 The research has helped narrowing down a list of requirements 2.6 that will be incorporated in the conceptual design which will be the content of the next chapters. 30 of 79 MED6, Group 617 3 3 DESIGN AAU-CPH Design Derived from the analysis is the set of requirements 2.6, listing the necessities an absolute solution to the final problem statement must contain. These different methods will be introduced and explained, after which different constellations of the methods will be presented as different conceptual design proposals, and a final choice of design will be chosen to build the project from. 3.1 Objectives Tree Seeing as all requirements will have a countless amount of methods that could potentially fulfill them, due to the complexity of a finished design, with a numerous set of methods affecting and/or observing each other; Before all of these methods will be listed and evaluated as whole designs, a hierarchical structure will be created in order to maintain an overview of the relationship between the different considerations of the bot. In figure 8 such a hierarchical structure has been created, dividing the bot’s methods into four different types of objectives; the Main-, Sub-, Task- and Action Objectives. This structure will help sorting the different methods into a hierarchical order. Although which methods will be in what relation to each other is not known at this point, it can safely be presumed that the structure will be of such a form, seeing as any analysis the bot will make of its actions and experiences, will need to have a point of reference - i.e. a goal (objective) of a higher importance, to check if success criterias are met or not. Fig. 8: The figure of the objectives tree is a depiction of the hierarichal structure which the conceptually designed bot would base its decisions from. 31 of 79 MED6, Group 617 3.2 3 DESIGN AAU-CPH Concept Design With this structure as the foundation, the next step will be to describe and explain the different methods capable of fulfilling the set of requirements 2.6. 3.2.1 Higher Purpose / Goal To make reasoned deliberations of which performances are beneficial for reaching the ultimate goal winning the game, it is not sufficient to just keep track of kills and winning games. An agent might register that it died within a short amount of time, without killing any opponents, but it doesnt necessarily mean that the agent should get a purely negative feedback for its actions, as it might have contributed to the overall team effort in terms of distracting the opponents for instance. So, inferring relations between behavior and reward requires the agent to take into account a larger number of parameters. These parameters are what eventually constitute what is considered to be a successful round. The parameters can be denoted as success criteria, and are established to enable an assessment of each round. As withdrawn from the analysis, the higher purpose of the individual agents could be to keep track of number of kills it makes, as well as the outcome of the game. But there are more criteria to consider. To achieve a more human-like behavior, the agent must learn a large variation of movement patterns and general behaviors that meet these success criteria. The success criteria that works as motivation factors for the FPS agents, could be divided into two categories; individual and a team- based, respectively: Individual assessment criteria • Number of kills • Health after round • Ammo left • Survived/Died • Amount of time it was defensive/offensive Team-based assessment criteria • Outcome of the game win/lose • Number of survived team-mates • Number of kills • Amount of time it was defensive/offensive • Amount of different team tactics performed in each round • Amount of different team tactics performed in total 32 of 79 MED6, Group 617 3.2.2 3 DESIGN AAU-CPH Learn Through Experience The FPS agent should have a range of different behaviors that it could choose to perform: attacking, standing still, hiding, exploring, patrolling, and so on. When the agent learns from experience, it learns to make more cognitive estimations of which of these behaviors fit certain situations. 3.2.2.1 Elements of combat behavior With different machine learning approaches, the FPS bot has been developed to move dynamically in an environment, has obtained the ability to make decision about where to move, and the ability to think tactically, both in terms of individual and team-based aspects. 3.2.2.2 Connection between Actions and Rewards But how does it actually learn from its experiences? In order to do so, it needs to make estimates (measure probabilities) of a connection between different actions or decisions, and the reward given. In other words; how does the agent identify what caused the round to go the way it did? The goal is to use some sort of method that can make estimations of what actually caused these rewards after the game. Since there are a lot of different actions that could, hypothetically, be part of the overall reward, it is not plausible to infer fully reliable correlation between action and reward, since it is a large number of actions and decision that constitutes the reward in the long run. A function that applied this policy of probability distribution is the Fitness Function described in 2.1.5.7. Probability estimations are beneficial, as they predict a probability for an action to give good results - enabling a new experience to change this estimation, in case the actions preformed gave negative feedback. 3.2.2.3 Curiosity Element Human players tend to differ in tactics instead of deploying same strategy throughout the game, merely because it was successful. With the aim towards a bot that is as human-like as possible, a requirement that will be set for the agents in this project will be a curiosity element. For the bot to acquire a thorough experience with different tactical decisions, it must keep trying new roads, weapons, movements, communicated messages, etc. This will continuously help the agent to find more optimal solutions, and will prevent the opposing team from adapting to its’ limited behavior. 3.2.2.4 Making priorities in what to learn Since it doesnt make any sense to keep repeating patterns that are inclined to results in negative feedback, the agent should, after adequate experience, also learn to make priorities in which areas are estimated to be more relevant to learn from. For instance, it would not make sense to make local adjustments in behavior (for instance making the player navigate a little different after every round), if it still keeps running into a room from the same door while there is an enemy waiting there to kill it every time. In situations like this, ”making priorities in what to learn” would be to stop making small navigation adjustments and start prioritizing a larger adjustment of behavior. 33 of 79 MED6, Group 617 3.2.2.5 3 DESIGN AAU-CPH Applying Heatmaps, Waypoints and Waypoint Flags In order to deduce better estimations of which elements of the bots behavior causes certain outcomes, there are some approaches that try to isolate actions with given reward, in terms of time and space. The environment could be equipped with a heatmap7 , which is a map that keeps track of where the bots are moving over a period of time. This could help identify if there is a connection between where the bot is navigating and what its task objective is. Applying waypoints for this, would also make the estimations more probable in terms of locating where a behavior (waypoint flag 8 at that waypoint) has good or bad consequences. 3.2.2.6 Communication between Team-mate Bots The agent should also experiment with different kinds of radio commands, to reveal if there is a connection between what is conveyed among the team-mates and how the outcome of the game is. 3.2.2.7 Learning From Own Objectives links to higher purpose Whether the individual bot died or survived a game, the number of kills it made is, as well as its’ health and amount of ammo left after the game, are important criteria for how well it did. The higher purpose that should be the motivation factor for the individual bots, are introduces in 3.2.1. The agent should keep track of how long it is in a defensive vs. offensive stance to reveal its degree of efficiency. If the agent used a predominant amount of time attacking the opponent, it could indicate that it had chosen a bad position, had selected a wrong weapon for the situation, possessed poor aiming abilities etc. The point is that the objective of the offensive stance is to kill the enemy, and if it takes too long it must indicate that the tactical elements of the attack did not go as planned. On the other side, if a team is in an defensive stance, defending a bomb site over a longer period of time, the feedback the bots receive should be positive (depending on the overall purpose = winning the game 3.2.1.). This is because the objective of the defensive stance is not to get killed: The longer the player can remain alive in a defensive stance, the greater the reward. 3.2.2.8 Learning from Team-based Objectives Despite the obvious goal of striving towards winning the game/completing the group task, the bot should also, for each round; keep track of number of survived team-mates, total number of kills and amount of time it managed to stay in a defensive mode. This is also described in 3.2.1. All these parameters will give good indications of how efficient the team was on a general basis. After each round, the bots should also take into consideration the amount of team tactics/strategies performed. If the team only deploys one strategy during the round, it could imply that the round had been successful, unless the round was lost. So given the game is won, the amount of performed team strategies gives good indication of how successful the round was. The more strategies performed, 7 Map of dangerzones, typically shown in colour codes. The more danger there is in a zone, the more red the zone is. 8 A waypoint with more determinators, to determine what kind of waypoint it is, fx. camping spot, rushing spot or jumping spot 34 of 79 MED6, Group 617 3 DESIGN AAU-CPH the more frequently the team had to change a condition that they predicted to be successful. Thus a large amount of transitions between team strategies are inclined to be caused by poor predictions of the opponent’s tactical decisions. 3.2.2.9 Reinforcement Learning In the analysis, both Supervised Learning and Unsupervised Learning 2.1.3 was recognized to efficiently reveal hidden patterns in data. For an agent to learn behavior, Supervised Learning proved to be very good for replicating behaviors from human-players, which provided the bot with fundamental behavioral patterns that ”seems” human-like. However, putting an emphasis on a human player that learns through experience, this would likely also entail learning how to adjust behavior based on what happens right now - in the round. To do this, the agent must employ Reinforcement Learning 2.1.3.3, which allows the agent to adapt it’s behavior in run-time based on trial and error interactions with the environment. 3.2.3 Tactical Decisions In order to perform tactically smart decisions and in order to fulfil the aforementioned established purpose/goal (i.e. Main Objective), a set of parameters needs to be chosen to determine which factors the bot should analyze and evaluate the best course of action. The following will list factors of interest in this regard, and attempt to narrow them down to the most essential types of data to make tactical decisions from. All tactical factors of consideration will to a higher/lower degree influence the decisions of other tactical factors, and it is at this point considered redundant to tie the relations between the factors until a conceptual design has been made in its entirety. 3.2.3.1 Enemies’ (Current) Positions The enemies’ positions, whether observed by the bot itself or its team-mates, can be used to determine the optimal route to take for the bot’s destination, whether it will try to avoid or engage the enemy. 3.2.3.2 Enemies’ (Presumed) Positions The enemies’ presumed positions can be used for the bot to assess the likelihood of the enemy’s presence in a specific area at a specific point in time. This can be set up by having the bot save waypoints at the position the enemy were spotted for the first time, in every new round. In the beginning the ’first time encounter’ points may appear at relatively random places, but after a large amount of rounds played, the bot will be able to draw borders of where to expect the enemy at the earliest in a round. Such a way of mapping out the enemy’s presence can also be used to create heatmaps from the entire duration of a round; logging every position of an observed enemy at any given point of time during a round. After a big enough sample pool of data has been gathered, the bot will be able to, with statistical probability, guess the immediate position of the enemy at any point in time during a round. 3.2.3.3 Team-mates’ Positions The position of team-mates can be of grave importance to the bot’s survival, the elimination of the enemy players or the fulfilment of the main objective. The bot can be made 35 of 79 MED6, Group 617 3 DESIGN AAU-CPH aware of its team-mates’ position in numerous different ways: The bots on a team could at a fixed interval communicate coordinates or zone/area specific data of their locations to each other, they could make use of the radar feedback in the UI, giving a distance and angle estimation of team-mates’ positions or they could simply be limited to the apparent visual feedback (whether they have a clear line of sight to a team-mate or not). 3.2.3.4 Closing in on the Main Objective Seeing as the Main Objective will in most instances be the utmost goal/purpose of the bot, it can be made so that the bot at all times will have an estimation of whether it is working towards fulfilling the main objective, or whether its actions will digress its progress towards it. Such an estimation can be made from observing if the distance towards the main objective is growing/shrinking with the bot’s chosen route or it can be assumed from whether the bots survivability probability is growing/shrinking from its actions (i.e. the probability of whether the bot will be able to be successful in fulfilling the main objective). 3.2.3.5 Kill/Death Ratio The efficiency of killing (also known as Kill/Death ratio) can be used for the bot to evaluate the success rate of its actions; i.e. the greater the K/D ratio, the better its course of actions are. On its own, such a factor can lead to very skewed results, seeing as many factors may have affected the bots K/D ratio other than its own choice of actions, but if paired together with other considerations, for instance enemy positions, team-mates’ position, etc. it can aid as a supplementing factor of evaluation. 3.2.3.6 Chance of Survival As briefly mentioned; the chance of survival can be used by the bot to estimate a success rate by choosing actions which it has learned, might increase its chance of survival. Even if surviving is not the main objective of the bot, such a consideration can help in the sense that a dead bot wont be able to fulfil its main objective, or aid its team-mates in doing the same. 3.2.3.7 Enemies’ Stance The bot can take the enemies’ stance into consideration by making estimates, based on their positions, in order to assess a ”velocity” - where the enemy is moving towards, and at what speed. This can give the bot an idea of how offensive/defensive the enemy is, and act accordingly, in order to learn what course of actions suits best against a very offensive/defensive enemy. 3.2.3.8 Team-mates’ Stance Just as the aforementioned assessment of enemies’ positions, the bot can via observations and/or communication with its team-mates bring their stance(s) into consideration, so as to evaluate which course of actions is preferable for the bot to do if its team-mates are taking an offensive/defensive approach, at a certain time and position. 3.2.3.9 Team-mates’ Success Rate When a sub- and task objective has been chosen, the bot can be made to pay careful attention to those of its team-mates who has a particularly high success rate in fulfilling the objectives; i.e. to mimic traits of its team-mates course of actions, seeing as the particular bot seems to be doing something right. The same approach can naturally 36 of 79 MED6, Group 617 3 DESIGN AAU-CPH be made vice versa; that is to make the bot aware of not mimicking another bot with a particularly low success rate. 3.2.3.10 Choice of Weapon There is a large array of weapons in Counter-Strike, all with their (dis)advantages. The bot could be made to take this into consideration when choosing a weapon, in relation to its sub objective in the upcoming round (whether it’s going to be fast and offensive, static and defensive, close or long range weapon, etc.). Another approach to weapon considerations could be to make the bot evaluate based on experiences with the different weapons - i.e. how good is the accuracy of a weapon when the bot uses it, how is the bots K/D ratio with a weapon, survivability, main objective success rate, etc. Human players tend to choose the more popular weapons, such as the AK47 and M4A1 rifles. The reason for some weapons being more popular than others can differ a lot, but is typically related to weapon cost, accuracy, recoil and rate of fire. If the bot is to act human-like, it should opt to use the weapons deemed as popular, more often. Ideally, the bot can be set to figure out which weapons are popular by evaluating the parameters above for each weapon themselves. 3.2.3.11 Team-mates’ Advice The bot should be made to be receptive towards advice from its team-mates. This can be regarding positioning, rotation, offensive/defensive stance to assume - basically any course of action in which a team-mate assumes to know better than the bot receiving the advice. How such communication would be conveyed, and what level of detail the information would contain will be discussed in the following paragraph. 3.2.4 Offensive / Defensive Strategies As mentioned in 2.3 players in Counter-Strike have tactics based on whether they are attackers or defenders, but they use also tactics to cover each other and themselves and to inform each other better. This chapter will mainly focus on what the most optimal bot in relation to the offensive and defensive strategies should consist of. 3.2.4.1 Tactics for the bomb Since the best stance to be in is the defensive stance, will the Terrorist always strive to plant the bomb, since it can change their stance from offensive to defensive, but since only one Terrorist have the bomb, should the specific Terrorist be escorted to the bomb area, by its team-mates. If the Terrorist with the bomb dies he will drop the bomb, at the position he was killed. This means that the situation for the Counter-Terrorist at that point will be more lucrative, because they just have to protect the bomb, instead of two bomb areas. It is therefore important that the Terrorist focus on not letting their bomb Terrorist die, since it can change their chances of winning drastically to the worse. 3.2.4.2 Cover your Six Cover your six is a term referring to a clock in military lingo, where 12 is forward, and 6 is backwards. Covering your six therefore means that the group should always be covered from behind. This tactic is used to prevent an ambush from the enemy. Though this tactic is very important, the specific tactic have to take two different parameters into 37 of 79 MED6, Group 617 3 DESIGN AAU-CPH considerations, the time and the area. When a human player is playing a game, he will always take into consideration how far the enemy can be, before starting covering areas, there are typically no need to cover a place, where the enemy cannot be yet. Furthermore, there will be scenarios where an area does not contain any hiding places or there has not passed enough time for an opponent to reach this spot, then the bot should not need to ’cover his six’. 3.2.4.3 Clutch Kaizen [a] Sometimes one team has outnumbered the other team, and the other team will only have one player left. In these situations where there only are one player left, against a lot of players on the other team, the one player have to change tactics, since it most frequently will die, if sticking to the team tactic. The kind of tactic the single player have to use, is known as a clutch tactic. The elements of the clutch tactic will be explained in the following: • The last remaining player of the team has to be stealth by walking (not running) and using silencers, to be sure the enemy has a hard time finding you, this also means that the player has to be on the move, since the enemy can make a coordinated attack, if they know the players’ position. • When one of the opposing players has been taken out by the single player, the single player can use the enemy teams communication, to anticipate their movement, and by then kill their rotators. When facing multiple enemies alone, it proves best to do so one by one, because they are stronger when in a team, than alone(2.3.4). • A very great tactic for the single player, is to do the unexpected, and be unpredictable. This makes it possible to surprise the enemy, and make them more vulnerable. A very important element to draw into consideration about how to clutch, is the bomb element. The bomb is an element that can change how tactics are performed. • If the single player is a Terrorist, then he should get the bomb planted and let the Counter-Terrorist defuse the bomb. When the Counter-Terrorist is defusing, the single Terrorist player should jump out from hiding, and kill the Counter-Terrorist. • As single Counter-Terrorist it is possible to make use of an element called ”Fake Defuse”, which means that the Counter-Terrorist should start defusing, and stop again just to provoke the Terrorist out of his hiding spot when hearing the sound of defusing. 3.2.4.4 Flanking As mentioned in the paragraph above 3.2.4.3, it can sometimes be an advantage to behave in an unexpected manner. A tactic that can be used to mislead the enemy into falsely thinking that they have figured out your team’s strategy, is the flanking tactic. A tactic where the team as described in 2.3 sacrifice one team-mate, that have to draw the focus of the enemies away, by making as much noise as possible, while the team takes another path in the map. This will typically result in an ambush, or an undiscovered bomb placement. 38 of 79 MED6, Group 617 3.2.5 3 DESIGN AAU-CPH Tactical Communication Establishing the part of the conceptual design for how the bots are to communicate with their team-mates can be split into two sub-categories: ”What” to communicate and ”How” to communicate it. Several subjects of communication may be significantly affected by the form they are chosen to be relayed in. E.g. if a communication form is chosen that is to mimic the communication possibilities of two human players, the bots cannot be allowed to communicate specific coordinates. The vast amount of delimiting possibilities that may lead to a chosen form of communication is however so vast, that such a refinement will not be able until the conceptual design has had ”what”- and ”how” to communicate chosen. 3.2.5.1 What to communicate The subjects to be communicated for tactical purposes to the bot’s team-mates will naturally all derive from the above written tactical considerations the bot will make in order to determine its own set of actions. Generally speaking; any factor important enough to determine a bots own course of action is important enough to relay to its team-mates. However, the importance of different types of information differs, and all has a time and a place. At the top of the list is likely to be the positions of the enemies, closely followed by the positions of the bot in relation to its team-mates. The choice of weapon could also be communicated, so as to give the bots the option of creating a good composition in their team, so their choice of weapons compliment each others (dis)advantages. The enemies’ presumed positions are possible one of the only things which may be best to make the bot keep to itself, seeing as all the bots will have their estimations of this and have their actions influences by it. If the entire team were to share their statistical estimation of the enemies’ positions (ignoring the processing power it would necessitate), it’s imaginable that the heat maps of the enemies’ presumed positions will be smoothed out, effectively leaving the bots clueless. 3.2.5.2 How to communicate As written in 2.3, human players have some limitations when they have to communicate to each other typically done by the use of radio commands, text messages or voice communication. In 2.4.4 the bots communication method is described, it is though known as a slow and unprecise communication method according to 2.3.2 the best solution will be, to be using the inbuilt voice communicator, but this will require a voice recognition feature. We are not interested in making them able to do so, since the bots are only going to work together with each other, and not be able react upon voice communication. 3.2.5.3 between versus in-between round The bots can either communicate between each round or during the round. Both of these kinds of communication, can be implemented but will highly possible result in two different kinds of outcomes. • The communication between each round, is the kind of communication where all the experience that have been gathered through the round, will be shared with the other team-mates after the round is done. This will most likely result in tactics based on each bots’ earlier experience on the success criteria. This kind of communication would possibly result in a bot, that creates a tactic before each round start. • The communication within each round is hypothesized to be more dynamic, due to the fact that the bots will be able to evaluate upon success criteria when it happens 39 of 79 MED6, Group 617 3 DESIGN AAU-CPH and not just the following rounds. This type of communication will probably require more computational power, but ideally will loose a very static behaviour and thus behave more like a human would. 3.2.6 Navigation Navigation through a computer controlled environment can happen in many different ways, since the computer can send all the coordinates to the bot, and let the bot make use of them. This means that it is possible to use both waypoint, wayzones or raytracing. It is not the only parameter that affects navigation though; The overall navigation path is also important, because it reflects the tasks of the bot. 3.2.6.1 Waypoints Waypoints are locations in the game that each bot uses as part of its’ navigational system. Waypoints can either be programmed statically into each map, be set by a human player going through the map and setting them manually, or be placed automatically using algorithms. 3.2.6.2 Wayzones Wayzones are very similiar to waypoints. The main difference is that, with wayzones, the bot will use a randomized point within a given radius of the waypoint and navigate through this point instead of the actual location of the waypoint. This kind of waypoint/zone makes the bot navigate more dynamically, and look more natural, since it not is depending on one specific point, but a larger area. 3.2.6.3 Geometry Raytracing Raytracing is a way of navigating that makes the bot highly dynamic. It makes the bot shoot out a ray from itself, which it then uses to navigate from. It looks at all the possible ways it can go, and through the information that is send back to the bot, it decides whether it is a good or bad way to go. 3.2.6.4 Fastest Path No matter which kind of navigation method used for the bot, it have to use some kind of pathfinding algorithm that the bot should make use of to navigate with. This could be done using an algorithm that makes the bot navigate as fast as possible through the map, to get to its objective. Despite this kind of navigation, sounds very obvious to be the fastest, is not it always the best path. 3.2.6.5 Most Covered Path Another path that could be used is the most covered path, which make the bot more protected, but will be less fast. The most covered path could be done using the experience from former rounds or games. 3.2.6.6 Fewest Enemies Path The last kind of path that could be a solution for the bot, this kind of pathfinding, also makes use of former experience. This will result in a longer survival, since fewer enemies the bot meets, the less the possibility of being killed. 40 of 79 MED6, Group 617 3.2.6.7 3 DESIGN AAU-CPH Walk To Be Stealth To get the bot as human-like as possible, it must use some of the other kind of movement methods, one of these methods are walking, that makes the bot move slower, but silent, which can give a great advantage, since it makes use of the unpredictability. 3.2.6.8 Run Using Knife another method that often is used by human players, is the knife method, which gives slightly more speed, and make the bot/player run faster. 3.2.6.9 Creative Decisions The creativity must also be possible for the bot to use, a human player can sometimes take some unpredictable paths, to fool the enemy, making an ambush, or just try new ways to learn from. This must be a feature the bot also uses, since it makes it more human-like, as well as learns new paths during the process. 3.2.7 Aim and Shoot The ability to aim and shoot is naturally a core mechanic in a FPS game, and is easy to make a bot able to do, but can be hard to make it appear human-like in doing so. The following paragraph will cover the most important factors to take into account in order to make a bot appear human-like when aiming and shooting. Other than a headshot (shooting the opponent directly in the head), the fastest way of killing an enemy is to shoot him with as many shots as fast as possible. This is (with the automatic weapon choices) achieved by holding down the shoot key (known as ”spraying”), however the weapon’s recoil will make the shots cover a larger area than the crosshair when spraying. Different weapons have different recoil, and while recoil does not matter much at close range, at long range it will make it nearly impossible to hit your target. The approach most human players take to overcome this dilemma is the concept known as ”burst fire”, where only a small salve of shots are fired, followed by a short pause to let the recoil settle down, before firing another salve. Taking appropriate measures to make the bot consider these factors should not only contribute to its human-like traits, but also its shooting accuracy. 3.2.7.1 Reaction Time Another problem stemming from the computer’s inhumane capabilities, is the reaction time. Human players, even the best will always have a delay from thought to in-gameaction, where a bot will be able to react almost instantaneously. Therefore a delay should be implemented to make the reaction time similar to that of a human player - in order to make it even more human-like, the response-time delay could be chosen by random from a set interval, so the delay wont be exactly the same in every single instance of aiming and shooting. 3.2.7.2 Wall Shooting A clever use of game mechanics in Counter-Strike is the fact that many types of objects and walls can be shot through. This is used by many human players, but rarely by bots 41 of 79 MED6, Group 617 3 DESIGN AAU-CPH since they are unlikely to have a visual point of reference on their opponent to aim at. How the bots should be made aware of the presence of the enemy without a visual is at this point undecided, but it could be made in several ways, for instance; by communication with team-mates, by the use of sound input, by ’guessing’ if the enemy is there (e.g. based on the aforementioned heat maps, showing enemies’ presumed positions), or countless other ways. Next comes the point which the bot is to aim at. If the bot is set to (for instance) aim at the enemy’s head, the crosshair will lock on to the opponent’s head and follow it without fail, which naturally does not come across as very human-like. A way to overcome this (im)perfection, could be to select a fixed point on the opponent’s body, and to then draw a circle around this point. The bot will then pick a randomly selected point within the circle to aim and fire at - the radius of the circle will directly control the accuracy of the bot’s aim, as the chance of not hitting the opponent will grow with the size of the circle. 3.2.7.3 Preemptive Grenades Another clever use of game mechanics used by human players are the fact that a thrown grenade will ricochet of a wall it’s thrown into. Most bots does not usually throw a grenade until they have a visual confirmation on their target, where human players tend to use the grenades preemptively to create a secure passage into a room/around a corner, and most often does so by throwing the grenade from a point where there is no line of sight between the enemy and themselves. Making proper use of the grenades’ mechanics can create significant tactical advantages, and should therefore be a part of the bots’ competences. 3.3 Interim Summary Throughout the Design chapter, numerous considerations have been listed, concerning what the bot should assess in order to have a better chance at appearing human-like in its behavior. However, as was previously mentioned, it was always apparent that creating a bot from scratch, containing the entirety of necessary competences would not be possible within the time frame of the project. Therefore, it was decided to develop further on the framework of an existing state of the art bot, researched in the Analysis chapter. 3.4 Choice of Bot Previously, in the analysis 2.4, three different bots for Counter-Strike were introduced. The implementation of this project will apply one of the three bots as a starting point to develop further upon. When comparing the RealBot2.4.3, TeamBot2.4.4 and the PODbot 2.4.5, it became clear early in the process that the PODbot was most applicable due to pragmatic reasons. It offered an easily accessible source code that was structured and accompanied by comments to describe the different classes. This made it fairly easy to implement new code in the PODbot. Besides the practical reasons, the PODbot were deemed to portray a lot of the features we hypothesize to be crucial for human-like qualities. Having said that, the PODbot does lack certain desired features/competences as well as having some features that are in need of improvement, such as it e.g. having a very rigid method of aiming. A feature that has its limitations for the PODbot, is how it conveys 42 of 79 MED6, Group 617 3 DESIGN AAU-CPH information to other bots. Currently, it only communicates and applies a dynamic behavior 2.1.2.2 based on the collected damage received on a waypoint. 3.5 Development Choices for the PODbot The features that will be addressed in the implementation in regards of fulfilling the final problem statement’s goal of interactive communication, establishing dynamic combat tactics, is primarily Navigation. In order to make the PODbot capable of navigating with the elaborate considerations established, several other features will be a necessity to implement as well. The new features which the implementation of the the bot will include, not present in the current version of PODbot and necessary in order to result in a refined navigation, are as follows: • Learn Through Experience • Reinforcement Learning • Navigation • Applying Heatmaps, Waypoints and Wapoint Flags • Tactical Communication • Offensive / Defensive Strategies • Enemies’ (Current) Positions • Enemies’ (Presumed) Positions • Team-mate’s Positions 43 of 79 MED6, Group 617 4 4 IMPLEMENTATION AAU-CPH Implementation This chapter will give an overview of how the modifications to the original PODbot have been implemented. The main focus will be upon the process of how this implementation has been executed, including descriptions of the necessary tools, followed by a section that aims to thoroughly go through the details of modifications and the further development of the bot. 4.1 Counter-Strike Structure The game Counter-Strike is originally built as a mod for the game Half-Life. In the early stages of the beta of Counter-Strike, Valve9 recognized the potential of the game and chose to team up with the developers of the mod. Since Counter-Strike is a mod, added to Half-Life, the two shares a lot of the same physics and programming. The engine used is called ”GoldSource” (Valve [a]). 4.1.1 Half-Life Software Development Kit The Half-Life Software Development Kit(Valve [b]) contains various tools, to add content to the GoldSource engine. Different tools are made e.g. to incorporate 3d models from other software such as 3ds Max(Autodesk). 4.1.2 Add-ons Add-ons are used to include mods etc., to the game. These can be directed at a specific mod, such as Counter-Strike or for all mods. There is a lot of community driven add-ons to be found for Half-Life where most of them are game client specific. The PODbot used in this project is an example of such an Add-on and so is ”Metamod”. 4.1.3 Metamod Metamod (AlliedModders) is used as a link between the GoldSource engine and mod dlls10 . It can be used to handle multiple new plugins in the format of dlls, which will be elaborated further upon in the coming section ”Compiling the Dynamic-Link Libraries”. 4.2 Compiling the Dynamic-Link Libraries Dynamic-Link Libraries (DLLs) are files used for Microsoft Windows. They are, as the name indicates, library files that are typically used by the Operating System. They share similarities with executibles (.EXE files), but are not directly executable. DLL files are roughly said a container for Microsoft’s shared libraries. The PODbot source files are compiled as a DLL file that is being loaded by the previously mentioned add-on ”Metamod”. Many tools have been developed over the past decades to enable the compiling 9 10 the team behind Half-Life A filte type, explained in section 4.2 44 of 79 MED6, Group 617 4 IMPLEMENTATION AAU-CPH of binary files such as DLL files. However, we found that the simplest method of doing so, was by using the Command Prompt (cmd.exe). The cmd holds a method called ”make”GNUWIN, which is a tool that enables generation of executables and other nonsource files. It requires something called a makefile which essentially is the recipe for what is being built. The makefile holds information as to what source files are being used, the target directory, operating system(s) and a lot of other settings related to the building of the program/files. In this case the makefile refers to the metamod add-on as well as the Half Life Software Development Kit. 4.3 PODbot The bot we chose in chapter 3 was the PODbot(3.4). The PODbot AI consists of a complex navigation and experience system, and is referred to as a renown state of the art bot. (Markus) Before changing the navigation system, it was necessary to investigate what the PODbot uses as inputs, and how it interprets these inputs for navigation. In the following paragraphs the functioning of the PODbot will be described and explained in order to, later in the report, gain an understanding of how to further development upon it’s origin. 4.3.1 PODbot Properties The focus will remain on the navigation, and any closely related properties to this function. Orientation properties3.2.7, the aiming properties 3.2.7, or any other properties not directly related to the navigation of the PODbot will not be discussed. The PODbot has gone through thorough reiterative development since 2004, including several refined properties that is specifically designed with the intention of making it more human-like in its behavior. The navigation property covers the waypoint system, the navigation algorithm (A* algorithmPatel), and the experience system. The communication system in the PODbot covers message handling, such as radio commands(2.3.2), and text message communication(2.3.1). The communication used in the navigation property is a sharing of the bot’s experience with the entire team, which will be elaborated later 4.3.2.6. 4.3.2 Understanding of the PODbot’s Code Due to the sheer size of the PODbot’s open source code, even after limiting the focus of the following chapter to the navigational properties, it will still entail thousands of lines of code. Therefore most of the code will be described and explained in a concise manner, with elaborate content and depictions when imperative. All code developed for this project, as well as the entire source code for the PODbot, can be found appended at Appendix10 - 4. Fig. 9: The Header of the method BotCollectExperienceData() The bot uses the method BotCollectExperienceData (Figure 9), to gather the damage taken at the closest waypoint, and add it to the already existing damage stored at that 45 of 79 MED6, Group 617 4 IMPLEMENTATION AAU-CPH waypoint. The method takes in pVictimEdict, and pAttackerEdict, which are two parameters that points to the data of the victim player, and the data to the attacking player. The last parameter used is the iDamage, which is an integer that simply states the damage done to the victim. This method is called each time damage is done to a bot. Furthermore, the method, BotCollectExperienceData() (Figure 9): • Adds experience (damage value) to the victim bots waypoint. • Adds experience to the victim bot’s waypoint about the most dangerous waypoint, from the victim bots waypoint. • Checks if a waypoint is exceeding the max limit of experience points. • Finds and stores the most dangerous waypoint in the game. • Stores the overall experience in all the waypoints. • Shares the knowledge with the team-mates in round. 4.3.2.1 The Victim Index Adds experience to the victim-bots waypoint about the amount of damage from the attackerbots waypoint The bots experience values are an amalgam of several different sources, but the most dominating source is damage inflicted to a bot standing at (or near) a specific waypoint. The waypoints in the bots’ experience are split up into Victim Indices and Attacker Indices. In this paragraph it is only the Victim index that is going to be explained. The Victim index is referring to a waypoint where the (victim) bot received damage, and/or died. The experience value at a victim index is equal to the total amount of damage that has been received at that specific waypoint. The value is stored in an array that contains all waypoints in the map, holding its damage values in integers. When the damage values have been added to the index, is it stored in a pointer called uTeamDamage, which is referred to every time, the experience at a waypoint have to be used. The entire code of the CIAbot can be found in Appendix10 - 5 . 4.3.2.2 Adds experience to the victim bots waypoint about the most dangerous waypoint, from the victim The bot stores not only experience about where it received damage, but also the position of the source of the damage taken. It is done by using the Attacker index, stored in an array attached to the Victim index which the damage received is stored at, as well as the amount of damage received. The bot uses this experience, to know where the enemy most likely is a threat (Attacker Index), from that specific position (Victim Index). 4.3.2.3 Checks if a waypoint is exceeding the max limit of experience points Each time damage is done, the bot checks if the amount of experience of either the Victim index, or the Attacker index, exceeds 2040 experience points. If an index value exceeds the limit, it is set to the max, which is 2040. 46 of 79 MED6, Group 617 4 IMPLEMENTATION AAU-CPH Fig. 10: The illustration above demonstrates how the weights of the waypoint are distributed dynamically in run-time 4.3.2.4 Find and stores the most dangerous waypoint in the game Every time damage points have been added to an index, all waypoints are compared to each other, and the one with most damage, is added to a global variable as the most dangerous waypoint in the game. This waypoint with the title as the dangerous waypoint, is the one with largest cost value, in the A* navigation system. 4.3.2.5 Stores the gathered experience in all the waypoints As earlier described, the damage stored in a pointer is called uTeamDamage, that is the variable the bot takes into consideration, when making a path. After each round the bot will store additional data on the waypoints, depending on what experience it has had and what experience it received from its’ team-mates. This continuously developing values in the waypoint system is what ”reinforces” the bot with new knowledge, which it uses to dynamically adjust it’s pathfinding preferences. This on-line adaptive behavior can be referred to as Reinforcement Learning which is described in 2.1.3.3. The PODbot does not store the experience in a document automatically, but has to be triggered to do it, which is done by a console command in Counter-Strike. 4.3.2.6 Shares the experience with the team-mates. The variable uTeamDamage is not only using its own experience, but is sharing the experience with all bots on the same team (Terrorists or Counter-Terrorists). This is the only functional communication that the bot uses. The bot is equipped with a radio communication, however, it is only compatible for human to bot communication. 47 of 79 MED6, Group 617 4.4 4 IMPLEMENTATION AAU-CPH CIAbot The Communicative Interactive Agent (CIAbot) is the name dubbed to our further development of the PODbot, where only part of the PODbot have been changed. This is done, because of time constrains trying to build a new bot, and due to the fact that, the project’s focus is on the further development of the ”State Of The Art” bots, and not reinvention. 4.4.1 Changes to the PODbot Fig. 11: The Header of the method UpdateVisibleWPThreat() The header of the method (Figure 11) that is implemented in the CIAbot is called UpdateVisibleWPThreat and takes in a bot pointer called pBot, which is referring to the current bot. UpdateVisibleWPThreat is a method used as the main change in the PODbot, which makes it into the CIAbot, and is the method where all new code is placed. In the implementation, it was chosen to focus on making the bot capable of communicating dynamic, which meant, that it should be able to tell what it sees, and where it have seen it - an share the experience in run-time. This was done by making a new method that take in the current bot as parameter, and check each frame, if it sees an enemy. If the bots sees an enemy, it adds 1 experience point to the enemy waypoint. If the bot cant see any enemies, it clears all experience in the waypoints within its vision, for the given round. This process runs every frame, with the intention of making the bots able to communicate threats they observe (or does not), in order to precisely estimate the presence of all enemies in the map. In comparison to the PODbot’s experience system, the CIAbot has the ability to relay real-time intelligence to its team, where the PODbot’s experience values only change once it receives damage. The method, UpdateVisibleWPThreatHeader() (Figure 11) differs from the way the PODbot work, in the way that it; • Changes the limit of experience a waypoint can contain, and how the value are processed when limit is exceeded. • Adds experience based on where the bot sees an enemy. • Delete experience (Temporarily) based on the bot vision; clears waypoints when no enemy is detected. • Share the experience constantly with its team-mate bots (Communication) 4.4.1.1 Change the limit of experience a waypoint can contain, and how the value is processed, when limit is exceeded. Instead of just making a maximum of how large the experience point can be and then assign them to the maximum, an algorithm have been used to control the maximum. This algorithm use a for-loop going through all the waypoints, and divide them with 2 when a waypoint’s experience has surpassed the set maximum value allowed. The reason why changes to the limit system have been done, is to increase the range of experience. The 48 of 79 MED6, Group 617 4 IMPLEMENTATION AAU-CPH PODbot’s limitation in the system of waypoint damage values(4.3.2.1), is that, once a waypoint has reached the maximum value, it does not change. This leads to eventually having several waypoint at the same value (the maximum), despite having received different amounts of damage at the two given sites. 4.4.1.2 Add experience based on the bot vision, when the bot sees an enemy. One of the core addons that have been made is the vision addon. This makes the bot capable of getting experience based on the detection of enemies, and not only when damage taken. A large amount of different arrays have been used to hold the data from uTeamExperience; process this data, save the data, and delete the data. The way it works is that the bot spots an enemy, finds the enemys position and adds 1 point to it. This is done in every frame, which result in many points added in the long run. Afterwards this data is added into an array, holding the uTeamDamage value, and checked if it is reached the limit. Afterwards, it is sent to uTeamDamage and saved. 4.4.1.3 Delete experience based on the bot vision, clears waypoints when no enemy is detected. Another add-on that has been implemented is the use of the bots vision in terms of clearing areas. When a bot sees an area without an enemy present, it checks each waypoint to see if any damage stored in its’ experience. The waypoint is subsequently set to clear, which means that it is set to a value of 1. This makes it possible for the bot to communicate, that the area is ’clear’. The special about this feature, is that it only functions in-round, and doesnt save this data for the next round. This is however transferred directly to the uTeamDamage pointer while the round is running, which affects the bots’ navigational decisions. When the round is done, uTeamDamage is set to what it was in the start of the last round, plus the new damage and vision experience of enemy positions. 4.4.1.4 Share the experience constantly with its team-mate bots(Communication) The method 11, as described in chapter 4.4, is running in every frame. The information(data) that the bots assign to the waypoints is constantly updating the experience system, which applies for the entire team. 4.5 HLTV & Screen Capturing inc. The recordings made in the test scenarios are created with a plugin called ”HLTV” (abbreviation of ”Half-Life Television”). This plugin is connected to the server during playtime. It takes up a player spot and stays as such as a spectator in the game, recording all actions in the game. The output format is something called a ”.dem” file, which is an abbrevation of demonstration. These type of files can only be opened with the exact Half-Life game and mod that it was recorded in. Reviewing the .dem file, enables the viewer to lock on whichever player they would like to view, from any available angle and/or overview. In order to make these screen recordings into movie files for later editing and upload to YouTube, they had to be converted. However there is not a simple solution in doing so. It 49 of 79 MED6, Group 617 4 IMPLEMENTATION AAU-CPH came down to the following two choices: • Have a script take a screen-shot for each frame and add these together in some video editing software and export as a recognized video format. • Use screen-recording software to capture the wanted playback of the .dem file and edit it in some video editing software. The latter was chosen. None of these choices were ideal, since they had to be recorded in real-time by a computer with enough computational power to run the demonstration along with the script or screen-recording software. 4.6 Editing When all of the screen-recordings were finished, the process of editing began. The aim was to have as many videos as possible, so the subjects would be able to answer as many questions as they could endure. 4.7 Interim Summary Throughout the implementation the procedure of compiling source files, into a dll file, and how halflife is capable of using the dll file through metamod has been described. Furthermore, an explanation of the transformation from PODbot to CIAbot have been presented. This transformation resulted in an enhancement of the navigation system, that gives the CIAbot tools to communicate what it sees to the other bots. When the implementation of the CIAbot was done, a program called HLTV was used to film the game, when playing it, both from the bots and from the humans perspective. These videos are used in the coming chapter, chapter 5. 50 of 79 MED6, Group 617 5 5 TESTING AAU-CPH Testing With reference back to the Finale Problem Statement; ”Will interactive communication between FPS game bots, in order to establish dynamic combat tactics, increase the development towards a completely human-like bot?” the purpose behind the testing will be to investigate whether an interactive communication will enhance the appearance of human-like combat tactics. 5.1 Different Approach Than The BotPrize Competition In the Analysis, 2.5 the The Turing test, along with its extensions such as Gestural Turing test 2.5.2 and Player Believability test 2.5.2 were introduced as ways in which a computer can test its human-like appearance. In relation to this project, the type of Turing test that is most suitable is the Player Believability test Tog [2012]. This test, which estimates the human-like behavior in computer-game bots is applied in the widely acknowledge BotPrize competition mentioned in 2.5.3. Despite being applied as a primary test for evaluating human-like behaviors in bots, some complication were encountered when trying to employ it in this project. 1. The BotPrize competition uses the Team Death Match game to test bots. Since the PODbot is only applicable in the Counter-Strike game environment, it will not be possible to test the bot in the same game engine as in the BotPrize competition. 2. Attempting to do a test in Counter-Strike, similar to the test in the BotPrize contest, turned out to be problematic. In the Team Deathmatch game, the judge is playing against one bot and one human-player three players against each other all in all. Since Counter-Strike is based on team battles and does not allow more than two teams to play against each other, this procedure is not adequate for the PODbot. 3. An alternative to this issue could be to place the test subject on one of the two teams in Counter-Strike. The judge could play on a team of human-player against bots, and subsequently on a team of bots against humans. This way the interactive part of the Turing test would not be neglected. However, the test procedure would be misleading since the PODbot is not programmed to account for a human players behavior and experience, and is only receptive to voice commands which is not of this project’s interest. 5.2 Testing Through Observation Instead of Interaction Based on the deliberations mentioned above, it was apparent that the test for the PODbot would not include an interactive test subject. Now the question would be to which extent a recorded video of the two players (bot and human) could be sufficient in estimating a human-like behavior. One could argue that an the observer is not engaged in the true 51 of 79 MED6, Group 617 5 TESTING AAU-CPH experience of game-play believability as he/she is able to center his/her attention more on the target, since the player would not have that much time to assess the opponent in an ingame situation. However, this issue is emphasized in the article ”Assessing Believability” written by Julian Togelius, a professor at the Center for Computer Games Research, IT University of Copenhagen Tog [2012]. The article underlines that an assessment based on mere observation might be more adequate: In some cases a human player might not actually be a good judge. Human players can be too involved in playing the game, and thus unable to devote much time to observing other players closely. Further, they may be limited to seeing only a fraction of the behavior of the other players in many games the amount of time that other players are actually in view might be quite small. With limited ability to gather evidence, a human player may be unable to judge the test. 5.3 Test Setup Through the following sections, the test setup will be explained. Focus will be on the different aspects of the test, such as playing against the bots, the video footage recorded and the survey. 5.3.1 Hypothesis ”Real-time communication of the positioning of threats, will increase the human-like qualities of the CIAbot, in a player believability test, compared to the PODbot.”. 5.3.2 Playing against PODbot and CIAbot Since it was not beneficent to conduct a test where the test subjects could personally play against the bot themselves (5.1), the test for this project entails having the test subjects as observers and not as participants of the ongoing game. Thus, the first step that were done towards enabling a test, was to record four human players play against the CIAbot and PODbot, respectively. It was ensured that these players did not have any tie to this project, nor were they made aware of them being recorded while playing until after the session had ended. This was to omit any bias in their in-game behavior. The reason for making recordings of players playing against the PODbot as well as the CIAbot, was to enable an assessment of the original bots’ human-like quality (i.e. the PODbot), and to compare it to the results of the CIAbot, in order to determine whether the further development had improved the human-like qualities. Since the bots behavioral patterns changes in some aspects, depending on whether it is acting as a Terrorist or a Counter-Terrorist (mainly due to in-game objectives), the players played 10 rounds against the CIAbot and PODbot, as Terrorists and Counter-Terrorists. Producing a total of 40 game rounds recorded: 20 rounds vs. PODbot (10 as Terrorists and 10 as Counter-Terrorists), and 20 rounds vs. CIAbot (10 as Terrorists and 10 as Counter-Terrorists). 52 of 79 MED6, Group 617 5.3.3 5 TESTING AAU-CPH Video footage After the 40 rounds had been played and recorded, the draft for the 40 test videos commenced. With the 40 rounds, with 4 players on 2 teams, a total of 160 video combinations were possible. However, wanting as many answers-, on as many different videos of possible, it was decided to only draft 40 videos; 20 videos from the CIAbot matches and 20 videos from the PODbot. The videos were assembled by random, in order to diminish any bias that may follow from selecting which recordings ourselves. First the round was chosen randomly, next which Counter-Terrorist player’s perspective were to be shown and finally which Terrorist player’s perspective were to be shown. This was done within the frame of ensuring that 10 videos would show the CIAbot as the Terrorist, 10 videos showing the CIAbot as the Counter-Terrorist, 10 videos showing the PODbot as the Terrorist and 10 videos showing the PODbot as the Counter-Terrorist - in every single video the player from the opposite team would be a randomly picked human player’s perspective. 5.3.4 Survey Due to the extensive amount of time it would require a single subject to answer the full survey, the possibility of a reward was announced in the description of the questionnaire(Appendix10 - 8): ” You are free to stop the survey at any given time and submit your response. The more questions you answer correctly, the bigger your chance will be to win 50$, paid via PayPal. The winner will receive notice latest the 30th of June. If you want to stop the survey prematurely, simply answer ”No, I want to stop” when asked if you wish to continue.” This was given as an incentive to have our subjects answer questions to more videos than we would assume for them to do out of selflessness. The detail of only considering correct answers towards the possibility to win the cash price, was to eliminate any notion of mindlessly clicking through the survey, corruption the data. Additionally, a timestamp was saved with the subjects’ responses, to sort out any submissions that were started/finished within an unreasonable short amount of time. The survey, which were formed and conducted in Google Docs, were constructed in such a way, that every test participant initially were to include certain demographic information such as name, age, gender and occupation. Before the actual test, the test subject were also asked how much experience he/she had with FPS games in general, on a scale from 1 (having no experience) to 6 (being very experienced). The same scale were applied to indicate how much experience the test participant had with Counter-Strike in particular. Subsequently the test subject was forwarded to a page consisting of the first of the 40 test videos and a brief set of questions to the specific video. To diminish the effect of a learning bias on the results of the video questions, the survey was constructed and distributed in four different formats, each with a randomly generated order of showing each of the 40 videos in. Another reason for the randomized order of the test videos were that the subjects were given no information about the number of different artificial intelligences in the videos, and in the event that the PODbot and CIAbot were to exibit obviously different behavioral patterns, it was not wanted to have this create any bias as to whether they appeared human-like or not, in the eyes of the subject. The randomized order of the test videos also helped in regards of a potential learning bias in this aspect. 53 of 79 MED6, Group 617 5 TESTING AAU-CPH The first question concerning the test video, asked the test subject to judge which one of the two players were a computer-controlled bot. On a scale from 1 to 6 (1 being very uncertain and 6 being convinced), the subject was to rate the certainty in his/her choice. If the test subject for instance could not distinguish one player from the other, or could not mark any exposed characteristics regarding the behavior of the players, he or she also had the option of selection ”does not know”. This option was added since it was deemed that it would be a sign of success for the hypothesis, if the subject could not determine which of the two shown players were more/less human-like in their behavior. To gather some qualitative feedback, the test subjects were given the opportunity to elaborate the reasoning behind his/her answer. Lastly the test subject were asked if he/she wanted to continue the survey, and get a chance of winning 50 dollars. Approving the request by answering ”yes, I would like to continue”, forwarded the test subject to a new page with a new video, with identical questions; which player is the bot, level of certainty, possible elaboration of the response and whether or not the subject would like to continue or not. 5.3.5 Finding Test Subjects This section covers which sources the project used for finding test participant, and what impact these sources had on the validity of test subjects. 5.3.5.1 Requirement for Test Subjects’ Experience With an reference to the Player Believability test 2.5.2 in the BotPrize Competition, the human-like qualities of a bot is here judged exclusively by certified expert judges. The approach for this test is not restricted to expert judges, and will also incorporate test participants that are less experienced. This was because, due to the point in time of the survey’s creation, it was not certain what level of experience could be deemed sufficient to validate the subject’s responses, therefore any subject were allowed to submit their responses to the survey. 5.3.5.2 Subjects With Vague Experience with CS and FPS-games Subjects who have too much lack of experience with CS or FPS-games in general, could be removed in the post-processing of the survey results. This decision was based on the assumption that people with barely any prior experience, were not considered to be adequately familiar with any characteristics of either human-players’ nor bots’ behavioral patterns, and thus would not be valid subjects for testing. 5.3.5.3 Counter-Strike Forums Links to the surveys were posted online on various Counter-Strike forums, in order to get as many experienced players to answer as possible. 5.3.5.4 Social Network A secondary source used for finding subjects was Facebook. This source was mainly used for its’ quantitative potential, as it is the platform that currently reaches the widest amount of people that are likely to test the bots. Since the surveys were set up to remove any insufficiently experienced subjects post-testing, the concern of whom might follow the 54 of 79 MED6, Group 617 5 TESTING AAU-CPH survey links was of little concern, since any invalid subjects (due to inexperience) would simply have their results removed later on. 55 of 79 MED6, Group 617 6 6 TEST RESULTS AAU-CPH Test Results Due to a time restraint, the surveys were taken down after 5 days, with 69 respondents and a total of 755 views and answers spread out over all 40 of the test videos. After having processed the test results of the surveys, the data showed some suspiciously large deviations, which after investigating them went to show that 5 of the 40 test videos had been assembled by the wrong video snippets which rendered them useless in regards of the test, and was therefore discarded completely, including any answers they had received (specifically test video 21, 22, 33, 35 and 40). Although the presence of two different bots were not informed to the test subjects, (5.3) and the identities of the two different types of bots were further masked by the randomizing of the order of presentation in the test videos, the results have all been processed for videos containing footage of the CIAbot and the PODbot separately, so as to compare the two different bots results. As previously mentioned in 5.3, one of the initial questions of the survey was addressing the level of self-perceived experience in regards to FPS games in general as well as to Counter-Strike specifically, asked as a likert-scale ranging from 1 to 6, 1 being no experience at all, and 6 being very experienced. For reasons which will be elaborated later in this chapter, the test results were processed in two segment sizes: subjects with an experience in FPS games and Counter-Strike, both rated to be 2 or higher, and another segment size which only includes the answers of subjects with an experience in FPS games and Counter-Strike, both rated to be 4 or higher. The human-like quality were assessed in two ways: For each of the individual test videos, and for the collective results of all videos for the CIAbot and the PODbot. As the following paragraphs will show in detail; this was calculated by taking the amount of wrong answers and dividing with the total amount of answers for the video/bot. The first results which will be presented, will be those of the subjects with an experience of 2 or higher. 6.1 Experience >= 2 The following results can be found in Appendix10 - 9.Despite having 5 of the test videos disqualified from the results, the test results considering submissions from >= 2 experience in FPS games and Counter-Strike consists of: • 65 respondents • 638 answers • A total of 35 test videos with an average of 18.23 answers to each test video. The mean value of subjects answering false of the CIAbot is 0.2311, which means that the average of the subjects answering, that the CIAbot was a human, was 23.11%. The standard deviation was 0.0677. Which means that the average deviation of the subjects answer to the CIAbot, deviated with 6.77%. The mean value of subjects answering false of the PODbot is 0.2691, which means that the average of the subjects that answered, that the PODbot was a human, was 26.91%. 56 of 79 MED6, Group 617 6 TEST RESULTS AAU-CPH The standard deviation was 0.0884. Which means that the average deviation of the subjects answer to the PODbot, deviated with 8.84%. Two T-tests were performed in order to find if any noticeable changes from the PODbot to the CIAbot had been made. The first T-Test performed was to test for hypothesized mean, which is set to the mean of the PODbot. If the null hypothesis was rejected, it will mean that a change have been made, for either the worse or better. If however, the null hypothesis is true, it means that no noticeable changes had been made. The test for hypothesized mean resulted in a rejected null hypothesis, with the P-value 2.6331e -23. The second test performed was a paired sample T-Test, which uses two samples and compares them, to see if they deviate from each other. This also resulted in a rejected null hypothesis, with a P-value at 1.4257e -09. Fig. 12: Graph showing the amount of answers each video (1-20) received, for CIAbot and PODbot. The x-axis showing the index number of the videos, and the y-axis showing the amount of answers. The graph 12 shows the amount subjects answering each video. Since a lot of the videos for the PODbot was not usable, due to biases in the video, these has been left out as well as the answers for them. The graph shows that most of the videos are equally represented in terms of answers, though there are some videos over represented of either answers to PODbot or CIAbot. 57 of 79 MED6, Group 617 6 TEST RESULTS AAU-CPH Fig. 14: Showing amount of true and false answers. The x-axis showing the index number of the videos, and the y-axis showing the amount of answers. Fig. 13: Showing amount of true and false answers. The x-axis showing the index number of the videos, and the y-axis showing the amount of answers. The two graphs 13 and 14 shows how many subjects that answered wrong, and how many that answered right, for both the CIAbot and the PODbot. Again as seen in the figure 12 there are answers missing to some of the PODbot videos. In these graphs it is clear that in some videos it was easier to spot the bots, than others. There are i.e. a lot of subjects, that can differ the bot from the human player in video 2, for the CIAbot, where video 10 of the PODbot, almost have identical false and true answers. Fig. 15: Percentage Human-likeness pr. video. The x-axis showing the index number of the videos, and the y-axis showing percentage human-likeness. Figure 15 shows how human-like the bots were in percentage. It shows that there are a large difference of the human-likeness between the videos, and also between the two bots, in the videos. The largest difference is between video 10 and 11, where the PODbot is about 42% human-like in video , and in video 11 is it only about 14% human-like. 58 of 79 MED6, Group 617 6 TEST RESULTS Fig. 16: CIA2total Normal Distribution: The graph shows, the normal distribution of the CIAbot, it shows, how well represented, the answers are, in correlation to the mean of the CIAbot response sample. The x-axis showing the percentage of human-likeness, and the y-axis showing the amount of answers. AAU-CPH Fig. 17: POD2total Normal Distribution: The graph shows, the normal distribution of the PODbot, it shows, how well represented, the answers are, in correlation to the mean of the PODbot response sample. The x-axis showing the percentage of human-likeness, and the y-axis showing the amount of answers. The graphs in figures 16 and 17 shows how much the human-likeness deviated, in comparison to a normal distribution, of both CIAbot and PODbot. This clarifies that, the percentage values, are deviating a lot, and are not following the Normal Distribution curve, which is supported by the graph 15, where it is clear, that the percentage of humanlikeness variates a lot from video to video, in both instances of CIA- and PODbot. Fig. 18: CIA2total Boxplot: The graph shows the boxplot of the CIAbot. It shows, how much of the total sample size, was in the 50 percent closest to the mean of the, normal distribution, by using the blue box. The black whiskers are the parts outside, of the 50 percent closest to the mean and the red line is the median. The yaxis showing the distribution of the answers, based on the percentage of human-likeness. Fig. 19: POD2total Boxplot: The graph shows the boxplot of the PODbot. It shows, how much of the total sample size, was in the 50 percent closest to the mean of the, normal distribution, by using the blue box. The black whiskers are the parts outside, of the 50 percent closest to the mean and the red line is the median. The yaxis showing the distribution of the answers, based on the percentage of human-likeness To get a better visual representation of the variation, between the answers to the videos, have boxplots of both CIA-and PODbot been made in figures 18 and 19. The Boxplots shows that the CIA bot variates, with a span between 10% and 40%, with a concentration between 17% and 27%. The PODbot variates more, and have a wider concentration of the answers, which means that there were a larger difference in the questions. 59 of 79 MED6, Group 617 6 TEST RESULTS AAU-CPH Fig. 21: POD2totalTrueCertainty Normal Distribution: The graph shows the normal distribution of the certainty level, that the subjects had when they were correct, judging the PODbot. The x-axis showing the certainty value from a likert-scale, and the y-axis showing the amount of answers. Fig. 20: CIA2totalTrueCertainty Normal Distribution: The graph shows the normal distribution of the certainty level, that the subjects had when they were correct, judging the CIAbot. The xaxis showing the certainty value on a likert-scale, and the y-axis showing the amount of answers. The figures 20 and 21 shows the normal distribution of how certain the subjects were in their correct answers. The CIAbots normal distribution shows that there is a large amount of people, that is outside the normal distribution. The answers for the PODbot though, are very close to fit in the normal distribution, despite the leftmost graph, is deviating a lot, in comparison to the rest, of the answers. Fig. 23: POD2totalFalseCertainty Normal Distribution: The graph shows the normal distribution of the certainty level, that the subjects had when they were incorrect, judging the PODbot. The x-axis showing the certainty value from a likert scale, and the y-axis showing the amount of answers. Fig. 22: CIA2totalFalseCertainty Normal Distribution: The graph shows the normal distribution of the certainty level, that the subjects had when they were incorrect, judging the CIAbot. The xaxis showing the certainty value from a likert-scale, and the y-axis showing the amount of answers. Figures 22 and 23 shows how certain people were, when their answers were incorrect. The CIAbot answers, are again deviating a lot, in comparison to the normal distribution. The PODbot though are very close to follow the standard deviation. this means that the questions are very likely to be correct, according to sample size, and biases. The mean certainty of the subjects answering false, at the question if the observed CIAbot was a bot, was 3.6905. The standard deviation was 0.7015, which means that the average deviation of the mean certainty was 0.7015. The mean certainty of the subjects 60 of 79 MED6, Group 617 6 TEST RESULTS AAU-CPH answering false, at the question if the observed PODbot was a bot, was 3.9324. The standard deviation was 0.4323, which means that the average deviation of the mean certainty was 0.4323. As previously stated, the subjects self proclaimed experience levels in FPS games and Counter-Strike were of interest , as it was expected that there was a significant difference in the accuracy of highly experienced players answers, compared to that of moderately experienced players. The following test results have been processed as such, and exclusively consists of answers made by subjects with an experience of 4 or greater in both FPS games in general, as well as Counter-Strike specifically. 6.2 Experience >= 4 The following results can be found in Appendix10 - 10.The test results considering submissions from >= 4 experience in FPS games and Counter-Strike consists of 36 respondents, and a total of 450 views and answers, spread out over all of the 35 test videos. • 36 respondents • 393 answers • a total of 35 test videos with an average of 11.23 answers to each test video. The mean value of subjects answering false to the CIAbot, after the excretion, was 0.1818, which means that the average of the subjects that answered, that the CIAbot was a human, was 18.18%. The standard deviation was 0.0808. Which means that the average deviation of the subjects answer to the CIAbot, deviated with 8.08%. The mean value of subjects answering false of the PODbot, after the excretion, was 0.1883, which means that the average of the subjects that answered, that the PODbot was a human, was 18.83%. The standard deviation was 0.0788. Which means that the average deviation of the subjects answer to the PODbot, deviated with 7.88%. Two T-tests were performed in order to find if any noticeable changes from the PODBot to the CIAbot had been made. The first T-Test performed was to test for hypothesized mean, which is set to the mean of the PODbot. If the null hypothesis was rejected, it will mean that a change have been made, for either the worse or better. If however, the null hypothesis is true, it means that no noticeable changes have been made. The test for hypothesized mean resulted in a true null hypothesis, with a p-value at 0.2595, which means that the bot have not changed. The second test performed was a paired sample T-Test, which uses two samples and compares them, to see if they deviates from each other. This resulted in a true null hypothesis, with a P-value at 0.4501, which backs up the test for hypothesized mean. 61 of 79 MED6, Group 617 6 TEST RESULTS AAU-CPH Fig. 24: Graph showing the amount of answers each video (1-20) received, for CIAbot and PODbot. The x-axis showing the index number of the videos, and the y-axis showing the amount of answers. In comparison to figure 12 are the amount of answers in figure 24 much more equal. This has happened because a lot of data have been omitted, due too little experience of the subjects. The amount of answers are also more equal between CIA-and PODbot. Fig. 25: Showing amount of the CIAbot’s true and false answers. The x-axis showing the index number of the videos, and the y-axis showing the amount of answers. Fig. 26: Showing amount of the PODbot’s true and false answers. The x-axis showing the index number of the videos, and the y-axis showing the amount of answers. The graphs 25 and 26 shows that the variation of false answers are following each other much more in comparison to the 13 and 14, after excreted the most inexperienced subjects out of the answers. 62 of 79 MED6, Group 617 6 TEST RESULTS AAU-CPH Fig. 27: Percentage Human-likeness pr. video. The x-axis showing the index number of the videos, and the y-axis showing the percentage The figure 27 shows that there still, after the excretion, are a large deviation between how human-likeness the bots are perceived, between the videos. Fig. 28: CIA4total Normal Distribution: The graph shows, the normal distribution of the CIAbot, it shows, how well represented, the answers are, in correlation to the mean of the CIAbot response sample. This graphs sample was composed only of subjects with more than or equal to 4 in experience, in either First-Person Shooter games in general or Counter-Strike specifically. The x-axis showing the humanlikeness in probability, and the y-axis showing the amount of answers. Fig. 29: POD4total Normal Distribution: The graph shows, the normal distribution of the PODbot, it shows, how well represented, the answers are, in correlation to the mean of the PODbot response sample.This graphs sample was composed only of subjects with more than or equal to 4 in experience, in either First-Person Shooter games in general or Counter-Strike specifically. The x-axis showing the humanlikeness in probability, and the y-axis showing the amount of answers. The graphs 28 and 29 shows how much the answers deviates from the normal distribution which have not changed from 16 and 17, after the excretion. This means that the videos are most likely too different, and some videos reveals too much of the bot than others. This being the case for both bots. 63 of 79 MED6, Group 617 6 TEST RESULTS Fig. 30: CIA4total Boxplot: The graph shows the boxplot of the CIAbot. It shows, how much of the total sample size, were in the 50 percent closest to the mean of the, normal distribution, by using the blue box. The black whiskers are the parts outside, of the 50 percent closest to the mean and the red line is the median. This graphs sample was only subjects with more than or equal to 4 in experience, in either First-Person Shooter games of Counter-Strike. The y-axis showing the distribution of the answers, based on the percentage of human-likeness AAU-CPH Fig. 31: POD4total Boxplot: The graph shows the boxplot of the PODbot. It shows, how much of the total sample size, was in the 50 percent closest to the mean of the, normal distribution, by using the blue box. The black whiskers are the parts outside, of the 50 percent closest to the mean and the red line is the median. This graphs sample was only subjects with more than or equal to 4 in experience, in either First-Person Shooter games of Counter-Strike. The y-axis showing the distribution of the answers, based on the percentage of human-likeness Figures 30 and 31 gives a very precise and good visual presetation of how the numbers are concentrated. The CIAbots boxplot, shows that the concentration of answers are close to the middle of the whole range. It also shows that it is very centered according to the median. The PODbot’s boxplot however, shows that the concentration is in the higher part of the range wherein subjects have answered. This also shows a median that is very high, in comparison to the range. Fig. 33: POD4totalTrueCertainty Normal Distribution: The graph shows the normal distribution of the certainty level, that the subjects had when they were correct, judging the PODbot. This graphs sample was composed only from subjects with more than or equal to 4 in experience, in either First-Person Shooter games in general or Counter-Strike specifically. The x-axis showing the certainty value from a likert-scale, and the y-axis showing the amount of answers. Fig. 32: CIA4totalTrueCertainty Normal Distribution: The graph shows the normal distribution of the certainty level, that the subjects had when they were correct, judging the CIAbot. This graphs sample was composed only from subjects with more than or equal to 4 in experience, in either FirstPerson Shooter games in general or Counter-Strike specifically. The xaxis shows the certainty value from a likert-scale, and the y-axis shows the amount of answers. 64 of 79 MED6, Group 617 6 TEST RESULTS AAU-CPH The two normal distributions shows in graphs 32 and 33 that there still are a large deviation between the answers about the certainy of their correct answers, according to the normal distribution and the mean. CIAbots histogram though, fits much better into the normal distribution than the normal distribution of the PODbot 21. Fig. 35: POD4totalFalseCertainty Normal Distribution: The graph shows the normal distribution of the certainty level, that the subjects had when they were incorrect, judging the PODbot. This graphs sample was composed only from subjects with more than or equal to 4 in experience, in either First-Person Shooter games in general or Counter-Strike specifically. The x-axis showing the certainty value from a likert-scale, and the y-axis showing the amount of answers. Fig. 34: CIA4totalFalseCertainty Normal Distribution: The graph shows the normal distribution of the certainty level, that the subjects had when they were incorrect, judging the CIAbot. This graphs sample was composed only from subjects with more than or equal to 4 in experience, in either FirstPerson Shooter games in general or Counter-Strike specifically. The xaxis showing the the certainty value from a likert-scale, and the y-axis showing the amount of answers. The graphs 34 and 35 shows that the subjects testing the CIAbot, almost follows the normal distribution, which are some much better results, in comparison to the 22, and deviates only a little from the normal distribution. The subjects testing the PODbot were very different in their answer about certainty, and make the answers about the certainty deviate a lot. The mean certainty of the subjects answering false, at the question if the observed CIAbot was a bot, was 2.6667. The standard deviation was 1.1155, which means that the average deviation of the mean certainty was 1.1155. The mean certainty of the subjects answering false, at the question if the observed PODbot was a bot, was 2.8276. The standard deviation was 1.2626, which means that the average deviation of the mean certainty was 1.2626. 6.3 Interim Summary Through this chapter the test results have been presented and analyzed. It is clear that there is a lack of experienced participants, these results though can and will still be used, to elaborate and answer the Final Problem Statement(1). The deeper elaboration and answer to the Final Problem Statement will be explained in the Discussion Chapter. 65 of 79 MED6, Group 617 7 7 DISCUSSION AAU-CPH Discussion Before diving into the finer topics of discussion, the first part of the chapter will go through the potential biases associated with the choice of game and open source bot material. 7.1 Choice of Game The choice of game as well as version may be questionable, considering the fact that Counter-Strike 1.6 is a relatively old game by now, and other newer games may facilitate a higher standard of state of the art artificial intelligence. Either a newer version of Counter-Strike or simply another game altogether might have been a preferable choice. As stated in the selection of the game 2.2, Counter-Strike was chosen primarily due to its potential for depicting observable cooperation, in ways that many other FPS games does not offer (e.g. working together to plant/defuse the bomb). However alternative games with cooperative missions such as this do exist, and these alternatives may have been prematurely excluded, but at this point in time there is little to no reason to doubt the choice of Counter-Strike 1.6, in its potential to function as a platform to answer the projects final problem statement. 7.2 Choice of Bot In extension to the choice of game, it was evident from the beginning that the developing of an artificial intelligent (bot) from scratch was not going to be a realistic goal within the time limits of this project. The choice of bot to further develop on, for the sake of fulfilling the final problem statement, could have been approached more meticulously than what was done. As stated in the Design chapter 3.4, the candidates we looked into are all renowned in the bot- and Counter-Strike communities. The primary reason for chosing the PODbot was of pragmatic reasons, seeing as the PODbot offered freely available source code for the bot, as well as a code structure which made it relatively easy to implement new functions. While the PODbot is a renowned State of the Art bot, having been continuously reiterated for over a decade, another AI may have proved better to use as a foundation for the further development which this project has been focusing on, but there is no apparent evidence at this point to support that such a change of path ought to have been made. 7.3 Test setup One of the larger areas of concern is the core of the test setup, in regards of the dissimilarity to the conventional test setup of FPS game bots human-like qualities 5.3. As was mentioned in the Test Setup chapter, the commonly used version of the Turing Test in regards of testing human-like qualities in bots, includes the judging subject as a part of the game, in a 3 player free for all deathmatch, making him/her capable of provoking reactions in the human/bot player. This was however problematic to duplicate in CounterStrike, seeing as it is not possible to be more than two teams, and including the judging subject in the game would therefore not be possible without restructuring the entire test 66 of 79 MED6, Group 617 7 DISCUSSION AAU-CPH setup. A possible interactive test setup, suitable for Counter-Strike, could be by having one or more subjects playing on one team, and the opposing team consisting of either bots/humans exclusively. The subject(s) would then play against the human- and bot team several times each, and base on his/her experiences judge which team consists of human- and which consists of bot players. While it may have been possible to include the judging subject into the game with such an interactive test setup, the choice of making the subjects a passive observer to the games actions was taken from the research from DTU, investigating player believability 5.3. Both conceptual test setups carry with them potentially significant biases, seeing as the subject who is an interactive part of the game will have numerous stress elements affecting his/her observing skills - i.e. trying to stay alive while in combat with an opponent. Additionally the amount of time spent actually observing the human/bot adversary is relatively low, seeing as the only observing made possible is when the player is in direct contact with either of the players. The passive observer test setup offers for the subject to study the behavior of the bot/human player without stress elements interfering with their judging. However its questionable whether the results yielded from such a test are as usable as the interactive, seeing as the primary purpose of the bot is to appear human-like to the human player playing against/with it, and not necessarily to an observer. It is possible that an observer may catch inhumane behavior traits of the bot, which a person playing with/against the bot simply wouldnt notice. Its difficult to tell how the results of the test may have differed if an interactive solution (such as the test setup for the Botprize competition 5.3) had been implemented instead of an observing test. However it is highly probable that the observing- and interactive test setups premises are so different, that two tests made from each test setup arent directly comparable. This opens up for the possibility of testing the bots human-like quality with a combination of the two tests, in order to establish if any major difference in its human-like quality occurs - i.e. whether certain inhumane traits in the bot are visible in one test, but possibly not in the other. However seeing as there is little to no reason to distrust the observing test setup used in the testing of the bots human-like quality, the discussion will continue under the assumption of an acceptable test setup. 7.4 Test Results A major point of concern in the test results, is the insufficient amount of answers to most of the test videos 6. Apart from simply having the surveys remain up for a longer period of time, a simple solution could have been to reduce the amount of videos. Most test subjects decided to end their survey prematurely, as was their right, however this resulted in the divided amount of answers to the majority of test videos, where a fewer amount of videos would have resulted in a higher average of answers per video. Doing so would naturally come at the cost of validity, seeing as a balance is necessary between amount of answers on a video, as well as amount of different videos. If too few videos are included, it will become increasingly difficult to determine anything in broad terms, and it will become a test investigating a few set of cases. However, with 20 videos for both the PODbot and the CIAbot (including the 5 videos of the PODbot that had to be removed due to errors in the editing), it seems plausible to be able to remove around 5 videos of each bot - roughly equating a total of 40 min. of time from the survey. The results of subjects with more than 2 in experience (6.1), showed that the human67 of 79 MED6, Group 617 7 DISCUSSION AAU-CPH likeness of the CIAbot was 23.11% with a standard deviation of 6.77% while the PODbot had a human-likeness of 26.91% and a deviation of 8.84%. When comparing the two means, it is clear that the human-likeness of the PODbot was better, than the CIAbot. To check how much the CIAbot deviated from the PODbot two T-Tests was conducted. The first t-test was a test for hypothesized mean. Where the samples of the CIAbot was tested against the mean of the PODbot. The T-test was rejected which gave a greater incentive, to assume that the bot was worse after the implementation. Another t-test was therefore conducted. The paired sample t-test, where the samples of CIAbot results and the samples of PODbot results was compared. This t-test was also rejected. According to the subjects with more than 2 in experience, was the CIAbot worse than the PODbot. A processing of the subjects were therefore made, to see if it actually just was a pattern in the subjects with small experience, that skewed the results. An excretion was made, and the subjects with more than 4 in experience (6.2), was only used in the data. A new mean value was made of the answers to both PODbot and CIAbot, which resulted in some stunningly new results. The more experienced subjects were better at distinguishing between human and bots, which resulted in some extremely decreasing human-likeness percentages. The CIAbot’s human-likeness was decreased to only 18.18% while the standard deviation was increased with 8.08% deviation. The PODbot’s human-likeness was 18.83% and its standard deviation was decreased to 7.88%. The new mean values, and standard deviation values, laid basis of a new t-test, again with the new excreated samples, and the new means. It resulted in stunningly new results, that did not reject the null hypothesis in either of the two t-tests. A new question was therefore raised. Was it at all possible to distinguish between two bots, with the chosen test setup? The issue of insufficient answers per video transfers into the validity of the human-like quality measured on each video 15. Where few of the videos in the Experience >= 2 test results were of a statistically tolerable size 12, the majority and virtually every video in the Experience >= 4 test results 27, had too few answers to seemingly produce anything conclusive for either of the bots. This can also be seen in the attempted normal distribution of the videos human-like quality 16, 17, 28 or 29, where it clearly shows that neither fits neatly into a bell curve. While it is probably that this is due to an insufficient amount of answers per video, it may also be due to occurrences in certain videos. After all the normal distribution is over the calculated human-like quality of the videos of the PODbot and CIAbot, and not over results of the respective bots, per se. These occurrences can either be due to a poorly chosen case from the editing side, or due to faulty assumptions made by the survey subjects. An example of such a faulty assumption can be seen in the reasoning behind a subjects answer to a test video, wherein the subject excludes the Counter-Terrorist player from being a bot, simply due to the fact that the player missed his grenade throw: CT missed the grenade throw - don’t think a bot would. - Appendix10 - 13, Form Responses, EQ8. Other examples include: • ”The terrorist is a player. A bot wouldn’t try to wallbang.” 11 - Appendix10 - 13, Form Responses, W3 • ”the tells are: Counter-Terrorist walk and duck bots never do that, Counter-Terrorist moves into position according the the info he gets from sound.. bots never do that” - Appendix10 - 13, Form Responses, AI27 • ”The BOT doesn’t react on sounds.” - Appendix10 - 14, Form Responses, S12 11 Wallbang - to shoot through walls after hearing/seeing an enemy. 68 of 79 MED6, Group 617 7 DISCUSSION AAU-CPH All of the above quoted assumptions made by the subjects are categorically wrong, and all of their answers to the videos which their quote is from were wrong. Not one of the cited subjects has a self proclaimed FPS game experience below 5 on the 1-6 scale, or a Counter-Strike experience below 4. Such faulty assumptions are likely to have caused a skewing of test videos human-like quality results. Other than the false assumptions of what constitutes bot-like behavior, quite a few subjects were extraordinarily good at catching actual bot-like and human-like traits in both the PODbot and CIAbot, with such examples as: • ”Botlike movement. Instant aimchange after getting hit. Backpeddle into corner camping spot. ” - Appendix10 - 13, Form Responses, K20 • ”The Terrorist uses sniper strategies (at the Terrorist spawn) that a bot wouldn’t. Furthermore the Terrorist know exactly where to position himself to look around corners and position himself at high grounds when in fire combat.” - Appendix10 13, Form Responses, W22 • ”Looks at the wall during firefight whilst behind create. Terrorist player much more rapid movement/peeking/walking” - Appendix10 - 13, Form Responses, AE2 Although the majority of subjects managed to pick out the bot player from the videos, navigation traits seems to have been the last thing on the subjects mind when observing the behavioral patterns of the bots. Other traits, such as orientation, reaction time and movement style were the dominating factor present in the elaborate reasoning behind the subjects’ answers in the surveys, while no comment has been submitted regarding the navigation of the bot-/human player. It is possible that the point of view ought to have been from an aerial perspective instead of a locked 3rd person view of the individual bot/human player. Showing the recording of a game round as such, would remove many of the factors that subjects showed to focus on when looking for bot-/human-like traits. Having the subjects judge the human-like quality of the bots in such a manner however seems questionable for several reasons. One reason is that while the subjects may be experienced in playing FPS games, it seems unlikely to equate to experience in observing the FPS game’s course from an aerial perspective. Furthermore, even if a bot appears to be very human-like from such a meta-game perspective in the end, the perspective which the bot is supposed to appear human-like from, is the perspective of the human players playing with or against it, hence testing from another perspective would still be a necessity. Taking the aforementioned facts into considerations, it points to a necessity of fixing multiple behavioral traits which gives the bot away as an inhumane entity, before a behavioral pattern such as navigation can be put to a test. Alternatively an aerial perspective of the navigation patterns could be used instead of regular in-game footage. What constitutes human-like behavior can in many aspects be considered a subjective size, and as comments in the test results goes to show, both right and wrong assumptions were made as for what behavior falls under the umbrella of human-like behavioral patterns, and which falls under that of a bots. 69 of 79 MED6, Group 617 8 8 CONCLUSION AAU-CPH Conclusion Will interactive communication between FPS game bots, in order to establish dynamic combat-tactics, increase the development towards a completely human-like bot? In regards to a human-like quality of an entity, the answer can be split in two: The abilities and the appearance. The ability to communicate interactively and cooperate in a dynamic manner are considered to be important aspects of human-like behavior, however, the purpose of developing a bot is not necessarily to possess human-like abilities, but rather to appear as if it does. In order to test the Final Problem Statement, the source code for a renowned state of the art bot for Counter-Strike 1.6 was developed upon, giving bots in a team the ability to communicate any visual confirmation of an enemy (or the lack of), in order to deduce estimations of the presence of all of the opponents. The knowledge were used to pursue/avoid depending on a higher goal of the bot, all for sake of creating a closer instance of a human-like simulation than that of the state of the art. Despite gathering hundreds of evaluations of the human-like appearance of the developed FPS game bot, CIAbot, the answer to the problem statement remains tentative at best. Although the amount of responses collected through the conducted surveys were high in quantity, they were spread too thinly over too many cases, making most of the evaluations questionable on their own. Only few videos of the CIAbot and its source code origin (the PODbot) had received a valid sample size, however judging the bots’ humanlike qualities from so few videos would hardly yield any conclusive answer to the problem statement. However, through the elaborate reasoning behind the subjects’ choices, two important points became abundantly clear. A large number of experienced players had misconceptions as to what identifies a human player and what identifies a bot. This lead to the test subjects making their choice (human or bot?) based on traits, that was not affected by the project’s tinkering. Even more astounding was the fact that not a single subject who was voicing the reasoning behind his/her choice, mentioned the tactical navigation as a determining factor for his/her answer. This raises the question of whether or not such a feature is noticeable to the human player. However, there is no way of coming to a final answer to such a question, before the numerous flaws and shortcomings of the bot’s behavioral patterns has been improved - and its navigation is more or less all that there is left to doubt in regards of a human-like behavior. A different approach that could have been taken in order to find a more conclusive answer to the problem statement, could be to make the bots communicate other factors than enemy positioning (and lack of) and use this information to affect other behavioral traits than tactical navigation. However, the research of this project did not make it so far as to produce an answer for such a hypothesis. 70 of 79 MED6, Group 617 9 9 FUTURE PERSPECTIVES AAU-CPH Future Perspectives Having only touched upon the surface of artificial intelligence in FPS games, there are numerous fields to incorporate in the development of a human-like bot. Looking at the comments from the subjects, there are more apparent give-aways in regards to the behaviour seen in the videos. These resemble features (or the lack of), that should either be improved, redesigned or implemented. Once the development of the human-like bot reaches a stage, where all features, that might act as obviously bot-like in the test, have been addressed and improved, it might be possible to create a truly human-like bot. As for the specific features, it would prove very beneficial to design a method in which traits could be identified, classified and tested for effectiveness. Implementing new features or making changes to the already existing ones, can be somewhat problematic, if the outcome cannot be isolated for evaluation. 71 of 79 MED6, Group 617 REFERENCES AAU-CPH References Computing machinery and intelligence. courses/471/papers/turing.pdf. 1950. URL http://www.csee.umbc.edu/ Readings in Machine Learning. Morgan Kaufmann Publishers, Inc, 1990. URL http://www.google.dk/books?id=UgC33U2KMCsC&dq=Readings+in+Machine+ Learning++edited+by+Jude+W.+Shavlik,+Thomas+Glen+Dietterich&lr=&source= gbs_navlinks_s. Reinforcement learning: A Survey. Journal of Artificial Intelligence Research 4, 1996. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics, Stanford University, Dept. of Statistics, Stanford CA 94305, 2009. Assessing believability. 2012. URL http://www.itu.dk/people/yannakakis/ TogeliusEtAl_AssessingBelievability.pdf. AlliedModders. Metamod:source. URL http://www.sourcemm.net/. Samuel Arthur. Some studies in machine learning using the game of checkers, volume 3. IBM Journal of Research and Development, 1959. Electronic Arts. Battlefield. URL http://www.battlefield.com/. Autodesk. 3ds-max. URL http://www.autodesk.com/products/autodesk-3ds-max/ overview. Bosskey. Mechanics. URL http://www.bosskey.net/cs/tips.html. Bruce G. Buchanan. A (very) brief history of articial intelligence. pages 53–60, 2006. URL http://aitopics.net/assets/PDF/AIMag26-04-016.pdf. Ching-Tsorng Tsai Chao-Hui Ko Chishyan Liaw, Wei-Hua Wang and Gorden Hao. Evolving a team in a first-person shooter game by using a genetic algorithm. Applied Artificial Intelligence: AnInternational Journal, pages 199–212, 2013. doi: http: //dx.doi.org/10.1080/08839514.2013.768883. Valve Corporation. Counter-strike. URL http://www.valvesoftware.com/games/css. html. Creativecommons.org. Unreal tournament 2004. URL http://liandri.beyondunreal. com/Unreal_Tournament_2004. Lewis Denby. 10 top tips from a counter-strike pro. URL http://www.pcgamer.com/2010/ 10/14/10-top-tips-from-a-counter-strike-pro/. Darren Doherty and Colm ORiordan. Effects of shared perception on the evolution of squad behaviors. IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, 01(01):50–62, 2009. doi: http://ieeexplore.ieee.org/stamp/stamp. jsp?arnumber=04804730. Benjamin Geisler. An empirical study of machine learning algorithms applied to modeling player behavior in a first person shooter video game. 2002. 72 of 79 MED6, Group 617 REFERENCES AAU-CPH GNUWIN. Make for windows. URL http://gnuwin32.sourceforge.net/packages/ make.htm. Stefan Hendricks. Realbot - readme, 2007. URL http://filebase.bots-united.com/ index.php?act=category&id=1. a ZeniMax Media company id. Quake live. URL http://www.quakelive.com/#!home. GameData inc. Hltv introductions. URL http://www.counter-strike.com/faqs/hltv/. M. Tim Jones. Artificial Intelligence - A Systems Approach. Jones and Bartlett Publishers, 2009. Kaizen. How to be clutch, a. URL http://www.nextlevelgamer.com/counter-strike/ how-to-be-clutch. Kaizen. Basic intro to flash-bangs, b. URL http://www.nextlevelgamer.com/ counter-strike/basic-intro-to-flash-bangs. Alessandro Canoss Magy Seif El-Nasr, Anders Drachen. Game Analytics: Maximizing the Value of Player Data. Springer, 2013. Markus. John McCarthy. Arthur samuel: Pioneer in machine learning. pages 329 – 331, 1992. URL http://aitopics.org/sites/default/files/articles-columns/Samuel_IBM_ Obit_1992.pdf. Tom M. Mitchell. The discipline of machine learning. 2006. Chris MORIARTY. Learning human behavior from observation for gaming applications. 2005. Mumble. Mumble. URL http://mumble.sourceforge.net/. Alexander Nareyek. Ai in computer games address =. Sushil J. Louis Nicholas Cole and Chris Miles. Using a genetic algorithm to tune firstperson shooter bots. 2009. URL http://ieeexplore.ieee.org/xpls/abs_all.jsp? arnumber=1330849. Amit Patel. Introduction to a*. URL http://theory.stanford.edu/~amitp/ GameProgramming/AStarComparison.html. Purvag Patel and Henry Hexmoor. Designing bots with bdi agents. 2009 International Symposium on Collaborative Technologies and Systems, pages 180–186, 2009. doi: http://www.computer.org/csdl/proceedings/cts/2009/4584/00/05067479-abs.html. Amir H. Fassihi Peyman Massoudi. Achieving dynamic ai difficulty by using reinforcement learning and fuzzy logic skill metering. pages 163 – 168, 2013. Eric Postma Pieter Spronck, Ida Sprinkhuizen-Kuyper. Online adaptation of computer game opponent ai. Proceedings of the 15th Belgium-Netherlands Conference on Artificial Intelligence, 2003. 73 of 79 MED6, Group 617 REFERENCES AAU-CPH Daniel Alvaro Fonseca Policarpo. Adaption and learning of intelligent agents in interactive environments. 2011. URL https://docs.di.fc.ul.pt/jspui/bitstream/ 10455/6737/1/Disserta%C3%A7%C3%A3o%20de%20Daniel%20Alvaro%20Fonseca% 20policarpo.pdf. Psycrotic. Counter-strike counter-terrorist guide and tips. counterstrikestrats.com/ctguide.asp. URL http://www. Randar. The bot faq. URL http://www.randars.com/bots/botfaq.html. William van der Sterren Remco Straatman and Arjen Beij. Killzone’s ai: dynamic procedural combat tactics. URL http://www.cgf-ai.com/docs/straatman_remco_killzone_ ai.pdf. ID Software. Id software, a. URL http://www.idsoftware.com/games/quake/. NNRG Software. Ut2̂ winning botprize 2012 entry, b. URL http://nn.cs.utexas.edu/ ?ut2. Alistair Stewart. Teambot beta 1.2. Matthew Strahan. Rbpm104readme, 2004. URL http://filebase.bots-united.com/ index.php?act=category&id=1. TeamSpeak. Teamspeak. URL http://www.teamspeak.com/. TheFeniX. Basic cs tactics. URL http://www.tacticalgamer.com/ counter-strike-tactics-discussion/30592-basic-cs-tactics.html. Valve. Goldssource, Goldsource. a. URL https://developer.valvesoftware.com/wiki/ Valve. Half-life sdk, b. URL https://developer.valvesoftware.com/wiki/Half-Life_ SDK. Valve. Half-life on steam, 1996-2010. URL http://store.steampowered.com/app/70/. Jeffrey Ventrella. The gestural turing test intelligence has a body. URL http://www. gesturalturingtest.com/. Ventrilo. Ventrilo. URL http://www.ventrilo.com/. Wolfz0r. How to defend. how-to-defend-css. URL http://www.nextlevelgamer.com/counter-strike/ Stephano Zanetti and Abdennour El Rhalibi. Machine learning techniques for fps in q3. ACE ’04 Proceedings of the 2004 ACM SIGCHI International Conference on Advances in computer entertainment technology, pages 239–244, 2004. doi: http://dx.doi.org/ 10.1145/1067343.1067374. 74 of 79 MED6, Group 617 List of Figures AAU-CPH List of Figures 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 This illustration depicts how Supervised-learning works. The algorithm that deduces relations in the raw data is trained by humans. The trained algorithm is then tested to see if it can be verified as a ”good solution” for classifying the data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . This illustration depicts how Unsupervised-Learning works, where an algorithm is applied to infer statistical similarities in the data by clustering patterns in groups. The ”manual review” attempts to label/identify what the clusters of data represents. . . . . . . . . . . . . . . . . . . . . . . . . . . This illustration depicts how Reinforcement Learning works. The second step in the procedure illustrated above shows how a human tries to model a relation between sample data and feedback. The algorithm that represents the data relations gets modified throughout the entire body of data. . . . Unreal Tournament 2004 . . . . . . . . . . . . . . . . . . . . . . . . . . . Counter-Strike 1.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Quake 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Editor of the TeamBot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The figure of the objectives tree is a depiction of the hierarichal structure which the conceptually designed bot would base its decisions from. . . . The Header of the method BotCollectExperienceData() . . . . . . . . . . The illustration above demonstrates how the weights of the waypoint are distributed dynamically in run-time . . . . . . . . . . . . . . . . . . . . . . The Header of the method UpdateVisibleWPThreat() . . . . . . . . . . . . Graph showing the amount of answers each video (1-20) received, for CIAbot and PODbot. The x-axis showing the index number of the videos, and the y-axis showing the amount of answers. . . . . . . . . . . . . . . . . . Showing amount of true and false answers. The x-axis showing the index number of the videos, and the y-axis showing the amount of answers. . . Showing amount of true and false answers. The x-axis showing the index number of the videos, and the y-axis showing the amount of answers. . . Percentage Human-likeness pr. video. The x-axis showing the index number of the videos, and the y-axis showing percentage human-likeness. . . CIA2total Normal Distribution: The graph shows, the normal distribution of the CIAbot, it shows, how well represented, the answers are, in correlation to the mean of the CIAbot response sample. The x-axis showing the percentage of human-likeness, and the y-axis showing the amount of answers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . POD2total Normal Distribution: The graph shows, the normal distribution of the PODbot, it shows, how well represented, the answers are, in correlation to the mean of the PODbot response sample. The x-axis showing the percentage of human-likeness, and the y-axis showing the amount of answers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 of 79 . 7 . 7 . . . . . 8 14 15 16 23 . 31 . 45 . 47 . 48 . 57 . 58 . 58 . 58 . 59 . 59 MED6, Group 617 18 19 20 21 22 23 24 25 26 27 28 List of Figures AAU-CPH CIA2total Boxplot: The graph shows the boxplot of the CIAbot. It shows, how much of the total sample size, was in the 50 percent closest to the mean of the, normal distribution, by using the blue box. The black whiskers are the parts outside, of the 50 percent closest to the mean and the red line is the median. The y-axis showing the distribution of the answers, based on the percentage of human-likeness. . . . . . . . . . . . . . . . . . . . . POD2total Boxplot: The graph shows the boxplot of the PODbot. It shows, how much of the total sample size, was in the 50 percent closest to the mean of the, normal distribution, by using the blue box. The black whiskers are the parts outside, of the 50 percent closest to the mean and the red line is the median. The y-axis showing the distribution of the answers, based on the percentage of human-likeness . . . . . . . . . . . . . . . . . . . . CIA2totalTrueCertainty Normal Distribution: The graph shows the normal distribution of the certainty level, that the subjects had when they were correct, judging the CIAbot. The x-axis showing the certainty value on a likert-scale, and the y-axis showing the amount of answers. . . . . . . . . POD2totalTrueCertainty Normal Distribution: The graph shows the normal distribution of the certainty level, that the subjects had when they were correct, judging the PODbot. The x-axis showing the certainty value from a likert-scale, and the y-axis showing the amount of answers. . . . . . . . CIA2totalFalseCertainty Normal Distribution: The graph shows the normal distribution of the certainty level, that the subjects had when they were incorrect, judging the CIAbot. The x-axis showing the certainty value from a likert-scale, and the y-axis showing the amount of answers. . . . . . . . POD2totalFalseCertainty Normal Distribution: The graph shows the normal distribution of the certainty level, that the subjects had when they were incorrect, judging the PODbot. The x-axis showing the certainty value from a likert scale, and the y-axis showing the amount of answers. . . . . . . . Graph showing the amount of answers each video (1-20) received, for CIAbot and PODbot. The x-axis showing the index number of the videos, and the y-axis showing the amount of answers. . . . . . . . . . . . . . . . . . Showing amount of the CIAbot’s true and false answers. The x-axis showing the index number of the videos, and the y-axis showing the amount of answers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Showing amount of the PODbot’s true and false answers. The x-axis showing the index number of the videos, and the y-axis showing the amount of answers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Percentage Human-likeness pr. video. The x-axis showing the index number of the videos, and the y-axis showing the percentage . . . . . . . . . CIA4total Normal Distribution: The graph shows, the normal distribution of the CIAbot, it shows, how well represented, the answers are, in correlation to the mean of the CIAbot response sample. This graphs sample was composed only of subjects with more than or equal to 4 in experience, in either First-Person Shooter games in general or Counter-Strike specifically. The x-axis showing the human-likeness in probability, and the y-axis showing the amount of answers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 of 79 . 59 . 59 . 60 . 60 . 60 . 60 . 62 . 62 . 62 . 63 . 63 MED6, Group 617 29 30 31 32 33 34 List of Figures AAU-CPH POD4total Normal Distribution: The graph shows, the normal distribution of the PODbot, it shows, how well represented, the answers are, in correlation to the mean of the PODbot response sample.This graphs sample was composed only of subjects with more than or equal to 4 in experience, in either First-Person Shooter games in general or Counter-Strike specifically. The x-axis showing the human-likeness in probability, and the y-axis showing the amount of answers. . . . . . . . . . . . . . . . . . . . . . . . CIA4total Boxplot: The graph shows the boxplot of the CIAbot. It shows, how much of the total sample size, were in the 50 percent closest to the mean of the, normal distribution, by using the blue box. The black whiskers are the parts outside, of the 50 percent closest to the mean and the red line is the median. This graphs sample was only subjects with more than or equal to 4 in experience, in either First-Person Shooter games of CounterStrike. The y-axis showing the distribution of the answers, based on the percentage of human-likeness . . . . . . . . . . . . . . . . . . . . . . . . POD4total Boxplot: The graph shows the boxplot of the PODbot. It shows, how much of the total sample size, was in the 50 percent closest to the mean of the, normal distribution, by using the blue box. The black whiskers are the parts outside, of the 50 percent closest to the mean and the red line is the median. This graphs sample was only subjects with more than or equal to 4 in experience, in either First-Person Shooter games of CounterStrike. The y-axis showing the distribution of the answers, based on the percentage of human-likeness . . . . . . . . . . . . . . . . . . . . . . . . CIA4totalTrueCertainty Normal Distribution: The graph shows the normal distribution of the certainty level, that the subjects had when they were correct, judging the CIAbot. This graphs sample was composed only from subjects with more than or equal to 4 in experience, in either First-Person Shooter games in general or Counter-Strike specifically. The x-axis shows the certainty value from a likert-scale, and the y-axis shows the amount of answers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . POD4totalTrueCertainty Normal Distribution: The graph shows the normal distribution of the certainty level, that the subjects had when they were correct, judging the PODbot. This graphs sample was composed only from subjects with more than or equal to 4 in experience, in either FirstPerson Shooter games in general or Counter-Strike specifically. The x-axis showing the certainty value from a likert-scale, and the y-axis showing the amount of answers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CIA4totalFalseCertainty Normal Distribution: The graph shows the normal distribution of the certainty level, that the subjects had when they were incorrect, judging the CIAbot. This graphs sample was composed only from subjects with more than or equal to 4 in experience, in either FirstPerson Shooter games in general or Counter-Strike specifically. The x-axis showing the the certainty value from a likert-scale, and the y-axis showing the amount of answers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 of 79 . 63 . 64 . 64 . 64 . 64 . 65 MED6, Group 617 35 List of Figures AAU-CPH POD4totalFalseCertainty Normal Distribution: The graph shows the normal distribution of the certainty level, that the subjects had when they were incorrect, judging the PODbot. This graphs sample was composed only from subjects with more than or equal to 4 in experience, in either FirstPerson Shooter games in general or Counter-Strike specifically. The x-axis showing the certainty value from a likert-scale, and the y-axis showing the amount of answers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 78 of 79 MED6, Group 617 10 10 APPENDIX AAU-CPH Appendix 1. Tactical Elements: CD/Appendix/Tactical Elements.docx 2. Further Development Of PODbot: CD/Appendix/Tactical Elements.docx 3. Teambot Readme: CD/Appendix/TEAMBot Readme.doc 4. Realbot Readme: CD/Appendix/RBPM104readme.txt 5. Decision Tree: CD/Appendix/Decision Tree.jpg 6. PODbot Code: CD/Appendix/PODbotSource/ 7. CIAbot Code: CD/Appendix/CIAbotSource/ 8. CIAbot dll: CD/Appendix/CIAbotSource/dlls/ 9. Survey: CD/Appendix/AI survey - Google Analyse.pdf 10. Results2: CD/Appendix/Results/Survey Results Experience greaterorequalto 4.xlsx 11. Results4: CD/Appendix/Results/Survey Results Experience greaterorequalto 2.xlsx 12. Survey Response 1: CD/Appendix/Results/BOT-HUMAN responses (1).xlsx 13. Survey Response 2: CD/Appendix/Results/BOT-HUMAN responses (2).xlsx 14. Survey Response 3: CD/Appendix/Results/BOT-HUMAN responses (3).xlsx 15. Survey Response 4: CD/Appendix/Results/BOT-HUMAN responses (4).xlsx 16. Realbot To Do List: CD/Appendix/RealBotToDo.docx 17. Teambot To Do List: CD/Appendix/TeamBotToDo.txt 79 of 79
Similar documents
Volume 8, Number 2
repetitive and time consuming actions. In MMOGs, bots are used to automate the avatar levelling or for gold farming, and they offer to their owners opportunities for fast achievements. However, thi...
More information