amc_proceedings_low
Transcription
amc_proceedings_low
Audio Mostly 2006 PROCEEDINGS A CONFERENCE ON SOUND IN GAMES OCTOBER 11-12 Proceedings of the Audio Mosty Conference - a Conference on Sound in Games October 11 - 12, 2006 Piteå, Sweden In collaboration with: Contents Music Videogames: the inception, progression and future of the music videogame 5 Lyall Williams Computer Game Audio: The Unappreciated Scholar of the Half-Life Generation 9 Stuart Cunnigham Autoring of 3D virtual auditory Environments 15 Niklas Roeber, Eva C. Deutschmann and Maic Masuch From Heartland Values to Killing Prostitutes: An Overview of Sound in the Video Game Grand Theft Auto Liberty City Stories 22 Juan M. Garcia Physically based sonic interaction synthesis for computer games 26 Stefania Serafin and Rolf Nordahl The Composition-Instrument: musical emergence and interaction 31 Norbert Herber Investigating the effects of music on emotions in games 37 Katarina Kiegler and David C Moffat REMUPP – a tool for investigating musical narrative functions 42 Johnny Wingstedt On the Functional Aspects of Computer Game Audio 48 Kristine Joergensen Composition and Arrangement Techniques for Music in Interactive Immersive Environments 53 Axel Berndt, Knut Hartmann, Niklas Roeber and Maic Masuch The drum pants 60 Soeren Holme Hansen and Alexander Refsum Jensenius Backseat Playground 64 John Bichard, Liselott Brunnberg, Marco Combetto, Anton Gustafsson and Oskar Juhlin Audio Mostly 2006 Music videogames: the inception, progression and future of the music videogame Lyall Williams Keele University, UK [email protected] Abstract. Over the last 10 years, the genre of the music videogame (or rhythm game) has become a staple of home videogame consoles, and has developed a strong presence in the arcade. In these games, the player is actively involved in the creation or playback of music or rhythm. The purpose of this poster is to firstly describe the genre of the music videogame, contrasting it with that of the audio game (in which the player needs little or no visual feedback to play, originally developed for blind or visually impaired players). I shall consider the origins and early titles, and then outline some important contemporary works, paying specific attention to the sonic and visual aesthetics developed within each title, and how they contribute to the genre as a whole. Among the titles I will consider are “Parappa the Rapper”, widely considered to be the first true music videogame; the phenomenon of the “Bemani” series, which brought music games to the arcade and introduced the first custom-designed controller inputs; and “Donkey Konga”, which came with a pair of bongos for Nintendo’s home console, the Gamecube. I will also consider the potential for high levels of interactivity between the player and the music/rhythm in these titles, whilst noting how most current titles do not develop this potential as fully as they might; despite this overall shortcoming, I shall examine one game, Rez, that challenges the music videogame formula and serves as an example of how multiple levels of audiovisual and physical interactivity can work together to create an immersive and original game experience. signifying moving blocks. 6 Some audio games overlap with music videogames in their gameplay - Sonic Match (for PC), 7 for instance, plays distinct sounds and matches them with arrows on the keyboard, then tests the player by playing back one of these sounds and awaiting the correct keypress. If the player presses the right button another sound is played which must also be matched with a correct keypress, and gradually as the game progresses, the length of time the player has to respond to each new sound decreases. This turn-based sound gameplay bears similarities to turn-based music videogames, although it is much more simplistic than any recent commercial music videogame title. 1 What is a music videogame? Music videogames are audiovisual games 1 in which the player is actively involved in the creation or playback of music or rhythm (music videogames are also sometimes known as rhythm games). This usually occurs in the form of keypress instructions scrolling across the screen which the player must respond to with appropriate timing. Most music videogames relay instructions in one of two manners: either a constant stream of instructions that must be pressed in order, or in turn-based volleys of instructions. It could be deduced from this that all that constitutes a music videogame is following rhythmic instructions; 2 not all music videogames follow this strict type of gameplay, however, as I shall discuss later. 2 Origins and early titles Music videogames should not be confused with audio games, a genre which started out as games for the blind. 3 In contrast to music videogames, audio games have little or no visual information, and rely entirely on gameplay that connects sound with control input. They are often given away as freeware and are usually quite simplistic, although some of the more ambitious projects have included a game modification to let the player play id software’s 4 fast-paced first-person shooter Quake using audio alone (through an artificial-voice instruction system and various 3D audio positioning effects), 5 and an audio version of block-puzzle game Tetris with a complex system of notes The origin of the music videogame is difficult to pinpoint. Perhaps the earliest game to be oriented entirely around music was Grover’s Music Maker 8 from 1983 (for Atari 2600), which features Grover from the Muppet Show dancing to primitive renditions of popular songs like “Old Macdonald Had a Farm”. There was no interaction on the part of the player, however, and thus cannot be considered a music game in the sense I have described. While not technically a “video” game, the late 70’s electronic toy “Simon” had a large round plastic base quartered into 4 brightly coloured buttons which the game would light up in sequences of gradually increasing length; the player needed to match the sequence each time to progress. The “music” in this case was varying beeps, which were very basic owing to the early sound technology used. This style of game transferred over to some computer games, in particular Break Dance for the Commodore 64, in which the player had to follow the increasingly lengthy dance moves of an on-screen character by 1 Here I mean games which require both audio and video to be playable, or games which largely lose meaning when audio or video are not present. By this definition many games are audiovisual, and almost all recent games are audiovisual. 2 Wolf tightly defines music videogames as “games in which gameplay requires players to keep time with a musical rhythm”, p130. I dislike this description as it implies no creative input on the part of the player, which I feel certainly should be a part of music games, even if it isn’t yet in most cases. 3 http://www.audiogames.net/ 4 “id” is intentionally left lower case. 5 http://www.agrip.org.uk/FrontPage 6 http://inspiredcode.net/Metris.htm http://www.bscgames.com/sonicmatch.asp 8 The title was never released, and only exists in prototype form. 7 5 pressing the appropriate keys, in a turn-based fashion. A similar gameplay feature can be found in some parts of Pinocchio for the Megadrive and SNES (1995/6 respectively). So, musically influenced gameplay is nothing new. for the Playstation, setting it apart from the previous generation of consoles, in graphics, sound, gameplay, and storyline. 10 A sequel was released to Parappa the Rapper, “UmJammer Lammy” (sic), starring one of the cast of the previous game (the titular Lammy), and in this game the focus was shifted from rapping to guitar riffing; most gameplay aspects of the game were identical, however, and by this point several other music videogames were becoming popular, primarily in the arcades: beatmania 11 and its various spin-offs by developer Konami (the series and many related games are generally referred to as “Bemani”, after the division within videogame developer Konami that creates them) 12 . All beatmania arcade games feature custom controllers, with a spinning turntable for “scratching” and several black and white piano-like keys; commands descend from the top of the screen in a constant stream and must be pressed in the correct order at the correct time. The keys pressed by the player are displayed at the bottom of the display. Correctly timing the controls so that the relevant keypress coincides with the relevant note hitting the bottom bar is not just an important aspect of gameplay (greater accuracy results in a higher score), but also a crucial aspect of musical delivery – when the right key is pressed at the right moment, the background music will be added to with extra instruments, samples, and effects. The game that is usually credited with being the first true music videogame, however, is Parappa the Rapper. In the next section I shall consider the sonic and visual aesthetics of this game and several other important titles in the music videogame genre. 3 Sonic and visual aesthetics in music videogames Parappa the Rapper, released for the Playstation in 1996, was designed by Masaya Matsuura (who had previously been in a band known for their progressive electronic music). The titular character is a talking, paper-thin cartoon dog that is taught to rap by 8 bizarre sensei, including a kung-fu master onion, in order to win the love of a sunflower. The gameplay followed a fairly simple theme – each sensei “raps” a line, represented by various controller buttons (square, triangle, circle, cross, left and right) indicated at the top of the screen, and you must repeat their commands by pressing the appropriate buttons; as play progresses the button combinations get faster and more complicated. Failure to correctly respond to the instructions results not only in a poor score, but in the background scenery falling apart, and distinct changes in music to indicate to the player their failure. If the score drops too low, the player must repeat the level from the start. Among the first music games to feature a custom controller, beatmania was also responsible for bringing music games to the arcades, where they remain hugely popular to this day in Japan, inspiring people to develop breathtaking levels of skill. Unlike the happy, bright pop music found in Parappa and UmJammer Lammy, the beatmania series tends to feature serious “real” music, with commercial dance music (trance, rave etc) being the most popular. That said, the series is so lengthy (the game in the video above, beatmania IIDX, has up to 24 “mixes” or variations of the musical selection) 13 that many types of music have now been included. One reason the games have stayed popular, and in turn been so prolific, is the ease with which the arcade machine can be rejuvenated if takings drop: simply install a new “mix” with a new selection of songs, and gamers have renewed interest in play. 14 The visual aesthetic in Parappa was unlike any seen in a videogame previously, and borrowed from Western Nickelodeon-style cartoons as much as Japanese anime (one of the key visual designers, Rodney Greenblat, was an American artist). By comparison, the music is varied, but fairly unchallenging: it certainly isn’t what most people would consider “rap” music in the west. Like the visuals, though, the music is bright and comedic. The game was a huge success in Japan, despite (or perhaps as a result of) the lyrics of the songs being in English with Japanese subtitles; English-language releases of the game had the same audio, and though it’s unclear who wrote the lyrics, calling them unusual would be charitable – at times they’re utterly nonsensical. This may well be intentional, rather than a mere mishap of translation, since it suits the odd visual aesthetic. The game did not sell as well in the West, and I would suggest this is due to it confusing a large portion the “young adult” audience that Sony had targeted with the Playstation: a rap game with no rappers or even stereotypical rap music in it, which looks like a children’s cartoon. 9 Visually, the beatmania games also clearly aim to be more mature than Parappa or UmJammer Lammy, typically featuring swirling 3D patterns in time with the music. These sections of video are superfluous to gameplay and offer no explicit benefit to the player, but contribute to the overall beatmania “experience” (Bemani games are often loathed by their detractors as much as they are loved by their fans, due to the extremely noisy and dominating nature of their arcade cabinets). In the West, there are far less arcades than in Japan, and it is perhaps as a result of this that beatmania has not attained a similar level of popularity. One Bemani offshoot that has eventually achieved awareness, however, is Dance Dance Revolution (or DDR). 15 DDR differs from beatmania in a few respects: the player’s feet on sensor pads are the control method, Parappa the rapper is an important title in videogames as a whole, as it represents a postmodernist sampling of varied cultural aesthetics (both with the game’s multi-cultural visual aesthetic, and its varied musical selection), and a refusal to aim for “high-brow” serious entertainment; it also fits into the postmodernist paradigm by clearly showing self-awareness and acceptance of its nature as a videogame, by eschewing any sense of realism in favour of the absurd. It was an important early title 10 Having a simplistic love story as a videogame plot is particularly rare, as noted by Kohler, p153-4 11 The “b” is deliberately left in lower case: http://en.wikipedia.org/wiki/beatmania 12 http://en.wikipedia.org/wiki/Beamani 13 http://en.wikipedia.org/wiki/Beatmania_IIDX 14 Kohler, p155 15 http://www.konami.co.jp/am/ddr/ 9 In contrast, cartoon animation and comic books are popular with all ages in Japan, with Manga (comics) accounting for 40% of all printed books and magazines (http://library.thinkquest.org/C0115441/article1.htm) 6 rather than hands on keys, and there are also only 4 directions (up, down, left, and right) for the player to press, as opposed to the 7 keys + turntable that some beatmania games include. DDR machines almost always have floor pads set up for two players (taking advantage of this, some single-player DDR routines require the player to dance across from the first player position to the second player’s floor pads). DDR is also more forgiving of player errors than beatmania games. Western DDR machines typically feature pop-trance, dance anthems etc – reasonably similar in genre to Japanese DDR machines, but often including popular Western dance music tracks. Visually, DDR and beatmania machines look similar, with large amounts of neon and flashing lights. DDR gained Western media recognition in 2004, when it became clear that people who played the game regularly were losing considerable amounts of weight, 16 and the game has since been considered by several American schools for fitness classes. 17 Demand for home versions of both beatmania and DDR have obviously been high, and Konami and other hardware manufacturers have been happy to provide (often very costly) custom controllers for the home, 18 and release many home versions of Bemani games. 19 representing a difference in the perceived demographic of US and European Gamecube owners by Nintendo. The European version of Donkey Konga also includes more covers of classic Nintendo theme tunes (Super Mario Bros theme, Legend of Zelda theme, Donkey Kong Country theme etc, all done in a “conga” style). It could be interpreted from this that Nintendo consider European Gamecube owners more likely to be familiar with their back catalogue of games; however, this seems unlikely, given that the majority of the titles in question were released during the 80s and early 90s, when Britain and Europe were Sega strongholds. 22 In light of this, I think it is more likely that these tracks were not included in the US release to make way for the children’s songs, or other more UScompatible songs. 23 Interestingly, all three releases have the same two classical tracks: Brahms’ Hungarian Dance no.5 in G Minor, and Mozart’s Turkish March. 4 Criticism of the genre The games I have looked above constitute a fair cross-section of the music videogame genre, from its inception to some of the more recent titles. I do not have space to look at the many other important titles in detail, nor would such an investigation probably yield much more productive results: I have deliberately chosen as wide a musical cross-section as possible in the titles I have discussed, and the gameplay of most music titles is very similar. This latter point brings me to perhaps my biggest criticism of the genre – the gameplay is often more or less identical. Many music videogames feature striking visuals paired usually with a gimmicky controller of some kind, and it could be argued that these are trying to make up for the fact that, behind all the gloss, the essential gameplay isn’t very far removed from the Simon electrical toys of the late 70’s. The level of interaction between the player and the music, too, is sometimes lacking, with the player often merely keeping up with the music rather than actively being involved in its creation. While the music and sound in Parappa fluctuates depending on player ability, in Donkey Konga the game barely reacts to the player’s actions; music videogames would benefit from an increased level of player involvement in the music. Konami are not alone in releasing custom home controllers for music videogames. In 2004, Nintendo released Donkey Konga for Gamecube, a music title that is intended to be controlled with a pair of specially made bongos. The gameplay is somewhat similar to beatmania, although all commands travel down a single line one at a time, and there are only four commands: left bongo (yellow), right bongo (red), both bongos together (purple), and clap (blue spark); 20 As a result the game is easier to learn than beatmania. As the player plays the bongos, drum samples are added to the soundtrack and various animations occur on screen. Donkey Konga is a far less serious affair than the Bemani games I have considered above. The game is amusing to watch and play, with colourful graphics and a light-hearted selection of music. Unlike Parappa the Rapper, the songs included in Donkey Konga are not only different in Japan and the West, but also between the USA and Europe. 21 The majority of music in the Japanese game is, unsurprisingly, Japanese music. The differences in musical choice between the USA and Europe games, however, is more interesting: the “European” release contains an English version of Nena’s “99 Red Balloons” (originally released in German), and 2 Latin American tracks, but nothing that’s clearly from mainland Europe; instead, several British bands are featured (Supergrass, Queen, Jamiroquai, among others) as well as many American tracks. This British/American cultural bias is likely to be Nintendo’s way of avoiding the costs involved with producing relevant localisations across Europe. The US selection includes a number of children’s songs (Happy Birthday to You, Itsy Bitsy Spider, She’ll Be Coming ‘Round the Mountain etc), possibly One game that has tried to experiment beyond the usual confines of “strings of instructions” gameplay, and made significant progress in the way the player interacts with the music, is Rez, originally released for the Sega Dreamcast. The gameplay for Rez is radically different to all the music videogames I have described above: it belongs to a group of games called “on-rails shooters” where player usually cannot directly move their character but can move a crosshair of some kind in order to target oncoming enemies and defend themselves What makes Rez a music videogame is that every action (locking on, firing, explosions) results in distinct beats which the game places (as best it can) in time with the music. Skilled players can learn to play the game in time with its music, and in this way playing Rez (like many other music videogames) can often be as much a public performance as a game in its own right. The difference between Rez and many other music games is that it is a pleasure not just to watch, but also to listen to someone who knows how to play Rez really well, and in this respect Rez comes far closer 16 http://www.getupmove.com/media/cnn.pdf http://news.bbc.co.uk/2/hi/technology/4653434.stm 18 Such as this $479 DDR dance pad: http://www.amazon.com/gp/product/9756097027/qid=1147011582/sr=1 -1/ref=sr_1_1/002-77373808122452?s=videogames&v=glance&n=468642 19 There are 12 home releases of DDR alone: http://en.wikipedia.org/wiki/Dance_dance_revolution#Home_releases 20 A small microphone is included in the bongo unit. 21 Lists can be found http://uk.cube.ign.com/articles/455/455683p1.html for the Japanese version, http://en.wikipedia.org/wiki/Donkey_Konga for the US and Europe versions. 17 22 http://www.sega-16.com/Genesis-%20A%20New%20Beginning.htm Such as the American folk song “I’ve Been Working on the Railroad” or Motown tracks like Diana Ross and The Supremes’ “You Can’t Hurry Love”. 23 7 to realising the potential of a music videogame – that of musical creation, rather than repetition. Rez has a fairly unique visual style. There have been other games which provide vaguely similar graphical effects for game consoles, although these have generally been either user controllable visualisations (like Baby Universe for the Playstation, which allows players to insert their own audio CDs and play around with visual controls on-screen, and Jeff Minter’s recent Neon Virtual Light Machine for the Xbox 360), or have not permitted the player to interact directly with the sound (such as N20, and Internal Section, both for Playstation). The soundtrack itself is unusual for a videogame, if not particularly ground-breaking –the ambient trance tunes are well executed, and are the product of several respectable artists, 24 but they do not break much new sonic ground, and at times feel sparse (perhaps necessarily so, to leave room for the player to create their own sounds over the top). Instead, it is the level of interaction between the player, the graphics, and the sound that makes Rez such a remarkable game - pulses of sound are represented with oscilloscope-esque visual pulses in the polygon formations onscreen, and player controls directly result in music and rhythm. Despite this innovation, Rez falls under my other concern about music videogames, which is less of a criticism of the genre itself, and more a problem for developers: players often have quite specific music preferences, and more problematically, there are specific musical styles which they actively dislike. In designing music videogames, developers are confronted with several options: select a specific genre and stick to it, resulting in a smaller user base with a high level of interest (the fanatics of beatmania and DDR are testament to this); create popinfluenced “comfortable” music, and risk alienating people who dislike such music (Parappa’s bouncy, happy music was not universally appreciated); or try and shoehorn every type of genre in for good measure. The latter approach seems like the most obvious, but it will only work with certain types of music videogame: I got quite aggravated at having to play bongos in time with Take That songs in order to make progress in Donkey Konga. I don’t think there is an easy answer to this problem; it must be tackled on a per-game basis – clearly, though, there is room in the market for all kinds of music videogames. 5 Conclusion In this paper, I have examined a few key music videogame titles, all of which exhibit great visual and sonic flair; however, since their true beginnings in the mid 90s the gameplay formula has become rigid and little evolution has taken place. To reach their full potential, music videogames must be developed to include greater interaction on the part of the player, and new modes of play must be designed to accommodate this. References Kohler, C. Power Up: How Japanese Video Games Gave the World an Extra Life. Bradygames, 2004 Wolf, J.P. The Medium of the Videogame. University of Texas Press, 2001 24 Such as Adam Freelander and Coldcut: http://www.sonicteam.com/rez/e/sounds/index.html 8 Computer Game Audio: The Unappreciated Scholar of the Half-Life Generation Stuart Cunningham, Vic Grout & Richard Hebblewhite Centre for Applied Internet Research (CAIR), University of Wales NEWI, Plas Coch Campus, Mold Road, Wrexham, LL11 2AW, North Wales, UK {s.cunningham | v.grout | r.hebblewhite}@newi.ac.uk Abstract. Audio has been present in computer games from the original sinusoidal beeps in Pong to the Grand Theft Auto soundtracks recorded by world-famous musical artists. Far from being an overemphasis, the word “soundtrack” is highly appropriate to the role that audio in games has played up until now. It sits comfortably-and as an equal-alongside Computer Graphics, Artificial Intelligence, online multiplayer gaming and new interactive environments as one of the main driving forces in both technology development and the acceptance of gaming as a core social activity. In this paper we provide a historic synopsis of the integration of audio in games and attempt to establish if the auditory field has advanced the diversity of games and driven the market to the same extent as its visual counterpart - computer graphics. From this perspective, we discuss possible reasons for gaming trends and propose how a new generation of computer games could be driven by enhanced aural stimulation and/or excitement, the potential for which has not yet been realised. In particular, we consider how developments in soundtracks and other audio material, along with innovative interfaces, can make games and gaming more accessible to those with various disabilities, in particular, limited vision. by the dazzling visuals of a product than by spending time interacting with it. Perhaps we are judging the book by its cover. To this end, games are traditionally graphically oriented, and it is often recognised that the audio factors in games tend to act as background fillers [1, 2]. However, we believe that the diversification of audio in games can lead to new and innovative products which can stimulate interest, and moreover, be useful to a variety of users some of whom might not have full access to traditional games due to some impairment. This is generally recognised by other experts in the field [1, 2, 3, 4, 5, 6, 7, 8]. Therefore, investigation into this area is vital. 1 Introduction All but the earliest, most basic, of computer games have contained some element of sound and audio. The complexity of in-game audio and music has grown at roughly the same speed as the field of computer graphics and, as games have developed in these areas, so has the game audio. To this end, soundtracks in games are coveted by international recording artists and games music is now usually written by professional composers and musicians. Games are scored just like a big-budget Hollywood movie. As part of our research, we undertook a pilot study of computer and video game players. This allowed us to determine particular gaming preferences and also to begin to assess to what extent audio in games is important to these users, and whether or not it influences them in deciding if they would purchase a game. Furthermore, we also investigate whether or not users would be interested in games which were developed to employ sound and audio as the principal method of interacting with, and controlling, the game environment. This started as the games did in the early 1970’s with games such as Pong and Space Invaders which were supported by the inclusion of simple sounds using primitive synthesis techniques. Games would often have limited voices and a small range of actual sound effects. Early attempts were made at producing music to accompany games, which generally consisted of rather quantised rhythmic sequences being constructed from the available sets of tones. In the 1980’s the music and sound effects in games took steps towards what we now know as a game soundtrack with the development of FM and Wavetable synthesis and the emergence of the MIDI set of standards. Most notable in this decade were the Atari ST, Commodore 64 with the SID chip and the Nintendo Entertainment System (NES). The 1990’s saw the PC become a more dominant player in the games market with the release of the popular SoundBlaster series of sound cards and processors. Sampled audio was no longer a rarity. This trend has proliferated to the present day and sample-based, waveform audio is the standard method by which sound effects and music are achieved in games. Most recently, games have diversified by taking advantage of surround sound systems; the processors for which are now almost a standard option on most new computers. Games like Wing Commander III used well-known actors in-game, and recently the Grand Theft Auto series has seen big name recording artists being employed on the development of the soundtrack. 2 The Importance of Audio in Games As part of our study into the factors which influence gamers when choosing a new game, we attempted to ascertain how important the gamer considers the audio artefacts and the musical soundtrack. This was achieved by asking each subject to designate what the most important factor was when they are choosing a game to purchase. Our aim is to show that users will usually rate other factors such as the playability and visuals of a game much higher than the sound and music, further demonstrating that the focus upon computer and video games tends to be in the areas of the graphical domain. The results of this are depicted in Figure 1. Although the support and inclusion of sound in games has diversified as faster processors, larger storage discs, and CD and DVD technology proliferated, the main focus to grab the a player’s interest has traditionally always been the visuals and graphic effects. This is perhaps second only to the playability of a game. Still, one finds it much easier to be impressed quicker 9 We also found it useful that a relatively high-proportion of users believed that the interface of a game was also of high importance, since we discuss, in this paper, the potential for audio to be used as a way of interfacing to-and-from a game scenario. It may be that users would be more amenable to auditory interfaces driving their interest in a product, rather than the actual content of any music or sounds. As expected, the sounds present in a game were cited by a low percentage of those surveyed as being an influencing factor. The users who chose the ‘Other’ category on this occasion also stated that the factor important to them was the story of the game. Most Important Feature 70 Gamer Rating (%) 60 50 40 30 20 Finally, in order to determine that users have some interest in the audio or music contained in a game, we specifically queried whether or not the musical soundtrack in a game would influence them, given that this has been a particular growth area in the games industry. The result of this is shown in Figure 3. 10 0 Playability Sound Interface Graphics Online Gaming Other Does the soundtrack of a game make you more interested in playing or buying it? 60 56 Figure 1 - Most Important Game Feature Gamer Rating (%) Not surprisingly, we found that the most important factor to users who intend to buy a game is the playability. The rating for all of the other possible factors are negligible, although perhaps somewhat surprising is that fact that none of the users rated the sound or musical elements of a game to be in any way important to them when deciding upon a game to buy (QED!) In fact, the ability to play a game online with other users took favour over audio which is an intriguing insight into the mind of the 21st Century games player. Users who chose the ‘Other’ category were prompted to provide an explanation of what that particular factor was. Some samples of the responses received here were: “Depth and Creativity”, “The whole package”, and two users stated that the story or scenario were the most important. 40 40 30 20 10 4 0 Yes To get a deeper insight into what is important to users in a game, and in anticipation that playability would be the top priority; we then asked the same users what the next most important feature was in a game. This took the same form, and had the same categories as the initial question. The responses received are shown in Figure 2. No Don't Know Figure 3 – Interest in Game Soundtrack The results gained form this very specific question show that there is no distinct defining trend among the sampled users. This is reflective of that fact that playability and graphics are top priority with most gamers, and that the soundtrack appears to be of some interest, but perhaps would not have a heavy influence on a prospective game buyer. For this reason, it is probably that a lot of developers choose not to risk large sums of money on new ideas for audio technology, only to find that it only appeals to small audience. The game development industry is already a risky business and most game development companies go bankrupt after releasing their first title. Second Most Important Feature 50 45 40 Gamer Rating (%) 50 35 30 As a passing note it was interesting to note the particular genre of games favoured by the users who were studied. Of the users we surveyed the majority favoured Role-Paying Games (RPGs), followed closely by those who preferred Shoot-‘em-up style entertainment. We believe a future study of interest could be to investigate whether the favoured game genre affects the particular factors which users specifically look for in games. For example, role-playing games have been traditionally much more limited in terms of their graphic and aural flamboyance, which much more emphasis being placed upon the game story, whilst action and adventure games are often much more visually stimulating. 25 20 15 10 5 0 Playability Sound Interface Graphics Online Gaming Other Figure 2 - Second Most Important Game Feature It can be seen in these results that users do not place any particular emphasis on game audio driving them heavily in deciding to purchase a new game. As was expected the main aspects users were interested in were the playability and The results in Figure 2 give us a more useful insight into the other factors which users look for in a game. This time we see that, as we expected, the graphics and visual stimulation presented by games was easily the most popular factor (QED!). 10 graphics of a game. However, the interface of a game did become clear as another area which is important to gamers, and since we are particularly focussed on diversifying the use of audio to provide intuitive and novel interfaces, this generates scope for further development into the area of deeper audio integration. biggest challenge to the user may be to learn the actual interface, which at worst may become exhaustive before the player even gets deeply involved with playing, and being absorbed into, the game virtual world. This can be derogatory to the overall experience. Once learned, interface in an audio game should effectively become pervasive and transparent to the user. 3 Audio Focussed Gaming The move towards audio gaming has been realised by the development of software and interfaces for users with disabilities which can be overcome by finding other interactive domains with which they can engage. In order to facilitate useful interaction with the game some form of multimodal or haptic interface system is often employed. Indeed, there would be many challenges associated with creating a purely audio-only interface environment, especially for a game. This reiterates the argument that pervasive interfaces are required. Beneficially, the most entertaining and novel games will often involve some form of physical interaction. A prime example of this kind of supportive audio application comes in games which have been seen as innovative in their multi-modal interfaces, and a prima facie case would be that of the Dance Dance Revolution (DDR) game, which is not totally audio focussed, but relies heavily on the fusion of physical interfacing, sound, and a less intensive, more supportive visual role for computer graphics. In the current games market a number of new innovations over the last few years have seen focus in directly drawn to the integration of supportive audio exposure. That is to say, games are becoming more reliant on audio and music since it has an important role to play in supporting the user interaction with the gaming environment. Investigations by Targett and Fernström [6] outline the potential effectiveness for purely audio-based gaming, and attempt to evaluate the usefulness of such a system in the context of the general games market, the effectiveness for users with disabilities, and potential applications in the field of complementary therapies. Crucially their work attempts to ascertain if these games are actually entertaining- a key factor in the success of any game, regardless of the novelty or innovativeness of its interface. Early work in the field of integrating a stronger audio presence in software environments was undertaken by Lumberas and Sánchez [3, 4]. Their work involved the creation of interactive worlds and environments which utilised 3D audio to help provide structure and support navigation within the virtual environment [3]. This specific interaction is achieved through the use of haptic interfaces. Additionally, stories which could be accessed by blind children were developed, which involved them in a virtual world, with which they could have a degree of interaction [4]. This proved successful as a game, but was also found to have therapeutic effects, which allowed the children to better deal with everyday challenges outside of the game environment. The work by Lumberas and Sánchez has been further developed and explored by Eriksson and Gärdenfors [9] and their paper provides a very useful insight into the key issues of developing audio interfaces, particularly for blind children. They discuss how to interpret particulars of game interfaces and challenges to that they can be effectively presented sonically. McCrindle and Symons revisited the classic game of Space Invaders and developed stimulating audio interfaces which could be used by both blind and partially sighted users as well as fully sighted gamers [1]. Their main concentration in this work was in the area of providing useful audio feedback and cues to the user and relied on a more traditional keyboard/joypad interface. However, they received strong results which indicate that their methods of providing audio cues are simple and effective. This removes the challenge to an extent of being able to provide more intuitive interfaces to such games. Figure 4 - Playing Konami Dance Dance Revolution Konami’s DDR game, pictured in Figure 4 (note the large speakers), is highly successful and popular, and has become integrated with youth culture [10]. Indeed, there are Worldwide International championships held; testament to the success of this particular multimodal game. Still, a visual element of following on-screen prompts are present, but the audio generated by the game is assistive to the process of interaction. This said, the ability to maintain rhythm and timing is crucial to success in the game scenarios. This kind of intense physical response to audio cues is perhaps an extreme example. One would often expect users to much prefer not having to physically involve themselves so profoundly in the interactive environment. Especially in the case where auditory interfaces make software and games accessible to disabled or impaired users, the physical activity required may not be preferable. Another good example of audio technology of this nature is Rainbow Six 3 and the expansion pack RS3: Black Arrow. They allow users to actually issue voice commands and hold (limited) conversations with computer controlled players via the XBOX Communicator system. This has huge potential and has perhaps been under-used, particularly for users with limited vision. Care must be taken when developing new and original methods of interacting with computers, particularly with games. The 11 However, although this game relied intensely on the supportive sound environment which it provided, without which it would be lost, the user is not necessarily conscious of the importance of the music in this game. Nevertheless, the user does indeed interact with an audio environment in order to be able to dance and keep rhythm whilst playing the game. Are games which use 3D/surround sound more interesting to you? 80 Gamer Rating (%) 4 Analysing Potential Market Appeal Of course, the technological and design challenges of developing audio games are a purely superfluous area of work if there is not sufficient requirement in the market for such a system. Although there may be sufficient demand in the areas of making game accessible to impaired users, there is no reason why such innovative and exciting developments should be limited to these users. To attempt to establish if there is interest and demand for games which have a particular focus upon the audio artefacts and presence contained within, we further probed our studied gamers to see what their interest would be. 50 40 30 20 20 8 0 Yes No Don't Know Figure 6 - Interest in 3D Sound The incorporation of surround sound in games is not a new concept, and again the high positive response rate may again be due to the expectations of users, based on their previous experiences. However, it would be argued that the use of spatial audio in games further deepens the experience and level of immersion experienced by the user. Nevertheless, these results indicate that users would be amenable to playing games where some form of 3D or spatial audio is used. Does the quality of the sound effects in a computer game make you more interested in playing or buying it? Finally, to see how users would respond to the notion of a game which uses audio as the main method of feedback and interaction with a game, we asked users if they would be interested in such a product. The results of this are shown in Figure 7. 72 70 60 50 40 24 30 45 20 10 0 Yes No Would you be interested in a game which used sound as the main way of controlling interaction with the game? 40 4 Gamer Rating (%) Gamer Rating (%) 60 10 First, we attempt to find out how much users are influenced by the general sounds that would be expected in a computer game. Given that response was low in our earlier investigation into the importance of sound in games, we attempt to determine whether the quality of the audio in games is therefore significant. The results of this are shown in Figure 5. 80 72 70 Don't Know Figure 5 - Importance of In-Game Sounds It is clear from these results, that although users may not be initially attracted to a game, the quality of the audio is still of importance. There is an element of doubt which remains however, since this was not indicated by an excessively large amount of the sampled population. It is also possible that users may be taking a consumer view, due to the phrasing of the question, and are insisting on the maximum quality in a product they might wish to buy or use. What is clear is that audio contained in games must be of significant quality to be of interest. This may be as a result of the use of high-quality soundtracks in games as mentioned earlier. Next, we attempt to establish whether or not the use of spatial audio within games is seen as a novelty, or a particular point which can be used to sell and drive a game in the market. The results of this part of the investigation are shown in Figure 6. 35 36 32 32 30 25 20 15 10 5 0 Yes No Don't Know Figure 7 - Interest in Audio Interfacing The results from this final query yield some of the most interesting results. Given the responses received from the previous two questions it was expected that users would be intrigued and interested at the concept of using such an innovative audio game. There is a distinct lack of any particular trend in response to this question. We would hypothesise at this stage that suggesting an immersive audio environment where control is also achieved through sound may be too extreme for the majority of users and gamers at this stage, who perhaps do not know enough about this particular area. We suggest that further development and exposure of such products is probably required. 12 well as taking into account the game usability which overall provide a Heuristic Evaluation for Playability (HEP). Particularly in the important initial stages of game development, the HEP testing mechanism has proved more useful at highlighting potential issues in a game’s playability than standard user testing mechanisms. Many of the usability heuristics proposed for the HEP system are generic and would apply equally to any game, regardless of whether the primary interface mechanism was visual, haptic, auditory, or a combination of interfaces. However, an interesting area for future research could be to further build-upon the HEP process set out by Desurvire et al., particularly with a focus upon being able to successfully evaluate audio games. From these results we see that there is further strengthening of the case for developing audio interfaces and audio games. The use of audio as a feedback mechanism has certainly had the case strengthened, although presenting a more extreme scenario where audio may be used for control purposes may have been too inventive at this stage. Clearly, deeper research is needed into why users may or may not be interested in the concept of much more involved audio environments for games. 5 The Future of Audio Games The development of new and innovative audio games will be an interesting and challenging field in the years to come. We can already gain insight into the main areas of development by examining some of the more recent research developments to transpire and the issues surrounding these. We can see that integration of audio is best achieved through either an indicative set of audio sounds, such as earcons [11], or by employing more continuous sounds which evolve with the game scenarios they represent [1]. 3D audio environments are a key area to focus on if more involving and realistic audio and control environments are to be realized in the computer games world. Since the human hearing system is used to dealing with 3D sound in everyday life, this is doubtless an area which should be further exploited. Indeed, the reaction of users to a 3D audio environment is often instinctive and there is general consistency in the responsiveness of subjects when working in 3-Dimensional control environments, even across international and cultural constraints [14]. The use of 3D spatial audio within computer games is also an area for rapid expansion. Though it is fairly standard for new games to embrace 3D or surround sound environment, there is still much work which can be done in this area. For example, Virtual Reality systems are now embracing 3D audio, and results show that when interacting with a VR or virtual environment, the responses from users are far better and more accurate when in the 3D sound domain [7, 8]. In their work on audio-only gaming, Röber and Masuch [12] present a good overview of the current range of developments and challenges in the field of audio gaming, both from the technological perspective as well as dealing with important issues relating to the playability and design requirements of audio games. In this work a number of audio-only games are developed which demonstrate how audio interfaces can be applied and combined with a varied array of gestural or movement-tracking control systems. Additionally, there is a large step taken in this work since the developers often employ complex sounds in their 3D auditory worlds, such as the sound of the traffic on the road in the AudioFrogger game. These sounds were recorded from the real-world, and are not indicative synthesised sounds as often encountered in audio control environments. However, since the use of the 3D audio space allows for more space in the environment, this may not be an issue, and in their paper the authors do not make any reference to the usage of these sounds being derogatory or of them creating any major problems in usability. This is promising, and is also an interesting area for future work to be carried out. Particularly, because gamers now expect realistic audio samples to be used in games, and critically, the use of such sounds will make the audio world more immersive and thus, effective. 6 Conclusions & Discussion Evidently, there is significant work being undertaken both in the commercial and academic sectors of computer games development. However, one of the key issues which we believe will have to be addressed is how to make these novel methods appealing, and receive sufficient uptake by the general public. It seems from our study that perhaps the main way in which to increase the interest and usage of such new technologies is simply through increasing awareness, and reinforcing the value that such methods of interaction need not be expensive nor that they are another novelty phenomenon which will disappear overnight. We can see that use of complex sounds mixed in a stereo or 3D audio space may not currently be the driving force behind consumer market demand for games, but that the users in the market are certainly open to, and interested in, the use of a diverse range of audio and innovative techniques within their games. This is especially relevant to games, and although the technologies and interfaces should be embraced, developers should not lose sight of the fact that it is a game which is being developed. The game play and addictive factors of audio games will certainly be an area at the forefront of the minds of many developers although it can be regarded as a separate and distinct challenge from the technological aspects of designing audio games. Many technologically innovative games have been shortlived, mainly due to poor game playability and also the cost associated with any extra equipment or components required. To varying different degrees of success we have seen light-guns (can we forget the Nintendo Super-Scope?), Game Boy Camera, Nintendo Power Glove, Sega Activator, the Barcode Battler, and the list goes on. Few of these novelty accessories have really made significant impact on the market, with the exception perhaps of the Sony Playstation I-toy, and the aforementioned Konami DDR dance mat based games. The thrust of this is that audio games must have high levels of playability, which is the single most important factor cited in our research of games players. The exploration into audio games and multimodal games most not become overshadowed by the need to learn the interface before playing the game. Audio games must become as the audio sense is to human users everyday, it must be pervasive, instinctive, and intuitive To ensure that playability of audio games is achieved, there must be in-depth testing carried out upon any software developed. A useful set of heuristic evaluation methods has been proposed and developed by Desurvire et al. [13]. Although not specifically focussed on audio games, the methods used specifically concentrate on a number of areas of gameplay, as 13 System Integration in Integrated Europe, Liberec, Czech Republic, (2004). References [1] McCrindle, R. J., Symons, D., Audio space invaders. Proceedings of 3rd International Conference on Disability, Virtual Reality & Associated Technologies, Alghero, Italy, (2000). [2] Yuille, J., Smearing Discontinuity :: In-Game Sound. Proceedings of 5th International Conference on Digital Arts and Culture (DAC), Melbourne, Australia, (2003). [3] Lumberas, M., Sánchez, J., Barcia, M., A 3D sound hypermedial System for the Blind. Proceedings of the 1st European Conference on Disability, Virtual Reality and Associated Technologies, Maidenhead, UK, (1996). [4] Lumberas, M., Sánchez, J., 3D Aural Interactive Hyper Stories for Blind Children. Proceedings of the 2nd European Conference on Disability, Virtual Reality and Associated Technologies, Skövde, Sweden, (1998). [5] Mereu, S., Kazman, R., Audio Enhanced 3D Interfaces for Visually Impaired Users, Proceedings of International Conference on Human Factors in Computing Systems ‘96, Vancouver, Canada, (1996). [6] Targett, S., Fernström, M., Audio Games: Fun for All? All for Fun?, Proceedings of International Conference on Auditory Display, Boston, MA, USA, (2003) [7] Zhou, Z., Cheok, A. D., Yang, X., Qiu, Y., An experimental study on the role of 3D sound in augmented reality environment. Interacting with Computers, 16, 1043-1068, (2004). [8] Zhou, Z., Cheok, A. D., Yang, X., Qiu, Y., An experimental study on the role of software synthesized 3D sound in augmented reality environments. Interacting with Computers, 16, 989-1016, (2004). [9] Eriksson, Y., Gärdenfors, D., Computer games for children with visual impairments. Proceedings of 5th International Conference on Disability, Virtual Reality and Associated Technologies, Oxford, UK, (2004). [10] Welcome To My World - Lord of the Dance Machine, Episode 2, TV. BBC Three, July 27, (2006). Synopsis available at: http://www.bbc.co.uk/bbcthree/tv/my_world/lord_dance.shtml [11] Brewster, S.A., Providing a structured method for integrating non-speech audio into human-computer interfaces. PhD Thesis, University of York, UK, (1994). [12] Röber, N., Masuch, M., Leaving the Screen: New Perspectives in Audio-Only Gaming. Proceedings of 5th International Conference on Auditory Displays (ICAD), Limerick, Ireland, (2005). [13] Desurvire, H., Caplan, M., Toth, J.A., Using Heuristics to Evaluate the Playability of Games. Proceedings of Conference on Human Factors in Computing Systems, Vienna, Austria, (2004). [14] Cunningham, S., Hebblewhite, R., Picking, R., Edwards, W., Multimodal Interaction and Cognition in 3D Music and Spaital Audio Environments: A European Compatible Framework. Proceedings of CSSI International Conference on 14 Authoring of 3D virtual auditory Environments Niklas Röber, Eva C. Deutschmann and Maic Masuch Games Research Group Department of Simulation and Graphics, School of Computing Science, Otto-von-Guericke University Magdeburg, Germany niklas|[email protected] Abstract. Auditory authoring is an essential component in the design of virtual environments and describes the process of assigning sounds and voices to objects within a virtual 3D scene. In a broader sense, auditory authoring also includes the definition of dependencies between objects and different object states, as well as time- and user-dependent interactions in dynamic environments. Our system unifies these attributes within so called auditory textures and allows an intuitive design of 3D auditory scenes for varying applications. Furthermore, it takes care of the different perception through auditory channels and provides interactive and easy to use sonification and interaction techniques. In this paper we present the necessary concepts as well as a system for the authoring of 3D virtual auditory environments as they are used in computer games, augmented audio reality and audio-based training simulations for the visually impaired. As applications we especially focus on augmented audio reality and the applications associated with it. In the paper we provide details about the definition of 3D auditory environments along techniques for their authoring and design, as well an overview of the system itself with a discussion of several examples. 1 Introduction defining a theoretical foundation for 3D virtual auditory environments and the methods necessary to describe and design them. As many auditory environments are currently still programmed using software API’s, this authoring system opens artists and nonprogrammers a door to design and create (augmented) auditory applications in a very easy and intuitive way. The challenges in the design of auditory environments, which especially applies to the authoring itself, is to provide enough information to the user without overloading the auditory display and to keep the right balance between aesthetics and functionality. Many of today’s computer games feature an impressive and an almost photo-realistic depiction of the virtual scenes. Although the importance of sound has moved into the focus of game developers and players, it still does not receive the same level of attention than high-end computer graphics. The reasons for this are manifold, but some of them are already decreasing in a way that sound plays a larger role in certain games and game genre. One niche in which sound is the major carrier for information are so called audio- or audio-only computer games. These type of games are often developed by and for the visually impaired community and are played and perceived through auditory channels alone. Many genre have been adopted, including adventures, action and role-playing games as well as simulations and racing games. To bridge the barrier between visual and auditory game play, some of these games are developed as hybrids, and can be played by sight and ear [4]. For a more detailed discussion on these games we refer to [8], [9] and the audiogames website [11]. Both, audio-visual and audio-only computer games, need to be authored and designed regarding the story and the game play. For this purpose, specially designed authoring environments are often shipped together with the game-engines used. An overview and comparison of some commercially and free available audio authoring environments have been discussed by Röber et.al. [6]. Our authoring system is part of a larger audio framework that can be used for the design of general auditory displays, audioonly computer games and augmented audio reality applications. The system is based on 3D polygonal scenes that form the virtual environment. This description is used for collision detection and to assign sound sources to objects and locations. During the authoring, additional acoustic information is added to the scene. Therefore, for each object an auditory texture is defined and set up to specify the objects later auditory appearance. This includes different sounds and sound parameters per object, as well as story and interaction specific dependencies. The auditory presentation of such sound-objects can be changed by user interaction, time, other objects or an underlying story event. The authoring system is additionally divided into an authoring and a stand-alone player component. This allows an hardware independent authoring and the player to be used independently from the main system in mobile devices. In our research we especially focus on audio-only computer games and augmented audio reality applications in the context of serious gaming. Here we concentrate on techniques for sonification, interaction and storytelling, but also on authoring and audio rendering itself. The methods developed here are not only applicable for entertainment and edutainment, but can also be used in the design of general auditory displays and for training simulations to aid the visually impaired. With the authoring environment and the techniques presented in this paper, we focus on The paper is organized as follows: After this introduction, we focus in Section 2 on the definition of 3D virtual auditory environments and discuss here especially the concept of auditory textures along the varying possibilities for sonification and interaction. In this section we also motivate and explain the additional changes necessary to support dynamic environments and augmented auditory applications. Section 3 is build upon the previous sections 15 which describe the scenes. This scenegraph is also responsible for collision detection, level of detail and to handle possible time-, position-, object- or user-based dependencies. Every object within the auditory scene must be audile in some way, otherwise it is not detectable and not part of the environment. The objects can be grouped into non-interactable, passage ways and doors and interactable objects [7]. Combined with this scenegraph is a 3D audio engine that is capable of spatializing sound sources and simulating the scenes acoustics. Due to the differences in perception, the acoustic design must not resemble a real-world acoustic environment, instead certain effects, such as the Doppler, need to be exaggerated in order to be perceived. Also, additional information for beacons, earcons and auditory icons to describe nonacoustic objects and events need to be integrated in the auditory description of the scene. In order to interact with the environment and to derive useful information from the scene, the user needs to be able to input information. This is be handled through a variety of sonification and interaction techniques, which have already been discussed in the literature [8], [6]. Difficulties often occur with navigational tasks in which the user needs to navigate from one point to another more distant location within a large scene. Path guiding techniques, such as Soundpipes have here proven to be useful to not get lost [10]. Another technique, which has demonstrated to greatly enhance the perception by imitating natural hearing behaviors, is head-tracking that measures the orientation of the users head and directly applies this transformation to the virtual listener. This enables the user to immediately receive feedback from the system by just changing the heads orientation. Head-tracking can also be used for gesture detections, in which nodding and negation directly transfer to the system. Section 4 presents an actual implementation of such a 3D virtual auditory environment, while the next to paragraphs extend the system towards dynamic and augmented auditory applications. and discusses in detail the authoring system using several examples. Here we explain the techniques and concepts used, and provide together with Section 4 additional information regarding the user interface and the soft- and hardware implementation. Section 5 presents and discusses the results achieved using some examples, while Section 6 summarizes the paper and states possibilities for future improvements. 2 Virtual auditory Environments Vision and hearing are by far the most strongest senses and provide us with all information necessary to orientate ourselves within a real-world environment. Although one perceives the majority of information visually through the eyes, different and often invisible information is sensed acoustically. Examples can be easily found in daily life, such as telephone rings or warning beacons. Though the visual and the auditory environment, which are perceived by the ears respective the eyes, are partially overlapping, the larger portion is dissimilar and complements each other to provide a comprehensive overview of the local surroundings. Virtual environments are computer created worlds, which often resemble a real environment. Depending on the realism of the computer generated graphics and sound, the user might immerse into this virtual reality and accepts it as real. Virtual environments have many applications, ranging from simulations and data visualization to computer games and virtual training scenarios. The most successful implementation are computer games, in which players immerse themselves into a different reality as virtual heros. 3D virtual auditory environments represent a special form of virtual environments that uses only the auditory channel to convey data and information. As discussed in the last paragraph, the auditory and the visual channel sense different information and form a diverse representation of the users surroundings. This has to be incorporated into the design of virtual auditory environments, if the goal is to visualize a (virtual) real-world-resembling environment. An advantage of hearing opposed to vision is the possibility to hear within a field of 360 degree and to also perceive information from behind obstacles and occlusions. Difficulties sometimes apply with the amount of data perceivable and the resolution of the spatial localization for 3D sound sources. Furthermore, auditory information can only be perceived over time and only if a sound source is active. For a technical realization, virtual auditory environments are simpler and cheaper to build, as no screens or visual displays are needed. Auditory environments have many applications, including auditory displays and of course audio-only computer games and augmented audio reality. In order to receive enough information for the users orientation, navigation and interaction, a 3D auditory environment must exhibit certain qualities. These qualities and functions can be described as: 2.1 Dynamic Environments While the user can only explore static environments, more interesting, but also more difficult, is the creation of dynamic and through user interaction changing environments. Dynamic classifies here not only animations and loops, but a reaction of the environment to the users interaction. This can be expressed through time-, position- and object-dependencies, which are directly bound to certain objects in the scene. • A 3D (polygon-based) virtual environment managed by a scenegraph system, • A 3D audio-engine with a non-realistic acoustic design, • Sonification and interaction techniques, • Input and interaction devices, and Figure 1: (Inter)action Graph to model dynamic Environments. • User-tracking equipment. A time dependency controls the state of an object with an absolute or a relative time measurement. If the time is up, a certain action is evoked, like the playback of sound or the setting of other objects or control structures. A position-dependency is triggered by the user if he approaches the corresponding object, while This list extends a little further with the design of dynamic and augmented auditory environments, see also the following paragraphs. The basis of virtual auditory environments is built by a 3D scenegraph system that manages all the 3D polygonal meshes 16 the auditory environment [6]. These auditory textures have now been extended to also control the object states through time, position and user interaction dependencies, and additionally handle also the references to the various sound files along the parameters for their playback. Figures 2(a) and 2(b) display the authoring and design of dependencies using the concept of auditory textures. Figure 2(a) shows here the different dependencies and their arrangement within the auditory texture after type for faster access. Figure 2(b) displays the final action graph that is constructed from the previous auditory textures. object-dependencies change an objects state and are induced by other related objects. Figure 1 shows an action graph that visualizes these dependencies, while Figures 2(a) and 2(b) display the later authoring and design of these dependencies using auditory textures. A menu system and soundpipes, which are using mobile sound sources, can be designed as well by using additional object dependencies. 2.2 Augmented Audio Reality The term Augmented Reality comprises technologies and techniques that extend and enhance a real-world environment with additional (artificial) information. It is, unlike virtual reality, not concerned with a complete replacement of the real environment, but focusses deliberately on the perception of both worlds that intermingle and blend over. Ideally, the user would perceive both environments as one, and artificial objects and sounds as positioned within the real environment [5], [1]. Augmented reality has many applications, ranging from entertainment and visualization to edutainment and virtual archeology [2], [3]. Augmented Audio Reality describes the part of augmented reality that focusses exclusively on auditory perception. The afore listed qualities of a virtual auditory environment need here to be extended by tracking techniques that position the user within the virtual environment. This positioning, as well as the virtual map, need to be calibrated in order to deliver the right position. Due to the low resolution of the human hearing system in localizing 3D sound sources, the tolerance can, also depending on the application, vary up to to 3 m. This positioning accuracy needs to be considered during the authoring, as objects with a positiondependency should be roughly two times that distance apart. If the virtual environment is perceived through headphones, another problem occurs. The human listening system heavily relies on the outer ears to localize sound sources within 3D space. If the ears are covered, sounds from the real-world can no longer be heard properly. A solution to this problem are bone-conducting headphones, that are worn in front or behind the ears and transmit the sound via bone. Besides a slightly lower listening quality, these bone-phones allow a perfect fusion of a real and virtual acoustic environment. Additional care has to be taken with the user tracking and positioning, as the latency effects resulting from the measurement and interpretation do not have to be too large. Otherwise, the two environments would appear disjunct under motion. A more detailed discussion on the hardware used to design such a system can be found in Section 4.2. (a) Authoring of Dependencies. (b) Construction of an Action Graph. Figure 2: Authoring and Design of Dependencies. The (inter)actiongraph that is depicted in Figure 1 is composed by time and user interactions and bound to a specific object in the scene. The edges describe conditions, which, if satisfied, connect to the next possible actions, while the nodes are build by counters and user interactions. All object conditions that are not related through user interaction can be described using time. This allows also the execution of events directly following other events. With some additional control mechanisms, this description can also be used to model a story-engine that controls narrative content and parameters, as used in computer games or other forms of interactive narration. These aforementioned time-, object- and user-dependencies, including the various conditions and sounds for an object, can be modelled using auditory textures. Auditory textures were initially designed to only handle the different states and acoustic representations of an object. These state changes were induced by user interaction, as well as a story- and physics-system which control 3 Auditory Authoring Authoring is the process of designing a document and filling it with content and information from possibly different sources and media. Auditory authoring refers to the design of virtual auditory environments and the specification of dependencies to model the applications behavior. The authoring for audio-only and augmented audio reality applications takes often place directly using programming languages. But this method is neither intuitive nor can the content later be changed easily or adjusted. Together with the development of applications, this was one of the main motivations for this research as the need for more professional authoring systems is growing. A previous publication was already concerned with the authoring of virtual auditory environments, on 17 Figure 3: Auditory Authoring Environment. per sound and also vary over time, see Figures 3 and 6(b). The user interface was designed using Qt4 and allows to detach the parameter entry forms and float them over the application to customize the layout. which the current work is based along with the development of an augmented audio reality system [6]. Figure 3 shows a screenshot of the authoring environment, that explains the menu and the authoring concept. The center window shows a visual representation of the scene, while the right hand side offers sliders and parameter entries to adjust and fine tune the sound sources as well as the auditory textures. Objects can be selected by either clicking on them or through the list on the left. Basic functionalities that have to be supported by any such authoring system are: • Select, create and delete sound sources, • Position and orientation of sound sources, • Specification of playback parameters, such as attenuation, loudness, rolloff etc., • Setup of background and environmental sounds, Figure 4: Setting up Sound Sources. • Definition and set up of dependencies, and • The design of an auditory menu system. In Figure 4 one can see a screenshot of the authoring environment with a graphical representation of a virtual scene with one sound source along their parameter visualizations. The cone visualizes the direction of the sound source, while the two wirespheres represent the attenuation and the rolloff space. The sound parameters that are adjustable include position, loudness, direction, inner and outer opening, minimal and maximal loudness, rolloff and many other. Figure 5 displays the authoring of a ring topology-based menu system using six spheres. The ring menu allows between two to 3.1 Sound and Environmental Authoring The first step for the sound and environmental authoring is to load a VRML file that represents the scene geometry. This data can be modelled with any 3D program, such as Maya or 3D Studio MAX, from which the geometry can be exported as VRML. After this, objects are selected and auditory textures as well as sounds assigned and defined. Several parameters can be adjusted 18 (a) Soundpath Design. Figure 5: Design of a Ring Topology-based Menu System. six objects, which are automatically arranged and evenly distributed around the listener. Every object within the menu can be assigned an auditory texture with all the possible modifications. This system can therefore be easily used to control and adjust parameters inside the virtual environment. 3.2 Dynamic Authoring After the authoring of the basic parameters, the dynamic authoring starts with the definition of dependencies and auditory textures. For each dependency exists a different input form, that assists the user in the authoring of parameters for the animations. Figure 6 displays two examples for dynamic authoring. Here Figure 6(a) shows the design and animation authoring of a circlebased soundpath. Other geometries, like polygon lines or splines, can be used as well and are employed later within the environment to assist the player with navigation and orientation. For the animation, an object (sphere) is selected and the time for the animation specified. The start of the animation can also be triggered through any event, like time or user interaction, and repeated as often as required. In Figure 6(b) one can see the visualization of a positional dependency. The two transparent boxes mark the entry, respective the exit event to play a sound file if the user approaches the center box object. The two boxes are due to the low resolution of the user positioning in order to avoid a parameter flipping, see also Section 2.2 for more details. (b) Position Dependency. Figure 6: Dynamic Authoring. multiple applications, ranging from entertainment and edutainment environments to training simulations for the visually impaired. The authoring and the presentation of the designed application takes place in two different components. The entire system is therefore divided into two parts: the authoring and a runtime module. The authoring system is used to design the virtual auditory environment, which can also be tested on the fly using a built-in player component. The authored application can then be saved and executed on a mobile platform using the runtime system as well. This division allows a hardware independent authoring, in which the additional tracking and input devices are simulated by the mouse and keyboard. Figure 7 shows an overview of the system, with the authoring component on top and the player module at the bottom of the figure. The player component also uses the VRML model to visually inspect the scene and to verify the authoring. The evaluation of the scene events are carried out using the authored auditory textures and the information from the tracking and user interaction equipment. The final acoustic presentation using sound spatialization and room acoustics is rendered by OpenAL. 4 System Design While the last two sections focused on the theoretical foundations of auditory environments and their authoring for audio-only and augmented audio reality applications, this section provides an overview of the systems design along with some implementation details and a discussion on hardware related issues. 4.2 Hardware As the main focus of the paper is on the authoring of virtual auditory environments, we will keep the discussion on hardwarerelated issues very brief. The hardware for our portable augmented audio system consists of a regular Laptop (Dell Inspiron8200), a digital compass that is used as head-tracking device, a gyro mouse for 360 degree interaction, bone-conducting headphones for the acoustic presentation and a W-Lan antenna along several portable W-Lan access points for the user positioning. Although the system is very low cost and cheap, it is still very reliable and achieves good results. The digital compass is a F350-COMPASS reference design from Silicon Laboratories that uses three separate axis of magneto-resistive sensing elements that are tilt compensated. The compass connects to the computer via USB and can be easily polled using a simple API. 4.1 Software The authoring system is based on a previous audio framework that was developed and applied to design audio-only computer games and to evaluate sonification and interaction techniques [7], [8]. This framework was built using OpenAL for sound rendering and OpenSG to manage the 3D content of the scenes. The same framework and libraries were used as basis for this authoring system, and extended by Trolltech’s Qt4 as API for the user interface design. Figure 3 shows a screenshot of the final application. The authoring system was designed to allow an easy authoring of (augmented) 3D virtual auditory environments without the need for special knowledge or programming experiences. Additionally, the system was designed as a universal modeler to serve 19 again, but the story and the story points were slightly adjusted to match the new requirements of the augmented system, especially for the user positioning. The story, the events and the user interaction were encoded within the dependencies of auditory textures. Although the system worked well, difficulties arose with the accuracy of the positioning due to the highly reflective stone walls that interfered with the W-Lan based user tracking. 5.1 Campus Training Simulation The other example, which shall be discussed in a little more detail, is an augmented virtual training scenario for the visually impaired. Figure 3 in Section 3 displays an overview of the map used and also shows the authoring of the dependencies using auditory textures. In this simulation, buildings and important places are characterized through certain sounds that are specific to their function. In our campus simulation, this is for example the rattling of plates and cutlery in the cafeteria, the rustling of pages and books in the library and space-like sounds representing the department of computer science. Using this training simulation, the user becomes familiar with the arrangement of buildings and locations in an off-line simulation using the player component. The orientation and position of the user are herby input using the mouse and keyboard. Later in the augmented version, the user walks through the real environment recognizing the various sounds. The user perceives the same sounds and information, except that the position and orientation are now measured by the digital compass / gyro mouse and the W-Lan positioning engine. The authoring for this training simulation was very straightforward and relatively easy. The 3D model could be designed very fast using 3D Studio MAX as the buildings did not need to be highly realistic. Describing sounds for each building were taken from a sound pool CD-ROM and also created by ourselves by simply recording the auditory atmosphere at these locations. In the final authoring using the system depicted in Figure 3, these sounds were assigned to each building along some object and position dependencies. Figure 8 displays a screenshot from the runtime component (left) and the W-Lan positioning engine (right). It shows the view from the department of computer science towards the library. In the right figure, the users position is marked by a bright red dot in the corner of the middle/right building. Figure 7: System Overview. The gyro mouse uses a similar principle to determine the mouses orientation in 3D space. It is used in the runtime system as alternative interaction device to specify the listeners orientation, but also to interact with the virtual environment and to input user selections. Bone-conducting headphones are employed to improve the blending of the two different auditory environments. Here we use a model from the Vonia Corporation. As the sounds are conveyed over the bones in front of the ear, the ear remains uncovered and is still fully functional to localize sounds from the real-world environment. Although frequencies below around 250 Hz can not be perceived, the quality is good enough to spatialize sounds through general HRTF’s. An evaluation of bone-conducting headphones for several applications including spatialization has been discussed by Walker et.al. [12]. The user positioning system uses an own implementation of W-Lan location determination systems [13], [14]. Our approach is a derivation of the standard Radar system that was extended by pre-sampled and interpolated radio maps and additional statistics to improve the performance. The resolution ranges between 1 m and 2 m and depends on the number of access points used and the rooms size and geometry. A huge advantage of W-Lan positioning over GPS is that it can be used inside and outside of buildings. With the growing number of commercial and private access points, this positioning technique uses a resource that is already in place. 5 Applications and Discussion The focus of the last sections was to form a theoretical foundation for 3D virtual auditory environments with applications in audioonly computer games and augmented audio reality. The emphasis in this section lies in the analysis of the results and a discussion on the performance of the authoring environment, the system and the initial definition of auditory environments. As one of the two foci was the design and evaluation of augmented audio reality applications, we have implemented and tested two different scenarios. One is an augmented adaptation of an earlier audio-only adventure game [8], while the other can be considered as a serious game that assists visually impaired people in training their orientational skills. The augmented audio adventure game takes place in an ancient cathedral of Magdeburg, were several sagas and myths have been combined into one story. A tourist visiting the city can unveil several mysteries, while at the same time learning about the history of the city and the cathedral. The 3D model that was used in the original game has been used Figure 8: Player Component (left) and W-Lan Positioning (right). Tests using both applications yielded good results, although some points need to be improved. So far we have tested both applications using sighted users only, but additional tests with visually impaired participants are scheduled for the next month. Although the entire campus is equipped with many overlapping access points, the positioning algorithm performs better indoors, due to the shadowing effects of the rooms geometry and furniture. In wide open spaces, such as the campus scenario, the signal strength is homogenous over long distances. Advantageous in 20 thank Stefanie Barthen for her help in designing the test scenarios, as well as the group of Professor Nett for helpful discussions on W-Lan positioning and for lending us portable W-Lan access points. outdoor applications is the existing ambient sound environment from the real-world, whereas indoors are more silent. Hence, the authoring for outdoor applications is easier as many sound sources are already present. Additionally, in outdoor augmented scenarios the distribution of event locations is scattered over a larger area, which at the same time allows a better positioning as overlapping effects are easy to avoid. One subject reported that the quality of the bone-phones was too poor and disturbed the perception, while all candidates stated that the system, see also Figure 9 is easy to wear and handle. References [1] Ronald T. Azuma. A survey of Augmented Reality. In Presence: Teleoperators and Virtual Environments 6, pages 355– 385, 1997. [2] S. K. Feiner. Augmented Reality: A New Way of Seeing: Computer scientists are developing systems that can enhance and enrich a user’s view of the world. Scientific American, April 2002. [3] Tony Hall, Luigina Ciolfi, Liam Bannon, Mike Fraser, Steve Benford, John Bowers, Chris Greenhalgh, Sten Olof Hellström, Shahram Izadi, Holger Schnädelbach, and Martin Flintham. The Visitor as Virtual Archaeologist: Explorations in Mixed Reality Technologies to enhance Educational and social Interaction in the Museum. In VAST - Conference on Virtual Reality, Archeology and Cultural Heritage, pages 91–96, 365, 2001. [4] Pin Interactive. Terraformers, 2003. PC. [5] Paul Milgram, David Drascic, J. Julius J. Grodski, Anu Restogi, Shumin Zhai, and Chin Zhou. Merging real and virtual Worlds. In IMAGINA’95, pages 218–220, 1995. Figure 9: The Augmented Audio System in Action. Important for all applications is a careful selection of sounds, as some of the sounds used in the campus training simulation were difficult to classify and sometimes even bothersome. Longer acoustic representations performed better that shorter ones. The next steps to improve the system and the two applications are a refining of the positioning system for outdoor tracking. Here we need more sampling points and a better interpolation scheme for the radio maps. Additionally, some sounds and event distances need to be checked and probably adjusted as well. But the most important part of future work is a detailed user study with sighted and blind users that features also a comparison between users that performed an off-line training with user that did not. [6] N. Röber and M. Masuch. Auditory Game Authoring: From virtual Worlds to auditory Environments. In Norman Gough Quasim Mehdi and Gavin King, editors, Proceedings of CGAIDE 2004, London, England, 2004. [7] N. Röber and M. Masuch. Interacting with Sound: An interaction Paradigm for virtual auditory Worlds. In 10th Int. Conf. on Auditory Display (ICAD), 2004. [8] N. Röber and M. Masuch. Leaving the Screen: New Perspectives in Audio-Only Gaming. In 11th Int. Conf. on Auditory Display (ICAD), 2005. [9] N. Röber and M. Masuch. Playing Audio-Only Games: A Compendium of Interacting with Virtual, Auditory Worlds. In 2nd Int. Digital Games Research Association Conf. (DIGRA), 2005. 6 Conclusions and Future Work In this work we have discussed virtual auditory environments and their basic qualities that define them. We have motivated this through several applications like audio-only computer games and augmented audio reality, for which the definition of auditory environments was extended. Furthermore, we have presented a system for the multipurpose authoring of various auditory applications together with several user supporting techniques. Finally we have presented and discussed a hardware realization for an augmented audio reality system along two example implementations. Future work includes, as already outlined in the last section, a detailed user study using sighted and blind participants, as well as a refinement of the positioning engine to improve the resolution and accuracy. [10] N. Röber and M. Masuch. Soundpipes: A new way of Path Sonification. Technical Report 5, Fakultät für Informatik, Otto-von-Guericke Universität Magdeburg, 2006. [11] Richard van Tol and Sander Huiberts. Audiogames Website. http://www.audiogames.net, 2006. [12] B. N. Walker and R. Stanley. Thresholds of Audibility for bone-conduction Headsets. In 11th Int. Conf. on Auditory Display (ICAD), 2005. [13] Moustafa Youssef and Ashok K. Agrawala. The Horus WLAN Location Determination System. In 3rd Int. Conf. on Mobile Systems, Applications, and Services (MobiSys 2005), 2005. Acknowledgment [14] Moustafa Youssef, Ashok K. Agrawala, and A. Udaya Shankar. WLAN Location Determination via Clustering and Probability Distributions. In IEEE Int. Conf. on Pervasive Computing and Communications (PerCom), 2003. The authors would like to thank Mathias Otto for his help in developing the augmented audio system and especially for his work on the W-Lan positioning engine. Furthermore, we would like to 21 From Heartland Values to Killing Prostitutes: An Overview of Sound in the Video Game Grand Theft Auto Liberty City Stories Juan M. Garcia E-mail: [email protected] Abstract: The video game, as understood by Jesper Juul, consist of two basic elements; a set of real rules and a fictional world. The fictional world requires the player not only to engage in a willing suspension of disbelief, the player must also willingly accept another cultural model as valid, at least during playtime. Those who engage Grand Theft Auto accept murder, extortion, sexism, and racism as valid, even when it may contradict their core set of values. The player has not turned his back on his old framework of understanding, he has instead been allowed to wear a new one, one that he must leave behind once he exits the virtual world. The actions of the player will be meaningful only within context. The virtual world, and the cultural model it promotes, reinforce the very real rules that dominate the game while simultaneously the rules reinforce the cultural model, and virtual world in which it exists. The following paper analyzes the role of sound in the video game Grand Theft Auto Liberty City Stories in creating the cultural model that the player adopts during game time. sets the story on motion. After the opening scene non-diegetic sound, in the form of music, appears only as brief snippets of the musical theme of GTA LCS; such snippets are heard after successfully completing a mission, and after finishing the game. 1 Objective The present study aims to expand the understanding of the use of sound in video games. For such purpose the video game Grand Theft Auto Liberty City Stories will be used as a case study. The use of GTA LCS obeys several reasons. First, the Grand Theft Auto Series has caused a lot of controversy for its violent and sexual content. The latest blunder comes from an exploit called hot coffee, the mentioned exploit allows the player to partake in a scene in which two characters engage in simulated sex. On the other hand the Grand Theft Auto Series and Liberty City Stories in particular have received positive reviews in the specialized press; IGN, GameSpot, and 1up —the sister website of the Playstation Magazine. Any game that generates as much presence on the media as Grand Theft Auto deserves a closer examination. Furthermore its clever use of sound provides as fertile ground for analysis. 3.1 Blue Arrows There are other sounds of a non-diegetic nature; most of them 'beeps'. Such 'beeps' accompany banners, or signs. The signs or banners are instructions, tips, and rules. These instructions are presented not as objects belonging to the fictional world but rather superimposed text, although part of the game. The texts, arrows, lights, pointers and sounds that indicate instructions or tips can be called 'blue arrows'. The term 'blue arrow' is used by Jesper Juul. [8] And lacking a better word to name the previously described instances 'blue arrow' will be used. In GTA LCS the “blue arrows”—which are in fact not blue arrows but yellow lights—appear also in the form of sound cues, 'beeps'. The use of 'blue arrows' does not seem excessively problematic. Alison McMachan points that most scholars and scientists agree that “total photo—and audio—realism is not necessary for a virtual reality to create immersion”. McMachan goes on to describe the requirements for an immersing environment; “the user's expectation of the game or environment must match the environments conventions fairly close”, the user's action must be reflected, have an effect, on the game or environment. And finally “the conventions of the world must be consistent even if they don't match the 'metaspace” . The use of non-diegetic sound remains consistent through out the game, marking the beginning, and ending—of both; individual missions and the game—and as aural 'blue arrows'.[11] 2 Welcome to the Worst Place in America Welcome to Liberty City, the “worst place in America”, nowhere to be found but on edge of reality. An intangible place, untouchable, ephemeral, fictional; yet it can be experienced, lived, suffered. Liberty City is the setting for the video game Grand Theft Auto Liberty City Stories and, like many other fictional places with names like Azeroth and Hyrule, it has become part of the the life of millions of people. As a media the video game, and its worlds, has truly become, as Marshall McLuhan would say, an extension of man, an extension of man's world. The following pages are but a little inquiry into the role that sound can play in creating the virtual playgrounds of the video game. 3.2 Mixing diegetic and non-diegetic sound. The simultaneous use of diegetic and non-diegetic sound , points to a complex system of symbols and a complex interaction between the user of the game and the game. According to Troy Innocent the electronic space of the game allows “multiple forms of representation to easily coexist”. [6] The player of GTA LCS has to be able to navigate between different uses of sound . The non-diegetic sounds are separated from what Aarseth calls “Game-world”—One of the three elements of games that Aarseth describes. [1] The player may assign different meaning to diegetic and non-diegetic sound, and use the information derived from such sounds appropriately. 2.1 Half-Real In his latest book, Half-Real, Jesper Juul, researcher at the IT University of Copenhagen, defines games as a “combination of rules and fiction”. If we are to take Juul’s statement as valid then a question arises; does sound in the video game obeys to the need to create a fictional world, or to create and maintain rules? Can sound do both, maintain fiction and rules, at the same time or are they mutually exclusive? These are the questions at the core of the present paper.[7] 3 Non-Diegetic Sound The graphic interface video games use is in more than one case composed of elements that belong to the fictional world of the game and other elements such as the HUD—Heads-Up display—or maps. The same applies to sound. Diegetic sound In the opening scene of Grand Theft Auto Liberty City Stories the main character, Tony Cipriani, arrives to Liberty City. This scene is fashioned as a movie scene; it use of background music—non-diegetic sound—a variety of camera angles, and 22 and Non-diegetic sound coexists on the same game. The examples of GTA III and the latest version of Zelda for the Game Cube are used by Juul as representatives of the use of blue arrows. Some games such as Shadow of The Colossus have tried to shed these blue arrows. One can only suggest that such attempts to relinquish the use of on-screen content that does not belong to the story space are made to create a more immersing world. In the most immersing environments reminders of the structural level of the game are gone and the player can concentrate on the game-world level. understanding the rules” (Juul 2005). The objective of the game is to complete missions that involve violence and crime, simultaneously the fictional world of Liberty City—the setting of the game—invites violence. Rules indicate commit crimes, and the fictional world indicates negotiation, interaction and compassion are not a part of this world. One can find examples of the coherence between the fictional world and the rules of the game. The video game rates the criminal ranking of the player; more crime and mayhem higher criminal ranking. The killing of characters is therefore viewed as a positive accomplishment in the game. On the other hand the character of Liberty City are for the most part unable to help the player successfully complete his missions. The characters that roam the streets of Liberty City provide no relevant information, or items that may help the character. The characters are however useful dead since they can drop weapons or cash—both useful tools. It would be the job of researches to investigate if techniques such as the non-use of HUDs and blue arrows affect immersion. In the case of GTA LCS the user can deactivate the the HUD and on-screen radar however he can not deactivate the blips and most blue arrows. The lack of such a feature is important because it denies the player the option of selecting the way he or she prefers to engage, and experience, the game. Making available those options could possibly create a variety of ways to experience the game. That significant possibility is an area open to further research. Since the majority of the sounds on the game GTA LCS are diegetic, and diegetic sound motivated the research, it is only appropriate to proceed to analyze it. The inability to communicate is manifested in several ways one of them aurally. The player soon notices, as both Leonard and Frasca did, that the utterances of the players are irrelevant and insignificant. This information is conveyed only aurally because although there is an option for subtitles this option does work on most characters. The cinematic scenes, the cut screens do get subtitles but the regular citizens that do not partake on the main story line can only be hear, not read. 4.1 A cold World David Leonard published a strong critique of Grand Theft Auto III. His article aimed to demonstrate the racial, social and class undertones behind GTA III. Leonard’s article is a searing criticism of what he identifies as white supremacy values being portrayed by the game. While the present study does not aim to give a moral evaluation of Grand Theft Auto, and while Leonard's criticism is of a different title to the one being used here Leonard’s text touches interesting aspects of the use of sound. He mentions that your enemies “have no voice or face”. He goes on to explain that the only rule that seems to dominate the main character world is “kill or be killed”. [10] The installment of Zelda for the SNES console contrasts with GTA LCS in various ways. First, the main character in Zelda can communicate with many of the characters on the game; they provide information, and items that are helpful. In such case the indiscriminate murder of characters becomes counterproductive—not to mention that it is not allowed. However in Zelda information is conveyed in the form of text. Aural feedback does not defines the world the same way that GTA. Newer games may use the same strategy of Zelda in that they make their characters capable of communicate and become useful. However these newer games can rely on sound to allow their characters to communicate. A different approach to Grand Theft Auto III was published by Gonzalo Frasca. He also referrers to the virtual inhabitants of GTA. Frasca describes the as “nothing short autistic” (Frasca 2003). The characters, inhabitants, of GTA III “remind the gamers that they are dealing with a bot” (Frasca 2003). The characters are dehumanized and objectified (Frasca 2003)—the criticism of Leonard. The similarities between Frasca's approach and Leonard's end there. Another example is the infamous prostitute trick. The main character, when riding a nice car, can drive close to a prostitute and slow down, the prostitute character then boards the car, and when the car is driven to an alley the main character will see his health indicator raise at the same time his money counter will lower. The indication that a sexual act may be taking place is also conveyed trough sound in form of phrases and car sounds since the car may be out of sight. In this instance is also impossible to negotiate or communicate, the only phrases are sexual in nature, or the offering of sexual services. The audio reflects the fact that the prostitute's usefulness is limited to raise the health meter of the main character. The fictional world is supports the fact that the main character may profit more of using and then shooting the prostitute—thus recovering the money. Such actions are reprehensible in the real world but useful in Liberty City. It is then again reflected in sound that the prostitute character serves a limited function in the game, and is more of an object, like every other inhabitant of Liberty City. 4 Diegetic Sound Leonard's findings lead him to criticize the game as nothing short of racist. Frasca on the other hand realizes that the dehumanization of the characters on the game and the social isolation of the main character—the playable character—allows the player to concentrate on his own actions. [3] The same analysis that Frasca does of GTA III applies to GTA LCS. The characters have not evolved a whole deal since the days of GTA III. The non-playable characters that populate the streets of Liberty City limit themselves to taunts, insult or screams, that is if they decide to talk at all. Even the elderly that Leonard described on GTA III as few of the innocent ones are mean or cowards, they will physically fight the main character, or run away. The attitudes of the inhabitants of GTA LCS as manifested through their utterances help define the kind of world that Liberty City is. Their short sentences and inability to talk important information to the player condemn them. whether it is the old man, the gangster or the prostitute they become, as was indicated in Leonard criticism and Frasca's essay, objects, bots. One may suppose that communication is possible since on In Half-Real Jesper Juul explains that the fiction aspect of video games plays a very important part in cuing the player “into 23 occasion groups of individuals are see standing together in what may resemble social interaction, but the illusion soon collapses; the groups are mostly gangs and no significant or important conversations take place. The video game has to cued the player and indicated him that that killing innocent people is not bad in the game. The player is now free as Frasca indicated to focus on his actions. The player is free of moral dilemmas since he was presented with a framework—the fictional world—that rewards violence, and this framework is presented partially trough sound it is a convergence of the fictional world, and the rules— including objectives of the game. “minimal use of diegetic sound”. [10] The use of sound in GTA LCS contrast heavily with JSRF. The player in GTA LCS is connected to the fictional world trough the radio. The radio, unlike the headphones in JSRF, does not mute the sounds of the city instead connects the player to the city since the consequences of the actions of the player are a news broadcast on the radio of Liberty City. The radio also reveals key aspects of the story line of the game. It for example announces the escalating war between gangs. Such use of sound results ingenious. The announcements on the radio serve multiple purposes on one hand it takes the place of the infamous cut-scene. The cut-scene is described by Juul as a “non-interactive sequence of a game”. Cut-scenes result problematic in several ways. As Juul explains they can be a “non-game element” . Cut Scenes negate the player the possibility of interacting. Cut-scenes also create a different representation of time. The in-game radio announcements do not disrupt the representation of time during the game as cut scenes do. One may consider this a clever use of in-game objects to inform the player without disrupting game play. [8] In recounting the history of video games technology journalist J. C. Herz describes Doom, as “deliciously clear-cut”. (Herz 19970 The lack of moral ambiguities, they impossibility of humanizing demons grant the player the freedom to blast trough mazes.[ ] One could say that the more complex world of Grant Theft Auto has to work overtime in dehumanizing characters so that the player can blast through the streets of Liberty City free of ambiguities. 4.2 On the Radio Another interesting aspect found on the game is the use of radio stations. Each time a player carjacks a vehicle he is allowed to tune in to different ‘radio stations’. These stations are prerecorded songs and programs that include among others a talk show sort of a mix between Fox News pundits, Jerry Springer, and an The 700 Club. The show features a character named Nurse Bob and his show named Heartland Values. The stations can also be turned of if the player designs. The content of this show and the other stations help define the game world in a completely aural way. The DJs are reactionary and often are a parody of homosexuals, liberals, conservatives, media, and pretty much everything else. The radio shows help define the violent sexist world of Liberty City Stories. They present the player with a larger view of Liberty City. It is not only that the people one finds on the street are worthless, but the whole society of Liberty City is presented through its media—the radio—as reactionary, intolerant, corrupt and perverted, not open to dialog. The radio stations further reinforce the message that not only the people but the whole society of Liberty City is despicable. The player is in a way relieved of any moral conflict. There are examples of in-game objects being used to reveal important information or parts of the story. In Half-Real Juul mentions Myst as an example where information is found in book, on the game world. However in of the cases of Myst the information is relied visually while in GTA LCS it is relied aurally, with the option of subtitles. The aural world of Liberty City surrounds the player. Police sirens, automobiles and rain sounds flood the user. The game sacrifices the use of mood-creating music and instead it gives the player the noise of the city. And in doing so the game creates an environment that surrounds the player; while player does not have a 360° view of the game world it can hear all around him. The player can turn the character around and observe whats behind it, the player however cannot see both back and front simultaneously, he can on the other hand hear what is supposedly happening behind or around the avatar. Through that use of sound the illusion that a world extends beyond the screen is reinforced. Before advancing is important to notice that in the previous observations no moral judgment is given as to Grand Theft Auto as a game. And this is partially because the paper analyzes how information is used to construct a fictional. If the information and communications engaged in the game are then used outside the realm of fantasy is of no concern to this paper. That is not to say that such issues are of no concern to the researcher. However those issues should be approached in future research as well as by other academics. The ability of the player to hear the world around his character is an ability that proves to be useful during playtime. The main character is a criminal with very few friends, he may be even hunted by members of his same gang. The ability to hear police cars when the character is being looked by the police or to hear gunshots behind when visiting a rival gang territory can be a lifesaver. This ability is not always useful since the player may be in a situation in which the game can not be heard or the aural conditions are less than ideal. In this case the experience of the game changes, and such change can be further explored by other researchers. There is also an interesting comparison of GTA LCS with a different game, in this case Jet Set Radio Future. The focus of the comparison is how sound sets the player in relationship to the fictional world. The first installment of Grand Theft Auto provided a bird's eye view where the player could observe both front and back simultaneously, in this case the sound can be used for example to reveal that police cars are in the area although not visible yet. However the information that the character is being attacked from behind is better conveyed visually. The video game Jet Set Radio Future was published for the Dreamcast console. In JSRF players engage in Skateboarding battles, during the game the main characters wear headphones. Trough those headphones the players mute the sounds of the city and of its surroundings. Nicholls believes that “[t]he use of head sets in the game sets the player apart from the city that is transformed in the course of play” and furthermore this creates a situation in which “[t]he player is situated in a space that refuses the sociability of urban capital” all this accomplished trough the 5 Conclusions The previous pages constitute a collection of examples of uses of sound in Grand Theft Auto. The present study however covers the use of sound in a very superficial manner. There is 24 to Ludology, The Video Game Theory Reader. Ed. By Mark J. P. Wolf et al., New York, Routledge, 221-235 (2003) [5] Herz, H.C., Joystick Nation, New York, Little, Brown (1997) [6] Innocent, Troy, Exploring the Nature of Electronic Space Through Semiotic Morphism Melbourne Digital Arts and Culture Conference, RMIT University, (2003) <http://hypertext.rmit.edu.au/dac/papers/> [7] Juul, Jesper, Games Telling Stories-A Brief Note on Games and Narratives, Game Studies: The International Journal of Computer Game Research, v.1 n.1 (2001) <http://gamestudies.org/0101/juul- still a lot of work to be done in order to better analyze both sound and Grand Theft Auto. There are several issues that could not be covered by the present study. Future researchers will have to analyze the actual level of impact that particular use of sound has on play and the player. For instance it was proposed that sound helps define the world of GTA as violent and isolated, and such presentation of the world allowed the player to engage more easily on the criminal enterprise. However it was not clearly measured how relevant audio is in the creation of such a fictional world as compared to visuals, manuals, previous knowledge of GTA, advertisement or social interaction. It is possible that sound weights heavily in allowing the player to commit virtual crimes, it is also however possible that players rely on previous knowledge of GTA to be able to easily engage in simulated criminal activity. Further research is needed to understand what affects the understanding of a virtual world and at what degree. gtsHYPERLINK "http://gamestudies.org/0101/ryan" > [8] Juul, Jesper, Half-Real, Cambridge, MIT (2005) [9] Leonard, David, Live in Your World, Play in Ours: Race, Video Games, and Consuming the Other, v.3 (2003) [10] Nicholls, Brett, Ryan, Simon, Game, Space and the Politics of Cyberplay Melbourne DAC , RMIT University (2003). [11] Wolf, Mark J. P., and Bernard Perron, eds. The Video Game Theory Reader. New York, Routledge (2003) The ability of a player to move between the rules of the virtual world, fictional world, and the real world should also be investigated more deeply since such a research may indeed provide both researchers and academics with information as to what clues determine the behavior of a person. Such research may help create more deep and engaging video games while at the same time help understand connections between media and violence. Future research in the field of sound and music, and video games can also focus on financial issues and their repercussion on the production in video games. Both Grand Theft Auto San Andreas and Grand Theft Auto Vice City contained popular music from groups of the time period they depict. The sequel, Grand Theft Auto Liberty City Stories, did not contain a heavy roster of popular music. The price of licensing music makes it prohibitive. How are smaller developers coping with the costs of producing sound and music? Furthermore previous video games on the GTA series allowed the player to place MP3 music files in a folder and play them when riding a car in the game. The newer Liberty City Stories did not allowed such thing. A later software release allowed players to place their own tracks but not in the MP3 format and only if the player had a physical CD, no digital music files like those bought on iTunes. Such attitude makes one wonder if current attitudes toward Digital Rights Management and Copyright from the content industry are influencing what game developers can build. The video game has to be observed as a form that constantly changes and evolves, such changes have to be measured and understood, to discover more about both human communication and the future of the video game. The researchers have only dipped their toes into the vast pool that is the world of the video game. There is a wide open field of study, of which sound is only an part, a minimally studied part. Further research is needed in every single aspect. References [1] Aarseth, Espen, Playing Research: Methodological Approches to Game Analysis, Melbourne Digital Arts and Culture Conference, RMIT University (2003) <http://hypertext.rmit.edu.au/dac/papers/> [2] Aarseth, Espen, Cybertext, Baltimore, John Hopkins (1997). [3] Frasca, Gonzalo, Sim Sin City: Some Thoughts About Grand Theft Auto 3, Game Studies: The International Journal of Computer Game Research, v.3 n.2 (2003) [4] Frasca, Gonzalo, Simulation versus Narrative: Introduction 25 Physically based sonic interaction synthesis for computer games Rolf Nordahl, Stefania Serafin, Niels Böttcher and Steven Gelineck Medialogy, Aalborg University Copenhagen Lautrupvang 15 2750 Ballerup, DK rn, sts, nboe05, [email protected] Abstract. In this paper we describe a platform in which sounds synthesized in real-time by using physical models are integrated in a multimodal environment. We focus in particular on sound effects created by actions of the player in the environment such as waking on different surfaces and hitting different objects. The sound effects are implemented as extensions to the real-time sound synthesis engine Max/MSP.1 An 8-channel soundscape is spatialized using the vector based amplitude panning (VBAP) algorithm developed by VIlle Pulkki [17]. The sonic environment is connected through TCP/IP to Virtools.2 1 Introduction In computer games and virtual environments, pre-recorded samples are commonly used to simulate sounds produced by the physical interactions of objects in the environment, as well as sounds produced when a user acts in the scenario by, for example, walking on different surfaces and hitting different materials. This approach has several disadvantages: first of all the sound designer needs to gather a lot of sonic material corresponding to the different actions and events in the environment. This is usually done by using sound effects libraries or recording sound effects, in the same way as it is done by a Foley artist in the movie industry [13]. Moreover, sampled sounds are repetitive, and do not capture the subtle nuances and variations which occur when objects interact with different forces, velocities, at different locations, and so on. This is usually overcome by applying processing to the recorded sounds, so some random variations are present. However, by using sound synthesis by physical models these disadvantages can be overcome. Physical models are widely developed in the computer music community [19], where their main use has been the faithful simulation of existing musical instruments. One of the pioneers in the field of parametric sound effects for interactive applications such as computer games and virtual reality is Perry Cook. In his book [6], Cook describes several algorithms which allow to create synthesized musical instruments and sounding objects, mostly using physical principles. The issue of creating sound effects using synthetic models in order to syncronize soundtracks and animation was first explored in [20, 10] using a structure called Timbre Tree. Recently, synthetic sound models in computer animation have seen an increase of interest. Van den Doel et al. [12] propose modal synthesis [1] as an efficient yet accurate framework for the sonic simulation of interactions between different kinds of objects. The same synthesis technique has also been used by O’Brien et al. [16], as a computationally efficient alternative to the finite element based simulations proposed in [15]. Complex dynamical systems have also been simulated both sonically and visually by decomposing them into a multitude of interacting particles [3], in a system 26 called CORDIS-ANIMA. In it, discrete mass-spring-damper systems interact with nonlinearities representing the input excitations. In this paper, we describe a framework for real-time sound synthesis by physical models of different interactions in a computer game. We focus in particular on impact and friction sounds produced when a player interacts with objects in the environment. While the scenario’s soundscape and the ambient sounds are created by using sampled sounds, the focus of this paper is on sounds produced by actions of the player. Examples are the sounds produced when the player hits hard objects or scrapes against surfaces of different materials with different forces and velocities. Such sounds are well suited to be simulated using physical models, especially given the fact that nowadays most game engines have physically based graphics engine in which forces and velocities of impacts and friction are calculated. Such physical parameters can be used as input parameters to the sound synthesis engine. We are particularly interested in creating physically based sound models that are rich enough to convey information about a specific environment yet efficient to run in real-time and respond continuously to user or system control signals. In [9, 8], Gaver proposes a map of everyday sound producing events. Examples of basic level events might include hitting a solid, scraping it, explosions, and dripping noises. More complex events, then, can be understood in terms of combinations of basic-level ones, combinations which are structured in ways which add information to their simpler constituents. Different platforms which allow to obtain sound synthesis by physical models are already available in the computer music community, although they have not yet been exploited in computer games. As an example, the Synthesis Toolkit (STK) by Perry Cook and Gary Scavone [5] is a collection of C++ classes which implement physical models of different musical instruments, mostly using the digital waveguides technique [19]. Another example is JASS (Java Audio Synthesis System) by Kees van den Doel [12], a unit generator synthesis program written in JAVA, which implements physical models of different sound effects based mostly on modal synthesis [1]. T R A C K E R V V I S U A L I Z A T I O C V The current development of novel interfaces for games, such as the Nintendo Wii,3 stimulates the implementation of a tighter connection between gestures of the user and corresponding sounds produced [2]. This connection is strongly exploited in the computer music community, where so-called new interfaces for musical expression are developed to control several sound synthesis algorithm,4 but it is yet not fully exploited in computer games and virtual reality applications. We believe that a stronger connection between player’s gestures and resulting sonic environment can be obtained by using sound synthesis by physical models. The paper is organized as follows. Section 2 introduces a multimodal architecture where sound synthesis by physical models have been integrated; Section 3 describes our strategies to track positions and actions of the user; Section 4 describes how the interactive sounds and the soundscape have been implemented; Section 5 introduces the visualization technique adopted, while Section 6 and 7 present an applications and conclusions and future perspectives respectively. 2 A multimodal architecture Figure 1 shows a multimodal architecture in which sound synthesis by physical models has been integrated. The goal of this platform is to be able to precisely track positions and actions of the user, and map them to meaningful visual and auditory feedback. The position of the user is tracked by using a 3D magnetic tracker produced by Polhemus.5 Moreover, a pair of sandals equipped with force sensitive resistors (FSRs) allow to detect when a user performs a step in the environment, together with the force of the impact. Such input parameters are mapped to the footsteps sounds which are synthesized using physical models. The Polhemus tracker is connected to the PC computer running Virtools, i.e., the visual rendering and game engine, while the footsteps controller is connected to the PC computer running Max/MSP. The two computers communicate together through TCP/IP. Finally, the synthesized interactive sounds, together with the ambient sounds are spatialised to an 8-channel surround sound system. In the following, each component of the environment is described in more details. We start by describing the tracking systems used, since they are the input of the interactive sound designed. 3 Tracking the user As mentioned above, the position and motion of the user are tracked in real-time using a Polhemus Fastrack tracker and an ad-hoc designed footsteps’ controller. I W H O I E R E C O L N E S T O L L E S O U N C O U N O M 8 S C I R H U E A R F N R A C N O E E U 8 L N S I S U A O L F I T Z T A W E T A I M P U T E T G A I T N 0 a x / M R A C K E R D A E U S R S C P / I P P 0 S D Figure 1: Connection of the different hardware and software components in the multimodal architecture. Two computers providing the visual and auditory rendering respectively communicate in real-time the tracker’s data and the sound synthesis engine status. surement of position (X, Y, and Z Cartesian coordinates) and orientation (azimuth, elevation, and roll), which are mapped to the sound engine as described later. Given the limited range of the tracker of about 1 1/2 meter, the receiver was placed in the center of the 8-channels configuration. 3.2 The footsteps’ controller The users visiting the environment are asked to wear a pair of sandals embedded with pressure sensitive sensors, placed one in each heel as shown in Figure 2. Such sandals are wirelessly connected to a receiver, which communicates to the Max/MSP platform, by using ad ad-hoc designed interface [7]. 3.1 The magnetic tracker The Fastrack computes the position and orientation of a small receiver placed on top of a hat worn the user, as shown in Figure 4. This device provides six degrees of freedom mea3 wii.nintendo.com/ 4 More information on this issue can be found in the proceedings of the New Interfaces for Musical Expression (NIME) conference, www.nime.org 5 www.polhemus.com 27 R O R D T F L U N E D T N E E O P R S M O M S R S R T O S S R N Although sensing only the pressure of the impact on the floor does not allow to track all the parameters of a person walking in the environment, and more sophisticated footsteps’ controllers have been built (see, for example, [11]), experiments with our configuration show that motion of subjects and sense of presence are significantly enhanced when self-sounds are added and controlled by this device [14]. T A T T R R A A N C S K M E R I T ' T S E R Figure 4: The Polhemus magnetic sensor is placed on the user’s head, so auditory and visual feedback can be rendered according to the position and orientation of the user. P R E S S U R E S E N S O R S Figure 2: The interactive sandals are equipped with pressure sensors which trigger footstep sounds and forward movement in the virtual world. T R R A E C C K E E I V R E ' S R Figure 3: The setup of the 8 speaker system. The magnetic tracker faces used were metal, wood, grass, bricks, tiles, gravel and snow. Such surfaces were resynthesized using modal synthesis [1] and physically informed sonic models (PHISM) [6, 4]. The footsteps’ synthesizer was implemented as an external objects in the Max/MSP platform. The control parameters of the synthetic footsteps were the fundamental frequency of each step and the amplitude and duration of each step. The amplitude and duration of each step were directly controlled by the users thanks to the pressuresensitive equipped shoes. The sensors controlled the frequency of the steps, as well as their duration and amplitude. To enhance variations among different steps, the fundamental frequency of each step was varied randomly. The different surfaces varied according to the different scenarios of the game in which the user was present. As an example, when the user was navigating around a garden, the grass surface was synthesized, which became instantly a wood sound when the user was walking in a hardwood floor. emitter is situated in the center, directly above the user. 4.2 3D sound 4 Sound design Non speech sounds in computer games can be divided into soundscape or environmental sounds and sound effects. Soundscapes and environmental sounds are the typical sonic landmarks of an environment. They are usually reproduced by recording and manipulation of existing sounds, and do not strongly depend on the action of the users. On the other end, sound effects are usually produced by actions of the user in the environment, or by interaction between objects, and they can strongly depend on events in the environment. Such sounds are highly dynamic and vary drastically depending on the interactions and objects, and therefore are difficult to create in a pre-production process. We decided to use sound synthesis by physical models for the creation of sound effects, and pre-recorded samples for the creation of the soundscape. 4.1 Interactive footsteps Footsteps recorded on seven different surfaces were obtained from the Hollywood Edge Sound Effects library.6 The sur6 www.hollywoodedge.com 28 The pre-designed soundscape which implemented ambient sounds was spatialized to an 8-channels system using the vector base amplitude panning technique (VBAP). VBAP is a method for positioning virtual sources to multiple loudspeakers developed by Ville Pulkki [17]. The number of loudspeakers can be varying and they can be placed in an arbitrary 2D or 3D positioning. In our situation, we chose a 3D configuration with 8 loudspeakers positioned in the vertexes of a cube, as shown in Figure 3. This is to preserve the same configurations as in CAVE systems. The goal of the VBAP is to produce virtual sources which are positioned at a specific elevation and azimuth specified by the user. The idea behind VBAP is to extend the traditional panning techniques for two loudspeakers to a configuration of multiple speakers. We used the VBAP algorithm to position the ambient sound in a 3D space. Such sounds are prerecorded samples which are positioned in a 3D space in realtime using the Max/MSP implementation of the VBAP algorithm. The algorithm allows also to simulate realistic moving sound sources, by continuously vary elevation and azimuth of the different input sounds. 5 Visual feedback The visual feedback was delivered using a 2.5x2.5x2.5 m. single screen. 3D visualization was delivered using anaglyph and implemented in the Virtools platform. Virtools is a powerful game engine, which provides the possibility of having both block based programming in a similar way as Max/MSP, as well as implementation of ones own’s blocks in C++. The 3D stereo was rendered using two Nvidia GeForce graphics cards 7 . A connection between Max/MSP and Virtools was obtained by using the flashserver object in Max/MSP8 and the NSClient BB developed in Virtools.9 6 Application: an hide and seek game In order to test the capabilities of the platform, an hide and seek game was developed. In this multi-users game the players have to find each others or escape from each others in a virtual environment. In the implemented example, the scenario is a small town. The idea behind the game is the connection between two VR CAVEs, with a user in each of them. The users are equipped with headset microphones, so they can communicate during the game. The sound of the other person is then panned into the exact position of the user in the came. By using auditory cues from the interactive sandals, one user can also derive location and position of the other person. The users are represented by avatars. They are only able to see the other user’s avatar but their own. Two persons outside the environment are connected to the game via LAN network. The can communicate with the users inside the game, and their goal is to transmit information about the location of the opponent. The external users are also able to upload 3D objects or sounds in the game in real-time. In this way, they are able to disturb the opponent user and enhanced the atmosphere by varying the current soundscape. Figure 6: Setup of the game with four users. 7 Conclusion In this paper we have described a multimodal architecture where interactive sounds synthesized by physical models as well as ambient soundscapes have been integrated. As done in [12] and [18], our current focus is on impact and friction sounds produced by actions of the user while interacting in the environment. In particular, we have focused our description on the use of footsteps sounds, since they play an important role in game design. We are currently extending this architecture to the use of action sounds produced by interaction of the user with other body parts, such as sound produced when the user hits, grabs and touch objects in the environment. As mentioned in the introduction, computer games currently released in the market use sampled sounds instead of computer generated sounds. The main reason for this choice is from one side the high computational cost of producing high fidelity sound synthesis by physical models, but on the other side the lack of sound quality of most synthesized sounds. Even in the field of musical instruments, which have been synthesized by using physical models for more than three decades, the quality of physical models is yet not as high as the original instrument which they are trying to simulate. Of course many progress has been done in this area, but we are not yet at a point where physical models can be used in a commercial applications. We are currently conducting experiments to understand if the use of physically based sounds enhances realism and quality of the interaction in a game. References [1] J.M. Adrien. The missing link: Modal synthesis. In Representations of Musical Signals. in: G. De Poli, A. Picalli, and C. Roads, eds., MIT press, 1991. [2] T. Blaine. The convergence of alternate controllers and musical interfaces in interactive entertainment. In Proc. International Conference on New Interfaces for Musical Expression (NIME05), 2005. [3] C. Cadoz, A. Luciani, and J.-L. Florens. Physical models for music and animated image. The use of CORDISANIMA in Esquisses: a Music film by Acroe. In Proc. Int. Computer Music Conf., Aarhus, Denmark, Sept. 1994. Figure 5: The view of one user playing the hide and seek game. 7 www.nvidia.com 8 The flashserver object for Max/MSP was developed by Olaf Matthes. BB was developed by Smilen Dimitrov at Aalborg University in Copenhagen 9 The NSClient 29 [4] P. Cook. Physically informed sonic models (phism): Synthesis of percussive sounds. Computer Music Journal, 21(3):38–49, 1997. [5] Perry R. Cook. Toolkit in C++, version 1.0. In SIGGRAPH Proceedings. Assoc. Comp. Mach., May 1996. [6] P.R. Cook. Real Sound Synthesis for Interactive Applications. AK Peters, Ltd. Natick, MA, USA, 2002. [7] S. Dimitrov and S. Serafin. A simple practical approach to a wireless data acquisition board. In Proc. NIME, 2005. [8] W. Gaver. How do we hear in the world?: Explorations in ecological acoustics. Ecological psychology, 5(4):285–313, 1993. [9] W. Gaver. What in the world do we hear?: An ecological approach to auditory event perception. Ecological psychology, 5(1):1–29, 1993. [10] J. Hahn, J. Geigel, J. Lee, L. Gritz, T. Takala, and S. Mishra. An Integrated Approach to Audio and Motion. Journal of Visualization and Computer Animation, 6(2):109–129, 1995. [11] A. Benbasat Z. Teegarden J. Paradiso, K. Hsiao. Design and implementation of expressive footwear. ICM Systems Journal, 39(3):511–529, 2000. [12] P. Kry K.van den Doel and D. Pai. Foleyautomatic: Physically-based sound effects for interactive simulation and animation. In Proc. ACM SIGGRAPH, 2001. [13] R.L. Mott. Sound Effects: Radio, TV, and Film. Focal Press, 1990. [14] R. Nordahl. Design and evaluation of a multimodal footsteps controller with vr applications. In Proc. Enactive, 2005. [15] J. O’Brien, P. R. Cook, and G. Essl. Synthesizing Sounds from Physically Based Motion. In Proc. Siggraph, Computer Graphics Proceedings, pages 529–536, 2001. [16] J. O’Brien, C. Shen, and C. Gatchalian. Synthesizing Sounds from Rigid-Body Simulations. In Proc. Siggraph, Computer Graphics Proceedings, pages 175– 203, 2002. [17] V. Pulkki. Generic panning tools for max/msp. In Proc. ICMC, 2000. [18] D. Rocchesso. Physically-based sounding objects, as we develop them today. Journal of New Music Research, 33(3):305–313, 2004. [19] J.O. Smith. Physical modeling using digital waveguides. Computer Music Journal, 16(4):74–91, 1992. [20] T. Takala and J. Hahn. Sound Rendering. In Proc. Siggraph, pages 211–220, 1993. 30 The Composition-Instrument: musical emergence and interaction Norbert Herber Indiana University Bloomington Department of Telecommunications Radio-TV Center 1229 E. 7th St. Bloomington, IN 47405 USA [email protected] Abstract. As a musician and sound artist, I have always understood the process of composition as the conception and organization of musical ideas, and an instrument as something that provides the necessary apparatus to realize such a work. However, my recent work with computer games and digital media has led me become increasingly curious to blur the lines between these terms and consider a coalescence of “composition” and “instrument.” In digital games and other environments of telematic interaction, a composed musical work can both stand-alone and provide a point of individual musical departure. Heard on its own the piece creates an experience of sound. But when altered by one or several users in the course of an interaction, it serves as an agent for further musical expression, exploration, and improvisation. The composition-instrument is a work that can play and be played simultaneously. This paper, building on a research project conducted in the summer of 2006, examines the synergies found in the experimental music of Earle Brown and Terry Riley, Free Improvisation, the game pieces of John Zorn, generative music, the interactive works of Toshio Iwai, contemporary music practice based on file sharing, electronic instrument construction, and computer game design. Across these disparate genres there is a confluence of technical and aesthetic sensibilities—a point at which the idea of a “composition-instrument” can be explored. Examples and previous research by the author are used to focus the discussion, including a work based on swarm intelligence and telematic interaction. interaction. This “instrumentalization” transforms the work into an agent for further musical expression and exploration. Thus, a composition-instrument is a work that can play and be played simultaneously. 1 Introduction In the conventional practice of music, the process of composition can be understood as the conception and organization of musical ideas, whereas an instrument provides the equipment necessary to realize such a work. In contemporary interactive media such as multimedia web sites, computer games, and other interactive applications involving the personal computer and mobile devices, this distinction remains largely the same. The composition of the music heard in these environments consists of musical statements to be heard and instructions to be executed in the course of an interaction. Often these structures call for a great deal of random sequencing and repetition following a linear structure. [1][2] The instrument can be simulated in software and manipulated using the inputs of an interactive system. It is usually represented as a database of recordings or samples. Composition and instrument are treated as distinct in the structure underlying the media product and function in their traditionally separate roles. A composition-instrument is not a specific piece of music or interactive work in itself but a means of approaching any work where music can be created and transformed. Composition-instrument is a conceptual framework that helps facilitate the creation of musical systems for interactive media, art, and telematic environments. This paper will discuss the historical context of this compositional approach and show how it is beginning to emerge in the current field of interactive media. The example of an original work aspires to demonstrate how a composition-instrument approach to music exhibits a congruity with the emergent nature of the medium. And finally, discussion of a contemporary computer game project exposes the potential of this musical concept in the world of games, digital art, and telematic media. This separation, while not wholly damaging to the experience of the media, should not be immune from scrutiny. Music that operates in a binary, linear mode does little to recognize the emergence, or becoming, that one experiences in the course of an interactive exchange. A traditional, narrative compositional approach leaves no room for the potential of a becoming of music. There is need for a critique of music in contemporary interactive media. The emergent, non-linear experience of interactivity is incongruous with the overly repetitive, linear music that is often heard in this field. It is time to ask: What kinds of compositional techniques can be used to create a music that recognizes the emergence and the potential of becoming found in a digitally-based or telematic interaction with art and media? 2 History Though the idea of a composition-instrument hybrid is situated in the praxis of computer games, telematic media and digital art, the historical precursors to this kind of compositional approach lie in an entirely different field and stem from three different musical traditions: Experimental, Improvisatory, and Generative. Each of these traditions has established aesthetic approaches, creative processes, and musical style. A historical perspective helps to reveal how these attributes can be woven into the fabric of a compositional approach for music that operates in art and media environments with telematic and digitally based interaction. 2.1 Experimental Music The roots of a composition-instrument approach can be found in Experimental music. American composer Earle Brown was looking for ways to open musical form and incorporate elements of improvisation into his music during the 1950’s. He found a great deal of inspiration in the mobiles of sculptor Alexander Calder. Brown described them to improvising guitarist and author Derek Bailey as, “…transforming works of art, I mean they have indigenous transformational factors in their construction, and this 1.1 Composition-instrument Blurring the traditionally distinct roles of composition and instrument provides one possible answer to this question. This approach allows a piece of music to play, or undergo a performance like a traditional composition. When it plays it allows listeners or users to have a musical experience of sound. But it can also be played like a conventional instrument. This treatment allows the musical output of the work to be modified by users in the course of an 31 seemed to me to be just beautiful. As you walk into a museum and you look at a mobile you see a configuration that’s moving very subtly. You walk in the same building the next day and it’s a different configuration yet it’s the same piece, the same work by Calder.” [3] Free improvised music depends upon some amount of organization, even if it is minimal. In musical situations where there is no preparation or discussion of musical intentions, an established rapport or relationship between performers serves as a kind of composition. This provides organization through familiarity and shared sensibilities. Borgo describes an improvising ensemble as an “open system” that emerges from bottom-up processes driven by players’ relationships and interactions, their training, and environmental factors. Listening is also a huge factor because it regulates the dynamics of the performance. Players are constantly aware of their contributions as well as the contributions of others, and make split-second decisions based on the overall musical output of the group. Brown’s thoughts on musical structure are also noted by Michael Nyman in “Experimental Music: Cage and Beyond.” Brown emphasizes that one importance of composition is to be both a means of sonic identification and musical point-of-departure. “There must be a fixed (even if flexible) sound-content, to establish the character of the work, in order to be called ‘open’ or ‘available’ form. We recognize people regardless of what they are doing or saying or how they are dressed if their basic identity has been established as a constant but flexible function of being alive.” [4] Brown was interested in approaching music with an openness that allowed every performance to render a unique musical output that retains the essential character of the work. These compositional ideas, however, were not exclusive to Brown and his music. Composition in this genre can be more formalized as well. Saxophonist Steve Lacy talks very openly about how he uses composition as a means of mobilizing a performance and creating a musically fertile situation that can nurture an improvisational performance. He stated, “I’m attracted to improvisation because of something I value. That is a freshness, a certain quality, which can only be obtained through improvisation, something you cannot possibly get from writing. It is something to do with ‘edge’. Always being on the brink of the unknown and being prepared for the leap. And when you go on out there you have all your years of preparation and all your sensibilities and your prepared means but it is a leap into the unknown. If through that leap you find something then it has a value which I don’t think can be found in any other way. I place a higher value on that than on what you can prepare. But I am also hooked on what you can prepare, especially in the way that it can take you to the edge. What I write is to take you to the edge safely so that you can go on out there and find this other stuff.” [3] Terry Riley’s In C, composed in 1964, is a seminal work in both the Experimental and Minimalist music traditions that shares in the compositional approach discussed by Brown. The piece consists of 53 melodic phrases (or patterns) and can be performed by any number of players. The piece is notated, but was conceived with an improvisatory spirit that demands careful listening by all involved in the performance. Players are asked to perform each of the 53 phrases in order, but may advance at their own pace, repeating a phrase or a resting between phrases as they see fit. Performers are asked to try to stay within two or three phrases of each other and should not fall too far behind or rush ahead of the rest of the group. An eighth note pulse played on the high C’s of a piano or mallet instrument helps regulate the tempo, as it is essential to play each phrase in strict rhythm. [5][6] 2.3 Game Pieces A similar aesthetic is evident in John Zorn’s compositional approach to his game pieces, which he considered as a later-day version of Riley’s In C, “… something that is fun to play, relatively easy, written on one sheet of paper. Game pieces came about through improvising with other people, seeing that things I wanted to have happen weren’t happening. [10] Zorn discusses the compositional direction he followed, “The game pieces worked because I was collaborating with improvisers who had developed very personal languages, and I could harness those languages in ways that made the players feel they were creating and participating. In these pieces, they were not being told what to do. You don’t tell a great improviser what to do—they’re going to get bored right away.” [10] The musical outcome of In C is a seething texture of melodic patterns in which phrases emerge, transform, and dissolve in a continuous organic process. Though the 53 patterns are prescribed, the choices made by individual musicians will inevitably vary, leading to an inimitable version of the piece every time it is performed. Riley’s composition reflects the imperative of self-identification expressed by Brown, but it also illustrates some of John Cage’s thoughts on Experimental music, when he writes that the “experiment” is essentially a composition where “the outcome of which is unknown.” [7] In performance, In C has indefinite outcomes and yet is always recognizable as In C due to the “personality” of the composition—the patterns and performance directions that comprise the work. In an interview with Christopher Cox, Zorn explains his rationale behind this position. He emphasizes how the individuality of the players he selected to perform the game pieces was an essential part of the compositional process, “I wanted to find something to harness the personal languages that the improvisers had developed on their own, languages that were so idiosyncratic as to be almost un-notate-able (to write it down would be to ruin it). The answer for me was to deal with form not with content, with relationships not with sound sound.” [11] Zorn understood the musicians in his ensemble and knew what they were and were not interested in playing. He was able to situate their personal musical vocabularies in a larger structure that allowed for freedom and individual expression while also satisfying his own musical objectives. 2.2 Free Improvisation There are links between Experimental music practice and improvisatory music. Free Improvisation is a good example of this. The genre took root in Europe in the early 1960s, with London, England serving as a major hub in its development. [3] This genre, in spite of labels and stereotypes, still involved elements of composition. One instance of this can be found in the coalescence of performing groups. In his essay “Les Instants Composés,” Dan Warburton notes that “The majority of professional improvisers are choosy about who they play with…and tend to restrict themselves to their own personal repertoire of techniques.” [8] David Borgo, in a recent publication on music improvisation and complex systems [9], acknowledges that this characteristic in free improvisation praxis comprises an important aspect of the musical organization and composition in these performances. 2.4 Generative Music Experimental music composition, and techniques or processes of composition found in various forms of improvised music are 32 similar to the work involved in modeling an emergent, self-organizing system. Generally, all involve a bottom-up structural approach that generates emergent dynamics through a lack of centralized control. The same can be said of generative music. Musician, composer, and visual artist Brian Eno has been working with a variety of generative structures throughout his career. He looks at works like In C, or anything where the composer makes no top-down directions, as precursors to generative music. In these works detailed directions are not provided. Instead there is “a set of conditions by which something will come into existence.” [12] some way, generative processes that affect the sound as well as the visuals and overall experience of the piece. These processes occur in a variety of ways including telematic exchange, random ordering and selection, and computer algorithms. Depending upon the nature of the work, several generative processes may be used, each in a different way, leading to a unique experience for the end-user or listener. As discussed earlier, emergence is an important quality heard in Experimental, free-improvised, and generative music. It is also a fundamental aspect of contemporary digital art works, and can arise from a variety of sources, “ordering itself from a multiplicity of chaotic interactions.” [14] The pieces discussed here are no exception. Whether through the layering of sonic and visual patterns, navigation of a data space, evolutionary algorithms, or telematic exchange, one cannot ignore the emergent properties that characterize these works. Eno’s influential Ambient recording Music for Airports was created using generative techniques [13]. Rather than deal directly with notes and form, generative composers create systems with musical potential. Eno refers to this as “…making seeds rather than forests,” and “…letting the forests grow themselves,” drawing on useful metaphors from arboriculture. An important aspect of this approach, however, is in setting constraints so that the generative system is able to produce what its creator (and hopefully others) will find to be interesting. In a recent conversation with Will Wright, the designer of The Sims and SimCity, Eno explains the reasoning behind this, “You have to care about your inputs and your systems a lot more since you aren’t designing the whole thing (you are not specifying in detail the whole thing) you’re making something that by definition is going to generate itself in a different way at different times.” [13] 3.1 Electroplankton Electroplankton, created for the Nintendo DS game system by Toshio Iwai, was released in Japan in 2005, and later in Europe and North America in 2006. Iwai writes that the idea draws on his fascination with different objects across the course of his life—a microscope, a tape recorder, a synthesizer, and the Nintendo Entertainment System (NES). [15] Some consider it a game; others a musical toy. Either way, Electroplankton captivates player and audience alike with its engaging use of sound and animation controlled via the touch-sensitive screen of the Nintendo DS device. Using a stylus, players are able to draw, twirl, tap, and sweep an array of animated plankton characters on the screen. There are ten different plankton “species;” each with its own sounds and sound-producing characteristics. Plankton and their behavior are linked to a pitched sound or a short recording made by the player using the device’s built-in microphone. Manipulating an individual plankton (or its environment) initiates a change in the sound(s) associated with it—a different pitch, timbre, rhythm, phrase length, and so on. As multiple plankton are manipulated, a shift in the overall sonic output of the system is apparent, causing the music of Electroplankton to produce textural patterns and foreground/background modulations similar to those of In C (as described earlier). These techniques—experimental, improvisatory, and generative—exhibit in their emergence a becoming. With each, the simple rules or relationships that form a composition act together and lead to unexpected, unpredictable, or novel results. Musical gestures show a ripple of promise, take ephemeral form, and then dissipate. Often this process requires a great investment of attention and time on the part of the listener. Time is especially important in Generative music, where the intentions are not to produce an immediate effect or shock of perception, but a gradual transformation as sounds are heard in the ebb and flow of the generative process. This quality of becoming can be similar to the emergence of a telematic environment or an experience with interactive art or media. Interactions with the plankton turn the Nintendo DS into an instrument that can be played purposely through the manipulation of the onscreen animations. Simultaneously, the software programming that links sounds to the plankton and their environment represents a musical ordering, or composition that is implicit in Electroplankton. The coupling of these attributes perfectly illustrates how the combination or blurring of composition and instrument can lead to an interactive work with profound musical potential. 3 Contemporary related works While a true blurring of composition and instrument has not been fully realized in contemporary practice there are a number of works that show the potential embedded in this approach. All examples discussed here demonstrate the latent quality of “composition-instrument” in the current art and media landscape. All of these works share three characteristics: asynchrony, emergence, and generative-ness. Asynchrony is a key factor in the processes of interaction. An input will have an affect on the output of the system, but it may not be immediately or fully apparent at the moment of interaction. While at first this approach may seem misleading or unresponsive, it is essential in shaping the music and the listening experience it creates. Whereas an immediate response would cause users to focus on functionality and “what it (the software/music) can do,” a delay—however slight—helps keeps them focused on listening and allows for a more gradual and introspective process of discovery. Additionally, it retains the potential for musical surprise. Listeners know that the music is changing but they are unlikely to be able to anticipate the nature of its transformation. 3.2 Additional Examples The musical qualities embedded in Electroplankton provide a clear—but not a sole—example of ways in which a compositioninstrument approach is latent in contemporary games and digital art works. Following are several short descriptions of additional projects that share a similar musical sensibility. To retain the focus of this paper, lengthy discussions have been avoided. However, readers are encouraged to pursue further investigation into these projects beginning with the web sites provided here. 3.2.1 Rez Rez, designed by Tetsuya Mizuguchi for Sega Dreamcast and Sony Playstation 2, is described as a musical shooter game. Change occurs by way of interaction but also through various means of generation. All of the works discussed here contain, in 33 Players enter the cyber world of a sleeping computer network to destroy viruses and awaken the system. [16] Each successful shot leads to the performance of sounds and musical phrases that perform/compose the soundtrack for Rez in real-time as a direct result of the game play. Both the visual and audio experience leads players to feel an immersive, trance-like state that makes the game incredibly captivating. More information on Rez can be found at www.sonicteam.com/rez. Readers may also be interested to see other musically focused games that require physical or “twitch” skills such as Amplitude, Band Brothers (a.k.a. Jam With the Band or Dai Gassou! Band Brothers), Dance Dance Revolution (a.k.a. Dancing Stage), and Guitar Hero. 4 The Composition-Instrument in Contemporary Projects As stated earlier, a composition-instrument approach is latent in contemporary practice. There are many excellent projects where the seeds of this approach are visible but no single work has yet realized the full potential bound within the idea. Following is a discussion of projects that either seek to—or have great potential to—embody the composition-instrument approach. 4.1 Perturb as a Model of Interaction Perturb is a project developed by the author in tandem with the research that helped inform this paper. It was created with the intent to provide a very basic and clear illustration of the composition-instrument idea. Perturb shows how music can be composed and performed in real-time via generative systems and user interaction. 3.2.2 Eden Eden, by Jon McCormack, is described as an “interactive, selfgenerating, artificial ecosystem.” [17] In more general terms, it is a generative installation artwork of sound, light and animation, driven by Artificial Life systems and environmental sensors. [18] Eden situates visitors in a room, standing outside the virtual ecosystem that is represented by a projected, cellular lattice in the room’s center. A visitor’s presence in the room can impact the ecosystem favorably. Someone standing in a particular location makes the adjacent space more fertile for the creatures, or “sonic agents,” that inhabit Eden. The lives of these creatures involve eating, mating, fighting, moving about the environment, and central to the musical character of the piece—singing. One way or another, all of these activities lead to both the visual and aural aspects that comprise the work. More information about Eden and McCormack’s publications can be found at www.csse.monash. edu.au/~jonmc/projects/eden/eden.html. The title was conceived by considering the nature of musical interaction in these works. Composition-instrument was initially defined as a work that can “play and be played,” and serves as a conceptual framework for music in interactive media and digital art. The concept strives to find a balance; neither the ability to “play” nor “be played” should dominate a user’s experience. If interactions are too direct (“be played” is too apparent), the piece becomes too much like an instrument and the significance of other aspects of the artwork can be diminished. Similarly, if an unresponsive musical environment obscures interactions and “play” dominates the experience, the work loses its novelty in being tied to the course of a user’s interaction. The composition-instrument approach permits equilibrium between these two and as a result, acknowledges user interactions as perturbations in the overall musical system. In this context a perturbation is understood as a ripple sent through the musical system due to an interaction. It does not take on the clear cause-effect nature of a musical instrument (press a key to hear a note, for example). Instead it allows interactions to manifest as sound, gradually following the course of the composition’s generative process. Perturbations introduce new sounds into the composition’s aural palette and can subtly reshape the musical character of the work. 3.2.3 Intelligent Street Intelligent Street was a telematic sound installation where users could compose their sound environment through SMS messages sent via mobile phone. [19] The piece was developed in 2003 by Henrik Lörstad, Mark d’Inverno, and John Eacott, with help from the Ambigence Group. Intelligent Street was situated simultaneously at the University of Westminster, London and the Interactive Institute, Piteå, Sweden via live video connection. Users at either end of the connection were able to see and hear the results of their interactions. Using freely associated, non-musical terms such as “air” or “mellow,” participants sent an SMS message to Intelligent Street, and were able to hear how their contribution impacted the overall composition. [19] Simultaneously, all received messages were superimposed over the video feed to create a graphic representation of the audible sounds at any given time. Intelligent Street showed how music could be used to set the mood of a physical space through processes of cooperation and composition across groups of people in distributed environments. [20] Further information about Intelligent Street is available at John Eacott’s web site (www.informal.org), Henrik Lörstad’s web site (www.lorstad.se/Lorstad/musik.html), www.lorstad.se/Lorstad/musik.html), and the Interactive Inwww.lorstad.se/Lorstad/musik.html stitute of Sweden (www.tii.se/sonic.backup/intelligentstreet). As a basic illustration of the composition-instrument approach, Perturb consists solely of an interface for introducing perturbations into the musical system. It offers nine separate modules that can hold sound samples. Running alongside the nine modules is a generative musical system based on the Particle Swarm Optimization algorithm developed by Kennedy and Eberhart [22][23]. The swarm has nine agents that correspond to each of the nine sound modules of the interface. As the system runs, the dynamics of individual agents within the swarm send cue messages that tell a module to play one of its attached sound samples. Users have the ability to attach an array of preset sounds. Or they can attach sounds on an individual basis. Either way, when an agent sends a cue message to its sound module, a randomly selected sound from the module is heard. As all agents act together, the music of Perturb begins. Users can improvise within this structure (or perturb it) in several ways. They can use as many or few of the nine modules as they like, which results in thinning or thickening the musical texture. Users are also able to choose which sound(s) are attached to each module. They can draw from a preset database of sounds or use sound files they have created themselves. Any of these interactions—adding/removing sounds or modulating the sonic texture—allows the work to be played played. Simultaneously, while following the generative structure directed by the swarm, the work is allowed to play on its own accord. The tension between interactive control and generative autonomy define the 3.2.4 PANSE PANSE, or Public Access Network Sound Engine, is an open platform for the development of audio-visual netArt created by Palle Thayer. The project exists online as a streaming audio application, and consists of a synthesizer, two step sequencers, and an effects generator. [21] PANSE creates an opportunity for artists and musicians to create interfaces that control, or animations that are controlled by, the PANSE audio stream. Information about PANSE including technical specifics for connecting to the stream and interface authoring is online at http://130.208.220.190/ panse. 34 nature of an interaction as a perturbation. User choices are recognized within a system, but are subject to the dynamics of that system before they can become manifest. particular species? What devices do they use to make music, and what is the sound of that music? In a game of becoming like Spore, a composition-instrument approach would be very advantageous. Composition-instrument monitors interactions carefully and sees each as perturbation that will have a gradual consequence within the system where it is sensed. In the way that procedural content generation leads to a natural mode of locomotion for a creature, perturbations to the musical system lead to a natural development of sounds that define that creature and its culture. As creature and culture develop and evolve, the sounds and music that are part of their identity take on new forms and tonalities. The generative nature of Spore can help to sustain this development. The game maintains its own internal sense of progress and evolution as it grows new creatures, new landscapes, generates climates, and pollinates one world with the contents of another. This continuous process of generation provides the exact dynamics that enable a composition-instrument piece to play while a gamer’s interactions in the Spore world play music with it. Perturb was created to demonstrate the musical and technical characteristics of a composition-instrument approach. The strength of the piece is in its musical expressiveness and flexibility, but it does not fully address the connection between music conceived in the composition-instrument approach and an interactive system or artwork. There are however other contemporary projects where the foundations of a substantial connection between music and interaction seem to be in the process of formation. 4.2 Spore—The Potential of Becoming Spore, the current project of game designer Will Wright, is a project where a composition-instrument approach could be fruitfully employed. Spore is slated for commercial release in the secondhalf of 2007 [24], which means that much of the argument offered here is speculative. Few details concerning Spore’s gameplay and features have been officially confirmed. However, there have been enough published articles, screen captures, and interviews with Wright to leave one with a good impression of the overall flavor of Spore. 5 Conclusion A composition-instrument approach embodies qualities of music formally understood as “composed” and “improvised.” Works that use this idea are like generative music compositions in that they have their own internal order or organization. They are also like instruments in that they can be played, or performed-upon, and in the course of that performance, make an impact that modifies the character or course of the music outputted by the generative system. This “instrumentalization” allows for perturbations in the generative system and leads to an emergent becoming of music. When coupled with an interactive game system, the compositioninstrument piece becomes a soundtrack that is both responsive to the game state and autonomous in its ability to adapt and develop relative to that state. This approach to music for games, or any sort of interactive digital system, hopes to open new opportunities for music in digital art and media, and to break down the linear models that have stifled creative progress in this area. In the game, players have the ability to design their own characters. These creatures can look like lizards, horses, trolls, or cutesy cartoons—whatever a player decides to create. One potential difficulty with this feature then becomes animating such an unpredictable variety of creatures. How can the game accurately simulate the motion of creatures that walk with tentacles, or creatures that have legs like waterfowl, or other exotic means of locomotion? This challenge presents one of the most promising aspects of Spore—the use of “procedurally generated content.” [24] [25] GameSpot news describes this as “content that’s created on the fly by the game in response to a few key decisions that players make, such as how they make their creatures look, walk, eat, and fight.” [24] The technology behind this aspect of Spore has not been revealed, but Wright describes it using an analogy: “think of it as sharing the DNA template of a creature while the game, like a womb, builds the ‘phenotypes’ of the animal, which represent a few megabytes of texturing, animation, etc.” [25] Spore also uses “content pollination” to complete the make-up of one player’s world using the assets of another player. [26] The basic sharing of resources is simple enough to grasp, but to be able to distribute these resources realistically and allow them to engage in believable interactions with another environment must involve a complex Artificial Life (or A-Life-like) system. If the world of Spore is to be a fluid ecosystem as promised, there will have to be some sort of self-organizing system or generative, non-linear dynamics that underlie the entire game and allow it to unfold in a natural, organic fashion. 6 References [1] Online reference: www.gamasutra.com/features/20000217/ harland_01.htm [2] Alexander Brandon, Building an Adaptive Audio Experience, Game Developer, Oct., pp.28-33, (2002) [3] Derek Bailey, Improvisation: its nature and practice in music, 2nd ed., New York, Da Capo, (1992) [4] Michael Nyman, Experimental music: Cage and beyond beyond, 2nd ed., Cambridge, U.K.; New York, Cambridge University Press, (1999) [5] Online reference: www.otherminds.org/SCORES/InC.pdf [6] Terry Riley, In C, (1964) [7] Cage, J. (1973). Silence: lectures and writings, 1st ed., Middletown, Wesleyan University Press. [8] Dan Warburton, Les Instants Composés é , in Marley & Wasés tell, et al, Blocks of consciousness and the unbroken continuum, 1st ed., London, Sound 323, (2005) [9] David Borgo, Sync or swarm: improvising music in a complex age, 1st ed., New York, Continuum, (2005) [10] Ann McCutchan and C. Baker, The muse that sings: composers speak about the creative process, 1st ed., New York, Oxford University Press, (1999) [11] Christopher Cox & Daniel Warner, Audio culture: readings in modern music, 1st ed., New York, Continuum, (2004) The generative aspects of Spore (whether documented in an article or speculated here) show that it has, as a central component of its functionality, the ability to become. Wright has commented that at one point the game was titled “Sim Everything.” [26] [26] Most likely this is due to the ability of the game to become any kind of world the player/designer intends. This focus on customization of experience, growth, and becoming are what make Spore such an ideal environment for music. In addition to exploring (to name a few) the physical, dietary, and architectural possibilities of culture in this game environment, it would also be interesting to explore musical possibilities. What sounds resonate with a 35 [12] David Toop, Haunted weather: music, silence, and memory, 1st ed., London, Serpent’s Tail, (2004) [13] Brian Eno and Will Wright, Playing With Time, Long Now Foundation Seminar, San Francisco, June 26, (2006) [14] Roy Ascott, Telenoia, in Ascott & Shanken, Telematic embrace: visionary theories of art, technology, and consciousness, 1st ed., Berkeley, University of California Press, (2003) [15] Nintendo of America, Electroplankton instruction booklet, 1st ed., Redmond, Nintendo, (2006) [16] Online reference: www.sonicteam.com/rez/e/story/index. html [17] Online reference: www.csse.monash.edu.au/~jonmc/projects/eden/eden.html [18] J. McCormack, Evolving for the Audience, International Journal of Design Computing, 4 (Special Issue On Designing Virtual Worlds), Sydney (2002) [19] Henrik Lörstad, M. d’Inverno, et al., The intelligent street: responsive sound environments for social interaction, Proceedings of the 2004 ACM SIGCHI International Conference on Advances in computer entertainment technology, 74, pp.155162, (2004) [20] Online reference: www.turbulence.org/blog/archives/000122.html [21] Online reference: http://130.208.220.190/panse/whats.htm [22] James Kennedy and Eberhart, R., Particle Swarm Optimization, Proceedings from the IEEE International Conference on Neural Networks, 4, pp.1942-1948, (1995) [23] Norbert Herber, Emergent Music, Altered States: Transformations of Perception, Place and Performance, 1, DVD-ROM, (2005) [24] Online reference: www.gamespot.com/news/6155498.html [25] Online reference: http://en.wikipedia.org/wiki/Spore_ (game) [26] Online reference: http://technology.guardian.co.uk/games/ story/0,,1835600,00.html 36 Investigating the effects of music on emotions in games David C Moffat and Katharina Kiegler ([email protected]) eMotion Lab, Division of Computing Glasgow Caledonian University, UK (http://www.gcal.ac.uk/) Abstract The importance of music in creating an emotional experience for game-players is recognised, but the nature of the relation between music and emotion is not well understood. We report a small study (N=15) in which players' skin conductance, heart-rate and pupil-dilation were recorded while watching brief film clips, and listening to pieces of background music. The main film clip was fearful in mood; and the music pieces expressed different basic emotions: happy, sad, aggressive, and fearful. There were definite effects of the music on the physiological measures, showing different patterns of arousal for different music. The interactions between music and film-clip feelings are complex, and not yet well-understood; but they exist, and are relevant to film and game makers. They can even change the way a player assesses the game, and thus change the play itself. 1 Introduction 2.1 Fifteen students at the university, 11 male and 4 female, aged between 18 and 26, volunteered to take part in the experiment, which took about 25 minutes on average. They sat comfortably in the eMotion Lab, which is designed like a typical living room at home, and were told that they would see several short film clips on a large, high-quality plasma TV-set, and fill in some short questionnaires about them. The clips would be trailers of new video-games. The N=15 participants were divided into three random groups of N=5 each: G1, G2 and G3. Music has long been known to evoke emotions in people. Even if researchers still argue whether music really elicits emotional responses in listeners, or whether it simply expresses or represents emotions, they agree that music provides a sort of emotional experience and affects our moods (Sloboda & Juslin, 2001). The computer's possible understanding of a user's emotional state is becoming important for HCI; and feasible (Picard 1997). There are attempts to use physiological measures, such as heartrate and skin-conductance, to sense activity of the autonomic nervous system. From that data, one could make an educated guess about the user's general state of arousal, or more (e.g. Mandryk 2005). However, it must be admitted that efforts to find reliable links between emotion and physiological response have not been very successful so far. The mere existence of an emotional state, of some undetermined kind, is typically all one can affirm. We aim to study the user experience of video-games in our eMotion Lab: in particular the user's emotions. Since games are intended to be fun, it should ideally be part of the usability testing for games, that their emotional effects on the player be understood. The eMotion Lab is a usability lab where colleagues investigate, among other things, the effect of background music on a player's performance. Game designers need to understand the connection between games and emotions when they use music and sound effects to enhance the experience of players, and so they need the support of a research effort in this area. 2 Participants 2.2 Method (materials and equipment) The lab is a friendly environment for playing video-games, complete with comfortable sofa, several games platforms, large plasma screen, a one-way observation mirror and CCTV videocameras for observation and recording. The model of the Tobii.com eye-tracker is quite unobtrusive, so that it does not interfere with the player's experience, and we can get more authentic data about the player's emotional state. To measure skin-conductance (SC) and heart-rate (HR) we used a device from a biofeedback game called “The Journey to Wild Divine” (wilddivine.com). The eye-tracker was used to measure dilation of the pupils. It is already well-known that pupils dilate under cognitive load (Beatty 1982), and other forms of arousal including both positive and negative emotions (Partalla and Surakka 2003). The dilations could be of interest if different emotions appear to have different pupil dilation patterns. The film-clips used were trailers for different video-games, with their original or different pieces of music. Only the clip for the new game Alan Wake (by Remedy) was of direct interest to us. The other clips were to set an initial neutral mood for all participants, to separate the repeat showings of the Alan Wake clip, and disguise the purpose of the experiment. The clip from Alan Wake was chosen to be ambiguous. It is not clear what is happening, and so the participants would be free to choose an interpretation according to the background music played with the clip. The pieces of music that were played with the Alan Wake clip were for the following basic emotions: fear, sadness, anger (or aggression), and happiness. There was also a no-music (silence) Experiment design To investigate the feasibility of detecting emotion or mood in a naturalistic setting, we ran an experiment where participants watch a series of short film-clips, with different pieces of music in the background. The clips were trailers for video-games. The music was to evoke different moods, so that we could observe the effects on experience and physiology, including pupil-dilation. The results were analysed to determine how the emotional influence of music changed the participants' feeling, impression, perception and assessment of the film-clips. 37 �� �� ���������"���������������������� ���������������������������������� ��������"���������� �������������������������������������������� ����������������������������� ����������*�!���������������������� �� � ���)� � ��������� � �����" � ��� � ���������� � ������� � ����� � ����� ����������� ���)����������������������������������������)������"� �������������������������������D������������������������������ ��������������"������������������������������������������������ ����� � ��� � �������� � ��������� � �� � ��������� � !������ � �� � ���� �������������)����������!���" ���������������������� !����������������������)����������� ���������������������)�������!����������������������������� �������������������������������������������������������������"� ����������������(����������������<������������������������������ ���� � ���� � ������� � ���� � ��� � ����� � ������ � ��� � ������ � ���� � ���� ��<�������������" ��� ������ ����� � ��������� ����������� � � ����& ������ �� .������ �� 8������ �1 �� .�� �� E��������� �> �� 3���� �� 8������ ������ ���� 8������&" �5������������ ����������������������������������� ����������������(�������������������������������������������������� ������������������������������������������������������8��"�&"����� �������������������������������������������������"������������ �����������������������������������������������������������������*� !�� � ��� � �� � ����� ���� � ��� � ���� � ������� � �����" � ��� � ���������� !������������������������������������������������������)������������ ���������!������������!�������������������������������������������� ����� � ���� � ��� � ���� � ������� � ������� � ��� � ���� � ���� � ������� @E����A�" E������������������������������������������������������������� �������������������������������������������������������)����������� ����������������������@8������A�������"�������������������������� ��� �������������*������ ��� ������������ ������������ ��� ����"�B���� ������������������������������������������������������������������ �����������������" E������������� ��� ������������������������������������������ ��������� �������������������!���������������������������!�������� ����������������"�8�������������������"1"&���������� 1����������� ����� �� ��� ����� ���� ����� �� ����� ����� ����� ����� ��� ��� ����� �� ���� @���A" ��� ����� ������� �� ����� ���� ��� !������ �������������������(����������������������������������������*�!��� ��������������������������������������" ����@�����A�������������������������������!��������������������� ����" ���� ����������� � �������� ������ ���������� @�����A� �!�������� ������������@�����A" 4��������������������������������������������������������������� 1"'������������������������������������������3������������@�����A� �����*����������@����������A���������������E����*����������@���A� ��������������.��*��������������@�������A�������������������8���" .���������������������������������������������������������������� ��������������������"�#����������� �������������������������� ������� �����������������������"�C�������������������)�����!����������� �������������������������������8��"�&���������������� �������� �����"�����@���A������������������!��������������������������������� �� � ������� � �� ����� � ��%�"22>�� � ��� � �������" � ��� � ����� � ������������ �������������������@�����A�������������������������������������� �� � �� � ����� � �� ����� � ��%�"22222&� � �� � �������� � ����" � B��� � ���� @�������A � ��� � @����������A � ������ � ���� � ���� � ��!������ � ��� �������� � ���� � @�������A � ����� � ������� � ���� � <���� � ���� � ���� @����������A � ���� � <���� � ����� � ���� � ����������� � !���� � ���� �������������������6'G�������" ��������������������������)�����!��������������������������� ���������������������������� ���������"�����@����������A������� �����������������������������������������������"&"1�@�������A������� �� ��������%�"226�"�.���������������"1"&�@���A������������������������ ����1 �& ��� ���������� ����� ������ ����� �� �� ������� ������ ��!F���� �������� ��� ��� ���� ��������� ������� ������ ����� ����� �� �������" .������ ���������!������������������������������������)��������������� ��������" � .��� � ����� � ���� � !��� � ��)�� � �� � ������� !������ � ���� ������������������������������������������������������������!��� �������������������������������������������������"�E���������������� ����������������!���������������������������������������&������>�� ����������!�����������������������"��������������!������������� ��� � ���������� � ������ !� � ��� � �� � ���� � ������������� � �� � ��� ������ ������������������"��1�����!� ��������� ���������� ��������������� �����������������<����������������������" �� ��������� ��������� ������������ � ����� ��!���&" �,�����������������������������!��)���������������"������&�����1� ��������������������������������������" ������ ����� ��� � 5����������� �5�� ������� ��� ��� ��� ���� ������ ���� ��������� ����� �� ��� !��)������" B����� ����� ���� ������� ��������� ������ ���� ��������� ������ �� ����!���� � ������ ���� ��� �����!���" B������ ��� ��� ��� ��� ������ ���� ����� ������� ����� ����� ���� ����� ������ �� �������� ����� ��� �� ���) ������� �� ��� ����� ������������ �� ���� ��� ��������� ����� �� ��� ������ ������������ ��� � ������ �� ������ � ��������������" .��� �� ��� ����� ����� ��� ����� ���� ��������� �� ���� ��� ��� ��� ����� ���� ��� ��� ���� ���� ���� ����� ���� ��������� ������ ��� ��� ������ ������� ����� ����!� ���� � ��������" 5�������������5���������������������������������������������� ������������������$%'��������������������������������������������� !��������������������������)��������������������������������� ��� ��� �����" ����� �& ��� �� ����� ������� ������ ��� ����� ���� ��� ������� �����" .�� ��!�� &" �1 ��� ��� ��� ����� ������ ������������������������������������*��������>����������������� �����������������������������������" ��������+ ����� #������� �������������$���%������& #���������������������������������������������������������������� �������������������������������!������������������������������� �������!����������������������" !������������"��� ������������������������������������(������������<��������������� ���������������������!������������������!����������������������� �������������������������" 38 �����������������������@�������A����������"&"1��� ��������%�"216�"� ����������@�������A���������">"1�����������������������������!������� <������������������������6'G��������� ��������"2;�"�E���������������� ������������������������������������@�����A�����@�������A������ ���� � ����� � ��� � ������� � ���� � ����� � ������� � ������� � !�� � ���� ������������������������6'G������" ������������������������������������������������������������� ��������������)�����������������������������������"�3������������� ���������������������������������������������������������������� �����������������������!�����������������"�4������������!����������� ���� � ��������� � ������ � �� � ����� � ���� !� � ����� � ���� � ��� � �<������ �������������� � �� � ����� � �������� � ��� � ���� � �������� � ����������� ������ � !� � ����� � !������ � �� � ��� � ���� � �� � ������� � ���� � ���� �������������������������������������������������������������" �� �� !������������������������������������������������������������������� ����!������������!����������������"�B������������������������������� ���������������������������������������!�����������!��������������� �����������������������������!����������������������������������" ?���������������������������������������������������8��"�>���������� �������������������������������������������&"1������>"1������� ������� ����� ���� � ���������� ������������"� ��� � ����)���� ����������� ���� � !� � ��� � �� � ����� � ������ � ���������+ � ������� � �����" � ����� ��������������������������������������������������������������������� ���������" 4 G.1.1 nosound 3 G.2.1 sad G.1.1 nosound Fearful Angry 4������������������������������������ ���8��"�1 ��������������!������ 8��"�&� � ����� � �� � ���������� � ��� � �� � � � ���� � ���� � ��)�� � ���� ������������������������8��"�>" 3 Happy Sad Fearful Angry 1 0 G.1.2 fear G.2.1 sad G.2.2 aggress G.3.1 happy G.2.2 aggress G.3.1 happy G.3.2 fear C�� ����� �!��������� �!������������� �� ���������� <���� ���������� ������ ��� ������" ��� ��������� �� �1"&� �� ��� ����� ��)�� ��� ������������ ���� ������ �� ����� �������" E��������������������������������������������������������������� ���������)��<��������������"�3�����������������������1"&���8��"�9� ���� � ��� � !������� � �� � 8��"�>� � !������ � ���� � ����� � �� � �������� � �� ������������*�!�������������������������������������" ��)��� ��� �!��� �!���������� �� ��������� ��� ��� ���������������������������������������������������������������� �!���������@������������������A������!������������������������ ����� � �� � ��� � !��)������ � ��� � ��� � ����� � �� � ������ � ��������� ���������������������������������������������������������" 2 G.1.1 nosound G.2.1 sad $�� � �� � ������� � 8��"�9 � ������� � !������� � 8��"�> � ����� � ���� ������������������������������������������������������������!������ ��� �������� ����" ������� ��������������1"&���������������������������������� �������������������!���������������������������%�"2:2�"����������� ��������������!������������������������%�"&29�" ��� � ����� ����� � ����� � ��>"&� � ��� � ������������� � ���� � ������ ���%�"2>>H���������������������������������*������������������ ���� � ������������� � ���%�"22>HH�" � ���� � ��� � ���� � � � ������ � ����� ���������!���������������������������%�"&91�"������������������������ ����������������������������������������������������������" #����������������������������)���������������� ������������ ����������������������������"�������������������������!���������� ���!�����������������������������!����������)��������!������� ���� ������������ ��� ���� ��� ��� ������ �� ��� �����" ���������� !������ ��� ��� ������ ��� ���� ������ ������ ��� ���������"������������������������ ������������������������������� ��� � ������� � ��������� � ��&"& � �� � �&"1�� � !��� � ������ � ��� � ��������� ����������������������������������������������������������������!��� ������������������" ��� � ���������� � ���� � �1"& � �� � �1"1 � ����� � �� � ��)� � ��� � �1� �������������������������%�"2;1�����������������%�"29'H�"�4��������� ��������������������������������1"1����������1"&���������"�4�� ���������� ��������������������������������������������������������� �������������������������������������������"�4��������������������)���� �����������������������������+������������������������������������ ����������������������������������)�������������������������������� �����" ��� � ����� � �> � ���� � ��� � ���� � ��� � ����������� � ������������ !���������������������������������������" 8������1"��8�������������������������������������� 4 G.1.2 fear 8������9"��8����������������������������!��������� 0 Sad Angry 0 1 Happy Fearful 1 G.3.1 happy 2 Sad 2 ��������������������������������������������� E� !������ ��!F����������������������������������������������� ����� ���� �� ����� �� 8��" 1" �� ����� ��� ����� ����� ��� ������ ���� ��� ��� ���� �������� �� ��� ����������� !������ ���� ����� ��� ���� � ��� � ��� � �� � ��� � !��)������ � ����� � ���� � ����� � ����� � ����� ����������" ������������������������������������������������������������ ������������������������������������������������� ��������!�����" 4�����)����)��������������������>"&���������������������������� ������������������������������������������������������1"&�"����� ��������������<��������������������6'G�������������������%�"2:7�"� �����������������������!�������������������������������!�������� �������������)��������������������������������" 3������������������������������������������������������������ ���%�"29'H����������������� ��������������%�"22>HH��������������� �����" � 4� � �� � �������� ����� ��� � ���� � ����������� � �������� �� � ������ ����������������� ����������������������������������������������� ����������������������!������������������������" 4 Happy 3 G.3.2 fear 8������>"���������������� �����������!��������" B����������������������� ���� ���� ��������� ������� ������ ������� �����������������������������)����������������������������!�������"� 4����������!������������������������������������������!������������� 39 2.4.3 Musical effects on player's physiology 2.4.4 Until now, we have only used self-report data from participants' answers to questionnaires. The question remains open whether they are commenting on the feeling of the music, just imagining what it could make somebody feel; or whether their true feelings are genuinely affected. Table 2 shows the physiological data of skin-conductance (SC), heart-rate (HR), and what we call pupil-range (PR). The PR is the difference between the minimum and maximum pupildilations over the whole clip. It is a simple summary parameter from the eye-tracker data, which leaves out a lot of complexity, but it is useful for a first analysis. Musical effects on player's thinking Some of the questions on the questionnaire asked about how participants assessed the events in the story. Any changes in assessments, caused by background music, would show that even thought processes can be influenced by incidental sounds. 4 3 2 1 0 Table 2. Physiological data – changes over whole clip and mean that the variable falls or rises SC is skin conductance; HR is heart-rate; Pupil-range is max-min dilation over clip (in mm) Group 1 SC HR Clip 1 Clip 2 No sound Fear 1.743 1.454 Sad Aggress SC HR Pupil-range 1.363 1.677 Group 3 Happy Fear 1.205 1.469 Pupil-range Group 2 SC HR Pupil-range G1.1 nosound G1.2 fear G2.1 sad G2.2 aggress G3.1 happy G3.2 fear Figure 5. “Does the character have a weapon?” We limit discussion here to just one of the questions, which asks if the participants agree that the Alan Wake character has a weapon with him (e.g. a gun in his coat pocket). The answers are shown in Fig. 5, averaged for each group. The scores range from 0, meaning “I totally disagree” through 3, which is “neutral”, to 4, meaning “I totally agree.” The strongest agreement is from the aggressive-music group (G2.2), who all agree or agree totally, that he has a weapon. Testing for statistical significance, we compare the group with the groups G2.1 and G3.1, and find that the differences are significant (p = .013* and p = .017*, respectively). Aggressive music seems to cause participants to assess the situation quite differently. They appear to attribute aggression to the lead character, and that leads them to assume he must be armed to be so confident. How much of this reasoning is conscious cannot be determined from our results. 3 Discussion One question concerning studies such as this one is whether emotions are truly induced, or whether they are merely imagined and reported by participants. Because we found distinct patterns in the physiological data, we conclude that in this case emotions really were induced in the experiment. The mood or emotion associated with each piece of music was confirmed by the participants at the end of the experiment. Although the pieces were well-chosen, they were not all equally effective at inducing one precise emotion. This is in the nature of music, which is more art than science, even today. The music and film-clip both had effects separately, but also interacted in some interesting ways. Music of one mood could induce a quite different mood state in a person watching the clip. The emotions of happiness and sadness were seen to be opposites throughout. Music or video that caused one to rise would generally cause the other to fall, which is intuitively reasonable. The sad piece of music seemed to be especially effective at inducing sadness, but on this evidence alone, it is not possible to generalise from this case to all other sad music. Happy music had an interesting “inoculation” effect. It tended to lessen the negative emotions, including fear, which is striking because the film-clip is intended to be fearful in mood. This is one result that should alert game designers to be careful when choosing background music for their games. An inappropriate piece of music can kill the experience for the player. Fearful music was also quite effective, and brought two different groups to a similar emotional state even after their divergent histories up to that point (in the first clip). This may be Each and symbol represents a rise or a fall of the physiological variable from start to end of the clip. This change in value is consistent for all (five) participants of each group in every case, but for one small exception. One participant in group G.3 starts with a heart-rate (HR) of 83.8 beats per minute (bpm) and end the clip with 84.5 bpm; but all the other participants in that group experience a larger fall in HR, as shown in the table. Pupil-ranges (PR) in the table are average for the group, and a (or ) symbol shows that each participant in the group has a bigger (or smaller) PR for clip-2 than for clip-1. In each group, the difference is of the order of about 0.3 mm. Because of problems with missing readings from the eye-tracker, there are only three participants for the PR row in each group. Even so, the within-group consistency is almost as impressive for the PR variable as for the SC and HR variables (with five participants in each group). It is clear, from Table 2, that the different emotions have different effects on the physiological variables. This is not to say that all emotions, in all circumstances, will have clear, characteristic patterns of response in physiology; indeed, that is very unlikely. However, in the controlled context of our experiment, there is a pattern. This is meaningful, because people do not have conscious control of SC, HR, or pupil-dilation; but emotion does strongly affect such physiological variables. We conclude that our participants were reporting actual emotional responses. Given that the clips were only about 30 seconds long, it might be surprising to some people to realise how quickly a viewer's emotional response can be affected by music. 40 an interaction between the music and the clip, however, since the fearful music is the original track for the (fearful) clip. One result is of special interest to us, and the focus of our current and future work: the demonstration that background music can have the power to change the listener's assessments and other thoughts about the situation or story in the film. We found this effect for aggressive music, but in general other moods might be relevant to other clips. 4 Conclusion As a result of this study, we suggest that music, via emotion, can influence subjects’ perception and assessment of the situation. Physiological measurements, such as heart-rate, skin conductance and pupil dilation can be valuable in helping to read the emotional state of game players. But it is still difficult to detect a person’s emotion reliably, as many factors can influence the emotional experience. Therefore more sophisticated models are required to frame analysis and support interpretation. 5 References Bartlett, D. (1996) Physiological Responses to Music and Sound Stimuli, in D.A. Hodges (Ed.) Handbook of Music Psychology, 1996, 2nd edn. Lawrence, KS: National Association for Music Therapy. Beatty, J. (1982). Task-Evoked Pupillary Responses, Processing Load, and the Structure of Processing Resources. Psychological Bulletin, 91(2), 276-292 . Frijda, N.H. (1986) The emotions. Cambridge: CUP. Frijda, N.H. (1994) Emotions are functional, most of the time. In: Ekman, P., Davidson, R.J. (Ed.s) The Nature of Emotion, Fundamental Questions New York: OUP. Mandryk, R.L. (2005). Evaluating Affective Computing Environments Using Physiological Measures. In: Proceedings of Workshop 14: Innovative Approaches to Evaluating Affective Interfaces, at CHI 2005. Portland, USA, April 2005. Partala, T. Surakka V. (2003). Pupil size variation as an indication of affective processing. Int. J. Human-Computer Studies 59 185198. Picard, R.W. (1997) Affective computing. Cambridge, MA: MIT Press. Sloboda, J.A. and Juslin, P.N. (2001) Music and emotion: theory and research. Oxford: OUP. Tan, E.S.H., and Frijda., N.H. (1999). Sentiment in film viewing. In: Plantinga, C. and Smith, G.M., (Ed.s) Passionate Views: Film, Cognition, and Emotion. Baltimore: Johns Hopkins UP, 48-64. 41 REMUPP – a tool for investigating musical narrative functions Johnny Wingstedt School of Music, Luleå University of Technology, PO Box 744, SE941 28 Piteå, Sweden Sonic Studio, Interactive Institute, Acusticum 4, SE-941 28 Piteå, Sweden [email protected], [email protected] Abstract. The changing conditions for music as it appears in new media was the starting point for the project “NIM – Narrative Interactive Music”. The overall aim was to explore interactive potentials and narrative functions of music in combination with technology and other narrative media – such as in film or computer games. The software REMUPP was designed for investigating various aspects of the musical experience and allows for experimental non-verbal examination of selected musical parameters in a musical context. By manipulating controls presented graphically on the computer screen, participants can in real-time change the expression of an ongoing musical piece by adjusting structural and performance-related musical parameters such as tempo, harmony, rhythm, articulation etc. The music can also be combined with other media elements such as text or graphics. The manipulations of the parameter controls are recorded into the software and can be output in the form of numerical data, available for statistical analysis. The resulting music can also be played back in real time, making it possible to study the creative process as well as the aural end result. A study utilized the REMUPP interface to explore young adolescents’ knowledge about, and use of, musical narrative functions in multimedia. Twenty-three participants were given the task of interactively adapting musical expression to make it fit different visual scenes shown on a computer screen. The participants also answered a questionnaire asking about their musical backgrounds and media habits. Numerical data from the parameter manipulations were analyzed statistically. After each completed session, the participants were also interviewed in a ‘stimulated recall’ type of sitting. The results showed that the participants to a large degree displayed a collective consensus about certain narrative musical functions. The results were also affected by the participants’ gender, musical backgrounds and individual habits of music listening and media use. 1 New musical functions good reasons to assume that media music contributes to shaping knowledge and attitudes concerning communicational, artistic and interactional musical issues. A characteristic feature of modern society is the increased interaction between man and technology. New technology requires new kinds of skills and knowledge – but is also the source of new knowledge. This new knowledge concerns not only technology itself, but also various societal and cultural phenomena related to the technological changes. The changing conditions for music as it appears in new media was the starting point for the project “NIM – Narrative Interactive Music”, performed in collaboration between the Interactive Institute’s studio Sonic and the School of Music in Piteå. The overall aim of the project was to explore interactive potentials and narrative functions of music in combination with technology and other narrative media such as image, text or sound – such as in film or computer games. This article will describe the use of the interactive analysis tool REMUPP (‘Relations Between Musical Parameters and Perceived Properties’), which in the project has been used in several quasiexperiments (Cook & Campbell, 1979) to investigate the participants’ knowledge and creative use of music’s narrative codes and conventions. Kress (2003) has described how, in this new ‘age of media’, the book is being replaced by the screen as the dominant medium for communication – changing the basic conditions for the concept of literacy. The centuries-long dominance of writing is giving way to a new dominance of the image. But this new literacy of course does not only involve visual communication. Rather we are in the new media today making sense, or trying to make sense, out of an intricate assortment and multimodal combination of different media: images, written and spoken text, video, animations, movement, sound, music and so on. What creates meaning is above all the complex interplay of the different modes of expression involved. At the same time new technology involves an including quality, emphasizing elements of interactivity and active communication as the contemporary information society is gradually abandoning the communicational model of ‘from-one-to-many’ in favor of ‘from-many-to-many’. 1.1 Musical narrative functions Before describing the use of REMUPP, the concept of musical narrative functions will briefly be discussed. In the process of defining a theoretical foundation for the project, a categorization of musical narrative function was commenced. The purpose was to provide a framework examining and defining what narrative functions are. This framework was aimed to serve as part of a theoretical and referential basis for further exploration of how the narrative functions are experienced, used and achieved. Since film is a medium with an established narrative tradition, having developed sophisticated musical narrative techniques and codes during the past century, the narrative functions of film music was the chosen focus for this categorization. In the emerging multimodal and multimedial settings, the study of the role of sound and music is so far largely a neglected field. Remarkably so, since music and sound are often important expressive and narrative elements used in contemporary media. In formal music education, narrative music, as it appears in film, television or computer games (henceforth referred to as media music) is typically a blind spot and is rarely discussed at depth (Tagg & Clarida, 2003). However, considering the high degree of exposure to this kind of music in our everyday life, there are Around 40 different musical narrative functions were taken as a starting point and divided into six narrative classes: (a) the 42 Emotive class, (b) the Informative class, (c) the Descriptive class, (d) the Guiding class, (e) the Temporal class and (f) the Rhetorical class. These classes were in turn subdivided into altogether 11 (later 12) different categories. Before the modern technologization of media, experiencing drama or listening to music can be said to always have involved a certain degree of interactivity and variability. A live music concert will always respond to the “unique necessities of the individual time, place and people involved” (Buttram, 2004, p. 504), and never be repeated twice exactly the same way. Cook (1998) observes how music detached from its original context, assimilates new contexts. New musical contexts continue to evolve as technology and society changes. Viewing the individual listener and the musical sound as being active dimensions in the defining of the context also implies the listener as being interactively participating in the act of music. Rather than just talking about ‘listening’, Small (1998) uses the term musicking to emphasize a participatory view of the composer and performer - as well as the bathroom singer, the Walkman listener or the seller of concert tickets. In computer games, where the dimension of agency is salient, there is a potential for affecting the musical expression of the game music by interacting with the gaming interface. The emotive class includes the emotive category – which is a general category, present to some degree in most cases where music is used in film (including functions such as describing feelings of a character, foreboding and stating relationships). The functions of the informative class achieve meaning by communicating information on a cognitive level rather than on an emotional level. The class includes three categories – communication of meaning (such as clarifying ambiguous situations a n d communicating unspoken thoughts) , communication of values (such as evocation of time period, cultural setting or indication of social status) and establishing recognition. The descriptive class is related to the informative class in certain aspects, but differs in that the music is actively describing something rather than more passively establishing associations and communicating information. It is also different from the emotive class, in that it describes the physical world rather than emotions. In this class there are two main categories – describing setting (such as physical environment or atmosphere) and describing physical activity (such as movement of a character). The dimension of interactivity in the new media challenges the traditional pattern of ‘Creator-Performer-Receiver’ (Fig. 1) – as well as the conditions for traditional institutionalized learning. The conventional progression of the musical communication process (as seen in western traditional music) is then challenged. Rather than what has traditionally been seen as a one-way communication model we get a situation where the distinction between the roles gets more ambiguous and new relations emerge between the actors involved (Fig. 2). This can be thought of as the music process increasingly getting participatory and inclusive rather than specialized and exclusive. The guiding class includes musical functions that can be described as ‘directing the eye, thought and mind’. It includes two categories, the indicative category (such as pointing out details or establishing direction of attention) and the masking category. CREATOR The temporal class deals with the time-based dimension of music. Two categories are included: providing continuity (shorter-term or overall continuity) and defining structure and form. PERFORMER RECEIVER Figure 1: Traditional view of the musical communication chain. The rhetorical class includes the commenting as well as contrasting categories. Some functions in this class spring from how music sometimes steps forward and ‘comments’ the narrative. Rhetorical functions also come into play when image and music are contrasting, making visible not only the semiotic codes of the music but also the effect of the narrative on how we perceive the meaning of the music. RECEIVER In a given situation, several narrative functions typically operates simultaneously on several different levels, the salient functions will quickly and dynamically change. A more detailed discussion of the musical narrative functions is found in Wingstedt (2004, 2005). CREATOR 1.2 Changing roles During the larger part of the past century, we have gradually gotten used to the role of being consumers of text, images and music. We have progressively accustomed ourselves to the objectification of media – by the means of books, magazines, recordings, films etc. Making media mobile has led to a recontextualization and personalization of medial expression and experience. This in turn has affected how we establish visual, aural and musical codes, metaphors and conventions. The growing interest in mobile phone ring tones, the use of ‘smileys’ in SMS and e-mail, the codification of the ‘SMS-language’ – are manifestations of evolving technology-related media codes. PERFORMER Figure 2: A relational view of participants in act of music. 2 Controlling musical expression One solution to the challenge of achieving a higher degree of adaptability in music used in narrative interactive situations, is to facilitate user influence of the music at a finer level of detail (Buttram, 2004). This can be done by representing the music on a component or parameter level where the parameters are 43 accessible for control via the application, directly or indirectly influenced by user interaction. The concept of musical parameters is here defined as attributes of the musical sound: structural elements such as tonality, mode (e.g. major or minor mode), intervals, harmonic complexity (consonance – dissonance), rhythmic complexity, register (low or high pitch level) etc. - or performance-related elements such as tempo, timing, phrasing, articulation etc. (Gabrielsson & Lindström, 2001; Juslin, 2001). By altering musical parameters in real-time, the musical expression will, directly or indirectly, be affected by the listener/user in ways that is traditionally thought of as being the domain of the composer or performer, as discussed above. By manipulating controls presented graphically on the computer screen (as knobs or sliders), participants can in real-time change the expression of an ongoing musical piece by adjusting structural and performance-related musical parameters like tonality, mode, tempo, harmonic and rhythmic complexity, register, instrumentation, articulation, etc. The basic musical material, as well as the types and number of musical parameters included with REMUPP, can be varied and tailored by the researcher according to the needs and purpose of the study at hand. The music can also be combined with other media elements such as text or graphics. Having the participants manipulate the music, makes REMUPP a non-verbal tool where the participant responds to the musical experience within ‘the medium of music’ itself, without having to translate the response into other modes of expression such as words or drawings. By responding to the musical experience in this way, the user will directly influence the musical expression – and thereby to a certain degree control his/her own experience. Managing the parameter controls requires no previous musical training. In a typical REMUPP session, the controls will be presented without any verbal labels or descriptions, making for an intuitive use of the parameters with a focus on the actual musical sound. Modifying musical expression by controlling musical parameters, directly accesses communicational and expressional properties of the music on a level that goes beyond the genre concept. Alteration of the musical performance can be accomplished without disturbing the musical flow and continuity, at the same time as it provides variation and dynamic expressive changes. Regardless of style, the same set of parameters can be made available, only their settings will be changed – e.g. the parameter tempo is a component of any kind of music and by using it to alter the speed of a musical performance it will at the same time alter some aspect(s) of the musical expression. The possibility to have several variable musical parameters simultaneously available opens up for studying not only the individual parameters themselves, but also for investigating the relationships and interplay between the different parameters. Furthermore, combining the music with other media such as text or video makes visible the relationships between music and other modes of expression – making it possible to study specific meaning making factors appearing as the result of multimodal interweaving. It should be noted that a basic assumption of the project is the view that the musical sound itself (or a certain musical parameter value) is not typically expressing a specific ‘meaning’ – but rather represent a meaning potential (Jewitt & Kress, 2003). The more specific musical meaning making is depending on contextual factors – like the interplay with the situation (including socio-cultural factors), the dramaturgical context and the interweaving with other narrative modes such as the moving image, sound effects and dialogue. In REMUPP, the participants’ manipulations of the parameter controls are recorded into the software and can be output in the form of numerical data, available for statistical analysis. The resulting music, including all the manipulations on a time-axis, can also be played back in real time, making it possible to study the creative process as well as the aural end result. The various ways to handle data, and the possibility to combine different data types, makes the REMUPP tool potentially available for use within several different types of research disciplines. As well as being a source of quantitative statistical data, REMUPP is also suited for use with more qualitatively oriented methods (such as observations or interviews) – or for combinations of different techniques. To explore the potentials of affecting musical expression by the alteration of musical parameters, the software REMUPP (Relations between Musical Parameters and Perceived Properties) was developed (Wingstedt, Berg, Liljedahl & Lindberg, 2005; Wingstedt, Liljedahl, Lindberg & Berg, 2005). 2.1 REMUPP REMUPP (fig. 3) is designed for investigating various aspects of the musical experience and allows for experimental nonverbal examination of selected musical parameters in a musical context. The musical control is put into the hands of the experiment participants, introducing elements of creativity and interactivity, and enhancing the sense of immersion to a test situation. REMUPP offers an environment, providing control over certain selected musical parameters, not including the finer level of the user selecting each individual note. Limiting the control in this way, affects the creative process as well as the final outcome. The participant might be described as being more of a cocomposer (or maybe a performer), rather than a composer in a traditional sense. 2.2 Musical implementation The concept and functionality of the REMUPP interface causes special demands to be put on the structure of the basic musical material involved – and thus on the composer of this musical material. Since the technical and musical designs will be interwoven with and interdependent on each other, the construction and implementation of the musical material becomes as important as the technical design. Unlike music created for more conventional use, the ‘basic music’ composed Figure 3: REMUPP – an example of the user interface. 44 for REMUPP must in a satisfactory way accommodate the parameter changes made by a participant. The desired expressional or narrative effects must be distinctly achieved at the same time as the overall music performance should remain convincing. Special consideration also has to be taken of the complex interaction of different parameters working together, since the perceived effect of any selected parameter change will be affected by the prevailing settings of the other parameters available. The musical material can thus be thought of as an algorithm, where each parameter is put in relation to all the other parameters in a complex system interacting on many levels. The composer must therefore carefully define and tailor the basic musical material to fulfill the demands of expressional situation at hand – as well as take into account the technical framework of REMUPP. These conditions form the basis for the formulating of artistic strategies that allows for a certain freedom of musical expression and development of musical form – leaving room for the decisions and actions of the listener/user. Rather than ascribing the detailed function of each individual note, the composer will define rules and conditions determining a range of possible musical treatments for a given situation. Each visual scene was presented three times, each time with a different ‘basic musical score’ as accompaniment (the presenting order of the altogether 9 trials was randomized). The initial values of the seven musical parameters were randomized to avoid systematic errors resulting from the initial musical sound. The instruction to the participants was to ‘adjust the musical expression to fit the visual scene as well as possible’. The controlling faders were presented without any written labels, to make the participant focus on their functions only by listening to their effect on the musical expression when moved. 3 Experimental studies To better understand the properties of musical expression resulting from parameterization, several experiments, or quasiexperiments (Cook & Campbell, 1979), have been carried out. Initially, two pilot-studies were performed. The first study investigated selected parameters’ perceived capability to change the general musical expression; the second study examined how the parameters can contribute to express emotions. These studies are described in several articles (Berg & Wingstedt, 2005; Berg, Wingstedt, Liljedahl & Lindberg, 2005; Wingstedt, Berg et al, 2005). Figure 4: An example of REMUPP’s test interface – a screenshot of a 3D animation depicting a ‘physical environment’ (Picnic by the Lake) and below the faders controlling musical parameters. The participants also answered a questionnaire asking about their musical backgrounds, and habits of listening to music, watching movies and playing computer games. Numerical data from the parameter manipulations were analyzed statistically to search for tendencies within the group with regards to the preferred values of the musical parameters in relation to the different visual scenes. A larger study, “Young Adolescents’ Usage of Narrative Functions of Media Music by Manipulation of Musical Expression” (Wingstedt, Brändström & Berg, 2005), utilizes the REMUPP interface to explore young adolescents’ knowledge about, and use of, musical narrative functions in multimedia. Twenty-three participants, 12-13 years old, were given the task of interactively adapting musical expression to make it fit different visual scenes shown as 3D-animations on a computer screen (fig. 4). This was accomplished by manipulating seven musical parameters: Instrumentation (3 different instrument sets, ‘Rock’, ‘Electronic’ and ‘Symphonic’, were available), Tempo (beats per minute), Harmonic complexity (degree of dissonance – consonance), Rhythmic complexity (rhythmic activity), Register (octave level), Articulation (staccato – legato) and Reverb (effect amount). The study took as a starting point one of the descriptive musical narrative functions discussed earlier: Describing physical environment. After each completed session, the participants were also interviewed in a ‘stimulated recall’ type of sitting, where they got to watch and listen to their process as well and results, and discussed and commented on their creative decisions in relation to the musical and narrative expression experienced and intended. Additionally, they got to rate their favourite version of each of the three movies (‘which one are you most satisfied with?’), based on how well they thought the music fitted the visuals. They also discussed the perceived functions of the seven parameters being used, and the experience of interactively controlling the musical expression. 3.1 Results The results from the statistical analysis of the parameter settings, combined with the questionnaires, showed that the participants to a large degree displayed a collective consensus about certain narrative musical functions. This intrinsic consensus can, in turn, be interpreted as mirroring extrinsic norms – a knowledge about existing conventions that we encounter in film, computer games and other narrative multimedia. Three different visual scenes were presented, depicting different physical settings: City Night (a dark hostile alley under a highway bridge), In Space (inside a space ship looking out to planets and other space ships through a giant window) and Picnic by the Lake (a sunny day by a small lake with water lilies and butterflies, a picnic basket on a blanket). There were no people in these environments, to keep the focus on the actual settings. The graphics were realized as animations, but with the movements used sparingly, so there was no visible plot or story – they could be thought of as ‘moving still images’. The idea was to give an impression of these places being alive, ongoing, in process – representing ‘present tense’. A short interpretation, summing up the results of the participants, goes as follows: The pastoral scene by the lake is expressed by the group of participants by the use of the 45 ‘Symphonic’ instrumentation consisting primarily of flute, strings and harp – a classic cliché for expressing pastoral settings in Western musical tradition. The darker and more hostile urban City scene, as well as the more high-tech and mysterious Space scene, are portrayed using electronic instruments. In the two latter scenes the register is also generally lower, producing darker and more sombre sonorities than in the brighter Lake scene. The basic tempi of the Space and Lake scenes are kept relatively low, reflecting the tranquillity of these situations – although the rhythmic activity in the Lake scene is higher, maybe expressing the movements of the fluttering butterflies. The tempo of the City scene is slightly higher, although with a low rhythmic activity, which can be seen as reflecting a higher degree of suspense. The more confined locations of the Space and City scenes are portrayed by the use of more reverb than the open air, and less dramatic, Lake scene. The articulation of the music for the Lake scene is also shorter, although not down to a full staccato, providing an airy quality allowing more ‘breathing’ into the musical phrasings. unreflecting level since the visuals tend to achieve salience. Working with the REMUPP interface has made it possible to bring the music to the front, to make visible the implicit knowledge about musical narrative functions. The results strengthen the assumption that high exposure to media and its associated music contributes to the shaping of knowledge and attitudes of media music. We learn, not only t h r u the ‘multimodal texts’ but also about the modes themselves, from simply using media in informal situations. This gives rise to questions about how learning takes place in pronounced multimodal settings, how we become ‘multimodally literate’ by using the various modes – and the role of music in such situations. REMUPP offers a potential for investigating a range of musicrelated issues from new angles, presenting alternatives when compared to traditional test methods. Firstly, the non-verbal nature of the interface allows for attaining types of data that are difficult or impossible to access using verbal descriptions. Secondly, the tool provides opportunities for exploring various aspects of contextual relations, intra-musical as well as extramusical. Thirdly, the participants’ interaction and control of the musical expression, allows for investigation of aspects of creativity and establishes a deepened sense of agency for the participant. The emphasis on interactivity and the high quality music engine provides an environment resembling a computer game, which enhances immersion and effectively works against the otherwise potentially negative effects of the laboratory situation. The results were also affected by the participants’ gender, musical backgrounds and individual habits of music listening and media use. A general trend was that participants with a higher level of media use (spending much time playing computer games or watching movies) also exhibited a higher awareness of (and conformity to) musical narrative conventions. A more detailed discussion of these results is found in Wingstedt, Brändström and Berg (2005) and Wingstedt (2005). The above mentioned results are drawn from the statistical material. However, at this point the statistical material can mainly indicate answers to the ‘what’ (what was being done) questions. In upcoming papers, analyses of the interviews will be presented – aiming to also contribute some answers to the ‘why’ questions and to include matters related to creative issues, including choices made (conscious or intuitive) in order to follow or deviate from narrative and expressional codes and conventions. In describing the REMUPP interface, emphasis has been put on its use as an interactive non-verbal tool suited for research of various aspects of musical experience. It should be noted however, that the technical and musical concepts behind the interface also offer a platform for other potential applications. For example, the system provides a promising environment for the creation and concept development of live interactive music performances. Also, the technical as well as artistic concepts developed can be thought of as an embryo for a ‘musical engine’ to be used for computer games and other interactive situations. 4 Conclusion By taking charge of the possibilities offered by contemporary interactive and narrative media, a new world of artistic and creative possibilities is emerging – also for the participant in the act of music traditionally thought of as the ‘listener’. It is an aim of this project to serve as a platform for further studies towards knowledge and understanding of the potentials and challenges offered for music in the emerging communication media. This interdisciplinary project has resulted in the development of a theoretical groundwork concerning topics such as narrative functions of media music, and artistic and practical strategies for composition of interactive music – and also in development and innovation of technical nature. The various results gained in the study indicate the usefulness of the REMUPP interface as a tool for exploring musical narrative functions. In manipulating the musical parameter controls, the participants achieve meaning through ‘musical actions’, which is different from using language. For example, to just say that a visual setting is ‘scary’ is not the same as expressing it musically. To determine ‘scary’ by (for example) assigning a low register, setting a certain degree of harmonic dissonance and rhythmic activity, adding more reverberation and slowing down the tempo, demands a commitment to a higher degree than just saying the word. Acknowledgements Thank you to Professor Sture Brändström and Senior Lecturer Jan Berg, at the School of Music in Piteå, for their engagement, help and inspiration throughout this project. Thanks also to Mats Liljedahl (programming) and Stefan Lindberg (composing the basic musical material) at the Interactive Institute, Studio Sonic in Piteå – and to Jacob Svensson (3D graphics) at LTU Skellefteå – for all the work involved in developing REMUPP. Not only the music, but the interweaving between different modes – in this case especially visuals and music – is what creates meaning in the multimodal ensemble (Kress, Jewitt, Ogborn & Tsatsarelis, 2001:25). REMUPP provides conditions for such kind of interweaving. In experiencing a narrative multimodal situation, there is a tendency for the audience or user to treat media music on a relatively subconscious and 46 References Computer Entertainment Technology ACE 2005, Valencia, Spain, 15-17 June. Berg, J. and Wingstedt, J. (2005). ‘Relations between Musical Parameters and Expressed Emotions – Extending the Potential of Computer Entertainment’. In Proceedings of ACM SIGCHI International Conference on Advances in Computer Entertainment Technology, ACE 2005, Valencia, Spain, 15-17 June. Wingstedt, J., Brändström, S. and Berg, J. (2005). Young Adolescents’ Usage of Narrative Functions of Media Music by Manipulation of Musical Expression. Manuscript submitted for publication. Wingstedt, J., Liljedahl, M., Lindberg, S. and Berg, J. (2005). ‘REMUPP – An Interactive Tool for Investigating Musical Properties and Relations’. In Proceedings of The International Conference on New Interfaces for Musical Expression, NIME, Vancouver, Canada, 26-28 May. Berg, J., Wingstedt, J., Liljedahl, M. and Lindberg, S. (2005). ‘Perceived Properties of Parameterised Music for Interactive Applications’. In Proceedings of The 9th World MultiConference on Systemics, Cybernetics and Informatics WMSCI, Orlando, Florida, 10-13 July. Buttram, T. (2004). “Beyond Games: Bringing DirectMusic into the Living Room”, in DirectX 9 Audio Exposed: Interactive Audio Development, ed. T. M. Fay. Plano, Texas: Wordware Publishing Inc. Cook, N. (1998). Analysing Musical Multimedia. Oxford, UK: Oxford University Press. Cook, T.D. and Campbell, D.T. (1979). Quasi-Experimentation: Design & Analysis Issues for Field Settings. Boston, MA: Houghton Mifflin Company. Gabrielsson, A. and Lindström, E. (2001). “The Influence of Musical Structure on Emotional Expression”, in Music and Emotion: Theory and Research, eds. P.N. Juslin and J.A. Sloboda. Oxford, UK: Oxford University Press. Jewitt, C. and Kress, G. (eds.) (2003). Multimodal Literacy. New York: Peter Lang Publishing. Juslin, P.N. (2001) “Communicating Emotion in Music Performance: A Review and Theoretical Framework”, in Music and Emotion: Theory and Research, eds. P.N. Juslin and J.A. Sloboda. Oxford, UK: Oxford University Press. Kress, G. (2003). Literacy in the New Media Age. Oxon, UK: Routledge. Kress, G., Jewitt, C., Ogborn, J. and Tsatsarelis, C. (2001). Multimodal Teaching and Learning: The Rhetorics of the Science Classroom. London: Continuum. Small, C. (1998). Musicking: The Meanings of Performing and Listening. Middletown, CT: Wesleyan University Press. Tagg, P. and Clarida, B. (2003). Ten Little Title Tunes. New York, NY: The Mass Media Music Scholars’ Press. Wingstedt, J. (2004). ‘Narrative Functions of Film Music in a Relational Perspective’. In Proceedings of ISME – Sound Worlds to Discover, Santa Cruz, Teneriffe, Spain, 14-16 July. Wingstedt, J. (2005). Narrative Music: Towards and Understanding of Musical Narrative Functions in Multimedia, (Licentiate thesis). School of Music, Luleå University of Technology, Sweden. Wingstedt, J., Berg, J., Liljedahl, M. and Lindberg, S. (2005). ‘REMUPP – An Interface for Evaluation of Relations between Musical Parameters and Perceived Properties’. In Proceedings of ACM SIGCHI International Conference on Advances in 47 On the Functional Aspects of Computer Game Audio Kristine Jørgensen, Ph.D. student, Section of Film & Media Studies, Copenhagen University, [email protected] Abstract: What is the role of computer game audio? What formal functions does game audio have? These are central questions in this paper which seeks to outline an overview of the functionalities of sound in games. Based on a concluding chapter in my Ph.D. dissertation in which formal functions are identified and discussed, this paper will be a sum-up of some of the most crucial points in my current Ph.D. research on the functionality of sound and music in computer games. The research shows that game audio has important functions related to actions and events in the game world, and also related to the definition and delimitation of spaces in computer games. asked to believe in when playing a computer game. The player must also accept that the fictional world is the frame of reference for what happens in the game. A fictional world may depict a setting that has no real world counterpart and in which nonexistent features are present, or it may depict a setting which has a real world counterpart but presents hypothetical events and features. An example of the first is Warcraft III’s fantasy world Azeroth that features the existence of dragons, orcs and magic, and an example of the latter is Hitman Contracts’ world that is very similar to our own by featuring settings called Amsterdam and Belgrade, but in which the main character and his enemies never were existing persons. In both contexts, sound is used to emphasise the fictional world by being connected to soundproducing sources in a similar manner to real world sounds and by contributing to the atmosphere and the dramatic developments in this world. This point is supported by theories of film sound and music. 1 Introduction This paper concerns the role of computer game audio, and seeks to outline an overview of important functionalities that can be identified in modern games. The present discussions are based on findings in my Ph.D. research and demonstrate a range of different, but related functions of game audio. These are connected to usability, mood and atmosphere, orientation, control and identification. An important prerequisite for understanding the functions that computer game audio has is seeing computer games as dual in the sense that they are game systems as well as fictional worlds [1]. This means that game audio has the overarching role of supporting a user system while also supporting the sense of presence in a fictional world. The identification of these functions are based on my current Ph.D. research that studies computer game sound and music with focus on the relationship between audio and player action as well as events in games. The study is based on theories about film sound and music [2, 3], auditory display studies [4, 5, 6, 7], and qualitative studies of game audio designers and computer game players. The theoretical and empirical perspectives have together provided the understanding of game audio functionality presented in this paper. However, since my project has focussed on two specific games, namely Io Interactive’s stealth-based action game Hitman Contracts (2004), and Blizzard’s real-time strategy game Warcraft III (2002), it is likely that additional functions may be discovered when studying games within other genres. Still the results presented in this paper are diverse, because of the great difference in genre and audio use in the two games in question. However, this paper will also draw on examples from other games. Film theory traditionally separates between diegetic and extradiegetic sound. Diegetic sound is that which has a perceived source in the film universe, and which the fictional characters consequently are able to hear. Extradiegetic sound, on the other hand, are sounds that are part of the film, but which do not seem to have a physical source within the film universe. Thus, extradiegetic sounds cannot be heard by the fictional characters and communicate to the audience by contributing to the mood or drama within the film [2, 10]. However, in computer games, extradiegetic sound often has a different informative role since the player may use information available in extradiegetic sound when evaluating his choice of actions in the game world. In effect, this means that extradiegetic sound has the power to influence what happens in a game, while it does not have this power in a film. An example of this is the use of adaptive music in games: when a certain piece of extradiegetic music starts playing when the avatar is riding in the forest in The Elder Scrolls IV: Oblivion (Bethesda 2006), the player knows that a hostile creature is on its way to attack, and s/he may either try to evade the creature, or stop to kill it. In comparison, when the special shark theme appears when someone is swimming in the thriller film Jaws (Spielberg 1975), the spectator can only watch as the character knows nothing of the approaching danger. 2 Theoretical Background As noted above, understanding the functionality of game audio is connected to understanding the dual origin of computer games as 1) game systems that focus on usability, and 2) fictional worlds that focus on the sense of presence in the game environment. When talking about usability in relation to the game system, I want to emphasise that sound has the role of easing the use of the system by providing specific information to the player about states of the system. This idea is supported by auditory display-related theories. Diegetic sounds in computer games may also have a different role than that diegetic sounds in films. When the avatar produces the line “I cannot attack that” when the player uses the attack command in World of Warcraft (Blizzard 2004), this is of course a system message, but it also seems that the avatar itself is speaking directly to the player. In this sense, the illusion of the fictional universe is broken because a fictional character is When talking about the sense of presence in a fictional world, I want to point out that most modern computer games are set in virtual environments that depict fictional, virtual worlds. In this context, fictional world should be understood as an imaginary, hypothetical world separate from our own which the players are 48 addressing an entity situated outside the game universe. However, when the traditional concepts of diegetic and extradiegetic spaces seem to break down in games, I call the sounds transdiegetic [11, 12]. It should be noted that transdiegetic sounds are consciously utilized in computer games, where they have a clear functional, usability-oriented role. This will be demonstrated in the following. how sounds that seem natural to the game universe also have strong informative value, while the concept of earcon explains provides an understanding of why game music and artificial noises make meaning without disturbance in computer games. More importantly, these ideas also help explain why there are transdiegetic sounds in computer games. When a game developer wants to utilize sound for urgency and response purposes, while also maintaining a direct link to the game universe, it becomes necessary to break the border between real world space and virtual space in order to enable communication between the player and the game world. Auditory display studies are concerned with the use of sound as a communication system in physical and virtual interfaces, and the field derives from human-computer interaction studies and ecological psychoacoustics. This field utilizes sound as a semiotic system in which a sound is used to represent a specific message or event. 3 Different Functions Related to the above theoretical assumptions, this part of the paper will discuss five different overarching functions that have been disclosed during my research. As noted above, the identification of these functions are based on analyses, interviews and observations related to two specific games, and it is likely that the study of more games will reveal additional functions. Auditory display studies often separate between two kinds of signals called auditory icons and earcons. Auditory icons are characteristic sounds based on a principle of similarity or direct physical correspondence and which can be recognized as sounds connected to corresponding real world events; while earcons are symbolic and arbitrary sounds such as artificial noises and music which may be seen as abstract in the sense that they cannot immediately be recognized [5, 6, 8, 9]. This separation between two types of signals also applies to computer game audio. When using sound as an information system, computer games utilize both auditory icons and earcons. Broadly speaking, auditory icons are used in connection with all kinds of communicative and source-oriented diegetic sounds, while earcons are used in connection with extradiegetic music and interface-related sounds. In general, these terms are used for non-verbal audio, but in the case of computer games there is an exception to this. When voices are used in order to identify a human source and not for its semantic qualities [3], the voice does not present detailed linguistic information and may be used in a similar manner to other object-related sounds. Examples of auditory icons in games are the sound of a gun shot, the sound of enemies shouting, and the sound of footsteps on the ground, while examples of earcons are the use of music to signal hostile presence, a jingle playing when the avatar reaches a new level in an MMORPG, and the sound playing when Super Mario is jumping. 3.1 Action-Oriented Functions This research has identified uses of game audio which relate to events and player actions in the game world, and which corresponds to auditory display studies’ urgency and response functions. Most modern games utilize sound for these purposes to an extensive degree, although it is not always evident that this is the formal and intended function of the sound. It seems to depend on how auditory icons and earcons are used. Hitman Contracts integrates auditory icons as naturally occurring sounds from events in the environment. In this sense, the communicative role of the sounds becomes transparent by giving the impression that sounds are present for a realistic purpose instead of a functional purpose. For instance, when the avatar is in a knife fight, sound will be a good indicator of whether he hits or not. When the avatar hits, the slashing sound of a knife against flesh will be heard, accompanied by screams or moans from the enemy, and when the avatar misses, the sound of a knife whooshing through the air is heard. These are of course examples of a confirmation and a rejection response to player actions, and work as a usability feature although they also seem natural to the setting and the situation. Concerning the purpose of auditory signals, studies of auditory display often speak of two central functions. These may be described as urgency and response functions. Urgency signals are proactive in the sense that they provide information that the user needs to respond to or evaluate shortly. Urgency signals are often alarms and other alerts pointing towards emergency situations, and may be separated into different priority levels based on whether they demand immediate action or evaluation only [7]. Response signals, on the other hand, are reactive, and work to inform the user that a certain action or command has been registered by the system. In order to be experienced as responses, the sound must appear immediately after a the player has executed a command or an action, and it must be clearly connected to a specific event [4, 5]. In a game, an urgency message may be the voiceover message “our forces are under attack” in Warcraft III, while a response message may be the sound of a mouseclick when selecting a certain ability from the interface menu in the same game. However, it is also possible to use auditory icons in a less transparent manner, in which the auditory icons more clearly stand out as auditory signals intended for communicating specific messages. In Warcraft III, objects produce specific sounds when manipulated. For instance, when the player selects the lumber mill, the sound of a saw is heard. Also, when the barracks is selected, the player hears the sound of marching feet. Although these responses have diegetic sources, the sounds do not seem natural to the game world in the same manner as the knife sounds in Hitman Contracts. The reason for this is that they are produced only when the player selects the specific building, and in the case of the barracks, this is not the exact sound one expects to hear at a real-world barracks. We see that the sound is suitable for the specific object, although not in this precise format. According to Keller & Stevens [6], this demonstrates non-iconic use of auditory icons, while the example from Hitman Contracts demonstrates iconic use of auditory icons. This difference also emphasises the fact that sounds with a seemingly naturalistic motivation do have usability functions. Together these concepts form a fruitful framework for understanding why computer game audio is realized the way it is, and it also provides an understanding of different functions that game audio may be said to have. The response and urgency functions explain game audio in terms of the usability of a computer system. In addition, the concept of auditory icon explains Concerning the use of earcons for response purposes, Hitman Contracts has music that informs the player whether his/her 49 current activities are going well or badly. The music changes into a combat theme which will play a particular piece of music if the player is doing well, and another if the player is doing badly. However, although this follows the idea of earcons, this use of music is also adopted from the use of dramatic music in films. This makes the use of musical earcons feel familiar and suitable even though it does not feel natural to a specific setting. logue also contribute to the specific mood of a game or a situation. The overall soundscape contributes to a sense of presence or even immersion in a game by creating an illusion of the game world as an actual space. Sound may thus give the impression of a realistic space by presenting virtual offscreen sources. In this context, ambient environmental sound is of interest. Ambience should be understood as environmental background sounds added to the game for the purpose of adding the sense of presence and a specific mood to the game. Thus, these sounds are not present in order to influence player action by giving the player specific information about objects, events or situations, and they are often not connected to specific sources in the game. Instead they may be connected to virtual sources, or be collected into a separate soundtrack. The first technique is found in Lineage II (NC Soft 2004), where for instance insects can be heard in each bush. When looking for the actual sources, however, these cannot be found as visual objects. The second technique is found in Sacred (Ascaron 2004), where the ambient background noise for each setting is stored as a separate mp3-file. Thus, when the player is exploring dungeons, a specific soundtrack consisting of reverberated wind and running water is played, while when the player visits villages, the sounds of children laughing and dogs barking are heard. Both earcons and auditory icons are used for urgency purposes in computer games. Although a range of different priority levels may be identified in games, I will limit myself to the two most common. Games often separate between urgency signals that work as notifications that do not demand immediate player action; and urgency signals that work as warnings that demand some kind of action. Notifications provide information about events in the environment that the player needs to know about, but which s/he does not have to react to. S/he may, however, need to evaluate the situation. An example from Warcraft III is the message “work complete” which is played when a worker has finished its task. Warnings, on the other hand, provide information about immediate threats or dangers to the player. These will always need an immediate evaluation, and possibly action, but dependent on the situation, the player may choose to not take any action if he regards the situation under control. An example is the message “our forces are under attack” which is played in Warcraft III when the player’s units are being attacked by the enemy. Observations and conversations with players reveal that the engagement in the game may decrease when the sound is removed from the game. Players notice that the immersion decreases, and that the fictional world seems to disappear and that the game is reduced to rules and game mechanics when sound is removed. 3.2 Atmospheric Functions Working in a more subtle manner, the atmospheric functions of game audio may still be regarded as one of the most central. The use of music in films for emotionally engaging the audience is well known [12], and games try to adopt a similar way of using music. Most mainstream games utilize music to emphasise certain areas, locations and situations. 3.3 Orienting Functions The orienting functions of game audio are related to actionoriented functions in the sense that both provide information about events and objects in the game environment although in different ways. While the action-oriented functions are reactive and proactive, the orienting functions inform about the presence and relative location of objects and events. The functions described in this section were identified in my qualitative research where player performance was studied in the absence and presence of game audio. An example is a game such as World of Warcraft, where the large cities have distinct music. When entering the orcish capital of Orgrimmar, the player hears that a certain piece of music starts, dominated by wardrums. This music is distinct from the music heard when entering the human capital of Stormwind, which has a more Wagnerian epic style. In both cases, the music is there as a mood enhancer that emphasises classical fantasy conventions of the noble humans and the savage orcs. In this context, it is important to point out that atmospheric function of music is guided by genre conventions. In survival horror games such as the Silent Hill series (Konami 1999-2004), atmospheric sound and music are used to emphasise a very specific mood of anxiety and horror. However, it should be noted that this mood also has the power to influence the player’s behaviour in the game. When the player becomes anxious he may act more carefully in order to avoid any dangerous enemies and unpleasant situations. In this sense, atmospheric sound may thus work indirectly to influence player action. In connection with the orienting functions of game audio, it is important to note that sound seems to extend the player’s visual perception beyond what is possible without sound. In the presence of sound, the player receives information that the visual system cannot process, such as for instance events and objects situated outside the line of sight. It also enables the player to know what is going on in locations not in the immediate vicinity of the player. The perhaps most obvious orienting function of sound is that it provides information about the presence of objects as well as the direction of sound sources. This is especially important in the context of offsceen sources. Sound may thus reveal the appearance and presence of an object before the player has actually seen it, and provides therefore information that the visual system could not provide on its own. A good example is the shouting voices of offscreen guards in Hitman Contracts. Today’s computer games utilize the stereo channels to inform the player about the relative direction of a sound source. However, although a stereo sound system does reveal the relative direction of a certain source, it is not able to provide information on whether the source is located in front of or behind the player. True surround systems demonstrate significant possibilities for providing detailed information about the location of an offscreen Atmospheric sounds may also influence player behaviour in more direct manners. When music is used for responsive and urgency purposes, it will also have atmospheric properties. In the example from Hitman Contracts above, we see that different pieces of music provide different kinds of information to the player. The music does not only work as pure information, it also emphasises mood. For instance, when the player is in a combat situation, the music becomes more aggressive by an increased tempo and a more vivacious melody. Although music may be the more persuasive kind of atmospheric sound, environmental and object sounds as well as dia- 50 source, and prove to be interesting for the further development of game audio functionality. These orienting functions are also demonstrated in the research on audio-only games for the blind and visually impaired. This research demonstrates the use of characteristic sounds that identify objects and events, their presence, and their relative location [4, 5, 13]. “ready for action”. However, it is interesting to see that these utterances not only identify the unit; they also signal the relative value of it. This means that the more powerful a unit is, the more distinct its sound of recognition is. Within Warcraft III’s orc team, the workers utter sentences that suggest obedience and humbleness such as “work, work”, “ready to work” and “be happy too”. The named warchief which represent the most powerful units in the game, on the other hand, utter sentences such as “I have an axe to grind”, “for my ancestors”, and “an excellent plan”, which emphasise aggressiveness, honour, and strategic insight. In addition, its voice is deeper than the voices of other units, as well as the fact that the footsteps of the unit sound heavily. Thus, we see that the quality and content of the sound are used in order to ease recognition of certain objects in the game as well as to signal the value of different units. 3.4 Control-Related Functions Tightly connected to the orienting functions are control-related functions. These are related to the idea that sound extends visual perception, and point to what sound directly contributes to concerning the player’s control over the game environment. Since game audio extends visual perception, it enables the player to be in control over unseen areas. Strategy games often provide good examples of this. In the real-time strategy game Warcraft III, the player receives auditory information about events happening on remote areas of the map. When the player is busy leading his/her army to battle, s/he still receives voiceover messages about status of the base, such as “upgrade complete” and “ready to work”. These messages contribute to increased control over all activities in the game. The same game also utilizes sound to provide the player with more detailed information than what visuals can provide. Combat situations in this game tend to be chaotic due to the fact that there is a huge number of military units fighting on each side. It is therefore difficult for the player to see exactly what happens in combat. The sounds of bowstrings and metal against metal inform the player what units are fighting, and screams tell the player that units are dying. In this example we see that sound contributes to ease the player’s management of the game by providing information that is difficult to provide by visuals only. 4 General Discussion The functions identified above are closely related to each other although they seem to stem from different aspects of games. Most of the functions seem to be motivated by usability, although the atmospheric function seems to go against this by emphasising presence and immersion into the fictional game world. These two seemingly different purposes of game audio are connected to the fact that computer games are user systems at the same time as they are set in fictional worlds. However, it is important to note that computer games also bridge these two domains, something which also becomes evident through their use and implementation of audio. How, then, does this fusion of user system and fictional world happen? To say it bluntly, it happens through giving many sounds a double function where they belong to in-game sources and are accepted as fiction at the same time as they provide specific information to the player. We can identify three central techniques that ensure that this merge seems transparent and intuitive; namely the use of auditory icons, earcons, and transdiegetic sounds. These examples are also related to the idea that the presence of sound eases or increases the player’s attention and perception. This was suggested by the informants of my study, who emphasised the idea that channel redundancy, or presenting the same information through different perceptual channels, increased the ability to register certain messages [14]. When sound was absent, Warcraft III players had difficulties noticing written messages appearing on the screen. This is probably due to the high tempo of the game, and the fact that the player’s visual perception is focussed on specific tasks in the game. Since auditory icons have an immediately recognizable relation to its source, these are very well suitable for combining the usability function with the fictional world. The sounds seem natural to the game environment, at the same time as they provide the player with information relevant for improved usability of the system. This is what hinders the sound from the buildings in Warcraft III to seem misplaced. 3.5 Identifying Functions Another interesting function connected to sound is its ability to identify objects and to imply an objects value. The fact that sound identifies may not seem surprising, since sound in general indicates its producing source. However, this is utilized in games, not only in the format of auditory icons that automatically are recognized, but also in the format of earcons that needs to be learned before they can be recognized as belonging to a specific source. We have already discussed the example from Hitman Contracts where music is used to identify certain situations. Earcons may be said to work the other way around, since they illustrate an artificially constructed relation between sound and source. The use of artificial noises may contribute to a certain auditory message becoming very noticeable or even disturbing because of its unexpected relation to a certain source, such as is the case with the squeaking negative response produced when the player tries to make an illegal action in Warcraft III. On the other hand, the use of game music does not seem disturbing because it utilizes accepted conventions from film music and adds mood to the game. This is why the player accepts music which changes according to the situation in a game such as Hitman Contracts, and which plays in major when the player is doing well and in minor when the player is doing badly. Warcraft III connects identifying sounds to units and buildings. From the player’s top-down view on the environment it may be difficult to distinguish objects from each other. However, as noted above, when the player selects the lumber mill, s/he will hear the sound of a saw, and when s/he selects the barracks, s/he hears the sound of marching feet. This enables the player to easily recognize the building without having a clear view of it. In the case of units, each of them presents an utterance of recognition when produced and when manipulated. This means that a worker says things such as “ready to work”, while a knight says The third technique that makes the fusion between usability and presence in a fictional world transparent is transdiegetic sounds. Transdiegetic sounds break the conventional division between diegetic and extradiegetic sounds by either having diegetic sources that communicate directly to the player, or by being extradiegetic sounds that game characters virtually can hear. 51 [7] Sorkin, Robert D., “Design of Auditory and Tactile Displays”, in Salvendy, Gavriel (ed.): Handbook of Human Factors. New York, Chichester, Brisbane, Toronto, Singapore: John Wiley & Sons, 549-576, (1987). When sound in films breaks this common separation between diegesis and extradiegesis, it is understood as a stylistic, artistic and uncommon way of using sound, but games utilize this functionally to bind together usability and fictional space. This means that it does not feel disturbing when a unit in Warcraft III says “What do you want?” with direct address to the player, although the unit is regarded a fictional character and the player who has no avatar in the game is situated in real world space. Neither does it seem strange that the avatar as a fictional character in The Elder Scrolls IV: Oblivion (Bethesda 2006) reacts by drawing its sword when the musical theme that suggests nearby danger starts playing – although a film character would not react in this way, a game character can due to the link between avatar and player. [8] McKeown, Denis, “Candidates for Within-Vehicle Auditory Displays”, Proceedings of ICAD 05. Available: http://www.idc.ul.ie/icad2005/downloads/f118.pdf [10.04.06], (2005). [9] Suied, Clara, Patrick Susini, Nicolas Misdariis, Sabine Langlois, Bennett K. Smith, & Stephen McAdams (2005): “Toward a Sound Design Methodology: Application to Electronic Automotive Sounds”, Proceedings of ICAD 05. Available: http://www.idc.ul.ie/icad2005/downloads/f93.pdf [10.04.06], (2005). In this sense, computer game audio aims to combine usability with presence and immersion in the fictional game world, and by doing this the realization and functionality of game audio becomes in different ways similar to both film audio and auditory displays and interfaces. This creates a very unique way of utilizing audio which is especially designed to emphasise how modern computer games work. [10] Bordwell, David & Kristin Thompson, Film Art: An Introduction. New York: Mc-Graw Hill, (1997). [11] Jørgensen, Kristine, “On Transdiegetic Sounds in Computer Games”, Northern Lights 2006, Copenhagen: Museum Tusculanums Forlag, (2006). [12] Gorbman, Claudia, Unheard Melodies? Narrative Film Music, Indiana University Press, (1987). 5 Summary As a summary of the concluding chapter of my upcoming Ph.D. thesis on the functionality of game audio in relation to actions and events, this paper has concerned computer game audio functionality. The paper identifies and describes the most important functions of computer game audio and provided an explanation of why these functions are central to computer game audio. The main argument is that modern computer games are set in fictional, virtual worlds at the same time as they are user systems, and in order to combine this in the most transparent way, they break the common concept of diegesis by utilizing auditory icons and earcons for informative purposes. [13] Röber, Niklas & Maic Masuch, “Leaving the Screen. New Perspectives in Audio-Only Gaming”, Proceedings of ICAD-05. Available: http://www.idc.ul.ie/icad2005/downloads/f109.pdf [02.08.06], 2005. [14] Heeter, Carrie & Pericles Gomes, “It’s Time for Hypermedia to Move to Talking Pictures”, Journal of Educational Multimedia and Hypermedia, winter, 1992. Available: http://commtechlab.msu.edu/publications/files/talking.html [03.08.06], 1992. 6 References [1] Juul, Jesper, Half-Real. Video Games Between Real Rules and Fictional Worlds. Copenhagen: IT University of Copenhagen, (2003). [2] Branigan, Edward, Narrative Comprehension and Film. London, New York: Routledge, (1992). [3] Chion, Michel, Audio-Vision. Sound on Screen. New York: Columbia University Press, (1994). [4] Drewes, Thomas M. & Elizabeth D. Mynatt, “Sleuth: An Audio Experience”, Proceedings from ICAD 2000. Available: http://www.cc.gatech.edu/~everydaycomputing/publications/sleuth-icad2000.pdf [03.06.2005], (2000). [5] Friberg, Johnny & Dan Gärdenfors, “Audio Games: New Perspectives on Game Audio”, Proceedings from ACE conference 2004. Available: www.cms.livjm.ac.uk/library/AAAGAMES-Conferences/ACM-ACE/ACE2004/FP18friberg.johnny.audiogames.pdf [02.08.06], (2000). [6] Keller, Peter & Catherine Stevens (2004): “Meaning From Environmental Sounds: Types of Signal-Referent Relations and Their Effect on Recognizing Auditory Icons”, in Journal of Experimental Psychology: Applied. Vol. 10, No. 1. American Psychological Association Inc., 3-12, (2004). 52 Composition and Arrangement Techniques for Music in Interactive Immersive Environments Axel Berndt, Knut Hartmann, Niklas Röber, and Maic Masuch Department of Simulation and Graphics Otto-von-Guericke University of Magdeburg P.O. Box 4120, D-39016 Magdeburg, Germany http://games.cs.uni-magdeburg.de/ Abstract. Inspired by the dramatic and emotional effects of film music, we aim at integrating music seamlessly into interactive immersive applications — especially in computer games. In both scenarios it is crucial to synchronize their visual and auditory contents. Hence, the final cut of movies is often adjusted to the score or vice versa. In interactive applications, however, the music engine has to adjust the score automatically according to the player’s interactions. Moreover, the musical effects should be very subtle, i. e., any asynchronous hard cuts have to be avoided and multi-repetitions should be concealed. This paper presents strategies to tackle the challenging problem to synchronize and adapt the game music with nonpredictable player interaction behaviors. In order to incorporate expressive scores from human composers we extend traditional composition and arrangement techniques and introduce new methods to arrange and edit music in the context of interactive applications. Composers can segment a score into rhythmic, melodic, or harmonic variations of basic themes, as known from musical dice games. The individual parts of these basic elements are assigned to characterize elements of the game play. Moreover, composers or game designers can specify how player interactions trigger changes between musical elements. To evaluate the musical coherency, consistency, and to gain experience with compositional limitations, advantages and possibilities, we applied this technique within two interactive immersive applications. 1 Introduction order to start the next piece of music. These hard cuts destroy inner musical structures that we are used to hear and thus eventually break the game’s atmosphere. Because of music cultural typification and preparatory training of the listener he perceives music with a certain listening consuetude. He is used to hear musical structure even unconsciously. That is why humans would recognize hard cuts even while hearing some piece of music for the very first time. In addition, the succeeding music requires at least a few seconds to evolve its own atmosphere — time of an atmosphere-less bald spot. All these factors lower the immersion of the user into the virtual world, which is particularly dangerous in all application domains where music is used to intensify the immerse of the user into a virtual environment. To solve this antagonism between static musical elements within dynamic interactive environments one may be tempted to formalize music composition and delegate the creation of background music to automatic real-time generators. Even though the development of automatic composition systems has been one of the first challenges tackled by researchers within the field of artificial intelligence (see for instance Hiller’s & Isaacsons’ automated composed string quartet “Illiac Suite” [8] and the overview articles [16, 4]), the qualitative problem is apparently clear. Despite of the long research tradition, the majority of these systems are specialized to a single musical style (e. g., chorales in the style of Johann Sebastian Bach [5, 6]) or tootles more or less pseudo randomly tunes (due to the lack of high-level musical evaluation criteria for optimization methods [3, 7, 13, 11] or machine learning techniques [21]; due to the application of stochastic methods such as Markov chains [8]). Another conflict with our intention to integrate expressive musical elements in interactive immersive applications results from a major strategy of research in computer music: researchers manually extract rules or constraints from textbooks on music theory or develop algorithms which automatically In movies and theater, directors and composers employ musical elements and sound effects to reinforce the dramatical and emotional effects of pictures: scores can bring a new level of content and coherency into the story-line, can invert the picture’s statement, or can insert elements of doubt or parody. However, these effects have to be very subtle as they are intended to be perceived subconsciously. Only in this way — through by-passing the process of concentrated listening and intellectual understanding — the music can establish it’s emotional power (cf. [10, pg.22ff]). In the post-processing of movies, directors, cutters, and composers cooperate to intensify their emotional impact. One — very critical — aspect of this procedure is the careful synchronization of the visual and auditory contents. Usually, the final cut of movies is done according to the underlying score.1 . Hence, musical structures become a pattern for scene cuts and transitions. Often, pictures and music seem to be of a piece: scene and content transitions (actions and events within a scene) melt in the music. In the opera this is even more extreme: nothing happens without a musical trigger. This is only possible due to the static nature of these linear media: all transitions are known and have been adjusted with the score. In interactive immersive applications such as computer games, however, the music engine has to adjust the score automatically according to the player’s interactions. Very often pre-composed musical elements or pre-arranged sound effects are triggered by some elements of the game play. This problem is intensified by the asynchrony between game elements and player interactions: very often the music engines of computer games simply disrupt the currently playing music regardless of its musical context in 1 Schneider describes this from the composer’s point of view [19] and Kungel goes deep into the practical details considering also the cutter’s concerns [10]. 53 berger’s (1721–1783) [9] and Wolfgang Amadeus Mozart’s (1756–1791) [14] dice games base upon a common harmonic structure: all interchangeable group elements are short (just one bar) and are based on the same underlying harmonies. In order to “compose” a new piece, the player selects elements from sixteen melody groups by throwing a dice. Jörg Ratai [18] extends this idea by exploiting chord substitutions and harmonic progressions, so-called jazz changes. The basic compositional principle is to replace elements within a harmonic context by appropriate substitutions (e. g., borrowed chords). While the basic blocks of Ratai’s jazz-dice contains manually composed harmonic variations, Steedman [20] proposed an automatic system employing recursive rewriting rules. Even though Ratai’s system is based on a simple 12-bar blues schema, it achieves an enormous harmonic variance. extract them from a corpora of compositions and use them to generate or evaluate new tunes. But as any composer will notice, a pure adherence to these rules neither guarantees vivid compositions nor do great compositions follow rules in all respects. An agile musical practice constantly defines new musical patterns and breaks accepted rules in order to achieve expressiveness and a novel musical diction. Therefore, we decided not to replace the human composer by an automatism. The music should still be written by an artist, but it has to be composed in a way that it is re-arrangeable and adaptable in order to adapt the game’s music with non-predictable player interaction behaviors. The following sections describe one way to achieve this goal. The compositional roots of our approach are introduced in Sec. 2. Sec. 3 describes a music engine which acts like a kind of real-time arranger. Sec. 4 demonstrates the application of this technique within two interactive immersive applications. Sec. 5 summarizes this paper and motivates directions for future research. Other musical arrangement techniques — abbreviations and jumps — can help to overcome the second problem: the smooth transition between musical elements. Sometimes composers or editors include special marks into the score indicating that the performer can jump to other musical elements while omitting some segments (e. g., da capo or dal segno). A few game music use this method to ensure that the music is not cut at any position but only on these predefined ones (e. g., only on the barline like in Don Bluth’s music for the game “Dragon’s Lair 3D”). A non-compositional technique which is used quite often by disc jockeys for transitions between different pieces of music or sounds is the cross-fade. Cross-fading means, while the currently running music fades out the next piece is started and fades in. During this time both pieces of music are hearable. This bares a big problem: both pieces of music might not harmonize. In particular differing tempi, rhythmic overlays, and dissonant tones might be quite confusing to the listener. Hence, one cannot crossfade arbitrary musical pieces ad libitum. 2 Compositional Roots Compositional and arranging techniques as well as the musical practice already offer a number of methods which can be used to make music more flexible and to adjust the length of musical elements to a performance of unpredictable length. There are two basic problems in the synchronization between musical and performance elements: (i) the duration of a given musical element might not suffice or (ii) the performance requires a thematic break within a piece. There are several strategies of composers and musicians which are useful to tackle the first problem: to remain at some musical idea while keeping the musical performance interesting. The simplest solution is to extend the length of a musical element — the whole piece or just a segment of it can be repeated or looped. This practice is used to accompany e. g., folk dances, where short melodic themes are usually repeated very often. The infinite loop is also known from the first video games and can still be found in recent computer games. Examples therefore are Super Mario Bros., the games of the Monkey Island and the Gothic series. By exploiting the different timbres of the instruments in an orchestra the instrumentation or orchestration opens up a second dimension. The composers of building set movements actually surpass the instrumentation technique [12]: their music can be performed by all notated parts at once or just by a few of them. Different combinations are possible and can be used to vary the performance of different verses. Nonetheless, every part-combination sounds self-contained and complete. This composition manner has its roots in the baroque practice of rural composition. Baroque composers like Valentin Rathgeber (1682– 1750) [17] wrote such music with reducible choirs and instrumentations for congregations with minor performance potentials. Today this acquirement is nearly extinct. In the whole polyphonic music there is not always just one leading part with an own melodic identity. Heinrich Schütz (1585–1672) already taught his students to compose in multiple counterpoint, i. e., the parts can be interchanged [15]. Every voice has its individual identity. Ergo the soprano can be played as a tenor, the bass as soprano and so on. Here every voice can be the upper one, can be “melody”. Johann Sebastian Bach (1685–1750) demonstrates this impressively in his multi-counterpoint fugues. Musical dice games show another way to bring more flexibility into the music. The basic principle is that composers create groups of interchangeable musical elements which are randomly selected during a performance. Johann Philipp Kirn- Computer games aim at immersing players into a three dimensional virtual environment. Interestingly, the multi-choir music of some Renaissance composers already integrated the three dimensionality of real world in some way into their compositions. The choirs (here this term also includes instrumental groups) were separated by open ground and placed e. g., to the left, right, front and/or back of the audience. One of the greatest and most famous representatives for this practice of composition and performance is the Venetian Giovanni Gabrieli (1556/57–1612). The listener can have quite different musical experiences by changing his own position in the room. He can hear the choirs playing alternately, communicating with and melting into each other bringing the whole room to sound. This can also be considered as surround music. Every choir has its own identity but all together build a bigger musical sound-scape. In their analysis, Adorno and Eisler [1] already pointed out that traditional music structures do not work within the new medium film, where the music (i) once had to become formally more open and unsealed and where (ii) composers had to learn following rapid short-term scene structures. But while new compositional techniques and musical forms have been successfully established for film music, the development of compositional techniques for music in interactive media is still in the beginning. In modern computer games, the development of a non-linear arborescent background story adds another dimension to the three dimensionality of virtual worlds. In contrast, music is only one dimensional with a fixed beginning and end where everything inbetween is predefined or pre-composed. The musical techniques outlined above are able to add further dimensions to game music: 54 (a) Distributed Music (b) Music Change Marks Figure 1: (a) Sound sources and their radii in the distributed music concept.(b) Passing music change marks triggers musical changes. appropriate parts from a given score and changes between musical elements. The following subsections will describe the several aspects of music arrangement and composition that promote the adaptation of musical structure to interaction space structure. instrumentation changes, building sets, and the melodic independence of interchangeable parts within a counterpoint score can vary the sound impression of a single musical element whereas musical dice games can even vary musical content; loops, abbreviations, and jumps can stretch and shorten the timely disposition. All these techniques have to be combined in order to tackle the problematic synchronization between musical and performance elements. Moreover, the multi-choir manner offers a way to integrate spatial dimension within the composition. Now we go a step further and introduce a way to gain actually four dimensions and consider user interactions. 3.1 Overview The basic principle of our music engine is the integration of musical elements into the virtual world and the re-arrangement of pre-composed musical pieces in real-time. We introduce the concept of parallel and sequential music distribution into different musical elements, which can be blended without dissonances (parallel and synchronously running elements), and meaningful self-contained musical entities, which can be re-arranged without interrupting the playback (sequential elements). Inspired by multi-choir music, dedicated parallel parts of a score characterize elements of the virtual world and the game play. Therefore, the game designer can assign these parts to 3D locations. Fig. 1-a contains some game objects (four rooms in floor plan) with musical elements. Their hearability can interfere, when the player navigates from one location to the another. Fig. 2 illustrates the subdivision of a holistic score ((a) in Fig. 2) into parallel tracks ((b) in Fig. 2) which characterize the four locations in Fig. 1-a. Since the player is free to navigate through all game objects the associated tracks should possess an independent musical power (e. g., with own melodies) and must harmonize with other synchronous tracks to do a proper cross-fading. Therefore, the composers have to apply the techniques of the building set manner and multiple counterpoint. Furthermore, the score is segmented into sequential blocks ((c) in Fig. 2), which can be re-arranged in various ways in order to achieve articulated musical changes. Music change marks (see also Fig. 1-b) can trigger the re-arrangement, i. e., a edit the block sequence even during playback. Here different block classes (as illustrated in Fig. 2-d) are considered which denote special blocks. These are used for transitions and for music variance. The following sections will describe these introduced concepts in more detail and also consider compositional matters and rela- 3 A Music Engine as Real-Time Arranger “Games may owe something to movies, but they are as different from them as movies are different from theater.” This statement of Hal Barwood2 also applies for the music of computer games: the player’s integration in a three dimensional virtual environment, a non-linear arborescent background story, and unpredictable user interactions prevent the direct application of traditional techniques or film music composition techniques and sound elements designed for movies in computer games. The previous section revealed the main problem of a music engine in interactive immersive applications, especially in computer games: the automatic adaptation of the score automatically according to non-predictable player interactions. Moreover, all game elements should be designed in a way that prevents losses of immersion. Hence, sound elements have to be integrated into the virtual world and all musical effects should be very subtle, i. e., any asynchronous hard cuts have to be avoided and multirepetitions should be concealed. This paper presents a new method to compose and arrange scores for interactive immersive applications. Composers can segment a score into rhythmic, melodic, or harmonic variations of basic themes, as known from musical dice games. The individual parts of these basic elements are assigned to characterize elements of the game play (e. g., objects, actors, locations, and actions) or narrative elements. Moreover, composers or game designers can specify how player interactions trigger the selection of 2 Game designer, writer, and project leader for Lucas Arts’s computer game adaptations of the Indiana Jones movies [2]. 55 Parallel Distribution. To locate music on several positions in the three dimensional environment we literally place it there by using punctiform sound sources as known from physically oriented 3D-audio modeling environments (e. g., OpenAL http://www.openal.org/). This can be considered as multi-choir music where the choirs are placed at particular positions in the room. As in the real world every source has a region of hearability. As Fig. 1-a illustrates it can be set according to the measures of its location. Thus, the musical accompaniment can cover the location it belongs to, completely. Depending on the position of the listener he can hear the sources at different volume levels, thus any movement can change the volume gain which corresponds to fading. Sec. 3.4 goes deeper into detail with this. This concept means, multiple tracks which are associated to 3D locations run in parallel (and as loops) and are cross-faded when the player moves between the sound sources (i. e., the locations). As we discussed previously 2 one cannot cross-fade any pieces of music ad libitum. But to enable this, the music can be specially prepared: comparable to multi-choir music everything runs synchronously (by starting the playback at the same time 2-b), and the music of each sound source is composed having regard to sound in combination with the others. If multiple parts need to sound together, they cannot sound as individual as the composer or the designer might want them to be. For the first, the music which is dedicated to the location the player currently visits, is the most important, and thus leading one. All other parts are adjusted variations, not the more individual original! If the player now leaves the location these variations can fade-in without any musical problems. Sequential block distribution. Since, the music, that fades in, is only an adjusted variation to the one, that fades out, pure crossfading is only half of the music transition. It can realize only a certain degree of musical change. The arrival at the new location, and the new music is still missing. Therefore, all pieces of music are partitioned into blocks of self-contained musical phrases which should not be interrupted to ensure musical coherency. But to achieve a real music change, other blocks can be put into the sequence. The sequence can be adapted just in time even while the playback is running (cf. Fig. 2c). To perceive the necessity of such a music change, so called music change marks are placed at the connection points between the locations. These are triangle polygons as illustrated in Fig. 1-b. A collision detection perceives when the player moves through one of them. While the new music is loaded, the playback goes on till the end of the current musical block, where the block sequence of the next music is enqueued glueless. The playback goes through without any stops or breaks. After such a music change the new location and its associated music is the most important and thus leading one. The remaining pieces are adjusted variations to this. Up to now everything was triggered by position changes in the three dimensional virtual space. The fourth dimension, that is the story which can change independently from the players 3Dposition, can also be an actuator for music changes. These are executed in the same way as the triggered music transitions, described here. But for this they are not caused by position changes of the player but by (story relevant) interactions. Figure 2: A parallel and sequential segmentation of a holistic score. Different block classes can be considered by the music engine. tionships, which the composer considers to enable the adaptiveness of his music. 3.2 Distributed Music 3.3 Loop Variance The structure of music has to attend four dimensions: the three dimensional virtual world and the story dimension. Therefore, we describe an approach to distribute the music in these four dimensions. One basic principle of human communication is its economy. If a message is being repeated, i. e., if the communication contains a redundancy, it is interpreted purely on the level of pragmatics. 56 Fading by Distance Attenuation gain maxGain minGain O minDistance maxDistance distance Figure 4: Moveable sources (pink, in contrast to static sound sources in blue) are positioned inside of their bounding volume (yellow) as near to the listener as possible. Figure 3: The distance of the listener to the source controls their perceived volume gain. For a smooth attenuation behavior a cosine function is used. Frequently, senders using this technique intend to convey the importance of this message to their audience. But if these messages are conveyed too often their inherent redundancy will cause the audience to be bored or bothered. This phenomena also applies to music: as soon as the player can recognize infinite loops within the background music of computer games it will sooner or later loose its unobtrusiveness and become flashy. Therefore, composers are urged to avoid readily identifiable motifs, themes, or melodic phrases which could be immediately be recognized at the first repetition. Hence, background music aims to be diffuse, nebulous, and less concrete. In practice game designers furthermore give composers an approximate time the user will presumably spend on each game element. In order to avoid that players recognize repetitions, the musical disposition is planned with regard to this length. But this does not work for all (especially the slow) playing behaviors. To protract the effect of recognition, we introduce a way to vary the music in form and content: our system incorporates musical blocks called one-times, i. e., the music engine plays them only one time and removes them from the list of active blocks after the first cycle (cf. Fig. 2-d). Ideally, subsequent loop iterations appear like a continuation of the musical material or a re-arrangement of the first loop iteration and soonest the third iteration can be recognized as a real repetition. This behavior was originally implemented for transitions between two pieces of music, which likewise have to be played only once. But, by using One-Times also in-between a current music, it turned out to be an effective instrument to adjust the musical length of game elements, as the second repetition acts as a buffer for very slow players. Moreover, parallel musical themes of several game elements are cross-faded when the player moves, which activates the musical concept of timbre changes, instrumentation or harmonic variations. tened or pop up too late after a very flat beginning phase. This behavior is not just unmusical; it catapults the music out of the subconscious and into the focus of the consciously perceived. In contrast, the music attenuation model of our system emulates how sound engineers mix several parts in a musical performance. Therefore, often a linear function is used. But at the beginning and end there are points of undifferentiability, which cause abrupt volume changes at the begin and end. As in graphical animation this jerky behavior appears to be mechanical unnatural. In order to obtain those smooth fadings as sound engineers achieve manually, we make use of a scaled cosine-function in the interval from zero to Π. Fig. 3 illustrates that sound sources in our distance model are characterized by two circumferences of minimal and maximal perceivable gains. Therefore, we need only one source to cover a small room or a wide yard. 3.5 Moving Sources The distance model presented in the previous section is well suited for compact and uniform locations. As background music is independent from sound effects and acoustic situations, it would be a fault to incorporate a physically correct sound rendering component which can consider sound barriers such as walls and reflections. This causes a problem with covering locations of a less compact shape (e. g., long, narrow, flat, or high rooms, corridors, or towers). Either several small sound sources have to be placed over the shape of the location or the maximal distance value of one centrally placed source is set very far. In this case one can hear this music through walls deep into neighboring rooms. For a more precise treatment we employ two strategies: Moveable sources within a bounding volume. Fig. 4 presents an example of this strategy, which is specially suited for in-door scenes. Game designers can assign a three-dimensional bounding volume for each source. The bounding volume approximates the spatial extent and shape of the location. By placing the sound source inside of this volume and as near to the listener as possible it can now cover the location more precisely. If the listener is inside the bounding volume the source is on his position. If he leaves it, the sound source stops on the border and follows his movements along the border. This behavior also prevents jumps when the listener enters the volume. Otherwise source jumps would sound like sudden volume changes. By the automatic alignment towards the listener this can be avoided, thus the fading is always smooth and believable. 3.4 A Music Distance Model Our system places musical elements as punctiform sound sources in a virtual 3D environment. In a correct acoustic model the impact of a sound source — its gain — depends on the distance to the listener (the nearer the louder). But physically correct attenuation or distance models are not appropriate to maintain the details which characterize a piece of music as several important musical aspects such as its dynamic (e. g., crescendi : getting louder or fading in and decrescendi : fade-out) would be too quickly flat57 Bonded moveable sources. Music can also be applied to other moveable game elements like characters and objects. They do not have to be enchained to their predefined positions or locations in the virtual world. Non-player-characters in games can move as well as the player. Music is attached to them by replacing sound sources at one go with the movement of the game object. This can be done by the game engine using a dedicated interface command offered by the music engine. is already introduced by going nearer to the person, the object, or the location. These special kind of scene transitions ans inner-musical ideas or processes often overlap or are interweaved in a way that this cannot be represented by a simple sequence of succeeding blocks. Modulation and tempo changes (ritardando, accelerando) are examples for this. Here the parallelism of multiple cross-fadeable parts can help, too. For modulation ambiguous chords can be used. In the same way it is possible to realize the evolution of a motif. Tempo changes can be achieved by changing the metrical conciseness. Here the composer is restricted in his possibilities and forced to use more complicated compositional tools to compensate this. Nevertheless, these solutions, although they can achieve analog results, are not equivalent substitutions for common changes of harmony, tempo, motif, and so on. These usually happen in transitional blocks which lead over to the next music. Since these blocks are played only once those processes can run “prerendered” inside of them. So with the use of One-Times these ostensible limitations can be conquered, too. Note, that the hard cut should still be available. It is used sometimes to mediate an abrupt change, a surprise and to shock. It may also still be useful for some types of measureless music. 4 Results and Discussion It is a challenging problem to develop methods which are able to evaluate the artistic quality of music — or any other kind of computer generated art. In our system, however, the quality of the background music accompanying interactive media heavily depends on the ability of an artist to compose multiple selfcontained parts in the building set manner, so that the parts of the score can characterize game elements convincingly. Hence, the main objective of our music engine is to guarantee that all adaptation techniques do not interfere with inherent musical structures (e. g., prevent abrupt breaks of melodic units, conflicting tempi and rhythms, dissonances). The main challenges in this scenario are to (i) integrate the automatic arrangement of musical elements and the transitions between them and to (ii) conceal boring repetitions, so that the player gets the impression of consistency, coherency and seemingly pre-composed music that accidentally fits to the interactive events. Except of a few standard cases, a classification of music transitions is not possible, because they eminently consider the specific musical contexts to which they attach. A music transition is usually as unique as the music it leads over. Therefore, a pure enumeration of supported music transitions does not reflect the quality of an adaptive music engine. In contrast, an evaluation has to consider the usability of compositional methods and techniques applied in music and music transitions for composers and game designers. In which way do they constrict the composer or enrich his possibilities? But as the first author of this paper is also the author of our system, the composer and the game designer, we cannot provide any proper results. The previous discussion reveals that the combination of all functionalities offers a rich and expressive pool of compositional techniques for adaptive music in interactive environments. But we are aware of the fact, that additional constraints can arise from specific musical styles. This raises the question, how different cross-fadeable parts can be? Can we integrate musical pieces which are not based on tonal music in order to extend the musical language used in interactive media? The poly-stylistic composition manner and the style-collages of composers like Bernd Alois Zimmermann (1918–1970) already affirm our hope that there are no restrictions to music for adaptable music in interactive media. Hence, composers are not forced into a specific musical style as the concept of parallelism and sequentiality are generally used in music. Furthermore, by including the building set manner already in the process of composition the results will always be faithful in style according to the composer’s intention. We developed two prototypes to demonstrate the capabilities of adaptive music in interactive applications: a 3D adventure game and a presentation presenting a picture sequence likewise an interactive comic strip. We believe that the techniques presented in this paper open up a number of new possibilities. Musical soundscapes, for example, can benefit from the fading concepts and with moveable sources they get a new powerful tool to establish a new never-before heard experience. It is even possible to let the music fly around the listener in all three dimensions. The three-dimensional arrangement of punctiform sound sources can furthermore be used for a positional audio effect and a surround output. Thereby the music can act as an auditive compass or an orientation guide. In the following, we will discuss some of our transition techniques in more detail: the most simple way to change from one musical idea or situation while preserving its unity is to finalize a musical block and append the next one. This corresponds to play the current music block over to the end and start the next. By partitioning the music into multiple short blocks the latency between interaction and music change can be reduced. But the composer has to be aware of the connectivity to the next music which can now follow after every block. For instance, the melodic connection should never possess any illogical ineligible jumps. Depending on the melodic mode this might entail a limitation for its evolvement. In movies the so called L-cut denotes a consciously asynchronous transition of sound and pictures. That means, the sound (including music) can switch noticeably earlier to the next scene than the pictures. Carried over to interactive environments this means to do the music transition before the actuating interaction is done or the trigger is activated. Of course, this simple approach (wait until the end of a block then do the transition) does not work for this quite extreme example. But it can actually be achieved by cross-fading. The music transition is already in full activity when the end of the block is reached because the new musical material 5 Conclusion and Future Work As already mentioned in Sec. 2, there is still a lack of compositional and arrangement techniques for music in new interactive media. This paper presents both (i) new compositional techniques for adaptive music in interaction media and (ii) an automatic realtime arrangement technique of pre-composed parallel and sequential musical elements. We have shown how they can be used to create a coherent musical accompaniment for interactive applications. 58 [11] B. Manaris, P. Machado, C. McCauley, J. Romero, and D. Krehbiel. Developing Fitness Functions for Pleasant Music: Zipf’s Law and Interactive Evolution Systems. In 3rd European WS on Evolutionary Music and Art (EvoMUSART), pages 498–507, 2005. By abolishing the hard cut we could ensure an appropriate musical performance and — more importantly — we could raise the effect of immersion to a higher level. With this solution interactive environments can approach the immersiveness of movies. In spite of non-predictable user interactions the background music never seems to be taken by surprise of any scene transitions or user actions. Our approach is able to integrate expressive scores from human artists. In oder to support their compositional style, traditional compositional techniques such as building set composition, multiple counterpoint, and multi-choir music which was up to now often just on the fringes grows up to new importance. All these aspects lead to a solution which includes technical and musical concerns as well. It actually opens up new musical spaces and possibilities. The limitations of our work also mark some directions of future research: the integration of a random selection between alternative group members or more flexible transitions can prevent the direct recognition of looping parts, the player recognizes and gets bothered. Furthermore, the stripline between musical blocks should not be forced to be synchronous for every track or source. Musical blocks can overlap e. g., by an offbeat. An enhanced distance or attenuation model can improve the fading between parallel blocks. It ensures that the fading always sounds believable and without baring any points of undifferentiability. But if the listener stops his movement new such points appear again because the fading stops with the same abruptness as the listener. To avoid this the listener movement should be handled with some inertance. Thus an always continuous and differentiable distance model can be built. [12] J. Manz and J. Winter, editors. Baukastensätze zu Weisen des Evangelischen Kirchengesangbuches. Evangelische Verlagsanstalt, Berlin, 1976. [13] J. McCormack. Open Problems in Evolutionary Music and Art. In 3rd European WS on Evolutionary Music and Art (EvoMUSART), pages 428–436, 2005. [14] W. A. Mozart. Musikalisches Würfelspiel: Anleitung so viel Walzer oder Schleifer mit zwei Würfeln zu componieren ohne musikalisch zu seyn noch von der Composition etwas zu verstehen. Köchel Catalog of Mozart’s Work KV1 Appendix 294d or KV6 516f, 1787. [15] J. Müller-Blattau, editor. Die Kompositionslehre Heinrich Schützens in der Fassung seines Schülers Christoph Bernhard. Bärenreiter, Kassel, 3. edition, 1999. [16] G. Papadopoulos and G. Wiggins. AI Methods for Algorithmic Composition: A Survey, a Critical View and Future Prospects. In AISB Symposium on Musical Creativity, 1999. [17] V. Rathgeber. Missa Civilis, Opus 12, Nr. 8. Johann Jakob Lotter Verlag, Augsburg, 1733. [18] J. Ratia. Der Jazzwürfel — Ein harmonisches Würfelspiel. netzspannung.org; Fraunhofer Institut Medienkommunikation, 2005. References [19] N. J. Schneider. Handbuch Filmmusik I — Musikdramaturgie im neuen Deutschen Film. Verlag Ölschläger, München, 2. edition, 1990. [1] T. Adorno and H. Eisler. Composing for the Films. Oxford University Press, New York, 1947. [2] H. Barwood. Cutting to the Chase: Cinematic Construction for Gamers. In Game Developer Conference, 2000. [20] M. J. Steedman. A Generative Grammar for Jazz Chord Sequences. Music Perception, 2(1):52–77, 1984. [3] G. D. Birkhoff. Aesthetic Measure. Havard University Press, Cambridge, 1933. [21] P. Todd and G. Loy, editors. Music and Connectionism. MIT Press, Cambridge, 1991. [4] R. L. de Mantaras and J. L. Arcos. AI and Music from Composition to Expressive Performance. AI Magazine, 23(2):43–57, 2002. [5] K. Ebcioglu. An Expert System for Harmonizing Four-Part Chorales. Computer Music Journal, 12(3):43–51, 1988. [6] K. Ebcioglu. An Expert System for Harmonizing Chorales in the Style of J. S. Bach. Journal of Logic Programing, 8(1–2):145–185, 1990. [7] A. Gartland-Jones and P. Copley. The Suitability of Genetic Algorithms for Musical Composition. Contemporary Music Review, 22(3):43–55, 2003. [8] L. A. Hiller and L. M. Isaacsons. Experimental Music: Composing with an Electronic Computer. McGraw Hill, New York, 1959. [9] J. P. Kirnberger. Der allezeit fertige Polonaisen und Menuetten Komponist, 1757. (trans.: The Ever Ready Composer of Polonaises and Minuets. [10] R. Kungel. Filmmusik für Filmemacher — Die richtige Musik zum besseren Film. Mediabook-Verlag, Reil, 2004. 59 THE DRUM PANTS Søren Holme Hansen University of Copenhagen Department of Musicology Klerkegade 2 DK-1308 København K, Denmark [email protected] Alexander Refsum Jensenius University of Oslo Department of Musicology P.O. 1017 Blindern N-0315 Oslo, Norway [email protected] Abstract. This paper describes the concept and realization of The Drum Pants, a pair of pants with sensors and control switches, allowing the performer to play and record a virtual drum set or percussion rack by hitting the thighs and waist with the hands. The main idea is to make a virtual percussion instrument with a high level of bodily control and which permits new visual performance possibilities. 1 Introduction 2 The Design Idea Drummers and percussionists have a habit of tapping themselves pretending to play their instrument. This can be seen as a way to practice without their instrument at hand, but also as a natural way of getting beats or rhythmic figures into the body. The latter reflects a close interrelation between musical sound, mental imagery and bodily experience as suggested in [1]. In this project we have been interested in using such a connection in the design of a new instrument. Most of the commercially available electronic drum and percussion interfaces are designed to simulate the acoustical instruments they are replacing, like the electronic drum set or different kinds of percussion sensing plates. Why not exploit the natural way of feeling the rhythm in the body and develop a new kind of interface which allows a closer contact between rhythm and body? Even though drum performances can be visually interesting, drummers are usually locked to their stationary instruments and do not have the same amount of physical freedom to participate and interact in visual performance on stage as for example singers, guitarists or saxophone players. It would be quite natural to see the drummer moving around to the beat he is playing. From this the drum performance would be an integrated combination of rhythm and dance and could thereby add a visually interesting dimension to the performance. Today, when programming beats, grooves and rhythmic soundscapes on the computer with sample software, there seems to be a need for more human friendly controllers besides keyboard/mouse and MIDI-keyboards. Of course, it is possible to use the afore-mentioned commercial drum interfaces, but there will still be a lack of flow in the programming as you normally have to switch from the controller to computer or sampler and back again. A solution where a drum interface and a sample-controller is combined in an ecologically sound design, would therefore be preferable and an interesting way of improving the flow and creative energy in the process of programming. In developing The Drum Pants we sought to integrate and solve these issues by means of a new wearable drum interface design. The idea of wearable electronic instruments has been exploited by a number of artists and musicians throughout the years, for example Joseph Paradiso [2], Ståle Stenslie [3] and Rolf Wallin [4]. These designs have focused on creating new sounds with new interfaces. We have been interested in exploring the control of traditional sounds with a new interface, and in the following sections we will focus on three main issues concerning the design: physical freedom while playing, sensor types and placements, and the dependence of the computer while playing. 2.1 Physical Freedom In order to give the most possible physical freedom for the performer, we chose to develop a pair of pants, since it leaves the upper body and arms free of electronics and wires. Cotton where chosen as the material for the pants in order to get a comfortable and light pair of pants. The sensors placed on the legs are flexible, which means that all kinds of physical activity are possible wearing The Drum Pants, like stretching, bending, walking, jumping, dancing etc. 2.2 Sensor Types and Placements The Drum Pants is implemented with six force sensors on the thighs, and seven digital switches plus one potentiometer around the waist. In addition to the pants there is a pair of shoes with a force sensor under one of the soles, connected to the pants with a wire (see Figure 1). The reason for using analog force sensors, and not just digital touch sensors, was to be able to get a natural connection between the level of tapping and the dynamics of the music. Furthermore, the possibilities of dynamic variation makes the beats produced more alive and authentic, as accentuation, ghost-notes and crescendo/decrescendo figures are possible. 60 record to create a loop. In general, the placement design around the waist makes it easy to control the sampler-functions while playing. In addition eight diode lights are placed right under the waist as a visual help in controlling the sampler-functions. Sensor interface Switches 2.3 Dependence of Computer All the sensors are connected with wires to a USB sensor interface placed in the hip pocket. The design of the Drum Pants with its various sampler-functions makes the performer independent of the computer while performing on stage or during programming beats or rhythms in the studio. On stage this means that the performer can focus on being in contact with the audience or fellow musicians rather than having to concentrate on the computer. In the studio, the integrated drum interface and sampler design helps keeping focused in a more fluent programming process. In addition, the overall mobile and flexible design brings a new dimension of physical activity into the discourse of drum interfaces, allowing the performer to walk, jump, dance etc. while playing. USB-cable Diode light 3 Implementation Potentiometer The implementation process involved the hardware design and developing software to use the pants. Force sensors Force sensor Cable 3.1 Sensor Interface The USB interface used is a Phidgets USB interface kit1 with eight analog inputs (the seven force sensors and the potentiometer), eight digital inputs (the switches around the waist) and eight digital outputs (the diode lights). Using the analog inputs the interface kit provides a standard 0-5V range. The interface is connected directly to the computer with a USB cable, and since it draws the necessary power from the USB cable, it is not necessary with a separate power supply. Of course, although it is possible to extend the range with powered USB hubs, the length of the USB cable is a clear limitation in a performance situation with this prototype. Figure 1: The Drum Pants. The placements of the force sensors were decided after studying how people “play” drums on their pants, and a wish to adopt some features of an acoustical drum set; i.e. placing the sensors so it is possible to use both hands and foot at the same time as for example hi-hat, snare drum and bass drum. Furthermore, it has been a priority to place the sensors in a way which makes it easy to tap them in an upright standing position, as it would add to the performer a great deal of physical freedom and mobility while playing. The natural area of tapping on the pants in upright standing position is shown in Figure 2. 3.2 Sensors The type of force sensor used for pants is the Flexiforce sensor2, which acts as a force sensing resistor in an electrical circuit. The Flexiforce sensor is a very thin and flexible strip, and is easily incorporated into a circuit (Figure 3). Figure 2: Natural area of tapping on pants in an upright standing position. Figure 3: The Flexiforce sensor. The sensing area to the right is 9.53 mm in diameter. Within this natural area of playing there are three force sensors placed on each leg/thigh, which makes it possible to switch quickly between the three sensors with one hand, while at the same time being spaced much enough to avoid unintended sensor activation because of imprecise tapping. As it is seen in figure 1, the solution is an unsymmetrical placement of the sensors on each side, since it was felt more natural for the right handed test person to have a sensor slightly more on the outside of the right thigh than on the left. The result is an instrument which feels natural to play, and requires very little practice to get used to, since it is based on the simple idea of hitting your thighs and waist. The two outer pins of the connector are active and the center is inactive. The sensing area is only 9.53 mm in diameter, so it has been necessary to construct a flat cone of thick cardboard which is glued on to the sensing area in order to increase the hitting area (black circles on the pants in Figure 1). The force range of the sensors placed on the pants is 0-4.4 N, whereas the range for the sensor placed under the shoe is 0-110 N to allow for a greater force. Due to the thin (0.208 mm) and flexible design of the Flexiforce sensor a very close contact between sensor and body is possible. The potentiometer and digital switches placed around the waist on The Drum Pants are standard sensors, even though the design The digital sensors and the potentiometer around the waist serve as control buttons for different sampler-functions, as for example volume, effects, change of instrument presets, recording etc. The design of the digital sensors makes it possible to activate them by tapping, which is practical for fast and accurate control, for example when starting and stopping 61 1 http://www.phidgets.com/ 2 http://www.tekscan.com/flexiforce.html of the switches is carefully chosen as it allows activating by tapping and not switching. 3.3 Pants The potentiometer and digital switches are positioned on the outside of the pants by means of sewing, while the force sensors and all wires are fastened with strong tape on the inside of the pants. In a future version it would be desirable to use a more durable solution than tape, but for this prototype model it made the implementation process more flexible. 3.4 Programming The software is developed in Max/MSP using an external object from Phidgets to read values from the sensor interface. Figure 4 shows an overview of the data process from sensor to sound. The Drum Pants with sensors Sensor interface Filtering, scaling and segmentation Max/MSP Logic/control Mapping Synthesis module Sound Figure 4: Schematic overview of the data process from sensor to sound. Figure 5: Main patch of The Drum Pants software. The incoming data are run through a simple FIR-filter to smooth the data, and a threshold function is set to prevent unwanted attacks. In the patch (Figure 5), the filtered, scaled and segmented sensor data are divided into two signals; one for playing back a sample, and one for controlling the volume of the played sample. This is the basis of the real-time playing mode, where different sound bank presets can be selected with a sound-preset multiswitch. In addition to the real-time playing mode, there are three different recording modes: multilayered loop-recording (MLR), full drum set loop recording (FDLR) and master recording (MR). The sound module used for the prototype has been a simple sampler made in Max/MSP (Figure 5). The focus has been on creating a system which can easily be controlled from only a few centrally placed buttons. This was achieved with a “multiswitch” patch, which makes it possible to select up to 8 different channels with only one digital switch. The diode lights on the pants give visual feedback when controlling the multiswitch, as the number of lights turned on reveals which channel is chosen. With the MLR, a rhythmic figure can be built up by recording one sensor at a time in separate tracks. The sensor to record is chosen with the multiswitch, which also functions as activation of the MLR mode. The first recording made will be looped automatically and serves as the master-synchronization reference for all later recordings. The FDLR allows the performer to record several sensors simultaneously to one loop, as for example bass drum, snare drum and hi-hat. This recording mode is automatically activated when choosing sound bank preset three or four. With MR the main audio output is recorded, which means both loops and real-time playing. A separate switch is used for this purpose. It is possible to easily save the MR in The Drum Pants start window. 3.5 Mapping Although we have only tested the pants with a sampler module, all the patches have been designed so that the pants can easily be used to control any type of sound module, for example a physical model or synthesis. Since all values are scaled to a 0.1. range, they can easily be used to output data as Open Sound Control (OSC) messages [5], or be scaled and output as MIDI messages. When choosing a channel with the multiswitch, it is possible to add two different types of effects to the sample: delay and manipulation of the sample-speed. The delay-time and samplespeed are then controllable with the potentiometer as with the main output volume. A switch for default settings, i.e. clear all recordings and added effects, makes it is easy to start all over again. All functions of the software are controllable from the pants. The latency of sound when tapping the sensors in real-time playing mode is practically unnoticeable, which is essential when playing rhythms. We experience some latency when adding a lot of tracks and effects, but this could be improved by using a dedicated sampler program rather than our own sampler. 62 Finally, we will also be looking at possibilities for creating a fully embedded system, based on a mini-computer. Imagine being able to bring a full instrument with you inside your clothes. Just plug in a pair of headphones and you will have the ultimate mobile instrument, allowing you to practice and play anywhere, anytime. 4 Conclusion In this paper we have presented The Drum Pants, a wearable interface built around the idea that drummers like to “play” drums on their own body. The interface offers a new drum playing experience and is inspiring when creating beats and grooves. The fact that the performer also feels every tap on his or her own body provides the playing with an enhanced intimacy between rhythm and body compared to other electronic interfaces. In addition, the sampler-functions which can be controlled from switches placed at the waist line, makes it possible to easily create interesting multilayered rhythmic figures. 5 References [1] Godøy, R.I., E. Haga, and A.R. Jensenius. Playing “Air Instruments”: Mimicry of Sound-Producing Gestures by Novices and Experts. Paper presented at the 6th International Gesture Workshop, Vannes, France, 18-21 May, 2005. Besides the use in performance, we also think this instrument can be interesting in the field of music education. Playing the Drum Pants offers an intimate connection between rhythm and body, and could be used to study rhythm and its relation to human motor functions. In the future we would be interested in studying how people develop sense of rhythm by tapping themselves? Does the combination of sound, rhythm and intimate body experience ease the learning of human motor control? In continuation of this, it is interesting to imagine a model of the Drum Pants designed for children as a pedagogical toy to use both at home and in educational institutions. [2] Paradiso, J. and Eric Hu. Expressive footwear for computeraugmented dance performance. In Proceedings of the First International Symposium on Wearable Computers, Cambridge, MA. IEEE Computer Society Press, 1997, 165– 166. [3] Stenslie, Ståle. EROTOGOD: The synesthetic ecstasy. http://www.stenslie.net/stahl/projects/erotogod/index.html [4] Wallin, Rolf. Yó (1994) for controller suit and computer. http://www.notam02.no/~rolfwa/controlsuit.html Future development includes developing a sensor clip-on system, such that it is easier to adjust the sensor positions to match different performers. This could also be the solution for creating pants that can be washed, which is not possible with the prototype. [5] Whright, M. and A. Freed. Open sound control: A new protocol for communicating with sound synthesizers. In Proceedings of the International Computer Music Conference, Thessaloniki, Greece, 1997, 101-104. We will in addition be looking into improving the control of the sampler functions, with some kind of matrix-controller, controlled for example from a glove. 63 Backseat Playground John Bichard*, Liselott Brunnberg*, Marco Combetto#, Anton Gustafsson* and Oskar Juhlin* * Interactive Institute, P O Box 24 081, SE 104 50 Stockholm {john.bichard, liselott, anton.gustafsson, oskarj}@tii.se http://www.tii.se/mobility # Microsoft Research Cambridge UK, Roger Needham Building , 7 J J Thomson Ave, Cambridge CB3 0FB, UK [email protected] http://research.microsoft.com/ero/ Abstract. We have implemented a conceptual software framework and a story-based game that facilitates generation of rich and vivid narratives in vast geographical areas. An important design challenge in the emergent research area of pervasive gaming is to provide believable environments where game content is matched to the landscape in an evocative and persuasive way. More specifically, our game is designed to generate such an environment tailored to a journey as experienced from the backseat of a car. Therefore, it continuously references common geographical objects, such as houses, forests and churches, in the vicinity within the story; and it provides a sequential narrative that fit with the drive. Additionally, it is important that the player can mix the interaction with the devices with as much visual focus as possible on the surrounding landscape, in order to generate a coherent experience. The implemented user interaction is audio centric, where most of game and narrative features are presented as sounds. Additional interaction through movements is integrated with audio in the form of a directional microphone. 64