amc_proceedings_low

Transcription

amc_proceedings_low
Audio Mostly 2006
PROCEEDINGS
A CONFERENCE ON SOUND IN GAMES
OCTOBER 11-12
Proceedings of the
Audio Mosty Conference
- a Conference on Sound in Games
October 11 - 12, 2006
Piteå, Sweden
In collaboration with:
Contents
Music Videogames: the inception, progression and future of the music videogame
5
Lyall Williams
Computer Game Audio: The Unappreciated Scholar of the Half-Life Generation
9
Stuart Cunnigham
Autoring of 3D virtual auditory Environments
15
Niklas Roeber, Eva C. Deutschmann and Maic Masuch
From Heartland Values to Killing Prostitutes:
An Overview of Sound in the Video Game Grand Theft Auto Liberty City Stories
22
Juan M. Garcia
Physically based sonic interaction synthesis for computer games
26
Stefania Serafin and Rolf Nordahl
The Composition-Instrument: musical emergence and interaction
31
Norbert Herber
Investigating the effects of music on emotions in games
37
Katarina Kiegler and David C Moffat
REMUPP – a tool for investigating musical narrative functions
42
Johnny Wingstedt
On the Functional Aspects of Computer Game Audio
48
Kristine Joergensen
Composition and Arrangement Techniques for Music in Interactive Immersive Environments
53
Axel Berndt, Knut Hartmann, Niklas Roeber and Maic Masuch
The drum pants
60
Soeren Holme Hansen and Alexander Refsum Jensenius
Backseat Playground
64
John Bichard, Liselott Brunnberg, Marco Combetto, Anton Gustafsson and Oskar Juhlin
Audio Mostly 2006
Music videogames: the inception, progression and future of the music
videogame
Lyall Williams
Keele University, UK
[email protected]
Abstract. Over the last 10 years, the genre of the music videogame (or rhythm game) has become a staple of home videogame
consoles, and has developed a strong presence in the arcade. In these games, the player is actively involved in the creation or
playback of music or rhythm. The purpose of this poster is to firstly describe the genre of the music videogame, contrasting it with
that of the audio game (in which the player needs little or no visual feedback to play, originally developed for blind or visually
impaired players). I shall consider the origins and early titles, and then outline some important contemporary works, paying specific
attention to the sonic and visual aesthetics developed within each title, and how they contribute to the genre as a whole. Among the
titles I will consider are “Parappa the Rapper”, widely considered to be the first true music videogame; the phenomenon of the
“Bemani” series, which brought music games to the arcade and introduced the first custom-designed controller inputs; and “Donkey
Konga”, which came with a pair of bongos for Nintendo’s home console, the Gamecube. I will also consider the potential for high
levels of interactivity between the player and the music/rhythm in these titles, whilst noting how most current titles do not develop
this potential as fully as they might; despite this overall shortcoming, I shall examine one game, Rez, that challenges the music
videogame formula and serves as an example of how multiple levels of audiovisual and physical interactivity can work together to
create an immersive and original game experience.
signifying moving blocks. 6 Some audio games overlap with
music videogames in their gameplay - Sonic Match (for PC), 7
for instance, plays distinct sounds and matches them with
arrows on the keyboard, then tests the player by playing back
one of these sounds and awaiting the correct keypress. If the
player presses the right button another sound is played which
must also be matched with a correct keypress, and gradually as
the game progresses, the length of time the player has to respond
to each new sound decreases. This turn-based sound gameplay
bears similarities to turn-based music videogames, although it is
much more simplistic than any recent commercial music
videogame title.
1 What is a music videogame?
Music videogames are audiovisual games 1 in which the player is
actively involved in the creation or playback of music or rhythm
(music videogames are also sometimes known as rhythm
games). This usually occurs in the form of keypress instructions
scrolling across the screen which the player must respond to
with appropriate timing. Most music videogames relay
instructions in one of two manners: either a constant stream of
instructions that must be pressed in order, or in turn-based
volleys of instructions. It could be deduced from this that all that
constitutes a music videogame is following rhythmic
instructions; 2 not all music videogames follow this strict type of
gameplay, however, as I shall discuss later.
2 Origins and early titles
Music videogames should not be confused with audio games, a
genre which started out as games for the blind. 3 In contrast to
music videogames, audio games have little or no visual
information, and rely entirely on gameplay that connects sound
with control input. They are often given away as freeware and
are usually quite simplistic, although some of the more
ambitious projects have included a game modification to let the
player play id software’s 4 fast-paced first-person shooter Quake
using audio alone (through an artificial-voice instruction system
and various 3D audio positioning effects), 5 and an audio version
of block-puzzle game Tetris with a complex system of notes
The origin of the music videogame is difficult to pinpoint.
Perhaps the earliest game to be oriented entirely around music
was Grover’s Music Maker 8 from 1983 (for Atari 2600), which
features Grover from the Muppet Show dancing to primitive
renditions of popular songs like “Old Macdonald Had a Farm”.
There was no interaction on the part of the player, however, and
thus cannot be considered a music game in the sense I have
described.
While not technically a “video” game, the late 70’s electronic
toy “Simon” had a large round plastic base quartered into 4
brightly coloured buttons which the game would light up in
sequences of gradually increasing length; the player needed to
match the sequence each time to progress. The “music” in this
case was varying beeps, which were very basic owing to the
early sound technology used. This style of game transferred over
to some computer games, in particular Break Dance for the
Commodore 64, in which the player had to follow the
increasingly lengthy dance moves of an on-screen character by
1
Here I mean games which require both audio and video to be playable,
or games which largely lose meaning when audio or video are not
present. By this definition many games are audiovisual, and almost all
recent games are audiovisual.
2
Wolf tightly defines music videogames as “games in which gameplay
requires players to keep time with a musical rhythm”, p130. I dislike this
description as it implies no creative input on the part of the player,
which I feel certainly should be a part of music games, even if it isn’t yet
in most cases.
3
http://www.audiogames.net/
4
“id” is intentionally left lower case.
5
http://www.agrip.org.uk/FrontPage
6
http://inspiredcode.net/Metris.htm
http://www.bscgames.com/sonicmatch.asp
8
The title was never released, and only exists in prototype form.
7
5
pressing the appropriate keys, in a turn-based fashion. A similar
gameplay feature can be found in some parts of Pinocchio for
the Megadrive and SNES (1995/6 respectively). So, musically
influenced gameplay is nothing new.
for the Playstation, setting it apart from the previous generation
of consoles, in graphics, sound, gameplay, and storyline. 10
A sequel was released to Parappa the Rapper, “UmJammer
Lammy” (sic), starring one of the cast of the previous game (the
titular Lammy), and in this game the focus was shifted from
rapping to guitar riffing; most gameplay aspects of the game
were identical, however, and by this point several other music
videogames were becoming popular, primarily in the arcades:
beatmania 11 and its various spin-offs by developer Konami (the
series and many related games are generally referred to as
“Bemani”, after the division within videogame developer
Konami that creates them) 12 . All beatmania arcade games
feature custom controllers, with a spinning turntable for
“scratching” and several black and white piano-like keys;
commands descend from the top of the screen in a constant
stream and must be pressed in the correct order at the correct
time. The keys pressed by the player are displayed at the bottom
of the display. Correctly timing the controls so that the relevant
keypress coincides with the relevant note hitting the bottom bar
is not just an important aspect of gameplay (greater accuracy
results in a higher score), but also a crucial aspect of musical
delivery – when the right key is pressed at the right moment, the
background music will be added to with extra instruments,
samples, and effects.
The game that is usually credited with being the first true music
videogame, however, is Parappa the Rapper. In the next section
I shall consider the sonic and visual aesthetics of this game and
several other important titles in the music videogame genre.
3 Sonic and visual aesthetics in music
videogames
Parappa the Rapper, released for the Playstation in 1996, was
designed by Masaya Matsuura (who had previously been in a
band known for their progressive electronic music). The titular
character is a talking, paper-thin cartoon dog that is taught to rap
by 8 bizarre sensei, including a kung-fu master onion, in order to
win the love of a sunflower. The gameplay followed a fairly
simple theme – each sensei “raps” a line, represented by various
controller buttons (square, triangle, circle, cross, left and right)
indicated at the top of the screen, and you must repeat their
commands by pressing the appropriate buttons; as play
progresses the button combinations get faster and more
complicated. Failure to correctly respond to the instructions
results not only in a poor score, but in the background scenery
falling apart, and distinct changes in music to indicate to the
player their failure. If the score drops too low, the player must
repeat the level from the start.
Among the first music games to feature a custom controller,
beatmania was also responsible for bringing music games to the
arcades, where they remain hugely popular to this day in Japan,
inspiring people to develop breathtaking levels of skill. Unlike
the happy, bright pop music found in Parappa and UmJammer
Lammy, the beatmania series tends to feature serious “real”
music, with commercial dance music (trance, rave etc) being the
most popular. That said, the series is so lengthy (the game in the
video above, beatmania IIDX, has up to 24 “mixes” or
variations of the musical selection) 13 that many types of music
have now been included. One reason the games have stayed
popular, and in turn been so prolific, is the ease with which the
arcade machine can be rejuvenated if takings drop: simply
install a new “mix” with a new selection of songs, and gamers
have renewed interest in play. 14
The visual aesthetic in Parappa was unlike any seen in a
videogame previously, and borrowed from Western
Nickelodeon-style cartoons as much as Japanese anime (one of
the key visual designers, Rodney Greenblat, was an American
artist). By comparison, the music is varied, but fairly
unchallenging: it certainly isn’t what most people would
consider “rap” music in the west. Like the visuals, though, the
music is bright and comedic. The game was a huge success in
Japan, despite (or perhaps as a result of) the lyrics of the songs
being in English with Japanese subtitles; English-language
releases of the game had the same audio, and though it’s unclear
who wrote the lyrics, calling them unusual would be charitable –
at times they’re utterly nonsensical. This may well be
intentional, rather than a mere mishap of translation, since it
suits the odd visual aesthetic. The game did not sell as well in
the West, and I would suggest this is due to it confusing a large
portion the “young adult” audience that Sony had targeted with
the Playstation: a rap game with no rappers or even stereotypical
rap music in it, which looks like a children’s cartoon. 9
Visually, the beatmania games also clearly aim to be more
mature than Parappa or UmJammer Lammy, typically featuring
swirling 3D patterns in time with the music. These sections of
video are superfluous to gameplay and offer no explicit benefit
to the player, but contribute to the overall beatmania
“experience” (Bemani games are often loathed by their
detractors as much as they are loved by their fans, due to the
extremely noisy and dominating nature of their arcade cabinets).
In the West, there are far less arcades than in Japan, and it is
perhaps as a result of this that beatmania has not attained a
similar level of popularity. One Bemani offshoot that has
eventually achieved awareness, however, is Dance Dance
Revolution (or DDR). 15 DDR differs from beatmania in a few
respects: the player’s feet on sensor pads are the control method,
Parappa the rapper is an important title in videogames as a
whole, as it represents a postmodernist sampling of varied
cultural aesthetics (both with the game’s multi-cultural visual
aesthetic, and its varied musical selection), and a refusal to aim
for “high-brow” serious entertainment; it also fits into the
postmodernist paradigm by clearly showing self-awareness and
acceptance of its nature as a videogame, by eschewing any sense
of realism in favour of the absurd. It was an important early title
10
Having a simplistic love story as a videogame plot is particularly rare,
as noted by Kohler, p153-4
11
The “b” is deliberately left in lower case:
http://en.wikipedia.org/wiki/beatmania
12
http://en.wikipedia.org/wiki/Beamani
13
http://en.wikipedia.org/wiki/Beatmania_IIDX
14
Kohler, p155
15
http://www.konami.co.jp/am/ddr/
9
In contrast, cartoon animation and comic books are popular with all
ages in Japan, with Manga (comics) accounting for 40% of all printed
books and magazines
(http://library.thinkquest.org/C0115441/article1.htm)
6
rather than hands on keys, and there are also only 4 directions
(up, down, left, and right) for the player to press, as opposed to
the 7 keys + turntable that some beatmania games include. DDR
machines almost always have floor pads set up for two players
(taking advantage of this, some single-player DDR routines
require the player to dance across from the first player position
to the second player’s floor pads). DDR is also more forgiving
of player errors than beatmania games.
Western DDR machines typically feature pop-trance, dance
anthems etc – reasonably similar in genre to Japanese DDR
machines, but often including popular Western dance music
tracks. Visually, DDR and beatmania machines look similar,
with large amounts of neon and flashing lights. DDR gained
Western media recognition in 2004, when it became clear that
people who played the game regularly were losing considerable
amounts of weight, 16 and the game has since been considered by
several American schools for fitness classes. 17 Demand for
home versions of both beatmania and DDR have obviously been
high, and Konami and other hardware manufacturers have been
happy to provide (often very costly) custom controllers for the
home, 18 and release many home versions of Bemani games. 19
representing a difference in the perceived demographic of US
and European Gamecube owners by Nintendo.
The European version of Donkey Konga also includes more
covers of classic Nintendo theme tunes (Super Mario Bros
theme, Legend of Zelda theme, Donkey Kong Country theme
etc, all done in a “conga” style). It could be interpreted from this
that Nintendo consider European Gamecube owners more likely
to be familiar with their back catalogue of games; however, this
seems unlikely, given that the majority of the titles in question
were released during the 80s and early 90s, when Britain and
Europe were Sega strongholds. 22 In light of this, I think it is
more likely that these tracks were not included in the US release
to make way for the children’s songs, or other more UScompatible songs. 23 Interestingly, all three releases have the
same two classical tracks: Brahms’ Hungarian Dance no.5 in G
Minor, and Mozart’s Turkish March.
4 Criticism of the genre
The games I have looked above constitute a fair cross-section of
the music videogame genre, from its inception to some of the
more recent titles. I do not have space to look at the many other
important titles in detail, nor would such an investigation
probably yield much more productive results: I have deliberately
chosen as wide a musical cross-section as possible in the titles I
have discussed, and the gameplay of most music titles is very
similar. This latter point brings me to perhaps my biggest
criticism of the genre – the gameplay is often more or less
identical. Many music videogames feature striking visuals
paired usually with a gimmicky controller of some kind, and it
could be argued that these are trying to make up for the fact that,
behind all the gloss, the essential gameplay isn’t very far
removed from the Simon electrical toys of the late 70’s. The
level of interaction between the player and the music, too, is
sometimes lacking, with the player often merely keeping up
with the music rather than actively being involved in its
creation. While the music and sound in Parappa fluctuates
depending on player ability, in Donkey Konga the game barely
reacts to the player’s actions; music videogames would benefit
from an increased level of player involvement in the music.
Konami are not alone in releasing custom home controllers for
music videogames. In 2004, Nintendo released Donkey Konga
for Gamecube, a music title that is intended to be controlled
with a pair of specially made bongos. The gameplay is
somewhat similar to beatmania, although all commands travel
down a single line one at a time, and there are only four
commands: left bongo (yellow), right bongo (red), both bongos
together (purple), and clap (blue spark); 20 As a result the game
is easier to learn than beatmania. As the player plays the bongos,
drum samples are added to the soundtrack and various
animations occur on screen.
Donkey Konga is a far less serious affair than the Bemani games
I have considered above. The game is amusing to watch and
play, with colourful graphics and a light-hearted selection of
music. Unlike Parappa the Rapper, the songs included in
Donkey Konga are not only different in Japan and the West, but
also between the USA and Europe. 21 The majority of music in
the Japanese game is, unsurprisingly, Japanese music. The
differences in musical choice between the USA and Europe
games, however, is more interesting: the “European” release
contains an English version of Nena’s “99 Red Balloons”
(originally released in German), and 2 Latin American tracks,
but nothing that’s clearly from mainland Europe; instead,
several British bands are featured (Supergrass, Queen,
Jamiroquai, among others) as well as many American tracks.
This British/American cultural bias is likely to be Nintendo’s
way of avoiding the costs involved with producing relevant
localisations across Europe. The US selection includes a number
of children’s songs (Happy Birthday to You, Itsy Bitsy Spider,
She’ll Be Coming ‘Round the Mountain etc), possibly
One game that has tried to experiment beyond the usual confines
of “strings of instructions” gameplay, and made significant
progress in the way the player interacts with the music, is Rez,
originally released for the Sega Dreamcast. The gameplay for
Rez is radically different to all the music videogames I have
described above: it belongs to a group of games called “on-rails
shooters” where player usually cannot directly move their
character but can move a crosshair of some kind in order to
target oncoming enemies and defend themselves What makes
Rez a music videogame is that every action (locking on, firing,
explosions) results in distinct beats which the game places (as
best it can) in time with the music. Skilled players can learn to
play the game in time with its music, and in this way playing
Rez (like many other music videogames) can often be as much a
public performance as a game in its own right. The difference
between Rez and many other music games is that it is a pleasure
not just to watch, but also to listen to someone who knows how
to play Rez really well, and in this respect Rez comes far closer
16
http://www.getupmove.com/media/cnn.pdf
http://news.bbc.co.uk/2/hi/technology/4653434.stm
18
Such as this $479 DDR dance pad:
http://www.amazon.com/gp/product/9756097027/qid=1147011582/sr=1
-1/ref=sr_1_1/002-77373808122452?s=videogames&v=glance&n=468642
19
There are 12 home releases of DDR alone:
http://en.wikipedia.org/wiki/Dance_dance_revolution#Home_releases
20
A small microphone is included in the bongo unit.
21
Lists can be found http://uk.cube.ign.com/articles/455/455683p1.html
for the Japanese version, http://en.wikipedia.org/wiki/Donkey_Konga
for the US and Europe versions.
17
22
http://www.sega-16.com/Genesis-%20A%20New%20Beginning.htm
Such as the American folk song “I’ve Been Working on the Railroad”
or Motown tracks like Diana Ross and The Supremes’ “You Can’t
Hurry Love”.
23
7
to realising the potential of a music videogame – that of musical
creation, rather than repetition.
Rez has a fairly unique visual style. There have been other
games which provide vaguely similar graphical effects for game
consoles, although these have generally been either user
controllable visualisations (like Baby Universe for the
Playstation, which allows players to insert their own audio CDs
and play around with visual controls on-screen, and Jeff
Minter’s recent Neon Virtual Light Machine for the Xbox 360),
or have not permitted the player to interact directly with the
sound (such as N20, and Internal Section, both for Playstation).
The soundtrack itself is unusual for a videogame, if not
particularly ground-breaking –the ambient trance tunes are well
executed, and are the product of several respectable artists, 24 but
they do not break much new sonic ground, and at times feel
sparse (perhaps necessarily so, to leave room for the player to
create their own sounds over the top). Instead, it is the level of
interaction between the player, the graphics, and the sound that
makes Rez such a remarkable game - pulses of sound are
represented with oscilloscope-esque visual pulses in the polygon
formations onscreen, and player controls directly result in music
and rhythm.
Despite this innovation, Rez falls under my other concern about
music videogames, which is less of a criticism of the genre
itself, and more a problem for developers: players often have
quite specific music preferences, and more problematically,
there are specific musical styles which they actively dislike. In
designing music videogames, developers are confronted with
several options: select a specific genre and stick to it, resulting
in a smaller user base with a high level of interest (the fanatics
of beatmania and DDR are testament to this); create popinfluenced “comfortable” music, and risk alienating people who
dislike such music (Parappa’s bouncy, happy music was not
universally appreciated); or try and shoehorn every type of genre
in for good measure. The latter approach seems like the most
obvious, but it will only work with certain types of music
videogame: I got quite aggravated at having to play bongos in
time with Take That songs in order to make progress in Donkey
Konga. I don’t think there is an easy answer to this problem; it
must be tackled on a per-game basis – clearly, though, there is
room in the market for all kinds of music videogames.
5 Conclusion
In this paper, I have examined a few key music videogame titles,
all of which exhibit great visual and sonic flair; however, since
their true beginnings in the mid 90s the gameplay formula has
become rigid and little evolution has taken place. To reach their
full potential, music videogames must be developed to include
greater interaction on the part of the player, and new modes of
play must be designed to accommodate this.
References
Kohler, C. Power Up: How Japanese Video Games Gave the
World an Extra Life. Bradygames, 2004
Wolf, J.P. The Medium of the Videogame. University of Texas
Press, 2001
24
Such as Adam Freelander and Coldcut:
http://www.sonicteam.com/rez/e/sounds/index.html
8
Computer Game Audio:
The Unappreciated Scholar of the Half-Life Generation
Stuart Cunningham, Vic Grout & Richard Hebblewhite
Centre for Applied Internet Research (CAIR), University of Wales
NEWI, Plas Coch Campus, Mold Road, Wrexham, LL11 2AW, North Wales, UK
{s.cunningham | v.grout | r.hebblewhite}@newi.ac.uk
Abstract. Audio has been present in computer games from the original sinusoidal beeps in Pong to the Grand Theft Auto
soundtracks recorded by world-famous musical artists. Far from being an overemphasis, the word “soundtrack” is highly appropriate
to the role that audio in games has played up until now. It sits comfortably-and as an equal-alongside Computer Graphics, Artificial
Intelligence, online multiplayer gaming and new interactive environments as one of the main driving forces in both technology
development and the acceptance of gaming as a core social activity.
In this paper we provide a historic synopsis of the integration of audio in games and attempt to establish if the auditory field has
advanced the diversity of games and driven the market to the same extent as its visual counterpart - computer graphics. From this
perspective, we discuss possible reasons for gaming trends and propose how a new generation of computer games could be driven by
enhanced aural stimulation and/or excitement, the potential for which has not yet been realised. In particular, we consider how
developments in soundtracks and other audio material, along with innovative interfaces, can make games and gaming more
accessible to those with various disabilities, in particular, limited vision.
by the dazzling visuals of a product than by spending time
interacting with it. Perhaps we are judging the book by its cover.
To this end, games are traditionally graphically oriented, and it
is often recognised that the audio factors in games tend to act as
background fillers [1, 2]. However, we believe that the
diversification of audio in games can lead to new and innovative
products which can stimulate interest, and moreover, be useful
to a variety of users some of whom might not have full access to
traditional games due to some impairment. This is generally
recognised by other experts in the field [1, 2, 3, 4, 5, 6, 7, 8].
Therefore, investigation into this area is vital.
1 Introduction
All but the earliest, most basic, of computer games have
contained some element of sound and audio. The complexity of
in-game audio and music has grown at roughly the same speed
as the field of computer graphics and, as games have developed
in these areas, so has the game audio. To this end, soundtracks
in games are coveted by international recording artists and
games music is now usually written by professional composers
and musicians. Games are scored just like a big-budget
Hollywood movie.
As part of our research, we undertook a pilot study of computer
and video game players. This allowed us to determine particular
gaming preferences and also to begin to assess to what extent
audio in games is important to these users, and whether or not it
influences them in deciding if they would purchase a game.
Furthermore, we also investigate whether or not users would be
interested in games which were developed to employ sound and
audio as the principal method of interacting with, and
controlling, the game environment.
This started as the games did in the early 1970’s with games
such as Pong and Space Invaders which were supported by the
inclusion of simple sounds using primitive synthesis techniques.
Games would often have limited voices and a small range of
actual sound effects. Early attempts were made at producing
music to accompany games, which generally consisted of rather
quantised rhythmic sequences being constructed from the
available sets of tones. In the 1980’s the music and sound effects
in games took steps towards what we now know as a game
soundtrack with the development of FM and Wavetable
synthesis and the emergence of the MIDI set of standards. Most
notable in this decade were the Atari ST, Commodore 64 with
the SID chip and the Nintendo Entertainment System (NES).
The 1990’s saw the PC become a more dominant player in the
games market with the release of the popular SoundBlaster
series of sound cards and processors. Sampled audio was no
longer a rarity. This trend has proliferated to the present day and
sample-based, waveform audio is the standard method by which
sound effects and music are achieved in games. Most recently,
games have diversified by taking advantage of surround sound
systems; the processors for which are now almost a standard
option on most new computers. Games like Wing Commander
III used well-known actors in-game, and recently the Grand
Theft Auto series has seen big name recording artists being
employed on the development of the soundtrack.
2 The Importance of Audio in Games
As part of our study into the factors which influence gamers
when choosing a new game, we attempted to ascertain how
important the gamer considers the audio artefacts and the
musical soundtrack. This was achieved by asking each subject to
designate what the most important factor was when they are
choosing a game to purchase.
Our aim is to show that users will usually rate other factors such
as the playability and visuals of a game much higher than the
sound and music, further demonstrating that the focus upon
computer and video games tends to be in the areas of the
graphical domain. The results of this are depicted in Figure 1.
Although the support and inclusion of sound in games has
diversified as faster processors, larger storage discs, and CD and
DVD technology proliferated, the main focus to grab the a
player’s interest has traditionally always been the visuals and
graphic effects. This is perhaps second only to the playability of
a game. Still, one finds it much easier to be impressed quicker
9
We also found it useful that a relatively high-proportion of users
believed that the interface of a game was also of high
importance, since we discuss, in this paper, the potential for
audio to be used as a way of interfacing to-and-from a game
scenario. It may be that users would be more amenable to
auditory interfaces driving their interest in a product, rather than
the actual content of any music or sounds. As expected, the
sounds present in a game were cited by a low percentage of
those surveyed as being an influencing factor. The users who
chose the ‘Other’ category on this occasion also stated that the
factor important to them was the story of the game.
Most Important Feature
70
Gamer Rating (%)
60
50
40
30
20
Finally, in order to determine that users have some interest in
the audio or music contained in a game, we specifically queried
whether or not the musical soundtrack in a game would
influence them, given that this has been a particular growth area
in the games industry. The result of this is shown in Figure 3.
10
0
Playability
Sound
Interface
Graphics
Online Gaming
Other
Does the soundtrack of a game make
you more interested in playing or buying
it?
60
56
Figure 1 - Most Important Game Feature
Gamer Rating (%)
Not surprisingly, we found that the most important factor to
users who intend to buy a game is the playability. The rating for
all of the other possible factors are negligible, although perhaps
somewhat surprising is that fact that none of the users rated the
sound or musical elements of a game to be in any way important
to them when deciding upon a game to buy (QED!) In fact, the
ability to play a game online with other users took favour over
audio which is an intriguing insight into the mind of the 21st
Century games player. Users who chose the ‘Other’ category
were prompted to provide an explanation of what that particular
factor was. Some samples of the responses received here were:
“Depth and Creativity”, “The whole package”, and two users
stated that the story or scenario were the most important.
40
40
30
20
10
4
0
Yes
To get a deeper insight into what is important to users in a game,
and in anticipation that playability would be the top priority; we
then asked the same users what the next most important feature
was in a game. This took the same form, and had the same
categories as the initial question. The responses received are
shown in Figure 2.
No
Don't Know
Figure 3 – Interest in Game Soundtrack
The results gained form this very specific question show that
there is no distinct defining trend among the sampled users. This
is reflective of that fact that playability and graphics are top
priority with most gamers, and that the soundtrack appears to be
of some interest, but perhaps would not have a heavy influence
on a prospective game buyer. For this reason, it is probably that
a lot of developers choose not to risk large sums of money on
new ideas for audio technology, only to find that it only appeals
to small audience. The game development industry is already a
risky business and most game development companies go
bankrupt after releasing their first title.
Second Most Important Feature
50
45
40
Gamer Rating (%)
50
35
30
As a passing note it was interesting to note the particular genre
of games favoured by the users who were studied. Of the users
we surveyed the majority favoured Role-Paying Games (RPGs),
followed closely by those who preferred Shoot-‘em-up style
entertainment. We believe a future study of interest could be to
investigate whether the favoured game genre affects the
particular factors which users specifically look for in games. For
example, role-playing games have been traditionally much more
limited in terms of their graphic and aural flamboyance, which
much more emphasis being placed upon the game story, whilst
action and adventure games are often much more visually
stimulating.
25
20
15
10
5
0
Playability
Sound
Interface
Graphics
Online Gaming
Other
Figure 2 - Second Most Important Game Feature
It can be seen in these results that users do not place any
particular emphasis on game audio driving them heavily in
deciding to purchase a new game. As was expected the main
aspects users were interested in were the playability and
The results in Figure 2 give us a more useful insight into the
other factors which users look for in a game. This time we see
that, as we expected, the graphics and visual stimulation
presented by games was easily the most popular factor (QED!).
10
graphics of a game. However, the interface of a game did
become clear as another area which is important to gamers, and
since we are particularly focussed on diversifying the use of
audio to provide intuitive and novel interfaces, this generates
scope for further development into the area of deeper audio
integration.
biggest challenge to the user may be to learn the actual interface,
which at worst may become exhaustive before the player even
gets deeply involved with playing, and being absorbed into, the
game virtual world. This can be derogatory to the overall
experience. Once learned, interface in an audio game should
effectively become pervasive and transparent to the user.
3 Audio Focussed Gaming
The move towards audio gaming has been realised by the
development of software and interfaces for users with
disabilities which can be overcome by finding other interactive
domains with which they can engage. In order to facilitate useful
interaction with the game some form of multimodal or haptic
interface system is often employed. Indeed, there would be
many challenges associated with creating a purely audio-only
interface environment, especially for a game. This reiterates the
argument that pervasive interfaces are required. Beneficially, the
most entertaining and novel games will often involve some form
of physical interaction. A prime example of this kind of
supportive audio application comes in games which have been
seen as innovative in their multi-modal interfaces, and a prima
facie case would be that of the Dance Dance Revolution (DDR)
game, which is not totally audio focussed, but relies heavily on
the fusion of physical interfacing, sound, and a less intensive,
more supportive visual role for computer graphics.
In the current games market a number of new innovations over
the last few years have seen focus in directly drawn to the
integration of supportive audio exposure. That is to say, games
are becoming more reliant on audio and music since it has an
important role to play in supporting the user interaction with the
gaming environment.
Investigations by Targett and Fernström [6] outline the potential
effectiveness for purely audio-based gaming, and attempt to
evaluate the usefulness of such a system in the context of the
general games market, the effectiveness for users with
disabilities, and potential applications in the field of
complementary therapies. Crucially their work attempts to
ascertain if these games are actually entertaining- a key factor in
the success of any game, regardless of the novelty or
innovativeness of its interface.
Early work in the field of integrating a stronger audio presence
in software environments was undertaken by Lumberas and
Sánchez [3, 4]. Their work involved the creation of interactive
worlds and environments which utilised 3D audio to help
provide structure and support navigation within the virtual
environment [3]. This specific interaction is achieved through
the use of haptic interfaces. Additionally, stories which could be
accessed by blind children were developed, which involved
them in a virtual world, with which they could have a degree of
interaction [4]. This proved successful as a game, but was also
found to have therapeutic effects, which allowed the children to
better deal with everyday challenges outside of the game
environment.
The work by Lumberas and Sánchez has been further developed
and explored by Eriksson and Gärdenfors [9] and their paper
provides a very useful insight into the key issues of developing
audio interfaces, particularly for blind children. They discuss
how to interpret particulars of game interfaces and challenges to
that they can be effectively presented sonically.
McCrindle and Symons revisited the classic game of Space
Invaders and developed stimulating audio interfaces which
could be used by both blind and partially sighted users as well as
fully sighted gamers [1]. Their main concentration in this work
was in the area of providing useful audio feedback and cues to
the user and relied on a more traditional keyboard/joypad
interface. However, they received strong results which indicate
that their methods of providing audio cues are simple and
effective. This removes the challenge to an extent of being able
to provide more intuitive interfaces to such games.
Figure 4 - Playing Konami Dance Dance Revolution
Konami’s DDR game, pictured in Figure 4 (note the large
speakers), is highly successful and popular, and has become
integrated with youth culture [10]. Indeed, there are Worldwide
International championships held; testament to the success of
this particular multimodal game.
Still, a visual element of following on-screen prompts are
present, but the audio generated by the game is assistive to the
process of interaction. This said, the ability to maintain rhythm
and timing is crucial to success in the game scenarios. This kind
of intense physical response to audio cues is perhaps an extreme
example. One would often expect users to much prefer not
having to physically involve themselves so profoundly in the
interactive environment. Especially in the case where auditory
interfaces make software and games accessible to disabled or
impaired users, the physical activity required may not be
preferable.
Another good example of audio technology of this nature is
Rainbow Six 3 and the expansion pack RS3: Black Arrow. They
allow users to actually issue voice commands and hold (limited)
conversations with computer controlled players via the XBOX
Communicator system. This has huge potential and has perhaps
been under-used, particularly for users with limited vision.
Care must be taken when developing new and original methods
of interacting with computers, particularly with games. The
11
However, although this game relied intensely on the supportive
sound environment which it provided, without which it would
be lost, the user is not necessarily conscious of the importance
of the music in this game. Nevertheless, the user does indeed
interact with an audio environment in order to be able to dance
and keep rhythm whilst playing the game.
Are games which use 3D/surround
sound more interesting to you?
80
Gamer Rating (%)
4 Analysing Potential Market Appeal
Of course, the technological and design challenges of
developing audio games are a purely superfluous area of work if
there is not sufficient requirement in the market for such a
system. Although there may be sufficient demand in the areas of
making game accessible to impaired users, there is no reason
why such innovative and exciting developments should be
limited to these users. To attempt to establish if there is interest
and demand for games which have a particular focus upon the
audio artefacts and presence contained within, we further probed
our studied gamers to see what their interest would be.
50
40
30
20
20
8
0
Yes
No
Don't Know
Figure 6 - Interest in 3D Sound
The incorporation of surround sound in games is not a new
concept, and again the high positive response rate may again be
due to the expectations of users, based on their previous
experiences. However, it would be argued that the use of spatial
audio in games further deepens the experience and level of
immersion experienced by the user. Nevertheless, these results
indicate that users would be amenable to playing games where
some form of 3D or spatial audio is used.
Does the quality of the sound effects in a
computer game make you more
interested in playing or buying it?
Finally, to see how users would respond to the notion of a game
which uses audio as the main method of feedback and
interaction with a game, we asked users if they would be
interested in such a product. The results of this are shown in
Figure 7.
72
70
60
50
40
24
30
45
20
10
0
Yes
No
Would you be interested in a game
which used sound as the main way of
controlling interaction with the game?
40
4
Gamer Rating (%)
Gamer Rating (%)
60
10
First, we attempt to find out how much users are influenced by
the general sounds that would be expected in a computer game.
Given that response was low in our earlier investigation into the
importance of sound in games, we attempt to determine whether
the quality of the audio in games is therefore significant. The
results of this are shown in Figure 5.
80
72
70
Don't Know
Figure 5 - Importance of In-Game Sounds
It is clear from these results, that although users may not be
initially attracted to a game, the quality of the audio is still of
importance. There is an element of doubt which remains
however, since this was not indicated by an excessively large
amount of the sampled population. It is also possible that users
may be taking a consumer view, due to the phrasing of the
question, and are insisting on the maximum quality in a product
they might wish to buy or use. What is clear is that audio
contained in games must be of significant quality to be of
interest. This may be as a result of the use of high-quality
soundtracks in games as mentioned earlier.
Next, we attempt to establish whether or not the use of spatial
audio within games is seen as a novelty, or a particular point
which can be used to sell and drive a game in the market. The
results of this part of the investigation are shown in Figure 6.
35
36
32
32
30
25
20
15
10
5
0
Yes
No
Don't Know
Figure 7 - Interest in Audio Interfacing
The results from this final query yield some of the most
interesting results. Given the responses received from the
previous two questions it was expected that users would be
intrigued and interested at the concept of using such an
innovative audio game. There is a distinct lack of any particular
trend in response to this question. We would hypothesise at this
stage that suggesting an immersive audio environment where
control is also achieved through sound may be too extreme for
the majority of users and gamers at this stage, who perhaps do
not know enough about this particular area. We suggest that
further development and exposure of such products is probably
required.
12
well as taking into account the game usability which overall
provide a Heuristic Evaluation for Playability (HEP).
Particularly in the important initial stages of game development,
the HEP testing mechanism has proved more useful at
highlighting potential issues in a game’s playability than
standard user testing mechanisms. Many of the usability
heuristics proposed for the HEP system are generic and would
apply equally to any game, regardless of whether the primary
interface mechanism was visual, haptic, auditory, or a
combination of interfaces. However, an interesting area for
future research could be to further build-upon the HEP process
set out by Desurvire et al., particularly with a focus upon being
able to successfully evaluate audio games.
From these results we see that there is further strengthening of
the case for developing audio interfaces and audio games. The
use of audio as a feedback mechanism has certainly had the case
strengthened, although presenting a more extreme scenario
where audio may be used for control purposes may have been
too inventive at this stage. Clearly, deeper research is needed
into why users may or may not be interested in the concept of
much more involved audio environments for games.
5 The Future of Audio Games
The development of new and innovative audio games will be an
interesting and challenging field in the years to come. We can
already gain insight into the main areas of development by
examining some of the more recent research developments to
transpire and the issues surrounding these. We can see that
integration of audio is best achieved through either an indicative
set of audio sounds, such as earcons [11], or by employing
more continuous sounds which evolve with the game scenarios
they represent [1].
3D audio environments are a key area to focus on if more
involving and realistic audio and control environments are to be
realized in the computer games world. Since the human hearing
system is used to dealing with 3D sound in everyday life, this is
doubtless an area which should be further exploited. Indeed, the
reaction of users to a 3D audio environment is often instinctive
and there is general consistency in the responsiveness of
subjects when working in 3-Dimensional control environments,
even across international and cultural constraints [14]. The use
of 3D spatial audio within computer games is also an area for
rapid expansion. Though it is fairly standard for new games to
embrace 3D or surround sound environment, there is still much
work which can be done in this area. For example, Virtual
Reality systems are now embracing 3D audio, and results show
that when interacting with a VR or virtual environment, the
responses from users are far better and more accurate when in
the 3D sound domain [7, 8].
In their work on audio-only gaming, Röber and Masuch [12]
present a good overview of the current range of developments
and challenges in the field of audio gaming, both from the
technological perspective as well as dealing with important
issues relating to the playability and design requirements of
audio games. In this work a number of audio-only games are
developed which demonstrate how audio interfaces can be
applied and combined with a varied array of gestural or
movement-tracking control systems. Additionally, there is a
large step taken in this work since the developers often employ
complex sounds in their 3D auditory worlds, such as the sound
of the traffic on the road in the AudioFrogger game. These
sounds were recorded from the real-world, and are not indicative
synthesised sounds as often encountered in audio control
environments. However, since the use of the 3D audio space
allows for more space in the environment, this may not be an
issue, and in their paper the authors do not make any reference
to the usage of these sounds being derogatory or of them
creating any major problems in usability. This is promising, and
is also an interesting area for future work to be carried out.
Particularly, because gamers now expect realistic audio samples
to be used in games, and critically, the use of such sounds will
make the audio world more immersive and thus, effective.
6 Conclusions & Discussion
Evidently, there is significant work being undertaken both in the
commercial and academic sectors of computer games
development. However, one of the key issues which we believe
will have to be addressed is how to make these novel methods
appealing, and receive sufficient uptake by the general public. It
seems from our study that perhaps the main way in which to
increase the interest and usage of such new technologies is
simply through increasing awareness, and reinforcing the value
that such methods of interaction need not be expensive nor that
they are another novelty phenomenon which will disappear
overnight.
We can see that use of complex sounds mixed in a stereo or 3D
audio space may not currently be the driving force behind
consumer market demand for games, but that the users in the
market are certainly open to, and interested in, the use of a
diverse range of audio and innovative techniques within their
games. This is especially relevant to games, and although the
technologies and interfaces should be embraced, developers
should not lose sight of the fact that it is a game which is being
developed.
The game play and addictive factors of audio games will
certainly be an area at the forefront of the minds of many
developers although it can be regarded as a separate and distinct
challenge from the technological aspects of designing audio
games. Many technologically innovative games have been shortlived, mainly due to poor game playability and also the cost
associated with any extra equipment or components required. To
varying different degrees of success we have seen light-guns
(can we forget the Nintendo Super-Scope?), Game Boy Camera,
Nintendo Power Glove, Sega Activator, the Barcode Battler, and
the list goes on. Few of these novelty accessories have really
made significant impact on the market, with the exception
perhaps of the Sony Playstation I-toy, and the aforementioned
Konami DDR dance mat based games.
The thrust of this is that audio games must have high levels of
playability, which is the single most important factor cited in our
research of games players. The exploration into audio games
and multimodal games most not become overshadowed by the
need to learn the interface before playing the game. Audio
games must become as the audio sense is to human users
everyday, it must be pervasive, instinctive, and intuitive
To ensure that playability of audio games is achieved, there
must be in-depth testing carried out upon any software
developed. A useful set of heuristic evaluation methods has been
proposed and developed by Desurvire et al. [13]. Although not
specifically focussed on audio games, the methods used
specifically concentrate on a number of areas of gameplay, as
13
System Integration in Integrated Europe, Liberec, Czech
Republic, (2004).
References
[1] McCrindle, R. J., Symons, D., Audio space invaders.
Proceedings of 3rd International Conference on Disability,
Virtual Reality & Associated Technologies, Alghero, Italy,
(2000).
[2] Yuille, J., Smearing Discontinuity :: In-Game Sound.
Proceedings of 5th International Conference on Digital Arts and
Culture (DAC), Melbourne, Australia, (2003).
[3] Lumberas, M., Sánchez, J., Barcia, M., A 3D sound
hypermedial System for the Blind. Proceedings of the 1st
European Conference on Disability, Virtual Reality and
Associated Technologies, Maidenhead, UK, (1996).
[4] Lumberas, M., Sánchez, J., 3D Aural Interactive Hyper
Stories for Blind Children. Proceedings of the 2nd European
Conference on Disability, Virtual Reality and Associated
Technologies, Skövde, Sweden, (1998).
[5] Mereu, S., Kazman, R., Audio Enhanced 3D Interfaces for
Visually Impaired Users, Proceedings of International
Conference on Human Factors in Computing Systems ‘96,
Vancouver, Canada, (1996).
[6] Targett, S., Fernström, M., Audio Games: Fun for All? All
for Fun?, Proceedings of International Conference on Auditory
Display, Boston, MA, USA, (2003)
[7] Zhou, Z., Cheok, A. D., Yang, X., Qiu, Y., An experimental
study on the role of 3D sound in augmented reality environment.
Interacting with Computers, 16, 1043-1068, (2004).
[8] Zhou, Z., Cheok, A. D., Yang, X., Qiu, Y., An experimental
study on the role of software synthesized 3D sound in
augmented reality environments. Interacting with Computers,
16, 989-1016, (2004).
[9] Eriksson, Y., Gärdenfors, D., Computer games for children
with visual impairments. Proceedings of 5th International
Conference on Disability, Virtual Reality and Associated
Technologies, Oxford, UK, (2004).
[10] Welcome To My World - Lord of the Dance Machine,
Episode 2, TV. BBC Three, July 27, (2006). Synopsis available
at:
http://www.bbc.co.uk/bbcthree/tv/my_world/lord_dance.shtml
[11] Brewster, S.A., Providing a structured method for
integrating non-speech audio into human-computer interfaces.
PhD Thesis, University of York, UK, (1994).
[12] Röber, N., Masuch, M., Leaving the Screen: New
Perspectives in Audio-Only Gaming. Proceedings of 5th
International Conference on Auditory Displays (ICAD),
Limerick, Ireland, (2005).
[13] Desurvire, H., Caplan, M., Toth, J.A., Using Heuristics to
Evaluate the Playability of Games. Proceedings of Conference
on Human Factors in Computing Systems, Vienna, Austria,
(2004).
[14] Cunningham, S., Hebblewhite, R., Picking, R., Edwards,
W., Multimodal Interaction and Cognition in 3D Music and
Spaital Audio Environments: A European Compatible
Framework. Proceedings of CSSI International Conference on
14
Authoring of 3D virtual auditory Environments
Niklas Röber, Eva C. Deutschmann and Maic Masuch
Games Research Group
Department of Simulation and Graphics,
School of Computing Science,
Otto-von-Guericke University Magdeburg, Germany
niklas|[email protected]
Abstract. Auditory authoring is an essential component in the design of virtual environments and describes the process of
assigning sounds and voices to objects within a virtual 3D scene. In a broader sense, auditory authoring also includes the
definition of dependencies between objects and different object states, as well as time- and user-dependent interactions in
dynamic environments. Our system unifies these attributes within so called auditory textures and allows an intuitive design of
3D auditory scenes for varying applications. Furthermore, it takes care of the different perception through auditory channels
and provides interactive and easy to use sonification and interaction techniques.
In this paper we present the necessary concepts as well as a system for the authoring of 3D virtual auditory environments
as they are used in computer games, augmented audio reality and audio-based training simulations for the visually impaired.
As applications we especially focus on augmented audio reality and the applications associated with it. In the paper we
provide details about the definition of 3D auditory environments along techniques for their authoring and design, as well an
overview of the system itself with a discussion of several examples.
1 Introduction
defining a theoretical foundation for 3D virtual auditory environments and the methods necessary to describe and design them. As
many auditory environments are currently still programmed using software API’s, this authoring system opens artists and nonprogrammers a door to design and create (augmented) auditory
applications in a very easy and intuitive way. The challenges in
the design of auditory environments, which especially applies to
the authoring itself, is to provide enough information to the user
without overloading the auditory display and to keep the right balance between aesthetics and functionality.
Many of today’s computer games feature an impressive and an
almost photo-realistic depiction of the virtual scenes. Although
the importance of sound has moved into the focus of game developers and players, it still does not receive the same level of
attention than high-end computer graphics. The reasons for this
are manifold, but some of them are already decreasing in a way
that sound plays a larger role in certain games and game genre.
One niche in which sound is the major carrier for information
are so called audio- or audio-only computer games. These type
of games are often developed by and for the visually impaired
community and are played and perceived through auditory channels alone. Many genre have been adopted, including adventures,
action and role-playing games as well as simulations and racing
games. To bridge the barrier between visual and auditory game
play, some of these games are developed as hybrids, and can be
played by sight and ear [4]. For a more detailed discussion on
these games we refer to [8], [9] and the audiogames website [11].
Both, audio-visual and audio-only computer games, need to be
authored and designed regarding the story and the game play. For
this purpose, specially designed authoring environments are often shipped together with the game-engines used. An overview
and comparison of some commercially and free available audio
authoring environments have been discussed by Röber et.al. [6].
Our authoring system is part of a larger audio framework that
can be used for the design of general auditory displays, audioonly computer games and augmented audio reality applications.
The system is based on 3D polygonal scenes that form the virtual environment. This description is used for collision detection
and to assign sound sources to objects and locations. During the
authoring, additional acoustic information is added to the scene.
Therefore, for each object an auditory texture is defined and set
up to specify the objects later auditory appearance. This includes
different sounds and sound parameters per object, as well as story
and interaction specific dependencies. The auditory presentation
of such sound-objects can be changed by user interaction, time,
other objects or an underlying story event. The authoring system
is additionally divided into an authoring and a stand-alone player
component. This allows an hardware independent authoring and
the player to be used independently from the main system in mobile devices.
In our research we especially focus on audio-only computer
games and augmented audio reality applications in the context of
serious gaming. Here we concentrate on techniques for sonification, interaction and storytelling, but also on authoring and audio
rendering itself. The methods developed here are not only applicable for entertainment and edutainment, but can also be used
in the design of general auditory displays and for training simulations to aid the visually impaired. With the authoring environment and the techniques presented in this paper, we focus on
The paper is organized as follows: After this introduction, we
focus in Section 2 on the definition of 3D virtual auditory environments and discuss here especially the concept of auditory textures
along the varying possibilities for sonification and interaction. In
this section we also motivate and explain the additional changes
necessary to support dynamic environments and augmented auditory applications. Section 3 is build upon the previous sections
15
which describe the scenes. This scenegraph is also responsible for collision detection, level of detail and to handle possible
time-, position-, object- or user-based dependencies. Every object
within the auditory scene must be audile in some way, otherwise
it is not detectable and not part of the environment. The objects
can be grouped into non-interactable, passage ways and doors
and interactable objects [7]. Combined with this scenegraph is a
3D audio engine that is capable of spatializing sound sources and
simulating the scenes acoustics. Due to the differences in perception, the acoustic design must not resemble a real-world acoustic
environment, instead certain effects, such as the Doppler, need to
be exaggerated in order to be perceived. Also, additional information for beacons, earcons and auditory icons to describe nonacoustic objects and events need to be integrated in the auditory
description of the scene. In order to interact with the environment
and to derive useful information from the scene, the user needs to
be able to input information. This is be handled through a variety of sonification and interaction techniques, which have already
been discussed in the literature [8], [6]. Difficulties often occur
with navigational tasks in which the user needs to navigate from
one point to another more distant location within a large scene.
Path guiding techniques, such as Soundpipes have here proven
to be useful to not get lost [10]. Another technique, which has
demonstrated to greatly enhance the perception by imitating natural hearing behaviors, is head-tracking that measures the orientation of the users head and directly applies this transformation to
the virtual listener. This enables the user to immediately receive
feedback from the system by just changing the heads orientation.
Head-tracking can also be used for gesture detections, in which
nodding and negation directly transfer to the system. Section 4
presents an actual implementation of such a 3D virtual auditory
environment, while the next to paragraphs extend the system towards dynamic and augmented auditory applications.
and discusses in detail the authoring system using several examples. Here we explain the techniques and concepts used, and provide together with Section 4 additional information regarding the
user interface and the soft- and hardware implementation. Section 5 presents and discusses the results achieved using some examples, while Section 6 summarizes the paper and states possibilities for future improvements.
2 Virtual auditory Environments
Vision and hearing are by far the most strongest senses and
provide us with all information necessary to orientate ourselves
within a real-world environment. Although one perceives the majority of information visually through the eyes, different and often
invisible information is sensed acoustically. Examples can be easily found in daily life, such as telephone rings or warning beacons.
Though the visual and the auditory environment, which are perceived by the ears respective the eyes, are partially overlapping,
the larger portion is dissimilar and complements each other to provide a comprehensive overview of the local surroundings. Virtual
environments are computer created worlds, which often resemble
a real environment. Depending on the realism of the computer
generated graphics and sound, the user might immerse into this
virtual reality and accepts it as real. Virtual environments have
many applications, ranging from simulations and data visualization to computer games and virtual training scenarios. The most
successful implementation are computer games, in which players
immerse themselves into a different reality as virtual heros.
3D virtual auditory environments represent a special form of
virtual environments that uses only the auditory channel to convey
data and information. As discussed in the last paragraph, the auditory and the visual channel sense different information and form
a diverse representation of the users surroundings. This has to be
incorporated into the design of virtual auditory environments, if
the goal is to visualize a (virtual) real-world-resembling environment. An advantage of hearing opposed to vision is the possibility
to hear within a field of 360 degree and to also perceive information from behind obstacles and occlusions. Difficulties sometimes
apply with the amount of data perceivable and the resolution of
the spatial localization for 3D sound sources. Furthermore, auditory information can only be perceived over time and only if a
sound source is active. For a technical realization, virtual auditory environments are simpler and cheaper to build, as no screens
or visual displays are needed. Auditory environments have many
applications, including auditory displays and of course audio-only
computer games and augmented audio reality.
In order to receive enough information for the users orientation, navigation and interaction, a 3D auditory environment must
exhibit certain qualities. These qualities and functions can be described as:
2.1 Dynamic Environments
While the user can only explore static environments, more interesting, but also more difficult, is the creation of dynamic and
through user interaction changing environments. Dynamic classifies here not only animations and loops, but a reaction of the
environment to the users interaction. This can be expressed
through time-, position- and object-dependencies, which are directly bound to certain objects in the scene.
• A 3D (polygon-based) virtual environment managed by a
scenegraph system,
• A 3D audio-engine with a non-realistic acoustic design,
• Sonification and interaction techniques,
• Input and interaction devices, and
Figure 1: (Inter)action Graph to model dynamic Environments.
• User-tracking equipment.
A time dependency controls the state of an object with an absolute or a relative time measurement. If the time is up, a certain action is evoked, like the playback of sound or the setting of
other objects or control structures. A position-dependency is triggered by the user if he approaches the corresponding object, while
This list extends a little further with the design of dynamic and
augmented auditory environments, see also the following paragraphs. The basis of virtual auditory environments is built by a
3D scenegraph system that manages all the 3D polygonal meshes
16
the auditory environment [6]. These auditory textures have now
been extended to also control the object states through time, position and user interaction dependencies, and additionally handle
also the references to the various sound files along the parameters
for their playback.
Figures 2(a) and 2(b) display the authoring and design of dependencies using the concept of auditory textures. Figure 2(a)
shows here the different dependencies and their arrangement
within the auditory texture after type for faster access. Figure 2(b)
displays the final action graph that is constructed from the previous auditory textures.
object-dependencies change an objects state and are induced by
other related objects. Figure 1 shows an action graph that visualizes these dependencies, while Figures 2(a) and 2(b) display the
later authoring and design of these dependencies using auditory
textures. A menu system and soundpipes, which are using mobile
sound sources, can be designed as well by using additional object
dependencies.
2.2 Augmented Audio Reality
The term Augmented Reality comprises technologies and techniques that extend and enhance a real-world environment with
additional (artificial) information. It is, unlike virtual reality, not
concerned with a complete replacement of the real environment,
but focusses deliberately on the perception of both worlds that intermingle and blend over. Ideally, the user would perceive both
environments as one, and artificial objects and sounds as positioned within the real environment [5], [1]. Augmented reality
has many applications, ranging from entertainment and visualization to edutainment and virtual archeology [2], [3].
Augmented Audio Reality describes the part of augmented reality that focusses exclusively on auditory perception. The afore
listed qualities of a virtual auditory environment need here to be
extended by tracking techniques that position the user within the
virtual environment. This positioning, as well as the virtual map,
need to be calibrated in order to deliver the right position. Due
to the low resolution of the human hearing system in localizing
3D sound sources, the tolerance can, also depending on the application, vary up to to 3 m. This positioning accuracy needs to
be considered during the authoring, as objects with a positiondependency should be roughly two times that distance apart. If
the virtual environment is perceived through headphones, another
problem occurs. The human listening system heavily relies on the
outer ears to localize sound sources within 3D space. If the ears
are covered, sounds from the real-world can no longer be heard
properly. A solution to this problem are bone-conducting headphones, that are worn in front or behind the ears and transmit the
sound via bone. Besides a slightly lower listening quality, these
bone-phones allow a perfect fusion of a real and virtual acoustic
environment. Additional care has to be taken with the user tracking and positioning, as the latency effects resulting from the measurement and interpretation do not have to be too large. Otherwise, the two environments would appear disjunct under motion.
A more detailed discussion on the hardware used to design such a
system can be found in Section 4.2.
(a) Authoring of Dependencies.
(b) Construction of an Action Graph.
Figure 2: Authoring and Design of Dependencies.
The (inter)actiongraph that is depicted in Figure 1 is composed
by time and user interactions and bound to a specific object in the
scene. The edges describe conditions, which, if satisfied, connect
to the next possible actions, while the nodes are build by counters
and user interactions. All object conditions that are not related
through user interaction can be described using time. This allows
also the execution of events directly following other events. With
some additional control mechanisms, this description can also be
used to model a story-engine that controls narrative content and
parameters, as used in computer games or other forms of interactive narration.
These aforementioned time-, object- and user-dependencies,
including the various conditions and sounds for an object, can be
modelled using auditory textures. Auditory textures were initially
designed to only handle the different states and acoustic representations of an object. These state changes were induced by user
interaction, as well as a story- and physics-system which control
3 Auditory Authoring
Authoring is the process of designing a document and filling it
with content and information from possibly different sources and
media. Auditory authoring refers to the design of virtual auditory environments and the specification of dependencies to model
the applications behavior. The authoring for audio-only and augmented audio reality applications takes often place directly using
programming languages. But this method is neither intuitive nor
can the content later be changed easily or adjusted. Together with
the development of applications, this was one of the main motivations for this research as the need for more professional authoring systems is growing. A previous publication was already
concerned with the authoring of virtual auditory environments, on
17
Figure 3: Auditory Authoring Environment.
per sound and also vary over time, see Figures 3 and 6(b). The
user interface was designed using Qt4 and allows to detach the
parameter entry forms and float them over the application to customize the layout.
which the current work is based along with the development of an
augmented audio reality system [6].
Figure 3 shows a screenshot of the authoring environment, that
explains the menu and the authoring concept. The center window
shows a visual representation of the scene, while the right hand
side offers sliders and parameter entries to adjust and fine tune
the sound sources as well as the auditory textures. Objects can
be selected by either clicking on them or through the list on the
left. Basic functionalities that have to be supported by any such
authoring system are:
• Select, create and delete sound sources,
• Position and orientation of sound sources,
• Specification of playback parameters, such as attenuation,
loudness, rolloff etc.,
• Setup of background and environmental sounds,
Figure 4: Setting up Sound Sources.
• Definition and set up of dependencies, and
• The design of an auditory menu system.
In Figure 4 one can see a screenshot of the authoring environment with a graphical representation of a virtual scene with one
sound source along their parameter visualizations. The cone visualizes the direction of the sound source, while the two wirespheres represent the attenuation and the rolloff space. The sound
parameters that are adjustable include position, loudness, direction, inner and outer opening, minimal and maximal loudness,
rolloff and many other.
Figure 5 displays the authoring of a ring topology-based menu
system using six spheres. The ring menu allows between two to
3.1 Sound and Environmental Authoring
The first step for the sound and environmental authoring is to
load a VRML file that represents the scene geometry. This data
can be modelled with any 3D program, such as Maya or 3D Studio MAX, from which the geometry can be exported as VRML.
After this, objects are selected and auditory textures as well as
sounds assigned and defined. Several parameters can be adjusted
18
(a) Soundpath Design.
Figure 5: Design of a Ring Topology-based Menu System.
six objects, which are automatically arranged and evenly distributed around the listener. Every object within the menu can be
assigned an auditory texture with all the possible modifications.
This system can therefore be easily used to control and adjust parameters inside the virtual environment.
3.2 Dynamic Authoring
After the authoring of the basic parameters, the dynamic authoring starts with the definition of dependencies and auditory textures. For each dependency exists a different input form, that assists the user in the authoring of parameters for the animations.
Figure 6 displays two examples for dynamic authoring. Here
Figure 6(a) shows the design and animation authoring of a circlebased soundpath. Other geometries, like polygon lines or splines,
can be used as well and are employed later within the environment to assist the player with navigation and orientation. For the
animation, an object (sphere) is selected and the time for the animation specified. The start of the animation can also be triggered
through any event, like time or user interaction, and repeated as
often as required. In Figure 6(b) one can see the visualization of a
positional dependency. The two transparent boxes mark the entry,
respective the exit event to play a sound file if the user approaches
the center box object. The two boxes are due to the low resolution
of the user positioning in order to avoid a parameter flipping, see
also Section 2.2 for more details.
(b) Position Dependency.
Figure 6: Dynamic Authoring.
multiple applications, ranging from entertainment and edutainment environments to training simulations for the visually impaired. The authoring and the presentation of the designed application takes place in two different components. The entire system
is therefore divided into two parts: the authoring and a runtime
module. The authoring system is used to design the virtual auditory environment, which can also be tested on the fly using a
built-in player component. The authored application can then be
saved and executed on a mobile platform using the runtime system
as well. This division allows a hardware independent authoring,
in which the additional tracking and input devices are simulated
by the mouse and keyboard.
Figure 7 shows an overview of the system, with the authoring
component on top and the player module at the bottom of the figure. The player component also uses the VRML model to visually
inspect the scene and to verify the authoring. The evaluation of
the scene events are carried out using the authored auditory textures and the information from the tracking and user interaction
equipment. The final acoustic presentation using sound spatialization and room acoustics is rendered by OpenAL.
4 System Design
While the last two sections focused on the theoretical foundations of auditory environments and their authoring for audio-only
and augmented audio reality applications, this section provides an
overview of the systems design along with some implementation
details and a discussion on hardware related issues.
4.2 Hardware
As the main focus of the paper is on the authoring of virtual auditory environments, we will keep the discussion on hardwarerelated issues very brief. The hardware for our portable augmented audio system consists of a regular Laptop (Dell Inspiron8200), a digital compass that is used as head-tracking device, a gyro mouse for 360 degree interaction, bone-conducting
headphones for the acoustic presentation and a W-Lan antenna
along several portable W-Lan access points for the user positioning. Although the system is very low cost and cheap, it is
still very reliable and achieves good results. The digital compass is a F350-COMPASS reference design from Silicon Laboratories that uses three separate axis of magneto-resistive sensing
elements that are tilt compensated. The compass connects to the
computer via USB and can be easily polled using a simple API.
4.1 Software
The authoring system is based on a previous audio framework
that was developed and applied to design audio-only computer
games and to evaluate sonification and interaction techniques [7],
[8]. This framework was built using OpenAL for sound rendering
and OpenSG to manage the 3D content of the scenes. The same
framework and libraries were used as basis for this authoring system, and extended by Trolltech’s Qt4 as API for the user interface
design. Figure 3 shows a screenshot of the final application.
The authoring system was designed to allow an easy authoring of (augmented) 3D virtual auditory environments without the
need for special knowledge or programming experiences. Additionally, the system was designed as a universal modeler to serve
19
again, but the story and the story points were slightly adjusted to
match the new requirements of the augmented system, especially
for the user positioning. The story, the events and the user interaction were encoded within the dependencies of auditory textures.
Although the system worked well, difficulties arose with the accuracy of the positioning due to the highly reflective stone walls
that interfered with the W-Lan based user tracking.
5.1 Campus Training Simulation
The other example, which shall be discussed in a little more detail, is an augmented virtual training scenario for the visually impaired. Figure 3 in Section 3 displays an overview of the map
used and also shows the authoring of the dependencies using auditory textures. In this simulation, buildings and important places
are characterized through certain sounds that are specific to their
function. In our campus simulation, this is for example the rattling of plates and cutlery in the cafeteria, the rustling of pages
and books in the library and space-like sounds representing the
department of computer science. Using this training simulation,
the user becomes familiar with the arrangement of buildings and
locations in an off-line simulation using the player component.
The orientation and position of the user are herby input using
the mouse and keyboard. Later in the augmented version, the
user walks through the real environment recognizing the various
sounds. The user perceives the same sounds and information, except that the position and orientation are now measured by the
digital compass / gyro mouse and the W-Lan positioning engine.
The authoring for this training simulation was very straightforward and relatively easy. The 3D model could be designed very
fast using 3D Studio MAX as the buildings did not need to be
highly realistic. Describing sounds for each building were taken
from a sound pool CD-ROM and also created by ourselves by
simply recording the auditory atmosphere at these locations. In
the final authoring using the system depicted in Figure 3, these
sounds were assigned to each building along some object and position dependencies. Figure 8 displays a screenshot from the runtime component (left) and the W-Lan positioning engine (right).
It shows the view from the department of computer science towards the library. In the right figure, the users position is marked
by a bright red dot in the corner of the middle/right building.
Figure 7: System Overview.
The gyro mouse uses a similar principle to determine the mouses
orientation in 3D space. It is used in the runtime system as alternative interaction device to specify the listeners orientation, but
also to interact with the virtual environment and to input user selections. Bone-conducting headphones are employed to improve
the blending of the two different auditory environments. Here we
use a model from the Vonia Corporation. As the sounds are conveyed over the bones in front of the ear, the ear remains uncovered
and is still fully functional to localize sounds from the real-world
environment. Although frequencies below around 250 Hz can
not be perceived, the quality is good enough to spatialize sounds
through general HRTF’s. An evaluation of bone-conducting headphones for several applications including spatialization has been
discussed by Walker et.al. [12]. The user positioning system uses
an own implementation of W-Lan location determination systems
[13], [14]. Our approach is a derivation of the standard Radar
system that was extended by pre-sampled and interpolated radio
maps and additional statistics to improve the performance. The
resolution ranges between 1 m and 2 m and depends on the number of access points used and the rooms size and geometry. A
huge advantage of W-Lan positioning over GPS is that it can be
used inside and outside of buildings. With the growing number of
commercial and private access points, this positioning technique
uses a resource that is already in place.
5 Applications and Discussion
The focus of the last sections was to form a theoretical foundation
for 3D virtual auditory environments with applications in audioonly computer games and augmented audio reality. The emphasis
in this section lies in the analysis of the results and a discussion
on the performance of the authoring environment, the system and
the initial definition of auditory environments.
As one of the two foci was the design and evaluation of augmented audio reality applications, we have implemented and
tested two different scenarios. One is an augmented adaptation of
an earlier audio-only adventure game [8], while the other can be
considered as a serious game that assists visually impaired people
in training their orientational skills. The augmented audio adventure game takes place in an ancient cathedral of Magdeburg, were
several sagas and myths have been combined into one story. A
tourist visiting the city can unveil several mysteries, while at the
same time learning about the history of the city and the cathedral.
The 3D model that was used in the original game has been used
Figure 8: Player Component (left) and W-Lan Positioning (right).
Tests using both applications yielded good results, although
some points need to be improved. So far we have tested both
applications using sighted users only, but additional tests with
visually impaired participants are scheduled for the next month.
Although the entire campus is equipped with many overlapping
access points, the positioning algorithm performs better indoors,
due to the shadowing effects of the rooms geometry and furniture. In wide open spaces, such as the campus scenario, the signal strength is homogenous over long distances. Advantageous in
20
thank Stefanie Barthen for her help in designing the test scenarios, as well as the group of Professor Nett for helpful discussions
on W-Lan positioning and for lending us portable W-Lan access
points.
outdoor applications is the existing ambient sound environment
from the real-world, whereas indoors are more silent. Hence, the
authoring for outdoor applications is easier as many sound sources
are already present. Additionally, in outdoor augmented scenarios
the distribution of event locations is scattered over a larger area,
which at the same time allows a better positioning as overlapping
effects are easy to avoid. One subject reported that the quality of
the bone-phones was too poor and disturbed the perception, while
all candidates stated that the system, see also Figure 9 is easy to
wear and handle.
References
[1] Ronald T. Azuma. A survey of Augmented Reality. In Presence: Teleoperators and Virtual Environments 6, pages 355–
385, 1997.
[2] S. K. Feiner. Augmented Reality: A New Way of Seeing: Computer scientists are developing systems that can
enhance and enrich a user’s view of the world. Scientific
American, April 2002.
[3] Tony Hall, Luigina Ciolfi, Liam Bannon, Mike Fraser, Steve
Benford, John Bowers, Chris Greenhalgh, Sten Olof Hellström, Shahram Izadi, Holger Schnädelbach, and Martin
Flintham. The Visitor as Virtual Archaeologist: Explorations in Mixed Reality Technologies to enhance Educational and social Interaction in the Museum. In VAST - Conference on Virtual Reality, Archeology and Cultural Heritage, pages 91–96, 365, 2001.
[4] Pin Interactive. Terraformers, 2003. PC.
[5] Paul Milgram, David Drascic, J. Julius J. Grodski, Anu
Restogi, Shumin Zhai, and Chin Zhou. Merging real and
virtual Worlds. In IMAGINA’95, pages 218–220, 1995.
Figure 9: The Augmented Audio System in Action.
Important for all applications is a careful selection of sounds,
as some of the sounds used in the campus training simulation
were difficult to classify and sometimes even bothersome. Longer
acoustic representations performed better that shorter ones.
The next steps to improve the system and the two applications
are a refining of the positioning system for outdoor tracking. Here
we need more sampling points and a better interpolation scheme
for the radio maps. Additionally, some sounds and event distances
need to be checked and probably adjusted as well. But the most
important part of future work is a detailed user study with sighted
and blind users that features also a comparison between users that
performed an off-line training with user that did not.
[6] N. Röber and M. Masuch. Auditory Game Authoring: From
virtual Worlds to auditory Environments. In Norman Gough
Quasim Mehdi and Gavin King, editors, Proceedings of
CGAIDE 2004, London, England, 2004.
[7] N. Röber and M. Masuch. Interacting with Sound: An interaction Paradigm for virtual auditory Worlds. In 10th Int.
Conf. on Auditory Display (ICAD), 2004.
[8] N. Röber and M. Masuch. Leaving the Screen: New Perspectives in Audio-Only Gaming. In 11th Int. Conf. on Auditory Display (ICAD), 2005.
[9] N. Röber and M. Masuch. Playing Audio-Only Games: A
Compendium of Interacting with Virtual, Auditory Worlds.
In 2nd Int. Digital Games Research Association Conf. (DIGRA), 2005.
6 Conclusions and Future Work
In this work we have discussed virtual auditory environments and
their basic qualities that define them. We have motivated this
through several applications like audio-only computer games and
augmented audio reality, for which the definition of auditory environments was extended. Furthermore, we have presented a system
for the multipurpose authoring of various auditory applications together with several user supporting techniques. Finally we have
presented and discussed a hardware realization for an augmented
audio reality system along two example implementations.
Future work includes, as already outlined in the last section, a
detailed user study using sighted and blind participants, as well as
a refinement of the positioning engine to improve the resolution
and accuracy.
[10] N. Röber and M. Masuch. Soundpipes: A new way of Path
Sonification. Technical Report 5, Fakultät für Informatik,
Otto-von-Guericke Universität Magdeburg, 2006.
[11] Richard van Tol and Sander Huiberts. Audiogames Website.
http://www.audiogames.net, 2006.
[12] B. N. Walker and R. Stanley. Thresholds of Audibility for
bone-conduction Headsets. In 11th Int. Conf. on Auditory
Display (ICAD), 2005.
[13] Moustafa Youssef and Ashok K. Agrawala. The Horus
WLAN Location Determination System. In 3rd Int. Conf.
on Mobile Systems, Applications, and Services (MobiSys
2005), 2005.
Acknowledgment
[14] Moustafa Youssef, Ashok K. Agrawala, and A. Udaya
Shankar. WLAN Location Determination via Clustering and
Probability Distributions. In IEEE Int. Conf. on Pervasive
Computing and Communications (PerCom), 2003.
The authors would like to thank Mathias Otto for his help in developing the augmented audio system and especially for his work
on the W-Lan positioning engine. Furthermore, we would like to
21
From Heartland Values to Killing Prostitutes: An Overview of Sound in
the Video Game Grand Theft Auto Liberty City Stories
Juan M. Garcia
E-mail: [email protected]
Abstract: The video game, as understood by Jesper Juul, consist of two basic elements; a set of real rules and a fictional world. The
fictional world requires the player not only to engage in a willing suspension of disbelief, the player must also willingly accept
another cultural model as valid, at least during playtime. Those who engage Grand Theft Auto accept murder, extortion, sexism, and
racism as valid, even when it may contradict their core set of values. The player has not turned his back on his old framework of
understanding, he has instead been allowed to wear a new one, one that he must leave behind once he exits the virtual world. The
actions of the player will be meaningful only within context. The virtual world, and the cultural model it promotes, reinforce the very
real rules that dominate the game while simultaneously the rules reinforce the cultural model, and virtual world in which it exists.
The following paper analyzes the role of sound in the video game Grand Theft Auto Liberty City Stories in creating the cultural
model that the player adopts during game time.
sets the story on motion. After the opening scene non-diegetic
sound, in the form of music, appears only as brief snippets of the
musical theme of GTA LCS; such snippets are heard after
successfully completing a mission, and after finishing the game.
1 Objective
The present study aims to expand the understanding of the use
of sound in video games. For such purpose the video game
Grand Theft Auto Liberty City Stories will be used as a case
study. The use of GTA LCS obeys several reasons. First, the
Grand Theft Auto Series has caused a lot of controversy for its
violent and sexual content. The latest blunder comes from an
exploit called hot coffee, the mentioned exploit allows the
player to partake in a scene in which two characters engage in
simulated sex. On the other hand the Grand Theft Auto Series
and Liberty City Stories in particular have received positive
reviews in the specialized press; IGN, GameSpot, and 1up —the
sister website of the Playstation Magazine. Any game that
generates as much presence on the media as Grand Theft Auto
deserves a closer examination. Furthermore its clever use of
sound provides as fertile ground for analysis.
3.1 Blue Arrows
There are other sounds of a non-diegetic nature; most of them
'beeps'. Such 'beeps' accompany banners, or signs. The signs or
banners are instructions, tips, and rules. These instructions are
presented not as objects belonging to the fictional world but
rather superimposed text, although part of the game. The texts,
arrows, lights, pointers and sounds that indicate instructions or
tips can be called 'blue arrows'. The term 'blue arrow' is used by
Jesper Juul. [8] And lacking a better word to name the
previously described instances 'blue arrow' will be used. In GTA
LCS the “blue arrows”—which are in fact not blue arrows but
yellow lights—appear also in the form of sound cues, 'beeps'.
The use of 'blue arrows' does not seem excessively problematic.
Alison McMachan points that most scholars and scientists agree
that “total photo—and audio—realism is not necessary for a
virtual reality to create immersion”. McMachan goes on to
describe the requirements for an immersing environment; “the
user's expectation of the game or environment must match the
environments conventions fairly close”, the user's action must be
reflected, have an effect, on the game or environment. And
finally “the conventions of the world must be consistent even if
they don't match the 'metaspace” . The use of non-diegetic
sound remains consistent through out the game, marking the
beginning, and ending—of both; individual missions and the
game—and as aural 'blue arrows'.[11]
2 Welcome to the Worst Place in America
Welcome to Liberty City, the “worst place in America”,
nowhere to be found but on edge of reality. An intangible place,
untouchable, ephemeral, fictional; yet it can be experienced,
lived, suffered. Liberty City is the setting for the video game
Grand Theft Auto Liberty City Stories and, like many other
fictional places with names like Azeroth and Hyrule, it has
become part of the the life of millions of people. As a media the
video game, and its worlds, has truly become, as Marshall
McLuhan would say, an extension of man, an extension of man's
world. The following pages are but a little inquiry into the role
that sound can play in creating the virtual playgrounds of the
video game.
3.2 Mixing diegetic and non-diegetic sound.
The simultaneous use of diegetic and non-diegetic sound , points
to a complex system of symbols and a complex interaction
between the user of the game and the game. According to Troy
Innocent the electronic space of the game allows “multiple
forms of representation to easily coexist”. [6] The player of GTA
LCS has to be able to navigate between different uses of sound .
The non-diegetic sounds are separated from what Aarseth calls
“Game-world”—One of the three elements of games that
Aarseth describes. [1] The player may assign different meaning
to diegetic and non-diegetic sound, and use the information
derived from such sounds appropriately.
2.1 Half-Real
In his latest book, Half-Real, Jesper Juul, researcher at the IT
University of Copenhagen, defines games as a “combination of
rules and fiction”. If we are to take Juul’s statement as valid then
a question arises; does sound in the video game obeys to the
need to create a fictional world, or to create and maintain rules?
Can sound do both, maintain fiction and rules, at the same time
or are they mutually exclusive? These are the questions at the
core of the present paper.[7]
3 Non-Diegetic Sound
The graphic interface video games use is in more than one case
composed of elements that belong to the fictional world of the
game and other elements such as the HUD—Heads-Up
display—or maps. The same applies to sound. Diegetic sound
In the opening scene of Grand Theft Auto Liberty City Stories
the main character, Tony Cipriani, arrives to Liberty City. This
scene is fashioned as a movie scene; it use of background
music—non-diegetic sound—a variety of camera angles, and
22
and Non-diegetic sound coexists on the same game. The
examples of GTA III and the latest version of Zelda for the
Game Cube are used by Juul as representatives of the use of
blue arrows. Some games such as Shadow of The Colossus have
tried to shed these blue arrows. One can only suggest that such
attempts to relinquish the use of on-screen content that does not
belong to the story space are made to create a more immersing
world. In the most immersing environments reminders of the
structural level of the game are gone and the player can
concentrate on the game-world level.
understanding the rules” (Juul 2005). The objective of the game
is to complete missions that involve violence and crime,
simultaneously the fictional world of Liberty City—the setting
of the game—invites violence. Rules indicate commit crimes,
and the fictional world indicates negotiation, interaction and
compassion are not a part of this world.
One can find examples of the coherence between the fictional
world and the rules of the game. The video game rates the
criminal ranking of the player; more crime and mayhem higher
criminal ranking. The killing of characters is therefore viewed as
a positive accomplishment in the game. On the other hand the
character of Liberty City are for the most part unable to help the
player successfully complete his missions. The characters that
roam the streets of Liberty City provide no relevant information,
or items that may help the character. The characters are however
useful dead since they can drop weapons or cash—both useful
tools.
It would be the job of researches to investigate if techniques
such as the non-use of HUDs and blue arrows affect immersion.
In the case of GTA LCS the user can deactivate the the HUD and
on-screen radar however he can not deactivate the blips and
most blue arrows. The lack of such a feature is important
because it denies the player the option of selecting the way he or
she prefers to engage, and experience, the game. Making
available those options could possibly create a variety of ways
to experience the game. That significant possibility is an area
open to further research.
Since the majority of the sounds on the game GTA LCS are
diegetic, and diegetic sound motivated the research, it is only
appropriate to proceed to analyze it.
The inability to communicate is manifested in several ways one
of them aurally. The player soon notices, as both Leonard and
Frasca did, that the utterances of the players are irrelevant and
insignificant. This information is conveyed only aurally because
although there is an option for subtitles this option does work on
most characters. The cinematic scenes, the cut screens do get
subtitles but the regular citizens that do not partake on the main
story line can only be hear, not read.
4.1 A cold World
David Leonard published a strong critique of Grand Theft Auto
III. His article aimed to demonstrate the racial, social and class
undertones behind GTA III. Leonard’s article is a searing
criticism of what he identifies as white supremacy values being
portrayed by the game. While the present study does not aim to
give a moral evaluation of Grand Theft Auto, and while
Leonard's criticism is of a different title to the one being used
here Leonard’s text touches interesting aspects of the use of
sound. He mentions that your enemies “have no voice or face”.
He goes on to explain that the only rule that seems to dominate
the main character world is “kill or be killed”. [10]
The installment of Zelda for the SNES console contrasts with
GTA LCS in various ways. First, the main character in Zelda
can communicate with many of the characters on the game; they
provide information, and items that are helpful. In such case the
indiscriminate
murder
of
characters
becomes
counterproductive—not to mention that it is not allowed.
However in Zelda information is conveyed in the form of text.
Aural feedback does not defines the world the same way that
GTA. Newer games may use the same strategy of Zelda in that
they make their characters capable of communicate and become
useful. However these newer games can rely on sound to allow
their characters to communicate.
A different approach to Grand Theft Auto III was published by
Gonzalo Frasca. He also referrers to the virtual inhabitants of
GTA. Frasca describes the as “nothing short autistic” (Frasca
2003). The characters, inhabitants, of GTA III “remind the
gamers that they are dealing with a bot” (Frasca 2003). The
characters are dehumanized and objectified (Frasca 2003)—the
criticism of
Leonard. The similarities between Frasca's
approach and Leonard's end there.
Another example is the infamous prostitute trick. The main
character, when riding a nice car, can drive close to a prostitute
and slow down, the prostitute character then boards the car, and
when the car is driven to an alley the main character will see his
health indicator raise at the same time his money counter will
lower. The indication that a sexual act may be taking place is
also conveyed trough sound in form of phrases and car sounds
since the car may be out of sight. In this instance is also
impossible to negotiate or communicate, the only phrases are
sexual in nature, or the offering of sexual services. The audio
reflects the fact that the prostitute's usefulness is limited to raise
the health meter of the main character. The fictional world is
supports the fact that the main character may profit more of
using and then shooting the prostitute—thus recovering the
money. Such actions are reprehensible in the real world but
useful in Liberty City. It is then again reflected in sound that the
prostitute character serves a limited function in the game, and is
more of an object, like every other inhabitant of Liberty City.
4 Diegetic Sound
Leonard's findings lead him to criticize the game as nothing
short of racist. Frasca on the other hand realizes that the
dehumanization of the characters on the game and the social
isolation of the main character—the playable character—allows
the player to concentrate on his own actions. [3]
The same analysis that Frasca does of GTA III applies to GTA
LCS. The characters have not evolved a whole deal since the
days of GTA III. The non-playable characters that populate the
streets of Liberty City limit themselves to taunts, insult or
screams, that is if they decide to talk at all. Even the elderly that
Leonard described on GTA III as few of the innocent ones are
mean or cowards, they will physically fight the main character,
or run away.
The attitudes of the inhabitants of GTA LCS as manifested
through their utterances help define the kind of world that
Liberty City is. Their short sentences and inability to talk
important information to the player condemn them. whether it is
the old man, the gangster or the prostitute they become, as was
indicated in Leonard criticism and Frasca's essay, objects, bots.
One may suppose that communication is possible since on
In Half-Real Jesper Juul explains that the fiction aspect of video
games plays a very important part in cuing the player “into
23
occasion groups of individuals are see standing together in what
may resemble social interaction, but the illusion soon collapses;
the groups are mostly gangs and no significant or important
conversations take place. The video game has to cued the player
and indicated him that that killing innocent people is not bad in
the game. The player is now free as Frasca indicated to focus on
his actions. The player is free of moral dilemmas since he was
presented with a framework—the fictional world—that rewards
violence, and this framework is presented partially trough sound
it is a convergence of the fictional world, and the rules—
including objectives of the game.
“minimal use of diegetic sound”. [10]
The use of sound in GTA LCS contrast heavily with JSRF. The
player in GTA LCS is connected to the fictional world trough
the radio. The radio, unlike the headphones in JSRF, does not
mute the sounds of the city instead connects the player to the
city since the consequences of the actions of the player are a
news broadcast on the radio of Liberty City.
The radio also reveals key aspects of the story line of the game.
It for example announces the escalating war between gangs.
Such use of sound results ingenious. The announcements on the
radio serve multiple purposes on one hand it takes the place of
the infamous cut-scene. The cut-scene is described by Juul as a
“non-interactive sequence of a game”. Cut-scenes result
problematic in several ways. As Juul explains they can be a
“non-game element” . Cut Scenes negate the player the
possibility of interacting. Cut-scenes also create a different
representation of time. The in-game radio announcements do not
disrupt the representation of time during the game as cut scenes
do. One may consider this a clever use of in-game objects to
inform the player without disrupting game play. [8]
In recounting the history of video games technology journalist J.
C. Herz describes Doom, as “deliciously clear-cut”. (Herz 19970
The lack of moral ambiguities, they impossibility of humanizing
demons grant the player the freedom to blast trough mazes.[ ]
One could say that the more complex world of Grant Theft Auto
has to work overtime in dehumanizing characters so that the
player can blast through the streets of Liberty City free of
ambiguities.
4.2 On the Radio
Another interesting aspect found on the game is the use of radio
stations. Each time a player carjacks a vehicle he is allowed to
tune in to different ‘radio stations’. These stations are
prerecorded songs and programs that include among others a
talk show sort of a mix between Fox News pundits, Jerry
Springer, and an The 700 Club. The show features a character
named Nurse Bob and his show named Heartland Values. The
stations can also be turned of if the player designs. The content
of this show and the other stations help define the game world in
a completely aural way. The DJs are reactionary and often are a
parody of homosexuals, liberals, conservatives, media, and
pretty much everything else. The radio shows help define the
violent sexist world of Liberty City Stories. They present the
player with a larger view of Liberty City. It is not only that the
people one finds on the street are worthless, but the whole
society of Liberty City is presented through its media—the
radio—as reactionary, intolerant, corrupt and perverted, not
open to dialog. The radio stations further reinforce the message
that not only the people but the whole society of Liberty City is
despicable. The player is in a way relieved of any moral conflict.
There are examples of in-game objects being used to reveal
important information or parts of the story. In Half-Real Juul
mentions Myst as an example where information is found in
book, on the game world. However in of the cases of Myst the
information is relied visually while in GTA LCS it is relied
aurally, with the option of subtitles.
The aural world of Liberty City surrounds the player. Police
sirens, automobiles and rain sounds flood the user. The game
sacrifices the use of mood-creating music and instead it gives
the player the noise of the city. And in doing so the game creates
an environment that surrounds the player; while player does not
have a 360° view of the game world it can hear all around him.
The player can turn the character around and observe whats
behind it, the player however cannot see both back and front
simultaneously, he can on the other hand hear what is
supposedly happening behind or around the avatar. Through that
use of sound the illusion that a world extends beyond the screen
is reinforced.
Before advancing is important to notice that in the previous
observations no moral judgment is given as to Grand Theft Auto
as a game. And this is partially because the paper analyzes how
information is used to construct a fictional. If the information
and communications engaged in the game are then used outside
the realm of fantasy is of no concern to this paper. That is not to
say that such issues are of no concern to the researcher.
However those issues should be approached in future research
as well as by other academics.
The ability of the player to hear the world around his character
is an ability that proves to be useful during playtime. The main
character is a criminal with very few friends, he may be even
hunted by members of his same gang. The ability to hear police
cars when the character is being looked by the police or to hear
gunshots behind when visiting a rival gang territory can be a
lifesaver. This ability is not always useful since the player may
be in a situation in which the game can not be heard or the aural
conditions are less than ideal. In this case the experience of the
game changes, and such change can be further explored by other
researchers.
There is also an interesting comparison of GTA LCS with a
different game, in this case Jet Set Radio Future. The focus of
the comparison is how sound sets the player in relationship to
the fictional world.
The first installment of Grand Theft Auto provided a bird's eye
view where the player could observe both front and back
simultaneously, in this case the sound can be used for example
to reveal that police cars are in the area although not visible yet.
However the information that the character is being attacked
from behind is better conveyed visually.
The video game Jet Set Radio Future was published for the
Dreamcast console. In JSRF players engage in Skateboarding
battles, during the game the main characters wear headphones.
Trough those headphones the players mute the sounds of the city
and of its surroundings. Nicholls believes that “[t]he use of head
sets in the game sets the player apart from the city that is
transformed in the course of play” and furthermore this creates a
situation in which “[t]he player is situated in a space that refuses
the sociability of urban capital” all this accomplished trough the
5 Conclusions
The previous pages constitute a collection of examples of uses
of sound in Grand Theft Auto. The present study however
covers the use of sound in a very superficial manner. There is
24
to Ludology, The Video Game Theory Reader. Ed. By Mark J.
P. Wolf et al., New York, Routledge, 221-235 (2003)
[5] Herz, H.C., Joystick Nation, New York, Little, Brown (1997)
[6] Innocent, Troy, Exploring the Nature of Electronic Space
Through Semiotic Morphism Melbourne Digital Arts and
Culture Conference, RMIT University,
(2003)
<http://hypertext.rmit.edu.au/dac/papers/>
[7] Juul, Jesper, Games Telling Stories-A Brief Note on Games
and Narratives, Game Studies: The
International Journal of
Computer Game Research, v.1 n.1 (2001)
<http://gamestudies.org/0101/juul-
still a lot of work to be done in order to better analyze both
sound and Grand Theft Auto. There are several issues that could
not be covered by the present study. Future researchers will have
to analyze the actual level of impact that particular use of sound
has on play and the player. For instance it was proposed that
sound helps define the world of GTA as violent and isolated,
and such presentation of the world allowed the player to engage
more easily on the criminal enterprise. However it was not
clearly measured how relevant audio is in the creation of such a
fictional world as compared to visuals, manuals, previous
knowledge of GTA, advertisement or social interaction. It is
possible that sound weights heavily in allowing the player to
commit virtual crimes, it is also however possible that players
rely on previous knowledge of GTA to be able to easily engage
in simulated criminal activity. Further research is needed to
understand what affects the understanding of a virtual world and
at what degree.
gtsHYPERLINK
"http://gamestudies.org/0101/ryan" >
[8] Juul, Jesper, Half-Real, Cambridge, MIT
(2005)
[9] Leonard, David, Live in Your World, Play in Ours: Race,
Video Games, and Consuming the Other, v.3 (2003)
[10] Nicholls, Brett, Ryan, Simon, Game, Space and the Politics
of Cyberplay Melbourne DAC , RMIT University (2003).
[11] Wolf, Mark J. P., and Bernard Perron, eds. The Video Game
Theory Reader. New York, Routledge (2003)
The ability of a player to move between the rules of the virtual
world, fictional world, and the real world should also be
investigated more deeply since such a research may indeed
provide both researchers and academics with information as to
what clues determine the behavior of a person. Such research
may help create more deep and engaging video games while at
the same time help understand connections between media and
violence.
Future research in the field of sound and music, and video
games can also focus on financial issues and their repercussion
on the production in video games. Both Grand Theft Auto San
Andreas and Grand Theft Auto Vice City contained popular
music from groups of the time period they depict. The sequel,
Grand Theft Auto Liberty City Stories, did not contain a heavy
roster of popular music. The price of licensing music makes it
prohibitive. How are smaller developers coping with the costs of
producing sound and music? Furthermore previous video games
on the GTA series allowed the player to place MP3 music files
in a folder and play them when riding a car in the game. The
newer Liberty City Stories did not allowed such thing. A later
software release allowed players to place their own tracks but
not in the MP3 format and only if the player had a physical CD,
no digital music files like those bought on iTunes. Such attitude
makes one wonder if current attitudes toward Digital Rights
Management and Copyright from the content industry are
influencing what game developers can build.
The video game has to be observed as a form that constantly
changes and evolves, such changes have to be measured and
understood, to discover more about both human communication
and the future of the video game.
The researchers have only dipped their toes into the vast pool
that is the world of the video game. There is a wide open field of
study, of which sound is only an part, a minimally studied part.
Further research is needed in every single aspect.
References
[1] Aarseth, Espen, Playing Research: Methodological
Approches to Game Analysis, Melbourne Digital Arts and
Culture
Conference,
RMIT
University
(2003)
<http://hypertext.rmit.edu.au/dac/papers/>
[2] Aarseth, Espen, Cybertext, Baltimore, John Hopkins (1997).
[3] Frasca, Gonzalo, Sim Sin City: Some Thoughts About Grand
Theft Auto 3, Game Studies: The International Journal of
Computer Game Research, v.3 n.2 (2003)
[4] Frasca, Gonzalo, Simulation versus Narrative: Introduction
25
Physically based sonic interaction synthesis for computer games
Rolf Nordahl, Stefania Serafin, Niels Böttcher and Steven Gelineck
Medialogy, Aalborg University Copenhagen
Lautrupvang 15
2750 Ballerup, DK
rn, sts, nboe05, [email protected]
Abstract. In this paper we describe a platform in which sounds synthesized in real-time by using physical models are integrated
in a multimodal environment. We focus in particular on sound effects created by actions of the player in the environment such
as waking on different surfaces and hitting different objects. The sound effects are implemented as extensions to the real-time
sound synthesis engine Max/MSP.1 An 8-channel soundscape is spatialized using the vector based amplitude panning (VBAP)
algorithm developed by VIlle Pulkki [17]. The sonic environment is connected through TCP/IP to Virtools.2
1 Introduction
In computer games and virtual environments, pre-recorded
samples are commonly used to simulate sounds produced by
the physical interactions of objects in the environment, as well
as sounds produced when a user acts in the scenario by, for
example, walking on different surfaces and hitting different
materials. This approach has several disadvantages: first of
all the sound designer needs to gather a lot of sonic material
corresponding to the different actions and events in the environment. This is usually done by using sound effects libraries
or recording sound effects, in the same way as it is done by
a Foley artist in the movie industry [13]. Moreover, sampled
sounds are repetitive, and do not capture the subtle nuances
and variations which occur when objects interact with different forces, velocities, at different locations, and so on. This
is usually overcome by applying processing to the recorded
sounds, so some random variations are present.
However, by using sound synthesis by physical models these
disadvantages can be overcome. Physical models are widely
developed in the computer music community [19], where their
main use has been the faithful simulation of existing musical instruments. One of the pioneers in the field of parametric sound effects for interactive applications such as computer
games and virtual reality is Perry Cook. In his book [6], Cook
describes several algorithms which allow to create synthesized musical instruments and sounding objects, mostly using
physical principles. The issue of creating sound effects using
synthetic models in order to syncronize soundtracks and animation was first explored in [20, 10] using a structure called
Timbre Tree. Recently, synthetic sound models in computer
animation have seen an increase of interest. Van den Doel
et al. [12] propose modal synthesis [1] as an efficient yet
accurate framework for the sonic simulation of interactions
between different kinds of objects. The same synthesis technique has also been used by O’Brien et al. [16], as a computationally efficient alternative to the finite element based simulations proposed in [15]. Complex dynamical systems have also
been simulated both sonically and visually by decomposing
them into a multitude of interacting particles [3], in a system
26
called CORDIS-ANIMA. In it, discrete mass-spring-damper
systems interact with nonlinearities representing the input excitations.
In this paper, we describe a framework for real-time sound
synthesis by physical models of different interactions in a computer game. We focus in particular on impact and friction
sounds produced when a player interacts with objects in the
environment. While the scenario’s soundscape and the ambient sounds are created by using sampled sounds, the focus
of this paper is on sounds produced by actions of the player.
Examples are the sounds produced when the player hits hard
objects or scrapes against surfaces of different materials with
different forces and velocities. Such sounds are well suited
to be simulated using physical models, especially given the
fact that nowadays most game engines have physically based
graphics engine in which forces and velocities of impacts and
friction are calculated. Such physical parameters can be used
as input parameters to the sound synthesis engine.
We are particularly interested in creating physically based sound
models that are rich enough to convey information about a
specific environment yet efficient to run in real-time and respond continuously to user or system control signals. In [9, 8],
Gaver proposes a map of everyday sound producing events.
Examples of basic level events might include hitting a solid,
scraping it, explosions, and dripping noises. More complex
events, then, can be understood in terms of combinations of
basic-level ones, combinations which are structured in ways
which add information to their simpler constituents.
Different platforms which allow to obtain sound synthesis by
physical models are already available in the computer music
community, although they have not yet been exploited in computer games. As an example, the Synthesis Toolkit (STK)
by Perry Cook and Gary Scavone [5] is a collection of C++
classes which implement physical models of different musical
instruments, mostly using the digital waveguides technique
[19].
Another example is JASS (Java Audio Synthesis System) by
Kees van den Doel [12], a unit generator synthesis program
written in JAVA, which implements physical models of different sound effects based mostly on modal synthesis [1].
T
R
A
C
K
E
R
V
V
I
S
U
A
L
I
Z
A
T
I
O
C
V
The current development of novel interfaces for games, such
as the Nintendo Wii,3 stimulates the implementation of a tighter
connection between gestures of the user and corresponding
sounds produced [2]. This connection is strongly exploited
in the computer music community, where so-called new interfaces for musical expression are developed to control several
sound synthesis algorithm,4 but it is yet not fully exploited
in computer games and virtual reality applications. We believe that a stronger connection between player’s gestures and
resulting sonic environment can be obtained by using sound
synthesis by physical models.
The paper is organized as follows. Section 2 introduces a multimodal architecture where sound synthesis by physical models have been integrated; Section 3 describes our strategies to
track positions and actions of the user; Section 4 describes
how the interactive sounds and the soundscape have been implemented; Section 5 introduces the visualization technique
adopted, while Section 6 and 7 present an applications and
conclusions and future perspectives respectively.
2 A multimodal architecture
Figure 1 shows a multimodal architecture in which sound synthesis by physical models has been integrated. The goal of this
platform is to be able to precisely track positions and actions
of the user, and map them to meaningful visual and auditory
feedback. The position of the user is tracked by using a 3D
magnetic tracker produced by Polhemus.5 Moreover, a pair of
sandals equipped with force sensitive resistors (FSRs) allow
to detect when a user performs a step in the environment, together with the force of the impact. Such input parameters are
mapped to the footsteps sounds which are synthesized using
physical models. The Polhemus tracker is connected to the PC
computer running Virtools, i.e., the visual rendering and game
engine, while the footsteps controller is connected to the PC
computer running Max/MSP. The two computers communicate together through TCP/IP. Finally, the synthesized interactive sounds, together with the ambient sounds are spatialised
to an 8-channel surround sound system.
In the following, each component of the environment is described in more details. We start by describing the tracking
systems used, since they are the input of the interactive sound
designed.
3 Tracking the user
As mentioned above, the position and motion of the user are
tracked in real-time using a Polhemus Fastrack tracker and an
ad-hoc designed footsteps’ controller.
I
W
H
O
I
E
R
E
C
O
L
N
E
S
T
O
L
L
E
S
O
U
N
C
O
U
N
O
M
8
S
C
I
R
H
U
E
A
R
F
N
R
A
C
N
O
E
E
U
8
L
N
S
I
S
U
A
O
L
F
I
T
Z
T
A
W
E
T
A
I
M
P
U
T
E
T
G
A
I
T
N
0
a
x
/
M
R
A
C
K
E
R
D
A
E
U
S
R
S
C
P
/
I
P
P
0
S
D
Figure 1: Connection of the different hardware and software components in the multimodal architecture. Two computers providing the
visual and auditory rendering respectively communicate in real-time
the tracker’s data and the sound synthesis engine status.
surement of position (X, Y, and Z Cartesian coordinates) and
orientation (azimuth, elevation, and roll), which are mapped to
the sound engine as described later. Given the limited range
of the tracker of about 1 1/2 meter, the receiver was placed in
the center of the 8-channels configuration.
3.2 The footsteps’ controller
The users visiting the environment are asked to wear a pair of
sandals embedded with pressure sensitive sensors, placed one
in each heel as shown in Figure 2. Such sandals are wirelessly
connected to a receiver, which communicates to the Max/MSP
platform, by using ad ad-hoc designed interface [7].
3.1 The magnetic tracker
The Fastrack computes the position and orientation of a small
receiver placed on top of a hat worn the user, as shown in
Figure 4. This device provides six degrees of freedom mea3 wii.nintendo.com/
4 More information on this issue can be found in the proceedings of the
New Interfaces for Musical Expression (NIME) conference, www.nime.org
5 www.polhemus.com
27
R
O
R
D
T
F
L
U
N
E
D
T
N
E
E
O
P
R
S
M
O
M
S
R
S
R
T
O
S
S
R
N
Although sensing only the pressure of the impact on the floor
does not allow to track all the parameters of a person walking in the environment, and more sophisticated footsteps’ controllers have been built (see, for example, [11]), experiments
with our configuration show that motion of subjects and sense
of presence are significantly enhanced when self-sounds are
added and controlled by this device [14].
T
A
T
T
R
R
A
A
N
C
S
K
M
E
R
I
T
'
T
S
E
R
Figure 4: The Polhemus magnetic sensor is placed on the user’s
head, so auditory and visual feedback can be rendered according to
the position and orientation of the user.
P
R
E
S
S
U
R
E
S
E
N
S
O
R
S
Figure 2: The interactive sandals are equipped with pressure sensors
which trigger footstep sounds and forward movement in the virtual
world.
T
R
R
A
E
C
C
K
E
E
I
V
R
E
'
S
R
Figure 3: The setup of the 8 speaker system. The magnetic tracker
faces used were metal, wood, grass, bricks, tiles, gravel and
snow. Such surfaces were resynthesized using modal synthesis [1] and physically informed sonic models (PHISM) [6, 4].
The footsteps’ synthesizer was implemented as an external objects in the Max/MSP platform.
The control parameters of the synthetic footsteps were the fundamental frequency of each step and the amplitude and duration of each step. The amplitude and duration of each step
were directly controlled by the users thanks to the pressuresensitive equipped shoes. The sensors controlled the frequency
of the steps, as well as their duration and amplitude. To enhance variations among different steps, the fundamental frequency of each step was varied randomly. The different surfaces varied according to the different scenarios of the game
in which the user was present. As an example, when the user
was navigating around a garden, the grass surface was synthesized, which became instantly a wood sound when the user
was walking in a hardwood floor.
emitter is situated in the center, directly above the user.
4.2 3D sound
4 Sound design
Non speech sounds in computer games can be divided into
soundscape or environmental sounds and sound effects. Soundscapes and environmental sounds are the typical sonic landmarks of an environment. They are usually reproduced by
recording and manipulation of existing sounds, and do not
strongly depend on the action of the users. On the other end,
sound effects are usually produced by actions of the user in the
environment, or by interaction between objects, and they can
strongly depend on events in the environment. Such sounds
are highly dynamic and vary drastically depending on the interactions and objects, and therefore are difficult to create in a
pre-production process.
We decided to use sound synthesis by physical models for the
creation of sound effects, and pre-recorded samples for the
creation of the soundscape.
4.1 Interactive footsteps
Footsteps recorded on seven different surfaces were obtained
from the Hollywood Edge Sound Effects library.6 The sur6 www.hollywoodedge.com
28
The pre-designed soundscape which implemented ambient sounds
was spatialized to an 8-channels system using the vector base
amplitude panning technique (VBAP). VBAP is a method for
positioning virtual sources to multiple loudspeakers developed
by Ville Pulkki [17]. The number of loudspeakers can be varying and they can be placed in an arbitrary 2D or 3D positioning. In our situation, we chose a 3D configuration with 8
loudspeakers positioned in the vertexes of a cube, as shown
in Figure 3. This is to preserve the same configurations as in
CAVE systems.
The goal of the VBAP is to produce virtual sources which
are positioned at a specific elevation and azimuth specified by
the user. The idea behind VBAP is to extend the traditional
panning techniques for two loudspeakers to a configuration
of multiple speakers. We used the VBAP algorithm to position the ambient sound in a 3D space. Such sounds are prerecorded samples which are positioned in a 3D space in realtime using the Max/MSP implementation of the VBAP algorithm. The algorithm allows also to simulate realistic moving
sound sources, by continuously vary elevation and azimuth of
the different input sounds.
5 Visual feedback
The visual feedback was delivered using a 2.5x2.5x2.5 m. single screen. 3D visualization was delivered using anaglyph and
implemented in the Virtools platform. Virtools is a powerful
game engine, which provides the possibility of having both
block based programming in a similar way as Max/MSP, as
well as implementation of ones own’s blocks in C++. The 3D
stereo was rendered using two Nvidia GeForce graphics cards
7
. A connection between Max/MSP and Virtools was obtained
by using the flashserver object in Max/MSP8 and the NSClient
BB developed in Virtools.9
6 Application: an hide and seek game
In order to test the capabilities of the platform, an hide and
seek game was developed. In this multi-users game the players have to find each others or escape from each others in a
virtual environment. In the implemented example, the scenario is a small town. The idea behind the game is the connection between two VR CAVEs, with a user in each of them.
The users are equipped with headset microphones, so they can
communicate during the game. The sound of the other person
is then panned into the exact position of the user in the came.
By using auditory cues from the interactive sandals, one user
can also derive location and position of the other person. The
users are represented by avatars. They are only able to see the
other user’s avatar but their own. Two persons outside the environment are connected to the game via LAN network. The
can communicate with the users inside the game, and their
goal is to transmit information about the location of the opponent. The external users are also able to upload 3D objects
or sounds in the game in real-time. In this way, they are able
to disturb the opponent user and enhanced the atmosphere by
varying the current soundscape.
Figure 6: Setup of the game with four users.
7 Conclusion
In this paper we have described a multimodal architecture
where interactive sounds synthesized by physical models as
well as ambient soundscapes have been integrated.
As done in [12] and [18], our current focus is on impact and
friction sounds produced by actions of the user while interacting in the environment. In particular, we have focused our
description on the use of footsteps sounds, since they play
an important role in game design. We are currently extending this architecture to the use of action sounds produced by
interaction of the user with other body parts, such as sound
produced when the user hits, grabs and touch objects in the
environment. As mentioned in the introduction, computer
games currently released in the market use sampled sounds
instead of computer generated sounds. The main reason for
this choice is from one side the high computational cost of
producing high fidelity sound synthesis by physical models,
but on the other side the lack of sound quality of most synthesized sounds. Even in the field of musical instruments, which
have been synthesized by using physical models for more than
three decades, the quality of physical models is yet not as high
as the original instrument which they are trying to simulate.
Of course many progress has been done in this area, but we
are not yet at a point where physical models can be used in a
commercial applications.
We are currently conducting experiments to understand if the
use of physically based sounds enhances realism and quality
of the interaction in a game.
References
[1] J.M. Adrien. The missing link: Modal synthesis. In
Representations of Musical Signals. in: G. De Poli, A.
Picalli, and C. Roads, eds., MIT press, 1991.
[2] T. Blaine. The convergence of alternate controllers and
musical interfaces in interactive entertainment. In Proc.
International Conference on New Interfaces for Musical
Expression (NIME05), 2005.
[3] C. Cadoz, A. Luciani, and J.-L. Florens. Physical models for music and animated image. The use of CORDISANIMA in Esquisses: a Music film by Acroe. In Proc.
Int. Computer Music Conf., Aarhus, Denmark, Sept.
1994.
Figure 5: The view of one user playing the hide and seek game.
7 www.nvidia.com
8 The
flashserver object for Max/MSP was developed by Olaf Matthes.
BB was developed by Smilen Dimitrov at Aalborg University in Copenhagen
9 The NSClient
29
[4] P. Cook. Physically informed sonic models (phism):
Synthesis of percussive sounds. Computer Music Journal, 21(3):38–49, 1997.
[5] Perry R. Cook. Toolkit in C++, version 1.0. In SIGGRAPH Proceedings. Assoc. Comp. Mach., May 1996.
[6] P.R. Cook. Real Sound Synthesis for Interactive Applications. AK Peters, Ltd. Natick, MA, USA, 2002.
[7] S. Dimitrov and S. Serafin. A simple practical approach
to a wireless data acquisition board. In Proc. NIME,
2005.
[8] W. Gaver. How do we hear in the world?: Explorations in ecological acoustics. Ecological psychology,
5(4):285–313, 1993.
[9] W. Gaver. What in the world do we hear?: An ecological approach to auditory event perception. Ecological
psychology, 5(1):1–29, 1993.
[10] J. Hahn, J. Geigel, J. Lee, L. Gritz, T. Takala, and
S. Mishra. An Integrated Approach to Audio and Motion. Journal of Visualization and Computer Animation,
6(2):109–129, 1995.
[11] A. Benbasat Z. Teegarden J. Paradiso, K. Hsiao. Design and implementation of expressive footwear. ICM
Systems Journal, 39(3):511–529, 2000.
[12] P. Kry K.van den Doel and D. Pai. Foleyautomatic:
Physically-based sound effects for interactive simulation
and animation. In Proc. ACM SIGGRAPH, 2001.
[13] R.L. Mott. Sound Effects: Radio, TV, and Film. Focal
Press, 1990.
[14] R. Nordahl. Design and evaluation of a multimodal footsteps controller with vr applications. In Proc. Enactive,
2005.
[15] J. O’Brien, P. R. Cook, and G. Essl. Synthesizing Sounds
from Physically Based Motion. In Proc. Siggraph, Computer Graphics Proceedings, pages 529–536, 2001.
[16] J. O’Brien, C. Shen, and C. Gatchalian. Synthesizing Sounds from Rigid-Body Simulations. In Proc.
Siggraph, Computer Graphics Proceedings, pages 175–
203, 2002.
[17] V. Pulkki. Generic panning tools for max/msp. In Proc.
ICMC, 2000.
[18] D. Rocchesso. Physically-based sounding objects, as we
develop them today. Journal of New Music Research,
33(3):305–313, 2004.
[19] J.O. Smith. Physical modeling using digital waveguides.
Computer Music Journal, 16(4):74–91, 1992.
[20] T. Takala and J. Hahn. Sound Rendering. In Proc. Siggraph, pages 211–220, 1993.
30
The Composition-Instrument: musical emergence and interaction
Norbert Herber
Indiana University Bloomington Department of Telecommunications
Radio-TV Center 1229 E. 7th St. Bloomington, IN 47405 USA
[email protected]
Abstract. As a musician and sound artist, I have always understood the process of composition as the conception and organization of
musical ideas, and an instrument as something that provides the necessary apparatus to realize such a work. However, my recent work
with computer games and digital media has led me become increasingly curious to blur the lines between these terms and consider a
coalescence of “composition” and “instrument.” In digital games and other environments of telematic interaction, a composed musical
work can both stand-alone and provide a point of individual musical departure. Heard on its own the piece creates an experience of
sound. But when altered by one or several users in the course of an interaction, it serves as an agent for further musical expression,
exploration, and improvisation. The composition-instrument is a work that can play and be played simultaneously. This paper, building
on a research project conducted in the summer of 2006, examines the synergies found in the experimental music of Earle Brown and
Terry Riley, Free Improvisation, the game pieces of John Zorn, generative music, the interactive works of Toshio Iwai, contemporary
music practice based on file sharing, electronic instrument construction, and computer game design. Across these disparate genres
there is a confluence of technical and aesthetic sensibilities—a point at which the idea of a “composition-instrument” can be explored.
Examples and previous research by the author are used to focus the discussion, including a work based on swarm intelligence and
telematic interaction.
interaction. This “instrumentalization” transforms the work into
an agent for further musical expression and exploration. Thus,
a composition-instrument is a work that can play and be played
simultaneously.
1 Introduction
In the conventional practice of music, the process of composition
can be understood as the conception and organization of musical
ideas, whereas an instrument provides the equipment necessary
to realize such a work. In contemporary interactive media such
as multimedia web sites, computer games, and other interactive
applications involving the personal computer and mobile devices,
this distinction remains largely the same. The composition of the
music heard in these environments consists of musical statements
to be heard and instructions to be executed in the course of an
interaction. Often these structures call for a great deal of random
sequencing and repetition following a linear structure. [1][2] The
instrument can be simulated in software and manipulated using
the inputs of an interactive system. It is usually represented as a
database of recordings or samples. Composition and instrument
are treated as distinct in the structure underlying the media product and function in their traditionally separate roles.
A composition-instrument is not a specific piece of music or interactive work in itself but a means of approaching any work where
music can be created and transformed. Composition-instrument
is a conceptual framework that helps facilitate the creation of
musical systems for interactive media, art, and telematic environments. This paper will discuss the historical context of this compositional approach and show how it is beginning to emerge in
the current field of interactive media. The example of an original
work aspires to demonstrate how a composition-instrument approach to music exhibits a congruity with the emergent nature of
the medium. And finally, discussion of a contemporary computer
game project exposes the potential of this musical concept in the
world of games, digital art, and telematic media.
This separation, while not wholly damaging to the experience of
the media, should not be immune from scrutiny. Music that operates in a binary, linear mode does little to recognize the emergence,
or becoming, that one experiences in the course of an interactive
exchange. A traditional, narrative compositional approach leaves
no room for the potential of a becoming of music. There is need
for a critique of music in contemporary interactive media. The
emergent, non-linear experience of interactivity is incongruous
with the overly repetitive, linear music that is often heard in this
field. It is time to ask: What kinds of compositional techniques
can be used to create a music that recognizes the emergence and
the potential of becoming found in a digitally-based or telematic
interaction with art and media?
2 History
Though the idea of a composition-instrument hybrid is situated
in the praxis of computer games, telematic media and digital art,
the historical precursors to this kind of compositional approach
lie in an entirely different field and stem from three different musical traditions: Experimental, Improvisatory, and Generative.
Each of these traditions has established aesthetic approaches, creative processes, and musical style. A historical perspective helps
to reveal how these attributes can be woven into the fabric of a
compositional approach for music that operates in art and media
environments with telematic and digitally based interaction.
2.1 Experimental Music
The roots of a composition-instrument approach can be found in
Experimental music. American composer Earle Brown was looking for ways to open musical form and incorporate elements of
improvisation into his music during the 1950’s. He found a great
deal of inspiration in the mobiles of sculptor Alexander Calder.
Brown described them to improvising guitarist and author Derek
Bailey as, “…transforming works of art, I mean they have indigenous transformational factors in their construction, and this
1.1 Composition-instrument
Blurring the traditionally distinct roles of composition and instrument provides one possible answer to this question. This approach
allows a piece of music to play, or undergo a performance like a
traditional composition. When it plays it allows listeners or users
to have a musical experience of sound. But it can also be played
like a conventional instrument. This treatment allows the musical
output of the work to be modified by users in the course of an
31
seemed to me to be just beautiful. As you walk into a museum
and you look at a mobile you see a configuration that’s moving
very subtly. You walk in the same building the next day and it’s a
different configuration yet it’s the same piece, the same work by
Calder.” [3]
Free improvised music depends upon some amount of organization, even if it is minimal. In musical situations where there is
no preparation or discussion of musical intentions, an established
rapport or relationship between performers serves as a kind of
composition. This provides organization through familiarity and
shared sensibilities. Borgo describes an improvising ensemble as
an “open system” that emerges from bottom-up processes driven by players’ relationships and interactions, their training, and
environmental factors. Listening is also a huge factor because it
regulates the dynamics of the performance. Players are constantly
aware of their contributions as well as the contributions of others, and make split-second decisions based on the overall musical
output of the group.
Brown’s thoughts on musical structure are also noted by Michael
Nyman in “Experimental Music: Cage and Beyond.” Brown emphasizes that one importance of composition is to be both a means
of sonic identification and musical point-of-departure. “There
must be a fixed (even if flexible) sound-content, to establish the
character of the work, in order to be called ‘open’ or ‘available’
form. We recognize people regardless of what they are doing or
saying or how they are dressed if their basic identity has been
established as a constant but flexible function of being alive.” [4]
Brown was interested in approaching music with an openness that
allowed every performance to render a unique musical output that
retains the essential character of the work. These compositional
ideas, however, were not exclusive to Brown and his music.
Composition in this genre can be more formalized as well. Saxophonist Steve Lacy talks very openly about how he uses composition as a means of mobilizing a performance and creating
a musically fertile situation that can nurture an improvisational
performance. He stated, “I’m attracted to improvisation because
of something I value. That is a freshness, a certain quality, which
can only be obtained through improvisation, something you cannot possibly get from writing. It is something to do with ‘edge’.
Always being on the brink of the unknown and being prepared for
the leap. And when you go on out there you have all your years
of preparation and all your sensibilities and your prepared means
but it is a leap into the unknown. If through that leap you find
something then it has a value which I don’t think can be found
in any other way. I place a higher value on that than on what you
can prepare. But I am also hooked on what you can prepare, especially in the way that it can take you to the edge. What I write is
to take you to the edge safely so that you can go on out there and
find this other stuff.” [3]
Terry Riley’s In C, composed in 1964, is a seminal work in both
the Experimental and Minimalist music traditions that shares in
the compositional approach discussed by Brown. The piece consists of 53 melodic phrases (or patterns) and can be performed by
any number of players. The piece is notated, but was conceived
with an improvisatory spirit that demands careful listening by all
involved in the performance. Players are asked to perform each
of the 53 phrases in order, but may advance at their own pace,
repeating a phrase or a resting between phrases as they see fit.
Performers are asked to try to stay within two or three phrases of
each other and should not fall too far behind or rush ahead of the
rest of the group. An eighth note pulse played on the high C’s of
a piano or mallet instrument helps regulate the tempo, as it is essential to play each phrase in strict rhythm. [5][6]
2.3 Game Pieces
A similar aesthetic is evident in John Zorn’s compositional approach to his game pieces, which he considered as a later-day
version of Riley’s In C, “… something that is fun to play, relatively easy, written on one sheet of paper. Game pieces came about
through improvising with other people, seeing that things I wanted to have happen weren’t happening. [10] Zorn discusses the
compositional direction he followed, “The game pieces worked
because I was collaborating with improvisers who had developed
very personal languages, and I could harness those languages in
ways that made the players feel they were creating and participating. In these pieces, they were not being told what to do. You
don’t tell a great improviser what to do—they’re going to get
bored right away.” [10]
The musical outcome of In C is a seething texture of melodic patterns in which phrases emerge, transform, and dissolve in a continuous organic process. Though the 53 patterns are prescribed,
the choices made by individual musicians will inevitably vary,
leading to an inimitable version of the piece every time it is performed. Riley’s composition reflects the imperative of self-identification expressed by Brown, but it also illustrates some of John
Cage’s thoughts on Experimental music, when he writes that the
“experiment” is essentially a composition where “the outcome
of which is unknown.” [7] In performance, In C has indefinite
outcomes and yet is always recognizable as In C due to the “personality” of the composition—the patterns and performance directions that comprise the work.
In an interview with Christopher Cox, Zorn explains his rationale
behind this position. He emphasizes how the individuality of the
players he selected to perform the game pieces was an essential
part of the compositional process, “I wanted to find something to
harness the personal languages that the improvisers had developed on their own, languages that were so idiosyncratic as to be
almost un-notate-able (to write it down would be to ruin it). The
answer for me was to deal with form not with content, with relationships not with sound
sound.” [11] Zorn understood the musicians in
his ensemble and knew what they were and were not interested in
playing. He was able to situate their personal musical vocabularies in a larger structure that allowed for freedom and individual
expression while also satisfying his own musical objectives.
2.2 Free Improvisation
There are links between Experimental music practice and improvisatory music. Free Improvisation is a good example of this. The
genre took root in Europe in the early 1960s, with London, England serving as a major hub in its development. [3] This genre,
in spite of labels and stereotypes, still involved elements of composition. One instance of this can be found in the coalescence of
performing groups. In his essay “Les Instants Composés,” Dan
Warburton notes that “The majority of professional improvisers
are choosy about who they play with…and tend to restrict themselves to their own personal repertoire of techniques.” [8]
David Borgo, in a recent publication on music improvisation and
complex systems [9], acknowledges that this characteristic in
free improvisation praxis comprises an important aspect of the
musical organization and composition in these performances.
2.4 Generative Music
Experimental music composition, and techniques or processes
of composition found in various forms of improvised music are
32
similar to the work involved in modeling an emergent, self-organizing system. Generally, all involve a bottom-up structural
approach that generates emergent dynamics through a lack of
centralized control. The same can be said of generative music.
Musician, composer, and visual artist Brian Eno has been working with a variety of generative structures throughout his career.
He looks at works like In C, or anything where the composer
makes no top-down directions, as precursors to generative music. In these works detailed directions are not provided. Instead
there is “a set of conditions by which something will come into
existence.” [12]
some way, generative processes that affect the sound as well as
the visuals and overall experience of the piece. These processes
occur in a variety of ways including telematic exchange, random
ordering and selection, and computer algorithms. Depending
upon the nature of the work, several generative processes may be
used, each in a different way, leading to a unique experience for
the end-user or listener.
As discussed earlier, emergence is an important quality heard in
Experimental, free-improvised, and generative music. It is also
a fundamental aspect of contemporary digital art works, and can
arise from a variety of sources, “ordering itself from a multiplicity of chaotic interactions.” [14] The pieces discussed here are
no exception. Whether through the layering of sonic and visual
patterns, navigation of a data space, evolutionary algorithms, or
telematic exchange, one cannot ignore the emergent properties
that characterize these works.
Eno’s influential Ambient recording Music for Airports was created using generative techniques [13]. Rather than deal directly
with notes and form, generative composers create systems with
musical potential. Eno refers to this as “…making seeds rather
than forests,” and “…letting the forests grow themselves,” drawing on useful metaphors from arboriculture. An important aspect
of this approach, however, is in setting constraints so that the generative system is able to produce what its creator (and hopefully
others) will find to be interesting. In a recent conversation with
Will Wright, the designer of The Sims and SimCity, Eno explains
the reasoning behind this, “You have to care about your inputs
and your systems a lot more since you aren’t designing the whole
thing (you are not specifying in detail the whole thing) you’re
making something that by definition is going to generate itself in
a different way at different times.” [13]
3.1 Electroplankton
Electroplankton, created for the Nintendo DS game system by
Toshio Iwai, was released in Japan in 2005, and later in Europe
and North America in 2006. Iwai writes that the idea draws on his
fascination with different objects across the course of his life—a
microscope, a tape recorder, a synthesizer, and the Nintendo Entertainment System (NES). [15] Some consider it a game; others
a musical toy. Either way, Electroplankton captivates player and
audience alike with its engaging use of sound and animation controlled via the touch-sensitive screen of the Nintendo DS device.
Using a stylus, players are able to draw, twirl, tap, and sweep
an array of animated plankton characters on the screen. There
are ten different plankton “species;” each with its own sounds
and sound-producing characteristics. Plankton and their behavior are linked to a pitched sound or a short recording made by
the player using the device’s built-in microphone. Manipulating
an individual plankton (or its environment) initiates a change in
the sound(s) associated with it—a different pitch, timbre, rhythm,
phrase length, and so on. As multiple plankton are manipulated,
a shift in the overall sonic output of the system is apparent, causing the music of Electroplankton to produce textural patterns and
foreground/background modulations similar to those of In C (as
described earlier).
These techniques—experimental, improvisatory, and generative—exhibit in their emergence a becoming. With each, the
simple rules or relationships that form a composition act together
and lead to unexpected, unpredictable, or novel results. Musical
gestures show a ripple of promise, take ephemeral form, and then
dissipate. Often this process requires a great investment of attention and time on the part of the listener. Time is especially
important in Generative music, where the intentions are not to
produce an immediate effect or shock of perception, but a gradual
transformation as sounds are heard in the ebb and flow of the
generative process. This quality of becoming can be similar to
the emergence of a telematic environment or an experience with
interactive art or media.
Interactions with the plankton turn the Nintendo DS into an instrument that can be played purposely through the manipulation
of the onscreen animations. Simultaneously, the software programming that links sounds to the plankton and their environment
represents a musical ordering, or composition that is implicit in
Electroplankton. The coupling of these attributes perfectly illustrates how the combination or blurring of composition and instrument can lead to an interactive work with profound musical
potential.
3 Contemporary related works
While a true blurring of composition and instrument has not been
fully realized in contemporary practice there are a number of
works that show the potential embedded in this approach. All examples discussed here demonstrate the latent quality of “composition-instrument” in the current art and media landscape. All of
these works share three characteristics: asynchrony, emergence,
and generative-ness. Asynchrony is a key factor in the processes
of interaction. An input will have an affect on the output of the
system, but it may not be immediately or fully apparent at the
moment of interaction. While at first this approach may seem
misleading or unresponsive, it is essential in shaping the music
and the listening experience it creates. Whereas an immediate
response would cause users to focus on functionality and “what
it (the software/music) can do,” a delay—however slight—helps
keeps them focused on listening and allows for a more gradual
and introspective process of discovery. Additionally, it retains the
potential for musical surprise. Listeners know that the music is
changing but they are unlikely to be able to anticipate the nature
of its transformation.
3.2 Additional Examples
The musical qualities embedded in Electroplankton provide a
clear—but not a sole—example of ways in which a compositioninstrument approach is latent in contemporary games and digital
art works. Following are several short descriptions of additional
projects that share a similar musical sensibility. To retain the focus of this paper, lengthy discussions have been avoided. However, readers are encouraged to pursue further investigation into
these projects beginning with the web sites provided here.
3.2.1 Rez
Rez, designed by Tetsuya Mizuguchi for Sega Dreamcast and
Sony Playstation 2, is described as a musical shooter game.
Change occurs by way of interaction but also through various
means of generation. All of the works discussed here contain, in
33
Players enter the cyber world of a sleeping computer network
to destroy viruses and awaken the system. [16] Each successful
shot leads to the performance of sounds and musical phrases that
perform/compose the soundtrack for Rez in real-time as a direct
result of the game play. Both the visual and audio experience
leads players to feel an immersive, trance-like state that makes
the game incredibly captivating. More information on Rez can be
found at www.sonicteam.com/rez. Readers may also be interested
to see other musically focused games that require physical or
“twitch” skills such as Amplitude, Band Brothers (a.k.a. Jam With
the Band or Dai Gassou! Band Brothers), Dance Dance Revolution (a.k.a. Dancing Stage), and Guitar Hero.
4 The Composition-Instrument in Contemporary
Projects
As stated earlier, a composition-instrument approach is latent in
contemporary practice. There are many excellent projects where
the seeds of this approach are visible but no single work has yet
realized the full potential bound within the idea. Following is a
discussion of projects that either seek to—or have great potential
to—embody the composition-instrument approach.
4.1 Perturb as a Model of Interaction
Perturb is a project developed by the author in tandem with the
research that helped inform this paper. It was created with the
intent to provide a very basic and clear illustration of the composition-instrument idea. Perturb shows how music can be composed and performed in real-time via generative systems and user
interaction.
3.2.2 Eden
Eden, by Jon McCormack, is described as an “interactive, selfgenerating, artificial ecosystem.” [17] In more general terms, it
is a generative installation artwork of sound, light and animation,
driven by Artificial Life systems and environmental sensors. [18]
Eden situates visitors in a room, standing outside the virtual ecosystem that is represented by a projected, cellular lattice in the
room’s center. A visitor’s presence in the room can impact the
ecosystem favorably. Someone standing in a particular location
makes the adjacent space more fertile for the creatures, or “sonic
agents,” that inhabit Eden. The lives of these creatures involve
eating, mating, fighting, moving about the environment, and central to the musical character of the piece—singing. One way or
another, all of these activities lead to both the visual and aural aspects that comprise the work. More information about Eden and
McCormack’s publications can be found at www.csse.monash.
edu.au/~jonmc/projects/eden/eden.html.
The title was conceived by considering the nature of musical interaction in these works. Composition-instrument was initially
defined as a work that can “play and be played,” and serves as a
conceptual framework for music in interactive media and digital
art. The concept strives to find a balance; neither the ability to
“play” nor “be played” should dominate a user’s experience. If
interactions are too direct (“be played” is too apparent), the piece
becomes too much like an instrument and the significance of other aspects of the artwork can be diminished. Similarly, if an unresponsive musical environment obscures interactions and “play”
dominates the experience, the work loses its novelty in being tied
to the course of a user’s interaction. The composition-instrument
approach permits equilibrium between these two and as a result,
acknowledges user interactions as perturbations in the overall
musical system. In this context a perturbation is understood as
a ripple sent through the musical system due to an interaction. It
does not take on the clear cause-effect nature of a musical instrument (press a key to hear a note, for example). Instead it allows
interactions to manifest as sound, gradually following the course
of the composition’s generative process. Perturbations introduce
new sounds into the composition’s aural palette and can subtly
reshape the musical character of the work.
3.2.3 Intelligent Street
Intelligent Street was a telematic sound installation where users
could compose their sound environment through SMS messages
sent via mobile phone. [19] The piece was developed in 2003 by
Henrik Lörstad, Mark d’Inverno, and John Eacott, with help from
the Ambigence Group. Intelligent Street was situated simultaneously at the University of Westminster, London and the Interactive Institute, Piteå, Sweden via live video connection. Users at
either end of the connection were able to see and hear the results
of their interactions. Using freely associated, non-musical terms
such as “air” or “mellow,” participants sent an SMS message to
Intelligent Street, and were able to hear how their contribution
impacted the overall composition. [19] Simultaneously, all received messages were superimposed over the video feed to create a graphic representation of the audible sounds at any given
time. Intelligent Street showed how music could be used to set the
mood of a physical space through processes of cooperation and
composition across groups of people in distributed environments.
[20] Further information about Intelligent Street is available at
John Eacott’s web site (www.informal.org), Henrik Lörstad’s web
site (www.lorstad.se/Lorstad/musik.html),
www.lorstad.se/Lorstad/musik.html), and the Interactive Inwww.lorstad.se/Lorstad/musik.html
stitute of Sweden (www.tii.se/sonic.backup/intelligentstreet).
As a basic illustration of the composition-instrument approach,
Perturb consists solely of an interface for introducing perturbations into the musical system. It offers nine separate modules that
can hold sound samples. Running alongside the nine modules is a
generative musical system based on the Particle Swarm Optimization algorithm developed by Kennedy and Eberhart [22][23].
The swarm has nine agents that correspond to each of the nine
sound modules of the interface. As the system runs, the dynamics
of individual agents within the swarm send cue messages that tell
a module to play one of its attached sound samples. Users have
the ability to attach an array of preset sounds. Or they can attach
sounds on an individual basis. Either way, when an agent sends
a cue message to its sound module, a randomly selected sound
from the module is heard. As all agents act together, the music
of Perturb begins. Users can improvise within this structure (or
perturb it) in several ways. They can use as many or few of the
nine modules as they like, which results in thinning or thickening
the musical texture. Users are also able to choose which sound(s)
are attached to each module. They can draw from a preset database of sounds or use sound files they have created themselves.
Any of these interactions—adding/removing sounds or modulating the sonic texture—allows the work to be played
played. Simultaneously, while following the generative structure directed by the
swarm, the work is allowed to play on its own accord. The tension
between interactive control and generative autonomy define the
3.2.4 PANSE
PANSE, or Public Access Network Sound Engine, is an open platform for the development of audio-visual netArt created by Palle
Thayer. The project exists online as a streaming audio application, and consists of a synthesizer, two step sequencers, and an
effects generator. [21] PANSE creates an opportunity for artists
and musicians to create interfaces that control, or animations that
are controlled by, the PANSE audio stream. Information about
PANSE including technical specifics for connecting to the stream
and interface authoring is online at http://130.208.220.190/
panse.
34
nature of an interaction as a perturbation. User choices are recognized within a system, but are subject to the dynamics of that
system before they can become manifest.
particular species? What devices do they use to make music, and
what is the sound of that music?
In a game of becoming like Spore, a composition-instrument approach would be very advantageous. Composition-instrument
monitors interactions carefully and sees each as perturbation that
will have a gradual consequence within the system where it is
sensed. In the way that procedural content generation leads to a
natural mode of locomotion for a creature, perturbations to the
musical system lead to a natural development of sounds that define that creature and its culture. As creature and culture develop
and evolve, the sounds and music that are part of their identity
take on new forms and tonalities. The generative nature of Spore
can help to sustain this development. The game maintains its own
internal sense of progress and evolution as it grows new creatures, new landscapes, generates climates, and pollinates one
world with the contents of another. This continuous process of
generation provides the exact dynamics that enable a composition-instrument piece to play while a gamer’s interactions in the
Spore world play music with it.
Perturb was created to demonstrate the musical and technical characteristics of a composition-instrument approach. The strength of
the piece is in its musical expressiveness and flexibility, but it
does not fully address the connection between music conceived
in the composition-instrument approach and an interactive system or artwork. There are however other contemporary projects
where the foundations of a substantial connection between music
and interaction seem to be in the process of formation.
4.2 Spore—The Potential of Becoming
Spore, the current project of game designer Will Wright, is a project where a composition-instrument approach could be fruitfully
employed. Spore is slated for commercial release in the secondhalf of 2007 [24], which means that much of the argument offered
here is speculative. Few details concerning Spore’s gameplay and
features have been officially confirmed. However, there have
been enough published articles, screen captures, and interviews
with Wright to leave one with a good impression of the overall
flavor of Spore.
5 Conclusion
A composition-instrument approach embodies qualities of music
formally understood as “composed” and “improvised.” Works that
use this idea are like generative music compositions in that they
have their own internal order or organization. They are also like
instruments in that they can be played, or performed-upon, and
in the course of that performance, make an impact that modifies
the character or course of the music outputted by the generative
system. This “instrumentalization” allows for perturbations in the
generative system and leads to an emergent becoming of music.
When coupled with an interactive game system, the compositioninstrument piece becomes a soundtrack that is both responsive to
the game state and autonomous in its ability to adapt and develop
relative to that state. This approach to music for games, or any
sort of interactive digital system, hopes to open new opportunities
for music in digital art and media, and to break down the linear
models that have stifled creative progress in this area.
In the game, players have the ability to design their own characters. These creatures can look like lizards, horses, trolls, or cutesy
cartoons—whatever a player decides to create. One potential difficulty with this feature then becomes animating such an unpredictable variety of creatures. How can the game accurately simulate the motion of creatures that walk with tentacles, or creatures
that have legs like waterfowl, or other exotic means of locomotion? This challenge presents one of the most promising aspects
of Spore—the use of “procedurally generated content.” [24] [25]
GameSpot news describes this as “content that’s created on the
fly by the game in response to a few key decisions that players
make, such as how they make their creatures look, walk, eat, and
fight.” [24] The technology behind this aspect of Spore has not
been revealed, but Wright describes it using an analogy: “think
of it as sharing the DNA template of a creature while the game,
like a womb, builds the ‘phenotypes’ of the animal, which represent a few megabytes of texturing, animation, etc.” [25] Spore
also uses “content pollination” to complete the make-up of one
player’s world using the assets of another player. [26] The basic
sharing of resources is simple enough to grasp, but to be able to
distribute these resources realistically and allow them to engage
in believable interactions with another environment must involve
a complex Artificial Life (or A-Life-like) system. If the world of
Spore is to be a fluid ecosystem as promised, there will have to
be some sort of self-organizing system or generative, non-linear
dynamics that underlie the entire game and allow it to unfold in a
natural, organic fashion.
6 References
[1] Online reference: www.gamasutra.com/features/20000217/
harland_01.htm
[2] Alexander Brandon, Building an Adaptive Audio Experience,
Game Developer, Oct., pp.28-33, (2002)
[3] Derek Bailey, Improvisation: its nature and practice in music, 2nd ed., New York, Da Capo, (1992)
[4] Michael Nyman, Experimental music: Cage and beyond
beyond, 2nd
ed., Cambridge, U.K.; New York, Cambridge University Press,
(1999)
[5] Online reference: www.otherminds.org/SCORES/InC.pdf
[6] Terry Riley, In C, (1964)
[7] Cage, J. (1973). Silence: lectures and writings, 1st ed., Middletown, Wesleyan University Press.
[8] Dan Warburton, Les Instants Composés
é , in Marley & Wasés
tell, et al, Blocks of consciousness and the unbroken continuum,
1st ed., London, Sound 323, (2005)
[9] David Borgo, Sync or swarm: improvising music in a complex age, 1st ed., New York, Continuum, (2005)
[10] Ann McCutchan and C. Baker, The muse that sings: composers speak about the creative process, 1st ed., New York, Oxford University Press, (1999)
[11] Christopher Cox & Daniel Warner, Audio culture: readings
in modern music, 1st ed., New York, Continuum, (2004)
The generative aspects of Spore (whether documented in an article or speculated here) show that it has, as a central component
of its functionality, the ability to become. Wright has commented
that at one point the game was titled “Sim Everything.” [26] [26]
Most likely this is due to the ability of the game to become any
kind of world the player/designer intends. This focus on customization of experience, growth, and becoming are what make Spore
such an ideal environment for music. In addition to exploring (to
name a few) the physical, dietary, and architectural possibilities
of culture in this game environment, it would also be interesting to explore musical possibilities. What sounds resonate with a
35
[12] David Toop, Haunted weather: music, silence, and memory,
1st ed., London, Serpent’s Tail, (2004)
[13] Brian Eno and Will Wright, Playing With Time, Long Now
Foundation Seminar, San Francisco, June 26, (2006)
[14] Roy Ascott, Telenoia, in Ascott & Shanken, Telematic embrace: visionary theories of art, technology, and consciousness,
1st ed., Berkeley, University of California Press, (2003)
[15] Nintendo of America, Electroplankton instruction booklet,
1st ed., Redmond, Nintendo, (2006)
[16] Online reference: www.sonicteam.com/rez/e/story/index.
html
[17] Online reference: www.csse.monash.edu.au/~jonmc/projects/eden/eden.html
[18] J. McCormack, Evolving for the Audience, International
Journal of Design Computing, 4 (Special Issue On Designing
Virtual Worlds), Sydney (2002)
[19] Henrik Lörstad, M. d’Inverno, et al., The intelligent street:
responsive sound environments for social interaction, Proceedings of the 2004 ACM SIGCHI International Conference on
Advances in computer entertainment technology, 74, pp.155162, (2004)
[20] Online reference: www.turbulence.org/blog/archives/000122.html
[21] Online reference: http://130.208.220.190/panse/whats.htm
[22] James Kennedy and Eberhart, R., Particle Swarm Optimization, Proceedings from the IEEE International Conference on
Neural Networks, 4, pp.1942-1948, (1995)
[23] Norbert Herber, Emergent Music, Altered States: Transformations of Perception, Place and Performance, 1, DVD-ROM,
(2005)
[24] Online reference: www.gamespot.com/news/6155498.html
[25] Online reference: http://en.wikipedia.org/wiki/Spore_
(game)
[26] Online reference: http://technology.guardian.co.uk/games/
story/0,,1835600,00.html
36
Investigating the effects of music on emotions in games
David C Moffat and Katharina Kiegler
([email protected])
eMotion Lab,
Division of Computing
Glasgow Caledonian University, UK
(http://www.gcal.ac.uk/)
Abstract The importance of music in creating an emotional experience for game-players is recognised, but the nature of the relation
between music and emotion is not well understood. We report a small study (N=15) in which players' skin conductance, heart-rate and
pupil-dilation were recorded while watching brief film clips, and listening to pieces of background music. The main film clip was
fearful in mood; and the music pieces expressed different basic emotions: happy, sad, aggressive, and fearful. There were definite
effects of the music on the physiological measures, showing different patterns of arousal for different music. The interactions between
music and film-clip feelings are complex, and not yet well-understood; but they exist, and are relevant to film and game makers. They
can even change the way a player assesses the game, and thus change the play itself.
1
Introduction
2.1
Fifteen students at the university, 11 male and 4 female, aged
between 18 and 26, volunteered to take part in the experiment,
which took about 25 minutes on average. They sat comfortably in
the eMotion Lab, which is designed like a typical living room at
home, and were told that they would see several short film clips
on a large, high-quality plasma TV-set, and fill in some short
questionnaires about them. The clips would be trailers of new
video-games.
The N=15 participants were divided into three random groups
of N=5 each: G1, G2 and G3.
Music has long been known to evoke emotions in people. Even if
researchers still argue whether music really elicits emotional
responses in listeners, or whether it simply expresses or
represents emotions, they agree that music provides a sort of
emotional experience and affects our moods (Sloboda & Juslin,
2001).
The computer's possible understanding of a user's emotional
state is becoming important for HCI; and feasible (Picard 1997).
There are attempts to use physiological measures, such as heartrate and skin-conductance, to sense activity of the autonomic
nervous system. From that data, one could make an educated
guess about the user's general state of arousal, or more (e.g.
Mandryk 2005). However, it must be admitted that efforts to find
reliable links between emotion and physiological response have
not been very successful so far. The mere existence of an
emotional state, of some undetermined kind, is typically all one
can affirm.
We aim to study the user experience of video-games in our
eMotion Lab: in particular the user's emotions. Since games are
intended to be fun, it should ideally be part of the usability testing
for games, that their emotional effects on the player be
understood. The eMotion Lab is a usability lab where colleagues
investigate, among other things, the effect of background music
on a player's performance.
Game designers need to understand the connection between
games and emotions when they use music and sound effects to
enhance the experience of players, and so they need the support
of a research effort in this area.
2
Participants
2.2
Method (materials and equipment)
The lab is a friendly environment for playing video-games,
complete with comfortable sofa, several games platforms, large
plasma screen, a one-way observation mirror and CCTV videocameras for observation and recording. The model of the
Tobii.com eye-tracker is quite unobtrusive, so that it does not
interfere with the player's experience, and we can get more
authentic data about the player's emotional state. To measure
skin-conductance (SC) and heart-rate (HR) we used a device from
a biofeedback game called “The Journey to Wild Divine”
(wilddivine.com).
The eye-tracker was used to measure dilation of the pupils. It is
already well-known that pupils dilate under cognitive load
(Beatty 1982), and other forms of arousal including both positive
and negative emotions (Partalla and Surakka 2003). The dilations
could be of interest if different emotions appear to have different
pupil dilation patterns.
The film-clips used were trailers for different video-games,
with their original or different pieces of music. Only the clip for
the new game Alan Wake (by Remedy) was of direct interest to
us. The other clips were to set an initial neutral mood for all
participants, to separate the repeat showings of the Alan Wake
clip, and disguise the purpose of the experiment. The clip from
Alan Wake was chosen to be ambiguous. It is not clear what is
happening, and so the participants would be free to choose an
interpretation according to the background music played with the
clip.
The pieces of music that were played with the Alan Wake clip
were for the following basic emotions: fear, sadness, anger (or
aggression), and happiness. There was also a no-music (silence)
Experiment design
To investigate the feasibility of detecting emotion or mood in a
naturalistic setting, we ran an experiment where participants
watch a series of short film-clips, with different pieces of music
in the background. The clips were trailers for video-games. The
music was to evoke different moods, so that we could observe the
effects on experience and physiology, including pupil-dilation.
The results were analysed to determine how the emotional
influence of music changed the participants' feeling, impression,
perception and assessment of the film-clips.
37
�� ��
���������"���������������������� ����������������������������������
��������"���������� ��������������������������������������������
����������������������������� ����������*�!����������������������
�� � ���)� � ��������� � �����" � ��� � ���������� � ������� � ����� � �����
����������� ���)����������������������������������������)������"�
�������������������������������D������������������������������
��������������"������������������������������������������������
����� � ��� � �������� � ��������� � �� � ��������� � !������ � �� � ����
�������������)����������!���"
���������������������� !����������������������)�����������
���������������������)�������!�����������������������������
�������������������������������������������������������������"�
����������������(����������������<������������������������������
���� � ���� � ������� � ���� � ��� � ����� � ������ � ��� � ������ � ���� � ����
��<�������������"
���
������ �����
�
���������
�����������
�
�
����&
������
��
.������
��
8������
�1
��
.��
��
E���������
�>
��
3����
��
8������
������
����
8������&" �5������������ �����������������������������������
����������������(��������������������������������������������������
������������������������������������������������������8��"�&"�����
�������������������������������������������������"������������
�����������������������������������������������������������������*�
!�� � ��� � �� � ����� ���� � ��� � ���� � ������� � �����" � ��� � ����������
!������������������������������������������������������)������������
���������!������������!��������������������������������������������
����� � ���� � ��� � ���� � ������� � ������� � ��� � ���� � ���� � �������
@E����A�"
E�������������������������������������������������������������
�������������������������������������������������������)�����������
����������������������@8������A�������"��������������������������
��� �������������*������ ��� ������������ ������������ ��� ����"�B����
������������������������������������������������������������������
�����������������"
E������������� ��� ������������������������������������������
��������� �������������������!���������������������������!��������
����������������"�8�������������������"1"&���������� 1�����������
����� �� ��� ����� ���� ����� �� ����� ����� ����� ����� ��� ���
����� �� ���� @���A" ��� ����� ������� �� ����� ���� ��� !������
�������������������(����������������������������������������*�!���
��������������������������������������"
����@�����A�������������������������������!���������������������
����" ���� ����������� � �������� ������ ���������� @�����A� �!��������
������������@�����A"
4���������������������������������������������������������������
1"'������������������������������������������3������������@�����A�
�����*����������@����������A���������������E����*����������@���A�
��������������.��*��������������@�������A�������������������8���"
.����������������������������������������������������������������
��������������������"�#����������� �������������������������� �������
�����������������������"�C�������������������)�����!�����������
�������������������������������8��"�&���������������� ��������
�����"�����@���A������������������!���������������������������������
�� � ������� � �� ����� � ��%�"22>�� � ��� � �������" � ��� � ����� � ������������
�������������������@�����A��������������������������������������
�� � �� � ����� � �� ����� � ��%�"22222&� � �� � �������� � ����" � B��� � ����
@�������A � ��� � @����������A � ������ � ���� � ���� � ��!������ � ���
�������� � ���� � @�������A � ����� � ������� � ���� � <���� � ���� � ����
@����������A � ���� � <���� � ����� � ���� � ����������� � !���� � ����
�������������������6'G�������"
��������������������������)�����!���������������������������
���������������������������� ���������"�����@����������A�������
�����������������������������������������������"&"1�@�������A�������
�� ��������%�"226�"�.���������������"1"&�@���A������������������������
����1
�&
���
����������
����� ������ ����� �� �� ������� ������ ��!F���� �������� ���
��� ���� ��������� ������� ������ ����� ����� �� �������" .������
���������!������������������������������������)���������������
��������" � .��� � ����� � ���� � !��� � ��)�� � �� � ������� !������ � ����
������������������������������������������������������������!���
�������������������������������������������������"�E����������������
����������������!���������������������������������������&������>��
����������!�����������������������"��������������!�������������
��� � ���������� � ������ !� � ��� � �� � ���� � ������������� � �� � ��� ������
������������������"��1�����!� ��������� ���������� ���������������
�����������������<����������������������"
��
���������
���������
������������
�
�����
��!���&" �,�����������������������������!��)���������������"������&�����1�
��������������������������������������"
������
����� ���
�
5����������� �5�� ������� ��� ��� ��� ���� ������ ���� ���������
����� �� ��� !��)������" B����� ����� ���� ������� ���������
������ ���� ��������� ������ �� ����!���� � ������ ���� ���
�����!���" B������ ��� ��� ��� ��� ������ ���� ����� �������
����� ����� ���� ����� ������ �� �������� ����� ��� �� ���) �������
�� ��� ����� ������������ �� ���� ��� ��������� ����� �� ��� ������
������������ ��� � ������ �� ������ � ��������������" .��� �� ���
����� ����� ��� ����� ���� ��������� �� ���� ��� ��� ��� �����
���� ��� ��� ���� ���� ���� ����� ���� ��������� ������ ��� ���
������ ������� ����� ����!� ���� � ��������"
5�������������5����������������������������������������������
������������������$%'���������������������������������������������
!��������������������������)���������������������������������
��� ��� �����" ����� �& ��� �� ����� ������� ������ ��� �����
���� ��� ������� �����" .�� ��!�� &" �1 ��� ��� ��� ����� ������
������������������������������������*��������>�����������������
�����������������������������������"
��������+
�����
#������� �������������$���%������&
#����������������������������������������������������������������
�������������������������������!�������������������������������
�������!����������������������"
!������������"���
������������������������������������(������������<���������������
���������������������!������������������!�����������������������
�������������������������"
38
�����������������������@�������A����������"&"1��� ��������%�"216�"�
����������@�������A���������">"1�����������������������������!�������
<������������������������6'G��������� ��������"2;�"�E����������������
������������������������������������@�����A�����@�������A������
���� � ����� � ��� � ������� � ���� � ����� � ������� � ������� � !�� � ����
������������������������6'G������"
�������������������������������������������������������������
��������������)�����������������������������������"�3�������������
����������������������������������������������������������������
�����������������������!�����������������"�4������������!�����������
���� � ��������� � ������ � �� � ����� � ���� !� � ����� � ���� � ��� � �<������
�������������� � �� � ����� � �������� � ��� � ���� � �������� � �����������
������ � !� � ����� � !������ � �� � ��� � ���� � �� � ������� � ���� � ����
�������������������������������������������������������������"
�� ��
!�������������������������������������������������������������������
����!������������!����������������"�B�������������������������������
���������������������������������������!�����������!���������������
�����������������������������!����������������������������������"
?���������������������������������������������������8��"�>����������
�������������������������������������������&"1������>"1�������
������� ����� ���� � ���������� ������������"� ��� � ����)���� �����������
���� � !� � ��� � �� � ����� � ������ � ���������+ � ������� � �����" � �����
���������������������������������������������������������������������
���������"
4
G.1.1
nosound
3
G.2.1 sad
G.1.1
nosound
Fearful
Angry
4������������������������������������ ���8��"�1 ��������������!������
8��"�&� � ����� � �� � ���������� � ��� � �� � � � ���� � ���� � ��)�� � ����
������������������������8��"�>"
3
Happy
Sad
Fearful
Angry
1
0
G.1.2
fear
G.2.1
sad
G.2.2
aggress
G.3.1
happy
G.2.2
aggress
G.3.1
happy
G.3.2
fear
C�� ����� �!��������� �!������������� �� ���������� <���� ����������
������ ��� ������" ��� ��������� �� �1"&� �� ��� ����� ��)�� ���
������������ ���� ������ �� ����� �������"
E���������������������������������������������������������������
���������)��<��������������"�3�����������������������1"&���8��"�9�
���� � ��� � !������� � �� � 8��"�>� � !������ � ���� � ����� � �� � �������� � ��
������������*�!�������������������������������������"
��)��� ��� �!��� �!���������� �� ��������� ��� ���
����������������������������������������������������������������
�!���������@������������������A������!������������������������
����� � �� � ��� � !��)������ � ��� � ��� � ����� � �� � ������ � ���������
���������������������������������������������������������"
2
G.1.1
nosound
G.2.1
sad
$�� � �� � ������� � 8��"�9 � ������� � !������� � 8��"�> � ����� � ����
������������������������������������������������������������!������
��� �������� ����"
������� ��������������1"&����������������������������������
�������������������!���������������������������%�"2:2�"�����������
��������������!������������������������%�"&29�"
��� � ����� ����� � ����� � ��>"&� � ��� � ������������� � ���� � ������
���%�"2>>H���������������������������������*������������������
���� � ������������� � ���%�"22>HH�" � ���� � ��� � ���� � � � ������ � �����
���������!���������������������������%�"&91�"������������������������
����������������������������������������������������������"
#����������������������������)���������������� ������������
����������������������������"�������������������������!����������
���!�����������������������������!����������)��������!������� ����
������������ ��� ���� ��� ��� ������ �� ��� �����"
���������� !������ ��� ��� ������ ��� ���� ������ ������ ���
���������"������������������������ �������������������������������
��� � ������� � ��������� � ��&"& � �� � �&"1�� � !��� � ������ � ��� � ���������
����������������������������������������������������������������!���
������������������"
��� � ���������� � ���� � �1"& � �� � �1"1 � ����� � �� � ��)� � ��� � �1�
�������������������������%�"2;1�����������������%�"29'H�"�4���������
��������������������������������1"1����������1"&���������"�4��
���������� ���������������������������������������������������������
�������������������������������������������"�4��������������������)����
�����������������������������+������������������������������������
����������������������������������)��������������������������������
�����"
��� � ����� � �> � ���� � ��� � ���� � ��� � ����������� � ������������
!���������������������������������������"
8������1"��8��������������������������������������
4
G.1.2
fear
8������9"��8����������������������������!���������
0
Sad
Angry
0
1
Happy
Fearful
1
G.3.1 happy
2
Sad
2
���������������������������������������������
E� !������ ��!F�����������������������������������������������
����� ���� �� ����� �� 8��" 1" �� ����� ��� ����� ����� ��� ������ ����
��� ��� ���� �������� �� ��� ����������� !������ ���� ����� ���
���� � ��� � ��� � �� � ��� � !��)������ � ����� � ���� � ����� � ����� � �����
����������"
������������������������������������������������������������
������������������������������������������������� ��������!�����"
4�����)����)��������������������>"&����������������������������
������������������������������������������������������1"&�"�����
��������������<��������������������6'G�������������������%�"2:7�"�
�����������������������!�������������������������������!��������
�������������)��������������������������������"
3������������������������������������������������������������
���%�"29'H����������������� ��������������%�"22>HH���������������
�����" � 4� � �� � �������� ����� ��� � ���� � ����������� � �������� �� � ������
����������������� �����������������������������������������������
����������������������!������������������������"
4
Happy
3
G.3.2
fear
8������>"���������������� �����������!��������"
B����������������������� ���� ���� ��������� ������� ������ �������
�����������������������������)����������������������������!�������"�
4����������!������������������������������������������!�������������
39
2.4.3
Musical effects on player's physiology
2.4.4
Until now, we have only used self-report data from participants'
answers to questionnaires. The question remains open whether
they are commenting on the feeling of the music, just imagining
what it could make somebody feel; or whether their true feelings
are genuinely affected.
Table 2 shows the physiological data of skin-conductance
(SC), heart-rate (HR), and what we call pupil-range (PR). The PR
is the difference between the minimum and maximum pupildilations over the whole clip. It is a simple summary parameter
from the eye-tracker data, which leaves out a lot of complexity,
but it is useful for a first analysis.
Musical effects on player's thinking
Some of the questions on the questionnaire asked about how
participants assessed the events in the story. Any changes in
assessments, caused by background music, would show that even
thought processes can be influenced by incidental sounds.
4
3
2
1
0
Table 2. Physiological data – changes over whole clip
 and  mean that the variable falls or rises
SC is skin conductance; HR is heart-rate;
Pupil-range is max-min dilation over clip (in mm)
Group 1
SC
HR
Clip 1
Clip 2
No sound
Fear




1.743
1.454 
Sad
Aggress
SC


HR


Pupil-range
1.363
1.677 
Group 3
Happy
Fear




1.205
1.469 
Pupil-range
Group 2
SC
HR
Pupil-range
G1.1
nosound
G1.2
fear
G2.1
sad
G2.2
aggress
G3.1
happy
G3.2
fear
Figure 5. “Does the character have a weapon?”
We limit discussion here to just one of the questions, which
asks if the participants agree that the Alan Wake character has a
weapon with him (e.g. a gun in his coat pocket). The answers are
shown in Fig. 5, averaged for each group. The scores range from
0, meaning “I totally disagree” through 3, which is “neutral”, to 4,
meaning “I totally agree.”
The strongest agreement is from the aggressive-music group
(G2.2), who all agree or agree totally, that he has a weapon.
Testing for statistical significance, we compare the group with the
groups G2.1 and G3.1, and find that the differences are
significant (p = .013* and p = .017*, respectively).
Aggressive music seems to cause participants to assess the
situation quite differently. They appear to attribute aggression to
the lead character, and that leads them to assume he must be
armed to be so confident. How much of this reasoning is
conscious cannot be determined from our results.
3
Discussion
One question concerning studies such as this one is whether
emotions are truly induced, or whether they are merely imagined
and reported by participants. Because we found distinct patterns
in the physiological data, we conclude that in this case emotions
really were induced in the experiment.
The mood or emotion associated with each piece of music was
confirmed by the participants at the end of the experiment.
Although the pieces were well-chosen, they were not all equally
effective at inducing one precise emotion. This is in the nature of
music, which is more art than science, even today.
The music and film-clip both had effects separately, but also
interacted in some interesting ways. Music of one mood could
induce a quite different mood state in a person watching the clip.
The emotions of happiness and sadness were seen to be
opposites throughout. Music or video that caused one to rise
would generally cause the other to fall, which is intuitively
reasonable. The sad piece of music seemed to be especially
effective at inducing sadness, but on this evidence alone, it is not
possible to generalise from this case to all other sad music.
Happy music had an interesting “inoculation” effect. It tended
to lessen the negative emotions, including fear, which is striking
because the film-clip is intended to be fearful in mood. This is
one result that should alert game designers to be careful when
choosing background music for their games. An inappropriate
piece of music can kill the experience for the player.
Fearful music was also quite effective, and brought two
different groups to a similar emotional state even after their
divergent histories up to that point (in the first clip). This may be
Each  and  symbol represents a rise or a fall of the
physiological variable from start to end of the clip. This change in
value is consistent for all (five) participants of each group in
every case, but for one small exception. One participant in group
G.3 starts with a heart-rate (HR) of 83.8 beats per minute (bpm)
and end the clip with 84.5 bpm; but all the other participants in
that group experience a larger fall in HR, as shown in the table.
Pupil-ranges (PR) in the table are average for the group, and a
 (or ) symbol shows that each participant in the group has a
bigger (or smaller) PR for clip-2 than for clip-1. In each group,
the difference is of the order of about 0.3 mm. Because of
problems with missing readings from the eye-tracker, there are
only three participants for the PR row in each group. Even so, the
within-group consistency is almost as impressive for the PR
variable as for the SC and HR variables (with five participants in
each group).
It is clear, from Table 2, that the different emotions have
different effects on the physiological variables. This is not to say
that all emotions, in all circumstances, will have clear,
characteristic patterns of response in physiology; indeed, that is
very unlikely. However, in the controlled context of our
experiment, there is a pattern. This is meaningful, because people
do not have conscious control of SC, HR, or pupil-dilation; but
emotion does strongly affect such physiological variables.
We conclude that our participants were reporting actual
emotional responses. Given that the clips were only about 30
seconds long, it might be surprising to some people to realise how
quickly a viewer's emotional response can be affected by music.
40
an interaction between the music and the clip, however, since the
fearful music is the original track for the (fearful) clip.
One result is of special interest to us, and the focus of our
current and future work: the demonstration that background
music can have the power to change the listener's assessments
and other thoughts about the situation or story in the film. We
found this effect for aggressive music, but in general other moods
might be relevant to other clips.
4
Conclusion
As a result of this study, we suggest that music, via emotion, can
influence subjects’ perception and assessment of the situation.
Physiological measurements, such as heart-rate, skin conductance
and pupil dilation can be valuable in helping to read the
emotional state of game players. But it is still difficult to detect a
person’s emotion reliably, as many factors can influence the
emotional experience. Therefore more sophisticated models are
required to frame analysis and support interpretation.
5
References
Bartlett, D. (1996) Physiological Responses to Music and Sound
Stimuli, in D.A. Hodges (Ed.) Handbook of Music Psychology,
1996, 2nd edn. Lawrence, KS: National Association for Music
Therapy.
Beatty, J. (1982). Task-Evoked Pupillary Responses, Processing
Load, and the Structure of Processing Resources.
Psychological Bulletin, 91(2), 276-292 .
Frijda, N.H. (1986) The emotions. Cambridge: CUP.
Frijda, N.H. (1994) Emotions are functional, most of the time. In:
Ekman, P., Davidson, R.J. (Ed.s) The Nature of Emotion,
Fundamental Questions New York: OUP.
Mandryk, R.L. (2005). Evaluating Affective Computing
Environments Using Physiological Measures. In: Proceedings
of Workshop 14: Innovative Approaches to Evaluating
Affective Interfaces, at CHI 2005. Portland, USA, April 2005.
Partala, T. Surakka V. (2003). Pupil size variation as an
indication of affective processing. Int. J. Human-Computer
Studies 59 185198.
Picard, R.W. (1997) Affective computing. Cambridge, MA: MIT
Press.
Sloboda, J.A. and Juslin, P.N. (2001) Music and emotion: theory
and research. Oxford: OUP.
Tan, E.S.H., and Frijda., N.H. (1999). Sentiment in film viewing.
In: Plantinga, C. and Smith, G.M., (Ed.s) Passionate Views:
Film, Cognition, and Emotion. Baltimore: Johns Hopkins UP,
48-64.
41
REMUPP – a tool for investigating musical narrative functions
Johnny Wingstedt
School of Music, Luleå University of Technology, PO Box 744, SE941 28 Piteå, Sweden
Sonic Studio, Interactive Institute, Acusticum 4, SE-941 28 Piteå, Sweden
[email protected], [email protected]
Abstract. The changing conditions for music as it appears in new media was the starting point for the project “NIM – Narrative
Interactive Music”. The overall aim was to explore interactive potentials and narrative functions of music in combination with
technology and other narrative media – such as in film or computer games. The software REMUPP was designed for investigating
various aspects of the musical experience and allows for experimental non-verbal examination of selected musical parameters in a
musical context. By manipulating controls presented graphically on the computer screen, participants can in real-time change the
expression of an ongoing musical piece by adjusting structural and performance-related musical parameters such as tempo, harmony,
rhythm, articulation etc. The music can also be combined with other media elements such as text or graphics. The manipulations of
the parameter controls are recorded into the software and can be output in the form of numerical data, available for statistical
analysis. The resulting music can also be played back in real time, making it possible to study the creative process as well as the aural
end result. A study utilized the REMUPP interface to explore young adolescents’ knowledge about, and use of, musical narrative
functions in multimedia. Twenty-three participants were given the task of interactively adapting musical expression to make it fit
different visual scenes shown on a computer screen. The participants also answered a questionnaire asking about their musical
backgrounds and media habits. Numerical data from the parameter manipulations were analyzed statistically. After each completed
session, the participants were also interviewed in a ‘stimulated recall’ type of sitting. The results showed that the participants to a
large degree displayed a collective consensus about certain narrative musical functions. The results were also affected by the
participants’ gender, musical backgrounds and individual habits of music listening and media use.
1 New musical functions
good reasons to assume that media music contributes to shaping
knowledge and attitudes concerning communicational, artistic
and interactional musical issues.
A characteristic feature of modern society is the increased
interaction between man and technology. New technology
requires new kinds of skills and knowledge – but is also the
source of new knowledge. This new knowledge concerns not
only technology itself, but also various societal and cultural
phenomena related to the technological changes.
The changing conditions for music as it appears in new media
was the starting point for the project “NIM – Narrative
Interactive Music”, performed in collaboration between the
Interactive Institute’s studio Sonic and the School of Music in
Piteå. The overall aim of the project was to explore interactive
potentials and narrative functions of music in combination with
technology and other narrative media such as image, text or
sound – such as in film or computer games. This article will
describe the use of the interactive analysis tool REMUPP
(‘Relations Between Musical Parameters and Perceived
Properties’), which in the project has been used in several quasiexperiments (Cook & Campbell, 1979) to investigate the
participants’ knowledge and creative use of music’s narrative
codes and conventions.
Kress (2003) has described how, in this new ‘age of media’, the
book is being replaced by the screen as the dominant medium
for communication – changing the basic conditions for the
concept of literacy. The centuries-long dominance of writing is
giving way to a new dominance of the image. But this new
literacy of course does not only involve visual communication.
Rather we are in the new media today making sense, or trying to
make sense, out of an intricate assortment and multimodal
combination of different media: images, written and spoken text,
video, animations, movement, sound, music and so on. What
creates meaning is above all the complex interplay of the
different modes of expression involved. At the same time new
technology involves an including quality, emphasizing elements
of interactivity and active communication as the contemporary
information society is gradually abandoning the
communicational model of ‘from-one-to-many’ in favor of
‘from-many-to-many’.
1.1 Musical narrative functions
Before describing the use of REMUPP, the concept of musical
narrative functions will briefly be discussed. In the process of
defining a theoretical foundation for the project, a categorization
of musical narrative function was commenced. The purpose was
to provide a framework examining and defining what narrative
functions are. This framework was aimed to serve as part of a
theoretical and referential basis for further exploration of how
the narrative functions are experienced, used and achieved.
Since film is a medium with an established narrative tradition,
having developed sophisticated musical narrative techniques and
codes during the past century, the narrative functions of film
music was the chosen focus for this categorization.
In the emerging multimodal and multimedial settings, the study
of the role of sound and music is so far largely a neglected field.
Remarkably so, since music and sound are often important
expressive and narrative elements used in contemporary media.
In formal music education, narrative music, as it appears in film,
television or computer games (henceforth referred to as media
music) is typically a blind spot and is rarely discussed at depth
(Tagg & Clarida, 2003). However, considering the high degree
of exposure to this kind of music in our everyday life, there are
Around 40 different musical narrative functions were taken as a
starting point and divided into six narrative classes: (a) the
42
Emotive class, (b) the Informative class, (c) the Descriptive
class, (d) the Guiding class, (e) the Temporal class and (f) the
Rhetorical class. These classes were in turn subdivided into
altogether 11 (later 12) different categories.
Before the modern technologization of media, experiencing
drama or listening to music can be said to always have involved
a certain degree of interactivity and variability. A live music
concert will always respond to the “unique necessities of the
individual time, place and people involved” (Buttram, 2004, p.
504), and never be repeated twice exactly the same way. Cook
(1998) observes how music detached from its original context,
assimilates new contexts. New musical contexts continue to
evolve as technology and society changes. Viewing the
individual listener and the musical sound as being active
dimensions in the defining of the context also implies the
listener as being interactively participating in the act of music.
Rather than just talking about ‘listening’, Small (1998) uses the
term musicking to emphasize a participatory view of the
composer and performer - as well as the bathroom singer, the
Walkman listener or the seller of concert tickets. In computer
games, where the dimension of agency is salient, there is a
potential for affecting the musical expression of the game music
by interacting with the gaming interface.
The emotive class includes the emotive category – which is a
general category, present to some degree in most cases where
music is used in film (including functions such as describing
feelings of a character, foreboding and stating relationships).
The functions of the informative class achieve meaning by
communicating information on a cognitive level rather than on
an emotional level. The class includes three categories –
communication of meaning (such as clarifying ambiguous
situations a n d communicating unspoken thoughts) ,
communication of values (such as evocation of time period,
cultural setting or indication of social status) and establishing
recognition.
The descriptive class is related to the informative class in certain
aspects, but differs in that the music is actively describing
something rather than more passively establishing associations
and communicating information. It is also different from the
emotive class, in that it describes the physical world rather than
emotions. In this class there are two main categories –
describing setting (such as physical environment or atmosphere)
and describing physical activity (such as movement of a
character).
The dimension of interactivity in the new media challenges the
traditional pattern of ‘Creator-Performer-Receiver’ (Fig. 1) – as
well as the conditions for traditional institutionalized learning.
The conventional progression of the musical communication
process (as seen in western traditional music) is then challenged.
Rather than what has traditionally been seen as a one-way
communication model we get a situation where the distinction
between the roles gets more ambiguous and new relations
emerge between the actors involved (Fig. 2). This can be
thought of as the music process increasingly getting
participatory and inclusive rather than specialized and exclusive.
The guiding class includes musical functions that can be
described as ‘directing the eye, thought and mind’. It includes
two categories, the indicative category (such as pointing out
details or establishing direction of attention) and the masking
category.
CREATOR
The temporal class deals with the time-based dimension of
music. Two categories are included: providing continuity
(shorter-term or overall continuity) and defining structure and
form.
PERFORMER
RECEIVER
Figure 1: Traditional view of the musical communication
chain.
The rhetorical class includes the commenting as well as
contrasting categories. Some functions in this class spring from
how music sometimes steps forward and ‘comments’ the
narrative. Rhetorical functions also come into play when image
and music are contrasting, making visible not only the semiotic
codes of the music but also the effect of the narrative on how we
perceive the meaning of the music.
RECEIVER
In a given situation, several narrative functions typically
operates simultaneously on several different levels, the salient
functions will quickly and dynamically change. A more detailed
discussion of the musical narrative functions is found in
Wingstedt (2004, 2005).
CREATOR
1.2 Changing roles
During the larger part of the past century, we have gradually
gotten used to the role of being consumers of text, images and
music. We have progressively accustomed ourselves to the
objectification of media – by the means of books, magazines,
recordings, films etc. Making media mobile has led to a recontextualization and personalization of medial expression and
experience. This in turn has affected how we establish visual,
aural and musical codes, metaphors and conventions. The
growing interest in mobile phone ring tones, the use of ‘smileys’
in SMS and e-mail, the codification of the ‘SMS-language’ – are
manifestations of evolving technology-related media codes.
PERFORMER
Figure 2: A relational view of participants in act of music.
2 Controlling musical expression
One solution to the challenge of achieving a higher degree of
adaptability in music used in narrative interactive situations, is
to facilitate user influence of the music at a finer level of detail
(Buttram, 2004). This can be done by representing the music on
a component or parameter level where the parameters are
43
accessible for control via the application, directly or indirectly
influenced by user interaction. The concept of musical
parameters is here defined as attributes of the musical sound:
structural elements such as tonality, mode (e.g. major or minor
mode), intervals, harmonic complexity (consonance –
dissonance), rhythmic complexity, register (low or high pitch
level) etc. - or performance-related elements such as tempo,
timing, phrasing, articulation etc. (Gabrielsson & Lindström,
2001; Juslin, 2001). By altering musical parameters in real-time,
the musical expression will, directly or indirectly, be affected by
the listener/user in ways that is traditionally thought of as being
the domain of the composer or performer, as discussed above.
By manipulating controls presented graphically on the computer
screen (as knobs or sliders), participants can in real-time change
the expression of an ongoing musical piece by adjusting
structural and performance-related musical parameters like
tonality, mode, tempo, harmonic and rhythmic complexity,
register, instrumentation, articulation, etc. The basic musical
material, as well as the types and number of musical parameters
included with REMUPP, can be varied and tailored by the
researcher according to the needs and purpose of the study at
hand. The music can also be combined with other media
elements such as text or graphics.
Having the participants manipulate the music, makes REMUPP
a non-verbal tool where the participant responds to the musical
experience within ‘the medium of music’ itself, without having
to translate the response into other modes of expression such as
words or drawings. By responding to the musical experience in
this way, the user will directly influence the musical expression
– and thereby to a certain degree control his/her own experience.
Managing the parameter controls requires no previous musical
training. In a typical REMUPP session, the controls will be
presented without any verbal labels or descriptions, making for
an intuitive use of the parameters with a focus on the actual
musical sound.
Modifying musical expression by controlling musical
parameters, directly accesses communicational and expressional
properties of the music on a level that goes beyond the genre
concept. Alteration of the musical performance can be
accomplished without disturbing the musical flow and
continuity, at the same time as it provides variation and dynamic
expressive changes. Regardless of style, the same set of
parameters can be made available, only their settings will be
changed – e.g. the parameter tempo is a component of any kind
of music and by using it to alter the speed of a musical
performance it will at the same time alter some aspect(s) of the
musical expression.
The possibility to have several variable musical parameters
simultaneously available opens up for studying not only the
individual parameters themselves, but also for investigating the
relationships and interplay between the different parameters.
Furthermore, combining the music with other media such as text
or video makes visible the relationships between music and
other modes of expression – making it possible to study specific
meaning making factors appearing as the result of multimodal
interweaving.
It should be noted that a basic assumption of the project is the
view that the musical sound itself (or a certain musical
parameter value) is not typically expressing a specific ‘meaning’
– but rather represent a meaning potential (Jewitt & Kress,
2003). The more specific musical meaning making is depending
on contextual factors – like the interplay with the situation
(including socio-cultural factors), the dramaturgical context and
the interweaving with other narrative modes such as the moving
image, sound effects and dialogue.
In REMUPP, the participants’ manipulations of the parameter
controls are recorded into the software and can be output in the
form of numerical data, available for statistical analysis. The
resulting music, including all the manipulations on a time-axis,
can also be played back in real time, making it possible to study
the creative process as well as the aural end result. The various
ways to handle data, and the possibility to combine different
data types, makes the REMUPP tool potentially available for
use within several different types of research disciplines. As
well as being a source of quantitative statistical data, REMUPP
is also suited for use with more qualitatively oriented methods
(such as observations or interviews) – or for combinations of
different techniques.
To explore the potentials of affecting musical expression by the
alteration of musical parameters, the software REMUPP
(Relations between Musical Parameters and Perceived
Properties) was developed (Wingstedt, Berg, Liljedahl &
Lindberg, 2005; Wingstedt, Liljedahl, Lindberg & Berg, 2005).
2.1 REMUPP
REMUPP (fig. 3) is designed for investigating various aspects
of the musical experience and allows for experimental nonverbal examination of selected musical parameters in a musical
context. The musical control is put into the hands of the
experiment participants, introducing elements of creativity and
interactivity, and enhancing the sense of immersion to a test
situation.
REMUPP offers an environment, providing control over certain
selected musical parameters, not including the finer level of the
user selecting each individual note. Limiting the control in this
way, affects the creative process as well as the final outcome.
The participant might be described as being more of a cocomposer (or maybe a performer), rather than a composer in a
traditional sense.
2.2 Musical implementation
The concept and functionality of the REMUPP interface causes
special demands to be put on the structure of the basic musical
material involved – and thus on the composer of this musical
material. Since the technical and musical designs will be
interwoven with and interdependent on each other, the
construction and implementation of the musical material
becomes as important as the technical design. Unlike music
created for more conventional use, the ‘basic music’ composed
Figure 3: REMUPP – an example of the user interface.
44
for REMUPP must in a satisfactory way accommodate the
parameter changes made by a participant. The desired
expressional or narrative effects must be distinctly achieved at
the same time as the overall music performance should remain
convincing. Special consideration also has to be taken of the
complex interaction of different parameters working together,
since the perceived effect of any selected parameter change will
be affected by the prevailing settings of the other parameters
available. The musical material can thus be thought of as an
algorithm, where each parameter is put in relation to all the
other parameters in a complex system interacting on many
levels. The composer must therefore carefully define and tailor
the basic musical material to fulfill the demands of expressional
situation at hand – as well as take into account the technical
framework of REMUPP. These conditions form the basis for the
formulating of artistic strategies that allows for a certain
freedom of musical expression and development of musical
form – leaving room for the decisions and actions of the
listener/user. Rather than ascribing the detailed function of each
individual note, the composer will define rules and conditions
determining a range of possible musical treatments for a given
situation.
Each visual scene was presented three times, each time with a
different ‘basic musical score’ as accompaniment (the
presenting order of the altogether 9 trials was randomized). The
initial values of the seven musical parameters were randomized
to avoid systematic errors resulting from the initial musical
sound. The instruction to the participants was to ‘adjust the
musical expression to fit the visual scene as well as possible’.
The controlling faders were presented without any written
labels, to make the participant focus on their functions only by
listening to their effect on the musical expression when moved.
3 Experimental studies
To better understand the properties of musical expression
resulting from parameterization, several experiments, or quasiexperiments (Cook & Campbell, 1979), have been carried out.
Initially, two pilot-studies were performed. The first study
investigated selected parameters’ perceived capability to change
the general musical expression; the second study examined how
the parameters can contribute to express emotions. These studies
are described in several articles (Berg & Wingstedt, 2005; Berg,
Wingstedt, Liljedahl & Lindberg, 2005; Wingstedt, Berg et al,
2005).
Figure 4: An example of REMUPP’s test interface – a screenshot of
a 3D animation depicting a ‘physical environment’ (Picnic by the
Lake) and below the faders controlling musical parameters.
The participants also answered a questionnaire asking about
their musical backgrounds, and habits of listening to music,
watching movies and playing computer games. Numerical data
from the parameter manipulations were analyzed statistically to
search for tendencies within the group with regards to the
preferred values of the musical parameters in relation to the
different visual scenes.
A larger study, “Young Adolescents’ Usage of Narrative
Functions of Media Music by Manipulation of Musical
Expression” (Wingstedt, Brändström & Berg, 2005), utilizes the
REMUPP interface to explore young adolescents’ knowledge
about, and use of, musical narrative functions in multimedia.
Twenty-three participants, 12-13 years old, were given the task
of interactively adapting musical expression to make it fit
different visual scenes shown as 3D-animations on a computer
screen (fig. 4). This was accomplished by manipulating seven
musical parameters: Instrumentation (3 different instrument sets,
‘Rock’, ‘Electronic’ and ‘Symphonic’, were available), Tempo
(beats per minute), Harmonic complexity (degree of dissonance
– consonance), Rhythmic complexity (rhythmic activity),
Register (octave level), Articulation (staccato – legato) and
Reverb (effect amount). The study took as a starting point one of
the descriptive musical narrative functions discussed earlier:
Describing physical environment.
After each completed session, the participants were also
interviewed in a ‘stimulated recall’ type of sitting, where they
got to watch and listen to their process as well and results, and
discussed and commented on their creative decisions in relation
to the musical and narrative expression experienced and
intended. Additionally, they got to rate their favourite version of
each of the three movies (‘which one are you most satisfied
with?’), based on how well they thought the music fitted the
visuals.
They also discussed the perceived functions of the seven
parameters being used, and the experience of interactively
controlling the musical expression.
3.1 Results
The results from the statistical analysis of the parameter settings,
combined with the questionnaires, showed that the participants
to a large degree displayed a collective consensus about certain
narrative musical functions. This intrinsic consensus can, in
turn, be interpreted as mirroring extrinsic norms – a knowledge
about existing conventions that we encounter in film, computer
games and other narrative multimedia.
Three different visual scenes were presented, depicting different
physical settings: City Night (a dark hostile alley under a
highway bridge), In Space (inside a space ship looking out to
planets and other space ships through a giant window) and
Picnic by the Lake (a sunny day by a small lake with water lilies
and butterflies, a picnic basket on a blanket). There were no
people in these environments, to keep the focus on the actual
settings. The graphics were realized as animations, but with the
movements used sparingly, so there was no visible plot or story
– they could be thought of as ‘moving still images’. The idea
was to give an impression of these places being alive, ongoing,
in process – representing ‘present tense’.
A short interpretation, summing up the results of the
participants, goes as follows: The pastoral scene by the lake is
expressed by the group of participants by the use of the
45
‘Symphonic’ instrumentation consisting primarily of flute,
strings and harp – a classic cliché for expressing pastoral
settings in Western musical tradition. The darker and more
hostile urban City scene, as well as the more high-tech and
mysterious Space scene, are portrayed using electronic
instruments. In the two latter scenes the register is also generally
lower, producing darker and more sombre sonorities than in the
brighter Lake scene. The basic tempi of the Space and Lake
scenes are kept relatively low, reflecting the tranquillity of these
situations – although the rhythmic activity in the Lake scene is
higher, maybe expressing the movements of the fluttering
butterflies. The tempo of the City scene is slightly higher,
although with a low rhythmic activity, which can be seen as
reflecting a higher degree of suspense. The more confined
locations of the Space and City scenes are portrayed by the use
of more reverb than the open air, and less dramatic, Lake scene.
The articulation of the music for the Lake scene is also shorter,
although not down to a full staccato, providing an airy quality
allowing more ‘breathing’ into the musical phrasings.
unreflecting level since the visuals tend to achieve salience.
Working with the REMUPP interface has made it possible to
bring the music to the front, to make visible the implicit
knowledge about musical narrative functions.
The results strengthen the assumption that high exposure to
media and its associated music contributes to the shaping of
knowledge and attitudes of media music. We learn, not only
t h r u the ‘multimodal texts’ but also about the modes
themselves, from simply using media in informal situations.
This gives rise to questions about how learning takes place in
pronounced multimodal settings, how we become ‘multimodally
literate’ by using the various modes – and the role of music in
such situations.
REMUPP offers a potential for investigating a range of musicrelated issues from new angles, presenting alternatives when
compared to traditional test methods. Firstly, the non-verbal
nature of the interface allows for attaining types of data that are
difficult or impossible to access using verbal descriptions.
Secondly, the tool provides opportunities for exploring various
aspects of contextual relations, intra-musical as well as extramusical. Thirdly, the participants’ interaction and control of the
musical expression, allows for investigation of aspects of
creativity and establishes a deepened sense of agency for the
participant. The emphasis on interactivity and the high quality
music engine provides an environment resembling a computer
game, which enhances immersion and effectively works against
the otherwise potentially negative effects of the laboratory
situation.
The results were also affected by the participants’ gender,
musical backgrounds and individual habits of music listening
and media use. A general trend was that participants with a
higher level of media use (spending much time playing
computer games or watching movies) also exhibited a higher
awareness of (and conformity to) musical narrative conventions.
A more detailed discussion of these results is found in
Wingstedt, Brändström and Berg (2005) and Wingstedt (2005).
The above mentioned results are drawn from the statistical
material. However, at this point the statistical material can
mainly indicate answers to the ‘what’ (what was being done)
questions. In upcoming papers, analyses of the interviews will
be presented – aiming to also contribute some answers to the
‘why’ questions and to include matters related to creative issues,
including choices made (conscious or intuitive) in order to
follow or deviate from narrative and expressional codes and
conventions.
In describing the REMUPP interface, emphasis has been put on
its use as an interactive non-verbal tool suited for research of
various aspects of musical experience. It should be noted
however, that the technical and musical concepts behind the
interface also offer a platform for other potential applications.
For example, the system provides a promising environment for
the creation and concept development of live interactive music
performances. Also, the technical as well as artistic concepts
developed can be thought of as an embryo for a ‘musical engine’
to be used for computer games and other interactive situations.
4 Conclusion
By taking charge of the possibilities offered by contemporary
interactive and narrative media, a new world of artistic and
creative possibilities is emerging – also for the participant in the
act of music traditionally thought of as the ‘listener’. It is an aim
of this project to serve as a platform for further studies towards
knowledge and understanding of the potentials and challenges
offered for music in the emerging communication media.
This interdisciplinary project has resulted in the development of
a theoretical groundwork concerning topics such as narrative
functions of media music, and artistic and practical strategies for
composition of interactive music – and also in development and
innovation of technical nature.
The various results gained in the study indicate the usefulness of
the REMUPP interface as a tool for exploring musical narrative
functions. In manipulating the musical parameter controls, the
participants achieve meaning through ‘musical actions’, which
is different from using language. For example, to just say that a
visual setting is ‘scary’ is not the same as expressing it
musically. To determine ‘scary’ by (for example) assigning a
low register, setting a certain degree of harmonic dissonance and
rhythmic activity, adding more reverberation and slowing down
the tempo, demands a commitment to a higher degree than just
saying the word.
Acknowledgements
Thank you to Professor Sture Brändström and Senior Lecturer
Jan Berg, at the School of Music in Piteå, for their engagement,
help and inspiration throughout this project. Thanks also to Mats
Liljedahl (programming) and Stefan Lindberg (composing the
basic musical material) at the Interactive Institute, Studio Sonic
in Piteå – and to Jacob Svensson (3D graphics) at LTU
Skellefteå – for all the work involved in developing REMUPP.
Not only the music, but the interweaving between different
modes – in this case especially visuals and music – is what
creates meaning in the multimodal ensemble (Kress, Jewitt,
Ogborn & Tsatsarelis, 2001:25). REMUPP provides conditions
for such kind of interweaving. In experiencing a narrative
multimodal situation, there is a tendency for the audience or user
to treat media music on a relatively subconscious and
46
References
Computer Entertainment Technology ACE 2005, Valencia,
Spain, 15-17 June.
Berg, J. and Wingstedt, J. (2005). ‘Relations between Musical
Parameters and Expressed Emotions – Extending the Potential
of Computer Entertainment’. In Proceedings of ACM SIGCHI
International Conference on Advances in Computer
Entertainment Technology, ACE 2005, Valencia, Spain, 15-17
June.
Wingstedt, J., Brändström, S. and Berg, J. (2005). Young
Adolescents’ Usage of Narrative Functions of Media Music by
Manipulation of Musical Expression. Manuscript submitted for
publication.
Wingstedt, J., Liljedahl, M., Lindberg, S. and Berg, J. (2005).
‘REMUPP – An Interactive Tool for Investigating Musical
Properties and Relations’. In Proceedings of The International
Conference on New Interfaces for Musical Expression, NIME,
Vancouver, Canada, 26-28 May.
Berg, J., Wingstedt, J., Liljedahl, M. and Lindberg, S. (2005).
‘Perceived Properties of Parameterised Music for Interactive
Applications’. In Proceedings of The 9th World MultiConference on Systemics, Cybernetics and Informatics WMSCI,
Orlando, Florida, 10-13 July.
Buttram, T. (2004). “Beyond Games: Bringing DirectMusic into
the Living Room”, in DirectX 9 Audio Exposed: Interactive
Audio Development, ed. T. M. Fay. Plano, Texas: Wordware
Publishing Inc.
Cook, N. (1998). Analysing Musical Multimedia. Oxford, UK:
Oxford University Press.
Cook, T.D. and Campbell, D.T. (1979). Quasi-Experimentation:
Design & Analysis Issues for Field Settings. Boston, MA:
Houghton Mifflin Company.
Gabrielsson, A. and Lindström, E. (2001). “The Influence of
Musical Structure on Emotional Expression”, in Music and
Emotion: Theory and Research, eds. P.N. Juslin and J.A.
Sloboda. Oxford, UK: Oxford University Press.
Jewitt, C. and Kress, G. (eds.) (2003). Multimodal Literacy.
New York: Peter Lang Publishing.
Juslin, P.N. (2001) “Communicating Emotion in Music
Performance: A Review and Theoretical Framework”, in Music
and Emotion: Theory and Research, eds. P.N. Juslin and J.A.
Sloboda. Oxford, UK: Oxford University Press.
Kress, G. (2003). Literacy in the New Media Age. Oxon, UK:
Routledge.
Kress, G., Jewitt, C., Ogborn, J. and Tsatsarelis, C. (2001).
Multimodal Teaching and Learning: The Rhetorics of the
Science Classroom. London: Continuum.
Small, C. (1998). Musicking: The Meanings of Performing and
Listening. Middletown, CT: Wesleyan University Press.
Tagg, P. and Clarida, B. (2003). Ten Little Title Tunes. New
York, NY: The Mass Media Music Scholars’ Press.
Wingstedt, J. (2004). ‘Narrative Functions of Film Music in a
Relational Perspective’. In Proceedings of ISME – Sound
Worlds to Discover, Santa Cruz, Teneriffe, Spain, 14-16 July.
Wingstedt, J. (2005). Narrative Music: Towards and
Understanding of Musical Narrative Functions in Multimedia,
(Licentiate thesis). School of Music, Luleå University of
Technology, Sweden.
Wingstedt, J., Berg, J., Liljedahl, M. and Lindberg, S. (2005).
‘REMUPP – An Interface for Evaluation of Relations between
Musical Parameters and Perceived Properties’. In Proceedings
of ACM SIGCHI International Conference on Advances in
47
On the Functional Aspects of Computer Game Audio
Kristine Jørgensen, Ph.D. student,
Section of Film & Media Studies, Copenhagen University,
[email protected]
Abstract: What is the role of computer game audio? What formal functions does game audio have? These are central questions in
this paper which seeks to outline an overview of the functionalities of sound in games. Based on a concluding chapter in my Ph.D.
dissertation in which formal functions are identified and discussed, this paper will be a sum-up of some of the most crucial points in
my current Ph.D. research on the functionality of sound and music in computer games. The research shows that game audio has
important functions related to actions and events in the game world, and also related to the definition and delimitation of spaces in
computer games.
asked to believe in when playing a computer game. The player
must also accept that the fictional world is the frame of reference for what happens in the game. A fictional world may depict
a setting that has no real world counterpart and in which nonexistent features are present, or it may depict a setting which has
a real world counterpart but presents hypothetical events and
features. An example of the first is Warcraft III’s fantasy world
Azeroth that features the existence of dragons, orcs and magic,
and an example of the latter is Hitman Contracts’ world that is
very similar to our own by featuring settings called Amsterdam
and Belgrade, but in which the main character and his enemies
never were existing persons. In both contexts, sound is used to
emphasise the fictional world by being connected to soundproducing sources in a similar manner to real world sounds and
by contributing to the atmosphere and the dramatic developments in this world. This point is supported by theories of film
sound and music.
1 Introduction
This paper concerns the role of computer game audio, and seeks
to outline an overview of important functionalities that can be
identified in modern games. The present discussions are based
on findings in my Ph.D. research and demonstrate a range of
different, but related functions of game audio. These are connected to usability, mood and atmosphere, orientation, control
and identification. An important prerequisite for understanding
the functions that computer game audio has is seeing computer
games as dual in the sense that they are game systems as well as
fictional worlds [1]. This means that game audio has the overarching role of supporting a user system while also supporting
the sense of presence in a fictional world.
The identification of these functions are based on my current
Ph.D. research that studies computer game sound and music
with focus on the relationship between audio and player action
as well as events in games. The study is based on theories about
film sound and music [2, 3], auditory display studies [4, 5, 6, 7],
and qualitative studies of game audio designers and computer
game players. The theoretical and empirical perspectives have
together provided the understanding of game audio functionality
presented in this paper. However, since my project has focussed
on two specific games, namely Io Interactive’s stealth-based
action game Hitman Contracts (2004), and Blizzard’s real-time
strategy game Warcraft III (2002), it is likely that additional
functions may be discovered when studying games within other
genres. Still the results presented in this paper are diverse, because of the great difference in genre and audio use in the two
games in question. However, this paper will also draw on examples from other games.
Film theory traditionally separates between diegetic and extradiegetic sound. Diegetic sound is that which has a perceived
source in the film universe, and which the fictional characters
consequently are able to hear. Extradiegetic sound, on the other
hand, are sounds that are part of the film, but which do not seem
to have a physical source within the film universe. Thus, extradiegetic sounds cannot be heard by the fictional characters and
communicate to the audience by contributing to the mood or
drama within the film [2, 10].
However, in computer games, extradiegetic sound often has a
different informative role since the player may use information
available in extradiegetic sound when evaluating his choice of
actions in the game world. In effect, this means that extradiegetic sound has the power to influence what happens in a
game, while it does not have this power in a film. An example of
this is the use of adaptive music in games: when a certain piece
of extradiegetic music starts playing when the avatar is riding in
the forest in The Elder Scrolls IV: Oblivion (Bethesda 2006), the
player knows that a hostile creature is on its way to attack, and
s/he may either try to evade the creature, or stop to kill it. In
comparison, when the special shark theme appears when someone is swimming in the thriller film Jaws (Spielberg 1975), the
spectator can only watch as the character knows nothing of the
approaching danger.
2 Theoretical Background
As noted above, understanding the functionality of game audio
is connected to understanding the dual origin of computer games
as 1) game systems that focus on usability, and 2) fictional
worlds that focus on the sense of presence in the game environment. When talking about usability in relation to the game system, I want to emphasise that sound has the role of easing the
use of the system by providing specific information to the player
about states of the system. This idea is supported by auditory
display-related theories.
Diegetic sounds in computer games may also have a different
role than that diegetic sounds in films. When the avatar produces
the line “I cannot attack that” when the player uses the attack
command in World of Warcraft (Blizzard 2004), this is of course
a system message, but it also seems that the avatar itself is
speaking directly to the player. In this sense, the illusion of the
fictional universe is broken because a fictional character is
When talking about the sense of presence in a fictional world, I
want to point out that most modern computer games are set in
virtual environments that depict fictional, virtual worlds. In this
context, fictional world should be understood as an imaginary,
hypothetical world separate from our own which the players are
48
addressing an entity situated outside the game universe. However, when the traditional concepts of diegetic and extradiegetic
spaces seem to break down in games, I call the sounds transdiegetic [11, 12]. It should be noted that transdiegetic sounds are
consciously utilized in computer games, where they have a clear
functional, usability-oriented role. This will be demonstrated in
the following.
how sounds that seem natural to the game universe also have
strong informative value, while the concept of earcon explains
provides an understanding of why game music and artificial
noises make meaning without disturbance in computer games.
More importantly, these ideas also help explain why there are
transdiegetic sounds in computer games. When a game developer wants to utilize sound for urgency and response purposes,
while also maintaining a direct link to the game universe, it
becomes necessary to break the border between real world space
and virtual space in order to enable communication between the
player and the game world.
Auditory display studies are concerned with the use of sound as
a communication system in physical and virtual interfaces, and
the field derives from human-computer interaction studies and
ecological psychoacoustics. This field utilizes sound as a semiotic system in which a sound is used to represent a specific
message or event.
3 Different Functions
Related to the above theoretical assumptions, this part of the
paper will discuss five different overarching functions that have
been disclosed during my research. As noted above, the identification of these functions are based on analyses, interviews and
observations related to two specific games, and it is likely that
the study of more games will reveal additional functions.
Auditory display studies often separate between two kinds of
signals called auditory icons and earcons. Auditory icons are
characteristic sounds based on a principle of similarity or direct
physical correspondence and which can be recognized as sounds
connected to corresponding real world events; while earcons are
symbolic and arbitrary sounds such as artificial noises and music
which may be seen as abstract in the sense that they cannot
immediately be recognized [5, 6, 8, 9]. This separation between
two types of signals also applies to computer game audio. When
using sound as an information system, computer games utilize
both auditory icons and earcons. Broadly speaking, auditory
icons are used in connection with all kinds of communicative
and source-oriented diegetic sounds, while earcons are used in
connection with extradiegetic music and interface-related
sounds. In general, these terms are used for non-verbal audio,
but in the case of computer games there is an exception to this.
When voices are used in order to identify a human source and
not for its semantic qualities [3], the voice does not present
detailed linguistic information and may be used in a similar
manner to other object-related sounds. Examples of auditory
icons in games are the sound of a gun shot, the sound of enemies
shouting, and the sound of footsteps on the ground, while examples of earcons are the use of music to signal hostile presence, a
jingle playing when the avatar reaches a new level in an
MMORPG, and the sound playing when Super Mario is jumping.
3.1 Action-Oriented Functions
This research has identified uses of game audio which relate to
events and player actions in the game world, and which corresponds to auditory display studies’ urgency and response functions. Most modern games utilize sound for these purposes to an
extensive degree, although it is not always evident that this is
the formal and intended function of the sound. It seems to depend on how auditory icons and earcons are used.
Hitman Contracts integrates auditory icons as naturally occurring sounds from events in the environment. In this sense, the
communicative role of the sounds becomes transparent by giving the impression that sounds are present for a realistic purpose
instead of a functional purpose. For instance, when the avatar is
in a knife fight, sound will be a good indicator of whether he
hits or not. When the avatar hits, the slashing sound of a knife
against flesh will be heard, accompanied by screams or moans
from the enemy, and when the avatar misses, the sound of a
knife whooshing through the air is heard. These are of course
examples of a confirmation and a rejection response to player
actions, and work as a usability feature although they also seem
natural to the setting and the situation.
Concerning the purpose of auditory signals, studies of auditory
display often speak of two central functions. These may be
described as urgency and response functions. Urgency signals
are proactive in the sense that they provide information that the
user needs to respond to or evaluate shortly. Urgency signals are
often alarms and other alerts pointing towards emergency situations, and may be separated into different priority levels based
on whether they demand immediate action or evaluation only
[7]. Response signals, on the other hand, are reactive, and work
to inform the user that a certain action or command has been
registered by the system. In order to be experienced as responses, the sound must appear immediately after a the player
has executed a command or an action, and it must be clearly
connected to a specific event [4, 5]. In a game, an urgency message may be the voiceover message “our forces are under attack” in Warcraft III, while a response message may be the
sound of a mouseclick when selecting a certain ability from the
interface menu in the same game.
However, it is also possible to use auditory icons in a less transparent manner, in which the auditory icons more clearly stand
out as auditory signals intended for communicating specific
messages. In Warcraft III, objects produce specific sounds when
manipulated. For instance, when the player selects the lumber
mill, the sound of a saw is heard. Also, when the barracks is
selected, the player hears the sound of marching feet. Although
these responses have diegetic sources, the sounds do not seem
natural to the game world in the same manner as the knife
sounds in Hitman Contracts. The reason for this is that they are
produced only when the player selects the specific building, and
in the case of the barracks, this is not the exact sound one expects to hear at a real-world barracks. We see that the sound is
suitable for the specific object, although not in this precise
format. According to Keller & Stevens [6], this demonstrates
non-iconic use of auditory icons, while the example from Hitman Contracts demonstrates iconic use of auditory icons. This
difference also emphasises the fact that sounds with a seemingly
naturalistic motivation do have usability functions.
Together these concepts form a fruitful framework for understanding why computer game audio is realized the way it is, and
it also provides an understanding of different functions that
game audio may be said to have. The response and urgency
functions explain game audio in terms of the usability of a computer system. In addition, the concept of auditory icon explains
Concerning the use of earcons for response purposes, Hitman
Contracts has music that informs the player whether his/her
49
current activities are going well or badly. The music changes
into a combat theme which will play a particular piece of music
if the player is doing well, and another if the player is doing
badly. However, although this follows the idea of earcons, this
use of music is also adopted from the use of dramatic music in
films. This makes the use of musical earcons feel familiar and
suitable even though it does not feel natural to a specific setting.
logue also contribute to the specific mood of a game or a situation. The overall soundscape contributes to a sense of presence
or even immersion in a game by creating an illusion of the game
world as an actual space. Sound may thus give the impression of
a realistic space by presenting virtual offscreen sources. In this
context, ambient environmental sound is of interest. Ambience
should be understood as environmental background sounds
added to the game for the purpose of adding the sense of presence and a specific mood to the game. Thus, these sounds are
not present in order to influence player action by giving the
player specific information about objects, events or situations,
and they are often not connected to specific sources in the game.
Instead they may be connected to virtual sources, or be collected
into a separate soundtrack. The first technique is found in Lineage II (NC Soft 2004), where for instance insects can be heard
in each bush. When looking for the actual sources, however,
these cannot be found as visual objects. The second technique is
found in Sacred (Ascaron 2004), where the ambient background
noise for each setting is stored as a separate mp3-file. Thus,
when the player is exploring dungeons, a specific soundtrack
consisting of reverberated wind and running water is played,
while when the player visits villages, the sounds of children
laughing and dogs barking are heard.
Both earcons and auditory icons are used for urgency purposes
in computer games. Although a range of different priority levels
may be identified in games, I will limit myself to the two most
common. Games often separate between urgency signals that
work as notifications that do not demand immediate player
action; and urgency signals that work as warnings that demand
some kind of action. Notifications provide information about
events in the environment that the player needs to know about,
but which s/he does not have to react to. S/he may, however,
need to evaluate the situation. An example from Warcraft III is
the message “work complete” which is played when a worker
has finished its task. Warnings, on the other hand, provide information about immediate threats or dangers to the player.
These will always need an immediate evaluation, and possibly
action, but dependent on the situation, the player may choose to
not take any action if he regards the situation under control. An
example is the message “our forces are under attack” which is
played in Warcraft III when the player’s units are being attacked
by the enemy.
Observations and conversations with players reveal that the
engagement in the game may decrease when the sound is removed from the game. Players notice that the immersion decreases, and that the fictional world seems to disappear and that
the game is reduced to rules and game mechanics when sound is
removed.
3.2 Atmospheric Functions
Working in a more subtle manner, the atmospheric functions of
game audio may still be regarded as one of the most central. The
use of music in films for emotionally engaging the audience is
well known [12], and games try to adopt a similar way of using
music. Most mainstream games utilize music to emphasise
certain areas, locations and situations.
3.3 Orienting Functions
The orienting functions of game audio are related to actionoriented functions in the sense that both provide information
about events and objects in the game environment although in
different ways. While the action-oriented functions are reactive
and proactive, the orienting functions inform about the presence
and relative location of objects and events. The functions described in this section were identified in my qualitative research
where player performance was studied in the absence and presence of game audio.
An example is a game such as World of Warcraft, where the
large cities have distinct music. When entering the orcish capital
of Orgrimmar, the player hears that a certain piece of music
starts, dominated by wardrums. This music is distinct from the
music heard when entering the human capital of Stormwind,
which has a more Wagnerian epic style. In both cases, the music
is there as a mood enhancer that emphasises classical fantasy
conventions of the noble humans and the savage orcs. In this
context, it is important to point out that atmospheric function of
music is guided by genre conventions. In survival horror games
such as the Silent Hill series (Konami 1999-2004), atmospheric
sound and music are used to emphasise a very specific mood of
anxiety and horror. However, it should be noted that this mood
also has the power to influence the player’s behaviour in the
game. When the player becomes anxious he may act more carefully in order to avoid any dangerous enemies and unpleasant
situations. In this sense, atmospheric sound may thus work
indirectly to influence player action.
In connection with the orienting functions of game audio, it is
important to note that sound seems to extend the player’s visual
perception beyond what is possible without sound. In the presence of sound, the player receives information that the visual
system cannot process, such as for instance events and objects
situated outside the line of sight. It also enables the player to
know what is going on in locations not in the immediate vicinity
of the player.
The perhaps most obvious orienting function of sound is that it
provides information about the presence of objects as well as the
direction of sound sources. This is especially important in the
context of offsceen sources. Sound may thus reveal the appearance and presence of an object before the player has actually
seen it, and provides therefore information that the visual system
could not provide on its own. A good example is the shouting
voices of offscreen guards in Hitman Contracts. Today’s computer games utilize the stereo channels to inform the player
about the relative direction of a sound source. However, although a stereo sound system does reveal the relative direction
of a certain source, it is not able to provide information on
whether the source is located in front of or behind the player.
True surround systems demonstrate significant possibilities for
providing detailed information about the location of an offscreen
Atmospheric sounds may also influence player behaviour in
more direct manners. When music is used for responsive and
urgency purposes, it will also have atmospheric properties. In
the example from Hitman Contracts above, we see that different
pieces of music provide different kinds of information to the
player. The music does not only work as pure information, it
also emphasises mood. For instance, when the player is in a
combat situation, the music becomes more aggressive by an
increased tempo and a more vivacious melody.
Although music may be the more persuasive kind of atmospheric sound, environmental and object sounds as well as dia-
50
source, and prove to be interesting for the further development
of game audio functionality. These orienting functions are also
demonstrated in the research on audio-only games for the blind
and visually impaired. This research demonstrates the use of
characteristic sounds that identify objects and events, their
presence, and their relative location [4, 5, 13].
“ready for action”. However, it is interesting to see that these
utterances not only identify the unit; they also signal the relative
value of it. This means that the more powerful a unit is, the more
distinct its sound of recognition is. Within Warcraft III’s orc
team, the workers utter sentences that suggest obedience and
humbleness such as “work, work”, “ready to work” and “be
happy too”. The named warchief which represent the most
powerful units in the game, on the other hand, utter sentences
such as “I have an axe to grind”, “for my ancestors”, and “an
excellent plan”, which emphasise aggressiveness, honour, and
strategic insight. In addition, its voice is deeper than the voices
of other units, as well as the fact that the footsteps of the unit
sound heavily. Thus, we see that the quality and content of the
sound are used in order to ease recognition of certain objects in
the game as well as to signal the value of different units.
3.4 Control-Related Functions
Tightly connected to the orienting functions are control-related
functions. These are related to the idea that sound extends visual
perception, and point to what sound directly contributes to concerning the player’s control over the game environment.
Since game audio extends visual perception, it enables the
player to be in control over unseen areas. Strategy games often
provide good examples of this. In the real-time strategy game
Warcraft III, the player receives auditory information about
events happening on remote areas of the map. When the player
is busy leading his/her army to battle, s/he still receives voiceover messages about status of the base, such as “upgrade complete” and “ready to work”. These messages contribute to increased control over all activities in the game. The same game
also utilizes sound to provide the player with more detailed
information than what visuals can provide. Combat situations in
this game tend to be chaotic due to the fact that there is a huge
number of military units fighting on each side. It is therefore
difficult for the player to see exactly what happens in combat.
The sounds of bowstrings and metal against metal inform the
player what units are fighting, and screams tell the player that
units are dying. In this example we see that sound contributes to
ease the player’s management of the game by providing information that is difficult to provide by visuals only.
4 General Discussion
The functions identified above are closely related to each other
although they seem to stem from different aspects of games.
Most of the functions seem to be motivated by usability, although the atmospheric function seems to go against this by
emphasising presence and immersion into the fictional game
world. These two seemingly different purposes of game audio
are connected to the fact that computer games are user systems
at the same time as they are set in fictional worlds. However, it
is important to note that computer games also bridge these two
domains, something which also becomes evident through their
use and implementation of audio.
How, then, does this fusion of user system and fictional world
happen? To say it bluntly, it happens through giving many
sounds a double function where they belong to in-game sources
and are accepted as fiction at the same time as they provide
specific information to the player. We can identify three central
techniques that ensure that this merge seems transparent and
intuitive; namely the use of auditory icons, earcons, and transdiegetic sounds.
These examples are also related to the idea that the presence of
sound eases or increases the player’s attention and perception.
This was suggested by the informants of my study, who emphasised the idea that channel redundancy, or presenting the same
information through different perceptual channels, increased the
ability to register certain messages [14]. When sound was absent, Warcraft III players had difficulties noticing written messages appearing on the screen. This is probably due to the high
tempo of the game, and the fact that the player’s visual perception is focussed on specific tasks in the game.
Since auditory icons have an immediately recognizable relation
to its source, these are very well suitable for combining the
usability function with the fictional world. The sounds seem
natural to the game environment, at the same time as they provide the player with information relevant for improved usability
of the system. This is what hinders the sound from the buildings
in Warcraft III to seem misplaced.
3.5 Identifying Functions
Another interesting function connected to sound is its ability to
identify objects and to imply an objects value. The fact that
sound identifies may not seem surprising, since sound in general
indicates its producing source. However, this is utilized in
games, not only in the format of auditory icons that automatically are recognized, but also in the format of earcons that needs
to be learned before they can be recognized as belonging to a
specific source. We have already discussed the example from
Hitman Contracts where music is used to identify certain situations.
Earcons may be said to work the other way around, since they
illustrate an artificially constructed relation between sound and
source. The use of artificial noises may contribute to a certain
auditory message becoming very noticeable or even disturbing
because of its unexpected relation to a certain source, such as is
the case with the squeaking negative response produced when
the player tries to make an illegal action in Warcraft III. On the
other hand, the use of game music does not seem disturbing
because it utilizes accepted conventions from film music and
adds mood to the game. This is why the player accepts music
which changes according to the situation in a game such as
Hitman Contracts, and which plays in major when the player is
doing well and in minor when the player is doing badly.
Warcraft III connects identifying sounds to units and buildings.
From the player’s top-down view on the environment it may be
difficult to distinguish objects from each other. However, as
noted above, when the player selects the lumber mill, s/he will
hear the sound of a saw, and when s/he selects the barracks, s/he
hears the sound of marching feet. This enables the player to
easily recognize the building without having a clear view of it.
In the case of units, each of them presents an utterance of recognition when produced and when manipulated. This means that a
worker says things such as “ready to work”, while a knight says
The third technique that makes the fusion between usability and
presence in a fictional world transparent is transdiegetic sounds.
Transdiegetic sounds break the conventional division between
diegetic and extradiegetic sounds by either having diegetic
sources that communicate directly to the player, or by being
extradiegetic sounds that game characters virtually can hear.
51
[7] Sorkin, Robert D., “Design of Auditory and Tactile Displays”, in Salvendy, Gavriel (ed.): Handbook of Human Factors.
New York, Chichester, Brisbane, Toronto, Singapore: John
Wiley & Sons, 549-576, (1987).
When sound in films breaks this common separation between
diegesis and extradiegesis, it is understood as a stylistic, artistic
and uncommon way of using sound, but games utilize this functionally to bind together usability and fictional space. This
means that it does not feel disturbing when a unit in Warcraft III
says “What do you want?” with direct address to the player,
although the unit is regarded a fictional character and the player
who has no avatar in the game is situated in real world space.
Neither does it seem strange that the avatar as a fictional character in The Elder Scrolls IV: Oblivion (Bethesda 2006) reacts by
drawing its sword when the musical theme that suggests nearby
danger starts playing – although a film character would not react
in this way, a game character can due to the link between avatar
and player.
[8] McKeown, Denis, “Candidates for Within-Vehicle Auditory
Displays”,
Proceedings
of
ICAD
05.
Available:
http://www.idc.ul.ie/icad2005/downloads/f118.pdf [10.04.06],
(2005).
[9] Suied, Clara, Patrick Susini, Nicolas Misdariis, Sabine Langlois, Bennett K. Smith, & Stephen McAdams (2005): “Toward
a Sound Design Methodology: Application to Electronic Automotive Sounds”, Proceedings of ICAD 05. Available:
http://www.idc.ul.ie/icad2005/downloads/f93.pdf
[10.04.06],
(2005).
In this sense, computer game audio aims to combine usability
with presence and immersion in the fictional game world, and
by doing this the realization and functionality of game audio
becomes in different ways similar to both film audio and auditory displays and interfaces. This creates a very unique way of
utilizing audio which is especially designed to emphasise how
modern computer games work.
[10] Bordwell, David & Kristin Thompson, Film Art: An Introduction. New York: Mc-Graw Hill, (1997).
[11] Jørgensen, Kristine, “On Transdiegetic Sounds in Computer
Games”, Northern Lights 2006, Copenhagen: Museum Tusculanums Forlag, (2006).
[12] Gorbman, Claudia, Unheard Melodies? Narrative Film
Music, Indiana University Press, (1987).
5 Summary
As a summary of the concluding chapter of my upcoming Ph.D.
thesis on the functionality of game audio in relation to actions
and events, this paper has concerned computer game audio
functionality. The paper identifies and describes the most important functions of computer game audio and provided an explanation of why these functions are central to computer game audio.
The main argument is that modern computer games are set in
fictional, virtual worlds at the same time as they are user systems, and in order to combine this in the most transparent way,
they break the common concept of diegesis by utilizing auditory
icons and earcons for informative purposes.
[13] Röber, Niklas & Maic Masuch, “Leaving the Screen. New
Perspectives in Audio-Only Gaming”, Proceedings of ICAD-05.
Available:
http://www.idc.ul.ie/icad2005/downloads/f109.pdf
[02.08.06], 2005.
[14] Heeter, Carrie & Pericles Gomes, “It’s Time for Hypermedia to Move to Talking Pictures”, Journal of Educational Multimedia and Hypermedia, winter, 1992. Available:
http://commtechlab.msu.edu/publications/files/talking.html
[03.08.06], 1992.
6 References
[1] Juul, Jesper, Half-Real. Video Games Between Real Rules
and Fictional Worlds. Copenhagen: IT University of Copenhagen, (2003).
[2] Branigan, Edward, Narrative Comprehension and Film.
London, New York: Routledge, (1992).
[3] Chion, Michel, Audio-Vision. Sound on Screen. New York:
Columbia University Press, (1994).
[4] Drewes, Thomas M. & Elizabeth D. Mynatt, “Sleuth: An
Audio Experience”, Proceedings from ICAD 2000. Available:
http://www.cc.gatech.edu/~everydaycomputing/publications/sleuth-icad2000.pdf
[03.06.2005],
(2000).
[5] Friberg, Johnny & Dan Gärdenfors, “Audio Games: New
Perspectives on Game Audio”, Proceedings from ACE conference 2004. Available: www.cms.livjm.ac.uk/library/AAAGAMES-Conferences/ACM-ACE/ACE2004/FP18friberg.johnny.audiogames.pdf [02.08.06], (2000).
[6] Keller, Peter & Catherine Stevens (2004): “Meaning From
Environmental Sounds: Types of Signal-Referent Relations and
Their Effect on Recognizing Auditory Icons”, in Journal of
Experimental Psychology: Applied. Vol. 10, No. 1. American
Psychological Association Inc., 3-12, (2004).
52
Composition and Arrangement Techniques for Music in Interactive Immersive
Environments
Axel Berndt, Knut Hartmann, Niklas Röber, and Maic Masuch
Department of Simulation and Graphics
Otto-von-Guericke University of Magdeburg
P.O. Box 4120, D-39016 Magdeburg, Germany
http://games.cs.uni-magdeburg.de/
Abstract. Inspired by the dramatic and emotional effects of film music, we aim at integrating music seamlessly into interactive immersive applications — especially in computer games. In both scenarios it is crucial to synchronize their visual
and auditory contents. Hence, the final cut of movies is often adjusted to the score or vice versa. In interactive applications,
however, the music engine has to adjust the score automatically according to the player’s interactions. Moreover, the musical
effects should be very subtle, i. e., any asynchronous hard cuts have to be avoided and multi-repetitions should be concealed.
This paper presents strategies to tackle the challenging problem to synchronize and adapt the game music with nonpredictable player interaction behaviors. In order to incorporate expressive scores from human composers we extend traditional composition and arrangement techniques and introduce new methods to arrange and edit music in the context of
interactive applications. Composers can segment a score into rhythmic, melodic, or harmonic variations of basic themes, as
known from musical dice games. The individual parts of these basic elements are assigned to characterize elements of the
game play. Moreover, composers or game designers can specify how player interactions trigger changes between musical
elements. To evaluate the musical coherency, consistency, and to gain experience with compositional limitations, advantages
and possibilities, we applied this technique within two interactive immersive applications.
1 Introduction
order to start the next piece of music. These hard cuts destroy
inner musical structures that we are used to hear and thus eventually break the game’s atmosphere. Because of music cultural
typification and preparatory training of the listener he perceives
music with a certain listening consuetude. He is used to hear musical structure even unconsciously. That is why humans would
recognize hard cuts even while hearing some piece of music for
the very first time. In addition, the succeeding music requires at
least a few seconds to evolve its own atmosphere — time of an
atmosphere-less bald spot. All these factors lower the immersion
of the user into the virtual world, which is particularly dangerous
in all application domains where music is used to intensify the
immerse of the user into a virtual environment.
To solve this antagonism between static musical elements
within dynamic interactive environments one may be tempted to
formalize music composition and delegate the creation of background music to automatic real-time generators. Even though the
development of automatic composition systems has been one of
the first challenges tackled by researchers within the field of artificial intelligence (see for instance Hiller’s & Isaacsons’ automated
composed string quartet “Illiac Suite” [8] and the overview articles [16, 4]), the qualitative problem is apparently clear. Despite
of the long research tradition, the majority of these systems are
specialized to a single musical style (e. g., chorales in the style
of Johann Sebastian Bach [5, 6]) or tootles more or less pseudo
randomly tunes (due to the lack of high-level musical evaluation
criteria for optimization methods [3, 7, 13, 11] or machine learning techniques [21]; due to the application of stochastic methods
such as Markov chains [8]). Another conflict with our intention
to integrate expressive musical elements in interactive immersive
applications results from a major strategy of research in computer
music: researchers manually extract rules or constraints from textbooks on music theory or develop algorithms which automatically
In movies and theater, directors and composers employ musical
elements and sound effects to reinforce the dramatical and emotional effects of pictures: scores can bring a new level of content
and coherency into the story-line, can invert the picture’s statement, or can insert elements of doubt or parody. However, these
effects have to be very subtle as they are intended to be perceived
subconsciously. Only in this way — through by-passing the process of concentrated listening and intellectual understanding —
the music can establish it’s emotional power (cf. [10, pg.22ff]).
In the post-processing of movies, directors, cutters, and composers cooperate to intensify their emotional impact. One — very
critical — aspect of this procedure is the careful synchronization of the visual and auditory contents. Usually, the final cut of
movies is done according to the underlying score.1 . Hence, musical structures become a pattern for scene cuts and transitions.
Often, pictures and music seem to be of a piece: scene and content transitions (actions and events within a scene) melt in the
music. In the opera this is even more extreme: nothing happens
without a musical trigger. This is only possible due to the static
nature of these linear media: all transitions are known and have
been adjusted with the score.
In interactive immersive applications such as computer games,
however, the music engine has to adjust the score automatically
according to the player’s interactions. Very often pre-composed
musical elements or pre-arranged sound effects are triggered by
some elements of the game play. This problem is intensified by
the asynchrony between game elements and player interactions:
very often the music engines of computer games simply disrupt
the currently playing music regardless of its musical context in
1
Schneider describes this from the composer’s point of view [19] and Kungel goes
deep into the practical details considering also the cutter’s concerns [10].
53
berger’s (1721–1783) [9] and Wolfgang Amadeus Mozart’s
(1756–1791) [14] dice games base upon a common harmonic
structure: all interchangeable group elements are short (just one
bar) and are based on the same underlying harmonies. In order
to “compose” a new piece, the player selects elements from sixteen melody groups by throwing a dice. Jörg Ratai [18] extends
this idea by exploiting chord substitutions and harmonic progressions, so-called jazz changes. The basic compositional principle
is to replace elements within a harmonic context by appropriate
substitutions (e. g., borrowed chords). While the basic blocks of
Ratai’s jazz-dice contains manually composed harmonic variations, Steedman [20] proposed an automatic system employing
recursive rewriting rules. Even though Ratai’s system is based on
a simple 12-bar blues schema, it achieves an enormous harmonic
variance.
extract them from a corpora of compositions and use them to generate or evaluate new tunes. But as any composer will notice,
a pure adherence to these rules neither guarantees vivid compositions nor do great compositions follow rules in all respects. An
agile musical practice constantly defines new musical patterns and
breaks accepted rules in order to achieve expressiveness and a
novel musical diction. Therefore, we decided not to replace the
human composer by an automatism. The music should still be
written by an artist, but it has to be composed in a way that it is
re-arrangeable and adaptable in order to adapt the game’s music
with non-predictable player interaction behaviors.
The following sections describe one way to achieve this goal.
The compositional roots of our approach are introduced in Sec. 2.
Sec. 3 describes a music engine which acts like a kind of real-time
arranger. Sec. 4 demonstrates the application of this technique
within two interactive immersive applications. Sec. 5 summarizes
this paper and motivates directions for future research.
Other musical arrangement techniques — abbreviations and
jumps — can help to overcome the second problem: the smooth
transition between musical elements. Sometimes composers or
editors include special marks into the score indicating that the performer can jump to other musical elements while omitting some
segments (e. g., da capo or dal segno). A few game music use this
method to ensure that the music is not cut at any position but only
on these predefined ones (e. g., only on the barline like in Don
Bluth’s music for the game “Dragon’s Lair 3D”).
A non-compositional technique which is used quite often by
disc jockeys for transitions between different pieces of music or
sounds is the cross-fade. Cross-fading means, while the currently
running music fades out the next piece is started and fades in.
During this time both pieces of music are hearable. This bares
a big problem: both pieces of music might not harmonize. In
particular differing tempi, rhythmic overlays, and dissonant tones
might be quite confusing to the listener. Hence, one cannot crossfade arbitrary musical pieces ad libitum.
2 Compositional Roots
Compositional and arranging techniques as well as the musical
practice already offer a number of methods which can be used
to make music more flexible and to adjust the length of musical elements to a performance of unpredictable length. There are
two basic problems in the synchronization between musical and
performance elements: (i) the duration of a given musical element
might not suffice or (ii) the performance requires a thematic break
within a piece.
There are several strategies of composers and musicians which
are useful to tackle the first problem: to remain at some musical
idea while keeping the musical performance interesting.
The simplest solution is to extend the length of a musical element — the whole piece or just a segment of it can be repeated
or looped. This practice is used to accompany e. g., folk dances,
where short melodic themes are usually repeated very often. The
infinite loop is also known from the first video games and can still
be found in recent computer games. Examples therefore are Super Mario Bros., the games of the Monkey Island and the Gothic
series.
By exploiting the different timbres of the instruments in an
orchestra the instrumentation or orchestration opens up a second dimension. The composers of building set movements actually surpass the instrumentation technique [12]: their music
can be performed by all notated parts at once or just by a few
of them. Different combinations are possible and can be used
to vary the performance of different verses. Nonetheless, every part-combination sounds self-contained and complete. This
composition manner has its roots in the baroque practice of rural
composition. Baroque composers like Valentin Rathgeber (1682–
1750) [17] wrote such music with reducible choirs and instrumentations for congregations with minor performance potentials.
Today this acquirement is nearly extinct.
In the whole polyphonic music there is not always just one
leading part with an own melodic identity. Heinrich Schütz
(1585–1672) already taught his students to compose in multiple
counterpoint, i. e., the parts can be interchanged [15]. Every voice
has its individual identity. Ergo the soprano can be played as a
tenor, the bass as soprano and so on. Here every voice can be the
upper one, can be “melody”. Johann Sebastian Bach (1685–1750)
demonstrates this impressively in his multi-counterpoint fugues.
Musical dice games show another way to bring more flexibility into the music. The basic principle is that composers create groups of interchangeable musical elements which are randomly selected during a performance. Johann Philipp Kirn-
Computer games aim at immersing players into a three dimensional virtual environment. Interestingly, the multi-choir music
of some Renaissance composers already integrated the three dimensionality of real world in some way into their compositions.
The choirs (here this term also includes instrumental groups) were
separated by open ground and placed e. g., to the left, right, front
and/or back of the audience. One of the greatest and most famous representatives for this practice of composition and performance is the Venetian Giovanni Gabrieli (1556/57–1612). The
listener can have quite different musical experiences by changing
his own position in the room. He can hear the choirs playing alternately, communicating with and melting into each other bringing
the whole room to sound. This can also be considered as surround
music. Every choir has its own identity but all together build a
bigger musical sound-scape.
In their analysis, Adorno and Eisler [1] already pointed out that
traditional music structures do not work within the new medium
film, where the music (i) once had to become formally more open
and unsealed and where (ii) composers had to learn following
rapid short-term scene structures. But while new compositional
techniques and musical forms have been successfully established
for film music, the development of compositional techniques for
music in interactive media is still in the beginning.
In modern computer games, the development of a non-linear arborescent background story adds another dimension to the three
dimensionality of virtual worlds. In contrast, music is only one
dimensional with a fixed beginning and end where everything inbetween is predefined or pre-composed. The musical techniques
outlined above are able to add further dimensions to game music:
54
(a) Distributed Music
(b) Music Change Marks
Figure 1: (a) Sound sources and their radii in the distributed music concept.(b) Passing music change marks triggers musical changes.
appropriate parts from a given score and changes between musical
elements.
The following subsections will describe the several aspects of
music arrangement and composition that promote the adaptation
of musical structure to interaction space structure.
instrumentation changes, building sets, and the melodic independence of interchangeable parts within a counterpoint score can
vary the sound impression of a single musical element whereas
musical dice games can even vary musical content; loops, abbreviations, and jumps can stretch and shorten the timely disposition.
All these techniques have to be combined in order to tackle the
problematic synchronization between musical and performance
elements. Moreover, the multi-choir manner offers a way to integrate spatial dimension within the composition. Now we go a
step further and introduce a way to gain actually four dimensions
and consider user interactions.
3.1 Overview
The basic principle of our music engine is the integration of musical elements into the virtual world and the re-arrangement of
pre-composed musical pieces in real-time.
We introduce the concept of parallel and sequential music distribution into different musical elements, which can be blended
without dissonances (parallel and synchronously running elements), and meaningful self-contained musical entities, which can
be re-arranged without interrupting the playback (sequential elements).
Inspired by multi-choir music, dedicated parallel parts of a
score characterize elements of the virtual world and the game
play. Therefore, the game designer can assign these parts to 3D
locations. Fig. 1-a contains some game objects (four rooms in
floor plan) with musical elements. Their hearability can interfere,
when the player navigates from one location to the another.
Fig. 2 illustrates the subdivision of a holistic score ((a) in Fig. 2)
into parallel tracks ((b) in Fig. 2) which characterize the four locations in Fig. 1-a. Since the player is free to navigate through all
game objects the associated tracks should possess an independent
musical power (e. g., with own melodies) and must harmonize
with other synchronous tracks to do a proper cross-fading. Therefore, the composers have to apply the techniques of the building
set manner and multiple counterpoint.
Furthermore, the score is segmented into sequential blocks ((c)
in Fig. 2), which can be re-arranged in various ways in order to
achieve articulated musical changes. Music change marks (see
also Fig. 1-b) can trigger the re-arrangement, i. e., a edit the block
sequence even during playback. Here different block classes (as
illustrated in Fig. 2-d) are considered which denote special blocks.
These are used for transitions and for music variance.
The following sections will describe these introduced concepts
in more detail and also consider compositional matters and rela-
3 A Music Engine as Real-Time Arranger
“Games may owe something to movies, but they are as different from them as movies are different from theater.” This statement of Hal Barwood2 also applies for the music of computer
games: the player’s integration in a three dimensional virtual environment, a non-linear arborescent background story, and unpredictable user interactions prevent the direct application of traditional techniques or film music composition techniques and sound
elements designed for movies in computer games.
The previous section revealed the main problem of a music
engine in interactive immersive applications, especially in computer games: the automatic adaptation of the score automatically
according to non-predictable player interactions. Moreover, all
game elements should be designed in a way that prevents losses
of immersion. Hence, sound elements have to be integrated into
the virtual world and all musical effects should be very subtle,
i. e., any asynchronous hard cuts have to be avoided and multirepetitions should be concealed.
This paper presents a new method to compose and arrange
scores for interactive immersive applications. Composers can
segment a score into rhythmic, melodic, or harmonic variations
of basic themes, as known from musical dice games. The individual parts of these basic elements are assigned to characterize
elements of the game play (e. g., objects, actors, locations, and
actions) or narrative elements. Moreover, composers or game designers can specify how player interactions trigger the selection of
2
Game designer, writer, and project leader for Lucas Arts’s computer game adaptations of the Indiana Jones movies [2].
55
Parallel Distribution. To locate music on several positions in the three dimensional environment we literally place it
there by using punctiform sound sources as known from physically oriented 3D-audio modeling environments (e. g., OpenAL
http://www.openal.org/). This can be considered as multi-choir
music where the choirs are placed at particular positions in the
room. As in the real world every source has a region of hearability. As Fig. 1-a illustrates it can be set according to the measures
of its location. Thus, the musical accompaniment can cover the
location it belongs to, completely. Depending on the position of
the listener he can hear the sources at different volume levels, thus
any movement can change the volume gain which corresponds to
fading. Sec. 3.4 goes deeper into detail with this.
This concept means, multiple tracks which are associated to 3D
locations run in parallel (and as loops) and are cross-faded when
the player moves between the sound sources (i. e., the locations).
As we discussed previously 2 one cannot cross-fade any pieces
of music ad libitum. But to enable this, the music can be specially prepared: comparable to multi-choir music everything runs
synchronously (by starting the playback at the same time 2-b),
and the music of each sound source is composed having regard to
sound in combination with the others.
If multiple parts need to sound together, they cannot sound as
individual as the composer or the designer might want them to
be. For the first, the music which is dedicated to the location the
player currently visits, is the most important, and thus leading
one. All other parts are adjusted variations, not the more individual original! If the player now leaves the location these variations
can fade-in without any musical problems.
Sequential block distribution. Since, the music, that fades in,
is only an adjusted variation to the one, that fades out, pure crossfading is only half of the music transition. It can realize only a
certain degree of musical change. The arrival at the new location,
and the new music is still missing.
Therefore, all pieces of music are partitioned into blocks of
self-contained musical phrases which should not be interrupted to
ensure musical coherency. But to achieve a real music change,
other blocks can be put into the sequence. The sequence can be
adapted just in time even while the playback is running (cf. Fig. 2c).
To perceive the necessity of such a music change, so called music change marks are placed at the connection points between the
locations. These are triangle polygons as illustrated in Fig. 1-b. A
collision detection perceives when the player moves through one
of them. While the new music is loaded, the playback goes on till
the end of the current musical block, where the block sequence of
the next music is enqueued glueless. The playback goes through
without any stops or breaks.
After such a music change the new location and its associated
music is the most important and thus leading one. The remaining
pieces are adjusted variations to this.
Up to now everything was triggered by position changes in the
three dimensional virtual space. The fourth dimension, that is
the story which can change independently from the players 3Dposition, can also be an actuator for music changes. These are
executed in the same way as the triggered music transitions, described here. But for this they are not caused by position changes
of the player but by (story relevant) interactions.
Figure 2: A parallel and sequential segmentation of a holistic score. Different block
classes can be considered by the music engine.
tionships, which the composer considers to enable the adaptiveness of his music.
3.2 Distributed Music
3.3 Loop Variance
The structure of music has to attend four dimensions: the three dimensional virtual world and the story dimension. Therefore, we
describe an approach to distribute the music in these four dimensions.
One basic principle of human communication is its economy. If
a message is being repeated, i. e., if the communication contains
a redundancy, it is interpreted purely on the level of pragmatics.
56
Fading by Distance
Attenuation
gain
maxGain
minGain
O
minDistance
maxDistance
distance
Figure 4: Moveable sources (pink, in contrast to static sound sources in blue) are
positioned inside of their bounding volume (yellow) as near to the listener
as possible.
Figure 3: The distance of the listener to the source controls their perceived volume
gain. For a smooth attenuation behavior a cosine function is used.
Frequently, senders using this technique intend to convey the importance of this message to their audience. But if these messages
are conveyed too often their inherent redundancy will cause the
audience to be bored or bothered. This phenomena also applies to
music: as soon as the player can recognize infinite loops within
the background music of computer games it will sooner or later
loose its unobtrusiveness and become flashy. Therefore, composers are urged to avoid readily identifiable motifs, themes, or
melodic phrases which could be immediately be recognized at
the first repetition. Hence, background music aims to be diffuse,
nebulous, and less concrete. In practice game designers furthermore give composers an approximate time the user will presumably spend on each game element. In order to avoid that players
recognize repetitions, the musical disposition is planned with regard to this length. But this does not work for all (especially the
slow) playing behaviors.
To protract the effect of recognition, we introduce a way to vary
the music in form and content: our system incorporates musical
blocks called one-times, i. e., the music engine plays them only
one time and removes them from the list of active blocks after the
first cycle (cf. Fig. 2-d). Ideally, subsequent loop iterations appear
like a continuation of the musical material or a re-arrangement
of the first loop iteration and soonest the third iteration can be
recognized as a real repetition.
This behavior was originally implemented for transitions between two pieces of music, which likewise have to be played only
once. But, by using One-Times also in-between a current music,
it turned out to be an effective instrument to adjust the musical
length of game elements, as the second repetition acts as a buffer
for very slow players.
Moreover, parallel musical themes of several game elements
are cross-faded when the player moves, which activates the musical concept of timbre changes, instrumentation or harmonic variations.
tened or pop up too late after a very flat beginning phase. This
behavior is not just unmusical; it catapults the music out of the
subconscious and into the focus of the consciously perceived. In
contrast, the music attenuation model of our system emulates how
sound engineers mix several parts in a musical performance.
Therefore, often a linear function is used. But at the beginning
and end there are points of undifferentiability, which cause abrupt
volume changes at the begin and end. As in graphical animation
this jerky behavior appears to be mechanical unnatural. In order to
obtain those smooth fadings as sound engineers achieve manually,
we make use of a scaled cosine-function in the interval from zero
to Π. Fig. 3 illustrates that sound sources in our distance model
are characterized by two circumferences of minimal and maximal
perceivable gains. Therefore, we need only one source to cover a
small room or a wide yard.
3.5 Moving Sources
The distance model presented in the previous section is well
suited for compact and uniform locations. As background music is independent from sound effects and acoustic situations, it
would be a fault to incorporate a physically correct sound rendering component which can consider sound barriers such as walls
and reflections. This causes a problem with covering locations
of a less compact shape (e. g., long, narrow, flat, or high rooms,
corridors, or towers). Either several small sound sources have to
be placed over the shape of the location or the maximal distance
value of one centrally placed source is set very far. In this case one
can hear this music through walls deep into neighboring rooms.
For a more precise treatment we employ two strategies:
Moveable sources within a bounding volume. Fig. 4 presents
an example of this strategy, which is specially suited for in-door
scenes. Game designers can assign a three-dimensional bounding volume for each source. The bounding volume approximates
the spatial extent and shape of the location. By placing the sound
source inside of this volume and as near to the listener as possible it can now cover the location more precisely. If the listener
is inside the bounding volume the source is on his position. If
he leaves it, the sound source stops on the border and follows
his movements along the border. This behavior also prevents
jumps when the listener enters the volume. Otherwise source
jumps would sound like sudden volume changes. By the automatic alignment towards the listener this can be avoided, thus the
fading is always smooth and believable.
3.4 A Music Distance Model
Our system places musical elements as punctiform sound sources
in a virtual 3D environment. In a correct acoustic model the impact of a sound source — its gain — depends on the distance to
the listener (the nearer the louder). But physically correct attenuation or distance models are not appropriate to maintain the details
which characterize a piece of music as several important musical aspects such as its dynamic (e. g., crescendi : getting louder or
fading in and decrescendi : fade-out) would be too quickly flat57
Bonded moveable sources. Music can also be applied to other
moveable game elements like characters and objects. They do not
have to be enchained to their predefined positions or locations in
the virtual world. Non-player-characters in games can move as
well as the player.
Music is attached to them by replacing sound sources at one go
with the movement of the game object. This can be done by the
game engine using a dedicated interface command offered by the
music engine.
is already introduced by going nearer to the person, the object, or
the location.
These special kind of scene transitions ans inner-musical ideas
or processes often overlap or are interweaved in a way that this
cannot be represented by a simple sequence of succeeding blocks.
Modulation and tempo changes (ritardando, accelerando) are examples for this. Here the parallelism of multiple cross-fadeable
parts can help, too. For modulation ambiguous chords can be
used. In the same way it is possible to realize the evolution of a
motif. Tempo changes can be achieved by changing the metrical
conciseness. Here the composer is restricted in his possibilities
and forced to use more complicated compositional tools to compensate this.
Nevertheless, these solutions, although they can achieve analog results, are not equivalent substitutions for common changes
of harmony, tempo, motif, and so on. These usually happen
in transitional blocks which lead over to the next music. Since
these blocks are played only once those processes can run “prerendered” inside of them. So with the use of One-Times these
ostensible limitations can be conquered, too.
Note, that the hard cut should still be available. It is used sometimes to mediate an abrupt change, a surprise and to shock. It may
also still be useful for some types of measureless music.
4 Results and Discussion
It is a challenging problem to develop methods which are able
to evaluate the artistic quality of music — or any other kind
of computer generated art. In our system, however, the quality
of the background music accompanying interactive media heavily depends on the ability of an artist to compose multiple selfcontained parts in the building set manner, so that the parts of the
score can characterize game elements convincingly. Hence, the
main objective of our music engine is to guarantee that all adaptation techniques do not interfere with inherent musical structures
(e. g., prevent abrupt breaks of melodic units, conflicting tempi
and rhythms, dissonances). The main challenges in this scenario
are to (i) integrate the automatic arrangement of musical elements
and the transitions between them and to (ii) conceal boring repetitions, so that the player gets the impression of consistency, coherency and seemingly pre-composed music that accidentally fits
to the interactive events.
Except of a few standard cases, a classification of music transitions is not possible, because they eminently consider the specific musical contexts to which they attach. A music transition
is usually as unique as the music it leads over. Therefore, a pure
enumeration of supported music transitions does not reflect the
quality of an adaptive music engine. In contrast, an evaluation
has to consider the usability of compositional methods and techniques applied in music and music transitions for composers and
game designers. In which way do they constrict the composer or
enrich his possibilities? But as the first author of this paper is also
the author of our system, the composer and the game designer, we
cannot provide any proper results.
The previous discussion reveals that the combination of all
functionalities offers a rich and expressive pool of compositional
techniques for adaptive music in interactive environments. But
we are aware of the fact, that additional constraints can arise from
specific musical styles. This raises the question, how different
cross-fadeable parts can be? Can we integrate musical pieces
which are not based on tonal music in order to extend the musical
language used in interactive media? The poly-stylistic composition manner and the style-collages of composers like Bernd Alois
Zimmermann (1918–1970) already affirm our hope that there are
no restrictions to music for adaptable music in interactive media.
Hence, composers are not forced into a specific musical style as
the concept of parallelism and sequentiality are generally used in
music. Furthermore, by including the building set manner already
in the process of composition the results will always be faithful in
style according to the composer’s intention.
We developed two prototypes to demonstrate the capabilities of
adaptive music in interactive applications: a 3D adventure game
and a presentation presenting a picture sequence likewise an interactive comic strip. We believe that the techniques presented in
this paper open up a number of new possibilities. Musical soundscapes, for example, can benefit from the fading concepts and
with moveable sources they get a new powerful tool to establish
a new never-before heard experience. It is even possible to let
the music fly around the listener in all three dimensions. The
three-dimensional arrangement of punctiform sound sources can
furthermore be used for a positional audio effect and a surround
output. Thereby the music can act as an auditive compass or an
orientation guide.
In the following, we will discuss some of our transition techniques in more detail: the most simple way to change from one
musical idea or situation while preserving its unity is to finalize a
musical block and append the next one. This corresponds to play
the current music block over to the end and start the next. By
partitioning the music into multiple short blocks the latency between interaction and music change can be reduced. But the composer has to be aware of the connectivity to the next music which
can now follow after every block. For instance, the melodic connection should never possess any illogical ineligible jumps. Depending on the melodic mode this might entail a limitation for its
evolvement.
In movies the so called L-cut denotes a consciously asynchronous transition of sound and pictures. That means, the sound
(including music) can switch noticeably earlier to the next scene
than the pictures. Carried over to interactive environments this
means to do the music transition before the actuating interaction
is done or the trigger is activated. Of course, this simple approach
(wait until the end of a block then do the transition) does not work
for this quite extreme example. But it can actually be achieved by
cross-fading. The music transition is already in full activity when
the end of the block is reached because the new musical material
5 Conclusion and Future Work
As already mentioned in Sec. 2, there is still a lack of compositional and arrangement techniques for music in new interactive
media. This paper presents both (i) new compositional techniques
for adaptive music in interaction media and (ii) an automatic realtime arrangement technique of pre-composed parallel and sequential musical elements. We have shown how they can be used to
create a coherent musical accompaniment for interactive applications.
58
[11] B. Manaris, P. Machado, C. McCauley, J. Romero, and
D. Krehbiel. Developing Fitness Functions for Pleasant
Music: Zipf’s Law and Interactive Evolution Systems. In
3rd European WS on Evolutionary Music and Art (EvoMUSART), pages 498–507, 2005.
By abolishing the hard cut we could ensure an appropriate musical performance and — more importantly — we could raise the
effect of immersion to a higher level. With this solution interactive environments can approach the immersiveness of movies. In
spite of non-predictable user interactions the background music
never seems to be taken by surprise of any scene transitions or
user actions.
Our approach is able to integrate expressive scores from human
artists. In oder to support their compositional style, traditional
compositional techniques such as building set composition, multiple counterpoint, and multi-choir music which was up to now
often just on the fringes grows up to new importance. All these
aspects lead to a solution which includes technical and musical
concerns as well. It actually opens up new musical spaces and
possibilities.
The limitations of our work also mark some directions of future research: the integration of a random selection between alternative group members or more flexible transitions can prevent
the direct recognition of looping parts, the player recognizes and
gets bothered. Furthermore, the stripline between musical blocks
should not be forced to be synchronous for every track or source.
Musical blocks can overlap e. g., by an offbeat. An enhanced distance or attenuation model can improve the fading between parallel blocks. It ensures that the fading always sounds believable and
without baring any points of undifferentiability. But if the listener
stops his movement new such points appear again because the fading stops with the same abruptness as the listener. To avoid this
the listener movement should be handled with some inertance.
Thus an always continuous and differentiable distance model can
be built.
[12] J. Manz and J. Winter, editors. Baukastensätze zu Weisen des
Evangelischen Kirchengesangbuches. Evangelische Verlagsanstalt, Berlin, 1976.
[13] J. McCormack. Open Problems in Evolutionary Music and
Art. In 3rd European WS on Evolutionary Music and Art
(EvoMUSART), pages 428–436, 2005.
[14] W. A. Mozart. Musikalisches Würfelspiel: Anleitung so viel
Walzer oder Schleifer mit zwei Würfeln zu componieren
ohne musikalisch zu seyn noch von der Composition etwas
zu verstehen. Köchel Catalog of Mozart’s Work KV1 Appendix 294d or KV6 516f, 1787.
[15] J. Müller-Blattau, editor. Die Kompositionslehre Heinrich
Schützens in der Fassung seines Schülers Christoph Bernhard. Bärenreiter, Kassel, 3. edition, 1999.
[16] G. Papadopoulos and G. Wiggins. AI Methods for Algorithmic Composition: A Survey, a Critical View and Future
Prospects. In AISB Symposium on Musical Creativity, 1999.
[17] V. Rathgeber. Missa Civilis, Opus 12, Nr. 8. Johann Jakob
Lotter Verlag, Augsburg, 1733.
[18] J. Ratia. Der Jazzwürfel — Ein harmonisches Würfelspiel.
netzspannung.org; Fraunhofer Institut Medienkommunikation, 2005.
References
[19] N. J. Schneider. Handbuch Filmmusik I — Musikdramaturgie im neuen Deutschen Film. Verlag Ölschläger,
München, 2. edition, 1990.
[1] T. Adorno and H. Eisler. Composing for the Films. Oxford
University Press, New York, 1947.
[2] H. Barwood. Cutting to the Chase: Cinematic Construction
for Gamers. In Game Developer Conference, 2000.
[20] M. J. Steedman. A Generative Grammar for Jazz Chord
Sequences. Music Perception, 2(1):52–77, 1984.
[3] G. D. Birkhoff. Aesthetic Measure. Havard University Press,
Cambridge, 1933.
[21] P. Todd and G. Loy, editors. Music and Connectionism. MIT
Press, Cambridge, 1991.
[4] R. L. de Mantaras and J. L. Arcos. AI and Music from
Composition to Expressive Performance. AI Magazine,
23(2):43–57, 2002.
[5] K. Ebcioglu. An Expert System for Harmonizing Four-Part
Chorales. Computer Music Journal, 12(3):43–51, 1988.
[6] K. Ebcioglu. An Expert System for Harmonizing Chorales
in the Style of J. S. Bach. Journal of Logic Programing,
8(1–2):145–185, 1990.
[7] A. Gartland-Jones and P. Copley. The Suitability of Genetic
Algorithms for Musical Composition. Contemporary Music
Review, 22(3):43–55, 2003.
[8] L. A. Hiller and L. M. Isaacsons. Experimental Music:
Composing with an Electronic Computer. McGraw Hill,
New York, 1959.
[9] J. P. Kirnberger. Der allezeit fertige Polonaisen und Menuetten Komponist, 1757. (trans.: The Ever Ready Composer of
Polonaises and Minuets.
[10] R. Kungel. Filmmusik für Filmemacher — Die richtige
Musik zum besseren Film. Mediabook-Verlag, Reil, 2004.
59
THE DRUM PANTS
Søren Holme Hansen
University of Copenhagen
Department of Musicology
Klerkegade 2
DK-1308 København K, Denmark
[email protected]
Alexander Refsum Jensenius
University of Oslo
Department of Musicology
P.O. 1017 Blindern
N-0315 Oslo, Norway
[email protected]
Abstract. This paper describes the concept and realization of The Drum Pants, a pair of pants with sensors and control switches,
allowing the performer to play and record a virtual drum set or percussion rack by hitting the thighs and waist with the hands. The
main idea is to make a virtual percussion instrument with a high level of bodily control and which permits new visual performance
possibilities.
1 Introduction
2 The Design Idea
Drummers and percussionists have a habit of tapping
themselves pretending to play their instrument. This can be seen
as a way to practice without their instrument at hand, but also as
a natural way of getting beats or rhythmic figures into the body.
The latter reflects a close interrelation between musical sound,
mental imagery and bodily experience as suggested in [1]. In
this project we have been interested in using such a connection
in the design of a new instrument.
Most of the commercially available electronic drum and
percussion interfaces are designed to simulate the acoustical
instruments they are replacing, like the electronic drum set or
different kinds of percussion sensing plates. Why not exploit the
natural way of feeling the rhythm in the body and develop a
new kind of interface which allows a closer contact between
rhythm and body?
Even though drum performances can be visually interesting,
drummers are usually locked to their stationary instruments and
do not have the same amount of physical freedom to participate
and interact in visual performance on stage as for example
singers, guitarists or saxophone players. It would be quite
natural to see the drummer moving around to the beat he is
playing. From this the drum performance would be an
integrated combination of rhythm and dance and could thereby
add a visually interesting dimension to the performance.
Today, when programming beats, grooves and rhythmic
soundscapes on the computer with sample software, there seems
to be a need for more human friendly controllers besides
keyboard/mouse and MIDI-keyboards. Of course, it is possible
to use the afore-mentioned commercial drum interfaces, but
there will still be a lack of flow in the programming as you
normally have to switch from the controller to computer or
sampler and back again. A solution where a drum interface and
a sample-controller is combined in an ecologically sound
design, would therefore be preferable and an interesting way of
improving the flow and creative energy in the process of
programming.
In developing The Drum Pants we sought to integrate and solve
these issues by means of a new wearable drum interface design.
The idea of wearable electronic instruments has been exploited
by a number of artists and musicians throughout the years, for
example Joseph Paradiso [2], Ståle Stenslie [3] and Rolf Wallin
[4]. These designs have focused on creating new sounds with
new interfaces. We have been interested in exploring the control
of traditional sounds with a new interface, and in the following
sections we will focus on three main issues concerning the
design: physical freedom while playing, sensor types and
placements, and the dependence of the computer while playing.
2.1 Physical Freedom
In order to give the most possible physical freedom for the
performer, we chose to develop a pair of pants, since it leaves
the upper body and arms free of electronics and wires. Cotton
where chosen as the material for the pants in order to get a
comfortable and light pair of pants. The sensors placed on the
legs are flexible, which means that all kinds of physical activity
are possible wearing The Drum Pants, like stretching, bending,
walking, jumping, dancing etc.
2.2 Sensor Types and Placements
The Drum Pants is implemented with six force sensors on the
thighs, and seven digital switches plus one potentiometer around
the waist. In addition to the pants there is a pair of shoes with a
force sensor under one of the soles, connected to the pants with a
wire (see Figure 1). The reason for using analog force sensors,
and not just digital touch sensors, was to be able to get a natural
connection between the level of tapping and the dynamics of the
music. Furthermore, the possibilities of dynamic variation makes
the beats produced more alive and authentic, as accentuation,
ghost-notes and crescendo/decrescendo figures are possible.
60
record to create a loop. In general, the placement design around
the waist makes it easy to control the sampler-functions while
playing. In addition eight diode lights are placed right under the
waist as a visual help in controlling the sampler-functions.
Sensor
interface
Switches
2.3 Dependence of Computer
All the sensors are connected with wires to a USB sensor
interface placed in the hip pocket. The design of the Drum Pants
with its various sampler-functions makes the performer
independent of the computer while performing on stage or
during programming beats or rhythms in the studio. On stage this
means that the performer can focus on being in contact with the
audience or fellow musicians rather than having to concentrate
on the computer. In the studio, the integrated drum interface and
sampler design helps keeping focused in a more fluent
programming process. In addition, the overall mobile and
flexible design brings a new dimension of physical activity into
the discourse of drum interfaces, allowing the performer to walk,
jump, dance etc. while playing.
USB-cable
Diode light
3 Implementation
Potentiometer
The implementation process involved the hardware design and
developing software to use the pants.
Force sensors
Force sensor
Cable
3.1 Sensor Interface
The USB interface used is a Phidgets USB interface kit1 with
eight analog inputs (the seven force sensors and the
potentiometer), eight digital inputs (the switches around the
waist) and eight digital outputs (the diode lights). Using the
analog inputs the interface kit provides a standard 0-5V range.
The interface is connected directly to the computer with a USB
cable, and since it draws the necessary power from the USB
cable, it is not necessary with a separate power supply. Of
course, although it is possible to extend the range with powered
USB hubs, the length of the USB cable is a clear limitation in a
performance situation with this prototype.
Figure 1: The Drum Pants.
The placements of the force sensors were decided after studying
how people “play” drums on their pants, and a wish to adopt
some features of an acoustical drum set; i.e. placing the sensors
so it is possible to use both hands and foot at the same time as
for example hi-hat, snare drum and bass drum. Furthermore, it
has been a priority to place the sensors in a way which makes it
easy to tap them in an upright standing position, as it would add
to the performer a great deal of physical freedom and mobility
while playing. The natural area of tapping on the pants in
upright standing position is shown in Figure 2.
3.2 Sensors
The type of force sensor used for pants is the Flexiforce sensor2,
which acts as a force sensing resistor in an electrical circuit. The
Flexiforce sensor is a very thin and flexible strip, and is easily
incorporated into a circuit (Figure 3).
Figure 2: Natural area of tapping on pants in an upright standing
position.
Figure 3: The Flexiforce sensor. The sensing area to the right is 9.53 mm
in diameter.
Within this natural area of playing there are three force sensors
placed on each leg/thigh, which makes it possible to switch
quickly between the three sensors with one hand, while at the
same time being spaced much enough to avoid unintended
sensor activation because of imprecise tapping. As it is seen in
figure 1, the solution is an unsymmetrical placement of the
sensors on each side, since it was felt more natural for the right
handed test person to have a sensor slightly more on the outside
of the right thigh than on the left. The result is an instrument
which feels natural to play, and requires very little practice to
get used to, since it is based on the simple idea of hitting your
thighs and waist.
The two outer pins of the connector are active and the center is
inactive. The sensing area is only 9.53 mm in diameter, so it has
been necessary to construct a flat cone of thick cardboard which
is glued on to the sensing area in order to increase the hitting
area (black circles on the pants in Figure 1). The force range of
the sensors placed on the pants is 0-4.4 N, whereas the range for
the sensor placed under the shoe is 0-110 N to allow for a greater
force. Due to the thin (0.208 mm) and flexible design of the
Flexiforce sensor a very close contact between sensor and body
is possible.
The potentiometer and digital switches placed around the waist
on The Drum Pants are standard sensors, even though the design
The digital sensors and the potentiometer around the waist serve
as control buttons for different sampler-functions, as for
example volume, effects, change of instrument presets,
recording etc. The design of the digital sensors makes it
possible to activate them by tapping, which is practical for fast
and accurate control, for example when starting and stopping
61
1
http://www.phidgets.com/
2
http://www.tekscan.com/flexiforce.html
of the switches is carefully chosen as it allows activating by
tapping and not switching.
3.3 Pants
The potentiometer and digital switches are positioned on the
outside of the pants by means of sewing, while the force sensors
and all wires are fastened with strong tape on the inside of the
pants. In a future version it would be desirable to use a more
durable solution than tape, but for this prototype model it made
the implementation process more flexible.
3.4 Programming
The software is developed in Max/MSP using an external object
from Phidgets to read values from the sensor interface. Figure 4
shows an overview of the data process from sensor to sound.
The Drum Pants with sensors
Sensor interface
Filtering, scaling and segmentation
Max/MSP
Logic/control
Mapping
Synthesis module
Sound
Figure 4: Schematic overview of the data process from sensor to sound.
Figure 5: Main patch of The Drum Pants software.
The incoming data are run through a simple FIR-filter to smooth
the data, and a threshold function is set to prevent unwanted
attacks.
In the patch (Figure 5), the filtered, scaled and segmented sensor
data are divided into two signals; one for playing back a sample,
and one for controlling the volume of the played sample. This is
the basis of the real-time playing mode, where different sound
bank presets can be selected with a sound-preset multiswitch. In
addition to the real-time playing mode, there are three different
recording modes: multilayered loop-recording (MLR), full drum
set loop recording (FDLR) and master recording (MR).
The sound module used for the prototype has been a simple
sampler made in Max/MSP (Figure 5). The focus has been on
creating a system which can easily be controlled from only a
few centrally placed buttons. This was achieved with a
“multiswitch” patch, which makes it possible to select up to 8
different channels with only one digital switch. The diode lights
on the pants give visual feedback when controlling the
multiswitch, as the number of lights turned on reveals which
channel is chosen.
With the MLR, a rhythmic figure can be built up by recording
one sensor at a time in separate tracks. The sensor to record is
chosen with the multiswitch, which also functions as activation
of the MLR mode. The first recording made will be looped
automatically and serves as the master-synchronization reference
for all later recordings. The FDLR allows the performer to
record several sensors simultaneously to one loop, as for
example bass drum, snare drum and hi-hat. This recording mode
is automatically activated when choosing sound bank preset
three or four. With MR the main audio output is recorded, which
means both loops and real-time playing. A separate switch is
used for this purpose. It is possible to easily save the MR in The
Drum Pants start window.
3.5 Mapping
Although we have only tested the pants with a sampler module,
all the patches have been designed so that the pants can easily
be used to control any type of sound module, for example a
physical model or synthesis. Since all values are scaled to a 0.1. range, they can easily be used to output data as Open Sound
Control (OSC) messages [5], or be scaled and output as MIDI
messages.
When choosing a channel with the multiswitch, it is possible to
add two different types of effects to the sample: delay and
manipulation of the sample-speed. The delay-time and samplespeed are then controllable with the potentiometer as with the
main output volume. A switch for default settings, i.e. clear all
recordings and added effects, makes it is easy to start all over
again. All functions of the software are controllable from the
pants.
The latency of sound when tapping the sensors in real-time
playing mode is practically unnoticeable, which is essential
when playing rhythms. We experience some latency when
adding a lot of tracks and effects, but this could be improved by
using a dedicated sampler program rather than our own sampler.
62
Finally, we will also be looking at possibilities for creating a
fully embedded system, based on a mini-computer. Imagine
being able to bring a full instrument with you inside your
clothes. Just plug in a pair of headphones and you will have the
ultimate mobile instrument, allowing you to practice and play
anywhere, anytime.
4 Conclusion
In this paper we have presented The Drum Pants, a wearable
interface built around the idea that drummers like to “play”
drums on their own body. The interface offers a new drum
playing experience and is inspiring when creating beats and
grooves. The fact that the performer also feels every tap on his
or her own body provides the playing with an enhanced
intimacy between rhythm and body compared to other
electronic interfaces. In addition, the sampler-functions which
can be controlled from switches placed at the waist line, makes
it possible to easily create interesting multilayered rhythmic
figures.
5 References
[1] Godøy, R.I., E. Haga, and A.R. Jensenius. Playing “Air
Instruments”: Mimicry of Sound-Producing Gestures by
Novices and Experts. Paper presented at the 6th
International Gesture Workshop, Vannes, France, 18-21
May, 2005.
Besides the use in performance, we also think this instrument
can be interesting in the field of music education. Playing the
Drum Pants offers an intimate connection between rhythm and
body, and could be used to study rhythm and its relation to
human motor functions. In the future we would be interested in
studying how people develop sense of rhythm by tapping
themselves? Does the combination of sound, rhythm and
intimate body experience ease the learning of human motor
control? In continuation of this, it is interesting to imagine a
model of the Drum Pants designed for children as a pedagogical
toy to use both at home and in educational institutions.
[2] Paradiso, J. and Eric Hu. Expressive footwear for computeraugmented dance performance. In Proceedings of the First
International Symposium on Wearable Computers,
Cambridge, MA. IEEE Computer Society Press, 1997, 165–
166.
[3] Stenslie, Ståle. EROTOGOD: The synesthetic ecstasy.
http://www.stenslie.net/stahl/projects/erotogod/index.html
[4] Wallin, Rolf. Yó (1994) for controller suit and computer.
http://www.notam02.no/~rolfwa/controlsuit.html
Future development includes developing a sensor clip-on
system, such that it is easier to adjust the sensor positions to
match different performers. This could also be the solution for
creating pants that can be washed, which is not possible with the
prototype.
[5] Whright, M. and A. Freed. Open sound control: A new
protocol for communicating with sound synthesizers. In
Proceedings of the International Computer Music
Conference, Thessaloniki, Greece, 1997, 101-104.
We will in addition be looking into improving the control of the
sampler functions, with some kind of matrix-controller,
controlled for example from a glove.
63
Backseat Playground
John Bichard*, Liselott Brunnberg*, Marco Combetto#, Anton Gustafsson* and Oskar Juhlin*
* Interactive Institute, P O Box 24 081, SE 104 50 Stockholm
{john.bichard, liselott, anton.gustafsson, oskarj}@tii.se
http://www.tii.se/mobility
# Microsoft Research Cambridge UK, Roger Needham Building , 7 J J Thomson Ave, Cambridge CB3 0FB, UK
[email protected]
http://research.microsoft.com/ero/
Abstract. We have implemented a conceptual software framework and a story-based game that facilitates generation of
rich and vivid narratives in vast geographical areas. An important design challenge in the emergent research area of
pervasive gaming is to provide believable environments where game content is matched to the landscape in an
evocative and persuasive way. More specifically, our game is designed to generate such an environment tailored to a
journey as experienced from the backseat of a car. Therefore, it continuously references common geographical objects,
such as houses, forests and churches, in the vicinity within the story; and it provides a sequential narrative that fit with
the drive. Additionally, it is important that the player can mix the interaction with the devices with as much visual focus
as possible on the surrounding landscape, in order to generate a coherent experience. The implemented user interaction
is audio centric, where most of game and narrative features are presented as sounds. Additional interaction through
movements is integrated with audio in the form of a directional microphone.
64