Automatic Audio Compositing System based on Music Information

Transcription

Automatic Audio Compositing System based on Music Information
POLITECNICO DI MILANO
Facoltà di Ingegneria dell’Informazione
Master of Science in Computer Engineering
Automatic Audio Compositing System
based on Music Information Retrieval
Author:
Luca Chiarandini
734917
Supervisor:
Prof. Augusto Sarti
Assistant supervisors:
Dott. Massimiliano Zanoni
Prof. Juan Carlos De Martin
Academic year 2009-2010
Abstract
In the past few years, music recommendation and playlist generation systems have become one
of the most promising research areas in the field of audio processing. Due to the large diffusion
of the Internet, users are able to collect and store a consistent amount of musical data and can
make use of them in everyday life thanks to portable media players.
The challenge of modern recommendation systems is how to process this huge amount of data
in order to extract useful descriptors of the musical content, i.e. how to perform automatic
tagging, catalog, indexing media material. This information may be used for many purposes:
media search, media classification, market suggestions, media similarity measurements, etc.
Until now, the traditional approach to this problem has been audio labelling. This operation
consists in the definition of symbolic descriptors that can be used for generating the playlist.
Examples of this sort are playlists based on music genre or artist name.
This approach has some strong limitations: first of all, since labels are usually considered as
descriptors of the whole musical piece, they cannot capture mood or genre changes inside the
same song. Moreover, the label classification sometimes results in heterogeneous classes (e.g.
music belonging to the same genre can be very different one from each other).
This thesis gets into this context and it consists in the study and development of a music recommendation framework that allows the user to interact by means of more precise descriptors.
The system intelligently recommends items of an audio database on the basis of the preferences
of the user.
Music Information Retrieval techniques are used in order to extract significant features from the
audio signal and allow the user to interact with the system by means of high level interfaces
such as musical tempo or timbric features.
During the description of the system, we will prove the generality of the approach by describing some of the many applications that could be derived from the framework: an automatic
DJ system, a tabletop interaction system, a playlist generation system based on runner’s step
frequency and training-based recommendation system.
The goal of this project is not only the development of a technically valid product but also an
exploration of the artistic applications. The system is addressed to a wide public of performers
(DJs, contemporary music executors, ...), composers and amateurs.
I
Sommario
Negli ultimi anni, i sistemi di music recommendation e di generazione dinamica di playlists sono
diventati aree di ricerca estremamente promettenti. Grazie alla grande diffusione di Internet,
gli utenti possono memorizzare un insieme consistente di dati musicali e farne uso nel contesto
di tutti i giorni grazie a riproduttori musicali portatili.
Il problema dei moderni sistemi di music recommendation è come elaborare questa grande quantità di dati ed estrarre descrittori significativi del contenuto; questa informazione può essere usata
per molti scopi: ricerca musicale, classificazione, consigli commerciali o misure di similarità audio.
Fino ad ora, l’approccio tradizionale al problema è stato audio labeling. Quest’operazione consiste nella definizione di descrittori simbolici che possano essere usati per la generazione della
playlist. Esempi di questo tipo sono playlist basate sul genere musicale o sul nome dell’artista.
Questo approccio ha però alcune forti limitazioni: prima di tutto, le labels sono in genere considerate descrittori dell’intero brano musicale e non considerano cambiamenti di genere o mood
all’interno della stessa canzone. Oltre a ciò, la classificazione per label si traduce spesso in classi
molto eterogenee; ad esempio, musica appartente allo stesso genere può avere caratteristiche
molto diverse.
Questa tesi si inserisce in questo contesto e consiste nello studio e sviluppo di un framework di
music recommendation che permetta all’utente di interagire tramite un insieme di descrittori
più precisi. Il sistema consiglia in modo intelligente brani musicali sulla base delle preferenze
dell’utente.
Tecniche di Music Information Retrieval vengono usate al fine di estrarre features significative
direttamente dal segnale musicale e permettere all’utente di interagire con il sistema per mezzo
di interfacce di alto livello come tempo o features timbriche.
Durante la descrizione del sistema, daremo prova della generalità dell’approccio usato descrivendo
alcune delle molte applicazioni che possono essere derivate dal framework: un sistema di DJ automatico, un sistema di interazione tabletop, un generatore dinamico di playlist basato sulla
frequenza del passo di una persona che corre a un sistema di recommendation basato sull’apprendimento.
L’obiettivo di questo progetto non è solo lo sviluppo di una piattaforma tecnicamente valida ma
anche l’analisi delle applicazioni artistiche che il sistema può trovare. Esso è infatti indirizzato
ad un vasto pubblico di esecutori (DJs, esecutori di musica contemporanea, ...), compositori e
dilettanti.
II
Acknowledgments
I would like to thank Professor Augusto Sarti for the opportunity of this amazing work and for
all his help. His passion and enthusiasm in the project were really contagious and supported
me.
Thanks to Doctor Massimiliano Zanoni for the careful supervision, for all the advices and for
all those brain storming sessions we had together. It was absolutely invaluable; such a friendly
and kind co-supervisor is hard to find!
Thanks to Professor Juan Carlos De Martin for reading this mess, and for being so kind about
it.
Special thanks to all the researchers of ”ISPG Lab” in Milano and ”Laboratorio di Elaborazione
e Produzione di Segnali Audio e Musicali” in Como for their help, hospitality and understanding.
I would like to express my gratitude to all the friends that filled the questionnaire and in
particular to Laura, Domenico and Paola who helped me in this difficoult phase. The evaluation
turned out to be one of the most useful parts of this work since all the people were really
enthusiastic about the software and gave many sincere advices.
Special thanks to the fantastic friends who supported me during the last years, both in Trieste
and in Milano. They are, in alphabetical order, Agnese, Alicia, Andrea B., Andrea M., Antoniela,
Ashanka, Camilo, Cristina, Dalia, Dean, Ebru, Elisa, Fabio, Giorgio, Giulio, Jovan, Maicol,
Marko, Mastro, Maurizio, Michele (the President), Paolo, Riccardo, Sara L..
Above all, many thanks to my family, who belived in me and gave me the opportunity to study
in Milano and complete my Master of Science.
III
Contents
1 Introduction
1
2 State of the art
5
2.1
Music information retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
2.2
Playlist generation and recommendation systems . . . . . . . . . . . . . . . . . .
7
3 Theoretical background
3.1
3.2
3.3
3.4
10
Feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
3.1.1
Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
3.1.2
Harmony . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
3.1.3
Tempo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
3.1.4
Brightness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
3.1.5
Rms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
3.1.6
Spectral centroid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
3.1.7
Spectral roll-off . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
3.1.8
Spectral flux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
3.1.9
Inharmonicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
3.1.10 MFCC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
Feature analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
3.2.1
Support Vector Machine (SVM) . . . . . . . . . . . . . . . . . . . . . . .
26
3.2.2
Gaussian Mixture Model (GMM) . . . . . . . . . . . . . . . . . . . . . . .
27
The short-time Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
3.3.1
Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
3.3.2
Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
Time-scaling
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30
3.4.1
Time-scaling STFT algorithm . . . . . . . . . . . . . . . . . . . . . . . . .
30
3.4.2
Time-scaling time-domain algorithms
31
IV
. . . . . . . . . . . . . . . . . . . .
CONTENTS
V
4 Methodology
33
4.1
The problem of labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
4.2
Labelling in recommendation systems . . . . . . . . . . . . . . . . . . . . . . . .
33
4.3
The recommendation framework structure . . . . . . . . . . . . . . . . . . . . . .
36
4.4
Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
4.4.1
Automatic DJ system . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
4.4.2
Tangible interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
4.4.3
Dynamic playlist generation system based on runner’s step frequency
. .
38
4.4.4
Training-based system . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
5 Implementation
5.1
5.2
5.3
40
Preprocessing phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
41
5.1.1
Features extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
41
5.1.2
Features similarity functions
. . . . . . . . . . . . . . . . . . . . . . . . .
45
XML Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49
5.2.1
Anchors XML Schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49
5.2.2
Generic feature XML schema . . . . . . . . . . . . . . . . . . . . . . . . .
50
Performance phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
52
5.3.1
Work-flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
52
5.3.2
Proposal generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
58
5.3.3
Ranking system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
63
5.3.4
Transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
64
5.3.5
Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
70
6 Evaluation
78
6.1
Instance of the system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
78
6.2
Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
79
6.2.1
Mood training dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
79
6.2.2
Evaluation dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
79
6.3
Test structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
79
6.4
Results of the questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
83
6.4.1
Auditory questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
83
6.4.2
Usage questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
88
6.4.3
Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
94
Overview of the result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
94
6.5
CONTENTS
VI
7 Perspectives and future developments
96
7.1
System-level improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
96
7.2
Implementation improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
98
7.3
From musical compositing to composition . . . . . . . . . . . . . . . . . . . . . .
99
A User manual
104
A.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
A.2 Running the system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
A.2.1 Feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
A.2.2 Performance
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
A.2.3 Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
B The audio database
108
B.1 Mood detection training set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
B.1.1 Anxious . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
B.1.2 Contentment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
B.1.3 Depression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
B.1.4 Exuberance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
B.2 The performance database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
C A short story of computer-aided composition
120
C.1 Algorithmic composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
C.2 Composition environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
C.3 Interactive composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
C.4 Collaborative Music Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
List of Figures
1.1
The work-flow of traditional recommendation systems . . . . . . . . . . . . . . .
2
1.2
The work-flow of our system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
2.1
Genre classification triangular plot . . . . . . . . . . . . . . . . . . . . . . . . . .
6
2.2
Mood extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
2.3
Apple iTunes Genius logo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
3.1
Structure of a generic classification system . . . . . . . . . . . . . . . . . . . . . .
11
3.2
Gaussian chequerboard kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
3.3
Segmentation operation performed . . . . . . . . . . . . . . . . . . . . . . . . . .
13
3.4
Harmony: original spectral components . . . . . . . . . . . . . . . . . . . . . . .
14
3.5
Harmony: unwrapped chromagram . . . . . . . . . . . . . . . . . . . . . . . . . .
15
3.6
Harmony: wrapped chromagram . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
3.7
Harmony: key detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
3.8
Harmony: key clarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18
3.9
Tempo: onset detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
3.10 Brightness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
3.11 RMS: comparison between rms and signal energy . . . . . . . . . . . . . . . . . .
22
3.12 Spectral roll-off . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
3.13 Inharmonicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
3.14 MFCC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
3.15 Feature space and items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
4.1
The work-flow of traditional recommendation systems . . . . . . . . . . . . . . .
34
4.2
The work-flow of our system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35
4.3
The functional blocks of the system . . . . . . . . . . . . . . . . . . . . . . . . . .
36
4.4
The ReacTable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
4.5
The ReacTable framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38
VII
LIST OF FIGURES
VIII
5.1
System stages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
5.2
Anchors points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
42
5.3
Tempo optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43
5.4
Mood bi dimensional plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
44
5.5
The hierarchical framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
44
5.6
Harmony similarity measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
46
5.7
The qualitative graph of harmony similarity . . . . . . . . . . . . . . . . . . . . .
47
5.8
Harmony similarity measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
5.9
Graph of compareh armony . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
48
5.10 Tempo similarity graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
48
5.11 The XML data model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
51
5.12 The functional blocks of the system . . . . . . . . . . . . . . . . . . . . . . . . . .
52
5.13 System startup temporising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
54
5.14 Tclosing and Topening
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55
5.15 Proposal selection closed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
5.16 Proposal selection opened . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
5.17 Proposal play started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
57
5.18 Cancel next ”proposal selection closed” event . . . . . . . . . . . . . . . . . . . .
58
5.19 Anticipated proposal selection closing . . . . . . . . . . . . . . . . . . . . . . . .
59
5.20 Anticipated proposal selection opening . . . . . . . . . . . . . . . . . . . . . . . .
60
5.21 An example of the fixed-length algorithm . . . . . . . . . . . . . . . . . . . . . .
62
5.22 Variable-length similarity algorithm . . . . . . . . . . . . . . . . . . . . . . . . .
63
5.23 GMM training example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
65
5.24 Transition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
66
5.25 Timescale operation in a transition . . . . . . . . . . . . . . . . . . . . . . . . . .
67
5.26 Transition timescale linear approximation . . . . . . . . . . . . . . . . . . . . . .
68
5.27 Beat synch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
69
5.28 The program window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
71
5.29 The next song selection entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
72
5.30 System parameters tabbed panel . . . . . . . . . . . . . . . . . . . . . . . . . . .
73
5.31 ReacTable system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
74
5.32 Feature weights tangibles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
74
5.33 Feature values tangibles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
75
5.34 The Wii Remote . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
77
LIST OF FIGURES
IX
6.1
The characteristics of the tester set . . . . . . . . . . . . . . . . . . . . . . . . . .
80
6.2
The paper version of the questionnaire . . . . . . . . . . . . . . . . . . . . . . . .
81
6.3
The electronic version of the questionnaire . . . . . . . . . . . . . . . . . . . . . .
82
6.4
Did the system play pleasant music? . . . . . . . . . . . . . . . . . . . . . . . . .
84
6.5
How do you evaluate the transitions between songs? . . . . . . . . . . . . . . . .
85
6.6
How well can the system be applied in the following fields? . . . . . . . . . . . .
87
6.7
How do you rate the following...? . . . . . . . . . . . . . . . . . . . . . . . . . . .
89
6.8
How well does the system suit you artistic needs? . . . . . . . . . . . . . . . . . .
90
6.9
Do you think the system could enhance your artistic performance? . . . . . . . .
91
6.10 Does the system respond to the changes in the parameters? . . . . . . . . . . . .
92
6.11 How do you rate the music proposal made by the system? . . . . . . . . . . . . .
93
6.12 Is the system intuitive (how much time do you need to learn how to use it)? . . .
94
7.1
97
FL Studio interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A.1 MATLAB interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
A.2 MIRtoolbox logo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
A.3 Java Runtime Environment logo . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
A.4 Bluetooth logo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
C.1 Relevant music scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
C.2 iMUSE Logo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
C.3 OpenMusic 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
C.4 The tabletop collaborative system . . . . . . . . . . . . . . . . . . . . . . . . . . 123
C.5 The Stanford Laptop Orchestra . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
List of Tables
5.2
The qualitative measure of harmony similarity . . . . . . . . . . . . . . . . . . .
45
5.1
Harmony similarity measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
46
5.3
Mood similarity table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49
5.4
Anchor XML structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
51
5.5
Generic feature XML structure . . . . . . . . . . . . . . . . . . . . . . . . . . . .
54
X
Chapter 1
Introduction
In the past few years, the availability and accessibility of media have increased as never before.
Users are able to store a huge amount of musical data and video content and make use of them
wherever and whenever they like, thanks to portable media players.
The main problem that modern audio applications are now facing is how to process this content
in order to extract useful descriptors, i.e. how to perform automatic meta-data extraction,
catalog, labelling media material. This information can be used for many purposes: media
search, media classification, market suggestions, media similarity measurements, ...
The generation of high-level symbolic descriptors of media is usually done by hand, therefore it
is error prone and time-demanding. The discipline that addresses the automatic generation of
media tagging and high-level description of content is Multimedia Information Retrieval (MIR).
When content is limited to musical audio, it contextualises to Music Information Retrieval,
which is aimed at deriving descriptors directly from the musical content.
The goal of feature extraction is usually the creation of music recommendation systems. Such
systems attempt to make content choices according to the inferred preferences of the user.
Systems of this sort are particularly suitable for market analysis and/or playlist generation
systems.
Figure 1.1: The work-flow of traditional recommendation systems
Most of the recommendation systems available today follow the work-flow described in Figure
1.1. The signal is analysed and a set of features are extracted to form N-dimensional vectors.
Vectors of the features space are then analysed through a decision process that returns outcomes
in the form of labels. This symbolic information can thus take on a finite or numerable number
of values, which results in a greatly limited discrimination power. Examples of labels are: artist,
genre, author, style, etc. Recommendation systems usually start from this symbolic description
to generate playlists based on the values of these labels.
The loss of discriminant power associated to the conversion from feature vectors to labels is
the main reason why recommendation systems tend to perform quite poorly. The reason why
playlists do not expose a constant behaviour could be due to:
• Temporal inhomogeneity: labels tend to be global descriptors of a whole musical piece and
to not tend to capture local changes in the mood and tempo or genre transitions within the
1
CHAPTER 1. INTRODUCTION
2
same piece. This makes it very hard to create playlists that locally adapt to a particular
state (e.g. a storytelling system able to reproduce music that describes a series of events).
• Ensemble inhomogeneity: labels tend to apply to musical pieces that differ a great deal
from each other. Choosing labels as control parameters results therefore in a weak control
on the part of the user. Consider, for example, genre classification: music belonging to
the same genre are very different from each other; if the user selects a playlist based on a
particular genre, this has a very weak connection with the actual mood of the song
Figure 1.2: The work-flow of our system
In this thesis, we would like to create a recommendation system that avoids the conversion
from feature vectors to labels (Figure 1.2). We will design a layer between the recommendation
system and the user that allows the user to control it in a by acting directly on the features.
Moreover, the system should face the problem of temporal inhomogeneity by considering how
the feature vectors change inside the same musical piece.
To prove the generality of this concept, we will present some ways in which the values of the features can be defined by the use. They have been ordered according to the amount of interaction
between the user and the machine.
• low-level feature based : the user may directly specify the values of the features by acting
on a set of controllers. This is a very precise but also low-level interaction mechanism
• high-level feature based : the user interacts by means of a set of high level features which
incorporate other sub-features (for example tempo, harmony, ...)
• feature detection based : the values of the features are extracted from the actions of the
user or from other types of data. Examples of this kind may be:
– a system that extracts the tempo from the movements of a dancing person
– a system that performs blob detection on a dancing floor in order to estimate the
number of people dancing and play music according to this parameter. The system
can, for example, select certain songs to encourage people to dance when the dancing
floor is empty
– a system that can detect the step frequency and heart rate of a running person and
adapt the tempo of the songs to these values,
• Training based : the system embodies some sort of learning mechanism that can be trained
on the preferences of the user and suggest good features candidates by itself
These four classes can coexist in the same system and it is even possible to combine them. In this
thesis, we will develop a general framework for feature-based recommendation systems. We will
then show some specific applications of the system obtained by performing slight adjustments.
CHAPTER 1. INTRODUCTION
3
Overview of thesis The thesis is organised as follows.
In chapter 2, a list of related works is presented. This chapter is intended as a broad view on
the environment in which the system moves.
In chapter 3 we describe some important concepts of signal processing and musical feature extraction that will be used in the system.
Chapter 4 gives a general description of the recommendation framework and presents some applications of the system.
In chapter 5 the implementation details are presented.
Chapter 6 performs an evaluation of the system.
Finally, chapter 7 presents guidelines for future improvements and evolutions.
To give a more practical reference and a description of the deliverables, Appendix A contains
an user manual, addressed to a user of the system.
Appendix B contains the list of audio items used during the evaluation phase.
In the case the reader is interested in knowing more about the history of computer-aided composition systems, Appendix C contains an excursus about this topic.
Chapter 2
State of the art
In this section we will discuss about related works from both research and commercial fields.
The chapter starts with Music Information Retrieval, a new branch in digital signal processing
that aims at extracting relevant descriptors directly from the audio content. It will be shown
how important this notion of content-based analysis is and we will then cite the latest achievements in the field.
Afterwards, the most recent applications in the field of automatic playlist generation are explained. This is a new growing field of research in the field of Digital Signal Processing due to
the recent diffusion of portable audio players and media managers. From the analysis of this
area, we would like to point out the innovative content of our framework by highlighting the
absence of systems with similar characteristics.
2.1
Music information retrieval
As we previously mentioned, the increasing quantity of media data available on the Internet
created the need of managing them and finding ways of automatically extract meta-data and indexing huge databases. Therefore, a new discipline has been born: Music Information Retrieval;
it merges the latest achievements in Digital Signal Processing with notions of musicology and
psychoacoustics to extract parameters that describe the musical piece.
Relevant work has been made in order to extract low and high level features from audio signals. Low level features (e.g. brightness, tempo, ...) are usually extracted directly from the
audio stream by means of statistical and signal processing methods, whereas high level features
(genre, mood, ...) are often calculated on the basis of low level features and try to classify abstract or global characteristics of the musical track. Antonacci et al. [2] give a general overview
on feature extraction; low-level features as well as higher level extraction methods are explained.
Feng et al. [16] defines the so-called ”Average silence ratio” to characterise the articulation in a
musical piece.
Gillet and Richard [18] and Fitzgerald [17] derive the drum score from a polyphonic music signal
exploiting techniques such as autocorrelation, eigenvalues decomposition and principal components analysis. This work, although it is experimental and sometimes imprecise, gives a hint of
the capability of this new research field.
An interesting study on high-level features is described by Prandi et al. [41], in which the
authors classify and visualise audio signals on the basis of three features: classicity (timbric
feature which tells if the current segment presents a classical sound), darkness (relative power
of low frequencies with respect to high frequencies) and dynamicity. The system is trained with
a set of sample signals and is able to classify the emotional content. The result is visualised in a
4
CHAPTER 2. STATE OF THE ART
5
Figure 2.1: Genre classification triangular plot
triangle plot (Figure 2.1). Other applications of feature extraction applied to genre classification
have been developed by Pampalk et al. [36] and Li and Ogihara [29].
Kapur et al. [25] created a query-by-beat-boxing system that is able to detect the tempo of
BeatBoxing and use it to dynamically browse a database of music. BeatBoxing is a type of
vocal percussion, where musicians use their lips, cheeks, and throat to create different beats.
Generally, the musician is imitating the sound of a real drumset or other percussion instrument,
but there are no limits to the actual sounds that can be produced with their mouth. The system
can be used by experienced retrieval users that are eager to try new technologies, namely DJs.
Mood extraction A new research thread in feature extraction is the so-called ”mood extraction”: it consists in using a set of techniques (data mining, neural networks, signal processing,
...) in order to detect the emotional content of a musical piece (i.e. the ”mood”).
A generic mood extraction process consists in the following phases:
• Training phase: in this phase the system analyses a set of user-selected audio items assigned
to some emotional categories (sad, happy, anxious, ...). The aim of this step is to train the
system to understand the peculiarities of each category. The training is usually performed
as follows:
– Low feature extraction: The input signal is windowed. Windows are usually very
short (around 25-50 milliseconds) and overlapped. A signal processing algorithm is
then applied to the frames to extract low level features such as:
∗ Timbric features: MFCC (Mel-frequency Cepstrum Coefficients), spectral flux,
...
∗ Intensity features: RMS (root-mean-square), energy, ...
∗ Rhythmic features: tempo, articulation, ...
∗ Tonal features: harmony, pitch, inharmonicity, ...
– Then the time evolution of each feature is usually summarised in a finite number of
variables (e.g. mean and variance)
– A classification method (e.g. SVM, GMM, ...) is applied to extract a model of the
data
• Classification phase: the trained model is applied to the new audio items.
CHAPTER 2. STATE OF THE ART
6
The main difference between the mood extraction algorithms are the choice of the categories
and the low-level features and the classification method; behind each of them lies a vision of the
human emotions that leads the author to derive his or her models.
Laurier and Herrera [28] define a set of binary overlapping classes to which each musical piece
can belong with a certain degree of confidence; the classes could be, for example, happy/not
happy, sad/not sad, aggressive/not aggressive, ... The system is then trained on them. The
result is displayed in an easy-to-understand interface (see Figure 2.2a). The interesting aspect
of the method is the concept of overlapping classes: a musical segment may be for example
evaluated both happy and relaxed, at the same time. Mood is therefore not be considered just
as a single variable but as a set of variables that cohabit in the same application.
Similarly, Liu et al. [31] classify the data according to two high level features: energy (intensity
of the signal, power, ...) and stress (timbric/tonal feature). According to these two dimension,
the mood feature is modelled as a point on a bi-dimensional plane. The interesting point of this
method is the possibility of a 2-D gray-scale classification since the boundaries between classes
are blurred allowing a more accurate characterisation of the items (see Figure 2.2b)
The same bi-dimensional plane is used by Yang et al. [48]: the emotion plane is composed by
the two dimensions: arousal (how exciting/calming) and valence (how positive/negative) (Figure
2.2c).
In addition to this, Meyers [34] and Govaerts et al. [20] use the song lyrics to improve accuracy;
in fact, lyrics are usually related to the emotional content of the song. The system classify
the mood detecting key words in the text. The lyrics training phase is however very hard and
time demanding for a wide spread use and needs a large up-to-date database and dictionary;
moreover, lyrics are sometimes confusing and ambiguous.
(a) Mood cloud interface
(b) Energy/Stress plane
(c) Arousal/Valence plane
Figure 2.2: Mood extraction
CHAPTER 2. STATE OF THE ART
2.2
7
Playlist generation and recommendation systems
We will now see how the research in Music Information Retrieval may be applied to playlist generation. The diffusion of portable musical devices (such as Apple iPod or Creative Zen) arose
the need for algorithm that efficiently combine musical files. A good playlist usually satisfies
some constraints (user-preferred items are played more often...) and is designed to find a balance
between coherency (similar items are player one after another) and novelty (the listener should
not get bored).
A common assumption in playlist generation is that the transitions between audio items are
performed only at the beginning and at the end; it is not considered the case of transitions between sections of songs. Therefore, the inter-item similarity function used to create the playlist
only considers a small fraction of the signal.
Furthermore playlist generation systems are not usually designed to adapt to fast changes in the
mood or the genre; instead they are used to create sequences of songs that will be played for
their entire length. Therefore playlist generation systems are usually based on the extraction of
features describing the entire piece (such as genre, artist, album, etc.).
Many approaches has been developed, all sharing the goal of adaptation to the user by collecting his preferences. The methods are many: artificial intelligence search (Pauws et al. [38] and
Aucouturier and Pachet [4]), fuzzy logic (Bosteels and Kerre [7]) and audio feature extraction
(Shan et al. [45]).
The features used to generate a playlist are not only musical or related to the audio data;
Reynolds et al. [42] underlines the importance of contextual information (time of the day, temperature, location, ...) during the training phase and the applications of playlist generation
systems. The contextualisation is essential when the playlist is generated by portable audio
players; it has been shown that the type of music people listen to strongly depends on the time,
location and activity (driving, doing sports, relaxing, ...).
An interesting playlist generator system is described by Masahiro et al. [33] and it is used to
adapt the music to the behaviour of a running person. The system uses an accelerometer to
detect the runner’s step frequency and is able to select and play a song with the same beats per
minute. We will see in the following chapter how systems such as this can seen as instance of
the recommendation framework of this thesis.
Apple iTunes Genius Genius is an automatic playlist generator and recommendation system integrated in iTunes. The
system is based on collaborative filtering of huge amount of data
derived from the iTunes’ libraries of the users.
The source code is currently company secret but, based on scientific studies and personal experience, we can infer some implementation details. The system seems not to take into account
the audio content of the music library but only the meta-data,
improved also by the Gracenote MusicID service. In fact, the
system only works with well known songs and artists that are
Figure 2.3: Apple iTunes Gepresent on the on-line Gracenote database.
nius logo
The recommendation system, although it is not content-based,
can be compared to a content based system and the results are surprising: Barrington et al.
[5] describes how Genius can capture audio and artist similarity only exploiting collaborative
filtering. Somehow the system is able to exploit the users as content-based analysers since the
users usually listen to similar songs genres.
CHAPTER 2. STATE OF THE ART
8
Music search engines Another growing research field is music search engines. The challenges
(Nanopoulos et al. [35]) of modern music search engines are:
• Search by meta-data
• Search by lyrics
• Search by audio data
• Query by humming
• Recommendation of similar music
It is clear that feature extraction methods are highly exploited (Pardo [37]). Moreover, since the
database is very large, optimisation and audio fingerprinting (similar to hashing functions) are
needed. Cai et al. [10] develops an audio fingerprinting method that is resistant to distortion
and noise and allows scalability; similar musical segments will tend to have the same fingerprint.
When executing a query, the system only compares fingerprints without analysing the underlying
audio data.
The efficiency of audio fingerprint methods could be seen in Shazam [32] a music search engine
for portable devices (iPhone, Android, ...); it can recognise the track name and artist from a
short sample of the song. The system is resistant to noise and distortion and works well even in
noisy environments (disco, bars, pubs, ...).
Chapter 3
Theoretical background
This chapter presents the main theoretical tools that will be used to build the system. We start
by the explanation of the concept of feature extraction from musical signals, citing some relevant
audio features.
After that, we describe how features could be analysed and we will present two machine learning
techniques: Support Vector Machines and Gaussian Mixture Models. These techniques will also
be used in the system.
Finally, we will introduce the concept of Short-time Fourier Transform, focused on explaining
the time-scaling techniques.
3.1
Feature extraction
This chapter will deal with the most recent advancements in automatic information retrieval
from audio signals. This is a very huge field of research and many applications can be devised.
Digital analysis may discriminate whether an audio file contains speech, music or other audio
entities, how many speakers are contained in a speech segment, what gender they are and even
which persons are speaking. Music may be classified into categories, such as jazz, rock, classics,
etc. Often it is possible to identify a piece of music even when performed by different artists or
an identical audio track also when distorted by coding artifacts. Finally, it may be possible to
identify particular sounds, such as explosions, gunshots, etc.
The feature extraction process can be summarised in Figure 3.1.
In order to clean the audio data and enhance the performance of the feature extraction techniques, some operations are performed:
• conversion from stereo to mono signal
• de-noising (if needed)
• signal down-sampling to improve performance
• windowing
• time scaling of the signal (if needed)
• segmentation, that divides the signal in segments, called frames, in correspondence of
significant points
9
CHAPTER 3. THEORETICAL BACKGROUND
10
Figure 3.1: Structure of a generic classification system
Once this step is completed, it is possible to extract the descriptors useful for our purposes. For
~ = di (l), i = 1, ..., M is extracted; each
each frame l, a set of descriptors (also called features) d(l)
set is a point in a multidimensional space. Our goal is to find the one such that:
~ and d(k)
~
• instances of descriptors belonging to the same class d(l)
with k 6= l are grouped
in the same cluster independently from k and l
• it should be always possible to separate descriptors related to different classes cg and ch
with g 6= h
We will now describe some relevant low-level features that will be used in the system.
3.1.1
Segmentation
The segmentation phase divides the audio file in phrases and define some ”interesting points”
in the musical stream. Some examples of good separation points could be in correspondence of:
• harmony change
• tempo change
• musical phrase start
• spectrum change
• ...
To be effective, segmentation should be performed in such a way that music between two anchor
has almost constant characteristics.
The points may be defined manually or automatically. A composer may for example manually
define anchor points in order to divide the song according to his or her personal interpretation.
If the segmentation is performed automatically, one solution consists in a peak detection in the
spectrum novelty function. The novelty curve indicates the temporal locations of significant
textural changes and is the convolution of the similarity matrix (Figure 3.3a) along the main
diagonal using a Gaussian chequerboard kernel (Figure 3.2). A Gaussian chequerboard kernel
is obtained from a point to point multiplication between the bi-dimensional Gaussian function
and the following function:
(
+1 if sign(x) = sign(y)
f (x, y) =
−1 otherwise
CHAPTER 3. THEORETICAL BACKGROUND
11
Figure 3.2: Gaussian chequerboard kernel
• From the input signal, the system computes the similarity matrix that shows the similarity
between all possible pairs of frames from the input data (Figure 3.3a).
• Along the diagonal of the similarity matrix we are able to see among similarity of a frame
and its neighbours.
• If we perform the convolution along the diagonal with a chequerboard Gaussian filter, we
obtain a one-dimensional function, the novelty (Figure 3.3b).
(a) Segmentation: computation of the
similarity matrix
(b) Segmentation: convolution along the diagonal
Figure 3.3: Segmentation operation performed
• Performing a peak detection on the resulting curve, we detect the instants of maximum
local novelty.
3.1.2
Harmony
In music, the harmony is defined as the composition between a key (e.g. C, C#, ...) and a mode
(major, minor).
It is extracted from the analysis of the chromagram. The chromagram, also called Harmonic
Pitch Class Profile, represents the energy distribution along the pitches or pitch classes. It is
obtained in the following way:
• First, the spectrum is computed in the logarithmic scale, with selection of, by default, the
20 highest dB, and restriction to a certain frequency range that covers an integer number of
CHAPTER 3. THEORETICAL BACKGROUND
12
octaves, and normalisation of the audio waveform before computation of the FFT (Figure
3.4).
Figure 3.4: Harmony: original spectral components
• The chromagram is a redistribution of the spectrum energy along the different pitches
(i.e.,chromas) (Figure 3.5):
Figure 3.5: Harmony: unwrapped chromagram
• If we discard the information about the octaves, we obtain the wrapped chromagram
(Figure 3.6):
• In order to determine the harmony, we compute the key strength (also called key clarity),
i.e., the probability associated with each possible key candidate, through a cross-correlation
(Figure 3.7) of the chromagram returned, wrapped and normalised, with similar profiles
representing all the possible tonality candidates (Krumhansl [26]; Gmez [19]).
• The resulting graph indicate the cross-correlation score for each different tonality candidate
(Figure 3.8).
• The selected harmony is the one corresponding to the maximum value.
3.1.3
Tempo
The tempo, expressed in BPM (beats per minute), is estimated by detecting periodicities from
the onset detection curve.
CHAPTER 3. THEORETICAL BACKGROUND
Figure 3.6: Harmony: wrapped chromagram
Figure 3.7: Harmony: key detection
Figure 3.8: Harmony: key clarity
13
CHAPTER 3. THEORETICAL BACKGROUND
14
A way of determining the tempo is based on first the computation of an onset detection curve,
showing the successive bursts of energy corresponding to the successive pulses (Figure 3.9).
A peak picking is automatically performed on the onset detection curve, in order to show the
estimated positions of the notes. The onset detection can be applied to multiple functions (signal
Figure 3.9: Tempo: onset detection
envelope or spectrum) or can be performed in parallel to a filter bank.
The system computes the autocorrelation function of the onset detection curve. Then a peak
picking is applied to the resulting function.
3.1.4
Brightness
The brightness (Figure 3.10) is a low level feature that expresses the power of the low-frequency
bands over the power of the high-frequency ones. This ratio usually ranges between 0.2 and 0.8
for ordinary musical tracks. This feature can be used as a rough estimator of the darkness of
the musical piece.
R +∞
X(ω)
brightness = R threshold
(3.1)
threshold
X(ω)
0
Figure 3.10: Brightness
where threshold is a pre-defined value usually around 1500 Hz.
CHAPTER 3. THEORETICAL BACKGROUND
3.1.5
15
Rms
The global energy of the signal x(t) can be computed simply by taking the root average of the
square of the amplitude, also called root-mean-square (RMS):
v
r
u
N
u1 X
x21 + x22 + x23 + ... + x2N
x2i =
(3.2)
xrms = t
N
N
i=1
Figure 3.11a shows an example of computation of the RMS feature. We can note that this
energy curve is very close to the envelope (Figure 3.11b).
(a) Rms
(b) Energy
Figure 3.11: RMS: comparison between rms and signal energy
3.1.6
Spectral centroid
An important and useful description of the shape of a distribution can be obtained through the
use of its moments. The first moment, called the mean, is the geometric center (centroid) of the
distribution and is a measure of central tendency for the random variable.
Z
centroid = xf (x) dx
(3.3)
3.1.7
Spectral roll-off
One way to estimate the amount of high frequency in the signal consists in finding the frequency
such that a certain fraction of the total energy is contained below that frequency (Figure 3.12).
This ratio is fixed by default to .85 (following Tzanetakis and Cook, 2002), other have proposed
.95 (Pohle, Pampalk and Widmer, 2005).
CHAPTER 3. THEORETICAL BACKGROUND
R spectralRollof f
−∞
R +∞
X 2 (ω) dω
X 2 (ω) dω
−∞
16
= .85
(3.4)
Figure 3.12: Spectral roll-off
3.1.8
Spectral flux
Spectral flux is a measure of how quickly the power spectrum of a signal is changing, calculated
by comparing the power spectrum for one frame against the power spectrum from the previous
frame. Given the spectrum of the signal, we can compute the spectral flux as being the distance
between the spectrum of each successive frames. More precisely, it is usually calculated as the
2-norm (also known as the Euclidean distance) between the two normalised spectra.
Z
+∞
(f1 (τ ) − f2 (τ ))2 dτ
f lux =
(3.5)
−∞
3.1.9
Inharmonicity
This feature estimates the inharmonicity, i.e., the amount of partials that are not multiples of
the fundamental frequency f0 , as a value between 0 and 1. For that purpose, we use a simple
function (Figure 3.13) estimating the inharmonicity of each frequency given the fundamental
frequency f0 :
+∞
X
ω − i · f0
Finharmonicity (ω) =
T(
)
(3.6)
f0
i=1
where T (ω) is a triangular function defined as follows:


0
if ω < 0



2 · ω
if 0 ≤ ω < 12
T (ω) =

2 − 2 · (ω) if 12 ≤ ω < 1



0
if ω ≥ 1
The inharmonicity is then computed as
Z
inharmonicity =
+∞
X(ω)Finharmonicity (ω) dω
0
(3.7)
CHAPTER 3. THEORETICAL BACKGROUND
17
where X(ω) is the Fourier transform of the signal.
Figure 3.13: Inharmonicity
3.1.10
MFCC
Mel-frequency cepstral coefficients (MFCCs) offer a description of the spectral shape of the
sound. They derive from a type of cepstral representation of the audio clip (a nonlinear
”spectrum-of-a-spectrum”). A cepstrum is the result of taking the Fourier transform (FT)
of the decibel spectrum as if it were a signal. The difference between the cepstrum and the
mel-frequency cepstrum is that in the second, the frequency bands are equally spaced on the
mel scale, which approximates the human auditory system’s response more closely than the
linearly-spaced frequency bands used in the normal cepstrum.
MFCCs are commonly derived as follows (see also Figure 3.14):
• Take the Fourier transform of (a windowed excerpt of) a signal.
• Take the logs of the powers of the frequencies
• Map the powers of the spectrum obtained above onto the mel scale, using triangular
overlapping windows.
• Take the discrete cosine transform of the list of mel log powers, as if it were a signal.
• The MFCCs are the amplitudes of the resulting spectrum.
The discrete cosine transform (DCT) is a Fourier-related transform similar to the discrete Fourier
transform (DFT), but using only real numbers. It has a strong ”energy compaction” property:
most of the signal information tends to be concentrated in a few low-frequency components of
the DCT. That is why by default only the first 13 components are returned. By convention, the
coefficient of rank zero simply indicates the average energy of the signal.
Figure 3.14: MFCC
CHAPTER 3. THEORETICAL BACKGROUND
3.2
18
Feature analysis
We will now present some mechanism that are used to analyse feature or to extract higherlevel features. These methods mainly consist in the training of an expert system to recognise a
particular state.
3.2.1
Support Vector Machine (SVM)
We will now discuss a relevant methods used for classification: Support Vector Machine (SVM).
A SVM is a binary classifier that learns the boundary between items belonging to two different
classes. It works by searching a suitable separation hyperplane between the different classes
in the feature space. The best separation hyperplane maximise its distance from the closest
training items.
However, in all days scenarios, we are not always able to trace a separation hyperplane between
the different classes: some items fall in the region of the feature space belonging to the opposite
class. On the other hand it may happen that it is not possible to find an hyperplane that
separates the different classes. In such a situation we will operate a transformation of the
feature space in order to ”rectify” the separation surface into an hyperplane. In this subsection
we will deal with linear and separable SVMs. Figure 3.15 shows an example of dataset and
trained SVM; note that more than one hyperplane can be found that separates the classes.
Figure 3.15: Feature space and items
Let xi with i = 1, ..., N be the feature vectors of the training data X. These belong to either
of two classes, ω1 , ω2 , which are assumed to be linearly separable. The goal is to design a
hyperplane:
g(x) = wT x + w0
(3.8)
that correctly classifies all the training vectors. Such a hyperplane is not unique. Let us now
quantify the term ”margin” that a hyperplane leaves from both classes. Every hyperplane is
characterised by its direction (determined by w) and its exact position in space (determined
by w0 ). Since we want to give no preference to either of the classes, then it is reasonable for
each direction to select that hyperplane which has the same distance from the respective nearest
points in ω1 and ω2 . Our goal is to search for the direction that gives the maximum possible
margin. However, each hyperplane is determined within a scaling factor. We will free ourselves
from it, by appropriate scaling of all the candidate hyperplanes. The distance of a point from a
hyperplane is given by
|g(x)|
(3.9)
z=
kwk
CHAPTER 3. THEORETICAL BACKGROUND
19
We can now scale w, w0 so that the value of g(x), at the nearest points in ω1 and ω2 (circled in
Figure), is equal to 1 for ω1 and, thus, equal to - 1 for ω2 . This is equivalent with
1. Having a margin of
1
kwk
−
−1
kwk
=
2
kwk
2. Requiring that
wT x + w0 > 1, ∀x ∈ ω1
(3.10)
wT x + w0 < −1, ∀x ∈ ω2
(3.11)
For each xi , we denote the corresponding class indicator by yj (+1 for ω1 , - 1 for ω2 .) Our task
can now be summarised as: Compute the parameters w;w0 of the hyperplane in order to:
1. minimise J(w) = 12 kwk2
2. subject to yi · (wT xi + w0 ) ≥ 1 i = 1, 2, ..., N .
Obviously, minimising the norm makes the margin maximum. This is a nonlinear (quadratic)
optimisation task subject to a set of linear inequality constraints.
3.2.2
Gaussian Mixture Model (GMM)
GMMs have been widely used in the field of speech processing, mostly for speech recognition,
speaker identification and voice conversion. Their capability to model arbitrary probability
densities and to represent general spectral features motivates their use. The GMM approach
assumes that the density of an observed process can be modelled as a weighted sum of component
densities bm (x):
M
X
p(x|λ) =
cm bm (x)
(3.12)
m−1
where x is a d-dimensional random vector, M is the number of mixture components and bm (x)
is a Gaussian density, parameterised by a mean vector µm and the covariance matrix Σm . The
coefficient cm is a weight that is used to model the fact that the different densities have different
heights in the probability density function. The parameters of the sound model are denoted
as λ = {cm ; µm ; Σm }, m = 1, ..., M. The training of the Gaussian Mixture Models consists in
finding the set of parameters λ that maximises the likelihood of a set of n data vectors.
Different alternatives are available in the literature to perform such estimation. One of them
is the Expectation Maximisation (EM) algorithm. The algorithm works by iteratively updating
the vector λ and the estimation of the probability density function p(m|xi , λ) for each element
in the training set. In the case of diagonal covariance matrices the update equations become:
µnew
m
Σnew
m
Pn
=
Pn
=
i=1 p(m|xi , λ)
· xi
p(m|xi , λ)
− µm )T (xi − µm )
p(m|xi , λ)
i=1 p(m|xi , λ)(xi
(3.13)
(3.14)
n
cnew
m
1X
=
p(m|xi , λ)
n
i=1
the value p(m|xi , λ) is updated at each iteration by the following equation:
(3.15)
CHAPTER 3. THEORETICAL BACKGROUND
20
cm bm (xi )
p(m|xi , λ) = PM
j=1 cj bj (xi )
(3.16)
Let us now consider the decision process: if we have a sequence of L ≥ 1 observations X =
x1 , x2 , ..., xL and we want to emit a verdict, we have to choose the model among λ1 , λ2 , ..., λK
that maximises the a posteriori probability for the observation sequence:
k̂ = arg max P (λk |X) = arg max
1≥k≥K
1≥k≥K
P (X|λk )P (λk )
p(X)
(3.17)
The computation can be greatly improved: in fact p(X) is the same for k = 1, ..., K. Furthermore, assuming that P (λk ) are equal for each class of sounds, the classification rule simplifies
to:
K̂ = arg max p(X|λk )
1≥k≥K
(3.18)
Using logarithms and the independence between observations, the sound recognition system
computes:
K̂ = arg max
1≥k≥K
3.3
L
X
log(p(xi |λk ))
(3.19)
l=1
The short-time Fourier transform
A useful technique for the modification of signals is given by the Short Time Fourier Transform
(STFT), that allows to represent a signal in a joint time and frequency domain. The basic
idea consists of performing a Fourier Transform on a limited portion of the signal, then shifting
to another portion of the signal and repeating the operation. This results in a set of Fourier
Transforms (frequency-domain representation) at different time instants (time-domain).
We have to consider two distinct phases for the modification of a signal based on the STFT. The
first one is the analysis phase: the signal is subdivided into small windowed portions on which
a Fourier transform is calculated; the signal is therefore described by the Fourier coefficients at
different time instants (short-time spectra) that can be modified in order to realise some digital
audio effects (e.g. time or pitch-scaling). The reconstruction of the signal from the STFT
representation takes place with the synthesis phase, by means of a Inverse Fourier transform
applied on each modified spectrum.
We now give a formal definition of the analysis and synthesis phases, according to the band-pass
convention of STFT.
3.3.1
Analysis
The short-time Fourier transform of a signal x(t) is defined by:
X(tua , Ωk ) =
+∞
X
h(n)x(tua + n)e−jΩk n
(3.20)
n=−∞
u
where h(n) is the analysis window, Ωk = 2πk
N , ta = uR is the u-th analysis time instant, in which
R represents a fixed integer increment that controls the analysis rate. For time and pitch-scaling
modifications the analysis rate can also be non-uniform.
CHAPTER 3. THEORETICAL BACKGROUND
3.3.2
21
Synthesis
Given an arbitrary sequence of synthesis short-time Fourier transforms Y (tus , Ωk ) , there is in
general no time-domain signal y(n) of which Y (tus , Ωk ) is the short-time Fourier transform. However many methods exist to obtain an approximation of y(n). The most general reconstruction
formula is:
N −1
+∞
X
1 X
u
y(n) =
w(n − tus )
Y (tus , Ωk )ejΩk (n−ts )
(3.21)
N
u=−∞
k=0
in which w(n) is the synthesis window and tus is the u-th synthesis time instant.
The perfect reconstruction of the original signal (in absence of modification between the analysis
and synthesis stages) is achieved when
tus = tua
(3.22)
Y (tus , Ωk ) = X(tua , Ωk )
(3.23)
if, for each n,
+∞
X
w(n − tus )h(tua − n) = 1
(3.24)
u=−∞
3.4
Time-scaling
The goal of time scaling is to slow down or speed up a given signal, in a time varying manner,
without altering the signal’s spectral content (i.e. without altering its pitch). In order to obtain
a time-scale modification, we have to define an arbitrary time-scale function which allows us to
specify a mapping between the time t in the original signal and time t0 in the modified signal.
This mapping is performed through a time warping function
T : t 7→ t0
(3.25)
Although the general expression for T(t) can be very general and can be specified by an integral
definition such as
Z
t
T (t) =
β(τ ) dτ
(3.26)
0
The term β represents the time modification rate; in particular β > 1 corresponds to a timescale
expansion (the signal is slowed down) while β < 1 corresponds to a time-scale compression (the
signal is speeded up). Notice also that it should be β > 0, since a negative time modification
rate has no physical meaning.
Give the sinusoidal model of a signal
x(t) =
I(t)
X
Ai (t)ejφi (t)
(3.27)
i=1
with
Z
t
φi (t) =
ωi (τ ) dτ
(3.28)
−∞
The time-scaled signal is the following
I(β −1 (t0 ))
0
0
x (t ) =
X
i=1
0
0
Ai (β −1 (t0 ))ejφi (t )
(3.29)
CHAPTER 3. THEORETICAL BACKGROUND
with
φ0i (t0 ) =
Z
t0
22
ωi (β −1 (t0 )) dt
(3.30)
−∞
We can see that the expression of ideal time-scaled signal is still a linear combination of the
sinusoidal components of the originary sinusoidal model. The signal is modified in a way that
the i-th sinusoid at time t’ corresponds to the instantaneous amplitude in the original signal at
time t = β −1 (t0 ) . The same holds for the instantaneous frequency of the i-th sinusoid at time
t’ which corresponds to the instantaneous frequency in the original signal at time t = β −1 (t0 ).
Notice that the phase term is obtained applying the inverse mapping function on time axis,
and not simply replacing t with t’ (corresponding to the simpler operation of time warping).
This results in a signal whose time-evolution is modified while its frequency content remains
unchanged.
In the following sections, frequency- and time-domain techniques for time scaling are described.
3.4.1
Time-scaling STFT algorithm
The short-time Fourier transform gives access to the implicit sinusoidal model parameters; hence,
the ideal time-scaling operation can be implemented in the same framework. Synthesis time
instants tus are set at a regular interval R = tu+1
− tus from the series of synthesis time instants
s
u
u
ts , analysis time instants ta are calculated according to the desired time warping function. The
short-time Fourier transform of the time-scaled signal is then:
u
Y (tus , Ωk ) = |Y (T (tus ), Ωk )| ejφk (ts )
(3.31)
φk (tus ) = φk (tu−1
) + ωk (T (tus )) · R
s
(3.32)
with
where ωk (T (tus )) is the instantaneous frequency computed in channel ωk (T (tus )) is supposed to
be constant over the duration of (tua − tu−1
a ) .
The complete time scaling algorithm can be summarised as follows:
1. set the instantaneous phases φk (tus , Ωk ) = arg(X(0, Ωk ))
2. set the synthesis instant, according to an evolution at a constant frame rate R, through
= tus + R
the relation tu+1
s
3. calculate the next analysis temporal instant through the inverse time-warping function.
Due to the fact that tu+1
could be non-integer, we have to consider the two integer time
a
instants immediately below and above it.
4. calculate the short-time Fourier transform at the temporal instants immediately below
and above tu+1
, for each channel k, compute the corresponding instantaneous frequencies
a
using ωk = Ωk + ∆φ−ΩkRR−2mπ . Take care of the phase unwrapping problem;
5. for each channel k, estimate the modulus of the STFT at time tu+1
a through linear
a
interpolation
6. reconstruct the time-scaled short time Fourier transform at time tu+1
s
7. calculate the (u+1)-th frame of the synthesis sequence y(n)
8. return to step 2.
CHAPTER 3. THEORETICAL BACKGROUND
3.4.2
23
Time-scaling time-domain algorithms
We now briefly describe some of the simplest methods for time-scaling in time domain. A trivial
method for time-scaling a sound recording is to just replay it at a different rate. When using
magnetic tapes, for example, the tape speed may be varied; however this incurs a simultaneous
change in the pitch of the signal. In response to this problem, a number of authors have developed algorithms to independently perform time and pitch-scaling. These methods are based
on time domain splicing overlap-add approaches. The basic idea consists of decomposing the
signal into successive segments of relatively short duration (10 to 40 ms). Time-scale compression/expansion is achieved by discarding/repeating some of the segments while leaving the others
unchanged, and by copying them back in the output signal. As was the case for frequencydomain techniques, pitch-scale modifications can be obtained by combining time-scaling and
re-sampling. For this scheme to work properly, one must make sure that no discontinuity appears at time-instants where segments are joined together; this is the reason why the segments
are usually overlapped and multiplied by weighting windows, while phase discontinuities can be
resolved by a proper time alignment of the blocks.
The splicing methods have the advantage of being computationally cheap, but at the expense of
suffering from echos artifact, due to the delayed and diminished amplitude replicas of the signal
being present in the reconstruction. For strictly periodic signals, the splicing method functions
perfectly provided the duration of the repeated or discarded segments is equal to a multiple of
the period. This is still true to a large extent for nearly periodic signal such as voiced speech and
many musical sounds. In order to improve the performances of the splicing technique, a number
of methods have been proposed, in which an actual estimate of the pitch or some measure of
waveform similarity are used to optimise the splicing points and durations.
Synchronous Overlap and Add (SOLA) The example we analyse regards the Synchronous
Overlap and Add method (SOLA), based on correlation techniques, which has been investigated since its simpleness and utility. The idea consists of adjusting the length of the repeated/discarded segment so that the overlapped parts (in particular, the beginning and the
end) are maximally similar, thus avoiding artifacts which can occur in splicing methods. In particular, the input signal x(n) is divided into overlapping blocks of a fixed length; consequentially
these blocks are temporal shifted according to the time-scaling factor. Then, the similarities in
the area of the overlap intervals are searched for a discrete-time lag of maximum similarity. At
this point of maximum similarity the overlapping blocks are weighted by a fade-in and fade-out
function and summed sample by sample.
The algorithm can be summarised as follows:
1. segmentation of the input signal into blocks of length N with time shift of Sa samples;
2. repositioning of blocks according to the scaling factor;
3. computation of the normalised cross-correlation between xL1 (n) and xL2 (n) , which are
the segments of x1(n) and s2(n) in the overlap interval
rxL1 ,xL2 (m) =
L−m−1
1 X
xL1 (n) · xL2 (n + m)
L
(3.33)
n=0
4. extracting the discrete-time lag where the normalised cross-correlation operator has its
maximum value
5. compute fade out/in between the two blocks
CHAPTER 3. THEORETICAL BACKGROUND
24
SOLA implementation leads to time scaling with small complexity, guaranteeing the independence of the parameters Sa, N and L from the pitch period of the input signal.
Chapter 4
Methodology
In this chapter we will give an overview of the feature-based recommendation framework, starting
from the general description of the problem it is addressed to. We will then decompose it into
functional blocks and explain the operation performed by them.
4.1
The problem of labels
In time, men have always shown the tendency of classifying and ordering the world around them.
This classification ability usually assumes the form of naming, i.e. associating a name to class
of objects; this is so important that many cultures consider it as a proof of the control of the
man over the world. In the Bible, for example, God creates the animals and let the man decide
their name: by giving names to the animals, the man is proclaimed the lord of all creation.
In the history of mankind, classification and cataloguing have been applied in almost every field
of science and studies; from more objective distinction as chemical components to more abstract
labelling as historical periods or emotions.
Although labels have shown their goodness since they usually create a simplified model of the
world, they suffer from some disadvantages. First of all, they are usually bound to a certain level
of abstraction and tend to hide the underlying details. If we consider the labelling of animals,
we can divide them according to different levels of abstractions; by choosing one of them, the
complexity of the underlying objects is hidden.
In addition to this, labels are sometimes too strong in the sense that they do not consider the case
of items belonging to more than one class; if we, for example, consider the genre classification
of musical artists items namely belonging to the same genre could be very different from each
other; moreover, the same artist may belong to more than one genre at the same time.
4.2
Labelling in recommendation systems
We will now consider the problem of labelling in the musical field. Historically, music has
been classified according to many aspects: artist, composer, genre, emotional content, etc. The
classification of music according to labels is however very weak: in the classification according
to the composer or the genre, for example, the boundaries between the classes are quite blurred;
musical pieces written by the same composer may be very different.
Recently, the label classification has been applied to the field of music recommendation systems.
Many applications have been developed, all sharing the work-flow displayed in Figure 4.1.
25
CHAPTER 4. METHODOLOGY
26
Figure 4.1: The work-flow of traditional recommendation systems
The musical signal is analysed in order to extract a set of descriptors called features. The feature
data forms N-dimensional vectors. Those vectors are then analysed through a decision process
that returns outcomes in the form of labels (genre, artist, style, etc.). This symbolic information
can thus take on a finite or numerable number of values, which results in a greatly limited
discrimination power. Recommendation systems usually start from this symbolic description to
generate playlists.
The loss of discriminant power associated to the conversion from feature vectors to labels is
the main reason why recommendation systems tend to perform quite poorly. The reason why
playlists do not expose a constant behaviour could be due to:
• Temporal inhomogeneity: labels tend to be global descriptors of a whole musical piece.
Therefore they usually do not consider local changes in the mood and tempo or genre
transitions within the same piece. This makes it very hard to create playlists that locally
adapt to a particular state.
• Ensemble inhomogeneity: labels tend to apply to musical pieces that differ a great deal
from each other. Choosing labels as control parameters results therefore in a weak control
on the content. Consider, for example, genre classification: music belonging to the same
genre are very different from each other; if the user selects a playlist based on a particular
genre, this has a very weak connection with the actual mood of the song
Figure 4.2: The work-flow of our system
In this thesis, we would like to create a recommendation framework that avoids the conversion
from feature vectors to labels (Figure 4.2). In this framework, the user can directly control the
features obtaining a much better control on the generated content. Moreover, features are no
more considered as global descriptors of the musical piece but are time-dependent properties
that may change inside the same musical piece. Therefore, the system considers the locality of
the features solving the problem of temporal inhomogeneity.
4.3
The recommendation framework structure
The recommendation framework is composed by the following parts (Figure 4.3):
• Proposal generator : this component is responsible of exploring the item database in order
to find items that satisfy the conditions expressed by the user in the feature space. The
CHAPTER 4. METHODOLOGY
27
Figure 4.3: The functional blocks of the system
output of this component is a list of pairs ai , ∆t, where ai is an audio item and ∆t =
[starti , endi ] is an interval inside ai , that satisfy the user constraints.
• Ranking system: this component is responsible of learning the musical taste of the user
by analysing his or her preferences. It then ranks the proposals created by the Proposal
generator according to this information. In this way the recommendation framework not
only considers the explicit requests of the user in terms of feature values, but also implicitly
adapts the list of generated proposals to the inferred preferences.
• Transition generator : this component is responsible of performing the transitions between
audio items and interpolate the the changes in the values of the feature between different
musical pieces
• User interface: this components manages the interaction mechanism between the human
and the machine. Various user interfaces can be devised according to the particular application of the framework
4.4
Applications
This section presents some applications of the system. They have been chosen as instances of a
wide range of possibilities, to show the generality of the recommendation framework approach.
The applications are:
• Automatic DJ system: the recommendation framework may be used as an automatic DJ
system that can automatically create transitions and adapt the emotional aspects of music
to a particular situation.
• Tabletop recommendation system: the recommendation framework may be used in conjunction with a tangible interface to allow the user to specify the values of the features by
placing blocks on a tabletop
• Dynamic playlist generation system based on runner’s step frequency: using an accelerometer, the system is able to detect the step frequency of a runner and adapt the music tempo
to this value
• Training-based system: using the ranking system is possible to make the system learn the
musical preferences of the user
Each application has been obtained enabling/disabling components of the framework or by
changing the parameters used during the generation phase.
CHAPTER 4. METHODOLOGY
4.4.1
28
Automatic DJ system
The increasing computational power of modern machines suggests a change in the traditional
view of the task of a Disk Jockey. The early DJs used to buy LP disk mixing them in a
creative way; the introduction of personal computers and audio compression (MP3 files) have
just moved the audio data from analog to digital and increased the dimension of DJ libraries,
without however affecting the performance of the artists. He or she has still to learn the repertory
and know in advance which pieces suit well together.
Keeping in mind these issues, this application focuses on the development of an automatic music
compositing system that enables the user to control the music selection on the basis of meaningful
parameters. Note that the same approach could be exploited in any other field (video, images,
paintings, even cooking, ...). The system resembles an automatic Disk Jockey software but it
differs from it in many ways. First of all, the system embodies an expert algorithm that analyses
the audio database and it is able to detect features directly from the content; in addition, the
interaction with the user is not just in the form of a control but more of a collaboration, in the
sense that the user and the machine are now helping each other to achieve the goal.
The program accepts real-time user inputs in order to modify its behaviour; the user, by means
of an input interface, can drive the performance of the system by specifying the values of predefined parameters.
4.4.2
Tangible interface
This application of the recommendation framework is based on the ReacTable, a tangible
interface originally designed within the Music
Technology Group at the Universitat Pompeu
Fabra in Barcelona, Spain by Sergi Jordá, Marcos Alonso, Martin Kaltenbrunner and Günter
Geiger.
The ReactTable is a multi-user electronic music
instrument with a tabletop user interface (Jordà
et al. [24]). The user can interact with the instrument by placing blocks (called tangibles) on
a semi-transparent table. Each tangible repreFigure 4.4: The ReacTable
sents a predefined function and is recognised by
a camera placed below the table; the visual interface of the system is projected on the table and
allows the user to interface with the system via tangibles or fingertips. In this way a virtual
modular synthesiser is operated, creating music or sound effects.
In the version of the system developed by Jordà et al. [24], there are various types of tangibles representing different modules of an analog synthesiser. Audio frequency VCOs (Voltage-controlled
oscillators), LFOs (Low-frequency oscillators), VCFs (Voltage-controlled filters), and sequencers
are some of the commonly-used tangibles. There are also tangibles that affect other modules:
one called radar is a periodic trigger, and another called tonaliser limits a VCO to the notes of
a musical scale.
The our application the interaction between the user and the machine is performed by placing
on the table a set of fiducials, each one with a different meaning. We may group the fiducial in
three main categories:
• Feature weights: this set of fiducial controls the importance assigned to the features. By
moving them, the user is able to prioritise the features by giving more or less importance
to each of them.
CHAPTER 4. METHODOLOGY
29
Figure 4.5: The ReacTable framework
• Feature values: this set of fiducial controls the value assigned to the features. The user
is therefore able to control the evolution of the recommendation system using a sort of
tabletop mixer where the fiducials act as sliders.
• Special fiducials: some fiducial have a special meaning; when they are visible in the scene,
they modify the behaviour of the system.
4.4.3
Dynamic playlist generation system based on runner’s step frequency
This application exploits a subset of the functionalities of the system. It consists in an adaptive
playlist generator based on the runner’s step frequency. A similar application has been developed
by Masahiro et al. [33]. The system adapts the tempo of the performed audio to the behaviour
of a running person, by both time-scaling or changing to another musical item.
The recommendation framework has been tuned in the following way:
• An upper bound of 5 seconds has been set on the temporal length of the proposals. In
this way, the system is forced to generate a proposal list (and therefore to check the value
of the features) every 5 seconds. This leads to a reasonable reactiveness
• Since in this application the tempo feature is the most important one, its importance has
been set to a high value. The relevance of the other features has been set to a low value.
• The audio database is composed by the same song (The Knack - My Sharona) time-scaled
at different BPMs.
An accelerometer has been fasten to the runner’s leg to detect his or her step frequency. When
the tester runs, the system adapts the music tempo to this value.
The interesting aspect of this application is that both the person and the machine tend to adapt
to each other. In fact, when the music beat frequency is close to the runner’s step frequency, he
or she tends to adapt his or her speed to the music.
The system could become even more interesting when it could enhance the runner’s performance by influencing his or her heart rate. When the system detects a low heart rate it should
recommend faster music and vice versa.
4.4.4
Training-based system
In this application the system is used without specifying the values of the features and interacting
only through the screen interface. The only part of the recommendation framework that is used
is the mechanism to learn the taste of the user and rank the music according to the history of
CHAPTER 4. METHODOLOGY
30
the users’ preferences.
The more the system is trained, the more it understands and adapts to the inferred preferences
of the user, improving the quality of the recommended music.
This is the lightest application from the point of view of the human-computer interaction; the
user implicitly interacts with the learning algorithm by selecting items from the proposal list.
Chapter 5
Implementation
This chapter describes the implementation details of the system. The system is mainly divided
in two stages (Figure 5.1):
• Preprocessing phase: the system extracts the features from the audio contained in a
database using Music Information Retrieval (MIR) techniques. The output of this phase
is saved in the form of XML files
• Performance phase: the system reads the XML data and uses it to select the audio items
to play. The system starts playing an audio track for a pre-defined period of time; it then
scans the database for items that best fit parameters defined by the user and shows the
user a list of proposal. If the user chooses one of them, the system automatically performs
the transition otherwise the system automatically select the best fitting item. During the
implementation of the performance step, real-time issues should be taken into account to
avoid artifacts such as playback delays or audio cracks.
The first section of this chapter describes the
feature extraction mechanism and the feature
similarity functions used to compare the values of the features. This part describes the
basic bricks the system is build with.
In the second section we describe the intermediate layer between the preprocessing and the
performance stages: the XML data.
The third section describes in detail the performance phase.
Figure 5.1: System stages
31
CHAPTER 5. IMPLEMENTATION
5.1
32
Preprocessing phase
The preprocessing stage analyses the audio data and extracts the features. For each audio item
in the database, the system calculates the evolution of the features in it and stores the result in
a XML file. The audio items are down-sampled to 11025 Hz mono signals in order to speedup
the processing.
The first section defines the concept of feature we will use in the system. It then describes how
the features are calculated.
The second section describes the feature similarity functions that will be used during the performance phase to compare the values of features in different audio files.
5.1.1
Features extraction
A feature is a function of time:
f eature : T IM E 7→ D
(5.1)
where D is an arbitrary domain. The domain D should satisfy the comparability property: there
exists a function
compare : D×D 7→ [0, 1]
(5.2)
that is used to compare each pair of values in D. The following properties hold:
• When two elements are similar, the value should be near to 1. Conversely, when two
elements are not similar, the value should be near 0
• An element of D is equal to itself
compare(a, a) = 1 ∀a ∈ D
(5.3)
• Symmetry: compare(a, b) == compare(b, a)
There are some relevant features that are used in the system (for the theoretical details, see
chapter 3):
• segmentation: a segmentation operation splits the audio signal in segments by detecting
some ”interesting points” in the musical stream. We will call them anchors. An anchor
defines the moment in which an event occurs in the music.
During the performance phase, the system will use the anchor points to build up the
playback tree (Figure 5.2). This means that the system will merge two songs only in
correspondence of segment points.
Figure 5.2: Anchors points
CHAPTER 5. IMPLEMENTATION
33
• harmony: this features is composed by a key (e.g. C, C#, ...) and a mode (major,
minor). In addition we could specify the confidence of the detected harmony, the so-called
keyClarity.
• tempo: expressed in BPM (beats per minute). It is extracted in 50% overlapping windows
of 16 seconds.
An important aspect we will now concentrate on is the optimisation of the tempo. It
sometimes happens that the detection of the tempo oscillates among the multiples of the
correct value although no tempo change occurs in the musical score; this is because some
beats of the rhythmic pattern are missing or have been added for artistic purposes. When
detecting the tempo, we should therefore avoid considering these oscillations. An algorithm
has been developed to merge the values of two different tempos when one is a multiple of
the other.
The system analysed the list of tempo detections from the beginning to the end:
– Given bpmn , the n-th tempo detection, the system builds two lists, div and mul in
the following way:
∗ For each bpmi with i = 0, ..., n − 1:
· Set divi = round(bpmi /bpmn ) · bpmn − bpmi
· Set muli = bpmn /round(bpmn /bpmi ) − bpmi
– Given divM inIndex and mulM inIndex the indexes of the minimum value of div
and mul respectly:
∗ if divminDivIndex < mulminM ulIndex : set the n-th value of the tempo to
bpmn · round(bpmminDivIndex /bpmn )
(5.4)
∗ otherwise set the n-th value of the tempo to
bpmn /round(bpmn /bpmminM ulIndex )
(5.5)
After that, a median filter of 5 tempo samples is applied to remove the remaining outliers.
Figure 5.3a shows an example of an un-optimised tempo detection. The tempo in the
original score moves from 180 to 160 and again to 180. We can notice that in the unoptimised detection, the values of the tempo oscillate between three values: 180, 90 and 160.
Whereas the value 180 and 160 are correct, 90 is wrong since no change in time takes place
in the score.
This problem arises from the fact that the during the 90 tempo segments only two beats
out of four are detected by the system. The result of the optimisation is shown in Figure
5.3b.
• brightness: the brightness expresses the power of the high-frequency bands with respect
to the low-frequencies.
The brightness feature is extracted using 1-seconds-long 50% overlapping windows.
• rms: rms are computed by taking the root average of the square of the amplitude.
The rms feature is extracted using 1-seconds-long 50% overlapping windows.
• mood: mood is a high-level feature describing the emotional content of a musical piece.
Since no standard set of emotions seems to have been established, this has to be selected
and should be founded in psychology and prove useful in the study. In the system, we
used a bi-dimensional representation of the mood shown in Figure 5.4.
The vertical axis (Energy) is related to the strength of the signal and could be therefore
detected through the intensity features (such as RMS, loudness, ...). The horizontal axis
CHAPTER 5. IMPLEMENTATION
(a) Original tempo detection (before optimisation)
34
(b) Optimised tempo detection
Figure 5.3: Tempo optimization
Figure 5.4: Mood bi dimensional plane
(Stress) indicates whether the emotion is positive (happiness, joy, ...) or negative (depression, frustration, ...) and is calculated using timbric features.
Using a set Support Vector Machine we are able to classify the audio data in a bidimensional plane; a total of three SVM are trained to determine the classes in a hierarchical framework (Figure 5.5): first of all, the items are classified according to the
intensity feature in two classes (high and low intensity) and then, for each class, a SVM
performs the vertical classification.
The features are extracted in windows of 32 milliseconds and, for each of them, the
average and variance are computed.
The intensity feature extracted from the audio items is rms.
The timbric features are:
– spectral centroid
– spectral roll-off
– spectral flux
– inharmonicity
– MFCC
CHAPTER 5. IMPLEMENTATION
35
Figure 5.5: The hierarchical framework
5.1.2
Features similarity functions
As previously stated (see 5.1.1), each feature (except for segmentation) is assigned a similarity
function that expresses the relationship between its values. The functions are the following:
• harmony: the similarity relation defined among the values of harmony could be defined
according to the rules of harmony in Western music. These values could be changed
according to the personal taste. In Table 5.1 and Figure 5.6 we present the values of the
compare function between (C, maj) and all other harmonies and between (C, min) and all
other harmonies. The other values could be found musically transposing the notes.
Beside this, the similarity measure also considers the keyClarity, i.e. the confidence of the
detected harmony. The key clarity not only expresses the probability of the correctness of
the given harmony, but gives hints about the amount of inharmonic noise present in the
piece. If this value is very high, a strong harmonic component is perceived by the listener.
On the contrary, when this value is low, the segment does not present a well-defined harmony.
When two segments have both a high key clarity, the overall harmony similarity function
should consider the values defined in Table 5.1. However, when two segments have both
a low key clarity, the actual value of the key is not important since the value has a low
confidence.
In particular given two audio samples, with key clarity keyClarity1 and keyClarity2 respectively with the key similarity computed as shown before (see Table 5.1), the harmony
similarity measure should show the qualitative behaviour described in Table 5.2.
High
Low
keySimilarity
High
High
Low
keySimilarity Medium
keySimilarity Medium
Low
High
Low
Medium High
Medium High
keyClarity1
keyClarity2
Table 5.2: The qualitative measure of harmony similarity
.
The qualitative graph of the similarity function is shown in Figure 5.7; when both key
clarities are zero, the overall similarity is high, when both are one, the value is the real
key similarity.
The overall similarity measure is obtained by the combination of two functions:
f1 (...) = (1 − keyClarity1 ) · (1 − keyClarity2 )
(5.6)
CHAPTER 5. IMPLEMENTATION
Key
C
C#
D
D#
E
F
F#
G
G#
A
A#
B
Mode
maj
min
maj
min
maj
min
maj
min
maj
min
maj
min
maj
min
maj
min
maj
min
maj
min
maj
min
maj
min
(a) C Major
measure
Value
1.00
0.25
0.00
0.00
0.20
0.25
0.10
0.00
0.10
0.85
0.25
0.25
0.00
0.00
0.25
0.25
0.10
0.00
0.10
0.90
0.20
0.00
0.00
0.00
similarity
36
Key
C
C#
D
D#
E
F
F#
G
G#
A
A#
B
Mode
maj
min
maj
min
maj
min
maj
min
maj
min
maj
min
maj
min
maj
min
maj
min
maj
min
maj
min
maj
min
Value
0.25
0.00
0.00
0.00
0.00
0.00
0.90
0.00
0.00
0.00
0.25
0.25
0.00
0.00
0.25
0.25
0.85
0.00
0.00
0.00
0.25
0.00
0.00
0.00
(b) C Minor similarity measure
Table 5.1: Harmony similarity measure
(a) C Major similarity measure
(b) C Minor similarity measure
Figure 5.6: Harmony similarity measure
Figure 5.7: The qualitative graph of harmony similarity
CHAPTER 5. IMPLEMENTATION
37
that is maximum when both keyClarity1 and keyClarity2 are zero, and the following:
f2 (...) = keySimilarity · (1 − |keyClarity1 − keyClarity2 |)
(5.7)
whose value decreases when keyClarity1 and keyClarity2 are distant. Figure 5.8a shows
the graph of f1 , whereas Figure 5.8b shows the graph of f2 .
(a) Graph of f1
(b) Graph of f2
Figure 5.8: Harmony similarity measure
The harmony similarity is obtained combining the contributes of the two functions:
compareharmony (...) = f1 (...) + (1 − f1 (...)) · f2 (...)
(5.8)
In Figure 5.9 an example of similarity function is plotted (key similarity is set to 0.5). We
can see that the qualitative behaviour resembles the one in Figure 5.7.
Figure 5.9: Graph of compareh armony
• tempo: the distance function for the tempo could be calculated as a normalized Gaussian
function centred in one of the two BMP values and with an appropriate variance (Figure
5.10). It returns high values when the two tempos are near and reasonably decreasing
values when they move away.
CHAPTER 5. IMPLEMENTATION
38
Figure 5.10: Tempo similarity graph
• brightness: the similarity function is computed as follows
comparebrightness (...) = 1 − |brightness1 − brightness2 |
(5.9)
• rms: the similarity function is computed as follows
comparerms (...) = 1 − |rms1 − rms2 |
(5.10)
• mood: the distance measure among the classes is summarised in Table 5.3. Keeping in
mind the bi-dimensional plane used to calculate the mood, the similarity is 1.00 when the
two moods are the same, 0.33 when they are in the same column, 0.40 when they are in
the same row and 0.10 otherwise.
Exuberance
Anxious
Contentment
Depression
Exuberance
1.00
0.40
0.33
0.10
Anxious
0.40
1.0
0.10
0.33
Contentment
0.33
0.10
1.0
0.40
Depression
0.10
0.33
0.40
1.0
Table 5.3: Mood similarity table
5.2
XML Structure
For each audio file in the database and for each feature, an XML file is created:
• filename anchors.xml : contains data about anchors (resulting from the segmentation phase)
• filename tempo.xml : contains data about tempo
• filename harmony.xml : contains data about harmony (key, mode, keyClarity)
• filename brightness.xml : contains data about brightness
• filename rms.xml : contains data about rms
• filename mood.xml : contains data about mood
Two XML schema has been defined, one used for the anchors and the other for the other features.
CHAPTER 5. IMPLEMENTATION
5.2.1
39
Anchors XML Schema
The anchors XML file contains the list of the anchor points. For each of them, a name and a
description may be defined. Table 5.4 describes the XML structure.
An example of XML file is the following:
<anchors>
<anchor position=POSITION>
<name>NAME</name>
<description>DESCRIPTION</description>
</anchor>
...
</anchors>
5.2.2
Generic feature XML schema
The generic feature XML schema is used to store information about features (harmony, tempo,
...). In this context a feature is seen as a structure that associates the value of some attributes to
time instants and has some parameters that specifies some global characteristics (Figure 5.11).
Table 5.5 describes the XML structure.
An example of XML file is the following:
<feature id=ID>
<name>NAME</name>
<description>DESCRIPTION</description>
<parameters>
<parameter name=NAME>VALUE</parameter>
</parameters>
<data>
<dataitem position=POSITION>
<name>NAME</name>
<description></description>
<attributes>
<attribute name=NAME>VALUE</attribute>
...
</attributes>
</dataitem>
...
</data>
</feature>
The list of features and attributes used in the system is the following:
• Harmony: The harmony XML file is derived from the generic feature XML schema. It
contains three attributes per data item:
– key: a value among C, C#, D, D#, E, F, F#, G, G#, A, A#, B
– mode: a value among maj, min
CHAPTER 5. IMPLEMENTATION
Name
<anchors>
<anchor>
Attributes
position: contains the
position of the anchor
in number of samples
(from 1 to N)
40
Children
Zero or more <anchor>
<name> [OPTIONAL],
<description>
[OPTIONAL]
<name>
<description>
Description
Contains the anchor list
Defines an anchor point
Name of the anchor
Description of the anchor
Table 5.4: Anchor XML structure
Figure 5.11: The XML data model
Name
<feature>
Attributes
id: identifier of the feature
<parameters>
<parameter>
name: identifier of the
parameter
<data>
<dataitem>
position: position in the
music stream in number
of samples (from 1 to N)
<attributes>
<attribute>
Children
<name> [OPTIONAL],
<description>
[OPTIONAL],
<parameters>
[OPTIONAL],
<data>
[REQUIRED]
One
or
more
<parameter>
Zero
or
more
<dataitem>
<name> [OPTIONAL],
<description>
[OPTIONAL],
<attributes>
[OPTIONAL]
One
or
more
<attribute>
name: identifier of the
attribute
Table 5.5: Generic feature XML structure
Description
Root node
Contains the feature
global parameters if any
Contains the value of a
parameter
Contains the data of the
feature
Represents a value of
the feature
List of attributes of the
data item
Contains the value of
an attribute of the data
item
CHAPTER 5. IMPLEMENTATION
41
– keyClarity: a real number between 0 and 1
• Tempo: The tempo XML file is derived from the generic feature XML schema. It contains
one attribute per data item:
– bpm: beat per minutes
• Brightness: it contains one attribute per data item:
– brightness: the value of the brightness (real number in the interval [0, 1])
• Rms: it contains one attribute per data item:
– rms: the value of the rms (real number in the interval [0, 1])
• Mood: it contains one attribute per data item:
– mood: the value of the mood (a number in the interval [1, 2, 3, 4])
5.3
Performance phase
In the following sections, the performance stage is analysed.
At startup, the system loads the information from the XML files into memory and randomly
selects a section of an audio item and plays it. It then shows to the user a list of audio items that
best fit with the current one, sorted according to the musical preferences of the user (detected
by an expert system). By default the first item of the list is selected, but the user can change
the selection. Once the played section is about to end, the system starts elaborating the selected
item, performing the transition between the currently played and the new item. The process is
then iterated.
The performance stage is composed by the functional areas shown in Figure 5.12.
Figure 5.12: The functional blocks of the system
• Work-flow manager : this components manages the timing in the system by sending signal
to the other blocks in correspondence of events
• Proposal generator : this component is responsible of the analysis of the feature space and
the detection of audio items that are similar to the one that is currently played
• Ranking system: this component learns the musical taste of the user from his or her actions
and sorts the proposal list generated by the proposal generator according to it
CHAPTER 5. IMPLEMENTATION
42
• Transition generator : once the user has chosen the next audio item, the transition generator mixes it with the one that is currently being played.
• System interface: between the system and the user lies the interface; we will see that this
is not a single component but there exist multiple interfaces (screen, tangible, tapping and
Wii Remote)
5.3.1
Work-flow
In this section we discuss the temporising of the system by highlighting the order and timing of
the events that occur during the performance phase.
The user interaction with the system is performed inside a so-called Proposal selection window,
in which a list of proposals is shown to the user and he or she can choose an option from the
list. Each list of proposal has a default value that is automatically selected in case the user does
not express any preference.
The temporising of the system is performed by means of events. An event is used as synchronisation mechanism to indicate to waiting processes when a particular condition has become true.
When an event triggered, a predefined set of actions (task ) is executed. In the current system,
the events are scheduled by the TaskScheduler class that allows other processes to be notified
of them.
The list of events in the system is the following:
• Proposal selection opened : this event signals the opening of a proposal selection window
meaning that the user can start selecting the next audio item to be played from a list.
The system displays the list until a ”Proposal selection closed” event is raised.
• Proposal selection closed : this event occurs after a ”Proposal selection opened” event and
signals that the proposal selection window has expired. From the moment in which the
event is triggered, the user is excluded from from performing a choice of the audio item
to be played and the system can start elaborating the audio file (creating the transition
and sending it to the audio device). Note that if the user does not express any preference
during the proposal selection window, the system automatically selects the best fitting
option.
• Proposal play started : this event notifies the system that a proposal is being played
• Proposal play ended : this event notifies the system that a proposal has finished playing
and can be discarded from the system.
We will now describe how the events are scheduled by the TaskScheduler class.
System startup
When the system starts, it allows the user to select the first audio item for a pre-defined interval
of time (Tstartup ) and then it closes the selection window.
Let t0 be the system startup time instant. The system:
• schedules a ”Proposal selection opened” event at t0
• schedules a ”Proposal selection closed” event at t0 + Tstartup
Figure 5.13 shows the process.
CHAPTER 5. IMPLEMENTATION
43
Figure 5.13: System startup temporising
System default execution
After the startup phase, the system performs repeatedly a series of operations (task) in correspondence of the events.
The events are scheduled in such a way that the computation is as distributed in time as possible (meaning that two computationally expensive events should not occur at the same time)
avoiding any interruption in the played music.
Considering as reference point the time instant in which a ”proposal play started” event occurs,
we define a set of constants (Figure 5.14):
• Tclosing : each proposal selection for audio item proposali should close at least Tclosing
seconds before its playback (so that the system has the time to elaborate it).
• Topening : similarly, the ”proposal selection opened” event for the next audio item should
occur at least Topening seconds before the beginning of the playback of the current audio
item.
Figure 5.14: Tclosing and Topening
Let Li be the length of the i-th audio item that is selected by the user, tevent the time instant
in which the event occurs.
• Proposal selection closed : when a selection closes, the system schedules the beginning of
the playback of the current audio item (since it is now decided) and the next proposal
selection opening.
The operations performed are the following (Figure 5.15):
– Retrieve the selected proposal proposali with length Li
– Schedule the next ”proposal play started” event at time tevent + Tclosing
– Schedule the next ”proposal selection opened” event at time tevent +Tclosing −Topening
• Proposal selection opened : when a selection opens, the system schedules the proposal
selection closing.
The operations performed are the following (Figure 5.16):
– Retrieve the last selected proposal proposali−1 with length Li−1
CHAPTER 5. IMPLEMENTATION
44
Figure 5.15: Proposal selection closed
– Schedule the next ”proposal selection closed” event at time tevent + Li−1 + Topening −
Tclosing
Figure 5.16: Proposal selection opened
• Proposal play started : when an audio item starts its playback, the system schedules the
”Proposal play ended” event.
The operations performed are the following (Figure 5.17):
– Retrieve the proposal to be played proposali with length Li
– Schedule the next ”proposal play ended” event at time tevent + Li−1
Figure 5.17: Proposal play started
• Proposal play ended : no action
Anticipated proposal selection closing
It happens sometimes that the user does not want to wait until the end of the proposal selection
window since he or she has already decided the next proposal to be played. In this case the user
can trigger an anticipated proposal selection closing and forcing the system to display the next
proposal list.
The operation that are performed by the system are the following:
• Check if the proposal selection is actually opened. If not, the anticipated closing cannot
be performed.
• Cancel the next ”proposal selection closing” event and retrieve the time in which it was
scheduled tcancelled (Figure 5.18)
CHAPTER 5. IMPLEMENTATION
45
Figure 5.18: Cancel next ”proposal selection closed” event
• Perform a slightly different version of the proposal selection closing operation (the only
difference is that in this case the next ”Proposal selection opened” event is not scheduled):
– Retrieve the selected proposal proposali with length Li
– Schedule the next ”proposal play started” event at time tcancelled + Tclosing (Figure
5.19)
Figure 5.19: Anticipated proposal selection closing
• Open the new anticipated proposal selection window (Figure 5.20):
– Schedule the next ”proposal selection closed” event at time tcancelled + Li
Figure 5.20: Anticipated proposal selection opening
Forcing the proposal list generation
In order to improve the user experience reacting fast to the user inputs, it is sometimes useful to
regenerate the proposal list in order to adapt to a new state of the parameters. This operation
should take place only if the user parameter have significantly changed.
The regeneration of the proposal list can be performed at every instant during a ”proposal
selection” window.
CHAPTER 5. IMPLEMENTATION
5.3.2
46
Proposal generator
In this section we describe the proposal generator algorithms. Given a segment [starti , endi ]
of an audio item ai , two algorithms have been developed to evaluate the similarity of another
audio item aj at position startj .
• A first simplified algorithm computes the similarity as a numerical value within a fixedlength time interval
• The second algorithm computes the similarity by detecting the length of the time interval
in which the two segments fit well together. In this interval, the average similarity of each
feature should not be less than a pre-defined value.
Before discussing the details of the algorithms, we will describe some concepts and techniques
used by them.
Control parameters
For each feature (except segmentation), a set of variables is created. These variables are used
to compute a similarity measure among audio items. The values of the variables are set by the
user in real time.
• Weight: The weight represents the importance of the feature. To limit the range of the
weights, we set:
wi ∈ [0, 1] ∀i
(5.11)
• Value: the value is an element of the domain of a feature and represents the expected
value of the feature. The user can also avoid setting the value; in this case, the variable is
set to ”null”.
Similarity algorithms
During the performance phase, the audio items are compared within a time interval and a
similarity measure is computed.
Given
• a1 (t) and a2 (t), two audio items
• [start1 , end1 ] and [start2 , end2 ], two time intervals referring to the two audio items respectively. We suppose that the two segments are of the same length L.
• f1,k and f2,h , the value of the features of the two audio items
• vh , the expected values of the features (also ”null”) expressed by the user in real time
we compute the similarity measure Simh for the feature h as follows:
• If the feature value is different from ”null”, the system computes the similarity measure
as a cumulated distance between the second audio item and the expected value
Z
1 L
Simh =
compare(f2,h (τ + start2 ), vh ) dτ
(5.12)
L 0
CHAPTER 5. IMPLEMENTATION
47
where compare(..., ...) is the compare function defined by the feature. Remember that in
general the similarity measure is symmetric, therefore it is not affected by the order of the
operands.
• If the value is ”null”, the similarity measure is calculated between the two audio items;
the more similar the two items are within the compare interval, the greater will be the
value of the similarity measure
1
Simh =
L
Z
L
compare(f1,h (τ + start1 ), f2,h (τ + start2 )) dτ
(5.13)
0
Fixed-length algorithm
The first algorithm performs the comparison inside a fixed-length interval (defined a priori) and
associates to each segment a value, that can then be used to sort it.
Given:
• ai , the previous audio item
• aj , the next audio item
• ti , a time instant in ai
• tj , a time instant in aj
• L, a length
• wh , the weights of the features expressed by the user in real time
the algorithm computes the similarity by calculating an overall similarity measure in the interval
of length L immediately after ti and tj . The value of tj is chosen among the anchor points defined
in the audio item aj , since they represent relevant positions inside the musical piece.
The operations performed by the algorithm are the following:
• For each feature h, compute the feature similarity measure Simh between the intervals
[ti , ti + L] and [tj , tj + L] belonging to ai and aj respectly.
• According to the value of the weights, merge the similarity measures of the features in a
single value:
PH
wh · Simh
Sim = h=1
(5.14)
PH
h=0 wh
When the value of Sim is compared and the values of wh remains constant, we can discard
the normalisation factor:
H
X
Sim0 =
wh · Simh
(5.15)
h=1
An example of the execution of the algorithm is shown in the following figures: Figure 5.21a
represents the expected feature evolution specified by the user and Figure 5.21b shows the value
obtained by selecting the best option according to the previous algorithm.
CHAPTER 5. IMPLEMENTATION
(a) Expected value of feature
48
(b) Actual value of feature
Figure 5.21: An example of the fixed-length algorithm
Variable-length algorithm
This algorithm, instead of accepting a pre-defined comparison interval length, dynamically
chooses it according to the similarity value. Items that with a high similarity values will correspond to longer intervals, since they suit well together.
Consider:
• ai , the previous audio item
• aj , the next audio item
• ti , a time instant in ai
• tj , a time instant in aj
• wh , the weights of the features expressed by the user in real time
The value of tj is chosen among the anchor points defined in the audio item aj , since they
represent relevant positions inside the musical piece.
The weights wh specified by the user are used as feature similarity thresholds: the average
similarity measure of the feature h inside the interval computed by the algorithm should be
greater or equal to the value of wh (Figure 5.22). If, for example, the value of the weight wh of
feature h is 0.5, the system will create a transition in which the average feature similarity of h
is at least 0.5.
Figure 5.22: Variable-length similarity algorithm
The operations performed by the algorithm are the following:
CHAPTER 5. IMPLEMENTATION
49
• For each feature h, compute the length Lh such that:
Lh = max({ l |
Simh (l0 )
≥ wh ∀l0 ∈ [0, l[})
l0
(5.16)
where Simh (l0 ) is the similarity measure of feature h between the intervals [ti , ti + l0 ] and
[tj , tj + l0 ] belonging to ai and aj respectively
• Compute the overall interval length as the minimum length among the features
L = min({ Lh ∀h })
5.3.3
(5.17)
Ranking system
In this section we will describe the algorithm used by the system to understand and satisfy the
musical preferences of the user. The system differs from most audio players since the learning
is content-based : the system understands the preferences directly from the audio data without
using meta-data or putting a likelihood variable on the audio items.
The ranking system is used to sort the proposal list after it has been generated. Audio items
are sorted by descending likelihood so that the items on top of the list (the ones that the user
sees first) are the one he or she likes most.
GMM training phase
The system periodically trains a Gaussian Mixture Model (GMM) on the basis of the list of
preferences of the user. The preferences are collected when the user forces a proposal selection
closing; in fact, when the user closes a proposal selection in advance, he or she has usually chosen
his or her favourite option among the proposed ones. When the proposal selection window is
expired (the system decides to close the proposal selection), no action is taken by the ranking
system.
The information collected are the mean values of the tempo, rms and brightness features (harmony has been discarded since it is not considered connected to user preferences) in the selected
section of the audio item.
The GMM is then trained when 10 preferences have been collected using the EM algorithm (see
chapter 3).
Figure 5.23a shows an example of dataset, extracted from a real execution of the system. Figure
5.23b shows the mean of the trained GMM. The red circles represent the centres of the GMM
components.
GMM usage phase
After the GMM has been trained, it is used to order the audio items in the proposal list. For
each of them, the mean values of the tempo, rms and brightness features are extracted and the
likelihood is computed as follows:
likelihood(X) =
K
X
P (λk |X)
k=1
where P (λk |X) is the probability of the k component of the GMM.
(5.18)
CHAPTER 5. IMPLEMENTATION
(a) A preference dataset
(b) The centers of the trained GMM
Figure 5.23: GMM training example
50
CHAPTER 5. IMPLEMENTATION
5.3.4
51
Transitions
We now concentrate on the transitions. A transition (Figure 5.24) is a section of the audio
generated by the system in which two or more audio items are merged together.
Figure 5.24: Transition
In this delicate moment, the features may change very roughly since the merged items may
have different characteristics. The aim of this chapter is to describe some methods that solve
this problem by modifying the original waveform in order to smooth the feature change. These
methods mainly concern the tempo feature whose change is more evident during the cross-fading
operation.
Timescale
In order to smooth the tempo feature between two items, we compute a timescale operation to
the audio items. We provide a small example; suppose that the first audio items tempo is 120
BPM and the seconds is 130 BPM. During the transition the system should:
• Gradually increase the tempo of item 1 from 120 to 130 BPM
• Timescale item 1 to 120 BPM at the beginning of the transition and gradually increase its
tempo to 130 BPM
• Consider the case of segments shorter than the length of the transition
Figure 5.25 explains the concept:
Figure 5.25: Timescale operation in a transition
Using a phase vocoder to perform timescale, we have to define a transformation from the new
time scale to the original one (for each instant in the new time scale, the function returns the
position in the original time scale):
f : Tnew 7→ Told
(5.19)
We will now describe a method to compute f given:
• Lold , the audio segment length in the non-time-scaled reference
CHAPTER 5. IMPLEMENTATION
52
• βstart , the starting tempo ratio (i.e. the time scaling factor at the beginning of the audio
item)
• βend , the ending tempo ratio
We will linearly interpolate the two times (Figure 5.26) meaning that:
df
(τ ) = a · τ + b
dτ
(5.20)
df
(0) = βstart
dτ
(5.21)
By setting
and
where Lnew
df
(Lnew ) = βend
(5.22)
dτ
is the length of the transition in the new time scale (unknown for now), we get
a=
βend − βstart
Lnew
(5.23)
b = βstart
(5.24)
and
In order to calculate a solution for the differential equation and determine Lnew , we set as
Figure 5.26: Transition timescale linear approximation
starting condition
f (0) = 0
(5.25)
and we set the relationship between Lold and Lnew :
Lold = f (Lnew )
(5.26)
We get
Z
f=
df
a
βend − βstart 2
dτ = · τ 2 + b · τ + 0 =
τ + βstart · τ
dτ
2
2 · Lnew
(5.27)
2Lold
βstart + βend
(5.28)
Lnew =
CHAPTER 5. IMPLEMENTATION
53
Until now we are able to process a single audio item. To perform a complete transition, we have
to set the value of βstart and βstart for the two items involved in the transition.
Give BP Mout and BP Min , the BPM (beat per minute) of the two audio items (the fade-out
and fade-in respectively). For the fade-out item, we start with
βstart,out = 1
(5.29)
and change the ratio to
βend,out =
BP Min
BP Mout
(5.30)
βstart,in =
BP Mout
BP Min
(5.31)
For the fade-in item
and
βend,in = 1
(5.32)
We may notice that, even if the two item have the same length, the length of the segments in
the new time scale could, in general, be different. The item with the higher BMP will have the
longest segment.
Scale factor reduction In order to minimise the time scale factor, we take into account the
fact that two BMP such that
BP M1 = n · BP M2
(5.33)
where n is a natural number, can be considered as equivalent in the timescale operation since
the beats are synchronised up to an integer multiple.
Before executing the timescale, we will try to detect the value of n:
if (BPM_{out} > BPM_{in}) {
n = round(BPM_{out}/ BPM_{in});
BPM_{out} = BPM_{out} / n;
} else {
n = round(BPM_{in}/ BPM_{out});
BPM_{in} = BPM_{in} / n;
}
In this way the ratios βstart and βend will fall inside the interval [ 23 , 32 ]. If the result is still
unsatisfactory, a threshold interval can be set (e.g. [0.9, 1.1]) and the timescale is performed
only if the ratio is contained within that interval.
Transition beat synchronising
In the previous paragraphs, we discussed the problem of tempo change in the transition between
two audio items. We used a dynamic time-scale algorithm to adapt the tempos of the two items.
We will now approach another problem: even if the two items have the same tempo values, the
beats should be synchronised in order to obtain a pleasant transition (Figure 5.27).
This operation is performed through a peak detection in the correlation function between the
two signals; the correlation is high when the two signals show a good synchronisation.
Given the two audio items (fade out and fade in) x1 (t) and x2 (t),
CHAPTER 5. IMPLEMENTATION
54
Figure 5.27: Beat synch
• Compute the cross-correlation function
Z +∞
x1 (τ ) · x2 (τ + t) dτ
C(t) =
(5.34)
−∞
• Determine the maximum of the function
t∗ = arg max(C(t))
(5.35)
• Merge the two audio items with an offset equal to t∗
5.3.5
Interface
In the following paragraphs we will describe the interfaces between the user and the system.
Due to the strong artistic component of the project, the goodness of the system will depend
not only on its technical capabilities but also on the impression of the final user. Therefore this
section covers one of the most important parts of the system.
Some interfaces has been developed; they can be used at the same time in a collaborative
environment:
• Screen interface: normally the user interacts with the software through a traditional screen
GUI (Graphical user interface); the GUI is composed by buttons, listbox and all traditional
windows environment components.
• Tangible interface: similarly to what happens in the ReacTable, the system can be controlled through a set of tangible objects on a tabletop whose position is detected by a
video device (video camera or web-cam); the user can control the system by moving the
object on the table or turning them. In addition to that, the interface is equipped with a
finger tracking system.
• Tapping interface: a special input device has been developed to allow the user to specify
a musical tempo value. The user can send a set of impulse to the system (tapping on a
membrane, pressing a button, ...) which estimates the period between the impulses and
calculates a BPM.
• Wii Remote interface: the software can be controlled using the Wii remote, a Bluetooth
wireless controller used in the Nintendo Wii Console. The remote is equipped with an
accelerometer and a set of buttons; the accelerometer is used to detect the frequency in
which the user moves the controller and set the tempo features, the button are used for
both selecting the next audio item and setting the rms and brightness features.
CHAPTER 5. IMPLEMENTATION
55
Screen interface
A Java Swing interface has been created in order to control each parameter of the system. The
screen interface is used to edit the real-time parameters in a precise way (e.g. specify a fine-tuned
value of the features or the feature weights). In addition to this, the screen interface allows the
user to choose the next song to be played among a list of proposal made by the system.
Figure 5.28 shows the main program window.
Figure 5.28: The program window
It is composed by:
• Next song selection (Figure 5.29), where the user can choose the song to be played from
a list of proposals. Each entry shows the song name, the length of the transition and
highlights the portion of the items that will be used in the transition
Figure 5.29: The next song selection entry
• Now playing panel (a.k.a. ZBar): shows the playlist of the songs that are going to be
played.
• System parameters panel : the user can interact with the system using the right part of
the screen interface.
CHAPTER 5. IMPLEMENTATION
56
System parameters panel In this panel the user can modify the parameters used to generate
the list of proposals. The design of this part of the system aims at finding a compromise between
a light and an understandable user interface.
This panel is divided into three tabs:
• Feature values tab: this panel allows the user to specify the value of the features. Three
input methods has been designed:
– Tapping panel (used for the tempo), where the user can input the tempo by tapping
it (see 5.3.5). The tapping can be performed both using the mouse or the keyboard.
– Rms and brightness panel, where both RMS and Brightness features can be edited
at the same time. The panel allows the user to specify a point on a polar coordinate
system in which the angle defines the value of the brightness and the radius the value
of the rms.
– Mood panel, where the mood can be chosen among the four classes (Exuberant,
Anxious, Depressed and Content).
• Feature weights tab: in this panel the user can specify the weights of the features by using
a set of sliders
• Settings tab: this tab contains the general settings that can be edited by the user. They
are:
– Use Timescaling?, when enabled, the system will try to change the tempo of the
songs in order to improve the transitions.
– Use Tabulist?, when enabled, the system will avoid playing twice the same song.
– One proposal per audio item?, when enabled, each song will appear at most once in
the list of proposals.
– Enable fullscreen?
– Run time mode, the following settings influence the way in which the system generates
the proposals:
∗ Normal, the system will continue playing the song unless the user forces a change.
∗ Skip, the system is forced to change song at each iteration.
∗ Continue, the system is forced to continue playing the same song.
Tangible interface
The tangible interface allows the user to interact with the system by placing objects on a table.
The architecture of the system derives from the ReacTable structure (Figure 5.31a).
The Reactable’s main user interface consists of a translucent table. Underneath the table is a
video camera, aimed at the underside of the table and inputing video to a personal computer.
There is also a video projector under the table, connected to the computer, projecting video
onto the underside of the table top that can be seen from the upper side as well.
A set of objects are placed on a table and a video devices captures the scene (from above or
below). In order to detect the position of the objects, a set of symbols, called ”fiducial markers”
are attached to them (Figure 5.31b)
The image of the camera is processed by a fiducial detecting program that can identify the
fiducials and calculate position and angle. The program used in this application is ReacTIVision
(http://reactivision.sourceforge.net/), that is created by the ReacTable team and distributed
CHAPTER 5. IMPLEMENTATION
(a) Feature values tab
57
(b) Weights tab
(c) Settings tab
Figure 5.30: System parameters tabbed panel
(a) The ReacTable framework
(b) ReacTIVision fiducials
Figure 5.31: ReacTable system
CHAPTER 5. IMPLEMENTATION
58
under GPL licence. ReacTIVision is a standalone application, which sends TUIO messages
via UDP port 3333 to any TUIO enabled client application. The TUIO protocol was initially
designed within this project for encoding the state of tangible objects and multi-touch events
from an interactive table surface. It is an open framework that defines a common protocol and
API for tangible multitouch surfaces.
Based on this structure, we created a tangible interface in which the user can set the values of
the features, by placing objects on it.
In our application, we divide the fiducial in three categories:
• Feature weights: this set of fiducial controls the weights assigned to the features. When
the fiducial is not visible, the value of the weight is set to zero. If visible, the value is
determined by the vertical position (as in a music mixer) and ranges between 0.0 and 1.0.
To reduce the space needed in the placement of the fiducial, the maximum value is reached
in the middle of the table whereas the minimum values is reached near the edges (Figure
5.32).
Figure 5.32: Feature weights tangibles
• Feature values: this set of fiducial controls the value assigned to the features. When
the fiducial is visible on the table, a value is assigned to the corresponding feature; it
is determined by the horizontal position and ranges between the feature minimum and
maximum value (Figure 5.33). If the fiducial is not visible, the value of the feature is set
to ”null”.
Figure 5.33: Feature values tangibles
• Special fiducials: some fiducial have a special meaning; when they are visible in the scene,
they modify the behaviour of the system. In particular:
CHAPTER 5. IMPLEMENTATION
59
– Continue: forces the system to continue with the same musical piece when possible
(i.e. it did not reach the end of file)
– Skip: forces the system to change audio item
In addition to this, the tangible interface is equipped with finger tracking and is able to detect
the position of the fingers of the user touching the surface of the table. This information is
useful when combined with the projection of the screen interface on the table; in this way the
user can simulate a mouse touching the table:
• When the user places his or her finger on the table, the system sends a ”mouse button
press” event to the host operating system simulating a mouse click.
• When the user moves the finger keeping it tangent to the table, the system simulates a
mouse drag operation
• When the user removes his or her finger on the table, the system sends a ”mouse button
release” event to the host operating system simulating a mouse release.
Tapping interface
The tapping interface is used to specify the value of the tempo feature. The user sends to the
system a sequence of impulses and the system detects the tempo based on the impulse period.
Since this value can change in time, the algorithm should adapt to sudden or gradual changes.
In our application, the user sends the train of impulses either by clicking with the mouse on the
tapping panel of the screen interface or by pressing the space bar in the keyboard. The user, by
right-clicking on the tapping panel or by pressing ESC key, can clear the detected tempo.
This input system combines precision and simplicity of usage and it has been successfully used
in other commercial products such as Sibelius 6, a musical score editing program, in which the
user can ”direct” the performance of the program by specifying the tempo using the computer
keyboard or a predefined key of a digital piano.
The algorithm to detect the tempo uses the average tapping delay, computed at each impulse; it
represents the average distance in time between two consecutive impulses. A set of parameters
influence the behaviour of the algorithm:
• dmin , minimum delay between two impulses
• dmax , maximum delay between two impulses
• α, the averaging factor; it expresses how the recent estimation of tapping influence the
average (usually 0.33)
The local variable used by the algorithm are
• tprevious , previous impulse time
• Ntap , number of the last valid impulses. A valid impulse occurs with a delay between dmin
and dmax from the previous. When an invalid impulse is received, it is discarded; otherwise
Ntap is incremented.
Upon tapping at time ti :
• if Ntap == 0 (it is the first tap), increase Ntap
CHAPTER 5. IMPLEMENTATION
60
• else if Ntap == 1
– if dmin ≤ (ti − tprevious ) ≤ dmax (the impulse is valid), we can compute the first
estimation
davg = ti − tprevious
(5.36)
and increase Ntap
– else (the impulse is not valid), ignore it
• else
– if dmin ≤ (ti − tprevious ) ≤ dmax (the impulse is valid), we can compute the first
estimation
davg = α · (ti − tprevious ) + (1 − α) · davg
(5.37)
and increase Ntap
– else (the impulse is not valid), ignore it
• In all cases, record the current timestamp
tprevious = ti
(5.38)
The value of the average tapping delay is exported to the other applications only after a predefined number of valid impulses (in our case, when Ntap is greater than 3). In this way we avoid
transitory oscillation at the beginning of the tapping.
Wii remote interface
The Wii Remote is a remote controller used
in the Nintendo Wii console. It is equipped
with a infra-red camera used to track infra-red
point light sources (not used in the system), a
set of accelerometers used to detect the force
the remote is subject to and some buttons.
The interesting aspect of this device is that it
could be connected to the computer through a
Bluetooth; a Java library called WiiRemoteJ,
based on the Java Service Release 82 Bluetooth specifications, can elaborate the data
and provide an easy-to-use Application Programming Interface.
Figure 5.34: The Wii Remote
In the current work, the Wii Remote is used as
a remote controller with the feature of detecting the tempo by analysing the frequency on which
the user shakes the controller. When the system detects a Wii remote, it starts detecting the
button and accelerometer events. In order to allow the user to control when the accelerometer
information is used, two interaction modes has been defined:
• Default mode: this mode is activated when the ”B” button is not pressed; the interaction
with the system consists in the selection of the proposal made by the system.
– UP, DOWN button: control the selection of the song from the proposal list
– A button: select proposal
– ONE button: set the run time mode to ”default”
CHAPTER 5. IMPLEMENTATION
61
– TWO button: set the run time mode to ”skip”
• Feature mode: this mode is activated when the ”B” button is pressed; the user interaction
consists in setting the value of the features.
If the user shakes the Wii remote, the system tries to estimate the tempo feature by
performing a zero-cross detection in one direction and using the same algorithm of the
tapping interface.
– UP, DOWN button: control rms feature
– LEFT, RIGHT button: control brightness feature
– HOME button: clear feature values
Chapter 6
Evaluation
In this chapter we will perform an evaluation of the recommendation framework. The evaluation
has been performed through a questionnaire proposed to a number of sujects.
Since the goodness of the program is quite subjective, it has been chosen not to assess it automatically; therefore, the evaluation will be qualitative rather than quantitative and will be
obtained through a questionnaire.
This chapter begins with a description of the instance of the system used for the evaluation.
Afterward, the audio dataset used to evaluate the system is described. Finally we will describe
the structure of the questionnaire, followed by the raw results and our final considerations.
6.1
Instance of the system
The recommendation framwork described in the previous chapters is very general and can give
birth to many different applications. To perform the evaluation we had create an instance of
the framework that could be accessable and easily understood by a wide range of people.
Therefore we made the following decisions:
• System functionalities: the system is composed by the following components:
– Proposal generator, responsible of generating the list of fitting audio items according
to the preferences expressed by the user
– Transition generator, responsable of performing the transition when the system changes
musical piece. This components is able to time-scale the two audio items in order to
make the transition between the two BPMs as smooth as possible.
The proposal ranking system has not been activated during the evaluation since it needs
time to be trained. Therefore, the duration of the test would have been longer, diencouraging people from undertaking it. It is however possible, in case the tester is interested in
experimenting this feature, to retake the test including the proposal ranking system.
• Interface: the interface of the system consists in the screen interface. The tangible and
the Wii remote interfaces are not considered, since the user may reach difference level of
confidence with them and the evaluation may be biased.
62
CHAPTER 6. EVALUATION
6.2
63
Datasets
The set of audio items used during the system can be divided in two categories: the one used
for the mood-detection training and the one used for the evaluation.
6.2.1
Mood training dataset
In 5.1.1 we described the use of three SVMs to classify the audio in four mood classes (Contentment, Exuberance, Anxiety and Depression). In this section we will explain how the dataset on
which the SVMs have been trained has been created.
For each class, a set of 5-seconds-long audio excerpts have been selected. The selection has been
been performed keeping in mind the characteristic of each class:
• Contentment: quiet music with a positive emotional content
• Exuberance: loud music with a positive emotional content
• Anxiety: loud music with a negative emotional content
• Depression: quiet music with a negative emotional content
In Appendix B the list of files that has been used to train the mood extraction system is
presented.
6.2.2
Evaluation dataset
The audio database used during the evaluation phase is composed by 1347 items mainly belonging to the following genres:
• Progressive rock (Alan Parsons Project, Pink Floyd, ...)
• Club/disco music (David Guetta, Gigi D’Agostino, ...)
• Metal (Sonata Arctica, Yngwie Malmsteen, ...)
• Classical (Bach, Concerti brandeburghesi)
These three very different genres has been chosen to approximate the tastes of a wide audience,
which can use the system with music they are familiar with.
6.3
Test structure
The test aims to assess the goodness of the proposal and the transition generation systems. The
TM T9400
R
experiment platform is a Windows 7 Professional Edition computer with an IntelCore
2.53GHz processor and 4GB memory.
The test has been proposed to a total of 80 people of different ages, origins and study backgrounds. Figure 6.1 shows the composition of the testset. It is biased toward the classes of
prospected user to whom the program is addressed: young people with scientific background
(most of the people come from Europe since the questionnaire has been presented in Milano).
CHAPTER 6. EVALUATION
64
Figure 6.1: The characteristics of the tester set
A standard test format has been designed to allow an unbiased and appropriate evaluation of
the system; it is composed by the following phases:
• Explanation (around 5 minutes): the functionalities of the system and the user interface
are explained to the tester. In particular the user should know:
– the meaning of the controllers of the value panel (tempo, rms/brightness, mood) and
the settings panel (in particular the run time mode (Normal, skip and continue)
– the meaning of the ”Now playing” panel
– how to use the proposal list panel to express preferences. In particular the user
should be aware of the fact that the system automatically selects the first proposal if
no change has been made
• Testing (10 minutes): in this phase the user tries to use the system by himself/herself.
• Questionnaire: the user fills a questionnaire
A shorter format can be applied to the people that only listen to the music generated by the
system. In this case they have to fill a subset of the questions.
The questionnaire proposed to the user is divided in three sections:
• Personal information: this section is aimed at understanding the characteristics of the
users and classify them
• Auditory questions: this section is focused on the audio output of the system and the first
impression the user gets from the software
• Usage questions: this section can be filled only by a person that interacted with the
software and is focused on the usability test of the system
• Comments
6.4
Results of the questionnaire
In this section we will show the result of the questionnaire. A brief comment has been added
on top of each graph.
The users have been segmented in the following classes, according to their level of knowledge of
music:
CHAPTER 6. EVALUATION
65
Personal information
Age
Origin
Education
How many hours a day do you listen to music?
Where do you listen to music?
Do you usually listen to music while
Do you listen to the radio?
How many years did you study music?
Which musical instruments do you play?
Have you already used a Musical Production Software?
 0 – 30
 30 – 50
 over 50
________________________________________________

 Scientifi c
 Humanisti c
 0-1
 1-3
 3+

 a t home
 cl ub/disco
 s tudying/working  dri ving
 doing sports
 yes
 no
0
 1-3
 3+
________________________________________________
 no
 yes _______________________
Auditory questions










Did the system play pleasant music?





How do you evaluate the transitions between songs?
How well can the system be applied in the following fields






Disco/pub performance






Personal use (iPod, ...)






Artistic exhibitions






Music analysis

Other, specify: _______________________________________________________________________________________
Usage questions (fill this part only if you used the system)
How do you rate the following

Screen interface (is the program intuitive?, appealing?, ...)

Speed (is it fast enough?)

Complexity (is it too difficoult to use/understand?)

Parameter num ber (are there too many/few parameters to control?)
How well does the system suit you artistic needs?
Does the system respond to the changes in the parameters?
How do you rate the music proposal made by the system?
Do you think the system could enhance your artistic performance?
Is the system intuitive (how much time do you need to learn how to use it)?


















































Comments
If you have any suggestion or comment, write here:
Figure 6.2: The paper version of the questionnaire
CHAPTER 6. EVALUATION
Figure 6.3: The electronic version of the questionnaire
66
CHAPTER 6. EVALUATION
67
• Unclassified : not belonging to any of the two following classes
• Listener : this user listen to music at least one hour a day and is able to recognize musical
tracks and evaluate the transition between songs. The main focus of this user will be the
output audio generated by the system.
Requirements: listen to music more than one hour a day
• Producer : this user knows how Music Production Software works and has already used
one. This user is able to compare the system with the professional software.
Requirements: used a Music Production Software
6.4.1
Auditory questions
The first question (Did the system play pleasant music? ) analyses the overall perception of the
music generated by the system. The result, displayed in Figure 6.4, shows a good value, uniform
in the three classes.
Figure 6.4: Did the system play pleasant music?
The evaluation of the transition between songs (Figure 6.5) is quite high in all classes.
Figure 6.5: How do you evaluate the transitions between songs?
CHAPTER 6. EVALUATION
68
Concerning applications of the program (How well can the system be applied in the following
fields? ) (Figure 6.6), we notice that producers are more reluctant to disco and pub applications
since this is the field in which they excel. However, disco/pub and personal usage are in general
the preferred ones.
(a) Disco/pub performance
(b) Personal use (iPod, ...)
(c) Artistic exhibitions
(d) Music analysis
Figure 6.6: How well can the system be applied in the following fields?
CHAPTER 6. EVALUATION
6.4.2
69
Usage questions
The next set of questions (How do you rate the following...? ) evaluate the impact of the software
to the user and the suitability of the graphical interface (Figure 6.7). The user interface is well
rated whereas the speed and reactivity of the program does not satisfy the producers who need a
fast real-time system. The evaluation of the number of parameter shows a peak near ”Normal”
since the parameters that can be controlled mainly depends on the application and the market
segment the software is directed to.
(a) Screen interface (is the program intuitive?, appealing?, ...)
(b) Speed (is it fast enough?)
(c) Complexity (is it too difficult to use/understand?)
(d) Parameter number (are there too many/few parameters to control?)
Figure 6.7: How do you rate the following...?
CHAPTER 6. EVALUATION
70
How well does the system suit you artistic needs? : the suitability to the artistic needs (Figure
6.8) is higher in the non-professional classes.
Figure 6.8: How well does the system suit you artistic needs?
Do you think the system could enhance your artistic performance? : the artistic performance
improvement (Figure 6.9) is evident in the non-professional classes.
Figure 6.9: Do you think the system could enhance your artistic performance?
Does the system respond to the changes in the parameters? : the changes in the parameters
(Figure 6.10) are well perceived by the users.
Figure 6.10: Does the system respond to the changes in the parameters?
CHAPTER 6. EVALUATION
71
How do you rate the music proposal made by the system? : the proposals made by the system
(Figure 6.11) were considered quite positively.
Figure 6.11: How do you rate the music proposal made by the system?
Is the system intuitive (how much time do you need to learn how to use it)? : the system is
considered very intuitive, after the 5 minutes explanation (Figure 6.12).
Figure 6.12: Is the system intuitive (how much time do you need to learn how to use it)?
6.4.3
Comments
The last part of the questionnaire can be considered the most interesting part since the user
gave their suggestions and advices that can help to improve the software. The critical points of
the system and the improvements that could be made are described in chapter 7.
6.5
Overview of the result
From the questionnaire, we are able to infer that the overall impression of the software is quite
positive in all classes, although the producers are sceptical about some features, in particular
the transition generation.
The reactiveness of the software has been well evaluated. Usually, when people are using a realtime software, they expect that when they perform some changes in the parameters, the system
promptly responds by modifying some parts of the screen interface. The actual extent of the
changes is not important (the system may show only a part of the computation, continuing to
elaborate in background); the crucial point is giving the user the impression that his or her need
not ignored by the system. The system, thanks to the efficient proposal generation algorithms
and the moderate dimension of the database, performed well under this point of view.
CHAPTER 6. EVALUATION
72
The result of questions about the artistic suitability of the system highly depends on the testset classes. The system is not seen as essential by the producers, that are more prone to the
traditional DJ systems. The listener however evaluate the system as a useful tool, even more
when integrated in other platforms. In addition, non-professional users think that the system
can really improve their artistic performance and capabilities. In our opinion, the system should
consider this market segment in which the enhancement is more evident.
About the applications of the system, professional users are sceptical about the disco/pub applications, since the performance and the transition methods is considered too poor. On one
hand, this shows the need of improving those features by adding more transition types but, on
the other hand, the evaluation of the producers is probably biased toward a traditional view of
the task of a DJs and a worry of being somehow replaced by the system. As we previously said,
the system still needs the presence of the human artist to give its best and should be seen as a
new tool, not a competitor.
The tester set is not enthusiastic about the idea of personal usage of the system, but still considers it very promising. We may motivate this answer in the following way: the system, as it
is, gives access to too many parameters and does not automate any control; a personal media
player application should be more autonomous and avoid the interaction with the user as much
as possible, since he or she is usually just interested in listening to music.
A very important result is the good evaluation of the ease-of-usage of the system; even nonprofessional user consider the software as being very intuitive and user-friendly. This point is
crucial in the future evolutions of the system that involve a wide-public, since it should be accessible and used by any kind of person. This result also evaluates the user interface, stating
that the design has reached its purpose.
Chapter 7
Perspectives and future
developments
In this chapter we will list some improvements and extensions that can be applied to the system.
These conclusions has been developed according to the result of the evaluation phase.
7.1
System-level improvements
Video application In the current work we explored the possibilities of audio feature extraction. A very similar research field in which analogous methods can be applied is ”video feature
extraction”. The relationship between these two areas is clear: they both consist in calculating
the value of some parameters from time-dependent data. The system can be easily applied also
to this new field.
Audio and video composition share many characteristics since they both aim at creating the
novel by combining existing material. The system can become very interesting when the two
approaches are exploited simultaneously by composing an audio+video stream and defining features that relate the two worlds (e.g. a user may want to generate a ”calm” music played along
a ”calm” video or may want to find a background music that is somehow synchronised with the
video scene).
User interface study The issue of the user interface is far from being solved right now.
A careful design of the visual and tangible interface has to be performed. The goodness of
the system at the eyes of the users will be evaluated on the basis of the interface that should
be as intuitive as possible and provide the fast interaction mechanism needed by a real time
performance system.
Interaction methods such as drag-and-drop or mouse motion detection could be interesting. A
good example of graphical interface shown in Figure 7.1.
In addition to this, the interface should give feedback to the user about the evolution of the
features (e.g. showing the values of the feature of the output audio) and show the spectrogram
or signal envelope in the proposal list and in the time-line.
Versioning Until now, the system has been developed without considering the possible categories of users. In the next future, the production should consider different applications of the
system and perform a careful selection of the functionalities of the software according to the
73
CHAPTER 7. PERSPECTIVES AND FUTURE DEVELOPMENTS
74
Figure 7.1: FL Studio interface
needs of the different users.
The different editions of the software could be:
• STUDIO version: with the complete set of functionalities, used by professional users
• DJ version: with an appealing user interface and optimised to be played in real time
through the tangible interface
• LITE version: used by non-professional users, consisting in an audio player that allows
the user to specify a few relevant features
Next song choice The user should have the possibility to choose the next song from the
database, bypassing the system proposal. When this happens, the system would try to find the
best fitting point in the song. Moreover a fast retrieval way should be developed. Usually this
consists in allowing the user to start typing a part of the song name and incrementally constraint
the database showing only the songs that contain the typed characters.
Time-line editing
In particular:
The user should be able to edit the songs already inserted in the time-line.
• Delete a song from the time-line if the playback has not started yet
• Set the length of the transition between two audio items and the length of the playback
of an audio item
• Create loops; by doing so, the user specifies that the system should loop inside an audio
segment
CHAPTER 7. PERSPECTIVES AND FUTURE DEVELOPMENTS
75
Integration between the preprocessing and performance phase The preprocessing
phase should be performed by the same software that plays the audio item, allowing the user
to incrementally enlarge the database by adding audio items in real time. Since the analysis is
very computationally expensive, it is could be done while the system is paused and still allowing
the features values to be stored in an XML file to speed up the performance.
Multiple proposal per audio item If there are multiple proposal for the same audio item
(many sections of the song are compatible with the currently played song), they should be
displayed in the same cell of the proposal list.
Multiple transition types For the professional application it would be nice to allow the user
select multiple types of transitions in order to adapt to different music styles and songs. The
user could also define a custom transition type by defining the evolution of some parameters
(spectrogram, volume, ...) during it.
7.2
Implementation improvements
Feature extraction The more relevant features are extracted, the more interesting the system
becomes. During the evaluation phase, a few users expressed the need of controlling more
features or improving the range of values of the features (in particular the ”mood” feature should
consider more nuances of emotions). A genre classification system could be also interesting.
In addition, this aspect leads to the next paragraph, since the features should not be hard-coded
in the software but the user should have the freedom of expanding the feature set.
Modularity The system should be customisable and expandable, allowing the user to add
new components and functionality at runtime. Due to the current early stage of the project,
this aspect has not been considered yet.
The modularity can be applied to the following areas:
• features: the user should be able to create his or her own feature extraction function by
implementing an exported interface. In order to define a feature, the user has to specify:
– Feature domain: the set of values that the feature can assume
– Feature extraction function: a function that extract the evolution of the feature from
an audio signal
– Feature similarity function: used to compare two values of the feature
• input devices: the user should be able to control the system in a personalised way (e.g.
using a MIDI keyboard, a virtual instrument or a mixer). The input device can control
the evolution of the value of one or more features, change the feature weights or perform
custom actions such as stopping the system or forcing a predefined behaviour.
• output devices: the result of the system should be accessible by other programs such as
real-time effect generators, audio editing or performance software. We may identify the
following exportable items:
– audio data: the audio generated by the program
– feature evolution parameters: the value of the features of the played audio can be
exported in order to be used by visualisation tools or other systems.
CHAPTER 7. PERSPECTIVES AND FUTURE DEVELOPMENTS
76
Silence removal The system should remove the silence from the beginning and the end of
the song as they should not be considered as part of the song.
Manual anchor selection The user should be able to manually select the anchors in a song.
Saving the result The system should allow the user to save the sequences of audio items
that are played. The saved data should still allow the user to edit the sequence and adjust some
settings after the playback. In addition, the system should be able to export the result in an
audio file.
Preview The system should allow the user to preview the transition and listen to the audio
items before they are actually played in the time-line.
Pause
The system allow the performance to be paused.
Formats
7.3
The system should be able to handle more audio formats (.mp3, .aiff, .ape, ...).
From musical compositing to composition
In this final section we would like to explore a new conception of music, based on the recommendation framework developed in this thesis. At this point, an application of the framework can be
a dynamic music compositing software that can automatically create transitions and adapt the
emotional aspects of music to a particular situation. We would like to think about the evolution
of the system toward a musical composition software that embodies a new idea of music: the
musical graph.
When we listen to music or play it, we usually consider the score as a line in time that somehow
evolves. On the one hand, time is inseparable from music, since musical notes, melodies or
harmonies need a time component to be meaningful. On the other hand, in our mind, the exact
evolution of music in time is not important: what we remember about a musical track is the
emotion it gave us or some relevant sections or melodies.
This is particularly clear in film music. The viewers are usually more likely to remember the
main theme of a soundtrack (e.g. the theme of ”Indiana Jones” by John Williams) rather than
the variations and arrangements made during the scenes of the movie.
A new idea of music is born; a musical piece in the mind of the composer and the listener is a
multidimensional space that is projected on time in the moment it is played. Musical themes,
melodies, harmonies and rhythms are connected in a tight web that the composer can explore
during the creation act.
In the field of video games, the developers of the Lucasart’s adventure games designed iMUSE
(1990),the first system that dynamically arranges music on the basis of the game scene. The
composers arrange musical themes in a sort of musical graph where each node is a musical pattern (the main themes) and arcs are the transitions between those themes. On the basis of the
inputs given by the player, the system merges music segments and plays a coherent music stream
that follows the gameplay. The weakest point of this system is the need for expert composers
and users to manually design transitions and themes.
We would like to go beyond it and design a system that, based on a music database, can automatically create transitions and adapt the emotional aspects of music to a particular situation.
It is here that Music Information Retrieval (MIR) comes to aid: modern MIR techniques allow
CHAPTER 7. PERSPECTIVES AND FUTURE DEVELOPMENTS
77
the computer to automatically extract relevant audio features. This information can then be
used to create smooth transitions between segments.
To conclude, we would like to spend a few words about the concept of human creativity. In our
opinion, the creation process is not just referred to the concept of creatio ex novo but also to
the idea of finding new links and combinations among things that already exist ([46]). The very
term ”composition” referring to musical creativity suggests this interpretation. It is sometimes
surprising how new ideas derive just from the juxtaposition of existing material.
Bibliography
[1] Torsten Anders. Composing Music by Composing Rules: Computer aided composition employing Contstraint Logic Programming. PhD thesis, Queen University Belfast, 2003.
[2] Fabio Antonacci, Antonio Canclini, and Augusto Sarti. Advanced Topics on Audio Processing. 2009.
[3] Eiichiro Aoki, 1982.
[4] Jean-Julien Aucouturier and Francois Pachet. Scaling up music playlist generation. 2001.
[5] Luke Barrington, Reid Oda, and Gert Lanckriet. Smarter than genius? human evaluation
of music recommender systems. 2009.
[6] P. Bellini, P. Nesi, and M. B. Spinu. Cooperative visual manipulation of music notation.
2002.
[7] Klaas Bosteels and Etienne E. Kerre. A fuzzy framework for defining fynamic playlist
generation heuristics. Fuzzy sets and systems, pages 3342–3358, 2009.
[8] Jean Bressin, Carlos Agon, and Gerard Assayag. Openmusic 5: A cross-platform release of
the computer-assisted composition environment. 2006.
[9] William A. S. Buxton. A composer’s introduction to computer music. 1975.
[10] Rui Cai, Chao Zhang, Lei Zhang, and Wei-Ying Ma. Scalable music recommendation by
search. 2007.
[11] Chris Chafe. Case studies of physical models in music composition. 2003.
[12] Sarit Chantasuban and Sarupa Thiemjarus. Ubiband: A framework for music composition
with bsns. IEEE Xplore, pages 267–272, 2009.
[13] Yap Siong Chua. Composition based on pentatonic scales: a computer-based approach.
1991.
[14] DLKW. Codeorgan. URL http://www.codeorgan.com/.
[15] Todor Fay, 1995.
[16] Yazhong Feng, Yueting Zhuang, and Yunhe Pan. Music information retrieval by detecting
mood via computational media aesthetics. 2003.
[17] Derry Fitzgerald. Automatic Drum Trascription and Source Separation. 2004.
[18] Olivier Gillet and Gael Richard. Extraction and remixing of drum tracks from polyphonic
music signals. pages 315–318, 2005.
[19] E. Gmez. Tonal description of music audio signal. PhD thesis, Universitat Pompeu Fabra,
2006.
78
BIBLIOGRAPHY
79
[20] Sten Govaerts, Nik Corthaut, and Erik Duval. Mood-ex-machina: towards automation of
moody tunes. 2007.
[21] Evolutionary System Group.
http://musigen.unical.it/.
Social
network
di
musica
generativa.
URL
[22] Martin Henz, Stefan Lauer, and Detlev Zimmermann. Compoze - intention-based music
composition through constraint programmimg. 2009.
[23] Sergi Jordà and Otto Wuest. A system for collaborative music composition over the web.
2001.
[24] Sergi Jordà, Martin Kaltenbrunner, Günter Geiger, and Ross Bencina. The reactable. 2004.
[25] Ajay Kapur, Manj Benning, and George Tzanetakis. Query-by-beat-boxing: Music retrieval
for the dj. 2004.
[26] Krumhansl. Cognitive foundations of musical pitch. Oxford UP, 1990.
[27] Michael Z. Land and Peter N. McConnel, 1991.
[28] Cyril Laurier and Perfecto Herrera. Mood cloud: A real-time music mood visualization
tool. 2008.
[29] Tao Li and Mitsunori Ogihara. Toward intelligent music information retrieval. 2006.
[30] Xuelong Li, Dacheng Tao, Stephen J. Maybank, and Yuan Yuan. Visual music and musical
vision, 2008. URL http://www.elsevier.com/locate/neucom.
[31] Dan Liu, Lie Lu, and Hong-Jiang Zhang. Automatic mood detection from acoustic music
data. 2003.
[32] Shazam Entertainment Ltd. Shazam, 2010. URL http://www.shazam.com/.
[33] Niitsuma Masahiro, Hiroshi Takaesu, Hazuki Demachi, Masaki Oono, and Hiroaki Saito.
Development of an automatic music selection system based on runner’s step frequency.
2008.
[34] Owen Craigie Meyers. A mood-based Music Classification and Exploration System. PhD
thesis, Massachussetts Institue of Technology, 2007.
[35] Alexandros Nanopoulos, Dimitrios Rafailidis, Maria M. Rixanda, and Yannis Manolopoulos.
Music search engines: Specification and challenges. 2009.
[36] Elias Pampalk, Arthur Flexer, and Gerhard Widmer. Improvements of audio-based music
similarity and genre classification. 2005.
[37] Bryan Pardo. Finding structure in audio for music information retrieval. IEEE Signal
Processing Magazine, 2006.
[38] Steffen Pauws, Win Verhaegh, and Mark Vossen. Music playlist generation by adapted
simulated annealing. Information Sciences, pages 647–662, 2007.
[39] Mauro C. Pichiliani and Celso M. Hirata. A tabletop groupware system for computer-based
music composition. 2009.
[40] Pietro Polotti and Davide Rocchesso. Sound to Sense - Sense to Sound: A state of the art
in Sound and Music Computing. 2008.
BIBLIOGRAPHY
80
[41] Giorgio Prandi, Augusto Sarti, and Stefano Tubaro. Music genre visualization and classification exploiting a small set of high-level semantic features. 2009.
[42] Gordon Reynolds, Dan Barry, Ted Burke, and Coyle Eugene. Towards a personal automatic
music playlist generation algorithm: The need of contextual information. 2007.
[43] Alexander P. Rigopulos and Eran B. Egozy, 1995.
[44] R. Roth. Music and animation tool kit: Modules for computer multimedia composition.
Computers Mathematical Applications, pages 137–144, 2009.
[45] Man-Kwan Shan, Fang-Fei Kuo, and Suh-Yin Lee. Emotion-based music recommendation
by affinity discovery from film music. Expert Systems with Applications, 2009.
[46] Peyman Sheikholharam and Mohamad Teshnehlab. Music composition using combination
of genetic algorithms and kohonen grammar. 2008.
[47] Muneyuki Unehara and Takehisa Onisawa. Interactive music composition system. pages
5736–5741, 2004.
[48] Yi-Hsuan Yang, Lin Yu-Ching, and Homer H. Chen. Music emotion classification: a regression approach. 2007.
Appendix A
User manual
In this chapter we will describe how to use the components of the system. The first section
describes how to install the needed software, whereas the second section explains the commands
and instruction that have to be executed.
The work-flow is the following: starting from a set of audio files, a MATLAB script is used to
extract the features and save them in an xml file. After that, the Java performance program
can be executed.
A.1
Prerequisites
In order to execute the software, some programs need to be installed.
Figure A.1: MATLAB interface
MATLAB MATLAB stands for ”MATrix LABoratory” and is a
numerical computing environment. Developed by The MathWorks,
MATLAB allows matrix manipulations, plotting of functions and data,
implementation of algorithms, creation of user interfaces, and interfacing with programs written in other languages, including C, C++,
and Fortran.
MATLAB can be purchased at http://www.matlab.com.
MIRtoolbox MIRtoolbox offers an integrated set of functions written in Matlab, dedicated to the extraction from audio files of musical
features such as tonality, rhythm, structures, etc. The objective is to
offer an overview of computational approaches in the area of Music
Information Retrieval. The design is based on a modular framework:
the different algorithms are decomposed into stages, formalized using
a minimal set of elementary mechanisms. These building blocks form
Figure A.2: MIRtoolbox the basic vocabulary of the toolbox, which can then be freely articulogo
lated in new original ways. These elementary mechanisms integrates
all the different variants proposed by alternative approaches - including new strategies we have developed -, that users can select and parametrize. This synthetic
digest of feature extraction tools enables a capitalization of the originality offered by all the alternative strategies. Additionally to the basic computational processes, the toolbox also includes
higher-level musical feature extraction tools, whose alternative strategies, and their multiple
combinations, can be selected by the user.
81
APPENDIX A. USER MANUAL
82
The choice of an object-oriented design allows a large flexibility with respect to the syntax:
the tools are combined in order to form a sets of methods that correspond to basic processes
(spectrum, autocorrelation, frame decomposition, etc.) and musical features. These methods
can adapt to a large area of objects as input. For instance, the autocorrelation method will
behave differently with audio signal or envelope, and can adapt to frame decompositions.
The toolbox is conceived in the context of the Brain Tuning project financed by the European
Union (FP6-NEST). One main objective is to investigate the relation between musical features
and music-induced emotion and the associated neural activity.
Java Runtime Environment (JRE) A Java Virtual Machine
(JVM) enables a set of computer software programs and data structures to use a virtual machine model for the execution of other computer programs and scripts. The model used by a JVM accepts a form
of computer intermediate language commonly referred to as Java bytecode. This language conceptually represents the instruction set of a
stack-oriented, capability architecture.
Programs intended to run on a JVM must be compiled into a standardized portable binary format, which typically comes in the form of
.class files. A program may consist of many classes in different files.
For easier distribution of large programs, multiple class files may be
Figure A.3: Java Runtime packaged together in a .jar file (short for Java archive).
Environment logo
The JVM runtime executes .class or .jar files, emulating the JVM instruction set by interpreting it, or using a just-in-time compiler (JIT)
such as Sun’s HotSpot. JIT compiling, not interpreting, is used in most JVMs today to achieve
greater speed. Ahead-of-time compilers that enable the developer to precompile class files into
native code for a particular platforms also exist.
Like most virtual machines, the Java Virtual Machine has a stack-based architecture akin to
a microcontroller/microprocessor. However, the JVM also has low-level support for Java-like
classes and methods, which amounts to a highly idiosyncratic memory model and capabilitybased architecture.
The JVM, which is the instance of the ’JRE’ (Java Runtime Environment), comes into action
when a Java program is executed. When execution is complete, this instance is garbage-collected.
JIT is the part of the JVM that is used to speed up the execution time. JIT compiles parts of
the byte code that have similar functionality at the same time, and hence reduces the amount
of time needed for compilation.
BlueCove Java Library BlueCove is a freeware implementation
of the Java JSR-082 Bluetooth specification. It provides a platformindependet layer that can be used by the Java classes.
Figure A.4:
logo
A.2
A.2.1
Bluetooth
Running the system
Feature extraction
The feature extraction is performed by a MATLAB script. In order to run the script, the audio
files should be downsampled to 11025 Hz, converted to .wav and placed in a folder called ”audio”
in the script current folder.
After that, it is necessary to run the script ”batchMirAnalysis.m” that generates the feature
XML files and stores them in the ”xml” folder.
APPENDIX A. USER MANUAL
83
For each file, a set of xml documents are created:
• filename anchors.xml: this document contains the position of the anchors
• filename brightness.xml: this document contains the values of the brightness feature
• filename harmony.xml: this document contains the values of the harmony feature
• filename mfcc.xml: this document contains the values of the mfcc feature
• filename mood.xml: this document contains the values of the mood feature
• filename rms.xml: this document contains the values of the rms feature
• filename tempo.xml: this document contains the values of the tempo feature
A.2.2
Performance
The result of the analysis phase can be used to run the Java program and obtain a performance.
To do so, the user has to copy the audio (11025 Hz .wav files) and xml documents and place
them in a folder inside the ”audio” and ”xml” subfolders respectively. The path of the folder
can be set editing the ”polysound.properties” file (see later).
It is now possible to run the software by typing:
java Main
in the program main directory.
A.2.3
Configuration
In order to allow the user to configure the default settings, a properties file has been created
whose values are loaded by the program at startup. The file is located in the root folder of the
program (the same as ”Main.class” file).
The configuration file is a set of key/value pairs and it is structured as following:
#Comment
key value
key value
...
The keys are the following:
• AudioDatabase.audioPath (String): it stores the path of the audio folder (the one containing audio files). There could be multiple audio folders; in this case, the paths are separated
by ”;”
• AudioDatabase.xmlPath (String): it stores the path of the XML folder (the one containing
the XML files). There could be multiple XML folders; in this case, the paths are separated
by ”;” and they should appear in the same order w.r.t the correspondent audio folder
• ProposalGeneratorSettings.minFadeLength (long): the maximum length (in samples) of
the segments used for transitions
APPENDIX A. USER MANUAL
84
• ProposalGeneratorSettings.maxFadeLength (long): the minimum length (in samples) of
the segments used for transitions
• ProposalGeneratorSettings.proposalListMaxLength (int): the maximum number of proposal made by the system at each iteration (default: 20)
• ProposalGeneratorSettings.harmonyWeight (double): the initial harmony weight (default:
0.33)
• ProposalGeneratorSettings.tempoWeight (double): the initial tempo weight (default: 0.33)
• ProposalGeneratorSettings.brightnessWeight (double): the initial brightness weight (default: 0.33)
• ProposalGeneratorSettings.rmsWeight (double): the initial rms weight (default: 0.33)
• ProposalGeneratorSettings.moodWeight (double): the initial mood weight (default: 0.33)
• ProposalGeneratorSettings.useTabuList (boolean): specifies if the system should use the
tabu list (default: true)
• ProposalGeneratorSettings.oneProposalPerDatabaseAudioItem (boolean): specifies if the
system should display only one proposal per audio item during the proposal generation
phase (default: true)
• RunTimeSettings.runTimeModeEnum (NORMAL—SKIP—CONTINUE): defines the initial run time mode (default: NORMAL)
• TransitionGeneratorSettings.useTimescale (boolean): defines if the system should use
timescale (default: false)
• ProposalRecommender.pointListXMLFilename (String): the path of the files in which the
player history will be saved. If the file exists, the system reads it at the beginning of the
execution and then updates it
Appendix B
The audio database
In this chapter we attach the list of audio items used during the evaluation phase, both for the
mood training and the playback.
B.1
Mood detection training set
This section contains the list of files used during the training phase of the mood detection
algorithm. They are divided according to the mood class.
B.1.1
Anxious
Liberi Fatali
Maybe I’m A Lion
Only A Plank Between One and Perdition
Force Your Way
Other World
Start
Hurry
Attack
The Legendary Beast
B.1.2
Contentment
May it be
J.S. BachConcerto No.6 in B flat major BWV III Allegro
J.S. BachConcerto No.5 in D major BWV III Allegro
J.S. BachConcerto No.3 in G major BWV VI Allegro
J.S. BachConcerto No.6 in B flat major BWV I Allegro
J.S. BachConcerto No.3 in G major BWV IV Allegro
B.1.3
Depression
J.S. BachConcerto No.1 in F major BWV II Adagio
J.S. BachConcerto No.3 in G major BWV V Adagio (from Trio Sonata
Drifting
in G major BWV 1048)
J.S. BachConcerto No.5 in D major BWV II Affetuoso
Total Eclipse
J.S. BachConcerto No.2 in F major BWV II Andante
Tragedy
J.S. BachConcerto No.6 in B flat major BWV II Adagio ma non tanto
Path of Repentance
Ominous
B.1.4
Exuberance
85
APPENDIX B. THE AUDIO DATABASE
86
Orinoco flow
Don‘t Hold Back
Little Hans
One More River
Pin floi
B.2
The performance database
This section contains the list of files used during the evaluation phase. The items belong to
different genres to test the system’s ability to change from a style to another and providing the
user with music they may know.
J.S. Bach - Concerto No.1 in F major BWV
I Am A Mirror
Avalanche
1046 - I Allegro
I‘d Rather Be a Man
Damned If I Do
J.S. Bach - Concerto No.4 in G major BWV
Mammagamma 04
Funny You Should Say That
1049 - I Allegro
Some Other Time
Hawkeye [Instrumental]
A Dream Within A Dream
The Tell-Tale Heart
Hawkeye
Heaven Knows
Wine From The Water
Inside Looking Out
I Robot
J.S. Bach - Concerto No.1 in F major BWV
L’Arc En Ciel
La Sagrada Familia
1046 - IV Minuetto-Trio I-Polonaise-Trio II
The Fall Of The House Of Usher I
Let’s Talk About Me
J.S. Bach - Concerto No.5 in D major BWV
The Voice
Let‘s Talk About Me
1050 - I Allegro
Where‘s The Walrus (Instrumenta
Lucifer
Breakaway
You‘re Gonna Get Your Fingers Bur
Return To Tunguska
Breakdown
J.S. Bach - Concerto No.2 in F major BWV
Sirus (Instrumental)
Gemini
1047 - III Allegro assai
Stereotomy
I Don’t Wanna Go Home
J.S. Bach - Concerto No.6 in B flat major
The Nirvana Principle
Little Hans
BWV 1051 - I Allegro
The Three Of Me
Sooner Or Later (2)
A Recurring Dream Within A Dream
J.S. Bach - Concerto No.1 in F major BWV
Sooner Or Later
Don‘t Hold Back
1046 - II Adagio
Standing On Higher Ground
Nucleus
J.S. Bach - Concerto No.4 in G major BWV
The Cask Of Amontillado
Paseo De Gracia (Instrumental)
1049 - II Andante
Walking On Ice
Psychobabble
Beaujolais
We Play The Game
Somebody Out There (2)
Eye In The Sky
You Won‘t Be There
Somebody Out There
Freudiana
J.S. Bach - Concerto No.2 in F major BWV
The Fall Of The House Of Usher I
I Wouldn’t Want To Be Like You
1047 - I Allegro
Turn Your Heart Around
More Lost Without You
J.S. Bach - Concerto No.5 in D major BWV
You‘re On Your Own
Seperate Lives (2)
1050 - II Affetuoso
J.S. Bach - Concerto No.3 in G major BWV
Seperate Lives
Don’t Let It Show
1048 - IV Allegro
The Raven
Dora
J.S. Bach - Concerto No.6 in B flat major
Too Late
How Can You Walk Away
BWV 1051 - II Adagio ma non tanto
Tragedy
In The Real World
Chinese Whispers (Instrumental)
Turn It Up
Money Talks
Day After Day (The Show Must Go O
You Lie Down With Dogs
Silence And I
Far Away From Home
J.S. Bach - Concerto No.1 in F major BWV
The System Of - Doctor Tarr And P
Give It Up (US Release)
1046 - III Allegro
Tijuaniac
Hollywood Heart
J.S. Bach - Concerto No.4 in G major BWV
Vulture Culture (2)
Mammagamma (Instrumental)
1049 - III Presto
Vulture Culture
Secret Garden
Children Of The Moon
Winding Me Up
The Fall Of The House Of Usher I
Closer Too Heaven
J.S. Bach - Concerto No.2 in F major BWV
The Same Old Sun (2)
Days Are Numbers (The Traveller) (2)
1047 - II Andante
The Same Old Sun
Days Are Numbers (The Traveller)
J.S. Bach - Concerto No.5 in D major BWV
You Can Run
Fight To Win
1050 - III Allegro
J.S. Bach - Concerto No.3 in G major BWV
APPENDIX B. THE AUDIO DATABASE
87
1048 - V Adagio (from Trio Sonata in G
Call Up
V. The Turn... (Part Two)
major BWV 1048)
Can’t Take It With You
Voyager
J.S. Bach - Concerto No.6 in B flat major
Cloudbreak
What Goes Up ...
BWV 1051 - III Allegro
Damned If I Do
What Goes Up
Ask No Question
Doctor Tarr And Professor Feth
Of Silence
Chomolungma
Dr. Evil Edit
Abandoned Pleased Brainwashed Exploited
If I Could Change Your Mind
Dreamscape
Arnold Layne
Let Yourself Go
Eye In The Sky
Astronomy Domine
No Answers Only Questions (Final
Fall Free
Black Sheep
Step By Step
Far Ago And Long Away
Blank File
Stereotomy Two
Games People Play (2)
Braveheart
The Fall Of The House Of Usher I
Games People Play
Broken (Edit Version)
Total Eclipse
H.G. Force Part One
broken glass [fire and ice bonus]
J.S. Bach - Concerto No.3 in G major BWV
Hyper-Gamma-Spaces
Cirrus Minor
1048 - VI Allegro
I Can’t Look Down
Don’t Say A Word (Edit)
Beyond The Pleasure Principle
I Robot Suite
Full Moon (Edit)
Night Full Of Voices
I’m Talkin’ to You
Go With The Flow
Old And Wise
I. The Turn... (Part One)
Hey You
Separate Lives (Alternative Mix)
Ignorance Is Bliss
In the Flesh
The Fall Of The House Of Usher V
II. Snake Eyes
Intro
Hawkeye (Demo)
III. The Ace Of Swords
Last Drop Falls
The Ring
In The Lap Of The Gods
Let There Be More Light
To One In Paradise
IV. Nothing Left To Lose
Master of Ceremonies
Turn Your Heart Around (single ve
Jigue
Misplaced
Sects Therapy
Light Of The World (2)
Money
The Naked Vulture
Light Of The World
Obscured By Clouds
Oh Life (There Must Be More)
Limelight
One of these days
No Answers Only Questions (The Fi
Lucifer & Mammagamma
Panic
No One Can Love You Better Than M
May Be A Price To Pay
Pigs On The Wing (Part 1)
Don‘t Let The Moment Pass
Mr Time
Runaway
Upper Me
No Future In The Past
San Sebastian (Original Version)
Freudiana (II)
Old And Wise
Shamandalie
Destiny
One Day To Fly
Signs of Life
There But For The Grace Of God
One More River
Speak To Me - Breathe
Ammonia Avenue
Out Of The Blue
Speak To Me
Dancing On A Highwire
Pavane
Sysyphus Part 1 , Richard Wright
Don’t Answer Me
Press Rewind
The little boy that Santa Claus forgot
Genesis Ch. 1 V. 32
Pyromania
The Post War Dream
Let Me Go Home
Re-Jigue
Theme from barveheart
One Good Reason
Rubber Universe
Unopened
Pipeline (Instrumental)
Shadow Of A Lonely Man
Vengeance - Yngwie Malmsteen
Prime Time
Siren Song
What can I do
Since The Last Goodbye
So Far Away
Wolf And Raven
Urbania
Temporalia
A New Day Yesterday
You Don’t Believe
The Call Of The Wild
A Song For Jeffrey
Apollo
The Eagle Will Raise Again
And The Mouse Police Never Sleeps
Back Against the Wall
The Gold Bug
Aqualung
Beginnings
The Time Machine (Part 2)
Beastie
Blown By The Wind
The Very Last Time
Birthday Card At Christmas
Blue Blue Sky I
Time (2)
Crossfire
Blue Blue Sky II
Time
Fat Man
Brother Up In Heaven
Too Close To The Sun
Kissing Willie
APPENDIX B. THE AUDIO DATABASE
88
Lap Of Luxury
Summer 68
False News Travels Fast
Living In The Past
Sysyphus Part 2 , Richard Wright
Fearless (you’ll never walk alone)
My Sunday Feeling
The Nile Song
Fearless
North Sea Oil
The Rest Of The Sun Belongs
Harry’s game
Quizz Kid
The Thin Ice
Have A Cigar
Roots To Branches
Time
I Want Out (Helloween Cover)
Someday the Sun Won’t Shine for Y
Weballergy
Kingdom For A Heart
Songs From The Wood
When You’re In
Mary-Lou (Acoustic Version)
Spiral
Your Possible Pasts
Matilda Mother
Steel Monkey
Acres Wild
Nobody Home
Stormy Monday Blues [Live]
Aqualung
On The Run
This Is Not Love
Clasp
One Of The Few
War Child
Cold Wind To Valhalla
One Of These Days
With You There To Help Me
Crazed Institution
See Emily play
Icarus dream fanfare
Cross-Eyed Mary
Set The Controls For The Heart Of The Sun
My Resurrection
Dot Com
Still Loving You (Scorpions Cover)
Never Die
Farm On The Freeway
Summer ’68
Victoria’s Secret
Fylingdale Flyer
Sysyphus Part 3 , Richard Wright
Locked and Loaded-lam
Holly Herald
The Cage
A Pillow Of Winds
Jack In The Green
The Dogs of War
Ain’t Your Fairytale
Jeffrey Goes To Leicester Square
The Fletcher Memorial Home
angel in heat [seventh sign bonus]
Life Is A Long Song
The Happiest Days Of Our Lives
Arnold Layne
Living in the Past
The Thin Ice
Astronomy Domine
Love Story [Live]
The Wind Beneath My Wings (Bette Middler
Blinded No More
Love Story
Cover)
Breathe
Nothing To Say
Tomorrow’s Gone - Yngwie Malmsteen
Broken (Album Version)
Occasional Demons
What Do You Want From Me
Ciprea
Orion
World In My Eyes (Depeche Mode Cover)
Die With Your Boots On (Iron Maiden Cover)
Queen And Country
A Christmas Song
Dreams
Rare And Precious Chain
A New Day Yesterday [Live]
Facing the animal
Someday The Sun Won’t Shine For Y
AWOL
Gun
The Rattlesnake Trail
Beggar’s Farm
If
Under Wraps #1
Black Satin Dancer
In the Flesh-
Why I sing the blues
Boure
In the Flesh
Alone In Paradise
Cheap Day Return
Is There Anybody Out There
Cavallino rampante
Christmas Song
Is There Anybody Out There
Champagne Bath
Cup Of Wonder
Learning To Fly
Facing The Animal
Ears Of Tin
Lucifer Sam
I Don’t Know
European Legacy
Mary Lou
Revolution-lam
Fallen On Hard Times
Mary-Lou (Acoustic Version)
th Commandment
Home
Money
Ain’t Your Fairytale
Inside
My Land
Another Brick in the Wall (Part I)
Jump Start
No Lovelost - Yngwie Malmsteen
Anywhere is
Ladies
On The Run
Book of days
Life Is a Long Song
Orinoco flow
Burning Bridges
Out Of The Noise
Peacemaker
cantabile ’vivaldi’ [magnum opus bonus]
Roll Yer Own
Portami via
Come Sei Veramente
Salamander
Remember A Day
Crying Song
Too Old To Rock ’N’ Roll (Too You
Rose of tralee
Downtown
Working John-Working Joe
See Emily Play
Dream Thieves
You Know I Love You
Still Loving You (Scorpions Cover)
Enemy
Fugue
APPENDIX B. THE AUDIO DATABASE
89
Meant To Be
Living In The Past
Wish You Were Here
Pictures Of Home
Locomotive Breath
Wot’s ... Uh The Deal
Rising Force
Mother Goose
Dogs In The Midwinter
Sing In Silence
Moths
Driving Song
Cracking the Whip-lam
Move On Alone
For Michael Collins, Jeffrey And
Another Brick In The Wall (Part 2)
Nothing At All
From A Dead Beat To An Old Grease
Another Brick in the Wall - Part 1
Requiem
God Rest Ye Merry Gentlemen
Another Brick in the Wall part 1
Rocks On The Road
Journeyman
Comfortably Numb
Said She Was A Dancer
Look Into The Sun
Corporal Clegg
Son
March The Mad Scientist
Die With Your Boots On (Iron Maiden Cover)
Summerday Sands
One White Duck 010 = Nothing At A
Fat Old Sun
Taxi Grab
Protect And Survive
Flaming
This Free Will
Ring Out Solstice Bells
Free Your
Under Wraps #2
Rock Island
i can’t wait [i can’t wait e.p.]
Undressed To Kill
Saboteur
Kiss me
Brain damage - Eclipse
Sealion
Learning To Fly
Bedrooms Eyes
Serenade To A Cuckoo
May it be
Forever One
Skating Away (On The Thin Ice Of
One Slip
Never Die
Slow Marching Band
Prendimi
prelude to april
Sparrow On The Schoolyard Wall
Reckoning Day, Reckoning Night
The End Of This Chapter
Sweet Little Angel
Remember a day
Winds of War (Invasion)-lam
Valey
Replica
aftermath [i can’t wait e.p.]
Warm Sporran
Round and around
Another Brick in the Wall (Part II)
Wicked Windows
Sacrifice
Black Sheep
Wond’ring Aloud
San Sebastian
Bring The Boys Back Home
Fullmoon
San Tropez
Don’t Say A Word
Hairtrigger
Shy
Dream Thieves
Toccata
Silver Tongue
False News Travel Fast
Crown of Thorns-lam
Sing In Silence
Green is the Colour
And she moves through the fair
The Gold It’s In The ...
I’d Die Without You - Yngwie Malmsteen
Another Brick In The Wall (Part 2)
The Great Gig In The Sky
Keep Talking
Another Brick in the Wall - Part 2
The Gun
Kingdom For A Heart
Coming Back To Life
The hands that built America
Like an angel (for April)
Cymbaline
The Happiest Days of our Lives
Money
Evergreen
The Hero’s Return
Northern shy
Follow you
The Only One - Yngwie Malmsteen
On the Turning Away
Fullmoon
Paintbox
Goodbye Blue Sky
Pigs On The Wing (Part 2)
Hey You
Cover)
Pow R Toc H
I Want Out (Helloween Cover)
Up the Khyber
Seamus
Julia Dream
Vera
See Saw
Land Of The Free
Water dance
Show me heaven
Last Drop Falls
Wish You Were Here
Shy
Mother
Another Christmas Song
The Great Gig In The Sky
Mudmen
Back To The Family
The Gunners Dream
My heart will go on
Back-Door Angels
The Happiest Days of Our Lives
My Resurrection
Black Sunday
The Misery
One Of These Days
Caledonia
Ti Scrivo
Overture 1622 - Yngwie Malmsteen
Flying Colours
Viaggio in aereo
Paranoid Eyes
Hunting Girl
When The Tigers Broke Free
power and glory ’takada’s theme’ i can’t wait
Later That Same Evening
When you say nothing at all
e.p.
Time
Two
Minds,
One
Soul
(Vanishing
Point
APPENDIX B. THE AUDIO DATABASE
90
Regina Dei Cristalli
Last Drop Falls
A Song For Jeffrey
Replica (Live)
Letter To Dana
Fire At Midnight
See Saw
Marooned
March The Mad Scientist
Set the controls for the heart of the sun
My Land (Live)
Slipstream
Several Species Of Small Furry Animals
My Selene
The Pine Marten’s Jig
Gathered Together In A Cave And Grooving
On The Run
Adagio
With A Pict
One
A Song For Jeffrey
Take Up The Stethoscope And Walk
Ossessione
First Post
The Boy Who Wanted To Be A Real Puppet
Party Sequence
Introduction By Claude Nobs
Victoria’s Secret
Respect The Wilderness
Animele
Bad-Eyed ’N’ Loveless
See Emily play
Beggar’s Farm
Batteries Not Included
spanish castle magic [inspiration bonus]
No Lullaby
Broadsword
The Happiest Days of our Lives
A Christmas Song
Bungle In The Jungle
The Narrow Way Part 1 , David Gilmour
Sweet Dream
Dangerous Veils
The Show Must Go On
Tiger Toon
Dharma For One
Vento d’Europa
A New Day Yesterday
Every Day I Have The Blues
Voodoo - Yngwie Malmsteen
Look At The Animals
Heavy Water
A Time For Everything
Skating Away (On The Thin Ice O
Hunt By Numbers
Fat Man
Boure
Jack Frost And The Hooded Crow
Grace
Jack In The Green
Nothing Is Easy
One Brown Mouse
Law Of The Bungle
Nursie
Singing All Day
Law Of The Bungle (Part 2)
Radio Free Moscow
Too Old To Rock ’N’ Roll (Too You
Nothing Is Easy
Rover
Sarabande
One Brown Mouse
Skating Away (On The Thin Ice Of
Beauty and A Beast-lam
A New Day Yesterday
Something’s On The Move
Another Brick in the Wall part 2
Left Right
Sweet Dream
Any Colour You Like
Living In The Past
Thick As A Brick [Edit No. 1]
Empty Spaces
Flute Solo Improvisation God Re
Thinking Round Corners
Goodbye Blue Sky
Solitaire
To Cry You A Song
Mc; Atmos
To Cry You A Song
Up To Me
Sospeso Nel Tempo
Songs From The Wood
Velvet Green
The Gnome
Teacher
Witch’s Promise
The Narrow Way Part 2 , David Gilmour
Post Last
Andante
Cold Wind To Valhalla (Intro)
Sweet Dream
Brothers
Dun Ringill
Cross-Eyed Mary
Ill See The Light Tonight
Hymn 43
Scenario
Like An Angel
One White Duck
Audition
Replica
Only Solitaire
Mother Goose
The Bogeyman-lam
Allegro
Aqualung
A New Machine (Part 1)
Fuguetta (Instrumental)-lam
No Rehearsal
Another time
A New Machine (Part 2)
Locomotive Breath
Any Colour You Like
Arnold Layne
Life Is A Long Song
Baby can I hold you
Eclipse
Thick As A Brick [Edit No. 1]
Carefull with that axe, Eugene
Empty Spaces
A Passion Play [Edit No. 8]
Champagne Bath
God is God [alchemy bonus]
Skating Away (On The Thin Ice O
Childhood’s End
I dont wanna know boadicea
Bungle In The Jungle
Fade To Black (Metallica Cover)
Ibiza Bar
Bike
Get Your Filthy Hands Off My Desert
Le Tue Mani
Eclipse
Goodbye Blue Sky
Minstrel boy
More Blues
Hey You
Paintbox
Qui Danza
I should have known better
Southampton Dock
Revontulet
Jugband Blues
The Nile song
Scarecrow
APPENDIX B. THE AUDIO DATABASE
91
The Grand Vizier’s Garden Party Part 1 -
Goodbye Cruel World
Broadsword
Entrance , Nick Mason
Star of a country down
Commons Brawl
What Shall We Do Now-
The celts
No Step
Cheerio
The Last Few Bricks
Under Wraps #2
Just Trying To Be
Life Is A Long Song
Drive On The Young Side Of Life
Lick Your Fingers Clean
Under Wraps #2
Steel Monkey
Mango Surprise
Goodbye Cruel World
Farm On The Freeway
Pan Dance
Is there anybody out there-
I Don’t Want To Be Me
Round
The Nile Song
Broadford Bazaar
Salamander
Mayhem, Maybe
Jump Start
Thick As A Brick [Edit No. 1]
Up The ’Pool
Kissing Willie
Guardian Angel (Instrumental)-lam
Nobody Home
Lights Out
Bike
Peacemaker (Studio Track)
This Is Not Love
Green Is The Colour
Dr. Bogenbroom
Truck Stop Runner
Jugband Blues
Someday The Sun Won’t Shine For Y
Hard Liner
Stop
Wond’ring Aloud [Live]
Nursie
Wrecking The Sphere
Vera
Rupi’s Dance
Young Lust
Dun Ringill [Live]
A Christmas Song
Greensleeved
For Later
Grace
Jack In The Green
Paraphrase (Instrumental)-lam
Run like Hell
Minstrel In The Gallery [Live]
Bring the boys back home
Waiting for the Worms
Under Wraps #2
Life Is A Long Song
A Perfect Circle - Mer De Noms - 01 - The
A Spanish Piece
Nursie
Hollow
Another Brick in the Wall (Part III)
The Water Carrier
A Perfect Circle - Mer De Noms - 03 - Rose
Crying Song
Introduction By Ian Anderson
A Perfect Circle - Mer De Noms - 09 - Ren-
One of My Turns
Minstrel In The Gallery
holdr
Pensieri Nascosti
Paradise Steakhouse
A Perfect Circle - Mer De Noms - 12 - Over
Skye boat song
Hunting Girl
A Perfect Circle - Thirteenth Step - 02 -
Stop
Sealion II
Weak and Powerless
The Grand Vizier’s Garden Party Part 3 -
Too Old To Rock ’N’ Roll (Too Y
A Perfect Circle - Thirteenth Step - 06 - A
Exit , Nick Mason
Piece Of Cake
Stranger
Cold Wind To Valhalla (Intro) [Li
Songs From The Wood
A Perfect Circle - Thirteenth Step - 08 -
Fire At Midnight
Too Old To Rock ’N’ Roll (Too Y
Crimes
King-Brubeck jam (short version)
Conundrum
A Perfect Circle - Thirteenth Step - 11 -
Life Is A Long Song
Jack In The Green
Lullaby
Finale
Quartet
Bob Marley and the Wailers - Legend - The
Sorrow
Minstrel In The Gallery
Best of - 04 - Three Little Birds
Vodka
Silver River Turning
Bob Marley and the Wailers - Legend - The
Air on a theme
The Whistler
Best of - 06 - Get Up Stand Up
Bike
Crew Nights
Bob Marley and the Wailers - Legend - The
Don’t Leave Me Now
Cross-Eyed Mary
Best of - 08 - One Love People Get Ready
Dramatic Theme
Dun Ringill
Celine Dion - Mon ami m’a quittee
Fields of gold
Quatrain
Celine Dion - Tellement J’ai D’amour Pour
Goodbye Cruel World
The Curse
Toi
Outside The Wall
Flyingdale Flyer
Celine Dion - Tout L’or Des Hommes
Watermark
Rosa On The Factory Floor
Debussy - La Fille Aux Cheveux De Lin
Dun Ringill
A Small Cigar
Debussy - Reverie
We Five Kings
Jack-A-Lynn
Debussy - Syrinx
Another Brick in the Wall - Part 3
Locomotive Breath
Debussy - Valse Romantique
Another Brick in the Wall part 3
Man Of Principle
ynvie - zelda
More Blues
Pussy Willow
house october 2009) Sammy Love ft Irene
Jack Frost And The Hooded Crow
The Dambusters March
Arer-Torcida
(Lanfranchi
&
Farina
rmx)
APPENDIX B. THE AUDIO DATABASE
92
[www.worldofhouse.es]
D.A.N.C.E.
david guetta-when love takes over (feat. kelly
Dennis Ferrer - Church Lady (Original Mix)
Supermassive Black Hole
rowland)
House- 3Rd Face - Canto Ddella Libert (Van-
The World Is Mine
david guetta-gettin over (feat. chris willis)
dalism Rmx)
Glimpse Jay Shepheard Alex Jones-Glimpse
david guetta-sexy bitch (feat. akon)
Muse - Showbiz - Sunburn
And Alex Jones - Fellaz (0Daymusic Org)
david guetta-memories (feat. kid cudi)
riva starr - i was drunk feat. noze (original
Motel Connection-Waxwork
david guetta-missing you (feat. novel)
mix) [4clubbers.com.pl]
nari and milani feat. max c-disco nuff (cris-
david guetta-its the way you love me (feat.
tim deluxe ft. shahin badar - mundaya (the
tian marchi perfect remix) (0daymusic.org)
kelly rowland)
boy)
the
Trentemoller-Nightwalker
Firestarter
okereke)-ck
crystal waters gipsy woman (shes homeless)-
Genesis
Everything Counts
hft
I Was Drunk
Golden Skans
david guetta-choose (feat.
Sexx Laws
Predominant
rowland)
Surfin’ U.S.A.
Toop Toop - Cassius
molella and phil jay - its a real world (world
dr kucho-beat for me (original mix) (0day-
Dance Me
mixx).Dr.SoOn
music.org)
New Jack
david
fatboy slim and koen groeneveld-rockafeller
DEPECHE MODE A question of time
will.i.am and apl de ap)
skank original mix (0daymusic.org)
Motel Connection-Three
david guetta-i gotta feeling (fmif edit) (feat.
fatboy slim vs.
Cajesukarije Cocek
black eyed peas)
The Cure - Lullaby
Spiller Feat. Sophie Ellis Bextor - Groovejet
Phantom pt. I
(if This Ain’t Love)
FLAMINGO PROJECT - Take No Shhh-
What’s My Age Again
Royal T (Featuring Roisin Murphy)
hhhhh
Mauro Picotto Komodo
Waters Of Nazareth
funkagenda-what the fuck (original club mix)-
Shake the Disease
Bibi Tanga et le professeur inlaSsable - talk-
scratch
Chemical Brothers-Do It Again (featuring
ing nigga brothaz
kaiser chiefs-ruby
Ali Love)
People Are People
sander van doorn-renegade club edit trance
apdw vs tim deluxe ft sam obernik-just wont
Underworld - Born Sleepy
energy 2010 anthem
it mowgli dub mix 2010 (0daymusic.org)
david guetta-one love (feat. estelle)
sgt slick and rob pix - behind the sun (cris-
Dj Dado - Coming Back
david guetta-sound of letting go (feat.
tian marchi perfect mix) (0daymusic.org)
Wedding Cocek
cadisco and chris willis)
the chemical brothers-galvanize (feat q-tip)-
Times
david
ck
Phantom pt. II
will.i.am)
the fratellis-henrietta
The Smiths - This Charming Man (1984)
massive attack-five man army
the hives - tick tick boom-ysp
Warriors Dance
david guetta-toyfriend (feat. wynter gordon)
Muse - Uprising - PANiC
gabry ponte and paki-its about to rain
david guetta-if we ever (feat. makeba)
Personal Jesus
See You
One Minute To Midnight
giorgio prezioso vs libex-disco robotz voco
Hot Stuff
Provenzano feat.
mix
Nina I’m So Excited
(Suonino Mix)
Motel Connection - Lost(1)
Valentine
American Wedding
Sparkles
Shined On Me
Dreamlend
Let There Be Light
The Party
The Smiths - Ask (1987)
My Sharona
Would You...
Flow
Shoot The Runner
Strangelove
Tonite
Beyonce ft. Sean Paul Baby Boy
Caballeros
I Feel You
DVNO
Andrea Doria Bucci Bag
I will survive
Matia Bazar - Solo tu
burgess)-ck
dennis ferrer feat. karlon brooks sr. - change
Whirlpool Productions From Disco To Disco
Donna Summers - Funky Town
the world
Benny B. Satisfaction
Just Can’t Get Enough
Riva’s Boogaloo
dennis ferrer - dem people go
Mesecina
Stress
Festivalbar 2007 Blu - Neffa - La Notte
Nothing More
chelley - took the night (victor palmez and
Venus
Bulgarian Chicks
id remashmix)-atrium (0daymusic.org)
Master and Servant
fedde le grand-praise you
2009 (f.l.g. remix)
FEDDE
LE
GRANDE
presents
THE
Moderat-Rusty Nails
the
chemical
brothers-the
boxer
(feattim
chemical
brothers-believe
(feat
kele
guetta-on
guetta-i
the
wanna
ne-yo and kelly
dancefloor
go
crazy
(feat.
(
to-
feat.
Max’c - Chains Of Love
APPENDIX B. THE AUDIO DATABASE
93
New Life
Beats and Styles feat. Justin Taylor - Friend
Crookers
Raffaella Carra - Festa (Italian Version) -
(Cristian Marchi & Paolo Sandrini Extended
D’Amico - Festa Festa
Raffaella Carr
Remix)
Crookers ft.
The Soundlovers Surrender
Bee Gees - You Should Be Dancing
Riva Starr Dub - Defected Miami WMC 2010
Never Let Me Down Ag
Beverly Project Vs Julio Cesar - My People
-ITH33DS-1(320k)
Brothers On The 4th Floor - Dreams (Will
(Cristian Marchi Flow Mix)
Crying at the discoteque
Come Alive)(1)
Beyonc - Single Ladies (Put A Ring On It)
Daft Punk Around The World
Let’s Dance
Billy More - Come On And Do It
Daft Punk Put Your Hands Up In The Air
Noferini & Dj Guy feat. Hilary - Pra Sonhar
Billy More - I keep on burning
DANCE ANNI 90 - Corona - This is the
(Marascia rmx)
Billy More - Up and down
rythm of the night
Alors on danse - Stromae
Bingo Players - Devotion [Original Mix]
Dance Anni 90 - Gala - Freed From Desire
diabulus in musica
Bingo Players - When I Dip (Original Mix)
Dance Anni 90 - Haddaway - What Is Love
Inxs - I Need You Tonight
Bingo Players Vs Chocolate Puma - Disco
Dance Anni 90 - Ultranate - Free
cd6 - 01 - human league - don’t you want me
Electrique (Original Mix)
Daniele Silvestri & Subsonica - Liberi tutti
baby
bla bla bla
Datura - The 7Th Allucination
pm (till I come)
Blondie - Call Me
Datura-yerba Del Diablo I
ACDC - Back in Black
Blondie - Heart of Glass
David Bowie - Rebel Rebel
ACDC Highway To Hell
blue
David Guetta - Everytime We Touch
Aerosmith - Pink
blur - boys and girls
David Guetta - Gettin’ Over You (Feat Chris
Aerosmith - Rag Doll
Bob Marley - Could You Be Loved
Willis, Fergie & Lmfao)
Afrojack - Pacha On Acid (Original mix)
Bob Marley - Legend 06 - Get Up Stand Up
David Guetta - Grrrr (Original Mix)
Alan Sorrenti - Figli Delle Stelle
Bob Sinclar - Gym Tonic (T.Bangalter Mix)
David Guetta - Love Is Gone
Alex Gaudino & Jason Rooney - I Love Rock
Bob Sinclar - New New New (2009 Blaster
David Guetta - Pop Life - 04 - Delirious
N Roll (Exclusive Edit)
Project Exclusive Version) (House Diciembre
David Guetta - Sexy Bitch feat. Akon
ALICE - Per Elisa
08)
David Guetta feat Kelly Rowland - Takes
Alison Goldfrapp - Lovely Head (Pubblicita’
Bodyrox Ft.
BMW)
Ramirez Radio Edit)
remix)
Almamegretta-BlackAthena
Bronski Beat - Smalltown Boy
david guetta feat. chris willis - love is gone
America
Calvin Bosco & Chris Bekker feat. Giorgio
David Guetta Feat.
Analog People In A Digital World - Vega (Ian
Moroder - The chase (D.O.N.S. Remix)
LMFAO - Gettin’ Over You (Extended Mix)
Pooley Mix)
Calvin Harris - Acceptable in the 80s (Radio
David Guetta feat.
Analog People In a Digital World - Walking
Edit)
takes over (Original Mix)
In Harlem [Dj Sneak Mix]
Calvin Harris - Flashback
David Guetta Love don’t let me go
Anni 80 - Amanda Lear - Tomorrow
Calvin Harris - I’m Not Alone
David Morales - Needin’ You
Anni 90 Corona - Baby Baby
Cassius - I’M A Woman
Dead Or Alive - You Spin Me Right Round
Annie Lenox -Eurythmics - There Must Be
Cassius - La Mouche
(Like A Record)
An Angel
Ce Ce Peniston - Finally (Vandalism Remix)
Dee-lite - Groove is in the heart
Luciana - Yeah Yeah (D.
feat.
Fabri
Fibra
&
Dargen
Roisin Murphy - Royal T -
Love Over (Arno Cost & Norman Doray
Chris Willis, Fergie &
Kelly Rowland - Love
[www.worldofhouse.es]
Deep Swing - In The Music 2010 (Cristian
GRINO ROCKING Original Mix
Chase - Obsession
Marchi Perfect remix)
Antonella Ruggero & Matia Bazar - Ti Sento
Chelsea Dagger
Dennis Christopher - Set It Off (Ian Carey
Apres La Classe - Paris
Christina Aguilera - Candyman
Remix)
Are you gonna go my way
Clash - London Calling
Dennis
Armand Van Helden - Hear my name (edit)
Corona - This is the rythm of the night
rer’s
Arno Cost - Cyan (Original Mix)(1)
(Dance Mix ’94)
mafia.blogspot.com)
ATB - Let You Go (Airplay Mix)
Corona - Try Me Out
Destination Unknown (J-Reverse Radio Mix)
ATB - You Are Not Alone
Cristian Marchi - Disco Strobe (Perfect Mix
Different Gear Vs Sia - Drink To Get Drunk
Axwell - I Found You (Axwells Re-Mode)
Radio)(1)
Dirty South - Let It Go [Axwell Remix] (Rip)
Axwell feat.
ANTHONY
LOUIS
&
PAOLO
PELLE-
Ferrer
-
Attention
Hey
Vocal
Hey
Mix)
(Dennis
Fer-
(bacauhouse-
Cristian Marchi - Love, Sex, American Ex-
Dirty South & Mark Knight - Stopover (Orig-
sunrise (Radio Edit)
press (Cristian Marchi Main Vocal Mix)
inal Mix) (bacauhousemafia.blogspot.com)
Bamboo - Bamboogie (12” Vocal Mix)
[www.worldofhouse.es]
Discorama - Giddy Up A Go Go (John
Barbara Ann
Crookers Feat.
Dahlback Remix)
Bassmonkeys & Bianca Lindgren - Get busy
Giorno’N’Nite
Steve Edwards - Watch the
Dargen D’Amico Dan T -
Discoteca Anni 90 - Snap!
- Rythm Is A
APPENDIX B. THE AUDIO DATABASE
94
Dancer
fect remix)
I predict a riot
Discoteca labirinto
Gala - Come Into My Life
I was made for loving you
Gala - Let A Boy Cry
ida corr - a1 let me think about it (fedde le
Bonkers (Club Mix)
Gem Boy - Orgia Cartoon
grand remix)
DJ Emanuele Inglese - I’m Really Hot
Gianna Nannini - 01 - Bello E Impossibile
Inna - hot
DKS - That’s Jazz (Da Club Mix)
Gianna Nannini - America
Itaka & Manu Blanco - Como Dice El Dj
Do you want to
Gianna Nannini - Fotoromanza
Jacqueline
Dr Kucho - New school tribal (original mix)
Gigi D Agostino - L Amour Toujours
Javi Mula Feat. DJ Disciple - Sexy Lady (Ex-
Dr Kucho! - Patricia Never Leaves The House
Gigi D’Agostino Another Way
tended Mix) (bacauhousemafia.roclub.org)
(Dr Kucho! Remix)
Gigi D’Agostino Bla Bla Bla
Javi Mula - Come On
Dr. Kucho - Belmondo Rulez (Bob Sinclair
Gigi D’Agostino The Riddle
jean
Vocal mix)
Girl
shingaling
Dr. Kucho - Groover’s Delight (2008 original
Giuliano Palma & La Pina - Parla Piano
Jefferson Airplane - 1967 - Surrealistic Pillow
mix)
Giuliano Palma & The Bluebeaters - Black Is
- 02 - Somebody to Love
Dropkick Murphys - I’m shipping up to
Black
John Dahlback - Blink
boston
Giuliano Palma & The Bluebeaters - Won-
John Dahlback-Everywhere
Duck Sauce Ft A-Trak & Armand Van Helden
derful Life
Juan Magan & Marcos Rodriguez - Bora Bora
- Anyway (Original Mix)
Giuliano Palma - Tutta Mia La Citt
[www.worldofhouse.es]
Dude Looks Like A Lady
Giuni Russo - Maracaibo
Juanjo Martin & Albert Neve ft.
Duran Duran - 13 - Notorius
Giuni Russo - Voglio andare ad Alghero
Brown - SuperMartxe (Original Mix) FULL
Duran Duran - Wild Boys
glimpse and alex jones–true friends-dh (0day-
Junior Jack - E Samba
Eddie Thoneick, Erick Morillo - Nothing Bet-
music.org)
Justice - DVNO (192 kbps)
ter Feat Shena (Original Mix)
Global Deejays Feat. Ida Corr - My Friend
justice - stress
Edward Maya&Vika Jigulina - StereoLove
(Club Mix) passion4housemusic.blogspot.com
Kalasnjikov
Estelle Feat. Kanye West - American Boy
Goldsylver - I Know You Better
KID CUDI - Day and night
everyday I love you less and less
Good Vibrations
Kim Carnes - Bette Davis Eyes
Faithless - Insomnia
Gorillaz - Stylo
Falco - Der Kommissar
Gramophonedzie-Why
Falco - Rock Me Amadeus
mix)
La Passion
Fedde Le Grand & Funkerman - 3 Minutes
Green Velvet - La La Land
LaBouche - Sweet Dreams (Disco Techno
To Explain (Original Mix)
Groove Armada-Paper Romance (Album Ver-
Mix)
Fedde Le Grand feat. Mitch Crown - Scared
sion) 2010 (Albummusic Eu)
Lady Gaga - Bad romance (radio edit)
Of Me (Extended Mix)
Gui Boratto - No Turning Back (Original
Laurent Wolf - Calinda (Original mix)
Feel da feeling (side a2 radio edit)
Mix)
laurent wolf - explosion club mix
feel it
Guido Nemola & Loaded - De Bailar
laurent wolf - seventies - club mix
Felix Da Housecat - Silver Screen
Hatiras - Spaced Invader (Hatiras 2010 Vocal
le knight club - Tropicall
flashdance
Remix)
Let Me Think About It - Radio Edit - Ida
fly away
Heart in a cage
Corr Vs Fedde Le Grand
Franco Battiato - Cerco Un Centro Di Gravit
Helmut Fritz - a m’enerve (Radio Edit)
Le Knight Club - Soul Bells
Permanente
hey boy hey girl
lies
Frankie Gada Vs Raf Marchesini - Rockstar
horny 98
Lobo guar - Resta aqui
(Cristian Marchi Perfect Remix)Clubkings
Hot Party Winter 2007 208 Alex Gaudino -
Loco Tribal - O Ritmo Do Samba (Tiko’s
Eu
Magic Destination(Calabria Mix)
Groove Remix)
Frankie Goes To Hollywood - Relax
Hot-Chip-One-Life-Stand
Love And Pride
Franz Ferdinand - No You Girls never know
House - Armand Van Helden - Witch Doctor
Love Don’t Let Me Go (Walking Away)
French Affair - My Heart Goes Boom
(original mix)
Love in an elevator
gabin - it don’t mean a thing
House - Roach Motel - Wild luv (H Connec-
Love Is Gone (Original Mix) - David Guetta
Gabry Ponte - Don’t Move Your Lips
tion Remix)
LSF - kasabian
Gabry Ponte - Time To Rock
House Of Glass - Disco Down
Madness - One Step Beyond (1979)
Huf8 - Sashi & Sushi - Original Mix
Madonna - Give It To Me - 2008 (Hard
I Believe in a Thing called Love
Candy)
i belong to you
Madonna - Hard Candy - Beat Goes On (Fea-
Dizzee Rascal Feat.
Gabry
Ponte,
D’angelo feat.
Armand Van Helden -
Cristian
Marchi,
Sergio
Andrea Love - Don’t Let
Me Be Misunderstood (Cristian Marchi per-
claude
ades
and
vincent
thomas-
Nalaya
King Of My Castle
Dont
You(Original
Kobra
APPENDIX B. THE AUDIO DATABASE
95
turing Kanye West)
Sandrini Remix)
Sade - Smooth operator
Madonna - Revolver (David Guetta Remix)
NARI & MILANI vs. CRISTIAN MARCHI
Salif Keita - Madan (Martin Solveig)
Malente - I Like it (Riva Starr Snatch Mix)
feat. MAX C - Let It Rain (Club Mix)
san franciscan nights
Marani And Montsaint - Turn (Paolo Bolog-
Ne-Yo - Nobody
Sash - Encore une fois
nesi Remix)
Need For Speed Underground - 08 - Asian
Sash! - Mysterious Time
Marchi’s Flow vs. Love feat. Miss Tia - Feel
Dub Foundation - Fortress Europe
Scatman John - Scat Man
The Love
Nelly Furtado ft Timberland - Promiscuos
Sexual Guarantee
Mark Knight & Funkagenda - Good Times
Girl
Sharam feat.
(Original Mix)
New Order - Blue Monday
(Jean Elan Remix)
Marracash - Badabum Cha Cha(1)
Nick Kamarera & Deepside Deejays - Beauti-
Sidekick - Deep Fear (Phobia Club Remix)
Martin Solveig - 02 - Something Better
ful Days (extended Version)
Sidney Samson - Riverside (Original Mix)
Martin Solveig - Boys & Girls
NICOLA
Martin Solveig - C’est la Vie (Martin Solveig
feat.MR.GEE
vs Fedde le Grand)(1)
SOUTH BEACH MIX(1)
Song 2
MARTIN SOLVEIG - Everybody -
Nuova ossessione (album)
Sono - Keep Control
MARTIN SOLVEIG - I want you
Oliver Twizt - Gangsterdam (Original Mix)
Sophie Ellis-Bextor - Bittersweet (Freema-
Martin Solveig - One 2 3 Four (Original Club
(bacauhousemafia.blogspot.com)
sons Mix Edit)
Mix)
Ordinary Life
splendido - ComoglioCut
Martin Solveig - One 2 3 Four
Out of Space
Starman
Martin Solveig feat Dragonette - Boys & Girls
Pain & Rossini - Hands Up Everybody (Cris-
stay
(Laidback Luke Remix)[NationOfHouse.com]
tian Marchi Rmx)
Stayin Alive (Saturday Night Fever)
Massive Attack - Herculaneum - Colonna
Peter Tosh - Out Of Space’76
Steve Aoki I’m In The House
sonora - GOMORRA
Pharrel Williams & Uffie - Add Suv (Armand
Strawberry fields (rmx)
Mastiksoul - Back To The 80’s feat. George
Van Helden Vocal Remix)
strings of life
Llanes Jr. - Latin Version
Pharrell feat Gwen Stefani - Can I Have It
Stylus
Matia Bazar - Vacanze Romane
Like That
Remix)
Meg - Distante
Pitbull - I Know You Want Me (Calle Ocho)
Subsonica - I chase the devil
Messico e nuvole
Pitbull feat. Akon - Shut It Down (Javi Mula
Susy La Ragazza Truzza (Gabry2o Rmx)
Michel Cleis & Salvatore Freda - Collivo
Remix Extended)
Swared Ruanda (Emanuele Esposito Club
Moby - Disco lies (Spencer & Hill rmx)
Planet Funk - Chase The Sun (Radio Edit)
Mix)
Molella And Phil J - With This Ring Let Me
Planet Funk - Inside All The People
Sweet dreams
Go
Planet Funk - Lemonade
Sylvester - You Make Me Feel (Mighty Real)
Molella-revolution (tantaroba mix)
Prezioso - Tell Me Why
Take me out
Motel Connection - Dreamer
Prok And Fitch Pres Saloma De Bahia
take me up
Motel Connection - H
Remixes - Outro Lugar (Tocadisco’s Nunca
Techno - Gigi d’Agostino - You spin me
Motel Connection - Heroin
Chove Floripa Mix)
around
Motel Connection - Hit and run
Quelli che benpensano
Telephone - Lady Gaga feat. Beyonce
Motel Connection - Reach out
Raf Marchesini & Max B - Farao (Marchesini
Tell her tonight
Motel Connection-The light of the morning
Radio mix)
The Bloody Beetroots - Anacletus
Motel Connection Uppercut
RAP - Old School - Run DMC - It’s Like
The Clash - 01 - Know Your Rights
move your body
That
The Clash - 02 - Car Jamming
MSTRKRFT - Heartbreaker feat. John Leg-
Ricardo
end Laidback Luke Remix
PONTE & PAKI remix
Go
Muse - 03 - Time is Running Out
Robert Miles - Children (Full Version)
The Clash - 04 - Rock The Casbah
Muse - 08 - Hysteria
Robert Miles - Freedom
The Clash - 05 - Red Angel Dragnet
Muse - The Resistance - 03 - Undisclosed
Roberto Molinaro - Hurry Up
The Clash - 07 - Overpowered By Funk
Desires
Robin S vs Steve Angello & Laidback Luke -
The Clash - 08 - Atom Tan
Music Response
Show Me Love vs Be (Hardwell remix)
The Clash - 09 - Sean Flynn
Mylo - Drop The Pressure
rock the house
The Clash - 10 - Ghetto Defendant
N trance - Da ya think I’m sexy
Roisin Murphy - Overpowered (Herve &
The Clash - 11 - Inoculated City
Nari & Milani and Cristian Marchi with Max
Roisin in the Secret Garden) (162k 5m29s)
The Clash - 12 - Death Is A Star
C - Let It Rain (Cristian Marchi & Paolo
Rudenko - Everybody (Club Mix)
The dark of the matinee
FASANO
-
vs.OUTWORK
ELECTRO
Villalobos
-
-
Enfants
FASANO
-
GABRY
Kid Cudi - She came along
Simone Jay - Wanna Be Like A Man
Smack My Bitch Up
Robb-Ininna
Tora(Nick
Corline
The Clash - 03 - Should I Stay Or Should I
APPENDIX B. THE AUDIO DATABASE
96
The Doors - 01 - Break On Through
do Rio (Vocal Extended) [by zZz]
Where are we runnin’
The Doors - 01 - Hello, I Love You
Tiko’s Groove feat. Mendonca Do Rio - Me
Whigfield - Saturday Night(1)
The Face Vs. Adam Shaw & Mark Brown -
Faz Amar (Vocal Extended Mix)
Whigfield - When I think of you
Needin U (Original Mix Version 2)
Tim Deluxe - It Just Won’t Do
Who Da Funk feat. Jessica Eve - Shiny Disco
The Gossip - Heavy Cross
Tom Novy - Your Body (Radio Edit)
The Gossip - Music For Men - 04 - Love Long
Treasure
Distance
Dancefloor
The Guru Josh Project - Infinity 2008(radio
[www.worldofhouse.es]
Winehouse, Amy Rehab
edit)
Trentemoller - The Trentemoller Chronicles
Wouldn’t It Be Nice
The Hives - Two Timing Touch And Broken
cd 1 - 02 - Klodsmajor
Wuthering Heights
Bones
Trentemoller - The Trentemoller Chronicles
Yolanda Be Cool & DCUP-dj roma the white
The House Keepers - Runaway(DJ Umile Full
cd 1 - 03 - Mcklaren (trentemoller remix)
Tracklist - We No Speak Americano (Original
Vocal Mix)www.mp30.er.pl
Trentemoller - The Trentemoller Chronicles
Mix) [320]
The Housekeepers - Go down
cd 1 - 11 - Rykketid
You only live once
The housekeepers Hangin’ on
Trentemoller - The Trentemoller Chronicles
yuksek - tonight
The Smashing Pumpkins - 02 - Ava Adore
cd 1 - 13 - Moan (Trentemoller Remix Radio
Yves Laroque - Rise up
The Strokes-Reptilia
Edit)
Theophilus London - TNT
Trio - Dadada
Tiko’s Groove - Para Sambar feat Mendoca
Tutti i miei sbagli
Fingers
(Laidback
-
Cross
Luke
Balls
The
Remix)
Will I Am - The Donque Song (Fedde Le
Grand remix) by ALEX INC
Appendix C
A short story of computer-aided
composition
The field of computer-aided composition spreads from artificial intelligence to humanistic studies.
We may say that musical composition systems derive from the very desire of men to create
machines able to emulate their behaviors and creative intuition. If we recall the 16th Century
legend of the golem, an animated anthropomorphic being created entirely from inanimate matter,
we can see how far this desires goes in time. The golem was created by men at their own image
and could be seen as the first idea of a ”computing device”: it could execute any order by writing
a specific series of letters on parchment and placing the paper in a golem’s mouth. In addition
to this, men wants to be surprised by a system that creates the ”unexpected”, the ”novel” and
therefore go beyond its creator.
Buxton [9] shows how, even in the year 1975, computer-aided composition has been seen as an
interesting research and artistic creation area.
C.1
Algorithmic composition
(a) John Cage, Variations
(b) Karlheinz Stockhausen, Tunnel Spiral
Figure C.1: Relevant music scores
The term ”algorithmic composition” refers to a category of systems that compose music by
applying a set of rules. The rule-based approach surely offers many advantages such as the
complete control on the result but, since music is linked to the concept of creativity and not
only to defining a set of rules, algorithmic composition suffers under staticity and predictability.
From the end-user side, the advantage of algorithmic composition systems is the possibility of
97
APPENDIX C. A SHORT STORY OF COMPUTER-AIDED COMPOSITION
98
drastically reduce the amount of user interaction during the generation phase; the developer
can define a large set of rules and let the system compose music with no input. This makes
rule-based composition systems available also for non-expert users. The disadvantage is the
complexity of the system compared to the simplicity of the generated music, usually biased
toward a particular style (Bach-style, techno, piano solo music, ...). Moreover, the relevance of
some musical authors or scores lies in the rejection of some classic rules.
Ancient musical composition systems are the so-called ”giochi armonici” (18th century, credited
to W.A. Mozart), combinatory tables that allow, by means of dices, the composition of a virtually
infinite number of melodies or ”minuetti”.
More recently, composers such as John Cage and Karlheinz Stockhausen explored new ways of
writing and executing a musical score leaving a certain degree of freedom to the player, usually
an electronic device (Figure C.1).
From this moment on, many attempts have been made to exploit the growing computational
power of computers, aiming at the creation of a music composition system that can autonomously
arrange and compose an appealing musical piece. It is clear that the notion of ”appealing musical
piece” depends on many factors (such as the musical training or the culture of the listener) and
is therefore an ill-posed problem. In order to circumvent this fact, music composition systems
usually restrict their scope to a particular slice of the musical panorama (western music, disco
music, ...).
A first category of systems use some rules of the traditional western music and some melodic
heuristics to create a musical melody and harmony. Aoki [3] uses the traditional harmony and
counterpunctus rules to generate a musical score, Chua [13] uses a random number generator to
select notes from the pentatonic scale whereas Rigopulos and Egozy [43] let the user decide the
characteristics of the generated musical piece by means of a joystick.
A more formalized approach makes use of mathematical logic to model a musical piece (Anders
[1] and Henz et al. [22]). The user specifies, by means of logic formulas, the expected music rules
and the system, finds a realization that satisfies the stated formulas.
C.2
Composition environments
Another type of systems explicitly needs human creativity and offers an environment in which
the composer creates his or her music. The preferred user is therefore a musically trained person.
These system usually consider music as non-linear: a musical score is no more considered as a
line with a beginning and an end, but is more similar to a graph where nodes represent musical
segments and edges possible continuation of such melodies. The executor can move in the graph
and produce a particular time evolution of the composition. The application of these system is
usually very specific: Microsoft Corporation integrated DirectMusic (Fay [15]) in the DirectX
framework and Lucasarts used iMUSE (Land and McConnel [27] as music engine for their games.
Figure C.2: iMUSE Logo
iMUSE iMUSE (Interactive MUsic Streaming Engine) (Land and McConnel [27]) is an
interactive music system used in a number
of LucasArts video games. The idea behind
iMUSE is to synchronize music with the visual action in a video game so that the audio continuously matches the on-screen events
and transitions from one musical theme to another are done seamlessly. iMUSE was developed in the early 1990s by composers Michael
Land and Peter McConnell while working at
APPENDIX C. A SHORT STORY OF COMPUTER-AIDED COMPOSITION
99
LucasArts. The iMUSE system was added to the fifth version of the SCUMM (Script Creation
Utility for Maniac Mansion) game engine in 1991.
iMUSE was developed out of Michael Land’s frustration for the audio system used by LucasArts
while composing ”The Secret of Monkey Island”. His goal was to create a system which would
enable the composer to set the mood via music according to the events of the game. The first
game to use the iMUSE system was ”Monkey Island 2: LeChuck’s Revenge” and it has been
used in all LucasArts adventure games since. It has also been used for some non-adventure
LucasArts titles, including ”Star Wars: X-Wing”, ”Star Wars: TIE Fighter”, ”Star Wars: Dark
Forces” and ”X-Wing Alliance”.
iMUSE uses standard MIDI files to whom some control signal are added; the input data is therefore polyphonic and represented as a sequence of musical notes. The issues related to digital
signal processing and audio feature extraction are here extremely simplified since the system
has a precise information about the score.
The actions that may be taken are:
• move the execution to a certain point in file
• adjust a MIDI controller such as volume, pitch, ...
• enable/disable an instrument
Decision points are placed in the performance data by the composer. Upon encountering a
decision point, the sound driver evaluates the corresponding condition and determines what
action to take based on the events occurring in the game. It is therefore possible to trigger
a musical piece in correspondence of a combat scene or change the characteristic of the music
when the user moves from room to room.
OpenMusic OpenMusic 5 (Bressin et al.
[8]), developed by IRCAM, is a more sophisticated version of iMUSE that allows a composer to use a sort of ”music programming
language” (similar to Csound or PureData),
assisted by a graphical interface. Visual programs are created by assembling and connecting icons representing functions and data
structures. Most programming and operations are performed by dragging an icon from
a particular place and dropping it to an other
place. Built-in visual control structures (e.g.
loops) are provided, that interface with Lisp
ones.
Figure C.3: OpenMusic 5
OpenMusic may be used as a general purpose functional/object/visual programming
language. At a more specialized level, a set of provided classes and libraries make it a very
convenient environment for music composition. Different representations of a musical process
are handled, among which common notation, midi piano-roll, sound signal.
C.3
Interactive composition
Beside the algorithmic (rule-based) systems and the composition environments there exists another category of systems, the interactive composition systems, that are designed to assist the
APPENDIX C. A SHORT STORY OF COMPUTER-AIDED COMPOSITION
100
person in the creative process, without substituting him or being relegated to the pre-production
of the musical piece.
A mixture between algorithmic and traditional composition could be found in the article by Unehara and Onisawa [47] where genetic algorithms and machine learning techniques are exploited.
The system is iteratively trained by the user by expressing appreciation for good sections or
discarding bad sections. The system starts by randomly composing a group of melodies, tones
(i.e. an harmonization) and backing patterns (i.e. an accompaniment). It then let the user
select the preferred ones. Based on the preferences of the user, a genetic algorithm creates a
new set of melodies, tones and backing patterns selecting only the best options and the process
is iterated.
Another research field concentrates of the generation of music based on movement or visual
features. This system may be consider a mixture between a computer-aided composition system and a musical instrument. Wireless sensor networks are exploited by Chantasuban and
Thiemjarus [12] to detect human movements. The system is composed by a number of wireless
nodes, attached to the person’s body, that contain sensors (accelerometers, heat sensors, ...);
the nodes are hierarchically organized: each node elaborates the information of the lower classes
in order to create higher level of abstraction and finally generate musical signals. In this way
the performer can control the music production with the movements of his body.
Other systems are based on the extraction of music from images or shapes. The concept of
”synaesthesia” is explored trying to find the hidden link between human senses. Roth [44] generates music based on the characteristics of predefined shapes (spirals, circles, crosses, ...); when
a shapes appears on the screen, a predefined type of music is played; the speed, pitch and harmony of the music depends on the properties of the shape (translation or rotation speed, color,
...). The result is far from being appealing or musically relevant but it can be considered as a
good starting point. In the article by Li et al. [30], a system is trained to associate music with
images, based on some features (color, blur, ...). DLKW [14] transforms HTML code into music
by selecting riffs and notes from a predefined database. Group [21] proposes many application
that generate music from chaotic models or concrete music installations.
Polotti and Rocchesso [40] and Chafe [11] offer a general overview about current trends in
human-computer musical interactions from both technological and artistic sides.
C.4
Collaborative Music Composition
The improvements in telecommunication technology and the increasing availability and speed of
Internet connections allowed the creation of distributed music composition systems that enable
users from different locations to contribute to a
musical piece. These systems usually take advantage of innovative human-computer interaction interfaces such as Tabletop (i.e. an horizontal touchscreen), virtual musical instruments or human motion detectors.
In these applications the idea of composition is either the traditional musical score editing or a real
time audio performance but they could also be used
to divide the arrangement of a song among multiple musicians or for teaching purposes.
Figure C.4: The tabletop collaborative system Systems belonging to this category are, for example, MOODS (Music Object Oriented Distributed
System) (Bellini et al. [6]), which is a synchronous real-time cooperative editor for musical score
APPENDIX C. A SHORT STORY OF COMPUTER-AIDED COMPOSITION
101
or FMOL (F@ust Music On-Line) (Jordà and Wuest [23]) a web-based application to edit a
musical tree.
Another relevant collaborative composition environment is presented in the web page
http://www.noteflight.com, where users can edit musical scores together.
Pichiliani and Hirata [39] describes a collaborative tabletop system that allows users to work
together over a shared horizontal display and can also receive inputs from virtual musical instruments and MIDI devices. The structure of the system is displayed in Figure C.4. After
establishing a connection with the CoMusic Server each user can open an instance of CoTuxGuitar (a score editor program) to see and edit the notes that his/her instrument is producing,
which are stored separated in different parallel tracks of the music score. Also, the sound of every note played by any instrument is reproduced to all users and every modification in the notes
already played and stored in the track is replicated to all instances of CoTuxGuitar through the
server.
Stanford Laptop Orchestra The Stanford Laptop Orchestra (SLOrk) is a largescale, computer-mediated ensemble that explores cutting-edge technology in combination
with conventional musical contexts - while
radically transforming both. Founded in 2008
by director Ge Wang and students, faculty,
and staff at Stanford University’s Center for
Computer Research in Music and Acoustics
(CCRMA), this unique ensemble comprises
more than 20 laptops, human performers, controllers, and custom multi-channel speaker arrays designed to provide each computer metaFigure C.5: The Stanford Laptop Orchestra
instrument with its own identity and presence.
The orchestra fuses a powerful sea of sound with the immediacy of human music-making, capturing the irreplaceable energy of a live ensemble performance as well as its sonic intimacy and
grandeur. At the same time, it leverages the computer’s precision, possibilities for new sounds,
and potential for fantastical automation to provide a boundary-less sonic canvas on which to
experiment with, create, and perform music.
Offstage, the ensemble serves as a one-of-a-kind learning environment that explores music, computer science, composition, and live performance in a naturally interdisciplinary way. SLOrk uses
the ChucK programming language as its primary software platform for sound synthesis/analysis,
instrument design, performance, and education. (http://slork.stanford.edu/)
E quindi uscimmo a riveder le stelle
Dante Alighieri
Inferno XXXIV, 139