Automatic Audio Compositing System based on Music Information
Transcription
Automatic Audio Compositing System based on Music Information
POLITECNICO DI MILANO Facoltà di Ingegneria dell’Informazione Master of Science in Computer Engineering Automatic Audio Compositing System based on Music Information Retrieval Author: Luca Chiarandini 734917 Supervisor: Prof. Augusto Sarti Assistant supervisors: Dott. Massimiliano Zanoni Prof. Juan Carlos De Martin Academic year 2009-2010 Abstract In the past few years, music recommendation and playlist generation systems have become one of the most promising research areas in the field of audio processing. Due to the large diffusion of the Internet, users are able to collect and store a consistent amount of musical data and can make use of them in everyday life thanks to portable media players. The challenge of modern recommendation systems is how to process this huge amount of data in order to extract useful descriptors of the musical content, i.e. how to perform automatic tagging, catalog, indexing media material. This information may be used for many purposes: media search, media classification, market suggestions, media similarity measurements, etc. Until now, the traditional approach to this problem has been audio labelling. This operation consists in the definition of symbolic descriptors that can be used for generating the playlist. Examples of this sort are playlists based on music genre or artist name. This approach has some strong limitations: first of all, since labels are usually considered as descriptors of the whole musical piece, they cannot capture mood or genre changes inside the same song. Moreover, the label classification sometimes results in heterogeneous classes (e.g. music belonging to the same genre can be very different one from each other). This thesis gets into this context and it consists in the study and development of a music recommendation framework that allows the user to interact by means of more precise descriptors. The system intelligently recommends items of an audio database on the basis of the preferences of the user. Music Information Retrieval techniques are used in order to extract significant features from the audio signal and allow the user to interact with the system by means of high level interfaces such as musical tempo or timbric features. During the description of the system, we will prove the generality of the approach by describing some of the many applications that could be derived from the framework: an automatic DJ system, a tabletop interaction system, a playlist generation system based on runner’s step frequency and training-based recommendation system. The goal of this project is not only the development of a technically valid product but also an exploration of the artistic applications. The system is addressed to a wide public of performers (DJs, contemporary music executors, ...), composers and amateurs. I Sommario Negli ultimi anni, i sistemi di music recommendation e di generazione dinamica di playlists sono diventati aree di ricerca estremamente promettenti. Grazie alla grande diffusione di Internet, gli utenti possono memorizzare un insieme consistente di dati musicali e farne uso nel contesto di tutti i giorni grazie a riproduttori musicali portatili. Il problema dei moderni sistemi di music recommendation è come elaborare questa grande quantità di dati ed estrarre descrittori significativi del contenuto; questa informazione può essere usata per molti scopi: ricerca musicale, classificazione, consigli commerciali o misure di similarità audio. Fino ad ora, l’approccio tradizionale al problema è stato audio labeling. Quest’operazione consiste nella definizione di descrittori simbolici che possano essere usati per la generazione della playlist. Esempi di questo tipo sono playlist basate sul genere musicale o sul nome dell’artista. Questo approccio ha però alcune forti limitazioni: prima di tutto, le labels sono in genere considerate descrittori dell’intero brano musicale e non considerano cambiamenti di genere o mood all’interno della stessa canzone. Oltre a ciò, la classificazione per label si traduce spesso in classi molto eterogenee; ad esempio, musica appartente allo stesso genere può avere caratteristiche molto diverse. Questa tesi si inserisce in questo contesto e consiste nello studio e sviluppo di un framework di music recommendation che permetta all’utente di interagire tramite un insieme di descrittori più precisi. Il sistema consiglia in modo intelligente brani musicali sulla base delle preferenze dell’utente. Tecniche di Music Information Retrieval vengono usate al fine di estrarre features significative direttamente dal segnale musicale e permettere all’utente di interagire con il sistema per mezzo di interfacce di alto livello come tempo o features timbriche. Durante la descrizione del sistema, daremo prova della generalità dell’approccio usato descrivendo alcune delle molte applicazioni che possono essere derivate dal framework: un sistema di DJ automatico, un sistema di interazione tabletop, un generatore dinamico di playlist basato sulla frequenza del passo di una persona che corre a un sistema di recommendation basato sull’apprendimento. L’obiettivo di questo progetto non è solo lo sviluppo di una piattaforma tecnicamente valida ma anche l’analisi delle applicazioni artistiche che il sistema può trovare. Esso è infatti indirizzato ad un vasto pubblico di esecutori (DJs, esecutori di musica contemporanea, ...), compositori e dilettanti. II Acknowledgments I would like to thank Professor Augusto Sarti for the opportunity of this amazing work and for all his help. His passion and enthusiasm in the project were really contagious and supported me. Thanks to Doctor Massimiliano Zanoni for the careful supervision, for all the advices and for all those brain storming sessions we had together. It was absolutely invaluable; such a friendly and kind co-supervisor is hard to find! Thanks to Professor Juan Carlos De Martin for reading this mess, and for being so kind about it. Special thanks to all the researchers of ”ISPG Lab” in Milano and ”Laboratorio di Elaborazione e Produzione di Segnali Audio e Musicali” in Como for their help, hospitality and understanding. I would like to express my gratitude to all the friends that filled the questionnaire and in particular to Laura, Domenico and Paola who helped me in this difficoult phase. The evaluation turned out to be one of the most useful parts of this work since all the people were really enthusiastic about the software and gave many sincere advices. Special thanks to the fantastic friends who supported me during the last years, both in Trieste and in Milano. They are, in alphabetical order, Agnese, Alicia, Andrea B., Andrea M., Antoniela, Ashanka, Camilo, Cristina, Dalia, Dean, Ebru, Elisa, Fabio, Giorgio, Giulio, Jovan, Maicol, Marko, Mastro, Maurizio, Michele (the President), Paolo, Riccardo, Sara L.. Above all, many thanks to my family, who belived in me and gave me the opportunity to study in Milano and complete my Master of Science. III Contents 1 Introduction 1 2 State of the art 5 2.1 Music information retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Playlist generation and recommendation systems . . . . . . . . . . . . . . . . . . 7 3 Theoretical background 3.1 3.2 3.3 3.4 10 Feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.1.1 Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.1.2 Harmony . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.1.3 Tempo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.1.4 Brightness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.1.5 Rms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.1.6 Spectral centroid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.1.7 Spectral roll-off . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.1.8 Spectral flux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.1.9 Inharmonicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.1.10 MFCC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Feature analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.2.1 Support Vector Machine (SVM) . . . . . . . . . . . . . . . . . . . . . . . 26 3.2.2 Gaussian Mixture Model (GMM) . . . . . . . . . . . . . . . . . . . . . . . 27 The short-time Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.3.1 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.3.2 Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Time-scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.4.1 Time-scaling STFT algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.4.2 Time-scaling time-domain algorithms 31 IV . . . . . . . . . . . . . . . . . . . . CONTENTS V 4 Methodology 33 4.1 The problem of labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.2 Labelling in recommendation systems . . . . . . . . . . . . . . . . . . . . . . . . 33 4.3 The recommendation framework structure . . . . . . . . . . . . . . . . . . . . . . 36 4.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.4.1 Automatic DJ system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.4.2 Tangible interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.4.3 Dynamic playlist generation system based on runner’s step frequency . . 38 4.4.4 Training-based system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5 Implementation 5.1 5.2 5.3 40 Preprocessing phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.1.1 Features extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.1.2 Features similarity functions . . . . . . . . . . . . . . . . . . . . . . . . . 45 XML Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.2.1 Anchors XML Schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.2.2 Generic feature XML schema . . . . . . . . . . . . . . . . . . . . . . . . . 50 Performance phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.3.1 Work-flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.3.2 Proposal generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 5.3.3 Ranking system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 5.3.4 Transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 5.3.5 Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 6 Evaluation 78 6.1 Instance of the system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 6.2 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 6.2.1 Mood training dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 6.2.2 Evaluation dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 6.3 Test structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 6.4 Results of the questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 6.4.1 Auditory questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 6.4.2 Usage questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 6.4.3 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 Overview of the result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 6.5 CONTENTS VI 7 Perspectives and future developments 96 7.1 System-level improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 7.2 Implementation improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 7.3 From musical compositing to composition . . . . . . . . . . . . . . . . . . . . . . 99 A User manual 104 A.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 A.2 Running the system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 A.2.1 Feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 A.2.2 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 A.2.3 Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 B The audio database 108 B.1 Mood detection training set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 B.1.1 Anxious . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 B.1.2 Contentment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 B.1.3 Depression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 B.1.4 Exuberance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 B.2 The performance database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 C A short story of computer-aided composition 120 C.1 Algorithmic composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 C.2 Composition environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 C.3 Interactive composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 C.4 Collaborative Music Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 List of Figures 1.1 The work-flow of traditional recommendation systems . . . . . . . . . . . . . . . 2 1.2 The work-flow of our system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1 Genre classification triangular plot . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Mood extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3 Apple iTunes Genius logo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.1 Structure of a generic classification system . . . . . . . . . . . . . . . . . . . . . . 11 3.2 Gaussian chequerboard kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.3 Segmentation operation performed . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.4 Harmony: original spectral components . . . . . . . . . . . . . . . . . . . . . . . 14 3.5 Harmony: unwrapped chromagram . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.6 Harmony: wrapped chromagram . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.7 Harmony: key detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.8 Harmony: key clarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.9 Tempo: onset detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.10 Brightness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.11 RMS: comparison between rms and signal energy . . . . . . . . . . . . . . . . . . 22 3.12 Spectral roll-off . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.13 Inharmonicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.14 MFCC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.15 Feature space and items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.1 The work-flow of traditional recommendation systems . . . . . . . . . . . . . . . 34 4.2 The work-flow of our system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.3 The functional blocks of the system . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.4 The ReacTable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.5 The ReacTable framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 VII LIST OF FIGURES VIII 5.1 System stages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.2 Anchors points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5.3 Tempo optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 5.4 Mood bi dimensional plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.5 The hierarchical framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.6 Harmony similarity measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.7 The qualitative graph of harmony similarity . . . . . . . . . . . . . . . . . . . . . 47 5.8 Harmony similarity measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.9 Graph of compareh armony . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.10 Tempo similarity graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.11 The XML data model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.12 The functional blocks of the system . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.13 System startup temporising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 5.14 Tclosing and Topening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 5.15 Proposal selection closed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 5.16 Proposal selection opened . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 5.17 Proposal play started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 5.18 Cancel next ”proposal selection closed” event . . . . . . . . . . . . . . . . . . . . 58 5.19 Anticipated proposal selection closing . . . . . . . . . . . . . . . . . . . . . . . . 59 5.20 Anticipated proposal selection opening . . . . . . . . . . . . . . . . . . . . . . . . 60 5.21 An example of the fixed-length algorithm . . . . . . . . . . . . . . . . . . . . . . 62 5.22 Variable-length similarity algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 63 5.23 GMM training example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 5.24 Transition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5.25 Timescale operation in a transition . . . . . . . . . . . . . . . . . . . . . . . . . . 67 5.26 Transition timescale linear approximation . . . . . . . . . . . . . . . . . . . . . . 68 5.27 Beat synch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.28 The program window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 5.29 The next song selection entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.30 System parameters tabbed panel . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.31 ReacTable system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.32 Feature weights tangibles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.33 Feature values tangibles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5.34 The Wii Remote . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 LIST OF FIGURES IX 6.1 The characteristics of the tester set . . . . . . . . . . . . . . . . . . . . . . . . . . 80 6.2 The paper version of the questionnaire . . . . . . . . . . . . . . . . . . . . . . . . 81 6.3 The electronic version of the questionnaire . . . . . . . . . . . . . . . . . . . . . . 82 6.4 Did the system play pleasant music? . . . . . . . . . . . . . . . . . . . . . . . . . 84 6.5 How do you evaluate the transitions between songs? . . . . . . . . . . . . . . . . 85 6.6 How well can the system be applied in the following fields? . . . . . . . . . . . . 87 6.7 How do you rate the following...? . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 6.8 How well does the system suit you artistic needs? . . . . . . . . . . . . . . . . . . 90 6.9 Do you think the system could enhance your artistic performance? . . . . . . . . 91 6.10 Does the system respond to the changes in the parameters? . . . . . . . . . . . . 92 6.11 How do you rate the music proposal made by the system? . . . . . . . . . . . . . 93 6.12 Is the system intuitive (how much time do you need to learn how to use it)? . . . 94 7.1 97 FL Studio interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.1 MATLAB interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 A.2 MIRtoolbox logo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 A.3 Java Runtime Environment logo . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 A.4 Bluetooth logo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 C.1 Relevant music scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 C.2 iMUSE Logo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 C.3 OpenMusic 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 C.4 The tabletop collaborative system . . . . . . . . . . . . . . . . . . . . . . . . . . 123 C.5 The Stanford Laptop Orchestra . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 List of Tables 5.2 The qualitative measure of harmony similarity . . . . . . . . . . . . . . . . . . . 45 5.1 Harmony similarity measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.3 Mood similarity table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.4 Anchor XML structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.5 Generic feature XML structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 X Chapter 1 Introduction In the past few years, the availability and accessibility of media have increased as never before. Users are able to store a huge amount of musical data and video content and make use of them wherever and whenever they like, thanks to portable media players. The main problem that modern audio applications are now facing is how to process this content in order to extract useful descriptors, i.e. how to perform automatic meta-data extraction, catalog, labelling media material. This information can be used for many purposes: media search, media classification, market suggestions, media similarity measurements, ... The generation of high-level symbolic descriptors of media is usually done by hand, therefore it is error prone and time-demanding. The discipline that addresses the automatic generation of media tagging and high-level description of content is Multimedia Information Retrieval (MIR). When content is limited to musical audio, it contextualises to Music Information Retrieval, which is aimed at deriving descriptors directly from the musical content. The goal of feature extraction is usually the creation of music recommendation systems. Such systems attempt to make content choices according to the inferred preferences of the user. Systems of this sort are particularly suitable for market analysis and/or playlist generation systems. Figure 1.1: The work-flow of traditional recommendation systems Most of the recommendation systems available today follow the work-flow described in Figure 1.1. The signal is analysed and a set of features are extracted to form N-dimensional vectors. Vectors of the features space are then analysed through a decision process that returns outcomes in the form of labels. This symbolic information can thus take on a finite or numerable number of values, which results in a greatly limited discrimination power. Examples of labels are: artist, genre, author, style, etc. Recommendation systems usually start from this symbolic description to generate playlists based on the values of these labels. The loss of discriminant power associated to the conversion from feature vectors to labels is the main reason why recommendation systems tend to perform quite poorly. The reason why playlists do not expose a constant behaviour could be due to: • Temporal inhomogeneity: labels tend to be global descriptors of a whole musical piece and to not tend to capture local changes in the mood and tempo or genre transitions within the 1 CHAPTER 1. INTRODUCTION 2 same piece. This makes it very hard to create playlists that locally adapt to a particular state (e.g. a storytelling system able to reproduce music that describes a series of events). • Ensemble inhomogeneity: labels tend to apply to musical pieces that differ a great deal from each other. Choosing labels as control parameters results therefore in a weak control on the part of the user. Consider, for example, genre classification: music belonging to the same genre are very different from each other; if the user selects a playlist based on a particular genre, this has a very weak connection with the actual mood of the song Figure 1.2: The work-flow of our system In this thesis, we would like to create a recommendation system that avoids the conversion from feature vectors to labels (Figure 1.2). We will design a layer between the recommendation system and the user that allows the user to control it in a by acting directly on the features. Moreover, the system should face the problem of temporal inhomogeneity by considering how the feature vectors change inside the same musical piece. To prove the generality of this concept, we will present some ways in which the values of the features can be defined by the use. They have been ordered according to the amount of interaction between the user and the machine. • low-level feature based : the user may directly specify the values of the features by acting on a set of controllers. This is a very precise but also low-level interaction mechanism • high-level feature based : the user interacts by means of a set of high level features which incorporate other sub-features (for example tempo, harmony, ...) • feature detection based : the values of the features are extracted from the actions of the user or from other types of data. Examples of this kind may be: – a system that extracts the tempo from the movements of a dancing person – a system that performs blob detection on a dancing floor in order to estimate the number of people dancing and play music according to this parameter. The system can, for example, select certain songs to encourage people to dance when the dancing floor is empty – a system that can detect the step frequency and heart rate of a running person and adapt the tempo of the songs to these values, • Training based : the system embodies some sort of learning mechanism that can be trained on the preferences of the user and suggest good features candidates by itself These four classes can coexist in the same system and it is even possible to combine them. In this thesis, we will develop a general framework for feature-based recommendation systems. We will then show some specific applications of the system obtained by performing slight adjustments. CHAPTER 1. INTRODUCTION 3 Overview of thesis The thesis is organised as follows. In chapter 2, a list of related works is presented. This chapter is intended as a broad view on the environment in which the system moves. In chapter 3 we describe some important concepts of signal processing and musical feature extraction that will be used in the system. Chapter 4 gives a general description of the recommendation framework and presents some applications of the system. In chapter 5 the implementation details are presented. Chapter 6 performs an evaluation of the system. Finally, chapter 7 presents guidelines for future improvements and evolutions. To give a more practical reference and a description of the deliverables, Appendix A contains an user manual, addressed to a user of the system. Appendix B contains the list of audio items used during the evaluation phase. In the case the reader is interested in knowing more about the history of computer-aided composition systems, Appendix C contains an excursus about this topic. Chapter 2 State of the art In this section we will discuss about related works from both research and commercial fields. The chapter starts with Music Information Retrieval, a new branch in digital signal processing that aims at extracting relevant descriptors directly from the audio content. It will be shown how important this notion of content-based analysis is and we will then cite the latest achievements in the field. Afterwards, the most recent applications in the field of automatic playlist generation are explained. This is a new growing field of research in the field of Digital Signal Processing due to the recent diffusion of portable audio players and media managers. From the analysis of this area, we would like to point out the innovative content of our framework by highlighting the absence of systems with similar characteristics. 2.1 Music information retrieval As we previously mentioned, the increasing quantity of media data available on the Internet created the need of managing them and finding ways of automatically extract meta-data and indexing huge databases. Therefore, a new discipline has been born: Music Information Retrieval; it merges the latest achievements in Digital Signal Processing with notions of musicology and psychoacoustics to extract parameters that describe the musical piece. Relevant work has been made in order to extract low and high level features from audio signals. Low level features (e.g. brightness, tempo, ...) are usually extracted directly from the audio stream by means of statistical and signal processing methods, whereas high level features (genre, mood, ...) are often calculated on the basis of low level features and try to classify abstract or global characteristics of the musical track. Antonacci et al. [2] give a general overview on feature extraction; low-level features as well as higher level extraction methods are explained. Feng et al. [16] defines the so-called ”Average silence ratio” to characterise the articulation in a musical piece. Gillet and Richard [18] and Fitzgerald [17] derive the drum score from a polyphonic music signal exploiting techniques such as autocorrelation, eigenvalues decomposition and principal components analysis. This work, although it is experimental and sometimes imprecise, gives a hint of the capability of this new research field. An interesting study on high-level features is described by Prandi et al. [41], in which the authors classify and visualise audio signals on the basis of three features: classicity (timbric feature which tells if the current segment presents a classical sound), darkness (relative power of low frequencies with respect to high frequencies) and dynamicity. The system is trained with a set of sample signals and is able to classify the emotional content. The result is visualised in a 4 CHAPTER 2. STATE OF THE ART 5 Figure 2.1: Genre classification triangular plot triangle plot (Figure 2.1). Other applications of feature extraction applied to genre classification have been developed by Pampalk et al. [36] and Li and Ogihara [29]. Kapur et al. [25] created a query-by-beat-boxing system that is able to detect the tempo of BeatBoxing and use it to dynamically browse a database of music. BeatBoxing is a type of vocal percussion, where musicians use their lips, cheeks, and throat to create different beats. Generally, the musician is imitating the sound of a real drumset or other percussion instrument, but there are no limits to the actual sounds that can be produced with their mouth. The system can be used by experienced retrieval users that are eager to try new technologies, namely DJs. Mood extraction A new research thread in feature extraction is the so-called ”mood extraction”: it consists in using a set of techniques (data mining, neural networks, signal processing, ...) in order to detect the emotional content of a musical piece (i.e. the ”mood”). A generic mood extraction process consists in the following phases: • Training phase: in this phase the system analyses a set of user-selected audio items assigned to some emotional categories (sad, happy, anxious, ...). The aim of this step is to train the system to understand the peculiarities of each category. The training is usually performed as follows: – Low feature extraction: The input signal is windowed. Windows are usually very short (around 25-50 milliseconds) and overlapped. A signal processing algorithm is then applied to the frames to extract low level features such as: ∗ Timbric features: MFCC (Mel-frequency Cepstrum Coefficients), spectral flux, ... ∗ Intensity features: RMS (root-mean-square), energy, ... ∗ Rhythmic features: tempo, articulation, ... ∗ Tonal features: harmony, pitch, inharmonicity, ... – Then the time evolution of each feature is usually summarised in a finite number of variables (e.g. mean and variance) – A classification method (e.g. SVM, GMM, ...) is applied to extract a model of the data • Classification phase: the trained model is applied to the new audio items. CHAPTER 2. STATE OF THE ART 6 The main difference between the mood extraction algorithms are the choice of the categories and the low-level features and the classification method; behind each of them lies a vision of the human emotions that leads the author to derive his or her models. Laurier and Herrera [28] define a set of binary overlapping classes to which each musical piece can belong with a certain degree of confidence; the classes could be, for example, happy/not happy, sad/not sad, aggressive/not aggressive, ... The system is then trained on them. The result is displayed in an easy-to-understand interface (see Figure 2.2a). The interesting aspect of the method is the concept of overlapping classes: a musical segment may be for example evaluated both happy and relaxed, at the same time. Mood is therefore not be considered just as a single variable but as a set of variables that cohabit in the same application. Similarly, Liu et al. [31] classify the data according to two high level features: energy (intensity of the signal, power, ...) and stress (timbric/tonal feature). According to these two dimension, the mood feature is modelled as a point on a bi-dimensional plane. The interesting point of this method is the possibility of a 2-D gray-scale classification since the boundaries between classes are blurred allowing a more accurate characterisation of the items (see Figure 2.2b) The same bi-dimensional plane is used by Yang et al. [48]: the emotion plane is composed by the two dimensions: arousal (how exciting/calming) and valence (how positive/negative) (Figure 2.2c). In addition to this, Meyers [34] and Govaerts et al. [20] use the song lyrics to improve accuracy; in fact, lyrics are usually related to the emotional content of the song. The system classify the mood detecting key words in the text. The lyrics training phase is however very hard and time demanding for a wide spread use and needs a large up-to-date database and dictionary; moreover, lyrics are sometimes confusing and ambiguous. (a) Mood cloud interface (b) Energy/Stress plane (c) Arousal/Valence plane Figure 2.2: Mood extraction CHAPTER 2. STATE OF THE ART 2.2 7 Playlist generation and recommendation systems We will now see how the research in Music Information Retrieval may be applied to playlist generation. The diffusion of portable musical devices (such as Apple iPod or Creative Zen) arose the need for algorithm that efficiently combine musical files. A good playlist usually satisfies some constraints (user-preferred items are played more often...) and is designed to find a balance between coherency (similar items are player one after another) and novelty (the listener should not get bored). A common assumption in playlist generation is that the transitions between audio items are performed only at the beginning and at the end; it is not considered the case of transitions between sections of songs. Therefore, the inter-item similarity function used to create the playlist only considers a small fraction of the signal. Furthermore playlist generation systems are not usually designed to adapt to fast changes in the mood or the genre; instead they are used to create sequences of songs that will be played for their entire length. Therefore playlist generation systems are usually based on the extraction of features describing the entire piece (such as genre, artist, album, etc.). Many approaches has been developed, all sharing the goal of adaptation to the user by collecting his preferences. The methods are many: artificial intelligence search (Pauws et al. [38] and Aucouturier and Pachet [4]), fuzzy logic (Bosteels and Kerre [7]) and audio feature extraction (Shan et al. [45]). The features used to generate a playlist are not only musical or related to the audio data; Reynolds et al. [42] underlines the importance of contextual information (time of the day, temperature, location, ...) during the training phase and the applications of playlist generation systems. The contextualisation is essential when the playlist is generated by portable audio players; it has been shown that the type of music people listen to strongly depends on the time, location and activity (driving, doing sports, relaxing, ...). An interesting playlist generator system is described by Masahiro et al. [33] and it is used to adapt the music to the behaviour of a running person. The system uses an accelerometer to detect the runner’s step frequency and is able to select and play a song with the same beats per minute. We will see in the following chapter how systems such as this can seen as instance of the recommendation framework of this thesis. Apple iTunes Genius Genius is an automatic playlist generator and recommendation system integrated in iTunes. The system is based on collaborative filtering of huge amount of data derived from the iTunes’ libraries of the users. The source code is currently company secret but, based on scientific studies and personal experience, we can infer some implementation details. The system seems not to take into account the audio content of the music library but only the meta-data, improved also by the Gracenote MusicID service. In fact, the system only works with well known songs and artists that are Figure 2.3: Apple iTunes Gepresent on the on-line Gracenote database. nius logo The recommendation system, although it is not content-based, can be compared to a content based system and the results are surprising: Barrington et al. [5] describes how Genius can capture audio and artist similarity only exploiting collaborative filtering. Somehow the system is able to exploit the users as content-based analysers since the users usually listen to similar songs genres. CHAPTER 2. STATE OF THE ART 8 Music search engines Another growing research field is music search engines. The challenges (Nanopoulos et al. [35]) of modern music search engines are: • Search by meta-data • Search by lyrics • Search by audio data • Query by humming • Recommendation of similar music It is clear that feature extraction methods are highly exploited (Pardo [37]). Moreover, since the database is very large, optimisation and audio fingerprinting (similar to hashing functions) are needed. Cai et al. [10] develops an audio fingerprinting method that is resistant to distortion and noise and allows scalability; similar musical segments will tend to have the same fingerprint. When executing a query, the system only compares fingerprints without analysing the underlying audio data. The efficiency of audio fingerprint methods could be seen in Shazam [32] a music search engine for portable devices (iPhone, Android, ...); it can recognise the track name and artist from a short sample of the song. The system is resistant to noise and distortion and works well even in noisy environments (disco, bars, pubs, ...). Chapter 3 Theoretical background This chapter presents the main theoretical tools that will be used to build the system. We start by the explanation of the concept of feature extraction from musical signals, citing some relevant audio features. After that, we describe how features could be analysed and we will present two machine learning techniques: Support Vector Machines and Gaussian Mixture Models. These techniques will also be used in the system. Finally, we will introduce the concept of Short-time Fourier Transform, focused on explaining the time-scaling techniques. 3.1 Feature extraction This chapter will deal with the most recent advancements in automatic information retrieval from audio signals. This is a very huge field of research and many applications can be devised. Digital analysis may discriminate whether an audio file contains speech, music or other audio entities, how many speakers are contained in a speech segment, what gender they are and even which persons are speaking. Music may be classified into categories, such as jazz, rock, classics, etc. Often it is possible to identify a piece of music even when performed by different artists or an identical audio track also when distorted by coding artifacts. Finally, it may be possible to identify particular sounds, such as explosions, gunshots, etc. The feature extraction process can be summarised in Figure 3.1. In order to clean the audio data and enhance the performance of the feature extraction techniques, some operations are performed: • conversion from stereo to mono signal • de-noising (if needed) • signal down-sampling to improve performance • windowing • time scaling of the signal (if needed) • segmentation, that divides the signal in segments, called frames, in correspondence of significant points 9 CHAPTER 3. THEORETICAL BACKGROUND 10 Figure 3.1: Structure of a generic classification system Once this step is completed, it is possible to extract the descriptors useful for our purposes. For ~ = di (l), i = 1, ..., M is extracted; each each frame l, a set of descriptors (also called features) d(l) set is a point in a multidimensional space. Our goal is to find the one such that: ~ and d(k) ~ • instances of descriptors belonging to the same class d(l) with k 6= l are grouped in the same cluster independently from k and l • it should be always possible to separate descriptors related to different classes cg and ch with g 6= h We will now describe some relevant low-level features that will be used in the system. 3.1.1 Segmentation The segmentation phase divides the audio file in phrases and define some ”interesting points” in the musical stream. Some examples of good separation points could be in correspondence of: • harmony change • tempo change • musical phrase start • spectrum change • ... To be effective, segmentation should be performed in such a way that music between two anchor has almost constant characteristics. The points may be defined manually or automatically. A composer may for example manually define anchor points in order to divide the song according to his or her personal interpretation. If the segmentation is performed automatically, one solution consists in a peak detection in the spectrum novelty function. The novelty curve indicates the temporal locations of significant textural changes and is the convolution of the similarity matrix (Figure 3.3a) along the main diagonal using a Gaussian chequerboard kernel (Figure 3.2). A Gaussian chequerboard kernel is obtained from a point to point multiplication between the bi-dimensional Gaussian function and the following function: ( +1 if sign(x) = sign(y) f (x, y) = −1 otherwise CHAPTER 3. THEORETICAL BACKGROUND 11 Figure 3.2: Gaussian chequerboard kernel • From the input signal, the system computes the similarity matrix that shows the similarity between all possible pairs of frames from the input data (Figure 3.3a). • Along the diagonal of the similarity matrix we are able to see among similarity of a frame and its neighbours. • If we perform the convolution along the diagonal with a chequerboard Gaussian filter, we obtain a one-dimensional function, the novelty (Figure 3.3b). (a) Segmentation: computation of the similarity matrix (b) Segmentation: convolution along the diagonal Figure 3.3: Segmentation operation performed • Performing a peak detection on the resulting curve, we detect the instants of maximum local novelty. 3.1.2 Harmony In music, the harmony is defined as the composition between a key (e.g. C, C#, ...) and a mode (major, minor). It is extracted from the analysis of the chromagram. The chromagram, also called Harmonic Pitch Class Profile, represents the energy distribution along the pitches or pitch classes. It is obtained in the following way: • First, the spectrum is computed in the logarithmic scale, with selection of, by default, the 20 highest dB, and restriction to a certain frequency range that covers an integer number of CHAPTER 3. THEORETICAL BACKGROUND 12 octaves, and normalisation of the audio waveform before computation of the FFT (Figure 3.4). Figure 3.4: Harmony: original spectral components • The chromagram is a redistribution of the spectrum energy along the different pitches (i.e.,chromas) (Figure 3.5): Figure 3.5: Harmony: unwrapped chromagram • If we discard the information about the octaves, we obtain the wrapped chromagram (Figure 3.6): • In order to determine the harmony, we compute the key strength (also called key clarity), i.e., the probability associated with each possible key candidate, through a cross-correlation (Figure 3.7) of the chromagram returned, wrapped and normalised, with similar profiles representing all the possible tonality candidates (Krumhansl [26]; Gmez [19]). • The resulting graph indicate the cross-correlation score for each different tonality candidate (Figure 3.8). • The selected harmony is the one corresponding to the maximum value. 3.1.3 Tempo The tempo, expressed in BPM (beats per minute), is estimated by detecting periodicities from the onset detection curve. CHAPTER 3. THEORETICAL BACKGROUND Figure 3.6: Harmony: wrapped chromagram Figure 3.7: Harmony: key detection Figure 3.8: Harmony: key clarity 13 CHAPTER 3. THEORETICAL BACKGROUND 14 A way of determining the tempo is based on first the computation of an onset detection curve, showing the successive bursts of energy corresponding to the successive pulses (Figure 3.9). A peak picking is automatically performed on the onset detection curve, in order to show the estimated positions of the notes. The onset detection can be applied to multiple functions (signal Figure 3.9: Tempo: onset detection envelope or spectrum) or can be performed in parallel to a filter bank. The system computes the autocorrelation function of the onset detection curve. Then a peak picking is applied to the resulting function. 3.1.4 Brightness The brightness (Figure 3.10) is a low level feature that expresses the power of the low-frequency bands over the power of the high-frequency ones. This ratio usually ranges between 0.2 and 0.8 for ordinary musical tracks. This feature can be used as a rough estimator of the darkness of the musical piece. R +∞ X(ω) brightness = R threshold (3.1) threshold X(ω) 0 Figure 3.10: Brightness where threshold is a pre-defined value usually around 1500 Hz. CHAPTER 3. THEORETICAL BACKGROUND 3.1.5 15 Rms The global energy of the signal x(t) can be computed simply by taking the root average of the square of the amplitude, also called root-mean-square (RMS): v r u N u1 X x21 + x22 + x23 + ... + x2N x2i = (3.2) xrms = t N N i=1 Figure 3.11a shows an example of computation of the RMS feature. We can note that this energy curve is very close to the envelope (Figure 3.11b). (a) Rms (b) Energy Figure 3.11: RMS: comparison between rms and signal energy 3.1.6 Spectral centroid An important and useful description of the shape of a distribution can be obtained through the use of its moments. The first moment, called the mean, is the geometric center (centroid) of the distribution and is a measure of central tendency for the random variable. Z centroid = xf (x) dx (3.3) 3.1.7 Spectral roll-off One way to estimate the amount of high frequency in the signal consists in finding the frequency such that a certain fraction of the total energy is contained below that frequency (Figure 3.12). This ratio is fixed by default to .85 (following Tzanetakis and Cook, 2002), other have proposed .95 (Pohle, Pampalk and Widmer, 2005). CHAPTER 3. THEORETICAL BACKGROUND R spectralRollof f −∞ R +∞ X 2 (ω) dω X 2 (ω) dω −∞ 16 = .85 (3.4) Figure 3.12: Spectral roll-off 3.1.8 Spectral flux Spectral flux is a measure of how quickly the power spectrum of a signal is changing, calculated by comparing the power spectrum for one frame against the power spectrum from the previous frame. Given the spectrum of the signal, we can compute the spectral flux as being the distance between the spectrum of each successive frames. More precisely, it is usually calculated as the 2-norm (also known as the Euclidean distance) between the two normalised spectra. Z +∞ (f1 (τ ) − f2 (τ ))2 dτ f lux = (3.5) −∞ 3.1.9 Inharmonicity This feature estimates the inharmonicity, i.e., the amount of partials that are not multiples of the fundamental frequency f0 , as a value between 0 and 1. For that purpose, we use a simple function (Figure 3.13) estimating the inharmonicity of each frequency given the fundamental frequency f0 : +∞ X ω − i · f0 Finharmonicity (ω) = T( ) (3.6) f0 i=1 where T (ω) is a triangular function defined as follows: 0 if ω < 0 2 · ω if 0 ≤ ω < 12 T (ω) = 2 − 2 · (ω) if 12 ≤ ω < 1 0 if ω ≥ 1 The inharmonicity is then computed as Z inharmonicity = +∞ X(ω)Finharmonicity (ω) dω 0 (3.7) CHAPTER 3. THEORETICAL BACKGROUND 17 where X(ω) is the Fourier transform of the signal. Figure 3.13: Inharmonicity 3.1.10 MFCC Mel-frequency cepstral coefficients (MFCCs) offer a description of the spectral shape of the sound. They derive from a type of cepstral representation of the audio clip (a nonlinear ”spectrum-of-a-spectrum”). A cepstrum is the result of taking the Fourier transform (FT) of the decibel spectrum as if it were a signal. The difference between the cepstrum and the mel-frequency cepstrum is that in the second, the frequency bands are equally spaced on the mel scale, which approximates the human auditory system’s response more closely than the linearly-spaced frequency bands used in the normal cepstrum. MFCCs are commonly derived as follows (see also Figure 3.14): • Take the Fourier transform of (a windowed excerpt of) a signal. • Take the logs of the powers of the frequencies • Map the powers of the spectrum obtained above onto the mel scale, using triangular overlapping windows. • Take the discrete cosine transform of the list of mel log powers, as if it were a signal. • The MFCCs are the amplitudes of the resulting spectrum. The discrete cosine transform (DCT) is a Fourier-related transform similar to the discrete Fourier transform (DFT), but using only real numbers. It has a strong ”energy compaction” property: most of the signal information tends to be concentrated in a few low-frequency components of the DCT. That is why by default only the first 13 components are returned. By convention, the coefficient of rank zero simply indicates the average energy of the signal. Figure 3.14: MFCC CHAPTER 3. THEORETICAL BACKGROUND 3.2 18 Feature analysis We will now present some mechanism that are used to analyse feature or to extract higherlevel features. These methods mainly consist in the training of an expert system to recognise a particular state. 3.2.1 Support Vector Machine (SVM) We will now discuss a relevant methods used for classification: Support Vector Machine (SVM). A SVM is a binary classifier that learns the boundary between items belonging to two different classes. It works by searching a suitable separation hyperplane between the different classes in the feature space. The best separation hyperplane maximise its distance from the closest training items. However, in all days scenarios, we are not always able to trace a separation hyperplane between the different classes: some items fall in the region of the feature space belonging to the opposite class. On the other hand it may happen that it is not possible to find an hyperplane that separates the different classes. In such a situation we will operate a transformation of the feature space in order to ”rectify” the separation surface into an hyperplane. In this subsection we will deal with linear and separable SVMs. Figure 3.15 shows an example of dataset and trained SVM; note that more than one hyperplane can be found that separates the classes. Figure 3.15: Feature space and items Let xi with i = 1, ..., N be the feature vectors of the training data X. These belong to either of two classes, ω1 , ω2 , which are assumed to be linearly separable. The goal is to design a hyperplane: g(x) = wT x + w0 (3.8) that correctly classifies all the training vectors. Such a hyperplane is not unique. Let us now quantify the term ”margin” that a hyperplane leaves from both classes. Every hyperplane is characterised by its direction (determined by w) and its exact position in space (determined by w0 ). Since we want to give no preference to either of the classes, then it is reasonable for each direction to select that hyperplane which has the same distance from the respective nearest points in ω1 and ω2 . Our goal is to search for the direction that gives the maximum possible margin. However, each hyperplane is determined within a scaling factor. We will free ourselves from it, by appropriate scaling of all the candidate hyperplanes. The distance of a point from a hyperplane is given by |g(x)| (3.9) z= kwk CHAPTER 3. THEORETICAL BACKGROUND 19 We can now scale w, w0 so that the value of g(x), at the nearest points in ω1 and ω2 (circled in Figure), is equal to 1 for ω1 and, thus, equal to - 1 for ω2 . This is equivalent with 1. Having a margin of 1 kwk − −1 kwk = 2 kwk 2. Requiring that wT x + w0 > 1, ∀x ∈ ω1 (3.10) wT x + w0 < −1, ∀x ∈ ω2 (3.11) For each xi , we denote the corresponding class indicator by yj (+1 for ω1 , - 1 for ω2 .) Our task can now be summarised as: Compute the parameters w;w0 of the hyperplane in order to: 1. minimise J(w) = 12 kwk2 2. subject to yi · (wT xi + w0 ) ≥ 1 i = 1, 2, ..., N . Obviously, minimising the norm makes the margin maximum. This is a nonlinear (quadratic) optimisation task subject to a set of linear inequality constraints. 3.2.2 Gaussian Mixture Model (GMM) GMMs have been widely used in the field of speech processing, mostly for speech recognition, speaker identification and voice conversion. Their capability to model arbitrary probability densities and to represent general spectral features motivates their use. The GMM approach assumes that the density of an observed process can be modelled as a weighted sum of component densities bm (x): M X p(x|λ) = cm bm (x) (3.12) m−1 where x is a d-dimensional random vector, M is the number of mixture components and bm (x) is a Gaussian density, parameterised by a mean vector µm and the covariance matrix Σm . The coefficient cm is a weight that is used to model the fact that the different densities have different heights in the probability density function. The parameters of the sound model are denoted as λ = {cm ; µm ; Σm }, m = 1, ..., M. The training of the Gaussian Mixture Models consists in finding the set of parameters λ that maximises the likelihood of a set of n data vectors. Different alternatives are available in the literature to perform such estimation. One of them is the Expectation Maximisation (EM) algorithm. The algorithm works by iteratively updating the vector λ and the estimation of the probability density function p(m|xi , λ) for each element in the training set. In the case of diagonal covariance matrices the update equations become: µnew m Σnew m Pn = Pn = i=1 p(m|xi , λ) · xi p(m|xi , λ) − µm )T (xi − µm ) p(m|xi , λ) i=1 p(m|xi , λ)(xi (3.13) (3.14) n cnew m 1X = p(m|xi , λ) n i=1 the value p(m|xi , λ) is updated at each iteration by the following equation: (3.15) CHAPTER 3. THEORETICAL BACKGROUND 20 cm bm (xi ) p(m|xi , λ) = PM j=1 cj bj (xi ) (3.16) Let us now consider the decision process: if we have a sequence of L ≥ 1 observations X = x1 , x2 , ..., xL and we want to emit a verdict, we have to choose the model among λ1 , λ2 , ..., λK that maximises the a posteriori probability for the observation sequence: k̂ = arg max P (λk |X) = arg max 1≥k≥K 1≥k≥K P (X|λk )P (λk ) p(X) (3.17) The computation can be greatly improved: in fact p(X) is the same for k = 1, ..., K. Furthermore, assuming that P (λk ) are equal for each class of sounds, the classification rule simplifies to: K̂ = arg max p(X|λk ) 1≥k≥K (3.18) Using logarithms and the independence between observations, the sound recognition system computes: K̂ = arg max 1≥k≥K 3.3 L X log(p(xi |λk )) (3.19) l=1 The short-time Fourier transform A useful technique for the modification of signals is given by the Short Time Fourier Transform (STFT), that allows to represent a signal in a joint time and frequency domain. The basic idea consists of performing a Fourier Transform on a limited portion of the signal, then shifting to another portion of the signal and repeating the operation. This results in a set of Fourier Transforms (frequency-domain representation) at different time instants (time-domain). We have to consider two distinct phases for the modification of a signal based on the STFT. The first one is the analysis phase: the signal is subdivided into small windowed portions on which a Fourier transform is calculated; the signal is therefore described by the Fourier coefficients at different time instants (short-time spectra) that can be modified in order to realise some digital audio effects (e.g. time or pitch-scaling). The reconstruction of the signal from the STFT representation takes place with the synthesis phase, by means of a Inverse Fourier transform applied on each modified spectrum. We now give a formal definition of the analysis and synthesis phases, according to the band-pass convention of STFT. 3.3.1 Analysis The short-time Fourier transform of a signal x(t) is defined by: X(tua , Ωk ) = +∞ X h(n)x(tua + n)e−jΩk n (3.20) n=−∞ u where h(n) is the analysis window, Ωk = 2πk N , ta = uR is the u-th analysis time instant, in which R represents a fixed integer increment that controls the analysis rate. For time and pitch-scaling modifications the analysis rate can also be non-uniform. CHAPTER 3. THEORETICAL BACKGROUND 3.3.2 21 Synthesis Given an arbitrary sequence of synthesis short-time Fourier transforms Y (tus , Ωk ) , there is in general no time-domain signal y(n) of which Y (tus , Ωk ) is the short-time Fourier transform. However many methods exist to obtain an approximation of y(n). The most general reconstruction formula is: N −1 +∞ X 1 X u y(n) = w(n − tus ) Y (tus , Ωk )ejΩk (n−ts ) (3.21) N u=−∞ k=0 in which w(n) is the synthesis window and tus is the u-th synthesis time instant. The perfect reconstruction of the original signal (in absence of modification between the analysis and synthesis stages) is achieved when tus = tua (3.22) Y (tus , Ωk ) = X(tua , Ωk ) (3.23) if, for each n, +∞ X w(n − tus )h(tua − n) = 1 (3.24) u=−∞ 3.4 Time-scaling The goal of time scaling is to slow down or speed up a given signal, in a time varying manner, without altering the signal’s spectral content (i.e. without altering its pitch). In order to obtain a time-scale modification, we have to define an arbitrary time-scale function which allows us to specify a mapping between the time t in the original signal and time t0 in the modified signal. This mapping is performed through a time warping function T : t 7→ t0 (3.25) Although the general expression for T(t) can be very general and can be specified by an integral definition such as Z t T (t) = β(τ ) dτ (3.26) 0 The term β represents the time modification rate; in particular β > 1 corresponds to a timescale expansion (the signal is slowed down) while β < 1 corresponds to a time-scale compression (the signal is speeded up). Notice also that it should be β > 0, since a negative time modification rate has no physical meaning. Give the sinusoidal model of a signal x(t) = I(t) X Ai (t)ejφi (t) (3.27) i=1 with Z t φi (t) = ωi (τ ) dτ (3.28) −∞ The time-scaled signal is the following I(β −1 (t0 )) 0 0 x (t ) = X i=1 0 0 Ai (β −1 (t0 ))ejφi (t ) (3.29) CHAPTER 3. THEORETICAL BACKGROUND with φ0i (t0 ) = Z t0 22 ωi (β −1 (t0 )) dt (3.30) −∞ We can see that the expression of ideal time-scaled signal is still a linear combination of the sinusoidal components of the originary sinusoidal model. The signal is modified in a way that the i-th sinusoid at time t’ corresponds to the instantaneous amplitude in the original signal at time t = β −1 (t0 ) . The same holds for the instantaneous frequency of the i-th sinusoid at time t’ which corresponds to the instantaneous frequency in the original signal at time t = β −1 (t0 ). Notice that the phase term is obtained applying the inverse mapping function on time axis, and not simply replacing t with t’ (corresponding to the simpler operation of time warping). This results in a signal whose time-evolution is modified while its frequency content remains unchanged. In the following sections, frequency- and time-domain techniques for time scaling are described. 3.4.1 Time-scaling STFT algorithm The short-time Fourier transform gives access to the implicit sinusoidal model parameters; hence, the ideal time-scaling operation can be implemented in the same framework. Synthesis time instants tus are set at a regular interval R = tu+1 − tus from the series of synthesis time instants s u u ts , analysis time instants ta are calculated according to the desired time warping function. The short-time Fourier transform of the time-scaled signal is then: u Y (tus , Ωk ) = |Y (T (tus ), Ωk )| ejφk (ts ) (3.31) φk (tus ) = φk (tu−1 ) + ωk (T (tus )) · R s (3.32) with where ωk (T (tus )) is the instantaneous frequency computed in channel ωk (T (tus )) is supposed to be constant over the duration of (tua − tu−1 a ) . The complete time scaling algorithm can be summarised as follows: 1. set the instantaneous phases φk (tus , Ωk ) = arg(X(0, Ωk )) 2. set the synthesis instant, according to an evolution at a constant frame rate R, through = tus + R the relation tu+1 s 3. calculate the next analysis temporal instant through the inverse time-warping function. Due to the fact that tu+1 could be non-integer, we have to consider the two integer time a instants immediately below and above it. 4. calculate the short-time Fourier transform at the temporal instants immediately below and above tu+1 , for each channel k, compute the corresponding instantaneous frequencies a using ωk = Ωk + ∆φ−ΩkRR−2mπ . Take care of the phase unwrapping problem; 5. for each channel k, estimate the modulus of the STFT at time tu+1 a through linear a interpolation 6. reconstruct the time-scaled short time Fourier transform at time tu+1 s 7. calculate the (u+1)-th frame of the synthesis sequence y(n) 8. return to step 2. CHAPTER 3. THEORETICAL BACKGROUND 3.4.2 23 Time-scaling time-domain algorithms We now briefly describe some of the simplest methods for time-scaling in time domain. A trivial method for time-scaling a sound recording is to just replay it at a different rate. When using magnetic tapes, for example, the tape speed may be varied; however this incurs a simultaneous change in the pitch of the signal. In response to this problem, a number of authors have developed algorithms to independently perform time and pitch-scaling. These methods are based on time domain splicing overlap-add approaches. The basic idea consists of decomposing the signal into successive segments of relatively short duration (10 to 40 ms). Time-scale compression/expansion is achieved by discarding/repeating some of the segments while leaving the others unchanged, and by copying them back in the output signal. As was the case for frequencydomain techniques, pitch-scale modifications can be obtained by combining time-scaling and re-sampling. For this scheme to work properly, one must make sure that no discontinuity appears at time-instants where segments are joined together; this is the reason why the segments are usually overlapped and multiplied by weighting windows, while phase discontinuities can be resolved by a proper time alignment of the blocks. The splicing methods have the advantage of being computationally cheap, but at the expense of suffering from echos artifact, due to the delayed and diminished amplitude replicas of the signal being present in the reconstruction. For strictly periodic signals, the splicing method functions perfectly provided the duration of the repeated or discarded segments is equal to a multiple of the period. This is still true to a large extent for nearly periodic signal such as voiced speech and many musical sounds. In order to improve the performances of the splicing technique, a number of methods have been proposed, in which an actual estimate of the pitch or some measure of waveform similarity are used to optimise the splicing points and durations. Synchronous Overlap and Add (SOLA) The example we analyse regards the Synchronous Overlap and Add method (SOLA), based on correlation techniques, which has been investigated since its simpleness and utility. The idea consists of adjusting the length of the repeated/discarded segment so that the overlapped parts (in particular, the beginning and the end) are maximally similar, thus avoiding artifacts which can occur in splicing methods. In particular, the input signal x(n) is divided into overlapping blocks of a fixed length; consequentially these blocks are temporal shifted according to the time-scaling factor. Then, the similarities in the area of the overlap intervals are searched for a discrete-time lag of maximum similarity. At this point of maximum similarity the overlapping blocks are weighted by a fade-in and fade-out function and summed sample by sample. The algorithm can be summarised as follows: 1. segmentation of the input signal into blocks of length N with time shift of Sa samples; 2. repositioning of blocks according to the scaling factor; 3. computation of the normalised cross-correlation between xL1 (n) and xL2 (n) , which are the segments of x1(n) and s2(n) in the overlap interval rxL1 ,xL2 (m) = L−m−1 1 X xL1 (n) · xL2 (n + m) L (3.33) n=0 4. extracting the discrete-time lag where the normalised cross-correlation operator has its maximum value 5. compute fade out/in between the two blocks CHAPTER 3. THEORETICAL BACKGROUND 24 SOLA implementation leads to time scaling with small complexity, guaranteeing the independence of the parameters Sa, N and L from the pitch period of the input signal. Chapter 4 Methodology In this chapter we will give an overview of the feature-based recommendation framework, starting from the general description of the problem it is addressed to. We will then decompose it into functional blocks and explain the operation performed by them. 4.1 The problem of labels In time, men have always shown the tendency of classifying and ordering the world around them. This classification ability usually assumes the form of naming, i.e. associating a name to class of objects; this is so important that many cultures consider it as a proof of the control of the man over the world. In the Bible, for example, God creates the animals and let the man decide their name: by giving names to the animals, the man is proclaimed the lord of all creation. In the history of mankind, classification and cataloguing have been applied in almost every field of science and studies; from more objective distinction as chemical components to more abstract labelling as historical periods or emotions. Although labels have shown their goodness since they usually create a simplified model of the world, they suffer from some disadvantages. First of all, they are usually bound to a certain level of abstraction and tend to hide the underlying details. If we consider the labelling of animals, we can divide them according to different levels of abstractions; by choosing one of them, the complexity of the underlying objects is hidden. In addition to this, labels are sometimes too strong in the sense that they do not consider the case of items belonging to more than one class; if we, for example, consider the genre classification of musical artists items namely belonging to the same genre could be very different from each other; moreover, the same artist may belong to more than one genre at the same time. 4.2 Labelling in recommendation systems We will now consider the problem of labelling in the musical field. Historically, music has been classified according to many aspects: artist, composer, genre, emotional content, etc. The classification of music according to labels is however very weak: in the classification according to the composer or the genre, for example, the boundaries between the classes are quite blurred; musical pieces written by the same composer may be very different. Recently, the label classification has been applied to the field of music recommendation systems. Many applications have been developed, all sharing the work-flow displayed in Figure 4.1. 25 CHAPTER 4. METHODOLOGY 26 Figure 4.1: The work-flow of traditional recommendation systems The musical signal is analysed in order to extract a set of descriptors called features. The feature data forms N-dimensional vectors. Those vectors are then analysed through a decision process that returns outcomes in the form of labels (genre, artist, style, etc.). This symbolic information can thus take on a finite or numerable number of values, which results in a greatly limited discrimination power. Recommendation systems usually start from this symbolic description to generate playlists. The loss of discriminant power associated to the conversion from feature vectors to labels is the main reason why recommendation systems tend to perform quite poorly. The reason why playlists do not expose a constant behaviour could be due to: • Temporal inhomogeneity: labels tend to be global descriptors of a whole musical piece. Therefore they usually do not consider local changes in the mood and tempo or genre transitions within the same piece. This makes it very hard to create playlists that locally adapt to a particular state. • Ensemble inhomogeneity: labels tend to apply to musical pieces that differ a great deal from each other. Choosing labels as control parameters results therefore in a weak control on the content. Consider, for example, genre classification: music belonging to the same genre are very different from each other; if the user selects a playlist based on a particular genre, this has a very weak connection with the actual mood of the song Figure 4.2: The work-flow of our system In this thesis, we would like to create a recommendation framework that avoids the conversion from feature vectors to labels (Figure 4.2). In this framework, the user can directly control the features obtaining a much better control on the generated content. Moreover, features are no more considered as global descriptors of the musical piece but are time-dependent properties that may change inside the same musical piece. Therefore, the system considers the locality of the features solving the problem of temporal inhomogeneity. 4.3 The recommendation framework structure The recommendation framework is composed by the following parts (Figure 4.3): • Proposal generator : this component is responsible of exploring the item database in order to find items that satisfy the conditions expressed by the user in the feature space. The CHAPTER 4. METHODOLOGY 27 Figure 4.3: The functional blocks of the system output of this component is a list of pairs ai , ∆t, where ai is an audio item and ∆t = [starti , endi ] is an interval inside ai , that satisfy the user constraints. • Ranking system: this component is responsible of learning the musical taste of the user by analysing his or her preferences. It then ranks the proposals created by the Proposal generator according to this information. In this way the recommendation framework not only considers the explicit requests of the user in terms of feature values, but also implicitly adapts the list of generated proposals to the inferred preferences. • Transition generator : this component is responsible of performing the transitions between audio items and interpolate the the changes in the values of the feature between different musical pieces • User interface: this components manages the interaction mechanism between the human and the machine. Various user interfaces can be devised according to the particular application of the framework 4.4 Applications This section presents some applications of the system. They have been chosen as instances of a wide range of possibilities, to show the generality of the recommendation framework approach. The applications are: • Automatic DJ system: the recommendation framework may be used as an automatic DJ system that can automatically create transitions and adapt the emotional aspects of music to a particular situation. • Tabletop recommendation system: the recommendation framework may be used in conjunction with a tangible interface to allow the user to specify the values of the features by placing blocks on a tabletop • Dynamic playlist generation system based on runner’s step frequency: using an accelerometer, the system is able to detect the step frequency of a runner and adapt the music tempo to this value • Training-based system: using the ranking system is possible to make the system learn the musical preferences of the user Each application has been obtained enabling/disabling components of the framework or by changing the parameters used during the generation phase. CHAPTER 4. METHODOLOGY 4.4.1 28 Automatic DJ system The increasing computational power of modern machines suggests a change in the traditional view of the task of a Disk Jockey. The early DJs used to buy LP disk mixing them in a creative way; the introduction of personal computers and audio compression (MP3 files) have just moved the audio data from analog to digital and increased the dimension of DJ libraries, without however affecting the performance of the artists. He or she has still to learn the repertory and know in advance which pieces suit well together. Keeping in mind these issues, this application focuses on the development of an automatic music compositing system that enables the user to control the music selection on the basis of meaningful parameters. Note that the same approach could be exploited in any other field (video, images, paintings, even cooking, ...). The system resembles an automatic Disk Jockey software but it differs from it in many ways. First of all, the system embodies an expert algorithm that analyses the audio database and it is able to detect features directly from the content; in addition, the interaction with the user is not just in the form of a control but more of a collaboration, in the sense that the user and the machine are now helping each other to achieve the goal. The program accepts real-time user inputs in order to modify its behaviour; the user, by means of an input interface, can drive the performance of the system by specifying the values of predefined parameters. 4.4.2 Tangible interface This application of the recommendation framework is based on the ReacTable, a tangible interface originally designed within the Music Technology Group at the Universitat Pompeu Fabra in Barcelona, Spain by Sergi Jordá, Marcos Alonso, Martin Kaltenbrunner and Günter Geiger. The ReactTable is a multi-user electronic music instrument with a tabletop user interface (Jordà et al. [24]). The user can interact with the instrument by placing blocks (called tangibles) on a semi-transparent table. Each tangible repreFigure 4.4: The ReacTable sents a predefined function and is recognised by a camera placed below the table; the visual interface of the system is projected on the table and allows the user to interface with the system via tangibles or fingertips. In this way a virtual modular synthesiser is operated, creating music or sound effects. In the version of the system developed by Jordà et al. [24], there are various types of tangibles representing different modules of an analog synthesiser. Audio frequency VCOs (Voltage-controlled oscillators), LFOs (Low-frequency oscillators), VCFs (Voltage-controlled filters), and sequencers are some of the commonly-used tangibles. There are also tangibles that affect other modules: one called radar is a periodic trigger, and another called tonaliser limits a VCO to the notes of a musical scale. The our application the interaction between the user and the machine is performed by placing on the table a set of fiducials, each one with a different meaning. We may group the fiducial in three main categories: • Feature weights: this set of fiducial controls the importance assigned to the features. By moving them, the user is able to prioritise the features by giving more or less importance to each of them. CHAPTER 4. METHODOLOGY 29 Figure 4.5: The ReacTable framework • Feature values: this set of fiducial controls the value assigned to the features. The user is therefore able to control the evolution of the recommendation system using a sort of tabletop mixer where the fiducials act as sliders. • Special fiducials: some fiducial have a special meaning; when they are visible in the scene, they modify the behaviour of the system. 4.4.3 Dynamic playlist generation system based on runner’s step frequency This application exploits a subset of the functionalities of the system. It consists in an adaptive playlist generator based on the runner’s step frequency. A similar application has been developed by Masahiro et al. [33]. The system adapts the tempo of the performed audio to the behaviour of a running person, by both time-scaling or changing to another musical item. The recommendation framework has been tuned in the following way: • An upper bound of 5 seconds has been set on the temporal length of the proposals. In this way, the system is forced to generate a proposal list (and therefore to check the value of the features) every 5 seconds. This leads to a reasonable reactiveness • Since in this application the tempo feature is the most important one, its importance has been set to a high value. The relevance of the other features has been set to a low value. • The audio database is composed by the same song (The Knack - My Sharona) time-scaled at different BPMs. An accelerometer has been fasten to the runner’s leg to detect his or her step frequency. When the tester runs, the system adapts the music tempo to this value. The interesting aspect of this application is that both the person and the machine tend to adapt to each other. In fact, when the music beat frequency is close to the runner’s step frequency, he or she tends to adapt his or her speed to the music. The system could become even more interesting when it could enhance the runner’s performance by influencing his or her heart rate. When the system detects a low heart rate it should recommend faster music and vice versa. 4.4.4 Training-based system In this application the system is used without specifying the values of the features and interacting only through the screen interface. The only part of the recommendation framework that is used is the mechanism to learn the taste of the user and rank the music according to the history of CHAPTER 4. METHODOLOGY 30 the users’ preferences. The more the system is trained, the more it understands and adapts to the inferred preferences of the user, improving the quality of the recommended music. This is the lightest application from the point of view of the human-computer interaction; the user implicitly interacts with the learning algorithm by selecting items from the proposal list. Chapter 5 Implementation This chapter describes the implementation details of the system. The system is mainly divided in two stages (Figure 5.1): • Preprocessing phase: the system extracts the features from the audio contained in a database using Music Information Retrieval (MIR) techniques. The output of this phase is saved in the form of XML files • Performance phase: the system reads the XML data and uses it to select the audio items to play. The system starts playing an audio track for a pre-defined period of time; it then scans the database for items that best fit parameters defined by the user and shows the user a list of proposal. If the user chooses one of them, the system automatically performs the transition otherwise the system automatically select the best fitting item. During the implementation of the performance step, real-time issues should be taken into account to avoid artifacts such as playback delays or audio cracks. The first section of this chapter describes the feature extraction mechanism and the feature similarity functions used to compare the values of the features. This part describes the basic bricks the system is build with. In the second section we describe the intermediate layer between the preprocessing and the performance stages: the XML data. The third section describes in detail the performance phase. Figure 5.1: System stages 31 CHAPTER 5. IMPLEMENTATION 5.1 32 Preprocessing phase The preprocessing stage analyses the audio data and extracts the features. For each audio item in the database, the system calculates the evolution of the features in it and stores the result in a XML file. The audio items are down-sampled to 11025 Hz mono signals in order to speedup the processing. The first section defines the concept of feature we will use in the system. It then describes how the features are calculated. The second section describes the feature similarity functions that will be used during the performance phase to compare the values of features in different audio files. 5.1.1 Features extraction A feature is a function of time: f eature : T IM E 7→ D (5.1) where D is an arbitrary domain. The domain D should satisfy the comparability property: there exists a function compare : D×D 7→ [0, 1] (5.2) that is used to compare each pair of values in D. The following properties hold: • When two elements are similar, the value should be near to 1. Conversely, when two elements are not similar, the value should be near 0 • An element of D is equal to itself compare(a, a) = 1 ∀a ∈ D (5.3) • Symmetry: compare(a, b) == compare(b, a) There are some relevant features that are used in the system (for the theoretical details, see chapter 3): • segmentation: a segmentation operation splits the audio signal in segments by detecting some ”interesting points” in the musical stream. We will call them anchors. An anchor defines the moment in which an event occurs in the music. During the performance phase, the system will use the anchor points to build up the playback tree (Figure 5.2). This means that the system will merge two songs only in correspondence of segment points. Figure 5.2: Anchors points CHAPTER 5. IMPLEMENTATION 33 • harmony: this features is composed by a key (e.g. C, C#, ...) and a mode (major, minor). In addition we could specify the confidence of the detected harmony, the so-called keyClarity. • tempo: expressed in BPM (beats per minute). It is extracted in 50% overlapping windows of 16 seconds. An important aspect we will now concentrate on is the optimisation of the tempo. It sometimes happens that the detection of the tempo oscillates among the multiples of the correct value although no tempo change occurs in the musical score; this is because some beats of the rhythmic pattern are missing or have been added for artistic purposes. When detecting the tempo, we should therefore avoid considering these oscillations. An algorithm has been developed to merge the values of two different tempos when one is a multiple of the other. The system analysed the list of tempo detections from the beginning to the end: – Given bpmn , the n-th tempo detection, the system builds two lists, div and mul in the following way: ∗ For each bpmi with i = 0, ..., n − 1: · Set divi = round(bpmi /bpmn ) · bpmn − bpmi · Set muli = bpmn /round(bpmn /bpmi ) − bpmi – Given divM inIndex and mulM inIndex the indexes of the minimum value of div and mul respectly: ∗ if divminDivIndex < mulminM ulIndex : set the n-th value of the tempo to bpmn · round(bpmminDivIndex /bpmn ) (5.4) ∗ otherwise set the n-th value of the tempo to bpmn /round(bpmn /bpmminM ulIndex ) (5.5) After that, a median filter of 5 tempo samples is applied to remove the remaining outliers. Figure 5.3a shows an example of an un-optimised tempo detection. The tempo in the original score moves from 180 to 160 and again to 180. We can notice that in the unoptimised detection, the values of the tempo oscillate between three values: 180, 90 and 160. Whereas the value 180 and 160 are correct, 90 is wrong since no change in time takes place in the score. This problem arises from the fact that the during the 90 tempo segments only two beats out of four are detected by the system. The result of the optimisation is shown in Figure 5.3b. • brightness: the brightness expresses the power of the high-frequency bands with respect to the low-frequencies. The brightness feature is extracted using 1-seconds-long 50% overlapping windows. • rms: rms are computed by taking the root average of the square of the amplitude. The rms feature is extracted using 1-seconds-long 50% overlapping windows. • mood: mood is a high-level feature describing the emotional content of a musical piece. Since no standard set of emotions seems to have been established, this has to be selected and should be founded in psychology and prove useful in the study. In the system, we used a bi-dimensional representation of the mood shown in Figure 5.4. The vertical axis (Energy) is related to the strength of the signal and could be therefore detected through the intensity features (such as RMS, loudness, ...). The horizontal axis CHAPTER 5. IMPLEMENTATION (a) Original tempo detection (before optimisation) 34 (b) Optimised tempo detection Figure 5.3: Tempo optimization Figure 5.4: Mood bi dimensional plane (Stress) indicates whether the emotion is positive (happiness, joy, ...) or negative (depression, frustration, ...) and is calculated using timbric features. Using a set Support Vector Machine we are able to classify the audio data in a bidimensional plane; a total of three SVM are trained to determine the classes in a hierarchical framework (Figure 5.5): first of all, the items are classified according to the intensity feature in two classes (high and low intensity) and then, for each class, a SVM performs the vertical classification. The features are extracted in windows of 32 milliseconds and, for each of them, the average and variance are computed. The intensity feature extracted from the audio items is rms. The timbric features are: – spectral centroid – spectral roll-off – spectral flux – inharmonicity – MFCC CHAPTER 5. IMPLEMENTATION 35 Figure 5.5: The hierarchical framework 5.1.2 Features similarity functions As previously stated (see 5.1.1), each feature (except for segmentation) is assigned a similarity function that expresses the relationship between its values. The functions are the following: • harmony: the similarity relation defined among the values of harmony could be defined according to the rules of harmony in Western music. These values could be changed according to the personal taste. In Table 5.1 and Figure 5.6 we present the values of the compare function between (C, maj) and all other harmonies and between (C, min) and all other harmonies. The other values could be found musically transposing the notes. Beside this, the similarity measure also considers the keyClarity, i.e. the confidence of the detected harmony. The key clarity not only expresses the probability of the correctness of the given harmony, but gives hints about the amount of inharmonic noise present in the piece. If this value is very high, a strong harmonic component is perceived by the listener. On the contrary, when this value is low, the segment does not present a well-defined harmony. When two segments have both a high key clarity, the overall harmony similarity function should consider the values defined in Table 5.1. However, when two segments have both a low key clarity, the actual value of the key is not important since the value has a low confidence. In particular given two audio samples, with key clarity keyClarity1 and keyClarity2 respectively with the key similarity computed as shown before (see Table 5.1), the harmony similarity measure should show the qualitative behaviour described in Table 5.2. High Low keySimilarity High High Low keySimilarity Medium keySimilarity Medium Low High Low Medium High Medium High keyClarity1 keyClarity2 Table 5.2: The qualitative measure of harmony similarity . The qualitative graph of the similarity function is shown in Figure 5.7; when both key clarities are zero, the overall similarity is high, when both are one, the value is the real key similarity. The overall similarity measure is obtained by the combination of two functions: f1 (...) = (1 − keyClarity1 ) · (1 − keyClarity2 ) (5.6) CHAPTER 5. IMPLEMENTATION Key C C# D D# E F F# G G# A A# B Mode maj min maj min maj min maj min maj min maj min maj min maj min maj min maj min maj min maj min (a) C Major measure Value 1.00 0.25 0.00 0.00 0.20 0.25 0.10 0.00 0.10 0.85 0.25 0.25 0.00 0.00 0.25 0.25 0.10 0.00 0.10 0.90 0.20 0.00 0.00 0.00 similarity 36 Key C C# D D# E F F# G G# A A# B Mode maj min maj min maj min maj min maj min maj min maj min maj min maj min maj min maj min maj min Value 0.25 0.00 0.00 0.00 0.00 0.00 0.90 0.00 0.00 0.00 0.25 0.25 0.00 0.00 0.25 0.25 0.85 0.00 0.00 0.00 0.25 0.00 0.00 0.00 (b) C Minor similarity measure Table 5.1: Harmony similarity measure (a) C Major similarity measure (b) C Minor similarity measure Figure 5.6: Harmony similarity measure Figure 5.7: The qualitative graph of harmony similarity CHAPTER 5. IMPLEMENTATION 37 that is maximum when both keyClarity1 and keyClarity2 are zero, and the following: f2 (...) = keySimilarity · (1 − |keyClarity1 − keyClarity2 |) (5.7) whose value decreases when keyClarity1 and keyClarity2 are distant. Figure 5.8a shows the graph of f1 , whereas Figure 5.8b shows the graph of f2 . (a) Graph of f1 (b) Graph of f2 Figure 5.8: Harmony similarity measure The harmony similarity is obtained combining the contributes of the two functions: compareharmony (...) = f1 (...) + (1 − f1 (...)) · f2 (...) (5.8) In Figure 5.9 an example of similarity function is plotted (key similarity is set to 0.5). We can see that the qualitative behaviour resembles the one in Figure 5.7. Figure 5.9: Graph of compareh armony • tempo: the distance function for the tempo could be calculated as a normalized Gaussian function centred in one of the two BMP values and with an appropriate variance (Figure 5.10). It returns high values when the two tempos are near and reasonably decreasing values when they move away. CHAPTER 5. IMPLEMENTATION 38 Figure 5.10: Tempo similarity graph • brightness: the similarity function is computed as follows comparebrightness (...) = 1 − |brightness1 − brightness2 | (5.9) • rms: the similarity function is computed as follows comparerms (...) = 1 − |rms1 − rms2 | (5.10) • mood: the distance measure among the classes is summarised in Table 5.3. Keeping in mind the bi-dimensional plane used to calculate the mood, the similarity is 1.00 when the two moods are the same, 0.33 when they are in the same column, 0.40 when they are in the same row and 0.10 otherwise. Exuberance Anxious Contentment Depression Exuberance 1.00 0.40 0.33 0.10 Anxious 0.40 1.0 0.10 0.33 Contentment 0.33 0.10 1.0 0.40 Depression 0.10 0.33 0.40 1.0 Table 5.3: Mood similarity table 5.2 XML Structure For each audio file in the database and for each feature, an XML file is created: • filename anchors.xml : contains data about anchors (resulting from the segmentation phase) • filename tempo.xml : contains data about tempo • filename harmony.xml : contains data about harmony (key, mode, keyClarity) • filename brightness.xml : contains data about brightness • filename rms.xml : contains data about rms • filename mood.xml : contains data about mood Two XML schema has been defined, one used for the anchors and the other for the other features. CHAPTER 5. IMPLEMENTATION 5.2.1 39 Anchors XML Schema The anchors XML file contains the list of the anchor points. For each of them, a name and a description may be defined. Table 5.4 describes the XML structure. An example of XML file is the following: <anchors> <anchor position=POSITION> <name>NAME</name> <description>DESCRIPTION</description> </anchor> ... </anchors> 5.2.2 Generic feature XML schema The generic feature XML schema is used to store information about features (harmony, tempo, ...). In this context a feature is seen as a structure that associates the value of some attributes to time instants and has some parameters that specifies some global characteristics (Figure 5.11). Table 5.5 describes the XML structure. An example of XML file is the following: <feature id=ID> <name>NAME</name> <description>DESCRIPTION</description> <parameters> <parameter name=NAME>VALUE</parameter> </parameters> <data> <dataitem position=POSITION> <name>NAME</name> <description></description> <attributes> <attribute name=NAME>VALUE</attribute> ... </attributes> </dataitem> ... </data> </feature> The list of features and attributes used in the system is the following: • Harmony: The harmony XML file is derived from the generic feature XML schema. It contains three attributes per data item: – key: a value among C, C#, D, D#, E, F, F#, G, G#, A, A#, B – mode: a value among maj, min CHAPTER 5. IMPLEMENTATION Name <anchors> <anchor> Attributes position: contains the position of the anchor in number of samples (from 1 to N) 40 Children Zero or more <anchor> <name> [OPTIONAL], <description> [OPTIONAL] <name> <description> Description Contains the anchor list Defines an anchor point Name of the anchor Description of the anchor Table 5.4: Anchor XML structure Figure 5.11: The XML data model Name <feature> Attributes id: identifier of the feature <parameters> <parameter> name: identifier of the parameter <data> <dataitem> position: position in the music stream in number of samples (from 1 to N) <attributes> <attribute> Children <name> [OPTIONAL], <description> [OPTIONAL], <parameters> [OPTIONAL], <data> [REQUIRED] One or more <parameter> Zero or more <dataitem> <name> [OPTIONAL], <description> [OPTIONAL], <attributes> [OPTIONAL] One or more <attribute> name: identifier of the attribute Table 5.5: Generic feature XML structure Description Root node Contains the feature global parameters if any Contains the value of a parameter Contains the data of the feature Represents a value of the feature List of attributes of the data item Contains the value of an attribute of the data item CHAPTER 5. IMPLEMENTATION 41 – keyClarity: a real number between 0 and 1 • Tempo: The tempo XML file is derived from the generic feature XML schema. It contains one attribute per data item: – bpm: beat per minutes • Brightness: it contains one attribute per data item: – brightness: the value of the brightness (real number in the interval [0, 1]) • Rms: it contains one attribute per data item: – rms: the value of the rms (real number in the interval [0, 1]) • Mood: it contains one attribute per data item: – mood: the value of the mood (a number in the interval [1, 2, 3, 4]) 5.3 Performance phase In the following sections, the performance stage is analysed. At startup, the system loads the information from the XML files into memory and randomly selects a section of an audio item and plays it. It then shows to the user a list of audio items that best fit with the current one, sorted according to the musical preferences of the user (detected by an expert system). By default the first item of the list is selected, but the user can change the selection. Once the played section is about to end, the system starts elaborating the selected item, performing the transition between the currently played and the new item. The process is then iterated. The performance stage is composed by the functional areas shown in Figure 5.12. Figure 5.12: The functional blocks of the system • Work-flow manager : this components manages the timing in the system by sending signal to the other blocks in correspondence of events • Proposal generator : this component is responsible of the analysis of the feature space and the detection of audio items that are similar to the one that is currently played • Ranking system: this component learns the musical taste of the user from his or her actions and sorts the proposal list generated by the proposal generator according to it CHAPTER 5. IMPLEMENTATION 42 • Transition generator : once the user has chosen the next audio item, the transition generator mixes it with the one that is currently being played. • System interface: between the system and the user lies the interface; we will see that this is not a single component but there exist multiple interfaces (screen, tangible, tapping and Wii Remote) 5.3.1 Work-flow In this section we discuss the temporising of the system by highlighting the order and timing of the events that occur during the performance phase. The user interaction with the system is performed inside a so-called Proposal selection window, in which a list of proposals is shown to the user and he or she can choose an option from the list. Each list of proposal has a default value that is automatically selected in case the user does not express any preference. The temporising of the system is performed by means of events. An event is used as synchronisation mechanism to indicate to waiting processes when a particular condition has become true. When an event triggered, a predefined set of actions (task ) is executed. In the current system, the events are scheduled by the TaskScheduler class that allows other processes to be notified of them. The list of events in the system is the following: • Proposal selection opened : this event signals the opening of a proposal selection window meaning that the user can start selecting the next audio item to be played from a list. The system displays the list until a ”Proposal selection closed” event is raised. • Proposal selection closed : this event occurs after a ”Proposal selection opened” event and signals that the proposal selection window has expired. From the moment in which the event is triggered, the user is excluded from from performing a choice of the audio item to be played and the system can start elaborating the audio file (creating the transition and sending it to the audio device). Note that if the user does not express any preference during the proposal selection window, the system automatically selects the best fitting option. • Proposal play started : this event notifies the system that a proposal is being played • Proposal play ended : this event notifies the system that a proposal has finished playing and can be discarded from the system. We will now describe how the events are scheduled by the TaskScheduler class. System startup When the system starts, it allows the user to select the first audio item for a pre-defined interval of time (Tstartup ) and then it closes the selection window. Let t0 be the system startup time instant. The system: • schedules a ”Proposal selection opened” event at t0 • schedules a ”Proposal selection closed” event at t0 + Tstartup Figure 5.13 shows the process. CHAPTER 5. IMPLEMENTATION 43 Figure 5.13: System startup temporising System default execution After the startup phase, the system performs repeatedly a series of operations (task) in correspondence of the events. The events are scheduled in such a way that the computation is as distributed in time as possible (meaning that two computationally expensive events should not occur at the same time) avoiding any interruption in the played music. Considering as reference point the time instant in which a ”proposal play started” event occurs, we define a set of constants (Figure 5.14): • Tclosing : each proposal selection for audio item proposali should close at least Tclosing seconds before its playback (so that the system has the time to elaborate it). • Topening : similarly, the ”proposal selection opened” event for the next audio item should occur at least Topening seconds before the beginning of the playback of the current audio item. Figure 5.14: Tclosing and Topening Let Li be the length of the i-th audio item that is selected by the user, tevent the time instant in which the event occurs. • Proposal selection closed : when a selection closes, the system schedules the beginning of the playback of the current audio item (since it is now decided) and the next proposal selection opening. The operations performed are the following (Figure 5.15): – Retrieve the selected proposal proposali with length Li – Schedule the next ”proposal play started” event at time tevent + Tclosing – Schedule the next ”proposal selection opened” event at time tevent +Tclosing −Topening • Proposal selection opened : when a selection opens, the system schedules the proposal selection closing. The operations performed are the following (Figure 5.16): – Retrieve the last selected proposal proposali−1 with length Li−1 CHAPTER 5. IMPLEMENTATION 44 Figure 5.15: Proposal selection closed – Schedule the next ”proposal selection closed” event at time tevent + Li−1 + Topening − Tclosing Figure 5.16: Proposal selection opened • Proposal play started : when an audio item starts its playback, the system schedules the ”Proposal play ended” event. The operations performed are the following (Figure 5.17): – Retrieve the proposal to be played proposali with length Li – Schedule the next ”proposal play ended” event at time tevent + Li−1 Figure 5.17: Proposal play started • Proposal play ended : no action Anticipated proposal selection closing It happens sometimes that the user does not want to wait until the end of the proposal selection window since he or she has already decided the next proposal to be played. In this case the user can trigger an anticipated proposal selection closing and forcing the system to display the next proposal list. The operation that are performed by the system are the following: • Check if the proposal selection is actually opened. If not, the anticipated closing cannot be performed. • Cancel the next ”proposal selection closing” event and retrieve the time in which it was scheduled tcancelled (Figure 5.18) CHAPTER 5. IMPLEMENTATION 45 Figure 5.18: Cancel next ”proposal selection closed” event • Perform a slightly different version of the proposal selection closing operation (the only difference is that in this case the next ”Proposal selection opened” event is not scheduled): – Retrieve the selected proposal proposali with length Li – Schedule the next ”proposal play started” event at time tcancelled + Tclosing (Figure 5.19) Figure 5.19: Anticipated proposal selection closing • Open the new anticipated proposal selection window (Figure 5.20): – Schedule the next ”proposal selection closed” event at time tcancelled + Li Figure 5.20: Anticipated proposal selection opening Forcing the proposal list generation In order to improve the user experience reacting fast to the user inputs, it is sometimes useful to regenerate the proposal list in order to adapt to a new state of the parameters. This operation should take place only if the user parameter have significantly changed. The regeneration of the proposal list can be performed at every instant during a ”proposal selection” window. CHAPTER 5. IMPLEMENTATION 5.3.2 46 Proposal generator In this section we describe the proposal generator algorithms. Given a segment [starti , endi ] of an audio item ai , two algorithms have been developed to evaluate the similarity of another audio item aj at position startj . • A first simplified algorithm computes the similarity as a numerical value within a fixedlength time interval • The second algorithm computes the similarity by detecting the length of the time interval in which the two segments fit well together. In this interval, the average similarity of each feature should not be less than a pre-defined value. Before discussing the details of the algorithms, we will describe some concepts and techniques used by them. Control parameters For each feature (except segmentation), a set of variables is created. These variables are used to compute a similarity measure among audio items. The values of the variables are set by the user in real time. • Weight: The weight represents the importance of the feature. To limit the range of the weights, we set: wi ∈ [0, 1] ∀i (5.11) • Value: the value is an element of the domain of a feature and represents the expected value of the feature. The user can also avoid setting the value; in this case, the variable is set to ”null”. Similarity algorithms During the performance phase, the audio items are compared within a time interval and a similarity measure is computed. Given • a1 (t) and a2 (t), two audio items • [start1 , end1 ] and [start2 , end2 ], two time intervals referring to the two audio items respectively. We suppose that the two segments are of the same length L. • f1,k and f2,h , the value of the features of the two audio items • vh , the expected values of the features (also ”null”) expressed by the user in real time we compute the similarity measure Simh for the feature h as follows: • If the feature value is different from ”null”, the system computes the similarity measure as a cumulated distance between the second audio item and the expected value Z 1 L Simh = compare(f2,h (τ + start2 ), vh ) dτ (5.12) L 0 CHAPTER 5. IMPLEMENTATION 47 where compare(..., ...) is the compare function defined by the feature. Remember that in general the similarity measure is symmetric, therefore it is not affected by the order of the operands. • If the value is ”null”, the similarity measure is calculated between the two audio items; the more similar the two items are within the compare interval, the greater will be the value of the similarity measure 1 Simh = L Z L compare(f1,h (τ + start1 ), f2,h (τ + start2 )) dτ (5.13) 0 Fixed-length algorithm The first algorithm performs the comparison inside a fixed-length interval (defined a priori) and associates to each segment a value, that can then be used to sort it. Given: • ai , the previous audio item • aj , the next audio item • ti , a time instant in ai • tj , a time instant in aj • L, a length • wh , the weights of the features expressed by the user in real time the algorithm computes the similarity by calculating an overall similarity measure in the interval of length L immediately after ti and tj . The value of tj is chosen among the anchor points defined in the audio item aj , since they represent relevant positions inside the musical piece. The operations performed by the algorithm are the following: • For each feature h, compute the feature similarity measure Simh between the intervals [ti , ti + L] and [tj , tj + L] belonging to ai and aj respectly. • According to the value of the weights, merge the similarity measures of the features in a single value: PH wh · Simh Sim = h=1 (5.14) PH h=0 wh When the value of Sim is compared and the values of wh remains constant, we can discard the normalisation factor: H X Sim0 = wh · Simh (5.15) h=1 An example of the execution of the algorithm is shown in the following figures: Figure 5.21a represents the expected feature evolution specified by the user and Figure 5.21b shows the value obtained by selecting the best option according to the previous algorithm. CHAPTER 5. IMPLEMENTATION (a) Expected value of feature 48 (b) Actual value of feature Figure 5.21: An example of the fixed-length algorithm Variable-length algorithm This algorithm, instead of accepting a pre-defined comparison interval length, dynamically chooses it according to the similarity value. Items that with a high similarity values will correspond to longer intervals, since they suit well together. Consider: • ai , the previous audio item • aj , the next audio item • ti , a time instant in ai • tj , a time instant in aj • wh , the weights of the features expressed by the user in real time The value of tj is chosen among the anchor points defined in the audio item aj , since they represent relevant positions inside the musical piece. The weights wh specified by the user are used as feature similarity thresholds: the average similarity measure of the feature h inside the interval computed by the algorithm should be greater or equal to the value of wh (Figure 5.22). If, for example, the value of the weight wh of feature h is 0.5, the system will create a transition in which the average feature similarity of h is at least 0.5. Figure 5.22: Variable-length similarity algorithm The operations performed by the algorithm are the following: CHAPTER 5. IMPLEMENTATION 49 • For each feature h, compute the length Lh such that: Lh = max({ l | Simh (l0 ) ≥ wh ∀l0 ∈ [0, l[}) l0 (5.16) where Simh (l0 ) is the similarity measure of feature h between the intervals [ti , ti + l0 ] and [tj , tj + l0 ] belonging to ai and aj respectively • Compute the overall interval length as the minimum length among the features L = min({ Lh ∀h }) 5.3.3 (5.17) Ranking system In this section we will describe the algorithm used by the system to understand and satisfy the musical preferences of the user. The system differs from most audio players since the learning is content-based : the system understands the preferences directly from the audio data without using meta-data or putting a likelihood variable on the audio items. The ranking system is used to sort the proposal list after it has been generated. Audio items are sorted by descending likelihood so that the items on top of the list (the ones that the user sees first) are the one he or she likes most. GMM training phase The system periodically trains a Gaussian Mixture Model (GMM) on the basis of the list of preferences of the user. The preferences are collected when the user forces a proposal selection closing; in fact, when the user closes a proposal selection in advance, he or she has usually chosen his or her favourite option among the proposed ones. When the proposal selection window is expired (the system decides to close the proposal selection), no action is taken by the ranking system. The information collected are the mean values of the tempo, rms and brightness features (harmony has been discarded since it is not considered connected to user preferences) in the selected section of the audio item. The GMM is then trained when 10 preferences have been collected using the EM algorithm (see chapter 3). Figure 5.23a shows an example of dataset, extracted from a real execution of the system. Figure 5.23b shows the mean of the trained GMM. The red circles represent the centres of the GMM components. GMM usage phase After the GMM has been trained, it is used to order the audio items in the proposal list. For each of them, the mean values of the tempo, rms and brightness features are extracted and the likelihood is computed as follows: likelihood(X) = K X P (λk |X) k=1 where P (λk |X) is the probability of the k component of the GMM. (5.18) CHAPTER 5. IMPLEMENTATION (a) A preference dataset (b) The centers of the trained GMM Figure 5.23: GMM training example 50 CHAPTER 5. IMPLEMENTATION 5.3.4 51 Transitions We now concentrate on the transitions. A transition (Figure 5.24) is a section of the audio generated by the system in which two or more audio items are merged together. Figure 5.24: Transition In this delicate moment, the features may change very roughly since the merged items may have different characteristics. The aim of this chapter is to describe some methods that solve this problem by modifying the original waveform in order to smooth the feature change. These methods mainly concern the tempo feature whose change is more evident during the cross-fading operation. Timescale In order to smooth the tempo feature between two items, we compute a timescale operation to the audio items. We provide a small example; suppose that the first audio items tempo is 120 BPM and the seconds is 130 BPM. During the transition the system should: • Gradually increase the tempo of item 1 from 120 to 130 BPM • Timescale item 1 to 120 BPM at the beginning of the transition and gradually increase its tempo to 130 BPM • Consider the case of segments shorter than the length of the transition Figure 5.25 explains the concept: Figure 5.25: Timescale operation in a transition Using a phase vocoder to perform timescale, we have to define a transformation from the new time scale to the original one (for each instant in the new time scale, the function returns the position in the original time scale): f : Tnew 7→ Told (5.19) We will now describe a method to compute f given: • Lold , the audio segment length in the non-time-scaled reference CHAPTER 5. IMPLEMENTATION 52 • βstart , the starting tempo ratio (i.e. the time scaling factor at the beginning of the audio item) • βend , the ending tempo ratio We will linearly interpolate the two times (Figure 5.26) meaning that: df (τ ) = a · τ + b dτ (5.20) df (0) = βstart dτ (5.21) By setting and where Lnew df (Lnew ) = βend (5.22) dτ is the length of the transition in the new time scale (unknown for now), we get a= βend − βstart Lnew (5.23) b = βstart (5.24) and In order to calculate a solution for the differential equation and determine Lnew , we set as Figure 5.26: Transition timescale linear approximation starting condition f (0) = 0 (5.25) and we set the relationship between Lold and Lnew : Lold = f (Lnew ) (5.26) We get Z f= df a βend − βstart 2 dτ = · τ 2 + b · τ + 0 = τ + βstart · τ dτ 2 2 · Lnew (5.27) 2Lold βstart + βend (5.28) Lnew = CHAPTER 5. IMPLEMENTATION 53 Until now we are able to process a single audio item. To perform a complete transition, we have to set the value of βstart and βstart for the two items involved in the transition. Give BP Mout and BP Min , the BPM (beat per minute) of the two audio items (the fade-out and fade-in respectively). For the fade-out item, we start with βstart,out = 1 (5.29) and change the ratio to βend,out = BP Min BP Mout (5.30) βstart,in = BP Mout BP Min (5.31) For the fade-in item and βend,in = 1 (5.32) We may notice that, even if the two item have the same length, the length of the segments in the new time scale could, in general, be different. The item with the higher BMP will have the longest segment. Scale factor reduction In order to minimise the time scale factor, we take into account the fact that two BMP such that BP M1 = n · BP M2 (5.33) where n is a natural number, can be considered as equivalent in the timescale operation since the beats are synchronised up to an integer multiple. Before executing the timescale, we will try to detect the value of n: if (BPM_{out} > BPM_{in}) { n = round(BPM_{out}/ BPM_{in}); BPM_{out} = BPM_{out} / n; } else { n = round(BPM_{in}/ BPM_{out}); BPM_{in} = BPM_{in} / n; } In this way the ratios βstart and βend will fall inside the interval [ 23 , 32 ]. If the result is still unsatisfactory, a threshold interval can be set (e.g. [0.9, 1.1]) and the timescale is performed only if the ratio is contained within that interval. Transition beat synchronising In the previous paragraphs, we discussed the problem of tempo change in the transition between two audio items. We used a dynamic time-scale algorithm to adapt the tempos of the two items. We will now approach another problem: even if the two items have the same tempo values, the beats should be synchronised in order to obtain a pleasant transition (Figure 5.27). This operation is performed through a peak detection in the correlation function between the two signals; the correlation is high when the two signals show a good synchronisation. Given the two audio items (fade out and fade in) x1 (t) and x2 (t), CHAPTER 5. IMPLEMENTATION 54 Figure 5.27: Beat synch • Compute the cross-correlation function Z +∞ x1 (τ ) · x2 (τ + t) dτ C(t) = (5.34) −∞ • Determine the maximum of the function t∗ = arg max(C(t)) (5.35) • Merge the two audio items with an offset equal to t∗ 5.3.5 Interface In the following paragraphs we will describe the interfaces between the user and the system. Due to the strong artistic component of the project, the goodness of the system will depend not only on its technical capabilities but also on the impression of the final user. Therefore this section covers one of the most important parts of the system. Some interfaces has been developed; they can be used at the same time in a collaborative environment: • Screen interface: normally the user interacts with the software through a traditional screen GUI (Graphical user interface); the GUI is composed by buttons, listbox and all traditional windows environment components. • Tangible interface: similarly to what happens in the ReacTable, the system can be controlled through a set of tangible objects on a tabletop whose position is detected by a video device (video camera or web-cam); the user can control the system by moving the object on the table or turning them. In addition to that, the interface is equipped with a finger tracking system. • Tapping interface: a special input device has been developed to allow the user to specify a musical tempo value. The user can send a set of impulse to the system (tapping on a membrane, pressing a button, ...) which estimates the period between the impulses and calculates a BPM. • Wii Remote interface: the software can be controlled using the Wii remote, a Bluetooth wireless controller used in the Nintendo Wii Console. The remote is equipped with an accelerometer and a set of buttons; the accelerometer is used to detect the frequency in which the user moves the controller and set the tempo features, the button are used for both selecting the next audio item and setting the rms and brightness features. CHAPTER 5. IMPLEMENTATION 55 Screen interface A Java Swing interface has been created in order to control each parameter of the system. The screen interface is used to edit the real-time parameters in a precise way (e.g. specify a fine-tuned value of the features or the feature weights). In addition to this, the screen interface allows the user to choose the next song to be played among a list of proposal made by the system. Figure 5.28 shows the main program window. Figure 5.28: The program window It is composed by: • Next song selection (Figure 5.29), where the user can choose the song to be played from a list of proposals. Each entry shows the song name, the length of the transition and highlights the portion of the items that will be used in the transition Figure 5.29: The next song selection entry • Now playing panel (a.k.a. ZBar): shows the playlist of the songs that are going to be played. • System parameters panel : the user can interact with the system using the right part of the screen interface. CHAPTER 5. IMPLEMENTATION 56 System parameters panel In this panel the user can modify the parameters used to generate the list of proposals. The design of this part of the system aims at finding a compromise between a light and an understandable user interface. This panel is divided into three tabs: • Feature values tab: this panel allows the user to specify the value of the features. Three input methods has been designed: – Tapping panel (used for the tempo), where the user can input the tempo by tapping it (see 5.3.5). The tapping can be performed both using the mouse or the keyboard. – Rms and brightness panel, where both RMS and Brightness features can be edited at the same time. The panel allows the user to specify a point on a polar coordinate system in which the angle defines the value of the brightness and the radius the value of the rms. – Mood panel, where the mood can be chosen among the four classes (Exuberant, Anxious, Depressed and Content). • Feature weights tab: in this panel the user can specify the weights of the features by using a set of sliders • Settings tab: this tab contains the general settings that can be edited by the user. They are: – Use Timescaling?, when enabled, the system will try to change the tempo of the songs in order to improve the transitions. – Use Tabulist?, when enabled, the system will avoid playing twice the same song. – One proposal per audio item?, when enabled, each song will appear at most once in the list of proposals. – Enable fullscreen? – Run time mode, the following settings influence the way in which the system generates the proposals: ∗ Normal, the system will continue playing the song unless the user forces a change. ∗ Skip, the system is forced to change song at each iteration. ∗ Continue, the system is forced to continue playing the same song. Tangible interface The tangible interface allows the user to interact with the system by placing objects on a table. The architecture of the system derives from the ReacTable structure (Figure 5.31a). The Reactable’s main user interface consists of a translucent table. Underneath the table is a video camera, aimed at the underside of the table and inputing video to a personal computer. There is also a video projector under the table, connected to the computer, projecting video onto the underside of the table top that can be seen from the upper side as well. A set of objects are placed on a table and a video devices captures the scene (from above or below). In order to detect the position of the objects, a set of symbols, called ”fiducial markers” are attached to them (Figure 5.31b) The image of the camera is processed by a fiducial detecting program that can identify the fiducials and calculate position and angle. The program used in this application is ReacTIVision (http://reactivision.sourceforge.net/), that is created by the ReacTable team and distributed CHAPTER 5. IMPLEMENTATION (a) Feature values tab 57 (b) Weights tab (c) Settings tab Figure 5.30: System parameters tabbed panel (a) The ReacTable framework (b) ReacTIVision fiducials Figure 5.31: ReacTable system CHAPTER 5. IMPLEMENTATION 58 under GPL licence. ReacTIVision is a standalone application, which sends TUIO messages via UDP port 3333 to any TUIO enabled client application. The TUIO protocol was initially designed within this project for encoding the state of tangible objects and multi-touch events from an interactive table surface. It is an open framework that defines a common protocol and API for tangible multitouch surfaces. Based on this structure, we created a tangible interface in which the user can set the values of the features, by placing objects on it. In our application, we divide the fiducial in three categories: • Feature weights: this set of fiducial controls the weights assigned to the features. When the fiducial is not visible, the value of the weight is set to zero. If visible, the value is determined by the vertical position (as in a music mixer) and ranges between 0.0 and 1.0. To reduce the space needed in the placement of the fiducial, the maximum value is reached in the middle of the table whereas the minimum values is reached near the edges (Figure 5.32). Figure 5.32: Feature weights tangibles • Feature values: this set of fiducial controls the value assigned to the features. When the fiducial is visible on the table, a value is assigned to the corresponding feature; it is determined by the horizontal position and ranges between the feature minimum and maximum value (Figure 5.33). If the fiducial is not visible, the value of the feature is set to ”null”. Figure 5.33: Feature values tangibles • Special fiducials: some fiducial have a special meaning; when they are visible in the scene, they modify the behaviour of the system. In particular: CHAPTER 5. IMPLEMENTATION 59 – Continue: forces the system to continue with the same musical piece when possible (i.e. it did not reach the end of file) – Skip: forces the system to change audio item In addition to this, the tangible interface is equipped with finger tracking and is able to detect the position of the fingers of the user touching the surface of the table. This information is useful when combined with the projection of the screen interface on the table; in this way the user can simulate a mouse touching the table: • When the user places his or her finger on the table, the system sends a ”mouse button press” event to the host operating system simulating a mouse click. • When the user moves the finger keeping it tangent to the table, the system simulates a mouse drag operation • When the user removes his or her finger on the table, the system sends a ”mouse button release” event to the host operating system simulating a mouse release. Tapping interface The tapping interface is used to specify the value of the tempo feature. The user sends to the system a sequence of impulses and the system detects the tempo based on the impulse period. Since this value can change in time, the algorithm should adapt to sudden or gradual changes. In our application, the user sends the train of impulses either by clicking with the mouse on the tapping panel of the screen interface or by pressing the space bar in the keyboard. The user, by right-clicking on the tapping panel or by pressing ESC key, can clear the detected tempo. This input system combines precision and simplicity of usage and it has been successfully used in other commercial products such as Sibelius 6, a musical score editing program, in which the user can ”direct” the performance of the program by specifying the tempo using the computer keyboard or a predefined key of a digital piano. The algorithm to detect the tempo uses the average tapping delay, computed at each impulse; it represents the average distance in time between two consecutive impulses. A set of parameters influence the behaviour of the algorithm: • dmin , minimum delay between two impulses • dmax , maximum delay between two impulses • α, the averaging factor; it expresses how the recent estimation of tapping influence the average (usually 0.33) The local variable used by the algorithm are • tprevious , previous impulse time • Ntap , number of the last valid impulses. A valid impulse occurs with a delay between dmin and dmax from the previous. When an invalid impulse is received, it is discarded; otherwise Ntap is incremented. Upon tapping at time ti : • if Ntap == 0 (it is the first tap), increase Ntap CHAPTER 5. IMPLEMENTATION 60 • else if Ntap == 1 – if dmin ≤ (ti − tprevious ) ≤ dmax (the impulse is valid), we can compute the first estimation davg = ti − tprevious (5.36) and increase Ntap – else (the impulse is not valid), ignore it • else – if dmin ≤ (ti − tprevious ) ≤ dmax (the impulse is valid), we can compute the first estimation davg = α · (ti − tprevious ) + (1 − α) · davg (5.37) and increase Ntap – else (the impulse is not valid), ignore it • In all cases, record the current timestamp tprevious = ti (5.38) The value of the average tapping delay is exported to the other applications only after a predefined number of valid impulses (in our case, when Ntap is greater than 3). In this way we avoid transitory oscillation at the beginning of the tapping. Wii remote interface The Wii Remote is a remote controller used in the Nintendo Wii console. It is equipped with a infra-red camera used to track infra-red point light sources (not used in the system), a set of accelerometers used to detect the force the remote is subject to and some buttons. The interesting aspect of this device is that it could be connected to the computer through a Bluetooth; a Java library called WiiRemoteJ, based on the Java Service Release 82 Bluetooth specifications, can elaborate the data and provide an easy-to-use Application Programming Interface. Figure 5.34: The Wii Remote In the current work, the Wii Remote is used as a remote controller with the feature of detecting the tempo by analysing the frequency on which the user shakes the controller. When the system detects a Wii remote, it starts detecting the button and accelerometer events. In order to allow the user to control when the accelerometer information is used, two interaction modes has been defined: • Default mode: this mode is activated when the ”B” button is not pressed; the interaction with the system consists in the selection of the proposal made by the system. – UP, DOWN button: control the selection of the song from the proposal list – A button: select proposal – ONE button: set the run time mode to ”default” CHAPTER 5. IMPLEMENTATION 61 – TWO button: set the run time mode to ”skip” • Feature mode: this mode is activated when the ”B” button is pressed; the user interaction consists in setting the value of the features. If the user shakes the Wii remote, the system tries to estimate the tempo feature by performing a zero-cross detection in one direction and using the same algorithm of the tapping interface. – UP, DOWN button: control rms feature – LEFT, RIGHT button: control brightness feature – HOME button: clear feature values Chapter 6 Evaluation In this chapter we will perform an evaluation of the recommendation framework. The evaluation has been performed through a questionnaire proposed to a number of sujects. Since the goodness of the program is quite subjective, it has been chosen not to assess it automatically; therefore, the evaluation will be qualitative rather than quantitative and will be obtained through a questionnaire. This chapter begins with a description of the instance of the system used for the evaluation. Afterward, the audio dataset used to evaluate the system is described. Finally we will describe the structure of the questionnaire, followed by the raw results and our final considerations. 6.1 Instance of the system The recommendation framwork described in the previous chapters is very general and can give birth to many different applications. To perform the evaluation we had create an instance of the framework that could be accessable and easily understood by a wide range of people. Therefore we made the following decisions: • System functionalities: the system is composed by the following components: – Proposal generator, responsible of generating the list of fitting audio items according to the preferences expressed by the user – Transition generator, responsable of performing the transition when the system changes musical piece. This components is able to time-scale the two audio items in order to make the transition between the two BPMs as smooth as possible. The proposal ranking system has not been activated during the evaluation since it needs time to be trained. Therefore, the duration of the test would have been longer, diencouraging people from undertaking it. It is however possible, in case the tester is interested in experimenting this feature, to retake the test including the proposal ranking system. • Interface: the interface of the system consists in the screen interface. The tangible and the Wii remote interfaces are not considered, since the user may reach difference level of confidence with them and the evaluation may be biased. 62 CHAPTER 6. EVALUATION 6.2 63 Datasets The set of audio items used during the system can be divided in two categories: the one used for the mood-detection training and the one used for the evaluation. 6.2.1 Mood training dataset In 5.1.1 we described the use of three SVMs to classify the audio in four mood classes (Contentment, Exuberance, Anxiety and Depression). In this section we will explain how the dataset on which the SVMs have been trained has been created. For each class, a set of 5-seconds-long audio excerpts have been selected. The selection has been been performed keeping in mind the characteristic of each class: • Contentment: quiet music with a positive emotional content • Exuberance: loud music with a positive emotional content • Anxiety: loud music with a negative emotional content • Depression: quiet music with a negative emotional content In Appendix B the list of files that has been used to train the mood extraction system is presented. 6.2.2 Evaluation dataset The audio database used during the evaluation phase is composed by 1347 items mainly belonging to the following genres: • Progressive rock (Alan Parsons Project, Pink Floyd, ...) • Club/disco music (David Guetta, Gigi D’Agostino, ...) • Metal (Sonata Arctica, Yngwie Malmsteen, ...) • Classical (Bach, Concerti brandeburghesi) These three very different genres has been chosen to approximate the tastes of a wide audience, which can use the system with music they are familiar with. 6.3 Test structure The test aims to assess the goodness of the proposal and the transition generation systems. The TM T9400 R experiment platform is a Windows 7 Professional Edition computer with an IntelCore 2.53GHz processor and 4GB memory. The test has been proposed to a total of 80 people of different ages, origins and study backgrounds. Figure 6.1 shows the composition of the testset. It is biased toward the classes of prospected user to whom the program is addressed: young people with scientific background (most of the people come from Europe since the questionnaire has been presented in Milano). CHAPTER 6. EVALUATION 64 Figure 6.1: The characteristics of the tester set A standard test format has been designed to allow an unbiased and appropriate evaluation of the system; it is composed by the following phases: • Explanation (around 5 minutes): the functionalities of the system and the user interface are explained to the tester. In particular the user should know: – the meaning of the controllers of the value panel (tempo, rms/brightness, mood) and the settings panel (in particular the run time mode (Normal, skip and continue) – the meaning of the ”Now playing” panel – how to use the proposal list panel to express preferences. In particular the user should be aware of the fact that the system automatically selects the first proposal if no change has been made • Testing (10 minutes): in this phase the user tries to use the system by himself/herself. • Questionnaire: the user fills a questionnaire A shorter format can be applied to the people that only listen to the music generated by the system. In this case they have to fill a subset of the questions. The questionnaire proposed to the user is divided in three sections: • Personal information: this section is aimed at understanding the characteristics of the users and classify them • Auditory questions: this section is focused on the audio output of the system and the first impression the user gets from the software • Usage questions: this section can be filled only by a person that interacted with the software and is focused on the usability test of the system • Comments 6.4 Results of the questionnaire In this section we will show the result of the questionnaire. A brief comment has been added on top of each graph. The users have been segmented in the following classes, according to their level of knowledge of music: CHAPTER 6. EVALUATION 65 Personal information Age Origin Education How many hours a day do you listen to music? Where do you listen to music? Do you usually listen to music while Do you listen to the radio? How many years did you study music? Which musical instruments do you play? Have you already used a Musical Production Software? 0 – 30 30 – 50 over 50 ________________________________________________ Scientifi c Humanisti c 0-1 1-3 3+ a t home cl ub/disco s tudying/working dri ving doing sports yes no 0 1-3 3+ ________________________________________________ no yes _______________________ Auditory questions Did the system play pleasant music? How do you evaluate the transitions between songs? How well can the system be applied in the following fields Disco/pub performance Personal use (iPod, ...) Artistic exhibitions Music analysis Other, specify: _______________________________________________________________________________________ Usage questions (fill this part only if you used the system) How do you rate the following Screen interface (is the program intuitive?, appealing?, ...) Speed (is it fast enough?) Complexity (is it too difficoult to use/understand?) Parameter num ber (are there too many/few parameters to control?) How well does the system suit you artistic needs? Does the system respond to the changes in the parameters? How do you rate the music proposal made by the system? Do you think the system could enhance your artistic performance? Is the system intuitive (how much time do you need to learn how to use it)? Comments If you have any suggestion or comment, write here: Figure 6.2: The paper version of the questionnaire CHAPTER 6. EVALUATION Figure 6.3: The electronic version of the questionnaire 66 CHAPTER 6. EVALUATION 67 • Unclassified : not belonging to any of the two following classes • Listener : this user listen to music at least one hour a day and is able to recognize musical tracks and evaluate the transition between songs. The main focus of this user will be the output audio generated by the system. Requirements: listen to music more than one hour a day • Producer : this user knows how Music Production Software works and has already used one. This user is able to compare the system with the professional software. Requirements: used a Music Production Software 6.4.1 Auditory questions The first question (Did the system play pleasant music? ) analyses the overall perception of the music generated by the system. The result, displayed in Figure 6.4, shows a good value, uniform in the three classes. Figure 6.4: Did the system play pleasant music? The evaluation of the transition between songs (Figure 6.5) is quite high in all classes. Figure 6.5: How do you evaluate the transitions between songs? CHAPTER 6. EVALUATION 68 Concerning applications of the program (How well can the system be applied in the following fields? ) (Figure 6.6), we notice that producers are more reluctant to disco and pub applications since this is the field in which they excel. However, disco/pub and personal usage are in general the preferred ones. (a) Disco/pub performance (b) Personal use (iPod, ...) (c) Artistic exhibitions (d) Music analysis Figure 6.6: How well can the system be applied in the following fields? CHAPTER 6. EVALUATION 6.4.2 69 Usage questions The next set of questions (How do you rate the following...? ) evaluate the impact of the software to the user and the suitability of the graphical interface (Figure 6.7). The user interface is well rated whereas the speed and reactivity of the program does not satisfy the producers who need a fast real-time system. The evaluation of the number of parameter shows a peak near ”Normal” since the parameters that can be controlled mainly depends on the application and the market segment the software is directed to. (a) Screen interface (is the program intuitive?, appealing?, ...) (b) Speed (is it fast enough?) (c) Complexity (is it too difficult to use/understand?) (d) Parameter number (are there too many/few parameters to control?) Figure 6.7: How do you rate the following...? CHAPTER 6. EVALUATION 70 How well does the system suit you artistic needs? : the suitability to the artistic needs (Figure 6.8) is higher in the non-professional classes. Figure 6.8: How well does the system suit you artistic needs? Do you think the system could enhance your artistic performance? : the artistic performance improvement (Figure 6.9) is evident in the non-professional classes. Figure 6.9: Do you think the system could enhance your artistic performance? Does the system respond to the changes in the parameters? : the changes in the parameters (Figure 6.10) are well perceived by the users. Figure 6.10: Does the system respond to the changes in the parameters? CHAPTER 6. EVALUATION 71 How do you rate the music proposal made by the system? : the proposals made by the system (Figure 6.11) were considered quite positively. Figure 6.11: How do you rate the music proposal made by the system? Is the system intuitive (how much time do you need to learn how to use it)? : the system is considered very intuitive, after the 5 minutes explanation (Figure 6.12). Figure 6.12: Is the system intuitive (how much time do you need to learn how to use it)? 6.4.3 Comments The last part of the questionnaire can be considered the most interesting part since the user gave their suggestions and advices that can help to improve the software. The critical points of the system and the improvements that could be made are described in chapter 7. 6.5 Overview of the result From the questionnaire, we are able to infer that the overall impression of the software is quite positive in all classes, although the producers are sceptical about some features, in particular the transition generation. The reactiveness of the software has been well evaluated. Usually, when people are using a realtime software, they expect that when they perform some changes in the parameters, the system promptly responds by modifying some parts of the screen interface. The actual extent of the changes is not important (the system may show only a part of the computation, continuing to elaborate in background); the crucial point is giving the user the impression that his or her need not ignored by the system. The system, thanks to the efficient proposal generation algorithms and the moderate dimension of the database, performed well under this point of view. CHAPTER 6. EVALUATION 72 The result of questions about the artistic suitability of the system highly depends on the testset classes. The system is not seen as essential by the producers, that are more prone to the traditional DJ systems. The listener however evaluate the system as a useful tool, even more when integrated in other platforms. In addition, non-professional users think that the system can really improve their artistic performance and capabilities. In our opinion, the system should consider this market segment in which the enhancement is more evident. About the applications of the system, professional users are sceptical about the disco/pub applications, since the performance and the transition methods is considered too poor. On one hand, this shows the need of improving those features by adding more transition types but, on the other hand, the evaluation of the producers is probably biased toward a traditional view of the task of a DJs and a worry of being somehow replaced by the system. As we previously said, the system still needs the presence of the human artist to give its best and should be seen as a new tool, not a competitor. The tester set is not enthusiastic about the idea of personal usage of the system, but still considers it very promising. We may motivate this answer in the following way: the system, as it is, gives access to too many parameters and does not automate any control; a personal media player application should be more autonomous and avoid the interaction with the user as much as possible, since he or she is usually just interested in listening to music. A very important result is the good evaluation of the ease-of-usage of the system; even nonprofessional user consider the software as being very intuitive and user-friendly. This point is crucial in the future evolutions of the system that involve a wide-public, since it should be accessible and used by any kind of person. This result also evaluates the user interface, stating that the design has reached its purpose. Chapter 7 Perspectives and future developments In this chapter we will list some improvements and extensions that can be applied to the system. These conclusions has been developed according to the result of the evaluation phase. 7.1 System-level improvements Video application In the current work we explored the possibilities of audio feature extraction. A very similar research field in which analogous methods can be applied is ”video feature extraction”. The relationship between these two areas is clear: they both consist in calculating the value of some parameters from time-dependent data. The system can be easily applied also to this new field. Audio and video composition share many characteristics since they both aim at creating the novel by combining existing material. The system can become very interesting when the two approaches are exploited simultaneously by composing an audio+video stream and defining features that relate the two worlds (e.g. a user may want to generate a ”calm” music played along a ”calm” video or may want to find a background music that is somehow synchronised with the video scene). User interface study The issue of the user interface is far from being solved right now. A careful design of the visual and tangible interface has to be performed. The goodness of the system at the eyes of the users will be evaluated on the basis of the interface that should be as intuitive as possible and provide the fast interaction mechanism needed by a real time performance system. Interaction methods such as drag-and-drop or mouse motion detection could be interesting. A good example of graphical interface shown in Figure 7.1. In addition to this, the interface should give feedback to the user about the evolution of the features (e.g. showing the values of the feature of the output audio) and show the spectrogram or signal envelope in the proposal list and in the time-line. Versioning Until now, the system has been developed without considering the possible categories of users. In the next future, the production should consider different applications of the system and perform a careful selection of the functionalities of the software according to the 73 CHAPTER 7. PERSPECTIVES AND FUTURE DEVELOPMENTS 74 Figure 7.1: FL Studio interface needs of the different users. The different editions of the software could be: • STUDIO version: with the complete set of functionalities, used by professional users • DJ version: with an appealing user interface and optimised to be played in real time through the tangible interface • LITE version: used by non-professional users, consisting in an audio player that allows the user to specify a few relevant features Next song choice The user should have the possibility to choose the next song from the database, bypassing the system proposal. When this happens, the system would try to find the best fitting point in the song. Moreover a fast retrieval way should be developed. Usually this consists in allowing the user to start typing a part of the song name and incrementally constraint the database showing only the songs that contain the typed characters. Time-line editing In particular: The user should be able to edit the songs already inserted in the time-line. • Delete a song from the time-line if the playback has not started yet • Set the length of the transition between two audio items and the length of the playback of an audio item • Create loops; by doing so, the user specifies that the system should loop inside an audio segment CHAPTER 7. PERSPECTIVES AND FUTURE DEVELOPMENTS 75 Integration between the preprocessing and performance phase The preprocessing phase should be performed by the same software that plays the audio item, allowing the user to incrementally enlarge the database by adding audio items in real time. Since the analysis is very computationally expensive, it is could be done while the system is paused and still allowing the features values to be stored in an XML file to speed up the performance. Multiple proposal per audio item If there are multiple proposal for the same audio item (many sections of the song are compatible with the currently played song), they should be displayed in the same cell of the proposal list. Multiple transition types For the professional application it would be nice to allow the user select multiple types of transitions in order to adapt to different music styles and songs. The user could also define a custom transition type by defining the evolution of some parameters (spectrogram, volume, ...) during it. 7.2 Implementation improvements Feature extraction The more relevant features are extracted, the more interesting the system becomes. During the evaluation phase, a few users expressed the need of controlling more features or improving the range of values of the features (in particular the ”mood” feature should consider more nuances of emotions). A genre classification system could be also interesting. In addition, this aspect leads to the next paragraph, since the features should not be hard-coded in the software but the user should have the freedom of expanding the feature set. Modularity The system should be customisable and expandable, allowing the user to add new components and functionality at runtime. Due to the current early stage of the project, this aspect has not been considered yet. The modularity can be applied to the following areas: • features: the user should be able to create his or her own feature extraction function by implementing an exported interface. In order to define a feature, the user has to specify: – Feature domain: the set of values that the feature can assume – Feature extraction function: a function that extract the evolution of the feature from an audio signal – Feature similarity function: used to compare two values of the feature • input devices: the user should be able to control the system in a personalised way (e.g. using a MIDI keyboard, a virtual instrument or a mixer). The input device can control the evolution of the value of one or more features, change the feature weights or perform custom actions such as stopping the system or forcing a predefined behaviour. • output devices: the result of the system should be accessible by other programs such as real-time effect generators, audio editing or performance software. We may identify the following exportable items: – audio data: the audio generated by the program – feature evolution parameters: the value of the features of the played audio can be exported in order to be used by visualisation tools or other systems. CHAPTER 7. PERSPECTIVES AND FUTURE DEVELOPMENTS 76 Silence removal The system should remove the silence from the beginning and the end of the song as they should not be considered as part of the song. Manual anchor selection The user should be able to manually select the anchors in a song. Saving the result The system should allow the user to save the sequences of audio items that are played. The saved data should still allow the user to edit the sequence and adjust some settings after the playback. In addition, the system should be able to export the result in an audio file. Preview The system should allow the user to preview the transition and listen to the audio items before they are actually played in the time-line. Pause The system allow the performance to be paused. Formats 7.3 The system should be able to handle more audio formats (.mp3, .aiff, .ape, ...). From musical compositing to composition In this final section we would like to explore a new conception of music, based on the recommendation framework developed in this thesis. At this point, an application of the framework can be a dynamic music compositing software that can automatically create transitions and adapt the emotional aspects of music to a particular situation. We would like to think about the evolution of the system toward a musical composition software that embodies a new idea of music: the musical graph. When we listen to music or play it, we usually consider the score as a line in time that somehow evolves. On the one hand, time is inseparable from music, since musical notes, melodies or harmonies need a time component to be meaningful. On the other hand, in our mind, the exact evolution of music in time is not important: what we remember about a musical track is the emotion it gave us or some relevant sections or melodies. This is particularly clear in film music. The viewers are usually more likely to remember the main theme of a soundtrack (e.g. the theme of ”Indiana Jones” by John Williams) rather than the variations and arrangements made during the scenes of the movie. A new idea of music is born; a musical piece in the mind of the composer and the listener is a multidimensional space that is projected on time in the moment it is played. Musical themes, melodies, harmonies and rhythms are connected in a tight web that the composer can explore during the creation act. In the field of video games, the developers of the Lucasart’s adventure games designed iMUSE (1990),the first system that dynamically arranges music on the basis of the game scene. The composers arrange musical themes in a sort of musical graph where each node is a musical pattern (the main themes) and arcs are the transitions between those themes. On the basis of the inputs given by the player, the system merges music segments and plays a coherent music stream that follows the gameplay. The weakest point of this system is the need for expert composers and users to manually design transitions and themes. We would like to go beyond it and design a system that, based on a music database, can automatically create transitions and adapt the emotional aspects of music to a particular situation. It is here that Music Information Retrieval (MIR) comes to aid: modern MIR techniques allow CHAPTER 7. PERSPECTIVES AND FUTURE DEVELOPMENTS 77 the computer to automatically extract relevant audio features. This information can then be used to create smooth transitions between segments. To conclude, we would like to spend a few words about the concept of human creativity. In our opinion, the creation process is not just referred to the concept of creatio ex novo but also to the idea of finding new links and combinations among things that already exist ([46]). The very term ”composition” referring to musical creativity suggests this interpretation. It is sometimes surprising how new ideas derive just from the juxtaposition of existing material. Bibliography [1] Torsten Anders. Composing Music by Composing Rules: Computer aided composition employing Contstraint Logic Programming. PhD thesis, Queen University Belfast, 2003. [2] Fabio Antonacci, Antonio Canclini, and Augusto Sarti. Advanced Topics on Audio Processing. 2009. [3] Eiichiro Aoki, 1982. [4] Jean-Julien Aucouturier and Francois Pachet. Scaling up music playlist generation. 2001. [5] Luke Barrington, Reid Oda, and Gert Lanckriet. Smarter than genius? human evaluation of music recommender systems. 2009. [6] P. Bellini, P. Nesi, and M. B. Spinu. Cooperative visual manipulation of music notation. 2002. [7] Klaas Bosteels and Etienne E. Kerre. A fuzzy framework for defining fynamic playlist generation heuristics. Fuzzy sets and systems, pages 3342–3358, 2009. [8] Jean Bressin, Carlos Agon, and Gerard Assayag. Openmusic 5: A cross-platform release of the computer-assisted composition environment. 2006. [9] William A. S. Buxton. A composer’s introduction to computer music. 1975. [10] Rui Cai, Chao Zhang, Lei Zhang, and Wei-Ying Ma. Scalable music recommendation by search. 2007. [11] Chris Chafe. Case studies of physical models in music composition. 2003. [12] Sarit Chantasuban and Sarupa Thiemjarus. Ubiband: A framework for music composition with bsns. IEEE Xplore, pages 267–272, 2009. [13] Yap Siong Chua. Composition based on pentatonic scales: a computer-based approach. 1991. [14] DLKW. Codeorgan. URL http://www.codeorgan.com/. [15] Todor Fay, 1995. [16] Yazhong Feng, Yueting Zhuang, and Yunhe Pan. Music information retrieval by detecting mood via computational media aesthetics. 2003. [17] Derry Fitzgerald. Automatic Drum Trascription and Source Separation. 2004. [18] Olivier Gillet and Gael Richard. Extraction and remixing of drum tracks from polyphonic music signals. pages 315–318, 2005. [19] E. Gmez. Tonal description of music audio signal. PhD thesis, Universitat Pompeu Fabra, 2006. 78 BIBLIOGRAPHY 79 [20] Sten Govaerts, Nik Corthaut, and Erik Duval. Mood-ex-machina: towards automation of moody tunes. 2007. [21] Evolutionary System Group. http://musigen.unical.it/. Social network di musica generativa. URL [22] Martin Henz, Stefan Lauer, and Detlev Zimmermann. Compoze - intention-based music composition through constraint programmimg. 2009. [23] Sergi Jordà and Otto Wuest. A system for collaborative music composition over the web. 2001. [24] Sergi Jordà, Martin Kaltenbrunner, Günter Geiger, and Ross Bencina. The reactable. 2004. [25] Ajay Kapur, Manj Benning, and George Tzanetakis. Query-by-beat-boxing: Music retrieval for the dj. 2004. [26] Krumhansl. Cognitive foundations of musical pitch. Oxford UP, 1990. [27] Michael Z. Land and Peter N. McConnel, 1991. [28] Cyril Laurier and Perfecto Herrera. Mood cloud: A real-time music mood visualization tool. 2008. [29] Tao Li and Mitsunori Ogihara. Toward intelligent music information retrieval. 2006. [30] Xuelong Li, Dacheng Tao, Stephen J. Maybank, and Yuan Yuan. Visual music and musical vision, 2008. URL http://www.elsevier.com/locate/neucom. [31] Dan Liu, Lie Lu, and Hong-Jiang Zhang. Automatic mood detection from acoustic music data. 2003. [32] Shazam Entertainment Ltd. Shazam, 2010. URL http://www.shazam.com/. [33] Niitsuma Masahiro, Hiroshi Takaesu, Hazuki Demachi, Masaki Oono, and Hiroaki Saito. Development of an automatic music selection system based on runner’s step frequency. 2008. [34] Owen Craigie Meyers. A mood-based Music Classification and Exploration System. PhD thesis, Massachussetts Institue of Technology, 2007. [35] Alexandros Nanopoulos, Dimitrios Rafailidis, Maria M. Rixanda, and Yannis Manolopoulos. Music search engines: Specification and challenges. 2009. [36] Elias Pampalk, Arthur Flexer, and Gerhard Widmer. Improvements of audio-based music similarity and genre classification. 2005. [37] Bryan Pardo. Finding structure in audio for music information retrieval. IEEE Signal Processing Magazine, 2006. [38] Steffen Pauws, Win Verhaegh, and Mark Vossen. Music playlist generation by adapted simulated annealing. Information Sciences, pages 647–662, 2007. [39] Mauro C. Pichiliani and Celso M. Hirata. A tabletop groupware system for computer-based music composition. 2009. [40] Pietro Polotti and Davide Rocchesso. Sound to Sense - Sense to Sound: A state of the art in Sound and Music Computing. 2008. BIBLIOGRAPHY 80 [41] Giorgio Prandi, Augusto Sarti, and Stefano Tubaro. Music genre visualization and classification exploiting a small set of high-level semantic features. 2009. [42] Gordon Reynolds, Dan Barry, Ted Burke, and Coyle Eugene. Towards a personal automatic music playlist generation algorithm: The need of contextual information. 2007. [43] Alexander P. Rigopulos and Eran B. Egozy, 1995. [44] R. Roth. Music and animation tool kit: Modules for computer multimedia composition. Computers Mathematical Applications, pages 137–144, 2009. [45] Man-Kwan Shan, Fang-Fei Kuo, and Suh-Yin Lee. Emotion-based music recommendation by affinity discovery from film music. Expert Systems with Applications, 2009. [46] Peyman Sheikholharam and Mohamad Teshnehlab. Music composition using combination of genetic algorithms and kohonen grammar. 2008. [47] Muneyuki Unehara and Takehisa Onisawa. Interactive music composition system. pages 5736–5741, 2004. [48] Yi-Hsuan Yang, Lin Yu-Ching, and Homer H. Chen. Music emotion classification: a regression approach. 2007. Appendix A User manual In this chapter we will describe how to use the components of the system. The first section describes how to install the needed software, whereas the second section explains the commands and instruction that have to be executed. The work-flow is the following: starting from a set of audio files, a MATLAB script is used to extract the features and save them in an xml file. After that, the Java performance program can be executed. A.1 Prerequisites In order to execute the software, some programs need to be installed. Figure A.1: MATLAB interface MATLAB MATLAB stands for ”MATrix LABoratory” and is a numerical computing environment. Developed by The MathWorks, MATLAB allows matrix manipulations, plotting of functions and data, implementation of algorithms, creation of user interfaces, and interfacing with programs written in other languages, including C, C++, and Fortran. MATLAB can be purchased at http://www.matlab.com. MIRtoolbox MIRtoolbox offers an integrated set of functions written in Matlab, dedicated to the extraction from audio files of musical features such as tonality, rhythm, structures, etc. The objective is to offer an overview of computational approaches in the area of Music Information Retrieval. The design is based on a modular framework: the different algorithms are decomposed into stages, formalized using a minimal set of elementary mechanisms. These building blocks form Figure A.2: MIRtoolbox the basic vocabulary of the toolbox, which can then be freely articulogo lated in new original ways. These elementary mechanisms integrates all the different variants proposed by alternative approaches - including new strategies we have developed -, that users can select and parametrize. This synthetic digest of feature extraction tools enables a capitalization of the originality offered by all the alternative strategies. Additionally to the basic computational processes, the toolbox also includes higher-level musical feature extraction tools, whose alternative strategies, and their multiple combinations, can be selected by the user. 81 APPENDIX A. USER MANUAL 82 The choice of an object-oriented design allows a large flexibility with respect to the syntax: the tools are combined in order to form a sets of methods that correspond to basic processes (spectrum, autocorrelation, frame decomposition, etc.) and musical features. These methods can adapt to a large area of objects as input. For instance, the autocorrelation method will behave differently with audio signal or envelope, and can adapt to frame decompositions. The toolbox is conceived in the context of the Brain Tuning project financed by the European Union (FP6-NEST). One main objective is to investigate the relation between musical features and music-induced emotion and the associated neural activity. Java Runtime Environment (JRE) A Java Virtual Machine (JVM) enables a set of computer software programs and data structures to use a virtual machine model for the execution of other computer programs and scripts. The model used by a JVM accepts a form of computer intermediate language commonly referred to as Java bytecode. This language conceptually represents the instruction set of a stack-oriented, capability architecture. Programs intended to run on a JVM must be compiled into a standardized portable binary format, which typically comes in the form of .class files. A program may consist of many classes in different files. For easier distribution of large programs, multiple class files may be Figure A.3: Java Runtime packaged together in a .jar file (short for Java archive). Environment logo The JVM runtime executes .class or .jar files, emulating the JVM instruction set by interpreting it, or using a just-in-time compiler (JIT) such as Sun’s HotSpot. JIT compiling, not interpreting, is used in most JVMs today to achieve greater speed. Ahead-of-time compilers that enable the developer to precompile class files into native code for a particular platforms also exist. Like most virtual machines, the Java Virtual Machine has a stack-based architecture akin to a microcontroller/microprocessor. However, the JVM also has low-level support for Java-like classes and methods, which amounts to a highly idiosyncratic memory model and capabilitybased architecture. The JVM, which is the instance of the ’JRE’ (Java Runtime Environment), comes into action when a Java program is executed. When execution is complete, this instance is garbage-collected. JIT is the part of the JVM that is used to speed up the execution time. JIT compiles parts of the byte code that have similar functionality at the same time, and hence reduces the amount of time needed for compilation. BlueCove Java Library BlueCove is a freeware implementation of the Java JSR-082 Bluetooth specification. It provides a platformindependet layer that can be used by the Java classes. Figure A.4: logo A.2 A.2.1 Bluetooth Running the system Feature extraction The feature extraction is performed by a MATLAB script. In order to run the script, the audio files should be downsampled to 11025 Hz, converted to .wav and placed in a folder called ”audio” in the script current folder. After that, it is necessary to run the script ”batchMirAnalysis.m” that generates the feature XML files and stores them in the ”xml” folder. APPENDIX A. USER MANUAL 83 For each file, a set of xml documents are created: • filename anchors.xml: this document contains the position of the anchors • filename brightness.xml: this document contains the values of the brightness feature • filename harmony.xml: this document contains the values of the harmony feature • filename mfcc.xml: this document contains the values of the mfcc feature • filename mood.xml: this document contains the values of the mood feature • filename rms.xml: this document contains the values of the rms feature • filename tempo.xml: this document contains the values of the tempo feature A.2.2 Performance The result of the analysis phase can be used to run the Java program and obtain a performance. To do so, the user has to copy the audio (11025 Hz .wav files) and xml documents and place them in a folder inside the ”audio” and ”xml” subfolders respectively. The path of the folder can be set editing the ”polysound.properties” file (see later). It is now possible to run the software by typing: java Main in the program main directory. A.2.3 Configuration In order to allow the user to configure the default settings, a properties file has been created whose values are loaded by the program at startup. The file is located in the root folder of the program (the same as ”Main.class” file). The configuration file is a set of key/value pairs and it is structured as following: #Comment key value key value ... The keys are the following: • AudioDatabase.audioPath (String): it stores the path of the audio folder (the one containing audio files). There could be multiple audio folders; in this case, the paths are separated by ”;” • AudioDatabase.xmlPath (String): it stores the path of the XML folder (the one containing the XML files). There could be multiple XML folders; in this case, the paths are separated by ”;” and they should appear in the same order w.r.t the correspondent audio folder • ProposalGeneratorSettings.minFadeLength (long): the maximum length (in samples) of the segments used for transitions APPENDIX A. USER MANUAL 84 • ProposalGeneratorSettings.maxFadeLength (long): the minimum length (in samples) of the segments used for transitions • ProposalGeneratorSettings.proposalListMaxLength (int): the maximum number of proposal made by the system at each iteration (default: 20) • ProposalGeneratorSettings.harmonyWeight (double): the initial harmony weight (default: 0.33) • ProposalGeneratorSettings.tempoWeight (double): the initial tempo weight (default: 0.33) • ProposalGeneratorSettings.brightnessWeight (double): the initial brightness weight (default: 0.33) • ProposalGeneratorSettings.rmsWeight (double): the initial rms weight (default: 0.33) • ProposalGeneratorSettings.moodWeight (double): the initial mood weight (default: 0.33) • ProposalGeneratorSettings.useTabuList (boolean): specifies if the system should use the tabu list (default: true) • ProposalGeneratorSettings.oneProposalPerDatabaseAudioItem (boolean): specifies if the system should display only one proposal per audio item during the proposal generation phase (default: true) • RunTimeSettings.runTimeModeEnum (NORMAL—SKIP—CONTINUE): defines the initial run time mode (default: NORMAL) • TransitionGeneratorSettings.useTimescale (boolean): defines if the system should use timescale (default: false) • ProposalRecommender.pointListXMLFilename (String): the path of the files in which the player history will be saved. If the file exists, the system reads it at the beginning of the execution and then updates it Appendix B The audio database In this chapter we attach the list of audio items used during the evaluation phase, both for the mood training and the playback. B.1 Mood detection training set This section contains the list of files used during the training phase of the mood detection algorithm. They are divided according to the mood class. B.1.1 Anxious Liberi Fatali Maybe I’m A Lion Only A Plank Between One and Perdition Force Your Way Other World Start Hurry Attack The Legendary Beast B.1.2 Contentment May it be J.S. BachConcerto No.6 in B flat major BWV III Allegro J.S. BachConcerto No.5 in D major BWV III Allegro J.S. BachConcerto No.3 in G major BWV VI Allegro J.S. BachConcerto No.6 in B flat major BWV I Allegro J.S. BachConcerto No.3 in G major BWV IV Allegro B.1.3 Depression J.S. BachConcerto No.1 in F major BWV II Adagio J.S. BachConcerto No.3 in G major BWV V Adagio (from Trio Sonata Drifting in G major BWV 1048) J.S. BachConcerto No.5 in D major BWV II Affetuoso Total Eclipse J.S. BachConcerto No.2 in F major BWV II Andante Tragedy J.S. BachConcerto No.6 in B flat major BWV II Adagio ma non tanto Path of Repentance Ominous B.1.4 Exuberance 85 APPENDIX B. THE AUDIO DATABASE 86 Orinoco flow Don‘t Hold Back Little Hans One More River Pin floi B.2 The performance database This section contains the list of files used during the evaluation phase. The items belong to different genres to test the system’s ability to change from a style to another and providing the user with music they may know. J.S. Bach - Concerto No.1 in F major BWV I Am A Mirror Avalanche 1046 - I Allegro I‘d Rather Be a Man Damned If I Do J.S. Bach - Concerto No.4 in G major BWV Mammagamma 04 Funny You Should Say That 1049 - I Allegro Some Other Time Hawkeye [Instrumental] A Dream Within A Dream The Tell-Tale Heart Hawkeye Heaven Knows Wine From The Water Inside Looking Out I Robot J.S. Bach - Concerto No.1 in F major BWV L’Arc En Ciel La Sagrada Familia 1046 - IV Minuetto-Trio I-Polonaise-Trio II The Fall Of The House Of Usher I Let’s Talk About Me J.S. Bach - Concerto No.5 in D major BWV The Voice Let‘s Talk About Me 1050 - I Allegro Where‘s The Walrus (Instrumenta Lucifer Breakaway You‘re Gonna Get Your Fingers Bur Return To Tunguska Breakdown J.S. Bach - Concerto No.2 in F major BWV Sirus (Instrumental) Gemini 1047 - III Allegro assai Stereotomy I Don’t Wanna Go Home J.S. Bach - Concerto No.6 in B flat major The Nirvana Principle Little Hans BWV 1051 - I Allegro The Three Of Me Sooner Or Later (2) A Recurring Dream Within A Dream J.S. Bach - Concerto No.1 in F major BWV Sooner Or Later Don‘t Hold Back 1046 - II Adagio Standing On Higher Ground Nucleus J.S. Bach - Concerto No.4 in G major BWV The Cask Of Amontillado Paseo De Gracia (Instrumental) 1049 - II Andante Walking On Ice Psychobabble Beaujolais We Play The Game Somebody Out There (2) Eye In The Sky You Won‘t Be There Somebody Out There Freudiana J.S. Bach - Concerto No.2 in F major BWV The Fall Of The House Of Usher I I Wouldn’t Want To Be Like You 1047 - I Allegro Turn Your Heart Around More Lost Without You J.S. Bach - Concerto No.5 in D major BWV You‘re On Your Own Seperate Lives (2) 1050 - II Affetuoso J.S. Bach - Concerto No.3 in G major BWV Seperate Lives Don’t Let It Show 1048 - IV Allegro The Raven Dora J.S. Bach - Concerto No.6 in B flat major Too Late How Can You Walk Away BWV 1051 - II Adagio ma non tanto Tragedy In The Real World Chinese Whispers (Instrumental) Turn It Up Money Talks Day After Day (The Show Must Go O You Lie Down With Dogs Silence And I Far Away From Home J.S. Bach - Concerto No.1 in F major BWV The System Of - Doctor Tarr And P Give It Up (US Release) 1046 - III Allegro Tijuaniac Hollywood Heart J.S. Bach - Concerto No.4 in G major BWV Vulture Culture (2) Mammagamma (Instrumental) 1049 - III Presto Vulture Culture Secret Garden Children Of The Moon Winding Me Up The Fall Of The House Of Usher I Closer Too Heaven J.S. Bach - Concerto No.2 in F major BWV The Same Old Sun (2) Days Are Numbers (The Traveller) (2) 1047 - II Andante The Same Old Sun Days Are Numbers (The Traveller) J.S. Bach - Concerto No.5 in D major BWV You Can Run Fight To Win 1050 - III Allegro J.S. Bach - Concerto No.3 in G major BWV APPENDIX B. THE AUDIO DATABASE 87 1048 - V Adagio (from Trio Sonata in G Call Up V. The Turn... (Part Two) major BWV 1048) Can’t Take It With You Voyager J.S. Bach - Concerto No.6 in B flat major Cloudbreak What Goes Up ... BWV 1051 - III Allegro Damned If I Do What Goes Up Ask No Question Doctor Tarr And Professor Feth Of Silence Chomolungma Dr. Evil Edit Abandoned Pleased Brainwashed Exploited If I Could Change Your Mind Dreamscape Arnold Layne Let Yourself Go Eye In The Sky Astronomy Domine No Answers Only Questions (Final Fall Free Black Sheep Step By Step Far Ago And Long Away Blank File Stereotomy Two Games People Play (2) Braveheart The Fall Of The House Of Usher I Games People Play Broken (Edit Version) Total Eclipse H.G. Force Part One broken glass [fire and ice bonus] J.S. Bach - Concerto No.3 in G major BWV Hyper-Gamma-Spaces Cirrus Minor 1048 - VI Allegro I Can’t Look Down Don’t Say A Word (Edit) Beyond The Pleasure Principle I Robot Suite Full Moon (Edit) Night Full Of Voices I’m Talkin’ to You Go With The Flow Old And Wise I. The Turn... (Part One) Hey You Separate Lives (Alternative Mix) Ignorance Is Bliss In the Flesh The Fall Of The House Of Usher V II. Snake Eyes Intro Hawkeye (Demo) III. The Ace Of Swords Last Drop Falls The Ring In The Lap Of The Gods Let There Be More Light To One In Paradise IV. Nothing Left To Lose Master of Ceremonies Turn Your Heart Around (single ve Jigue Misplaced Sects Therapy Light Of The World (2) Money The Naked Vulture Light Of The World Obscured By Clouds Oh Life (There Must Be More) Limelight One of these days No Answers Only Questions (The Fi Lucifer & Mammagamma Panic No One Can Love You Better Than M May Be A Price To Pay Pigs On The Wing (Part 1) Don‘t Let The Moment Pass Mr Time Runaway Upper Me No Future In The Past San Sebastian (Original Version) Freudiana (II) Old And Wise Shamandalie Destiny One Day To Fly Signs of Life There But For The Grace Of God One More River Speak To Me - Breathe Ammonia Avenue Out Of The Blue Speak To Me Dancing On A Highwire Pavane Sysyphus Part 1 , Richard Wright Don’t Answer Me Press Rewind The little boy that Santa Claus forgot Genesis Ch. 1 V. 32 Pyromania The Post War Dream Let Me Go Home Re-Jigue Theme from barveheart One Good Reason Rubber Universe Unopened Pipeline (Instrumental) Shadow Of A Lonely Man Vengeance - Yngwie Malmsteen Prime Time Siren Song What can I do Since The Last Goodbye So Far Away Wolf And Raven Urbania Temporalia A New Day Yesterday You Don’t Believe The Call Of The Wild A Song For Jeffrey Apollo The Eagle Will Raise Again And The Mouse Police Never Sleeps Back Against the Wall The Gold Bug Aqualung Beginnings The Time Machine (Part 2) Beastie Blown By The Wind The Very Last Time Birthday Card At Christmas Blue Blue Sky I Time (2) Crossfire Blue Blue Sky II Time Fat Man Brother Up In Heaven Too Close To The Sun Kissing Willie APPENDIX B. THE AUDIO DATABASE 88 Lap Of Luxury Summer 68 False News Travels Fast Living In The Past Sysyphus Part 2 , Richard Wright Fearless (you’ll never walk alone) My Sunday Feeling The Nile Song Fearless North Sea Oil The Rest Of The Sun Belongs Harry’s game Quizz Kid The Thin Ice Have A Cigar Roots To Branches Time I Want Out (Helloween Cover) Someday the Sun Won’t Shine for Y Weballergy Kingdom For A Heart Songs From The Wood When You’re In Mary-Lou (Acoustic Version) Spiral Your Possible Pasts Matilda Mother Steel Monkey Acres Wild Nobody Home Stormy Monday Blues [Live] Aqualung On The Run This Is Not Love Clasp One Of The Few War Child Cold Wind To Valhalla One Of These Days With You There To Help Me Crazed Institution See Emily play Icarus dream fanfare Cross-Eyed Mary Set The Controls For The Heart Of The Sun My Resurrection Dot Com Still Loving You (Scorpions Cover) Never Die Farm On The Freeway Summer ’68 Victoria’s Secret Fylingdale Flyer Sysyphus Part 3 , Richard Wright Locked and Loaded-lam Holly Herald The Cage A Pillow Of Winds Jack In The Green The Dogs of War Ain’t Your Fairytale Jeffrey Goes To Leicester Square The Fletcher Memorial Home angel in heat [seventh sign bonus] Life Is A Long Song The Happiest Days Of Our Lives Arnold Layne Living in the Past The Thin Ice Astronomy Domine Love Story [Live] The Wind Beneath My Wings (Bette Middler Blinded No More Love Story Cover) Breathe Nothing To Say Tomorrow’s Gone - Yngwie Malmsteen Broken (Album Version) Occasional Demons What Do You Want From Me Ciprea Orion World In My Eyes (Depeche Mode Cover) Die With Your Boots On (Iron Maiden Cover) Queen And Country A Christmas Song Dreams Rare And Precious Chain A New Day Yesterday [Live] Facing the animal Someday The Sun Won’t Shine For Y AWOL Gun The Rattlesnake Trail Beggar’s Farm If Under Wraps #1 Black Satin Dancer In the Flesh- Why I sing the blues Boure In the Flesh Alone In Paradise Cheap Day Return Is There Anybody Out There Cavallino rampante Christmas Song Is There Anybody Out There Champagne Bath Cup Of Wonder Learning To Fly Facing The Animal Ears Of Tin Lucifer Sam I Don’t Know European Legacy Mary Lou Revolution-lam Fallen On Hard Times Mary-Lou (Acoustic Version) th Commandment Home Money Ain’t Your Fairytale Inside My Land Another Brick in the Wall (Part I) Jump Start No Lovelost - Yngwie Malmsteen Anywhere is Ladies On The Run Book of days Life Is a Long Song Orinoco flow Burning Bridges Out Of The Noise Peacemaker cantabile ’vivaldi’ [magnum opus bonus] Roll Yer Own Portami via Come Sei Veramente Salamander Remember A Day Crying Song Too Old To Rock ’N’ Roll (Too You Rose of tralee Downtown Working John-Working Joe See Emily Play Dream Thieves You Know I Love You Still Loving You (Scorpions Cover) Enemy Fugue APPENDIX B. THE AUDIO DATABASE 89 Meant To Be Living In The Past Wish You Were Here Pictures Of Home Locomotive Breath Wot’s ... Uh The Deal Rising Force Mother Goose Dogs In The Midwinter Sing In Silence Moths Driving Song Cracking the Whip-lam Move On Alone For Michael Collins, Jeffrey And Another Brick In The Wall (Part 2) Nothing At All From A Dead Beat To An Old Grease Another Brick in the Wall - Part 1 Requiem God Rest Ye Merry Gentlemen Another Brick in the Wall part 1 Rocks On The Road Journeyman Comfortably Numb Said She Was A Dancer Look Into The Sun Corporal Clegg Son March The Mad Scientist Die With Your Boots On (Iron Maiden Cover) Summerday Sands One White Duck 010 = Nothing At A Fat Old Sun Taxi Grab Protect And Survive Flaming This Free Will Ring Out Solstice Bells Free Your Under Wraps #2 Rock Island i can’t wait [i can’t wait e.p.] Undressed To Kill Saboteur Kiss me Brain damage - Eclipse Sealion Learning To Fly Bedrooms Eyes Serenade To A Cuckoo May it be Forever One Skating Away (On The Thin Ice Of One Slip Never Die Slow Marching Band Prendimi prelude to april Sparrow On The Schoolyard Wall Reckoning Day, Reckoning Night The End Of This Chapter Sweet Little Angel Remember a day Winds of War (Invasion)-lam Valey Replica aftermath [i can’t wait e.p.] Warm Sporran Round and around Another Brick in the Wall (Part II) Wicked Windows Sacrifice Black Sheep Wond’ring Aloud San Sebastian Bring The Boys Back Home Fullmoon San Tropez Don’t Say A Word Hairtrigger Shy Dream Thieves Toccata Silver Tongue False News Travel Fast Crown of Thorns-lam Sing In Silence Green is the Colour And she moves through the fair The Gold It’s In The ... I’d Die Without You - Yngwie Malmsteen Another Brick In The Wall (Part 2) The Great Gig In The Sky Keep Talking Another Brick in the Wall - Part 2 The Gun Kingdom For A Heart Coming Back To Life The hands that built America Like an angel (for April) Cymbaline The Happiest Days of our Lives Money Evergreen The Hero’s Return Northern shy Follow you The Only One - Yngwie Malmsteen On the Turning Away Fullmoon Paintbox Goodbye Blue Sky Pigs On The Wing (Part 2) Hey You Cover) Pow R Toc H I Want Out (Helloween Cover) Up the Khyber Seamus Julia Dream Vera See Saw Land Of The Free Water dance Show me heaven Last Drop Falls Wish You Were Here Shy Mother Another Christmas Song The Great Gig In The Sky Mudmen Back To The Family The Gunners Dream My heart will go on Back-Door Angels The Happiest Days of Our Lives My Resurrection Black Sunday The Misery One Of These Days Caledonia Ti Scrivo Overture 1622 - Yngwie Malmsteen Flying Colours Viaggio in aereo Paranoid Eyes Hunting Girl When The Tigers Broke Free power and glory ’takada’s theme’ i can’t wait Later That Same Evening When you say nothing at all e.p. Time Two Minds, One Soul (Vanishing Point APPENDIX B. THE AUDIO DATABASE 90 Regina Dei Cristalli Last Drop Falls A Song For Jeffrey Replica (Live) Letter To Dana Fire At Midnight See Saw Marooned March The Mad Scientist Set the controls for the heart of the sun My Land (Live) Slipstream Several Species Of Small Furry Animals My Selene The Pine Marten’s Jig Gathered Together In A Cave And Grooving On The Run Adagio With A Pict One A Song For Jeffrey Take Up The Stethoscope And Walk Ossessione First Post The Boy Who Wanted To Be A Real Puppet Party Sequence Introduction By Claude Nobs Victoria’s Secret Respect The Wilderness Animele Bad-Eyed ’N’ Loveless See Emily play Beggar’s Farm Batteries Not Included spanish castle magic [inspiration bonus] No Lullaby Broadsword The Happiest Days of our Lives A Christmas Song Bungle In The Jungle The Narrow Way Part 1 , David Gilmour Sweet Dream Dangerous Veils The Show Must Go On Tiger Toon Dharma For One Vento d’Europa A New Day Yesterday Every Day I Have The Blues Voodoo - Yngwie Malmsteen Look At The Animals Heavy Water A Time For Everything Skating Away (On The Thin Ice O Hunt By Numbers Fat Man Boure Jack Frost And The Hooded Crow Grace Jack In The Green Nothing Is Easy One Brown Mouse Law Of The Bungle Nursie Singing All Day Law Of The Bungle (Part 2) Radio Free Moscow Too Old To Rock ’N’ Roll (Too You Nothing Is Easy Rover Sarabande One Brown Mouse Skating Away (On The Thin Ice Of Beauty and A Beast-lam A New Day Yesterday Something’s On The Move Another Brick in the Wall part 2 Left Right Sweet Dream Any Colour You Like Living In The Past Thick As A Brick [Edit No. 1] Empty Spaces Flute Solo Improvisation God Re Thinking Round Corners Goodbye Blue Sky Solitaire To Cry You A Song Mc; Atmos To Cry You A Song Up To Me Sospeso Nel Tempo Songs From The Wood Velvet Green The Gnome Teacher Witch’s Promise The Narrow Way Part 2 , David Gilmour Post Last Andante Cold Wind To Valhalla (Intro) Sweet Dream Brothers Dun Ringill Cross-Eyed Mary Ill See The Light Tonight Hymn 43 Scenario Like An Angel One White Duck Audition Replica Only Solitaire Mother Goose The Bogeyman-lam Allegro Aqualung A New Machine (Part 1) Fuguetta (Instrumental)-lam No Rehearsal Another time A New Machine (Part 2) Locomotive Breath Any Colour You Like Arnold Layne Life Is A Long Song Baby can I hold you Eclipse Thick As A Brick [Edit No. 1] Carefull with that axe, Eugene Empty Spaces A Passion Play [Edit No. 8] Champagne Bath God is God [alchemy bonus] Skating Away (On The Thin Ice O Childhood’s End I dont wanna know boadicea Bungle In The Jungle Fade To Black (Metallica Cover) Ibiza Bar Bike Get Your Filthy Hands Off My Desert Le Tue Mani Eclipse Goodbye Blue Sky Minstrel boy More Blues Hey You Paintbox Qui Danza I should have known better Southampton Dock Revontulet Jugband Blues The Nile song Scarecrow APPENDIX B. THE AUDIO DATABASE 91 The Grand Vizier’s Garden Party Part 1 - Goodbye Cruel World Broadsword Entrance , Nick Mason Star of a country down Commons Brawl What Shall We Do Now- The celts No Step Cheerio The Last Few Bricks Under Wraps #2 Just Trying To Be Life Is A Long Song Drive On The Young Side Of Life Lick Your Fingers Clean Under Wraps #2 Steel Monkey Mango Surprise Goodbye Cruel World Farm On The Freeway Pan Dance Is there anybody out there- I Don’t Want To Be Me Round The Nile Song Broadford Bazaar Salamander Mayhem, Maybe Jump Start Thick As A Brick [Edit No. 1] Up The ’Pool Kissing Willie Guardian Angel (Instrumental)-lam Nobody Home Lights Out Bike Peacemaker (Studio Track) This Is Not Love Green Is The Colour Dr. Bogenbroom Truck Stop Runner Jugband Blues Someday The Sun Won’t Shine For Y Hard Liner Stop Wond’ring Aloud [Live] Nursie Wrecking The Sphere Vera Rupi’s Dance Young Lust Dun Ringill [Live] A Christmas Song Greensleeved For Later Grace Jack In The Green Paraphrase (Instrumental)-lam Run like Hell Minstrel In The Gallery [Live] Bring the boys back home Waiting for the Worms Under Wraps #2 Life Is A Long Song A Perfect Circle - Mer De Noms - 01 - The A Spanish Piece Nursie Hollow Another Brick in the Wall (Part III) The Water Carrier A Perfect Circle - Mer De Noms - 03 - Rose Crying Song Introduction By Ian Anderson A Perfect Circle - Mer De Noms - 09 - Ren- One of My Turns Minstrel In The Gallery holdr Pensieri Nascosti Paradise Steakhouse A Perfect Circle - Mer De Noms - 12 - Over Skye boat song Hunting Girl A Perfect Circle - Thirteenth Step - 02 - Stop Sealion II Weak and Powerless The Grand Vizier’s Garden Party Part 3 - Too Old To Rock ’N’ Roll (Too Y A Perfect Circle - Thirteenth Step - 06 - A Exit , Nick Mason Piece Of Cake Stranger Cold Wind To Valhalla (Intro) [Li Songs From The Wood A Perfect Circle - Thirteenth Step - 08 - Fire At Midnight Too Old To Rock ’N’ Roll (Too Y Crimes King-Brubeck jam (short version) Conundrum A Perfect Circle - Thirteenth Step - 11 - Life Is A Long Song Jack In The Green Lullaby Finale Quartet Bob Marley and the Wailers - Legend - The Sorrow Minstrel In The Gallery Best of - 04 - Three Little Birds Vodka Silver River Turning Bob Marley and the Wailers - Legend - The Air on a theme The Whistler Best of - 06 - Get Up Stand Up Bike Crew Nights Bob Marley and the Wailers - Legend - The Don’t Leave Me Now Cross-Eyed Mary Best of - 08 - One Love People Get Ready Dramatic Theme Dun Ringill Celine Dion - Mon ami m’a quittee Fields of gold Quatrain Celine Dion - Tellement J’ai D’amour Pour Goodbye Cruel World The Curse Toi Outside The Wall Flyingdale Flyer Celine Dion - Tout L’or Des Hommes Watermark Rosa On The Factory Floor Debussy - La Fille Aux Cheveux De Lin Dun Ringill A Small Cigar Debussy - Reverie We Five Kings Jack-A-Lynn Debussy - Syrinx Another Brick in the Wall - Part 3 Locomotive Breath Debussy - Valse Romantique Another Brick in the Wall part 3 Man Of Principle ynvie - zelda More Blues Pussy Willow house october 2009) Sammy Love ft Irene Jack Frost And The Hooded Crow The Dambusters March Arer-Torcida (Lanfranchi & Farina rmx) APPENDIX B. THE AUDIO DATABASE 92 [www.worldofhouse.es] D.A.N.C.E. david guetta-when love takes over (feat. kelly Dennis Ferrer - Church Lady (Original Mix) Supermassive Black Hole rowland) House- 3Rd Face - Canto Ddella Libert (Van- The World Is Mine david guetta-gettin over (feat. chris willis) dalism Rmx) Glimpse Jay Shepheard Alex Jones-Glimpse david guetta-sexy bitch (feat. akon) Muse - Showbiz - Sunburn And Alex Jones - Fellaz (0Daymusic Org) david guetta-memories (feat. kid cudi) riva starr - i was drunk feat. noze (original Motel Connection-Waxwork david guetta-missing you (feat. novel) mix) [4clubbers.com.pl] nari and milani feat. max c-disco nuff (cris- david guetta-its the way you love me (feat. tim deluxe ft. shahin badar - mundaya (the tian marchi perfect remix) (0daymusic.org) kelly rowland) boy) the Trentemoller-Nightwalker Firestarter okereke)-ck crystal waters gipsy woman (shes homeless)- Genesis Everything Counts hft I Was Drunk Golden Skans david guetta-choose (feat. Sexx Laws Predominant rowland) Surfin’ U.S.A. Toop Toop - Cassius molella and phil jay - its a real world (world dr kucho-beat for me (original mix) (0day- Dance Me mixx).Dr.SoOn music.org) New Jack david fatboy slim and koen groeneveld-rockafeller DEPECHE MODE A question of time will.i.am and apl de ap) skank original mix (0daymusic.org) Motel Connection-Three david guetta-i gotta feeling (fmif edit) (feat. fatboy slim vs. Cajesukarije Cocek black eyed peas) The Cure - Lullaby Spiller Feat. Sophie Ellis Bextor - Groovejet Phantom pt. I (if This Ain’t Love) FLAMINGO PROJECT - Take No Shhh- What’s My Age Again Royal T (Featuring Roisin Murphy) hhhhh Mauro Picotto Komodo Waters Of Nazareth funkagenda-what the fuck (original club mix)- Shake the Disease Bibi Tanga et le professeur inlaSsable - talk- scratch Chemical Brothers-Do It Again (featuring ing nigga brothaz kaiser chiefs-ruby Ali Love) People Are People sander van doorn-renegade club edit trance apdw vs tim deluxe ft sam obernik-just wont Underworld - Born Sleepy energy 2010 anthem it mowgli dub mix 2010 (0daymusic.org) david guetta-one love (feat. estelle) sgt slick and rob pix - behind the sun (cris- Dj Dado - Coming Back david guetta-sound of letting go (feat. tian marchi perfect mix) (0daymusic.org) Wedding Cocek cadisco and chris willis) the chemical brothers-galvanize (feat q-tip)- Times david ck Phantom pt. II will.i.am) the fratellis-henrietta The Smiths - This Charming Man (1984) massive attack-five man army the hives - tick tick boom-ysp Warriors Dance david guetta-toyfriend (feat. wynter gordon) Muse - Uprising - PANiC gabry ponte and paki-its about to rain david guetta-if we ever (feat. makeba) Personal Jesus See You One Minute To Midnight giorgio prezioso vs libex-disco robotz voco Hot Stuff Provenzano feat. mix Nina I’m So Excited (Suonino Mix) Motel Connection - Lost(1) Valentine American Wedding Sparkles Shined On Me Dreamlend Let There Be Light The Party The Smiths - Ask (1987) My Sharona Would You... Flow Shoot The Runner Strangelove Tonite Beyonce ft. Sean Paul Baby Boy Caballeros I Feel You DVNO Andrea Doria Bucci Bag I will survive Matia Bazar - Solo tu burgess)-ck dennis ferrer feat. karlon brooks sr. - change Whirlpool Productions From Disco To Disco Donna Summers - Funky Town the world Benny B. Satisfaction Just Can’t Get Enough Riva’s Boogaloo dennis ferrer - dem people go Mesecina Stress Festivalbar 2007 Blu - Neffa - La Notte Nothing More chelley - took the night (victor palmez and Venus Bulgarian Chicks id remashmix)-atrium (0daymusic.org) Master and Servant fedde le grand-praise you 2009 (f.l.g. remix) FEDDE LE GRANDE presents THE Moderat-Rusty Nails the chemical brothers-the boxer (feattim chemical brothers-believe (feat kele guetta-on guetta-i the wanna ne-yo and kelly dancefloor go crazy (feat. ( to- feat. Max’c - Chains Of Love APPENDIX B. THE AUDIO DATABASE 93 New Life Beats and Styles feat. Justin Taylor - Friend Crookers Raffaella Carra - Festa (Italian Version) - (Cristian Marchi & Paolo Sandrini Extended D’Amico - Festa Festa Raffaella Carr Remix) Crookers ft. The Soundlovers Surrender Bee Gees - You Should Be Dancing Riva Starr Dub - Defected Miami WMC 2010 Never Let Me Down Ag Beverly Project Vs Julio Cesar - My People -ITH33DS-1(320k) Brothers On The 4th Floor - Dreams (Will (Cristian Marchi Flow Mix) Crying at the discoteque Come Alive)(1) Beyonc - Single Ladies (Put A Ring On It) Daft Punk Around The World Let’s Dance Billy More - Come On And Do It Daft Punk Put Your Hands Up In The Air Noferini & Dj Guy feat. Hilary - Pra Sonhar Billy More - I keep on burning DANCE ANNI 90 - Corona - This is the (Marascia rmx) Billy More - Up and down rythm of the night Alors on danse - Stromae Bingo Players - Devotion [Original Mix] Dance Anni 90 - Gala - Freed From Desire diabulus in musica Bingo Players - When I Dip (Original Mix) Dance Anni 90 - Haddaway - What Is Love Inxs - I Need You Tonight Bingo Players Vs Chocolate Puma - Disco Dance Anni 90 - Ultranate - Free cd6 - 01 - human league - don’t you want me Electrique (Original Mix) Daniele Silvestri & Subsonica - Liberi tutti baby bla bla bla Datura - The 7Th Allucination pm (till I come) Blondie - Call Me Datura-yerba Del Diablo I ACDC - Back in Black Blondie - Heart of Glass David Bowie - Rebel Rebel ACDC Highway To Hell blue David Guetta - Everytime We Touch Aerosmith - Pink blur - boys and girls David Guetta - Gettin’ Over You (Feat Chris Aerosmith - Rag Doll Bob Marley - Could You Be Loved Willis, Fergie & Lmfao) Afrojack - Pacha On Acid (Original mix) Bob Marley - Legend 06 - Get Up Stand Up David Guetta - Grrrr (Original Mix) Alan Sorrenti - Figli Delle Stelle Bob Sinclar - Gym Tonic (T.Bangalter Mix) David Guetta - Love Is Gone Alex Gaudino & Jason Rooney - I Love Rock Bob Sinclar - New New New (2009 Blaster David Guetta - Pop Life - 04 - Delirious N Roll (Exclusive Edit) Project Exclusive Version) (House Diciembre David Guetta - Sexy Bitch feat. Akon ALICE - Per Elisa 08) David Guetta feat Kelly Rowland - Takes Alison Goldfrapp - Lovely Head (Pubblicita’ Bodyrox Ft. BMW) Ramirez Radio Edit) remix) Almamegretta-BlackAthena Bronski Beat - Smalltown Boy david guetta feat. chris willis - love is gone America Calvin Bosco & Chris Bekker feat. Giorgio David Guetta Feat. Analog People In A Digital World - Vega (Ian Moroder - The chase (D.O.N.S. Remix) LMFAO - Gettin’ Over You (Extended Mix) Pooley Mix) Calvin Harris - Acceptable in the 80s (Radio David Guetta feat. Analog People In a Digital World - Walking Edit) takes over (Original Mix) In Harlem [Dj Sneak Mix] Calvin Harris - Flashback David Guetta Love don’t let me go Anni 80 - Amanda Lear - Tomorrow Calvin Harris - I’m Not Alone David Morales - Needin’ You Anni 90 Corona - Baby Baby Cassius - I’M A Woman Dead Or Alive - You Spin Me Right Round Annie Lenox -Eurythmics - There Must Be Cassius - La Mouche (Like A Record) An Angel Ce Ce Peniston - Finally (Vandalism Remix) Dee-lite - Groove is in the heart Luciana - Yeah Yeah (D. feat. Fabri Fibra & Dargen Roisin Murphy - Royal T - Love Over (Arno Cost & Norman Doray Chris Willis, Fergie & Kelly Rowland - Love [www.worldofhouse.es] Deep Swing - In The Music 2010 (Cristian GRINO ROCKING Original Mix Chase - Obsession Marchi Perfect remix) Antonella Ruggero & Matia Bazar - Ti Sento Chelsea Dagger Dennis Christopher - Set It Off (Ian Carey Apres La Classe - Paris Christina Aguilera - Candyman Remix) Are you gonna go my way Clash - London Calling Dennis Armand Van Helden - Hear my name (edit) Corona - This is the rythm of the night rer’s Arno Cost - Cyan (Original Mix)(1) (Dance Mix ’94) mafia.blogspot.com) ATB - Let You Go (Airplay Mix) Corona - Try Me Out Destination Unknown (J-Reverse Radio Mix) ATB - You Are Not Alone Cristian Marchi - Disco Strobe (Perfect Mix Different Gear Vs Sia - Drink To Get Drunk Axwell - I Found You (Axwells Re-Mode) Radio)(1) Dirty South - Let It Go [Axwell Remix] (Rip) Axwell feat. ANTHONY LOUIS & PAOLO PELLE- Ferrer - Attention Hey Vocal Hey Mix) (Dennis Fer- (bacauhouse- Cristian Marchi - Love, Sex, American Ex- Dirty South & Mark Knight - Stopover (Orig- sunrise (Radio Edit) press (Cristian Marchi Main Vocal Mix) inal Mix) (bacauhousemafia.blogspot.com) Bamboo - Bamboogie (12” Vocal Mix) [www.worldofhouse.es] Discorama - Giddy Up A Go Go (John Barbara Ann Crookers Feat. Dahlback Remix) Bassmonkeys & Bianca Lindgren - Get busy Giorno’N’Nite Steve Edwards - Watch the Dargen D’Amico Dan T - Discoteca Anni 90 - Snap! - Rythm Is A APPENDIX B. THE AUDIO DATABASE 94 Dancer fect remix) I predict a riot Discoteca labirinto Gala - Come Into My Life I was made for loving you Gala - Let A Boy Cry ida corr - a1 let me think about it (fedde le Bonkers (Club Mix) Gem Boy - Orgia Cartoon grand remix) DJ Emanuele Inglese - I’m Really Hot Gianna Nannini - 01 - Bello E Impossibile Inna - hot DKS - That’s Jazz (Da Club Mix) Gianna Nannini - America Itaka & Manu Blanco - Como Dice El Dj Do you want to Gianna Nannini - Fotoromanza Jacqueline Dr Kucho - New school tribal (original mix) Gigi D Agostino - L Amour Toujours Javi Mula Feat. DJ Disciple - Sexy Lady (Ex- Dr Kucho! - Patricia Never Leaves The House Gigi D’Agostino Another Way tended Mix) (bacauhousemafia.roclub.org) (Dr Kucho! Remix) Gigi D’Agostino Bla Bla Bla Javi Mula - Come On Dr. Kucho - Belmondo Rulez (Bob Sinclair Gigi D’Agostino The Riddle jean Vocal mix) Girl shingaling Dr. Kucho - Groover’s Delight (2008 original Giuliano Palma & La Pina - Parla Piano Jefferson Airplane - 1967 - Surrealistic Pillow mix) Giuliano Palma & The Bluebeaters - Black Is - 02 - Somebody to Love Dropkick Murphys - I’m shipping up to Black John Dahlback - Blink boston Giuliano Palma & The Bluebeaters - Won- John Dahlback-Everywhere Duck Sauce Ft A-Trak & Armand Van Helden derful Life Juan Magan & Marcos Rodriguez - Bora Bora - Anyway (Original Mix) Giuliano Palma - Tutta Mia La Citt [www.worldofhouse.es] Dude Looks Like A Lady Giuni Russo - Maracaibo Juanjo Martin & Albert Neve ft. Duran Duran - 13 - Notorius Giuni Russo - Voglio andare ad Alghero Brown - SuperMartxe (Original Mix) FULL Duran Duran - Wild Boys glimpse and alex jones–true friends-dh (0day- Junior Jack - E Samba Eddie Thoneick, Erick Morillo - Nothing Bet- music.org) Justice - DVNO (192 kbps) ter Feat Shena (Original Mix) Global Deejays Feat. Ida Corr - My Friend justice - stress Edward Maya&Vika Jigulina - StereoLove (Club Mix) passion4housemusic.blogspot.com Kalasnjikov Estelle Feat. Kanye West - American Boy Goldsylver - I Know You Better KID CUDI - Day and night everyday I love you less and less Good Vibrations Kim Carnes - Bette Davis Eyes Faithless - Insomnia Gorillaz - Stylo Falco - Der Kommissar Gramophonedzie-Why Falco - Rock Me Amadeus mix) La Passion Fedde Le Grand & Funkerman - 3 Minutes Green Velvet - La La Land LaBouche - Sweet Dreams (Disco Techno To Explain (Original Mix) Groove Armada-Paper Romance (Album Ver- Mix) Fedde Le Grand feat. Mitch Crown - Scared sion) 2010 (Albummusic Eu) Lady Gaga - Bad romance (radio edit) Of Me (Extended Mix) Gui Boratto - No Turning Back (Original Laurent Wolf - Calinda (Original mix) Feel da feeling (side a2 radio edit) Mix) laurent wolf - explosion club mix feel it Guido Nemola & Loaded - De Bailar laurent wolf - seventies - club mix Felix Da Housecat - Silver Screen Hatiras - Spaced Invader (Hatiras 2010 Vocal le knight club - Tropicall flashdance Remix) Let Me Think About It - Radio Edit - Ida fly away Heart in a cage Corr Vs Fedde Le Grand Franco Battiato - Cerco Un Centro Di Gravit Helmut Fritz - a m’enerve (Radio Edit) Le Knight Club - Soul Bells Permanente hey boy hey girl lies Frankie Gada Vs Raf Marchesini - Rockstar horny 98 Lobo guar - Resta aqui (Cristian Marchi Perfect Remix)Clubkings Hot Party Winter 2007 208 Alex Gaudino - Loco Tribal - O Ritmo Do Samba (Tiko’s Eu Magic Destination(Calabria Mix) Groove Remix) Frankie Goes To Hollywood - Relax Hot-Chip-One-Life-Stand Love And Pride Franz Ferdinand - No You Girls never know House - Armand Van Helden - Witch Doctor Love Don’t Let Me Go (Walking Away) French Affair - My Heart Goes Boom (original mix) Love in an elevator gabin - it don’t mean a thing House - Roach Motel - Wild luv (H Connec- Love Is Gone (Original Mix) - David Guetta Gabry Ponte - Don’t Move Your Lips tion Remix) LSF - kasabian Gabry Ponte - Time To Rock House Of Glass - Disco Down Madness - One Step Beyond (1979) Huf8 - Sashi & Sushi - Original Mix Madonna - Give It To Me - 2008 (Hard I Believe in a Thing called Love Candy) i belong to you Madonna - Hard Candy - Beat Goes On (Fea- Dizzee Rascal Feat. Gabry Ponte, D’angelo feat. Armand Van Helden - Cristian Marchi, Sergio Andrea Love - Don’t Let Me Be Misunderstood (Cristian Marchi per- claude ades and vincent thomas- Nalaya King Of My Castle Dont You(Original Kobra APPENDIX B. THE AUDIO DATABASE 95 turing Kanye West) Sandrini Remix) Sade - Smooth operator Madonna - Revolver (David Guetta Remix) NARI & MILANI vs. CRISTIAN MARCHI Salif Keita - Madan (Martin Solveig) Malente - I Like it (Riva Starr Snatch Mix) feat. MAX C - Let It Rain (Club Mix) san franciscan nights Marani And Montsaint - Turn (Paolo Bolog- Ne-Yo - Nobody Sash - Encore une fois nesi Remix) Need For Speed Underground - 08 - Asian Sash! - Mysterious Time Marchi’s Flow vs. Love feat. Miss Tia - Feel Dub Foundation - Fortress Europe Scatman John - Scat Man The Love Nelly Furtado ft Timberland - Promiscuos Sexual Guarantee Mark Knight & Funkagenda - Good Times Girl Sharam feat. (Original Mix) New Order - Blue Monday (Jean Elan Remix) Marracash - Badabum Cha Cha(1) Nick Kamarera & Deepside Deejays - Beauti- Sidekick - Deep Fear (Phobia Club Remix) Martin Solveig - 02 - Something Better ful Days (extended Version) Sidney Samson - Riverside (Original Mix) Martin Solveig - Boys & Girls NICOLA Martin Solveig - C’est la Vie (Martin Solveig feat.MR.GEE vs Fedde le Grand)(1) SOUTH BEACH MIX(1) Song 2 MARTIN SOLVEIG - Everybody - Nuova ossessione (album) Sono - Keep Control MARTIN SOLVEIG - I want you Oliver Twizt - Gangsterdam (Original Mix) Sophie Ellis-Bextor - Bittersweet (Freema- Martin Solveig - One 2 3 Four (Original Club (bacauhousemafia.blogspot.com) sons Mix Edit) Mix) Ordinary Life splendido - ComoglioCut Martin Solveig - One 2 3 Four Out of Space Starman Martin Solveig feat Dragonette - Boys & Girls Pain & Rossini - Hands Up Everybody (Cris- stay (Laidback Luke Remix)[NationOfHouse.com] tian Marchi Rmx) Stayin Alive (Saturday Night Fever) Massive Attack - Herculaneum - Colonna Peter Tosh - Out Of Space’76 Steve Aoki I’m In The House sonora - GOMORRA Pharrel Williams & Uffie - Add Suv (Armand Strawberry fields (rmx) Mastiksoul - Back To The 80’s feat. George Van Helden Vocal Remix) strings of life Llanes Jr. - Latin Version Pharrell feat Gwen Stefani - Can I Have It Stylus Matia Bazar - Vacanze Romane Like That Remix) Meg - Distante Pitbull - I Know You Want Me (Calle Ocho) Subsonica - I chase the devil Messico e nuvole Pitbull feat. Akon - Shut It Down (Javi Mula Susy La Ragazza Truzza (Gabry2o Rmx) Michel Cleis & Salvatore Freda - Collivo Remix Extended) Swared Ruanda (Emanuele Esposito Club Moby - Disco lies (Spencer & Hill rmx) Planet Funk - Chase The Sun (Radio Edit) Mix) Molella And Phil J - With This Ring Let Me Planet Funk - Inside All The People Sweet dreams Go Planet Funk - Lemonade Sylvester - You Make Me Feel (Mighty Real) Molella-revolution (tantaroba mix) Prezioso - Tell Me Why Take me out Motel Connection - Dreamer Prok And Fitch Pres Saloma De Bahia take me up Motel Connection - H Remixes - Outro Lugar (Tocadisco’s Nunca Techno - Gigi d’Agostino - You spin me Motel Connection - Heroin Chove Floripa Mix) around Motel Connection - Hit and run Quelli che benpensano Telephone - Lady Gaga feat. Beyonce Motel Connection - Reach out Raf Marchesini & Max B - Farao (Marchesini Tell her tonight Motel Connection-The light of the morning Radio mix) The Bloody Beetroots - Anacletus Motel Connection Uppercut RAP - Old School - Run DMC - It’s Like The Clash - 01 - Know Your Rights move your body That The Clash - 02 - Car Jamming MSTRKRFT - Heartbreaker feat. John Leg- Ricardo end Laidback Luke Remix PONTE & PAKI remix Go Muse - 03 - Time is Running Out Robert Miles - Children (Full Version) The Clash - 04 - Rock The Casbah Muse - 08 - Hysteria Robert Miles - Freedom The Clash - 05 - Red Angel Dragnet Muse - The Resistance - 03 - Undisclosed Roberto Molinaro - Hurry Up The Clash - 07 - Overpowered By Funk Desires Robin S vs Steve Angello & Laidback Luke - The Clash - 08 - Atom Tan Music Response Show Me Love vs Be (Hardwell remix) The Clash - 09 - Sean Flynn Mylo - Drop The Pressure rock the house The Clash - 10 - Ghetto Defendant N trance - Da ya think I’m sexy Roisin Murphy - Overpowered (Herve & The Clash - 11 - Inoculated City Nari & Milani and Cristian Marchi with Max Roisin in the Secret Garden) (162k 5m29s) The Clash - 12 - Death Is A Star C - Let It Rain (Cristian Marchi & Paolo Rudenko - Everybody (Club Mix) The dark of the matinee FASANO - vs.OUTWORK ELECTRO Villalobos - - Enfants FASANO - GABRY Kid Cudi - She came along Simone Jay - Wanna Be Like A Man Smack My Bitch Up Robb-Ininna Tora(Nick Corline The Clash - 03 - Should I Stay Or Should I APPENDIX B. THE AUDIO DATABASE 96 The Doors - 01 - Break On Through do Rio (Vocal Extended) [by zZz] Where are we runnin’ The Doors - 01 - Hello, I Love You Tiko’s Groove feat. Mendonca Do Rio - Me Whigfield - Saturday Night(1) The Face Vs. Adam Shaw & Mark Brown - Faz Amar (Vocal Extended Mix) Whigfield - When I think of you Needin U (Original Mix Version 2) Tim Deluxe - It Just Won’t Do Who Da Funk feat. Jessica Eve - Shiny Disco The Gossip - Heavy Cross Tom Novy - Your Body (Radio Edit) The Gossip - Music For Men - 04 - Love Long Treasure Distance Dancefloor The Guru Josh Project - Infinity 2008(radio [www.worldofhouse.es] Winehouse, Amy Rehab edit) Trentemoller - The Trentemoller Chronicles Wouldn’t It Be Nice The Hives - Two Timing Touch And Broken cd 1 - 02 - Klodsmajor Wuthering Heights Bones Trentemoller - The Trentemoller Chronicles Yolanda Be Cool & DCUP-dj roma the white The House Keepers - Runaway(DJ Umile Full cd 1 - 03 - Mcklaren (trentemoller remix) Tracklist - We No Speak Americano (Original Vocal Mix)www.mp30.er.pl Trentemoller - The Trentemoller Chronicles Mix) [320] The Housekeepers - Go down cd 1 - 11 - Rykketid You only live once The housekeepers Hangin’ on Trentemoller - The Trentemoller Chronicles yuksek - tonight The Smashing Pumpkins - 02 - Ava Adore cd 1 - 13 - Moan (Trentemoller Remix Radio Yves Laroque - Rise up The Strokes-Reptilia Edit) Theophilus London - TNT Trio - Dadada Tiko’s Groove - Para Sambar feat Mendoca Tutti i miei sbagli Fingers (Laidback - Cross Luke Balls The Remix) Will I Am - The Donque Song (Fedde Le Grand remix) by ALEX INC Appendix C A short story of computer-aided composition The field of computer-aided composition spreads from artificial intelligence to humanistic studies. We may say that musical composition systems derive from the very desire of men to create machines able to emulate their behaviors and creative intuition. If we recall the 16th Century legend of the golem, an animated anthropomorphic being created entirely from inanimate matter, we can see how far this desires goes in time. The golem was created by men at their own image and could be seen as the first idea of a ”computing device”: it could execute any order by writing a specific series of letters on parchment and placing the paper in a golem’s mouth. In addition to this, men wants to be surprised by a system that creates the ”unexpected”, the ”novel” and therefore go beyond its creator. Buxton [9] shows how, even in the year 1975, computer-aided composition has been seen as an interesting research and artistic creation area. C.1 Algorithmic composition (a) John Cage, Variations (b) Karlheinz Stockhausen, Tunnel Spiral Figure C.1: Relevant music scores The term ”algorithmic composition” refers to a category of systems that compose music by applying a set of rules. The rule-based approach surely offers many advantages such as the complete control on the result but, since music is linked to the concept of creativity and not only to defining a set of rules, algorithmic composition suffers under staticity and predictability. From the end-user side, the advantage of algorithmic composition systems is the possibility of 97 APPENDIX C. A SHORT STORY OF COMPUTER-AIDED COMPOSITION 98 drastically reduce the amount of user interaction during the generation phase; the developer can define a large set of rules and let the system compose music with no input. This makes rule-based composition systems available also for non-expert users. The disadvantage is the complexity of the system compared to the simplicity of the generated music, usually biased toward a particular style (Bach-style, techno, piano solo music, ...). Moreover, the relevance of some musical authors or scores lies in the rejection of some classic rules. Ancient musical composition systems are the so-called ”giochi armonici” (18th century, credited to W.A. Mozart), combinatory tables that allow, by means of dices, the composition of a virtually infinite number of melodies or ”minuetti”. More recently, composers such as John Cage and Karlheinz Stockhausen explored new ways of writing and executing a musical score leaving a certain degree of freedom to the player, usually an electronic device (Figure C.1). From this moment on, many attempts have been made to exploit the growing computational power of computers, aiming at the creation of a music composition system that can autonomously arrange and compose an appealing musical piece. It is clear that the notion of ”appealing musical piece” depends on many factors (such as the musical training or the culture of the listener) and is therefore an ill-posed problem. In order to circumvent this fact, music composition systems usually restrict their scope to a particular slice of the musical panorama (western music, disco music, ...). A first category of systems use some rules of the traditional western music and some melodic heuristics to create a musical melody and harmony. Aoki [3] uses the traditional harmony and counterpunctus rules to generate a musical score, Chua [13] uses a random number generator to select notes from the pentatonic scale whereas Rigopulos and Egozy [43] let the user decide the characteristics of the generated musical piece by means of a joystick. A more formalized approach makes use of mathematical logic to model a musical piece (Anders [1] and Henz et al. [22]). The user specifies, by means of logic formulas, the expected music rules and the system, finds a realization that satisfies the stated formulas. C.2 Composition environments Another type of systems explicitly needs human creativity and offers an environment in which the composer creates his or her music. The preferred user is therefore a musically trained person. These system usually consider music as non-linear: a musical score is no more considered as a line with a beginning and an end, but is more similar to a graph where nodes represent musical segments and edges possible continuation of such melodies. The executor can move in the graph and produce a particular time evolution of the composition. The application of these system is usually very specific: Microsoft Corporation integrated DirectMusic (Fay [15]) in the DirectX framework and Lucasarts used iMUSE (Land and McConnel [27] as music engine for their games. Figure C.2: iMUSE Logo iMUSE iMUSE (Interactive MUsic Streaming Engine) (Land and McConnel [27]) is an interactive music system used in a number of LucasArts video games. The idea behind iMUSE is to synchronize music with the visual action in a video game so that the audio continuously matches the on-screen events and transitions from one musical theme to another are done seamlessly. iMUSE was developed in the early 1990s by composers Michael Land and Peter McConnell while working at APPENDIX C. A SHORT STORY OF COMPUTER-AIDED COMPOSITION 99 LucasArts. The iMUSE system was added to the fifth version of the SCUMM (Script Creation Utility for Maniac Mansion) game engine in 1991. iMUSE was developed out of Michael Land’s frustration for the audio system used by LucasArts while composing ”The Secret of Monkey Island”. His goal was to create a system which would enable the composer to set the mood via music according to the events of the game. The first game to use the iMUSE system was ”Monkey Island 2: LeChuck’s Revenge” and it has been used in all LucasArts adventure games since. It has also been used for some non-adventure LucasArts titles, including ”Star Wars: X-Wing”, ”Star Wars: TIE Fighter”, ”Star Wars: Dark Forces” and ”X-Wing Alliance”. iMUSE uses standard MIDI files to whom some control signal are added; the input data is therefore polyphonic and represented as a sequence of musical notes. The issues related to digital signal processing and audio feature extraction are here extremely simplified since the system has a precise information about the score. The actions that may be taken are: • move the execution to a certain point in file • adjust a MIDI controller such as volume, pitch, ... • enable/disable an instrument Decision points are placed in the performance data by the composer. Upon encountering a decision point, the sound driver evaluates the corresponding condition and determines what action to take based on the events occurring in the game. It is therefore possible to trigger a musical piece in correspondence of a combat scene or change the characteristic of the music when the user moves from room to room. OpenMusic OpenMusic 5 (Bressin et al. [8]), developed by IRCAM, is a more sophisticated version of iMUSE that allows a composer to use a sort of ”music programming language” (similar to Csound or PureData), assisted by a graphical interface. Visual programs are created by assembling and connecting icons representing functions and data structures. Most programming and operations are performed by dragging an icon from a particular place and dropping it to an other place. Built-in visual control structures (e.g. loops) are provided, that interface with Lisp ones. Figure C.3: OpenMusic 5 OpenMusic may be used as a general purpose functional/object/visual programming language. At a more specialized level, a set of provided classes and libraries make it a very convenient environment for music composition. Different representations of a musical process are handled, among which common notation, midi piano-roll, sound signal. C.3 Interactive composition Beside the algorithmic (rule-based) systems and the composition environments there exists another category of systems, the interactive composition systems, that are designed to assist the APPENDIX C. A SHORT STORY OF COMPUTER-AIDED COMPOSITION 100 person in the creative process, without substituting him or being relegated to the pre-production of the musical piece. A mixture between algorithmic and traditional composition could be found in the article by Unehara and Onisawa [47] where genetic algorithms and machine learning techniques are exploited. The system is iteratively trained by the user by expressing appreciation for good sections or discarding bad sections. The system starts by randomly composing a group of melodies, tones (i.e. an harmonization) and backing patterns (i.e. an accompaniment). It then let the user select the preferred ones. Based on the preferences of the user, a genetic algorithm creates a new set of melodies, tones and backing patterns selecting only the best options and the process is iterated. Another research field concentrates of the generation of music based on movement or visual features. This system may be consider a mixture between a computer-aided composition system and a musical instrument. Wireless sensor networks are exploited by Chantasuban and Thiemjarus [12] to detect human movements. The system is composed by a number of wireless nodes, attached to the person’s body, that contain sensors (accelerometers, heat sensors, ...); the nodes are hierarchically organized: each node elaborates the information of the lower classes in order to create higher level of abstraction and finally generate musical signals. In this way the performer can control the music production with the movements of his body. Other systems are based on the extraction of music from images or shapes. The concept of ”synaesthesia” is explored trying to find the hidden link between human senses. Roth [44] generates music based on the characteristics of predefined shapes (spirals, circles, crosses, ...); when a shapes appears on the screen, a predefined type of music is played; the speed, pitch and harmony of the music depends on the properties of the shape (translation or rotation speed, color, ...). The result is far from being appealing or musically relevant but it can be considered as a good starting point. In the article by Li et al. [30], a system is trained to associate music with images, based on some features (color, blur, ...). DLKW [14] transforms HTML code into music by selecting riffs and notes from a predefined database. Group [21] proposes many application that generate music from chaotic models or concrete music installations. Polotti and Rocchesso [40] and Chafe [11] offer a general overview about current trends in human-computer musical interactions from both technological and artistic sides. C.4 Collaborative Music Composition The improvements in telecommunication technology and the increasing availability and speed of Internet connections allowed the creation of distributed music composition systems that enable users from different locations to contribute to a musical piece. These systems usually take advantage of innovative human-computer interaction interfaces such as Tabletop (i.e. an horizontal touchscreen), virtual musical instruments or human motion detectors. In these applications the idea of composition is either the traditional musical score editing or a real time audio performance but they could also be used to divide the arrangement of a song among multiple musicians or for teaching purposes. Figure C.4: The tabletop collaborative system Systems belonging to this category are, for example, MOODS (Music Object Oriented Distributed System) (Bellini et al. [6]), which is a synchronous real-time cooperative editor for musical score APPENDIX C. A SHORT STORY OF COMPUTER-AIDED COMPOSITION 101 or FMOL (F@ust Music On-Line) (Jordà and Wuest [23]) a web-based application to edit a musical tree. Another relevant collaborative composition environment is presented in the web page http://www.noteflight.com, where users can edit musical scores together. Pichiliani and Hirata [39] describes a collaborative tabletop system that allows users to work together over a shared horizontal display and can also receive inputs from virtual musical instruments and MIDI devices. The structure of the system is displayed in Figure C.4. After establishing a connection with the CoMusic Server each user can open an instance of CoTuxGuitar (a score editor program) to see and edit the notes that his/her instrument is producing, which are stored separated in different parallel tracks of the music score. Also, the sound of every note played by any instrument is reproduced to all users and every modification in the notes already played and stored in the track is replicated to all instances of CoTuxGuitar through the server. Stanford Laptop Orchestra The Stanford Laptop Orchestra (SLOrk) is a largescale, computer-mediated ensemble that explores cutting-edge technology in combination with conventional musical contexts - while radically transforming both. Founded in 2008 by director Ge Wang and students, faculty, and staff at Stanford University’s Center for Computer Research in Music and Acoustics (CCRMA), this unique ensemble comprises more than 20 laptops, human performers, controllers, and custom multi-channel speaker arrays designed to provide each computer metaFigure C.5: The Stanford Laptop Orchestra instrument with its own identity and presence. The orchestra fuses a powerful sea of sound with the immediacy of human music-making, capturing the irreplaceable energy of a live ensemble performance as well as its sonic intimacy and grandeur. At the same time, it leverages the computer’s precision, possibilities for new sounds, and potential for fantastical automation to provide a boundary-less sonic canvas on which to experiment with, create, and perform music. Offstage, the ensemble serves as a one-of-a-kind learning environment that explores music, computer science, composition, and live performance in a naturally interdisciplinary way. SLOrk uses the ChucK programming language as its primary software platform for sound synthesis/analysis, instrument design, performance, and education. (http://slork.stanford.edu/) E quindi uscimmo a riveder le stelle Dante Alighieri Inferno XXXIV, 139