Thesis - MuE: Music + Engineering
Transcription
Thesis - MuE: Music + Engineering
UNIVERSITY OF MIAMI ON THE VIABILITY OF USING MIXED FEATURE EXTRACTION WITH MULTIPLE STATISTICAL MODELS TO CATEGORIZE MUSIC BY GENRE By Benjamin Fields A RESEARCH PROJECT Submitted to the Faculty of the University of Miami in partial fulfillment of the requirements for the degree Master of Science in Music Engineering Technology Coral Gables, Fl May 2006 UNIVERSITY OF MIAMI A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Music Engineering Technology ON THE VIABILITY OF USING MIXED FEATURE EXTRACTION WITH MULTIPLE STATISTICAL MODELS TO CATEGORIZE MUSIC BY GENRE Benjamin Fields Approved: _________________________ Colby N. Leider Asst. Prof., Music Engineering ________________________ Dr. Edward P. Asmus Assoc. Dean, Graduate Studies _________________________ Dr. Miroslav Kubat Assoc. Prof., Electrical Engineering _______________________ Ken C. Pohlmann Prof., Music Engineering FIELDS, BENJAMIN (M.S. in Music Engineering Technology) (May 2005) On the Viability of Using Mixed Feature Extraction with Multiple Statistical Models to Categorize Music by Genre Abstract of a Master’s Research Project at the University of Miami Research project supervised by Assistant Professor Colby Leider Number of pages in text: {final count} In recent years, large capacity porTable music players have become widespread in their use and popularity. Coupled with the exponentially increasing processing power of personal computers and embedded devices, the way in which people consume and listen to music is ever-changing. To facilitate the categorization of music libraries, a system has been created using existing MPEG-7 feature vectors and Mel-Frequency Cepstral Coefficients (MFCC) evaluated through a number of trained Hidden Markov Models (HMM) and other statistical methods. Previously, MPEG-7 spectral feature vectors have been used for audio classification through a number of means including the use of HMMs. In an effort to improve accuracy and robustness, the system expands beyond MPEG-7 tools to include MFCC-based analysis. The output of these models is then compared, and a genre choice is made based on which model appears most accurate. This project explores the use of MPEG-7-compliant and other feature vectors used in conjunction with HMM sound models (one per feature per genre) as a means for categorizing music files by these pre-trained genre models. Results from these tests will be analyzed and ways to improve the performance of a genre sorting system will be discussed. ACKNOWELDGEMENTS The body of work presented was neither made in isolation or without context. Foremost, my parents, without them I simply wouldn’t exist, let alone have made it this far. WRITE MORE YO! iii Table of Contents 1. Introduction and Background ................................................................................... 1 1.1. Concept of Genre........................................................................................... 1 1.2. Feature Vector Extraction .............................................................................. 2 1.2.1. MPEG-7 Audio....................................................................................... 3 1.2.2. Mel-Frequency Cepstral Coefficients ...................................................... 5 1.3. Statistical Learning Methods.......................................................................... 6 2. Preliminary Study via Simple Investigation ............................................................ 10 2.1. Overview ..................................................................................................... 10 2.2. System Architecture..................................................................................... 11 2.2.1. Genre Model Creation........................................................................... 11 2.2.2. Testing A Song ..................................................................................... 15 2.3. Experiment .................................................................................................. 17 2.4. Results......................................................................................................... 18 2.5. Analysis and Summary ................................................................................ 19 3. The Song Sorting System ....................................................................................... 21 3.1. System Overview......................................................................................... 21 3.2. The MFCC chain ......................................................................................... 22 3.3. The Beat Chain ............................................................................................ 23 4. The Experiment...................................................................................................... 26 4.1. Proof of Concept.......................................................................................... 26 4.2. The Data Set ................................................................................................ 30 4.3. The Trial Runs............................................................................................. 31 5. Results ................................................................................................................... 41 5.1. Ten Genre Trial ........................................................................................... 41 5.2. Eight Genre Trial ......................................................................................... 43 5.3. Six Genre Trial ............................................................................................ 46 5.4. The Two Four Genre Trials (Original dataset v. iTunes dataset)................... 47 6. Analysis and Conclusion ........................................................................................ 49 6.1. Overall Performance .................................................................................... 49 6.2. What can this tell us about music? ............................................................... 52 6.3. Future Work ................................................................................................ 52 Bibliography ................................................................................................................. 55 Appendix Full Trial Song Listings with Genre Assignments........................................ 58 iv Table of Figures Figure 1 A discrete Markov process with 4 states. ........................................................... 7 Figure 2 High level overview of the model building and song testing process................ 11 Figure 3 A breakdown of the dimensional reduction process ......................................... 12 Figure 4 Raw spectragram of an ‘electronic’ clip........................................................... 13 Figure 5 The AudioSpectrumBasis of Figure 4.............................................................. 14 Figure 6 Structure of the song testing procedure............................................................ 15 Figure 7 Misses of the test bed ...................................................................................... 19 Figure 8 Broad overview of testing process with decision algorithm.............................. 22 Figure 9 Overview of the testing process for a single song ............................................ 25 Figure 10 The output of the basis functions used to created the 'classical' genre from the data set used in the run detailed in Chapter 2. ........................................................ 27 Figure 11 The output of the basis functions used to created the 'electronic' genre from the data set used in the run detailed in Chapter 2. ........................................................ 27 Figure 12 The output of the basis functions used to created the 'Jazz' genre from the data set used in the run detailed in Chapter 2................................................................. 28 Figure 13 The output of the basis functions used to created the 'Rock' genre from the data set used in the run detailed in Chapter 2................................................................. 28 Figure 14 Scatter of the BPM data for the training songs ............................................... 29 Figure 15 Scatter of the covariance data for the training songs ...................................... 29 Figure 16 Scatter of the reliability data for the training songs ........................................ 30 Figure 17 The output of the basis functions for the 'Alternative' genre........................... 33 Figure 18 The output of the basis functions for the 'Blues' genre ................................... 33 Figure 19 The output of the basis functions for the 'Classical' genre .............................. 34 Figure 20 The output of the basis functions for the 'Electronic' genre ............................ 34 Figure 21 The output of the basis functions for the 'Folk' genre ..................................... 35 Figure 22 The output of the basis functions for the ‘Hip-Hop/Rap’ genre ...................... 35 Figure 23 The output of the basis functions for the 'Jazz' genre...................................... 36 Figure 24 The output of the basis functions for the 'Pop' genre ...................................... 36 Figure 25 The output of the basis functions for the 'Rock' genre .................................... 37 Figure 26 The output of the basis functions for the 'R & B/Soul' genre.......................... 37 Figure 27 Scatter of the BPM data................................................................................. 38 Figure 28 The covariance of each BPM......................................................................... 38 Figure 29 The reliability of the BPM............................................................................. 39 Figure 30 Accuracy versus number of genre classes ...................................................... 50 v Table of Tables Table 1 Overall accuracy rates of the trial with the average confidence. ............... 18 Table 2 Summary of the genre output and error distribution, data from chpt.2 ..26 Table 3 Accuracy for the large datatset of all ten genre classes..............................42 Table 4 Accuracy by the spectral envelope for the large datatset...........................42 Table 5 Accuracy by the MFCC for the large datatset..............................................43 Table 6 Accuracy by the tempo based system for the large datatset ......................43 Table 7 Accuracy of the combined system for the eight genre classes. ..................44 Table 8 Accuracy by the spectral envelope for the eight genre classes. .................45 Table 9 Accuracy by the MFCC based system for the eight genre classes. ............45 Table 10 Accuracy by the tempo based system for the eight genre classes. ..........45 Table 11 Accuracy of the combined system for the six genre classes. ....................46 Table 12 Accuracy of the spectral envelope system for the six genre classes. .......47 Table 13 Accuracy of the MFCC system for the six genre classes. ..........................47 Table 14 Accuracyof the tempo-based system for the six genre classes.................47 vi 1. Introduction and Background 1.1. Concept of Genre The idea of genre as a means of classification has been seen in most forms of media since each particular form has been in substantial artistic use. The word is French and in its most literal sense means “kind” “sort” or “type,” though it shares the root as the Latin word genus[10, 11]. The current definition according to Oxford American Dictionary [12] is as follows: A category of artistic composition, as in music or literature, characterized by similarities in form, style, or subject matter. So genre as a categorical methodology is clearly highly qualitative and flexible in nature. It is this lack of definition that makes it both useful across many forms and at the same time very difficult to implement computationally [5, 11]. Still, this taxonomy, even with its fuzzy nature, is worthy of investigation. People have been using this process as a means of categorizing various forms of art for a few thousand years now. The earliest discussions of genre (used first to describe literary works) are thought to have been the work of Aristotle [10]. Any categorization topography (genre or otherwise) should ideally have two properties [10, 11], mutual exclusivity and joint exhaustivity. This is to say, (genre) categories should not overlap in the media they describe and, when all are taken together, should describe every possible and known piece of a given media. These ideals are just that, as they are not seen in any practical use of genre. For instance, when looking at music, if a piece of music is identified as “rock” is it then impossible to label it as “punk?” This question is further complicated when looking for broad consensus of a given typology across a population [13]. 1 It seems then, in order to have an automated system that is capable of this task of genre categorization, two distinct tasks must be performed. First, meaningful features must be produced to describe the characteristics of a given piece of media. In the case of music audio files, this can be done using a number of techniques, including those outlined in the MPEG-7 audio descriptor standard [1] and various others [2, 4, 5, 8, 18]. Once features have been extracted (hereafter referred to as feature vectors), a decision must be made as to which genre(s) this particular music audio file belongs. While there would seem to be a number of routes to go to handle this end of the problem [5, 18, 20], the fuzzy nature of this portion of the problem, combined with the dynamic nature of genre typology, leads to the use supervised statistical learning models, in particular hidden Markov models (HMM), which allow for training by example and a straightforward framework to compare test songs to models via the Viterbi algorithm [9]. 1.2. Feature Vector Extraction The first step of this process is to extract feature vectors that yield meaningful information concerning genre categories. What is meant by meaningful information will be explored further in the next section. First though, an examination of the various feature vectors available and how they interact with one another. 2 1.2.1. MPEG-7 Audio The MPEG-7 audio framework provides a wide variety of standardized feature vectors, intended for many purposes. These feature vectors (called Descriptors in MPEG-7 language) range from simple and very low level (AudioWavefornmD) to more complex spectral and temporal measures (AudioSpectrumBasisD, HarmonicSpectral-VariationD). Additionally, the MPEG-7 framework contains higher-level functions called DescriptorSchema (DS), which uses and processes these lower-level feature vectors in order to discern a more abstract quality of the audio in question. When attempting to classify types of audio, there is a descriptor schema of particular interest, the SoundModelDS. This DS uses the AudioSpectrum set of descriptors as a means to build a HMM describing the audio in a given class. There is a descriptor designed to work in conjunction with the SoundModelDS, the SoundModelStatePathD. This descriptor takes a model created by the SoundModelDS and an audio file to test as input. It then runs the test audio file through the same flow process used to create the input for the HMM for the model being used (though now there is only one audio file’s Spectrum Envelope data being used). But rather than create an HMM with this data, it will be used to estimate a state transition path that could account for this data with the HMM (see [9] for a good discussion of the algorithm used in this process). Along with this state path, the probability, or log likelihood, that this path is the correct path is also calculated. This log likelihood allows for a means to compare multiple HMMs to identify which of Markov model (and by extension, the training examples) best describes the test data. 3 Using this schema and descriptor group, a wide variety of different classification and retrieval tasks can be performed on a broad range of audio file types, all with fair to excellent accuracy rates [3,5,14]. However, when examining as detailed of a taxonomy as genres of music, this methodology by itself can leave something to be desired in accuracy. In addition to the SoundModelClassifierDS and the associated descriptors, there exist a number of other descriptors that can be found useful in the task of classifying by genre. These descriptors index parts of the audio signal that are dissimilar to those indexed by the Sound-ClassifierModelDS. When working on digital music files, the Sound-ClassifierModelDS can be thought to be classifying based for the most part on a sort of average timbre of each song. Further, timbre is explored in a complimentary way through the use of Cepstral Coefficients, as will be discussed later. In order to achieve the maximum accuracy possible of a genre model, care must be taken to use a diverse selection of features. Within the MPEG-7 construct, two principal parts of music should then be considered: tempo and pitch. Tempo within MPEG-7 is primarily analyzed through the use of the AudioBpmD descriptor. This descriptor’s primary use is to determine the beats per minute (BPM) of a digital music file. Besides this primary scalar feature, two other scalar measures, a correlation and accuracy measure, are returned as well, both which provide meaningful insight concerning tempo. All three of these measures are calculated at both the beginning and mid-point of each digital music file that is processed. The algorithm used to calculate the BPM is based upon a common method detailed in [15]. 4 Pitch-detection methods, though present within the MPEG-7 standard, are not robust enough to deal with source material with more than a single isolated melodic-line present. The pitch detection methods of MPEG-7 are therefore useful only within single note at a time environments [1]. This makes it unable to meaningfully process a majority of musical recordings, as most recordings contain multiple notes in parallel, across a number of instruments. These pitch detection tools are encapsulated within MelodyDS. This descriptor schema has not yet been implemented within [6]. 1.2.2. Mel-Frequency Cepstral Coefficients Mel-Frequency Cepstral Coefficients (MFCC) is a feature vector used commonly in speech recognition and classification processes [16]. It has also been shown that MFCC can be found to be useful for the task of general audio classification and specifically genre based classification [17, 18, 19]. At first glance, there are a number of problems that appear when attempting to use MFCC as a means to classify music [19]. The filterbanks are designed to maximize effectiveness on bands of audio relevant to speech, which could ignore valuable data in other audible frequency ranges. Further, though the Mel Scale was designed to mimic the response of the Cochlea, it has been shown to be fairly inaccurate. Nevertheless, MFCC has been shown to be a good, if not excellent feature to use in sound classification tasks, though there is certainly ample room for optimization of this process for this purpose. 5 A discrete cosine transform (DCT) is the final step after the Mel-Scaling has been performed. This serves to move the signal from the frequency domain into the quefrency domain, yielding the coefficients of the cepstrum [19]. 1.3. Statistical Learning Methods In examining statistical learning models for use in music and other sound classification procedures, the hidden Markov model (HMM) is a natural choice, due to its historical role in both speech recognition and categorization [7, 16]. In this usage, the HMM is formed based on the data of a matrix of feature vectors compiled from examples. This matrix is preprocessed to maximize effectiveness and then used to approximate a finite state machine that, given an optimal state transition path, could generate the original data. In effect, known information is used to generate an unknown, or hidden model. Once this model is generated, it is trained to increase the likelihood that the known data will correspond with that which is produced by the model. A related process is used to examine test data against the model to determine how possible it is for that test data to have been produced by the model generated by the training data [9]. In order to understand the construction process of a hidden Markov model, an understanding of a discrete Markov process is necessary. A discrete Markov process is a system of N states and can be seen in a graphic form in figure 1 (where N is set to 4). At a regular time period, the system will follow a 6 transition path based upon the probabilities assigned to that state (this could result in remaining in the current state). The probability of any state transition is defined in equation (1). Figure 1 A discrete Markov process with 4 states. Each state transition probability is labeled as Aij, where i is the current state and j is the previous state 7 aij = P [Qt = state(i) | Qt !1 = state( j)],1 = " j = 0 aij N (1) Where N is equal to the number of states in the discrete Markov process, Qi is the current state, and Qi-1 is the last state. This process is also commonly referred to as an observable Markov process, due to fact that the output of the process is simply the current state of the system. With a system where all transition probabilities are known, a number of behavioral probabilities can be calculated. For example, by equation (2) the probability that the model will stay in the same state for exactly T discrete time periods can be easily calculated. ( ) (1 ! a ) = p (T ) P (O | Model, q1 = State(i)) = aij T !1 ij i (2) This equation yields the probability density function of the event of staying in state i for a duration T. From this it is possible to determine the expected number of increments one can observe at a given state via equations (3a) and (3b). ! T i = " Tpi (T ) (3a) T =1 " ( ) (1 ! a ) = 1 !1a = # Tp aij T =1 T !1 ij (3b) ij The principal difference between a discrete Markov model (as highlighted above) and the hidden Markov models discussed and used throughout this paper is in the contents of the state. In a discrete Markov model, the observation and the current state in the model are the same. If, for example, one were using a discrete Markov model to model the weather, then each state would represent a 8 weather condition (i.e., sunny state, rainy state, cloudy state, etc.). However, in a hidden Markov model, the state itself is a stochastic process that is not observable (hence hidden); the only way to observe this process is via another set of stochastic processes that are observable and can be thought of as a discrete Markov model. A thorough explanation of all the mathematics necessary to construct and use hidden Markov models can be seen in [7]. For recognition or categorization tasks, a number of HMMs are generated based on different types of test data. To determine which model most closely aligns with a given test set, the Viterbi algorithm is run using the test data as the desired output against each model [9]. This algorithm finds the most probable state path within a given model to explain the corresponding output. It does this by recursively backtracking through the state transition path, selecting the most likely last state for each transition. For each model a measure of the probability is calculated, called the maximum log likelihood (MLL). The most reasonable model to associate with the test data is simply the model with highest MLL. Through this process, HMM can be used to facilitate a measure of closest fit via a given multidimensional data set, regardless of what the dataset represents. 9 2. Preliminary Study via Simple Investigation 2.1. Overview The MPEG-7 audio standard has, among its various uses, been widely used as a means of classifying audio signal through feature extraction [1]. Many of these efforts have been very successful, achieving high rates of accuracy across a large range of signal types [2, 3, 4]. In contrast, the problem of genre sorting audio files has been more limited in its success, due to a number of factors [5, 8], including poorly structured genre hierarchies, imperfect information from the digital audio source files and a lack of statistical independence between genres along measured dimensions. Therefore it seems reasonable to apply the techniques that are successful in general audio sound classification to the problem of sorting music by genre and measure the results. One of the inherent problems in the use of MPEG-7 feature vectors in the evaluation of music is the high computational burden of feature extraction. As such, and also to facilitate a speedy implementation of this preliminary study, a minimal number of MPEG-7 descriptors have been used. These descriptors are all energy or spectrum based by nature, however, the use of HMM provides a means of modeling the change of these features over time, giving a means of examining temporal information in the signal without having temporal feature vectors. The feature vectors used in this preliminary study have been selected to mirror the MPEG-7 structure for audio classification. 10 2.2. System Architecture Two distinct processes take place in the described genre classification system. The first is process to create and train a HMM for each genre class. After all of the genre HMMs have been created and trained, a second process of extracting features and model matching occurs with each song to be classified by the system. All of these computations and analysis have been done in MATLAB. Feature extraction was done through the use of the MPEG-7 XM toolkit [6]. 2.2.1. Genre Model Creation The process of creating and training an HMM to describe a genre follows a similar process as seen in [3] to build a sound model to describe a class of sounds. This can also be seen in Figure 2. First, a group of songs (or song clips) are selected to be the training examples for a given genre. Figure 2 High level overview of the model building and song testing process 11 The training examples can be as broad or focused as the particular application requires though it is important that whatever the methodology used for training sample selection, the entirety of the given genre is covered so as to minimize false negatives. For each of these songs, the AudioSpectrumEnvelope MPEG-7 descriptor is obtained. This is a logarithmic representation of the frequency spectrum on multiples of the octave and is the basic feature vector used in this entire process. Then output of each song’s AudioSpectrumEnvelope descriptor is passed into the AudioSpectrumBasis descriptor. This process can be seen graphically in Figure 3. An examination of the signal through this process can be seen in Figure 4 and Figure 5. This descriptor is a wrapper for a group of basis functions that are used to project the AudioSpectrumenvelope descriptors on to a lower dimensionality space to facilitate classification. This descriptor is generated through a matrix multiplication of the AudioSpectrumEnvelope and the matrix produced by the basis functions. This output contains the most statistically relevant features amongst all of the music’s feature space. Figure 3 A breakdown of the dimensional reduction process 12 The relationship between the raw spectrum (or the AudioSpectrumEnvelope descriptor) and the AudioSpectrumBasis descriptor can be seen across figures 4 and 5. In Figure 4, the spectrum of a 30 second clip of a song can be seen. Larger amplitudes can be seen in certain frequency ranges across the entire length of the clip. Comparing this to the basis reductions in Figure 5, these bands of higher amplitude correspond to similar high amplitude peaks in the basis functions. The AudioSpectrumBasis descriptor contains the feature vectors that are used to create the HMM that will be used to make the determination as to which of the available genres our test information will fit. The HMM used in this implementation is informative to the MPEG-7 specification for the SoundModel descriptor scheme (see [1]) and uses the standard solutions to the 3 critical HMM problems as can be found in [7]. It uses the Baum-Welch re-estimation algorithm to optimize likelihood Figure 4 Raw spectragram of an ‘electronic’ clip 13 Figure 5 The AudioSpectrumBasis of the same clip seen in Figure 4 with 10 feature vectors in the reduction This algorithm iteratively recalculates the state-to-state transitions for an HMM. This is done by calculating the expected duration the model stays in a given state and the expected number of transitions from each state to every other state. These expected values are compared to durations and transitions seen in the data set used to generate the HMM, the transition probabilities are adjusted and the process runs again. This process continues until the difference seen between the expected values and the data set are acceptably minimal. 14 2.2.2. Testing A Song The feature extraction portion of testing proceeds in much the same way as the feature extraction for the creation of each genre HMM, with the exception being that by design only one song’s (or clip’s) AudioSpectrumEnvelope descriptor will be used to generate the AudioSpectrumBasis descriptor. After the AudioSpectrumEnvelope is produced it is used to calculate a maximum logarithmic likelihood (MLL) [2] against each of the previously created HMM, one for each genre. The MLL is produced as a product of the SoundModelStatePath descriptor. This can be seen in Figure 6. Figure 6 Structure of the song testing procedure 15 This descriptor will compute the state path that will most likely be taken for a given set of feature vectors in a HMM. The MLL is a measure of how likely that particular path in the HMM is the actual path that was taken to produce the observed features. The state path and MLL are both computed using the standard Viterbi algorithm [2, 9]. This process is repeated with each of the various genre-specific HMM. The final part of the decision making process is then quite simple. The largest value MLL is taken as the mostly likely model for the song and therefore the assumed genre of the test song. G = max ([ Likelihood1 , Likelihood2 ,...Likelihoodn ]) , (4) where n = total number of genres. Lastly, after the genre has been determined a confidence index is calculated based on equation 5: " % $ ' normG C = 100 $ n ' $ normlikelihood ' i' $# ! & 1 (5) where the normlikelihood is created as follows: normlikelihoodi = likelihoodi ! min ([ Likelihood1 , Likelihood2 ,...Likelihoodn ]) (6) The idea behind this confidence metric is to get a better idea of how close the decision was as to which genre class the song is a member. Generally speaking it 16 should be higher the larger the delta between the selected genre class and the runner up genre class. The minimum possible confidence is based upon the number of genres in trial and is calculated in Equation 7. " 100 % Cmin = $ # 1 ! n '& (7) For a test set containing four genre classes, as used in the following experiment, CMIN would be equal to 33.33%. 2.3. Experiment As mentioned previously, all testing has taken place in MATLAB and, where applicable, has used the standard MPEG-7 XM toolkit. The media used was all evaluated in the Microsoft WAV format, though each file had previously been encoded as an MP3 (with bit rates varying from 128 kbs to 192 kbs). A 30-second clip from the middle of each audio file was extracted prior to the evaluation process regardless of whether the song was being used for training or for testing. In the sample set, a model of four genre classes was selected with 36 songs used for training (9 for each genre). In the testing a total of 40 songs were used (10 for each genre class). Additionally, before the test songs were run, each training song was run through the system as though it were a testing song. This was done to insure that the previously mentioned selection method was one that made sense and would produce meaningful results, prior to evaluating the test data set. The accuracy of this trial run was 100%. 17 2.4. Results After running the entire test set through the genre sorter, the results were quite good, given the limited size of the data sets. The overall accuracy of the system was 82.50%; however, there was a wide variation in the accuracy of results among the genre classes, as can be seen in Table 1. The best performance was seen in the “classical” genre class with an accuracy rate of 100%. Next after that was the “jazz” genre class with an accuracy rate of 90%. The other two genre classes, “electronic” and “rock,” came in third and forth respectively, with accuracy rates of 80% and 60%. Genre Class classical jazz electronic rock Total Accaracy Rate 100.00% 90.00% 80.00% 60.00% 82.50% Average Confidence 48.08% 47.41% 41.00% 45.29% - Table 1 Overall accuracy rates of the trial with the average confidence, by genre. 18 Figure 7 Misses of the test bed, grouped by the genre in which they should have been classified. The rock genre is clearly the most error prone in this test followed by electronic and then jazz. Classical, having no errors, is not in the chart. 2.5. Analysis and Summary With an average result of 82.5%, this system performed adequately. This is a slightly lower accuracy rate than a test of a nearly identical system seen in [14]. This accuracy rate would seem to indicate a strong (though less than that of the data set used in [14]) statistical independence of the genres, as they are defined in this data set and along the dimensionality seen using the AudioSpectrumEnvelope. However, the high rate of failure in the “rock” genre is a strong justification for additional features being used, in order to add additional dimensions to genre sorting system. There does seem to be a mild correlation between the confidence index 19 and the accuracy rate of a given genre. This provides cause for the use of the confidence index in the larger system, as will be discussed in the following chapter in further detail. 20 3. The Song Sorting System 3.1. System Overview To make the system detailed in Chapter 2 robust enough to maintain or even improve its accuracy rates as we examine more genre classes, more feature vectors must be added. These features will each be used independently of the existing system to determine a genre. After each feature chain has made a decision as to which genre a test song belongs, a final decision will be made based on which genre is chosen by at least two out of the three feature chains . If this does not provide a clear genre, the confidence measure produced by each feature chain is also used to determine the overall genre of the test song. This overview can be seen in Figure 8. As can be seen in the figure, there are three independent feature-decision chains in the Song Sorting System. The first is the system outlined in chapter 2. The second is feature chain based around an extraction of the MFCC of the music audio file. The third chain is based around the automatic extraction of the beat and tempo related information as extracted by AudioBPMD. Each of these three chains has as its output two pieces of data, the genre estimate and the confidence measure for that estimate. This confidence measure is generated through the same means for each of the feature-decision chain, in much the same way as the confidence measure was generated in the preliminary single chain system (Chapter 2, Equations 2 - 4), though it is modified in the case of tempo chain. 21 Figure 8 Broad overview of testing process with decision algorithm The inverse of Equation 5 is taken in this case, as the selected genre has the lowest (rather than highest, in the other chains) score. 3.2. The MFCC chain In many ways, the MFCC chain is vary similar to the AudioSpectrumEnvelopeD based chain used in the preliminary experiment. The only major 22 difference is that rather than extracting a number of signal envelopes, a matrix of MFCC coefficients is extracted as discussed in Chapter 1.2.2. After all the training files have their MFCC coefficient matrices extracted, the stacked matrix is sent to AudioSpectrum-ProjectionD and from there to the AudioSpectrumBasisD. This follows the same signal flow as the SoundModelDS. This modified MFCC chain’s block diagram can be seen in Figure 10. As seen in the SoundModelDS, the output of AudioSpectrumBasisD is used to create and train an HMM. This training process can be seen in Figure 9. The HMM is then used by the testing process, along with the song to be tested, to create a maximum likely path and with it the log likelihood of that path occurring and thus a genre selection can be made based on the cepstral coefficients. 3.3. The Beat Chain The third genre decision chain is based around beats per minute and the two statistical byproducts (covariance and reliability) produced by the AudioBpmD descriptor. The creation of this model is a far simpler process than the model creation of the two prior chains. The model is composed of three 2 x N matrices, where N is equal to the number of songs used in each model. These matrices store the values of each of the scalar values that are produced as output by the AudioBpmD descriptor, BPM, correlation, and reliability. The correlation and reliability are both byproducts of the filterbank into combfilter methodology employed by the tempo-detection algorithm [15]. The first row contains values calculated at the beginning of each audio file; the second row contains values 23 calculated from the midsection of each audio file. These three matrices are then stored as the beat model for a given genre. The testing phase of the beat chain begins with the extraction of the six scalars using the AudioBpmD. Each pair of scalars (BPM, correlation, and reliability) can then be thought of as a vector in a two-dimensional space. Similarly, each pair of scalars in the model matrices can be considered in the same way. Then a distance measure is taken between the test vector and each vector in the corresponding training matrix. A cosine distance measure is used here over simple Euclidean distance as the cosine distance has been found to yield better results in music similarity tasks [22]. Once the distance measures have been taken they are normalized and averaged together to yield an overall score of the test song against the genre model. This score is taken for each genre model, from which the minimum score (smallest average distance) across all the models is taken to be the genre from the perspective of the beat chain. This process is illustrated in Figure 9. This process puts equal emphasis on the tempo as well as the reliability and correlation of the tempo. This accounts for the diverse nature of different genres. Where some genres may see large difference in tempo, these same genres may show a high degree of independence and correlation in the reliability of that tempo or the correlation measure of the tempo. 24 Figure 9 Overview of the testing process for a single song 25 4. The Experiment 4.1. Proof of Concept As a starting point for this system, the same training and testing song data set was used as was used in the single chain system detailed in Chapter 2. The basis functions used to generate the hidden Markov models can be seen in Figures 10 – 13. These figures are surface plots of data that was represented in Chapter 2 in the form of multiple best-fit lines (Figure 5). These surface plots allow for an easy visual comparison of the basis functions used to generate the HMMs for each genre. The tempo related models are seen in Figures 14 – 16 in the form of scatter plots. These scatter plots make apparent the low statistical separation seen along the tempo dimension. The full results of this trial can be seen in Table 2. It can be seen that the full system gave a slight increase (0.5%) in performance. Output\Correct Classical Electronic Jazz Rock Classical Electronic Jazz 100% 0 0 90% 0 0 0 10% Rock 0 0 100% 0 0 60% 20% 20% Table 2 Summary of the genre output and error distribution seen when running the data set used in the trial in Chapter 2 through the expanded system. 26 Figure 10 The output of the basis functions used to created the hidden Markov models for the spectral envelope chain (on top) and the MFCC (bottom) for the 'classical' genre from the data set used in the run detailed in Chapter 2. Figure 11 The output of the basis functions used to created the hidden Markov models for the spectral envelope chain (on top) and the MFCC (bottom) for the 'electronic' genre from the data set used in the run detailed in Chapter 2. 27 Figure 12 The output of the basis functions used to created the hidden Markov models for the spectral envelope chain (on top) and the MFCC (bottom) for the 'Jazz' genre from the data set used in the run detailed in Chapter 2. Figure 13 The output of the basis functions used to created the hidden Markov models for the spectral envelope chain (on top) and the MFCC (bottom) for the 'Rock' genre from the data set used in the run detailed in Chapter 2. 28 Figure 14 Scatter of the BPM data for the training songs from the initial trial colorcoded by genre. The Y-axis is the first BPM reading; the X-axis is the second. Figure 15 Scatter of the covariance data for the training songs from the initial trial color-coded by genre. The Y-axis is the first covariance reading; the X-axis is the second. 29 Figure 16 Scatter of the reliability data for the training songs from the initial trial color-coded by genre. The Y-axis is the first reliability reading; the X-axis is the second. When examining the genre break down of this test it can also be seen that this expanded system lead to a more even distribution of the accuracy across each genre as compared to the outcome of the single chain test detailed in chapter 2. These results will be discussed further and compared to the larger trial’s data set and results in chapter 5. 4.2. The Data Set To facilitate full experimental trials, a significantly larger set of digital music test files is required. To that end, a means of selecting songs that are representational of a wide number of genres is needed. Since the nature of this system relies on statistical independence between genres, the specific songs chosen will clearly have a profound effect on the accuracy rates produced by the 30 classifier. Further, it is important to establish an agreed upon ground truth genre for each song used and that that genre assignment be as objective as possible, given the inherent cultural subjectivity of genre and genre classification. A natural fit for these requirements is the direct digital music retail download service. The music retail industry must make decisions and assign music into genre categories as part of its business, so the assignment of a genre is made as new material is acquired. In a physical store this is done by placing media in different areas in the store (i.e. which box does the record go in?) and by analog this is seen through the use of metadata tags describing genre in the digital download music retailers. Now, as a means for choosing a select representation from each genre, the top 40 most popular songs, as determined by sales within each genre, will be used. For these tests, Apple’s iTunes Music Store was used as the source of these lists, though there are many other digital download services available at the time of this writing that would have served just as well. For a complete list of all songs used in the trial and their genre assignments, including the breakdown of training songs and testing songs, please consult appendix A. A simple random algorithm was used to divide each of these groups of 40 songs into sub groups of 15 songs for training and 25 songs for testing. The largest trial involved the use of 10 genres, with less genres being used in subsequent trials, as will be described in detail later. 4.3. Trial Runs The trial runs are designed to examine the Song Sorting System’s structure and usefulness in a number of ways. The broadest way this is done is through the 31 overall accuracy and the accuracy measures of each genre within the test set. In order to gain further insight about the system and how it responds to the data set, the errors in labeling are broken down by genre to look for emerging correlated properties between genres based on the feature dimensionality examined in the genre assignment process. All ten genres’ selected test songs were used to create a model for each of the three feature-decision chains used in the system. For a graphical representation of the spectral envelope and MFCC features, see Figures 17 – 26. A distribution of the beat-related models can be seen in Figures 27 - 29. An examination of the basis function visualizations reveals many insights about the genres’ data. One of the most visual prominent features can be seen in the “Hip-Hop/Rap” model’s AudioSpectrumEnvelope-based basis functions (Figure 22, top graph). There is a large peak value in the Z-axis across the entire range of Y at very low values of X. This represents the bass heavy sound seen in almost all of the songs used as training examples for this genre. A similar, though less intense, bass emphasis can be seen in the alternative genre (Figure 17) The beat-related models’ scatter plots also provide information about the statistical grouping of the training example songs. In these figures, it is apparent that there is substantial overlap across the genres, along this feature’s dimensions. This overlap is seen at it’s highest in Figure 27, as BPM has the least statistical grouping. The other two scalars, seen in Figure 28 and Figure 29, still show a large amount of overlap, though there is a small amount of clustering seen in a few of the genres. 32 Figure 17 The output of the basis functions used to created the hidden Markov models for the spectral envelope chain (on top) and the MFCC (bottom) for the 'Alternative' genre Figure 18 The output of the basis functions used to created the hidden Markov models for the spectral envelope chain (on top) and the MFCC (bottom) for the 'Blues' genre 33 Figure 19 The output of the basis functions used to created the hidden Markov models for the spectral envelope chain (on top) and the MFCC (bottom) for the 'Classical' genre Figure 20 The output of the basis functions used to created the hidden Markov models for the spectral envelope chain (on top) and the MFCC (bottom) for the 'Electronic' genre 34 Figure 21 The output of the basis functions used to created the hidden Markov models for the spectral envelope chain (on top) and the MFCC (bottom) for the 'Folk' genre Figure 22 The output of the basis functions used to created the hidden Markov models for the spectral envelope chain (on top) and the MFCC (bottom) for the ‘HipHop/Rap’ genre 35 Figure 23 The output of the basis functions used to created the hidden Markov models for the spectral envelope chain (on top) and the MFCC (bottom) for the 'Jazz' genre Figure 24 The output of the basis functions used to created the hidden Markov models for the spectral envelope chain (on top) and the MFCC (bottom) for the 'Pop' genre 36 Figure 25 The output of the basis functions used to created the hidden Markov models for the spectral envelope chain (on top) and the MFCC (bottom) for the 'Rock' genre Figure 26 The output of the basis functions used to created the hidden Markov models for the spectral envelope chain (on top) and the MFCC (bottom) for the 'R & B/Soul' genre 37 Figure 27 Scatter of the BPM data for the training songs color-coded by genre. Yaxis is the first BPM reading, X-axis is the second. Figure 28 The covariance of each BPM sorted by genre. The first reading makes up the Y-axis and the second is the X-axis. 38 Figure 29 The reliability of the BPM measures taken for the training of each genre model. The Y-axis is from the first reading and the X-axis is the second. These models were then used for a series of four test runs, each using a different subsection of the test songs. The purpose of the series of tests is to examine the effect of limiting the number of genre classes on the overall accuracy and the trends within the error spread of a given genre. For the first run, all ten genre-models and the associated 250 test songs were used in the trial (25 songs per genre). In the second and third trials, the two worst performing genres of the previous trial are removed from the data set. So, in the second run there are 200 test songs evenly distributed across 8 genres. Similarly, in the third trial in additional two genres are removed from the data set, leaving 150 songs across the remaining 6 genre classes. 39 The forth trial was a bit different. Rather than remove the worst two performing genres to decide upon the four classes to use in the last trial, the four genres that were used in the proof of concept run were again used, but this time as defined by the iTunes Music Store derived data set. The purpose of this test is to examine the observable difference in genre definition and its corresponding effect on accuracy of genre assignment. 40 5. Results 5.1. Ten-Genre Trial The ten genre-class trial was the first and largest of the trials using the data from the iTunes Music Store. As noted in chapter 4, there were 15 songs used to train each genre model, for a total of 150 songs used in the training process. On the testing side, there were 25 songs used per genre for a total of 250 songs used in testing, giving a grand total of 400 songs for the entire trial. As would be expected, these trials took a non-trivial amount of time to process. All software was built and ran in MATLAB 7 (R14) on an Apple G4 1Ghz Powerbook with 1GB of RAM. In that environment the models took 60 minutes to run, for a total run time of approximately 8 hours. The models were only built once and were used for all the subsequent trials. The testing process was a bit faster, though still lengthy, with each song taking a bit over a minute to process for a total testing run time of about 5 hours. Accuracy rates on the full data set were less than stellar, with an overall accuracy rate of 37.75%. This accuracy rate was slightly better than the best performing of the three feature-decision chains, the spectral envelope chain, which had an accuracy rate of 37.35%. The accuracy of the MFCC chain came next with an accuracy of 32.1%. The tempo-based chain did the poorest with an accuracy of 13.7%. The overall accuracy of the entire system is broken down by genre-class in Table 3. 41 output\expected A B C E F H J P RB R alternative 40% 20% 0% 16% 12% 0% 0% 8% 4% 52% blues 4% 52% 0% 4% 24% 4% 28% 0% 4% 4% classical 0% 0% 52% 0% 0% 0% 16% 0% 0% 0% electronic 0% 0% 0% 12% 0% 0% 0% 0% 4% 0% folk 0% 8% 4% 4% 48% 4% 8% 0% 12% 8% hip-hop/rap 8% 0% 0% 16% 4% 48% 0% 12% 16% 4% jazz 0% 8% 24% 0% 0% 0% 40% 4% 0% 0% pop 20% 4% 16% 12% 12% 4% 4% 32% 28% 0% R & B/Soul 8% 4% 4% 24% 0% 36% 4% 36% 32% 12% rock 20% 4% 0% 12% 0% 4% 0% 8% 0% 20% Table 3 Actual genre output by the system versus expected genre output for the large datatset of all ten genre classes. Since each of the three feature-decision chains are independent genre sorters in and of themselves, a breakdown of output to expected output can be seen for each of the three in Tables 4 - 6. output\expected A B C E F H J P RB R alternative 16% 8% 0% 4% 4% 0% 0% 4% 0% 16% blues 4% 64% 0% 0% 20% 0% 28% 0% 8% 4% classical 0% 0% 52% 0% 0% 0% 16% 0% 0% 0% electronic 0% 0% 0% 20% 4% 4% 0% 0% 4% 4% folk 0% 0% 4% 4% 40% 0% 8% 0% 4% 0% hip-hop/rap 16% 0% 0% 16% 4% 44% 0% 0% 12% 4% jazz 0% 8% 24% 0% 0% 0% 40% 0% 0% 0% pop 20% 8% 16% 20% 28% 8% 4% 40% 40% 32% R & B/Soul 8% 8% 4% 32% 0% 36% 4% 48% 32% 16% rock 36% 4% 0% 4% 0% 8% 0% 8% 0% 24% Table 4 Actual genre output by the spectral envelope based system versus expected genre output for the large dataset of all ten genre classes. 42 output\expected A B C E F H J P RB R alternative 56% 28% 0% 16% 12% 0% 0% 24% 8% 60% blues 0% 24% 0% 4% 20% 4% 4% 0% 4% 0% classical 0% 4% 44% 0% 8% 0% 16% 0% 0% 0% electronic 8% 4% 0% 8% 0% 4% 0% 4% 0% 0% folk 0% 20% 4% 8% 44% 4% 24% 0% 8% 12% hip-hop/rap 0% 0% 0% 8% 0% 36% 0% 20% 28% 8% jazz 4% 8% 44% 8% 8% 0% 48% 4% 16% 0% pop 12% 8% 4% 8% 4% 4% 4% 12% 0% 4% R & B/Soul 8% 4% 4% 28% 4% 40% 4% 24% 36% 4% rock 12% 0% 0% 12% 0% 8% 0% 12% 0% 12% Table 5 Actual genre output by the MFCC based system versus expected genre output for the large dataset of all ten genre classes. output\expected A B C E F H J P RB R alternative 52% 32% 40% 52% 44% 20% 44% 40% 36% 56% blues 4% 12% 16% 4% 20% 8% 12% 12% 12% 12% classical 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% electronic 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% folk 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% hip-hop/rap 44% 56% 36% 44% 32% 72% 40% 48% 52% 32% jazz 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% pop 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% R & B/Soul 0% 0% 8% 0% 4% 0% 4% 0% 0% 0% rock 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% Table 6 Actual genre output by the tempo based system versus expected genre output for the large dataset of all ten genre classes. 5.2. Eight-Genre Trial As can be seen in Table 3, the two lowest accuracy genre classes in the ten genre class trial were electronic and rock, with accuracies of 12% and 20% respectively. So, for the trial with eight genres, these two genre models and their associated test files were removed from the data set and the test was run again. The elimination of these two models helps the overall accuracy considerably, 43 increasing it to 51%. The full breakdown of system output versus expected output appears in Table 7. Table 7 Actual genre output of the combined system versus expected genre output output\expected A B C F H alternative 52% 20% 0% 12% 4% blues 12% 56% 0% 24% 4% classical 0% 0% 56% 0% 0% folk 0% 8% 0% 52% 4% hip-hop/rap 8% 0% 0% 4% 48% jazz 0% 8% 24% 0% 0% pop 20% 4% 16% 8% 8% R & B/Soul 8% 4% 4% 0% 32% for the second trial containing eight genre classes. J 0% 24% 16% 12% 0% 40% 4% 4% P 28% 0% 0% 0% 16% 0% 28% 28% RB 4% 8% 0% 12% 16% 4% 32% 24% As with the ten class trial, the overall genre was slightly better than any single feature-decision chain. The accuracy order of the three chains remained the same, with the spectral envelope chain’s accuracy at 50.5%, the MFCC chain’s accuracy at 48% and the tempo-based chain’s accuracy at 17%. Even though the system as a whole did show improvement, it is interesting to note that two genres that have the lowest accuracies in this trial, ‘Pop’ and ‘R & B/Soul’, both actually decreased in accuracy from the ten-genre run. As with first trial, a results Table for each feature-decision chain follows (Table 8 – 10). 44 output\expected A B C F H J P RB alternative 28% 8% 0% 4% 0% 0% 12% 4% blues 4% 68% 0% 20% 0% 28% 0% 8% classical 0% 0% 52% 0% 0% 16% 0% 0% folk 0% 0% 4% 44% 0% 8% 0% 4% hip-hop/rap 20% 0% 0% 4% 44% 0% 0% 12% jazz 0% 8% 24% 0% 0% 40% 0% 0% pop 40% 8% 16% 28% 16% 4% 40% 40% R & B/Soul 8% 8% 4% 0% 40% 4% 48% 32% Table 8 Actual genre output by the spectral envelope based system versus expected genre output for the second trial containing eight genre classes. output\expected A B C F H J P RB alternative 56% 28% 0% 12% 8% 0% 28% 4% blues 8% 24% 0% 16% 8% 4% 0% 8% classical 0% 4% 56% 8% 0% 20% 0% 4% folk 0% 24% 0% 52% 8% 24% 4% 8% hip-hop/rap 4% 0% 0% 0% 40% 0% 24% 8% jazz 4% 12% 40% 8% 0% 48% 0% 16% pop 20% 4% 4% 4% 8% 4% 28% 16% R & B/Soul 8% 4% 0% 0% 28% 0% 16% 36% Table 9 Actual genre output by the MFCC based system versus expected genre output for the second trial containing eight genre classes. output\expected A B C F H J P RB alternative 52% 32% 40% 44% 20% 44% 40% 36% blues 4% 12% 16% 20% 8% 12% 12% 12% classical 0% 0% 0% 0% 0% 0% 0% 0% folk 0% 0% 0% 0% 0% 0% 0% 0% hip-hop/rap 44% 56% 36% 32% 72% 40% 48% 52% jazz 0% 0% 0% 0% 0% 0% 0% 0% pop 0% 0% 0% 0% 0% 0% 0% 0% R & B/Soul 0% 0% 8% 4% 0% 4% 0% 0% Table 10 Actual genre output by the tempo based system versus expected genre output for the second trial containing eight genre classes. 45 5.3. Six-Genre Trial This trial is of the six genre classes that have scored the most accurate on the prior tests. As mentioned above, from Table 7 it can be seen that the two least accurate genre classes from this trial are ‘pop’ and ‘R & B/soul.’ As such these two genres are not included in the third trial in the series. Therefore, this trial has 150 songs across the six remaining genre classes. The overall accuracy of this trial increased substantially with an overall accuracy of 77.3%. The genre-bygenre accuracy and error rates appear in Table 11. output\expected A B C F H J alternative 56% 8% 0% 8% 8% 0% blues 4% 56% 4% 28% 0% 32% classical 0% 0% 68% 0% 0% 16% folk 4% 28% 8% 60% 12% 12% hip-hop/rap 36% 4% 4% 4% 80% 0% jazz 0% 4% 16% 0% 0% 40% Table 11 Actual genre output of the combined system versus expected genre output for the third trial containing six genre classes. Interestingly, in this third trial, the spectral envelope chain had a higher overall accuracy by itself, 84%, than the overall system accuracy. This may be due to the smaller improvement seen in the MFCC chain, which correctly assign genre to 67.3% of the test data set. The tempo-based chain was again last, scoring correctly only 22.7% of the time. A breakdown of each feature decision chain’s performance for this trial can be seen in Tables 12 – 14. 46 output\expected A B C F H J alternative 52% 12% 0% 8% 12% 0% blues 4% 72% 4% 28% 4% 32% classical 0% 0% 68% 0% 0% 16% folk 0% 8% 8% 60% 0% 12% hip-hop/rap 44% 4% 4% 4% 84% 0% jazz 0% 4% 16% 0% 0% 40% Table 12 Actual genre output of the spectral envelope system versus expected genre output for the third trial containing six genre classes. output\expected A B C F H J alternative 52% 4% 0% 4% 8% 0% blues 0% 24% 0% 12% 0% 4% classical 0% 8% 44% 4% 0% 16% folk 24% 48% 8% 72% 36% 32% hip-hop/rap 20% 0% 0% 0% 56% 0% jazz 4% 16% 48% 8% 0% 48% Table 13 Actual genre output of the MFCC system versus expected genre output for the third trial containing six genre classes. output\expected A B C F H J alternative 52% 32% 40% 44% 20% 44% blues 4% 12% 16% 24% 8% 12% classical 0% 0% 0% 0% 0% 0% folk 0% 0% 0% 0% 0% 0% hip-hop/rap 44% 56% 44% 32% 72% 44% jazz 0% 0% 0% 0% 0% 0% Table 14 Actual genre output of the tempo-based system versus expected genre output for the third trial containing six genre classes. 5.4. The Two Four-Genre Trials (Original Dataset v. iTunes Dataset) In an effort to qualify the various differences between the original dataset used in chapter 2 and the dataset gathered through the best seller lists at the iTunes Music Store, the last trial uses the same four genres from the large data set that were used in the initial trials. Two of these genres, “rock” and “electronic,” were eliminated in between the first and second trials in the earlier series. The third and forth genres, “jazz” and “classica,” were in all three of the 47 previous trials and had relatively high accuracies as well. The overall accuracy of the two runs was very different. This delta was seen in all the feature-decision chains in varying amounts and can be seen in Table 15. feature\data original iTunes derived Spectral Env. 82.93% 61.62% MFCC 65.85% 55.56% tempo 31.71% 27.27% 75.61% overall 59.60% Table 15 A side-by-side comparison of the accuracy rates of the two data sets output accuracy. 48 6. Analysis and Conclusion 6.1. Overall Performance On the whole the hybrid song sorting system performed well, though with clear limitations. The most prevalent of these limitations (at least on the given test data) is one of genre overlap. This is caused by the training examples of each genre not being sufficiently dissimilar from other genre’s training material, along the dimensions of the feature chains in the system. The best way to see this visually is in the BPM scatter plot (Figure 27), which has the most pronounced overlap, rendering tempo chain only marginally helpful in improving the accuracy of the overall system. As a direct result of this overlap effect, the usefulness of this automatic system is significantly higher if there are fewer classes of data in the data set. This can be easily seen in Figure 30. When genres are removed from the trial dataset, any overlap that genre contributed is also removed. This causes behavior that shows nearly exponential improvement in accuracy as genres are removed from the trial. 49 Figure 30 Accuracy versus number of genre classes, taken from the iTunes derived dataset, with points for each of the individual feature-decision chains and a best fit line formed from the overall results. Taking this into account, it is also interesting to look beyond the accuracy rates of the various trials and examine where the errors were. In looking at these errors, especially those in the large initial trial (seen in Tables 3 – 6, it seems there are some telling patterns. There seems to be large variation in the topology of genre classes used to describe the set of 400 songs used. Looking at the ‘hiphop/rap’ genres distribution, 48% were correct, but a full 36% were thought to be of the genre “R & B/soul.” Conversely, “R & B/soul,” though a bit less accurate overall confirms this clear overlap of definition. Its accuracy rate is 32% yet 16% were incorrectly categorized as “hip-hop/rap.” This relationship is further exposed by the observable leap in accuracy in the “hip-hop/rap” genre when the 50 “R & B/soul” genre is eliminated from the data set in between the eight class and six class trials. Similar patterns can be seen to a varying degree amongst many of the other genre classes. There seems to exist a triangular overlap of definition between “alternative,” “blues” and “folk” that continued throughout all three trials. All of these overlapping genres are of course dependant on the feature vectors extracted. The overlap is observable from the perspective of the features used in this system but there may exist features that would eliminate one or more of these boundary definition problems (i.e. melodic structure feature vectors). Another question that emerges from these trials is that of effect of training song selection. One of the more notable differences between the smaller first data set used in the earlier trials and the larger one of ten genre classes is the means of selecting training songs. In the first data set the training songs were manually chosen to best represent their genre in the training model. In contrast, when using the larger data set the training songs were selected at random out of the available 40 songs genre group. This was done in an attempt to improve objectivity in the testing set-up however it may have had a substantial negative impact on the accuracy of the results. 6.2. The Intelligently Chosen Set In the last trial conducted with the models based on songs labeled by the iTunes Music Store, a four-genre test is directly compared to the trial using the initial set of songs used for the early tests. Unlike the iTunes sourced data, the data initial used was deliberately chosen to create statistically separated genres and the comparative results are telling (Table 15). There was substantial improve 51 seen by each of the feature chains, regardless of the relative performance seen in the prior trials. This leads to two related conclusions. First, the data set extracted from the iTunes Music Store is clearly far from the ideal topology that was described in Chapter 1. Second, this methodology of sorting songs is only as good as the initial training example songs’ manually chosen genre topology. Any overlap or holes present at that stage will reduce the effectiveness of the system. 6.3. What can this tell us about music? (Or At Least About Popular Manual Sorting Methodology.) From the trials run on the data set from the iTunes Music Store and specifically the trial just discussed, it seems clear that, at least through the eye of this sorting system, the genre topology used is far from ideal (as defined in chapter 1). It seems safe to say that this contributed at least somewhat to the high degree of error seen in the trials (especially the full test of all ten genres). That said, it is difficult to tell exactly how well this automatic sorting system could perform on a large number of genre classes if those genre classes were closer to ideal in their topology. Given the comparative last trial run on the two four-genre datasets, it can be inferred that an improvement on the order of 10% 15% would be reasonable to expect in a ten-genre test using data that was more intelligently split, with larger gains possible. 6.4. Future Work There are a number of possible avenues of further study that can continue where this research is ending. Without changing the system, a worthwhile 52 investigation could be seen in the use of a variety of song collections arranged in many genre topologies. One of particular interest would be a dataset with genres assigned to songs by a surveyed group of listeners. Then the automated genre assignment process can be evaluated against a group of people who have no commercial interest in the genre assignment (unlike the genre assignment of the ITunes Music Store, in which genre decisions have clear commercial effects). There is also potential in improving the structure of the system itself. The most immediate of these possible changes is to supplement the existing threechain system with other feature-decision chains based on more (preferable highly dissimilar to those currently in the system) feature vectors. Of particular interest are “musically aware” features, as features that has a deeper description of musical structure have a greater potential to separate what might otherwise be an overlapping topology. This could improve the accuracy of the system a great deal, though with a clear cost of computational time. Along these lines, clearly there is room for some improvement in the performance of the tempo-based feature decision chain. Perhaps some form of clustering could be used to increase independence of each genre prior to taking the distance. Although based on the distribution seen in the data set used for the trial in this document, there may not be much to gain through this course of action. Lastly, there is great potential in the use of the concept used in the genre sorting system described above as a means to index a songs’ similarity to other songs along the dimensionality of the feature decision chains used in such a system. A quick way to achieve this would be to have every song be a “genre” by itself. Then each song would be tested against each of these one-song-genres. 53 The genre that a song was placed in would in fact be the most similar song (along the dimensions of the features in the system) to the test song. A modified system like this could offer a far more flexible and adaptable classification solution to a landscape of constantly changing music and culture. 54 Bibliography [1] J. Martínez, “MPEG-7 Overview (version http://www.chiariglione.org/mpeg/standards/mpeg-7/mpeg-7.htm, 2005. 10),” Oct. [2] H. Kim, E. Berdahl, T. Sikora, Study of MPEG-7 Sound Classification and Retrieval, http://ccrma.stanford.edu/~eberdahl/Papers/ITG.pdf. [3] M. Casey, “MPEG-7 sound-recognition tools,” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 11, No. 6, June 2001, pp. 737747. [4] Z. Xiong, R. Radhakrishnan, A. Divakaran, T. S. Huang, “Comparing MFCC and MPEG-7 Audio Features for Feature Extraction, Maximum Likelihood HMM and Entropic Prior HMM for Sports Audio Classification,” IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003. [5] K. Kosina, “Music Genre recognition,” University of Hagenberg, June 2002 [6] M. Casey, “All-XM.zip” ISO/IEC 15938-4:2001 Audio Reference Software (ISO/IEC 15938-4:AMD1 2003), April 2001, 2003 [7] L. R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proc. of the IEEE, Vol. 77, No. 2, Feb 1989, pp. 257 – 286 [8] G. Tzanetakis, P. Cook, “Musical Genre Classification of Audio Signals,” IEEE Trans. on Speech and Audio Processing, VOL. 10, NO. 5, JULY 2002 [9] G.D.Forney, “The Viterbi algorithm,” Proc. IEEE, vol.61. pp.268-278, Mar.1973. [10] C. Beghtol, “The Concept of Genre and Its Characteristics,” Bulletin of the American Society for Information Science and Technology, vol. 27, No. 2, Dec./Jan. 2001 55 [11] Chandler, Daniel (1997): “An Introduction to Genre Theory” [WWW document] URL http://www.aber.ac.uk/media/Documents/intgenre/intgenre.html [12th, January, 2006] [12] Apple Computer, Oxford American Dictionary, from the program “Dictionary,” Version 1.0.1, 2005 [13] R. Altman, “A Semantic/Syntactic Approach to Film Genre,” Cinema Journal, Vol. 23, No. 3 (Spring, 1984), pp.6-18 [14] M. Casey, “Generalized sound classification and similarity in MPEG-7” Organized Sound, Vol.6, No.2, 2002. [15] E. Scheirer, “Tempo and Beat Analysis of Acoustic Musical Signals,” J. Acoust. Soc. Am. Vol 103, No.1, (Jan. 1998), pp. 588 – 601 [16] L. Rabiner and B. Juang, Fundamentals of Speech Recognition, 1993, Prentice-Hall Signal Processing Series, Englewood Cliffs, New Jersey [17] J.C. Brown, “Computer Identification of Musical Instruments Using Pattern Recognition with Cepstral Coefficients ,“ J. Acoust. Soc. Am. Vol. 105, No. 3, (Mar. 1999), pp.1933-1941 [18] M.J. Carey, E. Parris, H. Lloyd-Thomas, “A Comparison of Features for Speech, Music Discrimination,” Proc. IEEE Int. Conf. on Acoustics, Speech, Signal Processing (Phoenix, AZ), Mar. 1999. [19] B. Logan, “Mel-Frequency Cepstral Coefficients for Music Modeling,“ Int. Symposium on Music Information Retrieval, 2000. [20] K. Kosaina, “Music Genre Recognition” Hagenberg, June 2002. [21] U. Zöler, DAFX, Digital Audio Effects, 2002, Wiley, West Sussex, England [22] M. Cooper and J. Foote, “Automatic music summarization via similarity analysis,” in Proc. of ISMIR 2002, pp. 81–85, 2002. [23] S. Davis and P. Mermelstein, “Experiments in syllable-based recognition of continuous speech,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 28, pp. 357–366, Aug. 1980. [24] M. H. DeGroot and M. J. Schervish, Probability and Statistics, 3rd edition, 2002, Addison-Wesley 56 [25] V. N. Vapnik, The Nature of Statistical Learning Theory, 2nd edition, 2000, Springer-Verlag New York [26] R. D. Redman, Computer Speech Technology, 1999, Artech House Inc, Norwood, MA 57 Appendix Full Trial Song Listings with Genre Assignments Note: Each of these songs appeared on the Top 40 highest sales in their genre for the day of March 13, 2006. A.1 Alternative Rock --Dirty Little Secret The All-American Rejects --Swing, Swing The All-American Rejects --I Bet You Look Good On the Dancefloor Arctic Monkeys --Superman Five for Fighting --Stacy's Mom Fountains of Wayne --Feel Good Inc. (Album Crossfade) Gorillaz --American Idiot Green Day --Good Riddance (Time of Your Life) Green Day --The Middle Jimmy Eat World --The Only Difference Between Martyrdom and Suicide Is Press Coverage Panic! At The Disco --Lying Is the Most Fun a Girl Can Have Without Taking Her Clothes Off Panic! At The Disco --Somebody Told Me The Killers --Wonderwall Oasis --Remedy Seether --Forever Young Youth Group --Move Along The All-American Rejects --Talk Coldplay --Yellow Coldplay --Soul Meets Body Death Cab For Cutie --Dance, Dance Fall Out Boy --Sugar, We're Goin Down Fall Out Boy --Take Me Out Franz Ferdinand --Dare Gorillaz --Boulevard of Broken Dreams Green Day --Wake Me Up When September Ends Green Day --Holiday (Faded Ending) Green Day --Wings of a Butterfly H.I.M. (His Infernal Majesty) --The Reason Hoobastank --Mr. Brightside The Killers --King Without a Crown Matisyahu --Youth Matisyahu --Time of Your Song Matisyahu --I Write Sins Not Tragedies Panic! At The Disco --Who I Am Hates Who I've Been Relient K --Tear You Apart She Wants Revenge 58 --Perfect Situation --Beverly Hills --Seven Nation Army --Gold Lion --Ocean Avenue Weezer Weezer The White Stripes Yeah Yeah Yeahs Yellowcard A.2 Blues --Soul Man --Georgia On My Mind --Sweet Home Chicago --Smoking Gun --I'd Rather Go Blind --Red Light --I'd Love to Change the World --I Can't Make You Love Me --Lets Give Them Something --Bad to the Bone --Pride and Joy --The Sky Is Crying --Tightrope --Texas Flood --One Bourbon, One Scotch, One Beer --Ain't No Sunshine When She's Gone --Hey Bartender (Live) --Rubber Biscuit --Tuff Enuff --Moondance --When You're Walking Away --Damn Right, I've Got the Blues --Ain't No Sunshine --I've Got Dreams to Remember --Boom Boom --At Last (Single) --Born Under a Bad Sign --The Thrill Is Gone (1969 Single Version) --Riding with the King --Lie to Me --Misty Blue --Mannish Boy --Something to Talk About --Special Lady --Honky Tonk Women The Blues Brothers Ray Charles Eric Clapton Robert Cray Etta James Jonny Lang Ten Years After Bonnie Raitt Bonnie Raitt George Thorogood & The Destroyers Stevie Ray Vaughan Stevie Ray Vaughan Stevie Ray Vaughan Stevie Ray Vaughan & Double Trouble George Thorogood & The Destroyers Bobby "Blue" Bland The Blues Brothers The Blues Brothers The Fabulous Thunderbirds Georgie Fame Feat. Van Morrison & Jon Hendricks Jackie Greene Buddy Guy Buddy Guy & Tracy Chapman Buddy Guy & John Mayer John Lee Hooker Etta James Albert King B.B. King B.B. King & Eric Clapton Jonny Lang Dorothy Moore Muddy Waters Bonnie Raitt Ray, Goodman & Brown Taj Mahal 59 --It Hurt So Bad --Cocaine Blues --Who Do You Love? --Crossfire --Couldn't Stand the Weather --Superstition --The House Is Rockin' Susan Tedeschi George Thorogood & The Destroyers George Thorogood & The Destroyers Stevie Ray Vaughan Stevie Ray Vaughan Stevie Ray Vaughan & Double Trouble Stevie Ray Vaughan & Double Trouble A.3 Classical Sayuri's Theme Time to Say Goodbye John Williams & Yo-Yo Ma Andrea Bocelli & Sarah Brightman Danny Boy James Galway & The Chieftains Con te partiro Andrea Bocelli The Prayer Andrea Bocelli & Céline Dion Because We Believe Andrea Bocelli Unaccompanied Cello Suite No. 1 in G Major, BWV 1007: I. Prélude 2:21 Yo-Yo Ma (Bach) Schindler's List: Theme John Williams A Dream Discarded (Live Version) John Williams & Yo-Yo Ma Going To School (Live Version) John Williams & Yo-Yo Ma Somos novios (It's Impossible) Andrea Bocelli & Christina Aquilera Symphony No.5 in C Minor: I. Allegro con brio Orchestre Révolutionnaire et Romantique & John Eliot Gardiner (Beethoven) Canon in D English Chamber Orchestra & Raymond Leppard (Pachelbel) Fanfare for the Common Man Aaron Copland & London Symphony Orchestra Turandot, Act III, Nessun dorma! Luciano Pavarotti Amazing Grace The Canadian Scottish Regiment Pipes and Drums & The Third Marine Aircraft Wing Band Rhapsody in Blue Columbia Symphony Orchestra & Leonard Bernstein (Gershwin) "Eine kleine Nachtmusik", Serenade in G Major, K. 525: I. Allegro Academy of St. Martin in the Fields & Sir Neville Marriner (Mozart) 60 Piano Sonata No. 14 in C Sharp Minor, Op. 27, No. 2, "Moonlight": I. Adagio sostenuto Alfred Brendel Besame mucho Andrea Bocelli Concerto for 2 Violins in D Minor, BWV 1043: I. Vivace Hilary Hahn, Jeffrey Kahane, Los Angeles Chamber Orchestra & Margaret Batjer (Bach) "Für Elise" - Bagatelle in A Minor, WoO 59 Alfred Brendel (Beethoven) Theme from Rocky Cincinnati Pops Orchestra & Erich Kunzel Can't Help Falling In Love Andrea Bocelli Symphony No. 5 In C Minor, Op. 67: Allegro Con Brio Zagreb Philharmonic - Csr Symphony Orchestra (Bratislava) - Richard Edlinger – Michael Halasz (Beethoven) Carmina Burana: O Fortuna Boston Symphony Orchestra & Seiji Ozawa Olympic Theme Frederick Fennell & The Cleveland Symphonic Winds Pachebel: Canon In D A Brides Guide To Wedding Music Gabriel's Oboe Yo-Yo Ma, Roma Sinfonietta & Ennio Morricone Ride of the Valkyries from The Ring Richard Wagner Bagatelle In A Minor, Wo0 59 'Fur Elise' Beethoven The Prayer Andrea Bocelli & Céline Dion Turandot: 'Nessun Dorma!" John Alldis Choir, London Philharmonic Orchestra, Luciano Pavarotti, Wandsworth School Boys Choir & Zubin Mehta Adagio for Strings (arranged from the String Quartet, Op. 11) Leonard Bernstein & New York Philharmonic Unaccompanied Cello Suite No. 1 in G Major, BWV 1007: II. Allemande Yo-Yo Ma (Bach) Star Wars (Main Title) John Williams Canon and Gigue in D Major: I. Canon English Concert & Trevor Pinnock (Pachelbel) Air on a G String from Orchestral Suite No. 3 J.S. Bach O Fortuna from Carmina Burana Carl Orff Molto allegro from Symphony No. 40, K. 550 Wolfgang Amadeus Mozart 3.d Electronic: 61 --Starry Eyed Surprise --Hide and Seek --Axel F (Radio Mix) --One More Time --24 --Technologic --The Rockafeller Skank --Galvanize --They --Ready, Steady, Go --Flying High --Teardrop --South Side --Smack My B***h Up --In the Waiting Line --Porcelain --Random --Breathe --Take Me Away (Into the Night) --Pump Up the Jam --Blue (Da Ba Dee) --Goodnight and Go --Number 1 --Dreams (Featuring Stevie Nicks) --Breathe --Just a Ride --Firestarter --Flip Ya Lid --Your Woman --Keep Hope Alive --Return To Innocence --Come On Closer --Block Rockin' Beats --Sour Times --It Feels So Good Paul Oakenfold Imogen Heap Crazy Frog Daft Punk Jem Daft Punk Fatboy Slim The Chemical Brothers Jem Paul Oakenfold Jem Massive Attack Moby & Gwen Stefani Prodigy Zero Moby Lady Sovereign Telepopmusik 4 Strings Technotronic Eiffel 65 Imogen Heap Goldfrapp Deep Dish Prodigy Jem Prodigy Nightmares on Wax White Town The Crystal Method Enigma Jem The Chemical Brothers Portishead Sonique --Love Generation (Featuring Gary Pine) --Extreme Ways --Silence (DJ Tiësto's In Search of Sunrise Edit) Bob Sinclar Moby Delerium & Sarah McLachlan Thievery Corporation Afrika Bambaataa --Lebanese Blonde --Don't Stop...Planet Rock 3.e Folk 62 Training: --The Blower's Daughter --Travelin' Thru --World Spins Madly On --Closer to Fine --Closer Damien Rice Dolly Parton The Weepies Indigo Girls Joshua Radin --Cannonball --Cat's In the Cradle --Finnegan's Wake --Whisky You're the Devil Damien Rice Harry Chapin Clancy Brothers Clancy Brothers --When It Don't Come Easy --Colors --The Trapeze Swinger --California Stars --Bad, Bad Leroy Brown Patty Griffin Amos Lee Iron & Wine Billy Bragg and Wilco Jim Croce --Summer In the City Wagon Wheel The Lovin' Spoonful The Old Crow Medicine Show Jim Croce Gordon Lightfoot Indigo Girls Operator (That's Not the Way It Feels) --If You Could Read My Mind (Album Version) --Galileo --Unplayed Piano (Chris Lord-Alge Mix) Gotta Have You Ramblin' Irishman --Year of the Cat The Humours of Whiskey Damien Rice & Lisa Hannigan The Weepies Andy M. Stewart Al Stewart Andy M. Stewart & Mannus Lunny Everything'll Be Alright (Will's Lullaby) When the Stars Go Blue --Heartbeats The Humours of the King of Ballyhooley --Delicate Joshua Radin The Corrs José González Patrick Street Damien Rice --Puff, the Magic Dragon --Sunny Road --Wedding Song (There Is Love) --Keep It Loose, Keep It Tight Don't Think Twice, It's All Right Peter, Paul And Mary Emiliana Torrini Peter, Paul And Mary Amos Lee Bob Dylan 63 --If You Could Read My Mind --Leaving On a Jet Plane --Volcano --Wreck of the Edmund Fitzgerald (LP Version) --Pink Moon Gordon Lightfoot Peter, Paul And Mary Damien Rice Gordon Lightfoot Nick Drake 3.f Hip-Hop/Rap --I'm N Luv (Wit a Stripper) --Shake That --Pump It --Ms. New Booty --My Humps --Lean Wit It, Rock Wit It --Grillz --Ridin' --Fresh AZIMIZ --Gold Digger --Touch the Sky --Laffy Taffy --Poppin' My Collar --Soul Survivor --Tell Me When To Go --Best Friend --Turn It Up --Jesus Walks --Touch It (Remix) --If It's Lovin' That You Want --Let's Get It Started (Spike Mix) --Pon de Replay --Stay Fly --Baby Got Back --My Hood --Rodeo --In da Club T-Pain & Mike Jones Eminem Black Eyed Peas Bubba Sparxxx Featuring Ying Yang Twins Black Eyed Peas Dem Franchize Boyz Featuring Peenut & Charlay Nelly featuring Paul Wall, Ali & Gipp Chamillionaire & Krayzie Bone Bow Wow, J-Kwon & Jermaine Dupri Kanye West Kanye West D4L Three 6 Mafia Young Jeezy & Akon E-40 Olivia & 50 Cent Chamillionaire Kanye West Busta Rhymes Rihanna Black Eyed Peas Rihanna Three 6 Mafia featuring Young Buck & Eightball & M.J.G. Sir Mix-a-Lot Young Jeezy Juvenile 50 Cent 64 --Switch --There It Go (The Whistle Song) --Oh I Think Dey Like Me --I'm Sprung --Soul Survivor --Where Is the Love? --When I'm Gone --Pump It --One Wish (Radio Edit) --Rompe (Remix) --Oh Yes --Girl --Fireman (Main) Will Smith Juelz Santana Dem Franchize Boyz T-Pain Young Jeezy & Akon Black Eyed Peas & Justin Timberlake Eminem Black Eyed Peas Ray J Daddy Yankee Featuring Lloyd Banks and Young Buck Juelz Santana Paul Wall Lil' Wayne 3.g Jazz What a Wonderful World (Single) Take Five Do You Know What It Means to Miss New Orleans So What Sparks What Are You Doing the Rest of Your Life? Moody's Mood for Love (I'm In the Mood for Love) My One and Only Love My Funny Valentine The Look of Love What You Won't Do for Love (Original) In the Mood All at Sea The Secret Garden (Sweet Seduction Suite) Blue in Green Sing, Sing, Sing Louis Armstrong The Dave Brubeck Quartet Take 6 Featuring Aaron Neville Miles Davis Wynton Marsalis Chris Botti featuring Sting Brian McKnight, James Moody, Quincy Jones, Rachelle Ferrell & Take 6 John Coltrane & Johnny Hartman Miles Davis & Miles Davis Quintet Diana Krall Bobby Caldwell Glenn Miller Jamie Cullum Al B. Sure!, Barry White, El DeBarge, James Ingram & Quincy Jones Miles Davis Benny Goodman and His Orchestra 65 Good Morning Heartache Get a Clue Feeling Good The Way You Look Tonight Sweet Home Alabama In a Sentimental Mood The Girl from Ipanema Dance Me to the End of Love Careless Love J'Ai Deux Amours My Baby Just Cares for Me I'll Be Seeing You (1944 Single) I Think It's Going to Rain Today Flamenco Sketches Concierto de Aranjuez Get Your Way Give Me the Night High & Dry (US Version) Chris Botti & Jill Scott Simon & Milo Nina Simone Tony Bennett Lynyrd Skynyrd John Coltrane Astrud Gilberto, João Gilberto & Stan Getz Madeleine Peyroux Madeleine Peyroux Madeleine Peyroux Nina Simone Billie Holiday Norah Jones Miles Davis Jim Hall Jamie Cullum George Benson Jamie Cullum 3.h Pop Unwritten Beep Walk Away For You I Will (Confidence) Stupid Girls Rush Jesus, Take the Wheel L.O.V.E. What's Left of Me (Main Version) Stickwitu Don't Cha Crash Since U Been Gone Because of You Breathe (2AM) Hollaback Girl These Words Collide Behind These Hazel Eyes Black Horse and the Cherry Tree (Radio Version) Ms. New Booty (Edited Radio Shorter Version) Natasha Bedingfield The Pussycat Dolls Kelly Clarkson Teddy Geiger P!nk Aly & AJ Carrie Underwood Ashlee Simpson Nick Lachey The Pussycat Dolls The Pussycat Dolls Gwen Stefani Kelly Clarkson Kelly Clarkson Anna Nalick Gwen Stefani Natasha Bedingfield Howie Day Kelly Clarkson KT Tunstall Bubba Sparxxx & Ying Yang Twins 66 Breakaway Hung Up Sorry Boyfriend Rich Girl Barbie Girl (Radio) La Tortura American Pie Just the Girl Miss Independent Dirrty (Featuring Redman) I'll Be We Belong Together 4ever Beautiful Beautiful Soul Don't Forget About Us Cool Don't Bother Get the Party Started Inside Your Heaven Toxic My Happy Ending Kelly Clarkson Madonna Madonna Ashlee Simpson Gwen Stefani & Eve Aqua Shakira & Alejandro Sanz Don McLean The Click Five Kelly Clarkson Christina Aguilera featuring Redman Edwin McCain Mariah Carey The Veronicas Christina Aguilera Jesse McCartney Mariah Carey Gwen Stefani Shakira P!nk Carrie Underwood Britney Spears Avril Lavigne 3.i R & B/Soul So Sick Be Without You (Kendu Mix) Check On It Yo (Excuse Me Miss) Yeah! Temperature UnpredicTable (Main) Run It! (Featuring Juelz Santana) Love When You're Mad Crazy in Love One, Two Step Milkshake If I Ain't Got You Killing Me Softly with His Song Ne-Yo In My Own Words Mary J. Blige Beyoncé & Slim Thug Chris Brown Usher featuring Lil' Jon & Ludacris Sean Paul Jamie Foxx & Ludacris Chris Brown Keyshia Cole Ne-Yo Beyoncé Ciara featuring Missy Elliot Kelis Alicia Keys Fugees 67 Black Sweat Gimme That Ordinary People My Boo (Bonus Track) Naughty Girl 4 Minutes Stay Caught Up Play That Funky Music Lose My Breath Fallin' Goodies Leave (Get Out) Baby Boy Family Affair All My Life One Back at One Oh Dime Piece Conceited (There's Something About Remy) Survivor Wanna Love You Girl September Superstition As By Your Side Dance With My Father Prince Chris Brown John Legend Usher & Alicia Keys Beyoncé Avant Ne-Yo & Peedi Peedi Usher Wild Cherry Destiny's Child Alicia Keys Ciara featuring Petey Pablo JoJo Beyoncé & Sean Paul Mary J. Blige K-Ci & JoJo Mary J. Blige & U2 Brian McKnight Ciara featuring Ludacris Nick Cannon featuring Izzy Remy Ma Destiny's Child Robin Thicke & Pharrell Williams Earth, Wind & Fire Stevie Wonder Stevie Wonder Sade Luther Vandross Goodies 3.j Rock --Bad Day --You're Beautiful --Always On Your Side --Upside Down --Goodbye My Lover Daniel Powter James Blunt Sheryl Crow & Sting Jack Johnson James Blunt 68 --Over My Head (Cable Car) The Fray --Photograph Nickelback --Girl Next Door Saving Jane --Lights and Sounds Yellowcard --Savin' Me Nickelback --Ever the Same Rob Thomas --Who Says You Can't Go Home (Featuring Jennifer Nettles) Bon Jovi & Jennifer Nettles --The Real Thing Bo Bice --Better Days Goo Goo Dolls --100 Years Five for Fighting --You and Me Lifehouse --Drops of Jupiter Train --Hemorrhage (In My Hands) Fuel --Crazy B***h Buckcherry --California Phantom Planet --Just Feel Better (Featuring Steven Tyler) Santana & Steven Tyler --Animals Nickelback --Iris Goo Goo Dolls --Right Here Staind --Brown Eyed Girl Van Morrison --She Will Be Loved Maroon 5 --Bat Country Avenged Sevenfold --Someday Nickelback --Wasteland 10 Years --How You Remind Me Nickelback --Sitting, Waiting, Wishing Jack Johnson --Here Without You 3 Doors Down --Bom Bom Bom Living Things --Bohemian Rhapsody Queen --High James Blunt --Have a Nice Day Bon Jovi --Hotel California Eagles --Lonely No More Rob Thomas --This Love Maroon 5 --Far Away Nickelback 69