Thesis - MuE: Music + Engineering

Transcription

Thesis - MuE: Music + Engineering
UNIVERSITY OF MIAMI
ON THE VIABILITY OF USING
MIXED FEATURE EXTRACTION WITH MULTIPLE STATISTICAL
MODELS TO CATEGORIZE MUSIC BY GENRE
By
Benjamin Fields
A RESEARCH PROJECT
Submitted to the Faculty
of the University of Miami
in partial fulfillment of the requirements for
the degree Master of Science in Music Engineering Technology
Coral Gables, Fl
May 2006
UNIVERSITY OF MIAMI
A thesis submitted in partial fulfillment of
the requirements for the degree of
Master of Science in Music Engineering Technology
ON THE VIABILITY OF USING
MIXED FEATURE EXTRACTION WITH MULTIPLE STATISTICAL
MODELS TO CATEGORIZE MUSIC BY GENRE
Benjamin Fields
Approved:
_________________________
Colby N. Leider
Asst. Prof., Music Engineering
________________________
Dr. Edward P. Asmus
Assoc. Dean, Graduate Studies
_________________________
Dr. Miroslav Kubat
Assoc. Prof., Electrical Engineering
_______________________
Ken C. Pohlmann
Prof., Music Engineering
FIELDS, BENJAMIN
(M.S. in Music Engineering Technology)
(May 2005)
On the Viability of Using Mixed Feature Extraction with Multiple Statistical
Models to Categorize Music by Genre
Abstract of a Master’s Research Project at the University of Miami
Research project supervised by Assistant Professor Colby Leider
Number of pages in text: {final count}
In recent years, large capacity porTable music players have become
widespread in their use and popularity. Coupled with the exponentially
increasing processing power of personal computers and embedded devices, the
way in which people consume and listen to music is ever-changing. To facilitate
the categorization of music libraries, a system has been created using existing
MPEG-7 feature vectors and Mel-Frequency Cepstral Coefficients (MFCC)
evaluated through a number of trained Hidden Markov Models (HMM) and other
statistical methods. Previously, MPEG-7 spectral feature vectors have been used
for audio classification through a number of means including the use of HMMs.
In an effort to improve accuracy and robustness, the system expands beyond
MPEG-7 tools to include MFCC-based analysis. The output of these models is
then compared, and a genre choice is made based on which model appears most
accurate. This project explores the use of MPEG-7-compliant and other feature
vectors used in conjunction with HMM sound models (one per feature per genre)
as a means for categorizing music files by these pre-trained genre models.
Results from these tests will be analyzed and ways to improve the performance of
a genre sorting system will be discussed.
ACKNOWELDGEMENTS
The body of work presented was neither made in isolation or without
context. Foremost, my parents, without them I simply wouldn’t exist, let alone
have made it this far.
WRITE MORE YO!
iii
Table of Contents
1.
Introduction and Background ................................................................................... 1
1.1. Concept of Genre........................................................................................... 1
1.2. Feature Vector Extraction .............................................................................. 2
1.2.1. MPEG-7 Audio....................................................................................... 3
1.2.2. Mel-Frequency Cepstral Coefficients ...................................................... 5
1.3. Statistical Learning Methods.......................................................................... 6
2. Preliminary Study via Simple Investigation ............................................................ 10
2.1. Overview ..................................................................................................... 10
2.2. System Architecture..................................................................................... 11
2.2.1. Genre Model Creation........................................................................... 11
2.2.2. Testing A Song ..................................................................................... 15
2.3. Experiment .................................................................................................. 17
2.4. Results......................................................................................................... 18
2.5. Analysis and Summary ................................................................................ 19
3. The Song Sorting System ....................................................................................... 21
3.1. System Overview......................................................................................... 21
3.2. The MFCC chain ......................................................................................... 22
3.3. The Beat Chain ............................................................................................ 23
4. The Experiment...................................................................................................... 26
4.1. Proof of Concept.......................................................................................... 26
4.2. The Data Set ................................................................................................ 30
4.3. The Trial Runs............................................................................................. 31
5. Results ................................................................................................................... 41
5.1. Ten Genre Trial ........................................................................................... 41
5.2. Eight Genre Trial ......................................................................................... 43
5.3. Six Genre Trial ............................................................................................ 46
5.4. The Two Four Genre Trials (Original dataset v. iTunes dataset)................... 47
6. Analysis and Conclusion ........................................................................................ 49
6.1. Overall Performance .................................................................................... 49
6.2. What can this tell us about music? ............................................................... 52
6.3. Future Work ................................................................................................ 52
Bibliography ................................................................................................................. 55
Appendix Full Trial Song Listings with Genre Assignments........................................ 58
iv
Table of Figures
Figure 1 A discrete Markov process with 4 states. ........................................................... 7
Figure 2 High level overview of the model building and song testing process................ 11
Figure 3 A breakdown of the dimensional reduction process ......................................... 12
Figure 4 Raw spectragram of an ‘electronic’ clip........................................................... 13
Figure 5 The AudioSpectrumBasis of Figure 4.............................................................. 14
Figure 6 Structure of the song testing procedure............................................................ 15
Figure 7 Misses of the test bed ...................................................................................... 19
Figure 8 Broad overview of testing process with decision algorithm.............................. 22
Figure 9 Overview of the testing process for a single song ............................................ 25
Figure 10 The output of the basis functions used to created the 'classical' genre from the
data set used in the run detailed in Chapter 2. ........................................................ 27
Figure 11 The output of the basis functions used to created the 'electronic' genre from the
data set used in the run detailed in Chapter 2. ........................................................ 27
Figure 12 The output of the basis functions used to created the 'Jazz' genre from the data
set used in the run detailed in Chapter 2................................................................. 28
Figure 13 The output of the basis functions used to created the 'Rock' genre from the data
set used in the run detailed in Chapter 2................................................................. 28
Figure 14 Scatter of the BPM data for the training songs ............................................... 29
Figure 15 Scatter of the covariance data for the training songs ...................................... 29
Figure 16 Scatter of the reliability data for the training songs ........................................ 30
Figure 17 The output of the basis functions for the 'Alternative' genre........................... 33
Figure 18 The output of the basis functions for the 'Blues' genre ................................... 33
Figure 19 The output of the basis functions for the 'Classical' genre .............................. 34
Figure 20 The output of the basis functions for the 'Electronic' genre ............................ 34
Figure 21 The output of the basis functions for the 'Folk' genre ..................................... 35
Figure 22 The output of the basis functions for the ‘Hip-Hop/Rap’ genre ...................... 35
Figure 23 The output of the basis functions for the 'Jazz' genre...................................... 36
Figure 24 The output of the basis functions for the 'Pop' genre ...................................... 36
Figure 25 The output of the basis functions for the 'Rock' genre .................................... 37
Figure 26 The output of the basis functions for the 'R & B/Soul' genre.......................... 37
Figure 27 Scatter of the BPM data................................................................................. 38
Figure 28 The covariance of each BPM......................................................................... 38
Figure 29 The reliability of the BPM............................................................................. 39
Figure 30 Accuracy versus number of genre classes ...................................................... 50
v
Table of Tables
Table 1 Overall accuracy rates of the trial with the average confidence. ............... 18
Table 2 Summary of the genre output and error distribution, data from chpt.2 ..26
Table 3 Accuracy for the large datatset of all ten genre classes..............................42
Table 4 Accuracy by the spectral envelope for the large datatset...........................42
Table 5 Accuracy by the MFCC for the large datatset..............................................43
Table 6 Accuracy by the tempo based system for the large datatset ......................43
Table 7 Accuracy of the combined system for the eight genre classes. ..................44
Table 8 Accuracy by the spectral envelope for the eight genre classes. .................45
Table 9 Accuracy by the MFCC based system for the eight genre classes. ............45
Table 10 Accuracy by the tempo based system for the eight genre classes. ..........45
Table 11 Accuracy of the combined system for the six genre classes. ....................46
Table 12 Accuracy of the spectral envelope system for the six genre classes. .......47
Table 13 Accuracy of the MFCC system for the six genre classes. ..........................47
Table 14 Accuracyof the tempo-based system for the six genre classes.................47
vi
1. Introduction and Background
1.1.
Concept of Genre
The idea of genre as a means of classification has been seen in most forms
of media since each particular form has been in substantial artistic use. The word
is French and in its most literal sense means “kind” “sort” or “type,” though it
shares the root as the Latin word genus[10, 11]. The current definition according
to Oxford American Dictionary [12] is as follows:
A category of artistic composition, as in music or literature, characterized
by similarities in form, style, or subject matter.
So genre as a categorical methodology is clearly highly qualitative and flexible in
nature. It is this lack of definition that makes it both useful across many forms
and at the same time very difficult to implement computationally [5, 11]. Still,
this taxonomy, even with its fuzzy nature, is worthy of investigation. People have
been using this process as a means of categorizing various forms of art for a few
thousand years now. The earliest discussions of genre (used first to describe
literary works) are thought to have been the work of Aristotle [10].
Any categorization topography (genre or otherwise) should ideally have
two properties [10, 11], mutual exclusivity and joint exhaustivity. This is to say,
(genre) categories should not overlap in the media they describe and, when all are
taken together, should describe every possible and known piece of a given media.
These ideals are just that, as they are not seen in any practical use of genre. For
instance, when looking at music, if a piece of music is identified as “rock” is it
then impossible to label it as “punk?” This question is further complicated when
looking for broad consensus of a given typology across a population [13].
1
It seems then, in order to have an automated system that is capable of this
task of genre categorization, two distinct tasks must be performed. First,
meaningful features must be produced to describe the characteristics of a given
piece of media. In the case of music audio files, this can be done using a number
of techniques, including those outlined in the MPEG-7 audio descriptor standard
[1] and various others [2, 4, 5, 8, 18]. Once features have been extracted
(hereafter referred to as feature vectors), a decision must be made as to which
genre(s) this particular music audio file belongs. While there would seem to be a
number of routes to go to handle this end of the problem [5, 18, 20], the fuzzy
nature of this portion of the problem, combined with the dynamic nature of genre
typology, leads to the use supervised statistical learning models, in particular
hidden Markov models (HMM), which allow for training by example and a
straightforward framework to compare test songs to models via the Viterbi
algorithm [9].
1.2.
Feature Vector Extraction
The first step of this process is to extract feature vectors that yield
meaningful information concerning genre categories. What is meant by
meaningful information will be explored further in the next section. First though,
an examination of the various feature vectors available and how they interact
with one another.
2
1.2.1.
MPEG-7 Audio
The MPEG-7 audio framework provides a wide variety of standardized
feature vectors, intended for many purposes. These feature vectors (called
Descriptors in MPEG-7 language) range from simple and very low level
(AudioWavefornmD) to more complex spectral and temporal measures
(AudioSpectrumBasisD, HarmonicSpectral-VariationD). Additionally, the
MPEG-7 framework contains higher-level functions called DescriptorSchema
(DS), which uses and processes these lower-level feature vectors in order to
discern a more abstract quality of the audio in question.
When attempting to classify types of audio, there is a descriptor schema of
particular interest, the SoundModelDS. This DS uses the AudioSpectrum set of
descriptors as a means to build a HMM describing the audio in a given class.
There is a descriptor designed to work in conjunction with the SoundModelDS,
the SoundModelStatePathD. This descriptor takes a model created by the
SoundModelDS and an audio file to test as input. It then runs the test audio file
through the same flow process used to create the input for the HMM for the
model being used (though now there is only one audio file’s Spectrum Envelope
data being used). But rather than create an HMM with this data, it will be used to
estimate a state transition path that could account for this data with the HMM
(see [9] for a good discussion of the algorithm used in this process). Along with
this state path, the probability, or log likelihood, that this path is the correct path
is also calculated. This log likelihood allows for a means to compare multiple
HMMs to identify which of Markov model (and by extension, the training
examples) best describes the test data.
3
Using this schema and descriptor group, a wide variety of different
classification and retrieval tasks can be performed on a broad range of audio file
types, all with fair to excellent accuracy rates [3,5,14]. However, when examining
as detailed of a taxonomy as genres of music, this methodology by itself can leave
something to be desired in accuracy.
In addition to the SoundModelClassifierDS and the associated descriptors,
there exist a number of other descriptors that can be found useful in the task of
classifying by genre. These descriptors index parts of the audio signal that are
dissimilar to those indexed by the Sound-ClassifierModelDS. When working on
digital music files, the Sound-ClassifierModelDS can be thought to be classifying
based for the most part on a sort of average timbre of each song. Further, timbre
is explored in a complimentary way through the use of Cepstral Coefficients, as
will be discussed later. In order to achieve the maximum accuracy possible of a
genre model, care must be taken to use a diverse selection of features. Within the
MPEG-7 construct, two principal parts of music should then be considered:
tempo and pitch.
Tempo within MPEG-7 is primarily analyzed through the use of the
AudioBpmD descriptor. This descriptor’s primary use is to determine the beats
per minute (BPM) of a digital music file. Besides this primary scalar feature, two
other scalar measures, a correlation and accuracy measure, are returned as well,
both which provide meaningful insight concerning tempo. All three of these
measures are calculated at both the beginning and mid-point of each digital
music file that is processed. The algorithm used to calculate the BPM is based
upon a common method detailed in [15].
4
Pitch-detection methods, though present within the MPEG-7 standard, are
not robust enough to deal with source material with more than a single isolated
melodic-line present. The pitch detection methods of MPEG-7 are therefore
useful only within single note at a time environments [1]. This makes it unable to
meaningfully process a majority of musical recordings, as most recordings
contain multiple notes in parallel, across a number of instruments. These pitch
detection tools are encapsulated within MelodyDS. This descriptor schema has
not yet been implemented within [6].
1.2.2.
Mel-Frequency Cepstral Coefficients
Mel-Frequency Cepstral Coefficients (MFCC) is a feature vector used
commonly in speech recognition and classification processes [16]. It has also
been shown that MFCC can be found to be useful for the task of general audio
classification and specifically genre based classification [17, 18, 19].
At first glance, there are a number of problems that appear when
attempting to use MFCC as a means to classify music [19]. The filterbanks are
designed to maximize effectiveness on bands of audio relevant to speech, which
could ignore valuable data in other audible frequency ranges. Further, though
the Mel Scale was designed to mimic the response of the Cochlea, it has been
shown to be fairly inaccurate. Nevertheless, MFCC has been shown to be a good,
if not excellent feature to use in sound classification tasks, though there is
certainly ample room for optimization of this process for this purpose.
5
A discrete cosine transform (DCT) is the final step after the Mel-Scaling
has been performed. This serves to move the signal from the frequency domain
into the quefrency domain, yielding the coefficients of the cepstrum [19].
1.3.
Statistical Learning Methods
In examining statistical learning models for use in music and other sound
classification procedures, the hidden Markov model (HMM) is a natural
choice, due to its historical role in both speech recognition and categorization
[7, 16]. In this usage, the HMM is formed based on the data of a matrix of
feature vectors compiled from examples. This matrix is preprocessed to
maximize effectiveness and then used to approximate a finite state machine
that, given an optimal state transition path, could generate the original data.
In effect, known information is used to generate an unknown, or hidden
model. Once this model is generated, it is trained to increase the likelihood
that the known data will correspond with that which is produced by the
model. A related process is used to examine test data against the model to
determine how possible it is for that test data to have been produced by the
model generated by the training data [9].
In order to understand the construction process of a hidden Markov
model, an understanding of a discrete Markov process is necessary. A discrete
Markov process is a system of N states and can be seen in a graphic form in
figure 1 (where N is set to 4). At a regular time period, the system will follow a
6
transition path based upon the probabilities assigned to that state (this could
result in remaining in the current state). The probability of any state
transition is defined in equation (1).
Figure 1 A discrete Markov process with 4 states. Each state transition probability is
labeled as Aij, where i is the current state and j is the previous state
7
aij = P [Qt = state(i) | Qt !1 = state( j)],1 = " j = 0 aij
N
(1)
Where N is equal to the number of states in the discrete Markov process, Qi is the
current state, and Qi-1 is the last state.
This process is also commonly referred to as an observable Markov
process, due to fact that the output of the process is simply the current state of
the system. With a system where all transition probabilities are known, a number
of behavioral probabilities can be calculated. For example, by equation (2) the
probability that the model will stay in the same state for exactly T discrete time
periods can be easily calculated.
( ) (1 ! a ) = p (T )
P (O | Model, q1 = State(i)) = aij
T !1
ij
i
(2)
This equation yields the probability density function of the event of staying in
state i for a duration T. From this it is possible to determine the expected
number of increments one can observe at a given state via equations (3a) and
(3b).
!
T i = " Tpi (T )
(3a)
T =1
"
( ) (1 ! a ) = 1 !1a
= # Tp aij
T =1
T !1
ij
(3b)
ij
The principal difference between a discrete Markov model (as highlighted
above) and the hidden Markov models discussed and used throughout this paper
is in the contents of the state. In a discrete Markov model, the observation and
the current state in the model are the same. If, for example, one were using a
discrete Markov model to model the weather, then each state would represent a
8
weather condition (i.e., sunny state, rainy state, cloudy state, etc.). However, in a
hidden Markov model, the state itself is a stochastic process that is not
observable (hence hidden); the only way to observe this process is via another set
of stochastic processes that are observable and can be thought of as a discrete
Markov model. A thorough explanation of all the mathematics necessary to
construct and use hidden Markov models can be seen in [7].
For recognition or categorization tasks, a number of HMMs are generated
based on different types of test data. To determine which model most closely
aligns with a given test set, the Viterbi algorithm is run using the test data as the
desired output against each model [9]. This algorithm finds the most probable
state path within a given model to explain the corresponding output. It does this
by recursively backtracking through the state transition path, selecting the most
likely last state for each transition. For each model a measure of the probability
is calculated, called the maximum log likelihood (MLL). The most reasonable
model to associate with the test data is simply the model with highest MLL.
Through this process, HMM can be used to facilitate a measure of closest fit via a
given multidimensional data set, regardless of what the dataset represents.
9
2. Preliminary Study via Simple Investigation
2.1.
Overview
The MPEG-7 audio standard has, among its various uses, been widely used
as a means of classifying audio signal through feature extraction [1]. Many of
these efforts have been very successful, achieving high rates of accuracy across a
large range of signal types [2, 3, 4]. In contrast, the problem of genre sorting
audio files has been more limited in its success, due to a number of factors [5, 8],
including poorly structured genre hierarchies, imperfect information from the
digital audio source files and a lack of statistical independence between genres
along measured dimensions.
Therefore it seems reasonable to apply the techniques that are successful
in general audio sound classification to the problem of sorting music by genre
and measure the results. One of the inherent problems in the use of MPEG-7
feature vectors in the evaluation of music is the high computational burden of
feature extraction. As such, and also to facilitate a speedy implementation of this
preliminary study, a minimal number of MPEG-7 descriptors have been used.
These descriptors are all energy or spectrum based by nature, however, the use of
HMM provides a means of modeling the change of these features over time,
giving a means of examining temporal information in the signal without having
temporal feature vectors. The feature vectors used in this preliminary study have
been selected to mirror the MPEG-7 structure for audio classification.
10
2.2.
System Architecture
Two distinct processes take place in the described genre classification system.
The first is process to create and train a HMM for each genre class. After all of
the genre HMMs have been created and trained, a second process of extracting
features and model matching occurs with each song to be classified by the
system. All of these computations and analysis have been done in MATLAB.
Feature extraction was done through the use of the MPEG-7 XM toolkit [6].
2.2.1.
Genre Model Creation
The process of creating and training an HMM to describe a genre follows a
similar process as seen in [3] to build a sound model to describe a class of
sounds. This can also be seen in Figure 2. First, a group of songs (or song clips)
are selected to be the training examples for a given genre.
Figure 2 High level overview of the model building and song testing process
11
The training examples can be as broad or focused as the particular
application requires though it is important that whatever the methodology used
for training sample selection, the entirety of the given genre is covered so as to
minimize false negatives. For each of these songs, the AudioSpectrumEnvelope
MPEG-7 descriptor is obtained. This is a logarithmic representation of the
frequency spectrum on multiples of the octave and is the basic feature vector
used in this entire process. Then output of each song’s AudioSpectrumEnvelope
descriptor is passed into the AudioSpectrumBasis descriptor. This process can
be seen graphically in Figure 3. An examination of the signal through this
process can be seen in Figure 4 and Figure 5. This descriptor is a wrapper for a
group of basis functions that are used to project the AudioSpectrumenvelope
descriptors on to a lower dimensionality space to facilitate classification. This
descriptor is generated through a matrix multiplication of the
AudioSpectrumEnvelope and the matrix produced by the basis functions. This
output contains the most statistically relevant features amongst all of the music’s
feature space.
Figure 3 A breakdown of the dimensional reduction process
12
The relationship between the raw spectrum (or the AudioSpectrumEnvelope descriptor) and the AudioSpectrumBasis descriptor can be seen across
figures 4 and 5. In Figure 4, the spectrum of a 30 second clip of a song can be
seen. Larger amplitudes can be seen in certain frequency ranges across the entire
length of the clip. Comparing this to the basis reductions in Figure 5, these bands
of higher amplitude correspond to similar high amplitude peaks in the basis
functions. The AudioSpectrumBasis descriptor contains the feature vectors that
are used to create the HMM that will be used to make the determination as to
which of the available genres our test information will fit. The HMM used in this
implementation is informative to the MPEG-7 specification for the SoundModel
descriptor scheme (see [1]) and uses the standard solutions to the 3 critical HMM
problems as can be found in [7]. It uses the Baum-Welch re-estimation algorithm
to optimize likelihood
Figure 4 Raw spectragram of an ‘electronic’ clip
13
Figure 5 The AudioSpectrumBasis of the same clip seen in Figure 4 with 10 feature
vectors in the reduction
This algorithm iteratively recalculates the state-to-state transitions for an
HMM. This is done by calculating the expected duration the model stays in a
given state and the expected number of transitions from each state to every other
state. These expected values are compared to durations and transitions seen in
the data set used to generate the HMM, the transition probabilities are adjusted
and the process runs again. This process continues until the difference seen
between the expected values and the data set are acceptably minimal.
14
2.2.2.
Testing A Song
The feature extraction portion of testing proceeds in much the same way as the
feature extraction for the creation of each genre HMM, with the exception being
that by design only one song’s (or clip’s) AudioSpectrumEnvelope descriptor will
be used to generate the AudioSpectrumBasis descriptor. After the
AudioSpectrumEnvelope is produced it is used to calculate a maximum
logarithmic likelihood (MLL) [2] against each of the previously created HMM,
one for each genre. The MLL is produced as a product of the
SoundModelStatePath descriptor. This can be seen in Figure 6.
Figure 6 Structure of the song testing procedure
15
This descriptor will compute the state path that will most likely be taken for a
given set of feature vectors in a HMM. The MLL is a measure of how likely that
particular path in the HMM is the actual path that was taken to produce the
observed features. The state path and MLL are both computed using the
standard Viterbi algorithm [2, 9]. This process is repeated with each of the
various genre-specific HMM. The final part of the decision making process is
then quite simple. The largest value MLL is taken as the mostly likely model for
the song and therefore the assumed genre of the test song.
G = max ([ Likelihood1 , Likelihood2 ,...Likelihoodn ]) ,
(4)
where n = total number of genres.
Lastly, after the genre has been determined a confidence index is
calculated based on equation 5:
"
%
$
'
normG
C = 100 $ n
'
$ normlikelihood '
i'
$# !
&
1
(5)
where the normlikelihood is created as follows:
normlikelihoodi = likelihoodi ! min ([ Likelihood1 , Likelihood2 ,...Likelihoodn ])
(6)
The idea behind this confidence metric is to get a better idea of how close the
decision was as to which genre class the song is a member. Generally speaking it
16
should be higher the larger the delta between the selected genre class and the
runner up genre class. The minimum possible confidence is based upon the
number of genres in trial and is calculated in Equation 7.
" 100 %
Cmin = $
# 1 ! n '&
(7)
For a test set containing four genre classes, as used in the following experiment,
CMIN would be equal to 33.33%.
2.3.
Experiment
As mentioned previously, all testing has taken place in MATLAB and, where
applicable, has used the standard MPEG-7 XM toolkit.
The media used was all evaluated in the Microsoft WAV format, though each
file had previously been encoded as an MP3 (with bit rates varying from 128 kbs
to 192 kbs). A 30-second clip from the middle of each audio file was extracted
prior to the evaluation process regardless of whether the song was being used for
training or for testing.
In the sample set, a model of four genre classes was selected with 36 songs used
for training (9 for each genre). In the testing a total of 40 songs were used (10 for
each genre class). Additionally, before the test songs were run, each training
song was run through the system as though it were a testing song. This was done
to insure that the previously mentioned selection method was one that made
sense and would produce meaningful results, prior to evaluating the test data set.
The accuracy of this trial run was 100%.
17
2.4.
Results
After running the entire test set through the genre sorter, the results were quite
good, given the limited size of the data sets. The overall accuracy of the system
was 82.50%; however, there was a wide variation in the accuracy of results
among the genre classes, as can be seen in Table 1. The best performance was
seen in the “classical” genre class with an accuracy rate of 100%. Next after that
was the “jazz” genre class with an accuracy rate of 90%. The other two genre
classes, “electronic” and “rock,” came in third and forth respectively, with
accuracy rates of 80% and 60%.
Genre Class
classical
jazz
electronic
rock
Total
Accaracy Rate
100.00%
90.00%
80.00%
60.00%
82.50%
Average Confidence
48.08%
47.41%
41.00%
45.29%
-
Table 1 Overall accuracy rates of the trial with the average confidence, by genre.
18
Figure 7 Misses of the test bed, grouped by the genre in which they should have been
classified. The rock genre is clearly the most error prone in this test followed by
electronic and then jazz. Classical, having no errors, is not in the chart.
2.5.
Analysis and Summary
With an average result of 82.5%, this system performed adequately. This
is a slightly lower accuracy rate than a test of a nearly identical system seen in
[14]. This accuracy rate would seem to indicate a strong (though less than that of
the data set used in [14]) statistical independence of the genres, as they are
defined in this data set and along the dimensionality seen using the
AudioSpectrumEnvelope. However, the high rate of failure in the “rock” genre is
a strong justification for additional features being used, in order to add additional
dimensions to genre sorting system.
There does seem to be a mild correlation between the confidence index
19
and the accuracy rate of a given genre. This provides cause for the use of the
confidence index in the larger system, as will be discussed in the following
chapter in further detail.
20
3. The Song Sorting System
3.1.
System Overview
To make the system detailed in Chapter 2 robust enough to maintain or
even improve its accuracy rates as we examine more genre classes, more feature
vectors must be added. These features will each be used independently of the
existing system to determine a genre. After each feature chain has made a
decision as to which genre a test song belongs, a final decision will be made based
on which genre is chosen by at least two out of the three feature chains . If this
does not provide a clear genre, the confidence measure produced by each feature
chain is also used to determine the overall genre of the test song. This overview
can be seen in Figure 8.
As can be seen in the figure, there are three independent feature-decision
chains in the Song Sorting System. The first is the system outlined in chapter 2.
The second is feature chain based around an extraction of the MFCC of the music
audio file. The third chain is based around the automatic extraction of the beat
and tempo related information as extracted by AudioBPMD. Each of these three
chains has as its output two pieces of data, the genre estimate and the confidence
measure for that estimate. This confidence measure is generated through the
same means for each of the feature-decision chain, in much the same way as the
confidence measure was generated in the preliminary single chain system
(Chapter 2, Equations 2 - 4), though it is modified in the case of tempo chain.
21
Figure 8 Broad overview of testing process with decision algorithm
The inverse of Equation 5 is taken in this case, as the selected genre has the
lowest (rather than highest, in the other chains) score.
3.2.
The MFCC chain
In many ways, the MFCC chain is vary similar to the AudioSpectrumEnvelopeD based chain used in the preliminary experiment. The only major
22
difference is that rather than extracting a number of signal envelopes, a matrix of
MFCC coefficients is extracted as discussed in Chapter 1.2.2. After all the
training files have their MFCC coefficient matrices extracted, the stacked matrix
is sent to AudioSpectrum-ProjectionD and from there to the AudioSpectrumBasisD. This follows the same signal flow as the SoundModelDS. This modified
MFCC chain’s block diagram can be seen in Figure 10. As seen in the SoundModelDS, the output of AudioSpectrumBasisD is used to create and train an
HMM. This training process can be seen in Figure 9. The HMM is then used by
the testing process, along with the song to be tested, to create a maximum likely
path and with it the log likelihood of that path occurring and thus a genre
selection can be made based on the cepstral coefficients.
3.3.
The Beat Chain
The third genre decision chain is based around beats per minute and the
two statistical byproducts (covariance and reliability) produced by the
AudioBpmD descriptor. The creation of this model is a far simpler process than
the model creation of the two prior chains. The model is composed of three 2 x N
matrices, where N is equal to the number of songs used in each model. These
matrices store the values of each of the scalar values that are produced as output
by the AudioBpmD descriptor, BPM, correlation, and reliability. The correlation
and reliability are both byproducts of the filterbank into combfilter methodology
employed by the tempo-detection algorithm [15]. The first row contains values
calculated at the beginning of each audio file; the second row contains values
23
calculated from the midsection of each audio file. These three matrices are then
stored as the beat model for a given genre.
The testing phase of the beat chain begins with the extraction of the six
scalars using the AudioBpmD. Each pair of scalars (BPM, correlation, and
reliability) can then be thought of as a vector in a two-dimensional space.
Similarly, each pair of scalars in the model matrices can be considered in the
same way. Then a distance measure is taken between the test vector and each
vector in the corresponding training matrix. A cosine distance measure is used
here over simple Euclidean distance as the cosine distance has been found to
yield better results in music similarity tasks [22]. Once the distance measures
have been taken they are normalized and averaged together to yield an overall
score of the test song against the genre model. This score is taken for each genre
model, from which the minimum score (smallest average distance) across all the
models is taken to be the genre from the perspective of the beat chain. This
process is illustrated in Figure 9. This process puts equal emphasis on the tempo
as well as the reliability and correlation of the tempo. This accounts for the
diverse nature of different genres. Where some genres may see large difference
in tempo, these same genres may show a high degree of independence and
correlation in the reliability of that tempo or the correlation measure of the
tempo.
24
Figure 9 Overview of the testing process for a single song
25
4. The Experiment
4.1.
Proof of Concept
As a starting point for this system, the same training and testing song data
set was used as was used in the single chain system detailed in Chapter 2. The
basis functions used to generate the hidden Markov models can be seen in
Figures 10 – 13. These figures are surface plots of data that was represented in
Chapter 2 in the form of multiple best-fit lines (Figure 5). These surface plots
allow for an easy visual comparison of the basis functions used to generate the
HMMs for each genre.
The tempo related models are seen in Figures 14 – 16 in the form of scatter
plots. These scatter plots make apparent the low statistical separation seen along
the tempo dimension. The full results of this trial can be seen in Table 2. It can
be seen that the full system gave a slight increase (0.5%) in performance.
Output\Correct
Classical
Electronic
Jazz
Rock
Classical
Electronic
Jazz
100%
0
0
90%
0
0
0
10%
Rock
0
0
100%
0
0
60%
20%
20%
Table 2 Summary of the genre output and error distribution seen when running the
data set used in the trial in Chapter 2 through the expanded system.
26
Figure 10 The output of the basis functions used to created the hidden Markov
models for the spectral envelope chain (on top) and the MFCC (bottom) for the
'classical' genre from the data set used in the run detailed in Chapter 2.
Figure 11 The output of the basis functions used to created the hidden Markov
models for the spectral envelope chain (on top) and the MFCC (bottom) for the
'electronic' genre from the data set used in the run detailed in Chapter 2.
27
Figure 12 The output of the basis functions used to created the hidden Markov
models for the spectral envelope chain (on top) and the MFCC (bottom) for the
'Jazz' genre from the data set used in the run detailed in Chapter 2.
Figure 13 The output of the basis functions used to created the hidden Markov
models for the spectral envelope chain (on top) and the MFCC (bottom) for the
'Rock' genre from the data set used in the run detailed in Chapter 2.
28
Figure 14 Scatter of the BPM data for the training songs from the initial trial colorcoded by genre. The Y-axis is the first BPM reading; the X-axis is the second.
Figure 15 Scatter of the covariance data for the training songs from the initial trial
color-coded by genre. The Y-axis is the first covariance reading; the X-axis is the
second.
29
Figure 16 Scatter of the reliability data for the training songs from the initial trial
color-coded by genre. The Y-axis is the first reliability reading; the X-axis is the
second.
When examining the genre break down of this test it can also be seen that
this expanded system lead to a more even distribution of the accuracy across each
genre as compared to the outcome of the single chain test detailed in chapter 2.
These results will be discussed further and compared to the larger trial’s data set
and results in chapter 5.
4.2.
The Data Set
To facilitate full experimental trials, a significantly larger set of digital music
test files is required. To that end, a means of selecting songs that are
representational of a wide number of genres is needed. Since the nature of this
system relies on statistical independence between genres, the specific songs
chosen will clearly have a profound effect on the accuracy rates produced by the
30
classifier. Further, it is important to establish an agreed upon ground truth genre
for each song used and that that genre assignment be as objective as possible,
given the inherent cultural subjectivity of genre and genre classification.
A natural fit for these requirements is the direct digital music retail download
service. The music retail industry must make decisions and assign music into
genre categories as part of its business, so the assignment of a genre is made as
new material is acquired. In a physical store this is done by placing media in
different areas in the store (i.e. which box does the record go in?) and by analog
this is seen through the use of metadata tags describing genre in the digital
download music retailers. Now, as a means for choosing a select representation
from each genre, the top 40 most popular songs, as determined by sales within
each genre, will be used. For these tests, Apple’s iTunes Music Store was used as
the source of these lists, though there are many other digital download services
available at the time of this writing that would have served just as well. For a
complete list of all songs used in the trial and their genre assignments, including
the breakdown of training songs and testing songs, please consult appendix A. A
simple random algorithm was used to divide each of these groups of 40 songs
into sub groups of 15 songs for training and 25 songs for testing. The largest trial
involved the use of 10 genres, with less genres being used in subsequent trials, as
will be described in detail later.
4.3.
Trial Runs
The trial runs are designed to examine the Song Sorting System’s structure
and usefulness in a number of ways. The broadest way this is done is through the
31
overall accuracy and the accuracy measures of each genre within the test set. In
order to gain further insight about the system and how it responds to the data set,
the errors in labeling are broken down by genre to look for emerging correlated
properties between genres based on the feature dimensionality examined in the
genre assignment process.
All ten genres’ selected test songs were used to create a model for each of
the three feature-decision chains used in the system. For a graphical
representation of the spectral envelope and MFCC features, see Figures 17 – 26.
A distribution of the beat-related models can be seen in Figures 27 - 29.
An examination of the basis function visualizations reveals many insights
about the genres’ data. One of the most visual prominent features can be seen in
the “Hip-Hop/Rap” model’s AudioSpectrumEnvelope-based basis functions
(Figure 22, top graph). There is a large peak value in the Z-axis across the entire
range of Y at very low values of X. This represents the bass heavy sound seen in
almost all of the songs used as training examples for this genre. A similar, though
less intense, bass emphasis can be seen in the alternative genre (Figure 17)
The beat-related models’ scatter plots also provide information about the
statistical grouping of the training example songs. In these figures, it is apparent
that there is substantial overlap across the genres, along this feature’s
dimensions. This overlap is seen at it’s highest in Figure 27, as BPM has the least
statistical grouping. The other two scalars, seen in Figure 28 and Figure 29, still
show a large amount of overlap, though there is a small amount of clustering seen
in a few of the genres.
32
Figure 17 The output of the basis functions used to created the hidden Markov
models for the spectral envelope chain (on top) and the MFCC (bottom) for the
'Alternative' genre
Figure 18 The output of the basis functions used to created the hidden Markov
models for the spectral envelope chain (on top) and the MFCC (bottom) for the
'Blues' genre
33
Figure 19 The output of the basis functions used to created the hidden Markov
models for the spectral envelope chain (on top) and the MFCC (bottom) for the
'Classical' genre
Figure 20 The output of the basis functions used to created the hidden Markov
models for the spectral envelope chain (on top) and the MFCC (bottom) for the
'Electronic' genre
34
Figure 21 The output of the basis functions used to created the hidden Markov
models for the spectral envelope chain (on top) and the MFCC (bottom) for the
'Folk' genre
Figure 22 The output of the basis functions used to created the hidden Markov
models for the spectral envelope chain (on top) and the MFCC (bottom) for the ‘HipHop/Rap’ genre
35
Figure 23 The output of the basis functions used to created the hidden Markov
models for the spectral envelope chain (on top) and the MFCC (bottom) for the
'Jazz' genre
Figure 24 The output of the basis functions used to created the hidden Markov
models for the spectral envelope chain (on top) and the MFCC (bottom) for the 'Pop'
genre
36
Figure 25 The output of the basis functions used to created the hidden Markov
models for the spectral envelope chain (on top) and the MFCC (bottom) for the
'Rock' genre
Figure 26 The output of the basis functions used to created the hidden Markov
models for the spectral envelope chain (on top) and the MFCC (bottom) for the 'R &
B/Soul' genre
37
Figure 27 Scatter of the BPM data for the training songs color-coded by genre. Yaxis is the first BPM reading, X-axis is the second.
Figure 28 The covariance of each BPM sorted by genre. The first reading makes up
the Y-axis and the second is the X-axis.
38
Figure 29 The reliability of the BPM measures taken for the training of each genre
model. The Y-axis is from the first reading and the X-axis is the second.
These models were then used for a series of four test runs, each using a
different subsection of the test songs. The purpose of the series of tests is to
examine the effect of limiting the number of genre classes on the overall accuracy
and the trends within the error spread of a given genre. For the first run, all ten
genre-models and the associated 250 test songs were used in the trial (25 songs
per genre). In the second and third trials, the two worst performing genres of the
previous trial are removed from the data set. So, in the second run there are 200
test songs evenly distributed across 8 genres. Similarly, in the third trial in
additional two genres are removed from the data set, leaving 150 songs across the
remaining 6 genre classes.
39
The forth trial was a bit different. Rather than remove the worst two
performing genres to decide upon the four classes to use in the last trial, the four
genres that were used in the proof of concept run were again used, but this time
as defined by the iTunes Music Store derived data set. The purpose of this test is
to examine the observable difference in genre definition and its corresponding
effect on accuracy of genre assignment.
40
5. Results
5.1.
Ten-Genre Trial
The ten genre-class trial was the first and largest of the trials using the data
from the iTunes Music Store. As noted in chapter 4, there were 15 songs used to
train each genre model, for a total of 150 songs used in the training process. On
the testing side, there were 25 songs used per genre for a total of 250 songs used
in testing, giving a grand total of 400 songs for the entire trial. As would be
expected, these trials took a non-trivial amount of time to process. All software
was built and ran in MATLAB 7 (R14) on an Apple G4 1Ghz Powerbook with 1GB
of RAM. In that environment the models took 60 minutes to run, for a total run
time of approximately 8 hours. The models were only built once and were used
for all the subsequent trials. The testing process was a bit faster, though still
lengthy, with each song taking a bit over a minute to process for a total testing
run time of about 5 hours.
Accuracy rates on the full data set were less than stellar, with an overall
accuracy rate of 37.75%. This accuracy rate was slightly better than the best
performing of the three feature-decision chains, the spectral envelope chain,
which had an accuracy rate of 37.35%. The accuracy of the MFCC chain came
next with an accuracy of 32.1%. The tempo-based chain did the poorest with an
accuracy of 13.7%. The overall accuracy of the entire system is broken down by
genre-class in Table 3.
41
output\expected A
B
C
E
F
H
J
P
RB
R
alternative
40% 20%
0% 16% 12%
0%
0%
8%
4% 52%
blues
4% 52%
0%
4% 24%
4% 28%
0%
4%
4%
classical
0%
0% 52%
0%
0%
0% 16%
0%
0%
0%
electronic
0%
0%
0% 12%
0%
0%
0%
0%
4%
0%
folk
0%
8%
4%
4% 48%
4%
8%
0% 12%
8%
hip-hop/rap
8%
0%
0% 16%
4% 48%
0% 12% 16%
4%
jazz
0%
8% 24%
0%
0%
0% 40%
4%
0%
0%
pop
20%
4% 16% 12% 12%
4%
4% 32% 28%
0%
R & B/Soul
8%
4%
4% 24%
0% 36%
4% 36% 32% 12%
rock
20%
4%
0% 12%
0%
4%
0%
8%
0% 20%
Table 3 Actual genre output by the system versus expected genre output for the
large datatset of all ten genre classes.
Since each of the three feature-decision chains are independent genre sorters
in and of themselves, a breakdown of output to expected output can be seen for
each of the three in Tables 4 - 6.
output\expected A
B
C
E
F
H
J
P
RB
R
alternative
16% 8% 0% 4% 4% 0% 0% 4% 0% 16%
blues
4% 64% 0% 0% 20% 0% 28% 0% 8% 4%
classical
0% 0% 52% 0% 0% 0% 16% 0% 0% 0%
electronic
0% 0% 0% 20% 4% 4% 0% 0% 4% 4%
folk
0% 0% 4% 4% 40% 0% 8% 0% 4% 0%
hip-hop/rap
16% 0% 0% 16% 4% 44% 0% 0% 12% 4%
jazz
0% 8% 24% 0% 0% 0% 40% 0% 0% 0%
pop
20% 8% 16% 20% 28% 8% 4% 40% 40% 32%
R & B/Soul
8% 8% 4% 32% 0% 36% 4% 48% 32% 16%
rock
36% 4% 0% 4% 0% 8% 0% 8% 0% 24%
Table 4 Actual genre output by the spectral envelope based system versus expected
genre output for the large dataset of all ten genre classes.
42
output\expected A
B
C
E
F
H
J
P
RB
R
alternative
56% 28% 0% 16% 12% 0% 0% 24% 8% 60%
blues
0% 24% 0% 4% 20% 4% 4% 0% 4% 0%
classical
0% 4% 44% 0% 8% 0% 16% 0% 0% 0%
electronic
8% 4% 0% 8% 0% 4% 0% 4% 0% 0%
folk
0% 20% 4% 8% 44% 4% 24% 0% 8% 12%
hip-hop/rap
0% 0% 0% 8% 0% 36% 0% 20% 28% 8%
jazz
4% 8% 44% 8% 8% 0% 48% 4% 16% 0%
pop
12% 8% 4% 8% 4% 4% 4% 12% 0% 4%
R & B/Soul
8% 4% 4% 28% 4% 40% 4% 24% 36% 4%
rock
12% 0% 0% 12% 0% 8% 0% 12% 0% 12%
Table 5 Actual genre output by the MFCC based system versus expected genre
output for the large dataset of all ten genre classes.
output\expected A
B
C
E
F
H
J
P
RB
R
alternative
52% 32% 40% 52% 44% 20% 44% 40% 36% 56%
blues
4% 12% 16% 4% 20% 8% 12% 12% 12% 12%
classical
0% 0% 0% 0% 0% 0% 0% 0% 0% 0%
electronic
0% 0% 0% 0% 0% 0% 0% 0% 0% 0%
folk
0% 0% 0% 0% 0% 0% 0% 0% 0% 0%
hip-hop/rap
44% 56% 36% 44% 32% 72% 40% 48% 52% 32%
jazz
0% 0% 0% 0% 0% 0% 0% 0% 0% 0%
pop
0% 0% 0% 0% 0% 0% 0% 0% 0% 0%
R & B/Soul
0% 0% 8% 0% 4% 0% 4% 0% 0% 0%
rock
0% 0% 0% 0% 0% 0% 0% 0% 0% 0%
Table 6 Actual genre output by the tempo based system versus expected genre
output for the large dataset of all ten genre classes.
5.2.
Eight-Genre Trial
As can be seen in Table 3, the two lowest accuracy genre classes in the ten
genre class trial were electronic and rock, with accuracies of 12% and 20%
respectively. So, for the trial with eight genres, these two genre models and their
associated test files were removed from the data set and the test was run again.
The elimination of these two models helps the overall accuracy considerably,
43
increasing it to 51%. The full breakdown of system output versus expected output
appears in Table 7.
Table 7 Actual genre output of the combined system versus expected genre output
output\expected A
B
C
F
H
alternative
52% 20% 0% 12% 4%
blues
12% 56% 0% 24% 4%
classical
0% 0% 56% 0% 0%
folk
0% 8% 0% 52% 4%
hip-hop/rap
8% 0% 0% 4% 48%
jazz
0% 8% 24% 0% 0%
pop
20% 4% 16% 8% 8%
R & B/Soul
8% 4% 4% 0% 32%
for the second trial containing eight genre classes.
J
0%
24%
16%
12%
0%
40%
4%
4%
P
28%
0%
0%
0%
16%
0%
28%
28%
RB
4%
8%
0%
12%
16%
4%
32%
24%
As with the ten class trial, the overall genre was slightly better than any single
feature-decision chain. The accuracy order of the three chains remained the
same, with the spectral envelope chain’s accuracy at 50.5%, the MFCC chain’s
accuracy at 48% and the tempo-based chain’s accuracy at 17%. Even though the
system as a whole did show improvement, it is interesting to note that two genres
that have the lowest accuracies in this trial, ‘Pop’ and ‘R & B/Soul’, both actually
decreased in accuracy from the ten-genre run. As with first trial, a results Table
for each feature-decision chain follows (Table 8 – 10).
44
output\expected A
B
C
F
H
J
P
RB
alternative
28% 8% 0% 4% 0% 0% 12% 4%
blues
4% 68% 0% 20% 0% 28% 0% 8%
classical
0% 0% 52% 0% 0% 16% 0% 0%
folk
0% 0% 4% 44% 0% 8% 0% 4%
hip-hop/rap
20% 0% 0% 4% 44% 0% 0% 12%
jazz
0% 8% 24% 0% 0% 40% 0% 0%
pop
40% 8% 16% 28% 16% 4% 40% 40%
R & B/Soul
8% 8% 4% 0% 40% 4% 48% 32%
Table 8 Actual genre output by the spectral envelope based system versus expected
genre output for the second trial containing eight genre classes.
output\expected A
B
C
F
H
J
P
RB
alternative
56% 28% 0% 12% 8% 0% 28% 4%
blues
8% 24% 0% 16% 8% 4% 0% 8%
classical
0% 4% 56% 8% 0% 20% 0% 4%
folk
0% 24% 0% 52% 8% 24% 4% 8%
hip-hop/rap
4% 0% 0% 0% 40% 0% 24% 8%
jazz
4% 12% 40% 8% 0% 48% 0% 16%
pop
20% 4% 4% 4% 8% 4% 28% 16%
R & B/Soul
8% 4% 0% 0% 28% 0% 16% 36%
Table 9 Actual genre output by the MFCC based system versus expected genre
output for the second trial containing eight genre classes.
output\expected A
B
C
F
H
J
P
RB
alternative
52% 32% 40% 44% 20% 44% 40% 36%
blues
4% 12% 16% 20% 8% 12% 12% 12%
classical
0% 0% 0% 0% 0% 0% 0% 0%
folk
0% 0% 0% 0% 0% 0% 0% 0%
hip-hop/rap
44% 56% 36% 32% 72% 40% 48% 52%
jazz
0% 0% 0% 0% 0% 0% 0% 0%
pop
0% 0% 0% 0% 0% 0% 0% 0%
R & B/Soul
0% 0% 8% 4% 0% 4% 0% 0%
Table 10 Actual genre output by the tempo based system versus expected genre
output for the second trial containing eight genre classes.
45
5.3.
Six-Genre Trial
This trial is of the six genre classes that have scored the most accurate on the
prior tests. As mentioned above, from Table 7 it can be seen that the two least
accurate genre classes from this trial are ‘pop’ and ‘R & B/soul.’ As such these
two genres are not included in the third trial in the series. Therefore, this trial
has 150 songs across the six remaining genre classes. The overall accuracy of this
trial increased substantially with an overall accuracy of 77.3%. The genre-bygenre accuracy and error rates appear in Table 11.
output\expected A
B
C
F
H
J
alternative
56% 8% 0% 8% 8% 0%
blues
4% 56% 4% 28% 0% 32%
classical
0% 0% 68% 0% 0% 16%
folk
4% 28% 8% 60% 12% 12%
hip-hop/rap
36% 4% 4% 4% 80% 0%
jazz
0% 4% 16% 0% 0% 40%
Table 11 Actual genre output of the combined system versus expected genre output
for the third trial containing six genre classes.
Interestingly, in this third trial, the spectral envelope chain had a higher
overall accuracy by itself, 84%, than the overall system accuracy. This may be
due to the smaller improvement seen in the MFCC chain, which correctly assign
genre to 67.3% of the test data set. The tempo-based chain was again last,
scoring correctly only 22.7% of the time. A breakdown of each feature decision
chain’s performance for this trial can be seen in Tables 12 – 14.
46
output\expected A
B
C
F
H
J
alternative
52% 12% 0% 8% 12% 0%
blues
4% 72% 4% 28% 4% 32%
classical
0% 0% 68% 0% 0% 16%
folk
0% 8% 8% 60% 0% 12%
hip-hop/rap
44% 4% 4% 4% 84% 0%
jazz
0% 4% 16% 0% 0% 40%
Table 12 Actual genre output of the spectral envelope system versus expected genre
output for the third trial containing six genre classes.
output\expected A
B
C
F
H
J
alternative
52% 4% 0% 4% 8% 0%
blues
0% 24% 0% 12% 0% 4%
classical
0% 8% 44% 4% 0% 16%
folk
24% 48% 8% 72% 36% 32%
hip-hop/rap
20% 0% 0% 0% 56% 0%
jazz
4% 16% 48% 8% 0% 48%
Table 13 Actual genre output of the MFCC system versus expected genre output for
the third trial containing six genre classes.
output\expected A
B
C
F
H
J
alternative
52% 32% 40% 44% 20% 44%
blues
4% 12% 16% 24% 8% 12%
classical
0% 0% 0% 0% 0% 0%
folk
0% 0% 0% 0% 0% 0%
hip-hop/rap
44% 56% 44% 32% 72% 44%
jazz
0% 0% 0% 0% 0% 0%
Table 14 Actual genre output of the tempo-based system versus expected genre
output for the third trial containing six genre classes.
5.4.
The Two Four-Genre Trials (Original Dataset v. iTunes Dataset)
In an effort to qualify the various differences between the original dataset
used in chapter 2 and the dataset gathered through the best seller lists at the
iTunes Music Store, the last trial uses the same four genres from the large data
set that were used in the initial trials. Two of these genres, “rock” and
“electronic,” were eliminated in between the first and second trials in the earlier
series. The third and forth genres, “jazz” and “classica,” were in all three of the
47
previous trials and had relatively high accuracies as well. The overall accuracy of
the two runs was very different. This delta was seen in all the feature-decision
chains in varying amounts and can be seen in Table 15.
feature\data original
iTunes derived
Spectral Env.
82.93% 61.62%
MFCC
65.85% 55.56%
tempo
31.71% 27.27%
75.61%
overall
59.60%
Table 15 A side-by-side comparison of the accuracy rates of the two data sets output
accuracy.
48
6. Analysis and Conclusion
6.1.
Overall Performance
On the whole the hybrid song sorting system performed well, though with
clear limitations. The most prevalent of these limitations (at least on the given
test data) is one of genre overlap. This is caused by the training examples of each
genre not being sufficiently dissimilar from other genre’s training material, along
the dimensions of the feature chains in the system. The best way to see this
visually is in the BPM scatter plot (Figure 27), which has the most pronounced
overlap, rendering tempo chain only marginally helpful in improving the
accuracy of the overall system. As a direct result of this overlap effect, the
usefulness of this automatic system is significantly higher if there are fewer
classes of data in the data set. This can be easily seen in Figure 30.
When genres are removed from the trial dataset, any overlap that genre
contributed is also removed. This causes behavior that shows nearly exponential
improvement in accuracy as genres are removed from the trial.
49
Figure 30 Accuracy versus number of genre classes, taken from the iTunes derived
dataset, with points for each of the individual feature-decision chains and a best fit
line formed from the overall results.
Taking this into account, it is also interesting to look beyond the accuracy
rates of the various trials and examine where the errors were. In looking at these
errors, especially those in the large initial trial (seen in Tables 3 – 6, it seems
there are some telling patterns. There seems to be large variation in the topology
of genre classes used to describe the set of 400 songs used. Looking at the ‘hiphop/rap’ genres distribution, 48% were correct, but a full 36% were thought to be
of the genre “R & B/soul.” Conversely, “R & B/soul,” though a bit less accurate
overall confirms this clear overlap of definition. Its accuracy rate is 32% yet 16%
were incorrectly categorized as “hip-hop/rap.” This relationship is further
exposed by the observable leap in accuracy in the “hip-hop/rap” genre when the
50
“R & B/soul” genre is eliminated from the data set in between the eight class and
six class trials. Similar patterns can be seen to a varying degree amongst many of
the other genre classes. There seems to exist a triangular overlap of definition
between “alternative,” “blues” and “folk” that continued throughout all three
trials. All of these overlapping genres are of course dependant on the feature
vectors extracted. The overlap is observable from the perspective of the features
used in this system but there may exist features that would eliminate one or more
of these boundary definition problems (i.e. melodic structure feature vectors).
Another question that emerges from these trials is that of effect of training song
selection. One of the more notable differences between the smaller first data set
used in the earlier trials and the larger one of ten genre classes is the means of
selecting training songs. In the first data set the training songs were manually
chosen to best represent their genre in the training model. In contrast, when
using the larger data set the training songs were selected at random out of the
available 40 songs genre group. This was done in an attempt to improve
objectivity in the testing set-up however it may have had a substantial negative
impact on the accuracy of the results.
6.2.
The Intelligently Chosen Set
In the last trial conducted with the models based on songs labeled by the
iTunes Music Store, a four-genre test is directly compared to the trial using the
initial set of songs used for the early tests. Unlike the iTunes sourced data, the
data initial used was deliberately chosen to create statistically separated genres
and the comparative results are telling (Table 15). There was substantial improve
51
seen by each of the feature chains, regardless of the relative performance seen in
the prior trials. This leads to two related conclusions. First, the data set
extracted from the iTunes Music Store is clearly far from the ideal topology that
was described in Chapter 1. Second, this methodology of sorting songs is only as
good as the initial training example songs’ manually chosen genre topology. Any
overlap or holes present at that stage will reduce the effectiveness of the system.
6.3.
What can this tell us about music?
(Or At Least About Popular Manual Sorting Methodology.)
From the trials run on the data set from the iTunes Music Store and
specifically the trial just discussed, it seems clear that, at least through the eye of
this sorting system, the genre topology used is far from ideal (as defined in
chapter 1). It seems safe to say that this contributed at least somewhat to the
high degree of error seen in the trials (especially the full test of all ten genres).
That said, it is difficult to tell exactly how well this automatic sorting system
could perform on a large number of genre classes if those genre classes were
closer to ideal in their topology. Given the comparative last trial run on the two
four-genre datasets, it can be inferred that an improvement on the order of 10% 15% would be reasonable to expect in a ten-genre test using data that was more
intelligently split, with larger gains possible.
6.4.
Future Work
There are a number of possible avenues of further study that can continue
where this research is ending. Without changing the system, a worthwhile
52
investigation could be seen in the use of a variety of song collections arranged in
many genre topologies. One of particular interest would be a dataset with genres
assigned to songs by a surveyed group of listeners. Then the automated genre
assignment process can be evaluated against a group of people who have no
commercial interest in the genre assignment (unlike the genre assignment of the
ITunes Music Store, in which genre decisions have clear commercial effects).
There is also potential in improving the structure of the system itself. The
most immediate of these possible changes is to supplement the existing threechain system with other feature-decision chains based on more (preferable highly
dissimilar to those currently in the system) feature vectors. Of particular interest
are “musically aware” features, as features that has a deeper description of
musical structure have a greater potential to separate what might otherwise be an
overlapping topology. This could improve the accuracy of the system a great deal,
though with a clear cost of computational time. Along these lines, clearly there is
room for some improvement in the performance of the tempo-based feature
decision chain. Perhaps some form of clustering could be used to increase
independence of each genre prior to taking the distance. Although based on the
distribution seen in the data set used for the trial in this document, there may not
be much to gain through this course of action.
Lastly, there is great potential in the use of the concept used in the genre
sorting system described above as a means to index a songs’ similarity to other
songs along the dimensionality of the feature decision chains used in such a
system. A quick way to achieve this would be to have every song be a “genre” by
itself. Then each song would be tested against each of these one-song-genres.
53
The genre that a song was placed in would in fact be the most similar song (along
the dimensions of the features in the system) to the test song. A modified system
like this could offer a far more flexible and adaptable classification solution to a
landscape of constantly changing music and culture.
54
Bibliography
[1] J.
Martínez,
“MPEG-7
Overview
(version
http://www.chiariglione.org/mpeg/standards/mpeg-7/mpeg-7.htm,
2005.
10),”
Oct.
[2] H. Kim, E. Berdahl, T. Sikora, Study of MPEG-7 Sound Classification and
Retrieval, http://ccrma.stanford.edu/~eberdahl/Papers/ITG.pdf.
[3] M. Casey, “MPEG-7 sound-recognition tools,” IEEE Transactions on
Circuits and Systems for Video Technology, Vol. 11, No. 6, June 2001, pp. 737747.
[4] Z. Xiong, R. Radhakrishnan, A. Divakaran, T. S. Huang, “Comparing
MFCC and MPEG-7 Audio Features for Feature Extraction, Maximum
Likelihood HMM and Entropic Prior HMM for Sports Audio Classification,”
IEEE International Conference on Acoustics, Speech, and Signal Processing,
2003. Proceedings. (ICASSP '03). 2003.
[5] K. Kosina, “Music Genre recognition,” University of Hagenberg, June
2002
[6] M. Casey, “All-XM.zip” ISO/IEC 15938-4:2001 Audio Reference Software
(ISO/IEC 15938-4:AMD1 2003), April 2001, 2003
[7] L. R. Rabiner, “A Tutorial on Hidden Markov Models and Selected
Applications in Speech Recognition,” Proc. of the IEEE, Vol. 77, No. 2, Feb
1989, pp. 257 – 286
[8] G. Tzanetakis, P. Cook, “Musical Genre Classification of Audio Signals,”
IEEE Trans. on Speech and Audio Processing, VOL. 10, NO. 5, JULY 2002
[9] G.D.Forney, “The Viterbi algorithm,” Proc. IEEE, vol.61. pp.268-278,
Mar.1973.
[10] C. Beghtol, “The Concept of Genre and Its Characteristics,” Bulletin of the
American Society for Information Science and Technology, vol. 27, No. 2,
Dec./Jan. 2001
55
[11] Chandler, Daniel (1997): “An Introduction to Genre Theory” [WWW
document] URL
http://www.aber.ac.uk/media/Documents/intgenre/intgenre.html [12th,
January, 2006]
[12] Apple Computer, Oxford American Dictionary, from the program
“Dictionary,” Version 1.0.1, 2005
[13] R. Altman, “A Semantic/Syntactic Approach to Film Genre,” Cinema
Journal, Vol. 23, No. 3 (Spring, 1984), pp.6-18
[14] M. Casey, “Generalized sound classification and similarity in MPEG-7”
Organized Sound, Vol.6, No.2, 2002.
[15] E. Scheirer, “Tempo and Beat Analysis of Acoustic Musical Signals,” J.
Acoust. Soc. Am. Vol 103, No.1, (Jan. 1998), pp. 588 – 601
[16] L. Rabiner and B. Juang, Fundamentals of Speech Recognition, 1993,
Prentice-Hall Signal Processing Series, Englewood Cliffs, New Jersey
[17] J.C. Brown, “Computer Identification of Musical Instruments Using
Pattern Recognition with Cepstral Coefficients ,“ J. Acoust. Soc. Am. Vol. 105,
No. 3, (Mar. 1999), pp.1933-1941
[18] M.J. Carey, E. Parris, H. Lloyd-Thomas, “A Comparison of Features for
Speech, Music Discrimination,” Proc. IEEE Int. Conf. on Acoustics, Speech,
Signal Processing (Phoenix, AZ), Mar. 1999.
[19] B. Logan, “Mel-Frequency Cepstral Coefficients for Music Modeling,“ Int.
Symposium on Music Information Retrieval, 2000.
[20] K. Kosaina, “Music Genre Recognition” Hagenberg, June 2002.
[21] U. Zöler, DAFX, Digital Audio Effects, 2002, Wiley, West Sussex, England
[22] M. Cooper and J. Foote, “Automatic music summarization via similarity
analysis,” in Proc. of ISMIR 2002, pp. 81–85, 2002.
[23] S. Davis and P. Mermelstein, “Experiments in syllable-based recognition
of continuous speech,” IEEE Trans. Acoust., Speech, Signal Processing, vol.
28, pp. 357–366, Aug. 1980.
[24] M. H. DeGroot and M. J. Schervish, Probability and Statistics, 3rd edition,
2002, Addison-Wesley
56
[25] V. N. Vapnik, The Nature of Statistical Learning Theory, 2nd edition, 2000,
Springer-Verlag New York
[26] R. D. Redman, Computer Speech Technology, 1999, Artech House Inc,
Norwood, MA
57
Appendix Full Trial Song Listings with Genre Assignments
Note: Each of these songs appeared on the Top 40 highest sales in their
genre for the day of March 13, 2006.
A.1 Alternative Rock
--Dirty Little Secret
The All-American Rejects
--Swing, Swing
The All-American Rejects
--I Bet You Look Good On the Dancefloor
Arctic Monkeys
--Superman
Five for Fighting
--Stacy's Mom
Fountains of Wayne
--Feel Good Inc. (Album Crossfade)
Gorillaz
--American Idiot
Green Day
--Good Riddance (Time of Your Life)
Green Day
--The Middle
Jimmy Eat World
--The Only Difference Between Martyrdom and Suicide Is Press Coverage
Panic! At The Disco
--Lying Is the Most Fun a Girl Can Have Without Taking Her Clothes Off
Panic! At The Disco
--Somebody Told Me
The Killers
--Wonderwall
Oasis
--Remedy
Seether
--Forever Young
Youth Group
--Move Along
The All-American Rejects
--Talk
Coldplay
--Yellow
Coldplay
--Soul Meets Body
Death Cab For Cutie
--Dance, Dance
Fall Out Boy
--Sugar, We're Goin Down
Fall Out Boy
--Take Me Out
Franz Ferdinand
--Dare
Gorillaz
--Boulevard of Broken Dreams
Green Day
--Wake Me Up When September Ends
Green Day
--Holiday (Faded Ending)
Green Day
--Wings of a Butterfly
H.I.M. (His Infernal Majesty)
--The Reason
Hoobastank
--Mr. Brightside
The Killers
--King Without a Crown
Matisyahu
--Youth
Matisyahu
--Time of Your Song
Matisyahu
--I Write Sins Not Tragedies
Panic! At The Disco
--Who I Am Hates Who I've Been
Relient K
--Tear You Apart
She Wants Revenge
58
--Perfect Situation
--Beverly Hills
--Seven Nation Army
--Gold Lion
--Ocean Avenue
Weezer
Weezer
The White Stripes
Yeah Yeah Yeahs
Yellowcard
A.2 Blues
--Soul Man
--Georgia On My Mind
--Sweet Home Chicago
--Smoking Gun
--I'd Rather Go Blind
--Red Light
--I'd Love to Change the World
--I Can't Make You Love Me
--Lets Give Them Something
--Bad to the Bone
--Pride and Joy
--The Sky Is Crying
--Tightrope
--Texas Flood
--One Bourbon, One Scotch, One Beer
--Ain't No Sunshine When She's Gone
--Hey Bartender (Live)
--Rubber Biscuit
--Tuff Enuff
--Moondance
--When You're Walking Away
--Damn Right, I've Got the Blues
--Ain't No Sunshine
--I've Got Dreams to Remember
--Boom Boom
--At Last (Single)
--Born Under a Bad Sign
--The Thrill Is Gone (1969 Single Version)
--Riding with the King
--Lie to Me
--Misty Blue
--Mannish Boy
--Something to Talk About
--Special Lady
--Honky Tonk Women
The Blues Brothers
Ray Charles
Eric Clapton
Robert Cray
Etta James
Jonny Lang
Ten Years After
Bonnie Raitt
Bonnie Raitt
George Thorogood & The
Destroyers
Stevie Ray Vaughan
Stevie Ray Vaughan
Stevie Ray Vaughan
Stevie Ray Vaughan & Double
Trouble
George Thorogood & The
Destroyers
Bobby "Blue" Bland
The Blues Brothers
The Blues Brothers
The Fabulous Thunderbirds
Georgie Fame Feat. Van Morrison
& Jon Hendricks
Jackie Greene
Buddy Guy
Buddy Guy & Tracy Chapman
Buddy Guy & John Mayer
John Lee Hooker
Etta James
Albert King
B.B. King
B.B. King & Eric Clapton
Jonny Lang
Dorothy Moore
Muddy Waters
Bonnie Raitt
Ray, Goodman & Brown
Taj Mahal
59
--It Hurt So Bad
--Cocaine Blues
--Who Do You Love?
--Crossfire
--Couldn't Stand the Weather
--Superstition
--The House Is Rockin'
Susan Tedeschi
George Thorogood & The
Destroyers
George Thorogood & The
Destroyers
Stevie Ray Vaughan
Stevie Ray Vaughan
Stevie Ray Vaughan & Double
Trouble
Stevie Ray Vaughan & Double
Trouble
A.3 Classical
Sayuri's Theme
Time to Say Goodbye
John Williams & Yo-Yo Ma
Andrea Bocelli & Sarah
Brightman
Danny Boy
James Galway & The Chieftains
Con te partiro
Andrea Bocelli
The Prayer
Andrea Bocelli & Céline Dion
Because We Believe
Andrea Bocelli
Unaccompanied Cello Suite No. 1 in G Major, BWV 1007: I. Prélude 2:21
Yo-Yo Ma
(Bach)
Schindler's List: Theme
John Williams
A Dream Discarded (Live Version)
John Williams & Yo-Yo Ma
Going To School (Live Version)
John Williams & Yo-Yo Ma
Somos novios (It's Impossible)
Andrea Bocelli & Christina
Aquilera
Symphony No.5 in C Minor: I. Allegro con brio Orchestre Révolutionnaire et
Romantique & John Eliot
Gardiner
(Beethoven)
Canon in D
English Chamber Orchestra &
Raymond Leppard (Pachelbel)
Fanfare for the Common Man
Aaron Copland & London
Symphony Orchestra
Turandot, Act III, Nessun dorma!
Luciano Pavarotti
Amazing Grace
The Canadian Scottish Regiment
Pipes and Drums & The Third
Marine Aircraft Wing Band
Rhapsody in Blue
Columbia Symphony Orchestra &
Leonard Bernstein (Gershwin)
"Eine kleine Nachtmusik", Serenade in G Major, K. 525: I. Allegro
Academy of St. Martin in the
Fields & Sir Neville Marriner
(Mozart)
60
Piano Sonata No. 14 in C Sharp Minor, Op. 27, No. 2, "Moonlight": I. Adagio
sostenuto
Alfred Brendel
Besame mucho
Andrea Bocelli
Concerto for 2 Violins in D Minor, BWV 1043: I. Vivace
Hilary Hahn, Jeffrey Kahane, Los
Angeles Chamber Orchestra &
Margaret Batjer
(Bach)
"Für Elise" - Bagatelle in A Minor, WoO 59
Alfred Brendel
(Beethoven)
Theme from Rocky
Cincinnati Pops Orchestra &
Erich Kunzel
Can't Help Falling In Love
Andrea Bocelli
Symphony No. 5 In C Minor, Op. 67: Allegro Con Brio
Zagreb Philharmonic - Csr
Symphony Orchestra (Bratislava)
- Richard Edlinger – Michael
Halasz (Beethoven)
Carmina Burana: O Fortuna
Boston Symphony Orchestra &
Seiji Ozawa
Olympic Theme
Frederick Fennell & The
Cleveland Symphonic Winds
Pachebel: Canon In D
A Brides Guide To Wedding
Music Gabriel's Oboe
Yo-Yo Ma, Roma Sinfonietta &
Ennio Morricone
Ride of the Valkyries from The Ring
Richard Wagner
Bagatelle In A Minor, Wo0 59 'Fur Elise'
Beethoven
The Prayer
Andrea Bocelli & Céline Dion
Turandot: 'Nessun Dorma!"
John Alldis Choir, London
Philharmonic Orchestra, Luciano
Pavarotti, Wandsworth School
Boys Choir & Zubin Mehta
Adagio for Strings (arranged from the String Quartet, Op. 11)
Leonard Bernstein & New York
Philharmonic
Unaccompanied Cello Suite No. 1 in G Major, BWV 1007: II. Allemande
Yo-Yo Ma
(Bach)
Star Wars (Main Title)
John Williams
Canon and Gigue in D Major: I. Canon
English Concert & Trevor
Pinnock
(Pachelbel)
Air on a G String from Orchestral Suite No. 3 J.S. Bach
O Fortuna from Carmina Burana
Carl Orff
Molto allegro from Symphony No. 40, K. 550 Wolfgang Amadeus Mozart
3.d Electronic:
61
--Starry Eyed Surprise
--Hide and Seek
--Axel F (Radio Mix)
--One More Time
--24
--Technologic
--The Rockafeller Skank
--Galvanize
--They
--Ready, Steady, Go
--Flying High
--Teardrop
--South Side
--Smack My B***h Up
--In the Waiting Line
--Porcelain
--Random
--Breathe
--Take Me Away (Into the Night)
--Pump Up the Jam
--Blue (Da Ba Dee)
--Goodnight and Go
--Number 1
--Dreams (Featuring Stevie Nicks)
--Breathe
--Just a Ride
--Firestarter
--Flip Ya Lid
--Your Woman
--Keep Hope Alive
--Return To Innocence
--Come On Closer
--Block Rockin' Beats
--Sour Times
--It Feels So Good
Paul Oakenfold
Imogen Heap
Crazy Frog
Daft Punk
Jem
Daft Punk
Fatboy Slim
The Chemical Brothers
Jem
Paul Oakenfold
Jem
Massive Attack
Moby & Gwen Stefani
Prodigy
Zero
Moby
Lady Sovereign
Telepopmusik
4 Strings
Technotronic
Eiffel 65
Imogen Heap
Goldfrapp
Deep Dish
Prodigy
Jem
Prodigy
Nightmares on Wax
White Town
The Crystal Method
Enigma
Jem
The Chemical Brothers
Portishead
Sonique
--Love Generation (Featuring Gary Pine)
--Extreme Ways
--Silence (DJ Tiësto's In Search of Sunrise Edit)
Bob Sinclar
Moby
Delerium & Sarah
McLachlan
Thievery Corporation
Afrika Bambaataa
--Lebanese Blonde
--Don't Stop...Planet Rock
3.e Folk
62
Training:
--The Blower's Daughter
--Travelin' Thru
--World Spins Madly On
--Closer to Fine
--Closer
Damien Rice
Dolly Parton
The Weepies
Indigo Girls
Joshua Radin
--Cannonball
--Cat's In the Cradle
--Finnegan's Wake
--Whisky You're the Devil
Damien Rice
Harry Chapin
Clancy Brothers
Clancy Brothers
--When It Don't Come Easy
--Colors
--The Trapeze Swinger
--California Stars
--Bad, Bad Leroy Brown
Patty Griffin
Amos Lee
Iron & Wine
Billy Bragg and Wilco
Jim Croce
--Summer In the City
Wagon Wheel
The Lovin' Spoonful
The Old Crow Medicine
Show
Jim Croce
Gordon Lightfoot
Indigo Girls
Operator (That's Not the Way It Feels)
--If You Could Read My Mind (Album Version)
--Galileo
--Unplayed Piano (Chris Lord-Alge Mix)
Gotta Have You
Ramblin' Irishman
--Year of the Cat
The Humours of Whiskey
Damien Rice & Lisa
Hannigan
The Weepies
Andy M. Stewart
Al Stewart
Andy M. Stewart &
Mannus Lunny
Everything'll Be Alright (Will's Lullaby)
When the Stars Go Blue
--Heartbeats
The Humours of the King of Ballyhooley
--Delicate
Joshua Radin
The Corrs
José González
Patrick Street
Damien Rice
--Puff, the Magic Dragon
--Sunny Road
--Wedding Song (There Is Love)
--Keep It Loose, Keep It Tight
Don't Think Twice, It's All Right
Peter, Paul And Mary
Emiliana Torrini
Peter, Paul And Mary
Amos Lee
Bob Dylan
63
--If You Could Read My Mind
--Leaving On a Jet Plane
--Volcano
--Wreck of the Edmund Fitzgerald (LP Version)
--Pink Moon
Gordon Lightfoot
Peter, Paul And Mary
Damien Rice
Gordon Lightfoot
Nick Drake
3.f Hip-Hop/Rap
--I'm N Luv (Wit a Stripper)
--Shake That
--Pump It
--Ms. New Booty
--My Humps
--Lean Wit It, Rock Wit It
--Grillz
--Ridin'
--Fresh AZIMIZ
--Gold Digger
--Touch the Sky
--Laffy Taffy
--Poppin' My Collar
--Soul Survivor
--Tell Me When To Go
--Best Friend
--Turn It Up
--Jesus Walks
--Touch It (Remix)
--If It's Lovin' That You Want
--Let's Get It Started (Spike Mix)
--Pon de Replay
--Stay Fly
--Baby Got Back
--My Hood
--Rodeo
--In da Club
T-Pain & Mike Jones
Eminem
Black Eyed Peas
Bubba Sparxxx Featuring
Ying Yang Twins
Black Eyed Peas
Dem Franchize Boyz
Featuring Peenut &
Charlay
Nelly featuring Paul Wall,
Ali & Gipp
Chamillionaire & Krayzie
Bone
Bow Wow, J-Kwon &
Jermaine Dupri
Kanye West
Kanye West
D4L
Three 6 Mafia
Young Jeezy & Akon
E-40
Olivia & 50 Cent
Chamillionaire
Kanye West
Busta Rhymes
Rihanna
Black Eyed Peas
Rihanna
Three 6 Mafia featuring
Young Buck & Eightball
& M.J.G.
Sir Mix-a-Lot
Young Jeezy
Juvenile
50 Cent
64
--Switch
--There It Go (The Whistle Song)
--Oh I Think Dey Like Me
--I'm Sprung
--Soul Survivor
--Where Is the Love?
--When I'm Gone
--Pump It
--One Wish (Radio Edit)
--Rompe (Remix)
--Oh Yes
--Girl
--Fireman (Main)
Will Smith
Juelz Santana
Dem Franchize Boyz
T-Pain
Young Jeezy & Akon
Black Eyed Peas & Justin
Timberlake
Eminem
Black Eyed Peas
Ray J
Daddy Yankee Featuring
Lloyd Banks and Young
Buck
Juelz Santana
Paul Wall
Lil' Wayne
3.g Jazz
What a Wonderful World (Single)
Take Five
Do You Know What It Means to Miss New Orleans
So What
Sparks
What Are You Doing the Rest of Your Life?
Moody's Mood for Love (I'm In the Mood for Love)
My One and Only Love
My Funny Valentine
The Look of Love
What You Won't Do for Love (Original)
In the Mood
All at Sea
The Secret Garden (Sweet Seduction Suite)
Blue in Green
Sing, Sing, Sing
Louis Armstrong
The Dave Brubeck
Quartet
Take 6 Featuring Aaron
Neville
Miles Davis
Wynton Marsalis
Chris Botti featuring
Sting
Brian McKnight, James
Moody, Quincy Jones,
Rachelle Ferrell & Take 6
John Coltrane & Johnny
Hartman
Miles Davis & Miles Davis
Quintet
Diana Krall
Bobby Caldwell
Glenn Miller
Jamie Cullum
Al B. Sure!, Barry White,
El DeBarge, James
Ingram & Quincy Jones
Miles Davis
Benny Goodman and His
Orchestra
65
Good Morning Heartache
Get a Clue
Feeling Good
The Way You Look Tonight
Sweet Home Alabama
In a Sentimental Mood
The Girl from Ipanema
Dance Me to the End of Love
Careless Love
J'Ai Deux Amours
My Baby Just Cares for Me
I'll Be Seeing You (1944 Single)
I Think It's Going to Rain Today
Flamenco Sketches
Concierto de Aranjuez
Get Your Way
Give Me the Night
High & Dry (US Version)
Chris Botti & Jill Scott
Simon & Milo
Nina Simone
Tony Bennett
Lynyrd Skynyrd
John Coltrane
Astrud Gilberto, João
Gilberto & Stan Getz
Madeleine Peyroux
Madeleine Peyroux
Madeleine Peyroux
Nina Simone
Billie Holiday
Norah Jones
Miles Davis
Jim Hall
Jamie Cullum
George Benson
Jamie Cullum
3.h Pop
Unwritten
Beep
Walk Away
For You I Will (Confidence)
Stupid Girls
Rush
Jesus, Take the Wheel
L.O.V.E.
What's Left of Me (Main Version)
Stickwitu
Don't Cha
Crash
Since U Been Gone
Because of You
Breathe (2AM)
Hollaback Girl
These Words
Collide
Behind These Hazel Eyes
Black Horse and the Cherry Tree (Radio Version)
Ms. New Booty (Edited Radio Shorter Version)
Natasha Bedingfield
The Pussycat Dolls
Kelly Clarkson
Teddy Geiger
P!nk
Aly & AJ
Carrie Underwood
Ashlee Simpson
Nick Lachey
The Pussycat Dolls
The Pussycat Dolls
Gwen Stefani
Kelly Clarkson
Kelly Clarkson
Anna Nalick
Gwen Stefani
Natasha Bedingfield
Howie Day
Kelly Clarkson
KT Tunstall
Bubba Sparxxx & Ying
Yang Twins
66
Breakaway
Hung Up
Sorry
Boyfriend
Rich Girl
Barbie Girl (Radio)
La Tortura
American Pie
Just the Girl
Miss Independent
Dirrty (Featuring Redman)
I'll Be
We Belong Together
4ever
Beautiful
Beautiful Soul
Don't Forget About Us
Cool
Don't Bother
Get the Party Started
Inside Your Heaven
Toxic
My Happy Ending
Kelly Clarkson
Madonna
Madonna
Ashlee Simpson
Gwen Stefani & Eve
Aqua
Shakira & Alejandro Sanz
Don McLean
The Click Five
Kelly Clarkson
Christina Aguilera
featuring Redman
Edwin McCain
Mariah Carey
The Veronicas
Christina Aguilera
Jesse McCartney
Mariah Carey
Gwen Stefani
Shakira
P!nk
Carrie Underwood
Britney Spears
Avril Lavigne
3.i R & B/Soul
So Sick
Be Without You (Kendu Mix)
Check On It
Yo (Excuse Me Miss)
Yeah!
Temperature
UnpredicTable (Main)
Run It! (Featuring Juelz Santana)
Love
When You're Mad
Crazy in Love
One, Two Step
Milkshake
If I Ain't Got You
Killing Me Softly with His Song
Ne-Yo In My Own Words
Mary J. Blige
Beyoncé & Slim Thug
Chris Brown
Usher featuring Lil' Jon
& Ludacris
Sean Paul
Jamie Foxx & Ludacris
Chris Brown
Keyshia Cole
Ne-Yo
Beyoncé
Ciara featuring Missy
Elliot
Kelis
Alicia Keys
Fugees
67
Black Sweat
Gimme That
Ordinary People
My Boo (Bonus Track)
Naughty Girl
4 Minutes
Stay
Caught Up
Play That Funky Music
Lose My Breath
Fallin'
Goodies
Leave (Get Out)
Baby Boy
Family Affair
All My Life
One
Back at One
Oh
Dime Piece
Conceited (There's Something About Remy)
Survivor
Wanna Love You Girl
September
Superstition
As
By Your Side
Dance With My Father
Prince
Chris Brown
John Legend
Usher & Alicia Keys
Beyoncé
Avant
Ne-Yo & Peedi Peedi
Usher
Wild Cherry
Destiny's Child
Alicia Keys
Ciara featuring Petey
Pablo
JoJo
Beyoncé & Sean Paul
Mary J. Blige
K-Ci & JoJo
Mary J. Blige & U2
Brian McKnight
Ciara featuring Ludacris
Nick Cannon featuring
Izzy
Remy Ma
Destiny's Child
Robin Thicke & Pharrell
Williams
Earth, Wind & Fire
Stevie Wonder
Stevie Wonder
Sade
Luther Vandross
Goodies
3.j Rock
--Bad Day
--You're Beautiful
--Always On Your Side
--Upside Down
--Goodbye My Lover
Daniel Powter
James Blunt
Sheryl Crow & Sting
Jack Johnson
James Blunt
68
--Over My Head (Cable Car)
The Fray
--Photograph
Nickelback
--Girl Next Door
Saving Jane
--Lights and Sounds
Yellowcard
--Savin' Me
Nickelback
--Ever the Same
Rob Thomas
--Who Says You Can't Go Home (Featuring Jennifer Nettles)
Bon Jovi & Jennifer
Nettles
--The Real Thing
Bo Bice
--Better Days
Goo Goo Dolls
--100 Years
Five for Fighting
--You and Me
Lifehouse
--Drops of Jupiter
Train
--Hemorrhage (In My Hands)
Fuel
--Crazy B***h
Buckcherry
--California
Phantom Planet
--Just Feel Better (Featuring Steven Tyler)
Santana & Steven Tyler
--Animals
Nickelback
--Iris
Goo Goo Dolls
--Right Here
Staind
--Brown Eyed Girl
Van Morrison
--She Will Be Loved
Maroon 5
--Bat Country
Avenged Sevenfold
--Someday
Nickelback
--Wasteland
10 Years
--How You Remind Me
Nickelback
--Sitting, Waiting, Wishing
Jack Johnson
--Here Without You
3 Doors Down
--Bom Bom Bom
Living Things
--Bohemian Rhapsody
Queen
--High
James Blunt
--Have a Nice Day
Bon Jovi
--Hotel California
Eagles
--Lonely No More
Rob Thomas
--This Love
Maroon 5
--Far Away
Nickelback
69