Automatic chord recognition with known algorithm G D
Transcription
Automatic chord recognition with known algorithm G D
Automatic chord recognition with known algorithm Esben Paul Bugge University of Copenhagen, Department of Computer Science January 22nd 2010 13 Cm 11 D F sus2 G CONTENTS Contents 1 Introduction 5 2 Related work 9 3 The COCHONUT algorithm 10 4 MusicXML 11 5 System design 17 6 Implementation 19 7 Tests 26 8 Conclusions and future work 32 A Supported chord-types 38 2 ABSTRACT Abstract Chord recognition is the task of finding the chords that are used in a musical score and where they are used. Chord recognition has several applications including analyzation and classification of music. Several algorithms exist for recognizing chords from both symbolic and acoustic music data. In this paper I implement and test a system based on the COCHONUT algorithm used for chord recognition on symbolic data - in this case the symbolic data is MusicXML. COCHONUT was originally made for chord recognition on jazz music but in this project I test it on classical music, to examine if the algorithm is suitable for music of this genre. The tests are made on 5 classical scores and the results show that the COCHONUT algorithm is somewhat useful on classical music. Furthermore, I present some ideas on how to improve the algorithm. 3 PREFACE Preface This report is written as a candidate project in Computer Science at the University of Copenhagen. The reader of the report should have a basic understanding of music theory, although some of the terms used in the report will be shortly explained. Furthermore the reader should have a good understanding of software programming, in particular the Python programming language which is used to implement the system. It is also recommended that the reader has some basic knowledge about XML-documents and how they are structured. The report has the following sections: Section 1 contains a short introduction on the subject of chord recognition as well as an introduction on what is examined in this project. Section 2 contains a short study on what others have tried in the field of automatic chord recognition. In Section 3 the COCHONUT algorithm is described in detail - the implemented system will be based on this algorithm. Section 4 contains a description of the MusicXML format, as it is music of this format the system will be able to work on. Section 5 describes the design of the system, while Section 6 contains a description of the system’s technical implementation. In Section 7, some tests are set up and carried out and this is followed by a discussion of the results. Finally, Section 8 contains the conclusions of the work done plus a discussion of what could be done to enhance the system in the future. The project files (source code and test data) have been made available as a Google Code project at http://code.google.com/p/cochonut/. As a final note, I would like to send a special thanks to Louise Birkkjær for reading and commenting on the report. 4 1 1 INTRODUCTION Introduction A musical chord is a set of musical notes played simultaneously and chord recognition is the task of determining the chords in a musical score: at any given point in the score we would like to know what chord is being played and when it changes to another chord. Chord recognition can be carried out on either acoustic data (audio recordings) or symbolic data (sheet music or some kind of representation of this). In this project, I deal with automatic chord recognition on symbolic data. The process is made automatic by creating a computer program that will be able to read symbolic music data and output the chords found in the these data. Automatic chord recognition has some useful applications such as classification of music: knowing the chords of a musical score, we can classify the score along with other scores that uses similar chord sequences. For the purpose of introduction, let us take a look at a simple example of chord recognition. Figure 1 displays the notes of the first five measures of the popular tune, Happy Birthday To You. In this set of notes, the chords are not labeled; they have not been recognized yet. 43 3 4 Figure 1: The first five measures of Happy Birthday To You. In Figure 2 the chords of the notes from Figure 1 have been labeled. The notes in the first measure does not represent a chord. The notes of the second measure represents a D-minor7, the notes of the third represents a C7 and so on. 43 3 4 7 Dm 7 C 9 C F Figure 2: The first five measures of Happy Birthday To You - labeled with chords. 5 1 INTRODUCTION The system developed in this project, should in a similar way be able to output a description of what chords are used in the score and where each chord is used. In the example of Happy Birhday To You, the notes of each chord are not all played simultaneously - we say that the chords are broken: the notes still represent a chord, but they are not all played (or sound) simultaneously. The system in this project should also be able to recognize chords that are broken. As mentioned, the example of recognizing chords in Happy Birthday To You is a very simple one. A more complex example is displayed in Figure 3. These notes are from Bach’s Well-Tempered Clavier, Vol. I. This section of the score uses dissonances, that is, tones that are unstable in the harmony of the score. The presence of dissonance means that harmonic rules will be harder to apply in order to find the chords of the score. Figure 3: A section of Bach’s Well-Tempered Clavier, Vol. I Figure 4 displays a short section of Chopin’s Minute Waltz. This section includes two grace notes: notes that are used to decorate the music and not directly part of the melody or harmony. The grace notes in Figure 4 are smaller in print size than the rest of the notes, but grace notes can also be included in the score as ordinary notes. The presence of grace notes is considered to increase the difficulty of chord recognition as the grace notes are not part of the chords in the score and therefore is a “disturbance” for the chord recognition system. 43 3 4 Figure 4: A short section of Chopin’s Minute Waltz that uses grace notes. 6 1 INTRODUCTION The purpose of this project is to build a system that will be able to recognize chords in simple cases as well as in complex cases such as the two latter. In this project I deal with classical music, but the system might as well be used with other music genres. 1.1 Symbolic music data In the examples from Figures 1-4 the music is displayed symbolically: Symbols, and not acoustic sound, are used to define the music. The first of two main advantages of working with symbolic data, when you are creating a system for chord-recognition, is that you do not have to record a lot of audio for both training and testing of your system. Instead, one can use scannedin music notes, converted to data that is easy for the computer to handle and easy to work with for a programmer. The second main advantage is that the data will not contain noise, as it most likely would if the music was recorded. Because of these advantages, I will work on symbolic music data in this project. As for computer-readable symbolic music data, MIDI and MusicXML are two useful formats. The MIDI protocol [7] was created in 1983 for connecting musical devices such as instruments and computers and controlling them in real-time. The MIDI protocol can, for example, be used to connect a synthesizer to a computer and hereby transfer the music played on the synthesizer to the computer for editing, play-back etc. Later on, a storage format was created to save music in so-called MIDI-files. Files of this format holds symbolic music data. The MIDI format has many applications and it is widely used throughout the music industry. MusicXML [13] was created as an XML-specification to represent music notation. According to the creators of MusicXML, Recordare, the format is better for musical notation than MIDI-files because it holds much more information. That said, the downside of MusicXML is that because of all this information, the MusicXML format is very verbose and can be difficult to handle. Figure 5 shows an example of a simple MusicXML file. It represents a single whole note on middle C, based in 4/4 time. Because of MusicXML’s abilities to thoroughly describe notes, I have decided to work with this format in the project and the system that I implement will thereby have to cope with the verbosity of the format. MusicXML will be further described in Section 4. 7 1 INTRODUCTION <?xml version=” 1 . 0 ” e n c o d i n g=”UTF−8” standalone=”no” ?> < !DOCTYPE s c o r e −p a r t w i s e PUBLIC ”−// Rec ordare //DTD MusicXML 2 . 0 P a r t w i s e //EN” ” h t t p : //www. musicxml . o r g / d t d s / p a r t w i s e . dtd ”> <s c o r e −p a r t w i s e version=” 2 . 0 ”> <part− l i s t> <s c o r e −p a r t i d=”P1”> <part−name>Music</ part−name> </ s c o r e −p a r t> </ part− l i s t> <p a r t i d=”P1”> <measure number=” 1 ”> <a t t r i b u t e s> < d i v i s i o n s>1</ d i v i s i o n s> <key> < f i f t h s>0</ f i f t h s> </ key> <time> <b e a t s>4</ b e a t s> <beat−type>4</ beat−type> </ time> < c l e f> <s i g n>G</ s i g n> < l i n e>2</ l i n e> </ c l e f> </ a t t r i b u t e s> <n o t e> <p i t c h> <s t e p>C</ s t e p> <o c t a v e>4</ o c t a v e> </ p i t c h> <d u r a t i o n>4</ d u r a t i o n> <type>whole</ type> </ n o t e> </ measure> </ p a r t> </ s c o r e −p a r t w i s e> Figure 5: The “Hello World” of MusicXML: A whole C note. 8 2 2 RELATED WORK Related work In [9] a very general algorithm for chord recognition is described. The algorithm is divided in two parts (although the work of these parts is done simultaneously): One for segmentation of the score and one for chord labeling. Segmentation is the task of finding the points of the music where the chord may change. Chord labeling is the task of labeling each segment with a chord. In the given algorithm, the score is segmented on every note hit (or note-attack). A score is calculated for each segment representing how well the notes of the segment represents a chord. Afterwards, the segments are mapped to the vertices of a directed graph while the scores are mapped to the values of the edges between vertices. The highest scoring path is then found through the graph, and the result represents the chord labeling of the entire score. Trying to make this algorithm as general as possible it has been decided not to include any analysis on the musical context, only local information is analyzed. The COCHONUT algorithm is described in [15] as being an extension to the algorithm found in [9]. The test-results described in this article show that by using harmonic contextual information and a few other techniques, the algorithm from [9] can be improved substantially. As I will be working with the COCHONUT algorithm throughout this project, it will be described in more detail in Section 3. The Melisma Music Analyzer (MMA) described in [16] also works on symbolic data. MMA is divided into several programs, but the so-called Harmony Program is especially relevant in this context as it is able to output the root note of each segment in the music by analyzing the possible roots as how well they fit into the circle of fifths. In [1] chord recognition is used as a tool for classification of music. Automatic genre classification is done by using chord sequences extracted from both symbolic and acoustic data. Patterns are created for each musical genre by looking at chord sequences in scores from that genre. These patterns are then used to classify other scores, simply by comparing patterns in the test-score. Along with chord recognition tools for symbolic data, a lot of work has been done for dealing with chord recognition on acoustic data. [5], [17], [4] and [6] use chromograms for this task, which is the most common approach. A chromogram is a feature vector that holds intensities for the 12 pitch classes found in music. To perform chord recognition, chromograms for different chords are compared with the musical score. 9 3 3 THE COCHONUT ALGORITHM The COCHONUT algorithm The system I am creating is based on the COCHONUT-algorithm developed by Scholz and Ramalho in [15]. The algorithm has three steps: 1. Segmentation of the score. 2. Chord-identification for each segment. 3. Contextual analysis to determine the best chord-candidate for each segment. Segmentation is accomplished by splitting up the score every time there are three or more note-attacks within a small time-frame. The COCHONUT algorithm is created for chord-recognition on jazz-music, and chord-changes often occur in this kind of music, when there are three or more simultaneous note-attacks. This idea is an extension of what was done in [9] where the score was segmented every time a note-attack occurred. Chord-identification is based on pattern matching. The pitches of each segment found in the previous step are compared to a set of chord templates, and from this comparison a list of chord-candidates for the given segment is formed. Each chord-candidate is scored using a scoring-function from [9], which gives a score that tells how well the chord-candidate represents the pitches of the segment. Contextual analysis is performed by creating a directed graph of the chord-candidates. In this graph, all chord-candidates (represented by vertices) in a given segment are connected (by directed edges) to all chordcandidates in the next segment. Finding the best way through the graph is done using chord-sequence patterns. These patterns are not explained in the paper but it is assumed that the sequences are created from chord-sequences found in a training-set of musical scores. 10 4 4 MUSICXML MusicXML Prior to reading this section, the reader should be familiar with the concepts of measures and parts in a musical score. Measures are used to divide the music on a time-based scale while parts are used to divide the music into sections that represent different instruments or voices. In a regular music sheet, measures are divided by vertical bars while each part of the score has its own horizontal line. These relations are illustrated in Figure 6. Furthermore, the reader should note that all XML-elements mentioned in this section and in the following sections, are written in a bold font. Piano Violin Figure 6: An example of a musical score using measures and parts. The score contains three measures divided by the two middle vertical lines. The score contains two parts: one for piano, which contains both a treble and a bass, and one for violin. Parts are not necessarily named for a specific instrument. MusicXML is based on XML. As done in a music sheet, MusicXML divides the music into measures and parts. MusicXML has two ways of describing music: part-wise or time-wise. Using a time-wise description, the hierarchical structure of the MusicXML, will be as in Figure 7: the score is divided into measures, and each measure is divided into parts. Using a part-wise description, the score would be divided into parts and each part into measures. The MusicXML-example in Figure 5 has a part-wise structure. This can be seen on the name of the root tag, which is score-partwise and by the fact, that the only part of the score (specified by the part-element) contains a measure (specified by the measure-element): the measure is a descendant of the part. 11 4 MUSICXML score measure 2 measure 1 part 1 part 2 part 1 part 3 part 2 part 3 Figure 7: The hierarchical structure of a MusicXML-document, when the music is described time-wise. The score is divided in measures which in turn are divided in parts. Each part contains information about the music located in a given part within a given measure. The creators of MusicXML, Recordare, provides XSLT stylesheets that can be used to convert MusicXML documents from one of these two formats to the other. Regardless of which one of the two formats is chosen, the structure of the music data in the children of parts/measures is the same. This music data will be described below. 4.1 Details This section describes the XML-elements that are used to represent the actual music data. The elements mentioned in the following, are all needed for the work done in the project. 4.1.1 Key The attributes-element sets the attributes of the score. Within this element, the key of score can be set. This is done in the key-element, which contains two elements: fifths and mode. The value of the fifths-element sets how many sharps/flats there are in the key of the score. The number of sharps are set with a positive integer while the number of flats is set with a negative integer. For example, to specify four flats in the key, the value of the fifths-element is set to -4. The mode-element is optional and is used to specify whether the key is in major or minor by setting the value of the element to either “major” or “minor”. 12 4 4.1.2 MUSICXML Notes The note-element describes a pitch or a rest. A note-element can contain the following elements (among others): pitch: If present, the note represents a pitch. The pitch element contains three elements: step, octave and alter (optional). The step-element represents the pitch class (the possible values of this element are the letters A to G) and octave represents the octave in which the pitch is set. alter represents the alternation of the pitch, that is, if the pitch is a sharp or a flat. Regardless of the key specified, the alternation of a pitch must be specified with this element - the value of the element is 1 for a sharp or -1 for a flat. duration: This is the duration (or length) of the note. As seen in Section 4.1.3, this duration is set in terms of quarter-note parts. chord: Only used when the note is a pitch. If present, this element states that the pitch should be played as part of a chord (explained in Section 4.1.4). rest: If present, the note represents a rest where no pitch sound. grace: If present, the note is a grace note. 4.1.3 Note-length The attributes-element hold a divisions-element which is important, as it sets the shortest possible note-length, s. s is defined as follows: s= 1 4 · 1 d = 1 4d where d is the value of the divisions-element. The attributes are commonly specified within the first measure element of the score, but it is also possible to change the attributes, and hereby change s within the score. Each note holds an element, duration which sets the length, l of the note using the following: l =s·r where r is the value of the duration-element. An example: if d = 12, 1 1 the shortest possible note has length, s = 4·12 = 48 . Each note must then 1 specify its length in terms of s. A note of length 8 should therefore set r = 6 1 because 48 · 6 = 81 . 13 4 MUSICXML Grace notes do not have a duration element, which means that they have no relevant length. 4.1.4 Simultaneous notes MusicXML maintains a musical counter which can be moved forward and backwards to set the order of the notes in the score. When a note is specified, the counter is moved forward by the value of the duration-element. Simultaneous notes can be created by placing a chord-element in the next note. This element specifies that the counter should not move forward, but instead start the next note in the same place as the previous. Figure 8 contains an example of using of the chord-element. <n o t e> <p i t c h> <s t e p>C</ s t e p> <o c t a v e>4</ o c t a v e> </ p i t c h> <d u r a t i o n>4</ d u r a t i o n> </ n o t e> <n o t e> <chord /> <p i t c h> <s t e p>E</ s t e p> <o c t a v e>4</ o c t a v e> </ p i t c h> <d u r a t i o n>4</ d u r a t i o n> </ n o t e> <n o t e> <chord /> <p i t c h> <s t e p>G</ s t e p> <o c t a v e>4</ o c t a v e> </ p i t c h> <d u r a t i o n>4</ d u r a t i o n> </ n o t e> Figure 8: An example of displaying a chord in MusicXML using the chord-element. The chord contains three pitches: C, E and G. The counter can also be moved using the backup- and forward-elements. These elements are specified on the same level as the note-elements of the 14 4 MUSICXML score, one could say that they are siblings to note. Like note, backup and forward both contain a duration-element that specifies how much the counter should be moved. Figure 9 shows an example of moving the musical counter backwards and forwards using these elements. 4.1.5 Parts of the score The parts of the score are specified in the part-list element. Within this tag, each part is represented by a score-part-element. A score-part-element contains an attribute, id which is the identifier of the part and an element, part-name holding the name of the part. 15 4 MUSICXML <n o t e> <p i t c h> <s t e p>F</ s t e p> <o c t a v e>4</ o c t a v e> </ p i t c h> <d u r a t i o n>4</ d u r a t i o n> </ n o t e> <n o t e> <p i t c h> <s t e p>A</ s t e p> <o c t a v e>4</ o c t a v e> </ p i t c h> <d u r a t i o n>4</ d u r a t i o n> </ n o t e> <backup> <d u r a t i o n>8</ d u r a t i o n> </ backup> <n o t e> <p i t c h> <s t e p>D</ s t e p> <o c t a v e>4</ o c t a v e> </ p i t c h> <d u r a t i o n>4</ d u r a t i o n> </ n o t e> <f o r w a r d> <d u r a t i o n>4</ d u r a t i o n> </ f o r w a r d> <n o t e> <p i t c h> <s t e p>G</ s t e p> <o c t a v e>4</ o c t a v e> </ p i t c h> <d u r a t i o n>4</ d u r a t i o n> </ n o t e> Figure 9: An example of moving the musical counter using backup and forward in MusicXML to construct a chord. In this case, an F is followed by an A. The counter is then moved back, and a D is specified simultaneously with the F. At last the counter is moved forward to specify a G after the A. 16 5 5 SYSTEM DESIGN System design The design of the system and its execution flow is illustrated in Figure 10. The design is based on the design of the COCHONUT algorithm. Each element of the system is described below. MusicXML Parser Intervals Partitioner Segments ChordIdentifier Candidate chords ContextAnalyser Labeled score Figure 10: The design and flow of the system, implementing the COCHONUT algorithm. Data is represented by ellipses and the elements of the system are represented by squares. Parser: The Parser parses a MusicXML-document and returns the musical data from this document, split up into so-called intervals that each has the length of the shortest possible note in the score. Partitioner: The Partitioner is given the intervals as input and produces a list of segments as output. A segment in the system corresponds to a segment defined in [15]. The work done of the Parser and the Partitioner corresponds to the work done in the first step of the COCHONUT algorithm. ChordIdentifier: The ChordIdentifier identifies the chord-candidates of each segment and calculates a score for each candidate. The higher the score, the more likely that the notes of the segment represents that chord. This is the second step of the COCHONUT algorithm. 17 5 SYSTEM DESIGN ContextAnalyzer: The ContextAnalyzer examines the chord-candidates and selects the best chord for each segment. In the last step of the COCHONUT algorithm, the contextual analysis is done using chord-sequence patterns, but as I have not been able to get a hold of these patterns, the ContextAnalyzer in my system has been designed differently: [10] specifies a comprehensive set of chord-transition rules. These rules specify legal transitions from and to different chord-types such as tonic, subdominant and dominant, and these rules are used for contextual analysis in my system. 5.1 Data format The system design has been based on the idea, that we will need a data format that can hold the music data in a way that makes it easy to analyze. The grammar below describes this data format. X→T T →S T → TS S → CN N →M N → NM M → LJ J →I J → JI I → AR R→P R → RP P → DO X: A score, T : a list of segments, S: a segment, C: a (recognized) chord, N : a list of mini-segments, M : a mini-segment, L: length of the mini-segment in terms of number of intervals, J: list of intervals, I: interval, A: number of note-attacks at the start of the interval, R: list of pitches sounding in an interval, P : a pitch, D: pitch-class, O: octave. The format is created from the idea, that we want to end up with a list of segments, each containing a chord. Basically, a score is divided in segments, which in turn are divided in mini-segments, which in turn are divided in intervals. The difference between a segment and a mini-segment is that a mini-segment should be created every time the notes change (that is, upon every note-attack) while a segment should be created every time there are three or more note-attacks. The idea of a mini-segment was found in [9]. 18 6 6 IMPLEMENTATION Implementation The source code that implements the system described in the previous section is found at the Google Code project on the following URL: http: //code.google.com/p/cochonut/source/browse/#svn/trunk/src. The system is implemented using Python 2.6 and Table 1 displays the files that hold the implementation of the elements of the system: Parser, Partitioner, ChordIdentifier and ContextAnalyzer. The remainder of this section, contains a description of the implementation of these elements. The file cochonut.py, which is also available at the Google Code project, is used to run the program by calling the relevant functions from all elements, but the implementation of this file will not be described. Element Parser Partitioner ChordIdentifier ContextAnalyzer File parser.py partitioner.py chord identifier.py contextanalyzer.py Table 1: Implementation files. 6.1 Parser The Parser depends on the MusicXML in the input to follow the specification given at [12]. Furthermore it assumes that the XML holds information about the key of the score, provided in a key-element. This information is returned by the Parser and used to find the tonic of the score, as we will see in Section 6.4. As explained earlier, MusicXML can be used to describe music either partwise or time-wise. I have made the assumption that a musical score always has one chord at one time in the music. Because of this, it would be convenient to have the music data structured in a sequential way, and this is exactly what the time-wise structure in MusicXML provides. Therefore the Parser works on time-wise MusicXML. For the system to support both part-wise and time-wise MusicXML documents, the Parser converts part-wise scores into time-wise scores using the lxml library [2]. The XSLstylesheet used for the conversion from part-wise to time-wise, is provided at http://www.recordare.com/dtds/parttime.xsl. I could have opted to work with a part-wise score in the system; in this case, the data of all measures for each part should be read, before merging the information measure by measure. I have, however, found it more obvious and easier to work 19 6 IMPLEMENTATION with a time-wise score. The reading of the XML-elements in MusicXML is done by reading the data into a tree-like structure. This is done using the lxml module as well: the module is able to parse an XML-document from a text file and return a tree-structured object representing the XML. This object can be queried for specific elements, such as all measure-elements or all child-elements of a given element. This makes it very easy to retrieve all the elements needed from the XML. From the Parser, the music of the score is output as a list of intervals. Each interval has the length of the shortest possible note in the score. An interval holds the count of how many pitches of a given pitch class that are sounding in that interval, plus a count of how many note-attacks are made at the start of the interval. The length of a note (pitch or a rest), that is, how many intervals the note lasts, is found by dividing the length of the note with the length of an interval. If for example the largest divisor in the score is 8: 1 1 in this case the shortest possible note is 4·8 = 32 - this is also the length 1 of an interval. The length of a 2 -note in terms of number of intervals will 1 be 21 / 32 = 16. Table 2 shows an example of how notes from Figure 11 are mapped to the list of intervals. Figure 11: A set of notes, that can be mapped to intervals. The Parser maintains a divisor -variable, to hold to the value of the divisorelement (described in Section 4) for each part in the music. This is needed because the divisor-value can be different for each part in the score. In addition, the Parser maintains a variable that keeps track of the next interval (in which we will place the next note) in each part, as we jump from part to part when using a time-wise structure. This variable corresponds to the musical counter. It is incremented/decremented when note-, backup- and forward-elements are encountered. A pitch is mapped to a pitch-class number and an octave number (although only the pitch-class is used throughout the system). As grace notes are explicitly defined in MusicXML using the grace-element, it is not hard for the system to cope with these “decoration”-notes: they are merely skipped, as they are not part of the current chord’s notes. Grace notes that are given as regular notes are stored in the intervals, however. Whether or not these notes are part of the current chord is up to the ChordIdentifier to decide. 20 6 Interval 1 2 3 4 5 6 7 8 Note-attacks 1 0 2 1 3 0 0 2 IMPLEMENTATION Pitches [C] [C] [D, F] [D, F] [C, E, G] [] [] [A, A] Table 2: How notes from Figure 11 are mapped to a list of eight intervals, each interval having the length of a 41 -note. A row in the table corresponds to an interval. At the start of the first interval, a C-note is attacked, this note lasts two intervals, as it is a half-note. Therefore one note-attack is registered in the first interval, but not in the second. The pitch C is registered in both intervals, though, as the pitch sounds in both of them. Notice how the last interval holds two A’s as two pitches of pitch-class A sound in this interval. 6.2 Partitioner The Partitioner’s work is done in two sections: first it creates mini-segments from the intervals and then it creates segments from the mini-segments. As mentioned, a mini-segment should be created every time the sounding pitches change. The Partitioner maintains a list of mini-segments which is initially empty. It then iterates through the intervals, and every time an interval is encountered which has a different set of pitches than the previous one, a new mini-segment is created and appended to the list of minisegments. Every mini-segment holds a variable with the count of how many note-attacks was made at the start of the mini-segment. A segment should be created every time there is a possible chord-change. How many note-attacks is needed for a possible chord change is specified in the required attacks parameter (as in the COCHONUT-algorithm this parameter defaults to 3), which is set when calling the Partitioner. The required amount of note-attacks for a new segment to begin may occur within a given time-frame, t. The Partitioner is given the length of this time-frame as a parameter which specifies the time-frame in terms of number of intervals. In [15] the time-frame is based on the beat of the music, but it is not explained in the article how the time-frame is calculated. Therefore the length of time-frame is set as a parameter, T in the program - the value 1 of this parameter should be set to 18 or 16 . Recall from Section 4.1.3 that the length, s of the shortest possible note is 21 6 s= IMPLEMENTATION 1 4d where d represents the value of the divisions-element. t is calculated (before it is passed to the Partitioner) as t= T s = T · 4d When creating segments, the Partitioner follows the pseudo-code in Algorithm 1. It basically loops through the mini-segments and creates a new segment every time required attacks or more note-attacks occur. In the pseudo-code, the intervals()-function, returns the length of a mini-segment in terms of how many intervals it spans. Algorithm 1 Pseudo-code for creating segments (comments are in brackets) i=0 while i < length(mini segments) do j =i+1 m = mini segments[i] {m is the current mini-segment} n = mini segments[j] {n is the next mini-segment} s = list(m) {Create a list, s with m as the single, initial member} total attacks = m.note attacks total length = intervals(m) {total length is the length of s in terms of intervals} while total length + intervals(n) ≤ time frame do total length += intervals(n) total attacks += n.note attacks s.append(n) j += 1 n = mini segments[j] end while last = length(segments)-1 if total attacks ≥ required attacks then segments.append(s) {Mini-segments in s represent a new segment} else segments[last].add(s) {Mini-segments in s are part of an existing segment} end if i=j end while 22 6 6.3 IMPLEMENTATION ChordIdentifier The ChordIdentifier uses the following procedure to identify the chordcandidates of each segment: 1. A weight-vector is calculated for each segment. This vector contains twelve elements; one for each pitch-class. Each element in the vector is a count of how many times a pitch occurs in the mini-segments within the segment. Element 0 represents the count of C’s, element 1 represents the count of C#/Db’s, element 2 the count of D’s and so on. If for example a G-pitch is present in three mini-segments within the segment, the value of the seventh element will be 3. 2. The weight-vector of each segment is compared to all chord-templates using a combination of all possible pitch-classes as roots in each chord. For each combination, a score is calculated using a scoring-function. 3. The root/template-combinations that have a score of at least min of the highest score are chosen as chord-candidates. In the second step, the chord-candidates are scored using the scoring-function found in [9], which measures the distance from the weight-vector of a segment to each template in a set of chord templates. A chord template is represented by an array which holds the pitch class-indexes of the pitches that are in the chord. For example, the array [0, 4, 7] represents a major triad chord, meaning that in a major triad, the root plus the fourth and seventh pitch-class (seen from the root) are sounding. The scoring-function is implemented as a Python-function, which means that it easily can be replaced by another function (like it is discussed in Section 8.1). Furthermore, the list of templates is given as a parameter to the ChordIdentifier, which makes it possible for the user to provide her own chord-templates. In the current function, the score, S for a given segment with a given weight vector, is calculated in the following way: S = P − (N + M ) P is the positive factor which is the sum of pitch-counts for the pitch-classes in the template, N is the negative factor which is the sum of pitch-counts for the pitch-classes not present in the template and M represents the misses which is the count of pitches from the template not being played. As an example of scoring a chord, we will consider the following weightvector, w: 23 6 IMPLEMENTATION w = [1, 0, 0, 0, 3, 0, 1, 0, 0, 3, 0, 0] w represents a segment where one C, three E’s, one F#/Gb and three A’s are sounding. As mentioned, the template of a major triad chord is given by the array [0, 4, 7]. This template is adjusted to a given root when used for comparison. To adjust the major triad template so it represents a A-major triad (A having pitch-class 9) will be ([0 + 9, 4 + 9, 7 + 9]) mod 12 = [9, 1, 4] which corresponds to the pitch-classes A, C# and E. Using the weight vector, w above and the template, t for a A-major triad, the scoring-function gives the following calculations: P = P N= P i∈t w(i) i∈t / w(i) = 6 as w(9) = 3, w(1) = 0 and w(4) = 3 = 2 as w(0) = 1 and w(6) = 1 M = 1 as w(4) = 0 S = 6 − (2 + 1) = 3 In [15], a directed graph is built from the chord-candidates where all the chord-candidates of each segment point to the candidates of the next. The ChordIdentifier in this system does not, however, need to build such a graph as the segments are already sorted in the order of how they occur in the music and therefore the candidates of one segment already “point” to the candidates in the next segment, because each segment points to the next. 6.4 ContextAnalyzer As mentioned in Section 5, the ContextAnalyzer uses a set of rules from [10] specifying legal chord-transitions from and to different chord-types. Not all these rules have been implemented as this paper does not define the chordtypes directly. Therefore I have implemented rules for all chord-types found in [3]. The system is able to recognize a reasonable amount of chord-types (14 in total). These are given in Appendix A. The function that determines the chord-types is hard-coded in the ContextAnalyzer and the user of the system will therefore not be able simply give new chord-types as parameters - at least not without changing the function. To provide an easier way for the user to use other chord-types than the ones already defined, a grammar could have been created to represent the function. The user would then be able to provide a small function (which obeys the rules of the grammar) when providing a new chord-type. Before the ContextAnalyzer is able to find legal transitions, the tonic of the score must be determined because all chord-types depend on the tonic24 6 IMPLEMENTATION chord. For this purpose, the ContextAnalyzer is given the key of the score. Using this key, it finds the tonic of the score using a method from [3]1 . The tonic could also have been determined by looking at the last pitch of the score. This approach have not been taken, though, since the last pitch can be several pitches and we do not know which one of them is the root. When doing the actual analysis, the ContextAnalyzer iterates through the segments and selects the most appropriate chords from the list of candidatechords. Algorithm 2 shows the pseudo-code used to do this. In this code, find legal transitions(p,c) finds the chords from c that are legal chord-transitions from the previous chord, p while find best score(l) finds the chord with the highest score from a list, l of chords, simply by comparing the scores of all the chords and selecting the one with the highest score. If two or more candidates share the highest score, we simply pick the first one. This approach may seem naive and it is only taken because the scoring-function returns imprecise scores. In my opinion the scoring-function should be optimized to return more precise scores - this is discussed in Section 8. Algorithm 2 Context-analyzing the segments (comments are in brackets) p = null {p is the previous chord, which is nothing at first} for all s in segments do c = s.chord candidates {c is now the list of chord-candidates} l = find legal transitions(p,c) {l holds the chords from c that are legal transitions from p} if length(l) > 0 then s.chord = find best score(l) {legal transition(s) found: set the chord of s to the chord from l with best score} p = s.chord else if length(c) > 0 then s.chord = find best score(c) {no legal transition(s) found: set the chord of s to the chord from c with best score} p = s.chord end if end if end for 1 This method is quite trivial. It finds the root of the key, and hereby the tonic, by locating the key in the circle of fifths. 25 7 7 TESTS Tests Two types of tests have been made: First, a functional test has been created to test if the system works as expected on a set of small test data. Second, a score-test has been created to test the system on a set of musical scores. In this section, the two tests are described and their results are evaluated. All MusicXML-files used for tests are found at the Google Project, at the following URL: http://code.google.com/p/cochonut/source/browse/#svn/trunk/ test. All tests were made with a set of 48 chord-templates collected from [3] and [8]. In [15], only chord-patterns appropriate to jazz harmony are used. I have, however, chosen to go with a single comprehensive set of chordpatterns. This should make the chord identification more precise, as there are more templates to choose from. In the end, this should provide better results. During the tests, chord-candidates are discarded if their score is less than 85% of the highest score. This is the same approach as the one taken in [15]. In addition, the time-frame in which a required number of 1 note-attacks should occur, is set to the length of a 16 -note. The number of required note-attacks is set to 3. Ideally, tests should have been carried out, trying different parameter-settings to see if better results could be obtained with other settings than the ones explained, but because of the small scope of this project, I have opted to use just the settings described above. 7.1 Functional tests When testing if the system works, the following factors need to be tested: A. When there are no chord changes, no chord change should be registered. B. The system should label the music with maximum one chord at any given point in music. C. The system should be able to recognized a chord when all its notes are played simultaneously. D. The system should be able to recognized a chord even though not all its notes are played simultaneously. E. A chord should be recognized even though it contains notes that are not found in the chord regularly (grace notes). 26 7 TESTS F. The system should be able to recognize the most simple chords such as triads. G. The system should be able to recognize the more complex chords with five, six or seven notes. H. Chord sequences from the possible chord-transitions should be recognized. I. When no regular chord sequences from the possible chord-transitions are recognized, the system should give a give a good guess on which chords are played. J. If the system encounters a segment of notes that does not represent a chord, it should still be able to give a guess on what chord is being played. Furthermore, the chord recognition should not be influenced by the length of the notes, except that a note which lasts longer may have a higher weight in the weight-vector, and hereby chord templates that use this note, will have a higher score. To test the factors above, I have created seven tests, numbered 1-7. Table 3 illustrates the factors they test. Their notes (along with the expected results and the actual results) are found in Table 4. For each test, the corresponding XML-file at the URL above is named testX.xml, where X is the test number. Test 1 2 3 4 5 6 7 A x B x x x x x x x C D E F x x x x x x x x x x x x x G H x x x x x I J x x x Table 3: The factors tested by the seven functional tests. An ’x’ means that the test tests the given factor. For example: Test 2 tests factors B, C, F and H. Test 1 contains no chords. Notice, that the notes in this test may sound as a two D-major triads, but the system will not be able to recognize these chords, as it expects three or more simultaneous notes to detect a chord change. Test 2 contains some simple triad chords: C-major, F-major and G-major, where all pitches of each chord are played simultaneously. The order of the 27 7 TESTS chords are made from the pattern: tonic → subdominant → dominant → tonic. Test 3 contains two triad chords, D-major and G-major where not all the notes of the chords are played simultaneously. The chord-sequence used is: tonic → subdominant. Test 4 contains two triad chords, C-minor and F-minor. The F-minor is broken up and not all its notes are played simultaneously. In addition, a grace note (a 18 -note: Ges), which does not fit into the template of the chord is played during the F-minor. Test 5 has contains a set of complex chords. The sequence used is: tonic → tonic parallel → subdominant sixth → dominant seventh. Test 6 contains a transition from a subdominant to the tonic, which is not a legal transition according to the system. Test 7 contains two sets of some simultaneously played notes, neither of them does form a specific chord. 7.2 Score tests The score tests are used to test the system on “real” music. Among the selected scores should be: • scores with few (one or two) parts • scores with multiple (more than two) parts • scores with many short notes • scores with many simultaneous notes 28 Test 1 29 2 3 4 6 7 Expected output 5 Input G C D C F E /11 13 6/add9 GC G Any two chords are recognized Table 4: Functional tests C m A G C D Cm Fm C F G Actual output G Cm Fm 11 Bm E /11 13 6/add9 C m A GC G 9/add 13 C 9/add13 C 11 Bm 7 Score Composer 1 Shumann, R. 2 3 4 Mozart, W.A. Beethoven, L.v. Actor, L. 5 Bach, J.S. Music Im wundersch¨onen Monat Mai, Dictherliebe Das Veilchen An die ferne Geliebte Prelude to a Tragedy Brandenburg Concerto No. 2 in F Major, BWV 1047 Sheets TESTS File 2 dichter.xml 1 1 4 veilchen.xml geliebte.xml tragedy.xml 5 branden.xml Table 5: Musical scores used for score tests. Scores 1-4 were retrieved from [14] while Score 5 was retrieved from [11]. Score 1 was chosen because is contains a lot of short notes and few places with three or more note-attacks. This will result in segments with more complex weight-vectors. Score 2 also contain a lot of short notes, but it also contain a lot simultaneous note-attacks. The same is the case for Score 3. Score 4 and 5 have been chosen because of their multi-part music (Score 4 has 22 parts, Score 5 has four parts). I have not been able to retrieve scores that were already labeled with chords, neither have I had access to an expert who could label the scores for me. Therefore, I have not been able to calculate precise results saying how many chords were actually labeled correct. The evaluation of the results is merely based on my own basic knowledge about chords. 7.3 Results As seen from Table 4, all functional tests gave results as expected. The score tests gave reasonable results: The system was able to guess for a chord in every segment, and often the chord was a reasonable guess according to the pitches being played. Figure 12 shows an example of some chords that were recognized in Score 3. During the score tests, it turned out that some chords may be given a high score even though the root of the chord is not played within the segment. Take for example the following weight-vector which represents the fifth segment of Score 3: [1, 0, 1, 8, 0, 0, 0, 4, 0, 0, 1, 0] 30 7 Voice 43 43 Piano 3 4 p E 6 E E C Cm 9 7 A TESTS E Fm Fm F 7 Figure 12: Chord-labeling of the first four measures of Score 3. Chord-identification with this vector yields a lot of candidates with scores ranging from 13 to 15. Among these candidates are Fm13 and F13, although no F-pitches sound in the segment. This is because the score function does not take the root of the chord into account. It merely scores the weightvector based on the pitch-classes that should be in the chord, not taking into account that the root is more important than the other classes, because it is rarely left out of the chord. In Figure 12 the second and third segment are labeled with chords although the roots of these chords are not sounding in the segments. The score tests also revealed that the system is having some difficulty in dealing with multi-part scores. This is mainly because the system requires a number of note-attacks for a new chord to start. In multi-part music, this number can be quite hard to determine, as the different parts often use different note-patterns and because there can be a lot of parts. Score 4, for example, is written for an orchestra and contains 22 parts, where some of the parts play a lot of short pitches while other parts play only long pitches. The number of note-attacks at the points at which the chords change in this score, does range from 3 to 28 and therefore it is almost impossible to set a number of required note-attacks. Because of this, the results from Score 4 were not very good. The results of running Score 5 were somewhat better, because this score has fewer parts and because the note-patterns in the parts varies a lot. The time that the system uses to process a MusicXML-file depends, naturally, on the size of the file. Score 2 contains approximately 4800 lines (2 parts in music) and is processed in 400ms while score 4 that contains 42900 lines (22 parts in music) is processed in 5500ms. 31 8 8 CONCLUSIONS AND FUTURE WORK Conclusions and future work A system has been created to parse music from MusicXML-files, partition this music into segments where the notes of each segment represent a chord and at last identify and analyze chord-candidates of each segment to label each segment with a chord. The system is able to handle music with grace notes and/or dissonance and according to the functional tests, the system is working. According to the score tests, the system is useful on some kinds of real classical music. However, as seen from the test results, there are some issues that one could deal with in the future to improve the system. Apart from these issues, I am generally satisfied with the results that I was able to achieve with the system. It is difficult to compare my results to the results obtained in [15] as I was not able to test pre-labeled scores, and therefore have no precise indication of how many of the chords that were correctly recognized. In [15], the results show that the COCHONUT algorithm is able to recognize 6575% of the chords correctly. My guess is that the system developed in this project has a lower recognition-rate. The recognition-rate does, however, depend on the number of parts in the music: more parts make it more difficult to determine the points in which the chord changes and therefore lowers the recognition-rate. The tests also revealed that the current scoring-function is not optimal. The main problem is that the function is not taking the chord-root into account: a set of pitches can achieve a relatively high score for a given chord-template although the root of the chord is not among the pitches. The system has been built to work on MusicXML, because this format has been developed to represent sheet music in an accurate way. Working with MusicXML presents some problems though: the format allows the creator of the XML to leave out some parts of the music-specification like for example the key of the music. This made it more difficult to parse MusicXML-files, and because of this, it was need to put some further restrictions on the XML. The remainder of this section presents some possible enhancements that could be made to improve the system. The enhancements are presented in a prioritized order, meaning that the suggestions given first should enhance the system more effectively than the ones given last. 32 8 8.1 CONCLUSIONS AND FUTURE WORK New scoring-function As mentioned, the current scoring-function is not optimal, and a replacement of this function may enhance the performance of the system substantially. The new scoring-function should score the chord-candidates in a similar way to what the current function does, but also take the root of each chord into account: if the root is present, the score should be higher than if it is not. The new function could also have more precise scoring, so we are able to select the chord with a distinct highest scoring (with the current function, different chord-candidates often have the same scores). The information, about which pitches are lowest/highest at a given point in the music in terms of octaves, is not currently used. This information could be used to score the chord-candidates more precisely or to determine the chordtypes during the contextual analysis (some chord-types are defined by which pitch is the lowest). The data format used in the system is already able to handle information about octaves, so the information is easily accessible. At last, a new scoring-function could be one which is genre-specific and able to calculate more precise scores for the genre in question. Currently, a parameter is set to filter candidates that have a score less than a given threshold. If a new scoring-function is developed this function should do the calculation of this threshold automatically: if, for example, we have a lot of high scores the threshold should be set high, as we would only want to consider the top-scoring candidates. 8.2 Analyzing specific part(s) of the music When analyzing multi-part music, the approach of segmenting the score whenever a given number of note-attacks occur, is not optimal, because it is not easy to determine how many note-attacks that makes a chord change in multi-part music. Instead of analyzing all parts of the score, another approach could be taken, in which one would analyze a single part of a score. In this case, the algorithm would have to locate the most “important” part(s) and analyze this/these. This could be set as an requirement to the input: when giving the input, the most important part of the music should be marked, and the analysis should then be carried out only on this part. One could also try to find the most important part(s) automatically within the system, for example by removing the parts that contains a bass-key: these parts does almost certainly not hold the treble, and hereby does not hold the melody of the music. This makes the bass-parts unimportant when recognizing chords. 33 8 8.3 CONCLUSIONS AND FUTURE WORK More precise weight-vectors The concept of a weight-vector has been found in [9]: a single state of pitches is maintained for each mini-segment. The weight-vector of the segment is then based on the pitches of the mini-segments within the segment. Consider the example in Figure 13. The weight vector of the first segment for the notes in this figure will be [1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1] which means that the short F-pitch has the same importance as the longer G, B and D pitches (played simultaneously) even though they last four times longer. This may result in a different chord than the one intended by the composer. Figure 13: Illustration of how a weight-vector is constructed. The first segment in these notes would have the weight-vector [1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1], which does not weigh the tones of the long G, B and D-pitches higher than the F-pitch. A possible solution to this issue could be to maintain interval-weight-vectors so in the example, we would get the weight vector [0, 0, 8, 0, 0, 1, 0, 8, 0, 0, 0, 8] for the first segment, if the each interval is a 18 -note long. Having such vectors would mean that a new scoring-function should be created, or that the score function should be applied to the weight-vector of every interval in the segment. If not, the misses factor would be too unimportant. 8.4 Use of chord-sequence rules or additional transitions As I was not able to get a hold of the chord sequence rules that are used in [15], implementing these rules into the current system should of course be tried in the future. If this enhancement is not tried, and the user of the system chooses to stick to the model with chord-transitions that I have implemented, the list of possible transitions should be extended. As described in Section 6.4, the system could be extended so the user would be able to pass other chord-types and -transitions to the system as parameters. 8.5 Thorough testing As mentioned during the tests, the system has not been tested with different parameters than the ones specified in [15]. A more thorough test of the system may give different results and provide an indication of where the 34 8 CONCLUSIONS AND FUTURE WORK system should be improved. 8.6 Use of an object-oriented model Although it may not enhance the performance of the system in terms of chord-recognition, it may be easier to develop the system further, if it were built on an object-oriented model. 35 REFERENCES References [1] Anglade, A., Ramirez, R. and Dixon, S.: Genre Classification Using Harmony Rules Induced from Automatic Chord Transcriptions, Proceedings of the 10th International Conference on Music Information Retrieval (ISMIR 2009), Kobe, Japan, October, 2009. [2] Behnel, S. et. al: lxml. http://codespeak.net/lxml/. Retrieved December 23rd 2009, 12:25 GMT. [3] Grønager, J.: Nøgle til musikken - grundlæggende musikteori (eng.: Key to the music - basic music theory). Systime, 2004. [4] Khadekevich, M. and Omologo, M.: Use of hidden Markov Models and Factored Language Models for Automatic Chord Recognition, Proceedings of the 10th International Conference on Music Information Retrieval (ISMIR 2009), Kobe, Japan, October, 2009. [5] Lee, K.: Automatic Chord Recognition from Audio Using Enhanced Pitch Class Profile. Proceedings of International Computer Music Conference, 2006. [6] Lee, K. and Slaney, M.: Automatic Chord Recognition from Audio Using an HMM with Supervised Learning, Proceedings of the 7th International Conference on Music Information Retrieval, Victoria Canada, 2006. [7] MIDI Manufacturers Association Incorporated: Tutorial: History of MIDI, 1995-2009. http://www.midi.org/aboutmidi/tut_history. php. Retrieved November 18th 2009, 10:50 GMT. [8] Neut, E.v.d.: Chord House, Piano Room. http://www.looknohands. com/chordhouse/piano/. Retrieved Januar 8th 2010, 10:00 GMT. [9] Pardo, B. and Birmingham, W.: The Chordal Analysis of Tonal Music, Technical Report CSE-TR-439-01. The University of Michigan, Electrical Engineering and Computer Science Department, 2001. [10] Pedersen, E.: Verificering og generering af koralharmoniseringer (eng.: Verification and generation of choral harmony). 2006. [11] Project Gutenberg: Online Book Catalog. http://www.gutenberg. org/catalog/. Retrieved January 18th, 12:00 GMT. [12] Recordare: MusicXML 2.0 DTD Index. http://www.recordare.com/ dtds/index.html. Retrieved January 29th 2010, 13:45 GMT. [13] Recordare: MusicXML FAQ. http://www.recordare.com/xml/faq. html. Retrieved November 18th 2009, 12:00 GMT. 36 REFERENCES [14] Recordare: MusicXML Samples. http://www.recordare.com/xml/ samples.html. Retrieved January 18th, 12:00 GMT. [15] Scholz, R. and Ramalho, G.: COCHONUT: Recognizing Complex Chords From MIDI Guitar Sequences, Proceedings of the 9th International Conference on Music Information Retrieval, 2008. [16] Sleator, D. and Temperley, D.: The Melisma Music Analyzer. http:// www.link.cs.cmu.edu/music-analysis/. Retrieved December 26th 2009, 10:40 GMT. [17] Yoshioka, T. et. al.: Automatic Chord Transcription with concurrent recognition of chord symbols and boundaries. Proceedings of the 5th International Conference on Music Information Retrieval, 2004. 37 A A SUPPORTED CHORD-TYPES Supported chord-types The ContextAnalyzer of the system is able to recognize the chord-types given below. When referring to a “scale”, it is the scale used in the score. Type Tonic Dominant Dominant seventh Subdominant Tonic parallel Subdominant parallel Dominant parallel In-complete dominant Dominant none Dominant quarter-sixth Subdominant sixth In-complete subdominant Minor subdominant Subdominant parallel seventh Description Chord build on the first step in scale Build on the fifth step in scale Dominant, added a small seventh Build on fourth step in scale Parallel-chord to tonic Parallel-chord to subdominant Parallel-chord to dominant Dominant seventh with no pitch at root Dominant, added a small none Dominant where the third is replaced with a quarter and the fifth is replaced with a sixth Subdominant, added a small sixth Subdominant sixth with no fifth Subdominant, added a large sixth Subdominant parallel, added a seventh 38