Automatic Transcription of Music (WAV-to-MIDI)
Transcription
Automatic Transcription of Music (WAV-to-MIDI)
Annual presentation Automatic Transcription of Music (WAV-to-MIDI) Nathan Fellman Yuval Peled Supervisor: Hadas Ofir In cooperation with Mobixell Project’s Goal • Create a system that will automatically transcript notes from a recording of polyphonic music and use these notes to create a new file in MIDI format: – Correctly identify the note’s pitch frequency. – Correctly identify the onset-time and duration of each note. WAV -2-MIDI Introduction to Musical notes • A note is built of a set of frequencies – the base frequency and all its harmonies. Introduction to Musical notes • The chromatic scale is made up of 12 notes, that are repeated across octaves. • The ratio between the base frequencies of two similar notes on consecutive octaves is 2. frequency Log (frequency) ÎThe notes are geometrically spaced, with a ratio of 12 2 between every two consecutive notes. notes notes Introduction to Musical notes • Note characterizations: – Pitch: The base frequency – Onset time – Duration – Timbre: instrumentspecific characteristics – Volume Introduction to MIDI • MIDI is a de-facto standard for storing, transmitting and reproducing music. • MIDI files are relatively small, require low bandwidth and are very easy to edit. • The MIDI representation has a finite set of characteristics. It can’t express the feeling that a real musician can create. • MIDI files are built of events: – Onset of a new note with its characterizations (pitch, time, timbre, volume, etc.) – Offset of the note Time-frequency plot of a simple polyphonic piece Problems identifying notes • Pitch — Separate base frequency from harmonies — Overlapping frequencies — Same note – different octaves? •Onset time — Time resolution — Soft attack — Overlapping notes •Duration – When does a note end? – Is it one long note or several short ones? Time-frequency plot of a complex polyphonic piece Solution design Every solution should have the following stages: 1. Time-Frequency analysis 2. Filtering sufficient statistics 3. Decision rule 4. Creation of MIDI file Our proposed solution Four low frequencyresolution transforms Analyze, compare and merge data from all resolutions. The decision of which are the real notes is made here. For each resolution and at each interval in time - find frequencies that may be the base frequency of a note Concatenate short notes as needed and filter out short overlapping notes Find the energy of each potential base note frequency and its harmonies Create a list of notes and generate the MIDI file Theory - Filter Banks • DFT can’t be used when the frequencies are on a logarithmic scale. Using Filter Banks it is possible to calculate the spectral coefficients on different Bands. A figure that describes the process: x(n) h( n )e jnθ h(n) – window function • X ( e jθ ) e − jnθ The different bandwidths are achieved by choosing different window lengths for the band pass filters. Energy calculation • We can tell if a frequency is a base frequency or a harmony by looking at the energy of its multiples. A base frequency has significant energy in its multiples. A harmony will have less energy in its multiples. • The program builds a filter designed to sum all the energy in the base frequency’s harmonies and use this data to separate real notes from fake ones. The Decision rule • There are now 4 estimations calculated with different resolutions. The next step is to combine them. • Every estimation is a map of frequencies in every interval in time. • The maps are enlarged in the time axis to the Least Common Multiple. • After checking a few rules, the conclusion was that the most effective decision rule is the majority decision. Post Processing • The result of the decision rule needs some more filtering. • “Holes” may appear in long notes when there are large temporal changes in the amplitude due to close notes. A small median filter is applied in order to fix these holes. • A sharp onset of the note can cause short overlapping notes to appear at the beginning of the note. These are erased. Software Explanation • The program is able to analyze a musical piece with no prior knowledge. • Supplying prior knowledge about the piece greatly improves the performance. • Prior knowledge is: – Frequency range – Instrument-specific input • It is also possible to tune the sensitivity of the analysis on the command line. Example 1 – A tune Monophonic music Source File (WAV) Result File (MIDI) Total notes played: 49 Incorrect notes: 4 (8.1%) Detection Rate: 91.9% Example 2 - Chords Polyphonic music Source File (WAV) Result File (MIDI) Total notes played: 9 Incorrect notes: 0 (0%) Detection Rate: 100% Example 3 – A tune Polyphonic music Source File (WAV) Result File (MIDI) Total notes played: 23 Incorrect notes: 5 (21%) Detection Rate: 79% Conclusions • The system that was built is capable of recognizing and transforming a simple polyphonic audio recording into MIDI. • It is flexible enough to be able to work with different kinds of samples, and it is easily modified to fit the needs of the recording. • Built in a modular way, every section (transformation, filtering, decision rule) can be replaced or changed separately. Future Directions • Improve so it can handle more complex polyphonic recordings • Add estimation of repeated onsets (breaking up long notes into short ones) • Reduce complexity. Currently needs a strong computer to perform the transformation and filtering, and to compare the results. Annual presentation Automatic Transcription of Music (WAV-to-MIDI) Nathan Fellman Yuval Peled Supervisor: Hadas Ofir In cooperation with Mobixell