Automatic Transcription of Music (WAV-to-MIDI)

Transcription

Automatic Transcription of Music (WAV-to-MIDI)
Annual presentation
Automatic Transcription of Music
(WAV-to-MIDI)
Nathan Fellman
Yuval Peled
Supervisor:
Hadas Ofir
In cooperation with Mobixell
Project’s Goal
•
Create a system that will automatically transcript
notes from a recording of polyphonic music and use
these notes to create a new file in MIDI format:
– Correctly identify the note’s pitch frequency.
– Correctly identify the onset-time and duration of
each note.
WAV -2-MIDI
Introduction to
Musical notes
• A note is built of a set of
frequencies – the base frequency
and all its harmonies.
Introduction to
Musical notes
• The chromatic scale is made up of 12
notes, that are repeated across
octaves.
• The ratio between the base
frequencies of two similar notes on
consecutive octaves is 2.
frequency
Log (frequency)
ÎThe notes are geometrically spaced,
with a ratio
of 12 2 between
every two
consecutive
notes.
notes
notes
Introduction to
Musical notes
• Note
characterizations:
– Pitch: The base frequency
– Onset time
– Duration
– Timbre: instrumentspecific characteristics
– Volume
Introduction to MIDI
• MIDI is a de-facto standard for storing,
transmitting and reproducing music.
• MIDI files are relatively small, require
low bandwidth and are very easy to edit.
• The MIDI representation has a finite set
of characteristics. It can’t express the
feeling that a real musician can create.
• MIDI files are built of events:
– Onset of a new note with its characterizations
(pitch, time, timbre, volume, etc.)
– Offset of the note
Time-frequency plot of a
simple polyphonic piece
Problems identifying
notes
• Pitch
— Separate base frequency from harmonies
— Overlapping frequencies
— Same note – different octaves?
•Onset time
— Time resolution
— Soft attack
— Overlapping notes
•Duration
– When does a note end?
– Is it one long note or several short ones?
Time-frequency plot of a
complex polyphonic piece
Solution design
Every solution should have the
following stages:
1. Time-Frequency analysis
2. Filtering sufficient statistics
3. Decision rule
4. Creation of MIDI file
Our proposed solution
Four low frequencyresolution transforms
Analyze, compare
and merge data
from all resolutions.
The decision of
which are the real
notes is made here.
For each resolution
and at each interval in
time - find frequencies
that may be the base
frequency of a note
Concatenate short
notes as needed
and filter out short
overlapping notes
Find the energy of
each potential base
note frequency and
its harmonies
Create a list of
notes and generate
the MIDI file
Theory - Filter Banks
•
DFT can’t be used when the frequencies are on
a logarithmic scale. Using Filter Banks it is
possible to calculate the spectral coefficients
on different Bands.
A figure that describes the process:
x(n)
h( n )e jnθ
h(n) – window function
•
X ( e jθ )
e − jnθ
The different bandwidths are achieved by
choosing different window lengths for the
band pass filters.
Energy calculation
• We can tell if a frequency is a base
frequency or a harmony by looking at
the energy of its multiples. A base
frequency has significant energy in its
multiples. A harmony will have less
energy in its multiples.
• The program builds a filter designed to
sum all the energy in the base
frequency’s harmonies and use this data
to separate real notes from fake ones.
The Decision rule
• There are now 4 estimations calculated
with different resolutions. The next step
is to combine them.
• Every estimation is a map of frequencies
in every interval in time.
• The maps are enlarged in the time axis to
the Least Common Multiple.
• After checking a few rules, the
conclusion was that the most effective
decision rule is the majority decision.
Post Processing
• The result of the decision rule
needs some more filtering.
• “Holes” may appear in long notes
when there are large temporal
changes in the amplitude due to
close notes. A small median filter is
applied in order to fix these holes.
• A sharp onset of the note can cause
short overlapping notes to appear
at the beginning of the note. These
are erased.
Software Explanation
• The program is able to analyze a musical
piece with no prior knowledge.
• Supplying prior knowledge about the
piece greatly improves the performance.
• Prior knowledge is:
– Frequency range
– Instrument-specific input
• It is also possible to tune the sensitivity
of the analysis on the command line.
Example 1 – A tune
Monophonic music
Source File (WAV)
Result File (MIDI)
Total notes played: 49
Incorrect notes: 4 (8.1%)
Detection Rate: 91.9%
Example 2 - Chords
Polyphonic music
Source File (WAV)
Result File (MIDI)
Total notes played: 9
Incorrect notes: 0 (0%)
Detection Rate: 100%
Example 3 – A tune
Polyphonic music
Source File (WAV)
Result File (MIDI)
Total notes played: 23
Incorrect notes: 5 (21%)
Detection Rate: 79%
Conclusions
• The system that was built is capable of
recognizing and transforming a simple
polyphonic audio recording into MIDI.
• It is flexible enough to be able to work
with different kinds of samples, and it
is easily modified to fit the needs of
the recording.
• Built in a modular way, every section
(transformation, filtering, decision
rule) can be replaced or changed
separately.
Future Directions
• Improve so it can handle more
complex polyphonic recordings
• Add estimation of repeated onsets
(breaking up long notes into short
ones)
• Reduce complexity. Currently needs
a strong computer to perform the
transformation and filtering, and to
compare the results.
Annual presentation
Automatic Transcription of Music
(WAV-to-MIDI)
Nathan Fellman
Yuval Peled
Supervisor:
Hadas Ofir
In cooperation with Mobixell