Full Text - J

Transcription

Full Text - J
Journal of Signal Processing, Vol.17, No.2, pp.29-38, March 2013
PAPER
PAPER
Sparseness Criteria of F0-Frequencies Selection for Specmurt-Based
Multi-Pitch Analysis without Modeling Harmonic Structure
Daiki Nishimura, Toru Nakashika, Tetsuya Takiguchi and Yasuo Ariki
Graduate School of System Informatics, Kobe University, Kobe 657-8501, Japan
E-mail: {nishimura, nakashika}@me.cs.scitec.kobe-u.ac.jp, {takigu, ariki}@kobe-u.ac.jp
Abstract
This paper introduces a multi-pitch analysis method using specmurt analysis without modeling
the common harmonic structure pattern. Specmurt analysis is based on the idea that the fundamental
frequency distribution is expressed as a deconvolution of the observed spectrum by the common harmonic
structure pattern. To analyze the fundamental frequency distribution, the common harmonic structure
needs to be modeled accurately because it is often unknown while the observed spectrum is known. It
is considered impossible, however, to obtain a highly accurate model of the structure since it can vary
slightly depending on the pitch. Therefore we propose a method to analyze the fundamental frequency
distribution without modeling the harmonic structure. We note that each peak of the observed spectrum
indicates the fundamental frequency or the harmonic tone. Hence, the fundamental frequency distribution
can be regarded as the set which only has the peaks corresponding to fundamental frequencies. To find the
set, we prepare many sets of the peaks of the spectrum, and obtain a large number of common harmonic
structures. We evaluate the sparseness of these structures using L1 or L2 norm, and then select the set
that has derived the sparsest structure as a solution. The experimental result shows the effectiveness of
the proposed method.
Keywords: multi-pitch analysis, sparseness criteria, specmurt analysis
1.
Introduction
In recent years, music information processing
technology has improved dramatically. This gives us
many chances for creating music. For example, in the
past only those who had specific musical skills could
compose or arrange music, but now, anyone can enjoy these activities by using various music-related software. However, there still remain some fields that rely
on people with specific skills, such as perfect-pitch.
This ability is necessary when attempting to reproduce or score music by simply hearing it, and considerable experience and effort are needed in order to
acquire this skill. In particular, it is difficult to analyze the signal that has tones of a different pitch at
the same time. Therefore, a technology for analyzing
multi-pitch signals is required.
Monophonic music can be analyzed with relatively
a high accuracy [1]-[4]. However, multi-pitch music
is more difficult to analyze than a single tone. An
acoustic signal has information of fundamental frequencies and harmonic frequencies, but in the case of
Journal of Signal Processing, Vol. 17, No. 2, March 2013
multi-pitch sounds, it is unknown which peak corresponds to the fundamental frequency or the harmonic
frequency. Moreover, the number of fundamental frequencies is not always known. This is one reason for
the difficulty of multi-pitch analysis.
Many techniques have been tried in multi-pitch
analysis in the past, such as a comb filter [5], statistical information of chords and their progression
[6, 7], iterative estimation and separation [8], linear
models for the overtone series [9], parameter estimation of superimposed spectrum models [10, 11], acoustic object modeling using GMM and estimation with
an EM algorithm [12]-[14]. Specmurt analysis [15]-[21]
is another method of multi-pitch analysis. The method defines the observed spectrum as a convolution
of the fundamental frequency distribution and instrumental information, and it differs from those listed
above in terms of the introduction of the specmurt
domain while [5] is processed in the time domain and
[6]-[14] are processed in the spectrum domain.
The conventional specmurt analysis is based on the
approach that first obtains instrumental information
29
2
Journal of Signal Processing, Vol. , No. ,
ω
∆ω 2∆ω 3∆ω 4∆ω
ω
x
∆x
∆x ∆x ∆x
x
(a) Linear frequency dom- (b) Log-frequency domain
ain
Fig. 1 Positional relationship between fundamental
and harmonic frequencies
h(x)
u (x)
v(x)
Common harmonic
structure pattern
x1 x2
Fundamental frequency
distribution
x1 x2
Generated multi-pitch
spectrum
Fig. 2 Generation of a multi-pitch spectrum by convolution of a common harmonic structure and a fundamental frequency distribution [18]
by iteratively generating a model called the “common
harmonic structure” and then gives a fundamental frequency distribution based on the model. This method
builds a common harmonic structure, and the approach is based on the premise that the relative powers
of the harmonic components are common and do not
depend on the fundamental frequency. However, it is
considered impossible to obtain a highly accurate model of the structure since it can vary slightly depending
on the pitch. Because of the dependency of the harmonic structure on the pitch, the data-driven approach
to select the harmonic structure without assuming the
common harmonic structure is needed. Therefore we
propose a new method based on the sparseness criteria
to analyze the fundamental frequency distribution.
2.
Specmurt Analysis
2.1 Multi-pitch spectrum in log frequency
In our study, the acoustic signals having harmonics
are analyzed and percussive signals such as drums are
not targeted. The n-th harmonic frequency is equal to
n multiple of the fundamental frequency in the linearfrequency scale. Therefore, when the fundamental frequency shifts by ∆ω, the n-th harmonic frequency also
shifts by n × ∆ω (Fig. 1(a)). Meanwhile, in the logfrequency scale, the n-th harmonic frequency is located at log n away from the fundamental frequency.
30
This means that all harmonic frequencies shift by ∆x
when the fundamental frequency shifts by ∆x in the
log-frequency scale (Fig. 1(b)).
In specmurt analysis, it is assumed that the relative
powers of the harmonic components are common and
do not depend on the fundamental frequency. This
is called common harmonic structure h(x), where x
represents log-frequency. The fundamental frequency
is located at the origin, and the power is normalized
to be 1. All pitch spectra can be expressed by a shift
of h(x) along the x-axis in the log-frequency domain
when a fundamental frequency in the log-axis is given.
It is considered that a multi-pitch spectrum can be
generated by addition of a common harmonic structure h(x) multiplied by the power corresponding to
the fundamental frequency. If the distribution of the
power of fundamental frequencies is defined as a fundamental frequency distribution u(x), a multi-pitch
spectrum v(x) is a convolution of h(x) and u(x), as
shown in Fig. 2.
v(x) = h(x) ∗ u(x)
(1)
2.2 Analysis of fundamental frequency distribution
If a common harmonic structure h(x) is known, a
fundamental frequency distribution u(x) can be estimated by the deconvolution of an observed multi-pitch
spectrum v(x) by h(x)
u(x) = h(x)−1 ∗ v(x)
(2)
According to the convolution theorem, Eq. (2) can be
expressed as
V (y)
U (y) =
(3)
H(y)
where U (y), H(y) and V (y) are the inverse Fourier
transform of u(x), h(x) and v(x), respectively. We
can obtain u(x) using the Fourier transform of U (y)
in the y domain as follows:
u(x) = F[U (y)]
(4)
As described above, the method to estimate the fundamental frequency distribution by deconvolution in
the log-frequency domain is called specmurt analysis
[15]-[21], and the y domain (defined as the inverse Fourier transform of the log-frequency spectrum) is called
the specmurt domain. In practical calculation, the y
domain may be regarded as the Fourier transform.
In specmurt analysis, a wavelet transform that can
perform an analysis in the log-frequency is used to extract spectra instead of the short-term Fourier transform since the observed spectrum v(x) is dealt with
in the log-frequency domain.
One characteristic of specmurt analysis is that it
can analyze music signals where pitch changes occur
Journal of Signal Processing, Vol. 17, No. 2, March 2013
3
Observed Spectrum
Generation of Candidates for the
Fundamental Frequency Distribution
Calculation of Harmonic Structure
Using Specmurt
Rejection of Non-Harmonic Structure
Fig. 3 Example of observed spectrum (piano triad)
Finding the Optimal Harmonic
Structure Based on the Sparseness
in a short time. Therefore, the analysis result of a piano roll, for example, can be obtained as visual information, where the horizontal axis represents the time
index and the vertical axis represents the pitch.
2.3
Conventional approach with specmurt
The fundamental frequency distribution u(x) can
be obtained using Eq. (2) if the observed spectrum
v(x) is given, and the common harmonic structure
h(x) is known. However, h(x) is generally unknown.
For this reason, h(x) has been modeled in some ways
so far, and the model is assumed as follows. In [15, 16],
the common harmonic structure whose power ratio of
the n-th harmonic frequency component is 1/n of the
fundamental frequency component is defined. This
is based on the previous knowledge that a natural
sound spectrum commonly has such a shape. However, the optimal fundamental frequency distribution
u(x) is not always obtained by such an approach since
the common harmonic structure varies depending on
the tone. In [17, 18, 21], a quasi-optimization with an
iterative algorithm is used for estimating h(x) but a
more accurate modeling method is required.
3.
3.1
Correct Fundamental Distribution
Fig. 4
Flowchart of sparseness criteria of F0frequencies selection for specmurt-based multi-pitch
analysis
3.2
Outline of proposed method
If there is no noise in the observed spectrum v(x),
it is believed that each peak corresponds to the fundamental frequency or the harmonic frequency. Fig. 3
shows the example of a spectrum of a piano triad. It
is considered possible to obtain the fundamental frequency distribution u(x) by selecting the set of the
peaks in the observed spectrum correctly. Hence, our
method focuses on finding the fundamental frequency
distribution ũ(x) that has all peaks corresponding
to the fundamental frequencies of multiple tones and
does not have any peaks corresponding to the harmonic frequency components. Fig. 4 shows the flowchart
of our method. First, some candidates for the funda-
Sparseness-Based F0 Selection
Problem with modeling of the common harmonic
structure
As mentioned in the previous chapter, the conventional multi-pitch analysis with specmurt focuses on
how to model the common harmonic structure h(x).
However, it is considered difficult to obtain the strictly
correct model of the structure since it is known that
the harmonic structure slightly varies depending on
the pitch. Therefore, we propose the method to analyze the fundamental frequency distribution without
modeling the harmonic structure.
Journal of Signal Processing, Vol. 17, No. 2, March 2013
Fig. 5 Example of generation ûi (x) from the observed
spectrum
31
4
Journal of Signal Processing, Vol. , No. ,
Fig. 6 Examples of ûi (x) generated from Fig. 3 (upper row) and ĥi (x) corresponding to each ûi (x) (lower row)
Fig. 7 Example of harmonic structure (single tone
A3 of piano)
mental frequency distribution are generated from the
observed spectrum. Second, using specmurt, the harmonic structures corresponding to the candidates are
calculated, respectively. In the obtained structures,
non-harmonic structures are rejected, and the optimal
harmonic structure is found based on the sparseness
among the remaining structures. Finally, the candidate corresponding to the optimal harmonic structure
is selected as the correct fundamental frequency distribution.
3.3 Generation of candidates for fundamental frequency distribution
It is difficult to extract the peaks of fundamental frequencies exclusively from the observed spectrum
since it cannot be said which peak corresponds to
the fundamental frequency or the harmonic frequency
components. Therefore, we will first discuss the candidates for ũ(x).
32
It is known that the peaks corresponding to the
fundamental frequencies have a certain level of power. We consider M major peaks from the observed
spectrum and obtain some sets of û(x) by selecting
some combinations from the different M major peaks.
If the observed signal consists of L tones, the number of candidates is calculated by M CL because the
number of peaks of ũ(x) should be equal to the number of tones. However, the number of tones is often
unknown. ∑
Thus, the number of candidates λ is exL
pressed as l=1 M Cl so that it can calculate up to L
tones from single tone.
Fig. 5 shows an example of generation ûi (x) from
the observed spectrum. The candidates of ũ(x), ûi (x)
(i = 1, 2, . . . , λ) have a combination of peaks obtained
from the observed spectrum. Each peak is processed
as an impulse that has the power equal to the corresponding peak.
3.4 Selection of optimal harmonic structure
3.4.1 Calculation of harmonic structure using specmurt
A harmonic structure is obtained according to
Eq. (1) as follows:
h(x) = u(x)−1 ∗ v(x)
(5)
One solution for the harmonic structure, ĥi (x), is
obtained by substituting u(x) in Eq. (5) with the candidate ûi (x).
In this section, we discuss how to select an optimal
harmonic structure h̃(x) among ĥi (x) (i = 1, 2, . . . ,
Journal of Signal Processing, Vol. 17, No. 2, March 2013
5
λ). The figures in the upper row of Fig. 6 illustrate examples of ûi (x) generated from a piano triad (Fig. 3),
and those in the lower row show ĥi (x) corresponding
to each ûi (x). Fig. 6(a-1) shows a candidate ûi (x) that
has all the fundamental frequencies and does not have
any harmonic frequency; i.e., the correct combination
of spectral peaks ũ(x). Fig. 6(b-1) and Fig. 6(c-1) are
examples of incorrect combinations of spectral peaks,
where they lack some fundamental frequencies or have
some harmonic frequencies. Fig. 6(a-2) shows ĥi (x)
corresponding to the correct combination of spectral
peaks ũ(x); i.e., the optimal harmonic structure h̃(x).
As shown in these figures, h̃(x) is the most similar to
the harmonic structure (Fig. 7) among those on the
lower row of Fig. 6.
On the other hand, the harmonic structure in
Fig. 6(b-2) has numerous peaks, though that in Fig. 7
does not have peaks at the same position. Also,
Fig. 6(c-2) does not have large peaks in the harmonic frequencies. The structure like Fig. 6(c-2) is called
the non-harmonic structure in this paper.
where α represents the weight. If L1 norm is used in
the first and the second terms,
La (i) =
X
∑
{1 −
x=1
Lb (i) =
N
∑
j=1
X ∑
N
∑
x=1 j=1
3.4.3 Finding the optimal harmonic structure based
on the sparseness
An ideal harmonic structure has peaks only in the
fundamental frequency and the harmonic frequencies
as in Fig. 7. In our method, in order to select the
optimal harmonic structure, we calculate the sparseness of each ĥi (x) that is not rejected, as described
in Section 3.4.2. According to Fig. 7 and Fig. 6, the
optimal harmonic structure h̃(x) is considered to be
sparser and has larger peaks at the harmonics than
other ĥi (x). Thus, the sparseness S is defined as
S(i) = −{αLa (i) − (1 − α)Lb (i)}
δ (Ωj − x) |hi (x)|
(7)
(8)
where δ is the Kronecker’s delta. The first term, La (i),
means the sparseness (except for harmonic components), and the second term, Lb (i), means the summation of values at harmonics. If L2 norm is used in
Eq. (6),
La (i) =
X
∑
x=1
Lb (i) =
{1 −
N
∑
j=1
X ∑
N
∑
x=1 j=1
3.4.2 Rejection of non-harmonic structures
In order to reduce the computation cost for finding
the optimal harmonic structure which is described in
the 3.4.3, non-harmonic structures are rejected in advance using a technique described in this section.
If the instrument or the pitch varies, the relative
power ratio of each harmonic frequency varies but each
appearance position of harmonic frequencies does not
vary. Therefore, the appearance position of harmonic
frequency in the harmonic structure (Ω2 , Ω3 , . . . , ΩN )
is regarded as the information that is independent of
the pitch, where Ωn represents the position of the nth harmonic component, and Ω1 represents the origin position for the fundamental frequency. Based
on this information, it is important to check whether
there are values at (Ω2 , Ω3 , . . . , ΩN ). For example,
any structure that does not have any large peaks at
(Ω2 , Ω3 , . . . , ΩN ) like Fig. 6(c-2) is treated as a nonharmonic structure, and such structures are rejected
by using the threshold set experimentally.
δ (Ωj − x)}|hi (x)|
δ (Ωj − x)}hi (x)2
δ (Ωj − x) hi (x)2
(9)
(10)
Assuming that h̃(x) = ĥĩ (x), ĩ can be determined by
ĩ = argmax S(i)
(11)
i
3.5
Correct fundamental frequency distribution
As described above, the optimal harmonic structure h̃(x) is obtained based on sparseness criteria. Finally, ũ(x) corresponding to h̃(x) is selected uniquely
among ûi (x).
Summing up our method, the steps shown below
are processed for each frame.
1. Based on the observed spectrum v(x), the candidates of the optimal fundamental frequency distribution ûi (x) are prepared.
2. The candidates of the optimal harmonic structure ĥi (x) are obtained by substituting ûi (x) in
Eq. (5).
3. Non-harmonic structures are rejected among
ĥi (x), and the most sparsest ĥi (x) is determined as h̃(x).
4. ũ(x) corresponding to h̃(x) is selected among
ûi (x).
This method does not need to learn the pitch or the
instrumental information since each step is independent of pitch and instrumental information.
(6)
Journal of Signal Processing, Vol. 17, No. 2, March 2013
33
6
Journal of Signal Processing, Vol. , No. ,
C6
C6
C5
C5
C4
C4
C3
C3
Time
Time
Fig. 9 An example of analysis result (data A): Red
circles indicate some mistaken notes.
(a)
C6
C5
C4
Time
(b)
Fig. 8 (a) Piano-roll of test MIDI (data A) and (b)
Piano-roll of test MIDI (data B)
4.
Experiments
4.1 Conditions
To evaluate our method, we use two songs from the
RWC Music Database1 as the test data (Table 1), and
Fig. 8 shows the piano-roll of data A and data B. The
test signal is recorded at a 16kHz sampling rate using
MIDI instruments: piano, violin or acoustic guitar.
Wavelet transform with Gabor function is applied to
the test data to obtain the spectrum. The parameter
M described in Section 3.3 is set at 7. This means
that we can analyze the observed signal having up to
7 tones at the same time. Next, the parameter N is
set at 6 since the value at ΩN tends to be unobservable
when N increases and setting too large of an N might
cause the rejecting all ĥi (x).
Table 1
Symbol
data A
data B
List of experimental data
Title
Sicilienne
Gavotte E-Dur
Catalog number
RWC-MDB-C-2001 No.43
RWC-MDB-C-2001 No.36
4.2 Results
Fig. 9 depicts analysis results of data A, where (L2 ,
L1 ) and the weight parameter of 0.9 are used for a pi1 http://staff.aist.go.jp/m.goto/RWC-MDB/
34
ano roll. Almost all the notes are estimated correctly,
but some notes are mistaken as octave-different notes.
Fig. 10 and 11 show the accuracies of data A and
data B for piano, violin, and guitar using our proposed method (without modeling harmonic structure),
respectively. For example, (L1 , L2 ) in the figures indicates that the L1 norm is used in the first term of
Eq. (6), and L2 norm is used in the second term of
Eq. (6). The weight parameter α in Eq. (6) is changed from 0.0 to 1.0. The accuracy is calculated as
follows:
Accuracy(%) =
Nall − (Nins + Ndel )
× 100
Nall
(12)
where Nall , Nins and Ndel represent the total number
of notes, insertion errors and deletion errors, respectively. In our experiments, the note duration is not
evaluated, and we permit the onset time to shift τ seconds (in experiments, τ = 0.3) since the onset time
and the duration of each tone are not exactly equal to
the score.
As shown in Fig. 10 and Fig. 11, the optimal parameter varied, depending on the instrument. For piano,
the results with large weight indicate higher accuracy
(Fig. 10(a) and 11(a)). This means La may work effectively for instruments with frequency structures similar to that of a piano, where the largest peak at
the origin (fundamental frequency) is observed, and
because the frequency is higher, the peak value is
small. For violin, the use of small weight resulted
in the higher accuracy (Fig. 10(b) and 11(b)). This
means that Lb which calculates the summation of values at the harmonic, may work well for instruments
with frequency structures similar to that of a violin,
where the structure is different from that of a piano
in terms of having a larger peak at the second harmonic than the fundamental frequency. We will need
to investigate further the effectiveness of La and Lb
in future work. For guitar, Fig. 10(c) shows that the
use of middle weight resulted in higher accuracy, and
Fig. 11(c) shows the use of small weight resulted higher accuracy. In the case of guitar, the largest peak
is observed at the fundamental frequency, similar to
Journal of Signal Processing, Vol. 17, No. 2, March 2013
7
(a) Accuracy results for piano (data A)
(b) Accuracy results for violin (data A)
Fig. 10
(a) Accuracy results for piano (data B)
Accuracy results (data A)
(b) Accuracy results for violin (data B)
Fig. 11
(c) Accuracy results for guitar (data A)
(c) Accuracy results for guitar (data B)
Accuracy results (data B)
piano, but a guitar sometimes produces the peaks of
attack sound at the lower frequency than the fundamental frequency. Therefore, occasionally, the attack
sound is regarded as the fundamental frequency, and
the correct fundamental frequency is regarded as the
second harmonic. For that reason, some notes may be
regarded as a violin. As a consequence, the optimal
weight for a guitar varies depending on the number of
the notes regarded as a violin.
In all results for data A (except violin), the combination of (L2 , L1 ) resulted in the best accuracy, where
L2 norm is used in the first term, La in Eq. (6). In
order to increase the value of Eq. (6), La , which is the
summation of the noises in the harmonic structure,
has to be small, and Lb , which is the summation of
the harmonic, has to be large. L2 norm reduces the
value of La (the first term in Eq. (6)) better than the
use of L1 norm because L2 norm makes the value that
is less than 1 more smaller (all noises are smaller than
1). On the other hand, in order to increase the value of Lb (the second term in Eq. (6)), the use of L1
norm is better than L2 because most harmonics are
also smaller than 1.
Table 2 shows the comparison between the
specmurt-based method with modeling of the common
harmonic structure [18] and the proposed method in
data A, where the optimal parameters are selected in
each method. The proposed method (without modeling the common harmonic structure) obtained higher accuracies than that with the common harmonic
Journal of Signal Processing, Vol. 17, No. 2, March 2013
Fig. 12 Observed spectrum (multi-pitch D4 and B♭4
of piano)
structure for each instrument.
Fig. 12 shows an observed spectrum of multi-pitch
(D4 and B♭4). Fig. 13(a-1) and Fig. 13(a-2) show the
fundamental frequencies and the harmonic structure
obtained from the observed spectrum by modeling the
harmonic structure, and Fig. 13(b-1) and Fig. 13(b2) show the results gained by the proposed method.
The modeled harmonic structure (Fig. 13(a-2)) has
no noise, but the fundamental frequency distribution
corresponding to it (Fig. 13(a-1)) is incorrect. Some
mistaken peaks of the distribution may be eliminated
using threshold processing; however, the larger peak
35
8
Journal of Signal Processing, Vol. , No. ,
Fig. 13 The fundamental frequency distribution and the harmonic structure obtained from Fig. 12 by conventional
specmurt (left) and proposed (right)
Table 2 Comparison of a specmurt-based method
with modeling harmonic structure to the proposed
method
Piano
Guitar
Violin
with modeling
harmonics
89.2%
74.3%
65.0%
w/o modeling
harmonics
92.7%
79.7%
71.7%
tuations.
Fig. 14
The harmonic structure of piano (D4)
circled in black in Fig. 13(a-1) may not be excluded.
On the other hand, the harmonic structure produced
by the proposed method (Fig. 13(b-1)) has some small
noises, but the fundamental frequency (Fig. 13(b-2))
is correct. The noises in the harmonic structure come
from the difference of each harmonic structure D4 and
B♭4. Since the noises absorb the difference, the optimal fundamental distribution can be obtained.
Fig. 14 shows the harmonic structure of a piano
(D4). There are some differences between this structure and Fig. 13(a-2) although Fig. 13(a-2) is the
modeled harmonic structure. This may be because
it is difficult to model the optimal common harmonic structure in multi-pitch music since the harmonic
structure can vary slightly depending on the pitch. In
our future work, we will study how to best obtain the
optimal common harmonic structure in multi-pitch si-
36
5.
Conclusion
In this paper, we proposed a specmurt-based,
multi-pitch analysis method without modeling the
common harmonic structure. Instead of modeling
the structure, the optimal harmonic structure is selected among the candidates based on sparseness criteria. The experiments show our method is effective
for multi-pitch analysis. The results from Fig. 10 and
Fig. 11 indicate that the optimal parameter α varies
depending on instruments or music. Since multi-pitch
analysis in a real environment deals with some instruments or pitches without instrument information, in
our future work, we will study how to determine the
optimal parameter. In the future, we will improve the
method by adding other criteria to avoid octave difference errors and to make it possible to apply our
method to vocal singing harmony.
Journal of Signal Processing, Vol. 17, No. 2, March 2013
9
References
[1] L. R. Rabiner: On the use of autocorrelation analysis for
pitch detection, IEEE Trans. ASSP, Vol. ASSP-25, No. 1,
pp. 24-33, 1977.
[2] D.J. Hermes: Measurement of pitch by subharmonic summation, Journal of ASA, Vol. 83, No. 1, pp. 257-264, 1988.
[3] Y. Takasawa: Transcription with Computer, IPSJ, Vol. 29,
No. 6, pp. 593-598, 1988.
[4] P. Cuadra, A. Master and C. Sapp: Efficient pitch detection techniques for interactive music, International Computer Music Conference, 2001.
[5] T. Miwa, Y. Tadokoro and T. Saito: The pitch estimation
of different musical instruments sounds using comb filters for
transcription, IEICE Trans. D-II, Vol. J81-D-II, No. 9, pp.
1965-1974, 1998.
[6] K. Kashino, K. Nakadai, T. Kinoshita and H. Tanaka: Organization of hierarchical perceptual sounds: Music scene
analysis with autonomous processing modules and a quantitive information integration mechanism, Proc. IJCAI, Vol.
1, pp. 158-164, 1995.
[7] K. Kashino, T. Kinoshita, K. Nakadai and H. Tanaka:
Chord recognition mechanisms in the OPTIMA processing
architecture for music scene analysis, IEICE Trans. D-II, Vol.
J79-D-II, No. 11, pp. 1762-1770, 1996.
[8] A. Klapuri, T. Virtanen and J. Holm: Robust multipich
estimation for the analysis and manipulation of polyphonic
musical signals, Proc. COST-G6 Conference on Digital Audio Effects, pp. 233-236, 2000.
[9] T. Virtanen and A. Klapuri: Separation of harmonic
sounds using linear models for the overtone series, Proc.
ICASSP2002, Vol. 2, pp. 1757-1760, 2002.
[10] M. Goto: F0 estimation of melody and bass lines in musical
audio signals, IEICE Trans. D-II, Vol. J84-D-II, No. 1, pp.
12-22, 2001.
[11] M. Goto: A real-time music scene description system:
Predominant-F0 estimation for detecting melody and bass
lines in real-world audio signals, ISCA Journal, Vol. 43, No.
4, pp. 311-329, 2004.
[12] K. Miyamoto, H. Kameoka, T. Nishino, N. Ono and S.
Sagayama: Harmonic, temporal and timbral unified clustering for multi-instrumental music signal analysis, IPSJ SIG
Technical Report, 2005-MUS, Vol. 82, pp. 71-78, 2005.
[13] H. Kameoka, J. Le Roux, N. Ono and S. Sagayama: Harmonic temporal structured clustering: A new approach to
CASA, ASJ, Vol. 36, No. 7, pp. 575-580, 2006.
[14] K. Miyamoto, H. Kameoka, T. Nishimoto, N. Ono and S.
Sagayama: Harmonic-temporal-timbral clustering (HTTC)
for the analysis of multi-instrument polyphonic music signals,
Proc. ICASSP2008 pp. 113-116, 2008.
[15] K. Takahashi, T. Nishimoto and S. Sagayama: Multi-pitch
analysis using deconvolution of log-frequency spectrum, IPSJ
SIG Technical Report, 2003-MUS, Vol. 127, pp. 113-116,
2008.
[16] S. Sagayama, K. Takahashi, H. Kameoka and T. Nishino:
Specmurt analysis: A piano-roll-visualization of polyphonic music signal by deconvolution of log-frequency spectrum,
Proc. ISCA Tutorial and Research Workshop on Statistical
Journal of Signal Processing, Vol. 17, No. 2, March 2013
and Perceptual Audio Processing (SAPA2004), to appear,
2004.
[17] H. Kameoka, S. Saito, T. Nishino and S. Sagayama: Recursive estimation of quasi-optimal common harmonic structure pattern for specmurt analysis: Piano-roll visualization
and MIDI conversion of polyphonic music signal, IPSJ SIG
Technical Report, 2004-MUS, Vol. 84, pp.41-48, 2004.
[18] S. Saito, H. Kameoka, T. Nishimoto and S. Sagayama:
Specmurt analysis of multi-pitch music signals with adaptive
estimation of common harmonic structure, Proc, International Conference on Music Information Retrieval (ISMIR2005),
pp. 84-91, 2005.
[19] S. Saito, H. Kameoka, N. Ono and S. Sagayama: POCSbased common harmonic structure estimation for specmurt
analysis, IPSJ SIG Technical Report, 2006-MUS, Vol. 45, pp.
13-18, 2006.
[20] S. Saito, H. Kameoka, N. Ono and S. Sagayama: Iterative
multipitch estimation algorithm for MAP specmurt analysis,
IPSJ SIG Technical Report, 2006-MUS, Vol. 90, pp. 85-92,
2006.
[21] S. Saito, H. Kameoka, K. Takahashi, T. Nishimoto and S.
Sagayama: Specmurt analysis of polyphonic music signals,
IEEE Trans. ASLP, Vol. 16, No. 3, pp. 639-650, 2008.
Daiki Nishimura
received his
B.E. degree in computer science
from Kobe University in 2011. His
current research interest includes
acoustic signal processing. He is a
member of ASJ.
Toru Nakashika
received his
B.E. and M.E. degrees in computer science from Kobe University in
2009 and 2011, respectively. In the
same year, he continued his research
as a doctoral student. From September 2011 to August 2012 he studied at INSA de Lyon in France. He
is currently a 2nd-year doctoral student at Kobe University. His research interest is speech and image recognition and statistical signal processing. He is a member of IEEE
and ASJ.
37
10
Journal of Signal Processing, Vol. , No. ,
Tetsuya Takiguchi
received
his B.S. degree in applied mathematics from Okayama University of
Science, Okayama, Japan, in 1994,
and his M.E. and Dr. Eng. degrees
in information science from Nara Institute of Science and Technology,
Nara, Japan, in 1996 and 1999, respectively. From 1999 to 2004, he
was a researcher at IBM Research,
Tokyo Research Laboratory, Kanagawa, Japan. He is currently an Associate Professor at Kobe University.
His research interests include statistic signal processing and pattern recognition. He received the Awaya Award from the Acoustical Society of Japan in 2002. He is a member of IEEE, IPSJ
and ASJ.
Yasuo Ariki
received his B.E.,
M.E. and Ph.D. degrees in information science from Kyoto University
in 1974, 1976 and 1979, respectively. He was an Assistant Professor at Kyoto University from 1980
to 1990, and stayed at Edinburgh
University as visiting academic from
1987 to 1990. From 1990 to 1992 he
was an Associate Professor and from
1992 to 2003 a Professor at Ryukoku
University. Since 2003 he has been a
Professor at Kobe University. He is
mainly engaged in speech and image recognition and interested
in information retrieval and database. He is a member of IEEE,
IPSJ, JSAI, ITE and IIEEJ.
(Received July 17, 2012; revised January 7, 2013)
38
Journal of Signal Processing, Vol. 17, No. 2, March 2013