A Complete Syllable Dictionary for Serinus Canarius

Transcription

A Complete Syllable Dictionary for Serinus Canarius
For submission to Ecological Informatics
Article revision 2012/12/01
A Complete Syllable Dictionary for Serinus
Canarius
Glenda Angle and Hasan Coskun
Abstract This paper is devoted to a detailed quantitative analysis of the vocal
repertoire of a domestic male canary (Serinus Canarius) by multivariate
statistical methods, and temporal and spectral feature extraction. In the
first part of the paper, combination of various methods are employed
to extract characteristic features to classify all syllables, and construct
a complete seasonal syllable dictionary for the canary. In the second
part, synthetic syllables corresponding to all entries in the dictionary
are generated providing a standardized framework for behavioral and
neural studies of canaries.
Keywords canary vocalization, feature extraction, syllable classification, synthetic
syllables, song coding
1
Introduction
The structure of a canary (Serinus Canarius) song is diverse; it may contain
components which are sinusoidal, harmonic, non harmonic, and noisy in structure [15]. The study of acoustic features in canary syllables, the building blocks of
songs, will facilitate methods in producing synthetic syllables and provide a standardized framework for future behavioral and neural studies. The focus of this
paper is to construct a seasonal syllable dictionary, and generate its corresponding synthetic version. The first part is carried out by computing temporal and
spectral features of canary songs, and identifying of all distinct syllables. The
second phase is completed with the help of certain multivariate statistics techniques. To the best of our knowledge, a complete listing of seasonal syllables has
not yet been generated before. Thorpe [21] along with Mulligan and Olsen had
developed the first documented list of syllables. However, their classification was
based on visual inspection of sonograms which, as they admit, was not a reliable
quantitative method.
Thorpe’s focus was the study of canary communication and its significance
in their social behavior. Peter Marler expanded on this work utilizing a numerical representation of sounds, and demonstrated them on a few selected syllables [3]. However, a complete seasonal syllable dictionary generated by precise
quantitative methods as employed in this paper had still been missing. Some
of the methods we utilized here were also used by Nottebohm [12] to compare
changes that occur in a group of canaries’ syllable repertoire over the years, and
by Somervuo and Härmä [7] who studied classification of bird sounds based on
their harmonic structures. Other researchers such as Konishi, Nottebohm and
Gardner have studied neural structure of the canaries which has some connections to our work. [3, 5, 17, 7, 6, 12].
In this study, we have identified 25 distinct syllables in a library of approximately 100 songs recorded in a period of one year that included pre through
post copulation periods. Raven Pro 1.3 and Mathematica software are used to
carry out the numerical computations and statistical analysis. The Raven Pro, a
software application for acquisition and analysis of acoustic signals developed
by Cornell Bioacoustics Research program, provided a variety of recording and
sound measurement toolboxes for the syllable recognition phase. Mathematica
software is used to analyze temporal and spectral features of songs, and compare them to identify distinct syllables. It is also used for the production of their
synthetic analogues as Raven Pro offered limited capabilities in this regard.
The paper is organized as follows. Section 2 is about the basic stages of canary
song development, and the song control system; Section 3 is on the song structure
in a hierarchical organization; Section 4 is about the practices of song recording;
Section 5 is on the classification of syllables based on evaluation parameters for
certain features; Section 6 is about the construction of synthetic syllables; Section
7 gives examples of song coding for pattern recognition; and Section 8 contains a
discussion of results.
This work has been supported by the Advanced Research Program (ARP)
grant 003656-0046-2007 awarded by the Texas Higher Education Coordinating
Board.
2
2
2.1
Song Development
Development Stages
Canaries are vocal learners that imitate their tutor to produce normal songs [14].
Recent studies [6] have indicated that young canaries can also learn to imitate
synthetic songs generated by a computer that violate phrase patterns, but as the
bird matures he edits his repertoire such that it is typical of an adult canary
melody. A young male builds a large repertoire of song syllables through the
developmental stages that begin with what is called subsong (about 40 days after
hatching) to plastic song, and then to full or stable song (sometimes called the
crystallized form) [4]. Learning song depends on auditory experiences and motor
practice: hearing others and oneself is essential to the normal vocal development
of songbirds [16].
Early development begins with learning calls and trilling sounds at the preliminary subsong stage. In this phase the syllables are practiced to achieve pitch
and timber; only a small percentage of these syllables would be retained through
the stable song stage. In the plastic stage of vocal development, the frequency
and time resolution no longer have fuzzy boundaries, the repeated syllables have
structure and consistency. Songs that develop into stable form will last the duration of a season or the end of each breeding season at which time the male canary
returns to the subsong stage to rebuild his syllable repertoire, then once again
continues through to the plastic and stable song stages. During the rebuilding
of syllables, some of the original syllables are believed to disappear or possibly
forgotten, while others are transformed or reconstructed with maturity [13]. However, it is unclear if the ability to develop new songs continues throughout full
life of a canary.
2.2
Canary Vocal Organs
It would be helpful to give a brief description of the canary vocal system too. The
syrinx, a vocal organ located at the junction of the bronchi and the trachea consists
of movable tissue membranes called medial labia (ML) and lateral labia (LL). It
has a similar function to that of the glottis and vocal folds in humans. These
connective tissues are set into oscillatory motion by air from the lungs forced
3
through the bronchi into the syrinx then modulated by the vocal tract which acts
as a resonator tube consisting of the trachea, larynx, mouth, and beak [2].
Syrinx muscles control the movement of the syrinx which allows the bird
to control the frequency and intensity of sounds by changing the air pressure
passing from the lungs to the syrinx and varying the tension on the membranes.
The syrinx converts the energy of the expiratory airflow into sound similar to that
of the larynx.
3
Song Structure
A canary song may be segmented into components of increasing complexity in a
hierarchical structure. The smallest component referred to as an element usually
regarded as the smallest acoustic unit of vocalization. Syllables may be composed
of one or more elements that occur together in a regular pattern. A sequence
of the same syllable generates a phrase or trill based on the modulation rate.
Songs are long complex vocalizations which include some or all of the described
components. An example of the hierarchical canary vocal structure is shown in
Figure 1 below.
Figure 1: Example of a canary song phrase spectrogram. The intensity of the sound is denoted
by the gray-scale. The building blocks of a song are (1) the elements of syllable, (2) the syllable
composed of one or more elements (3), and a train of syllables forming a phrase, which in turn
makes a part of the song.
An arrangement of three or more syllables in a sequential pattern reoccurring
in a large number of different songs is called a motif. A syllable may occur
in different motifs, repeating or non-repeating. We found small motifs of 3–4
syllables more frequent in the breeding season than those consisting of more
than 6 syllables [11].
4
4
4.1
Set Up and Methods
The Birds (Sun and Shine)
The sterilized domestic male canary used in this study was purchased at a nationally recognized pet store at the age of six months. The female was purchased
at the same time through a private pet store. The cage size which housed both
birds is 3000 × 1800 × 1800 , and the two canaries had unrestricted physical interactions. Each clutch produced 4 eggs in a nest that was constructed from grass and
paper by the female. The canary was housed in a closed room with parakettes,
finches, and other canaries for at least one month in early April and May prior
to purchase. The breeding season of the male lasted from the beginning of June
until late November producing several clutches followed by moult beginning in
December at which time the canary did not sing.
4.2
Song Recording
Recordings were made using a Dynex microphone (DX-54) with sensitivity -53
dBV/ubar at 1KHz, and DC power 1.5V to 10V. The microphone was place inside
the cage at a position of no less than 4 inches from the canary. Testing the microphone with tonal sounds showed two harmonics on their spectrogram at the first
and second multiple of the dominant frequency.
All songs were recorded using Raven Pro 1.3. Each signal was digitized at
44.1kHz with 16-bit pulse code modulation (PCC). To ensure consistency with
all recordings the following data parameters were chosen as the best fit. Spectrograms were created by DFT with a Hann window, frame length of 256 points,
and a temporal resolution of 58 ms. Additionally, a 50% frame overlap with hop
size of 2.9 ms, and frequency grid spacing of 172 Hz is used. The bandwidth
is set at 3dB. Approximately 100 songs were recorded in similar conditions and
anayzed providing a sufficient sampling to obtain a complete seasonal syllable
dictionary for the specific canary. From this rich library of songs, ten representative songs were selected and listed in Table 5 that contain all distinct syllables for
song coding and further analysis.
5
4.3
Song Analysis
The songs were recorded in a twelve month span throughout the day during
mating season which took place from late June through November. Recognition of syllables was first performed on song spectrograms by visual inspection
with special focus on measuring frequencies, duration of intervals between the
repeated parts of trill phrases, full syllable duration, slope sweep, and overall
form. Within a song, some syllables will show slight variation in duration and
amplitude. Some syllables recur frequently in the songs, while others are much
more rare in occurrence as noted in Table 2, based on a sampling from the library
of recorded songs. Note that syllables N, R, and S are not represented in any
song, but appear as voiced sounds sporadically throughout the day. We will refer
to these as call syllables.
A song is defined to be at least 1.5 seconds in length and may last approximately one minute. A new song begins if a time lapse or pause longer than 0.4s
occurs in the recording. Songs used in the mating context produce more complex (what is also called sexy) syllables. These same syllables also occur outside
the breeding season, but at lower rates [19]. The structural components of each
syllable form the initial step in analysis. Most syllables consist of two or three
components; it is known that they may contain up to 6 components depending
on the breed of canary. Throughout most of the recorded songs one component
syllables are present, typical of domesticated canaries [11]. We consider, for the
purpose of this paper, tonal sounds such as syllable A repeated at different frequencies (known as frequency shift) the same syllable as suggested by Clark and
Marler [3]. That is, all syllables are held to the same approach in determining
distinct structures without considering possible frequency shifts as a factor.
A young male builds a large repertoire of song syllables varing between 12–40
distinct syllables per season [12, 1, 10]. That our canaries produced twenty-five
distinct syllables in the first year may be related to leaving out the frequency
shift as a defining factor, and their age as the number of syllables is expected to
increase each additional season [12].
6
5
Classification of Syllables
The classification of syllables was carried out in five steps: (1) visual inspection of
song spectrograms, and selection of syllable candidates based on physical structures; (2) segmentation of spectrograms and determining acoustic parameters for
classification; (3) extracting a set of features for each syllable; (4) computing the
correlation between syllable candidates and generating a list of ten candidates
for each syllable; (5) construction of a matrix representation for each syllable, and
calculating the distance between candidates using a suitable norm. The details of
this thorough classification process is discussed in the next section. Even though
some of the syllables can be recognized visually, this approach gives a precise
quantitative measure for how syllables and phrases differ. Table 1 and Table 4
show a complete list of individual syllables and their corresponding features.
5.1
Syllable Sets
In the selection phase, ten representations of each syllable (called a set) was identified from a control group of ten songs. The songs were bandpass filtered to
attenuate ambient noise [8], or low frequency background noise; typically a lower
bandpass of 100 Hz is used [2]. The elements in syllables of short duration were
difficult to extract with reliable signal boundaries; thus additional freedom was
used when manually isolating a single syllable. Twenty five syllable sets were
generated based on this procedure where each set was labeled by a letter from A
through Y; out of which {N, R, S} are call syllables that do not occur in songs.
To make it more precise, a quantitative comparison was performed by sliding
two spectrograms or waveforms past each other in time while computing the
correlation. Typically, a bandpass filter may be applied to reduce the amount of
noise contained in the signals prior to the correlation process if the energy outside
the frequency band is significant as we stated earlier. Once a syllable candidate is
identified and set as the control syllable, the position of the peak correlation was
identified against all other syllables in the songs. Syllables were considered to be
candidates in the same syllable set if their correlation was higher than 0.230, even
though there may be slight variations in their durations, formats, and frequency
modulations. The correlation process was repeated for each syllable to capture
ten occurrences of the same syllable candidate in all recorded songs.
7
5.2
Feature Extraction
The form of a canary syllable may range from a simple single frequency sweep
(tonal syllable) to a multiple element complex syllable. The frequency modulation may vary sharply from low to very high values or vice versa. Syllables with
fast frequency modulation of about 13s−1 and high repetition rates that contain
more than one element (called sexy syllables [20, 19]) are important in female mating choice. In addition to the modulation, our classification scheme will utilize
duration (s), dominant frequency (Hz) and peak power (dB) of each syllable.
The objective in feature extraction is to identify distinct categories in which
syllables are similar or different. Measures such as duration, dominent frequency,
average energy and signal power are presented in Table 1. The entries in the table
correspond to µ f ± 2σ f for each feature f where µ f is the average and σ f is the
standard deviation for each syllable set.
Additional features are listed in Table 2 to represent physical or relevant perceptual aspects of distinct syllables respectively. A few of these physical features
calculated from the sound wave are spectral centroid, signal bandwidth and zero
crossing rate. The loudness, pitch and brightness, on the other hand, are perceptual features. We kept the number of classification features to a moderate number
for preventing irrelevant features to cause possibly impaired classification.
A brief description of various parameter features used are given below:
1. Spectral Centroid (SC): The brightness of the sound is referred to as the
spectral centroid or center point of the spectrum; thus the higher the centroid the brighter the related sound. Spectral centroid is calculated by [5]:
2
∑nM=0 n | X (n)|
2
∑nM=0 | X (n)|
where X is the DFT of the sound and M is half of the size of DFT.
SC =
(1)
2. Signal Bandwidth (BW): The width of the frequency band of the signal
around the SC is called the signal bandwidth, which is calculated by [5]:
s
∑nM=0 (n − SC )2 | X (n)|
BW =
(2)
2
∑nM=0 | X (n)|
where SC, X, and M are as previously stated. The bandwidth of a syllable
is the range of lower and upper cutoff frequencies where the DFT is −3dB.
8
Table 1: A summary of detailed syllable analysis. Columns 2 and 3 give physical features based
on slope, number of elements, and frequency. Columns 4 - 8 provide measurements from each
syllable set (of ten of its own type) in seconds, Hz, and dB respectively. The acronym FS stands for
frequency sweep, R for the repetition, D for duration, S for spacing, DF for dominant frequency,
PP for peak power, and E for energy.
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
FS
R
D
S
DF
PP
E
Linear
Complex
Upsweep
Complex
Complex
Complex
Downsweep
Complex
Complex
Complex
Complex
Upsweep
Linear
Linear
Complex
Complex
Linear
Linear
Complex
Upsweep
Upsweep
Linear
Downsweep
Upsweep
Complex
Single
8-11
8-12
9-13
6-8
5-9
4-6
12-13
14-19
3-6
4-11
18-21
Single
Single
5-8
3-14
8-16
Single
5
17-31
15-29
2-8
5 - 12
8-11
3-6
0.30 ± .020
0.08 ± .002
0.05 ± .004
0.03 ± .004
0.10 ± .004
0.03 ± .003
0.07 ± .010
0.05 ± .005
0.03 ± .002
0.12 ± .008
0.05 ± .003
0.03 ± .005
0.30 ± .020
0.30 ± .030
0.08 ± .008
0.08 ± .030
0.03 ± .005
0.27 ± .001
0.30 ± .020
0.03 ± .004
0.04 ± .006
0.06 ± .010
0.04 ± .005
0.06 ± .010
0.07 ± .006
−
0.001 ± .000
0.010 ± .010
0.010 ± .010
0.020 ± .015
0.010 ± .001
0.080 ± .010
0.015 ± .015
0.007 ± .006
0.001 ± .001
0.030 ± .002
0.001 ± .002
0.200 ± .020
−
0.005 ± .004
0.030 ± .009
0.030 ± .009
−
−
-.003 ± .003
0.000 ± .003
0.003 ± .002
0.002 ± .001
0.010 ± .002
0.008 ± .009
3824.2 ± 773
3005.1 ± 091
4842.6 ± 104
3359.2 ± 181
2879.3 ± 840
3808.9 ± 664
4162.1 ± 298
2202.7 ± 892
3315.9 ± 153
4324.6 ± 265
3617.4 ± 414
3762.8 ± 224
3617.6 ± 227
4134.4 ± 608
3904.6 ± 1120
2628.5 ± 876
2476.9 ± 95.1
3617.6 ± 200
3617.6 ± 182
2618.3 ± 109
2618.4 ± 730
3678.6 ± 160
5052.7 ± 272
3962.1 ± 333
3812.1 ± 310
88.5 ± 2.4
75.3 ± 7.1
90.1 ± 4.2
78.4 ± 6.8
84.4 ± 1.0
82.0 ± 3.8
88.6 ± .62
85.5 ± 4.7
82.4 ± 2.7
90.5 ± 4.2
84.4 ± 5.1
79.6 ± 2.3
92.2 ± 3.7
79.2 ± 4.4
79.3 ± 3.4
82.9 ± 4.3
84.9 ± 2.6
74.2 ± 2.0
89.1 ± 2.8
79.0 ± 2.8
75.6 ± 2.4
91.5 ± 3.9
91.2 ± 2.1
85.9 ± 3.1
82.8 ± 6.6
79.6 ± 2.9
65.2 ± 1.6
74.1 ± 4.0
61.3 ± 5.1
71.4 ± .80
66.7 ± 8.2
75.1 ± .75
67.3 ± 3.7
63.4 ± 1.6
76.3 ± 3.7
68.1 ± 3.8
62.8 ± 1.7
81.8 ± 3.8
71.1 ± 4.1
63.8 ± 3.3
67.0 ± 3.1
66.2 ± 2.5
65.7 ± 1.0
77.1 ± 2.7
63.2 ± 2.0
60.9 ± 2.6
75.5 ± 3.5
74.3 ± 2.4
71.5 ± 3.2
67.1 ± 6.9
3. Spectral Rolloff Frequency (SRF): SRF is correlated to skewness of the spectral shape or the the value below which a certain amount of power distribution occurs. This measure can distinguish sounds with different frequency
ranges [5]. For e is the threshold between 0 and 1, it is defined by
K
SRF = max(K
∑ |X (n)|
n =0
9
2
M
<e
∑ |X (n)|
n =0
2
)·
(3)
Table 2: A list of syllable temporal and spectral features: SC stands for spectral centroid, SW for
signal bandwidth, SRF for spectral roll–off frequency, SF for spectral flatness, SE for short time
signal energy, ZCR for zero crossing rate, and FQ for frequency.
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
SC
SB
SRF
SF
SE
ZCR
FQ
1055.05
44.57
213.14
81.64
300.76
112.88
267.05
164.04
124.44
492.64
238.53
107.06
1037.37
1149.14
288.37
238.91
103.50
1247.02
1021.58
78.57
131.70
268.71
215.78
293.14
235.73
421179
3458.48
1405.10
1285.36
5241.04
782.71
1943.51
6125.11
1500.36
2489.19
1968.51
1569.26
7830.64
15575.40
4890.42
3477.71
2224.92
6577.77
8893.46
1022.54
2087.87
1287.99
803.30
1042.58
1798.20
1064
243
232
93
311
129
304
223
132
517
265
116
1183
1470
382
361
115
1265
1118
80
214
314
227
306
275
-8.45417
-6.38091
-8.13140
-7.47388
-6.23062
-8.00855
-8.21106
-4.51229
-6.24195
-8.08265
-7.39733
-6.59664
-8.03890
-6.42954
-5.97549
-6.94139
-6.89264
-5.34178
-6.55131
-7.16622
-6.18159
-8.70540
-7.96179
-7.97262
-8.17444
0.284780
0.002079
0.010897
0.005879
0.007454
0.020725
0.017742
0.003083
0.008000
0.128935
0.020423
0.001861
0.190536
0.028518
0.003465
0.008836
0.002899
1.395683
0.131217
0.006511
0.004082
0.078290
0.036380
0.115003
0.002933
2059
164
436
171
574
238
497
378
248
1025
452
214
2201
2238
598
506
222
2486
2020
154
267
572
414
570
466
0.95%
0.86%
2.85%
3.04%
3.80%
2.19%
5.23%
15.78%
6.65%
4.18%
5.04%
2.28%
4.56%
0%
0.67%
2.66%
4.94%
0.0%
0.0%
12.64%
8.75%
5.23%
2.00%
7.70%
2.57%
4. Spectral Flatness (SF): Spectral flatness can differentiate the voiced from the
unvoiced parts of the signal (which can occupy the same frequency range)
giving a low
q value for the noise and a high value for the voiced signal. Let
Gm = 1/M ∏nM=0 | X (n)|, and Am = 1/M ∑nM=0 | X (n)| be the means of the
magnitude values of the spectral points X (n). Spectral flatness measures
the ratio of the Gm and Am of the signals power spectrum as defined by
SF = 10 log10
10
Gm
·
Am
(4)
This is also known to measure the tonality (harmony of the focal point) of
the syllable signal. As Gm < Am , the maximum value of the SF can take is 1.
If SF = -60dB indicates the signal is very tone like, then SF = 0dB indicates
the signal is more noisy [18].
5. Signal Energy (SE): A syllable signal energy is the sum of the squared magnitude of the samples. That is,
N −1
∑
SE =
X ( n )2
(5)
n =0
where X denotes the DFT as before.
6. Zero Crossing Rate (ZCR): It represents the number of times the domain
adjacent data samples have different sign values. Similar to the spectral
centroid, ZCR measures the spectral shape of the syllable.
7. Frequency (FQ): The frequency of each syllable was tallied against the control group of ten songs, including partial syllables that may occur at the
beginning or ending of a phrase.
5.3
Distance Matrix
We constructed a matrix representation for each syllable using the spectrogram
data, and defined a norm to calculate distances between candidate syllables. The
spectrogram matrix is partitioned into N × N blocks denoted by bi,j for i, j =
1, 2, . . . , N, and the average b̄i,j is calculated for each block as shown below.


b̄1,1
b̄1,2
. . . b̄1,N −1
b̄1,N
 b̄2,1
b̄2,2
. . . b̄2,N −1
b̄2,N 




..
..
.
..
.
.
.


.
.
.
.
.


 b̄ N −1,1 b̄ N −1,2 . . . b̄ N −1,N −1 b̄ N −1,N 
b̄ N,1
b̄ N,2 . . . b̄ N,N −1
b̄ N,N
The distance between the matrices of the aligned syllables A and B of same dimensions is then calculated by the distance formula
!1/2
N
dist( A, B) :=
N
∑ ∑ (b̄i,jA − b̄i,jB )2
i =1 j =1
11
(6)
A distance value close to zero between two syllables is expected to show their
strong resemblance. However, caution should be taken as the averaging of matrix entries may result in false indications. In this paper, we used the distance
matrix in addition to other features given above to identify distinct syllables. The
distance matrix for the case when N = 20 is given in Table 3 for the synthetic
syllables created to represent each set as explained in the next section.
6
Synthetic Syllables
In order to provide a standard framework for further studies in canary vocalization, we have constructed a synthetic model for each syllable in two steps. In the
first step, each syllable in the set was compared to the control syllable identifying the exact position of the maximum correlation values, and thus aligning each
syllable in the set to be convolved with the control syllable. We then apply an
impulse response or a convolution kernel generating an array of 10 output signals
that are of the same size, the shortest signal, which is chosen to be our control
syllable. The process produced filtered syllables with equal length without loosing any essential features. The aligned and convolved syllables are then averaged
together, and the result is normalized with respect to the amplitude forming a
unique synthetic syllable representative of the original set. This process was repeated for each of the 25 distinct syllable sets generating a synthetic dictionary of
25 syllables. The resulting synthetic syllable spectrograms are shown in Table 4
for each of the 25 syllables.
7
Song Coding
A canary song will range in duration from a minimum of 5 seconds to an excess
of 2 minutes. From the library of songs generated for this project, we selected
10 songs that represented various song sizes, and all syllables in the dictionary.
Each of the ten songs has been coded based on the syllable types, and duration
of time between distinct syllables or the unvoiced space. For simplicity, we will
not give time lengths between syllables in phrases or trills consisting of at least
4 syllables. The markings of unvoiced space other than trill phrases is shown by
the notation .x. which indicates 0.x seconds. All syllables are counted including
12
any partial syllables such that L10 implies the syllable L repeated continuously
10 times. The Figure 2 below demonstrates the coding process.
Figure 2: Coding of a phrase. The first syllable A is repeated twice with an unvoiced time space
followed by the trill phrases of syllables H and W. The coding is: A1 .13. A1 .14. H11 .05. W4
A complete coding for the ten songs is given in Table 5. A percentage of
occurrence for each syllable is listed in the last column of Table 2. The coding
process brings to light the significance of pattern sequences that appear in canary
songs referred to as motifs; namely, a sequence of at least 3 different syllables that
are repeated sequentially. We expect that the coding scheme developed in this
paper will be of use in behavioral and neural studies of canaries. It is observed,
for example, that during copulation the canary remains perched in one location
until the M syllable is reached at which time he takes flight chasing the female.
A detailed analysis will be given in an upcoming publication.
8
Discussion
Canary syllables vary significantly in duration ranging from 0.3s (syllables M
and N, for example) to 0.01s (syllable I). Most syllables, however, have short time
durations and are contained in a successive repeated phrase as shown in Table 1.
The few syllables that have long duration are the long whistle, the tonal syllable
A, and the call syllables N, R, and S. The one exception is syllable M, which
occurs in most songs and tends to be near the end of song and/or preflight.
We identified five ’sexy’ syllables B, H, O, P and Y with multiple elements,
high repetition and frequency modulation rate; the most and least frequent of
which appear to be H and O, respectively. Lehongre and Del Negro [9] found
individual specific syllables in about 16% of a canaries repertoire, and the sequential arrangement of motifs (multiple syllables of three or more distinct syllables).
13
Such specifications give a unique identity to a song suggesting that contains individual information whereby a female can identify a male. Our findings also
verified syllables arranged in successive repeated patterns of varied length occurring in most songs. Certain short syllable sequences such as D-F and T-U
occurred repeatedly during song, with T-U being the longest duration of any syllable sequence (2 sec). In the coding of the 10 representative songs, we identified
the motifs D-F-E-H and J-G-K-V in Table 5.
The vocal performances of our sterilized domesticated canary displayed various patterns. We have classified the physical structure of syllables as linear, upsweep arch, downsweep arch, or complex in Table 1. Most syllables are complex
in shape with a combination of negative, positive, and/or constant slope. The
syllables range from near tonal to complex, with a dominant frequency range of
2400Hz to 4600Hz. The size of unique seasonal dictionary appears to vary by age
ranging between 12–40 distinct syllables in other studies [20, 11, 10]. The size of
25 distinct syllables is consistent with this range as our canary was approximately
one year old.
This work seems to be the first to give a detailed quantitative analysis and
classification of the vocal repertoire in canaries. Further studies are necessary
to determine in quantitative measures seasonal changes in their repertoire. The
results verify and demonstrate the use of diagnostic techniques in songbird vocalization, and set a framework and illustrate possible applications in the areas
of song production and animal behavior.
14
List of Figures
1
Example of a canary song phrase spectrogram. The intensity of the sound is
denoted by the gray-scale. The building blocks of a song are (1) the elements of
syllable, (2) the syllable composed of one or more elements (3), and a train of
syllables forming a phrase, which in turn makes a part of the song.
2
. . . . . .
4
Coding of a phrase. The first syllable A is repeated twice with an unvoiced time
space followed by the trill phrases of syllables H and W. The coding is: A1 .13.
A1 .14. H11 .05. W4
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
List of Tables
1
A summary of detailed syllable analysis. Columns 2 and 3 give physical features
based on slope, number of elements, and frequency. Columns 4 - 8 provide
measurements from each syllable set (of ten of its own type) in seconds, Hz, and
dB respectively. The acronym FS stands for frequency sweep, R for the repetition,
D for duration, S for spacing, DF for dominant frequency, PP for peak power, and
E for energy.
2
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
A list of syllable temporal and spectral features: SC stands for spectral centroid,
SW for signal bandwidth, SRF for spectral roll–off frequency, SF for spectral
flatness, SE for short time signal energy, ZCR for zero crossing rate, and FQ for
frequency.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
The matrix lists distances (×103 ) between distinct syllables.
4
Each syllable is classified by an identification card consisting of the averaged
. . . . . . . . . .
normalized syllables clipped spectrogram generated by a Mathematica.
5
. . . .
10
18
19
Song coding of 10 songs from catalog of seasonal songs that best represent each
syllable. The syllables are stated first followed by frequency and unvoiced space
between syllables measured in seconds.
15
. . . . . . . . . . . . . . . . . . .
22
References
[1] C.K. Catchpole and P.J. Slater, Bird song: biological themes and variations, Cambridge Univ. Press, Cambridge (2005), 163–199.
[2] Zhixin Chen and Robert C. Maher, Semi-automatic classification of bird vocalizations using spectral tracks, Acoustical Society of America 120 (2006), 2974–
2984.
[3] Christopher W. Clark, Peter Marler, and Kim Beeman, Quantitative analysis
of animal vocal phonolgy: an application to swamp sparrow song, Ethology 76.
[4] Paul R. Ehrlich, D. Dobkin, and D. Wheye, Vocal development.
[5] Sappo Fagerlund, Acoustics and physical models of bird sounds, (2004).
[6] Timothy J. Gardner, F. Naef, and F. Nottebohm, Freedom and rules: the acquisition and reprogramming of a bird’s learned song, Science 308 (2005), 1046–1049.
[7] A. Harma, Classification of the harmonic structure in bird vocalization.
[8] O.N. Larsen and F. Goller, Role of syringeal vibrations in bird vocalizations’, Proc.
R Soc. London 266.
[9] Katia Lehongre, Thierry Aubin, Stephane Robin, and Catherine Del Negro,
Individual singature in canary songs: contribution of multiple levels of song structure, Ethology 114 (2008), 425–435.
[10] Stefan Leitner and Clive K. Catchpole, Syllable repertoire and the size of the song
control system in captive canaries (serinus canaria), Neurobiol 60 (2004), 21–27.
[11] Stefan Leitner, Cornelia Voigt, and Manfred Gahr, Seasonal changes in the song
pattern of the non-domesticated island canary, Behavoir 138 (2001), 885–904.
[12] F. Nottebohm and M. E. Nottebohm, Relationship between song repertoire and
age in the canary, Z. Tierpsychol. 46 (1978), 298–305.
[13] F. Nottebohm, M.E. Nottebohm, and L. Crane, Developmental and seasonal
changes in canary song and their relation to changes in the anatomy of song-control
nuclei, Behav Neural Biol 46 (1986), 445–471.
16
[14] Fernando Nottebohm, The neural basis of birdsong, PloS. Biol. 3 (2005).
[15] S. Nowicki, Bird acoustics, John Wiley & Sons.
[16] Henri Ouellet, D.E. Kroodsma, and E.H. Miller, Acoustic communication in
birds, Academic Press.
[17] Arja Selin and J. T., Bird sound classification and recognition using wavelets,
EURASIP Journal on Applied Signal Processing 2007 (2007), 141.
[18] Panu Somervuo, Aki Harma, and Seppo Fagerlund, Parametric representations
of bird sounds for automatic species recognition, IEEE 14 (2006), 2252–2263.
[19] E. Vallet and M. Kreutzer, Female canaries are sexually responsive to special song
phrases, Animal Behavior 49 (1995), 1603–1610.
[20] E. Vallet, M. Kreutzer, and Irina Beme, Two-note syllables in canary songs elicit
high levels of sexual display, Animal Behavior 55 (1998), 291–297.
[21] W.H.Thorpe, J. M., and K. O., Communication in courtship calls, Prentise-Hall,
New Jersey, 1954.
17
Table 3: The matrix lists distances (×103 ) between distinct syllables.
A
B
C
D
E
F
G
H
I
J
K
L
M
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
A
B
C
D
E
F
G
H
I
J
K
L
M
0
63.3
0
51.5
44.1
0
58.6
43.2
45.6
0
57.5
26.4
38.2
32.7
0
37.4
77.4
58.1
67.4
70.2
0
44.6
5.00
33.6
49.1
41.2
41.9
0
57.3
29.5
39.3
37.2
21.1
73.6
44.8
0
26.0
50.7
43.8
47.6
42.6
38.6
33.2
45.9
0
65.5
80.1
57.1
77.8
74.7
51.7
46.0
76.2
58.4
0
31.2
54.0
37.9
47.4
46.8
39.2
36.6
50.5
28.3
65.8
0
43.3
31.4
34.5
34.6
19.1
58.8
33.6
22.4
29.2
65.4
38.7
0
44.0
53.7
50.2
52.3
43.0
44.9
26.8
50.3
30.4
55.3
42.2
36.1
0
N
O
P
Q
R
S
T
U
V
W
X
Y
50.3
31.3
27.5
39.2
20.3
64.4
37.3
23.7
37.6
67.8
39.3
19.4
43.6
0
52.0
29.2
32.3
40.0
20.9
70.4
44.0
22.6
42.0
76.4
42.7
20.6
49.3
17.3
0
59.9
30.7
35.6
35.4
17.3
68.4
39.1
27.3
47.0
71.8
46.2
25.7
46.4
23.9
25.7
0
60.0
31.3
41.0
19.3
17.7
71.5
46.1
27.3
46.5
77.8
48.2
25.7
49.1
28.5
28.3
22.7
0
80.4
128
114
117
125
77.9
104
124
91.5
108
90.3
112
100
117
121
126
124
0
46.3
55.9
51.8
53.3
49.0
55.9
44.4
48.7
38.9
67.8
46.7
40.2
43.9
46.8
49.4
52.4
51.7
104
0
63.8
35.7
45.2
19.3
22.4
73.1
49.8
34.3
50.7
80.4
51.4
31.9
52.2
34.4
35.0
26.1
13.6
126
54.7
0
54.2
29.9
34.8
25.0
13.2
64.5
37.4
25.9
39.6
70.4
42.4
20.4
41.1
21.5
24.7
19.2
14.3
120
45.7
17.2
0
73.9
90.5
72.8
91.2
85.8
80.6
73.1
81.0
78.2
78.1
82.7
78.4
73.4
77.9
83.4
85.8
89.9
114
65.6
91.6
83.8
0
94.7
88.5
76.7
93.4
85.6
100
87.5
88.6
90.2
91.3
91.9
86.1
89.5
75.8
82.9
84.5
92.6
133
89.2
90.6
85.9
97.8
0
10.3
108
74.0
105
106
89.6
83.4
106
102
73.7
94.5
103
100
95.8
103
98.5
107
135
105
108
101
105
101
0
47.1
35.3
39.1
39.4
31.5
64.0
43.4
30.8
40.8
72.3
45.2
26.7
46.0
30.2
29.9
36.5
30.0
113
37.3
38.3
32.1
70.6
90.0
104
0
18
Table 4: Each syllable is classified by an identification card consisting of the averaged normalized
syllables clipped spectrogram generated by a Mathematica.
19
20
21
Table 5: Song coding of 10 songs from catalog of seasonal songs that best represent each syllable. The syllables are stated first followed by frequency and unvoiced space between syllables
measured in seconds.
Song
Syllable Coding
1
O7 .1. H14 .4. X11 .25. M5 .1. H13 .05. Q16 .3. W8 .15. X14 .1. J4G4 .1. M1 .1. M1 .1.
M1 .05. M1 .15. I19 .05. Q8 .1. E4
2
T26U15 .3. D10F5 .2. E6 .05. H11 .2. J3G9 .2. M1 .1. M1 .1. M1 .05. M1 .2. I8C8
.2.J4G5K10V5 .2. C8 .2. J6K3 .05. M1
3
H12 .15. V6 .025. A2 .05. H17 .275. T21U17 .2.7. C14 .1. J6G3K4V4 .1. M1 .1. M1 .1.
M1 .05. M1 .1. I16V1 .15. V7 .1. M1 .1. M1 .1. M1 .1. M1 .05. M1 .05. M1 .05. M1
4
T24U32 .15. A1 .1. H12V1 .1. V8 .1. M1 .1. M1 .1. M1 .1. M1 .05. G5V2
5
H11V3J4P7K11 .2. A1 .025. A1 .1. I12X9 .05. J6G5 .1. M1 .1. M1 .1. M1 .05. M1 .1. H14
6
D11F9 .3. T19 .55. Q14 .1. E9 .1. H12 .15. V6W1 .1. M1 .1. M1 .1. M1 .05 M1 .25. X10 .2.
K4V2K3V2 .65. W12K6P3 .1. M1 .05. M1 .05. M1 .05. M1 .1. H12 .3. V8 .1. M1 .1. M1
.05. M1G3 .1. A1 .1. H13 .8. I5E6Q14 .1. A1 .05. A1 .1. E6 .1. H3X8J6G4P7K3
7
X15E4 .02. H10 .32. J4G12 .15. M1 .15. M1 .18. M1 .01. M1 .03. I10X8
8
L24 .28. T16E3 .13. H10 .06. J1 .24. D11F9 .27. T17 .8. Y7E2 .23. P6G5
9
T12U10 .49. K9 .55. Y12 .22. H2P5
10
A1 .07. A1 .25. T17U18X14 .63. B9
22