A Complete Syllable Dictionary for Serinus Canarius
Transcription
A Complete Syllable Dictionary for Serinus Canarius
For submission to Ecological Informatics Article revision 2012/12/01 A Complete Syllable Dictionary for Serinus Canarius Glenda Angle and Hasan Coskun Abstract This paper is devoted to a detailed quantitative analysis of the vocal repertoire of a domestic male canary (Serinus Canarius) by multivariate statistical methods, and temporal and spectral feature extraction. In the first part of the paper, combination of various methods are employed to extract characteristic features to classify all syllables, and construct a complete seasonal syllable dictionary for the canary. In the second part, synthetic syllables corresponding to all entries in the dictionary are generated providing a standardized framework for behavioral and neural studies of canaries. Keywords canary vocalization, feature extraction, syllable classification, synthetic syllables, song coding 1 Introduction The structure of a canary (Serinus Canarius) song is diverse; it may contain components which are sinusoidal, harmonic, non harmonic, and noisy in structure [15]. The study of acoustic features in canary syllables, the building blocks of songs, will facilitate methods in producing synthetic syllables and provide a standardized framework for future behavioral and neural studies. The focus of this paper is to construct a seasonal syllable dictionary, and generate its corresponding synthetic version. The first part is carried out by computing temporal and spectral features of canary songs, and identifying of all distinct syllables. The second phase is completed with the help of certain multivariate statistics techniques. To the best of our knowledge, a complete listing of seasonal syllables has not yet been generated before. Thorpe [21] along with Mulligan and Olsen had developed the first documented list of syllables. However, their classification was based on visual inspection of sonograms which, as they admit, was not a reliable quantitative method. Thorpe’s focus was the study of canary communication and its significance in their social behavior. Peter Marler expanded on this work utilizing a numerical representation of sounds, and demonstrated them on a few selected syllables [3]. However, a complete seasonal syllable dictionary generated by precise quantitative methods as employed in this paper had still been missing. Some of the methods we utilized here were also used by Nottebohm [12] to compare changes that occur in a group of canaries’ syllable repertoire over the years, and by Somervuo and Härmä [7] who studied classification of bird sounds based on their harmonic structures. Other researchers such as Konishi, Nottebohm and Gardner have studied neural structure of the canaries which has some connections to our work. [3, 5, 17, 7, 6, 12]. In this study, we have identified 25 distinct syllables in a library of approximately 100 songs recorded in a period of one year that included pre through post copulation periods. Raven Pro 1.3 and Mathematica software are used to carry out the numerical computations and statistical analysis. The Raven Pro, a software application for acquisition and analysis of acoustic signals developed by Cornell Bioacoustics Research program, provided a variety of recording and sound measurement toolboxes for the syllable recognition phase. Mathematica software is used to analyze temporal and spectral features of songs, and compare them to identify distinct syllables. It is also used for the production of their synthetic analogues as Raven Pro offered limited capabilities in this regard. The paper is organized as follows. Section 2 is about the basic stages of canary song development, and the song control system; Section 3 is on the song structure in a hierarchical organization; Section 4 is about the practices of song recording; Section 5 is on the classification of syllables based on evaluation parameters for certain features; Section 6 is about the construction of synthetic syllables; Section 7 gives examples of song coding for pattern recognition; and Section 8 contains a discussion of results. This work has been supported by the Advanced Research Program (ARP) grant 003656-0046-2007 awarded by the Texas Higher Education Coordinating Board. 2 2 2.1 Song Development Development Stages Canaries are vocal learners that imitate their tutor to produce normal songs [14]. Recent studies [6] have indicated that young canaries can also learn to imitate synthetic songs generated by a computer that violate phrase patterns, but as the bird matures he edits his repertoire such that it is typical of an adult canary melody. A young male builds a large repertoire of song syllables through the developmental stages that begin with what is called subsong (about 40 days after hatching) to plastic song, and then to full or stable song (sometimes called the crystallized form) [4]. Learning song depends on auditory experiences and motor practice: hearing others and oneself is essential to the normal vocal development of songbirds [16]. Early development begins with learning calls and trilling sounds at the preliminary subsong stage. In this phase the syllables are practiced to achieve pitch and timber; only a small percentage of these syllables would be retained through the stable song stage. In the plastic stage of vocal development, the frequency and time resolution no longer have fuzzy boundaries, the repeated syllables have structure and consistency. Songs that develop into stable form will last the duration of a season or the end of each breeding season at which time the male canary returns to the subsong stage to rebuild his syllable repertoire, then once again continues through to the plastic and stable song stages. During the rebuilding of syllables, some of the original syllables are believed to disappear or possibly forgotten, while others are transformed or reconstructed with maturity [13]. However, it is unclear if the ability to develop new songs continues throughout full life of a canary. 2.2 Canary Vocal Organs It would be helpful to give a brief description of the canary vocal system too. The syrinx, a vocal organ located at the junction of the bronchi and the trachea consists of movable tissue membranes called medial labia (ML) and lateral labia (LL). It has a similar function to that of the glottis and vocal folds in humans. These connective tissues are set into oscillatory motion by air from the lungs forced 3 through the bronchi into the syrinx then modulated by the vocal tract which acts as a resonator tube consisting of the trachea, larynx, mouth, and beak [2]. Syrinx muscles control the movement of the syrinx which allows the bird to control the frequency and intensity of sounds by changing the air pressure passing from the lungs to the syrinx and varying the tension on the membranes. The syrinx converts the energy of the expiratory airflow into sound similar to that of the larynx. 3 Song Structure A canary song may be segmented into components of increasing complexity in a hierarchical structure. The smallest component referred to as an element usually regarded as the smallest acoustic unit of vocalization. Syllables may be composed of one or more elements that occur together in a regular pattern. A sequence of the same syllable generates a phrase or trill based on the modulation rate. Songs are long complex vocalizations which include some or all of the described components. An example of the hierarchical canary vocal structure is shown in Figure 1 below. Figure 1: Example of a canary song phrase spectrogram. The intensity of the sound is denoted by the gray-scale. The building blocks of a song are (1) the elements of syllable, (2) the syllable composed of one or more elements (3), and a train of syllables forming a phrase, which in turn makes a part of the song. An arrangement of three or more syllables in a sequential pattern reoccurring in a large number of different songs is called a motif. A syllable may occur in different motifs, repeating or non-repeating. We found small motifs of 3–4 syllables more frequent in the breeding season than those consisting of more than 6 syllables [11]. 4 4 4.1 Set Up and Methods The Birds (Sun and Shine) The sterilized domestic male canary used in this study was purchased at a nationally recognized pet store at the age of six months. The female was purchased at the same time through a private pet store. The cage size which housed both birds is 3000 × 1800 × 1800 , and the two canaries had unrestricted physical interactions. Each clutch produced 4 eggs in a nest that was constructed from grass and paper by the female. The canary was housed in a closed room with parakettes, finches, and other canaries for at least one month in early April and May prior to purchase. The breeding season of the male lasted from the beginning of June until late November producing several clutches followed by moult beginning in December at which time the canary did not sing. 4.2 Song Recording Recordings were made using a Dynex microphone (DX-54) with sensitivity -53 dBV/ubar at 1KHz, and DC power 1.5V to 10V. The microphone was place inside the cage at a position of no less than 4 inches from the canary. Testing the microphone with tonal sounds showed two harmonics on their spectrogram at the first and second multiple of the dominant frequency. All songs were recorded using Raven Pro 1.3. Each signal was digitized at 44.1kHz with 16-bit pulse code modulation (PCC). To ensure consistency with all recordings the following data parameters were chosen as the best fit. Spectrograms were created by DFT with a Hann window, frame length of 256 points, and a temporal resolution of 58 ms. Additionally, a 50% frame overlap with hop size of 2.9 ms, and frequency grid spacing of 172 Hz is used. The bandwidth is set at 3dB. Approximately 100 songs were recorded in similar conditions and anayzed providing a sufficient sampling to obtain a complete seasonal syllable dictionary for the specific canary. From this rich library of songs, ten representative songs were selected and listed in Table 5 that contain all distinct syllables for song coding and further analysis. 5 4.3 Song Analysis The songs were recorded in a twelve month span throughout the day during mating season which took place from late June through November. Recognition of syllables was first performed on song spectrograms by visual inspection with special focus on measuring frequencies, duration of intervals between the repeated parts of trill phrases, full syllable duration, slope sweep, and overall form. Within a song, some syllables will show slight variation in duration and amplitude. Some syllables recur frequently in the songs, while others are much more rare in occurrence as noted in Table 2, based on a sampling from the library of recorded songs. Note that syllables N, R, and S are not represented in any song, but appear as voiced sounds sporadically throughout the day. We will refer to these as call syllables. A song is defined to be at least 1.5 seconds in length and may last approximately one minute. A new song begins if a time lapse or pause longer than 0.4s occurs in the recording. Songs used in the mating context produce more complex (what is also called sexy) syllables. These same syllables also occur outside the breeding season, but at lower rates [19]. The structural components of each syllable form the initial step in analysis. Most syllables consist of two or three components; it is known that they may contain up to 6 components depending on the breed of canary. Throughout most of the recorded songs one component syllables are present, typical of domesticated canaries [11]. We consider, for the purpose of this paper, tonal sounds such as syllable A repeated at different frequencies (known as frequency shift) the same syllable as suggested by Clark and Marler [3]. That is, all syllables are held to the same approach in determining distinct structures without considering possible frequency shifts as a factor. A young male builds a large repertoire of song syllables varing between 12–40 distinct syllables per season [12, 1, 10]. That our canaries produced twenty-five distinct syllables in the first year may be related to leaving out the frequency shift as a defining factor, and their age as the number of syllables is expected to increase each additional season [12]. 6 5 Classification of Syllables The classification of syllables was carried out in five steps: (1) visual inspection of song spectrograms, and selection of syllable candidates based on physical structures; (2) segmentation of spectrograms and determining acoustic parameters for classification; (3) extracting a set of features for each syllable; (4) computing the correlation between syllable candidates and generating a list of ten candidates for each syllable; (5) construction of a matrix representation for each syllable, and calculating the distance between candidates using a suitable norm. The details of this thorough classification process is discussed in the next section. Even though some of the syllables can be recognized visually, this approach gives a precise quantitative measure for how syllables and phrases differ. Table 1 and Table 4 show a complete list of individual syllables and their corresponding features. 5.1 Syllable Sets In the selection phase, ten representations of each syllable (called a set) was identified from a control group of ten songs. The songs were bandpass filtered to attenuate ambient noise [8], or low frequency background noise; typically a lower bandpass of 100 Hz is used [2]. The elements in syllables of short duration were difficult to extract with reliable signal boundaries; thus additional freedom was used when manually isolating a single syllable. Twenty five syllable sets were generated based on this procedure where each set was labeled by a letter from A through Y; out of which {N, R, S} are call syllables that do not occur in songs. To make it more precise, a quantitative comparison was performed by sliding two spectrograms or waveforms past each other in time while computing the correlation. Typically, a bandpass filter may be applied to reduce the amount of noise contained in the signals prior to the correlation process if the energy outside the frequency band is significant as we stated earlier. Once a syllable candidate is identified and set as the control syllable, the position of the peak correlation was identified against all other syllables in the songs. Syllables were considered to be candidates in the same syllable set if their correlation was higher than 0.230, even though there may be slight variations in their durations, formats, and frequency modulations. The correlation process was repeated for each syllable to capture ten occurrences of the same syllable candidate in all recorded songs. 7 5.2 Feature Extraction The form of a canary syllable may range from a simple single frequency sweep (tonal syllable) to a multiple element complex syllable. The frequency modulation may vary sharply from low to very high values or vice versa. Syllables with fast frequency modulation of about 13s−1 and high repetition rates that contain more than one element (called sexy syllables [20, 19]) are important in female mating choice. In addition to the modulation, our classification scheme will utilize duration (s), dominant frequency (Hz) and peak power (dB) of each syllable. The objective in feature extraction is to identify distinct categories in which syllables are similar or different. Measures such as duration, dominent frequency, average energy and signal power are presented in Table 1. The entries in the table correspond to µ f ± 2σ f for each feature f where µ f is the average and σ f is the standard deviation for each syllable set. Additional features are listed in Table 2 to represent physical or relevant perceptual aspects of distinct syllables respectively. A few of these physical features calculated from the sound wave are spectral centroid, signal bandwidth and zero crossing rate. The loudness, pitch and brightness, on the other hand, are perceptual features. We kept the number of classification features to a moderate number for preventing irrelevant features to cause possibly impaired classification. A brief description of various parameter features used are given below: 1. Spectral Centroid (SC): The brightness of the sound is referred to as the spectral centroid or center point of the spectrum; thus the higher the centroid the brighter the related sound. Spectral centroid is calculated by [5]: 2 ∑nM=0 n | X (n)| 2 ∑nM=0 | X (n)| where X is the DFT of the sound and M is half of the size of DFT. SC = (1) 2. Signal Bandwidth (BW): The width of the frequency band of the signal around the SC is called the signal bandwidth, which is calculated by [5]: s ∑nM=0 (n − SC )2 | X (n)| BW = (2) 2 ∑nM=0 | X (n)| where SC, X, and M are as previously stated. The bandwidth of a syllable is the range of lower and upper cutoff frequencies where the DFT is −3dB. 8 Table 1: A summary of detailed syllable analysis. Columns 2 and 3 give physical features based on slope, number of elements, and frequency. Columns 4 - 8 provide measurements from each syllable set (of ten of its own type) in seconds, Hz, and dB respectively. The acronym FS stands for frequency sweep, R for the repetition, D for duration, S for spacing, DF for dominant frequency, PP for peak power, and E for energy. A B C D E F G H I J K L M N O P Q R S T U V W X Y FS R D S DF PP E Linear Complex Upsweep Complex Complex Complex Downsweep Complex Complex Complex Complex Upsweep Linear Linear Complex Complex Linear Linear Complex Upsweep Upsweep Linear Downsweep Upsweep Complex Single 8-11 8-12 9-13 6-8 5-9 4-6 12-13 14-19 3-6 4-11 18-21 Single Single 5-8 3-14 8-16 Single 5 17-31 15-29 2-8 5 - 12 8-11 3-6 0.30 ± .020 0.08 ± .002 0.05 ± .004 0.03 ± .004 0.10 ± .004 0.03 ± .003 0.07 ± .010 0.05 ± .005 0.03 ± .002 0.12 ± .008 0.05 ± .003 0.03 ± .005 0.30 ± .020 0.30 ± .030 0.08 ± .008 0.08 ± .030 0.03 ± .005 0.27 ± .001 0.30 ± .020 0.03 ± .004 0.04 ± .006 0.06 ± .010 0.04 ± .005 0.06 ± .010 0.07 ± .006 − 0.001 ± .000 0.010 ± .010 0.010 ± .010 0.020 ± .015 0.010 ± .001 0.080 ± .010 0.015 ± .015 0.007 ± .006 0.001 ± .001 0.030 ± .002 0.001 ± .002 0.200 ± .020 − 0.005 ± .004 0.030 ± .009 0.030 ± .009 − − -.003 ± .003 0.000 ± .003 0.003 ± .002 0.002 ± .001 0.010 ± .002 0.008 ± .009 3824.2 ± 773 3005.1 ± 091 4842.6 ± 104 3359.2 ± 181 2879.3 ± 840 3808.9 ± 664 4162.1 ± 298 2202.7 ± 892 3315.9 ± 153 4324.6 ± 265 3617.4 ± 414 3762.8 ± 224 3617.6 ± 227 4134.4 ± 608 3904.6 ± 1120 2628.5 ± 876 2476.9 ± 95.1 3617.6 ± 200 3617.6 ± 182 2618.3 ± 109 2618.4 ± 730 3678.6 ± 160 5052.7 ± 272 3962.1 ± 333 3812.1 ± 310 88.5 ± 2.4 75.3 ± 7.1 90.1 ± 4.2 78.4 ± 6.8 84.4 ± 1.0 82.0 ± 3.8 88.6 ± .62 85.5 ± 4.7 82.4 ± 2.7 90.5 ± 4.2 84.4 ± 5.1 79.6 ± 2.3 92.2 ± 3.7 79.2 ± 4.4 79.3 ± 3.4 82.9 ± 4.3 84.9 ± 2.6 74.2 ± 2.0 89.1 ± 2.8 79.0 ± 2.8 75.6 ± 2.4 91.5 ± 3.9 91.2 ± 2.1 85.9 ± 3.1 82.8 ± 6.6 79.6 ± 2.9 65.2 ± 1.6 74.1 ± 4.0 61.3 ± 5.1 71.4 ± .80 66.7 ± 8.2 75.1 ± .75 67.3 ± 3.7 63.4 ± 1.6 76.3 ± 3.7 68.1 ± 3.8 62.8 ± 1.7 81.8 ± 3.8 71.1 ± 4.1 63.8 ± 3.3 67.0 ± 3.1 66.2 ± 2.5 65.7 ± 1.0 77.1 ± 2.7 63.2 ± 2.0 60.9 ± 2.6 75.5 ± 3.5 74.3 ± 2.4 71.5 ± 3.2 67.1 ± 6.9 3. Spectral Rolloff Frequency (SRF): SRF is correlated to skewness of the spectral shape or the the value below which a certain amount of power distribution occurs. This measure can distinguish sounds with different frequency ranges [5]. For e is the threshold between 0 and 1, it is defined by K SRF = max(K ∑ |X (n)| n =0 9 2 M <e ∑ |X (n)| n =0 2 )· (3) Table 2: A list of syllable temporal and spectral features: SC stands for spectral centroid, SW for signal bandwidth, SRF for spectral roll–off frequency, SF for spectral flatness, SE for short time signal energy, ZCR for zero crossing rate, and FQ for frequency. A B C D E F G H I J K L M N O P Q R S T U V W X Y SC SB SRF SF SE ZCR FQ 1055.05 44.57 213.14 81.64 300.76 112.88 267.05 164.04 124.44 492.64 238.53 107.06 1037.37 1149.14 288.37 238.91 103.50 1247.02 1021.58 78.57 131.70 268.71 215.78 293.14 235.73 421179 3458.48 1405.10 1285.36 5241.04 782.71 1943.51 6125.11 1500.36 2489.19 1968.51 1569.26 7830.64 15575.40 4890.42 3477.71 2224.92 6577.77 8893.46 1022.54 2087.87 1287.99 803.30 1042.58 1798.20 1064 243 232 93 311 129 304 223 132 517 265 116 1183 1470 382 361 115 1265 1118 80 214 314 227 306 275 -8.45417 -6.38091 -8.13140 -7.47388 -6.23062 -8.00855 -8.21106 -4.51229 -6.24195 -8.08265 -7.39733 -6.59664 -8.03890 -6.42954 -5.97549 -6.94139 -6.89264 -5.34178 -6.55131 -7.16622 -6.18159 -8.70540 -7.96179 -7.97262 -8.17444 0.284780 0.002079 0.010897 0.005879 0.007454 0.020725 0.017742 0.003083 0.008000 0.128935 0.020423 0.001861 0.190536 0.028518 0.003465 0.008836 0.002899 1.395683 0.131217 0.006511 0.004082 0.078290 0.036380 0.115003 0.002933 2059 164 436 171 574 238 497 378 248 1025 452 214 2201 2238 598 506 222 2486 2020 154 267 572 414 570 466 0.95% 0.86% 2.85% 3.04% 3.80% 2.19% 5.23% 15.78% 6.65% 4.18% 5.04% 2.28% 4.56% 0% 0.67% 2.66% 4.94% 0.0% 0.0% 12.64% 8.75% 5.23% 2.00% 7.70% 2.57% 4. Spectral Flatness (SF): Spectral flatness can differentiate the voiced from the unvoiced parts of the signal (which can occupy the same frequency range) giving a low q value for the noise and a high value for the voiced signal. Let Gm = 1/M ∏nM=0 | X (n)|, and Am = 1/M ∑nM=0 | X (n)| be the means of the magnitude values of the spectral points X (n). Spectral flatness measures the ratio of the Gm and Am of the signals power spectrum as defined by SF = 10 log10 10 Gm · Am (4) This is also known to measure the tonality (harmony of the focal point) of the syllable signal. As Gm < Am , the maximum value of the SF can take is 1. If SF = -60dB indicates the signal is very tone like, then SF = 0dB indicates the signal is more noisy [18]. 5. Signal Energy (SE): A syllable signal energy is the sum of the squared magnitude of the samples. That is, N −1 ∑ SE = X ( n )2 (5) n =0 where X denotes the DFT as before. 6. Zero Crossing Rate (ZCR): It represents the number of times the domain adjacent data samples have different sign values. Similar to the spectral centroid, ZCR measures the spectral shape of the syllable. 7. Frequency (FQ): The frequency of each syllable was tallied against the control group of ten songs, including partial syllables that may occur at the beginning or ending of a phrase. 5.3 Distance Matrix We constructed a matrix representation for each syllable using the spectrogram data, and defined a norm to calculate distances between candidate syllables. The spectrogram matrix is partitioned into N × N blocks denoted by bi,j for i, j = 1, 2, . . . , N, and the average b̄i,j is calculated for each block as shown below. b̄1,1 b̄1,2 . . . b̄1,N −1 b̄1,N b̄2,1 b̄2,2 . . . b̄2,N −1 b̄2,N .. .. . .. . . . . . . . . b̄ N −1,1 b̄ N −1,2 . . . b̄ N −1,N −1 b̄ N −1,N b̄ N,1 b̄ N,2 . . . b̄ N,N −1 b̄ N,N The distance between the matrices of the aligned syllables A and B of same dimensions is then calculated by the distance formula !1/2 N dist( A, B) := N ∑ ∑ (b̄i,jA − b̄i,jB )2 i =1 j =1 11 (6) A distance value close to zero between two syllables is expected to show their strong resemblance. However, caution should be taken as the averaging of matrix entries may result in false indications. In this paper, we used the distance matrix in addition to other features given above to identify distinct syllables. The distance matrix for the case when N = 20 is given in Table 3 for the synthetic syllables created to represent each set as explained in the next section. 6 Synthetic Syllables In order to provide a standard framework for further studies in canary vocalization, we have constructed a synthetic model for each syllable in two steps. In the first step, each syllable in the set was compared to the control syllable identifying the exact position of the maximum correlation values, and thus aligning each syllable in the set to be convolved with the control syllable. We then apply an impulse response or a convolution kernel generating an array of 10 output signals that are of the same size, the shortest signal, which is chosen to be our control syllable. The process produced filtered syllables with equal length without loosing any essential features. The aligned and convolved syllables are then averaged together, and the result is normalized with respect to the amplitude forming a unique synthetic syllable representative of the original set. This process was repeated for each of the 25 distinct syllable sets generating a synthetic dictionary of 25 syllables. The resulting synthetic syllable spectrograms are shown in Table 4 for each of the 25 syllables. 7 Song Coding A canary song will range in duration from a minimum of 5 seconds to an excess of 2 minutes. From the library of songs generated for this project, we selected 10 songs that represented various song sizes, and all syllables in the dictionary. Each of the ten songs has been coded based on the syllable types, and duration of time between distinct syllables or the unvoiced space. For simplicity, we will not give time lengths between syllables in phrases or trills consisting of at least 4 syllables. The markings of unvoiced space other than trill phrases is shown by the notation .x. which indicates 0.x seconds. All syllables are counted including 12 any partial syllables such that L10 implies the syllable L repeated continuously 10 times. The Figure 2 below demonstrates the coding process. Figure 2: Coding of a phrase. The first syllable A is repeated twice with an unvoiced time space followed by the trill phrases of syllables H and W. The coding is: A1 .13. A1 .14. H11 .05. W4 A complete coding for the ten songs is given in Table 5. A percentage of occurrence for each syllable is listed in the last column of Table 2. The coding process brings to light the significance of pattern sequences that appear in canary songs referred to as motifs; namely, a sequence of at least 3 different syllables that are repeated sequentially. We expect that the coding scheme developed in this paper will be of use in behavioral and neural studies of canaries. It is observed, for example, that during copulation the canary remains perched in one location until the M syllable is reached at which time he takes flight chasing the female. A detailed analysis will be given in an upcoming publication. 8 Discussion Canary syllables vary significantly in duration ranging from 0.3s (syllables M and N, for example) to 0.01s (syllable I). Most syllables, however, have short time durations and are contained in a successive repeated phrase as shown in Table 1. The few syllables that have long duration are the long whistle, the tonal syllable A, and the call syllables N, R, and S. The one exception is syllable M, which occurs in most songs and tends to be near the end of song and/or preflight. We identified five ’sexy’ syllables B, H, O, P and Y with multiple elements, high repetition and frequency modulation rate; the most and least frequent of which appear to be H and O, respectively. Lehongre and Del Negro [9] found individual specific syllables in about 16% of a canaries repertoire, and the sequential arrangement of motifs (multiple syllables of three or more distinct syllables). 13 Such specifications give a unique identity to a song suggesting that contains individual information whereby a female can identify a male. Our findings also verified syllables arranged in successive repeated patterns of varied length occurring in most songs. Certain short syllable sequences such as D-F and T-U occurred repeatedly during song, with T-U being the longest duration of any syllable sequence (2 sec). In the coding of the 10 representative songs, we identified the motifs D-F-E-H and J-G-K-V in Table 5. The vocal performances of our sterilized domesticated canary displayed various patterns. We have classified the physical structure of syllables as linear, upsweep arch, downsweep arch, or complex in Table 1. Most syllables are complex in shape with a combination of negative, positive, and/or constant slope. The syllables range from near tonal to complex, with a dominant frequency range of 2400Hz to 4600Hz. The size of unique seasonal dictionary appears to vary by age ranging between 12–40 distinct syllables in other studies [20, 11, 10]. The size of 25 distinct syllables is consistent with this range as our canary was approximately one year old. This work seems to be the first to give a detailed quantitative analysis and classification of the vocal repertoire in canaries. Further studies are necessary to determine in quantitative measures seasonal changes in their repertoire. The results verify and demonstrate the use of diagnostic techniques in songbird vocalization, and set a framework and illustrate possible applications in the areas of song production and animal behavior. 14 List of Figures 1 Example of a canary song phrase spectrogram. The intensity of the sound is denoted by the gray-scale. The building blocks of a song are (1) the elements of syllable, (2) the syllable composed of one or more elements (3), and a train of syllables forming a phrase, which in turn makes a part of the song. 2 . . . . . . 4 Coding of a phrase. The first syllable A is repeated twice with an unvoiced time space followed by the trill phrases of syllables H and W. The coding is: A1 .13. A1 .14. H11 .05. W4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 List of Tables 1 A summary of detailed syllable analysis. Columns 2 and 3 give physical features based on slope, number of elements, and frequency. Columns 4 - 8 provide measurements from each syllable set (of ten of its own type) in seconds, Hz, and dB respectively. The acronym FS stands for frequency sweep, R for the repetition, D for duration, S for spacing, DF for dominant frequency, PP for peak power, and E for energy. 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 A list of syllable temporal and spectral features: SC stands for spectral centroid, SW for signal bandwidth, SRF for spectral roll–off frequency, SF for spectral flatness, SE for short time signal energy, ZCR for zero crossing rate, and FQ for frequency. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 The matrix lists distances (×103 ) between distinct syllables. 4 Each syllable is classified by an identification card consisting of the averaged . . . . . . . . . . normalized syllables clipped spectrogram generated by a Mathematica. 5 . . . . 10 18 19 Song coding of 10 songs from catalog of seasonal songs that best represent each syllable. The syllables are stated first followed by frequency and unvoiced space between syllables measured in seconds. 15 . . . . . . . . . . . . . . . . . . . 22 References [1] C.K. Catchpole and P.J. Slater, Bird song: biological themes and variations, Cambridge Univ. Press, Cambridge (2005), 163–199. [2] Zhixin Chen and Robert C. Maher, Semi-automatic classification of bird vocalizations using spectral tracks, Acoustical Society of America 120 (2006), 2974– 2984. [3] Christopher W. Clark, Peter Marler, and Kim Beeman, Quantitative analysis of animal vocal phonolgy: an application to swamp sparrow song, Ethology 76. [4] Paul R. Ehrlich, D. Dobkin, and D. Wheye, Vocal development. [5] Sappo Fagerlund, Acoustics and physical models of bird sounds, (2004). [6] Timothy J. Gardner, F. Naef, and F. Nottebohm, Freedom and rules: the acquisition and reprogramming of a bird’s learned song, Science 308 (2005), 1046–1049. [7] A. Harma, Classification of the harmonic structure in bird vocalization. [8] O.N. Larsen and F. Goller, Role of syringeal vibrations in bird vocalizations’, Proc. R Soc. London 266. [9] Katia Lehongre, Thierry Aubin, Stephane Robin, and Catherine Del Negro, Individual singature in canary songs: contribution of multiple levels of song structure, Ethology 114 (2008), 425–435. [10] Stefan Leitner and Clive K. Catchpole, Syllable repertoire and the size of the song control system in captive canaries (serinus canaria), Neurobiol 60 (2004), 21–27. [11] Stefan Leitner, Cornelia Voigt, and Manfred Gahr, Seasonal changes in the song pattern of the non-domesticated island canary, Behavoir 138 (2001), 885–904. [12] F. Nottebohm and M. E. Nottebohm, Relationship between song repertoire and age in the canary, Z. Tierpsychol. 46 (1978), 298–305. [13] F. Nottebohm, M.E. Nottebohm, and L. Crane, Developmental and seasonal changes in canary song and their relation to changes in the anatomy of song-control nuclei, Behav Neural Biol 46 (1986), 445–471. 16 [14] Fernando Nottebohm, The neural basis of birdsong, PloS. Biol. 3 (2005). [15] S. Nowicki, Bird acoustics, John Wiley & Sons. [16] Henri Ouellet, D.E. Kroodsma, and E.H. Miller, Acoustic communication in birds, Academic Press. [17] Arja Selin and J. T., Bird sound classification and recognition using wavelets, EURASIP Journal on Applied Signal Processing 2007 (2007), 141. [18] Panu Somervuo, Aki Harma, and Seppo Fagerlund, Parametric representations of bird sounds for automatic species recognition, IEEE 14 (2006), 2252–2263. [19] E. Vallet and M. Kreutzer, Female canaries are sexually responsive to special song phrases, Animal Behavior 49 (1995), 1603–1610. [20] E. Vallet, M. Kreutzer, and Irina Beme, Two-note syllables in canary songs elicit high levels of sexual display, Animal Behavior 55 (1998), 291–297. [21] W.H.Thorpe, J. M., and K. O., Communication in courtship calls, Prentise-Hall, New Jersey, 1954. 17 Table 3: The matrix lists distances (×103 ) between distinct syllables. A B C D E F G H I J K L M A B C D E F G H I J K L M N O P Q R S T U V W X Y A B C D E F G H I J K L M 0 63.3 0 51.5 44.1 0 58.6 43.2 45.6 0 57.5 26.4 38.2 32.7 0 37.4 77.4 58.1 67.4 70.2 0 44.6 5.00 33.6 49.1 41.2 41.9 0 57.3 29.5 39.3 37.2 21.1 73.6 44.8 0 26.0 50.7 43.8 47.6 42.6 38.6 33.2 45.9 0 65.5 80.1 57.1 77.8 74.7 51.7 46.0 76.2 58.4 0 31.2 54.0 37.9 47.4 46.8 39.2 36.6 50.5 28.3 65.8 0 43.3 31.4 34.5 34.6 19.1 58.8 33.6 22.4 29.2 65.4 38.7 0 44.0 53.7 50.2 52.3 43.0 44.9 26.8 50.3 30.4 55.3 42.2 36.1 0 N O P Q R S T U V W X Y 50.3 31.3 27.5 39.2 20.3 64.4 37.3 23.7 37.6 67.8 39.3 19.4 43.6 0 52.0 29.2 32.3 40.0 20.9 70.4 44.0 22.6 42.0 76.4 42.7 20.6 49.3 17.3 0 59.9 30.7 35.6 35.4 17.3 68.4 39.1 27.3 47.0 71.8 46.2 25.7 46.4 23.9 25.7 0 60.0 31.3 41.0 19.3 17.7 71.5 46.1 27.3 46.5 77.8 48.2 25.7 49.1 28.5 28.3 22.7 0 80.4 128 114 117 125 77.9 104 124 91.5 108 90.3 112 100 117 121 126 124 0 46.3 55.9 51.8 53.3 49.0 55.9 44.4 48.7 38.9 67.8 46.7 40.2 43.9 46.8 49.4 52.4 51.7 104 0 63.8 35.7 45.2 19.3 22.4 73.1 49.8 34.3 50.7 80.4 51.4 31.9 52.2 34.4 35.0 26.1 13.6 126 54.7 0 54.2 29.9 34.8 25.0 13.2 64.5 37.4 25.9 39.6 70.4 42.4 20.4 41.1 21.5 24.7 19.2 14.3 120 45.7 17.2 0 73.9 90.5 72.8 91.2 85.8 80.6 73.1 81.0 78.2 78.1 82.7 78.4 73.4 77.9 83.4 85.8 89.9 114 65.6 91.6 83.8 0 94.7 88.5 76.7 93.4 85.6 100 87.5 88.6 90.2 91.3 91.9 86.1 89.5 75.8 82.9 84.5 92.6 133 89.2 90.6 85.9 97.8 0 10.3 108 74.0 105 106 89.6 83.4 106 102 73.7 94.5 103 100 95.8 103 98.5 107 135 105 108 101 105 101 0 47.1 35.3 39.1 39.4 31.5 64.0 43.4 30.8 40.8 72.3 45.2 26.7 46.0 30.2 29.9 36.5 30.0 113 37.3 38.3 32.1 70.6 90.0 104 0 18 Table 4: Each syllable is classified by an identification card consisting of the averaged normalized syllables clipped spectrogram generated by a Mathematica. 19 20 21 Table 5: Song coding of 10 songs from catalog of seasonal songs that best represent each syllable. The syllables are stated first followed by frequency and unvoiced space between syllables measured in seconds. Song Syllable Coding 1 O7 .1. H14 .4. X11 .25. M5 .1. H13 .05. Q16 .3. W8 .15. X14 .1. J4G4 .1. M1 .1. M1 .1. M1 .05. M1 .15. I19 .05. Q8 .1. E4 2 T26U15 .3. D10F5 .2. E6 .05. H11 .2. J3G9 .2. M1 .1. M1 .1. M1 .05. M1 .2. I8C8 .2.J4G5K10V5 .2. C8 .2. J6K3 .05. M1 3 H12 .15. V6 .025. A2 .05. H17 .275. T21U17 .2.7. C14 .1. J6G3K4V4 .1. M1 .1. M1 .1. M1 .05. M1 .1. I16V1 .15. V7 .1. M1 .1. M1 .1. M1 .1. M1 .05. M1 .05. M1 .05. M1 4 T24U32 .15. A1 .1. H12V1 .1. V8 .1. M1 .1. M1 .1. M1 .1. M1 .05. G5V2 5 H11V3J4P7K11 .2. A1 .025. A1 .1. I12X9 .05. J6G5 .1. M1 .1. M1 .1. M1 .05. M1 .1. H14 6 D11F9 .3. T19 .55. Q14 .1. E9 .1. H12 .15. V6W1 .1. M1 .1. M1 .1. M1 .05 M1 .25. X10 .2. K4V2K3V2 .65. W12K6P3 .1. M1 .05. M1 .05. M1 .05. M1 .1. H12 .3. V8 .1. M1 .1. M1 .05. M1G3 .1. A1 .1. H13 .8. I5E6Q14 .1. A1 .05. A1 .1. E6 .1. H3X8J6G4P7K3 7 X15E4 .02. H10 .32. J4G12 .15. M1 .15. M1 .18. M1 .01. M1 .03. I10X8 8 L24 .28. T16E3 .13. H10 .06. J1 .24. D11F9 .27. T17 .8. Y7E2 .23. P6G5 9 T12U10 .49. K9 .55. Y12 .22. H2P5 10 A1 .07. A1 .25. T17U18X14 .63. B9 22