Stops in CV-syllables
Transcription
Stops in CV-syllables
Dept. for Speech, Music and Hearing Quarterly Progress and Status Report Stops in CV-syllables Fant, G. journal: volume: number: year: pages: STL-QPSR 10 4 1969 001-025 http://www.speech.kth.se/qpsr B.L. I kHz - kHz - kHz - 1- 1- 0--, , 1 - 0- 1 -.l 0 .1 .2 3 4 sec. .5 I kHz - , .I 0 a 0 .2 .1 .3 .4 5 6 sec. -.I 0 .I .2 3 6sec 5 4 kHz4- 4- - 3- lbr 3- a - -.2 - 1 0 .I .2 Fig. I-A-1. .3 .4 sec. ;I. - . 2 -.I 0 1 Spectrogram8 of [p][t][k][b][d][g] .2 . 3 .A sec before the vowel -.2 -.I [a:]. 0 1 Subject B . L . 2 .3 L set B.L. kHz 4- 3- 3- 2- 2- - 1- 0 - 1 1 1 k1 ' 1 -.I 0 .I . 2 I l l 3 A .5 I sec. 0 .I .2 .3 .5 4 .6 set. -.2 -.l 0 .I .2 3 4 5 sec kHz - kHz 4- . 2 -.l 0 bi F i g . I-A-2. .1 2 . 3 .4 .5 sec 51i di Spectrograms of [p][t][k][b][d][g] sec. before the vowel [i:]. Subject B . L . B.L. I - 0-'I A ( I I -.l 0 I I .1 .2 3 4 I ' .5 -.l 0 sec 1 2 4 .3 0 J I 1 1 1 0 ' I .5 .6 sec. tar Pa 1 1 2 3 L I l l 5 6sec kg kHz- kHz - kHz 1 L -- sec. bra Fig. I - A - 3 . -.2 -.i 6 .i . 2 .3 .i ' I ' sec. 4 d@ Spectrograms of [p][t][k][b][d][g] g@ before the vowel [b:]. Subject B . L . sec. B.L. k H z -.I kHz - kHz - 4-- 4- 4- - 3- -.l 0 .1 .2 .3 4 .5 sec. 6 .i . 2 3 x .5 .s -.1 sec. kHz - 0 .1 . 2 . 3 4 .5 .6 sec. kHz 4 3- 2- it 1-- 0-.2 -.l 0 Fig'. .1 .2 A-4. .3 .L .5 sec. - . 2 -.I 0 .I .2 Spectrograms of [ p l [ t l [ k l [ b l [ d ] [ g ] 3 .4 .5 sec. I I .2 -.1 before the vowel [u:]. 1 0 .I .2 Subject B . L. .3 1 .4 1 1 .5 Sec. - I I 2. STL-QPSR 4/1969 ( 3 ) F r i c a t i v e segment. T h i s is c h a r a c t e r i z e d by a noise produced a t the consonantal c o n s t r i c t i o n a s i n a homdrgallic fricative. Z e r o s i n t e r a c t s o a s t o c a n c e l "back cavity" f o i m a n t s , while "front cavity" f o r mants prevail. \r~% (4) Aspirative segment. T h i s i s c h a r a c t e r i z e d by a n "h-like" noise originating f r o m a random s o u r c e a t the glottis o r f r o m a supraglottal s o u r c e a t a relative wide constriction exciting a l l formants. FZ, F3, $A? arid FF4 a r e the m o s t typical canstituentsr The a s p i r a t i v e segment c a n i n p a r t c o o c c u r with the f r i c a t i v e segment but t a k e s o v e r a s the d e g r e e of a r t i c u l a t o r y opening p r o c e e d s c & (5) The initial papt of a following voiced sdund t o the extent that i t is influenced by coarticulation with the st094 > - This c o m p l e t e sequefice is typical af a s p i i a t e d [t] and [k] w h e r e a s the frictional phase is ihthkr weak o r abseht i n Ep] b e c a u s e of t h e low noise generating efficiency of the bilabial s t r u c t u t e and t h e rapid delabialization. fn the terminology of t h e c l a s s i c a l ~ a d k i n s 's y n t h e s i s the tyansient plus f r i c t i o n plus algpiration eegment c a n b e s e e n ih [ d l and Cg]. ?he Lg] Ir A s h o r t frictional h a s the m a s t appapent t r a n - s i e n t segment which c a n b e a s c r i b e d t o t h e high of t h e front cavity r e s o Accordingly, nance and lacking d i s p e r s i o n effects of t h e m a i n formant. Cg] I i a t r e a t e d a s a singlk s e g m e n t called t h e "burst". Voited Swedish stops l a c k the a s p i r a t i o n phage. t h e d u r a t i o n of the 1 and [ k] t r a n s i e n t s is l o n g e r t h a n in any o t h e r stops. \ Uninterrupted voicirig c a n b e superimposed during a l l p h a s e s of [b][d] and [ g ] i n which c a s e the t r a n s i e n t a p p e a r s a s a n e x t r a spike i n the background of voicing. f It i s m o r e typical, however, that voicing i s absent o r v e r y weak, s o m e 50 m s b e f o r e and 10-30 m s a f t e r the transient. The period of weak voicing a f t e r the r e l e a s e thus c o r r e s p o n d s to the a s p i r a t i v e segment (4) i n unvoiced stops and m a y coincide with a f r i c a t i v e segment i n [ d l and [g]. We s h a l l r e t u r n t o t h e production t h e o r y i n the l a t e r f e a t u r e discussion. A f i r s t inspection of s p e c t r o g r a m s , s e e F i g s . I-A-1 - I-A-4, s o m e a p p a r e n t a s p e c t s of the voiced-voiceless distinction. s t o p s [k][p][t] reveals The unvoiced have a "burst", i. e. unvoiced segment defined by t h e d i s - t a n c e f r o m r e l e a s e t r a n s i e n t t o the full voicing onset i n t h e following vowel which i s of the o r d e r of 125 ms c o m p a r e d with 10-25 m s f o r t h e c o r r e s p o n d ing segment i n [b][d][g]. However, t h e duration of the voiced p a r t of the vowel is approximately the s a m e a f t e r a l l s t o p s , voiced a s well a s unvoiced. Thus the t e m p o r a l organization i s not s i m p l y a m a t t e r of delay i n voicing i n [k] [p] [ t ] compared with [b][d][g] a t t h e expense of the vowel length. e 1 4. STL-QPSR 4/1969 One f a c t o r that s e e m s t o add t o the observed dufations of c l o s e Swedish vowels is that they tend t o tie diphthongized with a homorganic fricative. Since my c r i t e r i o n f o r the telminhtion of the vowel segment was the d i s appearance of ~i i n the spectrog Yam the vowel accordingly incorporates As shown i n Table I-A-1. B the duration of the 6 u r s t is somewhat s h o r t e r before [u3 and [i] than before s voiced, any frictional termination ~ ~ r h i di h other vowels thus tending t o reduce differences i n the o v e r a l l duration of syllables comprising various vowels a f t e r unvoiced s t e p s J The m a i n finding above that the duration of the voiced p a r t of a vowel is not substantially influeneed by the M t u r e of the preceding consonant conf o r m s with the observations af P e t e r s o n and L e h i s t e ( 1 9 6 0 ) ~a s illbstrated by t h e i r figi 4 e%emplifyirig s p e c t r o g r a m s of "tugt' and "duck". "tuck" and however, t h e i r atteyhge findings indicate that the vowel a f t e r a voiced stop exceeds the length of the voiced p a r t of a vowel a f t e r a n unvoiced stop by approximately onemhalf of the b u r s t duration ok 30 ms. T h e i r vowel lengths a r e about 20 p e r cent s h o r t e r than those reported h e r e whilst the absolute value of t h e i r b u r s t durations w e r e of the o r d e r of 60 p e r cent of o u r s indicating a heavier aspiration of the Swedish stops. TABLE I-A- 1. B Voicing onset delay in ms a f t e r r e l e a s e in CV syllables b u o a mean 130 100 130 125 115 120 1 0 5 , 100 125 120 120 125 130 150 115 130 140 130 130 115 115 130 105 130 130 - <10 <10 <10 10 15 0 0 15 <10 10 15 30 40 25 40 10 10 15 20 25 25 20 15 35 15 15 20 35 25 Y i e E u p t 100 130 120 100 95 120 125 125 125 k 120 130 150 mean 115 130 db <10 gd g 120 h T h e s e d i f f e r e n c e s a r e probably related m o r e t o t h e complexity of t h e t e s t w o r d s , CVC v e r s u s CV, than t o language specific pronunciation a s judged by o t h e r speech m a t e r i a l a t o u r disposal, should a l s o b e considered, T e m p o and l e v e l of s t r e s s T h e s e conditional f a c t o r s a s well a s the effect of location within a complex s t r i n g of syllables need t o b e investigated further. As s e e n i n Fig. I-A-5 a s p i r a t i o n is not l o s t i n sentence initial u n s t r e s s e d syllables. In t h e s e CVCV:CV words spoken by t h e s a m e subject, B. L., the d u r a tion of the b u r s t is of t h e o r d e r of 50-90 m s in s t r e s s e d positions and 50-70 m s i n u n s t r e s s e d sentence initial position. Interesting m a t e r i a l f o r c o m p a r i s o n is offered by a h m a n (1965). He u s e d t e s t words of the type CVCen (with C=g and k, V=long [ a : ] and s h o r t [ a ] with accent 2 word intonation) i n s e r t e d i n a c a r r i e r sentence ( s a g a @en). ... T h e durations of h i s [ k ] - b u r s t s w e r e m o r e o r l e s s constant 80 ms. When two u t t e r a n c e s differing b y the voiced/voiceless distinction of t h e f i r s t C w e r e compared and synchronized with r e s p e c t t o o v e r a l l intonation p a t t e r n he found that the instant of s t o p r e l e a s e had to c o m e 40 m s e a r l i e r i n [ k] than i n [ g] . o h m a n a l s o c l a i m s that t h e s a m e relative timing p a t t e r n o c c u r s if the articulation of the following vowel and not the intonation is taken a s a b a s i s of comparison. T h i s r u l e a l s o a p p e a r s t o hold i n the p r e s e n t CV m a t e r i a l a s shown by Fig. I-A-6 exemplifying t h e overlaying of t r a c e d f o r m a n t p a t t e r n s of [ta:] and [ d a :I, [ k a : ] and Lga:]. H e r e the r e l e a s e of the [ t ] i s located 30 m s ahead of the r e l e a s e of [ d l and the s a m e holds f o r [ k] c o m p a r e d with Lg]. T h i s means that t h e a r t i c u l a t o r y g e s t u r e a f t e r r e l e a s e i s different i n the voiced and unvoiced plosives. f e r e n c e c a n have two dimensions. T h i s dif- One is that the a r t i c u l a t o r y p a t t e r n is different a t the instant of r e l e a s e and eventually r e a c h e s t h e s a m e dynami c a l p a t t e r n although a t different t i m e s f o r t h e two s t o p s o r that the initial a r t i c u l a t o r y p a t t e r n is m o r e o r l e s s the s a m e , except f o r the l a r g e r glottal opening a t t h e r e l e a s e of the unvoiced stop, whilst the offset g e s t u r e p r o c e e d s a t a s l o w e r r a t e i n the f i r s t 40 m s a f t e r r e l e a s e of the unvoiced s t o p . T h e l a t t e r a p p e a r s t o b e the c a s e with palatal stops and possibly a l s o f o r m o s t dental stops. * I T h e t e r m i n a l F - p a t t e r n s a r e not s o different comparing A r t i c u l a t o r y data on English atoptvowel dynamics published by Houde (1967) a r e of s o m e i n t e r e s t i n t h i s connection. I release 3000 I I 0 0 1 1 0.2 d t voice onset Fig. I - A - 6 . !6 sec. ram.. 0 -e . ;i o f t I I 0.2 I I I 0.4 9 k v o i c e onset F-patterns of voiced and unvoiced stops matched for articulatory synchrony. b 0.6 sec. 6. STL-QPSR 4/1969 Cg] and [ k ) o r [dl and [ t ] as comparing [p] and [ b l i n a position b e f o r e a back vowel w h e r e unvoiced stops have a much higher t e r m i n a l F voiced stops. shown 2 tha T h i s holds f o r Swedish a s w e l l as f o r English as will be On a l a t e r p a r t of this ar'ticle, Returning t o m a t t e r s of segment durations it a p p e a r s f i r s t of all t h a t available d a t a on the tlifferenbeo in voiced vowel lehgth with r e s p e c t t o t h e infltlence of the voiced/voiceless distihction of the preceding s t o p a r e l e s s v a r i a n t on a n absolute than on a relative t i m e scale. Thus the P e t e r s o n - L e h i s t e d a t a c a n be expteseetl a s a n a v e r a g e of 30 ms longer vowel a f t e r voiced than a f t e r urivoiced s t o p and the a h m a n d a t a a r e c l o s e t o the 40 m s difference which holds i n s h o r t a s well a s i n long vowels. The l a t t e r ob- s e r v a t i o n is r e m a r k a b l e i n view of the fact t h a t the long vowels a r e about 60 % l o n g e r than the s h o r t ones. If the p r e s e n t m a t e r i a l of C V syllables is t o b e analyzed i n exactly the s a m e way a s that of t h e o t h e r two studies mentioned above we m u s t add t o the length of the vowel a f t e r voiced s t o p t h e duration of the v o i c e l e s s o r weakly voiced i n t e r v a l between r e l e a s e t r a n s i e n t and visible onset of the following vowel. In a l l we would then have a 20 m s vowel length difference i n the g-k comparison, a 25 m s i n t h e d-t contexts, and a 25 m s i n t h e b-p contexts. A s i m p l e n u m e r i c a l r u l e f o r relating t h e s e f a c t s would be that the vowels a f t e r voiced stops a r e prolonged by the s a m e amount a s the latenc y of the instant of voiced s t o p r e l e a s e compared with t h e unvoiced s t o p release. In ohman' s m a t e r i a l t h i s l e a d s t o absolute synchrony of the in- s t a n t of vowel t e r m i n a t i o n b e f o r e the s t o p g a p of the following c o n s o n a n t ~ . Approximately the s a m e could be t r u e of t h e P e t e r s o n - L e h i s t e d a t a s i n c e t h e difference i n vowel lengths is of t h e o r d e r of one- half of the b u r s t length. In o u r CV-material, however, the e x c e s s i v e length of t h e b u r s t , a v e r a g e 125 m s , accounts f o r a relative prolongation of the instant of voice offset of t h e vowel preceded by a n unvoiced stop. T h i s prolonga- tion a s s u m i n g maximum vowel synchrony is apparently equal t o t h e b u r s t length minus the voiced s t o p r e l e a s e lag minus t h e difference i n voiced vowel length. T h i s d i s c u s s i o n is p e r h a p s c a r r i e d f u r t h e r than p e r h a p s motivated by o u r m e a g e r data. I However, the purpose is t o s t i m u l a t e f u r t h e r w o r k on the formulation of r u l e s f o r s e g m e n t a l programming. It could b e that i n the specific mode of reading isolated CV-syllables the s e g m e n t a l I p r o g r a m m i n g is governed mainly by a rhythmical demand of producing equally apaced, equally loud vowel nuclei, T e s t s on t h e timing of syllable production i n synchrony with a periodically repeated auditory signal p e r formed by Lindblom and Sundberg* indicate that the i n s t a n t of m a j o r intensity i l i c r e a s e i n the syllable, a s p e c i a l c a s e of which is t h e instant of These switching f r o m voiceless t o voiced segment, g o v e r n s the timing. d a t a support the syllabic! timing r u l e s pfopased by Koahevnikov and chistotrich (1 965). One typical ekample of the r o l e of voicing boundary a s a d e t e r m i n a n t of s e g m e n t a l organization c a n b e studied i n the [ C ~ C:Ca] (C=k, p, t , g, b , d ) s p e c t r o g r a m s of Fig. I-A-5. T h e t i m e i n t e r v a l between onset of voicing i n the first and t h e second vowel and between the second and final vowel is shown below together with d a t a on t h e duration of t h e t h r e e vowels. TABLE I-A-2 CaC a:Ca segmental a n a l y s i s , t i m e i n m s - onset V 1 Onset V3 - onset V2 Onset V2 C= k 6 P b t d 260 250 270 250 270 240 370 360 365 360 370 350 1 Duration V2 85 125 75 110 85 120 180 240 180 240 190 250 Duration V3 170 180 170 160 170 170 Duration V The stability of t h e s e t e m p o r a l r e f e r e n c e points of vowel o n s e t s holds f o r variations i n place of articulation within 10 m s and within 30 m s f o r the voicing distinction. The i n c r e a s e i n consonant length with unvoicing is somewhat l a r g e r than f o r reduction of the vowel length. Thus the V t C i n t e r v a l s of Table I-A-2 a r e about 15 m s l o n g e r when C is unvoiced than when C is voiced. The initial vowel is c l o s e to 40 m s l o n g e r when the consonant i s voiced i n a g r e e m e n t with previous findings. T h e second and fully s t r e s s e d vowel is 60 ms longer i n a voiced context w h e r e a s the final vowel which is u n s t r e s s e d does not v a r y much i n length depending on the voicing of t h e consonant. T h e l a t t e r observation conforms with the f a r going reduction of the a c o u s t i c a l distinction between voiced and unvoiced * unpublished data. I STL-QPSR 4/1969 8. stops i n non-initial u n s t r e s s e d position. The relative l a r g e effects on the second vowel could b e a s c r i b e d t o the added influence f r o m both previous and following consonants. A f u r t h e r d i s c u s s i o n of the k/g, p/b, and t/d I distinctions follows i n a l a t e r p a r t of t h i s a r t i c l e , B e f o r e leaving the topic of s e g m e n t a l s t r u c t u r e s o m e words should be s a i d about the t e r m i n a l boundary of a vowel followed by a stop. If voicing i s continued s t r a i g h t through t h e occlusion the boundary is s e t by the a r t i c u l a t o r y c l o s u r e a s s e e n by the t e r m i n a t i o n of the F1 t r a n s i t i o n towards b a s e - l i n e position. Vowels followed by unvoiced s t o p s a r e t e r - minated by a n a c t i v e devoicing g e s t u r e of the vocal c o r d s which is synchronized t o t u r n off voicing a t o r just b e f o r e the a r t i c u l a t o r y closure. The a r t i c u l a t o r y closing g e s t u r e m a y well contribute t o the final i n t e r ruption of t h e voice s o u r c e but t h i s i s not a n e c e s s a r y requirement. In heavily s t r e s s e d positions the voicing h a s died out well b e f o r e the a r t i c u l atory closure. Vowel d u r a t i o n is influenced m o r e by the following con- sonant t h a n by a preceding consonant, s e e P e t e r s o n and L e h i s t e (1 960), E l e r t (1964), and a forthcoming report?. Transitional patterns The purpose of t h e following section is t o d i s c u s s the m a t e r i a l on f o r - I mant p a t t e r n s and t r a n s i t i o n s i n the C V - m a t e r i a l i n relation to e a r l i e r s t u d i e s , notably those of L e h i s t e and P e t e r s o n (1 96 l ) , a h m a n (1 966), and F a n t (1959). By f o r m a n t t r a n s i t i o n s is understood the dynamic variation of the Fpattern, i. e. F F F F a s a function of time. T h e extent t o which t h e 1 2 3 4 F - p a t t e r n dynamics signals the place of articulation is one problem of general interest. Another is t h e possibility of i n f e r r i n g coarticulation f e a t u r e s f r o m F - p a t t e r n analysis. We s h a l l attempt t o c o m p a r e voiceless and voiced s t o p s i n Swedish and English accordingly. As a control on s o m e of t h e m e a s u r e m e n t s using a vocal t r a c t model we s h a l l s i m u l a t e t r a n s i t i o n s that a r e difficult t o follow i n s p e c t r o g r a m s . Finally we s h a l l d i s c u s s d a t a , vocal t r a c t theory, and proposed models of perception i n relation t o f e a t u r e theory of s t o p sounds. * A t h e s i s study by Inger K a r l s s o n and L. Nord support t h i s view. I 9 STL-QPSR 4/1969 F i r s t a few w o r d s about t r a n s i t i o n s and sampling techhiques. The main object of o u r m e a s u r e m e n t s h a s b e e n t o s a m p l e the F - p a t t e r n extrapolated t o the instant of t h e beginning of t h e t r a n s i e n t r e l e a s e of the s t o p closure. T h i s is not a n unambiguous p r o c e s s i he fikdt pakk of the t r a n s i t i o n a f t e r r e l e a s e m a y b e v e r y rapid and difficult t o faiiow, A fact which often is overlooked is that a CVdtransition i s often cort.rplex, comprising a f i r s t rapidly p r o g r e s s i n g p a r t ?elated to the f.ele&se of the consonantal obdtruch I of longer t i m e kondtant related t o the *his is typicaliy the t a a e with l a b i a l s but tion plus a n ovekiayed t r a n s i t i o n m a i n tondue body movementi a h o with aiveolaes and dentaid, It m a y b e difficult t o follow a l o r m a n t trankition i n unvaiced s e g m e n t s but t h e r e v e r s e c a n a l s o b e t r u e . An iritense a s p i r a t i o n m a y provide m o r e favorable conditions f o r F - p a t t e r n tracking than a v e r y low pitched voiced segment. It was considered of i n t e r e s t t o s a m p l e the F - p a t t e r n of un- voiced s t o p s not only a t r e l e a s e but a l s o a t the initiation of voicing a f t e r aspiration. The collected F - p a t t e r n data on F 2 , F3, and F No F1 d a t a a r e included. T a b l e s I-A-3 and I-A-4. 4 a r e documented i n The limiting value of F1 i n the occlusion is of the o r d e r of magnitude of 200 Hz f o r a l l voiced a t onset of voicing a f t e r unvoiced s t o p i s 1 g e n e r a l l y c l o s e t o t h e t a r g e t value of F1 except i n occasional i n s t a n c e s stops. On t h e o t h e r hand F of unvoiced s t o p plus [a:]. Other a s p e c t s of a r t i c u l a t o r y movements s u c h a s tongue body place shifts, o r a labial o r palatal closing g e s t u r e m a y continue during t h e vowel. Obviously, a s i m p l e t i m e constant one f o r e a c h formant independent of consonant and its vocalic context is not sufficient f o r CV-synthesis. The f i r s t object of t h e analysis was t o explore how much t h e initial F2 and F3 values of a stop v a r y with r e s p e c t t o the a s s o c i a t e d vowel. It c a n b e s e e n f r o m Table I-A-3 that the e x t r e m e low F -1400 Hz of 2io c c u r s with the vowel Lo:] and the maximally high F -1800 Hz with t h e 2 ivowels [i:], ce:], and [Y:]. The voiced cognate [ b ] has the s a m e rnaxi- value and a minimum F -900 Hz. Such d a t a on e x t r e m e ranges 2i 2 iof second and t h i r d f o r m a n t t e r m i n a l frequencies a r e summarized i n mum F Fig. I-A-7. Fig. I-A-8 shows a s e t of corresponding d a t a e x t r a c t e d f r o m a n a r t i c l e by L e h i s t e and P e t e r s o n (1961). A f i r s t glance a t the two f i g u r e s r e v e a l s b a s i c s i m i l a r i t i e s ; the s m a l l range of variation f o r d e n t a l s , I STL-CPSR 4/1969 TABLE I-A-3 F2F3F4a t instant of r i I I I i I range min-max TABLE I-A-4 F 2 , F 3 , F4 a t instant of voice onset a f t e r unvoiced s t o p s t h e l a r g e range f o r v e l a r s , and palatals a s a single g r o u p with the o v e r l a p of FZi and Fgi ranges. T h e g r e a t e r range f o r voiced than f o r unvoiced l a b i a l s , a l r e a d y mentioned above, is found i n the Swedish a s well i n the E n g l i s h data. A d e t a i l analysis r e v e a l s that the extended initial F - p a t t e r n range of voiced labial stops c a n b e a s c r i b e d t o a c l o s e r coarticulation with back vowels [u:][o:] and [a:] w h e r e a s unvoiced l a b i a l stops s t a r t f r o m a m o r e n e u t r a l tongue position a t the instant of r e l e a s e . a p p e a r s with Swedish dentals. A s i m i l a r trend The l o w e r bound of the F2i domain f o r Swedish [ g ] is a l s o somewhat l o w e r than t h a t of [kg. Following the co- articulation model developed by Ohman (1 967) t h e s e effects could a t l e a s t in p a r t b e a s c r i b a b l e t o t h e r e l a t i v e timing of a r t i c u l a t o r y programming. As d i s c u s s e d previously i n connectian with Fig. I-A-6 the voiced s t o p tongue movement i s equal t o that of the unvoiced one r e l e a s e d 30 m s e a r lier. I I If we hypothesize the s a m e s e m i - n e u t r a l tongue body t a r g e t of voiced a s well a s unvoiced stops the m e r e t r a n s l a t i o n of t h e vowel influence c u r v e t o t h e "right" i n t i m e f o r the unvoiced s t o p would reduce t h e effect of vowel coarticulation on the t e r m i n a l values of f o r m a n t f r e quencies. The range of t e r m i n a l F - p a t t e r n variations would b e even g r e a t e r if w e i n s e r t e d different vowels b e f o r e t h e consonants, i. e. if both the following and the previous vowels w e r e v a r i e d independently a s in the study of a h m a n (1966). Our study above c a n b e r e g a r d e d as a s p e c i a l c a s e w h e r e t h e consonant is preceded by a n e u t r a l vowel. VICVZ syllables with C=voiced s t o p [g][b] s t r e s s e d vowels [u:][a:]Cb:] o h m a n h a s shown that in o r i d ] and V1 and V2 equally o r [i:] varied independently the t r a n s i - tional p a t t e r n i n any p a r t of the t e s t words is influenced by both vowels and the consonant. Thus the initial F - p a t t e r n a f t e r r e l e a s e a s well a s the p a r t depend on the p a r t i c u l a r V1 and can2 v e r s e l y the V I C offglide t r a n s i t i o n is influenced by V 2 . following t r a n s i t i o n of the CV One p a t t e r n a s p e c t studied by Cjhman was t h e consonant "locus" i n t h e specific Haskins L a b o r a t o r i e s ' s e n s e . T h e i r "locus" is defined a s a common point a n t h e frequency s c a l e about 50 m s ahead of t h e r e l e a s e which is r e g a r d e d a s t h e v i r t u a l s t a r t i n g point of F2 t r a n s i t i o n s f r o m one and the s a m e consonant t o a l l p r ~ s s i b l evowels that c a n follow. Delattre, L i b e r m a n , and Cooper (1 955) claimed f r o m synthesis e x p e r i m e n t s that [d) h a s a locus of 1800 Hz, [ b ] a locus of 7 2 0 Hz, and with non-back vowels 3000 Hz. tg] if produced The a r t i c u l a t o r y significance of t h e l o c i a r e claimed t o b e invariant vocal t r a c t configurations. This is a n o v e r - simplification and the significance of the "locus" is p r i m a r i l y limited t o two-formant synthesis rules. Ohman s t a t e s that given a specific V1 and C the f o u r possible V2 of t h i s t e s t provide t r a n s i t i o n s that c a n be e x t r a polated b a c k t o a common "locus" providing C is e i t h e r [ d l o r [ b ] and I Fig. I-A-7. Range of i n i t i a l F2 and Fg of Swedish s t o p s in combinations with a l l p o s s i b l e long vowels. Fig. I-A-8. L e h i s t e - P e t e r s o n data on range of initial F 2 and F3 of stops. STL-QPSR 4/1969 with the locus being a function of t h e F2 sf V1. However, a c l o s e r view of o h m a n ' s d a t a shows that t h e invariance of [b] loci with r e s p e c t t o V 2 is not v e r y good. A brief study of t h e s p e c t r o g r a m s of o u r CV m a t e r i a l supports the notion that [ b ] does not have a unique locus. That [g] h a s a variable locus was evident a l r e a d y in the e a r l y Haskins L a b o r a t o r i e s ' work although they choose t o s p e a k of two Cgl loci, one f o r front vowels and one f o r back vowels. Transitions studied by analog simulation Before entering a discussion on the relative importance of various acoustic c u e s f o r s t o p consonant identifications i t is worth-while t o consult production theory in the support of s o m e of the m o r e uncertain m e a s u r e m e n t s and t o provide s o m e general b a s i s f o r feature analysis. The transitions of labial stops t o a following vowel a r e not always e a s y t o follow i n the spectrogram. The m a j o r p a r t of the labial opening phase is often completed i n l e s s than 20 msec. Production theory, F a n t (1 960), s t a t e s that a n i n c r e a s e i n lipsection a r e a , everything e l s e being equal, cannot r e s u l t i n a downward shift of any formant located at a frequency lower than c/4io, w h e r e lois the length of the l i p passage I which in p r a c t i c e applies t~ a l l observable formants of the F-pattern. However, the extent of the upward shift of formant frequencies v a r i e s with the p a r t i c u l a r formant and the vocal t r a c t configuration. As e a r - I transition of opposite sign t o that induced by the lip passage opening and I i t generally extends o v e r a longer period of time. A relative prominent falling transition m a y result, s e e [mg i n Fig. I-A-1 and Lpu:] i n Fig. h I-A-4. T h e s a m e f e a t u r e if found i n Danish [ p 01, ~ischer-~Br~ensen(l954). I l i e r pointed out a superimposed tongue body movement may produce a An o b s c u r e detail in the [ba:] s p e c t r u m is the v e r t i c a l s p e c t r a l line f r o m 1000-2000 Hz i n the released transient. It was observed a l r e a d y i n m y spectrographic work a t the E r i c s s o n Telephone Co. in 1946-1 949, s e e F a n t (1 959), Fig. 42. One object of the analog calculations would b e t o find out if it had anything t o do with F2 and F transitions. Another 3 object was the study of formant transitions f r o m [b] t o a front vowel [i], F o r this purpose I adopted f o r a simulation study with o u r line analog LEA the &opening c r o s s - s e c t i o n a l a r e a a s a function of t i m e , s e e Fig. I-A-9, experimentally determined by F u j i m u r a (1961). The a r e a - I L I P AREA-TIME - - - - . I -- 9 - -. 9 I F i g . I-A-9. F u j i m u r a (1 96 1 ) d a t a on l i p opening a s a function of t i m e f o r t h e t e s t w o r d "pope". function of t h e r e s t of the vocal t r a c t w a s kept constant. One s e t of m e a s - urements* w e r e m a d e with a vocal t r a c t a r e a function, a p p r o p r i a t e f o r the R u s s i a n vowel [a:), [P, 1, one f o r [i:], and one pertaining t o the palatized s e e F a n t (1960). At the i n t e r v a l of complete lip c l o s u r e Fl should not d r o p t o z e r o but to a limiting value of about 150 Hz d e t e r m i n e d by t h e enclosed a i r volume and the m a s s distribution a t t h e vocal walls. all F 1 Accordingly, s e e F a n t (1960), values w e r e c o r r e c t e d by a root s q u a r e s u m m a t i o n T h e r e s u l t s of the calculations a r e shown i n Fig. I-A-10 and T a b l e I-A-5, According t o Fig. I-A-9 the lipopening h a s reached 50 final value a t 10 m s and then p r o c e e d s a t a s l o w e r rate. 70 of the A major part and F t r a n s i t i o n s a r e a l s o completed a t 10 m s a f t e r r e l e a s e . 1 2 All t r a n s i t i o n s a r e positive a s expected. The F 2 and F 3 t r a n s i t i o n s of [ba:] a r e s m a l l and i t c a n accordingly b e concluded that t h e r e l e a s e of the F t r a n s i e n t above F2 should b e d i s r e g a r d e d i n t r a n s i t i o n studies. . jumps up 500 Hz on the f i r s t 5 m s . T h e t e r m i n a l value 21 1200 Hz is l o w e r than the F2i=1700 Hz m e a s u r e d f r o m s p e c t r o g r a m s . In [bi:] F T h i s difference could be explained by limited m e a n s of following s u c h a rapid t r a n s i t i o n i n t h e s p e c t r o g r a m . Another s o u r c e of deviation of the model f r o m the spoken d a t a could b e that t h e tongue body configuration a t the instant of r e l e a s e i n [bi:3 is not that of a p u r e [i:] but is p e r t u r b e d i n the d i r e c t i o n of a n e u t r a l position a s i n t h e palatalized [ b , of Fig. is c l o s e r t o 1500 Hz and the ex2i tent of the F t r a n s i t i o n i s s m a l l e r . In view of t h e wide range of co2 articulation induced by a previous vowel i n C V G c o n t e x t s and possible I-A-10 w h e r e the t e r m i n a l value of F fluctuations in initial tongue configuration in production of CV-syllables i t i s , anyhow, apparent that v a r i a t i o n s i n F 2i of [bi:] c a n b e expected. On the whole, however, d i s r e g a r d i n g t h e l a c k of-information on the f i r s t 5 m s the calculated dynamical F - p a t t e r n of [bi:] i n Fig. I-A-1 ti a g r e e s well with m e a s u r e d data. One p a t t e r n a s p e c t wellknown f r o m s p e c t r o - g r a m s is that the F t r a n s i t i o n g o e s on f o r a longer t i m e than t h e F2 t r a n 3 sition and, with d i s r e g a r d t o the first 5 m s , c o v e r s a g r e a t e r frequency s p a n than F2. * I a m indebted t o Doc. J. Sundberg f o r c a r r y i n g out t h i s work. Lb 4 CALCULATED F-PATTERNS (closed glottis) Lb il Fig. I-A-10. Calculated dynamic F-patterns of voiced labial rtopr. As seen in Fig. I-A-2 the transitional pattern of the [ ~ i : ]aspiration i s not l e s s apparent than that of [bi:]. The main part of the F j transition i s completed in 40 m s according to the simulation in Fig. I-A-10, In this has moved from the 2200 Hz terminal value to 2750 Hz. This 3 compares very well with measurements f r o m the spectrogram in Fig. time F I-A-2. The F3 transition in the following and l a t e r part of the spectro- g r a m reaches a higher target value than in the simulated syllable which I I can be ascribed to the tongue body movement up to a higher d c z r e e of closure typical for the diphthongization of Swedish [i:]. However, apart from this added F3 movement the longer duration of the F3 transition compared with the F transition i s related to a higher, differential influ2 ence of the lip parameter on F than on F2 a t relative large degrees of 3 lipopening. In t e r m s of resonator theory this i s explained by the fact that F 2 i s a standing wave resonance of the pharynx and once the lipopening has reached a value high enough s o a s to not compete with the palatal s t r i c t u r e the F influence will be minimal. Also, since F3 of [i:] i s a 2 mouth cavity resonance it will be highly susceptible to variation in the lip area. Experimental check of occlusion F-pattern Vocal t r a c t simulation i s an indirect means of studying the F-pattern i n articulatory closed parts of the utterance. It would be handy i f a con- tinuous tracking of the F-pattern were possible in a l l parts of real speech. If we limit our object to voiced stops there exist some limited possibilities of studying FlF2 and F during occlusion providing a high frequency em3 phasis and extra gain i s utilized in the spectrographic analysis. A small pilot studyJc has provided us with data that support the findings above concerning [ba:] and Chi:]. It was thus found that FZiof [ba:] was 1000 Hz and of [bi:] 1700 Hz a s measured from a separate recording of the same subject. During [bi:] and [ga:] there were prominent transi- tions within the occlusion. One technical difficulty in the analysis was the need for high input levels to the spectrograph and thereby the r i s k of overloading with intermodulation formants appearing. Aslother difficulty i s the low level of the voice source immediately before release. St This pilot study was carried out by S. Pauli utilizing both the Voiceprint Spectrograph and the 51 -channel analyzer. A separate seport on these studies is planned. I Identification of s p e c t r a l components Ambiguity often a r i s e s a s what i s the t r u e r e l e a s e t r a n s i e n t of palatal and v e l a r stops. n As pointed out a l r e a d y by F i s c h e r - ~ b r ~ e n s e(1954) t h e r e often o c c u r double o r t r i p l e s p i k e s indicating a sequence of i n t e r rupted a i r injections through the a r t i c u l a t o r y s t r i c t u r e , s e e Fig. I-A-I*. T h e s e multiple s p i k e s could reflect a suction reaction a t the a r t i c u l a t o r y s t r i c t u r e by the Bernoulli p r e s s u r e just a s i n t h e n o r m a l voice sour.2. In voiced v e l a r s t o p s they m a y o c c u r superimposed on the r e g u l a r voice s o u r c e operating i n a b r e a t h y mode s o a s t o d a m p out F1. This reduction o c c u r s both b e f o r e and a f t e r the r e l e a s e and is thus not i n itself indicative The double spikes of the [ k ) b u r s t could a l s o of the instant of release. originate f r o m a reaction on the glottis a t t h e r e l e a s e resulting i n a mom e n t a r y flow reduction. F u r t h e r investigations a r e needed t o r e a c h a b e t t e r understanding of t h e s e phenomena. Another p r o b l e m of i n t e r e s t locus of unvoiced stops. The subglottal impedance shunting the 1 supraglottal impedance i n a c i r c u i t t h e o r y model would account f o r a sub- is the F s t a n t i a l i n c r e a s e i n F1 and could a l s o introduce t r a c e s of subglottal r e s o nances. B e c a u s e of the low energy l e v e l of F 1 in the a s p i r a t i o n i t is h a r d t o g e t r e l i a b l e m e a s u r e s of a n initial F just b e f o r e r e l e a s e . J u s t a f t e r 1 r e l e a s e one o b s e r v e s values of the o r d e r of 300-600 Hz depending on t h e p a r t i c u l a r vowel, s e e F i g s . I-A-1 - I-A-4. However, F 1 of t h e a s p i r a - tion is not v e r y important f o r e i t h e r perception o r f o r s y n t h e s i s and r e cognition work. Acoustic c h a r a c t e r i s t i c s and svnthesis r u l e s When discussing the s t o p s a s a specific ensemble we need not w o r r y about distinctive f e a t u r e s i n a g e n e r a l sense. t h e relation of t h e s u b s e t [k][p][t] We c a n proceed t o d i s c u s s t o that of [g][b][d] and f u r t h e r on investigate the t r i a n g u l a r place relations within e a c h s u b s e t , e. g. what p a t t e r n a s p e c t s o r c u e s a r e typical f o r each of the m e m b e r s within the s u b s e t i n relation t o each of t h e o t h e r m e m b e r s . We do, of c o u r s e , find the expected s i m i l a r i t i e s k/g = p/d = t/d etc. underlying t h e four n a t u r a l c a t e g o r i e s which a r e traditionally r e f e r r e d t o a s 1) unvoiced/voiced, 2) v e l a r s and palatals, 3) labials, and 4) dentals. In t h i s limited m a t e - r i a l of s t r e s s e d and isolated CV-syllables the distinction between voiced and unvoiced s t o p s i s v e r y c l e a r , a s h a s been d i s c u s s e d i n the previous s ec tions * 1 . - See a l s o i l l u s t r a t i o n s of s e v e r a l s p e a k e r s ' [ka] and Cga] i n F a n t (1957/68). I STL-QPSR 4/1969 lfi I A synthegis of GV-stop plus long vowel syllable& of t h e type studied h e r e could proceed as followst tespect t o the p h r a s e prosbdy, a point oh thk t i m e s c a l e wheL'e t e vowel s h a l l s t a r t , If preceded by a voiced s t o p this is the instant of the s t d p r e l e a s e transient. If preceded by a n udvoiced stop t h i s 1s the instant of voicidg onset a f t e r aspiration. ( I ) Determine fil'dt, if needed wit 2 (2) Choose the vowel length a f t e r m o r e o r l e s s detailed r u l e s s t a r t i n g f r o m a m e a n value of 250-350 m s f o r long vowels according t o tempo and d e g r e e of emphasis required. Add 30 m s to the vowel if preceded by a voiced stop. The instant of r e l e a s e t r a n s i e n t of a n unvoiced s t o p is placed 80-120 m s ahead of the voicing onset. ( 3 ) An a p p r o p r i a t e F - p a t t e r n f o r the whole voiced s t o p plus vowel sequence i s generated. This c a n b e used a s a n approximation a l s o f o r t h e corresponding unvoiced s t o p if synchronized t o have i t s r e l e a s e t r a n s i e n t coincide with a point 30 m s a f t e r the r e l e a s e of the unvoiced stop. The F - p a t t e r n f o r the initial 30 m s of the b u r s t is t r a c e d by r u l e s f o r l i n e a r extrapolation back i n time. L a b i a l s b e f o r e b a c k vowels r e q u i r e s e p a r a t e F - p a t t e r n s f o r voiced and unvoiced stops. T h e s e can, however, probably b e derived f r o m coarticulation rules. A m i n o r c o r r e c t i o n f o r the effect of glottal opening on the F - p a t t e r n should be added. An open glottis i n c r e a s e s F and F3 by about 50-100 Hz. 2 (4) Make the Fo contour synchronous with r e s p e c t t o the F-pattern. F o r unvoiced stops add a n Fo i n c r e m e n t i n the f i r s t 50 m s a f t e r voicing onset. (5) Choose a n a p p r o p r i a t e dynamic p a t t e r n of intensity and s p e c t r a l distribution of the voice s o u r c e . O u r s p e a k e r consistently shifted his voiced s o u r c e s p e c t r a l balance t o a m o r e high-frequency d e emphasized shape i n the l a t e r half o r t h i r d of the vowel. An a s p i r a t i v e final t e r m i n a t i o n of voicing i s frequent i n the vowel [a:]. Although s o m e of t h e s e c h a r a c t e r i s t i c s v a r y with s p e a k e r the t r e n d of d e c r e a s i n g vocal effort with t i m e i s typical of the s e n t e n c e final position. ( 6 ) Apply r u l e s f o r s p e c t r u m and t i m e shaping of r e l e a s e t r a n s i e n t s and f r i c a t i v e segments. T h e s e r u l e s have yet t o b e worked out on the b a s i s of production t h e o r y , F a n t (1960), and m o r e quantitatively aimed p a t t e r n matchings, a s will b e d i s c u s s e d l a t e r . In g e n e r a l , s e e F a n t and MArtony (1962), the r e l e a s e t r a n s i e n t should b e synthesized with a DC-stop s o u r c e and a f r i c t i o n s e g m e n t with an app r o p r i a t e l y shaped noise source. T h e r e l e a s e t r a n s i e n t and the f r i c t i o n a r e both synthesized with t h e 'I(-filter", w h e r e a s the iollowing a s p i r a t i o n i s shaped with the 'IF-filter". The initial F- att tern i s a d a c e c o r r e l a t e We s h a l l now r e t u r n t o a study of the d a t a on F - p a t t e r n s and t r a n s i tions i n o r d e r t o evaluate how distinctive they a r e i n identifying "place" of articulation of the consonant and what additional c u e s should b e taken into consideration. 18. STL-QPSR 4/1969 It i s well known and rather obvious that the transitional patterns in the voiced part of a vowel after a heavily aspirated stop pertain to instances in time where the articulators have moved s o f a r away f r o m the consonant that their movements do not retain much distinctiveness. In Table I-A-6 lp][t] and [k] a r e compared in t e r m s of F2 and F3 at the voicing boundary. The reduction i s especially apparent comparing [ t ) and [p] before the vowel [ a:] and unrounded front vowels. The loss of transitional information within the stop burst i s specified by Table I-A-7. The amount by which voiced and unvoiced stops differ in F2 and F3 a t the instant of the release transient i s shown in Table I-A-7. The e a r l i e r discussed differences in articulation of voiced and unvoiced labials before back vowels a r e apparent. In other combinations the differences a r e not l a r g e r than 300 Hz and generally smaller than 200 Hz. TABLE I-A-6 F2 and F 3 differences a t instant of voice onset a s place correlates within unvoiced stops TABLE I-A-7 Extent of F and F transitions within unvoiced 2 3 segments (from release to voice onset) The d i s c r i m i n a t i v e power of the second and t h i r d f o r m a n t frequencies . and Fji is i l l u s t r a t e d i n Fig. I-A-11 and Fig. I-A-12. The following 21 g e n e r a l conclusions c a n b e drawn. T h e m a i n c h a r a c t e r i s t i c of dentals F c o m p a r e d with l a b i a l s is t h e 350-500 Hz higher Fji. Dentals m a y have higher FZit h a n l a b i a l s if c o m p a r e d i n context with the s a m e vowel. palatal [ k] and Cg] The b e f o r e t h e unrounded front vowels [ k][ e] and [ E:] com- p r i s e a p e r i p h e r a l l y located s u b s e t of higher Fji - - and a l s o somewhat higher F . than any dental. The [ k) and Cg3 b e f o r e rounded front vowels [ y:] 21 [u:] and differ f r o m l a b i a l s and d e n t a l s by a somewhat higher [#:I only. T h e v e l a r [k] and Lg] b e f o r e the back vowel [a:] h a s a l o w e r Fgi than any l a b i a l plus vowel. It is i n t e r e s t i n g t o note that t h e initial F F p a t t e r n differentiates un2 3 voiced s t o p s somewhat b e t t e r than voiced s t o p s which is fully i n l i n e with t h e previously i n f e r r e d finding that a t the i n s t a n c e of r e l e a s e t h e unvoiced s t o p s a p p e a r t o b e l e s s coarticulated with t h e following vowel t h a n is c o r responding voiced stops. T h i s is a l s o a p p a r e n t by the s m a l l e r s p r e a d of t h e unvoiced d a t a with r e s p e c t t o vowel context a s a l r e a d y pointed out .in connection with Fig. I-A.-7. ferences in T h e d e t a i l data on the unvoiced-voiced dif- F 2i and Fgi a r e given i n T a b l e I-A-8. The negative values of F3p-F3ba r e a s c r i b a b l e to t h e difference i n coarticulation a s is typically of. [u:][o:l and [a:]. It should b e kept i n mind that the glottal F2p'F2b shunt contributes t o the t r e n d of positive signs of t h e d a t a with a n a v e r a g e amount of the o r d e r of t 1 0 0 Hz. Thus with the exception of the infiltration of Lgu:] and [gu:] i n the l a b i a l a r e a i n Fig. I-A-10 a l l dentals a r e confined t o one a r e a of the place and a l l l a b i a l s a r e confined to a s e p a r a t e a r e a and the v e l a r - p a l a t a l s t o a l a r g e range of p e r i p h e r a l locations outside t h e s e a r e a s . F o r the c o r - responding unvoiced s t o p s , Fig. I-A-1 1 , t h e r e i s no overlapping. The vowel t a r g e t s a r e included i n Fig. I-A-10 s o a s t o allow a derivation of t h e d i r e c t i o n of CV-transitions, S p e c t r a l energy cues. General feature discussion An effective approach f o r testing t h e relevance of t h e s e t r a n s i t i o n a l c u e s is t o look up p a i r s of consonants i n t h e s a m e vowel context w h e r e t h e F - p a t t e r n d a t a a r e the s a m e o r a l m o s t the s a m e and then s e e what o t h e r c u e s t h e r e a r e t o note. i n his VICV 2 studies. This technique was used by a h m a n ( 1 966) He found that the CV2 p a r t of Cybo] was the s a m e I I - I I I I I I I I I I I 1 I F3 AND F2 M E A S U R E D AT PLOSION SUBJECT: B.L. - I . I k eki I I k~ D - D D I .. - D m m . I - ka D rn rn m - - - I I I Fig. I-A-12. I I I I I I I I I 1 I n i t i a l F and F of unvoiced S w e d i s h s t o p s , s u b j e c t B . L. 2 T h e vowel t a r g e l s a r e i n d i c a t e d i n t h e f i g u r e . STL-QPSR 4/1969 21. An extension of the r a n g e of analysis t o h i g h e r frequencies than 4000 Hz adds t o t h e distinctiveness of t h e s e visually defined c u e s , mainly by displaying the high frequency components of the [ t ] and [ d l bursts. The s t a t e m e n t s above concerning " s p e c t r a l energy'' r e f e r t o the f i r s t 10-30 m s a f t e r the r e l e a s e which a p p e a r s t o c a r r y t h e m a i n information on the place of articulation. T r a n s i e n t b u r s t and t h e f i r s t p a r t of a vowel when appearing within t h i s segment should b e regarded a s a single stimulus r a t h e r than a s a s e t of independent c u e s , F a n t (1960, p. 217), Stevens (1967). When relating d a t a f r o m r e a l s p e e c h to e x p e r i m e n t s with synthetic s p e e c h one should keep this i n mind. As stated a l r e a d y by E. ~ i a c h e r~- j d r ~ e n s e(1 n954): "The l i s t e n e r does not c o m p a r e eirplosion with explosion and t r a n s i t i o n with t r a n s i t i o n but c o m p a r e s a r t i f i c i a l s yllables comprising e i t h e r explosion o r t r a n s i t i o n with n a t u r a l syllables that always contain both". When discussing t r a n s i t i o n s it s e e m s w i s e t o distinguish two categories: 1) those r e l a t e d to the overall tongue b ~ d ymovement within the whole of a previous o r a following vowel and 2 ) t h o s e related to the b r e a k of a consonantal obstruction o r t h e movement towards c l o s u r e . T h o s e belong- ing t o c a t e g o r y 1 ) mainly reflect vowel coarticulation and a r e l e s s d i s tinctive than those of c a t e g o r y 2). A typical example is t h e falling t r a n s i - tion f r o m labial s t o p t o back vowel, s e e Fig. I-A-4, which r e f l e c t s the tongue body movements w h e r e a s the labiality c u e s m a y b e confined t o t h e f i r s t 10 m s only and m a y not b e visible i n the s p e c t r o g r a m . Production theory, F a n t (1960), provides a b a s i s f o r explanation of the o r i g i n of the g e n e r a l c h a r a c t e r i s t i c d i s c u s s e d above and i s the s t a r t i n g point f o r d e r i v a t i o n of synthesis s t r a t e g i e s . Thus the m a i n f o r m a n t of the [ k]Lg] sounds d e r i v e s f r o m the cavity i n front of the tongue constriction and is r e p r e s e n t e d by a f r e e pole, The diffuse s p e c t r u m of [ p ] and [ b ] r e l e a s e originates f r o m the l a c k of any f r o n t cavity. At r e l e a s e the d i s - p e r s i o n effect is pronounced, pole frequencies rapidly moving i n positive d i r e c t i o n away f r o m a s s o c i a t e d z e r o s which n e u t r a l i z e the poles b e f o r e release. The [ k l [ g ] , on the o t h e r hand, have a f r e e pole b e f o r e r e l e a s e . In the c r i t i c a l s e g m e n t a f t e r r e l e a s e t h i s pole cannot d i s p l a y v e r y rapid movements. The [t ] and [ d l have a s m a l l and n a r r o w front channel be- hind the s o u r c e which is a s s o c i a t e d with a high-pass sound filtering. TABLE I-A-9 B u r s t formant a r e a s of [k] and [g] T A B L ~I - A ~ I O T a r g e t values of subject; s f o r m a n t frequencies towards the end of the vowel ,*l L The m e a n frequency of the [k] and Lg? b u r s t s and t h e i r F a p a t t e r n associations i n different vowel contexts have b e e n m e a s u r e d and the d a t a a r e p r e s e n t e d i n Table IdA-9. 1000 Hz t o 3500 Hz. The d a t a v a r y o v e r a 2500 Hz range f r o m The observed differences With r e s p e c t t o voicing a r e not v e r y significant i n view of the limited data. Secondary c o r r e l a t e s to the place of articulation f o r [ k] and [ g ] a r e the approximately 30 m s d e l a y f r o m r e l e a s e t r a n s i e n t t o t h e a p p e a r a n c e of the f o r m a n t s t r u c t u r e i n the following vowel. The F 1 transitions after [ b l [ d ] and [ g ] a r e not m u c h d i f f e r e n t except that the F1 r i s e tends t o b e somewhat s l o w e r a f t e r Cg]. The differences i n vowel t a r g e t s conditioned by the p a r t i c u l a r place of articulation of t h e consonant could b e m e a s u r e d but a p p e a r t o b e too s m a l l t o be of any appreciable perceptual significance. T h e F0 c u e s a l s o contribute. Approximate vowel t a r g e t s f o r the subject B. L. a r e shown i n Table I-A-10. They p e r t a i n t o the final p a r t of the vowel, i n c a s e of c l o s e vowels (lowest l e v e l F ) t o t h e diphthongal t e r 1 mination. In [u:] and [u:] this i s a l i p c l o s u r e which accounts f o r the falling F2 and F 3' F o r Ci:] and Ly:] the diphthongal element is m a d e with the tongue p r e s s i n g h a r d e r against the palate' T h i s accounts f o r the r i s e i n F 3 a t constant lipopening i n ly:] and [if]' A m o r k detailed d i s c u s s i o n of Swedish vowels was given by F a n t (1 96914 Intensity-frequency sections of the t r a n s i e n t and b u r s t s p e k t r a of Swedish s t o p s have e a r l i e r been published Hy F a n t (1959) and c o r r e s p o n d ing d a t a on R u s s i a n stops by F a n t (1960). T h e s e d a t a support t h e conclu- s i o n s above and support t h e f e a t u r e f r a m e of jakobson, F a n t , and Halle (1 952/67) a s [ k ) [ g l being compact, [ p ] [ b j diffuse and g r a v e , [ t ] [ d l diffuse and a c u t e (nongrave). Although Chomsky and Halle (1 968) improved t h e f e a t u r e s y s t e m by introducing tongue body f e a t u r e s s e p a r a t e f r o m t h e place of articulation f e a t u r e s they have not been equally s u c c e s s f u l i n defining "place" f e a t u r e s that i r r e s p e c t i v e of c o o c c u r r e n c e with o t h e r f e a t u r e s r e t a i n s o m e p e r ceptual invariant e o r a t l e a s t s i m i l a r i t y . F u r t h e r m o r e , they a r e highly disputable e v e n on the level of production c o n t r o l , F a n t (1969). Although t h e f e a t u r e "anterior" t a k e s o v e r the function of "diffuse" and thus could i n h e r i t the s a m e c o r r e l a t e s t h e r e i s a r e a l trouble with the "coronal" f e a t u r e , which l o s e s i t s physiological b a s i s when separating dentals f r o m labials. The c l a s s of labial consonants is accordingly s e l e c t e d by r e f e r - ence t o the negative of a f e a t u r e r e f e r r i n g t o activities i n m u s c l e s which have nothing to d o with the lips. F r o m the perceptual point of view the f e a t u r e [+coronal] s e p a r a t i n g dentals f r o m l a b i a l s when combined with the f e a t u r e [ + a n t e r i o r ] i m p l i e s a high v e r s u s low frequency emphasis. When the c o r o n a l f e a t u r e is used t o differentiate [ - a n t e r i o r ] f r i c a t i v e s , e. g. Swedish [ s] and [ c ] , with r e s p e c t t o the t i p of the tongue being up [+coronal] o r down [ - c o r o n a l ] t h e acoustic effect a p p e a r s t o b e the opposite, t h e [+coronal] (retroflexion) accounting f o r a lowering of the m e a n frequency of the s p e c t r u m . I cannot find any o t h e r s p e c t r a l c h a r a c t e r i s t i c s of t h e "coronal" f e a t u r e t h a t would b e retained i n combination with both + and - anterior, The "coronal" f e a t u r e would not display t h i s acoustical ambiguity i f r e s t r i c t e d t o the c l a s s of [ - a n t e r i o r ] consonants. Stevens' (1967) t h e o r y of perceptual i n v a r i a n c e conforms with t h e gene r a l s t a t e m e n t on s t o p f e a t u r e s above and h a s elements i n common with that of F a n t (1960, p. 217) and Jakobson, F a n t , and Halle (1 952/67). his t r e a t m e n t of v e l a r sounds i s a l m o s t t h e s a m e a s m y e a r l i e r . His Thus, STL-QPSR 4/1969 24. floating r e f e r e n c e of s p e c t r a l energy with r e s p e c t t o the following vowel being low in labials is valid f o r the s h o r t ( = l o m s ) delabialization s e g ment only and r e q u i r e s that the aspiration is identified with the vowel. I have a feeling that the r e f e r e n c e t o the vowel i s not needed f o r d i s criminating [p] and [t], Stevens' t r e a t m e n t of lower pitch than a retroflex [a is valid f o r v e l a r Cg] it is m o r e n a t u r a l t o oppose v e l a r the relation of t r a t e d energy. [%I Cg] Lg] a s acoustically of only. In my view t o palatal [ g l pitch wise whereas t o [g] is basically a m a t t e r of s p r e a d v e r s u s concen- The [el should rightly b e opposed t o [ d l , the more'Y1at"and a l s o l e s s s p r e a d than [ d l . [t]being The role of the f e a t u r e "dis- tributed" i n this connection is not clear. References CHOMSKY, N. and HALLE, M, (1 968): Sound P a t t e r n of English ( ~ e w ~ork). DELATTRE, P., LIBERMAN, A.M., and COOPER, F.S. (1955): "Acoustic Loci and Transitional Cues for Consonants", J. Acoust.Soc.Am. 27, pp. 769-773. ELERT, C -C. (1 964): Phonological Studies of Guantity i n Swedish ( t h e s i s , Uppsala), FANT, G. (1 957/68): "Den akustiska fonetikens grunder", Report No. 7, KTH, Speech T r a n s m i s s i o n Laboratory (stockholm), new edition. FANT, G. (1 959): "Acoustic Analysis and Synthesis of Speech with Applications t o Swedish", E r i c s s o n Technics No. 1, pp. 3-108. FANT, G. (1 960): Acoustic Theory of Speech Production (' s - ~ r a v e n h a ~ e ) , FANT, G. (1 968): "Analysis and Synthesis of Speech P r o c e s s e s t ' i n Manual of Phonetics ed. by B. Malmberg, pp. 17 3-277 (~msterdam). FANT, G, (1969): "Distinctive F e a t u r e s and Phonetic Dimensions", pp. 1-18, STL-QPSR 2-3/1969. FANT, G. and M ~ T O N Y ,J. (1 962): "Speech Synthesis1', pp. STL-QPSR 2/1962. FANT, G. 18-24, , LINDBLOM, B. , and M ~ R T O N Y J. , (1 963): "Spectrograms of Swedish Stops", p. 1 , STL-QPSR 3/1963. FISCHER-J~RGENSEN, E. (1954): "Acoustic Analysis of Stop Consonants", Miscel. Phonetica 2, pp. 42-59. FUJIMURA, 0. (1961): "Bilabial Stop and Nasal Consonants: A Motion P i c t u r e Study and i t s Acoustical Implications", J, of Speech and 4, pp. 233-247. Hearing R e s e a r c h HOUDE, R. A. (1 967): "A Study of Tongue Body Motion During Selected Speech Sounds", (thesis, Univ. of Michigan, Ann Arbor). JAKOBSON, R. , FANT, G. , and HALLE, M. (1 952/67): " P r e l i m i n a r i e s t o Speech Analysis: The Distinctive F e a t u r e s and T h e i r C o r relates", MIT, Acoust. Lab. , Techn. Rep. No. 13 (1 952); 7th edition publ. by MIT P r e s s (Cambridge, Mass. ).