Stops in CV-syllables

Transcription

Stops in CV-syllables
Dept. for Speech, Music and Hearing
Quarterly Progress and
Status Report
Stops in CV-syllables
Fant, G.
journal:
volume:
number:
year:
pages:
STL-QPSR
10
4
1969
001-025
http://www.speech.kth.se/qpsr
B.L.
I
kHz -
kHz
-
kHz
-
1-
1-
0--,
,
1
-
0-
1
-.l 0
.1
.2
3
4
sec.
.5
I
kHz -
,
.I
0
a
0
.2
.1
.3
.4
5
6
sec.
-.I
0 .I
.2
3
6sec
5
4
kHz4-
4-
-
3-
lbr
3-
a
-
-.2 - 1
0
.I
.2
Fig. I-A-1.
.3
.4
sec.
;I.
- . 2 -.I
0
1
Spectrogram8 of [p][t][k][b][d][g]
.2
. 3 .A
sec
before the vowel
-.2 -.I
[a:].
0
1
Subject B . L .
2
.3
L
set
B.L.
kHz 4-
3-
3-
2-
2-
-
1-
0 - 1 1 1 k1 ' 1
-.I 0 .I . 2
I l l
3
A
.5
I
sec.
0
.I
.2
.3
.5
4
.6
set.
-.2 -.l 0
.I
.2
3
4
5 sec
kHz -
kHz 4-
. 2 -.l 0
bi
F i g . I-A-2.
.1
2
. 3 .4 .5 sec
51i
di
Spectrograms of [p][t][k][b][d][g]
sec.
before the vowel [i:].
Subject B . L .
B.L.
I
-
0-'I
A
(
I
I
-.l 0
I
I
.1
.2
3
4
I '
.5
-.l 0
sec
1
2
4
.3
0 J I 1 1
1
0
' I
.5 .6 sec.
tar
Pa
1
1
2
3
L
I l l
5 6sec
kg
kHz-
kHz -
kHz
1
L --
sec.
bra
Fig. I - A - 3 .
-.2 -.i 6 .i . 2
.3
.i
' I '
sec.
4
d@
Spectrograms of [p][t][k][b][d][g]
g@
before the vowel
[b:].
Subject B . L .
sec.
B.L.
k H z -.I
kHz -
kHz -
4--
4-
4-
-
3-
-.l 0
.1
.2
.3 4
.5
sec.
6 .i . 2
3
x
.5
.s
-.1
sec.
kHz -
0 .1 . 2 . 3 4
.5 .6 sec.
kHz
4
3-
2-
it
1--
0-.2 -.l 0
Fig'.
.1
.2
A-4.
.3 .L
.5 sec.
- . 2 -.I
0 .I
.2
Spectrograms of [ p l [ t l [ k l [ b l [ d ] [ g ]
3
.4
.5 sec.
I I
.2 -.1
before the vowel [u:].
1
0 .I
.2
Subject B . L.
.3
1
.4
1
1
.5
Sec.
-
I
I
2.
STL-QPSR 4/1969
( 3 ) F r i c a t i v e segment. T h i s is c h a r a c t e r i z e d by a noise produced a t the
consonantal c o n s t r i c t i o n a s i n a homdrgallic fricative. Z e r o s i n t e r a c t s o a s t o c a n c e l "back cavity" f o i m a n t s , while "front cavity" f o r mants prevail.
\r~%
(4) Aspirative segment. T h i s i s c h a r a c t e r i z e d by a n "h-like" noise
originating f r o m a random s o u r c e a t the glottis o r f r o m a supraglottal
s o u r c e a t a relative wide constriction exciting a l l formants. FZ, F3,
$A? arid FF4 a r e the m o s t typical canstituentsr The a s p i r a t i v e segment c a n
i n p a r t c o o c c u r with the f r i c a t i v e segment but t a k e s o v e r a s the d e g r e e
of a r t i c u l a t o r y opening p r o c e e d s c
&
(5) The initial papt of a following voiced sdund t o the extent that i t is influenced by coarticulation with the
st094
>
-
This c o m p l e t e sequefice is typical af a s p i i a t e d [t] and [k] w h e r e a s the
frictional
phase is ihthkr weak o r abseht i n Ep] b e c a u s e of t h e low noise
generating efficiency of the bilabial s t r u c t u t e and t h e rapid delabialization.
fn the terminology of t h e c l a s s i c a l ~ a d k i n s 's y n t h e s i s the tyansient plus
f r i c t i o n plus algpiration
eegment c a n b e s e e n ih [ d l and
Cg].
?he
Lg]
Ir
A s h o r t frictional
h a s the m a s t appapent t r a n -
s i e n t segment which c a n b e a s c r i b e d t o t h e high
of t h e front cavity r e s o Accordingly,
nance and lacking d i s p e r s i o n effects of t h e m a i n formant.
Cg]
I
i a t r e a t e d a s a singlk s e g m e n t called t h e "burst".
Voited Swedish stops l a c k the a s p i r a t i o n phage.
t h e d u r a t i o n of the
1
and [ k] t r a n s i e n t s is l o n g e r t h a n in any o t h e r stops.
\
Uninterrupted voicirig c a n b e superimposed during a l l p h a s e s of [b][d]
and [ g ] i n which c a s e the t r a n s i e n t a p p e a r s a s a n e x t r a spike i n the background of voicing.
f
It i s m o r e typical, however, that voicing i s absent o r
v e r y weak, s o m e 50 m s b e f o r e and 10-30 m s a f t e r the transient.
The
period of weak voicing a f t e r the r e l e a s e thus c o r r e s p o n d s to the a s p i r a t i v e
segment (4) i n unvoiced stops and m a y coincide with a f r i c a t i v e segment i n
[ d l and [g].
We s h a l l r e t u r n t o t h e production t h e o r y i n the l a t e r f e a t u r e
discussion.
A f i r s t inspection of s p e c t r o g r a m s , s e e F i g s . I-A-1
- I-A-4,
s o m e a p p a r e n t a s p e c t s of the voiced-voiceless distinction.
s t o p s [k][p][t]
reveals
The unvoiced
have a "burst", i. e. unvoiced segment defined by t h e d i s -
t a n c e f r o m r e l e a s e t r a n s i e n t t o the full voicing onset i n t h e following vowel
which i s of the o r d e r of 125 ms c o m p a r e d with 10-25 m s f o r t h e c o r r e s p o n d ing segment i n [b][d][g].
However, t h e duration of the voiced p a r t of the vowel is approximately
the s a m e a f t e r a l l s t o p s , voiced a s well a s unvoiced.
Thus the t e m p o r a l
organization i s not s i m p l y a m a t t e r of delay i n voicing i n [k] [p] [ t ] compared with [b][d][g]
a t t h e expense of the vowel length.
e
1
4.
STL-QPSR 4/1969
One f a c t o r that s e e m s t o add t o the observed dufations of c l o s e Swedish
vowels is that they tend t o tie diphthongized with a homorganic fricative.
Since my c r i t e r i o n f o r the telminhtion of the vowel segment was the d i s appearance of
~i
i n the spectrog Yam the vowel accordingly incorporates
As shown i n Table I-A-1. B the
duration of the 6 u r s t is somewhat s h o r t e r before [u3 and [i] than before
s voiced,
any frictional termination ~ ~ r h i di h
other vowels thus tending t o reduce differences i n the o v e r a l l duration of
syllables comprising various vowels a f t e r unvoiced s t e p s J
The m a i n finding above that the duration of the voiced p a r t of a vowel
is not substantially influeneed by the M t u r e of the preceding consonant conf o r m s with the observations af P e t e r s o n and L e h i s t e ( 1 9 6 0 ) ~a s illbstrated
by t h e i r figi 4 e%emplifyirig s p e c t r o g r a m s of "tugt' and
"duck".
"tuck" and
however, t h e i r atteyhge findings indicate that the vowel a f t e r a
voiced stop exceeds the length of the voiced p a r t of a vowel a f t e r a n unvoiced stop by approximately onemhalf of the b u r s t duration ok 30 ms.
T h e i r vowel lengths a r e about 20 p e r cent s h o r t e r than those reported
h e r e whilst the absolute value of t h e i r b u r s t durations w e r e of the o r d e r
of 60 p e r cent of o u r s indicating a heavier aspiration of the Swedish stops.
TABLE I-A- 1. B
Voicing onset delay in ms a f t e r r e l e a s e in CV syllables
b
u
o
a
mean
130
100
130
125
115
120
1 0 5 , 100
125
120
120
125
130
150
115
130
140
130
130
115
115
130
105
130
130
-
<10
<10
<10
10
15
0
0
15
<10
10
15
30
40
25
40
10
10
15
20
25
25
20
15
35
15
15
20
35
25
Y
i
e
E
u
p
t
100
130
120
100
95
120
125
125
125
k
120
130
150
mean
115
130
db
<10
gd
g
120
h
T h e s e d i f f e r e n c e s a r e probably related m o r e t o t h e complexity of t h e
t e s t w o r d s , CVC v e r s u s CV, than t o language specific pronunciation a s
judged by o t h e r speech m a t e r i a l a t o u r disposal,
should a l s o b e considered,
T e m p o and l e v e l of s t r e s s
T h e s e conditional f a c t o r s a s well a s the effect
of location within a complex s t r i n g of syllables need t o b e investigated
further.
As s e e n i n Fig. I-A-5 a s p i r a t i o n is not l o s t i n sentence initial
u n s t r e s s e d syllables.
In t h e s e CVCV:CV words spoken by t h e s a m e subject, B. L., the d u r a tion of the b u r s t is of t h e o r d e r of 50-90 m s in s t r e s s e d positions and 50-70
m s i n u n s t r e s s e d sentence initial position.
Interesting m a t e r i a l f o r c o m p a r i s o n is offered by a h m a n (1965).
He
u s e d t e s t words of the type CVCen (with C=g and k, V=long [ a : ] and s h o r t
[ a ] with accent 2 word intonation) i n s e r t e d i n a c a r r i e r sentence ( s a g a
@en).
...
T h e durations of h i s [ k ] - b u r s t s w e r e m o r e o r l e s s constant 80 ms.
When two u t t e r a n c e s differing b y the voiced/voiceless distinction of t h e
f i r s t C w e r e compared and synchronized with r e s p e c t t o o v e r a l l intonation
p a t t e r n he found that the instant of s t o p r e l e a s e had to c o m e 40 m s e a r l i e r
i n [ k] than i n [ g]
.
o h m a n a l s o c l a i m s that t h e s a m e relative timing p a t t e r n o c c u r s if the
articulation of the following vowel and not the intonation is taken a s a
b a s i s of comparison.
T h i s r u l e a l s o a p p e a r s t o hold i n the p r e s e n t CV
m a t e r i a l a s shown by Fig. I-A-6 exemplifying t h e overlaying of t r a c e d
f o r m a n t p a t t e r n s of [ta:] and [ d a
:I, [ k a : ]
and Lga:].
H e r e the r e l e a s e
of the [ t ] i s located 30 m s ahead of the r e l e a s e of [ d l and the s a m e holds
f o r [ k] c o m p a r e d with
Lg].
T h i s means that t h e a r t i c u l a t o r y g e s t u r e
a f t e r r e l e a s e i s different i n the voiced and unvoiced plosives.
f e r e n c e c a n have two dimensions.
T h i s dif-
One is that the a r t i c u l a t o r y p a t t e r n is
different a t the instant of r e l e a s e and eventually r e a c h e s t h e s a m e dynami c a l p a t t e r n although a t different t i m e s f o r t h e two s t o p s o r that the initial
a r t i c u l a t o r y p a t t e r n is m o r e o r l e s s the s a m e , except f o r the l a r g e r glottal
opening a t t h e r e l e a s e of the unvoiced stop, whilst the offset g e s t u r e p r o c e e d s a t a s l o w e r r a t e i n the f i r s t 40 m s a f t e r r e l e a s e of the unvoiced s t o p .
T h e l a t t e r a p p e a r s t o b e the c a s e with palatal stops and possibly a l s o f o r
m o s t dental stops.
*
I
T h e t e r m i n a l F - p a t t e r n s a r e not s o different comparing
A r t i c u l a t o r y data on English atoptvowel dynamics published by
Houde (1967) a r e of s o m e i n t e r e s t i n t h i s connection.
I
release
3000
I
I
0
0 1
1 0.2
d t
voice onset
Fig. I - A - 6 .
!6 sec.
ram..
0
-e
.
;i
o f t
I
I
0.2
I
I
I
0.4
9 k
v o i c e onset
F-patterns of voiced and unvoiced stops matched for articulatory synchrony.
b
0.6 sec.
6.
STL-QPSR 4/1969
Cg]
and [ k ) o r
[dl
and [ t ] as comparing [p] and [ b l i n a position b e f o r e
a back vowel w h e r e unvoiced stops have a much higher t e r m i n a l F
voiced stops.
shown
2
tha
T h i s holds f o r Swedish a s w e l l as f o r English as will be
On a l a t e r p a r t of this ar'ticle,
Returning t o m a t t e r s of segment durations it a p p e a r s f i r s t of all t h a t
available d a t a on the tlifferenbeo in voiced vowel lehgth with r e s p e c t t o t h e
infltlence of the voiced/voiceless distihction of the preceding s t o p a r e l e s s
v a r i a n t on a n absolute than on a relative t i m e scale.
Thus the P e t e r s o n -
L e h i s t e d a t a c a n be expteseetl a s a n a v e r a g e of 30 ms longer vowel a f t e r
voiced than a f t e r urivoiced s t o p and the a h m a n d a t a a r e c l o s e t o the 40 m s
difference which holds i n s h o r t a s well a s i n long vowels.
The l a t t e r ob-
s e r v a t i o n is r e m a r k a b l e i n view of the fact t h a t the long vowels a r e about
60 % l o n g e r than the s h o r t ones.
If the p r e s e n t m a t e r i a l of C V syllables
is t o b e analyzed i n exactly the s a m e way a s that of t h e o t h e r two studies
mentioned above we m u s t add t o the length of the vowel a f t e r voiced s t o p
t h e duration of the v o i c e l e s s o r weakly voiced i n t e r v a l between r e l e a s e
t r a n s i e n t and visible onset of the following vowel.
In a l l we would then
have a 20 m s vowel length difference i n the g-k comparison, a 25 m s i n
t h e d-t contexts, and a 25 m s i n t h e b-p contexts.
A s i m p l e n u m e r i c a l r u l e f o r relating t h e s e f a c t s would be that the
vowels a f t e r voiced stops a r e prolonged by the s a m e amount a s the latenc y of the instant of voiced s t o p r e l e a s e compared with t h e unvoiced s t o p
release.
In ohman' s m a t e r i a l t h i s l e a d s t o absolute synchrony of the in-
s t a n t of vowel t e r m i n a t i o n b e f o r e the s t o p g a p of the following c o n s o n a n t ~ .
Approximately the s a m e could be t r u e of t h e P e t e r s o n - L e h i s t e d a t a s i n c e
t h e difference i n vowel lengths is of t h e o r d e r of one- half of the b u r s t
length.
In o u r CV-material, however, the e x c e s s i v e length of t h e b u r s t ,
a v e r a g e 125 m s , accounts f o r a relative prolongation of the instant of
voice offset of t h e vowel preceded by a n unvoiced stop.
T h i s prolonga-
tion a s s u m i n g maximum vowel synchrony is apparently equal t o t h e b u r s t
length minus the voiced s t o p r e l e a s e lag minus t h e difference i n voiced
vowel length.
T h i s d i s c u s s i o n is p e r h a p s c a r r i e d f u r t h e r than p e r h a p s motivated
by o u r m e a g e r data.
I
However, the purpose is t o s t i m u l a t e f u r t h e r w o r k
on the formulation of r u l e s f o r s e g m e n t a l programming.
It could b e that
i n the specific mode of reading isolated CV-syllables the s e g m e n t a l
I
p r o g r a m m i n g is governed mainly by a rhythmical demand of producing
equally apaced, equally loud vowel nuclei,
T e s t s on t h e timing of syllable
production i n synchrony with a periodically repeated auditory signal p e r formed by Lindblom and Sundberg* indicate that the i n s t a n t of m a j o r intensity i l i c r e a s e i n the syllable, a s p e c i a l c a s e of which is t h e instant of
These
switching f r o m voiceless t o voiced segment, g o v e r n s the timing.
d a t a support the syllabic! timing r u l e s pfopased by Koahevnikov and
chistotrich (1 965).
One typical ekample of the r o l e of voicing boundary a s a d e t e r m i n a n t
of s e g m e n t a l organization c a n b e studied i n the [ C ~ C:Ca] (C=k, p, t , g, b , d )
s p e c t r o g r a m s of Fig. I-A-5.
T h e t i m e i n t e r v a l between onset of voicing
i n the first and t h e second vowel and between the second and final vowel is
shown below together with d a t a on t h e duration of t h e t h r e e vowels.
TABLE I-A-2
CaC a:Ca segmental a n a l y s i s , t i m e i n m s
- onset V 1
Onset V3 - onset V2
Onset V2
C=
k
6
P
b
t
d
260
250
270
250
270
240
370
360
365
360
370
350
1
Duration V2
85
125
75
110
85
120
180
240
180
240
190
250
Duration V3
170
180
170
160
170
170
Duration V
The stability of t h e s e t e m p o r a l r e f e r e n c e points of vowel o n s e t s holds
f o r variations i n place of articulation within 10 m s and within 30 m s f o r
the voicing distinction.
The i n c r e a s e i n consonant length with unvoicing
is somewhat l a r g e r than f o r reduction of the vowel length.
Thus the V t C
i n t e r v a l s of Table I-A-2 a r e about 15 m s l o n g e r when C is unvoiced than
when C is voiced.
The initial vowel is c l o s e to 40 m s l o n g e r when the
consonant i s voiced i n a g r e e m e n t with previous findings.
T h e second and
fully s t r e s s e d vowel is 60 ms longer i n a voiced context w h e r e a s the final
vowel which is u n s t r e s s e d does not v a r y much i n length depending on the
voicing of t h e consonant.
T h e l a t t e r observation conforms with the f a r
going reduction of the a c o u s t i c a l distinction between voiced and unvoiced
*
unpublished data.
I
STL-QPSR 4/1969
8.
stops i n non-initial u n s t r e s s e d position.
The relative l a r g e effects on the
second vowel could b e a s c r i b e d t o the added influence f r o m both previous
and following consonants.
A f u r t h e r d i s c u s s i o n of the k/g,
p/b,
and t/d
I
distinctions follows i n a l a t e r p a r t of t h i s a r t i c l e ,
B e f o r e leaving the topic of s e g m e n t a l s t r u c t u r e s o m e words should
be s a i d about the t e r m i n a l boundary of a vowel followed by a stop.
If
voicing i s continued s t r a i g h t through t h e occlusion the boundary is s e t by
the a r t i c u l a t o r y c l o s u r e a s s e e n by the t e r m i n a t i o n of the F1 t r a n s i t i o n
towards b a s e - l i n e position.
Vowels followed by unvoiced s t o p s a r e t e r -
minated by a n a c t i v e devoicing g e s t u r e of the vocal c o r d s which is synchronized t o t u r n off voicing a t o r just b e f o r e the a r t i c u l a t o r y closure.
The a r t i c u l a t o r y closing g e s t u r e m a y well contribute t o the final i n t e r ruption of t h e voice s o u r c e but t h i s i s not a n e c e s s a r y requirement.
In
heavily s t r e s s e d positions the voicing h a s died out well b e f o r e the a r t i c u l atory closure.
Vowel d u r a t i o n is influenced m o r e by the following con-
sonant t h a n by a preceding consonant, s e e P e t e r s o n and L e h i s t e (1 960),
E l e r t (1964), and a forthcoming report?.
Transitional patterns
The purpose of t h e following section is t o d i s c u s s the m a t e r i a l on f o r -
I
mant p a t t e r n s and t r a n s i t i o n s i n the C V - m a t e r i a l i n relation to e a r l i e r
s t u d i e s , notably those of L e h i s t e and P e t e r s o n (1 96 l ) , a h m a n (1 966), and
F a n t (1959).
By f o r m a n t t r a n s i t i o n s is understood the dynamic variation of the Fpattern, i. e. F F F F a s a function of time. T h e extent t o which t h e
1 2 3 4
F - p a t t e r n dynamics signals the place of articulation is one problem of
general interest.
Another is t h e possibility of i n f e r r i n g coarticulation
f e a t u r e s f r o m F - p a t t e r n analysis.
We s h a l l attempt t o c o m p a r e voiceless
and voiced s t o p s i n Swedish and English accordingly.
As a control on
s o m e of t h e m e a s u r e m e n t s using a vocal t r a c t model we s h a l l s i m u l a t e
t r a n s i t i o n s that a r e difficult t o follow i n s p e c t r o g r a m s .
Finally we s h a l l
d i s c u s s d a t a , vocal t r a c t theory, and proposed models of perception i n
relation t o f e a t u r e theory of s t o p sounds.
*
A t h e s i s study by Inger K a r l s s o n and L. Nord support t h i s view.
I
9
STL-QPSR 4/1969
F i r s t a few w o r d s about t r a n s i t i o n s and sampling techhiques.
The main
object of o u r m e a s u r e m e n t s h a s b e e n t o s a m p l e the F - p a t t e r n extrapolated
t o the instant of t h e beginning of t h e t r a n s i e n t r e l e a s e of the s t o p closure.
T h i s is not a n unambiguous p r o c e s s i
he
fikdt pakk of the t r a n s i t i o n a f t e r
r e l e a s e m a y b e v e r y rapid and difficult t o faiiow,
A fact which often is
overlooked is that a CVdtransition i s often cort.rplex, comprising a f i r s t
rapidly p r o g r e s s i n g p a r t ?elated to the f.ele&se of the consonantal obdtruch
I
of longer t i m e kondtant related t o the
*his is typicaliy the t a a e with l a b i a l s but
tion plus a n ovekiayed t r a n s i t i o n
m a i n tondue body movementi
a h o with aiveolaes and dentaid,
It m a y b e difficult t o follow a l o r m a n t trankition i n unvaiced s e g m e n t s
but t h e r e v e r s e c a n a l s o b e t r u e .
An iritense a s p i r a t i o n m a y provide m o r e
favorable conditions f o r F - p a t t e r n tracking than a v e r y low pitched voiced
segment.
It was considered of i n t e r e s t t o s a m p l e the F - p a t t e r n of un-
voiced s t o p s not only a t r e l e a s e but a l s o a t the initiation of voicing a f t e r
aspiration.
The collected F - p a t t e r n data on F 2 , F3, and F
No F1 d a t a a r e included.
T a b l e s I-A-3 and I-A-4.
4
a r e documented i n
The limiting value of
F1 i n the occlusion is of the o r d e r of magnitude of 200 Hz f o r a l l voiced
a t onset of voicing a f t e r unvoiced s t o p i s
1
g e n e r a l l y c l o s e t o t h e t a r g e t value of F1 except i n occasional i n s t a n c e s
stops.
On t h e o t h e r hand F
of unvoiced s t o p plus [a:].
Other a s p e c t s of a r t i c u l a t o r y movements s u c h
a s tongue body place shifts, o r a labial o r palatal closing g e s t u r e m a y continue during t h e vowel.
Obviously, a s i m p l e t i m e constant one f o r e a c h
formant independent of consonant and its vocalic context is not sufficient
f o r CV-synthesis.
The f i r s t object of t h e analysis was t o explore how much t h e initial
F2 and F3 values of a stop v a r y with r e s p e c t t o the a s s o c i a t e d vowel. It
c a n b e s e e n f r o m Table I-A-3 that the e x t r e m e low F -1400 Hz of
2io c c u r s with the vowel Lo:] and the maximally high F -1800 Hz with t h e
2 ivowels [i:], ce:],
and [Y:].
The voiced cognate [ b ] has the s a m e rnaxi-
value and a minimum F -900 Hz. Such d a t a on e x t r e m e ranges
2i
2 iof second and t h i r d f o r m a n t t e r m i n a l frequencies a r e summarized i n
mum F
Fig. I-A-7.
Fig. I-A-8 shows a s e t of corresponding d a t a e x t r a c t e d f r o m a n a r t i c l e by L e h i s t e and P e t e r s o n (1961).
A f i r s t glance a t the two f i g u r e s
r e v e a l s b a s i c s i m i l a r i t i e s ; the s m a l l range of variation f o r d e n t a l s ,
I
STL-CPSR 4/1969
TABLE I-A-3
F2F3F4a t instant of r
i
I
I
I
i
I
range
min-max
TABLE I-A-4
F 2 , F 3 , F4 a t instant of voice onset a f t e r unvoiced s t o p s
t h e l a r g e range f o r v e l a r s , and palatals a s a single g r o u p with the o v e r l a p
of FZi and Fgi ranges.
T h e g r e a t e r range f o r voiced than f o r unvoiced
l a b i a l s , a l r e a d y mentioned above, is found i n the Swedish a s well i n the
E n g l i s h data.
A d e t a i l analysis r e v e a l s that the extended initial F - p a t t e r n
range of voiced labial stops c a n b e a s c r i b e d t o a c l o s e r coarticulation with
back vowels [u:][o:]
and [a:] w h e r e a s unvoiced l a b i a l stops s t a r t f r o m
a m o r e n e u t r a l tongue position a t the instant of r e l e a s e .
a p p e a r s with Swedish dentals.
A s i m i l a r trend
The l o w e r bound of the F2i domain f o r
Swedish [ g ] is a l s o somewhat l o w e r than t h a t of [kg.
Following the co-
articulation model developed by Ohman (1 967) t h e s e effects could a t l e a s t
in p a r t b e a s c r i b a b l e t o t h e r e l a t i v e timing of a r t i c u l a t o r y programming.
As d i s c u s s e d previously i n connectian with Fig. I-A-6 the voiced s t o p
tongue movement i s equal t o that of the unvoiced one r e l e a s e d 30 m s e a r lier.
I
I
If we hypothesize the s a m e s e m i - n e u t r a l tongue body t a r g e t of
voiced a s well a s unvoiced stops the m e r e t r a n s l a t i o n of t h e vowel influence c u r v e t o t h e "right" i n t i m e f o r the unvoiced s t o p would reduce
t h e effect of vowel coarticulation on the t e r m i n a l values of f o r m a n t f r e quencies.
The range of t e r m i n a l F - p a t t e r n variations would b e even g r e a t e r if
w e i n s e r t e d different vowels b e f o r e t h e consonants, i. e. if both the following and the previous vowels w e r e v a r i e d independently a s in the study of
a h m a n (1966).
Our study above c a n b e r e g a r d e d as a s p e c i a l c a s e w h e r e
t h e consonant is preceded by a n e u t r a l vowel.
VICVZ syllables with C=voiced s t o p [g][b]
s t r e s s e d vowels [u:][a:]Cb:]
o h m a n h a s shown that in
o r i d ] and V1 and V2 equally
o r [i:] varied independently the t r a n s i -
tional p a t t e r n i n any p a r t of the t e s t words is influenced by both vowels
and the consonant.
Thus the initial F - p a t t e r n a f t e r r e l e a s e a s well a s the
p a r t depend on the p a r t i c u l a r V1 and can2
v e r s e l y the V I C offglide t r a n s i t i o n is influenced by V 2 .
following t r a n s i t i o n of the CV
One p a t t e r n a s p e c t studied by Cjhman was t h e consonant "locus" i n
t h e specific Haskins L a b o r a t o r i e s ' s e n s e .
T h e i r "locus" is defined a s
a common point a n t h e frequency s c a l e about 50 m s ahead of t h e r e l e a s e
which is r e g a r d e d a s t h e v i r t u a l s t a r t i n g point of F2 t r a n s i t i o n s f r o m one
and the s a m e consonant t o a l l p r ~ s s i b l evowels that c a n follow.
Delattre,
L i b e r m a n , and Cooper (1 955) claimed f r o m synthesis e x p e r i m e n t s that
[d) h a s a locus of 1800 Hz, [ b ] a locus of 7 2 0 Hz, and
with non-back vowels 3000 Hz.
tg] if
produced
The a r t i c u l a t o r y significance of t h e l o c i
a r e claimed t o b e invariant vocal t r a c t configurations.
This is a n o v e r -
simplification and the significance of the "locus" is p r i m a r i l y limited t o
two-formant synthesis rules.
Ohman s t a t e s that given a specific V1 and
C the f o u r possible V2 of t h i s t e s t provide t r a n s i t i o n s that c a n be e x t r a polated b a c k t o a common "locus" providing C is e i t h e r [ d l o r [ b ] and
I
Fig. I-A-7.
Range of i n i t i a l F2 and Fg of Swedish s t o p s in
combinations with a l l p o s s i b l e long vowels.
Fig. I-A-8.
L e h i s t e - P e t e r s o n data on range of initial F 2 and F3 of stops.
STL-QPSR 4/1969
with the locus being a function of t h e F2 sf V1.
However, a c l o s e r view
of o h m a n ' s d a t a shows that t h e invariance of [b] loci with r e s p e c t t o V 2
is not v e r y good. A brief study of t h e s p e c t r o g r a m s of o u r CV m a t e r i a l
supports the notion that [ b ] does not have a unique locus. That [g] h a s
a variable locus was evident a l r e a d y in the e a r l y Haskins L a b o r a t o r i e s '
work although they choose t o s p e a k of two
Cgl loci,
one f o r front vowels
and one f o r back vowels.
Transitions studied by analog simulation
Before entering a discussion on the relative importance of various
acoustic c u e s f o r s t o p consonant identifications i t is worth-while t o consult production theory in the support of s o m e of the m o r e uncertain
m e a s u r e m e n t s and t o provide s o m e general b a s i s f o r feature analysis.
The transitions of labial stops t o a following vowel a r e not always
e a s y t o follow i n the spectrogram.
The m a j o r p a r t of the labial opening
phase is often completed i n l e s s than 20 msec.
Production theory,
F a n t (1 960), s t a t e s that a n i n c r e a s e i n lipsection a r e a , everything e l s e
being equal, cannot r e s u l t i n a downward shift of any formant located at
a frequency lower than c/4io, w h e r e lois the length of the l i p passage
I
which in p r a c t i c e applies t~ a l l observable formants of the F-pattern.
However, the extent of the upward shift of formant frequencies v a r i e s
with the p a r t i c u l a r formant and the vocal t r a c t configuration.
As e a r -
I
transition of opposite sign t o that induced by the lip passage opening and
I
i t generally extends o v e r a longer period of time. A relative prominent
falling transition m a y result, s e e [mg i n Fig. I-A-1 and Lpu:] i n Fig.
h
I-A-4. T h e s a m e f e a t u r e if found i n Danish [ p 01, ~ischer-~Br~ensen(l954). I
l i e r pointed out a superimposed tongue body movement may produce a
An o b s c u r e detail in the [ba:]
s p e c t r u m is the v e r t i c a l s p e c t r a l line
f r o m 1000-2000 Hz i n the released transient.
It was observed a l r e a d y
i n m y spectrographic work a t the E r i c s s o n Telephone Co. in 1946-1 949,
s e e F a n t (1 959), Fig. 42.
One object of the analog calculations would
b e t o find out if it had anything t o do with F2 and F transitions. Another
3
object was the study of formant transitions f r o m [b] t o a front vowel [i],
F o r this purpose I adopted f o r a simulation study with o u r line analog
LEA the
&opening c r o s s - s e c t i o n a l a r e a a s a function of t i m e , s e e
Fig. I-A-9,
experimentally determined by F u j i m u r a (1961).
The a r e a
-
I
L I P AREA-TIME
-
-
-
-
.
I
--
9
-
-.
9
I
F i g . I-A-9.
F u j i m u r a (1 96 1 ) d a t a on l i p opening a s a
function of t i m e f o r t h e t e s t w o r d "pope".
function of t h e r e s t of the vocal t r a c t w a s kept constant.
One s e t of m e a s -
urements* w e r e m a d e with a vocal t r a c t a r e a function, a p p r o p r i a t e f o r
the R u s s i a n vowel [a:),
[P,
1,
one f o r [i:],
and one pertaining t o the palatized
s e e F a n t (1960).
At the i n t e r v a l of complete lip c l o s u r e Fl should not d r o p t o z e r o but
to a limiting value of about 150 Hz d e t e r m i n e d by t h e enclosed a i r volume
and the m a s s distribution a t t h e vocal walls.
all F
1
Accordingly, s e e F a n t (1960),
values w e r e c o r r e c t e d by a root s q u a r e s u m m a t i o n
T h e r e s u l t s of the calculations a r e shown i n Fig. I-A-10 and T a b l e
I-A-5,
According t o Fig. I-A-9 the lipopening h a s reached 50
final value a t 10 m s and then p r o c e e d s a t a s l o w e r rate.
70 of
the
A major part
and F t r a n s i t i o n s a r e a l s o completed a t 10 m s a f t e r r e l e a s e .
1
2
All t r a n s i t i o n s a r e positive a s expected. The F 2 and F 3 t r a n s i t i o n s of
[ba:] a r e s m a l l and i t c a n accordingly b e concluded that t h e r e l e a s e
of the F
t r a n s i e n t above F2 should b e d i s r e g a r d e d i n t r a n s i t i o n studies.
. jumps up 500 Hz on the f i r s t 5 m s . T h e t e r m i n a l value
21
1200 Hz is l o w e r than the F2i=1700 Hz m e a s u r e d f r o m s p e c t r o g r a m s .
In [bi:] F
T h i s difference could be explained by limited m e a n s of following s u c h a
rapid t r a n s i t i o n i n t h e s p e c t r o g r a m .
Another s o u r c e of deviation of the
model f r o m the spoken d a t a could b e that t h e tongue body configuration
a t the instant of r e l e a s e i n [bi:3 is not that of a p u r e [i:] but is p e r t u r b e d
i n the d i r e c t i o n of a n e u t r a l position a s i n t h e palatalized [ b ,
of Fig.
is c l o s e r t o 1500 Hz and the ex2i
tent of the F t r a n s i t i o n i s s m a l l e r . In view of t h e wide range of co2
articulation induced by a previous vowel i n C V G c o n t e x t s and possible
I-A-10 w h e r e the t e r m i n a l value of F
fluctuations in initial tongue configuration in production of CV-syllables
i t i s , anyhow, apparent that v a r i a t i o n s i n F
2i
of [bi:]
c a n b e expected.
On the whole, however, d i s r e g a r d i n g t h e l a c k of-information on the
f i r s t 5 m s the calculated dynamical F - p a t t e r n of [bi:] i n Fig. I-A-1 ti a g r e e s
well with m e a s u r e d data.
One p a t t e r n a s p e c t wellknown f r o m s p e c t r o -
g r a m s is that the F t r a n s i t i o n g o e s on f o r a longer t i m e than t h e F2 t r a n 3
sition and, with d i s r e g a r d t o the first 5 m s , c o v e r s a g r e a t e r frequency
s p a n than F2.
*
I a m indebted t o Doc. J. Sundberg f o r c a r r y i n g out t h i s work.
Lb 4
CALCULATED F-PATTERNS
(closed glottis)
Lb il
Fig. I-A-10. Calculated dynamic F-patterns of voiced labial rtopr.
As seen in Fig. I-A-2 the transitional pattern of the [ ~ i : ]aspiration i s
not l e s s apparent than that of [bi:].
The main part of the F j transition i s
completed in 40 m s according to the simulation in Fig. I-A-10,
In this
has moved from the 2200 Hz terminal value to 2750 Hz. This
3
compares very well with measurements f r o m the spectrogram in Fig.
time F
I-A-2.
The F3 transition in the following and l a t e r part of the spectro-
g r a m reaches a higher target value than in the simulated syllable which
I
I
can be ascribed to the tongue body movement up to a higher d c z r e e of
closure typical for the diphthongization of Swedish [i:].
However, apart
from this added F3 movement the longer duration of the F3 transition
compared with the F transition i s related to a higher, differential influ2
ence of the lip parameter on F than on F2 a t relative large degrees of
3
lipopening. In t e r m s of resonator theory this i s explained by the fact that
F 2 i s a standing wave resonance of the pharynx and once the lipopening
has reached a value high enough s o a s to not compete with the palatal
s t r i c t u r e the F
influence will be minimal. Also, since F3 of [i:] i s a
2
mouth cavity resonance it will be highly susceptible to variation in the
lip area.
Experimental check of occlusion F-pattern
Vocal t r a c t simulation i s an indirect means of studying the F-pattern
i n articulatory closed parts of the utterance.
It would be handy i f a con-
tinuous tracking of the F-pattern were possible in a l l parts of real speech.
If we limit our object to voiced stops there exist some limited possibilities
of studying FlF2 and F during occlusion providing a high frequency em3
phasis and extra gain i s utilized in the spectrographic analysis. A small
pilot studyJc has provided us with data that support the findings above
concerning [ba:] and Chi:]. It was thus found that FZiof [ba:] was
1000 Hz and of [bi:] 1700 Hz a s measured from a separate recording of
the same subject.
During [bi:] and [ga:] there were prominent transi-
tions within the occlusion.
One technical difficulty in the analysis was the need for high input levels
to the spectrograph and thereby the r i s k of overloading with intermodulation formants appearing.
Aslother difficulty i s the low level of the voice
source immediately before release.
St
This pilot study was carried out by S. Pauli utilizing both the Voiceprint
Spectrograph and the 51 -channel analyzer. A separate seport on these
studies is planned.
I
Identification of s p e c t r a l components
Ambiguity often a r i s e s a s what i s the t r u e r e l e a s e t r a n s i e n t of palatal
and v e l a r stops.
n
As pointed out a l r e a d y by F i s c h e r - ~ b r ~ e n s e(1954)
t h e r e often o c c u r double o r t r i p l e s p i k e s indicating a sequence of i n t e r rupted a i r injections through the a r t i c u l a t o r y s t r i c t u r e , s e e Fig. I-A-I*.
T h e s e multiple s p i k e s could reflect a suction reaction a t the a r t i c u l a t o r y
s t r i c t u r e by the Bernoulli p r e s s u r e just a s i n t h e n o r m a l voice sour.2.
In voiced v e l a r s t o p s they m a y o c c u r superimposed on the r e g u l a r voice
s o u r c e operating i n a b r e a t h y mode s o a s t o d a m p out F1.
This reduction
o c c u r s both b e f o r e and a f t e r the r e l e a s e and is thus not i n itself indicative
The double spikes of the [ k ) b u r s t could a l s o
of the instant of release.
originate f r o m a reaction on the glottis a t t h e r e l e a s e resulting i n a mom e n t a r y flow reduction.
F u r t h e r investigations a r e needed t o r e a c h a
b e t t e r understanding of t h e s e phenomena.
Another p r o b l e m of i n t e r e s t
locus of unvoiced stops. The subglottal impedance shunting the
1
supraglottal impedance i n a c i r c u i t t h e o r y model would account f o r a sub-
is the F
s t a n t i a l i n c r e a s e i n F1 and could a l s o introduce t r a c e s of subglottal r e s o nances.
B e c a u s e of the low energy l e v e l of F 1 in the a s p i r a t i o n i t is h a r d
t o g e t r e l i a b l e m e a s u r e s of a n initial F just b e f o r e r e l e a s e . J u s t a f t e r
1
r e l e a s e one o b s e r v e s values of the o r d e r of 300-600 Hz depending on t h e
p a r t i c u l a r vowel, s e e F i g s . I-A-1
- I-A-4.
However, F 1 of t h e a s p i r a -
tion is not v e r y important f o r e i t h e r perception o r f o r s y n t h e s i s and r e cognition work.
Acoustic c h a r a c t e r i s t i c s and svnthesis r u l e s
When discussing the s t o p s a s a specific ensemble we need not w o r r y
about distinctive f e a t u r e s i n a g e n e r a l sense.
t h e relation of t h e s u b s e t [k][p][t]
We c a n proceed t o d i s c u s s
t o that of [g][b][d]
and f u r t h e r on
investigate the t r i a n g u l a r place relations within e a c h s u b s e t , e. g. what
p a t t e r n a s p e c t s o r c u e s a r e typical f o r each of the m e m b e r s within the
s u b s e t i n relation t o each of t h e o t h e r m e m b e r s .
We do, of c o u r s e , find
the expected s i m i l a r i t i e s k/g = p/d = t/d etc. underlying t h e four n a t u r a l
c a t e g o r i e s which a r e traditionally r e f e r r e d t o a s 1) unvoiced/voiced,
2) v e l a r s and palatals,
3) labials, and 4) dentals.
In t h i s limited m a t e -
r i a l of s t r e s s e d and isolated CV-syllables the distinction between voiced
and unvoiced s t o p s i s v e r y c l e a r , a s h a s been d i s c u s s e d i n the previous
s ec tions
*
1
.
-
See a l s o i l l u s t r a t i o n s of s e v e r a l s p e a k e r s ' [ka] and Cga] i n F a n t (1957/68).
I
STL-QPSR 4/1969
lfi
I
A synthegis of GV-stop plus long vowel syllable& of t h e type studied
h e r e could proceed as followst
tespect t o the p h r a s e prosbdy, a
point oh thk t i m e s c a l e wheL'e t e vowel s h a l l s t a r t , If preceded
by a voiced s t o p this is the instant of the s t d p r e l e a s e transient.
If preceded by a n udvoiced stop t h i s 1s the instant of voicidg onset
a f t e r aspiration.
( I ) Determine fil'dt, if needed wit
2
(2) Choose the vowel length a f t e r m o r e o r l e s s detailed r u l e s s t a r t i n g
f r o m a m e a n value of 250-350 m s f o r long vowels according t o
tempo and d e g r e e of emphasis required. Add 30 m s to the vowel
if preceded by a voiced stop. The instant of r e l e a s e t r a n s i e n t of
a n unvoiced s t o p is placed 80-120 m s ahead of the voicing onset.
( 3 ) An a p p r o p r i a t e F - p a t t e r n f o r the whole voiced s t o p plus vowel
sequence i s generated. This c a n b e used a s a n approximation
a l s o f o r t h e corresponding unvoiced s t o p if synchronized t o have
i t s r e l e a s e t r a n s i e n t coincide with a point 30 m s a f t e r the r e l e a s e
of the unvoiced stop. The F - p a t t e r n f o r the initial 30 m s of the
b u r s t is t r a c e d by r u l e s f o r l i n e a r extrapolation back i n time.
L a b i a l s b e f o r e b a c k vowels r e q u i r e s e p a r a t e F - p a t t e r n s f o r voiced
and unvoiced stops. T h e s e can, however, probably b e derived
f r o m coarticulation rules. A m i n o r c o r r e c t i o n f o r the effect of
glottal opening on the F - p a t t e r n should be added. An open glottis
i n c r e a s e s F and F3 by about 50-100 Hz.
2
(4) Make the Fo contour synchronous with r e s p e c t t o the F-pattern.
F o r unvoiced stops add a n Fo i n c r e m e n t i n the f i r s t 50 m s a f t e r
voicing onset.
(5) Choose a n a p p r o p r i a t e dynamic p a t t e r n of intensity and s p e c t r a l
distribution of the voice s o u r c e . O u r s p e a k e r consistently shifted
his voiced s o u r c e s p e c t r a l balance t o a m o r e high-frequency d e emphasized shape i n the l a t e r half o r t h i r d of the vowel. An a s p i r a t i v e final t e r m i n a t i o n of voicing i s frequent i n the vowel [a:].
Although s o m e of t h e s e c h a r a c t e r i s t i c s v a r y with s p e a k e r the t r e n d
of d e c r e a s i n g vocal effort with t i m e i s typical of the s e n t e n c e final
position.
( 6 ) Apply r u l e s f o r s p e c t r u m and t i m e shaping of r e l e a s e t r a n s i e n t s
and f r i c a t i v e segments. T h e s e r u l e s have yet t o b e worked out on
the b a s i s of production t h e o r y , F a n t (1960), and m o r e quantitatively
aimed p a t t e r n matchings, a s will b e d i s c u s s e d l a t e r . In g e n e r a l ,
s e e F a n t and MArtony (1962), the r e l e a s e t r a n s i e n t should b e synthesized with a DC-stop s o u r c e and a f r i c t i o n s e g m e n t with an app r o p r i a t e l y shaped noise source. T h e r e l e a s e t r a n s i e n t and the
f r i c t i o n a r e both synthesized with t h e 'I(-filter", w h e r e a s the iollowing a s p i r a t i o n i s shaped with the 'IF-filter".
The initial F- att tern i s a d a c e c o r r e l a t e
We s h a l l now r e t u r n t o a study of the d a t a on F - p a t t e r n s and t r a n s i tions i n o r d e r t o evaluate how distinctive they a r e i n identifying "place"
of articulation of the consonant and what additional c u e s should b e taken
into consideration.
18.
STL-QPSR 4/1969
It i s well known and rather obvious that the transitional patterns in
the voiced part of a vowel after a heavily aspirated stop pertain to instances in time where the articulators have moved s o f a r away f r o m the
consonant that their movements do not retain much distinctiveness.
In
Table I-A-6 lp][t] and [k] a r e compared in t e r m s of F2 and F3 at the
voicing boundary.
The reduction i s especially apparent comparing [ t )
and [p] before the vowel
[ a:] and unrounded front vowels.
The loss of
transitional information within the stop burst i s specified by Table I-A-7.
The amount by which voiced and unvoiced stops differ in F2 and F3 a t the
instant of the release transient i s shown in Table I-A-7.
The e a r l i e r
discussed differences in articulation of voiced and unvoiced labials before back vowels a r e apparent.
In other combinations the differences
a r e not l a r g e r than 300 Hz and generally smaller than 200 Hz.
TABLE I-A-6
F2 and F 3 differences a t instant of voice onset
a s place correlates within unvoiced stops
TABLE I-A-7
Extent of F and F transitions within unvoiced
2
3
segments (from release to voice onset)
The d i s c r i m i n a t i v e power of the second and t h i r d f o r m a n t frequencies
. and Fji is i l l u s t r a t e d i n Fig. I-A-11 and Fig. I-A-12. The following
21
g e n e r a l conclusions c a n b e drawn. T h e m a i n c h a r a c t e r i s t i c of dentals
F
c o m p a r e d with l a b i a l s is t h e 350-500 Hz higher Fji.
Dentals m a y have
higher FZit h a n l a b i a l s if c o m p a r e d i n context with the s a m e vowel.
palatal [ k] and
Cg]
The
b e f o r e t h e unrounded front vowels [ k][ e] and [ E:] com-
p r i s e a p e r i p h e r a l l y located s u b s e t of higher Fji
- - and a l s o somewhat higher
F . than any dental. The [ k) and Cg3 b e f o r e rounded front vowels [ y:]
21
[u:] and
differ f r o m l a b i a l s and d e n t a l s by a somewhat higher
[#:I
only.
T h e v e l a r [k] and
Lg]
b e f o r e the back vowel [a:] h a s a l o w e r Fgi
than any l a b i a l plus vowel.
It is i n t e r e s t i n g t o note that t h e initial F F p a t t e r n differentiates un2 3
voiced s t o p s somewhat b e t t e r than voiced s t o p s which is fully i n l i n e with
t h e previously i n f e r r e d finding that a t the i n s t a n c e of r e l e a s e t h e unvoiced
s t o p s a p p e a r t o b e l e s s coarticulated with t h e following vowel t h a n is c o r responding voiced stops.
T h i s is a l s o a p p a r e n t by the s m a l l e r s p r e a d of
t h e unvoiced d a t a with r e s p e c t t o vowel context a s a l r e a d y pointed out .in
connection with Fig. I-A.-7.
ferences in
T h e d e t a i l data on the unvoiced-voiced dif-
F 2i and Fgi a r e given i n T a b l e I-A-8.
The negative values of
F3p-F3ba r e a s c r i b a b l e to t h e difference i n coarticulation a s is typically
of. [u:][o:l
and [a:]. It should b e kept i n mind that the glottal
F2p'F2b
shunt contributes t o the t r e n d of positive signs of t h e d a t a with a n a v e r a g e
amount of the o r d e r of t 1 0 0 Hz.
Thus with the exception of the infiltration of Lgu:] and [gu:] i n the
l a b i a l a r e a i n Fig. I-A-10 a l l dentals a r e confined t o one a r e a of the place
and a l l l a b i a l s a r e confined to a s e p a r a t e a r e a and the v e l a r - p a l a t a l s t o
a l a r g e range of p e r i p h e r a l locations outside t h e s e a r e a s .
F o r the c o r -
responding unvoiced s t o p s , Fig. I-A-1 1 , t h e r e i s no overlapping.
The
vowel t a r g e t s a r e included i n Fig. I-A-10 s o a s t o allow a derivation of
t h e d i r e c t i o n of CV-transitions,
S p e c t r a l energy cues.
General feature discussion
An effective approach f o r testing t h e relevance of t h e s e t r a n s i t i o n a l
c u e s is t o look up p a i r s of consonants i n t h e s a m e vowel context w h e r e
t h e F - p a t t e r n d a t a a r e the s a m e o r a l m o s t the s a m e and then s e e what
o t h e r c u e s t h e r e a r e t o note.
i n his VICV
2
studies.
This technique was used by a h m a n ( 1 966)
He found that the CV2 p a r t of Cybo] was the s a m e
I
I
-
I
I
I
I
I
I
I
I
I
I
I
1
I
F3 AND F2 M E A S U R E D AT PLOSION
SUBJECT: B.L.
-
I
.
I
k eki
I
I
k~
D
-
D
D
I
..
-
D
m
m
.
I
-
ka
D
rn
rn
m
-
-
-
I
I
I
Fig. I-A-12.
I
I
I
I
I
I
I
I
I
1
I n i t i a l F and F of unvoiced S w e d i s h s t o p s , s u b j e c t B . L.
2
T h e vowel t a r g e l s a r e i n d i c a t e d i n t h e f i g u r e .
STL-QPSR 4/1969
21.
An extension of the r a n g e of analysis t o h i g h e r frequencies than
4000 Hz adds t o t h e distinctiveness of t h e s e visually defined c u e s ,
mainly by displaying the high frequency components of the [ t ] and [ d l
bursts.
The s t a t e m e n t s above concerning " s p e c t r a l energy'' r e f e r t o the
f i r s t 10-30 m s a f t e r the r e l e a s e which a p p e a r s t o c a r r y t h e m a i n information on the place of articulation.
T r a n s i e n t b u r s t and t h e f i r s t p a r t
of a vowel when appearing within t h i s segment should b e regarded a s a
single stimulus r a t h e r than a s a s e t of independent c u e s , F a n t (1960, p. 217),
Stevens (1967).
When relating d a t a f r o m r e a l s p e e c h to e x p e r i m e n t s with
synthetic s p e e c h one should keep this i n mind.
As stated a l r e a d y by E.
~ i a c h e r~- j d r ~ e n s e(1
n954): "The l i s t e n e r does not c o m p a r e eirplosion
with explosion and t r a n s i t i o n with t r a n s i t i o n but c o m p a r e s a r t i f i c i a l s yllables comprising e i t h e r explosion o r t r a n s i t i o n with n a t u r a l syllables
that always contain both".
When discussing t r a n s i t i o n s it s e e m s w i s e t o distinguish two categories:
1) those r e l a t e d to the overall tongue b ~ d ymovement within the whole of
a previous o r a following vowel and 2 ) t h o s e related to the b r e a k of a
consonantal obstruction o r t h e movement towards c l o s u r e .
T h o s e belong-
ing t o c a t e g o r y 1 ) mainly reflect vowel coarticulation and a r e l e s s d i s tinctive than those of c a t e g o r y 2).
A typical example is t h e falling t r a n s i -
tion f r o m labial s t o p t o back vowel, s e e Fig. I-A-4, which r e f l e c t s the
tongue body movements w h e r e a s the labiality c u e s m a y b e confined t o t h e
f i r s t 10 m s only and m a y not b e visible i n the s p e c t r o g r a m .
Production theory, F a n t (1960), provides a b a s i s f o r explanation of
the o r i g i n of the g e n e r a l c h a r a c t e r i s t i c d i s c u s s e d above and i s the s t a r t i n g
point f o r d e r i v a t i o n of synthesis s t r a t e g i e s .
Thus the m a i n f o r m a n t of the
[ k]Lg] sounds d e r i v e s f r o m the cavity i n front of the tongue constriction
and is r e p r e s e n t e d by a f r e e pole,
The diffuse s p e c t r u m of [ p ] and [ b ]
r e l e a s e originates f r o m the l a c k of any f r o n t cavity.
At r e l e a s e the d i s -
p e r s i o n effect is pronounced, pole frequencies rapidly moving i n positive
d i r e c t i o n away f r o m a s s o c i a t e d z e r o s which n e u t r a l i z e the poles b e f o r e
release.
The [ k l [ g ] ,
on the o t h e r hand, have a f r e e pole b e f o r e r e l e a s e .
In
the c r i t i c a l s e g m e n t a f t e r r e l e a s e t h i s pole cannot d i s p l a y v e r y rapid
movements.
The [t ] and [ d l have a s m a l l and n a r r o w front channel be-
hind the s o u r c e which is a s s o c i a t e d with a high-pass sound filtering.
TABLE I-A-9
B u r s t formant a r e a s of [k] and
[g]
T A B L ~I - A ~ I O
T a r g e t values of subject; s f o r m a n t frequencies towards the end of the vowel
,*l L
The m e a n frequency of the [k] and
Lg?
b u r s t s and t h e i r F a p a t t e r n
associations i n different vowel contexts have b e e n m e a s u r e d and the d a t a
a r e p r e s e n t e d i n Table IdA-9.
1000 Hz t o 3500 Hz.
The d a t a v a r y o v e r a 2500 Hz range f r o m
The observed differences With r e s p e c t t o voicing
a r e not v e r y significant i n view of the limited data.
Secondary c o r r e l a t e s to the place of articulation f o r [ k] and [ g ] a r e
the approximately 30 m s d e l a y f r o m r e l e a s e t r a n s i e n t t o t h e a p p e a r a n c e
of the f o r m a n t s t r u c t u r e i n the following vowel.
The F
1
transitions after
[ b l [ d ] and [ g ] a r e not m u c h d i f f e r e n t except that the F1 r i s e tends t o b e
somewhat s l o w e r a f t e r
Cg].
The differences i n vowel t a r g e t s conditioned
by the p a r t i c u l a r place of articulation of t h e consonant could b e m e a s u r e d
but a p p e a r t o b e too s m a l l t o be of any appreciable perceptual significance.
T h e F0 c u e s a l s o contribute. Approximate vowel t a r g e t s f o r the subject
B. L. a r e shown i n Table I-A-10. They p e r t a i n t o the final p a r t of the
vowel, i n c a s e of c l o s e vowels (lowest l e v e l F ) t o t h e diphthongal t e r 1
mination.
In [u:] and [u:] this i s a l i p c l o s u r e which accounts f o r the
falling F2 and F
3'
F o r Ci:]
and
Ly:]
the diphthongal element is m a d e
with the tongue p r e s s i n g h a r d e r against the palate'
T h i s accounts f o r the
r i s e i n F 3 a t constant lipopening i n ly:] and [if]' A m o r k detailed d i s c u s s i o n of Swedish vowels was given by F a n t (1 96914
Intensity-frequency sections of the t r a n s i e n t and b u r s t s p e k t r a of
Swedish s t o p s have e a r l i e r been published Hy F a n t (1959) and c o r r e s p o n d ing d a t a on R u s s i a n stops by F a n t (1960).
T h e s e d a t a support t h e conclu-
s i o n s above and support t h e f e a t u r e f r a m e of jakobson, F a n t , and Halle
(1 952/67) a s [ k ) [ g l being compact, [ p ] [ b j diffuse and g r a v e , [ t ] [ d l
diffuse and a c u t e (nongrave).
Although Chomsky and Halle (1 968) improved t h e f e a t u r e s y s t e m by
introducing tongue body f e a t u r e s s e p a r a t e f r o m t h e place of articulation
f e a t u r e s they have not been equally s u c c e s s f u l i n defining "place" f e a t u r e s
that i r r e s p e c t i v e of c o o c c u r r e n c e with o t h e r f e a t u r e s r e t a i n s o m e p e r ceptual invariant e o r a t l e a s t s i m i l a r i t y .
F u r t h e r m o r e , they a r e highly
disputable e v e n on the level of production c o n t r o l , F a n t (1969).
Although
t h e f e a t u r e "anterior" t a k e s o v e r the function of "diffuse" and thus could
i n h e r i t the s a m e c o r r e l a t e s t h e r e i s a r e a l trouble with the "coronal"
f e a t u r e , which l o s e s i t s physiological b a s i s when separating dentals f r o m
labials.
The c l a s s of labial consonants is accordingly s e l e c t e d by r e f e r -
ence t o the negative of a f e a t u r e r e f e r r i n g t o activities i n m u s c l e s which
have nothing to d o with the lips.
F r o m the perceptual point of view the f e a t u r e [+coronal] s e p a r a t i n g
dentals f r o m l a b i a l s when combined with the f e a t u r e [ + a n t e r i o r ] i m p l i e s
a high v e r s u s low frequency emphasis.
When the c o r o n a l f e a t u r e is used
t o differentiate [ - a n t e r i o r ] f r i c a t i v e s , e. g. Swedish [ s] and [ c ] , with
r e s p e c t t o the t i p of the tongue being up [+coronal] o r down [ - c o r o n a l ]
t h e acoustic effect a p p e a r s t o b e the opposite, t h e [+coronal] (retroflexion)
accounting f o r a lowering of the m e a n frequency of the s p e c t r u m .
I cannot
find any o t h e r s p e c t r a l c h a r a c t e r i s t i c s of t h e "coronal" f e a t u r e t h a t would
b e retained i n combination with both
+ and
- anterior,
The "coronal"
f e a t u r e would not display t h i s acoustical ambiguity i f r e s t r i c t e d t o the
c l a s s of [ - a n t e r i o r ] consonants.
Stevens' (1967) t h e o r y of perceptual i n v a r i a n c e conforms with t h e gene r a l s t a t e m e n t on s t o p f e a t u r e s above and h a s elements i n common with
that of F a n t (1960, p. 217) and Jakobson, F a n t , and Halle (1 952/67).
his t r e a t m e n t of v e l a r sounds i s a l m o s t t h e s a m e a s m y e a r l i e r .
His
Thus,
STL-QPSR 4/1969
24.
floating r e f e r e n c e of s p e c t r a l energy with r e s p e c t t o the following vowel
being low in labials is valid f o r the s h o r t ( = l o m s ) delabialization s e g ment only and r e q u i r e s that the aspiration is identified with the vowel.
I have a feeling that the r e f e r e n c e t o the vowel i s not needed f o r d i s criminating [p] and [t],
Stevens' t r e a t m e n t of
lower pitch than a retroflex
[a is valid f o r v e l a r Cg]
it is m o r e n a t u r a l t o oppose v e l a r
the relation of
t r a t e d energy.
[%I
Cg]
Lg]
a s acoustically of
only.
In my view
t o palatal [ g l pitch wise whereas
t o [g] is basically a m a t t e r of s p r e a d v e r s u s concen-
The
[el should
rightly b e opposed t o [ d l , the
more'Y1at"and a l s o l e s s s p r e a d than [ d l .
[t]being
The role of the f e a t u r e "dis-
tributed" i n this connection is not clear.
References
CHOMSKY, N. and HALLE, M, (1 968):
Sound P a t t e r n of English ( ~ e w
~ork).
DELATTRE, P., LIBERMAN, A.M., and COOPER, F.S. (1955):
"Acoustic Loci and Transitional Cues for Consonants", J.
Acoust.Soc.Am. 27, pp. 769-773.
ELERT, C -C. (1 964): Phonological Studies of Guantity i n Swedish
( t h e s i s , Uppsala),
FANT, G. (1 957/68): "Den akustiska fonetikens grunder", Report No. 7,
KTH, Speech T r a n s m i s s i o n Laboratory (stockholm), new edition.
FANT, G. (1 959): "Acoustic Analysis and Synthesis of Speech with Applications t o Swedish", E r i c s s o n Technics No. 1, pp. 3-108.
FANT, G. (1 960): Acoustic Theory of Speech Production (' s - ~ r a v e n h a ~ e ) ,
FANT, G. (1 968): "Analysis and Synthesis of Speech P r o c e s s e s t ' i n
Manual of Phonetics ed. by B. Malmberg, pp. 17 3-277
(~msterdam).
FANT, G, (1969): "Distinctive F e a t u r e s and Phonetic Dimensions",
pp. 1-18, STL-QPSR 2-3/1969.
FANT, G. and M ~ T O N Y ,J. (1 962): "Speech Synthesis1', pp.
STL-QPSR 2/1962.
FANT, G.
18-24,
, LINDBLOM, B. , and M ~ R T O N Y J.
, (1 963): "Spectrograms
of Swedish Stops", p. 1 , STL-QPSR 3/1963.
FISCHER-J~RGENSEN, E. (1954): "Acoustic Analysis of Stop Consonants",
Miscel. Phonetica 2, pp. 42-59.
FUJIMURA, 0. (1961): "Bilabial Stop and Nasal Consonants: A Motion
P i c t u r e Study and i t s Acoustical Implications", J, of Speech and
4, pp. 233-247.
Hearing R e s e a r c h HOUDE, R. A. (1 967): "A Study of Tongue Body Motion During Selected
Speech Sounds", (thesis, Univ. of Michigan, Ann Arbor).
JAKOBSON, R. , FANT, G. , and HALLE, M. (1 952/67): " P r e l i m i n a r i e s
t o Speech Analysis: The Distinctive F e a t u r e s and T h e i r C o r relates", MIT, Acoust. Lab. , Techn. Rep. No. 13 (1 952);
7th edition publ. by MIT P r e s s (Cambridge, Mass. ).