full text - Queen Margaret University

Transcription

full text - Queen Margaret University
Language Interaction in the Bilingual
Acquisition of Sound Structure:
A longitudinal study of vowel quality, duration
and vocal effort in pre-school children
speaking Scottish English and Russian
Olga B. Gordeeva
PhD Thesis, 2005
QMUC SSRC Theses Online, release date: 31.05.2006
© Olga Gordeeva, 2005
This is an electronic (pdf) version of Olga Gordeeva’s PhD, submitted by the author as a pdf
file, with this cover sheet added. The definitive printed and bound version of the thesis is
available by inter-library loan (including via microfilm or electronically scanned), but
reference can be made to this version if it is clearly identified as the “QMUC SSRC Theses
Online” version with the appropriate date of release.
LANGUAGE INTERACTION IN THE
BILINGUAL ACQUISITION OF
SOUND STRUCTURE:
A longitudinal study of vowel quality,
duration and vocal effort in pre-school
children speaking Scottish English and
Russian
Olga Gordeeva
A thesis submitted in partial fulfilment of the
requirements for the degree of
Doctor of Philosophy in
Speech and Hearing Sciences
Queen Margaret University College
February 2006
Declaration
I confirm that the thesis submitted is my own work and that appropriate credit has
been given where reference has been made to the work of others.
Olga Gordeeva
15 February, 2006
ii
Publications from the Thesis
Gordeeva, O., Mennen, I. and Scobbie, J.M. (2003). Vowel duration and spectral
balance in Scottish English and Russian. In M.J. Solé, D. Recasens and J. Romero
(Eds.). Proceedings of the 15th International Congress of Phonetic Sciences (pp. 3193
– 3196), Universitat Autònoma de Barcelona.
iii
Acknowledgements
Conducting a PhD has been an enjoyable time for me from start to finish. My
deep appreciation is to Peter, my husband, who encouraged me to pursue this path. All
this time he was of invaluable support in many ways, most importantly by being as he
is. He helped me out practically by writing the algorithms of noise subtraction and
RMS-power analysis. My bilingual son Maxim has given me a great motivation for
this research, and he is just such a cute boy. I dedicate this thesis to him.
I am lucky to have had Dr. Ineke Mennen and Dr. Jim Scobbie as supervisory
team. Ineke has been encouraging and motivating from the moment we got in touch.
This was crucial to me. Her great knowledge of intonation, bilingual and second
language acquisition have inspired me and affected my views in the thesis. Jim has
given me great support throughout, not the least with his deep analytic way of
thinking, and he infected me with his keen interest and insights of sociophonetic
variation. Both of them have been around with a critical piece of advice, and we have
been a fantastic team and friends. Thanks for all this!
This PhD would have been impossible without the kind and enduring
participation of the two lovely bilingual girls and their parents on many occasions in
2002/2003. Claire Withnell was of a big help in the Scottish part of recordings. Many
thanks are to all the children, their parents and adults who participated in this study.
Thanks to QMUC for the financial contribution during my PhD, and to all
members of staff at Speech and Language Sciences supporting my research. Special
thanks are to Steve Cowen for his technical support, and to Robin Lickley for his help
with statistical analyses. Thanks to Ben Matthews, Suzanne Fuchs, Alan Wrench,
Joanne McCann, Lianne Carroll, Ioulia Grichkovtsova and Natalia Zharkova for nice
chats during coffee breaks and practical help. Thanks to Alice Turk and Bert
Remijsen for giving good advice. Thanks to Michael Jessen for introducing me to his
study.
I am grateful to Prof. Jeannine Vereecken and Prof. Voordeckers at the
department of Slavic Languages at the University of Ghent (Belgium), for revealing
me the joys of scientific thinking.
iv
Abstract
This PhD thesis contributes new empirical knowledge to the question of what
paths bilingual acquisition of sound structure can take in early simultaneous
bilinguals. The issues of language differentiation and interaction are considered in
their relationship to language input, crosslinguistic structure and longitudinal effects.
Two Russian-Scottish English subjects aged between 3;4 and 4;5 were recorded
longitudinally. Russian was spoken in their families, and Scottish English in the
community (Edinburgh, UK). The family environments were similar, but one subject
had received substantially more input in Russian than the other one. We addressed the
detail of their production of prominent syllable-nuclear vowels /i  / in Scottish
English and /i u/ in Russian with regard to their vowel quality, duration and vocal
effort. Language differentiation and interaction patterns were derived by accounting
for the language mode, and by statistical comparison of the crosslinguistic structures
to the speech of monolingual peers (n=7) and adults (n=14).
Subjects’ bilingual results revealed both substantial language differentiation and
systematic language interaction patterns. The extent of language differentiation and
directionality of interaction depended on the amount of language exposure. Its
directionality did not necessarily depend on the markedness of the crosslinguistic
structures, and could be bi-directional for the same properties. Longitudinally,
language differentiation increased, while interaction reduced. The amount of
reduction depended on both language input and the structural complexity of the
languages with segmental tense/lax contrast and complex postvocalic vowel duration
conditioning showing more persistent language interaction effects.
The results confirmed the importance of language input. We showed that in
bilingual phonological development language interaction should be considered as a
normal but non-obligatory process. Besides, some structurally complex processes
potentially explainable by ‘markedness’ (applied to isolated segments) could rather be
explained by lexical and phonotactic factors.
v
Table of Contents
Declaration.....................................................................................................................ii
Publications from the Thesis........................................................................................ iii
Acknowledgements.......................................................................................................iv
Abstract ..........................................................................................................................v
Table of Contents..........................................................................................................vi
List of Tables ...............................................................................................................xii
List of Figures .............................................................................................................xvi
List of Equations .........................................................................................................xxi
List of Abbreviations and Conventions .....................................................................xxii
1
Background ............................................................................................................1
1.1
Introduction....................................................................................................1
1.2
Important Concepts and Definitions ..............................................................2
1.2.1
Bilinguals and Bilingualism...................................................................2
1.2.2
Language Interaction .............................................................................5
1.3
Bilingual Language Differentiation and Interaction ......................................8
1.3.1
What is it about? ....................................................................................8
1.3.2
Factors Affecting Language Interaction ..............................................10
1.3.2.1 Language Mode and Pragmatic Awareness.....................................10
1.3.2.2 Language Mixing in the Input..........................................................12
1.3.2.3 Structural differences of the languages in contact ...........................13
1.3.2.3.1 Why should language structure be important?...........................13
1.3.2.3.2 Cross-Language Cue Competition.............................................18
1.3.2.3.3 Markedness ................................................................................20
1.3.2.4 Language dominance .......................................................................22
1.3.2.5 Bilingual bootstrapping....................................................................24
1.4
Summary ......................................................................................................25
2
Crosslinguistic Differences in Sound Structure and Their Acquisition...............26
2.1
Differences in Sound Structure between Scottish English and Russian ......26
2.1.1
Introduction..........................................................................................26
2.1.2
Theoretical Framework for the Research Variables ............................27
2.1.2.1 A Short Sketch of the Research Variables.......................................27
2.1.2.2 ‘Stress-Accent Hypothesis’..............................................................28
2.1.2.3 Stress and Vocal Effort in ‘Stress Accent’ Languages ....................29
2.1.2.4 Acoustic Correlates of Vocal Effort in ‘Stress-Accent’ Languages 31
2.1.2.5 Functional Load ...............................................................................36
2.1.3
Segmental Differences between Scottish English and Russian ...........41
2.1.3.1 Russian vowel system ......................................................................41
2.1.3.2 Scottish English vowel system.........................................................42
2.1.3.3 Segmental Differences in the Focus of Investigation ......................43
2.1.4
Prosodic Differences between Scottish English and Russian ..............44
2.2
Language Interaction in Bilingual Acquisition of Vowel Quality...............51
2.2.1
Monolingual Acquisition .....................................................................51
2.2.1.1 Non-Scottish English and Scottish English .....................................51
2.2.1.2 Russian.............................................................................................55
2.2.2
Bilingual Acquisition ...........................................................................58
2.3
Language Interaction in Bilingual Acquisition of Vowel Duration.............64
vi
2.3.1
Monolingual Acquisition .....................................................................64
2.3.2
Bilingual Acquisition ...........................................................................67
2.4
Acquisition of Vocal Effort .........................................................................71
2.4.1
Monolingual Acquisition .....................................................................71
2.4.2
Bilingual Acquisition ...........................................................................74
2.5
Summary and Research Questions...............................................................74
3
Methodology ........................................................................................................77
3.1
Introduction..................................................................................................77
3.2
Subjects ........................................................................................................78
3.2.1
Common Linguistic and Environmental Background .........................78
3.2.2
Differences in Linguistic and Environmental Background .................81
3.2.2.1 Subject BS........................................................................................81
3.2.2.2 Subject AN.......................................................................................83
3.3
Control groups .............................................................................................84
3.3.1
Children................................................................................................84
3.3.2
Adults...................................................................................................86
3.4
Materials ......................................................................................................87
3.4.1
Children................................................................................................87
3.4.2
Adults...................................................................................................90
3.5
Data Collection ............................................................................................90
3.5.1
Children................................................................................................90
3.5.1.1 Recording Equipment and Set up ....................................................90
3.5.1.2 Procedure .........................................................................................91
3.5.1.3 Games ..............................................................................................92
3.5.2
Adults...................................................................................................94
3.5.3
Summary of the Elicited Data..............................................................95
3.5.4
Digital Audio Data Formats.................................................................96
3.6
Phonetic and Acoustic Measurements .........................................................96
3.6.1
Overview..............................................................................................96
3.6.2
Data Annotation ...................................................................................98
3.6.2.1 Phonetic Labelling ...........................................................................98
3.6.2.2 Annotation of Timing ......................................................................98
3.6.2.3 Annotation of Prominence and Utterance Type.............................102
3.6.3
Automatic Acoustic Measurements ...................................................102
3.6.3.1 Steady-State of the Vowel .............................................................102
3.6.3.2 Formant Analysis ...........................................................................103
3.6.3.3 RMS-Power Analysis.....................................................................106
3.6.3.4 Fundamental Frequency Analysis..................................................107
3.6.4
Data Validation and Normalisation. ..................................................108
3.6.4.1 Validation of Phonetic Labels........................................................108
3.6.4.2 Validation of Estimated Formant Frequencies ..............................109
3.6.4.2.1 Introduction..............................................................................109
3.6.4.2.2 Adults.......................................................................................110
3.6.4.2.3 Children....................................................................................113
3.6.4.3 Normalisation of RMS-Power Measurements ...............................114
4
Acquisition of Vowel Quality............................................................................118
4.1
Introduction................................................................................................118
4.2
Statistical Analysis.....................................................................................119
4.3
Acquisition of Vowel Quality....................................................................119
4.3.1
Scottish English Monolingual Results ...............................................119
4.3.1.1 Acquisition of close(-mid) unrounded vowels...............................119
vii
4.3.1.2 Acquisition of close rounded vowels.............................................122
4.3.1.3 Summary of results for the SSE monolingual peers ......................125
4.3.2
Bilingual Acquisition .........................................................................126
4.3.3
Subject AN.........................................................................................126
4.3.3.1 Acquisition of close unrounded vowels.........................................126
4.3.3.1.1 Language differentiation..........................................................126
4.3.3.1.2 Longitudinal perspective..........................................................129
4.3.3.2 Acquisition of close rounded vowels.............................................130
4.3.3.2.1 Language differentiation..........................................................130
4.3.3.2.2 Longitudinal perspective..........................................................132
4.3.3.3 Summary of AN’s results...............................................................134
4.3.4
Subject BS..........................................................................................136
4.3.4.1 Acquisition of close unrounded vowels.........................................136
4.3.4.1.1 Language differentiation..........................................................136
4.3.4.1.2 Longitudinal perspective..........................................................138
4.3.4.2 Acquisition of close rounded vowels.............................................139
4.3.4.2.1 Language differentiation..........................................................139
4.3.4.2.2 Longitudinal results .................................................................141
4.3.4.3 Summary of BS’ Results................................................................143
5
Acquisition of Vowel Duration..........................................................................146
5.1
Introduction................................................................................................146
5.2
Data Analysis .............................................................................................147
5.3
Acquisition of Vowel Duration..................................................................148
5.3.1
A comparison of adult models ...........................................................148
5.3.1.1 Vowel /i/ ........................................................................................148
5.3.1.2 Vowel // ........................................................................................150
5.3.1.3 Close rounded vowels ....................................................................153
5.3.1.4 Summary of results for monolingual adults...................................155
5.3.2
SSE monolingual acquisition.............................................................157
5.3.2.1 Vowel /i/ ........................................................................................157
5.3.2.1.1 Group results............................................................................157
5.3.2.1.2 Individual results......................................................................159
5.3.2.2 Vowel // ........................................................................................161
5.3.2.2.1 Group results............................................................................161
5.3.2.2.2 Individual results......................................................................163
5.3.2.3 Close rounded vowel......................................................................165
5.3.2.3.1 Group results............................................................................165
5.3.2.3.2 Individual results......................................................................167
5.3.2.4 Summary of results for the SSE monolingual children .................169
5.3.3
Bilingual acquisition ..........................................................................170
5.3.3.1 Subject AN.....................................................................................170
5.3.3.1.1 SSE /i/ ......................................................................................170
5.3.3.1.2 SSE // ......................................................................................172
5.3.3.1.3 SSE // .....................................................................................174
5.3.3.1.4 MSR/SSE differentiation for /i/ ...............................................176
5.3.3.1.5 MSR/SSE differentiation for /u/ and // ..................................179
5.3.3.1.6 Summary of AN’s results.........................................................181
5.3.3.2 Subject BS......................................................................................184
viii
5.3.3.2.1 SSE /i/ ......................................................................................184
5.3.3.2.2 SSE // ......................................................................................186
5.3.3.2.3 SSE // .....................................................................................189
5.3.3.2.4 MSR/SSE differentiation for /i/ ...............................................192
5.3.3.2.5 MSR/SSE differentiation for /u/ and //. .................................194
5.3.3.2.6 Summary of BS’ results ...........................................................196
6
Acquisition of Vocal Effort ...............................................................................198
6.1
Introduction................................................................................................198
6.2
Data Analysis .............................................................................................200
6.3
Acquisition of Vocal Effort .......................................................................201
6.3.1
A comparison of adult models ...........................................................201
6.3.1.1 Unrounded vowel /i/ ......................................................................201
6.3.1.2 Vowel /i/ compared to // ...............................................................204
6.3.1.3 Rounded vowels.............................................................................206
6.3.1.4 Summary of results for monolingual adults...................................209
6.3.2
SSE monolingual children .................................................................210
6.3.2.1 Vowel /i/ ........................................................................................210
6.3.2.1.1 Group results............................................................................210
6.3.2.1.2 Individual results......................................................................212
6.3.2.2 Vowel /i/ compared to // ...............................................................214
6.3.2.2.1 Group results............................................................................214
6.3.2.2.2 Individual results......................................................................215
6.3.2.3 Vowel // .......................................................................................217
6.3.2.3.1 Group results............................................................................217
6.3.2.3.2 Individual results......................................................................219
6.3.2.4 Summary of results for the SSE monolingual children .................220
6.3.3
Bilingual Acquisition .........................................................................221
6.3.3.1 Subject AN.....................................................................................221
6.3.3.1.1 SSE /i/ ......................................................................................221
6.3.3.1.2 SSE /i/ compared to //.............................................................223
6.3.3.1.3 SSE // .....................................................................................225
6.3.3.1.4 MSR/SSE differentiation for /i/ ...............................................227
6.3.3.1.5 MSR/SSE differentiation for /u/ and // ..................................230
6.3.3.1.6 Summary of AN’s results.........................................................232
6.3.3.2 Subject BS......................................................................................235
6.3.3.2.1 SSE /i/ ......................................................................................235
6.3.3.2.2 SSE /i/ compared to //.............................................................237
6.3.3.2.3 SSE // .....................................................................................239
6.3.3.2.4 MSR/SSE differentiation for /i/ ...............................................241
6.3.3.2.5 MSR/SSE differentiation for /u/ and // ..................................243
6.3.3.2.6 Summary of BS’ results ...........................................................246
7
Discussion and Conclusion ................................................................................248
7.1
Overview of the main findings ..................................................................248
7.1.1
Language differentiation and interaction patterns .............................248
7.1.2
Conditioning Factors of Language Differentiation and Interaction...253
7.1.2.1 The role of language input conditions versus language structure..253
ix
7.1.2.2 Sound-structural effects .................................................................257
7.1.2.3 Lexicalisation effects .....................................................................261
7.1.2.4 Maturation and age effects.............................................................261
7.1.2.5 Other environmental effects...........................................................265
7.1.2.6 Methodological issues....................................................................266
7.1.3
Implications of the bilingual findings ................................................268
7.1.3.1 Language differentiation/interaction patterns and their mental
representation.................................................................................................268
7.1.3.2 Implications of the findings for the theory and models of language
acquisition ......................................................................................................270
7.1.4
Implications of vocal effort findings..................................................272
7.2
Suggestions for further research ................................................................275
7.3
General Conclusion....................................................................................276
References..................................................................................................................277
Appendix A Phonetic ranges of the production of the target /i/ by the SSE
monolingual children. ................................................................................................291
Appendix B Distributions of the three most frequent phonetic labels (per carrier
word) for the target // produced by the SSE monolingual children. ........................292
Appendix C Duration of the close(-mid) vowels produced by the adult subjects as a
function of the following consonant in SSE, MSR and SSBE. .................................293
Appendix D Duration of the close(-mid) vowels produced by the adult subjects
averaged per language (SSE, MSR and SSBE) and speaker as a function of the
following consonant...................................................................................................296
Appendix E Individual results of the SSE monolingual children for the duration of
the vowel /i/ as a function of the following consonant. .............................................297
Appendix F Individual results of the SSE monolingual children for the duration of
the vowel // as a function of the following consonant. ............................................298
Appendix G Individual results of the SSE monolingual children for the duration of
the vowel // as a function of the following consonant. .............................................299
Appendix H Duration of the vowel /i/ as a function of the following consonant
produced by the bilingual subject AN: longitudinal results for MSR and SSE.........300
Appendix I Duration of the vowels // and /u/ as a function of the following
consonant produced by the bilingual subject AN: longitudinal results for MSR and
SSE.
301
Appendix J Duration of the vowel /i/ as a function of the following consonant
produced by the bilingual subject BS: longitudinal results for MSR and SSE..........302
Appendix K Duration of the vowels /u/ and // as a function of the following
consonant produced by the bilingual subject BS: longitudinal results for MSR and
SSE.
303
Appendix L Mean RMS-power around F2 (dB) for the adult subjects averaged per
language (SSE, MSR and SSBE) for the vowel /i/ as a function of the following
consonant.
304
Appendix M
Mean RMS-power around F2 (dB) for the adult subjects averaged
per language (SSE, MSR and SSBE) for the close rounded vowels as a function of the
following consonant...................................................................................................305
Appendix N Mean RMS-power around F2 (dB) produced by the SSE subjects of
different ages for the vowel /i/ as a function of the following consonant. ................306
x
Appendix O Mean RMS-power around F2 (dB) produced by the SSE subjects of
different ages for the vowels /i/ and // across all consonantal contexts. ..................307
Appendix P Mean RMS-power around F2 (dB) produced by the SSE subjects of
different ages for the vowel // as a function of the following consonant.................308
Appendix Q Descriptive statistics of SSE/MSR bilingual production of vocal effort
for the vowel /i/ as a function of the following consonant based on three acoustic
measures A2, A2*a, A2*b (dB) per speaker, language and age................................309
Appendix R Descriptive statistics of bilingual SSE production of vocal effort for the
tense/lax vowels /i/ and // based on three acoustic measures A2, A2*a, A2*b (dB) per
speaker and age. .........................................................................................................312
Appendix S Descriptive statistics of SSE/MSR bilingual production of vocal effort
for the close rounded vowels as a function of the following consonant based on three
acoustic measures A2, A2*a, A2*c (dB) per speaker, language and age..................313
Appendix T Durational ratios for the postvocalic conditioning of vowel duration for
all subjects by language, age and bilinguality. ..........................................................316
xi
List of Tables
Table 2-1 Russian vowel phonemes (Bondarko, 1998)................................................41
Table 2-2 Russian vowel allophones (adopted from Bondarko, 1998; Kuznetsov,
1997) ............................................................................................................................41
Table 2-3 Scottish English vowel monophthongs (adopted from Wells, 1982) ...........42
Table 2-4 Comparison between monophthong phonemes between SSE and SSBE
(adapted from Matthews, 2002)...................................................................................42
Table 2-5 Broad differences and similarities between Russian and Scottish English
word-prosodic systems.................................................................................................45
Table 2-6 Broad characterisations, for one representative vowel [], of vowel
duration conditioning effects by various contexts in SSE and SSBE (adapted from
Scobbie et al., 1999a)...................................................................................................48
Table 2-7 Most frequent ‘non-adult-like’ substitutes for SSE target // in child speech
(adapted from Matthews, 2002)...................................................................................53
Table 2-8 Most frequent ‘non-adult-like’ substitutes for SSE target [] (adapted from
Matthews, 2002)...........................................................................................................55
Table 2-9 Frequency of vowel phonemes in 5 subjects (the higher the row – the more
frequent the sound in the table) (adapted from Zharkova, 2002). ...............................57
Table 2-10 A summary of five studies that dealt with bilingual phonological
acquisition of vowel inventories...................................................................................60
Table 2-11 Summary of the total of 8 research variables for three levels of speech
production, vowel sets, crosslinguistic differences and a cross-reference to Section
numbers containing discussion for these variables. ....................................................75
Table 3-1 Identification codes, age and sex of the children who participated in
experiments; the children are listed by age. ................................................................86
Table 3-2 Native language, age, sex of adult participants. .........................................87
Table 3-3 Elicited target words: orthography and adult target phonetic transcription
per language. ...............................................................................................................89
Table 3-4 Main type carrier sentences used in the two languages..............................94
Table 3-5 Summary of the number of sessions and the total number of elicited tokens
per child (and age sample) ..........................................................................................95
Table 3-6 Raw acoustic measurements in this study. ..................................................97
Table 3-7 Summary of the fixed frequency bandwidths for three frequency slices. ..107
Table 3-8 A comparison of different acoustic studies of formant frequencies (Hz),
estimated for adult native speakers of SSBE, SSE, MSR and General American......112
Table 4-1 Phonetic ranges of adult target // produced by SSE monolingual children
....................................................................................................................................121
Table 4-2 Frequencies of adult and non-adult like realisations of /i/ and // for SSE
monolingual children (aged 3;4 to 4;9).....................................................................122
Table 4-3 Phonetic range in the realisation of adult target [] by SSE monolingual
children ......................................................................................................................124
Table 4-4 The effect of factor bilinguality of subject AN compared to the SSE
monolingual peers for the production of phonetic variants [i] and [] for the target
//. ...............................................................................................................................126
xii
Table 4-5 AN’s production of phonetic variants [i] and [] for the target /i/ in SSE
compared MSR (across age). .....................................................................................127
Table 4-6 Distribution of palatalised and non-palatalised consonants in the preceding
context of the vowels [i] and [] for Russian target /i/. ...........................................129
Table 4-7 Longitudinal production of [i] and [] for target /i/ in SSE by AN. ........129
Table 4-8 Longitudinal production of [i] and [] for target /i/ in Russian by AN...130
Table 4-9 The effect of factor bilinguality of the subject AN on the production of
phonetic variants [] and [u] in SSE in comparison to the SSE monolingual
children. .....................................................................................................................131
Table 4-10 Phonetic ranges of the MSR target /u/ and SSE // produced by the
bilingual subject AN...................................................................................................132
Table 4-11 Longitudinal production of phonetic variants [u] and [] for the MSR
target /u/ by AN..........................................................................................................132
Table 4-12. The effect of carrier words on the proportions of the variants [] and [u]
for the MSR target /u/ produced by the subject AN. ..................................................133
Table 4-13 The effect of factor bilinguality of the subject BS on the proportion of
phonetic variants [i] and [] produced for the target // in comparison to the SSE
monolingual children. ................................................................................................136
Table 4-14 The effect of language on the phonetic ranges for the target /i/ produced
by BS in SSE compared to MSR language modes across age samples......................137
Table 4-15 Longitudinal production of [i] and [] for target // in SSE by the subject
BS. ..............................................................................................................................138
Table 4-16 Phonetic ranges for the SSE target // produced by BS in comparison to
the SSE monolingual children (across age)...............................................................139
Table 4-17 Contingency table showing the effect of the factor bilinguality on the
distribution of two most frequent non-adult phonetic targets for SSE // produced by
the subject BS in comparison to SSE monolingual children......................................140
Table 4-18 Phonetic ranges of the MSR adult target /u/ and SSE // produced by the
bilingual subject BS. ..................................................................................................141
Table 4-19 Contingency table showing the effect of language on the realisations of
[] and [u] for subject BS in SSE compared to MSR language modes across her age
samples.......................................................................................................................141
Table 4-20 Longitudinal production of [u] and [] for the target // in SSE by the
subject BS...................................................................................................................142
Table 4-21 Longitudinal production of [u] and [] for the target /u/ in MSR by the
subject BS...................................................................................................................142
Table 4-22 The effect of BS’ age on the use of (non-) palatalised consonants
preceding the MSR target /u/. ....................................................................................143
Table 5-1 Mean duration and standard deviation of the vowel /i/ (ms) for three right
consonantal contexts per language averaged for all the adult speakers...................150
Table 5-2 Mean duration and standard deviation of the vowel // (ms) in three right
consonantal contexts per language (SSE or SSBE) averaged for all the speakers. ..152
Table 5-3 Mean duration and standard deviation of close rounded vowels (ms) as a
function of the following consonant averaged for all the SSE, MSR and SSBE adult
speakers......................................................................................................................154
xiii
Table 5-4 Mean duration and standard deviation for the SSE vowel /i/ as a function of
the following consonant in four age groups of the SSE monolingual controls..........158
Table 5-5 Results of Tukey HSD post-hoc tests for the differences between age groups
within SSE monolingual controls...............................................................................159
Table 5-6 Mean duration and standard deviation for the SSE vowel // as a function of
the following consonant for each age group of the SSE monolingual controls.........162
Table 5-7 Results of Tukey HSD post-hoc tests for the age effects for the SSE
monolingual speakers. ...............................................................................................163
Table 5-8 Mean duration and standard deviation for the SSE vowel // as a function
of the following consonant for each age group of the SSE monolingual controls.....166
Table 5-9 Results of Tukey HSD post-hoc tests for the differences in the duration of
// between age groups within SSE monolingual controls. .......................................167
Table 5-10 Number of tokens and duration of the vowel /i/ (ms) as a function of the
following consonant for subject AN in three age samples.........................................171
Table 5-11 Number of tokens and duration of the vowel // as a function of the
following consonant produced by the subject AN in three age samples....................173
Table 5-12 Number of tokens and median duration of the vowel // as a function of
the following consonant for subject AN in three age samples ...................................175
Table 5-13 Median duration and number of tokens of the vowel /i/ as a function of the
following consonant produced by the subject BS in three age samples. ...................185
Table 5-14 Number of tokens and median duration of the vowel // as a function of the
following consonant produced by the subject BS in three age samples. ...................188
Table 5-15 Number of tokens and median duration of the vowel // as a function of
the following consonant for subject BS for three longitudinal moments. ..................190
Table 6-1 Summary of the ANOVA results for adult controls for the vocal effort
measures in the vowel /i/............................................................................................201
Table 6-2 Summary of the ANOVA results for the three normalisation methods of
vocal effort for the tense/lax vowel pair /i / in adult SSE/SSBE speakers................205
Table 6-3 SSE and SSBE adult means and standard deviations for three normalisation
methods of vocal effort for the vowels /i/ versus //. ..................................................206
Table 6-4 Summary of the ANOVA results for the three normalisation methods of
vocal effort for the close rounded vowels in adults. ..................................................208
Table 6-5 Summary of the ANOVA results for the three normalisation methods of
vocal effort of the vowel /i/ in four SSE monolingual age groups. ............................211
Table 6-6 Summary of the ANOVA results for the three normalisation methods of
vocal effort of the vowels /i/ versus // produced by four SSE monolingual age groups.
....................................................................................................................................214
Table 6-7 Summary of the ANOVA results for the three normalisation methods of
vocal effort for the vowel // in four SSE monolingual age groups...........................217
Table 6-8 Summary of the ANOVA results for the three normalisation methods of
vocal effort for the SSE vowel /i/ produced by the bilingual subject AN as compared to
the SSE monolingual peers. .......................................................................................221
Table 6-9 Summary of the ANOVA results for the three normalisation methods of
vocal effort for the SSE vowel /i/and // produced by the bilingual subject AN
compared to the SSE monolingual peers. ..................................................................223
xiv
Table 6-10 Summary of the ANOVA results for the three normalisation methods of
vocal effort for the SSE vowel // produced by the bilingual subject AN in comparison
to the SSE monolingual peers. ...................................................................................225
Table 6-11 Summary of the ANOVA results for the normalisation methods of vocal
effort (A2, A2*a, A2*b, dB) for the vowel /i/ as a function of the following consonant
produced by the bilingual subject AN in MSR and SSE.............................................228
Table 6-12 Summary of the ANOVA results for the normalisation methods of vocal
effort (A2, A2*a, A2*c, dB) for the SSE vowel // and MSR /u/ as a function of the
following consonant produced by the bilingual subject AN in MSR and SSE. ..........230
Table 6-13 Summary of the ANOVA results for the normalisation methods of vocal
effort for the SSE vowel /i/ produced by the bilingual subject BS as compared to the
SSE monolingual peers. .............................................................................................236
Table 6-14 Summary of the ANOVA results for vocal effort for the SSE vowel
/i/and // produced by the bilingual subject BS in comparison to SSE monolingual
peers...........................................................................................................................237
Table 6-15 Summary of the ANOVA results for the normalisation methods of vocal
effort (A2, A2*a, A2*b, dB) for the vowel /i/ produced by the bilingual subject BS in
MSR compared to SSE. ..............................................................................................242
Table 6-16 Summary of the ANOVA results for the normalisation methods of vocal
effort (A2, A2*a, A2*c, dB) for the SSE vowel // and MSR /u/ as a function of the
following consonant produced by the bilingual subject BS in MSR and SSE............244
Table 7-1 Patterns of language differentiation and interaction observed for the two
bilingual subjects (BS and AN) in different age samples, for three research variables
and two vowel sets. ....................................................................................................252
xv
List of Figures
Figure 1-1 Visual representation of the language mode continuum (Grosjean, 2001,
p.3). ..............................................................................................................................11
Figure 2-1 Variations in the flow glottogram of a single cycle (left part of the
diagram) when a speaker was instructed to increase phonatory loudness (conditions
1a to 3a from soft to loud). Right part of the diagram represents the acoustic
consequence of such increase in the radiated spectrum (2nd and 3rd ticks on the
horizontal axes show frequencies between 2 and 3 kHz) (adapted from Gauffin &
Sundberg, 1989)...........................................................................................................32
Figure 2-2 Acoustic representation of SSE, SSBE and MSR cardinal vowel space
(adopted from Bondarko, 1998; Deterding, 1997; Kuznetsov, 1997; Walker, 1992) .44
Figure 2-3 Acoustic differences in extrinsic vowel duration conditioning (raw
duration in ms) for close vowels between SSE and General American. The solid lines
represent tense close vowels, while the broken lines represent the lax ones (adapted
from House, 1961; Agutter, 1988; McKenna, 1988) ...................................................48
Figure 2-4 Mean duration (ms) of /i/ in SSE and Russian prominent CVC words as a
function of the following consonant (per position in utterance pos1= medial,
pos2=final in an utterance with more than one pitch accents, pos3=final in an
utterance with one pitch accent). .................................................................................49
Figure 2-5 Mean spectral level (dB) in 4 frequency bands in three utterance positions
in Scottish and in Russian. B1 = mean F1± 150 (Hz), B2 =mean F2 ± 300 (Hz), B3 =
mean F3 ± 300 (Hz), B4 = mean F4 ± 300 (Hz)..........................................................50
Figure 2-6 Mean duration (ms) for SSE vowel /i/ as a function of the right
consonantal context for three speakers in Matthews (2002). ......................................66
Figure 2-7 Individual means of the differences in intrinsic vowel duration in German
(short and long vowels) in the speech production of bilingual German-Spanish
(broken bars) and monolingual German (solid bars) children. ..................................68
Figure 2-8 Output sound pressure levels (dB) in female 4-, 8-year-olds and adults,
when they are asked to adjust phonatory loudness for syllable trains /p/ (adopted
from Strathopoulos & Sapienza, 1993)........................................................................73
Figure 2-9 Maximum flow declination rate (L/s/s) in female 4-, 8-year-olds and
adults, when they are asked to adjust phonatory loudness for syllable trains /p/
(adopted from Strathopoulos & Sapienza, 1993) ........................................................73
Figure 3-1 BS’s language exposure pattern (% per 3 month) throughout the preschool period, based on nursery attendance hours and 336 waking hours/month......82
Figure 3-2 AN’s language exposure pattern throughout the pre-school period, based
on nursery attendance hours and 336 waking hours/month........................................84
Figure 3-3 Data flow diagram of the encoding process of the acoustic waveform into
acoustic parameters and phonetic labels.....................................................................97
Figure 3-4 Timing marker indicating the end of the voiceless fricative [s] and the
beginning of the following vowel [] in “sieve” (annotated in SAMPA)....................99
Figure 3-5 Timing marker indicating the end of the vowel [] and the beginning of
the devoiced stop [t] in “food” (annotated in SAMPA). ............................................99
Figure 3-6 Timing marker indicating the end of the vowel [] and the beginning of
the voiceless stop [k] in “cook” (annotated in SAMPA). .........................................100
xvi
Figure 3-7 Timing marker indicating the end of the vowel [i] and the beginning of the
voiced fricative [z] in “cheese” (annotated in SAMPA). .........................................101
Figure 3-8 Two timing markers indicating the boundaries between the end of the
vowel [] and the preaspirated whispered transition [] and the following voiceless
fricative [s] in “fish” (annotated in SAMPA). The duration of [] is 142 ms..........101
Figure 3-9 Data flow diagram of the formant analysis process of the acoustic
waveform and annotated timing of vowels.................................................................104
Figure 3-10 Data flow diagram of the RMS-power analysis of the acoustic waveform
in fixed frequency bands. ...........................................................................................106
Figure 3-11 RMS errors (F1 to F3, Hz) for four automatic formant analysis methods
as compared to manual formant measurements from FFT spectra...........................114
Figure 4-1 Phonetic range of variation in the production of the lax vowel // by SSE
monolingual children (plotted by age on the horizontal axis)...................................121
Figure 4-2 Phonetic range of the production of the adult target vowel // by SSE
monolingual children (sorted by age on the horizontal axis). ...................................124
Figure 5-1 Mean duration and standard deviation of the vowel /i/ in the three
languages (SSE, MSR and SSBE) in the contexts before voiced fricatives, voiced stops
and voiceless stops produced by monolingual adults. ...............................................150
Figure 5-2 Durational means (ms) in all SSE versus SSBE adults of the vowel // in
the contexts before voiced fricatives, voiced stop and voiceless fricatives...............152
Figure 5-3 Mean duration (ms) and standard deviation of the close rounded vowels in
the three languages (SSE, MSR and SSBE) in the contexts before voiced fricatives,
voiced stops and voiceless stops produced by monolingual adults. ..........................154
Figure 5-4 Mean duration of the vowel /i/ (ms) as a function of the following
consonant in four age groups of the SSE monolingual speakers...............................158
Figure 5-5 Individual results of SSE monolingual children on the duration of /i/ as a
function of the following consonant...........................................................................160
Figure 5-6 Mean duration of the vowel // as a function of the following consonant in
4 SSE monolingual age groups ..................................................................................162
Figure 5-7 Individual results of SSE monolingual children on the duration of // as a
function of the following consonant...........................................................................164
Figure 5-8 Mean duration of the vowel // (ms) as a function of the following
consonant in four age groups of SSE monolingual speakers.....................................166
Figure 5-9 Individual results of SSE monolingual children on the duration of // as a
function of the following consonant...........................................................................168
Figure 5-10 Median duration of the vowel /i/ (ms) as a function of the following
consonant for subject AN compared to age matched SSE monolingual children in
three age samples. .....................................................................................................171
Figure 5-11 Median duration of the vowel // (ms) as a function of the following
consonant produced by the subject AN in comparison to the SSE monolingual peers in
three age samples (plotted from left to right). ...........................................................173
Figure 5-12 Median duration of the vowel // (ms) as a function of the following
consonant for subject AN compared to age matched SSE monolingual children in
three age samples.......................................................................................................175
Figure 5-13 Mean duration of the vowel /i/ (ms) as a function of the following
consonant produced by the subject AN in MSR and SSE in three longitudinal age
samples (from left to right). .......................................................................................177
xvii
Figure 5-14 A comparison of AN’s longitudinal results for the mean duration of /i/
(ms) to that of her mother speaking Russian and of the principal investigator (subject
R3) in child directed speech.......................................................................................178
Figure 5-15 Mean duration of the close rounded vowels (ms) as a function of the
following consonant for the subject AN in MSR and SSE..........................................180
Figure 5-16 A comparison of AN’s longitudinal results for the mean duration of MSR
/u/ (ms) compared to that of her mother speaking Russian, and of the principal
investigator (subject R3) in Russian child directed speech. ......................................180
Figure 5-17 Median duration of the vowel /i/ (ms) as a function of the following
consonant produced by the subject BS compared to the SSE monolingual peers in
three age samples.......................................................................................................185
Figure 5-18 Median duration of the target vowel // (ms) as a function of the following
consonant produced by the subject BS compared to the SSE monolingual peers in
three age samples.......................................................................................................187
Figure 5-19 Median duration of all phonetic realisations of [] (ms) as a function of
the following consonant produced by the subject BS compared to the SSE
monolingual peers in three age samples....................................................................187
Figure 5-20 Median duration of the vowel // (ms) as a function of the following
consonant for subject BS compared to age matched SSE monolingual children for
three longitudinal moments........................................................................................190
Figure 5-21 Longitudinal results for the mean duration of the vowel /i/ (ms) as a
function of the following consonant produced by the subject BS in MSR and SSE ...193
Figure 5-22 A comparison of BS’ longitudinal results for the mean duration of /i/ in
SSE and MSR to those of her mother speaking Russian, and those of the principal
investigator (subject R3) in child directed MSR. .......................................................193
Figure 5-23 Mean duration of the close rounded vowels (ms) as a function of the
following consonant produced by the subject BS in MSR and SSE in three age
samples.......................................................................................................................195
Figure 5-24 A comparison of BS’ longitudinal results for the mean duration of /u/ and
// (ms) in SSE and MSR to those of her mother speaking Russian, and those of the
principal investigator (subject R3_CDS) in child directed MSR speech...................195
Figure 6-1 Crosslinguistic effect on vocal effort (based on A2*a measure, dB)
produced by adults for the vowel /i/ as a function of the following consonant. ........202
Figure 6-2 Correlation between the measure A2*a (dB) and vowel duration (ms)
between MSR (left panel) and SSE (right panel) adults speakers. ............................203
Figure 6-3 Individual results for SSE and MSR adults for the production of measure
A2*a of vocal effort for the vowel /i/ as a function of the following consonant. .........204
Figure 6-4 Differences between vocal effort spent (based on mean A2*b, dB) to
produce lax vowel // and tense vowel /i/ for 5 SSE and 4 SSBE adult speakers.......205
Figure 6-5 Crosslinguistic effect on vocal effort (based on mean A2*c , dB) in the adult
production of close rounded vowels as a function of the following consonant. ........208
Figure 6-6 Individual results for SSE and MSR adults for the production vocal effort
(based on median A2*c, dB) for the close rounded vowels as a function of the following
consonant. ..................................................................................................................209
Figure 6-7 Context dependent vocal effort pattern (based on mean A2*a dB) for the
vowel /i/ produced by the SSE adults compared to three groups of children aged 3;4
to 4;9. .........................................................................................................................212
xviii
Figure 6-8 Individual SSE child results of vocal effort (based on median A2*a, dB) for
the vowel /i/ as a function of the following consonant. .............................................213
Figure 6-9 Vowel dependent vocal effort (based on mean A2*a, dB) for the vowels /i/
versus // in SSE adults compared to three groups of children aged 3;4 to 4;9. .......215
Figure 6-10 Individual results for SSE children for the vocal effort differences (based
on median A2*b, dB) between the tense/lax vowels /i /..............................................216
Figure 6-11 Context dependent vocal effort pattern (based on mean A2*a dB) for the
vowel // in the SSE adults compared to three groups of children aged 3;4 to 4;9. .218
Figure 6-12 Individual SSE child results of vocal effort (based on median A2*a, dB)
for the vowel // as a function of the following consonant........................................219
Figure 6-13 Vocal effort for the vowel /i/ (based on A2*a, dB) as a function of the
following consonant produced by the subject AN as compared to the SSE monolingual
peers in three age samples.........................................................................................222
Figure 6-14 Vocal effort applied to /i/ and // (based on A2*b, dB across all
consonantal contexts) produced by the bilingual subject AN and by the SSE
monolingual peers of three age groups. ....................................................................224
Figure 6-15 Vocal effort for the vowel // (based on mean A2*a, dB) as a function of
the following consonant produced by AN in comparison to the SSE monolingual peers
in the three age samples.............................................................................................226
Figure 6-16 AN’s crosslinguistic production of vocal effort for the vowel /i/ (based on
mean A2*a, dB) as a function of the following consonant (age is plotted from left to
right). .........................................................................................................................229
Figure 6-17 A comparison of AN’s vocal effort for /i/ in different consonantal
contexts in MSR (based on median A2*a, dB) to that of her mother and experimenter
(R3 in child directed speech). ....................................................................................229
Figure 6-18 AN’s crosslinguistic production of vocal effort for SSE // and MSR /u/
(based on mean A2*c, dB) as a function of the following consonant (age is plotted from
left to right). ...............................................................................................................231
Figure 6-19 A comparison of AN’s vocal effort for /u/ in different consonantal
contexts in MSR (based on median A2*c, dB) to that of her mother (reading) and
experimenter (R3 in spontaneous speech). ................................................................232
Figure 6-20 Vocal effort for the vowel /i/ (based on mean A2*a, dB) as a function of
the following consonant produced by the subject BS in comparison to the agematched SSE monolingual children in three age samples. ........................................236
Figure 6-21 Vocal effort of the vowel /i/ and // (based on mean A2*c, dB) produced by
the bilingual subject BS compared to the SSE monolingual peers in three age samples
(BS’ target /i/and // are plotted separately from the phonetic labels [i] []). ........238
Figure 6-22 Vocal effort for the vowel // (based on mean A2*a, dB) as a function of
the following consonant produced by the subject BS in comparison to the SSE
monolingual peers in three age samples....................................................................240
Figure 6-23 BS’s crosslinguistic production of vocal effort for the vowel /i/ (based on
mean A2*a, dB) as a function of the following consonant (age is plotted from left to
right). .........................................................................................................................242
Figure 6-24 A comparison of BS’s vocal effort for /i/ in different consonantal contexts
in MSR (based on A2*a, dB) to that of her mother (read speech) and experimenter (R3
spontaneous speech). .................................................................................................243
xix
Figure 6-25 BS’ crosslinguistic production of vocal effort for SSE // and MSR /u/
(based on mean A2*c, dB) as a function of the following consonant (age is plotted from
left to right). ...............................................................................................................245
Figure 6-26 A comparison of BS’s vocal effort for /u/ in different consonantal
contexts in MSR (based on A2*a, dB) to that of her mother (read speech) and
experimenter (R3, in spontaneous speech). ...............................................................245
Figure 7-1 Visual footprint of BS’ and AN’s language differentiation in their two
languages, speech immaturity and the direction of language interaction based on the
results in this study.....................................................................................................251
Figure 7-2 Abstract representation of the longitudinal effect for the bilingual subjects
AN and BS on their bilingual language differentiation based on the number of sound
structure variables involved in total and partial language differentiation across their
two languages. ...........................................................................................................263
xx
List of Equations
Equation 3-1 ............................................................................................................106
Equation 3-2...............................................................................................................107
Equation 3-3...............................................................................................................115
Equation 3-4...............................................................................................................116
Equation 3-5...............................................................................................................117
Equation 3-6...............................................................................................................117
Equation 3-7...............................................................................................................117
xxi
List of Abbreviations and Conventions
1;2.3
This convention for the child’s' age means year;month.days
BFLA
bilingual first language acquisition
BSLA
bilingual second language acquisition
C
Consonant
CCCH
Cross-Language Cue Competition Hypothesis
DTFT
Discrete Time Fourier Transform
F0
Fundamental frequency
Fn
n-th Formant
FFT
Fast Fourier Transform
L2
second language
MSR
Modern Standard Russian
ns
not significant
RMS
root mean square
SLA
second language acquisition
SSE
Scottish Standard English
SSBE
Southern Standard British English
SVLR
The Scottish Vowel Length Rule
V
Vowel
VOT
Voice Onset Time
VLF/VF
ratio of the duration of vowels before voiceless fricatives
relative to that before voiced fricatives
VLS/VF
ratio of the duration of vowels before voiceless stops relative to
that before voiced fricatives
VLS/VS
ratio of the duration of vowels before voiceless stops relative to
that before voiced stops
xxii
1 Background
1.1 Introduction
This study is built upon a general epistemological assumption that language
acquisition is probabilistic rather than deterministic in nature. This idea is well expressed
by Mohanan (1992, p. 653) in a metaphor of self-organisation of perfectly symmetrical
six-branch snowflakes forming in interaction with the environment, while they infinitely
vary in their patterns:
“If the growth of form in language formation is analogous to that in snowflake
formation, we must formulate it as a problem of morphogenesis: how does a
grammar arise and develop in an individual through interaction with the linguistic
environment.”
Under this view the constrained randomness of a snowflake is similar to the problem
of internalised human grammar, where “no two grammars are identical, even when the
input is the same, yet the variability across grammars is severely constrained” (Mohanan,
1992, p.653). Patterns different in complexity may arise in the child grammar compared to
that of adults, as well as more complex innovations are possible in language change. The
outcome and the process of language acquisition are thus predictable to some degree; and
the variability may be a part of the grammar.
Over years of research one of the general questions addressed by second language
(L2-) acquisition studies has been whether there is any functional difference between
early and late L2-acquisition. One of the less controversial statements forthcoming from
this research is that the age of onset of language acquisition has some predictive power
about the level of ultimate linguistic attainment in adulthood: i.e. the age at which L2acquisition begins is the strongest predictor of level of ultimate attainment in adulthood
(Flege et al., 1995; Birdsong, 2004).
So we may state that, given continuous and systematic exposure to their two
languages and positive motivation, bilingual children (like the Russian–Scottish English
pre-school subjects in this study) are very likely to sound like native speakers of the two
languages in adulthood. However, there remain controversial questions in simultaneous
1
bilingual acquisition such as: What does it take for a child to become a fluent speaker in
adulthood? How systematic is language interaction in their speech production? And what
are the conditioning factors for language differentiation and interaction? Besides it is still
an open question whether bilingual phonological acquisition is different from
monolingual and L2 acquisition. This study sets out to bridge the empirical gap for these
general questions.
In this introductory chapter we outline the concepts of language interaction and
differentiation as they apply to bilingual acquisition of language in general, and to that of
sound structure in particular. The aims of this chapter are to introduce important general
concepts in bilingual studies; to review some issues in bilingual language acquisition
specifically focusing on language differentiation and interaction, and to clarify the
research questions of this study.
In Section 1.2 we introduce general concepts and definitions in bilingual language
acquisition such as ‘bilingual’, ‘bilingualism’ and ‘language mode’. In addition to that, we
account for the typology of bilingual situations relevant to this study. Finally, we discuss
the concept of ‘language interaction’, and its relation to other terms such as ‘interference’,
‘transfer’, ‘code-switching’ and ‘language mixing’.
In Section 1.3 we discuss assumptions on the mental representation of a bilingual
child’s two languages. Since this thesis addresses the issue of language interaction in
bilingual acquisition of sound structure, we first discuss factors that are known to affect
language interaction in general and we also address the issue of what forms language
interaction can take in phonological bilingual acquisition.
1.2 Important Concepts and Definitions
1.2.1 Bilinguals and Bilingualism
There is a general agreement in the literature that a ‘bilingual’ is a person who
regularly uses two languages in his/her daily life (Weinreich, 1953; Grosjean, 1982).
Consequently ‘bilingualism’ involves “regular use of the two languages” (Grosjean, 1982,
p. 230), and this is a generic term used to refer to a range of bilingual situations. The term
‘bilinguality’ (or individual bilingualism) is used to refer to “the psychological state of the
individual who has access of more than one linguistic codes as a means of social
communication” (Hamers & Blanc, 2000, p.6).
2
Since this thesis is concerned with bilingual language acquisition from the early
years of life and studies child speech production, we limit this discussion to terminology
referring to bilingual children rather than adults.
Bilingual language acquisition in pre-school children has been characterised as a
result of “early, simultaneous, regular, and continued exposure to more than one
language” (de Houwer, 1995, p.222) from before the age of two and after. There is a wide
range of dimensions through which bilingual speech can be viewed. One of them is the
age of onset of language acquisition. The exact configuration of this relationship between
the age of acquisition and bilingual proficiency in adulthood is a controversial issue.
There is no agreement on whether it is just a correlation (Flege et al., 1995): i.e. a younger
person is more likely to have native-like proficiency in L2 than an older person; or
whether there is some ‘critical age’ (Lenneberg, 1967). The critical age hypothesis of
language acquisition was linked to cerebral lateralisation of language functions at some
developmental point after which native-like attainment was thought to become less likely.
However, different researchers have found different ‘critical ages’ ranging from three
years of age up to puberty, and it seems to be different for different language components
(Meisel, 2003).
There are a number of typological distinctions of bilingual children made alongside
the age of onset of bilingual acquisition. Some researchers have made a distinction
between ‘simultaneous’ and ‘consecutive’ bilingual children (McLaughlin, 1984; Lyon,
1996; Hamers & Blanc, 2000). For example, Hamers & Blanc (2000, p.28) make a
distinction between ‘simultaneous’ and ‘consecutive’ bilingual children. The term
‘simultaneous’ refers to early or infant bilinguals acquiring two mother tongues (LA and
LB). On the other hand, ‘consecutive’ bilinguals first acquire some basis of the mother
tongue (L1), and only then start acquiring L2 some time before the age of 10/11.
However, this subdivision underspecified the upper age limit for being a ‘simultaneous’
and the lower age limit of a ‘consecutive’ bilingual. Therefore, it is not clear whether a
child exposed to two languages from the age of 1;0 will become a simultaneous or a
consecutive bilingual. A similar distinction is proposed by McLaughlin (1984) and Lyon
(1996) between simultaneous and successive bilinguals with the difference that the
boundary between the two types is set to the age of three years old.
Other researchers have used the term “Bilingual First Language Acquisition”
(Meisel, 1989; de Houwer, 1995). De Houwer (1995, p. 223) makes a distinction between
Bilingual First Language Acquisition (BFLA) and Bilingual Second Language
3
Acquisition (BSLA). BFLA refers to “the acquisition of two or more languages from birth
or at most a month after birth” (de Houwer, 1995, p. 223), while BSLA includes other
cases. However, in this case it is not clear what maturational or empirical reasons are put
forward for this distinction, which makes a child aged four weeks a BFLA type bilingual,
and a child aged five weeks a functionally different BSLA type.
Meisel (2003) proposes to distinguish between three types of bilingual acquisition:
(1)
simultaneous bilingual acquisition, in which the child begins to acquire two
or more languages during first three to five years of life;
(2)
child second language acquisition, if the onset of L2 acquisition starts
between ages 5 and 10;
(3)
adult L2 acquisition, after the age of ten.
In this study, our bilingual subjects can best be regarded as simultaneous bilinguals.
Bilingual acquisition began at the ages of 1;3 (subject BS) and 0;7 (subject AN), when
both subjects started to attend the English-speaking nursery.
Obviously, the distinction based on the ‘age of onset’ of language acquisition refers
only to one important dimension affecting linguistic output of a bilingual. Language
proficiency is another important factor (Hamers & Blanc, 2000). In this dimension
bilinguals are subdivided into ‘balanced’ and ‘dominant’, where balanced bilinguals are
equally proficient in both languages and dominant ones are not. We shall further discuss
the issue of language dominance in Section 1.3.2, since it directly relates to the question
of language interaction.
Age and conditions of acquisition may also lead to differences in cognitive
functioning described as ‘subordinate’, ‘compound’ and ‘coordinate’ bilingualism
(Weinreich, 1953), and refers to the functioning of the cognitive system. In ‘subordinate’
bilingualism, there is one (L1-based) conceptual store associated with meanings for both
L1 and L2. In the ‘compound’ type, there is one conceptual store merged from L1 and L2.
In the ‘coordinate’ type, translation equivalents have two separate sets of concepts.
However, since this cognitive factor depends on the age of acquisition and context
(accounted for in this study), it should not influence the way we view the question of
phonological bilingual acquisition.
Finally, there are societal dimensions to bilingualism. Hamers & Blanc (2000)
emphasize the importance of cultural identity, and the issue of ‘valorisation’ (relative
4
status of the language in society). Since this study addresses language-related issues in
bilingualism, we need to control for the relative similarity of these societal factors across
our subjects, and we account for these important issues in Chapter 3 dealing with
methodology.
1.2.2 Language Interaction
The term ‘language interaction’ used in this study covers a broad range of effects
occurring during bilingual language acquisition due to the fact that the two languages may
influence each other (Paradis & Genesee, 1996), though it is not necessary that they will.
The term reflects empirical findings that there seems to be a functional separation of the
two language systems of young simultaneous bilinguals (Genesee, 1989; Genesee et al.,
1995; de Houwer, 1995; Deuchar & Quay, 2000; Petitto, 2001; Keshavarz & Ingram,
2002), but that the systems are not necessarily hermetically sealed from one another
(Petersen, 1988; Döpke, 1998; Schlyter, 1993; Müller, 1998; Döpke, 2000; Paradis, 2001;
Grosjean, 2001; Kehoe, 2002; Lleó, 2002; Guion, 2003), and can interact for a number of
reasons. The factors enabling language interaction will be described in Section 1.3.2. In
language acquisition, interaction is thought have a broad range of manifestations like
‘transfer’, ‘acceleration’ or ‘delay’ (Paradis & Genesee, 1996). We shall discuss these
manifestations later in this section after introducing other related terms.
The older terms ‘mixing’, ‘language mixing’ and ‘code mixing’ also cover any
“interactions between the bilingual child’s developing systems” (Genesee, 1989).
However, they usually refer to physical co-occurrences of elements from two or more
languages within a single utterance (Genesee, 1989), and, in the past, presence or absence
of language mixing was often taken as evidence to treat the question of initial single or
dual language systems in bilingual children in an ‘either or’ fashion (Volterra &
Taeschner, 1978; Redlinger & Park, 1980). As opposed to that, the term ‘language
interaction’ acknowledges the fact that language interaction and differentiation are not
mutually exclusive.
‘Code-switching’ (Muysken, 2000), is a more specific term that refers to a rulegoverned communication strategy among bilinguals. It involves a complex set of sociolinguistic and linguistic rules, whereby linguistic elements (usually lexical items) from a
non-base language are used during communication in the base language for pragmatic
5
reasons. In addition, it may involve an adaptation of the non-base lexeme to the
grammatical rules of the base language (Muysken, 2000).
‘Transfer’ is a term originating from the Second Language Acquisition (SLA)
studies. It usually means “the influence resulting from similarities and differences
between the target language and any other language that has been previously (and perhaps
imperfectly) acquired” (Odlin, 1989, p.27). Thus, it presupposes that L1 of the L2-learner
is already acquired. Nevertheless the term is also used in simultaneous bilingual
acquisition studies.
‘Transfer’ has a lot in common with ‘interference’. The phenomenon of
‘interference’ was defined in Weinreich’s “Languages in Contact” (1953). He labeled it as
“instances of deviation from the norms of either language, which occur in the speech of
bilinguals as a result of their familiarity with more than one language, i.e. as a result of
language contact” (Weinreich, 1953, p.1). The word ‘either’ means that both directions
are possible: i.e. LA may influence LB, and LB may influence LA; and ‘deviation’ refers to
some variation away from a monolingual norm. The difference between ‘transfer’ and
‘interference’ is in the fact that the former presupposes an established L1 competence,
while the latter does not require it, so is more suitable for intermediate stages in
acquisition.
Weinreich conceived ‘interference’ to describe all types of “deviation from the
norms” of monolinguals. However, the choice of words to label the phenomenon, the
terms like ‘deviation’ and ‘norms’, may imply that language interference in bilinguals
might be abnormal and that monolingual is ‘the norm’, even though phenomena like
‘code-switching’ are rule-governed and require intricate bilingual skills. Some researchers
introduced more neutral terms like ‘translinguistic markers’ (Lüdi, 1987) or ‘transference’
(Clyne, 1967). Despite this criticism, the term ‘interference’ took off in the literature, and
is often used interchangeably with ‘transfer’.
Paradis (1993) and Grosjean (2001) narrowed down the term ‘interference’ to all
non-pragmatic aspects of interference (e.g. excluding ‘code-switching’). In the narrowed
down definition, ‘language interference’ is thought to be either:
•
Incidental in nature, and is called ‘dynamic interference’ (Paradis, 1993; Grosjean,
2001). Dynamic interference is explained as being an ‘ephemeral deviation’ due to the
incidental ‘on-line’ influence of the deactivated language, or as “unrepaired slips of
the tongue” (de Houwer, 1995, p.248).
6
•
Or a ‘representational interference’ (static) (Paradis, 2004): i.e. it occurs at the point of
acquisition, and is stored in the mental representation of the wrong language for the
time being. ‘Static’ does not mean that it is stored and does not change any more. This
representation can change with growing linguistic experience.
A way to tease out dynamic and representational interference might be by
accounting for the systematicity of interference. For example, if it is persistent that could
be taken as a sign that this interference is representational.
The effects of language interaction may not be confined to mere occurrence of some
linguistic elements of a non-base language in the base language (like mixing, interference
or transfer). In addition, in language development these effects are thought to take the
form of either ‘acceleration’ or ‘delay’ (Paradis & Genesee, 1996). ‘Transfer’ in Paradis
& Genesee’s (1996, p.3) understanding means a structural incorporation of a grammatical
property from one language into another. ‘Acceleration’ means an earlier (compared to
monolinguals) acquisition of a certain property due to the availability of the other
language. One the other hand, ‘delay’ refers to the “whole rate of acquisition” and
manifests itself in a slowdown in the “overall progress in the grammatical development”
(1996, p.4). Paradis & Genesee have not found evidence for any of these language
interaction processes. Their conclusion was that for the syntactic properties of their
French-English bilingual subjects the bilingual development was autonomous rather than
interdependent. Thus, the whole subdivision of manifestations of language interaction
remains tentative, and we see some problems with it.
First of all, the authors were not clear why ‘acceleration’ should refer to only one
grammatical property, while ‘delay’ to the whole grammatical system. Secondly, they
functionally separate ‘transfer’ from ‘acceleration’ and ‘delay’ in the typology of
language interaction, but later in the paper ‘acceleration’ is compared to “transfer of
knowledge” (Paradis & Genesee, 1996, p.8), and indeed we don’t see why ‘acceleration’
or ‘delay’ (if evidence found) could not be a type of ‘transfer’. For example, it might be
that a static interference from LB persists in LA for some time (and superficially looks
‘delayed’), and later it is acquired in some native-like form. Finally, it needs to be
emphasized that any claim of a ‘delay’ should be substantiated by evidence that a certain
property (or maybe the whole grammatical system) is indeed acquired later within some
monolingual range, for if not, we are probably dealing with interference and not with a
delay.
7
1.3 Bilingual Language Differentiation and Interaction
1.3.1 What is it about?
A great part of research on simultaneous bilingual acquisition has been devoted to
the question of the abstract mental representation of the child’s two languages. In this
discussion the presence/absence of language interaction is crucial.
From at least the 1970’s bilingual language acquisition studies addressed the
question of whether a child acquiring two languages simultaneously develops ‘one or two
systems’ from the outset of linguistic experiences. Proponents of the ‘unitary system’
hypothesis (Volterra & Taeschner, 1978; Redlinger & Park, 1980) claimed that the
presence of language-mixing and apparent lack of translation equivalents in the early
speech of bilingual children should be interpreted as a sign of a unitary (single) system at
the onset of speech production, some innate predisposition to acquire one language rather
than more than one, and that the two language systems gradually became differentiated
later in the acquisition process.
Empirical evidence gathered to date strongly supports the view that rather than
going through any initial ‘unitary’ lexical and syntactic development stages (Volterra &
Taeschner, 1978) children growing up bilingually differentiate between their languages
from the onset of their language production in the second year of life (Genesee, 1989; de
Houwer, 1990; Genesee et al., 1995; Gawlitzek-Maiwald & Tracy, 1996; Paradis &
Genesee, 1996; Deuchar & Quay, 2000; Petitto, 2001; Khattab, 2002; Keshavarz &
Ingram, 2002).
The evidence of language differentiation should not necessarily imply the total
functional separation of the two systems, even though such claims have been made
(Genesee, 1989; de Houwer, 1990; Genesee et al., 1995; Paradis & Genesee, 1996).
Indeed young simultaneous bilinguals seem to produce the majority of linguistic
structures within the ranges of monolingual peers, but there is also ample empirical
evidence for the presence of language interaction in their speech (Petersen, 1988;
Schlyter, 1993; Schnitzer & Krasinski, 1994; Gawlitzek-Maiwald & Tracy, 1996; Döpke,
1998; Döpke, 2000; Khattab, 2000; Paradis, 2000; Paradis, 2001; Kehoe et al., 2001;
Kehoe, 2002; Keshavarz & Ingram, 2002; Khattab, 2002; Lleó, 2002). Such language
interaction does not consist of ‘slips of the tongue’ only, as is sometimes claimed by
proponents of the total language separation in bilinguals (de Houwer, 1995, p.248), but
8
there seems to be systematicity in the way this language interaction happens.
Alternatively, bilingual children may acquire language-specific structures and gain
differentiated control of their languages, where a marginal (and variable) LA/LB
interaction forms a normal developmental path throughout the process. In addition,
language differentiation should not necessarily imply an innate differentiation of language
systems ‘from the start’, since it can alternatively be constructed with growing linguistic
experience (Deuchar & Quay, 2000; Vihman, 2002; Paradis, 2004).
Besides, dual (LA separated from LB) or unitary (LA stored alongside LB) mental
representations may not be the only options for the type of mental construction. For
example, the “Subsystems Hypothesis” and the “Activation Threshold Hypothesis”
(Paradis, 1993; Paradis, 1998; Paradis, 2004) together form an alternative account for how
a bilingual’s can be stored. Jointly these hypotheses explain why language differentiation
and interaction may co-occur, and how it can happen. These hypotheses are part of
neurolinguistic theory of bilingualism formulated by Paradis (1993; 1998; 2004). Findings
are claimed to converge from studies on bilingual language differentiation and interaction
with the evidence of different recovery patterns in language pathology like bilingual
aphasia. The major tenets of the theory are (1) neurofunctional modularity with sets of
dedicated neural pathways for each module; (2) the distinction between implicit
knowledge (automatic procedures like grammar or phonology) and explicit knowledge
(retrievable facts like metalinguistic knowledge); (3) a set of hypotheses about language
processing.
The “Subsystems Hypothesis” (Paradis, 2004, p. 210) is a neurofunctional proposal
that postulates two independent language subsystems within one linguistic system. The
subsystems are functionally independent in that they can be selectively inhibited or
impaired, but they form part of the same neurofunctional language system. Thus this
hypothesis is compatible with the evidence for language differentiation.
It is also compatible with the evidence for language interaction. The choice of an
appropriate language to speak is determined by the cognitive system, and not by the
linguistic system or subsystems (Paradis, 2004, p.210). This choice of the language
together with concepts activates the appropriate language networks (subsystems or
modules), by lowering their ‘activation threshold’ values. The appropriate (lower than
alternative) values are selected. The ‘activation threshold’ (Paradis, 2004, p.28) is lowered
when a “sufficient amount of positive neural impulses have reached the neural substrate”.
The ‘activation threshold’ depends on the frequency and recency of language experience.
9
However, under certain circumstances the activation of the alternative, like an item from
the non-base language, is lower and the alternative is subsequently selected. This can be
due to the input from the cognitive system (e.g. a pragmatic choice to code-switch
activates the appropriate language items), or it might be that there is no appropriate
alternative available, and, thus, dynamic interference occurs.
Only recently bilingual child language studies have started to address the question of
what parts of the language system are prone to language interaction and why. With a few
exceptions the question of language interaction in phonetics and phonology is largely
unexplored, and the results and analyses are divergent (Paradis, 2000; Paradis, 2001;
Khattab, 2002; Kehoe, 2002; Whitworth, 2003; Kehoe, 2004; Keshavarz & Ingram, 2002;
Lleó, 2002). It is the aim of the following sections to review current knowledge about the
forms of language interaction in language components, and phonology specifically. In
addition, we discuss the factors that may determine language interaction.
1.3.2 Factors Affecting Language Interaction
1.3.2.1
Language Mode and Pragmatic Awareness
One of the facts about children acquiring two or more languages from infancy is that
they are confronted with some pragmatic aspects of speech acts (such as the right
language choice) with which their monolingual peers don’t usually have to deal. As
Grosjean (1998, p132) pointed out, not only do bilinguals adapt to their language
background accordingly, they also change their communication strategies depending on
whether the person they are talking to is a bilingual or a monolingual (Lanza, 1992). Thus,
‘language mode’ is an empirically-based construct (see the overview of evidence in
Grosjean, 2001, pp. 8 - 13) that models how this adaptation to the socio-linguistic
background of the interlocutor takes place.
Language mode is “a state of activation of the bilingual’s languages and language
processing mechanisms at a given point in time” (Grosjean, 2001, p.3); the notion is
exemplified in Figure 1-1. The figure represents activation of the two languages in a
bilingual person communicating in a base LA. LB is the non-base language of the
bilingual. The state of activation (the darker the box the more active the language) of the
non-base LB differs depending on the continuous language mode situation. While the base
LA is equally activated in all situations, the non-base LB is least activated in the
monolingual language mode (state ‘1’ in the Figure), and it is activated most in the
bilingual language mode (state ‘3’).
10
This empirically based model has important implications. First of all, this means that
non-base LB is never totally deactivated. Grosjean draws evidence from the studies
showing that speech of bilinguals exhibits signs of dynamic interferences even in the most
monolingual of situations. It must be noted that, if we consider the possibility of further
distinction between ‘dynamic’ and ‘representational interference’ (Paradis, 2004)
discussed in Section 1.2.2, which is not considered by Grosjean, we would still assume
that both types of interference should occur in the monolingual language mode, since both
are unintended in a pragmatic sense. However, each separate type would have different
implications for the state of activation of the deactivated (non-base) language: i.e. the
presence of solely representational interference (as opposed to dynamic) can imply a total
deactivation of the non-base language, since such interference is represented in LA rather
than being borrowed on-line from LB. This is a purely a logical consequence, since we
think that most plausibly both types of interference should operate at the same time.
Figure 1-1 Visual representation of the language mode continuum (Grosjean, 2001, p.3).
The second implication is that in the monolingual language mode (e.g. in
communicating with monolinguals) the amount of ‘code-switching’ should be drastically
reduced compared to the bilingual language mode. The remaining effect of the incomplete
deactivation is dynamic interference (or static if it affected the representation of LA).
11
Thirdly, interference can also occur in the bilingual language mode (for example, in
communication with other bilinguals). However, this is more difficult to separate from
code-switching. Bilingual studies that controlled for the language mode found differences
in the involvement of code-switching between monolingual and bilingual modes
(syntactic: Lanza, 1992; phonological: Khattab, 2002). Therefore, this important model is
accounted for in our methodology (see Chapter 3).
1.3.2.2
Language Mixing in the Input
There are controversial accounts of the influence of parental language mixing on
bilingual child language development. Traditionally the ‘one parent – one language’
principle is considered to be more advantageous for bilingual language development,
while language mixing in a caregiver’s input is viewed as somewhat harmful (see de
Houwer, 1995, p. 225 for a review). As Deuchar & Quay (2000, p.8) point out, there is so
far no evidence of “any ‘type’ of environment as being more or less beneficial for
bilingual upbringing”.
One view holds: “one would expect children exposed to frequent and general mixing
to mix frequently, since there is no reason for them to know that the languages should be
separated” (Genesee, 1989). However, there is evidence showing that even though the
metalinguistic and pragmatic skills of two-year olds are not yet fully developed, bilingual
children can produce interlocutor-dependent code-switching patterns. For example, Lanza
(1992) found that her bilingual Norwegian-English child used more English content word
switching in Norwegian with her Norwegian father, who code-switched himself (despite
the girl’s general preference for Norwegian). On the other hand, she used less Norwegian
content word switching in her weaker English in conversation with her English-speaking
mother, who did not approve of code-switching. Thus, such evidence suggests that
bilingual children are well aware of the bilingual context of their upbringing.
Another view, that of Chambers (2002), hypothesises the existence of an ‘innate
accent-filter’ as a part of sociolinguistic (or pragmatic) competence in bilinguals. He
called this ‘the Ethan experience’ after a boy, Ethan, born and raised in Toronto in a
family of immigrants from Eastern Europe. His parents spoke English with a medium to
strong accent from their L1. Yet Ethan never acquired the parental accent, but sounded
like his English-speaking peers. However, Khattab’s (2002; 2004) empirical data on
phonological acquisition contradicted the ‘accent filter’ hypothesis. She controlled for
monolingual and bilingual language modes in her three English-Arabic bilinguals (aged
12
5;0, 7;0 and 10;0). Her data showed that the children did acquire the parents’ accents, but
they produced them in appropriate sociolinguistic settings: i.e. while communicating in
the bilingual language mode (with their parents), and used the native-like registers of
English to communicate with the English-speaking peers. Khattab’s data suggested that
such a register switch or continuum is possible as a part of sociolinguistic competence.
However, the children did not seem to filter out accents innately, since they acquired
parents’ accents and used them in appropriate situations.
In fact, both findings of Lanza (1992) for syntactic acquisition and Khattab (2002;
2004) for the phonetic/phonological level of speech converge and point in the same
direction. Acquiring parental communication patterns (including code-switching and
interference) seems to be a part of sociolinguistic and pragmatic competence of a
simultaneous bilingual, and it can be used in appropriate communicative contexts, such as
in the bilingual language mode.
1.3.2.3
1.3.2.3.1
Structural differences of the languages in contact
Why should language structure be important?
In the context of monolingual language acquisition, it is known that different
linguistic properties may develop at a different pace depending on their complexity. For
example, phonological acquisition studies emphasise the interdependence of complexity
in sound structure, phonological learning processes and the order of acquisition. This
interdependence was incorporated in different theories ranging from the universal ‘laws of
irreversible solidarity’ (Jakobson, 1941) to the probabilistic and self-organising
‘articulatory naturalness hierarchy’ (Lindblom, 1998) of phonological learning (which
depends on articulatory naturalness, salience in the input, and other lexical forms already
present in the lexicon). The common theme of these otherwise very different accounts of
phonological learning is that learning proceeds from the acquisition of some basic to more
complex tasks (Lindblom, 1998), so that linguistic and phonological structure matters in
this most common sense.
In the context of the two distinct languages in contact, language structure matters in
measuring language interaction for an obvious reason: if the structure of some
components of two languages in contact were identical, there would be nothing to interact
with or to transfer.
13
In claiming that linguistic structure affects language interaction in simultaneous
bilingual acquisition, researchers have taken at least two extreme views. The ‘minimalist’
view claims that language structure does matter in language interaction, but only in
determining its docking sites. Structure does not determine the direction of interaction;
hence it is not its cause. Such a view is implied in the Language Dominance Hypothesis
(Petersen, 1988), which claims that language dominance is the major source of language
interaction in young bilinguals. As opposed to that, the ‘maximalist’ view claims that
linguistic structure does matter, and the structure itself and its complexity determine the
direction of interaction, hence the structural differences cause language interaction. For
simultaneous bilingual acquisition such a view is taken in the Cross-languageCompetition Hypothesis (Döpke, 1998; Döpke, 2000) and in the Markedness Hypothesis
(Müller, 1998).
The term ‘structural’, in our understanding, follows the definition of ‘phonological
knowledge’ by Docherty et al. (2005), which “embraces all the systematic relationships
between the sound patterns of spoken language and the external environment”. This
definition includes among others socially structured variation, rather than only a structural
subset covering lexical contrasts. It embraces any systematic sound structure property, be
it linguistic or sociolinguistic in nature.
Crosslinguistic differences can be addressed from the point of view of mere
structure itself (e.g. phonological or syntactic constituents) and its distribution in the
language (e.g. intralanguage frequency or frequency of input).
With regard to the structural complexity of linguistic tasks and their acquisition,
much research has been done in Second Language Acquisition (SLA) studies. For
example, the Contrastive Analysis Hypothesis (CAH) (Lado, 1957), made predictions of
transfer of linguistic habits of adult L2-learners, based on the similarity and differences
between two phonological systems in contact, and made predictions about the degree of
difficulty of learning. Similar categories in L2 were considered to be easy to acquire
(‘positive transfer’), while those that were different were difficult to acquire (‘negative
transfer’). In contrast to CAH, Flege’s Speech Learning Model (SLM) (Flege, 2002)
predicts that L2 learners have more difficulty establishing similar (but phonetically
differing) phonological categories, than new ones. Evidence for SLM is more solid, since
it is empirically based (that for CAH is drawn from the impressionistic observations in
foreign language classes). Both accounts build on the idea that structure is an important
factor in determining transfer (whatever the direction). Such structural differences should
14
also be important in simultaneous bilingualism, but the direction of language interaction
(if it takes place), might be qualitatively different, and require a separate model
(MacWhinney, 1997; MacWhinney, 2004).
Weinreich (1953, p.18) introduced a typology of processes of interference at the
level of sound structure for L2 learners and adult bilinguals, which was derived from
structural differences at phonological and phonetic levels. He distinguished between:
1. ‘Under-differentiation of phonemes’, i.e. a process when “two sounds of the
secondary system whose counterparts are not distinguished in the primary system
are confused”. For example, RP English features a phonemic tense-lax vowel
contrast such as /i/ and //, while in Spanish this contrast is absent (only tense
vowels are available). Therefore, a Spanish L2-learner of English must acquire a
subtle vowel quality difference between English tense and lax vowels. Empirical
evidence confirms that presence or absence of tense-lax contrast causes difficulties
in acquisition for L2 learners (Panasyuk et al., 1995; Markus & Bond, 1999;
Escudero, 2000; Guion, 2003; Piske et al., 2002). In terms of taxonomy, such
crosslinguistic differences in structure have been labeled as systemic: i.e.
involving different inventories (Wells, 1982).
Another example of ‘under-differentiation’ is the presence and absence of
postvocalic conditioning of vowel duration by the voicing of the following
consonant in languages like English and French. In Mack’s (1982) study,
proficient French speakers of L2 English produced a relatively short (similar to the
French model) phonetic duration of vowels before voiced stops that should be long
in the monolingual English model. Unlike native English speakers, the French L2
learners of English used their French ‘undifferentiated’ vowel categories in
English irrespective of the voicing of the following consonant.
2. ‘Over-differentiation of phonemes’ also involves systemic crosslinguistic
differences and occurs when L2-learners impose distinctions available in their L1
in structural places of L2 where they should be absent. Such a systemic difference
is, for example, the presence of intrinsic vowel duration conditioning in L1
Finnish, and its absence in L2 Russian. It is found to affect speech production of
Finnish students learning L2 Russian: i.e. they superimpose Finnish short/long
vowel distinction in Russian, where it should not occur at all (de Silva, 1999).
15
3. ‘Reinterpretation of distinctions’ occurs when an L2-learner distinguishes a set of
phonemes in L2 by some secondary phonetic features which are more important in
his L1. In taxonomic terms, the involved cross-linguistic differences have been
labeled as distributional (or phonotactic): i.e. involving differences in phonotactic
distributions of an element in the system (Wells, 1982). For example, beginning
English learners of L2 Thai interpret the Thai three-way voicing onset distinction
/b p p/ as a two way distinction /p p/ based on aspiration rather than on voicing
in word-initial positions (Pater, 2003), even though both voicing and aspiration are
present in English.
4. ‘Phone substitution’ may happen when two languages feature the same phoneme,
but differ in their phonetic implementation. In taxonomic terms, the such
crosslinguistic differences have been labeled as realisational (Wells, 1982). For
example, for the voiceless stop /t/ in Spanish, the place of articulation is dental,
but in English it is predominantly alveolar, thus alveolar articulation of English L2
learners of Spanish is considered to be an interference and together with VOT
differences it constitutes a source of foreign accent in Spanish (González-Bueno,
2002).
It is not only the surface structure that affects the acquisition processes in bilinguals,
but also the distributional characteristics within a language such as frequency of tasks and
input (MacWhinney, 1997; MacWhinney, 2004; Paradis, 2004). For example, de Houwer
(1990) addressed distributional differences between the grammatical tense systems in a
Dutch-English bilingual subject. Both English and Dutch feature very similar simple past
and present perfect tense systems. However, monolingual Dutch children first acquire
present perfect, while monolingual English children first acquire the simple past tense.
Both are acquired first due to their more frequent use in colloquial speech. The bilingual
child in de Houwer’s (1990) study acquired the syntactic tense properties in the
appropriate language-specific order, despite the crosslinguistic similarities from the point
of view of surface structure.
Bilingual speech production patterns can also be accounted by looking into
sociolinguistic variation of the language varieties they speak. For example, Khattab
(2002) studied speech production patterns of three English-Arabic bilingual children
(aged five, seven and ten). The children were raised in Lebanese families, residing in
Yorkshire (England). One of the aspects addressed by Khattab was production of /l/. From
16
the standard descriptions of the two acquired language varieties it could be expected that
word-initially the children would acquire a dark /l/ in English and a ‘clear’ /l/ in Arabic.
However, Khattab’s English monolingual control data suggested that the peers, their
parents and subjects in Leeds IVIE corpus, also produced clear /l/ word-initially alongside
the dark variant. Her bilingual subjects showed similar variation ranges in their English.
Khattab’s data showed how such structured variation can be part of a child’s
sociolinguistic competence, and the study provided a good example of how easy it would
have been to misinterpret these data in favour of interference from Arabic, if appropriate
control data were lacking.
Differences in the acquisition rates of sound structure might also be connected to
articulatory phonetic complexity. A good example is the bilingual acquisition of Voicing
Onset Time differences. Kehoe (2004) studied the acquisition of VOT by four GermanSpanish bilingual children aged 1;9 to 2;6. For the word-initial voiceless stops, German
features long lag and Spanish short lag. Phonologically voiced stops also involve a
realisational difference: i.e. Spanish features a voicing lead while German features short
lag. In both languages, the patterns of voiced stops are acquired later than those of the
voiceless stops. Furthermore, the monolingual acquisition rates of the VOT in voiced
stops in Spanish seem to be even slower than those in German. One of the explanations
proposed for these acquisition rate differences was that the voicing lead is inherently more
difficult to produce than the short lag. Kehoe tried to test this difference by involving the
concept of ‘markedness’, since it incorporates such acquisition rate differences (see also
Section 1.3.2.3.3). According to Kehoe, the bilingual results were consistent with three
different patterns of language interaction (Paradis & Genesee, 1996): i.e. delay, transfer
and no interaction. The variability of the results could partly be explained by the
differences in the amount of the bilingual input in different children. However, one
‘balanced’ bilingual showed evidence for language interaction, while another one didn’t.
Thus dominance was not the only conditioning factor. This study showed that differences
in surface structure of some ‘end product’ are not sufficient to explain the complexity of
acquisition process, and that acquisition rates should be taken into account.
To summarise, if we look at the structure of languages in contact, at the
phonological and phonetic level, the differences may involve systemic, phonotactic and
realisational phonetic differences in the implementation of the same phoneme, but also
phonotactic differences in the implementation of the same set (or subsets) of structures.
Such structural differences can be potential docking sites for language interaction;
17
however, distributional (in terms frequency) characteristics of these structural differences
in the language input are also important factors not to be dismissed. The next sections will
discuss current hypotheses on the direction of language interaction specifically formulated
for simultaneous bilingual acquisition.
1.3.2.3.2
Cross-Language Cue Competition
The Cross-Language Cue Competition Hypothesis (Döpke, 1998; Döpke, 2000)
builds on the Competition Model (Bates & MacWhinney, 1989; MacWhinney, 1997) to
account for how structural ambiguities of the languages in contact determine transfer in
simultaneous bilingual language acquisition.
The Competition Model (Bates & MacWhinney, 1989; MacWhinney, 1997;
MacWhinney, 2004) views first and second language acquisition as a constructive, datadriven process, which relies not on universals of the linguistic but of the cognitive system.
The basic claim of the Competition Model with regard to first language acquisition is that
structural cues available within a language compete among each other based on their ‘cue
strength’: i.e. cues which are most reliable (i.e. unambiguous) and frequent are acquired
first, while less reliable cues (i.e. more ambiguous or less frequent) are acquired later.
This applies to all linguistic components: phonological, morphosyntactic, lexical, and
semantic.
With regard to SLA the strongest claim of the Competition Model is “all that can
transfer, will transfer” (MacWhinney, 1997), given a potential for a crosslinguistic
conflict. All beginning L2 learners start with a ‘parasitic’ (MacWhinney, 1997) set of
linguistic structures based on their L1. In the context of first language acquisition, ‘cue
strength’ is the factor determining the acquisition process, its relative order and difficulty.
Linguistic cues with the strongest ‘cue strength’ are acquired first. ‘Task frequency’ is an
important factor determining ‘cue strength’. The factor ‘task frequency’ comprises
language internal frequencies of properties, but also environmental frequency (no input
means there is nothing to acquire). MacWhinney notes that in the context of SLA and
simultaneous bilingual language acquisition the factor ‘task frequency’ is of importance,
because if one of the languages is infrequently used, “task frequency could become a
factor determining a general slowdown of acquisition” (MacWhinney, 1997, p.122).
The Cross-Language Cue Competition Hypothesis (Döpke, 1998; Döpke, 2000)
builds on the Competition Model (Bates & MacWhinney, 1989; MacWhinney, 1997) to
account for simultaneous bilingual acquisition situation. The hypothesis is derived from
18
the longitudinal data of three German-English children (aged between 2;0 and 5;0)
growing up in Australia in families with German-speaking mothers and English-speaking
fathers. English is spoken between the parents and in the community. The author analysed
language-specific word order in verb phrases with auxiliary verbs and the acquisition of
finiteness markers. For example, in both languages simple sentences feature the SVO
word order. However, in the sentences with auxiliary verb phrases the word order is
different: i.e. in English they retain the SVO word order as in simple sentences (E.g. I find
it versus I will find it), while in German the complement moves from the post-verbal to
the pre-verbal position (E.g. Ich finde es versus Ich werde es finden). Word order is thus
more complex in German than in English. The results showed that the three subjects
produced German sentences with word order like Ich möchte essen das instead of Ich
möchte das essen significantly more often than in other reports for German monolingual
peers before the age of three.
Döpke (1998; 2000) explains the appearance of such non-target structures in the
speech of bilingual subjects by introducing the Cross-language Cue Competition
Hypothesis (CCCH). According to CCCH, structurally ambiguous cues of the languages
in contact (like the above word order difference) present a cognitive challenge to a child.
The appearance of ‘non-target structures’ (i.e. interference or transfer) is caused by the
presence of such structural crosslinguistic differences, and their relative ‘cue strength’.
‘Non-target structures’ from the least complex language (in this case English) appear in
the language containing a more complex structure (German). However, despite the
claimed importance of ‘task frequency’ in the Competition Model (MacWhinney, 1997,
p.122), Döpke (2000) argues that the environmental situation affects the frequency of
non-target structures, but is not its primary cause. It certainly does not determine the
direction of transfer.
If CCCH is true, transfer can be bi-directional for different sets of structural
ambiguities, depending on the surface structure of the languages in contact. However, it is
always unidirectional towards the language with a more ambiguous structure. This
hypothesis can be falsified if an ambiguous (unreliable) cue of one language is transferred
onto another language where there is no ambiguity. For example, if we observe a transfer
of the type ‘over-differentiation of phonemes’ (Weinreich, 1953) involving systemic
crosslinguistic differences (see Section 1.3.2.3.1).
However, since CCCH uses ‘cue strength’ devoid of ‘task frequency’ as a possible
factor of language interaction, it is not clear how CCCH can account for realisational and
19
distributional differences described in 1.3.2.3.1. for the structures equally ambiguous from
the point of view of either language or differing in their distributions.
1.3.2.3.3
Markedness
Another explanation (similar to ‘cue strength’) on the directionality of language
interaction in simultaneous bilingual acquisition employs the concept of ‘markedness’.
‘Markedness’ offers an explanation with some predictive power as to why certain
linguistic features are basic, frequent and relatively easy to acquire while others are not.
The idea originally referred to phonological privative oppositions (Trubetskoy, 1939,
p.87), with those members of the opposition lacking a distinctive feature being
‘unmarked’ (like [-voice] in unvoiced stop /p/), while the members of the opposition
featuring a distinctive feature are ‘marked’ (like [+voice] in /b/). An additional test of
markedness involved neutralised positions, where the member of the opposition which
appeared in the neutralised position was considered unmarked (like word final devoicing
of [p] for /b/ in German or Russian).
However, over the years other interpretations have been given for markedness. One
of them is the distinction in Chomsky’s Universal Grammar between the core (innate
marked and unmarked) rules of language and the peripheral marked rules (Chomsky,
1986). Another interpretation comes from the point of view of linguistic typological
universals with features present in most languages being unmarked and those more
exceptional being marked (Ellis, 1994).
In SLA and bilingual acquisition studies, the concept often encompasses a
combination of the structure of the languages in contact (presence and absence of
features), their versatility (number of rules determining a feature in different contexts) and
frequency within and between the languages in contact (Ellis, 1994).
There is some empirical evidence that markedness may help to explain the direction
of transfer in simultaneous bilingual acquisition. Müller (1998) claims that transfer in
young simultaneous bilinguals can be viewed as a relief strategy that helps them to cope
with structurally ambiguous input in the two languages. According to Müller (1998),
markedness is a key to understanding the ambiguity of input, though like Döpke (1998;
2000) she views markedness from the point of view of surface structure devoid of the
frequency effects.
Müller (1998) looked at the acquisition of word order in subordinate clauses in
simultaneous bilingual children acquiring German and either English, French or Italian. In
German, the rules of word order in subordinate clauses are marked, since there are many
20
rules determining it. German presents a child with a more ambiguous (marked) input, with
‘verb-object’ (VO) order being just an option, while the English input is unambiguous
(and unmarked) with regard to this syntactic property. Müller reviewed ten different case
studies of simultaneous bilingual children with regard to the acquisition of this syntactic
ambiguity. She concluded that in all cases transfer was unidirectional from the language
with unmarked syntactic structure (French, Italian or English) into the language
containing an optional marked structure in contexts determined by rule (German).
Importantly, Müller (1998, p.160) claims that this unidirectionality is independent
of the fact of whether German is a “preferred language of the bilinguals or not” [our
italics], in other words whether they are balanced or dominant bilinguals. She interpreted
the differences between monolingual and bilingual children as quantitative, since similar
patterns of errors were found in both groups, and qualitative, since in monolingual
children such error patterns were exceptional while in bilingual children they were
frequent. Müller does not claim that such transfer is a necessary feature of bilingual
language. Thus, we can further derive that markedness of a language structure is just a
factor of language interaction, but not its direct source.
The unidirectionality claimed by Müller is very similar to the formulation of CCCH
(Döpke, 1998; Döpke, 2000). In fact Müller’s hypothesis can be refuted exactly the same
way as that of CCCH discussed in the previous section. The difference between the two
hypotheses is the paradigm adopted by the authors, since markedness implies a nativist
paradigm while ‘cue strength’ builds on a connectionist view. Further, the two hypotheses
have common ground in rejecting the importance of language dominance.
Markedness has also been invoked to explain language interaction at the
phonological level of language in simultaneous bilinguals (Paradis, 2001; Kehoe, 2002;
Lleó, 2002; Whitworth, 2003; Kehoe, 2004). We discuss two of these studies (Kehoe,
2002; Whitworth, 2003) in greater detail in section 2.3.2, since they deal with vowel
duration.
Paradis (2001) examined word-truncation patterns in 17 French-English bilingual
children aged about 29 months growing up in a bilingual community of Montreal,
Canada. She compared bilingual speech production to English and French monolingual
children (n=18 in each group). French features an iambic rhythm (E.g. WWWS). In
English the trochaic pattern is predominant (SWS’W), however, other patterns (WS’WS,
WSWW, SS’WW) are also common. Subsequently, the patterns of syllable omissions in
monolingual children show a trochaic bias (SW) in English and an iambic one (WS) in
French. The results for the bilingual children (Paradis, 2001) showed that they performed
21
similarly to the two monolingual peer groups, except for the crosslinguistically ambiguous
WS’WS English pattern, which structurally resembles the French WWWS pattern. For
this English pattern the bilingual children showed an iambic (French) bias in the
preservation of the syllables unlike the English-speaking monolingual peers. Thus, despite
the language-specificity of the majority of bilingual realisations, this result also firmly
suggested the presence of crosslinguistic effects from French in bilinguals’ English.
Therefore, this finding supported both the idea of ‘cue strength’ (Döpke, 1998; Döpke,
2000) and markedness-driven (Müller, 1998) direction of language interaction as for
syntax.
However, Paradis (2001) also considered the possibility of language dominance in
her 17 bilingual subjects. She performed post-hoc tests for English and French dominant
bilinguals (determined by amount of exposure). The results showed that the Frenchdominant group had a significantly stronger tendency to treat the English WS’WS words
like French words than the English-dominant group. Paradis’ (2001) results emphasised
the importance of considering both structural linguistic properties and environmental
factors such as dominance.
1.3.2.4
Language dominance
Language dominance, the notion that “one language is somehow stronger than the
other and affects processing of the other” (Lanza, 2000), has been a controversial issue in
bilingual acquisition studies. The controversy revolves either around the presence of the
confounding effect of dominance on the developing child linguistic system, or around
methodological issues of measuring dominance (is it defined in the language input or
output?).
The fact that language dominance may play a role in bilingual acquisition has been
assumed or at least considered in quite a few handbooks in the field (Hamers & Blanc,
2000; Grosjean, 1982; Meisel, 2003), however there is surprisingly little empirical
evidence for its operation. Grosjean (1982, p. 189) pointed out that “the main reason for
dominance in one language is that the child has had greater exposure to it and needs it
more to communicate with people in the immediate environment”. In that sense, it seems
reasonable to consider the amount of input and motivation for language use to determine
language dominance. It is also reasonable that (if it matters) language dominance should
somehow be reflected in the mental representation, and that as an effect of this dominance
the language output can contain transferred structures. However, since we know that
22
transfer of linguistic structures might be influenced by other factors, such as the language
structure itself, than measuring amount of transfer in both languages in the output, and
attributing it to dominance only, seems problematic.
For example, the Dominant-Language Hypothesis (Petersen, 1988,p. 486) claims
that for word-internal language mixing, grammatical morphemes (like plural or tense
markers) of the dominant language of a bilingual child may co-occur with the lexical
morphemes of either dominant or non-dominant language, while grammatical morphemes
of the non-dominant language never occur in the dominant language. The hypothesis
predicts a unidirectional transfer from a more dominant into the less dominant language
irrespective of the language structure. The hypothesis is drawn from the data of a DanishEnglish bilingual child (aged 3;2), who lived in the USA and attended English-speaking
day-care for 30 hours a week, while spending the rest of the week with her Danishspeaking parents. Petersen measured the girl’s proficiency in the two languages in terms
of the occurrence of language mixing in the language output: i.e. in English she had only
two mixed items, while in Danish a considerable amount of mixing from English
occurred. However, the problem with this analysis is that ultimately it does measure the
girl’s language output, but it does not consider any alternative explanations of this
language mixing other than language dominance, such as, for example, the structural
complexity of tense marking in Danish-English verbs and their monolingual acquisition
patterns.
Lanza’s (1992) data supports Petersen’s (1988) language dominance hypothesis.
Like Peterson, Lanza interpreted the directionality of mixing in her data as an indication
of language dominance. She claimed that dominance is not a necessary correlate of
simultaneous bilingualism, and its state is prone to changes in time. She measured similar
morphosyntactic features as Petersen. Lanza’s subject acquired a typologically similar
language pair to Petersen’s study i.e. Norwegian and English, with the difference that the
girl was dominant in Norwegian rather than in English, due to a greater exposure to
Norwegian in Norway. Interestingly Lanza (1992, p. 642) noted that the patterns of
mixing involved a mirror image of those in Peterson’s study, which is not surprising given
their mirror image of dominance. Lanza (2000) noted that the same bilingual girl in fact
also used an English pattern of negation in her dominant Norwegian, and that could be
interpreted in favour of CCCH. Lanza (2000) refined her previous claim (1992) by stating
that the claims of language dominance and CCCH “need not be mutually exclusive”.
23
To conclude, at both ends of the ‘minimalist’ and ‘maximalist’ approaches to
structural differences in language interaction, there is no conclusive evidence that these
accounts should be looked at in ‘either/or’ fashion or be dismissed. It seems thus
reasonable to test the two accounts simultaneously so that the apparent binary nature of
these proposals could be given a continuous interpretation, if tendencies emerging over a
large number of studies would point in this direction.
1.3.2.5
Bilingual bootstrapping
The “Bilingual Bootstrapping Hypothesis” (Gawlitzek-Maiwald & Tracy, 1996)
accounts for syntactic acquisition, and views language mixing in young simultaneous
bilinguals as a relief strategy which involves a temporary use of the child expertise in one
domain of LA to solve similar problems in LB. The hypothesis was derived from the
longitudinal data of one bilingual child (aged 2;3 to 4;3) acquiring German and British
English in South Germany.
The term ‘bootstrapping’ generally means that the improvement of one capability
automatically improves any dependent capabilities. Thus, bilingual bootstrapping means
that “something that has been acquired in language A, fulfils a booster function for
language B” (Gawlitzek-Maiwald & Tracy, 1996, p. 903), or at least it serves as “a
temporary pooling of resources”. For example, the subject in their study first acquired the
infinitival constructions in English, with frequent use of want to constructions. At the
same time in German she did not produce such infinitival constructions, however, she did
use mixed utterances like: “Papa du mußt warten for me to dressed” (‘Daddy, you must
wait for me to get dressed’). The subject later acquired the German infinitival
constructions.
This hypothesis is attractive because it puts the process of acquisition in a
developmental perspective. Unfortunately, Gawlitzek-Maiwald & Tracy (1996) do not
explicitly state how the hypothesis can be tested. In our view, one of the predictions of
this hypothesis should be that once a structure is acquired, language interaction involving
this structure should cease, since the child can ‘pool the newly acquired resources’ in the
appropriate language, and no longer needs to rely on those of the other language. Of
course, acquisition is a gradual process, and the difficult question in child language
acquisition is ‘when is something fully acquired?’ In this sense this hypothesis is difficult
to test empirically, since if a structure emerges, but is not yet fully acquired, and there is
still language interaction present for this structure, then the hypothesis holds true.
24
However, if a structure is fully acquired, but there is language interaction involving this
structure, then the hypothesis should be falsified.
1.4 Summary
In this chapter we introduced the concepts of language interaction and
differentiation. We saw that the current overriding assumption about bilingual
phonological acquisition is that the mental representation of a bilingual’s languages is
differentiated and that additionally both systems can interact to variable degrees.
This assumption has received consensus among various researchers studying
phonological aspects of bilingual language acquisition, even though the empirical
foundation for this assumption mainly comprises studies of non-speech related language
structures in morphology, syntax and lexicon.
We discussed the concept of ‘language interaction’, and its relation to other broadly
used terms such as ‘interference’, ‘transfer’, ‘code-switching’ and ‘language mixing’.
From this discussion, we concluded that the concept of ‘transfer’ or ‘interference’ retains
its validity and usefulness in bilingual language acquisition research. Further, it remains
an open question what forms language interaction can take in phonological simultaneous
bilingual acquisition.
The discussion of the factors affecting language interaction concentrated on the
claimed importance of system-internal factors such as ‘cue strength’ (or ‘markedness’ in
the nativist paradigm), environmental factors such as ‘language dominance’, and the role
of other environmental factors such as “the quality of input” and language mode. We
emphasised that the available empirical evidence does not make it possible to treat the
confounding factors such as linguistic structure or language dominance in an ‘either/or’
fashion. It is necessary to consider these factors simultaneously.
25
2 Crosslinguistic Differences in Sound Structure
and Their Acquisition
2.1 Differences in Sound Structure between Scottish English
and Russian
2.1.1 Introduction
The bilingual subjects in this study are acquiring two languages: the Edinburgh
variety of Scottish Standard English (SSE or Scottish English) and the Moscow variety of
Modern Standard Russian (MSR or Russian).
Russian belongs the East-Slavic branch of the Slavic group, while Scottish English
belongs to the West-Germanic branch of the Germanic group, both within the IndoEuropean language family. There are large Russian-speaking communities in 30 countries
(including former USSR, China, Mongolia, Israel and the USA).
Scottish Standard English is usually described as a being one end of “a language
continuum [ranging] from Broad Scots to Scottish Standard English” (Corbett et al., 2003,
p.2). Broad Scots, also known as the Scots language, derived from the Anglian variety of
Old English spoken in the 6th century A.D. According to Corbett et al. (2003) the term
‘continuum’ suggests “that there is a shading and overlap of language uses from ‘Broad
Scots’ to ‘Scottish Standard English’”.
The vocabulary and grammar of the two extremes of the Scots continuum are shared
to a varying extent depending on (among others) the amount of influence of other
standard varieties of English spoken in the U.K. Between the two extremes, Scottish
Standard English has undergone great influence from other standard varieties of English
spoken in the U.K., with which it shares a substantial overlap in written language, while
retaining a more Scots phonology.
As might be expected given their common inheritance, Scottish English and Russian
have a number of similarities and a number of differences with regard to their
phonological systems (both at the segmental and suprasegmental levels). As we
mentioned in Chapter 1.3.2.3, structurally conflicting crosslinguistic differences may
trigger language interaction in simultaneous bilingual language acquisition. Therefore,
26
this chapter outlines the subset of crosslinguistic differences in sound structure with a
potential for language interaction between SSE and MSR. Having described the
crosslinguistic differences in detail, we shall review previous findings concerning such
differences in monolingual and bilingual language acquisition. Finally, based on the
literature review we shall introduce the research hypotheses for this study.
2.1.2 Theoretical Framework for the Research Variables
2.1.2.1
A Short Sketch of the Research Variables
Scottish English and Russian have similar word-prosodic systems, i.e. the languages
employ prosodic parameters other than pitch (f0) to encode lexical stress (Beckman,
1986), but they differ as follows:
(1)
SSE features a certain amount of phonological encoding of tense and lax vowels,
while such contrast is absent in Russian. We intend to look at this contrast in triple
phonetic terms: vowel quality, vowel duration and laryngeal differences.
(2)
The Scottish Vowel Length Rule (SVLR) (Aitken, 1981; Scobbie et al., 1999a;
Scobbie et al., 1999b; Scobbie, 2002), a highly systematic distribution of vowel
duration conditioned by post-vocalic consonantal voicing and manner of
articulation. In SSE, SVLR applies only to the tense vowels /i/ // /ai/ and not to the
lax // . Such substantial extrinsic conditioning of vowel duration is absent in
Russian (Chen, 1970; Gordeeva et al., 2003), where duration mainly cues
prominence relations: i.e. the presence of a pitch accent versus lack of it, or the
presence of word stress versus lack of it (Svetozarova, 1998).
(3)
The presence of the SVLR in SSE seems to trigger a differential employment of
acoustic cues to accentual contrasts (Gordeeva et al., 2003). In prominent positions,
a higher ‘spectral balance’1 (an acoustic correlate of vocal effort) is associated with
phonetically short vowels compared to the long ones. The higher vocal effort of the
short vowels is initiated at the pulmonic and laryngeal levels, and is reflected in the
laryngeal adjustment towards a more asymmetrical glottal pulse (Gauffin &
1
In the literature, different terms - ‘spectral tilt’ (Campbell, 1995; Hanson, 1997; Sluijter & van Heuven,
1996a), ‘spectral balance’ (Sluijter & van Heuven, 1996b; Sluijter et al., 1997; Jessen, 2002) and ‘spectral
emphasis’ (Traunmüller & Eriksson, 2000; Heldner, 2001; Heldner, 2003) - refer to the energy in spectral
midfrequencies. The terminological differences reflect differences in methodology: i.e. they infer the same
laryngeal effect by different acoustic measurements. ‘Spectral tilt’ is a ratio of the intensity of the first
harmonic to that of F3 or F2, while ‘spectral emphasis’ measures energy (dB) in a signal low-pass filtered at
1.5 times f0. Since we use Sluijter & van Heuven’s (1996b) methodology with some adaptations, we call the
acoustic parameter ‘spectral balance’ throughout the study.
27
Sundberg, 1989; Sluijter & van Heuven, 1996b; Hanson, 1997). In the radiated
spectrum, this glottal asymmetry is reflected in a higher energy of spectral partials
above 1000 Hz (Sluijter & van Heuven, 1996b). We hypothesised (Gordeeva et al.,
2003) that the enhancement of spectral balance in short but prominent vowels in
SSE serves as an additional cue to achieve sufficient prominence. The acoustic cue
‘duration’ is functionally loaded: i.e. the SVLR conditions a short duration in words
such as ‘sheep’, while prominence requires accentual lengthening. Therefore, the
SVLR interacts with the Scottish English accentual system and dynamically affects
the acoustic cues to prominence. For Russian there is no such association. In
Russian, vocal effort applied during speech production typically results in a
relatively ‘slack articulation’ of stressed vowels (Bondarko, 1998), with their
spectral levels being similar to those of the SSE long vowels (Gordeeva et al.,
2003).
These substantial structural differences are part of the adult language, and of the
language input to bilingual children. We limited the scope of phonemes considered for the
current analysis to a subset of close to close-mid vowels: /i,,/ in Scottish English, and
/i,u/ in Russian, since this subset most clearly exemplifies the above crosslinguistic
ambiguities. Before we proceed with a detailed explanation of these crosslinguistic
structures, we would like to outline the framework within which we view the relation
between these research variables.
2.1.2.2
‘Stress-Accent Hypothesis’
In our analysis of word-prosodic features (such as the presence of systematic
extrinsic vowel duration conditioning in SSE and its absence in MSR) and their influence
on the prominence relations in the two languages, we follow Beckman’s (1986) ‘stressaccent hypothesis’. The hypothesis states that “stress accent differs phonetically from
non-stress accent in that it uses to a greater extent material other than pitch”.
In Beckman’s view, ‘accent’ is defined as a “system of syntagmatic contrasts for
constructing prosodic patterns” (1986, p.1), and it is an organisational phonological rather
than a distinctive feature such as found in segmental contrasts. The prosodic patterns
subdivide an utterance into shorter phrases and organise them into larger units. Since such
a prosodic system involves only syntagmatic oppositions (rather than paradigmatic), the
distinctive function becomes secondary for prosodic properties (as opposed to its primacy
in segmental contrasts). According to Trubetskoy (1939, p.35) accent in a language can
serve delimitative, distinctive and culminative functions at the same time. However, in
28
Beckman’s interpretation (1986), the culminative function subserves either distinctive or
delimitative functions, since accentual systems only participate in syntagmatic contrasts.
‘Stress’ is “a phonologically delimitable type of accent” (1986, p.1), with no
specification of pitch shape in the lexicon, because this varies depending on the pragmatic
meaning. On the contrary to Beckman’s interpretation of the relation between stress and
accent, we rather agree with Sluijter and van Heuven (1996b) in understanding that stress
and accent are two distinct dimensions, where “accentuation is used to focus and is
determined by the communicative intentions of the speaker”, while “stress is a structural,
linguistic property of a word that specifies which syllable in the word is the strongest” and
a potential docking site for accent.
In Beckman’s view, phonological categories of accentual systems are not
necessarily phonetically uniform across languages (or even within a language). A
phonological property in one language can differ in phonetic detail from the same
property in another language. The difference in such phonetic detail is a question of extent
rather than an absolute.
2.1.2.3
Stress and Vocal Effort in ‘Stress Accent’ Languages
Traditionally, the phonetic detail of lexical stress in languages like English has been
measured by acoustic cues such as f0, vowel duration, vowel quality, and overall intensity.
Importantly, a set of acoustic measurements taken to infer stress depends on our
understanding of what constitutes stress physiologically for a given language (vocal effort
and its levels in speech motor control).
In the past, differences in stressed versus unstressed English syllables were
attributed “to differences in physical effort” (Lehiste, 1977, p. 106). Jones
impressionistically defined ‘stress’ as “the degree of force with which a sound or syllable
is uttered” (1918, p.245). The strong force in his view “gives the objective impression of
loudness” (1918, p.245). Similarly to Jones and Lehiste, Bloomfield (1933, p. 110)
associated English stress with loudness.
However, there was no agreement as to the level of speech production at which this
effort is achieved. Bloomfield (1933, p. 110) defined stress in terms of pulmonic,
laryngeal and supralaryngeal effort: i.e. “more energetic movements, such as pumping
more breath, bringing vocal cords closer together for voicing, and using muscles more
vigorously for oral articulations”. Similarly Jones (1918, p.245) defined the source of the
‘strong force’ as a result of “energetic action of all the articulating organs” that involves
29
laryngeal and supralaryngeal levels and “a strong push from the chest wall”. However,
Lehiste (1977) defined ‘effort’ on the pulmonic level only: i.e. involving physical activity
of the muscles controlling respiration. In Lehiste’s understanding, the force exerted by the
muscles was to be reflected in the subglottal pressure and ultimately in the compound
effect on f0, vowel duration, vowel quality and overall intensity in the speech.
An important turning point from associating loudness (and force) with stress in
English came from Fry’s (1955; 1976) series of empirical studies. He investigated the
hierarchy of acoustic cues to English stress in production and perception. Perceptually,
higher f0 was found to be the strongest perceptual correlate of stress (as compared to
unstressed syllables in word pairs like “OBject” versus “obJECT”), followed in cueing
strength by longer duration, greater overall intensity and fuller segmental quality. Overall
intensity was found to be a poor correlate of stress.
There are two problems with Fry’s studies (1955; 1976). The first one concerns his
treatment of f0 as a correlate of stress in the presence of accent. As Sluijter and van
Heuven (1996b) point out, the primacy of pitch in cueing stress emphasised in Fry’s
studies (1955; 1976) “is a major source of the common misunderstanding in the
experimental literature that f0 excursion is a direct acoustic correlate of the feature
‘stress’”, since stress and accent are “distinct (though non-orthogonal) dimensions”
(1996b, p.2471) and should be treated separately. The second problem is his dissociation
of stress from loudness based on manipulations of the overall intensity of the sound. The
latter problem needs more clarification.
Over the years, our understanding of physiological manifestations of lexical stress
has substantially evolved. While a relative amount of ‘effort’ remains an important part of
the physiological definition of stress (for example, Beckman, 1986; Laver, 1994), effort at
the subglottal level only as defined by e.g. Lehiste (1977, p.106) is not sufficient to
explain linguistic stress. In agreement with Jones’ (1918) and Bloomfield’s (1933) views,
in addition to pulmonic effort the laryngeal and supralaryngeal levels need to be taken
into account in explaining stress production mechanisms (Rietveld & van Heuven, 1997).
Empirical evidence supports this broader physiological definition of vocal effort
(Fónagy, 1966). At the pulmonic level the intercostal muscles and the diaphragm control
subglottal pressure by a dynamic balance of expiratory and inspiratory effort (Laver,
1994, p. 513). The subglottal pressure in the lungs produces an airstream that creates a
30
difference in transglottal pressure at the vocal folds (the Bernoulli effect) and sets them in
vibration.
At the laryngeal level, however, a speaker can control the crycothyroid muscles
regulating the frequency of vocal fold vibration (f0) to convey the appropriate
communicative intentions through intonation. A speaker is also able to control the shape
of the vibration pattern of the vocal folds, by reciprocal control of the activation of the
abductor and adductor muscle groups (Hirose, 1999, p.128). The shape of the vibration
pattern determines voice quality. For example, a breathy voice quality (often found in
female speakers) is associated with a constant glottal leakage (the vocal folds never come
completely together) and a relatively symmetrical glottal pulse (Hanson, 1997; Ní
Chasaide & Gobl, 1999).
At the supralaryngeal level, the effort in the production of stressed syllables is
characterised by a more careful (and slower) articulation by the tongue body, tongue blade
and lips resulting in a spectral expansion (fuller quality) in the vowel production (Rietveld
& van Heuven, 1997).
Therefore, vocal effort is created at three levels in the vocal tract, and the acoustic
measurements inferring stress should reflect pulmonic, laryngeal and supralaryngeal
contributions to vocal effort.
2.1.2.4
Acoustic Correlates of Vocal Effort in ‘Stress-Accent’ Languages
Fónagy (1966) analysed acoustic spectra of Hungarian stressed and unstressed
vowels. This is how he described the acoustic differences he observed (1966, p.239):
“The greater effort was reflected in different ways. In most cases ... the formants of
the vowels in stressed syllables had higher amplitudes and broader bandwidths.
Especially sharp was the divergence in the higher frequency ranges. In these cases,
the stressed syllables took a higher level and were longer and had a higher pitch. The
saturation of the spectrum indicated stress even when the greater effort was not
indicated by relatively higher sound pressure levels.” [emphasis added – O.G.]
Even though Fónagy treated f0 as a correlate of stress rather than one of accent, he
was probably one of the first phoneticians after Fant (Fant, 1960) to emphasise the role of
energy in the higher frequency ranges in differentiating stressed from unstressed syllables
(in addition to sound pressure level, vowel duration and quality).
It is known that an increase in subglottal pressure affects the f0, the radiated sound
pressure level (SPL) and ultimately the overall intensity (Finnegan et al., 2000; Gauffin &
31
Sundberg, 1989). Gauffin and Sundberg (1989) observed that when their subjects (four
singers and two non-singers) were instructed to increase phonatory loudness, this was
always accomplished by an increase in subglottal pressure. When subglottal pressure was
low (1a in Figure 2-1) there was glottal leakage: i.e. transglottal airflow did not reach zero
during the quasi-closed phase (the vocal folds did not come completely together). When
subglottal pressure increased with the increase the phonatory loudness (2a in Figure 2-1) a
nearly complete glottal closure occurred.
Thus, increases in SPL also affected the laryngeal level. The shape of the pulse
became more asymmetrical (from 1a to 3a in the steepness of the slope of the dotted line
increases): i.e. the trailing end of the flow glottogram pulses grew continuously steeper as
subglottal pressure increased. This change of shape was due to the relative increase of the
adduction force resulting in the increase of speed of the closing phase. This closing phase
slope was the steepest in the condition of the highest increased phonatory loudness (3a in
Figure 2-1).
Gauffin and Sundberg’s (1989) study showed that at the level of (inversely filtered)
radiated spectrum (1b to 3b in Figure 2-1 ) the change in closing phase due to increased
loudness resulted in a boost of frequencies between 2 and 4kHz (hence midfrequencies).
2a
3a
Time (arbitrary units)
1b
Sound Pressure Level (arbitrary units)
Transglottal airflow (arbitrary units)
1a
2b
3b
Frequency (arbitrary units)
Figure 2-1 Variations in the flow glottogram of a single cycle (left part of the diagram) when a speaker was
instructed to increase phonatory loudness (conditions 1a to 3a from soft to loud). Right part of the diagram
represents the acoustic consequence of such increase in the radiated spectrum (2nd and 3rd ticks on the
horizontal axes show frequencies between 2 and 3 kHz) (adapted from Gauffin & Sundberg, 1989).
32
Traunmüller and Eriksson (2000) defined vocal effort as “the quantity that ordinary
speakers vary when they adapt their speech to the demands of an increased or decreased
… distance”. They investigated the acoustic effects of the adjustment of vocal effort as a
consequence of changes in the physical distance (0.3 to 187.5 m) between a speaker
(n=20) and the addressee. They found that the overall SPL in subjects’ production was
affected by vocal effort. However, the overall SPL was an ambiguous cue: i.e. when the
distance of the microphone form the subject was variable (uncontrolled), it became an
unreliable cue in terms of its correlation with vocal effort. In comparison with the overall
SPL, more reliable information on vocal effort was conveyed by spectral emphasis (a
methodological alternative to ‘spectral balance’, measuring SPL above 1.5 * f0 (Hz)
relative to the overall SPL), which was not affected by the location of the microphone,
age or sex of the speakers. ‘Spectral emphasis’ in Traunmüller and Eriksson’s (2000)
study is an acoustic inference of the same laryngeal adjustments as in Gauffin and
Sundberg’s (1989) study: i.e. the more asymmetrical the glottal pulse, the higher the
energy in the midfrequency ranges of the radiated spectrum.
Importantly, adjustments in laryngeal configuration are also found to reliably cue
linguistic properties such as accent and stress. In the light of Fry’s studies (1955; 1976)
trying and failing to link stress, overall intensity and loudness, Sluijter et al. (1996b;
1996a; 1997) re-addressed the issue of acoustic cues to stress and accent for Dutch and
American English. They argued that the laryngeal level needs to be taken into account in
studying the acoustic correlates of stress or accent.
In their first study Sluijter et al. (1996b) separated three prosodic conditions:
[+stress][-accent], [+stress][+accent], [-stress][-accent], for syntagmatically comparable
word pairs like “CAnon” versus “kaNON” in Dutch. They examined the acoustic
correlates of stress and accent other than pitch. In addition to the ‘traditional’ acoustic
correlates of stress such as duration, overall intensity and vowel quality, they also
measured ‘spectral balance’. Spectral balance is an acoustic inference of asymmetry of the
glottal pulse. It was measured by comparing spectral levels (dB) in four contiguous
frequency bands B1-B4: 0-0.5, 0.5-1.0, 1.0-2.0, and 2.0-4.0 kHz after normalisation for
vowel quality differences. The relative importance of each of the acoustic parameters was
defined by their statistical ability to discriminate between the three prosodic conditions in
a syntagmatic comparison. With regard to the hierarchy of acoustic cues for Dutch,
Sluijter & van Heuven (1996b) found that duration remains the most effective acoustic
correlate of stress. However, spectral balance (intensity levels in B2 to B4) appeared to be
33
a reliable correlate of stress (irrespective of accent) close in strength to duration. Overall
intensity and vowel quality were the poorest indicators of stress in Dutch.
Sluijter et al. (1996a) further established the hierarchy of acoustic cues for Dutch by
inferring laryngeal configuration using more established voice source measures (such as
open quotient, amplitude of volume velocity, closure rate/skewness of the glottal pulse
and glottal leakage) derived from inversely filtered radiated spectrum (for a discussion see
Ní Chasaide & Gobl, 1999). These additional acoustic inferences of vocal effort revealed
the same role as in the previous study (Sluijter & van Heuven, 1996b), confirming the
reliability of spectral balance as a measure of glottal pulse asymmetry, and confirming the
results for the hierarchy. The analysis of overall intensity indicated that it is not a reliable
acoustic correlate of stress, even though it reliably cued accent2.
With regard to the hierarchy of the acoustic cues for American English, Sluijter et
al. (1996a) found that duration, glottal parameters (high frequency emphasis and glottal
leakage in B1), and vowel quality reliably cue stress in the [+stress][-accent] condition,
while f0 and overall intensity are unreliable cues. Compared to Dutch, American English
stress patterns had somewhat more influence from vowel quality. The relevance of
intensity measurements in midfrequencies of the radiated spectrum as an acoustic cue to
prominence has been confirmed in other studies (Campbell, 1995; Heldner, 2003)
The perceptual relevance of vocal effort initiated at the laryngeal level as a cue to
prominence has been addressed to some degree, but needs more study. Potentially vocal
effort could be a perceptually relevant cue to stress, since perception of the loudness of a
pure tone depends on its frequency and intensity (Fletcher & Munson, 1933), and the
spectral midfrequency ranges, where the energy levels appear most differentiated for
stressed syllables as compared to unstressed ones, lay within the frequency region of 2 – 5
kHz, in which the human ear is sensitive to smaller changes in intensity levels (Robinson
& Dadson, 1956).
The perceptual primacy of spectral tilt in cueing voice quality has been empirically
found by e.g. Gobl & Ní Chasaide (1999a). With regard to prominence, Campbell (1995)
addressed the issue of the perceptual relevance of spectral tilt at the level of the statistical
ability of this parameter to discriminate between prominent/non-prominent syllables by
means of linear discriminant analysis (LDA), based on spontaneous speech production
data. However, the problem with Campbell’s link of prominence to perception is that
2
Like Traunmüller and Sundberg (2000), Sluijter et al. rigidly controlled the position of the microphone
from the speaker’s mouth, thus in more real-world recordings exhibiting variation in this the reliability of
overall intensity may not hold.
34
LDA statistics do not necessarily reflect real human perception of prominence and stress;
at least this claim should be empirically substantiated.
Sluijter (1997) addressed the issue of the perceptual relevance of the acoustic cues
to stress for Dutch in an experiment with synthesized polysyllabic nonsense stimuli like
“nana”. Syllable duration, energy in appropriate spectral bands, and overall intensity was
separately manipulated to mimic differences in stress. Besides, there was an extra
condition: i.e. with and without addition of reverberating noise. The stimuli were
presented to 24 phonetically trained and 22 phonetically naive Dutch listeners. Results
showed that overall intensity was a minor stress cue under all conditions; that when the
perception of stimuli was hampered by variable reverberating noise, vocal effort
implemented as spectral balance was the strongest perceptual cue (stronger than duration);
and that in the condition with a stable noise background spectral balance and duration
were primary cues close in strength.
Contrary to Sluijter’s (1997) finding for stress, Heldner’s (2001) study on spectral
emphasis as a perceptual cue to prominence in Swedish focal accents did not show any
effect on perceived strength of focal accents through manipulated ‘spectral emphasis’.
Heldner used already-accented words as a baseline for the manipulations. A potential
explanation for Heldner’s negative results could be that he used already-accented words
as a baseline without manipulating pitch. In the utterance contexts containing focal
accents one would expect pitch to be a primary correlate of prominence, so that the
listeners may have solely attended to the pitch information rather than to any changes in
spectral emphasis. If the pitch information was made unreliable (e.g. monotonous), the
listeners might have been forced to attend to other potential cues to prominence such as
spectral emphasis. However, we agree with Heldner (2001, p.57) that the perceptual
relevance of spectral emphasis in cueing prominence “at the upper end of the prominence
scale” such as in focal accents does remain to be proven.
To conclude this section, recent empirical studies (Campbell, 1995; Sluijter & van
Heuven, 1996a; Sluijter & van Heuven, 1996b; Sluijter et al., 1997; Traunmüller &
Eriksson, 2000; Jessen, 2002; Remijsen, 2002; Heldner, 2003) strongly underline the
importance of the laryngeal level (in addition to pulmonic) in conveying linguistic
information about stress and prominence in speech production and (somewhat less
strongly) in perception. Overall intensity is unreliable, while selective energy in spectral
midfrequencies measured as spectral balance, emphasis or tilt seems to reliably reflect a
laryngeal contribution to vocal effort, stress and prominence.
35
These empirical studies confirmed earlier (impressionistic) observations of linguists
and phoneticians (among others Jones, 1918; Bloomfield, 1933; Lehiste, 1977) that stress
and prominence are associated with increased vocal effort or force, and provided a good
solution to the puzzle as to why overall intensity is not a reliable cue to vocal effort and
what a more reliable cue is.
In Section 2.1.3 we address the differences between the acoustic correlates of the
accentual systems of Russian and Scottish English.
2.1.2.5
Functional Load
The theoretical concept that further connects the research variables in this study
(such as duration and spectral balance) is the idea of ‘functional load’, i.e. the notion that
the presence of certain phonological contrasts can influence the relative amount of work
done by other contrasts within a phonological system (Beckman, 1986).
For the acoustic correlates of accentual systems the hypothesis was formulated by
Berinstein (1979) upon her finding that in K’ekchi (a Mayan language spoken in
Guatemala) the presence of phonological vowel length contrast interacts with the use of
duration as a cue to stress, since duration did not serve as a perceptual cue to stress for
K’ekchi speakers. Thus, according to Berinstein, the use of duration as a cue to stress was
precluded due its function as a cue to phonologically contrastive length in K’ekchi. Other
cues (f0 and intensity) played a greater role in conveying stress.
There are at least two problems with the Berinstein’s account of functional load. In
Berinstein’s study, the stress condition was treated in Fry’s (1955) fashion: i.e. the
polysyllabic target words with stress were recorded in prominent positions only. Thus,
stress always had a confounding effect of the accent, and f0 could be a correlate of accent
rather than of stress (see Ladd, 1996 for the criticism of this approach). Another problem
with Berinstein’s formulation of the functional load is its absolute terms: i.e. the use of
duration to cue stress is precluded in the presence of a phonological vowel length contrast.
Recent studies (Potisuk et al., 1996; Remijsen, 2002; Taff et al., 2004) provide evidence
that, while the notion of functional load seems justified as such, the functional load
hypothesis needs a more relative interpretation.
For example, Taff et al. (2004) found for Aleut (an Eskimo-Aleut family language
member with a 3-vowel system with contrastive phonological vowel length) that, unlike
Berinstein’s findings for K’ekchi, stress in Aleut does increase vowel duration, despite the
presence of lexically contrastive vowel length. The increase in duration in Aleut due to
36
prominence is of a much lesser extent than, for example, in English, i.e. a language where
phonological length contrasts are less frequent and involve additional differences in vowel
quality. Taff et al. conclude (2004) that Aleut uses duration as a weaker acoustic correlate
of stress than English. Therefore, the statement of the functional load on a suprasegmental
acoustic correlate of stress is a matter of degree rather than absolute.
These different interpretations of the scope of functional load with regard to stress
may at least partly result from the differences in methodology: i.e. Berinstein derived her
absolute interpretation from perceptual experiments, while both Potisuk (1996) and
Remijsen (2002) derived their relative interpretations of the functional load based on
statistical regression from their production data; and Taff et al. (2004) from the raw
production data only. Despite these methodological differences and the fact that speech
perception is known to work more categorically than speech production, it seems that the
idea of functional load in its relative interpretation is more useful, since the relative scalar
interpretation includes the case of preclusion, while the absolute interpretation excludes
scalability. The relative interpretation of functional load is also compatible with the stressaccent hypothesis, in that according to it, the same phonological property can have a
different phonetic implementation across languages and within a language, so that the
phonetic implementation of stress is question of extent rather than of an absolute fashion.
A way to measure the functional load of accentual contrast within a language is by a
syntagmatic comparison of the phonetic properties of stressed and unstressed syllables in
polysyllabic words, like “INcrease” and “inCREASE”. This method has been traditionally
applied over the years for different languages (for example Fry, 1955; Berinstein, 1979;
Beckman, 1986; Sluijter & van Heuven, 1996b; Potisuk et al., 1996; Remijsen, 2002; Taff
et al., 2004). However, since phonological properties of (non-) stress accent may not
necessarily be phonetically uniform within a language, a paradigmatic comparison should
also be possible, at least in those places of the phonological systems where the amount of
work of the acoustic cues representing the property is expected to differ due to the
presence of some structural contrasts (functional load).
For example, for K’ekchi Berinstein (1979, p.34) found that in the presence of
phonological vowel length and, thus, unavailability of duration to cue stress (in prominent
positions) f0 and peak intensity of stressed syllables are more important perceptual and
acoustic cues to stress than duration. In addition, vowel peak intensity was also found to
play a secondary role in a paradigmatic contrast in distinguishing short and long vowels in
words with the same structure and utterance position. In such words, the peak intensity in
37
short vowels was on average 1 dB higher than that in the long ones. The subjects’ distance
from the microphone was fixed. At first sight, one can doubt the significance of such a
small difference in overall intensity as that reported by Berinstein. However, there are
more studies pointing in the same direction. This suggests that there may be a more salient
underlying cause to this difference in peak intensity.
Fónagy (1966) found for Hungarian (a language with lexically contrastive vowel
length) that in the same utterance position and prosodic context short vowels have higher
overall intensities than their long counterparts. Since Hungarian vowel length opposition
also involves vowel quality differences (i.e. long vowels have more tense and close
articulations than the short ones), Lehiste (1977, p.121) interpreted this overall intensity
difference in favour of the existence of intrinsic intensities (i.e. due to vowel quality
differences). While we agree that vowel quality could be a confounding factor, we argue
that, in this paradigmatic comparison of prominent syllables, we should also consider the
concomitant effect of another factor, i.e. greater vocal effort exerted by Hungarian
speakers to achieve sufficient prominence for the phonologically (and phonetically) short
vowels in the prominent positions.
As we discussed in the previous section, differences in vocal effort can be measured
by looking at intensities in midfrequencies. We know that given stringent control of the
speaker mouth distance from the microphone (like in Gauffin & Sundberg, 1989; Sluijter
& van Heuven, 1996b; Traunmüller & Eriksson, 2000), SPL is directly proportional to
spectral balance. So that given stringent control of the recording settings, significant
differences in spectral balance may be proportional to less significant differences in
overall intensity (as e.g. about 2 dB on average measured by Fónagy, and 1 dB in
Berinstein’s study). Thus, it is possible that the seemingly insignificant differences in
overall intensity in Hungarian and K’ekchi short and long vowels resulted from more
significant differences of intensities in the midfrequency range, very much as in the
studies of the acoustic cues of stress and accent (Campbell, 1995; Sluijter & van Heuven,
1996b; Heldner, 2003).
Given that the above paradigmatic differences in overall intensities are systematic, it
could potentially be argued in a stress-accent hypothesis stance that it is not only that
presence of phonological length may limit the relative extent of use (possibly including
complete preclusion) of duration as a cue to prominence, but it may also affect the extent
of employment of the other secondary acoustic cues to stress, such as spectral balance.
The vowel quality can not be affected for these reasons, since in prominent positions it is
38
also ‘occupied’ for segmental phonological oppositions, while the overall intensity has
been found to be too ‘elusive’ (Lehiste, 1977) and unreliable (Traunmüller & Eriksson,
2000) to cue vocal effort.
Jessen (2002) showed that in German, spectral tilt (H1-A2, and H1-A3, following
methodology in Hanson, 1997) was an important discriminant of the tense/lax vowel
opposition in prominent syllables. He found that in prominent positions German tense
vowels have significantly lower intensities in frequencies around F2 than the lax ones.
This finding also means that the traditional phonetic term ‘tense’ is in fact confusing, as it
can stand for a more ‘lax’ configuration in laryngeal terms, and vice versa. Jessen
instantiated
his
acoustic
inferences
of
laryngeal
effort
with
more
direct
electroglottographic evidence: the intensity differences in midfrequencies were indeed a
consequence of the asymmetrical glottal pulse.
Jessen interpreted his evidence in favour of the syllable-cut prosody (e.g.
Trubetskoy, 1939). According to this theory, the vowel [] in the word “Mitte” (center) is
‘cut-off’ by the following consonant within the same syllable, while [i] in “Miete” (rent)
the vowel is simply followed (but not interrupted) by the consonant belonging to another
syllable. However, syllable structure alone provides only a partial picture in this German
contrast, since substantial phonetic differences in vowel quality and vowel duration
should also be considered.
For example, Stevens (1998, p.297) pointed out that there are more than just
segmental differences to the English tense/lax contrast. There may also be differences in
the laryngeal configuration involved in addition to vowel quality. The more breathy
laryngeal configuration for the tense vowels reduces spectrum amplitudes in
midfrequencies, whereas a less breathy laryngeal configuration for the lax vowels
enhances the amplitude of mid-frequencies. It is thus possible that the differences
measured by Jessen were due to a different voice configuration adapted by the German
speakers to mark the tense/lax vowel contrast. Besides, since the German tense vowels are
roughly twice as long as the lax ones (for an overview see Whitworth, 2003), the
difference in spectral balance might be explained by the differences in vowel duration.
There is an important terminological note to make here. It is common in the voice
source variation literature to view different voice qualities as being on a continuum with
‘modal’ voice source configuration being a neutral midpoint (Ní Chasaide & Gobl, 1999).
Small deviations in adductive tension, medial and longitudinal compression of the vocal
39
fold from this midpoint are usually described as ‘tense’ (with higher values of these
parameters) or ‘lax’ (with lower values) (Ladefoged, 1971). Extreme changes resulting in
perceptually different voice modalities with similar parameter changes are accordingly
described as ‘creaky’ or ‘breathy’ (Ní Chasaide & Gobl, 1999). Steven’s (1998) remark
mentioned above implies that there is a big terminological problem: i.e. the ‘tense/lax’
supralaryngeal vowel quality means the opposite at the laryngeal level. To avoid the
terminological confusion, in this study we shall use ‘more breathy’ for ‘laxer’ laryngeal
configuration, and ‘less breathy’ for the ‘tenser’ one, while we limit the terms ‘tense’ and
‘lax’ to the segmental opposition only. This also implies that the ‘neutral’ voice modality
midpoint this study is ‘breathy’, rather than ‘modal’, which is in fact more applicable to
female and child voices used in this study (Hanson, 1997; Kent & Read, 2002) of this
study.
40
2.1.3 Segmental Differences between Scottish English and Russian
2.1.3.1
Russian vowel system
Russian features the six vowel phonemes shown in Table 2-1. Thus, the system of
phonological oppositions involved is relatively small.
Table 2-1 Russian vowel phonemes (Bondarko, 1998)
i

u



However, the phonetic vowel space (Table 2-2) is quite crowded, reflecting the
contextual variability of vowels. The variability is mainly a result of the presence of two
features in the Russian phonological system:
Table 2-2 Russian vowel allophones (adopted from Bondarko, 1998; Kuznetsov, 1997)
i


e
u




æ
a

(1) Consonant palatalisation influences the following vowel allophone. The vowel []
appears after palatalised consonants, and in complementary distribution with the main
allophone of phoneme /u/, as in [luk] ‘onion’ versus [lk] ‘hatch’. Similarly, the sound
[e] in stressed syllables is in complementary distribution with the main allophone of
phoneme //, and [æ] with the main allophone of phoneme //.
(2) Like English, Russian features vowel reduction. However, the patterns of reduction
are more complicated in Russian. The sounds [],[],[],[] in Table 2-2 typically appear in
unstressed syllables. Additionally, [] appears in both stressed and unstressed syllables.
41
The vowel reduction patterns jointly depend (a) on the position of the unstressed syllable
in relation to the stressed one; (b) the position of the unstressed syllable in relation to the
word onset; (c) the underlying phoneme of the unstressed vowel; (d) whether or not the
unstressed syllable has an onset. Depending on the above four factors, there are three
vowel reduction patterns in Russian. For example, // in:
“ostorozhnogo” (‘from the careful one’) [.st.r.n.v]
is reduced to [] the 1st pre-tonic syllable or in the word-initial unstressed syllable without
onset; // is reduced to [] in any other pre- or post-tonic syllable. The phonemic contrasts
between // and // (as well as between /i/ and //) are neutralised in unstressed syllables.
2.1.3.2
Scottish English vowel system
With regard to the phonology involved in lexical contrasts, the SSE vowel system is
more crowded than the Russian one. There are thirteen vowel phonemes. Ten of them are
vowel monophthongs (see Table 2-3), of which // (schwa) appears only in unstressed
syllables, and /  / appear only in closed syllables. Besides, it features the three
diphthongs /ai a i/.
Table 2-3 Scottish English vowel monophthongs (adopted from Wells, 1982)
i


e


o
/
a
As we have mentioned in Section 2.1.1, the system of SSE monophthongs is
different to Southern Standard British English (SSBE) and smaller in the number of
oppositions involved, since it retains a Scots phonology. The differences in the
monophthongs between the two standard varieties are shown in Table 2-4.
Table 2-4 Comparison between monophthong phonemes between SSE and SSBE (adapted from Matthews,
2002)
Word
SSE
SSBE
foot – goose
palm-bath-trap
//
//,/u/
/a/
//,//,/a/
lot-thought
//
////
42
It is important to note these cross-varietal differences in the context of the study,
since bilingual and monolingual children in Edinburgh are exposed to different English
varieties through mass media, nurseries and community (see more in Section 3.2.1).
2.1.3.3
Segmental Differences in the Focus of Investigation
Scottish English features a tense/lax contrast between /i/ and //, while such a
contrast is absent in Russian (it has only the phoneme /i/). In the bilingual context, this
constitutes a systemic difference that is a potential ‘docking site’ for language interaction.
We will discuss the bilingual acquisition studies dealing with this particular contrast and
its apparent difficulty in acquisition in Section 2.2.
The SSE tense/lax contrast is different from SSBE. In SSBE phonological
opposition usually implies both a difference in vowel quality (tense/lax) and a phonetic
difference in vowel duration (with the lax/tense ratio of duration of 0.7 in the same
consonantal context). However in SSE, the tense/lax opposition does not involve an
extensive phonetic difference in duration; and it is featured only in the vowels /i / . Both
“ship” and “sheep” are short in SSE (Aitken, 1981).
In the bilingual Russian/Scottish English situation, according to the ‘Cross-language
Cue Competition Hypothesis’ (CCCH) (Döpke, 1998; Döpke, 2000) discussed in Section
1.3.2.3, the situation involving absence of the tense/lax contrast (Russian) should have
stronger ‘cue strength’, than the situation involving its presence (SSE). Thus if we
extrapolate the CCCH to the level of speech, for this type of systemic difference we
should observe unidirectional language interaction from Russian into Scottish English, but
not the other way around. The alternative ‘Dominant Language Hypothesis’ (DLH)
(Petersen, 1988) would predict language interaction from the more dominant into the less
dominant language, irrespectively of their structure.
The second difference relevant to this study is between the phonetic quality of close
rounded phonemes: i.e. in SSE a more central [] and in MSR a back [u].
Crosslinguistically, this constitutes a realisational phonetic difference alongside the
frontness – backness dimension.
In Döpke’s (1998; 2000) terms, structures involved in such a realisational difference
should have a similar ‘cue strength’. CCCH would not predict any language interaction
here. On the other hand, DLH (Petersen, 1988) would predict a unidirectional language
interaction from the more dominant language into the less dominant one.
43
Let us now consider the crosslinguistic differences between these vowels from the
acoustic point of view. Figure 2-2 shows an acoustic representation of the vowels /i/, /u/,
// and // in SSE, SSBE and MSR adopted from four acoustic studies (Bondarko, 1998;
Deterding, 1997; Kuznetsov, 1997; Walker, 1992). The measurements are averages from
adult female speakers. The vowels show the extremes of the vowel space.
F2 (Hz)
2500
2000
1500
1000
u
i

500
250
u


350
450
550
650
750
850
a
F1 (Hz)
3000
SSE
MSR
SSBE
950

1050
1150

1250
Figure 2-2 Acoustic representation of SSE, SSBE and MSR cardinal vowel space (adopted from Bondarko,
1998; Deterding, 1997; Kuznetsov, 1997; Walker, 1992) .
The vowel [i] seems to be similar in the three languages. The phonetic realisation of
// in the SSE speakers (Walker, 1992) is lower in comparison with the SSBE speakers
(Deterding, 1997). The main allophones of MSR /u/ and SSE // are very different
acoustically, with the Russian vowel being back. Another striking issue is that the Russian
close back rounded vowel /u/ and SSBE more central /u/ are annotated with the same
phonetic symbol in the literature. While different studies (Bauer, 1985; Deterding, 1997;
Hawkins & Midgley, 2004) (see also Section 3.6.4.2) reported that fronting has been
progressing in the RP /u/ over the years, the phonological representation reflects for
phoneme in its state of 40 years ago.
2.1.4 Prosodic Differences between Scottish English and Russian
Despite substantial typological differences in grammar and lexicon, Scottish English
and Russian have quite similar word-prosodic systems.
44
Both languages have variable syllabic location of lexical stress in that stress can
create a syntagmatic opposition of polysyllabic words in both languages and change their
meaning or grammatical properties:
For example:
Russian “ZAmok” (a castle) versus “zaMOK” (a lock)
Scottish English “an INcrease” versus “to inCREASE”
Both Scottish English and Russian are ‘stress accent’ languages (Beckman, 1986).
They employ stress (primarily encoded by duration) for syntagmatic contrasts between
words. Pitch conveys the pragmatic meaning of intonation rather than being a correlate of
stress, and it is aligned with stressed syllables. Pitch functions on a different distinct
dimension of prominence. Table 2-5 summarizes the main similarities and differences
between Russian and SSE word-prosodic systems.
Table 2-5 Broad differences and similarities between Russian and Scottish English word-prosodic systems.
Russian
Vowel reduction in unstressed
syllables?
Acoustic correlates of stress-accent
Suprasegmental paradigmatic
contrasts available?
Intonation
Scottish English
Yes
1. Duration
2. Vowel quality
2. Spectral balance
3. Spectral balance?
3. Vowel quality
4. Intensity
No
Yes (SVLR)
Pitch movement is not fixed at lexical level.
Pitch expresses variable pragmatic meaning.
Changes in pitch are associated with stressed
syllables.
Both languages feature vowel reduction. However, the rules of vowel reduction are
more complicated in Russian. We discussed their implementation in Section 2.1.3.1. As a
result, in Russian vowel quality seems to play a relatively more important perceptual role
in distinguishing word stress (Bondarko, 1998; Svetozarova, 1998) compared to English.
In Russian, it is second in strength as an acoustic correlate of word stress (Table 2-5). We
are not aware of any studies addressing the role of spectral balance as a correlate of wordstress in Russian. However, Bondarko (1998, p.55) supports the traditional viewpoint of
Russian phoneticians that in Russian the pronunciation of vowels is characterised by a
rather slack articulation in prominent positions. This observation may indicate a
secondary role of the acoustic parameter ‘spectral balance’ (which implies relative
45
slackness/tenseness of the glottal configuration) as compared to the primary role of vowel
duration and vowel quality.
We have found no studies on the order of importance of acoustic correlates of the
word-prosodic system in Scottish English. However, other varieties, like General
American have been studied in detail (Sluijter & van Heuven, 1996a; Beckman, 1986;
Fry, 1955). As we discussed in Section 2.1.2.3, American English word stress is encoded
by duration, spectral balance, vowel quality and intensity, with duration and spectral
balance being close in strength (Sluijter & van Heuven, 1996a). The word-stress in
American English is defined at lexical and morphological levels in a way similar to SSE.
Both varieties use largely the same lexicon and grammar, as well as largely the same rules
of syllabification and vowel reduction. We assume, then, that the order of importance of
the acoustic cues to word stress in the SSE word-prosodic system should be similar to that
in American English.
As we discussed in section 2.1.2.5, the presence of suprasegmental contrasts in a
language (like phonological tone or length) may affect the relative strength of acoustic
correlates to prominence and word-stress in a word-prosodic system. Scottish English
features ‘The Scottish Vowel Length Rule’ (SVLR) (Aitken, 1981; Scobbie et al., 1999a;
Scobbie et al., 1999b). It involves a highly systematic distribution of vowel duration
conditioned by postvocalic consonantal voicing and manner of articulation. SVLR applies
to vowels /i/,// and /ai/, and it is conditioned by either (Scobbie, 2002):
•
the right consonantal context of the vowel: i.e. voiced fricatives and /r/
condition long duration of the vowel, all other consonants condition short
duration;
•
the morphological context following the vowels: i.e. word-final open
syllables are long like in “brew”, and they remain long if followed by a
morpheme “_ed” like in “brewed”.
The differences between the application of the morphological conditioning and the
consonantal conditioning in SSE create a quasi-phonemic length contrast in a limited
number of words like “brood” /brud/ and “brewed” /brud/ (Scobbie, 2002).
Postvocalic consonantal conditioning of vowel duration has been claimed to be a
phonetic universal (Chen, 1970), i.e. the duration is lengthened automatically due to the
voicing of the following consonant with a number of mainly physiological phonetic
46
explanations (for a review see Lisker, 1974). The SSE SVLR pattern contradicts this
automatic argument, since there is a strong segmental dependence in its applicability.
Neither does the argument stand for Russian, since the language features final devoicing.
As Keating (1984) put it, in Russian the “duration pattern was apparently determined by
underlying values of the voicing features” (1984, p. 123), rather than by any physical
voicing of the consonants. We agree with Keating that such vowel duration patterns are
not universally predictable, but instead “each language must specify its own phonetic
facts by rule” (Keating, 1984, p. 123). This also means that vowel quantity and quality
are interdependent in a language-specific way, and monolingual and bilingual speakers
should acquire the patterns. For example, if a child produces the Russian close back
rounded [u] instead of the central [] in Scottish English it is worth considering, whether
negligible postvocalic vowel duration conditioning in Russian also replaces the SVLR.
Table 2-6 sketches the differences in the consonantal conditioning between Scottish
English and other varieties (such as Southern Standard British English or General
American). The table shows the examples of consonantal conditioning for the vowel //,
and the relative length triggered by the following consonants. There are substantial crossvarietal differences in the application of the postvocalic conditioning: i.e. in SSBE vowel
duration is mainly conditioned by the voicing of the following consonant, while in SSE
both voicing and manner of articulation (voiced fricatives) trigger longer vowel duration.
Unlike in SSBE, the SVLR conditions short duration in vowels followed by voiced stops
like in “brood” or “seed”. Similar differences as in Table 2-6 apply to the vowel /i/.
Figure 2-3 represents acoustic differences in vowel duration (ms) between the
postvocalic conditioning for the vowels // and /i/ in SSE and that in General American. In
SSE, the SVLR applies to the tense vowel, and does not to the lax one to the same extent
(Agutter, 1988; McKenna, 1988). The SSE lax vowel has been described as ‘invariably
short’ (Aitken, 1981). In General American, there is no such a differential
implementation: i.e. voicing of the following consonant seems to trigger vowel
lengthening in all contexts in both tense and lax vowels (House, 1961).
As we discussed in Section 2.1.2.4, the availability of certain paradigmatic contrasts
in a phonological system (such as length) can affect the system of syntagmatic contrasts
of accentual systems (cf. functional load in Section 2.1.2.5). Since the SVLR in Scottish
English is so different from the voicing effect in other English varieties, it can be expected
47
that the phonetic detail of the phonological accentual system in SSE might somewhat
differ from either American English or SSBE.
Table 2-6 Broad characterisations, for one representative vowel [], of vowel duration conditioning effects
by various contexts in SSE and SSBE (adapted from Scobbie et al., 1999a)
Dialect of
English
_n
SSE
Morphological
context
Duration Consonantal context
_s
_z
Longer
_d
Bruise
Shorter spoon Bruce
SSBE (or Gen.Am.) Longer spoon
Shorter
_t
_#
_#d
Brew brewed
brute brood
Bruise
Bruce
brood Brew brewed
brute
400
350
vowel duration (ms)
300
250
SSE tense
SSE lax
GA tense
GA lax
200
150
100
50
0
stop -voice
stop +voice
fric -voice
fric +voice
right consonantal context
Figure 2-3 Acoustic differences in extrinsic vowel duration conditioning (raw duration in ms) for close
vowels between SSE and General American. The solid lines represent tense close vowels, while the broken
lines represent the lax ones (adapted from House, 1961; Agutter, 1988; McKenna, 1988)
We can more certainly assume that the Scottish English word-prosodic system is
different from the Russian one, since in Russian, vowel quality is the 2nd important
acoustic cue to stress, and the extent of postvocalic conditioning of vowel duration is
small compared to the SSE system (Chen, 1970; Gordeeva et al., 2003). For example, as
48
shown in Figure 2-4, for /i/ the increase in vowel duration conditioned by the change of
the following consonant from /t/ to /z/ is 118% in Scottish English and only 18% in
Russian (Gordeeva et al., 2003).
250
duration (ms)
220
190
Russian fric +v
Russian stop -v
Scottish fric +v
Scottish stop -v
160
130
100
70
pos1
pos2
pos3
position in utterance
Figure 2-4 Mean duration (ms) of /i/ in SSE and Russian prominent CVC words as a function of the
following consonant (per position in utterance pos1= medial, pos2=final in an utterance with more than one
pitch accents, pos3=final in an utterance with one pitch accent).
Gordeeva et al. (2003) also addressed crosslinguistic differences in vocal effort
across prominent words in Russian and Scottish English. The study addressed the
question whether short and long SVLR vowels /i/ in prominent words like “sheep” and
“cheese” differ in vocal effort (as determined by spectral balance), and whether SSE
pattern is different from the Russian one given phonologically similar word structure and
utterance position. The materials were the same as in this study. The vowel /i/ in the CVC
words (e.g. “sheep” versus “cheese”) was followed by either phonologically voiceless
stops or voiced fricatives (see Section 3.4.1 implications of phonetic differences). The
words were compared in multiword utterances in similar prosodic contexts. Similar
structure was applied to the Russian words and utterances. The subjects were female
middle class speakers (five Scottish and four Russian), aged between 25 and 45.
Spectral balance was measured in a steady-state portion of /i/ in four fixed
frequency bands around F1 to F4 for each token of /i/. The methodology was addressed in
detail in Gordeeva et al. (2003). The results are shown in Figure 2-5. They revealed that in
SSE midfrequency bands of 2.5 to 4.5 kHz, the short /i/ had significantly higher RMS
49
power than the long /i/. In Russian, the contextual difference in the spectral balance of /i/
was not significant, but the RMS-power means were close to those of the SSE long vowel.
Gordeeva et al. (2003) showed that spectral balance is a relatively more important
acoustic correlate of the SSE word-prosodic system in than of the Russian one, since in
Scottish English the application of SVLR in the vowel /i/ differentially affects the spectral
balance of prominent short and long vowels, and results from greater vocal effort adopted
by the speakers in short vowels in order to make them sufficiently prominent in the
utterance. This context-dependent enhancement of spectral balance exemplifies the
functional load of SVLR on the Scottish word-prosodic system. This supports Beckman’s
(1986) dynamic view of an accentual system, in which phonological categories of the
0
-5
B1 B 2 B3 B4 B1 B2 B3 B 4 B1 B 2 B3 B 4
-10
-15
(dB)
ratio of spectral level to overall intensity
systems are not necessarily phonetically uniform across languages or within a language.
pos 1
pos 2
pos 3
-v p lo s S c o t t is h
+ v fric S c o t t is h
-20
-v p lo s Ru s s ia n
-25
+ v fric Ru s s ia n
-30
-35
-40
fre q u e n c y b a n d / p o s it io n
Figure 2-5 Mean spectral level (dB) in 4 frequency bands in three utterance positions in Scottish and in
Russian. B1 = mean F1± 150 (Hz), B2 =mean F2 ± 300 (Hz), B3 = mean F3 ± 300 (Hz), B4 = mean F4 ±
300 (Hz).
In Russian, the role of spectral balance was less important than in SSE, since the
spectral balance was undifferentiated between the postvocalic conditioning contexts, and
generally was as low as that of the SSE long vowel, indicating a rather slack glottal source
configuration in the CVC words. The result was not surprising given the small extent of
extrinsic vowel duration conditioning in Russian and the relatively great importance of
vowel quality as a cue to stress.
50
2.2 Language Interaction in Bilingual Acquisition of Vowel
Quality
2.2.1 Monolingual Acquisition
2.2.1.1
Non-Scottish English and Scottish English
Studies on vowel development in young American English-speaking monolingual
children (Stoel-Gammon & Herrington, 1990) suggest that vowel monophthongs can be
grouped into three categories based on their accuracy rates and the order of acquisition. At
the same time, vowel substitution patterns in earliest normal and disordered acquisition
highlight the more ‘difficult’ phonological categories that children have to acquire. For
American English, Stoel-Gammon and Herrington (1990) report (based on auditory
description) that:
(1) The corner vowels /i,,u/, midback /o/, and central // are acquired relatively early.
These vowels seem to cause the least difficulties for both normally developing and
children with phonological disorders. However, some children with phonological
disorders substitute the tense vowel [i] with the lax counterpart [], while [u] can be
substituted by [o].
(2) The group /æ,,,/ is acquired somewhat later than the first group of vowels.
(3) The front vowels /e,,/ are acquired the latest among these vowels. The target [] is
sometimes substituted with [i] in phonologically disordered child speech.
However, other recent studies on the acquisition of tense/lax vowel contrast,
contradict the above findings to some extent, at least for the tense/lax opposition. Kehoe
and Stoel-Gammon (2001) found, based on auditory transcriptions that while normally
developing English-speaking children (n=14, aged 1;3 to 2;0) systematically substituted
[i] for [] and the other way around (at least for the youngest children), the number of
realisations was limited and the majority of their productions had adult-like vowel quality.
Besides, other substitution patterns of [] with [] have been reported in Otomo and StoelGammon (1992): i.e. lowering of // is the most common pattern in their dataset.
In American English, the tense/lax contrast in vowel quality also involves a contrast
in phonological length (long/short phonetic duration), with a lax-to-tense duration ratio (in
words like “bit” versus “beat”) of .71 (House, 1961). With regard to the interplay of these
51
factors in acquisition, Stoel-Gammon et al. (Stoel-Gammon et al., 1995; Buder & StoelGammon, 2002) found that children acquiring such contrasts in American English initially
acquire the differentiation in vowel quality only rather than the durational differences,
which are added later.
To summarise, based on previous research, we can anticipate (at least for children
acquiring American English) that the tense/lax contrast should be already established by
the age of 3;0 (i.e. the starting age in our study). Given substantial cross-varietal
differences between American and Scottish English, we need to treat these findings with
caution in transposing the possible consequences for children acquiring SSE. However,
American English monolingual data show that the tense/lax opposition poses a certain
degree of difficulty in child speech production, while the tense counterparts of the
opposition /i/ and /u/ seem to be relatively easy to acquire.
As opposed to American English, the SSE close rounded vowel // is central (or
even front), rather than back (Wells, 1982; Walker, 1992; Scobbie et al., 1999b), and
unlike American English or RP it does not involve a tense/lax opposition. Contrary to the
acquisition patterns for American English, Matthews (2002) found a very broad range of
‘non-adult-like’ realisations in SSE vowel //. Matthews (2002) is the only longitudinal
study devoted to the acquisition of vowels in Scottish English children. It deserves close
attention, because it also suggests the range of segmental variation that can be considered
‘native-like’ in the speech of our bilingual subjects.
Matthews’(2002) study focused on the acquisition of segmental aspects of SSE
vowels in Scottish English children (n=7, aged 18 to 36 months) growing up in Edinburgh
middle class families. Matthews’ analysis concentrated on the qualitative aspects of vowel
mastery. He discussed the developmental trends of the children’s vowel production in
terms of being ‘adult-like’ versus ‘non-adult like’. The vowels with the accuracy (in
reaching adult-like targets) of production above one standard deviation in individual
sessions were labelled as ‘easy’, while the ones below one standard deviation were
labelled as ‘difficult’ for the children. His conclusions were mainly based on narrow
phonetic transcriptions. Matthews found a broad range of segmental variability in the
vowel production of the SSE children. It ranged from a substantial variation in quality
(such as nasalisation, rhoticity or rounding) to approximant-like and consonantal
realisations.
52
Regarding the close central rounded //, he found that this vowel is a ‘difficult’ type,
responsible for a big range of ‘non-adult-like’ realisations in child speech. The ‘difficulty’
in acquisition of SSE // was surprising, since the vowel belongs to the traditional set of
the ‘corner vowels’ /a,i,u/, often considered in the literature to be acquired first, according
to the “law of irreversible solidarity” formulated by Jakobson (Jakobson, 1941). Thus,
Matthews argues that “the primacy of acquisition of the corner vowels [..] does not
necessarily hold true for the case of” SSE (Matthews, 2002, p. 268). On the one hand, we
agree with Matthews’ criticism that Jakobson’s “law of irreversible solidarity” has proven
to be an overgeneralisation of some tendencies in child speech (see e.g. discussion in
Menn & Stoel-Gammon, 1995).
On the other hand, Matthews’ criticism is not necessarily empirically substantiated
in his dataset. Jakobson’s analysis of the corner vowels handles the acquisition of vowel
contrasts in terms of the order of ‘emergence’(Jakobson, 1941), rather than in terms of the
relative degree of adult-likeliness, as measured by Matthews. Thus, it is also possible that
the SSE // emerges early, while pertaining a broad range of non-adult-like realisation for
some time.
The substantial range of variability in the child realisations of SSE adult target //,
reported by Matthews, is an important finding. In Table 2-7, we summarise the most
common substitutions for // in his dataset for the latest longitudinal sessions (age 29 to
36 months). From this list, we excluded idiosyncratic realisations with frequency 1, and
12 cases with rhoticity attributed to productions of one child.
Table 2-7 Most frequent ‘non-adult-like’ substitutes for SSE target // in child speech (adapted from
Matthews, 2002).
Nr tokens
%
o
13
23

11
20

9
16

7
13
u
5
9
ø
3
5

3
5

3
5

2
4
53
As shown in the table, 20% of non-adult realisations of [] can be attributed to
changes in lip rounding (i.e. unrounding). 29% of the cases can be attributed to both
changes in rounding and quality (lowering, and/or backing or fronting). In fact, similarly
to Stoel-Gammon and Herrington’s (1990) finding for phonologically disordered speech,
the most frequent ‘non-adult-like’ realisation in SSE child data for // is [o]. The total
percentage of non-adult like realisations varied from child to child. For example, Esther
(2;9) makes only 28% of errors, while Ben (2;8) 75%.
Also, interestingly, a number of the realisations involve the lax vowel [], which
does not feature as a phoneme in the phonological system of adult SSE speakers. Of
course, the presence of this lax realisation of // and its broad phonetic range could simply
be due to speech immaturity. However, in addition to that, the relatively late acquisition
of the SSE adult-like quality of // could also be attributed to cross-varietal influences on
SSE from other British English varieties. Most children in Edinburgh are exposed to nonSSE varieties indirectly through TV, but additionally one fourth of the middle class
children grow up in families with at least one parent from non-SSE British English
background (Scobbie et al., 1999a). Children attending local nurseries are regularly
exposed to different varieties of British English from either staff members or peers (see
also Section 3.2.1 for discussion). The SSE //, and SSBE /u/ and // are tightly clustered
in the vowel space (see Figure 2-2 ), and yet they are crossvarietally distinct. The
extensive exposure of children to the tightly-clustered phonetic variants of SSE and nonSSE varieties such as [] versus [u] and [] in the same lexical items may explain the
difficulties to acquire this particular SSE vowel as well as its non-corner location.
Acquisition of the vowels /i/ and // in Matthews’ dataset was not treated in terms of
a special opposition, since he focused on the ranges and processes for all SSE vowels.
However, from his data we can derive that like in American English the SSE tense vowel
/i/ is acquired early, since it belongs to the ‘easy’ category (i.e. all children produced most
targets [i] with the adult-like accuracy). Among the rare cases of substitutions, in the age
group of 29 to 36 months, the most common realisation for [i] (56% of all substitutions)
was the lax vowel [], indicating that the tense/lax opposition may also cause some
difficulty in monolingual SSE acquisition.
On the other hand, the lax // seemed to belong to the ‘difficult’ category, since the
number of ‘non-adult-like’ realisations was relatively high in all sessions. In the older age
54
group (29 to 36 months), the substitutions (see Table 2-8) mainly involved lowering in
vowel quality rather than raising (in at least 80% of cases), and there was only one case of
substitution by [i]. This is not surprising, since // in the adult ‘Scots continuum’
substantially varies alongside the degree of aperture (increasing F1) ranging from [] to a
more open [] quality (Wells, 1982, v.2 p.404). Walker’s (1992) data show that this is
also true for middle class SSE speakers from Edinburgh. The relative difficulty of
acquisition of SSE // (as compared to /i/) parallels the acquisition pattern reported by
Otomo and Stoel-Gammon (1992) for American English.
Table 2-8 Most frequent ‘non-adult-like’ substitutes for SSE target [] (adapted from Matthews, 2002)
Nr tokens
%

24
46

12
24
e
4
8

4
8
o
2
2
y
1
2

1
2
ou
2
2
i
1
2
To summarize, Matthews’ (2002) data on the acquisition of SSE vowels confirms
findings for American English on the relative difficulty to acquire the lax vowel // as
compared to /i/. In SSE, there is a limited number of substitutions of /i/ by the lax [] in the
course of monolingual acquisition (even at the age of 29 to 36 months), indicating a
certain tension in acquiring this specific contrast. As opposed to that, substitutions of //
mainly involve lowering. In comparison to American English, the acquisition of SSE
// is not straightforward and easy, since at the age of 18 to 36 months SSE child speech
production exhibits a broad range of variation with a substantial proportion of ‘non adultlike’ realisations, indicating relative difficulty in acquiring this vowel.
2.2.1.2
Russian
As discussed in Section 2.1.3, there is a substantial difference between the number
of segmental oppositions involved in Russian and SSE phonology. The Russian phonemic
vowel inventory is substantially expanded at the phonetic level due to the presence of the
55
palatalised – non-palatalised opposition in consonants, and a more complicated (compared
to English) system of vowel reduction. This section discusses previous findings on these
phonological issues in monolingual acquisition of the Russian sound system.
Shvachkin’s (1948/1948) investigation of the development of phonemic perception
in Russian gave an important impulse to study other languages in terms of the order of
acquisition of phonological contrasts. The suggested patterns of phonological acquisition
in his study replicate to some extent the views on the emergence of contrasts in terms of
‘the laws of irreversible solidarity’ in Jakobson’s Kindersprache (Jakobson, 1941).
Shvachkin’s longitudinal analysis of child speech (n=18, aged from 0;10 to 2;0) suggested
a common pattern of emergence of phonological oppositions. The ‘discrimination of
vowels’ (1948, p.123) is the first stage of phonological development. At this stage, // is
initially discriminated from all non-// phonemes; subsequently there emerges an
opposition of /i/ – /u/, // – //, /i/ – //, /u/ – //. The first stage is followed by the stage of
‘discrimination of presence of consonants’, and the further stages involve discrimination
between different consonants.
Even a brief comparison reveals that there are differences between the universal
orders postulated by Jakobson and Shvachkin. In Jakobson’s analysis (Jakobson, 1941),
the acquisition of the low vowel is proceeded by the emergence of the first consonantal
opposition between nasal and oral stops. However, the two analyses suggest that both
vowels /i/ and /u/ emerge early in the process of Russian child speech acquisition.
In parallel to findings for English (Menn & Stoel-Gammon, 1995), a recent study of
phonological development in Russian conducted by Zharkova (2002, p.48) (n=7, age 1;3
to 3;2) also puts a question mark against the universal order of acquisition suggested by
Jakobson. Despite some common tendencies (e.g. all children first acquired //, and
// was last), Zharkova reports quite variable orders of emergence of phonological
oppositions in her subjects’ speech (e.g. // – /u/ – // or // – // – /i/).
Table 2-9 summarises the frequencies of vowel phonemes in the speech of five
subjects from Zharkova’s study. It is clear that the order of acquisition is probabilistic
rather than deterministic, as the frequencies of occurrence of the 3rd - 5th frequent
phonemes differ across five children. Despite these individual differences, we can assume
that in monolingual Russian development all six vowel phonemes (in Table 2-9), will
have emerged and established to a certain degree by the age of 3;0.
56
It is established that palatalisation is perceptually more salient for Russian speakers
than the voicing/voicelessness distinction or even place of articulation changes (labial
versus dental) (Kavitskaya, 2002). The phonemic opposition of palatalised – nonpalatalised consonants in Russian profoundly influences the quality of the following
vowels, and this consonantal palatalisation is acquired early in phonological development
(Jakobson, 1941; Shvachkin, 1948; Tsejtlin, 2002). It is also known that one of the
common patterns of Russian child speech production in the first three years of life
involves an extensive substitution of non-palatalised consonants by the palatalised
counterparts (Jakobson, 1941; Zharkova, 2002).
Table 2-9 Frequency of vowel phonemes in 5 subjects (the higher the row – the more frequent the sound in
the table) (adapted from Zharkova, 2002).
Subject Nr (age)
1 (1;3)
2 (1;9)
3 (2;0)

i
u



i

u


-

i

u


4 (2;0)
5 (3;0)

i

u



i
u



This process of palatalisation ought to affect the quality of the following vowel: e.g.
the back vowel [u] should become more fronted after palatalised consonants. This fronting
due to speech immaturity in Russian could be confused with language interaction from
Scottish English, where // is central (or front) phonetically. Palatalisation of the
preceding consonant might be an additional cue to the language identification in such
cases.
With regard to this influence of palatalisation on the vowel quality in child speech,
Zharkova provided a limited analysis based on formant measurements for //, and she
carried out a more comprehensive analysis of phonemic substitutions encountered in her
child subjects. Formant analysis showed that as children get older, the segmental
differences between vowels following the palatalised consonants and those following the
non-palatalised ones become more differentiated: i.e. the vowel representing phoneme //
following the non-palatalised consonants becomes more open and fronted. Her analysis of
phonemic substitutions confirmed that in the age groups concerned, the substitution of
57
non-palatalised consonants by palatalised ones is a frequent feature of the Russian child
speech.
To summarise, based on previous research we can anticipate that by the age of 3;0
Russian monolingual children acquire the system of phonological oppositions involving
vowels in focus of this study. There should be no difficulty to produce an adult-like [i] in
stressed syllables. For [u] a certain amount of phonetically palatalised realisations of the
preceding non-palatalised consonants can affect the vowel quality, i.e. it can become more
fronted. Thus, in child speech the Russian back /u/ can become less different in formant
structure from the Scottish vowel [].
The variability in the order of segmental acquisition in both languages (MSR and
SSE) discussed in this section is in line with Vihman’s (2002) view on the emergence of
phonology in child language. According to her view, the initial phonological system is not
directly constructed in the child’s language in terms of segments, phonological contrasts
or distinctive features, but is rather based on explicit lexical ‘item learning’ in the second
year of life. This initial speech production base is derived from the implicit learning and
motor practice (‘Vocal Motor Schemes’) (Vihman, 2002) in the first year of life, and from
prosodic and segmental language patterns available in the input language. The
accumulation of vocabulary in the second year of life provides a further base to induce
regularities from the available set, and, thus, helps to form characteristic production
patterns (‘word templates’). Items learned are selected and restructured (if they don’t fit)
into these ‘word templates’. Since ‘item learning’ is incidental to some degree (in that it
also depends on the input), it predicts both variation in the individual paths of
development, and similarities in the paths of development of a specific language.
2.2.2 Bilingual Acquisition
The question of language interaction in bilingual acquisition has only recently
started to receive some attention from researchers, but studies on language interaction in
phonological development are scant. There are also no studies on early bilingual
phonological development of vowel systems dealing either with Russian or Scottish
English. As we discussed in Chapter 1, most of the studies on phonological development
in early bilinguals addressed the question of ‘one versus two systems’ (Schnitzer &
Krasinski, 1994; Schnitzer & Krasinski, 1996; Johnson & Lancaster, 1998; Deuchar &
58
Quay, 2000; Keshavarz & Ingram, 2002), and they focussed on speech sounds in terms of
their inventories, rather than structural contrasts between them.
Table 2-10 summarises the set up of five studies (Schnitzer & Krasinski, 1994;
Schnitzer & Krasinski, 1996; Johnson & Lancaster, 1998; Deuchar & Quay, 2000;
Keshavarz & Ingram, 2002) that addressed early bilingual phonological acquisition of
vowel systems. These studies employ similar methodologies in that they are based on
children exposed to the two languages from birth; all analyses are single case studies;
phonological analyses are drawn from diary records or auditory phonetic analysis. All
these studies found that the two vowel systems were differentiated in bilingual child
speech. However, the question of ‘one or two systems from start’ seems to deliver
controversial results and interpretations.
For example, regarding the question ‘one versus two’ phonological systems, the
results in two case studies of Schnitzer and Krasinski (1994; 1996) give different
outcomes. In the two consecutive studies Schnitzer and Krasinski addressed segmental
aspects of phonological development of their two children: Fernando (age 1;1 to 3;9) and
Zevio (age 1;6 to 4;6). The children’s father was a native speaker of American English,
and their mother spoke Puerto Rican Spanish. The children grew up in Puerto Rico. The
authors found no evidence in either case for an initial ‘single system’ in the acquisition of
vowels, while for consonants one child seemed to have an initial single system with later
differentiation while the other subject differentiated between the two consonantal systems
from the outset of speech production. The authors explain this discrepancy by individual
differences in ‘avoidance’ strategies employed by the children: i.e. Zevio avoided target
words that he could not pronounce (thus ultimately he produced more target-like forms),
while Fernando attempted them at earlier more immature stages. These two papers present
little evidence for language interaction.
Similarly Deuchar & Quay (2000) considered the question of one versus two
systems and the role of input the bilingual language acquisition. Their case study is an
exception in that it simultaneously addressed several domains of language acquisition: i.e.
lexicon, syntax and phonology. The subject M (aged 0;10 to 2;3) was raised following the
‘one parent – one language’ approach until around age 1;0. Her mother spoke British
English and the father Cuban Spanish. The family lived in England. After the age of 1;0,
Spanish became their home language while English was spoken outside home. Based on
the input until the age of 2;0, the girl was exposed to more English than to Spanish.
59
Table 2-10 A summary of five studies that dealt with bilingual phonological acquisition of vowel
inventories.
S&K,94
S&K,96
1
N
J&L1998
1
K&I,2002
1
Age
1;1 to 3;9
1;6 to 4;6
1;2 to 1;11 0;8 to 1;8
Mother speaks
Puerto Rican
Spanish
Puerto Rican
Spanish
Father speaks
American
English
American
English
Bokmål
Norwegian Farsi
Farsi &
Canadian American
English
English
Puerto Rico
Residence
Upbringing situation
Puerto Rico
Canada
UK -- Iran
two languages from birth
1
0;10 to 2;3
British English
(SSBE) & Cuban
Spanish
Cuban Spanish
UK
diary records, audio recordings, auditory phonetic analysis
Method
Questions
single initial system versus two systems from the outset?
Sizes of compared
vowel inventories
Vowel Systems
differentiated?
Language
Interaction
Observed?
One or two
phonological
systems initially?
S&K,94
S&K,96
J&L,98
D&Q,2000
K&I,2002
D&Q2000
1
small versus large
2 large
small versus large
Yes
Yes
yes
Yes
Yes
Yes
two (vowels)
single
(consonants)
No
two (vowels)
two
(consonants)
unclear
Yes
No
neither is
true
Two
Neither
Schnitzer & Krasinksi 1994
Schnitzer & Krasinksi 1996
Johnson & Lancaster, 1998
Deuchar & Quay, 2000
Keshavarz & Ingram, 2002
60
Deuchar & Quay’s (2000) method involved analysis of diary records and audiovideo recordings. Among others, the research included analysis of broad phonetic
inventory based on auditory transcription of words that entered the girl’s lexicon. Spanish
features a five-vowel system, and RP twelve monophthongs and eight diphthongs.
Deuchar & Quay (2000) reported that by the age of 1;10 the vowels produced by the girl
reflected those of the input languages, and followed the patterns found in the monolingual
acquisition. There is neither explicit report on variability ranges in the vowel production,
nor that of language interaction in this study. This is interesting, since the differences
between Spanish and RP in terms of contrasts and oppositions in the vowels systems is
similar to those of Russian and Scottish English.
As Vihman (2002) argues, the question of ‘one versus – two systems from start’, put
in such a mutually exclusive way may just be the wrong one to ask, since other plausible
questions can be asked as well. Vihman proposes a hypothesis of bilingual phonological
acquisition, in which a bilingual child in the pre-linguistic period implicitly develops a
considerable distributional knowledge about the languages in his environment, while
explicit lexical learning at the start of speech production allows inducing and building the
phonological knowledge about the two languages in contact. Thus, in fact there may be no
question of ‘one or two phonological systems’ at all when a child starts to speak, but the
systems can be constructed as phonological rules are gradually induced with the lexicon
growth. In fact, Deuchar & Quay (2000) who tried to address the question of ‘one or two
phonological systems’, have come to a similar conclusion that rather than having ‘one or
two systems from start’, their subject’s bilingual acquisition could be rather seen as “a
progression from a lack of system in either languages” to the establishment of a vowel
(and VOT) system in English and Spanish. In Deuchar & Quay’s view (2000, p.34) the
vowels acquired by their bilingual subject “reflect those in the input languages”, and
looking at phonetic inventories does not reveal much “about the nature of the system in
terms of contrasts and oppositions”.
Since induced phonological rules may be incidental in the sense that they at least
depend on the lexicon acquired, the input a child is exposed to (within and across
languages) is very important. Thus, in looking at the sources of language interaction in
phonological system we should at least consider the question of input conditions (such as
different language exposure patterns) in addition to structural issues of the languages in
contact.
61
Guion’s (2003) study of adult bilinguals is relevant in this discussion because it
provides evidence that systemic crosslinguistic differences, such as relative crowdedness
of vowel space (absence/presence of certain vowel contrasts in the languages in contact),
can cause difficulties in the process of bilingual acquisition. Guion (2003) argues that the
success of bilingual acquisition depends on the age of onset of language learning. In this
study Quichua-Spanish adult bilinguals (n=20) were compared to Spanish monolinguals.
The Quichua-Spanish bilinguals all acquired Quichua from birth, but differed in the age of
acquisition onset for Spanish (from birth to 38;9). Quichua features only three
monophthong vowels (rather lax), while Ecuadorian Spanish features five tense vowels.
In the acoustic vowel space the Quichua vowels are less dispersed than the Spanish ones.
Guion performed formant analysis of vowels with subsequent normalisation for vocal
tract length differences. There were three groups of ‘tightly packed’ vowels with a
potential structural conflict in the vowel space:
(1) Quichua // versus Spanish /i/ and /e/;
(2) Quichua // versus Spanish /u/ and /o/;
(3) Quichua /a/ versus Spanish /a/ (F1 of the Spanish /a/ is phonetically lower than
the Quichua one).
The results indicated that age of onset of acquisition is an important factor in
approaching native-like speech production in the two languages. All the simultaneous and
most of the early bilinguals (onset between 5;0 and 7;0) distinguished between Spanish
and Quichua vowels in speech production. The late bilinguals transferred their native lax
Quichua vowels quality into the tense Spanish system. Besides, the simultaneous
bilinguals were able to acquire ‘more tightly packed’ vowels, while the early and late
bilinguals were able to acquire new vowels, but were less successful in “partitioning the
vowel space in the same fine-grained way” (Guion, 2003, p.121) as simultaneous
bilinguals. Since such systemic differences are difficult to acquire for adult bilinguals,
they also may cause difficulties in the course of early bilingual acquisition, before the two
systems grow into an adult product.
Keshavarz & Ingram (2002) addressed the question of ‘one versus two systems’, but
they also reported some variation in the vowel productions for their bilingual subject.
Some phonetic variants showed signs of language interaction. The subject, Arsham (n=1,
age 0;8 to 1;10), was brought up according to the ‘one parent – one language’ principle
(the father spoke American English, while the mother was a Farsi speaker). The amount
of input in the two languages changed as the family moved around: i.e. from 8 to 14
months Arsham received more input in Farsi, and 15 to 24 months more input in English.
62
Farsi has a six-vowel system, while American English features thirteen vowelmonophthongs in addition to three diphthongs.
Keshavarz & Ingram annotated the audio recordings with broad phonetic
transcriptions. The results showed that Arsham’s development of syllable structure was
consistent with the syllabic structure of the target languages. The majority of English
words were monosyllabic, and the majority of Farsi words were polysyllabic. Stress
patterns in both languages were respected: i.e. Farsi has fixed stress on the ultimate
syllable, while English has a variable word stress location. Vowel and consonant
inventories of the child were mainly language-specific. Thus, Keshavarz & Ingram (2002)
concluded that Arsham developed two separate phonologies. However, the authors also
reported (2002, p.265) that the child produced a limited number of vowels, which at least
suggested a possibility of transfer from English into Farsi: i.e. Arsham produced []
for /u/, [] for /o/ and [] for // in some target Farsi words. Keshavarz & Ingram (2002,
p.265) suggested that this transfer could be viewed as “a sign of shifting dominance” in
Arsham’s language acquisition process.
Interestingly, the direction of transfer in Arsham’s case contradicts the
unidirectional ‘markedness’ hypothesis proposed by Müller (1998) for simultaneous
bilingual acquisition. We discussed these issues in Section 1.3.2.3.3. The transfer is
directed from a more marked English system (/u/ and //, and // and //) into a less marked
six-vowel Farsi system rather than the other way around as predicted by the hypothesis
(Müller, 1998). Similarly, the direction of transfer in Arsham’s data contradicts the CrossLanguage Cue Competition Hypothesis (Döpke, 1998; Döpke, 2000), since elements of
the more ambiguous English vowel system were introduced into a less ambiguous Farsi
system, and not the other way around.
Further, Kehoe’s (2002) study directly dealt with the sources of language interaction
between different vowel systems of young simultaneous bilinguals. Since the segmental
issues in her study also concerned vowel duration, we shall discuss it in Section 2.3.2.
To conclude, there seems to be a consensus across the majority of studies devoted to
bilingual phonological acquisition of vowel systems that the two systems are acquired
language-specifically by early simultaneous bilinguals. However, handling acquisition of
vowels in terms of pure inventories (rather than a system of meaningful contrasts,
distributions and ranges of variation) may not reveal possible language interaction effects.
It is also not clear from the scant reports of language interaction what determines its
63
direction: the structure of languages in contact or environmental factors (such as the
amount of input in the two languages) or both.
2.3 Language Interaction in Bilingual Acquisition of Vowel
Duration
2.3.1 Monolingual Acquisition
The process of acquisition of phonological system involves both acquisition of
segmental properties and their language-specific timing. It is known that a number of
languages (e.g. Scottish English, Hungarian, Swedish or Finnish) feature contrasts such as
postvocalic consonantal conditioning of vowel duration or phonological vowel length,
while other (e.g. Russian, Spanish, Polish, Arabic) do not to the same extent, if at all.
Studies of monolingual acquisition suggest that in languages involving such paradigmatic
vowel duration conditioning (intrinsic or extrinsic) the durational contrasts are acquired
relatively early.
Stoel-Gammon et al. (1995) and Buder & Stoel-Gammon (2002) addressed the issue
of acquisition of vowel duration by Swedish and American English children (age 30
months, n=18). Both languages feature intrinsic vowel duration conditioning for vowels /i/
and // in words like “bit” and “beat”. In English, the two vowels differ in vowel quality
and duration with a lax-to-tense vowel duration ratio of .71 (House, 1961), while in
Swedish the difference is substantially larger with a lax-to-tense ratio of .65. In addition,
there are crosslinguistic differences in the implementation of extrinsic conditioning due to
the voicing of the following consonant (in words like “beat” and “bead”): i.e. the
voiceless-to-voiced ratio is .51 in American English, while in Swedish the extent of such
conditioning is negligible.
An important finding in these studies (Stoel-Gammon et al., 1995; Buder & StoelGammon, 2002) was that intrinsic and extrinsic vowel duration conditioning seemed to
follow different paths of acquisition in the two languages. In American English, where the
adult model features the tense/lax contrast for some vowels (but not for all) and their
extrinsic conditioning is substantial, 30-month children acquired only the vowel quality
differences in intrinsic vowel duration conditioning, not the differences in duration, and
the extrinsic vowel duration conditioning pattern. In contrast, the Swedish children
acquired the intrinsic vowel duration, but not the vowel quality distinction. The Swedish
64
pattern of acquisition was in line with the importance of intrinsic vowel duration
conditioning in Swedish where the length contrast is present for all vowels.
The above results for American English are also supported in Stoel-Gammon &
Buder (1999). The child averages (n=20; age 2;0) for extrinsic and intrinsic patterns in
their study conformed to the adult models. Stoel-Gammon & Buder also reported that
both patterns considerably varied across subjects: in 35% of the children the duration of
lax vowels exceeded that of the tense ones (unlike in the adult model in Figure 2-3).
Empirical data on the acquisition of SVLR in Scottish English is rather limited.
Hewlett et al. (1999) studied extrinsic vowel duration patterns in Edinburgh-born children
(n=7; age 6;0 to 9;0). The children had different parental backgrounds: i.e. two children
had both SSE-speaking parents, two had one SSE and one non-SSE speaking parent, and
the remaining children had two non-SSE speaking parents from different English
backgrounds. Regarding the cross-varietal differences in the implementation of the
extrinsic vowel duration (Figure 2-3 and Table 2-5), it could be expected that children
with different parental backgrounds (SSE or non-SSE) would have different extrinsic
vowel duration conditioning patterns. The study concentrated on the differences for the
vowels /i/ and // in CVC-words.
The results in Hewlett et al. (1999) showed that all the children mastered an
extrinsic vowel duration conditioning pattern, and the parental dialectal background
largely determined what pattern it followed. Those children with one or two SSE speaking
parents acquired the Scottish pattern, whereas the children with two non-SSE Englishspeaking parents acquired a pattern similar to the SSBE pattern for the tense vowels
shown in Figure 2-3. Besides, two subjects had a different extrinsic pattern between the
vowels /i/ and //: i.e. these children mastered the SVLR pattern for /i/, and the non-SSE
pattern for /u/ and lax // (the SSE // stands for two realisations in SSBE, tense /u/ as in
“foot” and lax // as in “food”). This suggests that additional vowel quality contrasts on
top of durational differences may play a mediating role in the acquisition process of vowel
duration.
Matthews (2002) carried out a limited acoustic analysis of vowel duration for three
(out of the seven) children in his study. He subdivided the tokens with syllable nucleus /i/
into ‘long’ and ‘short’ categories. Individual results of the acoustic analyses per child are
presented in Figure 2-6. The ‘long’ category included /i/ followed by voiced fricatives,
while ‘short’ included /i/ followed by voiceless stops.
65
600
mean duration (ms)
500
400
long
300
short
200
100
0
R_2;6
E_2;8
B_2;6
child_age
Figure 2-6 Mean duration (ms) for SSE vowel /i/ as a function of the right consonantal context for three
speakers in Matthews (2002).
Figure 2-6 shows that two children (R and B) acquired the SVLR distinction,
(though the differences were not statistically significant), since the differences between
the means are in the right SVLR direction, with the ‘short’ category being shorter than the
‘long’ one. There are substantial individual differences between the subjects, which might
be due to the low number of tokens in this sample (between three and five for each
category).
Matthews’ data suggest that Scottish children may acquire some SVLR-like pattern
by the age of 2;6 to 2;8. However, we cannot fully assume this, since we don’t know
(given that there was no /i/ followed by voiced stops in Matthews’ data) whether children
acquired a Scottish or a non-Scottish extrinsic vowel duration pattern. Nor can we
assume, given Hewlett et al. (1999) results that the bilingual children in our study would
acquire the SVLR pattern, given that they have two non-SSE speaking parents (Russian),
and that they are exposed to all English varieties in Edinburgh including the SSE majority
(see Section 3.2.2 for further discussion).
The results for American English, however, support the suggestive SSE data
(Matthews, 2002) that postvocalic conditioning of vowel duration can be acquired by the
age of 2;0. Thus, given that intrinsic and postvocalic vowel duration conditioning is of
66
similar extent within General American and SSE phonological systems (even though they
differ in phonetic detail), we can expect that SSE monolingual children:
(1)
should generally have no difficulties in distinguishing between the vowel
qualities of tense and lax vowels (/i/ and //);
(2)
will have acquired the SVLR patterns for /i/ and //, and will have a
differentiated pattern for the lax vowel // from the tense /i/;
(3)
will show somewhat different individual SVLR patterns from the adult SSE
model with a broad range of variation in its realisation.
2.3.2 Bilingual Acquisition
With a few exceptions, the acquisition of the contrasts such as intrinsic and extrinsic
vowel duration conditioning has rarely been addressed in early bilingual acquisition.
Kehoe (2002) addressed the question of language interaction between young
bilinguals’ phonological systems. She followed up Paradis & Genesee’s (1996) proposal
for the bilingual acquisition of syntax that language interaction (if any) may take a form
of acceleration, delay or transfer. She examined language interaction between German
and Spanish vowel systems in German-Spanish bilingual children (n=3, aged 1;0 to 3;0).
Her study concentrated on the influences of the Spanish 5-vowel system (with no intrinsic
vowel duration conditioning) on the German 14-vowel system (with such a conditioning)
in the speech of bilingual children.
The bilingual children lived in Germany in families with German fathers and
Spanish-speaking mothers. The families mainly followed the ‘one parent – one language’
approach. Control groups included age-matched German (n=3) and Spanish (n=3)
monolinguals. The methodology involved acoustic analysis of vowel duration and
auditory transcriptions of the target vowels in both languages. The results of her study for
the German monosyllabic words are presented in Figure 2-7.
67
450
400
duration (ms) + 1 SD
350
300
short ML
250
short BL
200
150
long ML
100
long BL
50
0
S
N
J
B
T
M
subject
Figure 2-7 Individual means of the differences in intrinsic vowel duration in German (short and long
vowels) in the speech production of bilingual German-Spanish (broken bars) and monolingual German
(solid bars) children.
At the age of 2;3 to 2;6 the German monolingual children (B, T and M in Figure
2-7) produced significant differences between short and long vowels, even though the
difference was not as substantial as in the adult model. The bilingual children (S, N and J)
did acquire the difference between short and long vowels in the right direction, but the
extent of the difference was much smaller than that of the monolingual children, and was
not statistically significant.
For the Spanish vowels, Kehoe found no systematic differences in the acquisition
pattern between bilinguals and monolinguals. Marginally, the bilingual children produced
more non-Spanish vowels (6%) than the Spanish monolingual children (3%). However,
Kehoe does not interpret this as evidence for transfer from German since most nonSpanish vowels fitted into the non-adult ranges of variation of the monolingual children,
and were mostly non-German vowels.
The results further showed that bilingual children: (1) had a delayed acquisition of
the German vowel length distinction compared to the German monolingual children; (2)
had no problems acquiring the Spanish 5-vowel system.
Similarly to Müller’s (1998) study for bilingual syntactic acquisition, Kehoe
invoked the concept of ‘markedness’ to explain the ‘delayed’ acquisition of German
68
vowel length by the German-Spanish bilingual children. She argued that the marked
German vowel system (featuring vowel length and more vowel contrasts) was more
difficult to acquire, than the unmarked 5-vowel system in Spanish. In arguing that the
language interaction takes the form of delay, and that it is due to the systemic differences
between German and Spanish, Kehoe (2002) did not provide the evidence that the
children in her study did eventually acquire the German intrinsic vowel duration
conditioning in a manner similar to German monolingual children. In fact, the bilingual
patterns in Kehoe’s study seem also to be influenced by the undifferentiated Spanish
system (i.e. undergo transfer).
We disagree with Kehoe’s treatment of the notions ‘transfer’ and ‘delay’ and their
relations inherited from Paradis & Genesee’s (1996) study (see Section 1.2.2). Both delay
and transfer could potentially be viewed as an effect of language interaction, only if the
delay (a developmental notion) has nothing to do with transfer (a static process). For
example, two phonological systems, like the durational features in Spanish/German
bilinguals’ languages in Kehoe’s (2002) study can influence each other, because the nonbase LB is not completely inhibited, or because Spanish durational structures are stored in
the German patterns. This effect may seem a delay compared to monolinguals because the
transfer ultimately ceases. So we need more clarification on the relations between the
notions of ‘transfer’, ‘acceleration’ and ‘delay’ in the taxonomy of language interaction
effects.
Whitworth’s (2003) study addressed the issue of acquisition of intrinsic and
extrinsic vowel duration conditioning in early and late German-English bilinguals.
German and English have similar phonological systems, but differ in the phonetic detail
of the implementation. Intrinsic vowel duration conditioning plays a greater role in
German, while extrinsic conditioning plays a greater role in English. The early
simultaneous bilinguals (n=6) in her study aged from 5;0 to 13;2, and lived in West
Yorkshire (UK) in families following the ‘one parent – one language’ upbringing
principle. The extrinsic and intrinsic vowel duration were acoustically measured and
expressed as ratios.
There are some problems with interpreting Whitworth’s results on vowel duration.
The first problem is that in a substantial part of the study she averaged the results of a
cross-section of six early bilinguals as one group, despite the fact that they were too
different in age to be treated this way (5;0 to 13;2). Being different in age and
environmental situations the children naturally produced very different results. For
69
example, Whitworth (2003, p.151) reports that on average the bilingual children produced
the non-language-specific patterns for German and English lax-to-tense ratios (with the
German ratio being greater than the English one) compared to the patterns of all ‘fathers’
and all ‘mothers’. On the contrary, further individual results (2003, p.157), showed that
the results of six children could, in fact, be split into two groups: (1) Max (6;2), Anneliese
(7;6) and Reuben (10;10), whose German lax-to-tense ratio was greater than the English
one, acquired a ratio unlike the adult model; and (2) Leonore (5;0), Rieke (8;2) and
Salome (13;2), whose German lax-to-tense ratio was smaller than the English one,
acquired a ratio more similar to the adult model. Despite this discrepancy between group
and individual results, further analyses, conclusions and discussion of the bilingual
acquisition of lax-to-tense ratio is based on the averaged group results, rather than on
individual (Whitworth, 2003, p.180). Another problem is that Whitworth treats Germanspeaking mothers and English-speaking fathers as groups rather than as individuals,
relating them to the speech production of their own children, despite the fact that
averaging results of the parents makes them comparable to just any sample from the
population from their dialectal area.
Results for extrinsic vowel duration conditioning (voiceless-to-voiced ratio) in
Whitworth’s study showed that the children produced results intermediate between the
two language values (except for the youngest child aged 5;0).
Whitworth argues (2003, p.192) that the intermediate results in the production of
extrinsic conditioning in the bilingual children are “affected by markedness rather than by
language transfer”. In our view, the two phenomena can’t be put together in an ‘either/or’
fashion, as markedness is shaped by the relative systems of the two languages in contact
and their distributional characteristics, whereas ‘transfer’ is a process resulting from the
relative (in-)dependence in their mental representations. In these capacities these
processes can operate on top of each other.
To summarise, there is some evidence that by the age of 5;0 bilingual children
acquiring two systems with very different extrinsic and intrinsic vowel duration
conditioning systems may experience relative ‘difficulty’ in acquiring the structurally
more complex system of the two. The difficulty in acquisition may result in apparent
delay in comparison to the monolingual children (Kehoe, 2002). What is not clear is
whether this delay resolves (if it exists at all independently of ‘transfer’), in what form it
settles down, and what factors other than language structure and its relative markedness
can influence this.
70
2.4 Acquisition of Vocal Effort
2.4.1 Monolingual Acquisition
In Section 2.1.2.3 we discussed the issue of association of stress and prominence
with vocal effort, and showed that vocal effort is at least a result of interaction of
respiratory and laryngeal levels of speech production. In acoustic output, the sound
pressure level (SPL) is controlled by neuromuscular actions of the respiratory system,
while the neuromuscular actions at the laryngeal level control the shape of the glottal
pulse and, thus, affect the slope of the radiated spectrum in midfrequencies.
Physiologically, the speech production of children is not just a scaled-down version
of the adult system. At the respiratory and laryngeal levels, maturational processes take
place throughout childhood. These processes make the child’s speech production system
qualitatively different from the adult’s.
In the respiratory system, children are more dependent on diaphragmatic breathing,
due to anatomical differences in the angle of the ribs, so that children cannot control chest
volume in the same way as adults until approximately the age of seven (see literature
review in Mackenzie Beck, 1997). Netsell et al. (1994) derived from their developmental
measurements of laryngeal and respiratory functions in speech production the conclusion
that the respiratory system of pre-school children differs from that of adults in a
substantially greater employment of expiratory muscle forces as compared to the
inspiratory ones.
At the laryngeal level, apart from the fact that vocal fold length increases as a linear
function of age (Titze, 1994; Mackenzie Beck, 1997), the vocal fold ligament is still
immature at the age of 4;0. It is thinner and does not have the same layered morphology
as that of adults, and its maturation continues into adolescence (Titze, 1994). The
thyroarytenoid muscle continues to develop throughout the childhood.
Despite these (and other) non-linear physiological differences, children aged
between two to six are able to speak as loudly as adults. One reason for that is the
correlation of f0 and intensity: i.e. an octave increase in f0 corresponds to 8-9 dB increase
in intensity; and children have higher f0 (Titze & Sundberg, 1992; Titze, 1994). Children
seem to be working harder to achieve loudness like that of adults; by achieving higher
lung pressures and longer volume excursions than adults, and by breathing more
frequently (Strathopoulos & Sapienza, 1993).
71
With regard to control of vocal effort, Strathopoulos & Sapienza (1993)
simultaneously measured aerodynamic, acoustic and kinematic correlates of vocal
intensity. They found that when 4-year-old children (n=20) were asked to adjust
phonatory loudness from soft to comfortable, and then to loud voice, their acoustic speech
output was similar to that of 8-year-olds and adults in many ways, while kinematic
correlates (lung volume, rib cage and abdominal displacement) differed quantitatively and
functionally. However, their respiratory and laryngeal adjustments still resulted in an
increase of the sound pressure level, like those of adults (see Figure 2-8). The overall
higher levels of SPL in children in Figure 2-8 could result from smaller vocal tract size
(the same force applied over a small area results in higher tracheal pressure than that
applied to larger areas).
Importantly for this study, Strathopoulos & Sapienza (1993) showed that when
performing the same adjustments in phonatory loudness, 4-year-old children control the
rate at which the vocal folds close. This aerodynamic measure is called ‘maximum flow
declination rate’ (MFDR), and it is known to affect the spectral intensities in
midfrequencies of the radiated spectrum (Gauffin & Sundberg, 1989). The children’s
control of the vocal fold closure rate had similar results as the adults (see Figure 2-9): i.e.
they increased MFDR with the increase of phonatory loudness. The mean difference in
MFDR between boys and girls was small.
Strathopoulos (1995) built on the data from Strathopoulos & Sapienza (1993)
addressing the issue of age-related variability in acoustic, aerodynamic, and respiratory
kinematic measurements. Even though Strathopoulos (1995) generally found that children
were not consistently more variable than adults (not for all measurements), she did find
that 4-year-olds produced several parameters significantly more variably than adults.
These parameters included SPL and MFDR, which are closely related to the overall
intensity and spectral balance in this study.
72
95
Sound Pressure Level (dB)
90
85
80
4 year old
75
8 year old
70
adult
65
60
55
50
soft
comfortable
loud
phonatory loudness level
Figure 2-8 Output sound pressure levels (dB) in female 4-, 8-year-olds and adults, when they are asked to
adjust phonatory loudness for syllable trains /p/ (adopted from Strathopoulos & Sapienza, 1993)
Maximum Flow Declination Rate (L/s/s)
500
450
400
350
300
4 year old
250
8 year old
200
adult
150
100
50
0
soft
comfortable
loud
phonatory loudness level
Figure 2-9 Maximum flow declination rate (L/s/s) in female 4-, 8-year-olds and adults, when they are asked
to adjust phonatory loudness for syllable trains /p/ (adopted from Strathopoulos & Sapienza, 1993)
Traunmüller & Eriksson (2000) compared a relative contribution of such acoustic
parameters as sound pressure level, spectral emphasis, f0, F1, F3, duration and pausing to
vocal effort. The subjects included adults (n=20) and 7-year-old children (n=8) of both
sexes. While the results showed that acoustic output in terms of ‘vocal effort’ is a result of
73
“synergetic process that involves an increase of subglottal pressure, increase of vocal fold
tension, and increase of the openness of the vocal tract”. However, the best single
predictor of vocal effort (when the distance from the microphone is not fixed) was
spectral emphasis. This parameter was not affected by speaker’s age and sex.
We are not aware of any studies measuring spectral balance as an acoustic correlate
of stress or prominence in child speech. However, the above findings confirm the fact that
despite the qualitative differences in respiratory and laryngeal control children are able to
perform the same linguistic tasks connected to loudness as adults. Thus, if the additional
vocal effort in SSE to mark prominence in short SVLR vowels is linguistically relevant,
we could expect that monolingual 4-year-old children acquiring SSE should learn this
behaviour, while showing a considerable degree of variability compared to adults
(Strathopoulos, 1995).
2.4.2 Bilingual Acquisition
We are not aware of any studies dealing with the acquisition of vocal effort in
bilingual children in acoustic, aerodynamic or kinematic terms. However, given the
discussion on monolingual acquisition, we can hypothesise that if the vocal effort
adjustments are relevant features of Russian and Scottish English sound structure systems,
bilingual children should acquire the crosslinguistic differences in vocal effort along with
other segmental and suprasegmental properties.
2.5 Summary and Research Questions
In this chapter we explained the crosslinguistic differences between Scottish English
and Russian that we would like to consider for this study. The research variables in the
crosslinguistic perspective constitute a representative set of structural differences, which
are frequent in phonology.
The summary of the total of eight research variables in this study is provided in
Table 2-11. The table shows the speech production level in assessing each of the
vowels sets and the crosslinguistic difference involved. We also include a cross-reference
to sections in which we discuss these variables either in terms of language description, or
issues of monolingual and bilingual acquisition.
74
Table 2-11 Summary of the total of 8 research variables for three levels of speech production, vowel sets,
crosslinguistic differences and a cross-reference to Section numbers containing discussion for these
variables.
Speech
production
level
Vowels per Language
Vowel Quality
Postvocalic
conditioning of
Vowel duration
Effect of
interaction of
vowel duration
and prominence
on Vocal Effort
MSR
SSE
/i/
/i/ versus //
/u/
//
/i/
/i/
//
/u/
//
/i/
/i/
/i/ versus //
/u/
//
Crosslinguistic difference
(MSR/SSE)
systemic (lack/presence
of contrast
realisational
(back/central)
systemic
(minimal/SVLR)
systemic
(none/invariably short)
systemic
(minimal/SVLR)
systemic
(unsystematic/systematic)
systemic
(none/differentiated)
systemic
(unsystematic/systematic)
Discussed in Sections
2.1.3.3 / 2.2.1 / 2.2.2
2.1.3.3 / 2.2.1 / 2.2.2
2.1.4 / 2.3.1 / 2.4.2
2.1.4
2.1.4; 2.3.1;2.4.2
2.1.4 / 2.4.1 / 2.4.2
2.1.2.5
2.1.4 / 2.4.1 / 2.4.2
Thus, given the set of research variables (see Table 2-11) and environmental
differences in the language input to our Scottish English – Russian subjects (n=2, aged 3;4
– 4;8), we will address the following questions:
(1)
Do bilingual children have differentiated control of their two languages?
(2)
Is their SSE ‘native-like’ compared to the monolingual SSE-peers and SSE
adults? Is it SSE (or some other ambient language variety) that they
acquire?
(3)
Is their MSR ‘native-like’ compared to MSR-speaking adults (including
mothers)?
(4)
Is there any language interaction? If any, what are the patterns?
Additionally we would like to provide data for the monolingual acquisition patterns
in the SSE monolingual peers (n=7), for all of the research variables, since there are only
limited accounts on their acquisition in this age group.
Results from each of the research questions will be analysed in the longitudinal
developmental perspective for each subject as well as with regard to the confounding
effects of the bilingual’s language input and crosslinguistic structural differences.
75
Specifically we will address the issues of contributions of structural, environmental and
longitudinal aspects of language input to language differentiation and possible interaction.
Given the substantial number of sound structure variables and different levels of speech
production involved in this study, any observed language interaction patterns should allow
us to judge their systematicity and direction in a quite reliable way.
The analysis should further help us to explain the observed patterns of bilingual
language differentiation and interaction in the light of current views in the area of
bilingual language acquisition studies (discussed in Chapter 1) and from the point of view
of the need for a unified/separate model of phonological acquisition for bilingual and
monolingual acquisition.
76
3 Methodology
3.1 Introduction
This chapter justifies the methodological choices made in this study. It accounts for
the selection of subjects and controls, the choice of materials, and the procedures for
recording and analysing the data.
As discussed in Chapter 2, this study aims to measure the extent of bilingual
subjects’ differentiation of their two languages, and to identify possible language
interaction patterns for cross-linguistically different aspects of vowel quality, vowel
duration and vocal effort. One way to establish whether a child ‘differentiates’ between
the two languages, is to find out the extent of the child’s language proximity to the
language input they receive, together with identifying the extent of their speech
immaturity and language interaction between the two languages.
By referring to the ‘extent’ of this linguistic knowledge, we mean emphasise the
continuous and gradient nature of speech production in general. In order to make
categorical inferences concerning such non-categorical data, we need to create a
representative control framework. To quantify the extent of bilingual’s language
command, we can:
-
minimise pragmatic code-switching in the speech production of the children
(Grosjean, 2001), in order to maximise language separation between the bilingual
child’s two languages, and, thus, to reduce child speech variability (further reasons are
discussed in Chapter 1).
-
control for the linguistic input from the direct or a closely matching sociolinguistic
environment of the child;
-
apply the same methodology to all control groups and subjects, rather than rely on
reports in the literature that inevitably differ in methodology;
-
use a sufficient number of repetitions of the carrier words to catch a representative
sample of intra-subject variation and to be able to perform statistical analyses for the
collected data.
The sociolinguistic background of the bilingual subjects is discussed in Section
3.2.2. The control groups in this study differed depending on the language mode
(Grosjean, 2001) analysed, and they were defined by the typical individual and social
networks of the subjects. In the SSE monolingual language mode, the speech production
77
of the subjects was compared to that of typically developing SSE monolingual children
(n=7), to SSE adults (n=5), and SSBE adults (n=4). In the Russian monolingual language
mode, the bilinguals’ speech was compared to the speech of their Russian mothers (n=2),
and to that of other adult MSR speakers (n=3). Since Russian monolingual children are
not part of the social environment of our subjects, we did not gather Russian monolingual
child data. However, for the Russian developmental patterns we refer to the available
literature in subsequent chapters. The sociolinguistic background of the control groups is
discussed in Section 3.3.
It is known that when eliciting any structured data from pre-school children,
researchers face qualitatively different methodological problems than when working with
adults. Difficulties arise due to specific aspects of the cognitive and social development of
children, such as, for example: their attention span, their sensitivity to strangers, the
‘observer effect’ (Crystal, 1997), or the (in-)ability to perform certain tasks at different
ages. Consequently, researchers cannot impose the same stringent conditions on the
experimental set-up as for adults. However, when tailoring data elicitation techniques, an
optimal trade-off should be made between the child’s abilities and the feasibility of
subsequent acoustic and statistical analyses. This is discussed in Section 3.5.
A major problem concerns the instrumental measurement of child speech
production, such as, for example, the difficulties in estimating formant frequencies due to
the typically high fundamental frequency in child speech (Kent & Read, 2002). But also
in physiological terms development means an increase of the vocal tract length. This in
turn causes an age specific decrease in formant frequencies. Such developmental changes
in acoustic measurements require inter- and intraspeaker normalisation. The acoustic
analysis procedures, normalisation and data validation issues are presented in Section 3.6.
3.2 Subjects
3.2.1 Common Linguistic and Environmental Background
This study investigates the speech production patterns of two bilingual Scottish
English/Russian children, BS and AN3, aged 3;4 to 4;5. Both subjects are girls. The girls
live in quite a similar linguistic and social environment. Both have grown up in Russian
speaking families, in which the parents are native speakers of Russian, and the children
have been acquiring Scottish English in the community. Both girls are firstborn children.
3
All initials used throughout the study have been changed to maintain anonymity.
78
The socioeconomic background of the families falls into a Middle Class (MC)
classification based on the parental occupational background (AN falls in Group 1, and
BS4 in 2 based on SOC 2000 NS-SEC standard) (Bilton et al., 2002).
At the start of the recordings, the two families lived in the centre of Edinburgh
(Scotland). The girls attended the same nursery close to their homes. The manager of the
nursery (in personal communication) described the socioeconomic and language
background of the nursery staff and children who were in daily contact with AN and BS
as follows. The absolute majority of the children had both parents employed in ‘white
collar’ MC jobs in Edinburgh. One child was from a working class family; two children
were from upper-middle class background. Out of a total of 14 staff members, seven were
born and bred in Scotland. One was born and bred in England. Furthermore, there were
two Irish, and four bilingual staff members (2 English/Urdu, 2 Spanish/English). A total
of 32 children were in daily contact with the girls. According to the nursery manager (an
SSE speaker herself), only four of them had a clear SSE accent, while most children had a
less clear mixture of SSE and near-RP accents. There was another Scottish/Russian
bilingual boy attending the nursery, as well as a couple of Scottish/Spanish bilingual
children. The subjects had regular contact with each other. When BS attended in the
nursery, the two girls often played together, and they were reported to speak English to
each other while playing.
The Russian community in Edinburgh lacks any institutional and social networks,
such as schools or nurseries. Thus, all contacts between Russian-speaking individuals are
established on their personal initiative. The two families had regular contacts with at least
three to four other Russian-speaking families living in Scotland5. Some of those families
also had children. An officer in the Russian consulate in Edinburgh (in personal
communication) was not able to provide exact information on the number of Russian
native speakers living in Lothian, since not all the Russian citizens in Scotland register
with the Russian consular services, and not all native Russian speakers are citizens of
Russia. However, it was noted that by June 2003, the number of such registrations was
about 140, and they estimated the real number of the residents in Lothian being at least
twice or even three times bigger. The university of Edinburgh and local IT-companies
attract Russian researchers and IT-specialists. This is also the background of our subjects’
4
Group 1 includes managers and senior officials; Group 2 includes professional occupations.
All information about the families was gathered in a language background questionnaire that was filled in
by the parents upon the completion of recordings.
5
79
parents: all of them have university degrees, and three of them are employed in Scotland
in ‘white collar’ jobs.
Another common denominator in the environment of our subjects are the language
varieties to which they are exposed in Edinburgh: i.e. the SSE continuum ranging from
Scots to SSE (Aitken, 1981), other English varieties, and other languages. The
heterogeneous sociolinguistic background of the nursery staff and children reflects the
situation in Edinburgh. It is known that the phonological and phonetic range of SSE in
Edinburgh sometimes reaches the near-RP side of the continuum in the speech of Scottish
MC speakers (Scobbie et al., 1999a). According to Scotland's Census 2001 for Edinburgh,
the Middle Class population (groups 1 and 2, SOC 2000 NS-SEC standard) constitutes
34.09% of the total city population (n=453,430). 12.14% of Edinburgh residents are born
in England; this is the biggest population group after Scottish-born residents (77.1%). The
percentage of residents born in England has increased by roughly one third as compared
to the 1991 Census, and the percentage is greater than in the rest of Scotland (8.08%) or
Glasgow (4.24%). Besides, Scobbie et al. (1999a) report that one fourth of children (23%)
born in Edinburgh in MC families have at least one English parent. It is well established
in the literature that parental language or dialectal background influences children’s
speech. For Edinburgh specifically, Hewlett et al. (1999) found that children (aged 6 to 9)
with two non-Scottish British parents implement extrinsic vowel duration conditioning
differently from their peers with two Scottish or one Scottish parent: i.e. they exhibit more
influences from the voicing effect featured in non-SSE English varieties.
This raises a question relevant to the sociolinguistic background of our bilingual
subjects. Both girls are growing up in Russian-speaking families. The families are not preoccupied with correcting their English. The only choice the parents make in this respect is
deciding what nursery or school the child should attend. Given that parental choice, the
girls have to make sense of the English varieties spoken in their environment themselves.
Given that one fourth of the children in their environment may have some SSBE
influences in their English, we cannot exclude the possibility of finding such influences in
the speech of the subjects. Therefore, we decided to include adult SSBE speakers, and we
also included one monolingual child with a mixed Scottish/English parental background
as a control. The control groups are discussed in detail in section 3.3 of this chapter.
80
3.2.2 Differences in Linguistic and Environmental Background
3.2.2.1
Subject BS
BS's Russian mother was born and grew up in a suburb of Moscow, Russia. BS's
father was born in Ukraine in a Russian-speaking family. He is a Russian/Ukrainian
bilingual. The father’s family moved around throughout his childhood: from Ukraine to
Siberia, back and forth. He spent 13 years in Moscow for studies and work. As a result,
BS’s father speaks a Moscow variety of Russian, with some minor South Russian and
Siberian influences in his speech. Both parents lived in Moscow during their university
studies and afterwards, until they moved to Scotland.
BS was born in Moscow, and moved with her parents to Edinburgh when she was
four months old. BS spent a lot of her time during the day with her mother. During their
residence in Edinburgh, the family went on holidays in Scotland, England and Europe. At
home, all communication was mainly in Russian. Naturally, the family went out into the
community on a daily basis, where Scottish English was spoken, and the family had
regular contacts with English-speaking families. BS watched children’s TV programmes
and videos on a daily basis in Russian, BBC English and Scottish English (in order of
importance). The family had contacts with more than 30 Russians living in the area at
least one to three times a week. The contacts included several other Russian-speaking
families with children. BS’s exposure to nursery English is summarised in Figure 3-1. The
figure is a very conservative estimate of BS’s exposure to English, since it includes only
the nursery attendance hours. This figure does not include any personal contacts of the
family with English speakers, or daily exposure to community English during family
outings or her exposure to mass media.
Recordings of BS’ speech production in two languages were made at 3 age samples:
from 3;4 to 3;5, from 3;9 to 3;10, from 4;4 to 4;5.
BS was enrolled in a local nursery in Edinburgh at the age of 1;3. From the age of
1;3 to 3;0 she attended the nursery quite variably for different periods of time: one day a
week (age 1;3 to 1;10), 30 hours a week (age 1;10 to 2;0), two days a week (10 hours a
week, age 2;6 to 3;0). However, there was a period of four months (2;4 to 2;7) in which
she did not attend the nursery at all and stayed at home.
81
100
90
80
70
%
60
English
50
Russian
40
30
20
10
0;
0
t
0; o 0
4 ;3
t
0; o 0
7 ;6
0; to
10 0;
t 9
1; o 1
1 ;0
t
1; o 1
4 ;3
t
1; o 1
7 ;6
1; to
10 1;
t 9
2; o 2
1 ;0
t
2; o 2
4 ;3
t
2; o 2
7 ;6
2; to
10 2;
t 9
3; o 3
1 ;0
t
3; o 3
4 ;3
t
3; o 3
7 ;6
3; to
10 3;
t 9
4; o 4
1 ;0
t
4; o 4
4 ;3
to
4;
5
0
BS's age
Figure 3-1 BS’s language exposure pattern (% per 3 month) throughout the pre-school period, based on
nursery attendance hours and 336 waking hours/month.
From the age of 3;0 to 3;5 BS attended the nursery two half-days a week (10 hours
in total). This period broadly corresponds to the first age sample recorded with BS.
During this period, BS’s mother learned from talking to nursery staff that BS understands
most spoken English, and that she was speaking spontaneously in sentences. For example,
she used to tell the nursery staff about what she had had for breakfast, and where she had
gone with her mum the day before.
From the age of 3;6 to 4;0, BS continued to attend the nursery for one day a week (5
hours a week). This period broadly corresponds to BS’ second age sample. In this period,
she had also started to attend a local community playgroup for 4 days a week (5 hours in
total). Thus, she socialised with English-speaking peers for at least five days a week.
From the age of 4;0 to 5;0, in addition to attending the nursery for one day a week
(5 hours), BS was enrolled in a local nursery school. This period broadly covers BS’ third
age sample. BS attended the nursery school for four days a week (10 hours in total), with
a two-month break for the summer holidays (3;11 to 4;0).
To summarise, while being exposed to Russian on a daily basis in the family from
birth and throughout the pre-school period, BS’s exposure to the community English was
limited to an average of 10 hours a week from the beginning of her linguistic experiences
82
in Scotland throughout the pre-school period. The language build-up continued with a
substantially broadened exposure to the community English from the age of 3;6, when BS
started attending the playgroup and nursery school. All the exposure to the community
language was on a regular basis. Based on the pattern of BS’s exposure to both languages,
she can be classified as a Russian-dominant Russian-Scottish English bilingual.
3.2.2.2
Subject AN
AN was born in Edinburgh. Her Russian parents were born and grew up in Moscow.
AN stayed at home in Edinburgh with her mother until the age of 0;7. After that she was
enrolled in the local nursery full-time (five days a week, 45 hours in total), and continued
to attend the nursery throughout the pre-school period. All communication at home
between family members was in Russian. AN’s exposure to Russian and English in
Edinburgh continued to be distributed in these proportions throughout this time. Figure
3-2 summarises the monthly percentages of AN’s exposure to English in the nursery
based on the language background questionnaire filled in by her parents. The exposure
pattern is based on 336 waking hours a week. The figure includes the time spent on
holidays to Russia. It does not, however, include familial contacts with English-speaking
families, and general daily exposure to the community English or mass media. The overall
exposure to English in the nursery for AN has been 43% until the age of 4;5.
The family had regular family visits from Russia. AN’s Russian grandmother stayed
with the family in Scotland every year for at least six months. At the age of 0;6 AN
visited Russia for three weeks. At the age of 2;3 and 3;2 AN spent about eight weeks in
Moscow on each occasion. While staying in Russia, she was only exposed to Russian. In
Edinburgh the family had regular contacts with other five to six Russian-speaking families
with or without children, as well as with English-speaking families. The family went on
holidays on a yearly basis in Scotland and Europe. During the holidays the family
members spoke Russian to each other. AN watched children’s programmes on TV and
videos on a daily basis in BBC English, Russian and Scottish English (in order of
importance).
83
100
90
80
70
%
60
English
50
Russian
40
30
20
10
4;4 to 4;5
4;1 to 4;3
3;10 to 4;0
3;7 to 3;9
3;4 to 3;6
3;1 to 3;3
2;10 to 3;0
2;7 to 2;9
2;4 to 2;6
2;1 to 2;3
1;10 to 2;0
1;7 to 1;9
1;4 to 1;6
1;1 to 1;3
0;10 to 1;0
0;7 to 0;9
0;4 to 0;6
0;0 to 0;3
0
AN's age
Figure 3-2 AN’s language exposure pattern throughout the pre-school period, based on nursery attendance
hours and 336 waking hours/month.
The recordings of AN’s speech production in two languages took place during three
age samples: from the age of 3;7 to 3;8, at the age of 4;2, and from the age of 4;5 to 4;6.
During all this time she was in childcare in an English-speaking environment for 45 hours
a week, and Russian was used in the family home. Based on AN’s exposure pattern to the
two languages, she can be classified as a nearly balanced Russian-Scottish English
bilingual.
3.3 Control groups
3.3.1 Children
Seven SSE monolingual children were selected as controls for the speech
production of the bilingual children. Six children were recruited through staff and students
at QMUC. One child was recruited through personal contacts. For their participation the
children received a gift voucher from a toy store. No children had a history of speech or
language disorders, or any reported hearing problems.
The monolingual children were selected so as to match the age of the bilingual
subjects. The age criterion was chosen above such developmental norms as, for example,
Mean Length of Utterance (Brown, 1973), or phonological (e.g. PACS) (Grunwell, 1982)
or syntactical profiles (e.g. LARSP), since no such developmental norms are available for
84
children acquiring two languages at the same time. It is known that monolingual norms
may not be representative of bilingual normal language development, and should only
apply to the population from which they were drawn (Crutchley et al., 1997; Stow &
Dodd, 2003). Besides, there are no comparable age norms available for Russian. Such
frequently used (see e.g. Müller, 1998; Deuchar & Quay, 2000) developmental norms as,
for example, mean length of utterance, or MLU (Brown, 1973) have not yet been
established for Russian child language development (Tsejtlin, 2002). It is also not clear
what type of MLU should be taken, since applying either word-based or morpheme-based
MLU are problematic for a crosslinguistic comparison between Russian and English.
Typologically Russian is a more inflective language that involves more derivational and
inflectional morphology in grammar, while English is a relatively more isolating
language. For example a six word English sentence “The girl will ask the boy” is
conveyed in Russian by three words “Devochka sprosit mal’chika” (Girl ask boy) with all
grammatical relationships conveyed by inflections. Such typological differences between
the languages make it difficult to apply either morpheme-based or word-based MLU.
Besides, the MLU-norm was created to measure the development of syntactic and
morphological aspects of language, and there is evidence that there may not necessarily be
a correspondence between prosodic development and MLU (Lleó, 2002).
Table 3-1 represents all the children (including the subjects) listed by their age. The
child control group is listed in a separate column from the subjects. The letter “C” means
‘child controls’. The digit attached after “C” is the unique number of each control. The
digit attached after the underscore is the child’s age at the end of each age sample. Three
monolingual children (C3, C7 and C4) were recorded longitudinally in two age samples.
All of the children are first-born except for C5. The children come from Scottish
families residing in Edinburgh (C7, C6, C2, C4, C1), or close to the city (C3, C8). All of
the children attended nurseries or nursery schools, and in all but one case the parents were
Scottish-born. One child, C4, had a Scottish mother and an English father. C4’s speech
production was a control case for any possible cross-dialectal influences (SSE-SSBE) in
the speech of our bilingual subjects.
Two of the monolingual subjects (C3 and C9) were boys. It was decided to accept
the boys as controls alongside the girls, in order to simplify the control selection
procedure. With regard to the effect of gender on the acoustic analyses (formant and f0), it
is known that age-related vocal tract length differences are a bigger issue in pre-school
children than gender since the vocal tract grows fast at this age (Kent & Read, 2002). The
85
longitudinal design of this study required a normalisation for vocal tract length
differences; therefore, the gender differences were accounted for by the same procedure.
Table 3-1 Identification codes, age and sex of the children who participated in experiments; the children are
listed by age.
Subject
BS_3;5
AN_3;8
BS_3;10
AN_4;2
BS_4;5
AN_4;8
Control
Age
3;4 - 3;5
C3_3;5 3;4 - 3;5
3;7 - 3;8
C4_3;8 3;8
3;9 - 3;10
C3_3;11 3;11
C5_4;0 3;11 - 4;0
C6_4;0 3;11 - 4;0
C4_4;1 4;1
4;2
C7_4;2 4;2
C8_4;2 4;2
4;4 - 4;5
4;7 - 4;8
C7_4;8 4;8
C9_4;10 4;9 - 4;10
Sex
F
M
F
F
F
M
F
F
F
F
F
F
F
F
F
M
1st-born
yes
yes
yes
yes
yes
yes
no
yes
yes
yes
yes
yes
yes
yes
yes
yes
Residence
Edinburgh
Rosyth
Edinburgh
Edinburgh
Edinburgh
Rosyth
Edinburgh
Edinburgh
Edinburgh
Edinburgh
Edinburgh
Dunbar
Edinburgh
Edinburgh
Edinburgh
Edinburgh
3.3.2 Adults
Adult control groups included five Scottish Standard English, four Southern
Standard British English and five Modern Standard Russian speakers. All the adults were
of a Middle Class social background (group 2 of SOC 2000 NS-SEC standard) (Bilton et
al., 2002). The Russian speakers had all learnt RP-based English at school and during
university studies in Russia, and have been exposed to different English varieties in
Edinburgh for periods ranging from four months to four years. All but one adult (E4) were
female. The MSR group included the two bilingual subjects' mothers. Table 3-2
summarises the geographical background, age and sex of the adult participants.
Three of the Russian speakers were born in Moscow. One of the adults (R2) was
born in Volgograd, in Southern Russia. The speech of R2 had negligible dialectal
influences, as is often the case with urban Russian varieties, which are influenced by the
language used on Russian central television and by the high rates of migration in Russia
among the population with university degrees (Avanesov, 1972). All Russian controls
spoke Modern Standard Russian.
86
Table 3-2 Native language, age, sex of adult participants.
L1
MSR
ID
Age Grew up in
Sex
R1
R2
R3
R4
R5
26
29
32
31
27
Moscow
Volgograd
Tver
Moscow
Moscow
F
F
F
F
F
S1
S2
S3
S4
S5
23
25
27
45
37
Linlithgow
Edinburgh
Edinburgh
Musselburgh
Edinburgh
F
F
F
F
F
E1
E2
E3
E4
31
32
52
44
Surrey
Oxford/Ascot
Yorkshire
Surrey
F
F
F
M
SSE
SSBE
Four of the SSE speakers grew up in either Midlothian (mainly Edinburgh) or West
Lothian (Linlithgow). Three of the SSBE speakers grew up in Southern parts of England.
E3 grew up in Yorkshire, but spoke SSBE.
3.4 Materials
3.4.1 Children
We compared structurally similar words in both languages, which differ enough
crosslinguistically to be diagnostic for possible language interaction in bilingual child
speech production. The materials consisted of monosyllabic "consonant - vowel consonant" (CVC) words. The decision was taken to keep the structure simple, given the
constraints that arise from language-specific structural properties:
(1) Russian and English exhibit rather different phonotactic rules for consonant
clusters in syllable onsets and codas.
(2) As opposed to CVC- type words, polysyllabic words are difficult to compare
between English and Russian, because these languages have too different patterns of
vowel reduction (see Section 2.1.3.1), whereby unstressed vowels crosslinguistically
differ in vowel quality.
87
(3) Both languages exhibit variable word-stress location in polysyllabic words,
while having rather different patterns of word-stress assignment (Trubetskoy, 1939).
(4) Many polysyllabic words in Russian and English contain consonant clusters. For
methodological reasons, there should be no ambiguity in the syllabic structure of target
words, and no doubt as to which syllable the consonant clusters should belong. However,
in the literature, there is a theoretical incompatibility between the accounts of English and
Russian syllable structure. As Kessler and Treiman (1997) point out, of the many theories
of English syllable structure the phonological account of "onset-rhyme" syllable structure
(Fudge, 1969; Selkirk, 1982) "is perhaps most widely accepted". The most widely
accepted account of the Russian syllable is "CV" structure defined in phonetic rather than
phonological terms (Bondarko, 1998). In using CVC-structured words, there is no
theoretical or practical ambiguity in the way in which syllabic decomposition of such a
word should be performed, since there is only one syllable in this case.
Apart from the above advantages in matching CVC-type words across Russian and
English, there are important statistical reasons behind the choice of monosyllabic words
for this study. In spontaneous English speech, monosyllabic words are most frequent
(Crystal, 1997). The same accounts for the vocabulary acquired by English-speaking
children in the first two years of life: i.e. in the CDI (Dale & Fenson, 1996) 23% of the
659 lexical items have CVC structure.
In Russian the number of monosyllabic words is perhaps less frequent than in
English, but it typically belongs to the most frequently used vocabulary. Besides,
monosyllabic Russian adult targets are as frequent in child speech production as trochaic
bi-syllabic ones. In addition to this, trochaic bi-syllabic adult targets are often substituted
by monosyllabic templates in child speech (Zharkova, 2002).
Given the above arguments, the materials were chosen to match the following
criteria in both languages:
-
word-internally voiceless consonants should precede the syllable nucleus, so that
phonation is available to define vowel onset
the consonant following the syllable nucleus should be either a voiced or voiceless
stop, or a voiced or voiceless fricative to trigger language-specific vowel duration
conditioning
the syllable nucleus should contain one of the vowels [i], [], or [] for SSE, or [i] or
[u] for MSR (vowel [] is not featured in MSR)
preference was given for words belonging to a typical child lexicon, or at least be easy
to learn in games by children aged 3;0
had to be suitable for picture naming
should not contain consonant clusters
88
-
the English words should not be true cognates of the Russian ones and visa versa to
avoid any confusion about the language identity of the word.
Some clarification is needed as to what we mean by ‘voiced’ and ‘voiceless’
obstruents. For British English varieties it seems reasonable to assume that the
neutralisation of word-final obstruents like /z/ is not complete. It is phonetically gradual,
without neutralising the phonological contrast (Docherty, 1992). For Russian, some
phonological accounts consider the contrast between voiced and voiceless obstruents as
completely neutralised in word-final positions in favour of voicelessness (Bondarko,
1998), while others (Avanesov, 1972) consider voiced and voiceless counterparts as
combinatory variants of the same voiced phonemes. For the neutralisation process to be
complete word-finally, there should be no phonetic differences between voiced and
voiceless counterparts. However, their behaviour in Russian across words boundaries is
the area of disagreement: i.e. in some contexts the neutralisation can be obligatory and
categorical, while in others it is gradient (Padgett, 2005). Since in this study we deal with
spontaneous speech, in which the tokens can appear in various contexts, we adopt the
gradual view of final devoicing in Russian. By ‘following voiced consonant’ we mean
then a phonologically voiced one that may have various amounts of phonetic devoicing of
the consonant and which may or may not be phonetically neutralised depending on the
context.
Following these criteria, we chose the target words listed in Table 3-3.
Table 3-3 Elicited target words: orthography and adult target phonetic transcription per language.
Utterance final phonetic targets for adults
IPA transcription
IPA transcription
Orthography SSE
SSBE
Transliteration &
Russian6
(Translation)
[’ip]
[’ip]
Sheep
[’fit]
[’fit]
[’kit]
Feet
kit (a whale)
[’sid]
[’sid]
[’fip]
Seed
Fib (proper name)
[’tiz]
[’tiz]
[’ti]
Cheese
chizh (a finch)
[’piz]
[’piz]
Peas
[’kk]
[’kk]
[’suk]
Cook
suk (a tree branch)
[’pt]
[’pt]
[’ut]
Put
shut (a joker)
[’fd]
[’fud]
[’kup]
Food
kub (a cube)
[’z]
[’uz]
[’tus]
Shoes
Tuz (proper name)
[’p]
[’p]
Pig
[’sv]
[’sv]
Sieve
[’f]
[’f]
Fish
6
Since Russian phonemes for voiced stops and fricatives are fully phonetically devoiced utterance-finally,
in the table we use phonetic symbols for voiceless counterparts rather than the devoicing diacritics.
89
The English carrier words were mainly chosen from the lexical entries from the
MacArthur Communicative Development Inventories (CDI) of Lexical Development
Norms (Dale & Fenson, 1996). The CDI is based on parental reports on the lexical items
acquired by English-speaking monolingual children (aged up to 1;5). For three of the
target structures we could not find suitable lexemes in the CDI. Therefore, we chose
depictable words, such as "seed", "sieve" and "cook". We anticipated that children aged
3;0 to 5;0 should have no problems acquiring these words (if they hadn't already done so).
There is a verb (Table 3-3) in the list that matched the required structure, i.e. "put". We
added it to the list after the data was collected, since almost all the children used it
regularly during the recording sessions.
It was less easy to find the matching words for Russian, since no Russian CDI was
available at that point. Most frequent English CDI items are lexemes denoting animals,
toys, cloths and food items. Therefore, the Russian carrier words (Table 3-3) were
matched to these categories as much as possible. The word "Fib", was an invented frog’s
name, the word was derived from the word "amphibian" to make sense. "Tuz" is a popular
dog’s name in Russian. The girls had no problems with remembering these names. The
picture for “cube” represented the popular toy, Rubik's cube, and appeared to be easy for
the children. Other Russian words for "a whale", "a finch", "a branch" and "a joker" were
already known or quickly learnt by the children. Since the depicted objects remained the
same in all experiments, there was no confusion about what lexical item a particular
picture represented.
3.4.2 Adults
The materials collected from the adults matched those of the children in both
languages. However, the materials were collected with a different procedure than playing
games. The procedure is described in Section 3.5.2
3.5 Data Collection
3.5.1 Children
3.5.1.1
Recording Equipment and Set up
The data were recorded using a Tascam (DA-P1) digital audiotape (DAT) recorder.
Each subject was recorded using a MPC-65 Beyerdynamic microphone with increased
directionality. After formal testing of different available options, this microphone had the
90
smallest Sound-to-Noise (SNR) ratio ranges in different environments (studio and office),
and, therefore, was the best option for an environment with variable background noise
(such as a family flat). Besides, this microphone had the advantage of being small in size,
and, thus, was less intrusive than bigger microphones.
During the recordings the microphone was connected to the left channel of the DAT
recorder. It was put on a flat surface, with the front surface facing the child. The recording
volume settings were kept constant. The microphone was kept as close as possible to the
child, with a distance not exceeding one meter.
A notebook computer was used for playing two computer games during the
recordings. The notebook processor needed to be regularly cooled by a fan. When the fan
went on, the ventilation resonated in the computer body, and generated steady narrowband formants in the spectrum of interest. For this reason, the notebook computer was
turned off whenever it was not used. Besides, we added an extra step into the acoustic
analysis procedure, i.e. noise reduction (described in section 3.6.3.2), to ensure that the
fan noise formants do not interfere with the vowel formant frequency estimation.
3.5.1.2
Procedure
All subjects (or a subject's parent) who volunteered for this study, signed a consent
letter and received an information sheet about the broad purpose of the experiments. Any
details that could influence their subsequent language behaviour were omitted.
For the bilingual children most of the recording sessions in both languages took
place in the author's own home. This was a flat with a relatively good sound insulation
located off busy roads, so that the outside noise was reduced to a minimum. However, for
practical reasons the recording of the first age sample of AN took place in AN's home.
In order to trigger the monolingual language mode (Grosjean, 2001), discussed in
Section 1.3.2.1, the bilingual children played games with two different interlocutors: i.e.
the author in the Russian sessions, and an SSE native speaker, S17, in the SSE sessions.
Whenever possible, the Russian parents were not present in the experiment location
(especially in the SSE sessions), to ensure that they would not influence the child's
language choice. This worked out very consistently with AN, but proved to be more
difficult with BS, since she often refused to let her mother leave the room. In such cases
BS's mother stayed in the room, but tried not to interfere with the games.
7
S1 grew up in Linlithgow (West Lothian), both of her parents are Scottish. As a speech and language
therapy student she was experienced in phonetics and data elicitation from children.
91
For monolingual children, the location of recordings varied depending on the child.
Usually it took place in their homes to ensure child's comfort and collaboration. For
subjects C6, C5, C8 (in the SSE child control group), the recordings were performed in
the studios of the Scottish Centre for Research into Speech Disability at QMUC, since it
was the easiest arrangement for all parties.
At all times, we ensured that the environmental noise at the recording location was
reduced as much as possible.
3.5.1.3
Games
Depending on the language mode, Russian or SSE, the children played different sets
of games. The games were designed to elicit a sufficient number of repetitions of the
target words to make possible statistical analyses, while preserving the spontaneity of the
conversation with the experimenter as much as possible. The games were chosen to match
the cognitive abilities of the children’s age range. Each game was self-contained, i.e. it
included all the language-specific target words. This gave the advantage that the games
could be interchanged depending on child's mood within each session and between
different sessions.
Each game lasted about 15-20 minutes. Typically, a set of three to four games was
played in every session. This arrangement was sufficient to keep the attention span of the
children for about 50 minutes. In each session, we elicited ten to twelve repetitions for
each target word for SSE, and fifteen to seventeen for Russian (there were fewer Russian
target words). The elicited speech contained a mixture of multi- and single-word
utterances, and spontaneous speech.
The SSE games were:
"The fishing game": The basic fishing game with the magnets was acquired in a local
toy store. Small laminated pictures representing the target words were then attached to
the fish, and were caught with a set of fishing rods, interchangeably by the
experimenter or by the child, and were collected into cups. The aim was to catch the
most fish.
"Snap": This game is very popular with the children of our target age. The cards are
shuffled and distributed among the players. The piles are put face down in front of
each player. One by one, each player takes the top card off the pile and lays it face up
in another pile in-between the two players. The players name the pictures as they
appear. When a sequence of the same pictures appears on top of the pile, both players
92
vie to be the first to call out "snap!" The one who shouts it out first takes the upturned
pile. Then the next player lays down another card and the game continues. The player
with most cards wins.
"Picture-pairs": In this game, the aim is to match the hidden pairs of pictures. The
participants turn over two cards, telling what is represented on the picture. If the pairs
are not the same, they are placed back in the same position. If they are the same, the
player keeps the cards and has another go. The player with the most pairs wins. This
activity revolves around remembering where the cards are placed. All children were
interested in this game, as long as they managed (or were allowed) to win. All the
children were familiar with the game, but especially the youngest ones (3;2 to 3;4) had
sometimes a too short attention span to keep playing the game.
"Mister Cook's Kitchen": This computer game was especially developed for the
experiments. As many SSE words were related to the food vocabulary, the story was
about cooking. The story line was about three little friends (Pig, Sheep, and Fish), who
visited Mister Cook. Mister Cook had plenty of kitchen cupboards, containing food or
non-food items (cheese, sieve, food, peas, shoes and feet). Each cupboard was opened
in turn, and the child could decide what items Mister Cook had to put into the soup he
was making. The children clicked on the screen buttons themselves, if they wanted to.
This game was a good complement to the non-computer games, and was very popular
with the children.
In the monolingual Russian language mode experiments, we played a slightly
different set of games:
"Catch the ladybirds": The basic magnet game was acquired in a local toy store. It
contained a set of colourful ladybirds that were placed on a playing surface ("grass
and flower field"). Participants caught the ladybirds in turn with magnet spiders
hanging from green branches. Small laminated pictures representing the target words
were attached to the bottom of the ladybirds.
"Hide and seek": This computer-game was especially developed for the experiments.
The story is about a puppy named Tuz, who is hiding away from his mum in one of
the four rooms. The child had to look for the puppy in each room. Pictures of the
objects representing the Russian carrier-words were hidden in the rooms behind red
buttons. Children could click on the buttons to see whether they could find the puppy.
93
In addition to that, we also played "Snap" and "Picture Pairs" in a similar way as in
the SSE experiments, but with the pictures depicting Russian target words.
An SSE-version of "Hide and seek" game was also adopted for the sessions with the
SSE monolingual children. The puppy was called Spud in that version.
3.5.2 Adults
For adults, the CVC carrier words were embedded in two types of carrier sentences
covering four prominent positions (referred to as “pos 1 – 4” in Table 3-4). The four
different positions, shown in Table 3.4, were chosen to elicit different degrees of phonetic
prominence in the vowels. Introducing such variation in prominence provides a more
meaningful basis for comparison (than, for example, reading out word lists), given the
extent of variability that is likely to occur in the child speech.
Position 1 covered a phrase initial pitch accent in an utterance with several pitch
accents. Position 2 covered a non-initial pitch accent before a phrase boundary. Position 3
covered a phrase final pitch accent in an utterance with several pitch accents. Position 4
covered a short full-intonation phrase with one pitch accent.
The subjects were recorded on a DAT-recorder in a soundproof booth using a
condenser boundary microphone with a half-spherical response. The recording volume
settings were kept constant. The subject's mouth distance from the microphone was 50 –
60 cm. The subjects were instructed to speak clearly. No specific instructions were
provided towards the pitch accent placement in the utterances. Subjects read the set of
sentences containing the target words five times from the computer screen. We
determined the subject’s speech rate by prompting each sentence at regular time intervals
with short pauses in between. In these studio recordings, we gathered 20 renditions of
each target word.
Table 3-4 Main type carrier sentences used in the two languages.
English (orthography)
Russian (transliteration)
It's a [target](pos 4).
Eto [target] (pos 4).
A [target] (pos 1) is a [target] (pos 2) Tot [target] (pos 1) – eto [target] (pos
and nothing but a [target](pos 3).
2), i tol'ko tot [target](pos 3).
94
In addition to the studio recordings we also analysed child directed speech for those
adult subjects who elicited data from children during the recording sessions. These
subjects (and languages) were S1 (SSE), R3 (SSE and MSR) and E4 (SSBE).
3.5.3 Summary of the Elicited Data
To extract statistically representative averages, we aimed to collect about 20
repetitions of the target words from each child per language and age sample. Some tokens
were excluded during data annotation due to too much background noise. Table 3-5
summarises the number of tokens collected (of all target words) per child and age sample.
It also shows the number of sessions needed to collect the tokens.
Table 3-5 Summary of the number of sessions and the total number of elicited tokens per child (and age
sample)
Child
BS_3;5
C3_3;5
AN_3;8
C4_3;8
BS_3;10
C3_3;11
C5_4;0
C6_4;0
C4_4;1
AN_4;2
C7_4;2
C8_4;2
BS_4;5
AN_4;8
C7_4;8
C9_4;10
Number of sessions per language
SSE
MSR
5
3
4
1
3
2
2
2
1
1
2
2
3
2
2
2
4
Total Number of Elicited Tokens
SSE
MSR
282
134
314
3
408
216
101
2
299
266
260
250
237
168
2
163
203
215
342
2
2
390
220
442
231
279
244
The number of tokens collected per session changed from child to child, and from
session to session. The interest of the children to the games was highest in the first
session, and typically somewhat reduced in subsequent sessions. The total number of
sessions recorded with children was 52, and a total of 5664 tokens were collected in both
languages.
95
As shown in Table 3-5, for the majority of children sufficient amounts of data were
collected in two sessions. Younger children had 3 to 5 sessions per age sample. C4 had
only one session for each age sample. AN had only one session for SSE due to a technical
fault, which resulted in the loss of the data for one session. Despite that, the number of
elicited tokens from AN was sufficient for statistical analyses.
3.5.4 Digital Audio Data Formats
The original recordings were digitised at the sampling rate of 44100 Hz and with
16-bit quantisation. The sampling rates used for analyses were 11025 Hz for adults, and
22050 Hz for children (both 16-bit quantisation). The digital audio files with these
sampling rates served as input for manual annotation and automatic acoustic analyses.
3.6 Phonetic and Acoustic Measurements
3.6.1 Overview
In Chapter 2 we introduced the cross-linguistic variables that are the focus of this
study. The aim is to address theoretical questions on how young bilinguals (balanced and
less balanced) cope with sound-structural ambiguities in their languages. The research
variables concern vowel quality, duration, and vocal effort in prominent syllables.
To enable automatic acoustic measurements, vowel duration was manually
annotated. Vowel quality was measured both qualitatively (auditory phonetic analysis)
and quantitatively (automatic acoustic analysis). To measure vocal effort, we used the
measure of spectral balance (Sluijter & van Heuven, 1996b) with modifications for the
purposes of this study. The measure of spectral balance combined intensity level analysis
and the formant analysis, since intensity had to be measured in specific frequency bands
of the radiated spectrum for a particular vowel token. To enable further normalisation, and
to exclude excessively loud or quiet utterances, we needed to estimate both overall
intensity and fundamental frequency.
The data annotation and analysis procedure consisted of two parts, and the general
process is represented in Figure 3-3. We first annotated the onset and the offset of a target
vowel and surrounding consonants, assigned it a broad phonetic label, and annotated a
piece of typical silence (noise) for a given fragment of speech. Subsequently, from the
annotated duration of the vowel we automatically estimated acoustic parameters, such as
formant frequencies (F1-F3, Hz), formant bandwidths (Hz), fundamental frequency (f0,
96
Hz), and RMS-power (for the whole vowel spectrum and for three fixed frequency bands
around F1, F2 and F3). The same process was applied to both child and adult speech. The
method of acoustic encoding, however, differed depending on the vocal tract
characteristics of adult and child participants. The overview of the acoustic parameters
used in this study is given in Table 3-6.
Figure 3-3 Data flow diagram of the encoding process of the acoustic waveform into acoustic parameters
and phonetic labels.
Table 3-6 Raw acoustic measurements in this study.
1
2
3
4
5
Parameter
F0
F1
F2
F3
OI
6
A1
7
A2
8
A3
Description
estimate of fundamental frequency (Hz)
estimate of the centre frequency of F1 (Hz)
estimate of the centre frequency of F2 (Hz)
estimate of the centre frequency of F3 (Hz)
overall intensity, measured as RMS-power (dB) from the
whole DTFT spectrum in a 23 ms Hamming window
RMS-power (dB) measured around estimated F1 in a fixed
frequency band
RMS-power (dB) measured around estimated F2 in a fixed
frequency band
RMS-power (dB) measured around estimated F3 in a fixed
frequency band
97
3.6.2 Data Annotation
3.6.2.1
Phonetic Labelling
For each token, the vowel was analysed auditorily and labelled accordingly. All the
manual labelling was done in PRAAT (Boersma & Weenink, 2004).
One of the following broad phonetic symbols was assigned to a vowel:
[i], [], [u], [], [], []. Since all files containing phonetic labels and duration markers
were subjected to automatic processing after acoustical analyses, all the phonetic labelling
was made with a computer readable version of IPA, i.e. SAM Phonetic Alphabet (or
SAMPA) (Wells, 1995).
Tokens were excluded from further acoustic analyses if:
-
an adult was talking at the same time as the child
-
there was too much environmental noise
-
the target word could not be identified
-
it was pronounced in whisper
-
the vowel was de-accented (cf. Section 3.6.2.3)
-
none of the above vowel symbols could be assigned.
Neighbouring consonants were also phonetically labeled. The transcription for
consonants included some diacritics for narrow phonetic transcription:
-
a palatalisation marker8 was used, since it is a lexically contrastive phonological
feature in Russian; but it also often occurred in SSE child speech
-
a marker for ejective stops was used, since we identified that some SSE child
realisations of voiceless stops in syllable coda were made with a glottalic
airstream rather than pulmonic; the duration of the occlusion in such stops often
looked significantly longer than more typical realisations made with a pulmonic
airstream mechanism
-
a marker for aspiration was used when it appeared in the Russian data, since this
phonetic property is not featured in Russian either phonologically or
phonetically, and could thus be due to language interaction from SSE.
3.6.2.2
Annotation of Timing
Duration of the vowel and the surrounding consonants was measured after visual
inspection of the waveform and of the spectrogram of each utterance. In defining
8
In SAMPA, the phonetic labels do not contain diacritics, but rather unambiguous sequences of symbols,
which we refer to as a “marker”.
98
segmental boundaries for each of the CVC segments, we mainly followed the annotation
criteria specified in van Zanten et al. (1991). The criteria concern the shape of the
amplitude envelope of an acoustic waveform. Since the consonants surrounding vowels
were usually realised as voiceless, the amplitude envelope of the waveform changed
dramatically at the CV transition, and such a transition was relatively easy to identify (e.g.
Figure 3-4). The same was often true for the VC transitions when a vowel was followed
by a voiceless obstruent (Figure 3-5).
Figure 3-4 Timing marker indicating the end of the voiceless fricative [s] and the beginning of the
following vowel [] in “sieve” (annotated in SAMPA).
Figure 3-5 Timing marker indicating the end of the vowel [] and the beginning of the devoiced stop [t] in
“food” (annotated in SAMPA).
99
Figure 3-6 Timing marker indicating the end of the vowel [] and the beginning of the voiceless stop [k] in
“cook” (annotated in SAMPA).
However, in some cases (Figure 3-6) it was more useful to follow spectral cues. For
example, in agreement with the patterns described for British English by Gobl and Ní
Chasaide (1988; 1999b), in vowels preceeding voiceless stops the cessation of voicing
could start quite early in the vowel, and the decrease of the vowel amplitude was long and
gradual rather than abrupt (see Figure 3-6). In such cases, we identified the boundary in
the vowel spectrum at the offset of F2.
A boundary between a vowel and a following voiced fricative was usually annotated
at the beginning of visually identifiable friction in the higher frequency partials (Figure
3-7).
Some difficulty for segmentation was caused by the presence of preaspiration of
voiceless fricatives in Scottish English (Gordeeva & Scobbie, 2004). Preaspiration also
occurred in the child data in the SSE sessions. In the preaspirated VC realisations, usually
in “fish” tokens, the [] sequence could contain rather long (sometimes up to 400 ms) [h]sounding or whispery transitions. Such a sequence was annotated as a separate phonetic
entity (as in Figure 3-8).
100
Figure 3-7 Timing marker indicating the end of the vowel [i] and the beginning of the voiced fricative
[z] in “cheese” (annotated in SAMPA).
Figure 3-8 Two timing markers indicating the boundaries between the end of the vowel [] and the
preaspirated whispered transition [] and the following voiceless fricative [s] in “fish” (annotated in
SAMPA). The duration of [] is 142 ms.
101
3.6.2.3
Annotation of Prominence and Utterance Type
It is well known that segmental and suprasegmental sound properties, such as voice
quality, duration and intensity vary as a function of prominence and pragmatic meaning of
intonation (Lehiste, 1977). Since the target words were elicited in spontaneous interaction,
children produced them in phrases of different length, structure, and intonational meaning.
Therefore, for each token the syllable prominence was analysed and labelled.
Syllables carrying a pitch accent in a broad or narrow focus (Ladd, 1996) were
considered for further analyses, while de-accented syllables were excluded. For each focal
syllable, we annotated position in the utterance as:
-
phrase initial
phrase medial
phrase final (in phrases with more than one pitch accent)
phrase final single pitch accent
It was beyond the scope of this study to identify pragmatic meaning of intonation
expressed by fundamental frequency or by giving a phonological transcription (e.g. by
assigning H and L labels to tonal events). However, since pragmatic meaning of
intonational events affects other suprasegmental sound properties, it was necessary to
broadly classify utterances according to modalities of illocutionary speech acts commonly
maintained in the literature, with such basic distinctions as non-emphatic statements,
yes/no questions, WH-questions, emphatic statements (see Hirst & Di Christo (1998)).
This enabled us to choose between appropriate intonational modalities for statistical
analyses for different subsets of tokens.
3.6.3 Automatic Acoustic Measurements
3.6.3.1
Steady-State of the Vowel
The formant centre frequencies and RMS-power were calculated by averaging the
estimates through all frames in a steady state part of the vowel. The steady state was
defined as beginning at 15% of the total vowel duration from the vowel onset, and ending
at the 50% of the total vowel duration. The minimum duration of the steady state was set
to 25 ms. In exceptional cases, when the percentage approach resulted in a steady state of
less than 25 ms, than its duration was defined in an absolute fashion, i.e. as the 25 ms after
the initial 15 ms transition. By performing acoustic analyses in the steady state of the
vowel rather than in the transitions, we excluded parts of vowels with possible short-term
laryngeal influences from the left and right consonantal contexts. By integrating the RMS102
power means through the steady part of the vowel, we also levelled out accidental
perceptually irrelevant short-term fluctuations of the intensity curve, which are due to the
interaction of harmonic and formant frequency (Ladefoged & McKinney N.P., 1963;
Lehiste, 1977).
3.6.3.2
Formant Analysis
Several difficulties have to be taken into account when estimating centre formant
frequencies for high fundamental frequencies, such as in child and adult female speech.
First of all, it is known that the estimation can be affected by a bias towards the
harmonic closest to the centre frequency. The greater the distance between two harmonics
the greater the bias away from the centre frequency is likely to be (Traunmüller &
Eriksson, 1997).
Secondly, since this bias is proportional to f0 (Traunmüller & Eriksson, 1997), it is
not constant in a prominent syllable nucleus, because in “stress-accent” languages (like
English and Russian) (Beckman, 1986), f0 changes as a function of the pragmatic meaning
of intonation throughout the course of prominent syllables. Therefore, in a syllable
nucleus, the error in formant estimation will fluctuate depending on how far the poles are
from the closest harmonic (both can eventually coincide). This problem can be somewhat
levelled out by averaging the estimated centre frequencies of formants through a steady
state portion of the vowel, taking the measurements at multiple points.
Thirdly, we had to make naturalistic recordings with variable environmental noise
(different homes and computer fan resonance). It is well known that estimation of
formants is prone to environmental noise, especially when such noise is narrow band. The
formant analysis procedure in this study is designed to address these issues as much as
possible.
Figure 3-9 represents the process of formant analysis employed in this study. The
same process applied to both child and adult data, but the specific method of formant
estimation (circle 2 in Figure 3-9) differed for the two types of data based on the best
performance. The issues of determining the best performance are discussed in Section
3.6.4.2.
103
Figure 3-9 Data flow diagram of the formant analysis process of the acoustic waveform and annotated
timing of vowels.
Step 1 in Figure 3-9 involved noise reduction. The issues of efficacy of applying
this method are discussed in Section 3.6.4.2.3. The timing of a typical piece of silence (or
noise) was labeled for each utterance during phonetic segmentation. ‘Typical’ means that
the piece of silence had to be more or less steady and reflect the noise level during of the
utterance. For example, with computer fan noise we could see in the spectrum that the
narrow-band formants run throughout the whole utterance. Subsequently, we performed
noise reduction9 on the speech file, by subtracting the spectral magnitude (in short term
discrete time Fourier transform) in a central frame of the annotated silence (noise) from
all frames in the speech signal. The speech signal with subtracted noise served hence as
input for further formant analysis (not for RMS-power analysis).
In Step 2 (Figure 3-9), formant centre frequencies of the target vowels were
analysed using PRAAT (Boersma & Weenink, 2004). Two different methods were
employed for the formant centre frequency estimation in child and adult speech. For adult
female speech (sampled at 11025 Hz), we estimated centre frequencies based on the 10th
order LPC analysis (autocorrelation, 25 ms analysis width, 10 ms time step, pre-emphasis
of +3 dB per octave from 50 Hz). For child speech, we estimated the centre frequencies
based on the LPC (burg) method as implemented in PRAAT (Press et al., 1992). In this
method, a speech signal is re-sampled at twice the maximal frequency of interest.
Subsequently, formant analysis is applied in a Gaussian window of 51.9 Hz. The
9
The algorithm was kindly implemented by Peter Rutten, a speech signal-processing engineer working in
text-to-speech technologies.
104
following parameters were employed: 5 ms (time step), 4 (number of poles), 6000 Hz
(maximal frequency of interest), 25 ms (analysis width), 100 Hz (+3 dB per octave preemphasis).
In Step 3 (Figure 3-9), the extracted formant centre frequencies for F1, F2 and F3
(Hz) were averaged through all the frames in a steady part of the vowel. Since both
autocorrelation and burg LPC make residual errors in estimation due to high f0, we
included a heuristic in the mean extraction algorithm to exclude spurious formant
estimations (‘candidate errors’). The criteria for exclusion (for both children and adults)
were established after extensive examination of typical values and errors in the vowels
and by subsequently verifying the errors in the FFT-spectra. The following criteria were
put forward, and referred to the auditorily verified phonetic symbols:
-
if number of estimated poles was less than 3 (rather than 4)
-
if female adult [i] in SSE and MSR had F2 < 1800 Hz (typically F2 > 2500)
-
if female adult [u] in MSR had F2 > 1800 Hz (typically F2 is low)
-
if child F1 < 250 Hz
-
if child F1 > 1500 (no open vowels in the child data in this study)
-
if child [i] in SSE and MSR had F2 < 2000 Hz (typically F2 > 3000 Hz)
-
if child [u] in MSR had F2 > 2000.
The validation of the formant means, calculated after applying this procedure is
addressed in section 3.6.4.2.
105
3.6.3.3
RMS-Power Analysis
RMS-power analysis in specific frequency bands of DTFT spectrum served as a
non-normalised basis for the spectral balance measurement. Figure 3-10 represents the
process of RMS-power analysis employed in this study. The same process applied to both
child and adult data.
Figure 3-10 Data flow diagram of the RMS-power analysis of the acoustic waveform in fixed frequency
bands.
In Step 1, the RMS-power (dB) was calculated in three frequency (spectral) bands
around F1, F2 and F3. The bandwidths of the spectral bands (Hz) were fixed as in Table
3-7. According to the acoustic theory of speech production (Fant, 1960), the formant
bandwidths are predictable from the formant frequency, for a normal phonation with
resulting spectral tilt of -12dB per octave. The bandwidths can be derived for a frequency
of interest from the following equation (Fant, 1960):
Equation 3-1
Bn = Fn / 2π ;
where Bn is a bandwidth, and Fn is a target frequency. The bandwidths derived from
Equation 3-1 were than fixed to a maximum, by rounding of the maximal bandwidth to
the nearest hundred in each of the three frequency slices shown in Table 3-7. For
example, if F2 of a vowel was 2100 Hz, the RMS-power (dB) would than be calculated
between 1801 and 2400 Hz. For each token, the spectral bands did not overlap.
106
Table 3-7 Summary of the fixed frequency bandwidths for three frequency slices.
Frequency slice (Hz) Maximal bandwidth (Hz) Fixed Bandwidth (Hz)
1 to 2000
318.3098862
300
2001 to 4000
636.6197724
600
4001 to 5500
875.352187
900
For a given spectral band, the power (dB) was calculated from the short term
Discrete Time Fourier Transformed spectrum (stDTFT)10. A Hamming window of 23 ms
was used to extract the speech signal samples. The power (dB) was than defined as 20
times the base-10 logarithm of the power in a frequency band, relative to the maximum
power allowed by 16-bit quantisation (32767).
The power was expressed as RMS following the equation:
Equation 3-2
RMS ( Fx → Fy ) =
2
N2
N / x− y
∑ F ( n)
2
(dB)
0
where: N is number of samples in a frame;
Fx is start frequency of a frequency band;
Fy is end frequency of a frequency band;
F(n) is the short term Discrete Time Fourier Transformed spectrum of the of the
windowed speech segment.
In Step 2 (Figure 3-10), the RMS power values (dB) were extracted for each of the
three bands, and from the values the means were calculated throughout the steady part of
the vowel. Overall intensity was measured and averaged in the same way as in specific
spectral bands, except that the RMS-power measurements covered all spectrum
frequencies.
3.6.3.4
Fundamental Frequency Analysis
Fundamental frequency, f0 (Hz), was estimated using speech analysis package
“Wavesurfer” (Sjölander & Beskow, 2000), using the ESPS (XWaves) pitch analysis
method. Since both child and female adult speech can have high fundamental frequencies,
we employed the same broad parameters covering both groups.
10
The algorithm was kindly implemented by Peter Rutten, a speech signal-processing engineer working in
text-to-speech technologies.
107
The f0 was calculated in frames of 10 ms, with a minimal pitch threshold of 50 Hz,
and a maximum pitch threshold of 1500 Hz. The f0 (Hz) values were then extracted for
three positions in the vowel (with reference to the total vowel duration, rather than to that
of the steady state):
(1) for the frame corresponding to the onset of the vowel duration + 10 ms
(2) for the frame in the middle of the vowel
(3) for the last frame in the vowel.
3.6.4 Data Validation and Normalisation.
3.6.4.1
Validation of Phonetic Labels
We performed a validation of auditory phonetic labelling by testing intra- and
intertranscriber reliability. Both tests were based on 5% of the child data pseudorandomly chosen from the data covering two age samples in both languages and from all
the child control data available at the time of testing. A total of 143 utterances from six
subjects were selected. Both intra- and inter-transcriber reliability tests were conducted
following the labelling criteria specified in 3.6.2.1, except that only vowels and no
surrounding consonants were annotated in the tests. The transcribers were informed as to
the target words collected in this study, since the author was also aware of those during
the data annotation process.
The agreement in the labeling results in both tests was measured by means of
statistical analysis. ‘Cohen's kappa’ statistic tests a pairwise agreement between
transcribers against the degree of agreement expected by chance, and, therefore, it is
considered to be a better indication of intra- and intertranscriber reliability than a
percentage of agreement (Fleiss, 1971). Pairwise agreement with a kappa value of more
than 0.7 (K>0.7) is considered to be a statistically satisfactory agreement. Kappa is an
overall index of agreement, and it does not indicate sources of disagreement, so we
mention them separately.
In the intratranscriber reliability test, we re-labeled vowels in the selected 143
utterances. The time difference between the original labeling and re-labeling ranged from
2 to 8 months. The overall agreement for phonetic labels was greater than 0.7
(kappa=0.767; n=143), which is a statistically satisfactory pairwise agreement. The major
sources of disagreement were the labels for close rounded vowels, specifically the
phonetic labels [u],[],[] realised in place of the adult target // or /u/. While it was not
108
difficult to decide that a given target belongs to a close rounded vowel space as such, it
was not always obvious what precise label to assign in this variable child vowel space.
In the intertranscriber reliability test, two phonetically trained transcribers other than
the author annotated the same set of 143 utterances. Both transcribers (A and B) were
native speakers of English, (A) of American English, and (B) of SSE. The intertranscriber
agreement with transcriber A was statistically unsatisfactory, i.e. K< 0.7 (K= 0.611,
n=143), even though the kappa value was still at the high end of the agreement. The
agreement with transcriber B was statistically satisfactory (K= 0.767, n=143). Likewise in
the intratranscriber reliability test, the major source of disagreement were the labels for
close rounded vowels [, u, ].
Since there was statistically significant intratranscriber agreement, and a significant
intertranscriber agreement with the transcriber B (native speaker of SSE), we consider the
overall quality of auditory phonetic analyses performed in this study generally to be
satisfactory for further statistical analyses.
3.6.4.2
3.6.4.2.1
Validation of Estimated Formant Frequencies
Introduction
The method of inferring vocal effort (Sluijter & van Heuven, 1996b) used in this
study relies on the goodness of estimation of the formant frequencies. In this study we
used two different standard methods of formant frequency estimation (LPC
autocorrelation for adults and LPC burg for children). We determined the goodness of
performance formant analysis for each subset of data separately.
For adult recordings, the LPC autocorrelation method performed better than the
LPC burg method in estimating Russian close back vowel [u], and performed similarly for
other vowels. We checked this by selective manual examination of the automatic output
and by visual inspection of the corresponding FFT spectra. Based on this observation, we
chose the autocorrelation method for the adult data.
For child speech LPC burg method performed substantially better than LPC
autocorrelation, and the test of performance is described in Section 3.6.4.2.3 alongside the
validation part.
109
3.6.4.2.2
Adults
Previous studies of vowel formant frequencies in SSBE, SSE and MSR allow us to
judge whether the adult vowel formant frequencies are plausibly estimated in this study.
For this comparison, we selected two studies of Russian vowel formants offering typical
values for one male (Fant, 1960) and one female speaker (Bondarko, 1981). For SSBE,
Wells (1962) studied 25 male subjects, whereas Deterding (1997) made acoustic
measurements of five female SSBE BBC broadcasters from the 1980’s. For SSE, Walker
(1992) reported formant frequencies of all vowel monophthongs. She analysed F1 to F3
for five female SSE speakers from Edinburgh. Additionally for SSE, we refer to the data
on Glaswegian female speakers in Scobbie et al. (1999b).
Table 3-8 compares formant estimations (in Hz) between these acoustic studies for
different languages and our own measurements using LPC autocorrelation method
described in Section 3.6.3.2. The General American data and male data are useful as
additional references when female cross-study estimations are in conflict.
For the adult data, we discuss all language-specific discrepancies larger than 150 Hz
between our data and different sources for female speakers listed in Table 3-8. All the
language-specific female data agrees on F1 to F3 for the vowels [i] and []. The
measurements between the sources are very similar for all the languages in the table.
Russian formant frequencies seem to have been reasonably estimated, given that there is a
gender difference between Fant (1960) and this study, so that lower formant values can be
expected in Fant’s male data. The difference of 315 Hz in Russian back [u] for female
data is acceptable, since it falls within possible F2 ranges reported from different sources
by Bondarko (1981).
For SSE [], there is a large discrepancy of 401 Hz in F2 between this study and
Walker. For SSBE [u], there is an even larger discrepancy of 610 Hz in F2 between this
study and Deterding. For SSBE [], there is a discrepancy of 152 Hz in F2 between this
study and Deterding. There are methodological differences across the studies. Walker and
Wells both performed manual measurements from FFT spectra. However, both Deterding
and this study used an automatic LPC-based method, and yet show a large discrepancy.
Even though, different estimation methods are bound to result in somewhat different
measurements for the same tokens, they are not likely to explain such big differences,
since most sources in Table 3-8 agree to a greater extent for the formants of [i] and [].
110
It is also possible that the studies in Table 3-8 reported reasonable estimations of F2
for the close rounded vowels, and the big differences are due to diachronic phonetic
changes in close rounded vowel quality. There are several reports (Gimson, 1962; Bauer,
1985) of a rapid diachronic change in the quality of RP /u/ since Well’s data (1962) was
sampled. A recent study by Hawkins and Midgeley (Hawkins & Midgley, 2004) confirms
the acoustic fronting of [u] for a group of 20 male RP speakers recorded in 2001,
especially in the younger age group. Deterding (1997) based his measurements on the
corpus of speech of BBC World Service newsreaders recorded in the 1980’s. The
estimated mean F2 for /u/ in Bauer’s (1985) 1982 data was 1704 Hz, which is 369 Hz
higher than the means reported by Deterding (1997). Our F2 means for SSBE /u/ are 610
Hz higher than Deterding’s (1997). This shows that in today’s SSBE the traditionally
described rounded close back vowel has evolved into a more central or even front
rounded vowel.
It is less certain whether a similar phonetic change (401 Hz difference in F2 of [])
may have happened in SSE, since the time span between the different sources is only 12
years at most. Data in this study and data reported in Scobbie et al. (1999b) are in
agreement, while Walker’s (1992) study is in disagreement with both. Since we have seen
that our method has no problems estimating low F2 in back vowels, we can conclude that
our estimation of SSE resonating frequencies is acceptable here as well.
111
Table 3-8 A comparison of different acoustic studies of formant frequencies (Hz), estimated for adult native
speakers of SSBE, SSE, MSR and General American.
IPA
symbol
SSBE
n/Sex
[i]
Wls
25M
285
2373
3088
356
2098
2696
MSR
Detr
5F
303
2654
3203
384
2174
2962
F1
F2
F3
F1
F2
F3
F1
F2
F3
F1 309 328
F2 939 1437
F3 2320 2674
F1 376 410
F2 950 1310
F3 2440 2697
[]
[]
[u]
[]
SSE
Gord. Bond. Fant Gord. Sc. Wlk. Gord.
3F
1F
1M
4F
F
5F
5F
399.4 300 222 383.4 412 343 376.7
2681 2620 2240 2708 2749 2689 2725.6
3170
*
3140 3220
* 3327 3364.6
*
531 545.1
507.9
* 2255 2110.4
2178
* 3009 3027.9
2976
445 411 393.6
1977 1576 1992.9
* 2727 2818.2
403.9 320 231 405.5
2047 620 730 935.4
2911
*
2230 2889
488.2
1462
2886
Gen.Am.
H&al
48F
437
2761
3372
483
2365
3053
459
1105
2735
519
1225
2827
References for the sources and explanations for the codes used in Table 3-8:
Gord. (this study)
Wls
(Wells, 1962)
Detr
(Deterding, 1997)
Bond. (Bondarko, 1981)
Fant
(Fant, 1960)
Wlk
(Walker, 1992)
Sc.
(Scobbie et al., 1999b)
H & al (Hillenbrand et al., 1995)
*
not available
F
Female
M
Male
112
3.6.4.2.3
Children
We validated the estimated formant frequencies for child speech by manual reannotation of vowel formant frequencies for a subset of child speech. The subset of child
data included the same pseudo-random 143 utterances that were originally used for
measuring inter- and intratranscriber reliability (Section 3.6.4.1). All manual annotation of
formants was performed in PRAAT (Boersma & Weenink, 2004). The F1 to F3
frequencies (Hz) were measured directly from FFT spectra (Hamming window, frequency
range 0 to 7000 Hz, window length of 25 ms, and the dynamic range varying from -45dB
to -20dB depending on the signal quality) and additionally from the spectra of according
spectral slices.
The vowel formant values from the manual annotation were then compared to the
automatic output of two formant methods, i.e. LPC burg and LPC autocorrelation. The
comparison was done by calculating RMS error (Hz) for each formant separately. The
input parameters of the LPC burg method were the same for children described in Section
3.6.3.2. The parameters for the autocorrelation method were the same as for adults, except
that for child speech the downsampling frequency was 12000 Hz. Additionally, to test the
efficacy of the noise reduction component in the formant analysis procedure, we also
included for each of the LPC methods the outputs from the acoustic signal with subtracted
noise and with the original noise level. So in total we calculated RMS errors of formant
estimation (Hz) for four methods: (1) LPC burg with subtracted noise; (2) LPC burg with
original recording noise level; (3) LPC autocorrelation with subtracted noise; (4) LPC
autocorrelation with original noise level.
The results of this test are presented in Figure 3-11. In terms of the smallest RMS
error (Hz), the LPC burg method with subtracted noise outperformed other three methods
for all measured formants (F1 to F3, Hz). Compared to the manual annotation, the
autocorrelation method performed much worse than burg, especially for F2 and F3. For
all three formants, applying the noise subtraction improved the performance of the
automatic LPC formant tracking as compared to the output from the signal with the
original noise levels. The improvement was the biggest for F2, where both methods (burg
and autocorrelation) had an improvement in estimation of 17%. This test reinforces: (1)
the use of the LPC (burg) method for the formant estimation of child speech in our study,
since it gives a more realistic estimation of the formant frequencies than LPC
113
(autocorrelation); (2) the use of the noise subtraction procedure to improve the overall
formant estimation.
900
800
RMS Error (Hz)
700
LPC Burg (reduced noise)
600
LPC Burg (original noise)
500
LPC Autocorrelation (reduced
noise)
LPC Autocorrelation (original
noise)
400
300
200
100
0
F1
F2
F3
Formant Number
Figure 3-11 RMS errors (F1 to F3, Hz) for four automatic formant analysis methods as compared to manual
formant measurements from FFT spectra.
3.6.4.3
Normalisation of RMS-Power Measurements
In section 3.6.3.3 we described the acoustic method used to measure spectral
balance. However, raw RMS-power measurements around formants cannot be used
without normalisation for other (non-)linguistic effects, such as variation of overall
intensity and formant frequency shifts between different instances of the same vowel.
Differences in the overall intensity of a vowel result from several confounding
speaker-related effects, such as the amount of the exerted effort, differences in vowel
quality. It also results from the environmental factors such as the distance of the subject
from the microphone, speaking volume, recorder volume settings and environmental
noise. An appropriate normalisation procedure should separate the speaker effects from
the environmental effects. The measures undertaken to reduce environmental noise were
discussed in Section 3.5.1.2.
To normalise for the residual differences in overall intensity, we chose what Jessen
(2002) calls an ‘intrinsic normalisation method’. An intrinsic normalisation expresses a
114
relationship between two different acoustic measures from the same token, while extrinsic
methods involve a comparison of a measure in one token to the same measure in some
other related token (e.g. overall intensity in stressed and unstressed syllables of the same
word). An extrinsic normalisation method is ill suited in this study, given that the data
were gathered in spontaneous play situations, and elicited phrase structure varied from
utterance to utterance, so that no unique comparison point could be found for all the
utterances.
The first step involved normalisation for differences in overall intensity. The
spectral band level of each vowel token was expressed as a ratio of the RMS-power in a
specific frequency band to the overall RMS-power of the same token, as shown in:
Equation 3-3
Ai = RMSBi − OI (dB)
where OI is the measured overall RMS-power (dB) averaged through the steady state of
the vowel, and RMSBi is the RMS power in a specific frequency band i (dB) for the same
part of the vowel. This ratio was calculated for each of the formant frequency bands of the
vowel, and was taken as input for the next normalisation step.
The next step involved normalisation for differences in the vocal tract resonance
frequencies. The differences are a result of intra- or interspeaker variation in
supralaryngeal settings for a given vowel rather than in vocal effort. This problem has
been addressed in studies dealing with acoustic correlates of stress (Sluijter & van
Heuven, 1996b), ‘syllable-cut’ prosody (Jessen, 2002) and voice quality (Hanson, 1997).
According to the acoustic theory of speech production (Fant, 1960), formant
frequencies and intensity levels of the spectrum are interrelated, following the ‘low-pass
filter’ rule:
shift in the frequency Fn of a formant brings about an intensity level change of the
sound, which is mainly confined to frequencies above Fn and amounts to +12 dB for
an increase of one octave in Fn (Fant, 1960, p. 58)
This means that the same speaker can produce two phone instances of the same
vowel with a slightly different articulatory setting (e.g. due to a more open articulation).
This supralaryngeal difference results in different vocal tract resonance in the radiated
spectrum (higher F1 in our example), and consequently in somewhat higher intensity
levels in the frequencies above F1 as specified by the ‘low-pass filter’ rule.
115
An appropriate normalisation procedure should thus separate the laryngeal effects
from the supralaryngeal ones. To achieve this, we followed the normalisation method for
formant frequency shifts described in Jessen (2002). As a whole, the method in his study
originates from different sources (Sluijter & van Heuven, 1996b; Hanson, 1997). Each
formant measurement (F1 to F3, Hz) for a vowel phoneme instance is compared to the
formant means for this vowel across speakers. The residual formant frequency difference
(Hz) is then transformed into intensity difference (dB). Subsequently the intensity
difference is subtracted from the measured RMS-power (dB) around the formant. The
method enables a comparison between different articulatory realisations of the same
phonemes, and allows comparison between speakers.
However, in this study we had to adopt the normalisation to enable crosslinguistic
comparison. Therefore, the specific formant estimation for each vowel token was
compared to the formant mean for this vowel phoneme across languages and speakers.
Originally we intended to apply the normalisation procedure for the formant
frequency shifts for both F2 and F3. However, upon the validation of formants in child
speech (Section 3.6.4.2.3), it was decided to exclude the data of spectral balance around
F3 due to its unreliability, since the formant shift normalisation in F3 requires subtraction
of a joint contribution from F1, F2 and F3 (Hanson, 1997). We have seen that the RMSerror in estimation of F2 and F3 was on average 300 Hz (Figure 3-11) for either formant,
so that the cumulative RMS-error was greater if we considered both F2 and F3. Therefore,
we account for the spectral balance measurement around F2 that requires subtracting a
joint contribution from only F1 and F2 (Sluijter & van Heuven, 1996b, p. 2478).
We first calculated the correction factors for intensity level differences that are due
to formant frequency differences. The mean formant frequencies for F1 to F2 (Hz) were
averaged for each target phoneme across languages and all speakers (for adults and
children separately). The means for F1 to F2 (Hz) were derived separately for adults and
children due to big differences in their vocal tract length and resonance. Equation 3-4
(Sluijter & van Heuven, 1996b, p. 2478) calculates the correction factor for intensity level
differences of A2 due to shifts in F1 as compared to the mean F1 and F2.
Equation 3-4
∆A2 a = 40 log 10( F 1n / F 1) − 40 log 10( ( F 2 n 2 − F1n 2) / ( F 22 − F 12) ;
(dB)
116
where F1, and F2 are estimates of the first and second resonating frequencies (Hz) for each
vowel token. F1n and F2n (Hz) are mean frequencies for each formant averaged across
speakers (adults and children separately) for each phoneme in each language separately.
The correction factor ∆A2 a (Equation 3-4) was then subtracted from the result of
Equation 3-3 as in:
Equation 3-5
A2 * a = A2 − ∆A2 a ; (dB)
Additionally, to allow comparisons between pairs of vowels different in vowel
quality we calculated the correction factors based on Equation 3-4 and for F1n and F2n
(Hz) based on mean frequencies for each of the two formants averaged across speakers
(adults and children separately) and languages (1) jointly for unrounded vowels /i/ and //,
resulting in correction factor ∆A2b (2) jointly for rounded vowels // and /u/ and across
speakers (adults and children separately) resulting in correction factor ∆A2c.
The correction factors were then subtracted from the band-specific result of
Equation 3-3 resulting in two additional normalisations:
Equation 3-6
A2 * b = A2 − ∆A2b ; (dB)
Equation 3-7
A2 * c = A2 − ∆A2c ; (dB)
To summarise, four normalised measures of spectral balance (A2, A2*a, A2*b, A2*c)
were used in this study. All of them normalised for the differences in overall intensity.
The normalisation was separate for the child and adult speakers. Additionally, to allow
crosslinguistic comparison for the vowels similar in quality, A2*a normalised for the
supralaryngeal differences in F2 within each of the phonemes /i/, //, /u/, // or // across
each speaker group (children or adults) across languages. Besides, to allow crosslinguistic
comparison and comparison of vowels different in vowel quality, A2*b normalised across
the phonemes // and /i/ across each speaker group and language, while A2*c normalised
for the differences across /u/, // and // across each speaker group and languages.
117
4 Acquisition of Vowel Quality
4.1 Introduction
This chapter investigates bilingual patterns of vowel quality. The subjects are two
bilingual children AN and BS acquiring Russian and Scottish English. The two girls differ
in the bilingual input conditions: i.e. BS is a Russian-dominant bilingual, while AN had a
nearly equal amount of input in the two languages. Both girls had “early, simultaneous,
regular, and continued exposure to more than one language” (de Houwer, 1995, p.222)
from before the age of two.
The vowels investigated fall into two categories: (1) close(-mid) unrounded vowel
/i/ in Russian versus SSE /i/ and //; and (2) MSR close rounded vowel /u/ versus SSE
close central rounded //. The former group forms a systemic crosslinguistic difference;
the latter one is realizational: i.e. equally ambiguous from the point of view of either
language. The crosslinguistic differences regarding the vowel quality have been addressed
in Section 2.1.3. The chapter is built around two sets of questions:
(1)
Does each of the bilingual children have a differentiated control of vowel
quality in their two languages? Does this control change longitudinally?
(2)
Is there any language interaction? What are the patterns? Can the direction
of interaction be explained by the amount of language input or
intralanguage factors such as ‘markedness’ or ‘cue strength’?
The aim set out in this and further chapters is to account for bilingual language
differentiation and interaction patterns for these research variables, and to test them
against two views on language interaction in bilingual language acquisition that have been
admittedly formulated for morpho-syntactic development. The Language Dominance
Hypothesis (Petersen, 1988) claims that language dominance determines the direction of
language interaction in young simultaneous bilinguals: i.e. transfer is unidirectional
towards a less dominant language. As opposed that, the Cross-language-Competition
Hypothesis (Döpke, 1998; Döpke, 2000) and the Markedness Hypothesis (Müller, 1998)
both claim that linguistic structure and its complexity (relative ‘cue strength’ or
‘markedness’) determine the direction of interaction, whereby language interaction for a
feature is unidirectional towards the language with a more ambiguous (marked) structure.
118
Furthermore, we provide new SSE monolingual data on the state of acquisition of
these vowels at the ages of 3;4 to 4;8.
Monolingual results (adults for Russian and adults versus children) are presented
first. Then bilingual results for each subject (BS and AN) are compared to the
monolingual data. In particular, the subjects’ phonetic ranges of realisations of adult
targets in each of the vowel categories are compared to the ranges of the monolingual
controls (both children and adults), trying to tease apart the issues of bilinguality and
speech immaturity. Then we compare (where possible) each subject’s ranges in SSE to
those in their Russian monolingual language mode to address the question of language
differentiation. Longitudinal aspects will be looked at separately. From all these analyses
we attempt to derive the answers to the above research questions.
4.2 Statistical Analysis
The phonetic variation between bilinguals’ languages and between bilingual and
monolingual children is compared by means of non-parametric statistics. Phonetic range
for each vowel category is defined by the frequencies of assigned phonetic labels for an
adult target phoneme. Chi-square (χ2) tests are appropriate to determine whether such
distributional differences in phonetic realisation are significant (at 95% level of
confidence). The tests were performed only for those subsets of data that fulfilled the
validity requirements of ‘expected frequencies’.
4.3 Acquisition of Vowel Quality
4.3.1 Scottish English Monolingual Results
4.3.1.1
Acquisition of close(-mid) unrounded vowels
In Section 2.2.1.1., we reviewed the evidence that SSE vowel // is acquired later
than /i/, and that at the age of 3;0 SSE-speaking children produce the lax vowel with a
relatively low accuracy rate compared to other vowels (Matthews, 2002). This section
explores monolingual production patterns at the age of 3;4 to 4;9.
As we discussed in Section 3.3.1, the control sample for SSE monolingual children
in this study consisted of seven children (aged 3;4 to 4;9). Three of the children were
119
recorded twice longitudinally, giving us a total of ten controls. In the following graphs the
data of all children (including longitudinal cases) are plotted by their age.
In this section we explore whether the accuracy rate of the SSE vowel // reported in
Matthews (2002) has improved in our age group of SSE-speaking children, whether the
children are still different from adults in the range of phonetic variants. Additionally, we
consider how the phonetic range of /i/ relates to that of //.
With regard to the first question, Matthews’ (2002) data show that the total
percentage of // produced as adult-like [] in the longitudinal data of his seven subjects
aged 1;10 to 2;10 is only 54.5% (n= 521). The percentages of adult-like forms of [] for //
for each subject at the age of 2;5 to 2;10 ranged from 11 to 93%. The lax vowel belonged
to the ‘difficult’ category11.
The phonetic range of // for our SSE child control data is presented in Table 4-1
and in Figure 4-1. The results show that the overall percentage of adult-like realisations of
// in our group has increased to a total of 98.9%. Interestingly the overall 0.9% of all non-
adult-like forms is contributed by tongue raising ([i] for //).
Among all the children, only C3 (aged 3;11) had an instance of // (1.4%) produced
as [] with lip rounding and tongue raising. The cases of tongue raising to [i] (0.9%)
belonged to two children, of whom C9 (aged 4;9) was the oldest child in our group, so
that the results showed that segmentally // is acquired by the age of 3;4 of monolingual
development. The limited number of non-adult-like realisations is likely to be a sign of
residual speech immaturity, rather than a systematic developmental property.
As expected, the overall number of non-adult-like realisations for the vowel /i/ is
very low (0.5%), and the results per child are summarised in Appendix A. There was only
one non-adult variant involved: [] for /i/.
11
Matthews’ (2002) subdivision into “difficult” and “easy” refers to standard deviations in the number of all
individual target vowels for a given session per speaker, so that we can’t use the same criterion for our
comparison, since our data involves only a limited subset of the whole vowel space. Here and in the
following sections on the rounded vowels we calculated percentages comparable to our own data for each
individual vowel from the raw data set kindly provided to us by Ben Matthews.
120
100%
90%
% tokens per subject
80%
70%
}
[]
I
[]
i
[i]
60%
50%
40%
30%
20%
10%
C
9_
4;
9
C
7_
4;
8
C
7_
4;
2
C
8_
4;
2
C
4_
4;
1
4;
0
5_
4;
0
C
C
C
3_
6_
3;
11
3;
8
4_
C
C
3_
3;
4
0%
Subjects
Figure 4-1 Phonetic range of variation in the production of the lax vowel // by SSE monolingual children
(plotted by age on the horizontal axis)
Table 4-1 Phonetic ranges of adult target // produced by SSE monolingual children
Speaker
C3_3;4
C7_4;2
C4_3;8
C3_3;11
C6_4;0
C8_4;2
C5_4;0
C4_4;1
C9_4;9
C7_4;8
Total
Tokens
per
speaker
N
[i]
0
84
0
84
%
.0%
100.0%
.0%
100.0%
N
0
46
0
46
%
.0%
100.0%
.0%
100.0%
Label
[]
Total
[]
N
0
28
0
28
%
.0%
100.0%
.0%
100.0%
N
0
72
1
73
%
.0%
98.6%
1.4%
100.0%
N
0
56
0
56
%
.0%
100.0%
.0%
100.0%
N
1
92
0
93
%
1.1%
98.9%
.0%
100.0%
N
0
73
0
73
%
.0%
100.0%
.0%
100.0%
N
0
45
0
45
%
.0%
100.0%
.0%
100.0%
N
5
76
0
81
%
6.2%
93.8%
.0%
100.0%
N
0
80
0
80
%
.0%
100.0%
.0%
100.0%
N
6
652
1
659
%
.9%
98.9%
.2%
100.0%
121
With regard to the second question of how the amount of non-adult realisations for
the target /i/ compares to that of //, the answer is summarised in Table 4-2. The low
number of non-adult like forms did not allow us to perform valid chi-square tests12. The
overall low number of non-adult like realisations: 0.5% for /i/ versus 0.7% for // shows
that both vowels are equally mastered in terms of adult-like production at this stage of
monolingual development, and there is no difference between them in terms of ‘difficulty’
at this point.
Table 4-2 Frequencies of adult and non-adult like realisations of /i/ and // for SSE monolingual children
(aged 3;4 to 4;9)
Target
vowel
/i/
//
Total
Tokens per
target
vowel
N
Adult like?
No
Total
4
Yes
780
%
.5%
99.5%
100.0%
N
6
652
658
%
.9%
99.1%
100.0%
N
10
1432
1442
%
.7%
99.3%
100.0%
784
With regard to the third question of children’s production compared to adults’
phonetic variants in auditory terms, the difference between the two groups is in the fact
that some children did marginally produce non-adult-like forms (e.g. [i] for // in Table
4-1), whereas adults did not at all.
4.3.1.2
Acquisition of close rounded vowels
Based on the overall numbers of adult-like and non-adult-like realisations for // in
Matthews (2002) study, the vowel was labeled as ‘difficult’ compared to the rest of the
SSE vowels (see also Section 2.2.1.1). This section explores phonetic ranges for this
target vowel produced by the monolingual children aged 3;4 to 4;9.
We address the questions: (1) whether the substantial percentage of non-adult-like
realisations for // reported in Matthews (2002) for children aged up to 3;0 is reduced in
the production of SSE monolingual children aged 3;4 to 4;9; (2) how the phonetic ranges
of // produced by the SSE-speaking children relate to those of the adults in this study.
12
This remark is also true in the following sections in places where we present contingency tables without
accompanying statistical results.
122
With regard to the first question, Matthews’ (2002) data showed that the percentage
of adult-like forms (i.e. produced as []) for // in the longitudinal data of his seven
subjects aged 1;10 to 2;10 was only 63.8% (n of tokens= 537): with the percentages of
adult-like forms across subjects (compared to total targets //) at the age of 2;5 to 2;10
ranging from 25% to 93%. Therefore, the vowel was classified as ‘difficult’.
Our results for // for the SSE monolingual children aged 3;4 to 4;9 are presented in
Table 4-3 and in Figure 4-2. The overall percentage of adult-like realisations of // by the
children increased to a total of 83% (compared to 63.8% in Matthews’ study). However,
despite an increase of 19 percentage points of adult-like forms compared to Matthews’
data, all subjects in this study still produced a substantial number of non-adult-like
realisations ranging from 1.9 to 35.8%.
The most important contributor of non-adult-likeliness (a total of 11.8% in Table 43) is the tongue lowering and backing resulting in a sound like []. // does not feature in
adult SSE phonology. However, all monolingual children produced [] to some degree
(ranging from 1.9% to 32%). Interestingly, among the tokens labeled as [], 76.8% of
cases come from the SSE carrier words featuring the lax // in the SSBE adult model (see
Appendix B).
The second most important contributor to non-adult-likeliness of // is the 4.3% of
back realisations as [u]. The presence of this variant produced by the SSE-speaking
children is very important for our crosslinguistic perspective involving Russian back
vowel /u/. This means that any presence of back realisations of [u] for SSE // in our
bilingual children speech production cannot be interpreted as a sign of language
interaction from Russian. Instead we need to look at the statistical significance of the
differences in phonetic ranges, rather than at their mere presence.
123
100%
80%
U
[]
[u]
u
[]
}
[]
I
[i]
i
60%
40%
20%
9
C
9_
4;
8
C
7_
4;
2
4;
7_
C
C
8_
4;
2
1
C
4_
4;
0
4;
5_
C
6_
4;
0
C
3_
3;
11
C
8
3;
4_
C
C
3_
3;
4
0%
Figure 4-2 Phonetic range of the production of the adult target vowel // by SSE monolingual children
(sorted by age on the horizontal axis).
Table 4-3 Phonetic range in the realisation of adult target [] by SSE monolingual children
Speaker
C3_3;4
C7_4;2
C4_3;8
C3_3;11
C6_4;0
C8_4;2
C5_4;0
C4_4;1
C9_4;9
C7_4;8
Total
Tokens
per
speaker
N
[i]
0
0
51
1
25
77
%
.0%
.0%
66.2%
1.3%
32.5%
100.0%
N
1
0
55
0
2
58
%
1.7%
.0%
94.8%
.0%
3.4%
100.0%
N
0
1
13
0
1
15
%
.0%
6.7%
86.7%
.0%
6.7%
100.0%
Label
[]
[]
Total
[u]
[]
N
0
1
46
0
1
48
%
.0%
2.1%
95.8%
.0%
2.1%
100.0%
N
0
0
52
8
6
66
%
.0%
.0%
78.8%
12.1%
9.1%
100.0%
N
0
1
71
5
15
92
%
.0%
1.1%
77.2%
5.4%
16.3%
100.0%
N
0
0
43
11
9
63
%
.0%
.0%
68.3%
17.5%
14.3%
100.0%
N
0
0
29
0
6
35
%
.0%
.0%
82.9%
.0%
17.1%
100.0%
N
0
0
52
0
1
53
%
.0%
.0%
98.1%
.0%
1.9%
100.0%
N
0
1
67
0
2
70
%
.0%
1.4%
95.7%
.0%
2.9%
100.0%
N
1
4
479
25
68
577
%
.2%
.7%
83.0%
4.3%
11.8%
100.0%
124
4.3.1.3
Summary of results for the SSE monolingual peers
For the vowel /i/, the SSE monolingual children aged 3;4 to 4;9 had no problems
producing adult-like vowel quality. Overall, there were only 0.5% of non-adult-like
targets involving lowering of /i/ to []. Only two children contributed to this immature
realisation. Even though this is not systematic, considering the bilingual aspect of the
study involving lack of tense/lax contrast in Russian the presence of this variant in the
monolingual data is worth noting.
For the vowel //, the SSE monolingual children seem to have resolved the
production difficulty reported in Matthews (2002). Immature realisations for this vowel
are infrequent, but nevertheless they do occur. They include fronting and raising to [i] and
lip rounding and raising to []. The occurrence of fronting and raising of // to [i] in
monolingual child speech is again important to notice, given that the same phenomenon is
frequently observed in the speech in L2-learners of languages lacking such a contrast
(Panasyuk et al., 1995; Escudero, 2000; Piske et al., 2002), and it is possible that similar
effect might manifest itself in the course of SSE acquisition in bilingual children speaking
Russian and Scottish English.
For the vowel //, the non-adult-like realisations were still systematic, in that all the
monolingual children produced them. Overall we measured 17% of non-adult-like
variants for //. The non-adult-like production involved in order of importance: lowering
and backing to [] (11.8%), backing to [u] (4.3%), lip unrounding (0.9%).
Among the tokens labeled as [], 76.8% of cases come from the English carrier
words featuring the lax // in the SSBE adult model. This is a very interesting finding
given the fact that one fourth of the MC families in Edinburgh have at least one member
speaking a non-SSE English variety (Scobbie et al., 1999a), usually featuring the /u/-//
distinction. Obviously pre-school children come into contact with these non-SSE English
varieties in Edinburgh through the local nurseries, and the variability of their speech
production might also reflect the variability of cross-varietal input that they receive in the
community, rather than only be attributed to speech immaturity at the age concerned.
Individual children showed different ranges of variation for each of these processes.
Important for our crosslinguistic perspective is the process of backing to [u], since it
125
involves a variant similar to Russian /u/. Thus in defining language differentiation in SSE
and possible interaction from Russian in our bilingual subjects we need to compare the
proportions of phonetic ranges (including [u]) rather than relying on the mere presence of
/u/ to establish language interaction.
4.3.2 Bilingual Acquisition
4.3.3 Subject AN
4.3.3.1
4.3.3.1.1
Acquisition of close unrounded vowels
Language differentiation
As we discussed in Sections 1.3.2.3.1, 2.1.3.3 and 2.3.2, crosslinguistic differences
in tense/lax vowel opposition, such as its presence in one language and its absence in
another, constitute a relative difficulty for L2-learners depending on the age of onset of
L2-learning (Flege et al., 1995; Guion, 2003), while simultaneous bilinguals ultimately
acquire such contrasts in a native-like fashion (Guion, 2003). We investigate whether AN
acquired this systemic difference.
The first question is whether the absence of the vowel [] in MSR affected AN’s
production of the lax vowel in SSE in terms of frequency of occurrence of phonetic
variants [i] and [] for the target // compared to the SSE monolingual peers.
Table 4-4 The effect of factor bilinguality of subject AN compared to the SSE monolingual peers for the
production of phonetic variants [i] and [] for the target //.
Bilingual?
No
Tokens
per label
N
%
Yes
Total
Label
[i]
Total
[]
6
659
665
.9%
99.1%
100.0%
N
4
303
307
%
1.3%
98.7%
100.0%
N
10
961
972
%
1.0%
99.0%
100.0%
The set up of the test is presented in Table 4-4. As we discussed in Section 4.3.1.1,
the occurrence of [i] for // in the SSE monolingual data was rather infrequent (in two
children out of seven). However, it was present, with the total of 0.9% of [i] for the //.
126
AN produced 1.3% of [i] for the target //. This percentage is comparable to C8’s
individual 1.1%, and it is less than C9’s individual 6.2% in Table 4-1.
Therefore, AN’s production of the lax vowel in SSE is acquired similarly to the
monolingual children despite the absence of the vowel in Russian. The occurrence of [i]
for the target // can fully be attributed to speech immaturity, rather than to language
interaction from Russian.
The second question is whether the presence of the tense/lax vowel contrast in SSE
affected AN’s production of the vowel /i/ in MSR. We assess this in terms of number of
occurrence of phonetic variants [i] and [] for the MSR target /i/ versus the SSE /i/.
The comparison is presented in Table 4-5. There was a significant association
[χ2=20.536; df=1; p<0.01] between the numbers of the phonetic labels [i] and [] for
target /i/ and language, with [] for /i/ being more common in MSR (10%) than in SSE
(2%).
Table 4-5 AN’s production of phonetic variants [i] and [] for the target /i/ in SSE compared MSR (across
age).
Language
SSE
MSR
Total
Label for /i/
Tokens per
Language
N
[i]
378
7
385
%
98.2%
1.8%
100.0%
[]
Total
N
198
22
220
%
90.0%
10.0%
100.0%
N
576
29
605
%
95.2%
4.8%
100.0%
As we already discussed, the vowel quality of /i/ in both Russian and Scottish
English is similar, while Russian features no lax vowel //. The direction of the significant
association in this test, however, is very surprising, since it appears that AN produced a
greater proportion of [] for the target /i/ in MSR compared to SSE, while we would
expect it to be the other way around; and, in fact, we should expect no [] in Russian at all.
The accounted substitutes for the vowel /i/ in Russian children are [] or []
(Zharkova, 2002, p. 72). To our knowledge there are no maturational accounts of the
occurrence of [] for /i/ in stressed syllables in the speech of Russian monolingual
children. The unusual use of [] for /i/ by AN in Russian certainly confirmed our native
speaker intuition that these cases sounded non-Russian even for immature child speech.
127
This affected all three Russian carrier words containing target /i/: /’kit/ (n=3),
/ti/ (n=15), /’fib/ (n=4). Thus, we can conclude that the presence of the vowel [] for
the target /i/ in AN’s Russian speech is a sign of language interaction from SSE.
This is a surprising direction of language interaction, because AN introduced a
marked SSE-sounding vowel [] for the unmarked Russian target /i/. This direction of
interaction contradicts both CCCH (Döpke, 1998; Döpke, 2000) and the Markedness
Hypothesis (Müller, 1998), if the influence originated from within the crosslinguistic
vowel systems.
A possible explanation for this language interaction in the vowel system in AN’s /i/
could be the phonotactic influence of palatalisation of the preceding consonant. If the
effect on the vowel /i/ in AN’s speech had its origin in the acquisition of the consonantal
system, rather than in that of vowels, then the argument of the relative markedness of the
vowel systems would be totally irrelevant, and we cannot make a claim that both CCCH
(Döpke, 1998; Döpke, 2000) and the Markedness Hypothesis (Müller, 1998) are falsified
at the level of speech production.
Phonotactically, Russian /i/ is preceded by palatalised consonants, while nonpalatalised consonants require following // for this pair of vowels. We also know that
Russian-speaking children sometimes produce [] for /i/ (Zharkova, 2002, p. 72). Thus,
potentially AN could aim for [], but produce [] for the target /i/, because she had not
acquired //. Alternatively, she could not yet have acquired the palatalisation in a similar
way to Russian monolingual children, and thus produced non-palatalised consonants
followed by []. Since we annotated this detail of consonantal contexts during phonetic
labelling, we can investigate this alternative explanation.
Table 4-6 shows the distribution of palatalised and non-palatalised consonants in the
preceding context of the vowels [i] and [] for the Russian target /i/. Overall she produced
87.9% of palatalised consonants before target /i/ which is low, considering the findings
that Russian monolingual children acquire palatalisation early (Jakobson, 1941; Zharkova,
2002).
128
Table 4-6 Distribution of palatalised and non-palatalised consonants in the preceding context of the vowels
[i] and [] for Russian target /i/.
Tokens
per Label
N
[i]
Label
[]
Total
Preceding consonant
palatalised?
No
Total
24
Yes
174
%
12.1%
87.9%
100.0%
N
1
21
22
%
4.5%
95.5%
100.0%
N
25
195
220
%
11.4%
88.6%
100.0%
198
Thus, this marginal lack of palatalisation might be a sign of language interaction
from SSE. However, we also see that 95.5% (n=21) of the [] variants are actually
produced after palatalised consonants; this means that the underlying reason for the use of
the lax vowel is not due to the phonotactic influence from the preceding consonant, but is
due to the vowel systems itself.
4.3.3.1.2
Longitudinal perspective
Since there was no difference in AN’s production of phonetic variants [i] and [] for
the target // compared to the SSE monolingual peers, we do not expect the longitudinal
perspective to reveal any language interaction effects.
Table 4-7 Longitudinal production of [i] and [] for target /i/ in SSE by AN.
Age
3;8
4;2
4;5
Total
Tokens per
longitudinal
moment
N
Label for SSE /i/
182
0
182
%
100.0%
.0%
100.0%
[i]
Total
[]
N
55
3
58
%
94.8%
5.2%
100.0%
N
140
1
141
%
99.3%
.7%
100.0%
N
N
4
381
%
%
1.0%
100.0%
Indeed, the percentages in Table 4-7 show that AN’s production of [] for /i/ in SSE
split by age is unsystematic. Overall the production of target /i/ in SSE is very similar to
the SSE monolingual ranges.
AN’s production of the phonetic variants [i] and [] for the target /i/ in Russian split
by age is presented in Table 4-8.
129
Table 4-8 Longitudinal production of [i] and [] for target /i/ in Russian by AN.
Age
3;8
4;2
4;5
Total
Tokens per
longitudinal
moment
N
Label for MSR /i/
[i]
Total
[]
53
16
69
%
76.8%
23.2%
100.0%
N
52
6
58
%
89.7%
10.3%
100.0%
N
93
0
93
%
100.0%
.0%
100.0%
N
198
22
220
%
90.0%
10.0%
100.0%
The association of distributions of phonetic variants [i] and [] for the target /i/ and
age was significant at the 99% level [χ2=23.676; df=2; p<.01]. The percentages in Table
4-8 show that AN’s production of [] for /i/ in Russian decreases in time: it is 23.2% at the
age of 3;8 and 0% at the age of 4;5. Thus AN’s production of Russian /i/ becomes more
adult-like with increasing age.
4.3.3.2
4.3.3.2.1
Acquisition of close rounded vowels
Language differentiation
Unlike the tense/lax vowel contrast for the unrounded vowels, the crosslinguistic
difference between Russian /u/ and Scottish English // is realizational. In Russian the
vowel is back, while in Scottish English is it front or central. Thus the crosslinguistic
speech production for this vowel is directly comparable.
We found that the monolingual SSE children did not produce 100% of adult-like []
at the age concerned, but rather a range of related variants involving adult-like [],
lowering and backing to [], and backing as far as [u]. Since [u] for the target // was
possible in the SSE monolingual child speech, we cannot assess bilingual language
differentiation in terms of mere presence of the back [u] in bilingual child speech in SSE.
Therefore, we assess the ranges rather than mere presence of certain phonetic realisations.
The first question is whether the proportion of the subset phonetic variants [] and
[u] for the SSE target // in AN’s production differs from that of the SSE monolingual
peers across age. The set up of the test is summarised in Table 4-9.
130
Table 4-9 The effect of factor bilinguality of the subject AN on the production of phonetic variants [] and
[u] in SSE in comparison to the SSE monolingual children.
Bilinguality
No
Yes
Total
Tokens
per label
N
Label for the target //
[]
Total
[u]
498
26
524
%
95.0%
5.0%
100.0%
N
249
2
251
%
99.2%
.8%
100.0%
N
747
28
775
%
96.4%
3.6%
100.0%
There was a highly significant association [χ2=8.454; df=1; p<.01] in SSE between
the vowel labels [u] and [] and the factor bilinguality for all the monolingual children
(bilinguality=“No” in Table 4-9) in comparison across all age samples of AN
(bilinguality=“Yes”). In fact, Table 4-9 shows that AN produced more adult-like SSE
targets [] in comparison to the SSE monolingual children. Thus, the result means that
AN produces the SSE vowel [] language-specifically that her speech production is more
mature than the overall production of the SSE peers, and that a small number of back
realisations in her sample cannot be attributed to language interaction from Russian.
The second question is whether the proportion of the phonetic variants for the SSE
target // in AN’s production is different from proportions of the phonetic variants for the
MSR target /u/ across age. In order for the MSR and SSE production to be languagespecific, AN should produce the majority of phonetic variant [] for the SSE target //,
and the majority of [u] for the MSR target /u/.
Table 4-10 summarises all phonetic variants in both languages produced by AN. To
test the question statistically we derived a 2x2 contingency table from Table 4-10 with
two factors: “phonetic label”: i.e. [] and [u] and “language”: i.e. SSE and MSR; and we
tested whether the difference between the proportion for these two phonetic labels and the
two languages was significantly different. The result showed that the difference between
the observed and the expected frequencies in each vowel category and language was
highly significant [χ2=336.387; df=1; p<.01]. This result means that overall (across all
longitudinal data), AN distinguished the rounded central-back vowel quality in a
language-specific way: i.e. in MSR 75.3% of her [u] is back, while the SSE [] was
central in 83.3% of instances.
131
Table 4-10 Phonetic ranges of the MSR target /u/ and SSE // produced by the bilingual subject AN
Language
SSE
Tokens per
label
N
%
MSR
Total
4.3.3.2.2
[i]
Label
[]
[]
[u]
[]
Total
2
2
249
2
44
299
.7%
.7%
83.3%
.7%
14.7%
100.0%
N
4
1
66
238
7
316
%
1.3%
.3%
20.9%
75.3%
2.2%
100.0%
N
6
3
315
240
51
615
%
1.0%
.5%
51.2%
39.0%
8.3%
100.0%
Longitudinal perspective
We did not expect the longitudinal perspective for AN’s production of the SSE
target // to reveal any developmental trends, or reveal any language interaction, since
overall we labeled only two instances of the back realisation for the target // (see Table
4-10).
AN’s longitudinal production of the MSR target /u/ is presented in Table 4-11.
There was no association [χ2=.198; df=1; p=.906] between the factor age and the
observed frequency of phonetic labels [u] and []. Their proportions remained stable
throughout the three age samples, and the majority of AN’s MSR realisations involved
adult-like [u] at all ages. The longitudinal perspective did not reveal more developmental
trends than the averaged data in Section 4.3.3.2.1.
Table 4-11 Longitudinal production of phonetic variants [u] and [] for the MSR target /u/ by AN.
Age
3;8
Tokens per
label
N
%
3;10
4;5
Total
Label for MSR /u/
[]
[u]
Total
25
86
111
22.5%
77.5%
100.0%
N
21
73
94
%
22.3%
77.7%
100.0%
N
20
79
99
%
20.2%
79.8%
100.0%
N
66
238
304
%
21.7%
78.3%
100.0%
We were interested to find out whether the overall 21.7% of [] variant in AN’s
MSR production was a developmental trend or an effect of language interaction from the
SSE target //.
132
Therefore, we looked at the breakdown of the results for the most frequent variants
[] and [u] for MSR target /u/ per carrier word; the results are summarised in Table 4-12.
There was a highly significant association [χ2=202.614; df=3; p<.01] between the carrier
words used in this study and the overall observed frequency of the phonetic labels [] and
[u] in AN’s MSR production. As can be seen in Table 4-12, 92.4% of AN’s [] vowels
were produced in a single Russian carrier word “shut” (a joker), while the percentage [u]
for other three carrier words is quite close to complete adult-like realisation (94.6% to
100%).
Table 4-12. The effect of carrier words on the proportions of the variants [] and [u] for the MSR target /u/
produced by the subject AN.
Label for /u/
Carrier
tuz
suk
kub
shut
Total
Tokens
N
[]
[u]
Total
2
109
111
% carrier
1.8%
98.2%
100.0%
% label
3.0%
46.4%
36.9%
3
53
56
% carrier
5.4%
94.6%
100.0%
% label
4.5%
22.6%
18.6%
0
58
58
% carrier
.0%
100.0%
100.0%
% label
.0%
24.7%
19.3%
61
15
76
% carrier
80.3%
19.7%
100.0%
% label
92.4%
6.4%
25.2%
N
N
N
N
% carrier
% label
66
235
301
21.9%
78.1%
100.0%
100.0%
100.0%
100.0%
Unlike in English, the Russian postalveolar voiceless fricative // (as in “shut” /ut/)
is apical: i.e. it involves the tip of the tongue as active articulator (Bondarko, 1998) rather
than laminal (tongue blade articulation) in English (Ladefoged, 1993, p.161). The laminal
articulation is associated with another palatalised postalveolar voiceless fricative in
Russian //. AN’s realisation of // in “shut” typically involved a laminal articulation
[] without palatalisation13. Such a realisation is most unlikely for Russian monolingual
children. In the course of speech acquisition Russian monolingual children often realize
// as [s] or [t] (Zharkova, 2002), so that typical immature realisations of // involve
13
Unfortunately, we did not annotate apicality or laminality of the consonants in the transcriptions.
However, this statement reflects our general impression of this consonant in AN’s production, and it holds
for the five instances of “shut” produced by AN which we rechecked when preparing this section.
133
palatalisation, fronting and de-apicalisation at the same time. Both // and // are
acquired later than other coronal consonants; they belong to consonants with a substantial
proportion of immature realisations (to the age of 3;0) compared to other consonants
(Zharkova, 2002). No accounts state that apical // can be realised by children as laminal
and palatalised []. Thus, AN’s laminal realisation of // in “shut” is unlikely to be due to
speech immaturity similar to that of Russian monolingual children. AN’s articulation of
SSE // as in “sheep” was quite mature compared to other monolingual children, so that
her laminal production of Russian // could not be explained by the difficulty to acquire
postalveolar fricatives as such.
We can conclude that AN did acquire the Russian back vowel /u/ in a quite adultlike way, if we look at the carrier words other than “shut” (a joker). The presence of nonadult-like variants [] in AN’s case was neither due to speech immaturity similar to that
of Russian monolingual children, nor was it due to the contacting vowel systems as such.
The language interaction occurred on one lexical item rather than being systematic. It had
an affect on the vowel but might originate in the phonotactic influence of the preceding
consonant. The language interaction possibly happened because the subject was not
familiar enough with the carrier word, or because she had not acquired the Russian apical
articulation for the sound, or because this Russian word happened to be a false cognate of
the English verb “to shoot”. The laminal articulation of // is also likely to be language
interaction from SSE, rather than a manifestation of speech immaturity. So in fact, it looks
that language interaction might apply to the whole word demonstrating a clear
lexicalisation effect, whatever the cause.
4.3.3.3
Summary of AN’s results
The results show that AN produced language-specific vowel quality for all the
vowels concerned in this study, and that she differentiated between her two languages.
First of all, AN acquired the segmental quality of the tense/lax contrast between
vowels /i/ and //. AN produced only a 1.3% of // as [i], and this small proportion of is
comparable with the results of the SSE monolingual peers on average. Her 0.9% is in fact
smaller than individual results of some SSE children.
Secondly, she acquired language-specific control of the close rounded vowels. AN’s
production of the SSE // was significantly more adult-like than the average results of the
134
SSE peers: i.e. among the [] and [u] tokens AN produced 99.2% of adult-like []
compared to the average of 95% of the SSE peers. Besides, her MSR/SSE production of
the close rounded vowels was differentiated in a significant way: i.e. among all phonetic
variants in SSE she produced 83% of [] for //, while in MSR she produced 75.3% of the
back [u] for /u/.
However in addition to AN’s language differentiation, we also observed two
language interaction effects regarding vowel quality in her speech production. Both
effects occurred in AN’s Russian rather than in SSE.
First of all, we found a significant proportion of [] variants (10%) for the Russian
target /i/ (as opposed to only 1.8% in her SSE production). The presence of the vowel
[] in AN’s Russian speech is quite unexplainable in terms of speech immaturity
accounted for Russian monolingual children in the literature (Jakobson, 1941; Zharkova,
2002). Its presence could not be explained by the phonotactic influence of the lack of
palatalisation in the preceding consonant, and was, thus, due to the vowel system rather
than due to the acquisition of consonants. [] for /i/ appeared in all MSR carrier words
involved in this study. The production of [] for /i/ in MSR was, thus, an effect of
language interaction from the SSE vowel system and it contradicts the direction of
language interaction predicted by both CCCH (Döpke, 1998; Döpke, 2000) and the
Markedness Hypothesis (Müller, 1998) discussed in Sections 1.3.2.3.2-3. The longitudinal
perspective revealed that the number of occurrences of [] for /i/ significantly decreased
with age.
The second language interaction effect was also observed in AN’s Russian speech
production. It involved the presence of central vowel [] for the back /u/ in one specific
MSR carrier word “shut” (a joker). As such, the presence of [] for /u/ in Russian child
speech should not necessarily be seen as an effect of language interaction, since it is
known that Russian children acquire palatalised consonants quite early on, and, in fact,
‘over-palatalisation’ (i.e. the use of palatalised consonants for the non-palatalised ones)
with subsequent change in the following vowel quality is a sign of speech immaturity in
Russian child speech (Jakobson, 1941; Zharkova, 2002). For the close back rounded
vowel /u/ over-palatalisation of the preceding consonant could involve fronting towards a
vowel quality similar to the SSE vowel //. However, as we observed, 92.4% of all
135
instances of [] occurred only for one lexical item “shut” (a joker) rather than being an
overall effect. We argued that the effect on the vowel originated in the phonotactic
influence of the preceding consonant // which AN produced with a laminal articulation
(typical for SSE) rather than with apical (typical for MSR). The laminal articulation could
be due to language interaction from SSE.
4.3.4 Subject BS
4.3.4.1
4.3.4.1.1
Acquisition of close unrounded vowels
Language differentiation
In this section on BS’ bilingual acquisition of vowel quality we address the same
questions as those addressed for AN. Recall that the difference between the two bilingual
subjects was primarily in the amount of input that they received in the two languages: i.e.
AN had a nearly equal amount in both SSE and MSR, while BS received substantially
more input in Russian than in SSE.
The first question is whether the absence of the vowel [] in MSR affected BS’
production of the lax vowel in SSE in terms of number of instances of phonetic variants
[i] and [] for the target // compared to the SSE monolingual peers.
Table 4-13 The effect of factor bilinguality of the subject BS on the proportion of phonetic variants [i] and
[] produced for the target // in comparison to the SSE monolingual children.
Tokens
per label
Bilingual?
No
Yes
Total
Label
[i]
Total
[]
N
6
659
665
%
.9%
99.1%
100.0%
N
144
78
222
%
64.9%
35.1%
100.0%
N
150
737
887
%
16.9%
83.1%
100.0%
The set up of the statistical test is summarised in Table 4-13. The result showed that
for BS, there was a highly significant association [χ2=484.609; df=1; p<.01] between the
factor “bilinguality” and the proportions of phonetic labels [i] and [] for SSE target //.
The table shows that BS (bilingual “Yes” in Table 4-13) produced only 35% of adult-like
forms compared to 99.1% of the monolingual children (bilingual “No”), while she
produced 64.9% of [i] for the SSE //. Such a highly significant difference in the
136
proportion of phonetic labels can only be accounted by language interaction from
Russian, since the lax vowel // is not featured in Russian.
However, it is worth noting that BS also produced an overall of 35.1% of lax vowels
[]. This means that she was able to produce the language-specific vowel quality, it’s just
that she did not produce it systematically in a way similar to the SSE monolingual
children. Longitudinal results may be more revealing here.
The second question is whether the presence of the tense/lax vowel contrast in SSE
affected BS’ production of the vowel /i/ in MSR in terms of number of occurrence of
phonetic variants [i] and [] for the target /i/ in comparison to her own production in the
SSE monolingual language mode.
Table 4-14 The effect of language on the phonetic ranges for the target /i/ produced by BS in SSE
compared to MSR language modes across age samples.
Language
SSE
MSR
Total
Label
Tokens per
Label
Nr
[i]
Total
395
7
402
%
98.3%
1.7%
100.0%
Nr
208
0
208
%
100.0%
.0%
100.0%
Nr
603
7
610
%
98.9%
1.1%
100.0%
[]
The set up the test is presented in Table 4-14. 100% of all BS’ /i/ in MSR and 98.3%
in SSE involved adult-like [i]. Therefore, the vowel /i/ had an adult-like vowel quality in
both languages. This is not surprising, since the vowel is acquired in the second year of
life in both languages (Matthews, 2002; Zharkova, 2002). Interestingly, like the SSE
monolingual children BS also produced a limited extent of lax realisations (1.7%) in SSE
that is comparable to the proportion of [] for /i/ produced by the SSE monolingual
children (see Table 4-1). This despite the fact that BS produced proportions of phonetic
variants [i] and [] for the lax vowel // differently to the monolingual results.
137
4.3.4.1.2
Longitudinal perspective
Since the amount of adult-like realisations for target /i/ was 100% for BS, we don’t
need to assess this aspect longitudinally. It is, however, interesting to view the
longitudinal perspective for the phonetic ranges of BS’ SSE target //, since overall it was
significantly different from those of the SSE monolingual peers, with BS over-producing
[i] for the target //.
The question we address here is whether the proportions of phonetic labels [i] and
[] for the SSE target // has an association with a specific age sampled in this study for
BS (3;4, 3;10 and 4;5). The set up of the test is presented in Table 4-15.
Table 4-15 Longitudinal production of [i] and [] for target // in SSE by the subject BS.
Age
3;4
3;10
4;5
Total
Label
Tokens per
label
N
[i]
Total
52
25
77
%
67.5%
32.5%
100.0%
[]
N
57
15
72
%
79.2%
20.8%
100.0%
N
35
38
73
%
47.9%
52.1%
100.0%
N
144
78
222
%
64.9%
35.1%
100.0%
There was a highly significant association [χ2=15.872; df=2; p<.01] between the
sampled ages of BS and the observed frequencies for labels [i] and [] for the SSE target
// for these ages. As shown in Table 4-15, the observed frequency of tense realisations for
// decreases from 67.5% at the age of 3;4 to 47.9% at the age of 4;5. The result shows
that with increasing age BS produced less instances of [i] for //.
However, even at the age of 4;5 BS’ percentage of [i] for // (47.9%) is still highly
significantly different [χ2=71.257; df=1; p<.01] from the eldest monolingual children in
our control group (3.1% in C7 and C9).
138
4.3.4.2
4.3.4.2.1
Acquisition of close rounded vowels
Language differentiation
The first question is whether the proportion of the phonetic variants [] and [u] for
the SSE target // in BS’ production differs from that of the SSE monolingual peers
across age. We assess the whole range of phonetic variants of // produced by BS in
comparison to her monolingual peers. The proportions of labels [i], [], [u] and [] for //
are presented in Table 4-16.
Notably 77.7% of BS’ [] for the target // is comparable to the individual ranges of
SSE monolingual children presented in Table 4-3, which vary from 66.2 to 98%. So that
BS’ amount of adult-like [] for // in SSE is language-specific.
Table 4-16 Phonetic ranges for the SSE target // produced by BS in comparison to the SSE monolingual
children (across age)
Bilingual?
No
Yes
Total
Tokens
per label
N
Label
[i]
[]
[u]
Total
[]
1
468
26
63
558
%
.2%
83.9%
4.7%
11.3%
100.0%
N
3
209
46
11
269
%
1.1%
77.7%
17.1%
4.1%
100.0%
N
4
677
72
74
827
%
.5%
81.9%
8.7%
8.9%
100.0%
However, if we look for the most frequent non-adult-like realisations: i.e. [u] and
[], we can see a substantial distributional difference in the percentages. Thus, the
difference between BS and the SSE monolingual children might be in the distribution of
non-adult forms rather than in the percentages of adult-like realisations compared to the
non-adult-like [u]. Therefore, we need to deviate from the test that we ran for AN, and
compare the two most frequent non-adult-like realisations of // as [u] and [] produced
by BS and by the SSE monolingual children.
The set up of the test is presented in Table 4-17. There was a highly significant
association [χ2=36.853; df=1; p<.01] between the factor “bilinguality” and the observed
frequency of phonetic labels [u] and []. Among non-adult-like targets (excluding
marginal [i]), on average the monolingual children produced 70.8% of [] variants, while
139
BS produces 80.7% of back [u]. This data suggests possible language interaction from
Russian.
Table 4-17 Contingency table showing the effect of the factor bilinguality on the distribution of two most
frequent non-adult phonetic targets for SSE // produced by the subject BS in comparison to SSE
monolingual children.
Bilingual?
No
Yes
Total
Label
Tokens
per label
N
[u]
Total
26
63
89
%
29.2%
70.8%
100.0%
[]
N
46
11
57
%
80.7%
19.3%
100.0%
N
72
74
146
%
49.3%
50.7%
100.0%
However, this highly significant result cannot be seen as a definitive proof of
language interaction from Russian in BS’ case, since the 17.1% of [u]’s in BS’ ranges, is
comparable to the results of individual SSE monolingual children (17.5% for C5 and
12.1% C6 in Table 4-3).
Therefore, we can conclude that BS’ production of SSE close rounded // was
acquired in a native-like way compared to the SSE-monolingual children, if we look at the
overall data across longitudinal results.
The second question is whether the proportion of the phonetic variants for the SSE
target // in BS’ production is different from proportions of the phonetic variants for the
MSR target /u/ across her age samples. In order for the MSR and SSE production to be
language-specific, BS should produce a majority of phonetic variant [] for the SSE target
//, and the majority of [u] for the MSR target /u/.
Indeed Table 4-18. shows that 77% of BS’ realisations of SSE // are adult-like
central [] in comparison to 84% of MSR back [u] for /u/. Data in Table 4-19, showed a
highly significant association [χ2=249.700; df=1; p<.01] between the factor language and
the observed frequencies between the phonetic labels [u] and []. This means that BS’
production of rounded vowels /u/ and // was language-specific: i.e. she produced the
back vowel in Russian, and a central vowel quality in SSE.
140
Table 4-18 Phonetic ranges of the MSR adult target /u/ and SSE // produced by the bilingual subject BS.
Language
Mode
SSE
MSR
Total
Tokens
per label
N
Label
[i]
[]
[u]
Total
[]
3
209
46
11
269
%
1.1%
77.7%
17.1%
4.1%
100.0%
N
0
41
247
6
294
%
.0%
13.9%
84.0%
2.0%
100.0%
N
3
250
293
17
563
%
.5%
44.4%
52.0%
3.0%
100.0%
Table 4-19 Contingency table showing the effect of language on the realisations of [] and [u] for subject
BS in SSE compared to MSR language modes across her age samples.
Language
mode
SSE
MSR
Total
4.3.4.2.2
Label
Tokens per
label
N
[]
209
46
255
%
82.0%
18.0%
100.0%
N
41
247
288
%
14.2%
85.8%
100.0%
N
250
293
543
%
46.0%
54.0%
100.0%
[u]
Total
Longitudinal results
We assess whether the proportions of phonetic labels [u] and [] for the SSE target
// has an association with a specific age sampled for the bilingual subject BS (3;4, 3;10
and 4;5). The set up of the test is presented in Table 4-20.
There was a highly significant association [χ2=15.210; df=2; p<.01] between the
factor “age” of BS and the observed frequency of phonetic labels [u] and [] for the SSE
target //. Table 4-20 shows that there was a decrease in the production of the back
vowels [u] for the target // from 22.2% at the age of 3;4 to 6.3% at the age of 4;5.
Importantly, the 6.3% of [u] at the age of 4;5 fall within the ranges produced by the SSE
monolingual children. (see Table 4-3).
The longitudinal results for the production of // in the SSE language mode in Table
4-20 also revealed that at the age of 3;4 and 3;10 BS produced accordingly 22.2% and
27.6% of the back variant [u] for // in SSE, somewhat higher than any of the SSE
141
monolingual peers (cf. C5 in Table 4-3). However, by the age of 4;5 her production was
normal.
Table 4-20 Longitudinal production of [u] and [] for the target // in SSE by the subject BS.
Age
3;4
3;10
4;5
Total
Label
Tokens per
label
N
[]
Total
56
16
72
%
77.8%
22.2%
100.0%
N
63
24
87
%
72.4%
27.6%
100.0%
N
90
6
96
%
93.8%
6.3%
100.0%
[u]
N
209
46
255
%
82.0%
18.0%
100.0%
Table 4-21 Longitudinal production of [u] and [] for the target /u/ in MSR by the subject BS.
Age
3;4
3;10
4;5
Total
Label
Tokens per
label
N
[]
21
31
52
%
40.4%
59.6%
100.0%
N
19
107
126
%
15.1%
84.9%
100.0%
[u]
Total
N
1
109
110
%
.9%
99.1%
100.0%
N
41
247
288
%
14.2%
85.8%
100.0%
The longitudinal results for the Russian monolingual language mode shown in Table
4-21 revealed a highly significant association [χ2=45.196; df=2; p<.01] between the age
of BS and her use of central and back vowels in Russian. With increasing age BS reduced
the number of central vowels in Russian and steadily increased the proportion of back
vowels from 59% at the age of 3;4 to 99.1% at the age of 4;5.
All the Russian carrier words containing /u/ used in this study should be preceded by
a non-palatalised consonant in the Russian adult model. As we discussed in Section
2.2.1.1, it is known that over-use of palatalisation for the non-palatalised adult targets in
children affects the quality of the following vowel (Jakobson, 1941; Zharkova, 2002). For
the close rounded vowels, this means that the back vowel should become more fronted.
The presence of the central vowel in BS’ Russian can potentially be explained by the
over-use of preceding palatalised consonants rather than by any language interaction from
SSE. Therefore, we need to investigate this potential consonantal effect on the vowel.
142
The results of the effect of age on the palatalisation/non-palatalisation of the
consonant preceding target MSR /u/ is shown in Table 4-22. There was a highly
significant association [χ2=46.139; df=2; p<.01] between the age of BS and the number of
palatalised and non-palatalised consonants preceding the target /u/. There is a
developmental decrease in the palatalisation of the preceding consonants in Russian from
42% at the age of 3;4 to 0.9% at the age of 4;5.
Table 4-22 The effect of BS’ age on the use of (non-) palatalised consonants preceding the MSR target /u/.
Age
3;4
3;10
4;5
Total
Tokens per
palatalisation
N
Preceding consonant
palatalised?
No
Total
Yes
30
22
52
%
57.7%
42.3%
100.0%
N
90
36
126
%
71.4%
28.6%
100.0%
N
109
1
110
%
99.1%
.9%
100.0%
N
229
59
288
%
79.5%
20.5%
100.0%
This percentage of the over-use of palatalisation by children for adult nonpalatalised targets is in line with the developmental data reported in Zharkova (2002, p.
75) for a Russian monolingual girl aged 3;0. In that case, 67% of all the processes for the
consonants involved the substitution of the non-palatalised consonants by the palatalised
ones. This means that BS’ production of the central [] for the MSR /u/ can be explained
by BS’ speech immaturity at the age of 3;4 to 3;10, rather than by any language
interaction from SSE.
4.3.4.3
Summary of BS’ Results
The results of BS’ acquisition of vowel quality form a mirror image of AN’s
patterns of acquisition in many respects.
Being a Russian-dominant bilingual, BS had not acquired the tense/lax contrast in a
way similar to the monolingual peers or to a more balanced bilingual AN. Overall, BS
produced only 35% of the adult-like forms of [] for // in comparison to 99.1% of adultlike forms produced by the monolingual peers. The non-adult-like realisations primarily
involved of [i] for // (64.9%) . This pattern, called by Weinreich (1953) ‘underdifferentiation of phonemes’, is frequently accounted for in the L2 acquisition literature
(Panasyuk et al., 1995; Escudero, 2000; Flege, 2002; Piske et al., 2002). Unlike AN’s
143
language interaction pattern for this set of vowels, the direction of the language
interaction did not contradict the direction of language interaction predicted by either
CCCH (Döpke, 1998; Döpke, 2000) or the Markedness Hypothesis (Müller, 1998), but it
was also compatible with BS’ language input conditions (lesser than in AN extent of
exposure to English).
We also showed that the difference between BS and the SSE monolingual peers in
the overall production of the vowels /i/ and // was not that BS was unable to produce an
adult-like [] for //. In fact, we could reverse the argument and say that she did produce
[] in 35% of all // cases. In that sense, we can state that BS did differentiate between her
two languages, but she also showed a considerable amount of language interaction from
Russian.
Similarly, the longitudinal perspective showed that BS was in the process of gradual
acquisition of the tense/lax vowel quality. We showed that with increasing age BS
produced more instances of adult-like [] for //, with 32% at the age of 3;4 and increasing
it to 52.1% at the age of 4;5.
It is very interesting to compare BS’ acquisition pattern for the tense/lax contrast to
her production of the close rounded vowels // and /u/ forming a realizational difference
between SSE and MSR. Unlike for the systemic tense/lax difference, for these vowels BS
did differentiate between the two languages in a native-like way. BS’ overall number of
adult-like and non-adult-like forms for the SSE target // was not significantly different
from that of the SSE monolingual children. Overall she produced 77.7% of adult-like []
for // in SSE, and 84% of [u] for /u/ in MSR. The 13.9 % of [] in MSR was not due to
language interaction from SSE, but in fact followed the Russian monolingual pattern of
‘over-palatalisation’ accounted for in the acquisition literature (Trubetskoy, 1939;
Zharkova, 2002). The amount of non-adult forms was comparable to the monolingual
ranges in SSE.
Despite the language differentiation we also observed a language interaction pattern
in SSE. It concerned BS’ ranges of phonetic variation for the SSE target //, namely the
proportion of the back vowel [u] tokens for // compared to that of the SSE monolingual
peers. With increasing age BS reduced the percentage of the close back rounded [u] in
SSE from 22.5 and 27.6% (at the age of 3;4 and 3;10) to 6.3% at the age of 4;5, a
144
proportion comparable to the SSE monolingual ranges. So that we can conclude that BS
had acquired the crosslinguistic difference between SSE // and MSR /u/ by the age of
4;5.
This finding seems to reinforce the idea that not all ambiguous sound structural
properties seem to be equally prone to language interaction. The realizational
phonological difference between SSE // and MSR /u/ seems to be less difficult to acquire
than the systemic tense/lax difference. However, the acquisition of this realizational
difference by the Russian-dominant bilingual BS may be mediated by the richer phonetic
continuum in Russian compared to the number of phonological categories: i.e. MSR
features palatalisation which triggers fronting of the main allophone [u] towards a more
central vowel similar in quality to the main allophone in SSE.
145
5 Acquisition of Vowel Duration
5.1 Introduction
This chapter presents results on monolingual and bilingual acquisition of postvocalic
conditioning of vowel duration. In this chapter the term ‘postvocalic conditioning’ refers
to how the consonant following a vowel systematically conditions the duration of the
vowel.
In the literature review on monolingual acquisition (Section 2.3.1) we concluded
that despite the evidence suggesting that English-speaking children master postvocalic
conditioning by the age of 3;0 to 5;0, there is little empirical evidence to confirm the
acquisition of the Scottish Vowel Length Rule for the SSE-speaking children at this age.
There is also little empirical evidence on the patterns of bilingual acquisition of
postvocalic conditioning patterns involving either language differentiation or interaction.
This chapter aims to shed more light on these issues.
We address the acquisition patterns for three variables concerning vowel duration:
(1) the acquisition of postvocalic conditioning of the SSE and MSR vowel /i/ ; (2) the
acquisition of SSE ‘invariably short’ postvocalic conditioning for the vowel // compared
to the differentiated pattern of the SSE /i/; and (3) the acquisition of postvocalic
conditioning for the SSE vowel // and MSR /u/.
First of all, we compare the crosslinguistic differences in the variables between adult
speakers: i.e. Russian (n=5), Scottish Standard English (n=5) and Southern Standard
British English (n=4). MSR is not included in any tests involving the lax vowel //, since
the vowel is not featured there. The idea behind testing of the adult models is to pinpoint
substantially different crosslinguistic differences between SSE and MSR in the
postvocalic conditioning of the vowel duration before testing the bilingual acquisition of
these patterns. The SSBE adult model is kept in mind for possible cross-varietal
influences in the speech of bilingual and monolingual children.
146
Subsequently, we assess the issue of monolingual acquisition of the postvocalic
conditioning of vowel duration by comparing SSE adult speech to the data of the SSE
monolingual children.
To account for the bilingual acquisition patterns, we compare the SSE speech of
each bilingual subject to that of the SSE monolingual peers (n=10, a cross-section of 7
individual cases plus 3 longitudinal cases C3, C4 and C7) to establish similarities and
differences between the two groups. This should allow us to determine any language
interaction patterns in SSE.
Besides, bilingual language differentiation is assessed by comparing each subject’s
speech production in SSE to her own speech production in MSR. We perform no direct
statistical comparison between adults and bilingual subjects, because the two groups differ
from each other alongside several dimensions including age and bilinguality. However,
we do consider individual patterns of the bilingual subjects in relation to the patterns of
the Russian-speaking mother, the Russian-speaking experimenter (Gordeeva) and the
adult group results to assess the native-likeliness of the MSR pattern in a descriptive way.
The speech of the experimenter should be indicative of how the changing mode of data
elicitation (spontaneous play with the child) may affect the adult vowel duration patterns
(recorded in reading out utterances from computer screen).
5.2 Data Analysis
The variables on the postvocalic conditioning of vowel duration form numerical data
(annotated vowel duration, ms). Labelling procedures have been described in Section
3.6.2.2. During data annotation (see Section 3.6.2), we indicated whether or not a token
carried a pitch accent, in what position it occurred in the utterance and we assigned a
broad intonational modality to each utterance (non-emphatic versus emphatic statement,
yes/no or WH-questions). We also dispose of information on f0 (Hz) and formant
frequencies.
In order to achieve maximal comparability between adult and child data, we tried to
make the data-sets as uniform as possible by selecting the subsets of tokens which:
-
carried a pitch accent (are not de-accented);
occurred in phrase final and medial positions (not phrase initial), or in
single word utterances;
were produced with detectable f0 (i.e. f0 > 0) and valid formant structure
(i.e. see exclusion criteria in Section 3.6.3.2);
147
- were produced as non-emphatic statements.
All durational measurements of child speech are averaged based on phonological
adult targets (rather than phonetic) unless stated otherwise. The averages are mainly based
on median values (if the statistical test used allowed that) for both adults and children to
achieve more comparability between the data, since the elicitation method was different
(adults read out utterances from computer screen, and children played games) and the
children had more variable speech production, which affected the distributions of the data.
The acquisition of postvocalic conditioning patterns in SSE, MSR and SSBE was
tested by means of Analysis of Variance (ANOVA) with a different set up: namely, mixed
design or multivariate, depending on the variables and number of subjects. The set up is
explained for each variable and subject group separately. All reported F-values were
‘Greenhouse-Geisser epsilon’ corrected.
The statistical analyses were performed separately for each of the bilingual subjects,
since their language input situations were too different to treat them as a group. The group
of the SSE monolingual children (10 cases) was split up into three age subgroups for each
bilingual child separately to more closely match individual age ranges of the subjects in
the longitudinal samples.
All the individual results of the monolingual children, as well as comparisons of
bilingual results and individual adults are analysed descriptively.
5.3 Acquisition of Vowel Duration
5.3.1 A comparison of adult models
5.3.1.1
Vowel /i/
We examine the crosslinguistic differences in the postvocalic conditioning (by
voiceless stop, voiced stop or voiced fricative) of the duration of the vowel /i/ between
MSR, SSE and SSBE.
The median values for the duration (ms) of /i/ for each speaker were entered in a
mixed design ANOVA with “LANGUAGE” (SSE, SSBE, MSR) as a between-subject
factor and the “FOLLOWING CONSONANT” as a within-subject factor. The
“FOLLOWING CONSONANT” factor had three levels: i.e. voiced fricative, voiced stop
and voiceless stop.
148
The results showed that there was a highly significant main effect of the following
consonant on the duration of the vowel /i/ [F(2,22)=77.152; p<.01]. There was also a
highly significant main effect of the factor “LANGUAGE” [F(2,11)=41.133; p<.01]
showing that the durational means are different between the languages. Tukey HSD
posthoc tests for the factor “LANGUAGE” showed that all three languages were highly
significantly different from each other (p<.01). Besides, there was a highly significant
interaction between the factor “LANGUAGE” and the “FOLLOWING CONSONANT”
[F(4,22)=16.943; p<.01].
The direction of the crosslinguistic differences is plotted in Figure 5-1. The mean
duration and standard deviations for /i/ per consonantal context and language averaged for
all the speakers are found in Table 5-1. Figure 5-1 shows that the direction of the main
effect of the following consonant on the duration of /i/ is in parallel in the three languages.
However, as shown by the interaction between the language and following consonants,
the extent of postvocalic conditioning differed significantly depending on the language.
There were significant differences between SSE and SSBE in the implementation of
duration before voiced stops: i.e. this context triggered long duration in SSBE and short
duration in SSE. There were significant crosslinguistic differences between SSE and MSR
in the implementation of duration before voiced fricatives: i.e. this context triggered long
duration in SSE and relatively short duration in MSR. Generally in MSR, the vowels
remained relatively short irrespective of the following consonant.
Appendix C sums up the individual results per speaker and language, mean and
median duration (ms), the number of tokens and the standard deviation. Appendix D sums
up the language results: mean and median duration (ms), the number of tokens and the
standard deviation. The voiceless stop/voiced fricative (VLS/VF) ratio was .5 for SSE, .84
for MSR and .54 for SSBE and the voiceless stop/ voiced stop (VLS/VS) ratio is .84 for
SSE, .89 for MSR and .63 for SSBE (see Appendix T for the overview of the ratios for all
speakers adults and children).
These results confirm previous reports on the extent of postvocalic conditioning of
vowel duration in SSE (McKenna, 1988; Scobbie et al., 1999a; Scobbie et al., 1999b) and
MSR (Chen, 1970; Gordeeva et al., 2003), and the cross-varietal differences between SSE
and SSBE (Scobbie, 2002).
149
350
duration (ms) + 1 Stdev
300
250
200
fric voice+
stop voice+
150
stop voice-
100
50
0
SSE
MSR
SSBE
Language
Figure 5-1 Mean duration and standard deviation of the vowel /i/ in the three languages (SSE, MSR and
SSBE) in the contexts before voiced fricatives, voiced stops and voiceless stops produced by monolingual
adults.
Table 5-1 Mean duration and standard deviation of the vowel /i/ (ms) for three right consonantal contexts
per language averaged for all the adult speakers.
Following
consonant
voiced fricative
voiced stop
voiceless stop
5.3.1.2
Language
SSE
MSR
SSBE
SSE
MSR
SSBE
SSE
MSR
SSBE
Mean
Std.
n of
duration (ms) Deviation subjects
209
24
106
20
285
44
125
16
100
10
243
54
106
11
89
11
153
17
5
5
4
5
5
4
5
5
4
Vowel //
We examined the crosslinguistic difference in the influence of the right consonantal
context on the duration // in stressed syllables in similar phrase positions. The set up of
ANOVA was the same as in Section 5.3.1.1, except that the between-subject factor
“LANGUAGE” had only two levels (SSE and SSBE), since MSR does not feature //, and
150
the within-subject factor “FOLLOWING CONSONANT” had three levels: i.e. voiceless
fricative, voiced stop, voiced fricative.
The results showed that in both English varieties there was a highly significant main
effect of the “FOLLOWING CONSONANT” on the duration of the vowel //
[F(2,14)=15.826; p<.01]. The result means that postvocalic conditioning for // operates
systematically in both SSE and SSBE. Besides, there was a significant main effect of the
factor “LANGUAGE” [F(2,7)=9.772; p<.05] indicating that the overall durational means
are different in each variety. There was also a significant interaction between the factors
“LANGUAGE” and the “FOLLOWING CONSONANT” [F(2,14)=5.363; p<.05]. This
interaction means that the conditioning depends on the following consonant and is
implemented differently between SSE and SSBE.
Mean duration and standard deviations for // per consonantal context and language
averaged for all the speakers are shown in Table 5-2. The direction of the differences
between the two English varieties is shown in Figure 5-2. Individual results per speaker
are found in Appendix C.
Figure 5-2 shows that SSE and SSBE appear to differ both in the extent and more
clearly in the contexts of the postvocalic conditioning for //. Voiceless fricatives
following the vowel trigger the shortest duration in both varieties. However, in SSE there
is only a slight increase in vowel duration before voiced stops and fricatives compared to
that before voiceless fricatives, whereas in SSBE the increase in both voiced contexts is
substantial.
The voiceless fricative/voiced stop (VLF/VS) ratio is .91 in SSE and .68 in SSBE,
while the corresponding voiceless fricative / voiced fricative (VLF/VF) ratios are .9 and
.69. Looking at individual variation of the adult speakers (values derived from Appendix
C), both VLF/VF and VLF/VS ratios are consistently smaller than 1 for the all four SSBE
speakers. However, in SSE only the VLF/VF ratio is consistently less than 1 for all
individual speakers, while the VLF/VS ratio varies from 0.77 to 1.0, with two speakers
having values greater than one.
151
200
180
mean duration (ms)
160
140
120
SSE
100
SSBE
80
60
40
20
0
voiced fricative
voiced stop
voiceless fricative
following consonant
Figure 5-2 Durational means (ms) in all SSE versus SSBE adults of the vowel // in the contexts before
voiced fricatives, voiced stop and voiceless fricatives.
Table 5-2 Mean duration and standard deviation of the vowel // (ms) in three right consonantal contexts per
language (SSE or SSBE) averaged for all the speakers.
Following
Consonant
voiced fricative
voiced stop
voiceless fricative
Mean
Std.
n of
Language duration
Deviation subjects
SSE
119
13
SSBE
172
37
SSE
105
13
SSBE
177
47
SSE
95
19
SSBE
120
24
5
4
5
4
5
4
The VLF/VS ratio for SSE of .91 and the VLF/VF ratios of .9 in our study are
smaller than the corresponding .87 and .72 ratios derived from McKenna (1988). The data
from Agutter (1988) gave similar results to McKenna’s (1988) study. The length of
utterances used for the analysis in these studies can plausibly explain the difference in the
ratios: i.e. both McKenna and Agutter recorded words in isolation, while we recorded
carrier words embedded in sentences. In a longer utterance the extent of postvocalic
conditioning for // becomes smaller due to speaking rate differences and a more
spontaneous mode of data elicitation.
Our results confirm previous reports on the relatively small phonetic extent of
postvocalic conditioning in the SSE // (Agutter, 1988; McKenna, 1988), and equally
152
confirm our analysis in Section 2.1.4 on the crosslinguistic difference between SSE and
SSBE, based on the data for General American (House, 1961; Peterson & Lehiste, 1960).
Additionally, empirical data on the postvocalic conditioning of duration of // in
McKenna (1988), Agutter (1988) and our own data support the fact that, unlike /i/, the lax
vowel // is relatively short before the three consonantal contexts considered; but that
there is still a small but systematic extent of postvocalic conditioning applicable to the
vowel depending on the voicing and the manner of articulation of the following
consonant. This means that Aitken’s (1981) definition of the SSE lax vowel as being
‘invariably short’ needs refinement, since the phonological ‘invariability’ does not hold at
the phonetic level.
5.3.1.3
Close rounded vowels
We examine the crosslinguistic patterns of the postvocalic conditioning (voiceless
stop, voiced stop or voiced fricative) of the duration of the rounded vowels /u/, // and //
between MSR, SSE and SSBE. The set up of the ANOVA was the same as for the vowel
/i/ in Section 5.3.1.1. We expected the extent of crosslinguistic differences between the
consonantal conditioning of the duration of the close rounded vowels to be similar to that
found for /i/.
The results showed that there was a highly significant main effect of the following
consonant on the duration of the close rounded vowels [F(2,22)=110.025; p<.01]. The
result means that postvocalic conditioning patterns are systematically different in different
consonantal contexts. There was a highly significant main effect of the factor
“LANGUAGE” [F(2,11)=29.044; p<.01]. This effect showed that the overall durational
means are different between the languages. Tukey HSD posthoc tests for the factor
showed that all the language pairs significantly differed from each other (p<.05).
There was also a highly significant interaction between the factor “LANGUAGE”
and the vowel duration as a function of the “FOLLOWING CONSONANT”
[F(4,22)=33.959; p<.01], showing that the duration of the rounded vowel depends on the
language in different consonantal contexts. The direction of the crosslinguistic differences
is shown in Figure 5-4.
153
350
mean duration + 1 StdDev (ms)
300
250
fric voice+
stop voice+
stop voice-
200
150
100
50
0
SSE
MSR
SSBE
Language
Figure 5-3 Mean duration (ms) and standard deviation of the close rounded vowels in the three languages
(SSE, MSR and SSBE) in the contexts before voiced fricatives, voiced stops and voiceless stops produced
by monolingual adults.
Table 5-3 Mean duration and standard deviation of close rounded vowels (ms) as a function of the
following consonant averaged for all the SSE, MSR and SSBE adult speakers.
Following
consonant
voiced fricative
voiced stop
voiceless stop
Mean
Std.
n of
Language duration (ms) Deviation subjects
SSE
214
30
MSR
115
18
SSBE
269
45
SSE
118
10
MSR
98
6
SSBE
253
52
SSE
108
10
MSR
97
6
SSBE
122
22
5
5
4
5
5
4
5
5
4
The mean duration and standard deviations for the rounded vowels per consonantal
context and language averaged for all the speakers are shown in Table 5-3. Individual
results per speaker are found in Appendix C.
The results confirm that the crosslinguistic implementation of postvocalic
conditioning for close rounded vowels shown in Figure 5-3 was very similar to that of /i/
154
(Figure 5-1). As expected, the duration of close rounded vowels before voiceless stops
was rather short in all three languages. The main crosslinguistic differences in duration
between MSR and SSE occurred before voiced stops and voiced fricatives. Similarly to
/i/, there were clear crosslinguistic differences between SSE and SSBE in the
implementation of vowel duration before voiced stops: i.e. this context triggered long
duration in SSBE and short duration in SSE. In Russian, vowels remained relatively short
irrespective of the following consonant, though there was a slight increase of the vowel
duration before voiced fricatives.
For the close rounded vowels, the VLS/VF ratio is .5 for SSE, .84 for MSR and .46
for SSBE. The VLS/VS ratio is .92 for SSE, .99 for MSR and .48 for SSBE. Similarly to
/i/, these results confirm previous reports on the extent of postvocalic conditioning of
vowel duration in SSE (McKenna, 1988; Scobbie et al., 1999a; Scobbie et al., 1999b) and
MSR (Chen, 1970; Gordeeva et al., 2003), and the cross-varietal differences between SSE
and SSBE (Scobbie, 2002).
5.3.1.4
Summary of results for monolingual adults
The results of between-language analysis of variance showed that there were
significant differences in the implementation of postvocalic conditioning of vowel
duration between MSR, SSE and SSBE for all the vowels concerned.
The results for SSE confirm empirical evidence for the operation of the Scottish
Vowel Length Rule in SSE (Agutter, 1988; McKenna, 1988; Scobbie et al., 1999a;
Scobbie et al., 1999b). In agreement with these studies, our data showed that both
monophthongs /i/ and // have a long duration conditioning before voiced fricatives, as
opposed to its short conditioning in the contexts before voiced and voiceless stops. All
adults consistently showed these patterns.
The results for MSR showed only a slight overall increase in vowel duration as a
function of the following consonant for the adults. This result confirms previous report of
such an increase in Chen (1970). However, it is important to note that the context
dependent increase in duration was different for the vowels /i/ and /u/ in our data, and that
the individual speakers varied and deviated from this trend in several instances (Appendix
D). This shows that postvocalic conditioning is not an obligatory phonetic property in
Russian, and that we should be careful comparing Russian language results of bilingual
children to the mean results of Russian adults as a group. Their speech production can
155
rather be compared crosslinguistically and to the individual speech production of their
parents (the mother in this case), and they still could differ from their parents. The biggest
difference in postvocalic conditioning between Russian and SSE is not necessarily in the
pattern, as both could coincide, but rather in the extent, especially in the context before
voiced fricatives.
The results for SSBE primarily confirm data based on American English (Peterson
& Lehiste, 1960; House, 1961). The vowels /i/ and /u / are short before voiceless stops
and long before voiced fricatives and stops, thus the increase in duration is conditioned
purely by the voicing of consonants rather than by a combination of voice and manner of
articulation as in SSE. SSBE has long duration before voiced stops as opposed to short
duration in SSE.
The results for the lax vowel // confirm that SSE and SSBE have a differential
implementation of postvocalic conditioning before voiced fricatives, voiced stops, and
voiceless fricatives. In SSE, the context-dependent increase of duration is very small: the
VLS/VS ratio is .91 and VLS/VF ratio .9. This confirms the previous reports (Aitken,
1981; Scobbie, 2002) that the vowel can be considered as phonologically short regardless
of the following consonant. At the phonetic level, however, our data as well as the data
from Agutter (1988) and McKenna (1988) show context-dependent variability (Aitken,
1981) of the duration of //. The duration of // systematically (though marginally)
increases in the context of voiced stops and voiced fricatives compared to the voiceless
context, and individual SSE adults are consistent in realising this pattern.
156
5.3.2 SSE monolingual acquisition
5.3.2.1
5.3.2.1.1
Vowel /i/
Group results
This section addresses the question whether the SSE monolingual children
participating in this study acquired the SVLR pattern for the vowel /i/ in patterns similar
to the SSE adults.
The median values for the duration (ms) of /i/ for each monolingual SSE speaker
(15 subjects including the three longitudinal cases) were entered in a mixed design
ANOVA with “AGE” (adult, child aged 3;4 to 3;11; child aged 4;0 to 4;4, child aged 4;5
to 4;9 ) as a between-subject factor and the “FOLLOWING CONSONANT” as a withinsubject factor. The factor “FOLLOWING CONSONANT” had three levels: i.e. voiced
fricative, voiced stop and voiceless stop.
The results showed that there was a highly significant main effect of the following
consonant on the duration of the vowel /i/ [F(2,22)=113.852; p<.01] in all groups. The
direction of the main effect was parallel in all age groups and it is shown in Figure 5-4. As
expected, the context of voiced fricatives triggered the longest duration of the preceding
vowel, while the context before voiced and voiceless stops remained relatively short. This
highly significant main effect means that the SSE monolingual children acquired the
SVLR pattern for the vowel /i/.
Furthermore, there was a highly significant main effect of the factor “AGE”
[F(3,11)=10.169; p<.01] on the vowel duration. This means that absolute differences
between the age groups in vowel duration were systematic. Figure 5-4 shows that the
main difference between the groups was contributed by the overall higher duration means
in the child groups compared to adults.
157
350
300
duration /i/ (ms)
250
SSE adult
child 3;4 to 3;11
child 4:0 to 4;4
child 4;5 to 4;9
200
150
100
50
0
voiced fricative
voiced stop
voiceless stop
following consonant
Figure 5-4 Mean duration of the vowel /i/ (ms) as a function of the following consonant in four age groups
of the SSE monolingual speakers.
Table 5-4 Mean duration and standard deviation for the SSE vowel /i/ as a function of the following
consonant in four age groups of the SSE monolingual controls.
Following
Mean duration Std.
n of
Consonant SSE age group (ms)
Deviation subjects
voiced fricative adult
209
24
5
child 3;4 to 3;11
322
26
3
child 4;0 to 4;4
359
49
5
child 4;5 to 4;9
295
51
2
Total
293
73
15
voiced stop
adult
125
16
5
child 3;4 to 3;11
178
28
3
child 4;0 to 4;4
194
53
5
child 4;5 to 4;9
153
13
2
Total
162
44
15
voiceless stop adult
106
11
5
child 3;4 to 3;11
158
19
3
child 4;0 to 4;4
166
54
5
child 4;5 to 4;9
132
9
2
Total
140
41
15
To establish what groups contributed to the significance of “AGE” we ran Tukey
HSD post-hoc tests for the age effects on the duration of the vowel /i/. The results of the
tests are shown in Table 5-5. The significant effects are marked with an asterisk (*) in the
column “(J) Age”. The post-hoc tests revealed that the age effect was only significant
(p<.05) between adults and the two youngest age groups (3;4 to 3;11 and 4;0 to 4;4).
158
There was no significant difference between adults and the older children (4;5 to 4;9).
This means that there was a significant longitudinal effect observed, and that the
acquisition of the SVLR pattern for the vowel /i/ is getting closer to the adult form at the
age of 4;5.
Table 5-5 Results of Tukey HSD post-hoc tests for the differences between age groups within SSE
monolingual controls.
(I) Age
Adult
(J) Age
Mean Difference (I-J) Std. Error
child 3;4 to 3;11*
-72.56
20.21
child 4;0 to 4;4*
-93.28
17.51
child 4;5 to 4;9
-46.95
23.15
child 3;4 to 3;11 adult*
72.56
20.21
child 4;0 to 4;4
-20.72
20.21
child 4;5 to 4;9
25.61
25.26
child 4;0 to 4;4 adult*
93.28
17.50
child 3;4 to 3;11
20.72
20.21
child 4;5 to 4;9
46.33
23.15
child 4;5 to 4;9 adult
46.95
23.15
child 3;4 to 3;11
-25.61
25.26
child 4;0 to 4;4
-46.33
23.15
*
The mean difference is significant at the .05 level.
Finally, there was a significant interaction between the factors “AGE” and the
“FOLLOWING CONSONANT” [F(6,22)=2.658; p<.05]. The interaction means that the
extent of age differences in vowel duration depended on the following consonant. Figure
5-4 shows that the largest contextual differences in vowel duration occurred in the context
before voiced fricatives, where the three child groups had longer duration compared to
adults. There were no other significant main effects or interactions.
5.3.2.1.2
Individual results
Since the speech production of the bilingual children (AN and BS) is assessed
individually, it is worthwhile considering the ranges of individual variation of the SSE
monolingual children. Individual results of the monolingual children are plotted in Figure
5-5. Individual descriptive statistics are reported in Appendix E.
Figure 5-5 shows that all seven SSE monolingual children (individual children are
plotted by increasing age on the x-axis) had an SVLR-like pattern, with /i/ before voiced
fricatives having about twice longer duration than before other consonants. Three of the
monolingual children that were recorded longitudinally: i.e. C3 (3;4 and 3;11), C7 (4;2
and 4;8) and C4 (3;8 and 4;1), showed a stable SVLR pattern in all age samples.
159
500
median duration (ms)
450
400
350
300
voice fricative
250
voiced stop
200
voiceless stop
150
100
50
3;
8
C
3_
3;
11
C
6_
4;
0
C
5_
4;
0
C
4_
4;
1
C
7_
4;
2
C
8_
4;
2
C
7_
4;
8
C
9_
4;
9
4_
C
C
3_
3;
4
0
SSE monolingual children
Figure 5-5 Individual results of SSE monolingual children on the duration of /i/ as a function of the
following consonant
In the context of /i/ before voiced stops the duration was relatively short in most
subjects compared to voiced fricatives. However, the relationship of the median duration
of /i/ between the contexts before voiced stops compared to that before voiceless stops
was quite variable between the 10 cases: i.e. some decrease, some increase, which
supports general claims for SSE that both voiced and voiceless stops condition short
vowels (Aitken, 1981; Scobbie et al., 1999a; Scobbie et al., 1999b; Scobbie, 2002).
There seems to be no developmental pattern in the SVLR with increasing age,
supporting the results in the previous section. The youngest children had already acquired
adult-like postvocalic conditioning of the vowel /i/.
The VLS/VF ratios of the children ranged from .31 to .62. The figures and patterns
of the individual children confirm the significant group results on the acquisition of the
SVLR pattern.
160
5.3.2.2
5.3.2.2.1
Vowel //
Group results
This section addresses the question whether the SSE monolingual children acquired
the phonetically short postvocalic conditioning pattern for the lax vowel // in a way
similar to the SSE adults.
The set up of ANOVA was the same as in Section 5.3.2.1.1, except that the withinsubject factor “FOLLOWING CONSONANT” had thee levels: voiced fricative, voiced
stop and voiceless fricative.
The results showed that there was a highly significant main effect of the factor
“FOLLOWING CONSONANT” on the duration of the vowel // [F(2,22)=14.05; p<.01].
The mean results for duration and standard deviations are presented in Table 5-6 for the
four age groups. The direction of the effect of the following consonant on vowel duration
was parallel in all age groups (see Figure 5-6). This result means that the SSE
monolingual children had acquired the postvocalic conditioning for //.
However, there was also a highly significant main effect of the factor “AGE” on the
duration of the vowel across all consonantal contexts [F(3,11)=11.26; p<.01]. This effect
means that absolute differences between the age groups in vowel duration were
systematic. The difference between the age groups is shown in Figure 5-6. Similarly to /i/,
children in all the age groups had higher mean duration values than the adult group.
To establish what groups contributed to the significance of the main effect of the
factor “AGE” we ran Tukey HSD post-hoc tests for the age effects on the duration of the
vowel //. The results are shown in Table 5-7. The significant effects are marked with an
asterisk (*) in the column “(J) Age”. The post-hoc tests revealed that the age effect was
only significant (p<0.5) between adults and the two youngest groups aged 3;4 to 3;11, and
4;0 to 4;4.
161
350
300
duration (ms)
250
SSE adult
200
child 3;4 to 3;11
child 4:0 to 4;4
150
child 4;5 to 4;9
100
50
0
voiced fricative
voiced stop
voiceless stop
following consonant
Figure 5-6 Mean duration of the vowel // as a function of the following consonant in 4 SSE monolingual
age groups
Table 5-6 Mean duration and standard deviation for the SSE vowel // as a function of the following
consonant for each age group of the SSE monolingual controls
Following
consonant
voiced fricative
Mean
Std.
n of
Age
duration (ms) Deviation subjects
Adult
119
13
5
Child 3;4 to 3;11
226
38
3
Child 4;0 to 4;4
192
21
5
Child 4;5 to 4;9
162
20
2
Total
170
47
15
voiced stop
Adult
105
13
5
Child 3;4 to 3;11
191
35
3
Child 4;0 to 4;4
178
59
5
Child 4;5 to 4;9
131
40
2
Total
150
53
15
voiceless fricative Adult
95
19
5
Child 3;4 to 3;11
153
20
3
Child 4;0 to 4;4
130
21
5
Child 4;5 to 4;9
116
24
2
Total
121
29
15
Similarly to /i/, there was no significant difference between adults and the older
children aged 4;5 to 4;9. The result is similar to that for the acquisition of SVLR in /i/,
162
and means that the short postvocalic conditioning for the vowel // settles in an adult-like
form by the age of 4;5.
There were no other significant main effects or interactions.
Table 5-7 Results of Tukey HSD post-hoc tests for the age effects for the SSE monolingual speakers.
(I) Age
Adult
(J) Age
child 3;4 to 3;11*
child 4;0 to 4;4*
child 4;5 to 4;9
child 3;4 to 3;11 adult*
child 4;0 to 4;4
child 4;5 to 4;9
child 4;0 to 4;4 adult*
child 3;4 to 3;11
child 4;5 to 4;9
child 4;5 to 4;9 Adult
child 3;4 to 3;11
child 4;0 to 4;4
*
Mean Difference
(I-J)
Std. Error
-83.73
15.88
-60.30
13.75
-30.28
18.20
83.73
15.88
23.43
15.88
53.46
19.85
60.30
13.75
-23.43
15.88
30.03
18.20
30.28
18.20
-53.46
19.85
-30.03
18.20
The mean difference is significant at the .05 level.
5.3.2.2.2
Individual results
Individual results of the monolingual children are presented in Figure 5-7. The
descriptive statistics are reported in Appendix G.
Figure 5-7 shows that all the SSE monolingual children consistently produced //
before voiceless fricatives shorter than before voiced fricatives. This finding parallels the
SSE adult results. Individual VLF/VF ratios range from .49 to .85, which is a somewhat
broader range than the adult ratios. VLF/VS ratios vary from .52 to 1.1. Similarly to the
SSE adults, the individual child VLF/VS ratios had a broader range than the VLF/VF
ratio.
There are two possible explanations for the broader range of variation in children.
The first one is the difference in the data elicitation mode used between adults and
children: i.e. adults read out utterances from computer screen, while children produced
carrier words playing games. Secondly, the broader range of the ratios could be explained
by speech immaturity.
163
300
median duration (ms)
250
200
voiced fricative
150
voiced stop
voiceless fricative
100
50
8
3_
3;
11
C
6_
4;
0
C
5_
4;
0
C
4_
4;
1
C
7_
4;
2
C
8_
4;
2
C
7_
4;
8
C
9_
4;
9
3;
C
C
4_
C
3_
3;
4
0
SSE monolingual child
Figure 5-7 Individual results of SSE monolingual children on the duration of // as a function of the
following consonant
Both group results and individual results for /i/ and // also confirm that the SSE
monolingual children had acquired differential implementation of postvocalic
conditioning for these two vowels similarly to the SSE adults.
164
5.3.2.3
5.3.2.3.1
Close rounded vowel
Group results
In Section 4.3.1.2 we found a broad range of phonetic variation in the production of
vowel quality by the SSE monolingual children: i.e. the production of // was less adultlike than that of /i/ even at the age of 3;4 to 4;9. This section assesses the acquisition of
postvocalic conditioning of duration (SVLR) for this vowel. We used the same set up for
the ANOVA as that described in Section 5.3.1.1 dealing with SVLR for /i/.
Similarly to /i/, the results show that there was a highly significant main effect of
the following consonant on the duration of the vowel // irrespective of the other factors
[F(1.092,22)=26.896; p<.01]. The direction of the main effect was parallel in all age
groups (see Figure 5-8). In all groups, the context before voiced fricatives triggered the
longest duration of //, while the context before voiced and voiceless stops remained
relatively short. The results are consistent with the acquisition of SVLR for the vowel
/i/ in our data set.
Furthermore, there was a highly significant main effect of the factor “AGE”
[F(3,11)=10.169; p<.01] on vowel duration. This effect means that absolute differences in
vowel duration between the age groups were systematic. Figure 5-8 shows that the main
difference between the groups is contributed by the overall higher duration means in the
child groups as compared to the adult group.
To establish what groups contributed to the main effect of “AGE”, we ran Tukey
HSD post-hoc tests for the age effects. The results of the test are shown in Table 5-9. The
significant effects are marked with an asterisk (*) in the column “(J) Age”. The post-hoc
tests revealed that like for /i/, there was no significant difference in the implementation of
SVLR between adults and the older children aged 4;5 to 4;9. However, there was also no
significant difference between adults and children aged 3;4 to 3;11, while there was a
significant difference between adults and children aged 4;0 to 4;4. This may indicate that
the significance of main effect “AGE” can also be influenced by the variability of
individual children contributing to the specific age groups in addition to developmental
trends. Individual results in the next section may help to clarify this issue
165
400
350
duration (ms)
300
250
SSE adult
child 3;4 to 3;11
200
child 4:0 to 4;4
child 4;5 to 4;9
150
100
50
0
voiced fricative
voiced stop
voiceless stop
Figure 5-8 Mean duration of the vowel // (ms) as a function of the following consonant in four age groups
of SSE monolingual speakers.
Table 5-8 Mean duration and standard deviation for the SSE vowel // as a function of the following
consonant for each age group of the SSE monolingual controls.
Following
Consonant
voiced fricative
voiced stop
voiceless stop
Mean duration Std.
Age
(ms)
Deviation
adult
214
child 3;4 to 3;11
292
child 4;0 to 4;4
421
child 4;5 to 4;9
270
Total
306
adult
118
child 3;4 to 3;11
213
child 4;0 to 4;4
235
child 4;5 to 4;9
129
Total
178
adult
108
child 3;4 to 3;11
107
child 4;0 to 4;4
119
child 4;5 to 4;9
96
Total
110
n of
subjects
30
5
53
3
89
5
74
2
106
15
10
5
40
3
135
5
32
2
93
15
10
5
33
3
15
5
39
2
20
15
There were no other significant main effects or interactions.
The average VLS/VF ratio for the children was .47, and it is similar to .5 in the adult
data. The average VLS/VS ratio for the children was .87, and it is again comparable to the
adult ratio of .84.
166
Table 5-9 Results of Tukey HSD post-hoc tests for the differences in the duration of // between age groups
within SSE monolingual controls.
(I) Age
(J) Age
Adult
child 3;4 to 3;11
child 4;0 to 4;4*
child 4;5 to 4;9
child 3;4 to 3;11 Adult
child 4;0 to 4;4
child 4;5 to 4;9
child 4;0 to 4;4 adult*
child 3;4 to 3;11
child 4;5 to 4;9*
child 4;5 to 4;9 Adult
child 3;4 to 3;11
child 4;0 to 4;4*
*
Mean Difference (I-J) Std. Error
-57.26
-111.56
-18.06
57.26
-54.30
39.20
111.56
54.30
93.50
18.06
-39.20
-93.50
21.30
18.44
24.40
21.30
21.30
26.62
18.44
21.30
24.40
24.40
26.62
24.40
The mean difference is significant at the .05 level.
5.3.2.3.2
Individual results
Individual results of the monolingual children are shown in Figure 5-9. The
descriptive statistics for each child (and age) are reported in Appendix F.
As shown in Figure 5-9, all monolingual children, except for C4 (aged 3;8 and 4;1)
had an SVLR-like pattern with // before voiced fricatives having longer duration than in
the other two contexts. In both age samples, C4 produced an SSBE-like postvocalic
conditioning pattern rather than SVLR, with // in the context before voiced fricatives and
voiced stops having much longer duration in comparison to the voiceless stop context.
This can be explained by this subject’s language background: i.e. C4 was the only child
with mixed parental background: i.e. an SSE-speaking mother and SSBE-speaking father,
even though she attended a largely SSE-speaking community nursery.
167
600
median duration (ms)
500
400
voiced fricative
300
voiced stop
voiceless stop
200
100
8
3_
3;
11
C
6_
4;
0
C
5_
4;
0
C
4_
4;
1
C
7_
4;
2
C
8_
4;
2
C
7_
4;
8
C
9_
4;
9
3;
C
C
4_
C
3_
3;
4
0
SSE monolingual children
Figure 5-9 Individual results of SSE monolingual children on the duration of // as a function of the
following consonant.
This subject was deliberately included in our monolingual sample, because it was
not obvious what variety of English should be preferred by the bilingual children from
Russian-speaking families, given the fact that they grow up in a crossvarietally
heterogeneous English community of Edinburgh. Interestingly, this SSBE pattern showed
up only for the C4’s vowel // and not for /i/, which had an SVLR like pattern (see Figure
5-5). As we discussed in Section 2.3.1., similar results were reported in Hewlett et al.
(1999), where two children with a non-SSE English parental background acquired an
SVLR-like pattern for the vowel /i/ and the non-SVLR English vowel duration pattern
(similar to SSBE) for the rounded vowel. According to Hewlett et al. (1999), this fact
suggested that there were competing influences from the two varieties at work in the
children’s speech production patterns. The acquisition pattern by the subject C4 agrees
with the pattern reported in Hewlett et al. (1999), with the difference that C4 comes from
a mixed parental background (SSE-speaking mother and SSBE-speaking father) as
opposed to the non-SSE English background of both parents in Hewlett et al. (1999).
The other two monolingual children (other than C4) recorded longitudinally: i.e. C3
(3;4 and 3;11), C7 (4;2 and 4;8) showed stable SVLR patterns in all age samples. The
VLS/VF ratios of the children (across all ages) range from .31 to .62, while VLS/VS
ratios range from .52 to 1.19.
168
5.3.2.4
Summary of results for the SSE monolingual children
The results of the ANOVA comparing different age groups within the SSE
monolingual speakers confirmed that the SSE monolingual children aged 3;4 to 4;9 firmly
acquired the SVLR pattern for the close vowels /i/ and //. This supports Matthews’
(2002) suggestive evidence that Scottish children might be in the process of acquisition of
SVLR by the age of 2;6 to 2;8.
The SVLR pattern for the target // was acquired in an adult-like form despite the
fact that segmental production of the monolingual children still showed broad (non-adultlike) ranges of phonetic variability.
Furthermore, the ANOVA results confirmed that the short postvocalic conditioning
pattern for the lax ‘invariably short’ (Aitken, 1981) vowel // is also established at the age
concerned. The results for child groups for // parallel those of the SSE adults. Since the
SSE adult-model of the postvocalic conditioning is different for the lax vowel compared
to the tense one, the results for both vowels in children also mean that the duration of the
SVLR tense vowel and the lax vowel is differentiated at the age concerned.
Concerning age effects, it appears that at the age of 4;5 to 4;9, the SSE monolingual
children were getting closer to adult-like production patterns for all the vowels concerned.
At no point were there significant differences between this age group and the SSE adults,
while there were significant differences between the younger age groups and the adults.
169
5.3.3 Bilingual acquisition
5.3.3.1
5.3.3.1.1
Subject AN
SSE /i/
This section addresses the question whether the bilingual subject AN (who received
a nearly equal input in SSE and MSR) acquired the SVLR pattern for the vowel /i/ in a
similar way to the SSE monolingual children.
The median values for the duration (ms) of /i/ for each monolingual SSE child and
AN were entered in a mixed design ANOVA with “BILINGUALITY” (yes, no) and
“AGE” (3;4 to 3;11; 4;0 to 4;4, 4;5 to 4;9) as between-subject factors and the
“FOLLOWING CONSONANT” as a within-subject factor. The factor “FOLLOWING
CONSONANT” had three levels: voiced fricative, voiced stop and voiceless stop.
The result showed that there was a highly significant main effect of the following
consonant on the duration of the vowel /i/ [F(2,14)=47.019; p<.01] irrespective of age and
bilinguality. The median results for the duration of /i/ and number of tokens for subject
AN for each age sample are presented in Table 5-10. The corresponding values for the
SSE monolingual children (per age) are found in Table 5-4. The direction of the main
effect of the following consonant on the duration of the vowel /i/ was parallel in all age
groups, and it is shown in Figure 5-10.
There were no significant main effects of the factors “AGE” or “BILINGUALITY”,
and no significant interactions.
This result means that AN acquired an SVLR pattern for /i/ in a way similar to the
SSE monolingual peers. Additionally, the result means that AN acquired the SSE majority
model of SVLR rather than a non-SSE English one (SSBE-like) despite their cooccurrence in the community of Edinburgh.
170
400
400
400
350
350
350
300
300
300
250
250
250
200
200
200
150
150
150
100
100
100
SSE child 3;4 to 3;11
50
AN 3;8
0
50
SSE child 4;0 to 4;4
AN 4;2
0
voiced
fricative
voiced
stop
voiceless
stop
50
SSE child 4;5 to 4;9
AN 4;5
0
voiced
fricative
voiced
stop
voiceless
stop
voiced
fricative
voiced
stop
voiceless
stop
Figure 5-10 Median duration of the vowel /i/ (ms) as a function of the following consonant for subject AN
compared to age matched SSE monolingual children in three age samples. 14
Table 5-10 Number of tokens and duration of the vowel /i/ (ms) as a function of the following consonant
for subject AN in three age samples
Speaker
AN_3;8
AN_4;2
AN_4;5
Following
Median duration
Consonant
(ms)
n of tokens
voiced fricative
228
118
voiced stop
184
33
Voiceless stop
164
66
Total
196
217
voiced fricative
271
20
voiced stop
168
12
Voiceless stop
97
25
Total
151
57
voiced fricative
267
50
voiced stop
167
28
Voiceless stop
128
49
Total
181
127
14
SSE children’s group values are means of individual children’s median values in this Figure and in all
subsequent Figures comparing SSE of bilingual and monolingual children.
171
Despite the fact that AN’s production of SVLR for this vowel is not significantly
different from that of the peers, it is worth noting that at the youngest age of 3;8 AN the
VLS/VF ratio is .71 as opposed to the average of 0.37 of the SSE children, and to the
individual highest 0.62 (C5 aged 4;0). Recall that the VLS/VF ratio of the monolingual
children was close to that of the SSE adults, while AN’s ratio at the age of 3;8 is much
higher than in the monolingual sample. We shall return to this pattern later in the results.
5.3.3.1.2
SSE //
The main question addressed in this section is whether AN acquired the short
postvocalic conditioning of duration of // in a way similar to the SSE monolingual
children. The ANOVA had the same design as in Section 5.3.3.1.1, except that the withinsubject factor “FOLLOWING CONSONANT” had different levels: i.e. voiced fricative,
voiced stop and voiceless fricative.
The result showed that there was a significant main effect of the following
consonant on the duration of the vowel // [F(2,14)=5.540; p<.05] irrespective of the other
factors. The median results for duration of // and number of tokens for AN are presented
in Table 5-11 for each age. The corresponding values for the SSE monolingual children
(per age) are found in Table 5-6. The direction of the main effect of the following
consonant on the duration of // was similar in all age groups despite AN’s bilinguality
(see Figure 5-11). The result showed that AN’s production of postvocalic conditioning for
// was similar to that of the monolingual children. This pattern of the duration of // was
different of AN’s SVLR pattern for the vowel // (compare Figure 5-10 and Figure 5-11).
There was no significant main effect of the factors “AGE” or “BILINGUALITY”
and no significant interactions. It is interesting to note that despite the non-significance of
the factor “AGE”, AN’s patterns of postvocalic conditioning in Figure 5-11 are somewhat
different from the averaged results of the monolingual children at the age of 3;8 and 4;2.
Similarly to the monolingual children, this pattern becomes more SSE-child-like and,
thus, also more adult-like at the age of 4;5.
172
300
300
300
250
250
250
200
200
200
150
150
150
100
100
100
50
50
AN 3;8
SSE child 4;0 to 4;4
SSE child 3;4 to 3;11
0
50
AN 4;2
0
voiced
fricative
voiced
stop
voiceless
fricative
AN 4;5
SSE child 4;5 to 4;9
0
voiced
fricative
voiced
stop
voiceless
fricative
voiced
fricative
voiced
stop
voiceless
fricative
Figure 5-11 Median duration of the vowel // (ms) as a function of the following consonant produced by the
subject AN in comparison to the SSE monolingual peers in three age samples (plotted from left to right).
Table 5-11 Number of tokens and duration of the vowel // as a function of the following consonant
produced by the subject AN in three age samples.
Following
Median
n of
AN’s Age Consonant
duration (ms) tokens
3;8
voiced fricative
180
29
voiced stop
241
37
Voiceless fricative
40
194
Total
203
106
4;2
voiced fricative
121
9
voiced stop
182
12
Voiceless fricative
19
95
Total
122
40
4;5
voiced fricative
188
33
voiced stop
144
27
91
Voiceless fricative
133
Total
166
151
173
However, we should not forget that the individual results for the monolingual
children were also somewhat less consistent for this vowel, and in fact AN’s pattern at age
3;8 and 4;2 is very similar to the pattern of C5 at age 4;0 (see Figure 5-7). This might
explain, why the difference between AN’s production and the averaged results for the
monolingual children in Figure 5-11 are not significant.
The result shows that by the age of 3;8 AN produced short vowel // in a way similar
to the SSE monolingual children. This result equally means that she differentiated
between the postvocalic conditioning for the vowels // and /i/.
5.3.3.1.3
SSE //
This section investigates whether the bilingual subject AN acquired the SVLR
pattern for the vowel // in a similar way to the SSE monolingual children. The design of
the ANOVA was the same as for AN’s vowel /i/ in Section 5.3.3.1.1
The result of the test showed that there was a significant main effect of the
following consonant on the duration of the vowel // [F(2,14)=9.03; p<.05] irrespective of
age and bilinguality. The descriptive statistics for the subject AN for each age are
presented in Table 5-12. The corresponding values for the SSE monolingual children (per
age) are found in Table 5-8. The direction of the main effect of the following consonant
on the duration of // is parallel in all age groups (see Figure 5-12).
There was no significant main effect of “AGE” or “BILINGUALITY”, and no
significant interactions. This result means that AN acquired the SVLR for // similarly to
the SSE peers. The result AN is consistent with her own results for /i/.
The longitudinal results for // revealed a statistically insignificant trend which was
nonetheless comparable to /i/. AN’s realisation of SVLR for // had a rather small
VLS/VF ratio of .72 at the age of 3;8. The ratio is substantially greater than the largest
VLS/VF ratio of .41 among the SSE monolingual peers (C3 aged 3;4). Similarly to /i/,
this smaller extent of SVLR for // mat the youngest age might indicate language
interaction from AN’s Russian vowel duration system.
174
450
450
450
400
400
400
350
350
350
300
300
300
250
250
250
200
200
200
150
150
150
100
100
100
50
SSE child 3;4 to 3;11
AN 3;8
0
50
SSE child 4;0 to 4;4
AN 4;2
0
voiced
fricative
voiced
stop
voiceless
stop
50
SSE child 4;5 to 4;9
AN 4;5
0
voiced
fricative
voiced
stop
voiceless
stop
voiced
fricative
voiced
stop
voiceless
stop
Figure 5-12 Median duration of the vowel // (ms) as a function of the following consonant for subject AN
compared to age matched SSE monolingual children in three age samples.
Table 5-12 Number of tokens and median duration of the vowel // as a function of the following consonant
for subject AN in three age samples
Following
AN’s Age Consonant
3;8
voiced fricative
voiced stop
voiceless stop
Total
4;2
voiced fricative
voiced stop
voiceless stop
Total
4;5
voiced fricative
voiced stop
voiceless stop
Total
Median
duration (ms) n of tokens
234
215
168
205
183
154
112
146
325
204
117
159
38
37
47
122
9
5
13
27
27
31
51
109
175
5.3.3.1.4
MSR/SSE differentiation for /i/
In Section 5.3.1.1 we showed a substantial crosslinguistic difference in the
postvocalic conditioning of vowel duration for the MSR and SSE adults. The difference
was most obvious in the context before voiced fricatives (short in MSR and long in SSE),
while in the other two consonantal contexts /i/ remained relatively short (see Figure 5-1).
If AN differentiates between her two languages for this variable we would expect to see a
substantial crosslinguistic difference in the vowel duration before voiced fricatives.
To establish this crosslinguistic difference in AN’s speech and any age effects, we
entered all subject’s renditions of the carrier words with target /i/ in a multivariate
ANOVA. We applied the exclusion criteria specified in Section 5.2 to the individual
renditions. The ANOVA had mean vowel duration as a dependent variable and three fixed
factors: i.e. “FOLLOWING CONSONANT” (voiced fricative, voiced and voiceless stop),
“LANGUAGE” (SSE and MSR) and “AGE” (3;8, 4;2 and 4;5).
The results of the ANOVA showed that there was a highly significant main effect
[F(2,602=17.059; p<.01)] of the factor “FOLLOWING CONSONANT” on the duration
of the vowel /i/. The direction of this effect per age and language is shown in Figure 5-13.
The descriptive statistics for each consonantal context, language and age are reported in
Appendix H. This result paralleled the main effect between MSR and SSE adult models in
Section 5.3.1.1.
We ran Tukey HSD post hoc tests to determine which of the three consonantal
contexts contributed to the effect of the “FOLLOWING CONSONANT”. The results
revealed a significant difference (p<.05) between the duration of /i/ before voiced
fricatives compared to voiced and voiceless stops. Thus, this result replicated the adult
results: i.e. for both languages there was a parallel direction of postvocalic conditioning
before voiced fricatives compared to the other contexts.
With regard to the language differentiation, there was no significant main effect of
language or age. However, there was a highly significant interaction [F(2,602=19.165;
p<.01)] between the factors “FOLLOWING CONSONANT” and “LANGUAGE”. This
interaction suggests a differential implementation of the duration of /i/ between AN’s two
languages depending on the following consonant. Such an interaction can be expected
176
given that adult SSE and MSR models in Figure 5-1 showed a differential implementation
of duration before voiced fricatives.
Besides, the ANOVA showed a highly significant interaction [F(4,602=5.231;
p<.01)] between the “FOLLOWING CONSONANT”, “LANGUAGE” and “AGE”. This
interaction can be seen in Figure 5-13. In SSE, AN showed a fairly consistent SVLR
pattern irrespective of her age, while AN’s MSR pattern for /i/ differs between the three
age samples. In MSR, AN increased the duration of /i/ depending on the following
consonant. Nevertheless, the increase between the contexts of voiced and voiceless stops
in MSR is inconsistent between the three age conditions. There were no other significant
main effects or interactions.
400
400
400
350
350
350
300
300
300
250
250
250
200
200
200
150
150
150
100
100
100
AN 3;8 SSE
50
AN 4;2 SSE
50
AN 3;8 MSR
0
AN 4;2 MSR
0
voiced
fricative
voiced
stop
voiceless
stop
AN 4;5 SSE
50
AN 4;5 MSR
0
voiced
fricative
voiced
stop
voiceless
stop
voiced
fricative
voiced
stop
voiceless
stop
Figure 5-13 Mean duration of the vowel /i/ (ms) as a function of the following consonant produced by the
subject AN in MSR and SSE in three longitudinal age samples (from left to right).
The difference between AN’s two languages is not obvious at the age of 3;8 when
the VLS/VF ratios are almost equal (.72 in SSE and .73 in MSR). If we also consider that
the fact that her VLS/VF ratio in SSE exceeds the maximal monolingual child ratio of .62,
the possibility of language interaction from MSR becomes clearer. At no age is AN’s
MSR pattern consistent with the SVLR pattern in SSE.
Generally across age samples, it seems that the crosslinguistic difference for AN’s
/i/ is substantial. It is consistent with a systematic pattern of vowel duration conditioning
177
in SSE, and with a less systematic one in MSR. Recall that in Section 5.3.1.4 we
concluded that the adult pattern of postvocalic conditioning in Russian was small in extent
and varied in individual speakers showing its non-obligatory nature.
Let’s consider a possible explanation for the lesser systematicity in AN’s MSR
consonantal conditioning of duration. The difference between AN’s longitudinal results,
her Russian-speaking mother’s pattern and that of the Russian-speaking experimenter (in
child directed speech) is presented in Figure 5-14.
400
400
400
350
350
350
300
300
300
250
250
250
200
200
200
150
150
150
100
100
100
AN 3;8 MSR
MSR mother
R3 CDS
50
0
voiced
fricative
voiced
stop
voiceless
stop
AN 4;2 MSR
MSR mother
R3 CDS
50
0
AN 4;5 MSR
MSR mother
R3 CDS
50
0
voiced
fricative
voiced
stop
voiceless
stop
voiced
fricative
voiced
stop
voiceless
stop
Figure 5-14 A comparison of AN’s longitudinal results for the mean duration of /i/ (ms) to that of her
mother speaking Russian and of the principal investigator (subject R3) in child directed speech.
The figure shows that at the age of 4;2 AN’s contextual increase in the duration of
/i/ parallels that of her mother, but not at other ages. However, there was a difference in
the data elicitation used for AN and her mother: i.e. AN’s mother read out utterances from
the computer screen, while AN was recorded in a more spontaneous speech elicitation
situation involving structured games. Principal investigator’s (“R3 CDS” in Figure 5-14,
“CDS”: child directed speech) speech during structured games might be more
representative of Russian duration patterns given the elicitation situation. A comparison of
the patterns of “R3 CDS” and AN’s mother shows the overall longer absolute duration of
“R3 CDS” compared to that of AN’s mother. Thus, AN’s overall longer duration of /i/ in
178
Russian compared to her mother might be due to the differences in the data elicitation
procedure.
To conclude, the results of AN’s realisation of postvocalic conditioning for
/i/ suggest that she differentiated between her two languages from the age of 4;2.
5.3.3.1.5
MSR/SSE differentiation for /u/ and //
To establish language differentiation in AN’s production of postvocalic conditioning
of the SSE // and MSR /u/ and age effects, we entered all AN’s individual renditions of
the carrier words with targets // and /u/ in a multivariate ANOVA. The ANOVA had the
same design as for /i/ in Section 5.3.3.1.4.
The results showed that there was a highly significant main effect [F(2,516=57.960;
p<.01)] of the factor “FOLLOWING CONSONANT” on the duration of // and /u/. The
direction of this effect per age and language is shown in Figure 5-15. The descriptive
statistics for the close rounded vowels per consonantal context, language and age are
reported in Appendix I. This result paralleled the main effect between MSR and SSE adult
models in Section 5.3.1.1.
We ran Tukey HSD post hoc tests to determine which of the three consonantal
contexts contributed to the main effect of the “FOLLOWING CONSONANT”. The
results revealed that the main effect was contributed by the significant difference (p<.05)
of the duration // and /u/ in the context of voiced fricatives compared to either voiced or
voiceless stops. Thus like for /i/, AN’s result paralleled the results of the SSE and MSR
adults in that both languages had some extent of postvocalic conditioning before voiced
fricatives compared to the other contexts.
Unlike for /i/, there was also a highly significant main effect of the factor “AGE”
[F(2,516)=4.785, p<.01]. Tukey HSD post hoc tests for the age effects showed that there
was a significant difference (p<.05) between the ages of 3;8 and 4;5. The effect can be
seen in Figure 5-15. The crosslinguistic patterns of the postvocalic conditioning appear to
be the opposite between the age of 3;8 and 4;5.
The difference between AN’s longitudinal results and her mother’s pattern is shown
in Figure 5-16. Similarly to the production of /i/, the overall higher duration values
irrespective of the context in AN compared to her mother’s can be attributed to the
difference in the data elicitation procedure.
179
450
450
450
400
400
400
350
350
350
300
300
300
250
250
250
200
200
200
150
150
150
100
100
100
AN 3;8 SSE
50
AN 4;2 SSE
50
AN 3;8 MSR
0
AN 4;2 MSR
0
voiced
fricative
voiced
stop
voiceless
stop
AN 4;5 SSE
50
AN 4;5 MSR
0
voiced
fricative
voiced
stop
voiceless
stop
voiced
fricative
voiced
stop
voiceless
stop
Figure 5-15 Mean duration of the close rounded vowels (ms) as a function of the following consonant for
the subject AN in MSR and SSE.
450
450
450
400
400
400
350
350
350
300
300
300
250
250
250
200
200
200
150
150
150
100
100
100
AN 3;8 MSR
50
MSR mother
R3_CDS
0
voiced
fricative
voiced stop voiceless
stop
AN 4;2 MSR
MSR mother
R3_CDS
50
0
voiced
fricative
voiced
stop
voiceless
stop
AN 4;5 MSR
MSR mother
R3_CDS
50
0
voiced
fricative
voiced
stop
voiceless
stop
Figure 5-16 A comparison of AN’s longitudinal results for the mean duration of MSR /u/ (ms) compared to
that of her mother speaking Russian, and of the principal investigator (subject R3) in Russian child directed
speech.
180
The MSR pattern at AN’s younger age of 3;8 is dissimilar to both adult speakers,
and shows some potential influence from SVLR. At the age of 4;5, AN shows a similar
pattern of postvocalic conditioning of /u/ to both of her mother and her interlocutor “R3
CDS”. Thus her production of duration of the close rounded vowels becomes more adultlike at the age of 4;5. Altogether towards the age of 4;5 AN produced a differentiated
pattern for MSR (and SSE) which is close to both adult models.
5.3.3.1.6
Summary of AN’s results
The results of acquisition of postvocalic conditioning of vowel duration by the
bilingual subject AN suggest that overall she differentiated between her two languages in
a way similar to the monolingual speakers.
First of all, AN acquired the postvocalic conditioning of vowel duration for the
vowels /i/ and // in a way similar to the SVLR pattern of the SSE monolingual peers. She
produced a consistent SVLR pattern for both vowels in all age samples gathered. The
factors “bilinguality” or “age” were consistently not significant compared to the SSE
peers.
However, despite the non-significance of the main effect of age or bilinguality for
both vowels, there was a clear longitudinal trend which is unlikely to be a coincidence. At
the age of 3;8, AN produced consonant-dependent vowel duration with VLS/VF ratios
exceeding the maximal monolingual ratios. This means that her average increase of
duration as a function of the following consonant was smaller compared to the SSE
monolingual children at the age of 3;8. As we discussed in Section 2.3.2., the Markedness
Hypothesis (Müller, 1998) had been invoked to explain smaller ratios for the for intrinsic
vowel duration conditioning for Spanish-German bilingual children reported in Kehoe
(2002). Kehoe’s study showed that the bilingual children aged 2;3 to 2;6 produced a much
smaller extent of the durational difference between short and long vowels than the
German monolingual children. AN had a similar pattern. Such a difference in the
implementation of the SVLR of /i/ and // between AN and the SSE-speaking peers
could, thus, be an effect of language interaction from the Russian unmarked (less
ambiguous) system of postvocalic conditioning of vowel duration.
At the age of 4;5, AN produced consonant-dependent duration values for both
SVLR vowels in a way very similar to the SSE monolingual peers. Given the fact that we
observed a significant developmental trend in the data of the SSE monolingual children,
181
whereby the oldest group of children aged 4;5 to 4;9 showed no significant differences to
the SSE adult data, we can conclude that AN’s SVLR patterns for both vowels also
became more like the SSE adult model.
Concerning language differentiation for the two SVLR vowels, generally AN did
differentiate between her two languages. The crosslinguistic differences for both vowel
sets were substantial and consistent. AN produced a systematic pattern of vowel duration
conditioning in SSE in the three age samples and a less systematic one in MSR. It is, thus,
possible that more variable longitudinal patterns produced by AN in MSR were
compatible with a non-obligatory system of postvocalic conditioning in Russian (even
though there were significant trends in the adult data in Section 5.3.1.4).
AN’s MSR data was quite similar to the patterns observed for the mother. They both
had a tendency to produce somewhat longer duration in vowels before voiced fricatives
compared to voiceless stops context. However, this was not true at the age of 3;8.
Comparing AN’s crosslinguistic pattern for the vowels / u/ at the age of 3;8 as opposed
to 4;5, we saw the reversal of the language patterns (which contributed to the significance
of the 3-way interaction between the factors “FOLLOWING CONSONANT”, “AGE ”
and “LANGUAGE” in the ANOVA). The direction of the significant interaction was
surprising, since it appeared that at the age of 3;8 AN produced a more SVLR-like pattern
in her Russian than in SSE, and the pattern reversed at the age of 4;5 towards an adult-like
model for both languages.
This pattern suggests a bi-directional influence of the systems of the postvocalic
conditioning in MSR and SSE in AN’s speech production at the age of 3;8. However, it is
a puzzling finding, because for a native-speaker of Russian a transfer of an SVLR-like
system into Russian is totally irrelevant. Increased duration of a stressed vowel in Russian
would primarily be perceived as a prominence related event, even though the event could
turn out to be pragmatically odd. We used sufficient number of repetitions in this study to
derive the means (see Appendix I), so AN’s reversed crosslinguistic patterns at the age of
3;8 and 4;5 should be representative for the subject’s speech production given the
elicitation mode.
The bi-directional language interaction are not compatible with the directions
predicted by the CCCH (Döpke, 1998; Döpke, 2000) and the Markedness Hypothesis
(Müller, 1998), as well as the language dominance hypothesis (Petersen, 1988), since all
of them are unidirectional. We further address these issues in the discussion chapter.
182
No language differentiation could be assessed for the postvocalic conditioning of the
lax vowel, since it is not featured in Russian. The result of the comparison of AN’s
production in SSE showed that AN acquired a relatively short conditioning of the lax
vowel // in a way similar to the SSE monolingual children.
This result also means that AN produced two different patterns of postvocalic
conditioning of vowel duration in SSE: (1) an SVLR pattern for the vowels /i/ and //, (2)
and a relatively short conditioning of the lax vowel // irrespective of the following
consonant. AN’s acquisition of SVLR in SSE means that she acquired the Scottish variety
of English in favour of other SSBE-like varieties co-occurring in Edinburgh.
183
5.3.3.2
5.3.3.2.1
Subject BS
SSE /i/
We investigate whether the bilingual subject BS acquired the postvocalic
conditioning of the duration of /i/ in a way similar to the SSE monolingual peers, and
whether the pattern had any significant age effect.
The set up for ANOVA was the same as for subject AN described in Section
5.3.3.1.1. However, the between-subject factor “AGE” had three levels (3;4 to 3;8, 3;9 to
4;1 4;2 to 4;9). The age levels of the SSE monolingual children matched BS’ ages of 3;4,
3;10 and 4;5.
The results showed that there was a highly significant main effect of the following
consonant on the duration of the vowel /i/ [F(2,14)=27.812; p<.01]. We observed the
same main effect in the comparison of the adults (Figure 5-1) in Section 5.3.1.1. The
descriptive statistics of this test for BS are presented in Table 5-13. The corresponding
values for each of the SSE monolingual children are found in Appendix E.
The test also showed a highly significant interaction between “FOLLOWING
CONSONANT” and “BILINGUALITY” [F(2,14)=10.361; p<.01]. There were no other
significant effects or interactions. The direction of the main effect of the following
consonant on the duration of /i/ and its interaction with BS’ bilinguality is shown in
Figure 5-17. The figure shows that unlike the SSE peers BS did not produce a sufficiently
long duration of /i/ before voiced fricatives. Recall that this is exactly the context and
direction, in which the crosslinguistic difference between MSR and SSE manifested itself
in the speech of the adult controls (Figure 5-1, there we had a significant interaction
between “LANGUAGE” and “FOLLOWING CONSONANT”). This result means that
BS had not acquired the SVLR for /i/ like the SSE monolingual peers, and that this
difference was due to her bilinguality. It seems that BS followed the MSR model of
postvocalic conditioning of vowel duration for this vowel, rather than the SSE one. BS’
VLS/VF ratios were .94 (age 3;4), .91 (age 3.10). 73 (age 4;5). In all age samples the
ratios exceeded the maximal ratio of .65 produced by C5 at the age of 4;0.
184
400
400
400
350
350
350
300
300
300
250
250
250
200
200
200
150
150
150
100
100
100
SSE child 3;4 to 3;8
50
SSE child 3;9 to 4;1
50
BS 3;10
BS 3;4
0
BS 4;5
0
voiced
fricative
voiced
stop
SSE child 4;2 to 4;9
50
voiceless
stop
0
voiced
fricative
voiced
stop
voiceless
stop
voiced
fricative
voiced
stop
voiceless
stop
Figure 5-17 Median duration of the vowel /i/ (ms) as a function of the following consonant produced by the
subject BS compared to the SSE monolingual peers in three age samples.
Table 5-13 Median duration and number of tokens of the vowel /i/ as a function of the following consonant
produced by the subject BS in three age samples.
Following
BS’ Age consonant
3;4
voiced fricative
voiced stop
voiceless stop
Total
3;10
voiced fricative
voiced stop
voiceless stop
Total
4;5
voiced fricative
voiced stop
voiceless stop
Total
Median
n of
duration (ms) tokens
224
69
197
28
211
61
211
158
211
47
156
22
193
43
182
112
249
68
175
22
182
64
206
154
185
Despite the non-significance of the factor “AGE” and the lack of interactions with
this factor, Figure 5-17 shows that at the age of 4;5 BS does produce a more SVLR-like
pattern than the patterns at the younger ages. Her VLS/VF ratio of .73 at the age of 4;5 is
comparable to .72 of AN at the age of 3;8.
It is possible that BS’ MSR system had an influence on her SSE production.
However, this influence is in line with both unmarkedness of the Russian postvocalic
conditioning system and BS’ language exposure patterns. Besides, a stronger influence of
the Russian model in BS’ case compared to AN suggests that the amount of language
interaction between the bilingual child’s languages can be affected by the individual
language exposure patterns.
5.3.3.2.2
SSE //
We investigate BS’ acquisition of postvocalic conditioning of duration of the SSE
vowel // in comparison to the SSE monolingual peers and whether there is any significant
age effect. The ANOVA had the same design as in Section 5.3.3.1.1, except that the
within-subject factor “FOLLOWING CONSONANT” had different levels: i.e. voiced
fricative, voiced stop and voiceless fricative. The factor “AGE” had three levels: 3;4 to
3;8, 3;9 to 4;1 and 4;2 to 4;9.
The results showed no significant main effects. However, there was a highly
significant interaction [F(2,14)=12.112, p<.01] between the factors “FOLLOWING
CONSONANT” and “BILINGUALITY”. The descriptive statistics for BS are presented
in Table 5-13. The corresponding values for each of the SSE monolingual children are
found in Appendix G. The direction of the differences between BS and the SSE
monolingual children per age is shown in Figure 5-18. The figure shows that BS produced
an opposite postvocalic conditioning pattern compared to the SSE peers: i.e. she realised
longer duration before voiceless fricatives than before voiced fricatives irrespective of her
age.
This interaction shows that similarly to the results for /i/, BS’ production of
postvocalic conditioning of the duration of // is different from the SSE monolingual peer
group.
186
400
400
400
350
350
350
300
300
300
250
250
250
200
200
200
150
150
150
100
100
100
SSE child 3;4 to 3;8
50
SSE child 3;9 to 4;1
50
BS 3;4
BS 3;10
0
BS 4;5
0
voiced
fricative
voiced
stop
SSE child 4;2 to 4;9
50
voiceless
fricative
0
voiced
fricative
voiced
stop
voiceless
fricative
voiced
fricative
voiced
stop
voiceless
fricative
Figure 5-18 Median duration of the target vowel // (ms) as a function of the following consonant produced
by the subject BS compared to the SSE monolingual peers in three age samples.
400
400
400
350
350
350
300
300
300
250
250
250
200
200
200
150
150
150
100
100
100
50
SSE child 3;4 to 3;8
50
BS 3;4
SSE child 3;9 to 4;1
BS 3;10
0
voiced
stop
voiceless
fricative
SSE child 4;2 to 4;9
BS 4;5
0
voiced
fricative
50
0
voiced
fricative
voiced
stop
voiceless
fricative
voiced
fricative
voiced
stop
voiceless
fricative
Figure 5-19 Median duration of all phonetic realisations of [] (ms) as a function of the following consonant
produced by the subject BS compared to the SSE monolingual peers in three age samples.
187
Table 5-14 Number of tokens and median duration of the vowel // as a function of the following consonant
produced by the subject BS in three age samples.
Median
Following
duration
BS’ Age consonant
(ms)
n of tokens
3;4
voiced fricative
199
14
voiceless fricative
259
28
voiced stop
198
35
Total
207
77
3;10
voiced fricative
183
6
voiceless fricative
229
37
voiced stop
223
31
Total
234
74
4;5
voiced fricative
137
19
voiceless fricative
212
37
voiced stop
211
22
Total
192
78
At this point it is worth considering the segmental aspect of BS’ acquisition of the
vowel // in its relation to the vowel duration. Recall that regarding vowel quality across
all age samples BS produced only 35% of adult-like [] as opposed to 99.1% of the SSE
peers. 98.3% of her non-adult-like realisations involved production of vowel [i] for the
SSE target //. It is then worth considering how BS’ phonetically adult-like [] vowels
compare to those produced by the SSE monolingual peers.
The duration of // in Figure 5-18 refers to all the adult targets produced by BS,
while Figure 5-19 plots only the ones that were auditorily labeled as []. The same
ANOVA based on the median duration of vowels phonetically realised as [] (rather than
all adult targets), produced an almost significant effect [F(2,14)=3.336, p=0.065] for the
factor
“FOLLOWING
[F(2,14)=3.310,
CONSONANT”,
p=0.067]
between
and
almost
“FOLLOWING
significant
interaction
CONSONANT”
and
“BILINGUALITY”. Thus, given this phonetically motivated set-up, the importance of
BS’ bilinguality decreased, while the significance of the main effect of the following
consonant increased compared to the SSE monolingual children. So that BS’ realisation of
duration of phonetic [] was less different from that of the monolingual peers. However,
the effect was only near significant and concerned only 35% of BS’ attempts to produce
the lax vowel //, so that overall the BS vowel was mature neither at the segmental nor at
the suprasegmental (durational) level.
188
Once again a possible explanation for this effect in BS speech is her greater
exposure to Russian than to SSE.
5.3.3.2.3
SSE //
To establish how BS’ production of postvocalic conditioning of the duration of the
SSE vowel // compares to that of the SSE monolingual peers, and whether there was any
observable age effect, we entered all median values of duration for the SSE target // in
different consonantal contexts in a mixed design ANOVA. The test had the same set up as
in Section 5.3.3.2.1.
The results showed that there was a highly significant main effect of the following
consonant on the duration of the vowel // [F(2,14)=6.714; p<.01]. There were no other
significant effects or interactions. The descriptive statistics for the subject BS are
presented in Table 5-15. The values for each of the SSE monolingual children are found
in Appendix F. The direction of the main effect of the following consonant on the
duration of // is shown in Figure 5-20.
Unlike for /i/, the test showed no significant interactions of the factors
“BILINGUALITY” and “FOLLOWING CONSONANT”. In fact, the results paralleled
AN’s test for this vowel. However, longitudinally there was a substantial difference
between the subjects AN and BS (compare Figure 5-15 and Figure 5-20). AN’s VLS/VF
ratio for // decreased with age towards a more SSE adult like values. For BS, there was
little longitudinal change in the VLS/VF ratio throughout the considered ages: i.e. the
VLS/VF ratio was .89 at the age of 3;4 , it decreased to .72 at the age of 3;10, and it was
.81 at the age of 4;5. In the three age samples considered, BS’ VLS/VF ratio was
substantially greater than the monolingual upper boundary of .62. It was more similar to
the adult Russian ratio of .85, and it did not change much in the time considered.
189
400
400
400
350
350
350
300
300
300
250
250
250
200
200
200
150
150
150
100
100
100
SSE child 3;4 to 3;8
50
SSE child 3;9 to 4;1
50
0
BS 4;5
0
voiced
fricative
voiced
stop
SSE child 4;2 to 4;9
50
BS 3;10
BS 3;4
voiceless
stop
0
voiced
fricative
voiced
stop
voiceless
stop
voiced
fricative
voiced
stop
voiceless
stop
Figure 5-20 Median duration of the vowel // (ms) as a function of the following consonant for subject BS
compared to age matched SSE monolingual children for three longitudinal moments.
Table 5-15 Number of tokens and median duration of the vowel // as a function of the following consonant
for subject BS for three longitudinal moments.
Median
duration
BS’ Age Following consonant (ms)
n of tokens
3;4
voiced fricative
227
41
voiced stop
196
18
voiceless stop
203
13
Total
226
72
3;10
voiced fricative
255
20
voiced stop
229
18
voiceless stop
185
33
Total
224
71
4;5
voiced fricative
271
31
voiced stop
219
39
voiceless stop
219
21
Total
241
91
190
This comparison suggests that even though the ANOVA showed a statistically
significant main effect of the following consonant on the duration of the vowel //, and no
effect of bilinguality of BS, the subject was still different from the monolingual children
in the reduced extent of the consonantal conditioning throughout the period considered.
The non-significance of the factor “BILINGUALITY” was most probably due to the
parallel direction of the main effect of the “FOLLOWING CONSONANT” in BS’ case
combined with the relatively low number of the subjects in the test.
Altogether we can conclude that BS’ pattern of the postvocalic conditioning for //
looked more SVLR-like than the pattern observed for /i/. However, BS’ VLS/VF ratios
underwent little longitudinal change and were beyond the monolingual ranges throughout
the study. BS’ pattern of postvocalic conditioning of // had a smaller extent (values
closer to 1) of the VLS/VF ratio compared to the SSE monolingual children. A similar
“reduced” SVLR pattern was also observed for AN at the age of 3;8. Both AN’s and BS’
patterns were consistent with the results for German-Spanish bilinguals (Kehoe, 2002)
discussed in Section 2.3.2 in connection to the relative markedness of the two languages
in contact of a bilingual child. However, the variable extent of the difference in vowel
duration conditioning between the bilingual subjects compared to the SSE monolingual
children suggests that factors other than relative language structure might as well be at
work.
191
5.3.3.2.4
MSR/SSE differentiation for /i/
To establish whether BS differentiated between her MSR and SSE production of
postvocalic conditioning of duration for the vowel /i/ and whether there was an age effect,
we entered all BS’ individual renditions of the carrier words with target /i/ (after applying
exclusion criteria specified in Section 5.2) in a multivariate ANOVA. The ANOVA had
vowel duration as a dependent variable with three fixed factors: i.e. “FOLLOWING
CONSONANT” (voiced fricative, voiced and voiceless stop), “LANGUAGE” (SSE and
MSR) and “AGE” (3;4, 3;10 and 4;5).
The results showed neither a significant main effect of the factor “FOLLOWING
CONSONANT” on the duration of /i/, nor any interactions with this factor. This result
means that overall BS did not differentiate the postvocalic conditioning between her two
languages. The descriptive statistics for BS’ production of vowel duration per consonantal
context, language and age are reported in Appendix J. The direction of the crosslinguistic
differences is shown in Figure 5-21.
There was a highly significant main effect of the factor “AGE” [F(2,610)=6.655;
p<.01]. There were no other significant main effects or interactions. Since there were no
significant interactions between language and age and the following consonant, which
would show language-specific implementation of the postvocalic conditioning, this main
effect of “AGE” is not relevant. In fact, the results of Tukey HSD post hoc tests showed
that there was a significant [p<.05] age effect between the age of 3;10 compared to the
ages of 3;4 and 4;5. Thus, this pattern is not linear longitudinally.
In fact, Figure 5-21 shows that at the age of 3;4 and 3;10 BS produced quite similar
postvocalic conditioning patterns in both SSE and MSR. This observation agrees with the
results of the comparison of BS’ speech production to that of the SSE monolingual peers,
for which we found a highly significant main effect of the factor “BILINGUALITY”
contributed by the relatively shorter duration of /i/ before voiced fricatives compared to
the SSE monolingual peers.
192
450
450
450
400
400
400
350
350
350
300
300
300
250
250
250
200
200
200
150
150
150
100
100
BS 3;4 SSE
50
BS 3;10 SSE
50
BS 3;4 MSR
0
100
BS 3;10 MSR
0
voiced
fricative
voiced
stop
BS 4;5 SSE
50
BS 4;5 MSR
0
voiceless
stop
voiced
fricative
voiced
stop
voiceless
stop
voiced
fricative
voiced
stop
voiceless
stop
Figure 5-21 Longitudinal results for the mean duration of the vowel /i/ (ms) as a function of the following
consonant produced by the subject BS in MSR and SSE
450
450
BS 3;10 SSE
BS 3;4 SSE
BS 3;4 MSR
400
BS 3;10 MSR
400
R3 CDS
350
R3 CDS
MSR mother
300
300
250
250
250
200
200
200
150
150
150
100
100
100
50
50
50
0
voiced
fricative
voiced
stop
voiceless
stop
BS 4;5 MSR
350
300
0
BS 4;5 SSE
400
MSR mother
MSR mother
350
450
R3 CDS
0
voiced
fricative
voiced
stop
voiceless
stop
voiced
fricative
voiced
stop
voiceless
stop
Figure 5-22 A comparison of BS’ longitudinal results for the mean duration of /i/ in SSE and MSR to those
of her mother speaking Russian, and those of the principal investigator (subject R3) in child directed MSR.
193
The results indicated that overall BS did not differentiate between the two languages
according to the adult models of postvocalic conditioning. This pattern of language
interaction agrees with both unmarkedness of the Russian model and BS’ language
exposure pattern. However, the postvocalic conditioning at the age of 4;5 suggests that
BS’ crosslinguistic production of the patterns started to look more language-specific.
The latter point becomes more obvious if we compare BS’ longitudinal results in
both languages to her mother’s MSR pattern, and to that of the principal investigator in
child directed speech. This difference is shown in Figure 5-22. Similarly to AN’s pattern
in Section 5.3.3.1.4, the overall higher duration values I in BS’ production compared to
her mother’s might be attributed to the difference in the data elicitation procedure. At the
age of 4;5, BS’ MSR production of postvocalic conditioning of /i/ showed a similar
pattern to those of her mother and of the principal investigator (“R3 CDS” in Figure
5-22), while her SSE pattern looks more SVLR-like.
5.3.3.2.5
MSR/SSE differentiation for /u/ and //.
To establish whether BS differentiated between her MSR and SSE production of the
postvocalic conditioning of duration for the vowels /u/ and // and whether there was an
age effect for this crosslinguistic difference, we entered all BS’ individual renditions of
the carrier words with adult targets /u/ and // (after applying exclusion criteria specified
in Section 5.2) in a multivariate ANOVA. The test had the same design as in Section
6.3.3.2.4.
The results showed a highly significant main effect [F(2,491)=5.992; p<.01] of the
“FOLLOWING CONSONANT” on the duration of these vowels. There were no other
significant main effects or interactions. To determine the direction of the consonantal
effect, we ran Tukey HSD post hoc tests. The tests revealed that the main effect was due
to a significant difference [p<.05] between duration of the vowels before voiced fricatives
compared either to the context before voiced stops or to that before voiceless stops. This
result agrees with the direction of the crosslinguistic differences between the SSE and
MSR adult models. The descriptive statistics of the test are reported in Appendix K. The
direction and the extent of the crosslinguistic differences per language and age are shown
in Figure 5-23.
194
450
450
450
400
400
400
350
350
350
300
300
300
250
250
250
200
200
200
150
150
150
100
BS 3;4 SSE
50
BS 3;4 MSR
0
100
BS 3;10 SSE
50
BS 3;10 MSR
0
voiced
fricative
voiced
stop
voiceless
stop
100
BS 4;5 SSE
50
BS 4;5 MSR
0
voiced
fricative
voiced
stop
voiceless
stop
voiced
fricative
voiced
stop
voiceless
stop
Figure 5-23 Mean duration of the close rounded vowels (ms) as a function of the following consonant
produced by the subject BS in MSR and SSE in three age samples.
450
450
450
BS 3;10 SSE
BS 3;4 SSE
400
350
BS 3;4 MSR
400
MSR mother
350
BS 3;10 MSR
MSR mother
BS 4;5 SSE
400
R3_CDS
R3_CDS
R3_CDS
300
300
250
250
250
200
200
200
150
150
150
100
100
100
50
50
50
0
voiced
fricative
voiced
stop
voiceless
stop
MSR mother
350
300
0
BS 4;5 MSR
0
voiced
fricative
voiced
stop
voiceless
stop
voiced
fricative
voiced
stop
voiceless
stop
Figure 5-24 A comparison of BS’ longitudinal results for the mean duration of /u/ and // (ms) in SSE and
MSR to those of her mother speaking Russian, and those of the principal investigator (subject R3_CDS) in
child directed MSR speech.
195
Despite the significance of the factor “FOLLOWING CONSONANT”, the test
showed neither significant main effect of “LANGUAGE”, nor significant interactions
between the factors (which could be expected given the extent of the differences between
SSE and MSR in the contexts before voiced fricatives). The result shows that though BS
produced the language-specific direction of the postvocalic conditioning for the close
rounded vowels in both languages, she did not produce the language-specific extent of it
to achieve sufficient language differentiation.
However, BS’ crosslinguistic pattern of postvocalic conditioning emerging at the
age 4;5 suggests that BS speech production became more differentiated and more
language-specific. Recall that a similar trend was found for BS’ /i/ at the age of 4;5 in
Section 5.3.3.2.4. In that sense the two sub-tests agree.
The difference between BS’, her mother’s and the interlocutor’s speech production
is shown in Figure 5-24. As for /i/, the overall higher duration values in BS’ production
compared to her mother’s can be attributed to the difference in the data elicitation
procedure. Apart from the absolute differences between BS’ and her mother’s production,
BS’ MSR production showed a similar pattern of postvocalic conditioning of the close
rounded vowels to both her mother and the interlocutor (“R3 CDS”).
5.3.3.2.6
Summary of BS’ results
In this section we investigated bilingual SSE/MSR patterns of postvocalic
conditioning of vowel duration produced by the subject BS, who by the age of 4;5 had
received substantially more input in Russian than in SSE. We addressed the question of
language differentiation and considered the possibility of language interaction.
The results suggest an overall lack of language differentiation for the postvocalic
vowel duration conditioning patterns. The patterns produced by BS seemed to follow a
Russian model of postvocalic conditioning in SSE irrespective of the vowel, while her
MSR production was similar to that of her mother.
First of all, we addressed BS’ acquisition of the postvocalic conditioning of duration
for the vowels /i/ and // requiring application of SVLR in the SSE adult model. We
compared BS’ production for these vowels to that of the SSE monolingual peers. The
results for both vowels showed that BS’ production of the postvocalic conditioning was
very different from the SSE monolingual peers. The bilinguality of BS played a highly
196
significant effect on the production of SVLR in /i/: i.e. BS produced a reduced extent of
the postvocalic conditioning. The subject produced greater VLS/VF ratios (.73 to.94)
compared to the SSE peers (.65 maximally, .47 on average), and her patterns were more
variable.
For the vowel //, the interaction between BS’ bilinguality and SVLR did not play a
statistically significant role, possibly because BS’ patterns were more systematic.
However, as for /i/, BS produced a reduced extent of the postvocalic conditioning of //
compared to the monolingual children, again mainly due to the fact that the vowels before
voiced fricatives were not long enough. Irrespective of BS’ age, her VLS/VF ratios were
nearer to 1 than the maximal individual VLS/VF ratios produced the SSE monolingual
children, and closer to the Russian adult model (VLS/VF of .85 for either vowel).
The results of the crosslinguistic comparison of BS’ production of postvocalic
conditioning between SSE and MSR showed no significant main effects or interactions.
This means that BS did not differentiate between her languages in the age samples
considered. For both unrounded and rounded close vowels there was no significant
crosslinguistic difference in vowel duration before voiced fricatives compared to voiceless
stops, which we expected, given the differences between the adult models. At the same
time the postvocalic conditioning patterns for both vowels in MSR showed similarities to
both production of BS’ mother and that of the principal investigator. Since BS’ production
was different from the SSE monolingual peers and similar to the Russian adult model, we
can conclude that BS followed the Russian model in her SSE production.
The comparison of BS’ production of postvocalic conditioning for the vowel // to
the SSE monolingual peers showed a highly significant interaction of the factors
“FOLLOWING CONSONANT” and “BILINGUALITY”. In fact, the patterns were the
opposite of the averaged monolingual child results: i.e. BS produced longer duration of
vowels before voiceless fricatives compared to all other contexts. Consequently BS’
production of duration for the SSE vowel // was not SSE-specific.
197
6 Acquisition of Vocal Effort
6.1 Introduction
This chapter presents data on the acquisition of vocal effort by bilingual and
monolingual pre-school children. In Section 2.4.1., we showed that despite qualitative
physiological differences in respiratory and laryngeal control between adults and children
(Titze, 1994; Mackenzie Beck, 1997), children are able to control their respiratory and
laryngeal mechanisms sufficiently to achieve a fine-grained control of phonatory loudness
similarly to adults (Strathopoulos & Sapienza, 1993; Strathopoulos, 1995; Traunmüller &
Eriksson, 2000).
In this chapter, we address the acquisition patterns (language differentiation and
interaction) for the phonetic variables involving vocal effort. The variables have been
discussed in detail in Sections 2.1.2 and 2.1.4. Vocal effort has been measured
acoustically as “spectral balance” as outlined in 3.6.3.3. We look at the three variables:
(1)
vocal effort patterns for the SSE/MSR vowel /i/ compared across three
postvocalic consonantal contexts triggering different vowel length in SSE,
all in prominent positions (see Table 3-3 for the list of carrier words). The
short SSE vowel is produced with more vocal effort than the long one to
achieve sufficient prominence.
To achieve language-specific results in vocal effort, the bilingual subjects
should produce higher spectral balance values (less breathy laryngeal
configuration) for the short vowel (before voiced or voiceless stops) than
for the long vowel (before voiced fricatives) in SSE, and have a variable
pattern in MSR.
(2)
vocal effort differences between the SSE tense/lax vowels /i/ and // across
all consonantal contexts in prominent positions (see discussion in Section
2.1.2.5.). This potential difference between the tense/lax vowels is
hypothesised to be due to differentiated laryngeal configuration adopted for
the vowels (Stevens, 1998). If proved, the involvement of the laryngeal
198
level should be seen as a separate phonetic dimention differentiating the
tense/lax contrast in addition to vowel quality and duration.
To achieve language-specific results in SSE, the bilingual subjects should
produce substantially higher spectral balance values (less breathy laryngeal
configuration involving more vocal effort) for the lax vowel than for the
tense vowel.
(3)
vocal effort patterns of the close rounded vowel // in SSE and /u/ in
MSR compared across the three postvocalic consonantal contexts
triggering SVLR in SSE (see Table 3-3 for the carrier words), all in
prominent positions. The short SSE vowel is produced with more vocal
effort than the long one to achieve sufficient prominence. The difference is
Russian across the three consonantal contexts might not be systematic.
To achieve language-specific results, the bilingual subjects should produce
patterns very similar to /i/.
In order to present data on bilingual acquisition we need to create a reference for the
exact patterns of vocal effort for these variables produced by the appropriate monolingual
control groups.
First of all, we perform a crosslinguistic comparison for the three research variables
between the adult speakers of MSR (n=5), SSE (n=5) and SSBE (n=4). The comparisons
are performed with a similar set up as in Chapter 5 dealing with vowel duration. The
SSBE adult set serves as additional control for possible cross-varietal influences in the
child data.
Secondly, we present data on the SSE monolingual acquisition of vocal effort for
each of the variables by the pre-school children (n=7 plus three longitudinal cases).
Finally, we present the bilingual patterns of vocal effort. The structure of the tests is
similar to that in Chapter 5 on vowel duration. Each bilingual child’s SSE speech is first
compared to that of the SSE monolingual peers. Subsequently, we present a comparison
between the MSR and SSE patterns for each of the subjects. Additionally, we
descriptively compare each subject’s Russian pattern to that of her mother and the
investigator for the reasons outlined in Section 5.1.
199
6.2 Data Analysis
We present statistical analysis for all the normalisation methods of spectral balance
around F2 to allow assessment of their coherence.
The measures A2, A2*a, A2*b, A2*c (explained in Section 3.6.4.3) all normalised for
differences in overall intensity. The measures represent the RMS-power (dB) of the
steady-state of each vowel measured around mean F2 of the vowel in a fixed frequency
band of 600Hz (see Table 3-7 for the definitions). In addition to that, A2*a normalised for
formant frequency shifts within each of the targets /i  u  / across all speakers and
languages. This measure is suitable for comparing vowels similar in vowel quality within
or between languages (as within /i/). A2*b normalised for formant frequency shifts across
the targets /i /, and across all the speakers and languages. A2*c normalised for formant
frequency shifts across the targets / u /, and cross all speakers and languages. The
measures A2*b or A2*c are suitable for comparing vowels differing in vowel quality (such
as between /i / or between /u  /) within or between languages. Interspeaker
normalisation was applied separately for children and adults for all the measures, since the
two groups of speakers differed in their vocal tract size.
We applied the same data selection criteria as in Section 5.2. The statistical analyses
performed also had a set up similar to that in Chapter 5.
We hypothesised that the short SVLR context requires more vocal effort due to the
conflict between the short SVLR-conditioning and durational lengthening required by the
word-prosodic system. It is thus not an anticipatory effect of the following consonant. For
this reason it would be useful to present the vocal effort measurements in their strength of
association with vowel duration (rather than as a function of the following consonant).
However, exploratory data analysis suggested that it is only sensible to perform tests
based on measures of statistical association for stringent and prosodically homogenous
subsets of data. The variable and more spontaneous child datasets do not satisfy these
criteria. Therefore, we chose measures of difference (ANOVA’s) rather than of
association (correlation analysis). However, we do present a bivariate correlation test in
Section 6.3.1.1 based for one subset of the adult data for the vowel /i/ to exemplify our
hypothesis.
200
6.3 Acquisition of Vocal Effort
6.3.1 A comparison of adult models
6.3.1.1
Unrounded vowel /i/
We examine the crosslinguistic differences in the implementation of the vocal effort
pattern for the vowel /i/ compared between three postvocalic consonantal contexts
triggering differential patterns of vowel duration in MSR, SSE and SSBE.
The median values of three normalisation methods of RMS-power: i.e.. A2, A2*a and
A2*b (dB) of /i/ for each speaker were entered in a mixed design ANOVA with
“LANGUAGE” (SSE, SSBE, MSR) as a between-subject factor and the “FOLLOWING
CONSONANT” as a within-subject factor. The factor “FOLLOWING CONSONANT”
had three levels: i.e. voiced fricative, voiced stop and voiceless stop. Since the crosslinguistic comparison involves vowel /i/ with similar formant structure (see Table 3-8) the
normalisation method A2*a of RMS-power is most suitable for this test.
The results of the ANOVA are presented in Table 6-1. There was a significant main
effect of the following consonant on the measures of A2*a and A2*b for the vowel /i/, and
A2 almost reached significance. This effect showed that overall the following consonant
influences vocal effort applied for /i/. The factor “LANGUAGE” showed no significant
main effect. However, there was a significant interaction between the factors
“LANGUAGE” and “FOLLOWING CONSONANT” which showed that the direction of
the contextual effect on vocal effort depends on the language.
Table 6-1 Summary of the ANOVA results for adult controls for the vocal effort measures in the vowel /i/.
Main Effects
Normalisation
Method
A2
A2*a
A2*b
Interaction
Language * Following
Following Consonant
Language Consonant
F(1.2,13.98)=4.152, p=.053 ns
F(2.5,13.98)=3.866; p<.05
F(1.3,22)=4.152; p<.05
ns
F(2.5,22)=3.866; p<.05
F(1.3,22)=6.277; p<.05
ns
F(2.5,22)=6.755; p<.01
The mean values of A2, A2*a and A2*b (dB) and standard deviations for /i/ per
consonantal context and language averaged for all the speakers are summarised in
Appendix L. The direction of the interaction between consonantal context and languages
201
is shown in Figure 6-1. Russian monolingual speakers showed the opposite contextual
effect compared to both SSE and SSBE. In the two English varieties, the context before
voiced fricatives was produced with a lower A2*8a values (and accordingly vocal effort)
compared to that before voiced and voiceless stops. The difference between the two
contexts for is on average 5.6 dB in SSE, and 4.2 dB in SSBE.
This crosslinguistic difference between MSR and SSE is of importance, because it
shows that bilingual children have to acquire a differentiated control of the underlying
vocal effort applied to the vowel in addition to the durational differences due to the
postvocalic conditioning of vowel duration.
-45
RMS-power around F2 (dB)
-40
-35
-30
voiced fricative
-25
voiced stop
-20
voiceless stop
-15
-10
-5
0
SSE adult
MSR adult
SSBE adult
Figure 6-1 Crosslinguistic effect on vocal effort (based on A2*a measure, dB) produced by adults for the
vowel /i/ as a function of the following consonant.
The statistical results (in Table 6-1) across the three normalisation methods used for
the analysis were somewhat different and yet consistent, since significant effects were
obtained for the same factors and interactions. The intermeasure consistency was
expected, given that the vowel /i/ is not much different in formant structure between the
three languages. Besides, the adult group was quite homogeneous and was recorded in
studio conditions. Therefore, the difference in significance levels between the method A2
(normalising for the intra- and interspeaker differences in overall intensity) and A2*a or
A2*b (normalising for both formant frequency shifts and overall intensity) can be
202
explained by the intra- and interspeaker variation in vocal tract length and slight
articulatory changes in the production of the vowel /i/.
Figure 6-2 shows the association of the method A2*a (dB, on the Y-axis) inferring
vocal effort and vowel duration (ms, on the x-axis) between SSE and MSR. In MSR (left
panel), there was a highly significant positive correlation [r=.225, N=486, p<.01] between
vocal effort and vowel duration. This means that the MSR speakers spent more vocal
effort to produce vowels of longer duration. In SSE (on the right panel) there was a highly
significant negative correlation [r=-.337; N=396, p<.01] between vocal effort and vowel
duration meaning that the highest vocal effort was spent to produce the short SVLR vowel
/i/.
Figure 6-2 Correlation between the measure A2*a (dB) and vowel duration (ms) between MSR (left panel)
and SSE (right panel) adults speakers.
Furthermore, the individual results for SSE and MSR adult speakers in Figure 6-3
show that the SSE speakers were consistent in producing less vocal effort for long /i/
(before voiced fricatives) and in producing more effort for the short vowels. As opposed
to that the Russian speakers were much less consistent in their pattern (compare R2, R3
and R5). As for vowel duration, for vocal effort there is a system in SSE, and a system is
less obvious in MSR. This is the main crosslinguistic difference to keep in mind for the
bilingual acquisition part of the study.
203
Figure 6-3 Individual results for SSE and MSR adults for the production of measure A2*a of vocal effort for
the vowel /i/ as a function of the following consonant.
6.3.1.2
Vowel /i/ compared to //
Stevens (1998, p.297) pointed out that there are more than just segmental
differences to the tense/lax contrast. There are also differences in the laryngeal
configuration involved: for non-low vowels. This implies that the amplitude of the
spectrum above F1 tends to be higher for the lax American English vowels. The more
breathy laryngeal configuration for tense vowels reduces spectrum intensity in
midfrequencies (meaning that less vocal effort is spent to produce tense vowels), whereas
less breathy laryngeal configuration for the lax vowels enhances the intensity of midfrequencies. Jessen (2002) found such an acoustic correlate for German tense/lax contrast
and attributed it to “syllable-cut” (phonotactic “free” versus “checked”) differences
between the vowels. It is the aim of this section to investigate whether the same change in
laryngeal configuration (and vocal effort) applies to the SSE and SSBE tense/lax contrast
for /i /. No crosslinguistic comparison is drawn to Russian, since the language does not
feature the contrast. A background comparison to SSBE is interesting because, the SSBE
tense/lax contrast is of greater importance in the number of vowel pairs involved
compared to SSE.
The median values of the measures A2, A2*a, A
2*b
(dB) of /i/ and // for each
speaker were entered in a mixed design ANOVA with “LANGUAGE” (SSE and SSBE)
as a between-subject factor and the “TENSE/LAX VOWEL” as a within-subject factor.
The vowels /i/ and // differ in formant structure, therefore A2*b normalisation for formant
frequency shifts across the two vowels was most suitable for this test. No consonantal
context effects were taken into account (see the list of carriers in Table 3-3), and all
median values were calculated across the consonantal contexts.
204
The result of the ANOVA is summarised in Table 6-2. There were no significant
main effects and no significant interactions for the measures of A2 and A2*a. However,
there was a highly significant main effect of the vowel tenseness/laxness on the acoustic
parameter A2*b which was based on the normalisation for formant frequency shifts
performed across the two vowels. The direction of the differences between the vowels in
the two English varieties is shown in Figure 6-4. There were no other significant main
effects or interactions.
The result confirms Steven’s (1998, p.297) point above, as well as replicating
Jessen’s (2002) finding of laryngeal correlates of the tense/lax contrast for the German
vowels /i/ and //. This means that apart from the vowel quality differences adult speakers
in SSE and SSBE produced similar laryngeal changes: i.e. they adopted a less breathy
laryngeal configuration (spent more vocal effort) for the lax vowel //, as opposed to a
more breathy configuration for the tense vowel /i/.
Table 6-2 Summary of the ANOVA results for the three normalisation methods of vocal effort for the
tense/lax vowel pair /i / in adult SSE/SSBE speakers.
Main Effects
Normalisation
Method
A2
A2*a
A2*b
Tense/lax vowel
ns
ns
F(1,7)=52.335, p<.01
Interaction
Language *
Language tense/lax
ns
ns
ns
ns
ns
ns
-40
RMS-power around F2 (dB)
-35
-30
-25
tense
-20
lax
-15
-10
-5
0
SSE
SSBE
Figure 6-4 Differences between vocal effort spent (based on mean A2*b, dB) to produce lax vowel // and
tense vowel /i/ for 5 SSE and 4 SSBE adult speakers.
205
Table 6-3 SSE and SSBE adult means and standard deviations for three normalisation methods of vocal
effort for the vowels /i/ versus //.
Normalis
ation
method Vowel
/i/
A2
//
/i/
A2*a
//
/i/
A2*b
//
Std.
n of
Language Mean (dB) Deviation subjects
SSE
-24.60
1.71
SSBE
-23.96
2.69
Total
-24.32
2.07
SSE
-23.11
4.82
SSBE
-19.04
3.62
Total
-21.30
4.59
SSE
-25.07
1.85
SSBE
-24.29
3.55
Total
-24.72
2.57
SSE
-23.00
4.02
SSBE
-19.15
2.77
Total
-21.29
3.88
SSE
-28.83
1.85
SSBE
-26.70
3.29
Total
-27.88
2.65
SSE
-15.97
4.30
SSBE
-13.11
2.39
Total
-14.70
3.70
5
4
9
5
4
9
5
4
9
5
4
9
5
4
9
5
4
9
Whether this change of laryngeal configuration is an intrinsic property of the
tense/lax contrast (Stevens, 1998, p.297) or is due to language phonotactics and prosody
(‘syllable-cut’) (Jessen, 2002) or has any other reason, monolingual and bilingual children
have to acquire the segmental differences between tense/lax vowels and also the
appropriate laryngeal configuration accompanying the contrast.
6.3.1.3
Rounded vowels
We examine the crosslinguistic differences in the implementation of vocal effort for
the rounded vowels /u  / compared between three postvocalic consonantal contexts
triggering differential patterns of vowel duration in MSR, SSE and SSBE. Since similar
postvocalic consonantal conditioning applies to this set of vowels as for /i/, we expect to
find similar crosslinguistic results as for the vowel /i/.
The set up of the ANOVA was the same as for the vowel /i/ in Section 6.3.1.1. The
methods involved were A2, A2*a, A2*c. Since the vowels /u  / are different in formant
structure, the A2*c method is most relevant in this test.
206
The results of the ANOVA are presented in Table 6-4. There was a significant main
effect of the “FOLLOWING CONSONANT” on A2 and A2*c, and an almost significant
effect for the measure of A2*a. This shows that overall the following consonant plays a
significant role in the production of vocal effort for the close rounded vowel. There was
no significant main effect of “LANGUAGE”. However, there was a highly significant
interaction between the factors “LANGUAGE” and “FOLLOWING CONSONANT” for
all three normalisation methods, showing that the direction of the contextual effect
depends on the language (Figure 6-5).
Unlike /i/, there was also a highly significant main effect of the factor
“LANGUAGE” on the normalisation methods of A2 and A2*c (see Table 6-4). This effect
is most plausibly due to the absolute difference in intensity levels between MSR and other
two languages (see Figure 6-5 and Figure 6-6) and might be a result of a methodological
side effect of comparing vowels crosslinguistically different in formant structure (see
Table 3-8 comparing formants). There were no other significant main effects or
interactions.
The mean values for A2, A2*a, A2*c (dB) and standard deviations for the close
rounded vowels per consonantal context and language averaged for all the speakers are
summarised in Appendix M. The direction of the interaction is shown in Figure 6-5. The
figure shows that the crosslinguistic pattern between MSR and SSE was very similar to
that of /i/, i.e. the effect seems to be the opposite. In SSE, the speakers spent less effort in
producing // before voiced fricatives and stops compared to that before voiceless stops.
There was also a substantial cross-varietal difference between SSE and SSBE vowels in
the vocal effort based on A2*c between the contexts of voiced fricatives and voiceless
stops. This is not surprising, since in SSE the vowel // in the two contexts differs only in
duration and the consonant following the vowel, while in SSBE there is an additional
tense/lax /u / vowel contrast. The ratio of the overall difference for A2*c of the context
before voiced fricatives compared to voiceless stops is 6.52 dB in SSE (a ratio similar to
the SSE /i/), as opposed to the more substantial 22.3 dB in SSBE.
Thus the tense/lax difference in A2*c does explain the extent of the difference in
SSBE compared to SSE. However, since neither vowel quality differences are involved in
the SSE // nor “syllable-cut” bounding (the vowel is “free”, it can occur in an open
207
syllable without coda), the differences in the adjustments of the laryngeal configuration
must be due to some other reason, such as prominence cueing.
Table 6-4 Summary of the ANOVA results for the three normalisation methods of vocal effort for the close
rounded vowels in adults.
Main Effects
Normalisation
Method
Following Consonant
A2
F(2,22)=4.597; p<.05
A2*a
F(2,22)=3.346; p=.054
A2*c
F(2,22)=15.567; p<.01
Interaction
Language * Following
Language
Consonant
F(2,11)=5.963, p<.01 F(4,22)=6.245; p<.01
ns
F(4,22)=4.542; p<.01
F(2,11)=38.246, p<.01 F(4,22)=13.739; p<.01
-45
RMS-power arond F2 (dB)
-40
-35
-30
voiced fricative
-25
voiced stop
-20
voiceless stop
-15
-10
-5
0
SSE
MSR
SSBE
Figure 6-5 Crosslinguistic effect on vocal effort (based on mean A2*c , dB) in the adult production of close
rounded vowels as a function of the following consonant.
As for /i/, individual results for the SSE and MSR adult speakers in Figure 6-6 show
that the SSE speakers were consistent in producing less vocal effort for the long //
(before voiced fricatives) and in producing more effort for the short ones before voiceless
stops. However, the results for the context before voiced stops were less consistent than
for /i/. As opposed to SSE, the MSR speakers are much less consistent in their patterns
(compare R2, R1 and R5). Once again, there is a system in vocal effort for the SSE vowel
//, and a system is lacking in MSR /u/. This is the main crosslinguistic difference to keep
in mind for the bilingual acquisition part of the study.
208
Figure 6-6 Individual results for SSE and MSR adults for the production vocal effort (based on median
A2*c, dB) for the close rounded vowels as a function of the following consonant.
6.3.1.4
Summary of results for monolingual adults
The results of between-language analysis of variance showed that there were
significant differences in the systematicity of changes of laryngeal configuration between
MSR, SSE and SSBE for the vowel /i/. In the two English varieties, the context before
voiced fricatives (long vowel) was produced with a lower A2*a values (and accordingly
vocal effort) than the short vowel /i/ before voiceless stops. The difference between the
two contexts is on average 5.6 dB in SSE, and 4.2 dB in SSBE. The context before voiced
stops was usually produced with intermediate values of A2*a between the other two
contexts in SSE, and was somewhat lower than the context before voiced fricatives in
SSBE. All adults consistently showed these patterns.
A very similar vocal effort pattern was found for the vowel //, which in SSE
features the same SVLR conditioning as for the vowel /i/.
For these two vowels, the average results for MSR showed a pattern of vocal effort
opposite to SSE. Similarly to vowel duration patterns, individual MSR speakers varied
and deviated from this trend in several instances for the vowel /i/, and showed substantial
variation for the rounded vowel /u/. This shows that vocal effort in MSR is not connected
to the postvocalic conditioning system, and probably serves exclusively for the purposes
of increasing prominence or phonatory loudness. In fact for the MSR /i/ we observed a
positive correlation between vocal effort and vowel duration: i.e. the increase in duration
209
was associated with increasing vocal effort, while in SSE the opposite pattern was
observed.
Similarly to crosslinguistic duration patterns, the main difference between Russian
and SSE is that SSE features a fine-grained system of laryngeal contrasts depending on
the duration of the vowel, while MSR seems to lack a system, since the adults produced
very variable results.
There are some issues that we would like to address in the discussion of these
results. One of them is whether the observed systematicity in the vocal effort for /i/ and
/u  / should be attributed to segmental influences from the consonantal contexts (which
admittedly differed in this study) or to the systems of prominence and their acoustic
correlates involved.
With regard to the laryngeal contrast between tense and lax vowels /i / we showed
that adult speakers of both SSE and SSBE produced similar laryngeal changes: i.e. they
adopted a less breathy laryngeal configuration (spent more vocal effort) to produce the lax
vowel //, as opposed to a more breathy configuration (involving less vocal effort) for the
tense vowel /i/. This result replicates Jessen’s (2002) findings for laryngeal correlates of
the German tense/lax contrast. Similar laryngeal adjustment is found for the SSBE
tense/lax pair /u/ and // in Section 6.3.1.3.
For the SSE monolingual and SSE/Russian bilingual acquisition this means that
appropriate laryngeal configuration changes should be acquired alongside languagespecific vowel quality differences and the system of duration.
6.3.2 SSE monolingual children
6.3.2.1
6.3.2.1.1
Vowel /i/
Group results
We investigated whether the SSE monolingual children acquired the adult-like finegrained differences in vocal effort of SVLR vowel /i/ before voiced fricatives and voiced
and voiceless stops. In adult speech, long vowels (before voiced fricatives) were produced
with less vocal effort in prominent positions, while the short ones were produced with a
210
relatively less breathy laryngeal configuration (boosting RMS-power levels in
midfrequencies).
The intensity levels are represented by the values A2 (normalised for the overall
intensity differences), A2*a (normalised for the overall intensity and formant frequency
shifts within the target /i/ across speakers and languages), A2*b (normalised for the overall
intensity and formant frequency shifts across the targets /i / and across child speakers and
languages).
To address the SSE monolingual acquisition (for the age 3;4 to 4;9) we entered the
median values of A2, A2*a, A2*b (dB) as dependent variables in a mixed design ANOVA.
The between-subject factor “AGE” had four levels: i.e. adult, child aged 3;4 to 3;11; child
aged 4;0 to 4;4, child aged 4;5 to 4;9. The within-subject factor “FOLLOWING
CONSONANT” had three levels: i.e. voiced fricative, voiced stop and voiceless stop.
The results of the ANOVA are presented in Table 6-5. The results showed a highly
significant main effect of the factor “FOLLOWING CONSONANT” irrespective of the
other factors. There were no other significant main effects or interactions. All three
normalisation methods showed the same level of significance. The direction of the main
effect is plotted for the four age groups in Figure 6-7. The descriptive statistics for each of
the groups are reported in Appendix N.
Table 6-5 Summary of the ANOVA results for the three normalisation methods of vocal effort of the vowel
/i/ in four SSE monolingual age groups.
Normalisation
Method
A2
A2*a
A2*b
Main Effects
Following Consonant
F(2,22)=10.777; p<.01
F(2,22)=18.758; p<.01
F(2,22)=18.757; p<.01
Age
ns
ns
ns
Interaction
Age * Following Consonant
ns
ns
ns
211
Following consonant
-35
-33
-31
A2*a (dB) of /i/
-29
adults
-27
child 3;4 to 3;11
-25
child 4;0 to 4;4
-23
child 4;5 to 4;9
-21
-19
-17
-15
voiced fricative
voiced stop
voiceless stop
Figure 6-7 Context dependent vocal effort pattern (based on mean A2*a dB) for the vowel /i/ produced by
the SSE adults compared to three groups of children aged 3;4 to 4;9.
The result showed that by the age of 3;4 the SSE monolingual children acquired the
same fine-grained difference in producing vocal effort for /i/ in different consonantal
contexts as the SSE adults: the short vowel /i/ in prominent positions was produced with
an adjustment of laryngeal configuration towards a less breathy phonation resulting from
an increase in vocal effort, while the long vowel before voiced fricatives was produced
with a more breathy laryngeal configuration.
The result shows that despite non-linear physiological differences the SSE children
acquired the same vocal effort pattern as SSE adults already at the age of 3;4.
6.3.2.1.2
Individual results
Individual results of the children for the context dependent pattern of vocal effort in
/i/ are shown in Figure 6-8. The individual results are plotted on the x-axis. The patterns
differed in extent, but like the SSE adults in Figure 6-3, the children produced a very
consistent direction of the contextual differences in vocal effort.
212
SSE children
-40
median A2*a (dB)
-35
-30
-25
voiced fricative
voiced stop
voiceless stop
-20
-15
-10
-5
C
7_
4;
2
C
8_
4;
2
C
7_
4;
8
C
9_
4;
9
4;
1
C
4_
4;
0
C
5_
4;
0
6_
C
3_
3;
11
3;
8
C
4_
C
C
3_
3;
4
0
Figure 6-8 Individual SSE child results of vocal effort (based on median A2*a, dB) for the vowel /i/ as a
function of the following consonant.
213
Vowel /i/ compared to //
6.3.2.2
6.3.2.2.1
Group results
In Section 6.3.1.2 we showed that the adult speakers of both SSE and SSBE
produced similar laryngeal differences between tense and lax vowels: i.e. they adopted a
less breathy laryngeal configuration (spent more vocal effort) for the lax vowel //, as
opposed to a more breathy configuration for the tense vowel /i/. This section investigates
whether the SSE monolingual children acquired a similar difference in the vocal effort
patterns for tense and lax vowels /i/ and // as the adults.
We report the results for the three normalisation methods A2, A2*a, A2*b. However,
since we compare two vowels different in formant structure the method A2*b is most
suitable for this test.
We entered the median values of A2, A2*a and A2*b (dB) as dependent variables in a
mixed design ANOVA. The between-subject factor “AGE” had four levels: i.e. adult,
child aged 3;4 to 3;11; child aged 4;0 to 4;4, child aged 4;5 to 4;9. The within-subject
factor “TENSE/LAX VOWEL” had two levels: i.e. /i/ and //.
The result of the ANOVA is reported in Table 6-6. The test showed a highly
significant main effect of the factor “TENSE/LAX VOWEL”. All the normalisation
methods showed the same level of significance. There was no significant main effect of
“AGE”. The direction of the main effect is plotted for the four age groups in Figure 6-9.
The descriptive statistics for each of the groups are reported in Appendix O.
Table 6-6 Summary of the ANOVA results for the three normalisation methods of vocal effort of the
vowels /i/ versus // produced by four SSE monolingual age groups.
Normalisation
Method
A2
A2*a
A2*b
Main Effects
Tense/Lax Vowel
F(1,11)=33.778; p<.01
F(1,11)=47.244; p<.01
F(1,11)=163.763; p<.01
Age
ns
ns
ns
Interaction
Age * Tense/Lax Vowel
F(3,11)=3.717; p<.05
F(3,11)=5.658; p<.05
F(3,11)=4.465; p<.05
214
tense/lax vowel
-35
-30
mean A2*b (dB)
-25
adult
-20
child 3;4 to 3;11
child 4;0 to 4;4
-15
child 4;5 to 4;9
-10
-5
0
/i/
/I/
Figure 6-9 Vowel dependent vocal effort (based on mean A2*a, dB) for the vowels /i/ versus // in SSE
adults compared to three groups of children aged 3;4 to 4;9.
Figure 6-9 shows that the SSE speakers of all age groups produced a very similar
difference in A2*b measure between the tense /i/ and lax //. This result means that the
SSE speakers of all ages adjusted their laryngeal configuration towards a less breathy
phonation for the lax vowel compared to the tense one, and that this difference was highly
significant.
However, there was also a significant interaction (for all three normalisation
methods) between the factors “AGE” and “TENSE/LAX VOWEL”. This interaction
suggests that the extent of vocal effort between tense and lax vowels depended on the
factor “AGE”. We consider the individual child-by-child results to determine whether this
interaction showed a linear age pattern or was contributed by individual variation of the
subjects.
There were no other significant main effects or interactions.
6.3.2.2.2
Individual results
Figure 6-10 plots the individual results of all the SSE children (by age on the Xaxis) on the acquisition of the difference in vocal effort between the tense and lax vowel
pair /i/ and //. The individual results show that there was no age-dependent pattern in the
difference between the tense and lax vowels for the cross-section of the children
concerned: i.e. the involvement of vocal effort between the two vowels does not seem to
215
increase (or decrease) as a function of age. It is likely that the significant interaction
between the factors “AGE” and “TENSE/LAX VOWEL” observed in the previous
section is due to individual variation of the children contributing to the three age groups.
However, as with adults the observed patterns are consistent throughout the individual
results.
Therefore, we can conclude that the SSE monolingual children acquired the
laryngeal distinction between the tense and lax vowels in addition to differences in vowel
quality and duration: i.e. similarly to adults they produced a more breathy laryngeal
configuration for the tense /i/, and a less breathy configuration for the lax counterpart. The
laryngeal adjustment resulted from applying different vocal effort pattern for the tense and
lax vowels.
SSE children
-40
-35
median A2*b (dB)
-30
-25
tense
-20
lax
-15
-10
-5
;9
C
9_
4
;8
7_
4
;2
C
8_
4
;2
C
7_
4
;1
C
C
4_
4
;0
5_
4
;0
C
6_
4
C
3_
3
;8
C
C
4_
3
;4
3_
3
C
;1
1
0
Figure 6-10 Individual results for SSE children for the vocal effort differences (based on median A2*b, dB)
between the tense/lax vowels /i /.
216
Vowel //
6.3.2.3
6.3.2.3.1
Group results
We investigated whether the SSE monolingual children acquired adult-like
differences in the vocal effort pattern for the SVLR vowel // in the contexts before
voiced fricatives and voiced and voiceless stops. The expected differences were similar to
those for the vowel /i/: i.e. the long vowels (before voiced fricatives) should be produced
with a more breathy laryngeal configuration, the short ones with a less breathy
configuration.
To address the SSE monolingual acquisition we entered the median values of A2,
A2*a, A2*c (dB) as dependent variables in a mixed design ANOVA. The set up of the test
was the same as that in Section 6.3.2.1.1. We report on three normalisation methods A2,
A2*a and A2*c, but expect the method A2*a to be most suitable for this test.
The results of the ANOVA are presented in Table 6-7. There were no significant
main effects or interactions for the measure of A2. For the measures A2*a and A2*c, the test
showed a highly significant main effect of the factor “FOLLOWING CONSONANT”.
The two normalisation methods showed the same level of significance. There was a
significant main effect of the factor “AGE” for the measure A2*a. The direction of the
main effects for the measure A2*a is plotted per consonantal context for the four age
groups in Figure 6-11. The descriptive statistics for all three normalisation methods for
each of the SSE age groups are reported in Appendix P.
Table 6-7 Summary of the ANOVA results for the three normalisation methods of vocal effort for the
vowel // in four SSE monolingual age groups.
Main Effects
Normalisation
Method
Following Consonant
A2
ns
A2*a
F(2,22)=10.415; p<.01
Age
ns
F(1,11)=4.231; p<.05
Interactions
Age * Following
Consonant
ns
ns
A2*c
ns
ns
F(2,22)=10.415; p<.01
217
Following Consonant
-35
-33
mean A2*a (dB)
-31
-29
adult
-27
child 3;4 to 3;11
-25
child 4;0 to 4;4
-23
child 4;5 to 4;9
-21
-19
-17
-15
voiced fricative
voiced stop
voiceless stop
Figure 6-11 Context dependent vocal effort pattern (based on mean A2*a dB) for the vowel // in the SSE
adults compared to three groups of children aged 3;4 to 4;9.
There were no other significant main effects or interactions.
The direction of the main effect of the “FOLLOWING CONSONANT” is similar in
all age groups, and it is fairly consistent with the results of the acquisition of this pattern
for the vowel /i/. The result suggests that the SSE monolingual children acquired a similar
fine-grained control of laryngeal adjustments resulting in differentiated RSM-power
levels around F2 of the vowel //: i.e. the children adopted a more breathy laryngeal
configuration for the long vowel // before voiced fricatives, and spend more vocal effort
to produce short vowels before voiceless stops.
We ran Tukey HSD post-hoc tests to determine which groups contributed to the
significant main effect of the factor “AGE” for the measure A2*a. The results indicated
that the age differences were non-linear and there was only a significant (p<.05)
difference between the groups “adults” and children aged 4;0 to 4;4.
218
6.3.2.3.2
Individual results
Individual results for the SSE children are presented in Figure 6-12.
SSE children
-35
median A2*a (dB)
-30
-25
voiced fricative
voiced stop
voiceless stop
-20
-15
-10
-5
C
7_
4;
2
C
8_
4;
2
C
7_
4;
8
C
9_
4;
9
4;
1
C
4_
4;
0
5_
C
4;
0
6_
C
3_
3;
11
3;
8
C
4_
C
C
3_
3;
4
0
Figure 6-12 Individual SSE child results of vocal effort (based on median A2*a, dB) for the vowel // as a
function of the following consonant.
The individual results in Figure 6-12 show somewhat less consistent patterns than
those for the vowel /i/. Subject C3 did not produce an adult-like pattern at either ages 3;4
or 3;11, neither did C6 aged 4;0. However, all other subjects did produce an adult-like
pattern. The lesser consistency for the vowel // could be due to several reasons.
First of all, in the SSE children the vowel // was less adult-like in quality than /i/.
Secondly, the children differed qualitatively from the SSE adults in producing a broader
phonetic range of vowel qualities. Thirdly, we did not provide a joint normalisation for
the formant frequency shifts across adults and children. Therefore, a part of the
differences in consistency (as well as the age effect) might be due to differences in the
methodology used.
However, despite all these potential reasons for consistency and the age effects
observed for A2*a measure, the child results are largely in agreement with the acquisition
pattern for the vowel /i/ and with the SSE adult results.
219
6.3.2.4
Summary of results for the SSE monolingual children
We investigated whether the SSE children acquired the same fine-grained contextdependent differences in vocal effort as the SSE adults for the SVLR vowels /i/ and //,
and the differences in laryngeal configuration between tense/lax /i/ and //.
The results showed that despite non-linear physiological differences of respiratory
and laryngeal systems between children and adults, the SSE children produced the
patterns of vocal effort in a way similar to the adults at the age of 3;4 for all vowels
concerned in this study.
They systematically produced the short SVLR vowel /i/ (before voiceless stops)
with an adjustment of laryngeal configuration towards a less breathy phonation resulting
in a 3-4 dB higher intensity levels compared to the long /i/ (before voiced fricatives).
Similar fine-grained phonetic system was acquired for the close rounded vowel //.
However, the individual patterns of the children for // were more variable than those for
the vowel /i/, since in three out of ten cases the children did not reproduce the adult
pattern (while all of them produced it for /i/).
The children acquired a substantial laryngeal distinction between the tense and lax
vowel in addition to the difference in vowel quality and duration. As the adults, the SSE
monolingual children produced a more breathy laryngeal configuration for the tense
vowel /i/, and a less breathy configuration (involving more vocal effort) for the lax
counterpart. The less breathy configuration in the child data was reflected in a substantial
boost of intensities (RMS-power) by 16 to 27 dB in the acoustic spectrum depending on
age compared to the average of 13 dB produced by the adults.
The result is significant because it shows for the first time that pre-school children
also perform fine-grained speech motor control in varying vocal effort in a way similar to
adults, in addition to increases in phonatory loudness shown in previous studies
(Strathopoulos & Sapienza, 1993; Strathopoulos, 1995; Traunmüller & Eriksson, 2000),
and that this fine-grained speech motor control at laryngeal level of speech production
was used for linguistic tasks.
220
6.3.3 Bilingual Acquisition
6.3.3.1
6.3.3.1.1
Subject AN
SSE /i/
We assess whether the bilingual subject AN acquired the fine-grained differences in
vocal effort in the SVLR vowel /i/ before voiced fricatives and voiced and voiceless stops
in a way similar to the SSE monolingual peers.
The set up of the ANOVA was the same as in the Section 5.3.3.1.1. The dependent
variables for vocal effort were represented by the median values of the normalisation
methods A2, A2*a, A2*b (dB). Since we assessed differences in vocal effort within the
target /i/, the measure of A2*a is most suitable for this test.
The results of the ANOVA are summarised in Table 6-8. The test showed a highly
significant main effect of the “FOLLOWING CONSONANT” irrespective of the other
factors for all three normalisation methods. There were no other significant main effects
or interactions. This result shows that the bilingual subject AN acquired the system of
vocal effort for the vowel /i/ in a way similar to the SSE monolingual peers.
Table 6-8 Summary of the ANOVA results for the three normalisation methods of vocal effort for the SSE
vowel /i/ produced by the bilingual subject AN as compared to the SSE monolingual peers.
Normalisatio
n Method
A2
A2*a
A2*b
Main Effects
Following Consonant Age
F(2,14)=17.715; p<.01ns
F(2,14)=18.426; p<.01ns
F(2,14)=18.425; p<.01ns
Bilinguality
ns
ns
ns
221
-40
-40
-40
-35
-35
-35
-30
-30
-30
-25
-25
-25
-20
-20
-20
-15
SSE child 3;4 to 3;11
-15
SSE child 4;0 to 4;4
AN 4;2
AN 3;8
-10
voiced
stop
voiceless
stop
SSE child 4;5 to 4;9
AN 4;5
-10
voiced
fricative
-15
-10
voiced
fricative
voiced
stop
voiceless
stop
voiced
fricative
voiced
stop
voiceless
stop
Figure 6-13 Vocal effort for the vowel /i/ (based on A2*a, dB) as a function of the following consonant
produced by the subject AN as compared to the SSE monolingual peers in three age samples.15
The direction of the main context-dependent effect on vocal effort in /i/ is shown in
Figure 6-13. The descriptive statistics for AN’s production are reported in Appendix Q.
Similarly to the SSE adults and children, on average AN spent less vocal effort to produce
the long /i/ before voiced fricatives than the short ones before voiced and voiceless stops.
The results are consistent with AN’s language differentiation patterns for the vowel
quality. There seem to be no language interaction effects for this variable, as opposed to
AN’s vowel duration pattern for this vowel at the age of 3;8. Recall that for the
postvocalic conditioning of the duration of /i/ at the age of 3;8 AN produced a reduced
range of VLS/VF ratio compared to the monolingual peers. For the vocal effort pattern at
different ages she produced VLS/VF ratios (based on A2*a, dB) of 3, 7 and 5 dB similar to
the monolingual 3, 4, 4 dB.
15
SSE children’s group values are means of individual children’s median values in this Figure and in all
subsequent Figures comparing SSE of bilingual and monolingual children, while the bilingual child’s results
are represented by the median value.
222
6.3.3.1.2
SSE /i/ compared to //
In Section 6.3.2.2 we showed that in addition to the vowel quality and duration
differences, the SSE monolingual children also acquired a laryngeal contrast specific to
the tense/lax vowels /i/ and //. The contrast involved producing a less breathy laryngeal
configuration for the lax // and more breathy configuration for the tense /i/.
In this section we assess whether AN acquired this contrast in a way similar to the
SSE monolingual peers. The ANOVA set up was similar to that in Section 6.3.2.2.1.
There was an additional between-subject factor “BILINGUALITY” with two levels: i.e.
“bilingual” and “monolingual”. The factor “AGE” had three levels: i.e. “3;4 to 3;11”; “4;0
to 4;4”, “4;5 to 4;9”. The normalisation A2*b is most suitable for this test, since it involves
a comparison of two vowels different in quality.
The results of the ANOVA are summarised in Table 6-9. The descriptive statistics
of AN’s production are reported in Appendix R. The test showed a highly significant
main effect of “TENSE/LAX VOWEL”. There were no other significant main effects.
Table 6-9 Summary of the ANOVA results for the three normalisation methods of vocal effort for the SSE
vowel /i/and // produced by the bilingual subject AN compared to the SSE monolingual peers.
Main Effects
Normalisatio
n Method
A2
A2*a
A2*b
Tense/lax vowel
Age
F(1,7)=55.076, p<.01 ns
F(1,7)=39.735, p<.01 ns
F(1,7)=122.503, p<.01ns
Interactions
Tense/lax
Bilinguality vowel*Age
ns
F(2,7)=7.744, p<.05
ns
ns
ns
F(2,7)=4.908, p<.05
There was, however, a significant interaction between the factors “TENSE/LAX
VOWEL” and “AGE”. The direction of the main effect and of the interaction is shown in
Figure 6-14. AN produced a consistent tense/lax pattern that was very similar to the
pattern of the SSE monolingual peers in all age samples. AN’s tense-lax ratios were 26.2
dB at the age of 3;8, 9.39 dB at the age of 4;2 and 21.6 dB at the age of 4;5.
The interaction of the laryngeal contrast in the “TENSE/LAX VOWEL” with
“AGE” is due to the age of 4;2. The interaction with age is not linear in time, thus it does
not seem to be age-related. It shows that the laryngeal pattern is acquired; it is consistent
but can vary in its extent. There were no other significant interactions.
223
-40
SSE child 3;4 to 3;11
AN 3;8
-35
-40
-35
SSE child 4;0 to 4;4
AN 4;2
-40
-35
-30
-30
-30
-25
-25
-25
-20
-20
-20
-15
-15
-15
-10
-10
-10
-5
-5
-5
0
0
vowel /i/
vowel /I/
SSE child 4;5 to 4;9
AN 4;5
0
vowel /i/
vowel /I/
vowel /i/
vowel /I/
Figure 6-14 Vocal effort applied to /i/ and // (based on A2*b, dB across all consonantal contexts) produced
by the bilingual subject AN and by the SSE monolingual peers of three age groups.
Once again, AN produced a consistent language-specific pattern for SSE
comparable to that of the monolingual peers. The similarity of AN’s speech production in
this test is in line with her acquisition of vowel quality and vowel duration for the
tense/lax contrast. It is also in line with the quantity of SSE input that she received in the
community. This research variable does not seem to show any language interaction
effects.
224
6.3.3.1.3
SSE //
We assess whether AN acquired the patterns of vocal effort for the vowel // before
voiced fricatives and voiced and voiceless stops in a way similar to the SSE monolingual
peers.
The set up of the ANOVA was similar to that in the Section 5.3.3.1.1. The
dependent variables were vocal effort represented by the median values of the methods
A2, A2*a, A2*c (dB). Since we assessed only the SSE target // and we already know that
AN produced vowel quality ranges similar to the SSE monolingual peers (see Section
4.3.3.2.1), there are no substantial vowel quality changes involved in this comparison
(apart from the issue of non-adult like segmental variability in SSE children). Therefore,
the methodA2*a is most relevant for this test.
The results of the ANOVA are summarised in Table 6-10. As opposed to /i/, the test
revealed no significant main effect of the “FOLLOWING CONSONANT” on the vocal
effort pattern for //. However, there was a significant main effect of the factor
“BILINGUALITY” for the measure of A2*a (and an almost significant effect for A2*c).
There were no other significant main effects or interactions.
Table 6-10 Summary of the ANOVA results for the three normalisation methods of vocal effort for the SSE
vowel // produced by the bilingual subject AN in comparison to the SSE monolingual peers.
Normalisation
Method
A2
A2*a
A2*c
Main Effects
Following Consonant
ns
ns
ns
Age
ns
ns
ns
Bilinguality
ns
F(1,7)=5.802; p<.05
F(1,7)=4.992; p=.061
The descriptive statistics for AN’s production of vocal effort are summarised in
Appendix S. A comparison of AN’s production to that of the monolingual peers is plotted
in Figure 6-15. The figure shows that despite the non-significance of the factor
“FOLLOWING CONSONANT” AN produced an SSE-like pattern at the age of 3;8 and
at the age of 4;2, whereby the RMS-power levels of the vowel // are lower in the long
vowels before voiced fricatives compared to the short ones before voiceless stops. AN’s
vocal effort pattern before voiced stops is less consistent which was also the case in the
SSE monolingual children. However, AN’s pattern at the age of 4;5 is unlike the
monolingual pattern, in fact it is the opposite and resembles more the MSR adult pattern
shown in Figure 6-5.
225
The result might be due to child speech variability. We already discussed in Section
6.3.2.4 that the SSE vowel // is produced by monolingual (and bilingual) children with a
greater range of phonetic variability in vowel quality than the target produced by the
adults. There was also less consistency among the SSE monolingual children in producing
the vocal effort pattern for // compared to /i/. We also do not know the exact effect of
greater differences in vowel quality (SSE // versus MSR /u/) on the precision of
normalisation used in this study.
-40
-40
-40
-35
-35
-35
-30
-30
-30
-25
-25
-25
-20
-20
-20
-15
-15
SSE child 3;4 to 3;11
SSE child 4;0 to 4;4
AN 4;2
AN 3;8
voiced
stop
voiceless
stop
SSE child 4;5 to 4;9
AN 4;5
-10
-10
voiced
fricative
-15
-10
voiced
fricative
voiced
stop
voiceless
stop
voiced
fricative
voiced
stop
voiceless
stop
Figure 6-15 Vocal effort for the vowel // (based on mean A2*a, dB) as a function of the following
consonant produced by AN in comparison to the SSE monolingual peers in the three age samples.
226
6.3.3.1.4
MSR/SSE differentiation for /i/
In Section 6.3.1.1 we showed that there is a fine-grained crosslinguistic difference in
the realisation of the vocal effort between Russian and Scottish English: i.e. in SSE
prominent syllables containing an extrinsically short vowel /i/ are produced with
somewhat higher vocal effort (higher A2*a, dB) than the long vowels before voiced
fricatives. Russian seemed to lack a system of vocal effort, since individual speakers
varied more in their patterns than the SSE speakers, and the average effect of the
following consonant on the vocal effort seemed to show a variable pattern.
To establish whether AN produced the crosslinguistic difference in vocal effort for
the vowel /i/ in different consonantal contexts, and whether there is any age effect for this
difference, we entered all AN’s renditions of the words with the target /i/ in a multivariate
ANOVA. The ANOVA had A2, A2*a andA2*b (dB) as dependent variables and three fixed
factors: i.e. “FOLLOWING CONSONANT” (voiced fricative, voiced and voiceless stop),
“LANGUAGE” (SSE and MSR) and “AGE” (3;8, 4;2 and 4;5). Since /i/ is
crosslinguistically similar in vowel quality, the method A2*a is most relevant for this test.
The set up of the ANOVA required the use of the mean values rather than median used in
the test of the monolingual children.
The results of the ANOVA are summarised in Table 6-11. The results showed a
highly significant main effect of the factor “AGE”. We ran Tukey HSD posthoc tests for
the measure A2*a to determine which age contributed to this effect. The result showed that
there was a significant (p<.05) difference between AN’s results at the age of 4;2
compared to the ages of 3;8 and 4;5. Therefore, this age effect was not linear in time.
There were no other significant main effects. However, there was a highly
significant interaction between the factors “FOLLOWING CONSONANT” and
“LANGUAGE”. The interaction means that for each of her languages AN produced a
different vocal effort pattern depending on the following consonant, and that she
differentiated between SSE and MSR in producing vocal effort for the vowel /i/. There
were also highly significant interactions between the factors “FOLLOWING
CONSONANT” and “AGE” on one hand, and “AGE” and “LANGUAGE” on the other.
227
Both interactions reflect a relative instability of AN’s vocal effort patterns throughout the
age samples. There were no other significant interactions.
Table 6-11 Summary of the ANOVA results for the normalisation methods of vocal effort (A2, A2*a, A2*b,
dB) for the vowel /i/ as a function of the following consonant produced by the bilingual subject AN in MSR
and SSE
Normali
sation Main Effects
Method Age
F(2,602)=14.2,
A2
p<.01
F(2,602)=13.38,
A2*a
p<.01
F(2,602)=13.38,
A2*b
p<.01
Interactions of the "Following
Consonant" with
Language
Age
F(2,602)=6.08,
F(4,602)=2.8,
p<.01
p<.05
F(2,602)=8.51,
F(4,602)=4.5,
p<.01
p<.01
F(2,602)=8.51,
F(4,602)=4.5,
p<.01
p<.01
Other Interactions
Age*Language
ns
F(2,602)=4.4, p<.01
F(2,602)=4.4, p<.01
The direction of the crosslinguistic difference in AN’s production is plotted in
Figure 6-16. Descriptive statistics for this test are reported in Appendix Q. The figure
shows that at the ages of 4;2 and 4;5 AN differentiated between her MSR and SSE vocal
effort in a way quite similar to the crosslinguistic pattern that we reported for the adult
production in Figure 6-1. However, at the age of 3;8 the VLS-VF ratio in SSE was only
1.5 dB.
The significance of the main effect “AGE” could potentially be explained by factors
other than age. First of all, the variability of AN’s Russian pattern throughout time is
greater than that of the SSE pattern. Therefore, the significant effect of the “AGE” and all
interactions with the factor could be a side effect of the variability in the MSR pattern in
connection to the following consonant.
In Figure 6-17, we compare AN’s MSR vocal effort pattern to that of her mother
producing read speech and to that of the experimenter during the games (spontaneous
speech). Despite the absolute differences in intensity levels, AN’s patterns of vocal effort
in different consonantal contexts are quite similar to those of both adults in different
elicitation modes.
We conclude that AN differentiated the vocal effort pattern for the vowel /i/
between SSE and MSR throughout at the three age samples. Based on the median results,
she produced a language-specific pattern for SSE comparable to the SSE peers. There
were no language interaction patterns observed for this variable.
228
-40
-40
SSE 3;8
MSR 3;8
-35
-40
SSE 4;2
MSR 4;2
-35
-30
-30
-25
-25
-25
-20
-20
-20
-15
-15
-15
-10
-10
-10
-5
-5
-5
0
0
voiced
fricative
voiced
fricative
voiced stop voiceless
stop
MSR 4;5
-35
-30
0
SSE 4;5
voiced
stop
voiceless
stop
voiced
fricative
voiced
stop
voiceless
stop
Figure 6-16 AN’s crosslinguistic production of vocal effort for the vowel /i/ (based on mean A2*a, dB) as a
function of the following consonant (age is plotted from left to right).
-40
MSR 3;8
mother MSR
R3 CDS MSR
-35
-40
MSR 4;2
mother MSR
R3 CDS MSR
-35
-40
-35
-30
-30
-30
-25
-25
-25
-20
-20
-20
-15
-15
-15
-10
-10
-10
-5
-5
-5
0
0
voiced
fricative
voiced
stop
voiceless
stop
MSR 4;5
mother MSR
R3 CDS MSR
0
voiced
fricative
voiced
stop
voiceless
stop
voiced
fricative
voiced
stop
voiceless
stop
Figure 6-17 A comparison of AN’s vocal effort for /i/ in different consonantal contexts in MSR (based on
median A2*a, dB) to that of her mother and experimenter (R3 in child directed speech).
229
6.3.3.1.5
MSR/SSE differentiation for /u/ and //
To establish whether AN produced a crosslinguistic difference in vocal effort
applied to SSE // and MSR /u/ before different consonants and whether there was any
age effect for this crosslinguistic difference, we entered all AN’s individual renditions of
the carrier words with the targets // and /u/ in a multivariate ANOVA. The ANOVA had
A2, A2*a andA2*c (dB) as dependent variables and three fixed factors: i.e. “FOLLOWING
CONSONANT” (voiced fricative, voiced and voiceless stop), “LANGUAGE” (SSE and
MSR) and “AGE” (3;8, 4;2 and 4;5). Since // and /u/ are crosslinguistically dissimilar in
vowel quality, the normalisation method A2*c should be most relevant for this test. The set
up of the ANOVA required the use of mean values for each condition rather than median
used in the comparison to the SSE monolingual children.
The results of the test are presented in Table 6-12. The test showed highly
significant main effects for the factors “FOLLOWING CONSONANT”, “LANGUAGE”
and “AGE” for the methods A2*a and A2*c. There was also a highly significant interaction
between the factors “FOLLOWING CONSONANT” and “LANGUAGE” for the same
methods. There were no other significant main effects or interactions. The direction of the
main effects of the postvocalic conditioning on vocal effort in AN’s crosslinguistic
production for // and /u/ is shown in Figure 6-18. The descriptive statistics for AN’s
crosslinguistic production of vocal effort are found in Appendix S.
Table 6-12 Summary of the ANOVA results for the normalisation methods of vocal effort (A2, A2*a, A2*c,
dB) for the SSE vowel // and MSR /u/ as a function of the following consonant produced by the bilingual
subject AN in MSR and SSE.
Normali Main Effects
sation Following
Method Consonant
F(2,516)=4.382,
p<.05
A2
F(2,516)=9.510,
A2*a
p<.01
F(2,516)=9.458,
A2*c
p<.01
Interactions
Language * Following
Consonant
Language
Age
F(1,516)=36.467 , F(2,516)=3.523,
p<.01
p<.05
ns
F(1,516)=48.546, F(2,516)=6.222,
p<.01
p<.01
F(2,516)=11.557, p<.01
F(1,516)=44.363, F(2,516)=6.139,
p<.01
p<.01
F(2,516)=11.649, p<.01
230
-40
SSE 3;8
MSR 3;8
-35
-40
SSE 4;2
MSR 4;2
-35
-40
-30
-30
-25
-25
-25
-20
-20
-20
-15
-15
-15
-10
-10
-10
-5
-5
-5
0
0
voiced
fricative
voiced
stop
voiceless
stop
MSR 4;5
-35
-30
0
SSE 4;5
voiced
fricative
voiced
stop
voiceless
stop
voiced
fricative
voiced
stop
voiceless
stop
Figure 6-18 AN’s crosslinguistic production of vocal effort for SSE // and MSR /u/ (based on mean A2*c,
dB) as a function of the following consonant (age is plotted from left to right).
The results showed that overall at age 3;8 and 4;2 AN differentiated between SSE
and MSR in context-dependent vocal effort for the vowels // and /u/. The patterns for
these age samples seem to be acquired in the language-specific direction similar to adults
in Figure 6-5. The significant effects of the factor “LANGUAGE” and significant
interaction between the “FOLLOWING CONSONANT” and “LANGUAGE” confirm the
pattern in the figure. The vocal effort patterns in the three MSR age samples are all
different, and agree with the lack of system for the Russian language pattern in AN’s
production of vocal effort.
However, AN’s crosslinguistic pattern at the age of 4;5 does not seem to show any
language differentiation, and seems to follow the Russian pattern (see Figure 6-19). This
fact and the variability of the Russian patterns possibly contributed to the significant main
effect of the factor “AGE”. We showed in Section 6.3.3.1.3. that AN’s SSE pattern at the
age of 4;5 differed from the monolingual results. From the longitudinal perspective this
pattern at the age of 4;5 does not make sense, since at earlier ages of 3;8 and 4;2 AN did
differentiate between the two languages. This pattern could be explained by the individual
variability in the production of vocal effort for this vowel set, since the pattern also varied
231
between the individual SSE monolingual children. Therefore, we do not consider the
possibility of language interaction from Russian in this age sample.
Overall, the AN’s data at age 3;8 and 4;2 for the close rounded vowels is in
agreement with this subject’s language differentiation of vocal effort patterns for the
vowel /i/, and /i/ and //.
-40
MSR 3;8
mother MSR
R3 CDS MSR
-35
-40
MSR 4;2
mother MSR
R3 CDS MSR
-35
-40
-35
-30
-30
-30
-25
-25
-25
-20
-20
-20
-15
-15
-15
-10
-10
-10
-5
-5
-5
0
0
0
voiced
fricative
voiced
stop
voiceless
stop
voiced
fricative
voiced
stop
voiceless
stop
MSR 4;5
mother MSR
R3 CDS MSR
voiced
fricative
voiced
stop
voiceless
stop
Figure 6-19 A comparison of AN’s vocal effort for /u/ in different consonantal contexts in MSR (based on
median A2*c, dB) to that of her mother (reading) and experimenter (R3 in spontaneous speech).
6.3.3.1.6
Summary of AN’s results
The results of the acquisition of crosslinguistic vocal effort patterns for the bilingual
subject AN suggest that overall she differentiated between her two languages in a way
similar to the monolingual speakers. We observed no language interaction effects for this
variable.
First of all, AN acquired the vocal effort pattern connected to the interaction of
SVLR and prominence for the vowel /i/ in a way similar to the SSE monolingual peers.
As the monolingual children, she produced VLS-VF ratios (based on median A2*a, dB) of
3, 7 and 5 dB in the longitudinal age samples similar to the monolingual 3, 4, 4 dB. At the
same time, AN produced significantly different and language-specific patterns of vocal
effort for the vowel /i/ in the direction similar to that in the crosslinguistic adult data.
232
Unlike for the acquisition of SVLR (at age 3;8), no language interaction effects were
observed for this variable. There were also no age effects. This means that AN produced
the vocal effort patterns by the age of 3;8.
Secondly, AN acquired a language-specific SSE pattern of laryngeal adjustment for
the tense/lax contrast between the vowels /i/ and //, whereby AN produced significantly
higher vocal effort (based on A2*b, dB) for the lax vowel // compared to the tense /i/. In
physiological terms, the acoustic results indicate that the subject produced a less breathy
phonation for the lax vowel and a more breathy phonation for the tense vowels. Her
results closely matched the results of the SSE monolingual children. AN’s tense-lax ratios
were 26.2 dB at the age of 3;8, 9.39 dB at the age of 4;2 and 21.6 dB at the age of 4;5
compared to 26.5, 16 and 21 dB of the monolingual peers. The results are consistent with
AN’s language differentiation patterns for the vowel quality of the tense/lax contrast.
Unlike for the segmental quality of /i/ and //, and similarly to context dependent vocal
effort for /i/ this suprasegmental variable showed no language interaction effects.
With regard to AN’s acquisition of the context-dependent vocal effort pattern for the
close rounded vowels // and /u/, the results showed that at age 3;8 and 4;2 AN
differentiated between her two languages. The patterns in these age samples seem to be
acquired in the language-specific direction similarly to adults and to the patterns observed
for AN’s production of vocal effort for the vowel /i/. Significant effects of the factor
“LANGUAGE”
and
a
significant
interaction
between
the
“FOLLOWING
CONSONANT” and “LANGUAGE” confirmed the adult pattern. The vocal effort
patterns in the three MSR age samples were different, and they agree with the lack of
system for the Russian language pattern in AN’s production of vocal effort. However,
AN’s crosslinguistic pattern at the age of 4;5 did not seem to show any language
differentiation. It was unlike the overall SSE pattern of the monolingual children, and it
seemed to follow the Russian pattern. However, it did fall within the ranges of individual
variation of the SSE monolingual children.
Several considerations arose around AN’s vocal effort pattern at the age of 4;5 for
the close rounded vowels. First of all, the pattern did not make sense from the longitudinal
perspective, since at earlier ages of 3;8 and 4;2 AN did produce a crosslinguistic
difference, and her SSE pattern did not significantly differ from that of the monolingual
children. Secondly, the SSE monolingual results for the SSE vowel // showed that not all
233
the children produced the pattern (see Figure 6-12): i.e. subject C3 aged 3;4 and 3;11
consistently did not produce it in the two age samples, neither did C6 aged 4;0. Therefore,
the question arises whether this variability of results reveals a non-obligatory nature of
this vocal effort pattern (i.e. Is it just a tendency?), or whether it reveals methodological
problems in connection to measuring spectral balance in child speech generally, or
normalising for formant frequency shifts between vowels too different in formant
structure, in particular. After all, the pattern of vocal effort did seem to be more
systematic for the unrounded vowel /i/ which is also crosslinguistically similar in formant
structure. We return to these questions to some extent in the discussion chapter. At this
point suffice it to state that given the uncertainly about the methodological issues, we
cannot accept AN’s vocal effort pattern at the age of 4;5 as evidence for language
interaction from Russian in SSE.
234
6.3.3.2
6.3.3.2.1
Subject BS
SSE /i/
We assess whether the bilingual subject BS acquired the system of vocal effort
connected to interaction of SVLR of the vowel /i/ and prominence in a way similar to the
age-matched SSE monolingual children.
The set up of the ANOVA was the same as in Section 6.3.3.1.1. for the bilingual
subject AN. Similarly, the dependent variables were vocal effort represented by the
median values of the normalisation methods A2, A2*a, A2*b (dB). Since we assessed only
target /i/ and there are no vowel quality changes involved in this comparison, the A2*a
measure is most suitable for this test.
The results of the ANOVA are summarised in Table 6-13. The test showed a highly
significant main effect of the factor “FOLLOWING CONSONANT” for all three
normalisation methods similarly to the tests of the SSE monolingual children. There was
also a significant main effect of the factor “BILINGUALITY”, but no significant
interaction between the factors “FOLLOWING CONSONANT” and “BILINGUALITY”
for the measures of A2*a and A2*b. This lack of interaction showed that the direction of the
main effect of the “FOLLOWING CONSONANT” was the same in all age groups
irrespective of the factor “BILINGUALITY”, and there was only a difference in the
absolute level of the RMS-power observed after normalisation for the formant frequency
shifts.
The difference between BS’ and the SSE peers’ production for this variable per age
is shown in Figure 6-20. Descriptive statistics for BS are reported in Appendix Q. There
were no other significant main effects or interactions. The results showed that despite the
differences in absolute RMS-power levels measured between BS and the SSE
monolingual children (accounting for the significant effect of the factor bilinguality), BS
acquired the SSE pattern of vocal effort for the vowel /i/ in a way similar to the SSE
monolingual peers.
The results are consistent with BS’ acquisition of the vowel quality for /i/. However, we
showed in Section 5.3.3.2.1 that BS did not start to acquire the SVLR pattern for the
vowel duration until the age of 4;5, when she produced an insignificant SVLR-like
difference in the language-specific direction (but not yet in extent).
235
Table 6-13 Summary of the ANOVA results for the normalisation methods of vocal effort for the SSE
vowel /i/ produced by the bilingual subject BS as compared to the SSE monolingual peers.
Normalisat Main Effects
ion
method
Following Consonant
A2
F(2,14)=10.798; p<.01
A2*a
F(2,14)=19.351; p<.01
F(2,14)=19.351; p<.01
A2*b
Interactions
Following Consonant *
Bilinguality
Bilinguality
F(1,7)=5.845, p<0.05 F(2,14)=7.061; p<.01
F(1,7)=5.668, p<0.05 ns
F(1,7)=6.936, p<0.05 ns
Age
ns
ns
ns
-40
-40
-40
-35
-35
-35
-30
-30
-30
-25
-25
-25
-20
-20
-20
-15
-15
-10
-15
BS 3;4
BS 3;8
SSE child 3;4 to 3;8
SSE child 3;9 to 4;1
voiced
fricative
voiced
stop
voiceless
stop
-10
voiced
fricative
voiced
stop
BS 4;5
voiceless
stop
-10
SSE child 4;2 to 4;9
voiced
fricative
voiced
stop
voiceless
stop
Figure 6-20 Vocal effort for the vowel /i/ (based on mean A2*a, dB) as a function of the following
consonant produced by the subject BS in comparison to the age-matched SSE monolingual children in three
age samples.
Despite that, it seems that at the age of 3;4 BS produced the SSE vocal effort pattern
in a way similar to the monolingual peers: she produced a less breathy phonation mode
for the vowel before voiceless stops as opposed to two other contexts. Recall that BS did
not produce language-specific patterns of postvocalic consonantal conditioning of
duration compared to the monolingual peers. For the vocal effort pattern at different ages
she produced VLS-VF ratios (based on median A2*a, dB) of 4, 3 and 6 dB similar to the
monolingual 3, 4, 4 dB. This fact suggests that BS acquired the vocal effort pattern prior
to the SVLR pattern for this vowel.
236
6.3.3.2.2
SSE /i/ compared to //
We investigated whether BS acquired the laryngeal configuration for the SSE
tense/lax vowels /i / in a way similar to the monolingual peers. The laryngeal contrast
involves producing a less breathy laryngeal configuration for the lax vowel // and more
breathy one for the tense vowel /i/. In acoustic terms it involves higher A2*b levels (dB)
for the lax vowels and lower A2*b levels for the tense vowels.
The ANOVA set up of the same as in Section 6.3.3.1.2 except that the factor “AGE”
was changed to match BS age samples: i.e. “3;4 to 3;11”; “4;0 to 4;4”, “4;5 to 4;9”. We
report for the three normalisation methods A2, A2*a and A2*b (dB). The measure A2*b is
most relevant for this test, since it involves comparison between two vowels different in
formant structure.
The results of the ANOVA are summarised in Table 6-14. The descriptive statistics
of BS’ production are reported in Appendix R. The test showed a highly significant main
effect of the “TENSE/LAX VOWEL” for the measures of A2 and A2*b (and an almost
significant effect for A2*a). This means that BS acquired the tense/lax pattern of vocal
effort in a way similar to the monolingual peers. However, there was a highly significant
interaction between the factors “TENSE/LAX VOWEL” and “BILINGUALITY”. There
were no other significant main effects or interactions.
Table 6-14 Summary of the ANOVA results for vocal effort for the SSE vowel /i/and // produced by the
bilingual subject BS in comparison to SSE monolingual peers.
Normalisatio
n method
A2
A2*a
A2*b
Main Effects
Tense/lax vowel
F(1,7)=25.998, p<.01
F(1,7)=5.311, p=.055
F(1,7)=45.575, p<.01
Interactions
Age Bilinguality Tense/lax vowel * Bilinguality
ns
ns
F(2,7)=11.166, p<.05
ns
ns
F(2,7)=20.532, p<.01
ns
ns
F(2,7)=6.186, p<.01
The direction of the interaction is shown in Figure 6-21. The figure shows that even
though BS produced a difference in laryngeal configuration between the tense and lax
vowels in the language-specific direction, the difference did not reach the same extent
compared to the monolingual children. This explains the significant interaction between
the factors “TENSE/LAX VOWEL” and “BILINGUALITY”.
BS produced the tense-lax ratio for A2*b between 2 and 6 dB compared to 17 to 28
dB produced by monolingual peers. Considering the results of BS’ acquisition of
segmental quality of the lax vowel // (Section 4.3.4.1), this finding for its vocal effort is
not surprising, since BS produced only 35% of adult targets // as [], while the rest as [i].
237
The dotted line on the Figure 6-21 shows BS’ median A2*b values of the 35% of the
vowels auditorily labeled as phonetically tense and lax (compared to the 100%
phonological targets represented by the solid line). Surprisingly, there is little difference
in the realisation of the vocal effort measures between the vowels [i] and [] and all adult
targets /i/ and // produced by BS. In fact, the ANOVA based on the vocal effort measures
for phonetic labels [i] and [] showed the same levels of significance as in Table 6-14.
This shows that BS started to acquire the laryngeal contrast irrespective of the vowel
quality, and that the two properties are not necessarily intrinsically bound to each other.
-40
SSE child 3;4 to 3;8
BS 3;4 /i/ /I/
BS 3;4 [i] [I]
-35
-40
SSE child 3;9 to 4;1
BS 3;10 /i/ /I/
BS 3;10 [i] [I]
-35
-40
-35
-30
-30
-30
-25
-25
-25
-20
-20
-20
-15
-15
-15
-10
-10
-10
-5
-5
-5
0
0
i
I
SSE child 4;2 to 4;9
BS 4;5 /i/ /I/
BS 4;5 [i] [I]
0
i
I
i
I
Figure 6-21 Vocal effort of the vowel /i/ and // (based on mean A2*c, dB) produced by the bilingual subject
BS compared to the SSE monolingual peers in three age samples (BS’ target /i/and // are plotted separately
from the phonetic labels [i] []).
238
The reduced extent of BS’ differentiation of vocal effort between the SSE tense and
lax vowels parallels her patterns of postvocalic vowel duration conditioning in SSE, and is
in line with her substantially greater exposure to Russian.
Yet it is interesting to note that BS did produce a systematic difference in the
laryngeal configuration between the two targets /i/ and //, despite the relative lack of
differentiation in duration at the age of 3;4. It may indicate that BS was in the process of
acquisition of the language-specific laryngeal configuration of tense and lax vowels, and
that this happened prior to the acquisition of the postvocalic conditioning of duration.
6.3.3.2.3
SSE //
We assess whether BS acquired the system of vocal effort connected to interaction
of SVLR of the vowel // and prominence in a way similar to the SSE monolingual peers.
The set up of the ANOVA was the same as in Section 6.3.3.1.3, except that the
factor “AGE” had levels specifically matched to BS’ age samples “3;4 to 3;11”, “4;0 to
4;4”, “4;5 to 4;9”.
The test showed no significant main effects or interactions for any of the
normalisation methods reflecting vocal effort (A2, A2*a and A2*c, dB). The descriptive
statistics for BS speech production are reported in Appendix S. The median values of A2*a
of // for each of the consonantal contexts are plotted in Figure 6-22. The lack of
significance for the factor “FOLLOWING CONSONANT” which we systematically
observed in the monolingual tests for the SSE adults, monolingual children and the
subject AN can be explained by a joint effect of (1) BS’ quite variable longitudinal
production (see Figure 6-22), (2) the fact that the monolingual children were also variable
in producing the vocal effort pattern for //; and (3) by a relatively small sample of
children tested in this study.
239
-40
-40
-35
BS 3;4
-40
SSE child 3;9 to 4;1
SSE child 3;4 to 3;8
-35
BS 3;10
SSE child 4;2 to 4;9
-35
-30
-30
-30
-25
-25
-25
-20
-20
-20
-15
-15
-15
-10
-10
voiced
fricative
voiced
stop
voiceless
stop
BS 4;5
-10
voiced
fricative
voiced
stop
voiceless
stop
voiced
fricative
voiced
stop
voiceless
stop
Figure 6-22 Vocal effort for the vowel // (based on mean A2*a, dB) as a function of the following
consonant produced by the subject BS in comparison to the SSE monolingual peers in three age samples.
The lack of significance effects in this test revealed that BS had not acquired the
fine-grained SSE vocal effort pattern for //. This is in line with the fact that she had not
yet acquired the SVLR pattern for this vowel, and with the fact that she produced a
relatively high percentage (17%) of phonetically back vowels for SSE // compared to the
monolingual children.
However, we should also remember that not all of the monolingual children
produced this vocal effort pattern by the age of 4;0, and that generally this vowel was less
adult-like in vowel quality than the vowels /i/ or //. Thus, the result of this test does not
necessarily prove that BS was different in producing vocal effort if we keep in mind the
individual results of the monolingual peers.
240
6.3.3.2.4
MSR/SSE differentiation for /i/
To establish whether BS produced a crosslinguistic difference in vocal effort for the
vowel /i/ in SSE and MSR in different consonantal contexts and whether there was any
age effect for this crosslinguistic difference, we entered all BS’ renditions of the words
with the target /i/ in a multivariate ANOVA. The ANOVA had A2, A2*a andA2*b (dB) as
dependent variables and three fixed factors: i.e. “FOLLOWING CONSONANT” (voiced
fricative, voiced and voiceless stop), “LANGUAGE” (SSE and MSR) and “AGE” (3;4,
3;10 and 4;5). Since the SSE and MSR targets /i/ are similar in formant structure, the
normalisation method A2*a is most suitable for this test. As in AN’s tests, the multivariate
ANOVA required the use of mean values.
The results of the tests are summarised in Table 6-15. The results showed a highly
significant main effect of the factor “AGE” and significant main effects for the factors
“LANGUAGE” and “FOLLOWING CONSONANT” for the methods A2*a and A2*b.
There were no significant interactions between any of the factors. Since there was no
significant interaction between the “FOLLOWING CONSONANT” and BS’ languages,
the two languages were not as clearly differentiated as in the monolingual adults and AN.
Even though the factor “LANGUAGE” was significant, the significant main effect of the
factor “FOLLOWING CONSONANT” showed that the direction of the consonantal
effect on vocal effort of the vowel /i/ was largely in the same direction in both of BS’
languages.
The direction of the main effects per age and language is plotted in Figure 6-23. The
descriptive statistics are presented in Appendix R. The figure shows that BS’ MSR vocal
effort in the three age samples had variable patterns, while BS’ SSE vocal effort pattern
systematically showed lower A2*a levels for the vowel /i/ before voiced fricatives as
opposed to both contexts before voiced and voiceless stops. The crosslinguistic difference
became greater at the age of 4;5 which should at least partly account for the highly
significant age effect.
241
Table 6-15 Summary of the ANOVA results for the normalisation methods of vocal effort (A2, A2*a, A2*b,
dB) for the vowel /i/ produced by the bilingual subject BS in MSR compared to SSE.
Normalisation Main Effects
Method
Following Consonant
A2
ns
A2*a
F(2,610)=3.066, p<.05
A2*b
F(2,610)=3.066, p<.05
-40
SSE 3;4
MSR 3;4
-35
Age
F(1,610)=5.909, p<.05
F(1,610)=10.455, p<.01
F(1,610)=10.717, p<.01
-40
SSE 3;10
MSR 3;10
-35
-40
-30
-30
-25
-25
-25
-20
-20
-20
-15
-15
-15
-10
-10
-10
-5
-5
-5
0
voiced
fricative
voiced
stop
voiceless
stop
SSE 4;5
MSR 4;5
-35
-30
0
Language
ns
F(2,610)=3.313, p<.05
F(2,610)=3.313, p<.05
0
voiced
fricative
voiced
stop
voiceless
stop
voiced
fricative
voiced
stop
voiceless
stop
Figure 6-23 BS’s crosslinguistic production of vocal effort for the vowel /i/ (based on mean A2*a, dB) as a
function of the following consonant (age is plotted from left to right).
The vocal effort patterns in Figure 6-23 show a longitudinal progression towards
greater language differentiation at from the age of 3;4 to the age of 4;5 explaining the
significance of the main effect of “AGE”. Like AN, irrespective of her age BS seemed to
spend less vocal effort to produce the vowels before voiced fricatives compared to the
contexts before voiced and voiceless stops in SSE, while in Russian she had quite variable
patterns in the three age samples
A comparison of BS’ production of context specific vocal effort shown in Figure
6-24 to that of her mother and of the Russian-speaking investigator in child-directed
speech shows that at age 4;5 BS vocal effort pattern was quite similar to that of the adults.
242
-40
-40
-40
-35
-35
-35
-30
-30
-30
-25
-25
-25
-20
-20
-20
-15
-15
-15
-10
-10
-10
MSR 3;4
mother MSR
R3 CDS MSR
-5
0
MSR 3;10
mother MSR
R3 CDS MSR
-5
0
voiced
fricative
voiced
stop
voiceless
stop
MSR 4;5
mother MSR
R3 CDS MSR
-5
0
voiced
fricative
voiced
stop
voiceless
stop
voiced
fricative
voiced
stop
voiceless
stop
Figure 6-24 A comparison of BS’s vocal effort for /i/ in different consonantal contexts in MSR (based on
A2*a, dB) to that of her mother (read speech) and experimenter (R3 spontaneous speech).
Overall, despite the lack of significant interaction between “FOLLOWING
CONSONANT” and “LANGUAGE”, the patterns and results of the tests observed in this
section suggest that BS differentiated to a certain degree between her two languages from
the age of 3;4 for the production of vocal effort based on mean A2*a, and that this
differentiation became more substantial at the age of 4;5.
6.3.3.2.5
MSR/SSE differentiation for /u/ and //
To establish whether BS produced a crosslinguistic difference in vocal effort
between the SSE vowel // and MSR /u/ in different consonantal contexts, and whether
there was any age effect for this crosslinguistic difference, we entered all BS’ individual
renditions of the carrier words with the close rounded targets in a multivariate ANOVA.
The ANOVA had A2, A2*a andA2*c (dB) as dependent variables and three fixed factors:
i.e. “FOLLOWING CONSONANT” (voiced fricative, voiced and voiceless stop),
“LANGUAGE” (SSE and MSR) and “AGE” (3;4, 3;10 and 4;5). Since the SSE // and
243
MSR /u/ are dissimilar in formant structure, the normalisation method A2*c is most
relevant for this test.
The results of the test are summarised in Table 6-16. The test showed significant
main effects for the factors “FOLLOWING CONSONANT”, “LANGUAGE” and “AGE”
for the methods A2*a and A2*c. There was a highly significant interaction between the
factors “FOLLOWING CONSONANT” and “LANGUAGE”. However, it was only
observed for the method A2, and not A2*a or the more relevant A2*c. The direction of the
main effects of the postvocalic conditioning on vocal effort in BS’ crosslinguistic
production for // and /u/ is shown in Figure 6-25. The descriptive statistics for BS’
crosslinguistic production of vocal effort are reported in Appendix S. The age-specific
language differentiation patterns showed that BS seemed to differentiate between her
languages in absolute levels of A2*a, but not in the direction depending on the following
consonant. This pattern is different from the crosslinguistic patterns in the adult data in
Figure 6-5. Besides both languages seemed to vary considerably depending on BS’ age,
therefore, we cannot speak of a system in either language patterns.
The significant main effect of the factor “LANGUAGE”, and the lack of interaction
between the factors “LANGUAGE” and “FOLLOWING CONSONANT” supported the
pattern observed in Figure 6-25 that BS seemed to differentiate between the languages in
the absolute levels of vocal effort but not in the direction depending on the following
consonant. This result contradicts BS’ language differentiation in vocal effort for the
vowel /i/, as well as it parallels the non-differentiation pattern of vocal effort observed for
the subject AN for the same vowel set.
Table 6-16 Summary of the ANOVA results for the normalisation methods of vocal effort (A2, A2*a, A2*c,
dB) for the SSE vowel // and MSR /u/ as a function of the following consonant produced by the bilingual
subject BS in MSR and SSE.
Normal
isation Main Effects
Metho Following
d
Consonant
Age
F(2,491)=5.517,
A2
p<.01
ns
A2*a
A2*c
Interactions of Following
Consonant with
Language
Age
Language
F(2,491)=9.348
p<.01
Ns
ns
F(1,491)=132.978,
F(2,491)=5.251, F(2,491)=3.8, p<.01
F(4,491)=2.50
F(1,491)=124.598, 5, p<.05
p<.01
p<.05
ns
p<.01
244
-40
-40
SSE 3;4
MSR 3;4
-35
-40
SSE 3;10
MSR 3;10
-35
-30
-30
-25
-25
-25
-20
-20
-20
-15
-15
-15
-10
-10
-10
-5
-5
-5
0
0
voiced
fricative
voiced
stop
voiceless
stop
MSR 4;5
-35
-30
0
SSE 4;5
voiced
fricative
voiced
stop
voiced
fricative
voiceless
stop
voiced
stop
voiceless
stop
Figure 6-25 BS’ crosslinguistic production of vocal effort for SSE // and MSR /u/ (based on mean A2*c,
dB) as a function of the following consonant (age is plotted from left to right).
-40
MSR 3;4
mother MSR
R3 CDS MSR
-35
-40
MSR 3;10
mother MSR
R3 CDS MSR
-35
-40
-35
-30
-30
-30
-25
-25
-25
-20
-20
-20
-15
-15
-15
-10
-10
-10
-5
-5
-5
0
0
0
voiced
fricative
voiced
stop
voiceless
stop
MSR 4;5
mother MSR
R3 CDS MSR
voiced
fricative
voiced
stop
voiceless
stop
voiced
fricative
voiced
stop
voiceless
stop
Figure 6-26 A comparison of BS’s vocal effort for /u/ in different consonantal contexts in MSR (based on
A2*a, dB) to that of her mother (read speech) and experimenter (R3, in spontaneous speech).
245
A comparison of BS’ production of the context dependent vocal effort in MSR to
her mother’s patterns and to those of the Russian-speaking investigator in more
spontaneous data elicitation mode is shown in Figure 6-26. BS’ pattern is quite dissimilar
to that of her mother, but it is very similar to that of the investigator. This discrepancy
shows again that the Russian pattern of vocal effort might be more variable than the SSE
one. Both BS and the investigator produced the utterances in the same elicitation mode,
and that might explain the similarity of their patterns.
6.3.3.2.6
Summary of BS’ results
The results of the acquisition of crosslinguistic vocal effort patterns for the bilingual
subject BS were variable depending on the vowel set concerned.
First of all, BS acquired the context-dependent vocal effort pattern for the vowel /i/
in a way similar to the SSE monolingual peers. Like the SSE children, in the three age
samples she produced VLS-VF ratios (based on median A2*a, dB) of 4, 3 and 6 dB similar
to the monolingual 3, 4, 4 dB. At the same time, BS seemed to differentiate between her
two languages to a less significant extent, though she produced the SSE and MSR patterns
of vocal effort for the vowel /i/ in the direction similar to that observed in the
crosslinguistic adult data: i.e. less breathy laryngeal configuration for the /i/ before
voiceless stops as opposed to two other contexts. No language interaction effects were
observed for this variable. This means that BS produced language-specific vocal effort
patterns for the vowel /i/ by the age of 3;4.
The fact that BS seemed to have acquired the vocal effort pattern for the vowel /i/ is
surprising, given that BS did not start differentiating between the crosslinguistic
postvocalic vowel duration conditioning pattern until the age of 4;5, and she seemed to
produce the postvocalic conditioning of vowel duration in SSE according to the Russian
model. Therefore, this result may suggest that BS’ acquisition of the suprasegmental
laryngeal contrast in SSE precedes her acquisition of the language-specific timing.
For the laryngeal difference between the SSE tense and lax vowels /i/ and //, BS
produced a difference between the two vowels in the language-specific direction, but she
had not reached the same extent of the difference compared to the monolingual children.
BS produced the tense-lax ratio for A2*b between 2 and 6 dB compared to 17 to 28 dB
produced by the monolingual peers. Considering the results of BS’ acquisition of vowel
quality discussed in Section 4.3.4.1, this finding at the laryngeal level of the tense/lax
246
vowel contrast is not surprising, since BS produced only 35% of adult targets // as [],
while the rest as [i]. This means that BS neither fully differentiated between the vowel
quality nor between the laryngeal configuration accompanying it. Her results showed that
she was in the process of acquisition of both segmental and laryngeal differences between
the vowels, but she had not yet fully acquired either of them.
The question arises whether the acquisition of segmental vowel quality is a
necessary condition for the acquisition of the accompanying laryngeal difference or
whether it is it the other way around. Analysis of a subset of BS’ vowels actually
produced as [i] and [] (rather than all targets /i/ and //) did not seem to change the
picture at the laryngeal level. This suggests that the two levels of BS’ tense/lax vowels:
vowel quality and accompanying vocal effort are acquired separately, and that acquisition
of postvocalic conditioning of duration is yet to start.
The fact that BS did not reach the same extent of tense/lax vowel differentiation at
the laryngeal level as the SSE monolingual children reminds of the patterns for the vowel
duration observed in Kehoe’s (2002) study that showed that the German/Spanish bilingual
children aged 2;3 to 2;6 produced a significantly smaller extent of the durational
difference between short and long vowels than the German monolingual children.
The third acquisition pattern was observed between the consonant dependent vocal
effort for the SSE vowel // and the MSR /u/. The ANOVA showed no language-specific
language differentiation for this variable based on the lack of significant interaction
between BS’ languages and the factor “FOLLOWING CONSONANT. Besides, the
comparison of BS’ patterns of vocal effort for // (based on median A2*c, dB) to those of
the SSE monolingual children showed no significant effects or interactions. The lack of
significance revealed that BS had not acquired the fine-grained SSE vocal effort pattern.
This pattern is in line with the fact that BS had not yet acquired the durational SVLR
pattern for this SSE vowel. However, as in AN’s data, interpreting this apparent lack of
language differentiation in favour of language interaction is problematic, since not all the
SSE monolingual children seemed to produce the pattern of vocal effort for the vowel //.
247
7 Discussion and Conclusion
7.1 Overview of the main findings
7.1.1 Language differentiation and interaction patterns
The study accounted for the language differentiation and interaction patterns in the
speech of two early simultaneous bilinguals: i.e. BS (aged 3;4 to 4;5) and AN (aged 3;8 to
4;5). The bilingual girls were acquiring Russian and Scottish English in Edinburgh in
Russian-speaking families with a similar sociolinguistic background. However, the
subjects differed in the amount of language input received by the start of recordings: i.e.
BS (Figure 3-1) had substantially less input in Scottish English than AN (Figure 3-2).
We addressed the detail of their production of prominent syllable nuclear vowels
/i  / in Scottish English versus the vowels /i u/ in Russian for one segmental (vowel
quality) and two suprasegmental aspects (vowel duration and vocal effort). The set up of
the study was varied to trigger potential language differentiation and interaction effects
based on crosslinguistic structure, language input conditions and longitudinal effects. The
subjects produced variable degrees of language differentiation and interaction depending
on their age, language exposure, crosslinguistic structure and variable concerned.
We formulated four research questions for this study, namely: (1) Are the languages
differentiated? (2) Is their SSE native-like compared to the SSE-speaking children and
adults? (3) Is their MSR native-like compared to the MSR-speaking adults (including
mothers)? (4) Is there language interaction? (What are the patterns?).
The results of the study are summarised in Table 7-1. In the table we give the
“yes/no” answers to the above questions based on a combination of statistical results in
three comparisons (1) of each bilingual child’s speech to that of the SSE monolingual
children (based on ANOVA’s) (2) to MSR adults (based on descriptive statistics), (3)
each subjects two languages (based on ANOVA’s).
The table gives the answers to the research questions for the total of eight research
variables across different vowel sets and the level of speech production (vowel quality,
duration and vocal effort). The results are shown per subject, age sample, research
variable and vowel set considered.
248
The language differentiation effects can be split into three groups based on Table
7-1: (1) total differentiation, when a subject’s speech production was within the range of
the SSE monolingual peers and MSR adults, and both languages differed from each other
in the expected direction; (2) partial differentiation, when the subject’s languages differed
from each other in the expected direction, but one of the languages differed from the
monolingual controls; (3) lack of differentiation, when neither language differed from the
other in the expected direction, and either one or both languages differed from the
controls. Language interaction (accounted for in Chapters 4-6) appeared in the sound
structures with partial or lacking language differentiation.
For AN, Table 7-1 shows that out of the eight variables, at the age of 3;8 AN lacked
language differentiation for the two variables involving postvocalic conditioning of vowel
duration for the SSE/MSR /i/, SSE // and MSR /u/. At the same age, there were four
language interaction patterns, two of which were due to partial language differentiation.
At the age of 4;2 only one pattern of postvocalic conditioning of duration of SSE // and
MSR /u/ lacked language differentiation, while there were three language interaction
patterns. At the age of 4;5, AN fully differentiated between her MSR and SSE for all eight
variables, and thus no more language interaction for these variables was observed.
Overall for AN, the language interaction patterns involved only two research
variables: i.e. vowel quality and duration, of which vowel duration was affected most due
to the lack of language differentiation (rather than being partial). No language interaction
effects were found for AN’s vocal effort patterns. She differentiated between her
languages for all three variables involving vocal effort, except for the fact that no definite
answer could be given for her vocal effort pattern of SSE // and MSR /u/ at the age of
4;5 (“?” in the table) due to potential methodological problems (see Section 7.1.2.6).
Table 7-1 also shows for this subject, who was ‘balanced’ with regards to language
exposure, three quite divergent directions of language interaction, even though the extent
of this interaction was quite marginal (especially for the vowel quality variables).
The uni-directional interaction from SSE to MSR involved AN’s production of
vowel quality. AN introduced a non-existent lax vowel [] in her MSR splitting the MSR
vowel phoneme /i/ into two phones [i] (90%) and [] (10%), while acquiring the SSE
tense/lax contrast similarly to the monolingual peers. To the best of our knowledge, this is
the first report of systematic language interaction involving tense/lax vowels of this
249
direction (apart from a note of a similar phenomenon in Keshavarz and Ingram (2002),
see section 2.2.2. for discussion).
The uni-directional interaction from MSR to SSE involved postvocalic conditioning
of vowel duration of /i/. The pattern was very similar to the ‘reduced’ pattern (compared
to monolingual peers) of intrinsically short-long vowel duration in German-Spanish
bilinguals in Kehoe’s (2002) study (see discussion in Section 2.3.2).
The bi-directional interaction from SSE to MSR and MSR to SSE involved the
postvocalic conditioning of vowel duration of SSE // and MSR /u/. We have not found
reports on this type of interaction in early simultaneous bilingual acquisition studies.
However, similar bi-directional effects were reported on pitch alignment and intonation of
proficient L1 Dutch learners of L2 Greek (Mennen, 2004). There are also studies
reporting bi-directional cross-language effects in VOT (Caramazza et al., 1973; Flege,
1987; Williams, 1980).
Despite the internal divergence of the direction of language interaction within AN’s
speech production, all these patterns are backed up by the literature to some extent, and in
this sense they are coherent.
For the Russian-‘dominant’ subject BS (Table 7-1) there was also a longitudinal
tendency for the eight research variables. However, she substantially differed from AN in
that she had a lesser extent of language differentiation. She also produced more language
interaction effects and an overall different direction of language interaction. Like AN, at
the age of 3;4 BS lacked language differentiation for two variables involving postvocalic
conditioning of vowel duration of SSE/MSR /i/, and SSE // versus MSR /u/. At the same
age, she produced six language interaction effects (involving partial or lacking
differentiation). The language interaction involved all three research variables (vowel
quality, duration and vocal effort), of which vowel duration was affected most due to the
total lack of language differentiation. At the age of 3;10 the situation did not change either
in systematicity or in extent. At the age of 4;5, BS differentiated between her languages.
However, she still produced five language interaction effects due to partial differentiation.
Table 7-1 shows that the subject BS did not show any language interaction effects in
MSR. All the interaction effects were unidirectional from MSR to SSE. The effects in SSE
were quite extensive in that they affected all three research variables. Each of the
variables: vowel quality, duration and vocal effort were affected to a variable degree.
BS’ results for the acquisition of SVLR for the vowels /i/ and // both differ and
overlap with AN’s results for these vowels. In statistical terms, the difference was in the
250
greater significance of the factor ‘bilinguality’ in BS’ case both in comparison to AN and
to the SSE peers. The overlap was in the direction of language interaction observed for
these vowels: i.e. at a younger age both subjects produced a somewhat reduced extent of
SVLR (compared to the monolingual peers) between the contexts before voiceless stops
and voiced fricatives: i.e. their VLS/VF ratios were greater than either maximal or
average values of the SSE children. For AN, the reduced extent was only observed at the
youngest age of 3;8. For BS, the reduced pattern persisted throughout the three age
samples 3;4, 3;10 to 4;5. Once again the ‘reduced’ SVLR in the speech production of both
subjects agrees with Kehoe’s (2002) study, which showed that the German/Spanish
bilingual children aged 2;3 to 2;6 produced a significantly smaller extent of the durational
difference between short and long vowels than the German monolingual children.
Figure 7-1 exemplifies the empirical findings for the bilingual subjects in a more
abstract way. The two languages of each subject are presented as a cross-section, which
comprises subject specific extent of speech immaturity, and language interaction effects
(their extent and direction for the same set of variables).
In the following sections we shall address the differences and similarities between
the subjects with regard to different conditioning factors.
AN's model of SSE/MSR
representation
SSE-like Sound structures
MSR-like Sound structures
Speech Immaturity
Speech Immaturity
Interaction
from MSR
Interaction
from SSE
BS's model of SSE/MSR
representation
SSE-like Sound structures
MSR-like Sound structures
Interaction from MSR
Speech Immaturity
Speech Immaturity
Figure 7-1 Visual footprint of BS’ and AN’s language differentiation in their two languages, speech
immaturity and the direction of language interaction based on the results in this study.
251
Table 7-1 Patterns of language differentiation and interaction observed for the two bilingual subjects (BS
and AN) in different age samples, for three research variables and two vowel sets.
Subject and
Age
Research Questions
BS 3;4
BS 3;10
BS 4;5
AN 3;8
AN 4;2
AN 4;5
Languages differentiated?
SSE native-like?
MSR native-like?
Pattern of language interaction
Languages differentiated?
SSE native-like?
MSR native-like?
Pattern of language interaction
Languages differentiated?
SSE native-like?
MSR native-like?
Pattern of language interaction
Languages differentiated?
SSE native-like?
MSR native-like?
Pattern of language interaction
Languages differentiated?
SSE native-like?
MSR native-like?
Pattern of language interaction
Languages differentiated?
SSE native-like?
MSR native-like?
Pattern of language interaction
Vowel
Quality
Vowel
Duration
/i / /i/ // /u/
Yes
No
Yes
1
Yes
No
Yes
1
Yes
No
Yes
1
Yes
Yes
No
2
Yes
Yes
No
2
Yes
Yes
Yes
0
Yes
No
Yes
1
Yes
No
Yes
1
Yes
Yes
Yes
0
Yes
Yes
No
2
Yes
Yes
No
2
Yes
Yes
Yes
0
Vocal Effort
/i/
// /u/
//
/i/
No
No
Yes
1
No
No
Yes
1
Yes
No
Yes
1
No
No
Yes
1
Yes
Yes
Yes
0
Yes
Yes
Yes
0
No
No
Yes
1
No
No
Yes
1
Yes
No
Yes
1
No
No
No
3
No
No
No
3
Yes
Yes
Yes
0
/
No
/
1
/
No
/
1
/
No
/
1
/
Yes
/
0
/
Yes
/
0
/
Yes
/
0
Yes
Yes
Yes
0
Yes
Yes
Yes
0
Yes
Yes
Yes
0
Yes
Yes
Yes
0
Yes
Yes
Yes
0
Yes
Yes
Yes
0
// /u/ /i/ //
Yes
Yes
Yes
0
Yes
Yes
Yes
0
?
?
?
?
Yes
Yes
Yes
0
Yes
Yes
Yes
0
?
?
?
?
/
No
/
1
/
No
/
1
/
No
/
1
/
Yes
/
0
/
Yes
/
0
/
Yes
/
0
Patterns of language interaction:
0
none
1
uni-directional interaction from MSR to SSE
2
uni-directional interaction from SSE to MSR
3
bi-directional interaction from SSE to MSR and from MSR to SSE
Abbreviations:
/
not applicable
?
unable to determine
252
7.1.2 Conditioning Factors of Language Differentiation and Interaction
7.1.2.1
The role of language input conditions versus language structure
In the introduction to the methods used in studies of Bilingual First Language
Acquisition (BFLA) De Houwer (1998, p. 258) questions the use of the term ‘language
dominance’ (Petersen, 1988; Lanza, 1992), because it is often dubbed in terms of another
concept, ‘proficiency’, usually referring to assessment of adult language skills. De
Houwer rightly points out that the link of ‘dominance’ and ‘proficiency’ is problematic
with regard to immature child speech, and that assessment of ‘dominance’ is often
performed using monolingual solutions, like word or morpheme based MLU (Brown,
1973), given the lack of a baseline for crosslinguistic comparison. De Houwer further
states that “it remains to be considered whether and to what extent the notion of
‘dominance’ is at all needed either as a descriptive or an explanatory concept with regard
to very young bilingual children” (1998, p. 258).
Together with Lanza (2000) we do not agree with the latter statement for the
following reason. Whatever form ‘dominance’ takes in the mental representation of a
bilingual, it has an environmental source, namely it should be shaped by the amount of
exposure to the two languages and by the need to “communicate with people in the
immediate environment” (Grosjean, 1982, p.189). Obviously there may be as many
variable situations with regard to the language input as there are bilingual children. One of
the restrictions of studies doubting the usefulness of the concept of ‘dominance’ (de
Houwer, 1990; Döpke, 1998; Döpke, 2000; Müller, 1998) is that they have looked at the
acquisition of morphosyntax only. As we have seen, phonological studies that have
considered environmental conditioning of ‘dominance’ conclude that the factor may play
a role in language differentiation and interaction (e.g. in production of prosodic properties
such as VOT or rhythm: Kehoe et al., 2001; Paradis, 2001). This conclusion has found
support in this study.
We defined the potential ‘language dominance’ of the subjects from the amount of
exposure to the two languages rather than from the output ‘proficiency’. We asked the
question whether the observed patterns of language differentiation and interaction differ
along this ‘language exposure’ dimension and how this refers to structural properties of
the two languages in contact.
253
In BS’ case, with her substantial exposure to MSR (Figure 3-1), overall we observed
unidirectional language interaction from her MSR into SSE (Table 7-1, Figure 7-1).
Qualitatively, language interaction effects were similar to ‘transfer’ accounted for in L2acquisition studies. For the vowel quality there is abundant evidence that L2-learners
“under-differentiate” (Weinreich, 1953) in such phonological contrasts as tense/lax
vowels if they are absent (or represented by one phoneme) in their L1 (Panasyuk et al.,
1995; Markus & Bond, 1999; Escudero, 2000; Guion, 2003; Piske et al., 2002). The
pattern of language interaction in BS’ case was similar, since it involved the overuse of
the tense vowel [i] for the SSE lax vowel //. The direction of language interaction in BS’
case was compatible with the direction predicted by CCCH (Döpke, 1998; Döpke, 2000)
and the Markedness Hypothesis (Müller, 1998) for simultaneous bilingual acquisition.
According to both hypotheses it is directed unidirectionally from a structurally simpler
into structurally more complex language (irrespective of ‘dominance’). The pattern of BS’
language interaction also agrees with her individual pattern of language exposure (more
Russian than English).
However, the pattern observed for AN for the vowel quality in this set of vowels
was just the opposite of BS’: namely in AN’s case it was directed from SSE to MSR,
where she “over-differentiated” (Weinreich, 1953) the tense/lax contrast. AN introduced a
phonologically irrelevant tense/lax contrast in her Russian. Recall AN’s more ‘balanced’
language exposure pattern (Figure 3-2).
The mirror-image language interaction for the same structural ambiguity between
the two subjects is then not explainable in terms of simplicity or complexity of sound
structures involved, but it rather can be explained by the subjects’ different language
exposure patterns.
Since both CCCH (Döpke, 1998; Döpke, 2000) and the Markedness Hypothesis
(Müller, 1998) predict unidirectional language interaction for one and the same language
structure (though it can be bi-directional for different ones), these hypotheses find thus no
support in this study at the level of sound structure.
As opposed to partial language differentiation in BS’ production of vowel quality
for the SSE /i / and MSR /i/, we observed an overall lack of language differentiation for
the subject’s vowel duration (if we do not consider a non-significant longitudinal change
at the age 4;5). Her patterns of language differentiation and interaction again resembled
the “underdifferentiation” effects observed in L2-learners of languages with complex
vowel duration conditioning patterns. In the studies of vowel duration conditioning in L1
254
French learners of L2 English (Mack, 1982) and L1 Russian learners of L2 Latvian
(Markus & Bond, 1999), the L2-learners produced some ‘intermediate’ results for
phonetically or phonologically long and short vowels given lack of such a system in L1,
or failed to produce them at all. Similarly, BS’ SSE SVLR targets /i/ generally were close
to those in her MSR, i.e. she did not produce the long vowel duration before voiced
fricatives the same way as the SSE monolingual children. Consequently, BS did not
produce a differentiated postvocalic pattern for the lax vowel // compared to her
production of SSE /i/. The pattern of language interaction again was clearly directed from
her Russian into SSE.
However, AN’s cross-linguistic patterns of vowel duration conditioning for /i/ and
// were only quantitatively different from those of BS: i.e. while BS did not seem to
produce the SVLR pattern at all (at least not in the first two age samples), AN had a
‘reduced’ extent of SVLR for /i/ at the age of 3;8 similar to the patterns reported in Kehoe
(2004). Besides, in SSE AN did produce a differentiated postvocalic conditioning with
SVLR for /i/ and invariably short conditioning for //. We can make two further comments
regarding these findings. The difference in language interaction between the two subjects
for this variable and set of vowels is thus quantitative (since the direction of interaction is
the same). Therefore, plausibly the amount of language exposure between the two subjects
is reflected in the different extent of language differentiation for the SSE vowel duration
patterns in their speech production.
Secondly, there is the striking fact that AN produced the vowel quality interaction
between tense/lax vowel unidirectionally from SSE into MSR, while her SVLR vowel
duration pattern for /i/ had an opposite unidirectional interaction from MSR to SSE. This
means that in AN’s case for the two variables of vowel quality and duration, the language
interaction effects were bi-directional within speech production of the same subject. This
finding is problematic for the Language Dominance Hypothesis (Petersen, 1988) in
simultaneous bilingual acquisition, since it predicts unidirectional language interaction in
the linguistic output of the same individual.
The third language interaction effect observed in AN’s postvocalic conditioning of
duration of the SSE vowel // and MSR /u/ (Table 7-1) was a bi-directional transfer from
SSE to MSR and MSR to SSE. Recall that she produced a more SVLR-like pattern in her
Russian, while producing a ‘reduced’ (compared to the monolingual children) SVLR
255
difference in SSE. This effect was observed at both age of 3;8 and 4;2, while at the age of
4;5 she produced language-specific patterns.
This means that for the postvocalic vowel duration pattern the language interaction
effects were bi-directional within the same subject and sound structure variable.
This conclusion is problematic for Language Dominance Hypothesis (Petersen,
1988), for CCCH (Döpke, 1998; Döpke, 2000), and the Markedness Hypothesis (Müller,
1998), since all of them predict only unidirectional language interaction for the same
language property. However, such bi-directional interaction is in line with findings of bidirectional transfer of timing of intonation patterns in proficient L1 Dutch learners of L2
Greek (Mennen, 2004). In this sense, the bi-directional interaction is not confined to early
simultaneous bilingual acquisition, and on this basis should not be used as an argument
for a functional distinction between early simultaneous bilinguals and L2-learners. This
finding shows that in more ‘balanced’ bilinguals, the direction of interaction can be
‘fuzzy’, and both languages can be affected. Such fine-grained phonetic interaction in
variable speech production may even not be necessarily perceivable in the context of less
mature child speech.
To summarise, so far, for more ‘balanced’ bilingual children (like AN), at the level
of sound structure, there seems to be no necessary direction of ‘dominance’, since the
language balance can be blurred for some less categorical variables like vowel duration.
The balance of acquired sound structures is generally language-specific (and mainly
differentiated). It depends on the sound structure in question. In some cases language
interaction can affect either of the languages and can even be bi-directional. Surfacestructural ‘markedness’ or ‘cue strength’ do not necessarily determine the direction of
language interaction at the level of sound structure. Our data support Lanza’s suggestion
that the two possible conditioning factors of language interaction: i.e. structural properties
and environmental factors such as language exposure need not to be exclusive arguments
(Lanza, 2000, p. 233).
In that sense language interaction can be considered to be a ‘normal’, but not
obligatory feature of the simultaneous bilingual acquisition of sound structure.
It is important to emphasise that CCCH (Döpke, 1998; Döpke, 2000) and the
Markedness Hypothesis were formulated based on morphosyntactic acquisition studies,
while we dealt with an altogether different linguistic level of sound structure. Also Paradis
& Genessee (1996) provided evidence on autonomous development based on the studies
of syntactic structures. However, in two later studies Paradis (2000; 2001) refined the
claim of autonomous development, based on further evidence from French-English
256
truncation patterns at the level of prosodic structure, which did show language interaction
effects as in this study. She attributed the differences between the two studies to the
methodological issues such as observation versus experimental manipulation, and to the
differences in the language pairs involved. We used two methods (auditory labelling and
instrumental acoustic analysis), yet most language interaction effects consistently showed
up for the same variables and subjects irrespective of the method used.
One obvious difference between morphosyntactic and sound structures is in the fact
that the physical manifestation of speech production is dual. It embraces the discrete
‘phonologised’ mental hierarchy of language units and continuous speech motor control,
whereas the dichotomy is absent in the production of morphosyntactic structures. It is
possible that some language interaction effects observed in this study are bound to the
level of sound structure due to this dual nature of speech production. However, this should
not mean that studies of non-speech levels of language should not look at the possibility
of such bi-directional interaction in the discrete language properties.
7.1.2.2
Sound-structural effects
In the previous section we showed that even though some language interaction
effects were compatible with structural arguments proposed by the CCCH (Döpke, 1998;
Döpke, 2000) and the Markedness Hypothesis (Müller, 1998), we saw that structural
factors such as ‘markedness’ or ‘cue strength’ do not predict the direction of language
interaction at the level of sound structure.
In addition to possible problems with the language level involved (morphosyntax
versus sound structure), there is another issue that explains why these structural factors
must be questioned. The problem at the level of sound structure may be in determining
markedness for segments in isolation. Segments or their suprasegmental properties (like
postvocalic vowel duration conditioning) are intertwined with other conditioning factors,
such as crosslinguistic differences in final devoicing of the phonologically voiced
obstruents (and their own relative markedness).
For example, BS may not produce the SSE language-specific vowel duration pattern
(with the lack of SVLR in SSE), because she might have produced complete final
devoicing phonetically (or in a phonologically neutralising way) more than the
monolingual children (see also Section 3.4.1.). This is because even if not neutralising, a
phonetically voiceless final consonant makes the preceding vowel shorter, if the vowel is
by definition phonated. Neutralisation of final voicing might be complete (or more
257
complete) in Russian compared to SSE in utterance-final positions (see e.g. Burton &
Robblee, 1997), since SSE is like American English or other British English varieties
(Docherty, 1992; Smith, 1997). Determining the ‘completeness/gradualness’ of the
neutralisation is not a trivial issue (especially in child speech), since it potentially requires
the use of more instrumental techniques such as airflow measurements or
electroglottography in addition to auditory or acoustic analysis, which we could not
perform within the scope of this thesis. We already discussed the fact (Section 2.1.2.6)
that SVLR in SSE is conditioned by both voicing and manner of articulation, and thus,
completeness of voicing alone would not have explained the big extent of the difference
in duration produced by either subject. However, this argument may apply to bilingual
acquisition studies of postvocalic conditioning dealing with languages that differ in
postvocalic conditioning based on voicing effect, as in German and SSBE (Whitworth,
2003), especially when such studies involve markedness in the discussion (Section 2.3.2).
We have seen in AN’s case that not all language interaction effects observed in the
vowels originate in the vowels (AN’s [] for MSR /u/ in Section 4.3.3.2.2), but may be
due to the phonotactic influence of the preceding consonant. This is another reason why it
can be misleading to study segments in isolation without considering the contextual
effects.
Sound structure did matter in the sense that some research variables (Table 7-1)
seemed to be more prone to language interaction than others, both depending on the level
of speech (vowel quality, duration and vocal effort), and on the vowel set concerned. For
example, within the segmental level the crosslinguistic systemic difference in vowel pair
/i / showed more language interaction effects than the realisational difference between
SSE // and MSR /u/ for both subjects.
Our assessment of the SSE vowel set /i / versus MSR /i/ was more versatile
compared to other monolingual, L2 or bilingual acquisition studies (Kehoe & StoelGammon, 2001; Buder & Stoel-Gammon, 2002; Kehoe, 2002; Stoel-Gammon et al.,
1995; Buder & Stoel-Gammon, 2002). It was new in that in addition to vowel quality
and/or duration, we also assessed the production of laryngeal effects connected to vocal
effort accompanying these vowels. For this contrast, AN produced all vowel quality,
duration and vocal effort effects in a native-like way in SSE, and the contrast only
affected her vowel quality in Russian. At the same time, Russian-dominant BS did not
fully differentiate between the SSE tense/lax contrast in vowel quality, duration, and vocal
effort, although she produced the contrast to some extent at the level of vowel quality and
258
vocal effort. Therefore in BS’ case, the lack of contrast in Russian did seem to affect her
SSE speech production. Both subjects seemed to have less problem acquiring the
realizational SSE // and MSR /u/ difference (Table 7-1), since they had greater extent of
language differentiation for this variable.
From the bilingual point of view, it is possible that the systemic tense/lax like /i / is
more difficult to acquire than the realizational differences like SSE // and MSR /u/, not
because it is more complex in surface sound structure, but because the tense/lax contrast
involves a more complicated speech motor control of both supralaryngeal and laryngeal
levels on top of timing as opposed to the / u/ difference (which mainly involves
supralaryngeal differences).
On the other hand, for the monolingual SSE acquisition this study (Section 4.3.1.3)
and Matthews (2002) have shown that the SSE monolingual children’s production of the
lax vowel // was more ‘adult-like’ than that of the vowel // despite the more
complicated speech motor control in the lax vowels compared to //. This discrepancy
may be not a matter of acquisition of speech motor control of the target sounds at the age
considered in this study. Both /i / had been acquired. The phone [] had been acquired
and is produced alongside with other less frequent phonetic variants [] and [u] for //.
After all, the children are systematically exposed to other non-SSE English varieties in
Edinburgh in addition to SSE. Studies involving a sociolinguistic perspective of
phonological acquisition (Docherty & Foulkes, 1999; Khattab, 2004; Scobbie, 2005) have
convincingly shown that “aspects of variable performance must be learned alongside
reflexes of the system of lexical contrast” (Docherty et al., in press). It is thus possible
that this pattern of variation in // in the SSE monolingual children reflects the crossvarietal variability in Edinburgh, and will possibly be reduced at the later school age
towards a more adult-like range, when the linguistic background of the majority SSE
peers becomes more important through their socialisation pattern. As Chevrot et al. (2000,
p.297) put it: “it is probable that stylistic skills precede stylistic awareness of the social
meaning of variants”, meaning that this type of sociolinguistic cross-varietal variation is
encoded in the mental representation through language input, but the metalinguistic
awareness of the appropriate social meaning of variety is yet to be acquired and applied.
Another sound-structural effect emerging from our bilingual data is the apparent
discrepancy in language differentiation patterns between the three gross variables
regarding sound structure in this study. In the results of both subjects we only observed
259
patterns apparently lacking language differentiation for vowel duration, while for vowel
quality and vocal effort the languages were either fully or partially differentiated (Table
7-1). This mainly concerned the postvocalic conditioning of vowel duration involving
SVLR in SSE, and a lack of such an extent of conditioning in Russian. As a result, most
language interaction effects (resulting from a total lack of and partial language
differentiation) – both uni- and bi-directional – appeared for this variable.
At the same time we concluded that their monolingual SSE peers had already
acquired the extrinsic and intrinsic vowel duration patterns. We should clearly note here
that “lack of language differentiation” for this variable does not imply a categorical
statement, since we dealt with continuous speech production reflecting variability ranges,
in which some of the produced tokens fell within the monolingual production ranges.
Besides, it is very much an open question how categorical perception of this timing
parameter works in both child and adult cases either in bilingual and monolingual context
(Macken, 1986). Our data do not allow evaluation of this, not least because it is difficult
to separate different (supra-)segmental levels of speech from each other in accent
judgment experiments based on real speech production. We were not aware of the
language interaction in the vowel duration component in AN’s speech during data
annotation, even though it was quite clear to us for BS’s long vowels.
One conclusion arising from the comparison to the SSE monolingual children is that
plausibly the language interaction effects in vowel duration the two bilingual children
cannot be accounted for by speech immaturity. Another conclusion is that ‘markedness’
or ‘cue strength’ resulting from surface structure of vowel duration do not seem to play a
role in the way proposed for the acquisition of morphosyntax (Döpke, 1998; Döpke, 2000;
Müller, 1998), since we have observed bi-directional effects for the same variables.
It is plausible that the bilingual input conditions make the two bilingual subjects
different from the monolingual SSE peers. It is possible that the crosslinguistic difference
in the input structure of postvocalic vowel duration conditioning had affected the bilingual
children’s ability to ‘phonologise’ (Keating, 1984) the versatility of cross-linguistic input
rules, and affected their categorical perception of these continuous variables and their
acquisition. The bi-directionality of the language interaction in AN’s data does not
support the view that it always works in a specific direction, as in BS’s case. It only
suggests that in the case of postvocalic conditioning of duration the categoricalness of
perception might be somewhat ‘blurred’ due to the versatility of bilingual input.
260
7.1.2.3
Lexicalisation effects
Proponents of the segment-sized basis of phonological acquisition (Wode, 1992, p.
622) have argued that in a bilingual context variation due to crosslinguistic influence
occurs in all targets containing a given segment, and that there is no evidence that
phonological variation in early bilingual acquisition is lexically based.
We have seen a clear example in AN’s data in Section 4.3.3.2.2 that a language
interaction effect can be lexically bound. 92.4% of instances of AN’s [] for the MSR /u/
were confined to one MSR carrier word [’ut] (a joker), while the rest of the MSR tokens
with this vowel were quite adult-like. AN had some difficulty in producing this particular
lexical item, either because of the influence of the preceding consonant, which she
consistently produced with laminal SSE articulation instead of apical as in MSR, or
because this lexical item happened to be a false cognate of the English verb “to shoot”. In
either case, this example illustrates that language interaction can be lexically bound, and
that it can be misleading to adopt a strict segmental view of phonological acquisition
without accounting for the other systemic influences on the segments.
This finding thus supports the views proposing broader units of phonological
acquisition, which involves “formalization of the strategies that a particular child has
adopted to represent words and classes of phonetically similar words” (Macken, 1986,
p.264), either lexically (Ferguson & Farwell, 1975), or “word template” based (Vihman,
1996; Vihman, 2002).
7.1.2.4
Maturation and age effects
In Section 1.3.2.5 we discussed the Bilingual Bootstrapping Hypothesis (GawlitzekMaiwald & Tracy, 1996). The hypothesis views bilingual language acquisition in a
maturational perspective. Under this view language interaction in syntactic acquisition of
young simultaneous bilinguals is a relief strategy involving a temporary use of child
expertise in one domain of LA to solve similar problems in LB. One of the falsifiable
predictions from the hypothesis was that language interaction with regard to a structure
should cease once the structure is acquired.
However, two patterns in AN’s data are problematic for the Bilingual Bootstrapping
Hypothesis (Gawlitzek-Maiwald & Tracy, 1996), at least at the level of sound structure.
The patterns were: (1) introducing the SSE lax vowel [] for the Russian /i/ and (2)
producing an SVLR-like durational difference in the otherwise ‘simple’ Russian model of
postvocalic conditioning of vowel duration. If bilingual language interaction is due to the
261
fact that a property has not yet been acquired, then language interaction regarding a
property in LB should cease once the similar property in LB is acquired. However, in the
case of AN introducing a non-existent phoneme // for the Russian /i/ it is not the question
of acquisition that seems to cause language interaction. We showed that AN acquired and
used both phonemes in SSE in a way similar to that of the SSE monolingual children. In
the light of ‘bilingual bootstrapping’ it should be no problem for AN to produce Russian
/i/, if she can produce both SSE /i/ and // in appropriate contexts in a native-like way,
since Russian and SSE /i/ are similar in vowel quality. It appears thus that at the level of
sound structure language interaction does not necessarily cease when the language
structures involved in this interaction are acquired.
From this perspective acquisition of a single sound-structural property does not
seem to be the only condition for its appropriate use, nor does completeness of its
acquisition explain language interaction. This bilingual pattern concerns the nature of
phonological development in general: “If a child’s business is to construct, within the
boundaries of UG, the simplest account of the input data why does a child impose
additional complexity on the grammar?” (Mohanan, 1992). We showed in Section
4.3.3.1.1 that this (admittedly marginal) pattern involved all carrier words; it did not seem
to have phonotactic explanations from the preceding palatalised consonants, and was
longitudinally coherent (ceasing at the age of 4;5). Therefore, the process was systematic.
Yet it is still possible that the appearance of [] in AN’s Russian may have a phonotactic
explanation from the influence of the following consonants in the carrier /ti/ [ti] (a
finch). For example, phonetically devoiced // in the syllable coda may have been
‘perceptually assimilated’ to // word-finally. In English, there are only a few infrequent
words ending in /i/ (‘sleesh’, ‘creesh’, ‘sneesh’, ‘quiche’, ‘niche’) and many more ending
with // including the really frequent ones (such as ‘fish’ and ‘dish’) (Rockey, 1973) that
a child is likely to know. In Russian, monosyllabic words ending with /i/ or /i/ are
relatively infrequent too. So that the greater frequency effect of the SSE // may have
cross-linguistically affected the production of the Russian low-frequency targets ending
with /i/. This possibly explains appearance of [] for /i/ in one MSR carrier word out of
three. However, this shows the enormous complexity of the task of eliciting data from
children, the multidimentionality of crosslinguistic sound-structural (in-)compatibility,
and the strength of the distributional characteristics of the input language. In that sense,
262
the pattern of language interaction is rather compatible with the input to the child rather
than with surface structural ‘markedness’. This frequency effect suggests that bilingual
acquisition of sound structure is lexically (Ferguson & Farwell, 1975) or “word template”
based (Vihman, 1996; Vihman, 2002), rather than instantiated segmentally.
With regard to age effects it is further worth noting here that most syntactic
acquisition studies claiming autonomous development also studied children of younger
age (usually 1;5 to 3;0). The two subjects in this study were older (3;4 to 4;5) and thus
had had more time to practice their speech motor routines and phonologise them in both
languages. Despite this, both subjects showed signs of systematic language interaction.
This finding supports the idea that not all language levels might be equally prone to
language interaction, with the level of sound structure in general being more prone to it
than some regular morphosyntactic properties (Paradis, 2000) (if we don’t consider, for
example, irregular or infrequent morphosyntactic subtleties), and this should encourage
more research into acquisition of bilingual speech.
Figure 7-2 Abstract representation of the longitudinal effect for the bilingual subjects AN and BS on their
bilingual language differentiation based on the number of sound structure variables involved in total and
partial language differentiation across their two languages.
263
We observed systematic longitudinal effects in language differentiation for both
subjects. The longitudinal effect is shown in Figure 7-2.
The language differentiation effects in Figure 7-2 are split into three groups based
on Table 7-1: (1) total differentiation, when subject’s speech production was within the
range of the SSE monolingual peers and MSR adults, and both languages differed from
each other in the expected direction; (2) partial differentiation, when a subject’s languages
differed from each other in the expected direction, but one of the languages differed from
the monolingual controls; (3) lack of differentiation, when the languages did not differ
from each other in the expected direction, and either one or both languages differed from
the controls. The width of each type of differentiation is determined by the number of
variables (our of eight in Table 7-1) showing each type of differentiation. All amounts of
language differentiation are drawn across the two languages. Several tendencies are
apparent from Figure 7-2.
First of all, both subjects show a systematic progression towards more
differentiation with increasing age, which shows up in the amount of total and partial
differentiation. This suggests that their bilingual speech production for all variables
becomes more and more language-specific, and that the observed amounts of language
interaction still reflects some more ‘initial’ stage of language acquisition, which may
eventually cease with growing linguistic experiences. In this sense their state of bilingual
language acquisition is not necessarily different from L2 acquisition, for which it is
known that ultimate attainment is proportional to the amount of language exposure (Flege
et al., 1995; Birdsong, 2004) with a confounding effect of age of acquisition, and where
‘transfer’ is known to manifest itself most obviously in the initial stages of L2 acquisition.
Secondly, the amount of differentiation lacking seems to be nearly equal in both
subjects (despite their language exposure differences) and it does not affect the majority
of their speech production. Recall that the lack of differentiation was mostly contributed
by the vowel duration component. In that sense both girls’ patterns are different from
predictions made for adult L2-learners. For example, the Competition Model (Bates &
MacWhinney, 1989; MacWhinney, 1997) predicts for beginning L2 learners that
everything that can transfer (given ‘cue strength’ differences) will transfer.
Thirdly, the girls do substantially differ in the amount of ‘partial differentiation’
compared to total language differentiation. By the age of 4;5, AN achieved total language
differentiation for all the variables considered, while BS reached a stage where she had no
patterns lacking differentiation. It seems that given their different language exposure
patterns, the parameters of exposure and age co-vary (or perhaps accumulate) and with
264
increasing age affect the output language differentiation patterns. Once again the
interdependence of language preference in adulthood, age of onset (amount of exposure in
years) and ultimate attainment is not new in L2-acquisition studies. For example, Flege et
al (1995) assessed the relation between non-native subjects' age of learning (AOL)
English and the overall degree of perceived foreign accent in their production of English
sentences. The 240 native Italian subjects had begun learning English in Canada between
the ages of 2 to 23, and had lived in Canada for an average of 32 years. Native Englishspeaking listeners used a continuous scale to rate sentences spoken by the native Italian
subjects and by subjects in a L1 English comparison group. Age of onset accounted for an
average of 59% of variance in the foreign accent ratings. Language use factors, such as
dominance, accounted for an additional 15% of variance. Thus, also in that study the
amount and length of language exposure determined ultimate attainment.
To conclude, in a longitudinal perspective the language differentiation patterns
primarily showed their dependence on the amount of language exposure, and a structural
effect of vowel duration on the lack of differentiation. Language interaction seems to be
part of normal bilingual phonological development, and it may eventually cease with
growing linguistic experience. Maturationally, bilingual acquisition of sound structure
cannot be accounted for in terms of bilingual bootstrapping based on segment-sized
phonology. Some ‘unnecessarily complex’ sound structures in adult terms can potentially
be explained by lexicalisation, frequency or/and phonotactic effects. The data supports
hypotheses claiming that children acquire phonology in units of larger size than segments.
7.1.2.5
Other environmental effects
This study was set up to account for cross-varietal influence on the English acquired
by the bilingual subjects. The majority of Edinburgh population speaks broad Scottish
Standard English varieties. However, there is a substantial proportion of non-SSE English
speakers in Edinburgh, especially in Middle Class families. We addressed the population
statistics in more detail in Section 3.2.1.
The design of this study included four SSBE adult speakers and a monolingual child
(C4) from a mixed SSE/SSBE parental background. Their data allowed us to determine
the non-SSE British English patterns for the variables in this study. Despite the presence
of input from non-SSE English varieties in the girls’ nursery and the fact that both of them
were regularly exposed to RP-based mass media, our data show that the subjects acquired
the Scottish English sound structures rather than those of the non-SSE English varieties.
265
Considering the fact that no English input was provided in the family, the proportion of
the English varieties in bilingual children’s input (with SSE being the majority variety)
seems to determine the English variety acquired.
Additionally, our data for the monolingual subject C4 showed that a child exposed
to two varieties of English (SSE and SSBE) in the parental input acquired an SSE SVLRlike pattern for the vowel /i/, and an SSBE pattern for the vowel //. This result replicated
results observed for two older children with non-SSE English parents growing up in
Edinburgh in Hewlett et al. (1999). Hewlett et al. (1999) suggested that the additional
vowel quality difference (SSBE /u/ and // as opposed to the lack of such contrast for the
SSE //) can mediate the acquisition of a variety-specific pattern of postvocalic
conditioning (i.e. the SSBE-like voicing effect for /u/ in that case).
While we do consider this as a possible explanation, we additionally suggest this
might also be a result of differing input conditions for the two vowels. For example, it is
possible that an SSBE–speaking parent gives more explicit attention to the child’s
acquisition of SSBE segmental /u / difference, because it is perceptually more salient
and phonologically more relevant for an SSBE-parent than the mere durational one
involving /i/. This may encourage ‘explicit learning’ (Vihman, 2002) (on top of
incidental) of the segmental contrast between /u/ and // and subsequent acquisition of the
SSBE voicing effect rather than of SSE SVLR.
7.1.2.6
Methodological issues
In this study we used two methods: i.e. observation (auditory labelling) of vowel
quality and quantitative instrumental measurement of vowel duration and vocal effort.
Two conclusions arise from the use of these methods. First of all, we observed language
interaction effects for all three sound structure variables despite the methodological
differences. The language interaction effects measured for the subject BS were coherently
in the same direction, and were in agreement with the L2- and bilingualism literature. This
allows us to state that the observed patterns of language differentiation / interaction
(including those of subject AN) are not a methodological artefact.
Secondly, in the instrumental measurements there were two language interaction
patterns that could not be easily detected by observation. The first one involved the
reduced extent of SVLR-conditioning in /i / produced by both subjects compared to the
266
monolingual peers (all in the earliest age samples). The second one involved the bidirectional influence between AN’s SSE and MSR systems in the production of the
postvocalic conditioning of duration of // (at the same youngest age). The third one
involved the ‘reduced’ laryngeal difference in vocal effort between the SSE tense and lax
vowels produced by BS compared to the monolingual peers. It means that there are
methods in the analysis of speech, which are more suitable to observe fine-grained
phonetic details, which would not necessarily be detected by observation.
Measuring vocal effort patterns for different vowel sets produced quite coherent
results across the different subject groups. The analysis of the intrinsic laryngeal contrast
between SSE tense/lax vowels in children and adults also replicated results for German in
Jessen (2002). This shows that the methodology applied was largely sound.
Some problematic issues arose in explaining the non-differentiated vocal effort
patterns for the close rounded vowels in the bilingual data at the age of 4;5. In this age
sample, both subjects produced patterns of vocal effort for the SSE close rounded vowels
that differed from their own production at earlier ages, and from the SSE monolingual
results of either children or adults. The pattern was not explainable in longitudinal terms.
It could not be explained by the larger variability in their vowel quality for //, since both
subjects showed language differentiation in vowel quality. It is important to note here that
upon finding this problematic result we thoroughly crosschecked all the data looking for
potential data analysis errors at different levels, but we did not find any explanation in
that. One plausible explanation would be child speech variability. However, we still have
doubts as to why this should happen for both subjects, in the same age sample out of
three, and not occur in the monolingual results. We had two age samples for 3 SSE
monolingual subjects (Figure 6-12): subjects C3 and C4 produced similar patterns in both
age samples, while subject C7 produced a longitudinally coherent change. Thus, for the
bilingual subjects we decided to discard this vocal effort variable for // at age 4;5 from
the further discussion as unreliable (hence the question marks in Table 7-1).
267
7.1.3 Implications of the bilingual findings
7.1.3.1
Language differentiation/interaction patterns and their mental
representation
In the review of bilingual issues (Chapter 1) we considered some pros and cons for
‘autonomous or interdependent’ bilingual language acquisition (Paradis & Genesee,
1996), and potential manifestations of the interdependence. According to the ‘autonomous
development hypothesis’ bilingual children acquire grammatical systems which are not
functionally different to monolingual development, while the ‘interdependent
development hypothesis’ claims that bilingual’s language systems develop differentially,
“causing a bilingual child to look different from monolingual children” (Paradis &
Genesee, 1996, p.2). Upon the finding of language interaction effects at prosodic level of
speech, J. Paradis (2000; 2001) stepped aside from this categorical view on
autonomy/independence, and moved the discussion towards ‘degrees of separation’ and
interaction in its relation to different subcomponents of language grammar, its structure
and language dominance.
In the light of our results here, it seems that the ‘interdependent development
hypothesis’ could indeed be interpreted in such a gradual fashion: i.e. rather than
postulating language interaction as an obligatory property of bilingual language
development, we can say that bilingual’s language systems develop differentially from
each other and may (though need not) interact.
We have seen that at the level of sound structure language interaction took place
systematically in both subjects regardless of their language exposure patterns. Both
subjects differentiated between their languages, and showed patterns of language
interaction to variable degrees which primarily depended on their language exposure, but
also depended on structural characteristics of the contrasts involved. Therefore, there is a
need for the synthesis of these factors. There was no evidence in this study that language
interaction effects work exclusively unidirectionally. On the contrary, it seems that the
mental footprint of language balance at the level of sound structure can produce quite
‘fuzzy’ directions of language interaction based on the language exposure of the bilingual
child and on the structures involved. The effects of language interaction observed in this
study are compatible with the unidirectional and bi-directional effects observed in L2268
acquisition. From this point of view, there seems to be no need for a separate model of
language interaction effects in simultaneous bilingual acquisition of sound structure.
In our view, the gradual interpretation of the ‘interdependent development
hypothesis’ is compatible with the postulations of the ‘neurolinguistic theory of
bilingualism’ (Paradis, 2004; Paradis, 1993; Paradis, 1981). M. Paradis did not
specifically develop the theory for language acquisition, but rather for some typical ‘end
product’ state of adult bilinguals, but we do see its implications for the developmental
perspective. Besides the ‘subsystems hypothesis’ does not predict language interaction in
bilinguals, though its combination with ‘activation threshold’ makes language interaction
possible.
For example, according to the two above hypotheses, representations (such as
lexicon) and automatic routines involved in the production of (supra-)segmental structures
can be stored within the appropriate modules of the two language subsystems of a single
neurofunctional language system. The subsystems use different neural paths, but are
stored intertwined amongst each other. The environmental situation of bilingual children
is usually quite volatile in terms of the amount and quality of language exposure, and
should affect the development of speech and language skills (their perception and
production). In the process of language acquisition, variable language input conditions
may differentially affect ‘the activation threshold’ level for the components in each of the
two language subsystems, enabling the selection of elements of LB for the LA and/or vice
versa if ‘the activation threshold’ is sufficiently lowered. Since the ‘activation threshold’
is “operative in all higher cognitive representations”, and is “not associated with any
particular anatomical area” (Paradis, 2004, p.28), the same mechanism should be
available in all sub-modules of the two linguistic systems, including modules encoding
phonology and speech motor control.
Since we have dealt with continuous prosodic variables (vowel duration and vocal
effort) in bilingual children’s speech production, and the observed language interaction
patterns were systematic (and coherent) across the two subjects, different variables and
methods, we can assume that the bilingual children produced variable patterns of
language interaction unconsciously and automatically. This means that generally we were
not dealing with pragmatic ‘code-switching’ issues (Muysken, 2000; Grosjean, 2001).
The systematicity of language interaction also means that we were not dealing with
occasional ‘unrepaired slips of the tongue’ (de Houwer, 1995). In fact, such systematicity
has been claimed to be a sign of ‘static interference’ (Tomioka, 2002; Paradis, 2004): i.e.
being part of the representation in the non-target language. ‘Static’ in this sense
269
presupposes that no other representation is available, or that there is no difference
between two representations of the two language structures in the subsequent language
submodules. However, potentially we argue that if two language-specific speech
production options are available, but one of them has a greater ‘activation threshold’ in
the non-target language due to increased environmental exposure to this pattern, perhaps
we are not dealing with a ‘static interference’, but with the ‘dynamic’ one (Grosjean,
2001; Paradis, 2004).
In Section 2.3.2 we discussed Kehoe’s (2002) study revealing bilingual GermanSpanish patterns of language interaction in vowel duration in the speech of early
simultaneous bilinguals acquiring two systems featuring a length contrast (German) and
lacking it (Spanish). There, we raised the problem that, given the structure of the
crosslinguistic difference, it is difficult to decide whether a given pattern should be
attributed to ‘transfer’ or to ‘delay’ (Paradis & Genesee, 1996). We argued that attributing
the ‘reduced’ vowel length contrast in bilingual’s German speech production to ‘delay’
should be accompanied by evidence that the difference in the extent of short and long
vowels compared to German monolingual children eventually ceased (that was not the
case in Kehoe’s (2002) study). In fact, our data showed that a similar ‘reduced’ SVLR
conditioning for both /i/ and // compared to the monolingual peers did longitudinally
cease in the case of AN, and reduce in the case of BS. This may suggest a possibility of a
‘delay’ in a narrow sense suggested by Kehoe for this particular feature. However, there
were other problems with the term ‘delay’. As we pointed out, Paradis & Genesee (1996)
proposed some general systemic delay, rather than a delay for a feature. So there is a
discrepancy in the specification of the term between the two studies. Besides, the term
‘delay’ means potentially ‘disordered’ in the speech and language therapy context, and
should be avoided for this reason. Therefore, we keep to our previous position that this
apparently ‘delayed’ pattern could rather be viewed as a systematic and normal bilingual
language interaction for this particular feature resulting from the mutual structural
influence of the two languages in contact in a bilingual’s mental representation.
7.1.3.2
Implications of the findings for the theory and models of language
acquisition
Our data on language interaction showed that the patterns observed were similar to
those reported in second language acquisition studies (Weinreich, 1953; Mack, 1982; de
Silva, 1999; Markus & Bond, 1999; Piske et al., 2002; Mennen, 2004). The pattern of
270
‘over-differentiation’ (Weinreich, 1953) of tense/lax vowels in AN’s Russian can
plausibly be explained by language input factors. As far as this data is concerned, there is
no need for a separate model of language interaction in sound structure for early
simultaneous bilinguals as opposed to L2-learners. However, in employing concepts like
‘markedness’ to bilingual and general phonological acquisition in child speech, it is
questioned here whether the concept can be applied at all to stand-alone segments
(Jakobson, 1941; Wode, 1992), since we found clear lexicalisation and frequency effects,
as well as phonotactic explanations for some apparent ‘markedness’-related effects on the
vowels.
One of the important findings in this study is the fact that the amount of language
differentiation (and subsequently language interaction) systematically differs with
changing language exposure conditions. This is confirmed by either looking at speech
production of two subjects with very different (yet systematic) exposure to two languages,
or by the longitudinal perspective within each of the subjects. This means that the
hypotheses such as the Cross-Language Cue Competition Hypothesis (Döpke, 1998;
Döpke, 2000) and the Markedness Hypothesis (Müller, 1998), which postulate structural
characteristics of languages as a single primary source of differentiation/interaction,
cannot disregard environmental factors as a cause of this changing differentiation – on the
contrary they should consider them at least as two primary confounding sources.
In Section 1.3.2.3.2 we discussed the fact that the formulation of Döpke’s CCCH
was based on the Competition Model of monolingual and second language acquisition
(Bates & MacWhinney, 1989; MacWhinney, 1997) Unfortunately, the Competition
Model provided no explanation of mechanisms of language interaction in simultaneous
bilingual language acquisition. The model emphasised that language acquisition is driven
by input – both in environmental and language-structural terms. Acquisition is driven by
‘cue strength’: the stronger the cue the earlier it is acquired. ‘Cue strength’ is determined
by four dimensions: ‘task frequency’, ‘cue availability’, ‘cue reliability’ and ‘conflict
reliability’ (MacWhinney, 1997, p.122). There is one important dimension in the model
which is overlooked in Döpke’s hypothesis (1998; 2000) that is intended to account for
the kind of environmental effects found in our study. The factor ‘task frequency’
comprises language internal frequencies of properties, but also environmental frequency
(no input means there is nothing to acquire). MacWhinney notes that in the context of
SLA and simultaneous bilingual language acquisition, the factor ‘task frequency’ may be
of greater importance, because if one of the languages is infrequently used, “task
frequency could become a factor determining a general slowdown of acquisition”
271
(MacWhinney, 1997, p.122). We suggest, thus, that the dimension similar to ‘task
frequency’ should play a more important role in accounts of language interaction effects
in order to explain the environmental effects on language interaction such as in this study,
and similar environmentally based morphosyntactic language interaction effects observed
by Petersen (1988) and Lanza (1992).
7.1.4 Implications of vocal effort findings
There is limited evidence that ‘stress-accent’ languages (Beckman, 1986) with
structural contrasts involving vowel length may have a differential implementation of the
acoustic cues to the accentual systems other than duration. For example, both Fónagy
(1966) for Hungarian and Berinstein (1979) for K’ekchi reported that vowel peak
intensity played a secondary role in a paradigmatic contrast in distinguishing short and
long vowels in words with the same structure and utterance position: i.e. in both
languages short vowels had 1-2 dB higher overall intensities than the long counterparts.
Importantly, if proven systematic, such evidence could empirically fortify the dynamic
view of word-prosodic systems taken in the Stress-Accent Hypothesis (Beckman, 1986),
which claims that phonological categories of accentual systems are not necessarily
phonetically uniform within a language. Unfortunately the differences in overall intensity
in both studies (Fónagy, 1966; Berinstein, 1979) were negligibly small, and could be due
to chance.
Alternatively, these studies (Fónagy, 1966; Berinstein, 1979) could just have looked
at less relevant acoustic cues (overall intensity). More recent empirical studies on the
acoustic correlates of stress and prominence (Sluijter & van Heuven, 1996a; Sluijter &
van Heuven, 1996b; Sluijter et al., 1997; Traunmüller & Eriksson, 2000; Heldner, 2003)
have emphasised the importance of laryngeal level of vocal effort (in addition to the
pulmonic one) in conveying linguistic information about stress and prominence in speech
production. These studies have shown that overall intensity is an unreliable cue to stress
and prominence, while intensity in spectral midfrequencies (‘spectral balance’, ‘spectral
emphasis’ or ‘spectral tilt’: i.e. different methods in different studies) seems to reliably
reflect the laryngeal contribution to vocal effort, stress and prominence. Our own data for
the differentiated vocal effort patterns accompanying durational SVLR suggest that
indeed the two studies (Fónagy, 1966; Berinstein, 1979) might not have looked at the
most relevant acoustic cues.
272
We have several reasons to think that this differentiated vocal effort pattern
accompanying the Scottish Vowel Length Rule vowels under strong sentence accent
(Section 6.3.1.1) is due the interaction between SSE word-prosodic system and SVLR,
rather than to anticipatory effects of the following consonants.
Accentual lengthening cued by duration is a known macroprosodic effect, and it
usually affects the various parts of the whole prosodic word, depending on the language
(Cambier-Langeveld & Turk, 1999; Turk & Sawusch, 1999), but its domain usually
includes the stressed syllable nucleus (short or long). The presence of short/long
phonological conditioning of vowel length in a language should impose certain
restrictions on how much duration can be used for other functions than phonological
length (such as SVLR). Specifically, the short vowel cannot be infinitely lengthened
without trespassing the acoustic boundaries of phonological length, as this would be
pragmatically odd (if it not lexically contrastive). This is a known effect from the L2acquistition studies, where L2-learners fail to achieve language-specific vowel length for
long or short vowels (Mack, 1982; de Silva, 1999; Markus & Bond, 1999).
Increasing vocal effort for the short vowels, as in the Scottish SVLR vowels, might
be a strategy to compensate for the load on duration from the accentual system of
prominence: it thus may be viewed as an additional word-prosodic means (next to
duration) to achieve sufficient prominence for the short vowel. We have shown that this
SSE pattern of vocal effort is not accidental, since it is systematic for both vowels /i/ and
//; monolingual children at age 3;4 have acquired adult-like performance, and so have
bilingual children (at least for the unrounded vowels in this study).
However, there is a confounding effect to our claim for the SSE SVLR and vocal
effort pattern and its relation to prominence: i.e. the varying right consonantal context
(voiced fricatives as opposed to other contexts). Yet we have reason to believe that this
pattern is not due to anticipatory effects of the following consonant, such as those
described in the ‘timing’ model of glottal control (Gobl & Ní Chasaide, 1988). In the
model, phonation of the vowel (in voice source parameters) varies as a function of the
voicing and manner of articulation of the following consonant. In English, a breathy
phonation type is only anticipated before voiceless fricatives, and sometimes voiceless
stops (Gobl & Ní Chasaide, 1988; Gobl & Ní Chasaide, 1999b; Ní Chasaide & Gobl,
1999), and not before voiced fricatives, as it was the case in this study. This counts for
both short and long SSE vowels /i/ and //. Additionally, the context before voiceless
stops in our data is rather compatible with less breathy (more tense) mode of vowel
273
phonation. This apparent contradiction can be explained by the fact the mode of phonation
changes in this particular short/long context vary as a function of prominence rather than
of the following consonant.
Another argument in favour of word-prosodic system/SVLR interaction comes from
our pilot analysis of the empirical data gathered from the ongoing SVLR project (Scobbie
et al., 1999a; Scobbie et al., 1999b; Scobbie, 2002). We analysed vocal effort in three
adult SSE speakers producing the morphophonemically contrastive pairs, such as “rude”
and “rued”, which only differ in vowel length. In this case, the confounding effect of the
following consonant was absent. The preliminary results showed an effect on vocal effort
in a similar direction and extent to that observed in the SVLR data in this study.
We are not aware of similar studies of paradigmatic contrasts in vowel length, vocal
effort in relation to stress accent. This could be done for the languages (such as Aleut,
K’ekchi or Finnish) featuring vowel length with no confounding consonantal effects or
vowel quality differences. Additionally, involving syntagmatic comparison of stressed
and unstressed short/long vowels may provide more evidence for our hypothesis.
Given this limitation, in the context of bilingual acquisition suffice it to state that it
was not a trivial task to assess this novel monolingual finding in bilingual children, since
neither aspect has been addressed before. Whatever argument proves correct for the
laryngeal distinction accompanying the SVLR vowels (effect of prominence or the
anticipatory effects), the high systematicity of the data both in the SSE monolingual and
in the crosslinguistic context persuaded us to include the vocal effort variables in this
study. Indeed, we showed that the SSE monolingual children acquired the pattern in the
age samples concerned, as well as the bilingual children.
The monolingual vocal effort results involved another novel finding for the
phonological acquisition of tense/lax contrast. It has been shown in the literature (so far to
a quite limited extent) for American English and for German (Stevens, 1998; Jessen,
2002) that tense/lax contrast not only involves phonetic differences in vowel quality and
duration (as it is traditionally treated), but also requires an adjustment in laryngeal
configuration. Similarly in our monolingual and bilingual SSE data child and adult data,
the lax vowels were realised with less breathy glottal source configuration, while the tense
vowels with a more breathy one (or more ‘lax’ in more conventional glottal source terms).
This study has shown that the segmental tense/lax difference involves at least a triple
phonetic distinction (arguably depending on the language), and the crosslinguistic studies
assessing the contribution of phonetic properties involving tense and lax contrast and their
acquisition should not overlook the vocal effort difference.
274
7.2 Suggestions for further research
Regarding the bilingual ‘interdependence/autonomy debate’, the phonological and
more specifically prosodic level of speech seems to be systematically prone to language
interaction effects which are variable in extent, both in proficient L2-learners (Caramazza
et al., 1973; Williams, 1980; Flege, 1987; Mennen, 2004) and young simultaneous
bilinguals (Kehoe et al., 2001; Paradis, 2001; Lleó, 2002; Kehoe, 2002; Kehoe, 2004 and
this study). Therefore, it remains to be proven whether fully ‘autonomous’ development
of sound structures is possible at all.
One of the findings in this study, to which we have no clear answer, is the apparent
discrepancy in the bilingual acquisition of vowel duration as opposed to both vowel
quality and vocal effort patterns. As far as we are aware there are no claims in the general
phonological development literature that vowel duration is more difficult to acquire than
other suprasegmental aspects. It seems that the type of crosslinguistic differences in vowel
duration (such as in Russian and Scottish English) may be difficult to acquire in the
context of simultaneous bilingual acquisition. However, so far only a few studies so far
have dealt with these issues (Kehoe, 2002; Whitworth, 2003).
Further, since this study supports the importance of language exposure patterns and
sound structure differences, it seems reasonable to further look at both of these aspects, to
gain more views of how the input/structure interface may operate.
We have shown that some of the language interaction patterns in bilingual child
speech can be bi-directional. There seems to be a common ground in the bi-directional
patterns in the speech of proficient L2 learners (Mennen, 2004) and young bilingual
children in this study. However, this option is not yet seriously considered in the accounts
on the sources of language interaction or L2 ‘transfer’ (Bates & MacWhinney, 1989;
Petersen, 1988; Müller, 1998; Döpke, 1998; Döpke, 2000; Flege, 2002).
Monolingual or bilingual studies looking into phonetic aspects of the tense/lax
contrasts and their phonological acquisition should not overlook the importance of
another phonetic contributor, ‘laryngeal configuration’, in addition to vowel quality and
duration.
275
7.3 General Conclusion
The results from this study offer new insights on the extent of language
differentiation and interaction in bilingual phonological acquisition.
In studies of simultaneous bilingual acquisition there is a seeming consensus that
children acquire their languages as separate entities. However, we showed that there is
evidence that bilingual children’s languages may interact. The systematicity of language
interaction in our data show that language interaction in early simultaneous acquisition
cannot be discarded as slips of the tongue, and that the development is not fully
autonomous (even in a bilingual child who is more ‘balanced’ with regard to the language
input). So far these two types of evidence of ‘autonomous’ and ‘interdependent’
development have been largely treated in a mutually exclusive fashion. This study shows
that this cannot be assumed.
We showed that at the level of sound structure, the development of a bilingual’s
languages does not appear to be fully autonomous: language differentiation can be partial,
or even be missing at certain developmental stages. The extent of differentiation varies
depending on the sound structure involved, but importantly it also depends on the amount
of language exposure in both languages. Longitudinal patterns and comparison to
monolingual peers suggest that the bilingual differentiation of sound structures mainly
increases as a function of age (and possibly as a function of accumulated exposure), and
as a function of maturation processes similar to the monolingual children.
Evidence of language interaction on the level of sound structure production
considered in this study provides some support for a unified model of acquisition. The
processes of language interaction observed in our data are largely in line with the types of
language interaction observed in L2 learners. Its directionality did not necessarily depend
on the relative markedness of the crosslinguistic structures, and in some cases was bidirectional for the same properties.
We showed that some structurally complex processes, which are potentially
explainable by such concepts as ‘markedness’ (with regard to isolated segments), can –
upon closer investigation – rather be explained by lexical, distributional and phonotactic
conditioning.
276
References
Agutter, A. (1988). The non-so-Scottish Vowel Length Rule. In Edinburgh Studies in the
English Language, eds. Anderson, J. M. & MacLeod, N., John Donald Publishers,
Edinburgh.
Aitken, A. J. (1981). The Scottish Vowel Length Rule. In So Many People, Longages, and
Tongues, Edinburgh: Middle English Dialect Project, ed. Benskin, M. L., pp. 131-157.
Avanesov, R. I. (1972). Russkoe literaturnoe proiznoshenie, Moscow.
Bates, E. & MacWhinney, B. (1989). Functionalism and the Competition Model. In The
cross-linguistic study of sentence processing, eds. Bates, E. & MacWhinney, B.,
Cambridge University Press, Cambridge.
Bauer, L. (1985). Tracing phonetic change in the recieved pronunciation of British
English. Journal of Phonetics 13, pp. 61-81.
Beckman, M. E. (1986). Stress and Non-Stress Accent, Foris Publications, Doordrecht.
Berinstein, A. E. (1979). A cross-liguistic study on the contribution of duration to the
perception of stress, UCLA Working Papers in Phonetics ed. UCLA, Los Angeles.
Bilton, T., Bonnett, K., Jones, P., Lawson, T., Skinner, D., Stanworth, M., & Webster, A.
(2002). Introductory Sociology, 4th ed.
Birdsong, D. (2004). Second Language Acquisition and Ultimate Attainment. In
Handbook of Applied Linguistics, eds. Davies, A. & Elder, C., pp. 82-105. Blackwell,
London.
Bloomfield, L. (1933). Language, George Allen & Unwin Ltd, London.
Boersma, P. & Weenink, D. (2004). PRAAT, a system for doing phonetics by computer.
www.praat.org version 4.3.04.
Bondarko, L. V. (1981). Foneticheskoe opisanie yazyka, fonologicheskoe opisanie rechi,
pp. 1-192. Izdatel'stvo Leningradskogo universiteta, Leningrad.
Bondarko, L. V. (1998). Fonetika sovremennogo russkogo yazyka, pp. 1-276. Izdatel'stvo
Sankt Peterburgskogo universiteta, St.-Petersburg.
Brown, R. (1973). A First Language: The early stages, Harward University Press,
Cambridge, MA.
277
Buder, E. H. & Stoel-Gammon, C. (2002). American and Swedish children's acquisition
of vowel duration: effects of vowel identity and final stop voicing. Journal of Acoustical
Society of America 111, pp. 1854-1864.
Burton, M. B. & Robblee, K. E. A phonetic analysis of voicing assimilation in Russian.
Journal of Phonetics 25, 97-114. 1997.
Cambier-Langeveld, T. & Turk, A. (1999). A cross-linguistic study of accentual
lengthening: Dutch vs. English. Journal of Phonetics 27, pp. 171-206.
Campbell, W. N. (1995). Loudness, spectral tilt, and perceived prominence in dialogues.
In Proceedings of the XIIIth International Congress of Phonetic Science, eds. Elenius, K.
& Branderud, P., pp. 676-679. KTH and Stockholm University, Stocholm.
Caramazza, A., Yeni-Komshian, G., Zurif, E., & Carbone, E. (1973). The acquisition of a
new phonological contrast: the case of stop consonants in French-English bilinguals.
Journal of Acoustical Society of America 54, pp. 421-428.
Chambers, J. (2002). Dynamics of Dialect Convergence. In Investigating Change and
Variation through Dialect Contact, ed. Milroy, L., pp. 117-130.
Chen, M. (1970). Vowel length variation as a function of the voicing of the consonant
environment. Phonetica 22, pp. 129-159.
Chevrot, J.-P., Beaud, L., & Varga, R. (2000). Developmental data on a French
sociolinguistic variable: Post-consonantal word-final /R/. Language Variation and
Change 12, pp. 295-319.
Chomsky, N. (1986). Knowledge of Language: its Nature, Origin and Use, Praeger, New
York.
Clyne, M. (1967). Transference and Triggering, Martinus Nijhoff, The Hague.
Corbett, J., McClure, J. D., & Stuart-Smith, J. (2003). A Brief History of Scots. In The
Edinburgh Companion to Scots, eds. Corbett, J., McClure, J. D., & Stuart-Smith, J., pp. 116. Edinburgh University Press, Edinburgh.
Crutchley, A., Conti-Ramsden, G., & Botting, N. (1997). Bilingual children with specific
language impairment and standardised assessments: preliminary findings from a study of
children in language units. International Journal of Bilingualism 6, pp. 117-134.
Crystal, D. (1997). The Cambridge Encyclopedia of Language, Cambridge University
Press, Cambridge.
Dale, P. S. & Fenson, L. (1996). Lexical development norms for young children. Behavior
Research Methods, Instruments, & Computers 28, pp. 125-127.
278
de Houwer, A. (1995). Bilingual Language Acquisition. In The Handbook of Child
Language, eds. Fletcher, P. & MacWhinney, B., pp. 219-250. Blackwell, Oxford.
de Houwer, A. (1998). By way of introduction: Methods in studies of bilingual first
language acquisition. International Journal of Bilingualism 2, pp. 249-264.
de Houwer, A. (1990). The Acquisition of Two Languages from Birth: a Case Study,
Cambridge University Press, Cambridge.
de Silva, V. (1999). Interference of a Quantity Language in Rhythmic Structure of a
Stress Language. In Proceedings of the 14th International Congress of Phonetic Sciences
pp. 559-562. San Francisco.
Deterding, D. (1997). The formants of monophthong vowels in Standard Southern British
English pronunciation. Journal of the International Phonetic Association 27, pp. 47-55.
Deuchar, M. & Quay, S. (2000). Bilingual Acquisition: Theoretical Implications of a case
study., Oxford University Press, Oxford.
Docherty, G. J. (1992). The timing of voicing in British English Obstruents, Netherlands
Phonetics Archives, 9, Foris, Berlin.
Docherty, G. J. & Foulkes, P. (1999). Derby and Newcastle: instrumental phonetics and
variationist studies. In Urban Voices: Accent Studies in the British Isles, eds. P.Foulkes &
G.Docherty, pp. 47-71. Arnold, London, UK.
Docherty, G. J., Foulkes, P., Tillotson, J., & Watt, D. (2005). On the scope of
phonological learning: issues arising from socially structured variation. In Labphon 8.
Döpke, S. (1998). Competing language structures: the acquisition of verb placement by
bilingual German-English children. Journal of Child Language 25, pp. 555-584.
Döpke, S. (2000). The Interplay Between Language-Specific Development and
Crosslinguistic Influence. In Cross-linguistic structures in simultaneous language
acquisition, ed. Döpke, S., pp. 79-104. John Benjamins, Amsterdam.
Ellis, R. (1994). The Study of Second Language Acquisition, Oxford University Press,
Oxford.
Escudero, P. (2000). The Perception of English Vowel Contrasts: Acoustic Cue Reliance
in the Development of New Contrasts. New Sounds 2000, the Fourth International
Symposium on the Acquisition of Second-Language Speech.
Fant, G. (1960). Acoustic Theory of Speech Production, 2nd ed., pp. 1-328. Mouton, The
Hague - Paris.
279
Ferguson, C. A. & Farwell, C. (1975). Words and sounds in early language acquisition:
English initial consonants in the first fifty words. Language 51, pp. 419-439.
Finnegan, E. M., Lushei, E. S., & Hoffman, H. T. (2000). Modulations of respiratory and
laryngeal activity associated with changes in vocal intensity during speech. Journal of
Speech, Language and Hearing Research 43, pp. 934-950.
Flege, J. E. (1987). The production of "new" and "similar" phones in a foreign language:
evidence for the effect of equivalence classification. Journal of Acoustical Society of
America 15, pp. 47-65.
Flege, J. E. (2002). Interactions between the Native and Second Language Phonetic
Systems. In An Integrated View of Language Development: Papers in Honor of Henning
Wode, eds. Burmeister, P., Piske, T., & Rohde, A., Wissenschaftlicher Verlag, Trier.
Flege, J. E., Munro, M. J., & MacKay, I. R. A. (1995). Factors affecting strength of
perceived foreign accent in a second language. Journal of Acoustical Society of America
97, pp. 3125-3134.
Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters.
Psychological Bulletin 76, pp. 378-382.
Fletcher, H. F. & Munson, W. A. (1933). Loudness, its definition, measurement and
calculation. Journal of Acoustical Society of America 5, pp. 82-108.
Fónagy, I. (1966). Elecro-physiological and acoustic correlates of stress and stress
perception. Journal of Speech and Hearing Research pp. 231-244.
Fry, D. B. (1955). Duration and intensity as physical correlates of linguistic stress.
Journal of Acoustical Society of America 27, pp. 765-768.
Fry, D. B. (1976). Experiments in the perception of stress. In Acoustic Phonetics: a
Course of Basic Readings, ed. Fry, D. B., Cambridge University Press, Cambridge.
Fudge, E. C. (1969). Syllables. In Phonological Theory: The Essential Readings, ed.
Goldsmith, J. A., pp. 370-391. Blackwell, Malden,Massachusetts - Oxford.
Gauffin, J. & Sundberg, J. (1989). Spectral correlates of glottal voice source waveform
characteristics. Journal of Speech and Hearing Research 32, pp. 556-565.
Gawlitzek-Maiwald, I. & Tracy, R. (1996). Bilingual Bootstrapping. Linguistics 34, pp.
901-926.
Genesee, F. (1989). Early bilingual development: one language or two? Journal of Child
Language 16, pp. 161-179.
280
Genesee, F., Nicoladis, E., & Paradis, J. (1995). Language differentiation in early
bilingual development. Journal of Child Language 22, pp. 611-631.
Gimson, A. C. (1962). An Introduction to the Pronunciation of English, Edward Arnold
(Publishers) Ltd, London, UK.
Gobl, C. & Ní Chasaide, A. (1988). The effects of adjasent voiced/voiceless consonants
on the vowel voice source: a cross-language study. STL-QPSR 2-3.
Gobl, C. & Ní Chasaide, A. (1999b). Voice source variation in the vowel as a function of
consonantal context. In Coarticulation: Theory, Data, Techniques, eds. Hardcastle, W. J.
& Hewlett, N., pp. 122-143. Cambridge University Press.
Gobl, C. & Ní Chasaide, A. (1999a). Perceptual correlates of source paramenters in
breathy voice. In Proceeding of the 14th International Congress of Phonetic Sciences pp.
2437-2440. San Francisco.
González-Bueno, M. (2002). Dental versus Alveolar Articulation of L2 Spanish Stops as
Perceived by Native Speakers of Malayalam. In Proceedings of "Linguistics and
Phonetics 2002" (LP2002), eds. Haraguchi Shoshuke, Palek Bohumil, & Fujimura
Osamu, Charles University Press and Meikai University, Japan.
Gordeeva, O. B., Mennen, I., & Scobbie, J. M. (2003). Vowel Duration and Spectral
Balance in Scottish English and Russian. In Proceedings of the 15th International
Congress of Phonetic Sciences, eds. Solé, M. J., Recasens, D., & Romero, J., pp. 31933196. Barcelona.
Gordeeva, O. B. & Scobbie, J. M. Non-normative preaspiration of voiceless fricatives in
Scottish English. [paper presented at the Colloquim of the British Association of
Academic Phoneticians]. 2004. Cambridge, University of Cambridge. 2004.
Grosjean, F. (1982). Life with Two Languages: An Introduction to Bilingualism, reprint
2002 ed., pp. 1-370. Harvard University Press, Cambridge, Massachusetts, London.
Grosjean, F. (2001). Bilingual's Language Modes. In One Mind, Two Languages:
Bilingual Language Processing, ed. Nicol, J. L., pp. 1-23. Blackwell, Oxford.
Grunwell, P. (1982). Clinical Phonology, 2nd ed. Churchill Livingstone, London.
Guion, S. G. (2003). The Vowel Systems of Quichua-Spanish Bilinguals: Age of
Acquisition Effects on the Mutual Influence of the First and Second Languages.
Phonetica 60, pp. 98-128.
Hamers, J. F. & Blanc, M. H. A. (2000). Bilinguality and Bilingualism, Cambridge
University Press, Cambridge.
Hanson, H. M. (1997). Glottal characteristics of female speakers: Acoustic correlates.
Journal of Acoustical Society of America 101, pp. 466-481.
281
Hawkins, S. & Midgley, J. (2004). Formant frequencies of RP monophthongs in four agegroups of speakers. Journal of the International Phonetic Association.
Heldner, M. (2001). Spectral emphasis as a perceptual cue to prominence. Fonetik 42, pp.
51-57.
Heldner, M. (2003). On the reliability of overall intensity and spectral emphasis as
acoustic correlates of focal accents in Swedish. Journal of Phonetics 31, pp. 39-62.
Hewlett, N., Matthews, B. M., & Scobbie, J. M. (1999). Vowel Duration in Scottish
English Speaking Children. pp. 2157-2160. 14th International Congress of Phonetic
Sciences, San Francisco.
Hillenbrand, J., Getty, L. A., Clark M.J., & Wheeler, K. (1995). Acoustic Characteristics
of American English Vowels. Journal of Acoustical Society of America 97, pp. 30993111.
Hirose, H. (1999). Investigating the Physiology of Laryngeal Structures. In The Handbook
of Phonetic Sciences, eds. Hardcastle, W. J. & Laver, J., Blackwell,
Oxford/Massachusetts.
Hirst, D. & Di Christo, A. (1998). Intonation Systems. In A Survey of Intonation Systems,
eds. Hirst, D. & Di Christo, A., pp. 1-44. Cambridge University Press, Cambridge, U.K.
House, A. S. (1961). On Vowel Duration in English. Journal of Acoustical Society of
America 33, pp. 1174-1178.
Jakobson, R. (1941). Child Language, Aphasia and Phonological Universals, reprint in
English 1968 ed. Mouton, The Hague, Paris.
Jessen, M. (2002). Spectral Balance in German and its relevance for syllable cut theory. In
Silbenschnitt und Tonakzente, eds. Auer, P., Gilles, P., & Spiekermann, H., pp. 153-179.
Max Niemeyer, Teubingen.
Johnson, C. E. & Lancaster, P. (1998). The Development of More Than One Phonology:
A Case Study of a Norwegian-English Bilingual Child. International Journal of
Bilingualism 2, pp. 265-300.
Jones, D. (1918). An Outline of English Phonetics, 9th (1972) ed. W. Heffer & Sons W.
Heffer & Sons, Cambridge.
Kavitskaya, D. (2002). Perceptual Salience and Palatalization in Russian. Oral Paper at
the Eighth Conference on Laboratory Phonology"Varieties of Phonological Competence".
Keating, P. A. (1984). Phonetic and Phonological Representation of stop consonant
voicing. In Phonetic Linguistics, ed. Fromkin, V., Academic Press, New York.
282
Kehoe, M. M. (2002). Developing vowel systems as a window to bilingual phonology.
International Journal of Bilingualism 6, pp. 315-334.
Kehoe, M. M. (2004). Voice Onset time in bilingual German-Spanish children.
Bilingualism: Language and Cognition 7, pp. 71-88.
Kehoe, M. M. & Stoel-Gammon, C. (2001). Development of syllable structure in Englishspeaking children with particular reference to rhymes. Journal of Child Language 28, pp.
393-432.
Kehoe, M. M., Trujullo, C., & Lleó, C. (2001). Phonological acquisition of bilingual
children: An analysis of syllable structure and Voice Onset Time. In Proceedings of the
Colloqium on Structure, Acquisition, and Change of Grammars: Phonological and
Syntactic Aspects, eds. Cantone, K. & Hinzelin, M., pp. 38-54.
Kent, R. D. & Read, C. (2002). Acoustic Analysis of Speech, pp. 1-311. Thomson
Learning, Albany.
Keshavarz, M. H. & Ingram, D. (2002). The early phonological development of a FarsiEnglish bilingual child. International Journal of Bilingualism 6, pp. 255-269.
Kessler, B. & Treiman, R. (1997). Syllable Structure and the Distribution of Phonemes in
English Syllables. Journal of Memory and Language 37, pp. 295-311.
Khattab, G. (2000). VOT Production in English and Arabic Bilingual and Monolingual
Children. Leeds Working Papers in Linguistics and Phonetics 8, pp. 95-122.
Khattab, G. (2002). Sociolinguistic Competence and the Bilingual's Adoption of Phonetic
Variants: Auditory and Instumental Data from English-Arabic Bilinguals, Unpublished
Ph.D. Thesis. The University of Leeds, Leeds.
Khattab, G. (2004). Variation in vowel production by English-Arabic bilinguals. Paper at
the 9th Conference of Laboratory Phonology, June 24-26.
Kuznetsov, V. I. (1997). Vokalizm russkoj rechi, Izdatel'stvo Sankt Peterburgskogo
universiteta, St-Petersburg.
Ladd, D. R. (1996). Intonational Phonology, Cambridge University Press, Cambridge.
Ladefoged, P. (1971). Preliminaries in Linguistic Phonetics, University of Chicago Press,
Chicago.
Ladefoged, P. (1993). A Course in Phonetics, pp. 1-300. Harcourt Brace College
Publishers.
Ladefoged, P. & McKinney N.P. (1963). Loudness, sound pressure, and subglottal
pressure in speech. Journal of Acoustical Society of America pp. 454-460.
283
Lado, R. (1957). Linguistics Across Cultures: Applied Linguistics for Language Teachers,
Ann Arbor, MIchigan:University of Michigan.
Lanza, E. (1992). Can bilingual two-year olds code-switch? Journal of Child Language
19, pp. 633-658.
Lanza, E. (2000). Concluding Remarks: Language Contact -- A Dilemma for the bilingual
Child or for the Linguist? In Cross-linguistic structures in simultaneous language
acquisition, ed. Doepke, S., pp. 227-246. John Benjamins, Amsterdam.
Laver, J. (1994). Principles of Phonetics, Cambridge University Press, Cambridge.
Lehiste, I. (1977). Suprasegmentals, pp. 1-194. The Massachusetts Institute of
Technology, Massachusetts.
Lenneberg, E. H. (1967). Biological Foundations of Language, Wiley, New York.
Lindblom, B. (1998). Systemic constraints and adaptive change in the formation of sound
structure. In Approaches to the Evolution of Language: Social and Cognitive Bases, eds.
Hurford, J. R., Studdert-Kennedy, M., & Knight, C., pp. 242-264. Cambridge University
Press, Cambridge.
Lisker, L. (1974). On "Explaining" Vowel Duration Variation. Glossa: An International
Journal of Linguistics 8, pp. 233-245.
Lleó, C. (2002). The role of markedness in the acquisition of complex prosodic structures
by German-Spanish bilinguals. International Journal of Bilingualism 6, pp. 291-313.
Lüdi, G. (1987). Les marques transcodiques: regards nouveaux sur le bilinguisme. In
Devenir bilingue-parler bilingue. Actes du 2e colloque sur le bilinguisme, Université de
Neuchatel, 2O-22 Septembre, 1984, ed. Lüdi, G., pp. 1-21. Max Niemeyer Verlag,
Tubingen.
Lyon, J. (1996). Becoming Bilingual: Language acquisition in a bilingual community,
Multilingual Matters, Clevedon, England; Philadelphia, PA.
Mack, M. (1982). Voicing-dependent vowel duration in English and French: monolingual
and bilingual production. Journal of Acoustical Society of America 71, pp. 173-178.
Macken, M. A. (1986). Phonological development: a crosslinguistic perspective. In
Language Acquisition, eds. Fletcher, P. & Garman, M., pp. 251-268. Cambridge
University Press, Cambridge.
Mackenzie Beck, J. (1997). Organic Variation of the Vocal Apparatus. In The Handbook
of Phonetic Sciences, eds. Hardcastle, W. J. & Laver, J., Blackwell,
Oxford/Massachusetts.
284
MacWhinney, B. (1997). Second Language Acquisition and the Competition Model. In
Tutorials in Bilingualism: Psycholinguistic Perspectives, eds. De Groot, A. M. B. &
Kroll, J. F., pp. 113-144. Lawrence Erlbaum Associates, Mahwah, New Jersey.
MacWhinney, B. (2004). A Unified Model of Language Acquisition. In Handbook of
bilingualism: Psycholinguistic approaches, eds. Kroll, J. & De Groot, A., Oxford
University Press, Oxford.
Markus, D. & Bond, D. (1999). Stress and Length in Learning Latvian. In 14th
International Congress of Phonetic Sciences pp. 563-566. San Francisco.
Matthews, B. M. (2002). On Variability and the Acquisition of Vowels in Normally
Developing Scottish Children (18-36 months), Unpublished Ph.D. thesis. Queen Margaret
University College, Edinburgh.
McKenna, G. (1988). Vowel Duration in the Standard English of Scotland, unpublished
MSc thesis, University of Edinburgh, Edinburgh.
McLaughlin, B. (1984). Second language acquisition in childhood, 2 ed. Erlbaum,
Hillsdale,NJ.
Meisel, J. (1989). Early differentiation of languages in bilingual children. In Bilingualism
across the life span. Aspects of acquisition, maturity and loss., eds. Hyltenstam, K. &
Obler, L., pp. 13-40. Cambridge University Press, Cambridge.
Meisel, J. (2003). The Bilingual Child. In The Handbook of Bilingualism, eds. Batia, T. K.
& Ritchie, W. C., Blackwell Publishing, Oxford (UK) - Cambridge (USA).
Menn, L. & Stoel-Gammon, C. (1995). Phonological Development. In The Handbook of
Child Language, eds. Fletcher, P. & MacWhinney, B., pp. 335-360. Blackwell, Oxford.
Mennen, I. (2004). Bi-directional interference in the intonation of Dutch speakers of
Greek. Journal of Phonetics 32, pp. 543-563.
Mohanan, K. P. (1992). Emergence of Complexity in Phonological Development. In
Phonological Development: Models, Research, Implications, eds. Ferguson, C. A., Menn,
L., & Stoel-Gammon, C., pp. 635-662. Timonium, Maryland.
Müller, N. (1998). Transfer in bilingual first language acquisition. Bilingualism:
Language and Cognition 1, pp. 151-171.
Muysken, P. (2000). Bilingual speech a typology of code-mixing, Cambridge University
Press, Cambridge.
Netsell, R., Lotz, W. K., Peters, J. E., & Schulte, L. (1994). Developmental patterns of
Laryngeal and Respiratory Function for Speech Production. Journal of Voice 8, pp. 123131.
285
Ní Chasaide, A. & Gobl, C. (1999). Voice Source Variation. In The Handbook of
Phonetic Sciences, eds. Hardcastle, W. J. & Laver, J., pp. 427-461. Blackwell,
Oxford/Massachusetts.
Odlin, T. (1989). Language Transfer, Cambridge University Press, Cambridge.
Otomo, K. & Stoel-Gammon, C. (1992). The acquisition of unrounded vowels in English.
Journal of Speech and Hearing Research 35, pp. 604-616.
Padgett, J. (2005). Russian voicing assimilation, final devoicing, and the problem of [v]
(or, the mouse that squeaked). Natural Language and Linguistic Theory to appear.
Panasyuk, A. Y., Panasyuk, I. V., Gorlovsky, A. L., & Anfimova, O. V. (1995).
Perception of Tense-Lax Vowels and Fortis-Lenis Consonants by Russian Learners of
English. In Proceedings of XIIIth International Congress of Phonetic Sciences, eds.
Elenius, K. & Branderud, P., pp. 566-569. KTH and Stockholm University, Stockholm.
Paradis, J. (2001). Do bilingual two-year olds have separate phonological systems?
International Journal of Bilingualism 5, pp. 19-38.
Paradis, J. (2000). Beyond "One System or Two?": Degrees of Separation Between the
Languages of French-English Bilingual Children. In Cross-linguistic structures in
simultaneous language acquisition, ed. Döpke, S., pp. 175-200. John Benjamins,
Amsterdam.
Paradis, J. & Genesee, F. (1996). Syntactic acquisition in bilingual children: Autonomous
or interdependent? Studies in Second Language Acquisition 18, pp. 1-25.
Paradis, M. (2004). A neurolinguistic Theory of Bilingualism, John Benjamins Publishing
Company, Amsterdam/Philadelphia.
Paradis, M. (1993). Linguistic, psycholinguistic, and neurolinguistic aspects of
"interference" in bilingual speakers: The Activation Threshold Hypothesis. International
Journal of Psycholinguistics 9, pp. 133-145.
Paradis, M. (1981). Neurolinguistic Organisation of Bilingualism. LACUS Forum 7, pp.
486-494.
Paradis, M. (1998). Aphasia in Bilinguals: How Atypical is it? In Aphasia in Atypical
Populations, eds. Coppens, P., Lebrun, Y., & Basso, A., pp. 35-66. Lawrence Erlbaum
Associates, London.
Pater, J. (2003). The Perceptual Acquisition of Thai Phonology by English Speakers: Task
and Stimulus Effects. Second Language Research 19, pp. 209-223.
Petersen, J. (1988). Word-internal code-switching constraints in a bilingual child's
grammar. Linguistics 26, pp. 479-493.
286
Peterson, G. E. & Lehiste, I. (1960). Duration of Syllable Nuclei in English. Journal of
Acoustical Society of America 32, pp. 693-703.
Petitto, L. A. (2001). Bilingual signed and spoken language acquisition from birth:
implications for the machanisms underlying early bilingual language acquisition. Journal
of Child Language 28, pp. 453-496.
Piske, T., Flege, J. E., & MacKay, I. R. A. (2002). The Production of English Vowels by
Fluent Early and Late Italian-English Bilinguals. Phonetica 59, pp. 49-71.
Potisuk, S., Gandour, J., & Harper, M. P. (1996). Acoustic Correlates of Stress in Thai.
Phonetica 53, pp. 200-220.
Press, W. H., Teukolsky, W. T., Vetterling W.T., & Flannery, B. P. (1992). Numerical
Recipes in C: the Art of Scientific Computing, 2nd ed. Cambridge University Press,
Cambridge.
Redlinger, W. E. & Park, T.-Z. (1980). Language mixing in young bilinguals. Journal of
Child Language 7, pp. 337-352.
Remijsen, B. (2002). Word-prosodic Systems of Raja Ampat Languages, Universiteit
Leiden Centre of Linguistics, Leiden.
Rietveld, A. C. M. & van Heuven, V. J. (1997). Algemene Fonetiek, pp. 1-420. Dick
Coutinho, Bussum.
Robinson, D. W. & Dadson, R. S. (1956). A redetermination of the equal-loudness
relations for pure tones. British Journal of Applied Physics 7, pp. 166-181.
Rockey, D. (1973). Phonetic lexicon of monosyllabic and some disyllabic words, with
homophones, arranged according to their phonetic structure, Heyden & Son LTD,
London, New York, Rheine.
Schlyter, S. (1993). The weaker language in bilingual Swedish-French children. In
Progression and Regression in Language, eds. K.Hyltenstam & A.Viberg, Cambridge
University Press, Cambridge.
Schnitzer, M. L. & Krasinski, E. (1994). The development of segmental phonological
production in a bilingual child. Journal of Child Language 21, pp. 585-622.
Schnitzer, M. L. & Krasinski, E. (1996). The development of segmental phonological
production in a bilingual child: a contrasting second case. Journal of Child Language 23,
pp. 547-571.
Scobbie, J. M. (2005). Flexibility in the face of incompatible English VOT systems. In
Papers in Laboratory Phonology 8: Varieties of Phonological Competence, eds.
Goldstein, L. M., Best, C., & Whalen, D..
287
Scobbie, J. M. (2002). Fuzzy contrasts, fuzzy inventories, fuzzy systems: Thoughts on
quasi-phonemic contrasts, the phonetics/phonology interface and sociolinguistic variation.
Second International Conference of Contrast in Phonology, University of Toronto (oral
paper), Toronto.
Scobbie, J. M., Hewlett, N., & Turk, A. (1999a). Standard English in Edinburgh and
Glasgow: the Scottish Vowel Length Rule revealed. In Urban Voices: Accent Studies in
the British Isles, eds. P.Foulkes & G.Docherty, pp. 230-245. Arnold, London.
Scobbie, J. M., Turk, A., & Hewlett, N. (1999b). Morphemes, Phonetics and Lexical
Items: The Case of the Scottish Vowel Length Rule. In Proceedings of the 14th
International Congress of Phonetic Sciences pp. 1617-1620. San Francisco.
Selkirk, E. (1982). The Syllable. In Phonological Theory: The Essential Readings, ed.
Goldsmith, J. A., Blackwell, Malden,Massachusetts - Oxford.
Shvachkin, N. K. (1948). The Development of Phonemic Speech Perception in Early
Childhood. In Studies in Child Language Development, eds. Ferguson, C. A. & Slobin, D.
I., pp. 91-127. Holt, Rinehart and Winston, Inc., New York.
Sjölander, K. & Beskow, J. WaveSurfer - an Open Source Speech Tool. 2000. Bejing,
China, International Conference of Speech and Language Processing 2000.
Sluijter, A. M. C. & van Heuven, V. J. (1996b). Spectral balance as an acoustic correlate
of linguistic stress. Journal of Acoustical Society of America 100, pp. 2471-76.
Sluijter, A. M. C. & van Heuven, V. J. (1996a). Acoustic correlates of linguistic stress and
accent in Dutch and American English. In ICSLP'96 Philadelphia.
Sluijter, A. M. C., van Heuven, V. J., & Pacilly, J. J. A. (1997). Spectral Balance as a cue
in the perception of linguistic stress. Journal of Acoustical Society of America 101, pp.
503-513.
Smith, C. L. (1997). The devoicing of /z/ in American English: effects of local and
prosodic context. Journal of Phonetics 25, pp. 471-500.
Stevens, K. N. (1998). Acoustic Phonetics, The MIT Press, Cambridge, Massachusetts.
Stoel-Gammon, C. & Buder, E. H. (1999). Vowel Length, Post-Vocalic Voicing and VOT
in the Speech of Two-Year Olds. In Proceedings of the 14th International Congress of
Phonetic Sciences pp. 2485-2488. San Francisco.
Stoel-Gammon, C., Buder, E. H., & Kehoe, M. M. (1995). Acquisition of vowel duration:
a comparison of Swedish and English. Proceesings of the 13th International Congress of
Phonetic Sciences, Stockholm.
288
Stoel-Gammon, C. & Herrington, P. B. (1990). Vowel systems of normally developing
and phonologically disordered children. Clinical Linguistics and Phonetics 4, pp. 145160.
Stow, C. & Dodd, B. (2003). Providing an equitable service to bilingual children in the
UK: a review. International Journal of Language and Communication Disorders 38, pp.
351-378.
Strathopoulos, E. T. (1995). Variability revisited: an acoustic, aerodynamic, and
respiratory kinematic comparison of children and adults during speech. Journal of
Phonetics 23, pp. 67-80.
Strathopoulos, E. T. & Sapienza, C. (1993). Respriratory and laryngeal measures of
children during vocal intensity variation. Journal of Acoustical Society of America 94, pp.
2531-2543.
Svetozarova, N. (1998). Intonation in Russian., eds. Hirst, D. & Di Christo, A., pp. 261274. Cambridge University Press, Cambridge.
Taff, A., Rozelle, L., Cho, T., Ladefoged, P., Dirks, M., & Wegelin, J. (2004). Phonetic
Structures of Aleut. Journal of Phonetics 29, pp. 231-271.
Titze, I. R. (1994). Principles of Voice Production, Prentice-Hall; Englewood Cliffs, N.J.,
USA.
Titze, I. R. & Sundberg, J. (1992). Vocal internsity in speakers and singers. Journal of
Acoustical Society of America 91, pp. 2936-2946.
Tomioka, N. (2002). A bilingual language production model. Paper presented at the
International Symposium on the Multimodality of Human Communication, University of
Toronto, 5 May.
Traunmüller, H. & Eriksson, A. (1997). A method of measuring formant frequencies at
high fundamental frequencies. Proceedings of EuroSpeech '97 1, pp. 477-480.
Traunmüller, H. & Eriksson, A. (2000). Acoustic effects of variation in vocal effort by
men, women, and children. Journal of Acoustical Society of America 107, pp. 3438-3451.
Trubetskoy, N. S. (1939). Gründzuge der Phonologie, Moscow, 2000.
Tsejtlin, S. V. (2002). Yazyk i rebenok: lingvistika detskoj rechi, pp. 1-239. Vlados,
Moscow.
Turk, A. & Sawusch, J. R. (1999). The domain of accentual lengthening in American
English. Journal of Phonetics 25, pp. 25-41.
289
van Zanten, E., Damen, L., & van Houten, E. The ASSP Speech Database. SPIN/ASSPreport 41. 1991. Utrecht, Speech Technology Foundation.
Vihman, M. M. (1996). Phonological Development: The Origins of Language in the
Child, Blackwell Publishers, Cambridge, Massachusetts - Oxford.
Vihman, M. M. (2002). Getting started without a system: from phonetics to phonology in
bilingual development. International Journal of Bilingualism 6, pp. 239-254.
Volterra, V. & Taeschner, T. (1978). The acquisition and development of language in
bilingual children. Journal of Child Language 5, pp. 311-326.
Walker, V. (1992). The Formant Frequencies of Scottish Vowels, Unpublished BSc
dissertation. Queen Margeret University College, Edinburgh.
Weinreich, U. (1953). Languages in Contact: Findings and Problems, 9th 1979 ed.
Mouton, The Hague.
Wells, J. Computer-coding the IPA: a proposed extension of SAMPA. 1995.
Wells, J. A study of the formants of the pure vowels of British English. 1962. University
of London, London.
Wells, J. (1982). Accents of English, pp. 1-673. Cambridge University Press, Cambridge.
Whitworth, N. (2003). Bilingual Acquisition of Speech Timing: Aspects of Rhythm
Production by German-English Families, Unpublished Ph.D. thesis. The University of
Leeds, Leeds.
Williams, L. (1980). Phonetic variation as a function of second-language learning. In
Child Phonology: perception, eds. Yeni-Komshian, G., Ferguson, C., & Kavanagh, J., pp.
185-216. Academic Press, New York.
Wode, H. (1992). Categorical Perception and Segmental Coding. In Phonological
Development: Models, Research, Implications, eds. Ferguson, C. A., Menn, L., & StoelGammon, C., pp. 605-631. Timonium, Maryland.
Zharkova, N. N. (2002). Razvitie fonologicheskoj sistemy detskoy rechi
(eksperimental'no-foneticheskoe issledovanie), unpublished M.Sc thesis. St. Petersburg
State University, St. Petersburg.
290
Appendix A Phonetic ranges of the production of the target
/i/ by the SSE monolingual children.
Speaker
C3_3;4
C7_4;2
C4_3;8
C3_3;11
C6_4;0
C8_4;2
C5_4;0
C4_4;1
C9_4;9
C7_4;8
Total
Tokens per
speaker
N
%
N
%
N
%
N
%
N
%
N
%
N
%
N
%
N
%
N
%
N
%
Label
[i]
[]
87
100.0%
74
100.0%
36
100.0%
78
100.0%
68
100.0%
122
100.0%
52
100.0%
54
100.0%
105
99.1%
108
97.3%
784
99.5%
0
.0%
0
.0%
0
.0%
0
.0%
0
.0%
0
.0%
0
.0%
0
.0%
1
.9%
3
2.7%
4
.5%
Total
87
100.0%
74
100.0%
36
100.0%
78
100.0%
68
100.0%
122
100.0%
52
100.0%
54
100.0%
106
100.0%
111
100.0%
788
100.0%
291
Appendix B Distributions of the three most frequent
phonetic labels (per carrier word) for the target //
produced by the SSE monolingual children.
Tokens
Label
[]
Carrier
shoes
soup
cook
food
put
foot
took
Total
N
% within carrier
% within label
N
% within carrier
% within label
N
% within carrier
% within label
N
% within carrier
% within label
N
% within carrier
% within label
N
% within carrier
% within label
N
% within carrier
% within label
N
% within carrier
% within label
193
98.0%
38.8%
6
85.7%
1.2%
126
69.2%
25.4%
136
81.9%
27.4%
28
93.3%
5.6%
6
85.7%
1.2%
2
66.7%
.4%
497
84.0%
100.0%
[u]
4
2.0%
15.4%
0
.0%
.0%
6
3.3%
23.1%
15
9.0%
57.7%
0
.0%
.0%
1
14.3%
3.8%
0
.0%
.0%
26
4.4%
100.0%
Total
[]
0
.0%
.0%
1
14.3%
1.4%
50
27.5%
72.5%
15
9.0%
21.7%
2
6.7%
2.9%
0
.0%
.0%
1
33.3%
1.4%
69
11.7%
100.0%
197
100.0%
33.3%
7
100.0%
1.2%
182
100.0%
30.7%
166
100.0%
28.0%
30
100.0%
5.1%
7
100.0%
1.2%
3
100.0%
.5%
592
100.0%
100.0%
292
Appendix C Duration of the close(-mid) vowels produced
by the adult subjects as a function of the following consonant
in SSE, MSR and SSBE.
Language Speaker
Vowel
SSE
/i/
S2
//
//
S1
/i/
//
//
S5
/i/
//
//
S4
/i/
//
//
Following
consonant
fric voice+
stop voice+
stop voiceTotal
fric voice+
stop voice+
stop voiceTotal
fric voice+
stop voice+
stop voiceTotal
fric voice+
stop voice+
stop voiceTotal
fric voice+
stop voice+
stop voiceTotal
fric voice+
stop voice+
stop voiceTotal
fric voice+
stop voice+
stop voiceTotal
fric voice+
stop voice+
stop voiceTotal
fric voice+
stop voice+
stop voiceTotal
fric voice+
stop voice+
stop voiceTotal
fric voice+
stop voice+
stop voiceTotal
fric voice+
stop voice+
stop voice-
Median
duration
(ms)
218.02
135.89
105.13
138.11
112.84
90.10
70.72
97.25
203.01
119.19
107.32
130.95
165.29
114.91
99.41
117.27
119.23
117.09
99.19
113.41
169.96
113.43
104.31
121.27
219.32
115.73
100.36
117.55
104.05
105.39
92.87
102.66
223.11
106.28
109.51
121.45
216.78
111.92
99.73
119.09
118.15
92.15
95.79
102.21
249.05
118.54
96.83
Mean
duratio
n (ms)
Std. Dev.
212.59
136.11
105.36
153.06
115.18
93.96
71.48
97.95
197.68
118.08
110.09
149.80
173.28
114.62
97.97
128.63
118.56
115.90
100.65
112.07
177.11
114.17
105.15
137.54
223.86
116.80
101.19
148.05
106.69
105.63
91.02
101.34
225.00
108.81
107.35
152.30
218.43
116.43
99.52
145.06
111.77
105.90
100.41
105.20
225.09
114.60
104.73
39.55
13.49
22.20
52.81
14.22
16.51
6.88
21.51
28.96
14.47
20.83
46.70
25.15
12.72
12.71
36.97
15.34
7.70
10.25
13.66
27.69
9.56
9.46
37.71
25.85
15.21
9.97
57.74
10.22
12.68
12.61
13.62
31.79
13.84
11.69
60.99
59.97
22.13
27.91
65.94
26.37
32.59
26.93
28.24
73.49
25.83
29.09
n of
tokens
28
28
25
81
12
12
6
30
28
26
13
67
30
30
30
90
14
15
13
42
30
30
15
75
27
28
25
80
15
15
14
44
26
29
14
69
21
22
20
63
9
11
14
34
28
23
11
293
S3
/i/
//
//
MSR
R3
/i/
/u/
R4
/i/
/u/
R2
/i/
/u/
R1
/i/
/u/
R5
/i/
/u/
SSBE
E2
/i/
Total
fric voice+
stop voice+
stop voiceTotal
fric voice+
stop voice+
stop voiceTotal
fric voice+
stop voice+
stop voiceTotal
fric voice+
stop voice+
stop voiceTotal
fric voice+
stop voice+
stop voiceTotal
fric voice+
stop voice+
stop voiceTotal
fric voice+
stop voice+
stop voiceTotal
fric voice+
stop voice+
stop voiceTotal
fric voice+
stop voice+
stop voiceTotal
fric voice+
stop voice+
stop voiceTotal
fric voice+
stop voice+
stop voiceTotal
fric voice+
stop voice+
stop voiceTotal
fric voice+
stop voice+
stop voiceTotal
fric voice+
stop voice+
stop voiceTotal
134.81
223.97
147.68
124.08
151.69
139.25
118.67
112.40
125.77
224.93
132.66
124.19
150.54
104.85
99.83
90.86
98.59
116.16
99.62
106.37
106.37
84.78
93.78
80.72
87.69
100.52
90.48
95.11
93.02
90.76
90.48
83.91
87.97
96.95
101.96
95.71
98.34
135.93
116.52
107.46
117.79
142.10
104.29
97.60
107.61
114.35
97.08
82.15
97.54
121.63
92.06
89.77
92.92
246.41
229.88
138.42
214.65
162.75
225.18
148.59
124.00
168.18
139.83
119.61
111.29
123.49
225.79
135.35
123.24
169.72
106.71
103.71
88.05
100.87
115.63
104.84
113.98
111.00
88.19
94.32
81.45
88.58
100.10
89.12
91.47
93.96
92.61
92.19
84.17
90.36
99.34
102.83
92.63
99.82
134.03
117.10
104.13
118.27
142.64
103.78
100.48
110.55
116.44
100.59
81.26
99.84
120.89
99.28
88.19
99.14
245.96
223.55
134.31
197.31
77.69
22.46
18.01
19.18
47.85
13.28
13.69
21.59
20.13
24.56
18.73
16.28
51.17
19.03
20.87
13.19
19.87
18.64
17.46
23.71
19.72
14.51
9.89
12.86
13.49
14.41
11.49
15.27
14.26
19.69
15.14
9.88
16.29
12.37
15.57
9.87
13.81
16.35
12.45
13.95
19.86
20.17
18.12
16.41
24.28
15.75
12.55
16.06
20.46
15.70
17.65
8.66
18.72
27.32
22.19
16.18
56.66
62
29
28
25
82
14
15
14
43
30
30
14
74
44
45
30
119
29
29
15
73
45
39
29
113
29
29
15
73
40
39
26
105
26
28
10
64
41
20
44
105
21
40
40
101
15
15
14
44
15
15
30
60
30
14
29
73
294
//
/u/
//
E1
/i/
//
/u/
//
E3
/i/
//
/u/
//
E5
/i/
//
/u/
//
fric voice+
stop voice+
stop voiceTotal
fric voice+
stop voice+
Total
stop voiceTotal
fric voice+
stop voice+
stop voiceTotal
fric voice+
stop voice+
stop voiceTotal
fric voice+
stop voice+
Total
stop voiceTotal
fric voice+
stop voice+
stop voiceTotal
fric voice+
stop voice+
stop voiceTotal
fric voice+
stop voice+
Total
stop voiceTotal
fric voice+
stop voice+
stop voiceTotal
fric voice+
stop voice+
stop voiceTotal
fric voice+
stop voice+
Total
stop voiceTotal
146.65
145.81
126.54
142.87
235.30
214.94
227.73
110.52
110.52
250.11
228.31
155.65
217.17
143.12
132.71
113.08
133.43
228.96
221.59
226.61
111.68
111.68
336.38
320.51
176.34
288.21
224.15
193.57
159.26
193.37
323.06
327.39
324.11
155.56
155.56
289.12
254.49
145.64
253.27
172.54
153.78
126.96
152.04
288.46
247.77
275.63
112.09
112.09
149.06
144.76
125.91
139.91
234.01
222.02
229.92
112.42
112.42
246.64
231.96
151.76
206.48
150.55
133.65
118.16
134.12
233.48
229.00
231.95
112.64
112.64
333.81
313.11
174.84
265.45
218.79
190.62
163.16
190.86
318.58
315.46
317.54
158.43
158.43
294.37
256.51
143.99
228.91
169.96
155.78
128.39
151.37
295.55
253.99
281.05
115.86
115.86
17.95
15.25
16.19
19.06
23.96
26.73
25.29
15.39
15.39
25.19
19.89
23.96
50.34
22.79
22.01
25.91
26.68
26.19
32.16
28.04
20.17
20.17
37.86
32.83
17.98
81.36
26.04
24.09
20.30
32.54
43.82
44.43
43.54
18.63
18.63
29.97
13.75
15.12
72.34
17.27
16.17
14.45
23.43
35.65
16.97
36.26
14.90
14.90
15
15
15
45
29
15
44
22
22
30
15
29
74
15
15
15
45
27
14
41
30
30
30
14
30
74
15
15
15
45
30
15
45
28
28
30
15
28
73
15
15
15
45
28
15
43
30
30
295
Appendix D Duration of the close(-mid) vowels produced by
the adult subjects averaged per language (SSE, MSR and
SSBE) and speaker as a function of the following consonant.
Language
SSE
Vowel
/i/
//
//
MSR
/i/
/u/
SSBE
/i/
//
/u/
//
Following
consonant
fric voice+
stop voice+
stop voiceTotal
fric voice+
stop voice+
stop voiceTotal
fric voice+
stop voice+
stop voiceTotal
fric voice+
stop voice+
stop voiceTotal
fric voice+
stop voice+
stop voiceTotal
fric voice+
stop voice+
stop voiceTotal
fric voice+
stop voice+
stop voiceTotal
fric voice+
stop voice+
Total
stop voiceTotal
Median
duration
(ms)
211.20
123.60
102.72
132.79
118.57
108.97
96.37
108.01
208.54
118.62
108.55
132.16
102.72
97.81
89.72
97.07
111.18
97.77
94.51
99.78
271.66
248.62
149.99
227.88
165.25
150.52
130.62
148.99
265.60
245.31
255.55
119.12
119.12
Mean
duration
(ms)
209.72
126.78
105.55
148.35
118.84
108.96
97.96
108.76
209.68
118.46
110.28
154.21
106.00
99.95
90.29
99.41
113.73
100.31
97.03
103.68
280.20
255.87
151.49
224.61
172.09
156.20
133.90
154.06
271.26
255.56
265.90
125.13
125.13
Std. Dev.
40.33
21.12
20.86
53.55
19.35
19.21
21.06
21.47
45.34
19.15
18.81
56.54
24.39
17.27
16.10
20.96
22.48
17.08
17.26
20.27
47.53
41.40
23.86
71.06
35.21
28.87
25.95
33.94
50.34
48.37
50.10
26.13
26.13
n of
tokens
135
136
125
396
64
68
61
193
142
138
67
347
185
158
143
486
120
141
110
371
120
58
116
294
60
60
60
180
114
59
173
110
110
296
Appendix E Individual results of the SSE monolingual
children for the duration of the vowel /i/ as a function of the
following consonant.
SSE
monolingual
child
C3_3;4
C7_4;2
C4_3;8
C3_3;11
C6_4;0
C8_4;2
C5_4;0
C4_4;1
C9_4;9
C7_4;8
Following
Consonant
fric voice+
stop voice+
stop voiceTotal
fric voice+
stop voice+
stop voiceTotal
fric voice+
stop voice+
stop voiceTotal
fric voice+
stop voice+
stop voiceTotal
fric voice+
stop voice+
stop voiceTotal
fric voice+
stop voice+
stop voiceTotal
fric voice+
stop voice+
stop voiceTotal
fric voice+
stop voice+
stop voiceTotal
fric voice+
stop voice+
stop voiceTotal
fric voice+
stop voice+
stop voiceTotal
Median
duration of /i/
(ms)
299.68
165.27
173.51
191.99
435.95
163.42
182.98
268.53
350.06
209.38
163.28
209.38
316.29
158.36
136.17
199.37
346.64
172.37
127.72
173.86
304.05
180.09
94.60
201.99
367.13
289.11
227.93
301.91
339.46
166.81
199.17
248.44
258.44
144.53
125.69
174.01
331.08
162.33
138.90
187.27
n of tokens
38
21
47
106
38
10
26
74
17
7
19
43
32
16
34
82
34
18
41
93
56
25
41
122
33
15
27
75
27
8
22
57
48
13
42
103
39
20
49
108
297
Appendix F Individual results of the SSE monolingual
children for the duration of the vowel // as a function of the
following consonant.
Following
SSE Child consonant
C3_3;4
fric voice+
stop voice+
stop voiceTotal
C7_4;2
fric voice+
stop voice+
stop voiceTotal
C4_3;8
fric voice+
stop voice+
stop voiceTotal
C3_3;11
fric voice+
stop voice+
stop voiceTotal
C6_4;0
fric voice+
stop voice+
stop voiceTotal
C8_4;2
fric voice+
stop voice+
stop voiceTotal
C5_4;0
fric voice+
stop voice+
stop voiceTotal
C4_4;1
fric voice+
stop voice+
stop voiceTotal
C9_4;9
fric voice+
stop voice+
stop voiceTotal
C7_4;8
fric voice+
stop voice+
stop voiceTotal
Median
vowel duration (ms)
334.12
210.24
136.55
198.75
539.71
142.39
130.97
151.32
232.29
255.14
71.77
247.67
310.43
174.94
111.26
174.94
382.75
192.70
114.95
187.15
375.94
157.37
98.54
150.65
483.88
211.10
136.80
235.53
321.89
471.61
115.15
249.73
217.14
106.52
68.42
136.67
322.19
151.56
123.46
187.25
n of
tokens
27
15
28
70
14
14
20
48
7
6
2
15
16
13
18
47
19
15
27
61
25
26
39
90
21
21
21
63
14
11
10
35
21
15
15
51
21
23
22
66
298
Appendix G Individual results of the SSE monolingual
children for the duration of the vowel // as a function of the
following consonant.
SSE child
C3_3;4
C7_4;2
C4_3;8
C3_3;11
C6_4;0
C8_4;2
C5_4;0
C4_4;1
C9_4;9
C7_4;8
Following
Consonant
fric voice+
fric voice stop voice+
Total
fric voice+
fric voice stop voice+
Total
fric voice+
fric voice stop voice+
Total
fric voice+
fric voice stop voice+
Total
fric voice+
fric voice stop voice+
Total
fric voice+
fric voice stop voice+
Total
fric voice+
fric voice stop voice+
Total
fric voice+
fric voice stop voice+
Total
fric voice+
fric voice stop voice+
Total
fric voice+
fric voice stop voice+
Total
Median
duration (ms)
269.28
139.31
228.52
262.40
182.75
130.46
156.42
179.61
206.57
176.12
157.96
195.87
202.19
142.87
186.87
179.47
171.74
109.38
134.47
153.40
219.20
108.38
128.24
157.16
208.81
112.57
270.59
239.62
177.26
157.64
200.09
183.26
148.25
99.35
102.82
145.66
176.57
132.69
159.37
158.75
n of
tokens
22
26
25
73
11
17
10
38
7
12
7
26
16
41
15
72
19
9
21
49
30
35
29
94
22
23
24
69
12
21
11
44
26
32
22
80
21
32
21
74
299
Appendix H Duration of the vowel /i/ as a function of the
following consonant produced by the bilingual subject AN:
longitudinal results for MSR and SSE.
Following
Consonant
voiced
fricative
Language
SSE
MSR
voiced
stop
SSE
MSR
voiceless
stop
SSE
MSR
AGE
3;7
4;2
4;5
Total
3;7
4;2
4;5
Total
3;7
4;2
4;5
Total
3;7
4;2
4;5
Total
3;7
4;2
4;5
Total
3;7
4;2
4;5
Total
Mean
duration
(ms)
247.45
315.22
287.41
265.29
245.35
183.33
219.24
215.70
179.31
169.33
181.42
178.48
248.82
145.42
145.30
178.61
176.48
105.43
143.30
152.18
213.38
296.22
201.37
223.85
Std. Dev.
117.92
235.80
127.60
138.61
94.60
88.11
45.72
80.60
59.42
49.36
67.71
60.65
128.60
91.08
47.38
104.19
94.87
41.21
71.02
83.34
113.44
454.53
79.32
211.73
n of
tokens
118
20
50
188
22
23
26
71
33
12
28
73
27
26
31
84
66
25
49
140
25
12
27
64
300
Appendix I Duration of the vowels // and /u/ as a function
of the following consonant produced by the bilingual subject
AN: longitudinal results for MSR and SSE.
Following
Consonant
voiced
fricative
Language
SSE
MSR
voiced
stop
SSE
MSR
voiceless
stop
SSE
MSR
AGE
3;7
4;2
4;5
Total
3;7
4;2
4;5
Total
3;7
4;2
4;5
Total
1st
2nd
3rd
Total
3;7
4;2
4;5
Total
3;7
4;2
4;5
Total
Mean
duration
(ms)
271.33
240.17
343.85
292.58
307.92
308.72
226.71
275.52
225.76
168.32
222.66
220.51
177.17
109.56
187.54
155.67
196.75
109.41
123.45
152.84
198.89
186.86
169.83
186.06
Std. Dev.
134.86
127.98
186.21
157.34
136.30
144.95
69.42
122.97
92.48
37.37
94.66
91.07
67.98
50.06
126.47
88.22
107.05
46.21
57.48
89.30
70.85
99.82
71.03
81.48
n of
tokens
38
11
27
76
25
30
37
92
37
5
31
73
20
19
14
53
47
13
51
111
48
41
40
129
301
Appendix J Duration of the vowel /i/ as a function of the
following consonant produced by the bilingual subject BS:
longitudinal results for MSR and SSE.
Following
Consonant
voiced
fricative
Age
3;4
3;10
4;5
Total
voiced
stop
3;4
3;10
4;5
Total
voiceless
stop
3;4
3;10
4;5
Total
Language
SSE
MSR
Total
SSE
MSR
Total
SSE
MSR
Total
SSE
MSR
Total
SSE
MSR
Total
SSE
MSR
Total
SSE
MSR
Total
SSE
MSR
Total
SSE
MSR
Total
SSE
MSR
Total
SSE
MSR
Total
SSE
MSR
Total
Mean
duration
(ms)
255.18
239.99
252.62
232.34
207.50
222.09
266.59
241.87
264.59
253.56
219.97
246.05
201.26
232.07
212.46
175.48
189.15
184.38
189.36
281.79
226.83
189.75
217.99
203.87
251.73
272.35
258.06
233.43
170.83
208.74
221.06
249.31
228.76
235.36
229.37
233.44
Std. Dev.
116.91
71.37
110.34
110.68
91.65
103.38
107.33
66.94
104.53
112.05
84.47
107.26
76.73
94.55
83.90
76.02
117.63
104.54
92.30
97.85
103.97
81.14
113.83
99.51
155.58
119.85
145.18
120.85
73.93
108.72
143.88
161.01
148.34
142.68
127.74
137.84
n of
tokens
69
14
83
47
33
80
68
6
74
184
53
237
28
16
44
22
41
63
22
15
37
72
72
144
61
27
88
43
28
71
64
24
88
168
79
247
302
Appendix K Duration of the vowels /u/ and // as a
function of the following consonant produced by the
bilingual subject BS: longitudinal results for MSR and SSE.
Following
Consonant
voiced
fricative
Age
3;4
3;10
4;5
Total
voiced
stop
3;4
3;10
4;5
Total
voiceless
stop
3;4
3;10
4;5
Total
Language
SSE
MSR
Total
SSE
MSR
Total
SSE
MSR
Total
SSE
MSR
Total
SSE
MSR
Total
SSE
MSR
Total
SSE
MSR
Total
SSE
MSR
Total
SSE
MSR
Total
SSE
MSR
Total
SSE
MSR
Total
SSE
MSR
Total
Mean
duration
(ms)
259.91
259.43
259.76
278.58
257.99
265.47
343.03
244.83
301.21
291.98
254.35
274.95
252.69
247.19
250.29
247.23
203.17
216.84
244.07
260.52
250.17
246.90
228.31
237.48
237.06
272.13
259.46
205.27
189.60
195.55
238.04
225.38
229.41
221.71
218.36
219.55
Std. Dev.
99.27
85.61
94.57
96.44
127.38
116.57
220.21
114.76
187.93
153.79
113.67
138.00
158.71
59.98
123.81
115.23
132.28
127.89
130.94
128.71
129.31
132.87
122.90
127.83
112.20
70.73
88.05
108.36
84.03
93.70
111.87
73.35
86.76
109.73
83.04
93.10
n of
tokens
41
18
59
20
35
55
31
23
54
92
76
168
18
14
32
18
40
58
39
23
62
75
77
152
13
23
36
33
54
87
21
45
66
67
122
189
303
Appendix L Mean RMS-power around F2 (dB) for the adult
subjects averaged per language (SSE, MSR and SSBE) for
the vowel /i/ as a function of the following consonant.
Acoustic Following
measure Consonant
A2
A2*a
A2*b
Mean
n of
Language (dB) Std. Dev. subjects
voiced fricative SSE
MSR
SSBE
Total
voiced stop
SSE
MSR
SSBE
Total
voiceless stop SSE
MSR
SSBE
Total
voiced fricative SSE
MSR
SSBE
Total
voiced stop
SSE
MSR
SSBE
Total
voiceless stop SSE
MSR
SSBE
Total
voiced fricative SSE
MSR
SSBE
Total
voiced stop
SSE
MSR
SSBE
Total
voiceless stop SSE
MSR
SSBE
Total
-26.80
-26.49
-23.81
-25.83
-24.52
-28.19
-25.52
-26.12
-22.37
-27.98
-22.07
-24.29
-27.60
-26.44
-25.57
-26.60
-24.67
-28.52
-25.50
-26.28
-22.03
-29.04
-21.41
-24.35
-31.36
-29.63
-27.97
-29.77
-28.43
-31.71
-27.91
-29.45
-25.79
-32.23
-23.81
-27.52
3.09
3.85
3.87
3.57
1.24
5.35
1.92
3.59
2.76
5.30
2.75
4.57
3.42
5.14
4.52
4.14
1.96
6.14
2.93
4.23
3.09
5.68
3.54
5.38
3.42
5.14
3.97
4.16
1.96
6.14
2.83
4.21
3.09
5.68
3.74
5.48
5
5
4
14
5
5
4
14
5
5
4
14
5
5
4
14
5
5
4
14
5
5
4
14
5
5
4
14
5
5
4
14
5
5
4
14
304
Appendix M Mean RMS-power around F2 (dB) for the adult
subjects averaged per language (SSE, MSR and SSBE) for
the close rounded vowels as a function of the following
consonant.
Acoustic Following
n of
measure Consonant
Language Mean
Std. Dev. subjects
A2
voiced fricative SSE
-28.97
4.97
MSR
-23.58
3.48
SSBE
-27.19
2.24
Total
-26.54
4.27
voiced stop
SSE
-31.89
4.00
MSR
-25.39
4.15
SSBE
-23.71
2.13
Total
-27.23
4.97
voiceless stop SSE
-28.22
5.85
MSR
-26.48
3.61
SSBE
-17.00
2.06
Total
-24.39
6.30
A2*a
voiced fricative SSE
-31.29
4.45
MSR
-23.39
4.97
SSBE
-28.37
3.41
Total
-27.63
5.35
voiced stop
SSE
-31.32
3.40
MSR
-25.87
6.04
SSBE
-23.58
3.94
Total
-27.17
5.44
voiceless stop SSE
-24.77
5.44
MSR
-28.42
6.42
SSBE
-18.31
1.35
Total
-24.23
6.32
A2*c
voiced fricative SSE
-36.43
4.45
MSR
-13.11
4.97
SSBE
-34.46
2.42
Total
-27.54
11.84
voiced stop
SSE
-36.46
3.40
MSR
-15.59
6.04
SSBE
-27.10
3.09
Total
-26.33
10.05
voiceless stop SSE
-29.91
5.44
MSR
-18.14
6.42
SSBE
-12.20
2.44
Total
-20.65
8.98
5
5
4
14
5
5
4
14
5
5
4
14
5
5
4
14
5
5
4
14
5
5
4
14
5
5
4
14
5
5
4
14
5
5
4
14
305
Appendix N Mean RMS-power around F2 (dB) produced
by the SSE subjects of different ages for the vowel /i/ as a
function of the following consonant.
Acoustic
Measure
A2
A2*a
A2*b
Following
n of
Consonant
Age
Mean (dB) Std. Dev. subjects
voiced fricative adult
-26.80
3.09
child 3;4 to 3;11
-30.70
2.66
child 4;0 to 4;4
-29.84
4.28
child 4;5 to 4;9
-27.97
4.80
Total
-28.75
3.65
voiced stop
adult
-24.52
1.24
child 3;4 to 3;11
-30.11
1.49
child 4;0 to 4;4
-29.63
3.32
child 4;5 to 4;9
-27.10
5.76
Total
-27.68
3.54
voiceless stop adult
-22.37
2.76
child 3;4 to 3;11
-28.46
3.59
child 4;0 to 4;4
-27.75
3.10
child 4;5 to 4;9
-26.71
5.52
Total
-25.96
4.01
voiced fricative adult
-27.60
3.42
child 3;4 to 3;11
-28.84
3.76
child 4;0 to 4;4
-27.64
5.77
child 4;5 to 4;9
-26.47
6.21
Total
-27.71
4.26
voiced stop
adult
-24.67
1.96
child 3;4 to 3;11
-26.50
2.87
child 4;0 to 4;4
-25.91
4.77
child 4;5 to 4;9
-23.57
6.97
Total
-25.30
3.64
voiceless stop adult
-22.03
3.09
child 3;4 to 3;11
-25.82
3.60
child 4;0 to 4;4
-23.66
6.21
child 4;5 to 4;9
-22.68
7.02
Total
-23.42
4.60
voiced fricative adult
-31.36
3.42
child 3;4 to 3;11
-33.59
3.76
child 4;0 to 4;4
-32.91
5.41
child 4;5 to 4;9
-31.21
6.21
Total
-32.30
4.18
voiced stop
adult
-28.43
1.96
child 3;4 to 3;11
-31.24
2.87
child 4;0 to 4;4
-31.18
4.85
child 4;5 to 4;9
-28.31
6.97
Total
-29.89
3.82
voiceless stop adult
-25.79
3.09
child 3;4 to 3;11
-30.57
3.60
child 4;0 to 4;4
-28.93
6.01
child 4;5 to 4;9
-27.42
7.02
Total
-28.01
4.68
5
3
5
2
15
5
3
5
2
15
5
3
5
2
15
5
3
5
2
15
5
3
5
2
15
5
3
5
2
15
5
3
5
2
15
5
3
5
2
15
5
3
5
2
15
306
Appendix O Mean RMS-power around F2 (dB) produced
by the SSE subjects of different ages for the vowels /i/ and //
across all consonantal contexts.
Acoustic
Measure Vowel
/i/
A2
//
A2*a
/i/
//
A2*b
/i/
//
n of
Age
Mean (dB)Std. Dev. subjects
adult
-24.60
1.71
5
child 3;4 to 3;11
-29.72
2.44
3
child 4;0 to 4;4
-29.14
4.04
5
child 4;5 to 4;9
-27.27
5.21
2
Total
-27.49
3.65
15
adult
-23.11
4.82
5
child 3;4 to 3;11
-19.29
1.09
3
child 4;0 to 4;4
-23.90
2.41
5
child 4;5 to 4;9
-20.07
3.96
2
Total
-22.21
3.66
15
adult
-25.07
1.85
5
child 3;4 to 3;11
-27.08
3.14
3
child 4;0 to 4;4
-25.49
5.43
5
child 4;5 to 4;9
-24.38
6.97
2
Total
-25.52
3.88
15
adult
-23.00
4.02
5
child 3;4 to 3;11
-8.08
6.25
3
child 4;0 to 4;4
-14.78
5.94
5
child 4;5 to 4;9
-11.85
1.83
2
Total
-15.79
7.37
15
adult
-28.83
1.85
5
child 3;4 to 3;11
-31.83
3.14
3
child 4;0 to 4;4
-30.76
5.26
5
child 4;5 to 4;9
-29.13
6.97
2
Total
-30.11
3.91
15
adult
-15.97
4.30
5
child 3;4 to 3;11
-5.32
4.83
3
child 4;0 to 4;4
-14.79
5.84
5
child 4;5 to 4;9
-8.14
3.56
2
Total
-12.40
6.26
15
307
Appendix P Mean RMS-power around F2 (dB) produced
by the SSE subjects of different ages for the vowel // as a
function of the following consonant.
Acoustic Following
Mean
n of
Measure Consonant
Age
(dB)
Std. Dev. subjects
A2
voiced fricative adult
-28.97
4.97
5
child 3;4 to 3;11
-29.03
2.81
3
child 4;0 to 4;4
-29.76
4.55
5
child 4;5 to 4;9
-31.96
0.13
2
Total
-29.64
3.89
15
voiced stop
adult
-31.89
4.00
5
child 3;4 to 3;11
-27.96
2.32
3
child 4;0 to 4;4
-29.00
6.20
5
child 4;5 to 4;9
-27.72
0.90
2
Total
-29.58
4.41
15
voiceless stop adult
-28.22
5.85
5
child 3;4 to 3;11
-28.35
3.36
3
child 4;0 to 4;4
-27.67
4.00
5
child 4;5 to 4;9
-28.09
1.71
2
Total
-28.04
4.03
15
A2*a
voiced fricative adult
-31.29
4.45
5
child 3;4 to 3;11
-24.20
5.82
3
child 4;0 to 4;4
-24.33
5.18
5
child 4;5 to 4;9
-26.18
2.44
2
Total
-26.87
5.43
15
voiced stop
adult
-31.32
3.40
5
child 3;4 to 3;11
-19.95
5.25
3
child 4;0 to 4;4
-19.00
7.35
5
child 4;5 to 4;9
-20.81
1.57
2
Total
-23.54
7.46
15
voiceless stop adult
-24.77
5.44
5
child 3;4 to 3;11
-19.41
3.98
3
child 4;0 to 4;4
-17.71
5.32
5
child 4;5 to 4;9
-19.76
1.98
2
Total
-20.68
5.36
15
A2*b
voiced fricative adult
-36.43
4.45
5
child 3;4 to 3;11
-33.87
5.82
3
child 4;0 to 4;4
-34.97
6.04
5
child 4;5 to 4;9
-35.86
2.44
2
Total
-35.36
4.73
15
voiced stop
adult
-36.46
3.40
5
child 3;4 to 3;11
-29.63
5.25
3
child 4;0 to 4;4
-29.63
8.26
5
child 4;5 to 4;9
-30.49
1.57
2
Total
-32.02
6.13
15
voiceless stop adult
-29.91
5.44
5
child 3;4 to 3;11
-29.09
3.98
3
child 4;0 to 4;4
-28.35
5.58
5
child 4;5 to 4;9
-29.44
1.98
2
Total
-29.16
4.51
15
308
Appendix Q Descriptive statistics of SSE/MSR bilingual
production of vocal effort for the vowel /i/ as a function of
the following consonant based on three acoustic measures
A2, A2*a, A2*b (dB) per speaker, language and age.
Language
SSE
Speaker
AN_3;7
Following
Consonant
fric voice+
stop voice+
stop voice-
BS_3;4
fric voice+
stop voice+
stop voice-
AN_4;2
fric voice+
stop voice+
stop voice-
BS_3;10
fric voice+
stop voice+
stop voice-
Median
Mean
Std. Dev
N
Median
Mean
Std. Dev.
n of tokens
Median
Mean
Std. Dev.
n of tokens
Median
Mean
Std. Dev.
n of tokens
Median
Mean
Std. Dev.
n of tokens
Median
Mean
Std. Dev.
n of tokens
Median
Mean
Std. Dev.
n of tokens
Median
Mean
Std. Dev.
n of tokens
Median
Mean
Std. Dev.
n of tokens
Median
Mean
Std. Dev.
n of tokens
Median
Mean
Std. Dev.
n of tokens
Median
Mean
Std. Dev.
A2
-26.98
-26.36
7.65
118
-27.73
-27.81
6.43
33
-23.87
-24.92
7.39
66
-25.61
-26.61
6.94
69
-24.03
-23.97
5.15
28
-25.39
-23.77
7.06
61
-23.93
-23.93
4.59
20
-19.81
-18.94
6.97
12
-19.68
-19.92
5.22
25
-25.12
-23.91
8.09
47
-23.20
-22.31
8.90
22
-22.62
-22.60
7.46
A2*a
-25.01
-23.19
10.19
118
-24.79
-25.13
7.50
33
-22.32
-22.32
9.03
66
-21.85
-21.63
8.22
69
-20.18
-20.16
6.66
28
-17.61
-16.05
9.14
61
-22.61
-22.33
4.90
20
-18.78
-14.45
14.27
12
-15.56
-14.12
7.94
25
-20.30
-17.26
12.14
47
-18.40
-18.16
11.19
22
-17.46
-15.52
10.28
A2*b
-29.76
-27.94
10.19
118
-29.54
-29.88
7.50
33
-27.06
-27.07
9.03
66
-26.59
-26.38
8.22
69
-24.93
-24.90
6.66
28
-22.36
-20.80
9.14
61
-27.36
-27.08
4.90
20
-23.53
-19.20
14.27
12
-20.31
-18.87
7.94
25
-25.05
-22.01
12.14
47
-23.14
-22.90
11.19
22
-22.21
-20.27
10.28
309
AN_4;5
fric voice+
stop voice+
stop voice-
BS_4;5
fric voice+
stop voice+
stop voice-
MSR
AN_3;7
fric voice+
stop voice+
stop voice-
BS_3;4
fric voice+
stop voice+
stop voice-
AN_4;2
fric voice+
stop voice+
n of tokens
Median
Mean
Std. Dev.
n of tokens
Median
Mean
Std. Dev.
n of tokens
Median
Mean
Std. Dev.
n of tokens
Median
Mean
Std. Dev.
n of tokens
Median
Mean
Std. Dev.
n of tokens
Median
Mean
Std. Dev.
n of tokens
Median
Mean
Std. Dev.
n of tokens
Median
Mean
Std. Dev.
n of tokens
Median
Mean
Std. Dev.
n of tokens
Median
Mean
Std. Dev.
n of tokens
Median
Mean
Std. Dev.
n of tokens
Median
Mean
Std. Dev.
n of tokens
Median
Mean
Std. Dev.
n of tokens
Median
Mean
43
-24.03
-23.88
5.38
50
-21.98
-22.04
6.64
28
-20.80
-20.22
6.51
49
-26.37
-25.81
7.90
68
-20.58
-21.94
7.14
22
-24.77
-25.30
6.73
64
-21.42
-21.50
5.16
22
-26.07
-27.59
6.87
27
-25.80
-24.95
6.13
25
-23.70
-25.06
7.11
14
-22.91
-24.22
6.33
16
-24.61
-25.23
8.21
27
-19.94
-20.36
7.90
23
-21.46
-23.04
43
-24.03
-22.92
6.99
50
-19.92
-19.33
7.73
28
-19.25
-18.29
7.80
49
-24.34
-22.35
8.24
68
-15.06
-16.05
8.25
22
-18.19
-19.21
7.28
64
-13.65
-14.75
6.13
22
-26.13
-23.31
8.84
27
-22.74
-22.24
7.45
25
-20.55
-20.33
6.58
14
-15.52
-18.82
8.21
16
-20.35
-20.15
9.37
27
-17.10
-16.23
12.10
23
-17.20
-16.49
43
-28.78
-27.66
6.99
50
-24.67
-24.08
7.73
28
-24.00
-23.04
7.80
49
-29.09
-27.10
8.24
68
-19.81
-20.80
8.25
22
-22.94
-23.95
7.28
64
-18.43
-19.53
6.13
22
-30.91
-28.09
8.84
27
-27.53
-27.03
7.45
25
-25.33
-25.11
6.58
14
-20.30
-23.60
8.21
16
-25.13
-24.93
9.37
27
-21.88
-21.02
12.10
23
-21.99
-21.28
310
BS_3;10
AN_4;5
BS_4;5
Std. Dev.
n of tokens
stop voice- Median
Mean
Std. Dev.
n of tokens
fric voice+ Median
Mean
Std. Dev.
n of tokens
stop voice+ Median
Mean
Std. Dev.
n of tokens
stop voice- Median
Mean
Std. Dev.
n of tokens
fric voice+ Median
Mean
Std. Dev.
n of tokens
stop voice+ Median
Mean
Std. Dev.
n of tokens
stop voice- Median
Mean
Std. Dev.
n of tokens
fric voice+ Median
Mean
Std. Dev.
n of tokens
stop voice+ Median
Mean
Std. Dev.
n of tokens
stop voice- Median
Mean
Std. Dev.
n of tokens
9.85
26
-20.54
-22.54
10.71
12
-23.52
-24.59
7.68
33
-26.74
-26.84
8.17
41
-24.37
-23.15
6.56
28
-22.77
-23.92
5.64
26
-22.67
-22.55
5.87
31
-25.29
-24.00
8.08
27
-27.50
-29.18
4.83
6
-30.14
-28.72
7.26
15
-24.93
-24.86
6.84
24
12.76
26
-14.96
-15.29
15.21
12
-22.64
-21.22
8.10
33
-20.59
-21.51
8.82
41
-21.31
-18.73
8.44
28
-18.48
-21.02
6.19
26
-21.31
-21.72
6.43
31
-25.13
-23.47
8.03
27
-22.84
-25.80
8.16
6
-24.15
-21.98
9.24
15
-23.42
-23.03
8.69
24
12.76
26
-19.75
-20.07
15.21
12
-27.42
-26.00
8.10
33
-25.37
-26.30
8.82
41
-26.09
-23.52
8.44
28
-23.27
-25.80
6.19
26
-26.09
-26.50
6.43
31
-29.91
-28.26
8.03
27
-27.63
-30.59
8.16
6
-28.93
-26.77
9.24
15
-28.20
-27.81
8.69
24
311
Appendix R Descriptive statistics of bilingual SSE
production of vocal effort for the tense/lax vowels /i/ and //
based on three acoustic measures A2, A2*a, A2*b (dB) per
speaker and age.
Speaker
AN_3;7
BS_3;4
AN_4;2
BS_3;10
AN_4;5
BS_4;5
SSE vowel
Median
/i/
Mean
Std. Dev.
n of tokens
Median
//
Mean
Std. Dev.
n of tokens
Median
/i/
Mean
Std. Dev.
n of tokens
Median
//
Mean
Std. Dev.
n of tokens
Median
/i/
Mean
Std. Dev.
n of tokens
Median
//
Mean
Std. Dev.
n of tokens
Median
/i/
Mean
Std. Dev.
n of tokens
Median
//
Mean
Std. Dev.
n of tokens
Median
/i/
Mean
Std. Dev.
n of tokens
Median
//
Mean
Std. Dev.
n of tokens
Median
/i/
Mean
Std. Dev.
n of tokens
Median
//
Mean
Std. Dev.
n of tokens
A2
-26.23
-26.14
7.43
217
-18.11
-17.86
5.50
106
-25.01
-25.05
6.81
158
-22.07
-22.49
6.53
77
-21.44
-21.12
5.73
57
-20.97
-21.62
8.21
40
-23.90
-23.06
7.95
113
-24.28
-24.36
6.85
74
-22.67
-22.06
6.28
127
-17.67
-17.36
5.63
151
-24.96
-25.05
7.39
154
-26.29
-26.55
6.76
78
A2*a
-24.44
-23.22
9.48
217
-5.89
-6.19
8.35
106
-19.28
-19.22
8.68
158
-21.90
-21.70
7.54
77
-19.13
-17.07
9.51
57
-17.40
-17.08
9.39
40
-18.82
-16.75
11.17
113
-25.70
-25.49
8.43
74
-20.42
-20.34
7.71
127
-6.45
-7.18
7.37
151
-20.64
-20.15
8.11
154
-25.47
-25.52
9.64
78
A2*b
-29.19
-27.97
9.48
217
-2.98
-3.28
8.35
106
-24.03
-23.96
8.68
158
-18.99
-18.79
7.54
77
-23.88
-21.82
9.51
57
-14.49
-14.17
9.39
40
-23.57
-21.50
11.17
113
-22.79
-22.58
8.43
74
-25.17
-25.09
7.71
127
-3.54
-4.27
7.37
151
-25.39
-24.89
8.11
154
-22.56
-22.61
9.64
78
312
Appendix S Descriptive statistics of SSE/MSR bilingual
production of vocal effort for the close rounded vowels as a
function of the following consonant based on three acoustic
measures A2, A2*a, A2*c (dB) per speaker, language and
age.
Language
SSE
Speaker
AN_3;7
Following
Consonant
fric voice+
stop voice+
stop voice-
BS_3;4
fric voice+
stop voice+
stop voice-
AN_4;2
fric voice+
stop voice+
stop voice-
BS_3;10
fric voice+
stop voice+
stop voice-
Median
Mean
Std. Dev.
n of tokens
Median
Mean
Std. Dev.
n of tokens
Median
Mean
Std. Dev.
n of tokens
Median
Mean
Std. Dev.
n of tokens
Median
Mean
Std. Dev.
n of tokens
Median
Mean
Std. Dev.
n of tokens
Median
Mean
Std. Dev.
n of tokens
Median
Mean
Std. Dev.
n of tokens
Median
Mean
Std. Dev.
n of tokens
Median
Mean
Std. Dev.
n of tokens
Median
Mean
Std. Dev.
n of tokens
Median
A2
-19.62
-20.52
6.69
38
-22.18
-22.41
5.97
37
-21.00
-21.08
5.58
47
-23.39
-23.97
7.31
41
-27.94
-27.93
6.63
18
-29.24
-30.00
5.99
13
-19.17
-19.71
4.50
11
-20.88
-22.17
7.56
5
-20.75
-21.40
5.74
13
-21.47
-21.05
8.07
20
-30.46
-31.46
6.99
18
-27.12
A2*a
-16.89
-17.28
8.96
38
-16.84
-15.26
6.77
37
-12.93
-13.13
6.68
47
-19.46
-20.10
10.63
41
-14.43
-14.27
11.42
18
-21.87
-21.10
9.67
13
-12.48
-13.70
6.06
11
-11.97
-13.17
5.53
5
-12.50
-12.44
3.89
13
-13.51
-14.49
13.29
20
-13.32
-18.46
10.30
18
-15.40
A2*c
-26.56
-26.95
8.96
38
-26.52
-24.93
6.77
37
-22.61
-22.80
6.68
47
-29.13
-29.78
10.63
41
-24.10
-23.94
11.42
18
-31.55
-30.77
9.67
13
-22.16
-23.59
5.87
11
-21.65
-22.85
5.53
5
-22.18
-22.12
3.89
13
-23.19
-24.16
13.29
20
-23.00
-28.14
10.30
18
-25.07
313
AN_4;5
fric voice+
stop voice+
stop voice-
BS_4;5
fric voice+
stop voice+
stop voice-
MSR
AN_3;7
fric voice+
stop voice+
stop voice-
BS_3;4
fric voice+
stop voice+
stop voice-
AN_4;2
fric voice+
stop voice+
Mean
Std. Dev.
n of tokens
Median
Mean
Std. Dev.
n of tokens
Median
Mean
Std. Dev.
n of tokens
Median
Mean
Std. Dev.
n of tokens
Median
Mean
Std. Dev.
n of tokens
Median
Mean
Std. Dev.
n of tokens
Median
Mean
Std. Dev.
n of tokens
Median
Mean
Std. Dev.
n of tokens
Median
Mean
Std. Dev.
n of tokens
Median
Mean
Std. Dev.
n of tokens
Median
Mean
Std. Dev.
n of tokens
Median
Mean
Std. Dev.
n of tokens
Median
Mean
Std. Dev.
n of tokens
Median
Mean
Std. Dev.
n of tokens
Median
-27.05
6.84
33
-20.85
-22.24
7.80
27
-21.17
-20.81
4.82
31
-21.36
-22.18
5.24
51
-24.27
-25.36
8.39
31
-28.19
-27.80
6.55
39
-26.90
-27.29
6.20
21
-23.21
-23.75
7.07
25
-25.63
-24.13
10.36
20
-33.55
-31.91
6.35
48
-27.61
-27.95
6.54
18
-25.63
-24.20
7.88
14
-28.02
-28.84
5.50
23
-22.13
-23.37
9.25
30
-15.62
-15.51
8.89
33
-12.43
-14.58
11.48
27
-12.34
-12.32
7.14
31
-15.62
-16.07
7.47
51
-21.43
-21.66
8.68
31
-19.96
-19.31
8.13
39
-21.51
-20.99
8.23
21
-2.92
-2.87
13.42
25
-7.00
-2.08
16.92
20
-23.19
-18.75
13.64
48
-11.75
-10.89
10.20
18
-3.18
-3.23
13.40
14
-11.75
-14.63
8.25
23
3.00
0.90
12.39
30
8.31
-25.19
8.89
33
-22.11
-24.26
11.48
27
-22.02
-22.00
7.14
31
-25.29
-25.75
7.47
51
-31.10
-31.34
8.68
31
-29.64
-28.98
8.13
39
-31.18
-30.66
8.23
21
-13.00
-12.95
13.42
25
-17.08
-12.16
16.92
20
-33.27
-28.83
13.64
48
-21.83
-20.97
10.20
18
-13.26
-13.31
13.40
14
-21.83
-24.71
8.25
23
-7.08
-9.19
12.39
30
-1.77
314
BS_3;10
AN_4;5
BS_4;5
Mean
Std. Dev.
n of tokens
stop voice- Median
Mean
Std. Dev.
n of tokens
fric voice+ Median
Mean
Std. Dev.
n of tokens
stop voice+ Median
Mean
Std. Dev.
n of tokens
stop voice- Median
Mean
Std. Dev.
n of tokens
fric voice+ Median
Mean
Std. Dev.
n of tokens
stop voice+ Median
Mean
Std. Dev.
n of tokens
stop voice- Median
Mean
Std. Dev.
n of tokens
fric voice+ Median
Mean
Std. Dev.
n of tokens
stop voice+ Median
Mean
Std. Dev.
n of tokens
stop voice- Median
Mean
Std. Dev.
n of tokens
-22.03
11.46
19
-24.77
-24.45
7.68
41
-23.43
-23.94
8.64
35
-23.69
-24.36
8.27
40
-24.43
-25.60
7.31
54
-26.28
-26.94
7.89
37
-30.70
-27.88
10.18
14
-28.61
-29.09
7.88
40
-26.64
-25.95
7.78
23
-22.34
-23.18
10.15
23
-26.94
-26.19
6.37
45
6.14
18.73
19
-9.79
-6.08
14.08
41
-2.61
-3.14
12.71
35
-3.63
-3.56
14.03
40
-5.56
-6.32
11.06
54
-9.20
-7.63
15.04
37
-9.11
-6.64
17.98
14
-14.99
-13.64
14.19
40
-7.39
-5.80
11.50
23
-3.02
0.72
19.46
23
-8.26
-5.56
10.89
45
-3.94
18.73
19
-19.87
-16.16
14.08
41
-12.69
-13.22
12.71
35
-13.71
-13.64
14.03
40
-15.64
-16.40
11.06
54
-19.28
-17.71
15.04
37
-19.19
-16.72
17.98
14
-25.07
-23.73
14.19
40
-17.47
-15.88
11.50
23
-13.10
-9.36
19.46
23
-18.34
-15.64
10.89
45
315
Appendix T Durational ratios for the postvocalic
conditioning of vowel duration for all subjects by language,
age and bilinguality.
Ratios for
Ratios for /u/
//
Ratios for /i/
Language Bilinguality Subject ID VLS/VF VLS/VS VLS/VF VLS/VS VLF/VF
SSE
monolingual S2
0.48
0.77
0.53
0.90
0.62
S1
0.60
0.87
0.61
0.92
0.90
S5
0.46
0.87
0.49
1.03
0.82
S4
0.46
0.89
0.39
0.82
0.80
S3
0.55
0.84
0.55
0.94
0.86
C3_3;4
0.58
1.05
0.41
0.65
0.52
C4_3;8
0.47
0.78
0.31
0.28
0.85
C3_3;11
0.43
0.86
0.36
0.64
0.71
C6_4;0
0.37
0.74
0.30
0.60
0.64
C8_4;2
0.31
0.53
0.26
0.63
0.49
C7_4;2
0.42
1.12
0.24
0.92
0.71
C5_4;0
0.62
0.79
0.28
0.65
0.68
C4_4;1
0.59
1.19
0.36
0.24
0.89
C9_4;9
0.49
0.87
0.32
0.64
0.67
C7_4;8
0.42
0.86
0.38
0.81
0.75
bilingual
AN_3;7
0.72
0.89
0.72
0.78
1.08
BS_3;4
0.94
1.07
0.89
1.03
1.30
AN_4;2
0.36
0.58
0.61
0.73
0.79
BS_3;10
0.91
1.23
0.72
0.81
1.25
AN_4;5
0.48
0.77
0.36
0.57
0.71
BS_4;5
0.73
1.04
0.81
1.00
1.54
MSR
monolingual R3
0.87
0.91
0.92
1.07
R4
0.95
0.86
0.95
1.05
R2
0.92
0.93
0.99
0.94
R1
0.79
0.92
0.69
0.94
R5
0.72
0.85
0.74
0.98
bilingual
AN_3;7
0.73
0.72
0.63
1.09
BS_3;4
1.08
1.12
1.06
1.09
AN_4;2
0.61
0.76
0.67
1.73
BS_3;10
0.85
0.98
0.83
1.01
AN_4;5
0.90
1.38
0.71
0.89
BS_4;5
0.75
0.76
0.94
0.90
SSBE
monolingual E2
0.56
0.60
0.47
0.51
0.69
E1
0.62
0.68
0.49
0.50
0.69
E3
0.52
0.55
0.48
0.48
0.66
E4
0.46
0.72
0.39
0.45
0.74
316