Mitochondrial DNA Analysis of Four Ethnic Groups of Afghanistan

Transcription

Mitochondrial DNA Analysis of Four Ethnic Groups of Afghanistan
Mitochondrial DNA Analysis of
Four Ethnic Groups of Afghanistan
John William Whale
The thesis is submitted in partial fulfilment
of the requirements for the award of Master
of Philosophy of the University of
Portsmouth
January 2012
1
It is nothing for one to know something
unless another knows you know it.
Persian proverb
2
Abstract
Mitochondrial DNA is a small genome, 16569 base pairs in length, which is found in
high quantities within mitochondria inside a typical somatic cell. Mitochondrial DNA is
also unilaterally inherited via the maternal line. As such, mitochondrial DNA is inherited
relatively unmolested from mother to offspring, with exception to mutational episodes,
enabling the historical analysis of a population, group of populations or a species.
Mitochondrial DNA analysis examines both the coding and non-coding regions for the
presence or absence of single nucleotide polymorphisms. When a particular collection of
polymorphisms are present, the mitochondrial DNA can be assigned to a genetic group
known as a haplogroup. The identification of polymorphisms within the non-coding
region (D-loop) illustrates the mitochondrial DNA haplotype. Many haplogroups are
region-specific, in that they often present among populations of a geographical region,
while presence of haplogroups from adjacent regions can infer adjustments to population
structure via gene flow from migratory events. Afghanistan is a landlocked, Central
Asian country which has held a significant strategic position throughout history as a
thoroughfare for ancient trade routes and human migrations. As a consequence,
Afghanistan has a vast diversity of ethnic groups. This study aimed to analyse the
mitochondrial DNA genome to identify the haplogroup composition and distribution
among four ethnic groups of Afghanistan; the Baluch, Hazaras, Pashtuns and Tajiks
which together account for ~80% of the total Afghani population. Afghanistan is a
previously unstudied population, and this study aims to determine whether the
haplogroup composition has been influenced by numerous demographic events. The
Baluch, Pashtun and Tajik ethnic groups believe they have ancestry from west Eurasia
and the Middle East, while the Hazaras believe they are of Mongol descent and this study
also aimed to identify whether the mtDNA haplogroups observed supported the belief
systems of the Afghani ethnic groups and provide indications of their ancestry. The
mitochondrial DNA analysis illustrates that the Hazara possess a large East Asian
haplogroup contribution of 37.5%, while the Baluch, Pashtuns, and Tajiks possess a
much lesser contribution; less than 14.3%. Meanwhile, the Baluch, Pashtuns and Tajiks
each enjoy a large west Eurasian haplogroup contribution of at least 64.3% while the
Hazaras exhibit a west Eurasian haplogroup frequency of 40%. The Hazaras have the
most diverse collection of haplogroups, with only two haplogroups out of the seventeen
observed overall absent from this ethnic group. The Pashtuns have the greatest HVS-I
sequence diversity as no haplogroup is shared within the ethnic group. As a whole, the
3
Afghani populations exhibit a high gene diversity (>0.98). The Hazara, Pashtun and
Tajik populations are considered to be expanding populations, or have recently
experienced an expansion process based upon mismatch distributions. This is supported
by a star-like phylogeny in a Median-Joining network. Genetic barriers were observed
when analysing Afghani HVS-I with an additional 3923 mitochondrial DNA HVS-I
sequences from 62 populations; separating the Iranian Baluch population from the
Afghani Baluch, also the Afghani populations from the Pakistani, Indian Bhargava,
Chinese and Mongol populations. The same analysis has inferred the Afghani ethnic
groups observed are share a greater affinity with west Eurasian and Central Asian
populations rather than to populations of South Asia or East Asia. The haplogroup
analysis indicates the Baluch, Pashtuns and Tajiks share some sort of ancestral heritage,
while the Hazaras, due to their greater East Asian lineage contribution, may be
descendants of a major East Asian expansion, possibly from the Genghis Khan line, and
have experienced a more recent maternal gene flow. These results illustrate the impact of
the historical expansions and migrations have had upon the Afghani population.
4
CONTENTS
1
2
Chapter One - Introduction; Anthropology and DNA
21
1.1 Origins of Modern Humans
22
1.1.1 Early Hominids
22
1.1.2 Modern Humans
25
1.2 Mitochondrial DNA (mtDNA)
29
1.3 Mitochondrial and Y-Chromosome Haplogroup Distribution
33
1.3.1 Mitochondrial DNA Variation
33
1.3.2 Y-Chromosome Variation
39
1.4 Aims
44
Chapter Two - Afghanistan
45
2.1 Geography
46
2.2 Climate
49
2.3 Population
50
2.4 Ethnicity and Language
51
2.5 Migrations
56
2.6 Refugees
57
2.7 Afghan Sub-Populations
57
2.7.1 Pashtuns
57
2.7.2 Tajiks
59
2.7.3 Hazara
59
2.7.4 Uzbeks
61
2.7.5 Other Ethnic Groups
62
2.7.6 Aimaqs
62
2.7.7 Baluch
62
2.7.8 Turkmens
62
2.7.9 Nuristanis
63
2.8 Historical Influence of Afghnistan’s Population
63
2.8.1 Prehistory
63
2.8.2 Aryan Migration
64
2.8.3 Persian Empire
64
2.8.4 Greek Rule
65
2.8.5 Yuezhi and the Kushan Empire
66
2.8.6 Arabs and Islam
66
5
3
4
5
2.8.7 Mongol Dynasty
66
2.8.8 Modern Era
66
Chapter Three - Materials and Methods
68
3.1 Materials
69
3.2 Precautionary Measures
69
3.3 Sample Collection
69
3.4 DNA Isolation
69
3.5 PCR
70
3.6 Agarose Gel Electrophoresis
74
3.7 Glycogen Precipitation of DNA (PCR Products)
75
3.8 Purification of PCR Products
75
3.9 DNA Extraction of PCR Products from Agarose Gels
76
3.10 RFLP Analysis
76
3.11 DNA Sequencing
77
3.11.1 Haplogroup Identification
77
3.11.2 Hypervariable Region I
77
Chapter Four - Results
79
4.1 PCR Amplifications
80
4.2 Haplogroup Characterisations using RFLP Analyses
83
4.3 Haplogroup Characterisations using DNA Sequencing
107
Chapter Five - Phylogeographic Analysis of Afghani mtDNAs
121
5.1 Phylogeography of Individual Haplogroups
128
5.1.1 African Haplogroup L3
128
5.1.2 The Early non-African Lineages
130
5.1.2.1 Haplogroup M*
130
5.1.2.2 Haplogroup N*
130
5.1.2.3 Haplogroup R*
132
5.1.3 The East Asian Lineages
135
5.1.3.1 Haplogroup C
135
5.1.3.2 Haplogroup D
135
5.1.3.3 Haplogroup G
137
5.1.3.4 Haplogroup Z
140
5.1.3.5 Haplogroup A
140
5.1.3.6 Haplogroup B
143
5.1.3.7 Haplogroup F
146
6
5.1.4 The West Eurasian Lineages
5.1.4.1 Haplogroup X
148
5.1.4.2 Haplogroup HV*
150
5.1.4.3 Haplogroup H
150
5.1.4.4 Haplogroup JT
153
5.1.4.5 Haplogroup J
153
5.1.4.6 Haplogroup T
156
5.2 Discussion
6
148
158
Chapter Six - MtDNA Diversity and Polymorphism in Afghani
Populations
167
6.1 Previous mtDNA Studies on Afghani Populations
168
6.2 mtDNA HVS-I Region Sequencing
170
6.2.1 Variable Sites
170
6.2.2 Haplotype Distribution
173
6.2.3 Genetic Diversity
174
6.2.3.1 Gene Diversity
174
6.2.3.2 Nucleotide Diversity
175
6.2.3.3 Theta Estimators
177
6.2.3.4 Mismatch Distribution
178
6.3 Phylogenetic Network of the Afghani Population
181
6.4 Mitochondrial DNA Genetic Barriers between Afghans and
Other Populations
7
181
Chapter Seven - Y-Chromosome Analysis of Afghani Ethnic groups
Article: Haber et al., 2012
8
188
Conclusion
197
References
199
Appendix I
222
Appendix II
229
Appendix III
236
7
Declaration
Whilst registered as a candidate for the above degree, I have not been registered for any
other research award. The results and conclusions embodied in this thesis are the work of
the named candidate and have not been submitted for any other academic award.
8
List of Tables
Chapter One - Introduction; Anthropology and DNA
Table 1.1: Haplogroup frequencies of west Asian haplogroups in India,
Central Asia and the Caucasus (Kivisild et al., 1999).
37
Chapter Two - Afghanistan
Table 2.1: Afghanistan population estimates every five years since 1950
(UNPD, 2009).
50
Table 2.2: Population growth rate every 5 Years since 1950 based on the
Estimated Population of Afghanistan (UNPD, 2009).
51
Table 2.3: UNHCR statistics of displaced Afghanis as of January 2010
(UNHCR, 2011).
57
Chapter Three - Materials and Methods
Table 3.1: Volumes and final concentrations of mastermix reagents for
polymerase chain reaction amplification of mtDNAs.
71
Table 3.2: Oligonucleotide sequences, their co-ordinates and the
fragment sizes produced following PCR (Torroni et al., 1997).
72
Table 3.3: The primers used in this study and their co-ordinates and
fragment sizes produced following PCR (Palanichamy et al.,
2004).
73
Table 3.4: Thermocycler conditions for the primer pairs as described by
Torroni et al. (1997).
74
Table 3.5: Thermocycler conditions for the Palanichamy et al. (2004)
primer pairs.
74
Table 3.6: Reaction mixes for the restriction digests with and without
BSA.
77
Table 3.7: Co-ordinates and sequences of the forward and reverse
oligonucleotides and the fragment size generated for HVS-I
analysis.
77
Chapter Four - Results
Table 4.1: Size of DNA fragments for each haplogroup characterisation
from RFLP analysis; a denotes primer pairs from Torroni et al.,
(1997), b denotes primer pairs from Palanichamy et al., (2004).
Table 4.2: Recognition sequences and cut sites of the enzymes used for
the haplogroup assignment of samples. N = any base (A, C, G or
9
82
T), R = either A or G, W = either A or T, Y = either C or T.
Table 4.3:SNP sites of haplogroups characterised via DNA sequencing.
83
107
Table 4.4: Sequencing results for the samples analysed for Haplogroups
Q & Z.
118
Table 4.5: Sequence results for samples examined for the characteristic
SNPs of Haplogroups S, W & X.
118
Table 4.6: Sequencing results of samples analysed for Haplogroups
B & F.
119
Table 4.7: All samples and the Haplogroups to which they belong.
120
Chapter Five - Phylogeographic Analysis of Afghani mtDNAs
Table 5.1: Frequencies of the regional haplogroup lineages in the
Afghanistan populations (%).
122
Table 5.2: Haplogroup frequencies within the Afghanistan populations
(bold). For comparison, data from other publications has been
included.
125
Table 5.3: East Asian haplogroup frequencies among the Hazara,
Mongolians and Koreans.
164
Table 5.4: Frequency of haplogroups HV & H among Afghan, Iranian,
Iraqi, Turkmen, Uzbek and Pakistani populations.
165
Chapter Six - MtDNA Diversity and Polymorphism in Afghani Populations
Table 6.1: Frequency and nucleotide positions of transitions, transversions
and indels within the HVS-I sequences of the four Afghani ethnic
groups.
172
Table 6.2: General data of the HVS-I polymorphisms among the four
Afghani ethnic groups.
172
Table 6.3: Afghani ethnic group mtDNA HVS-I sequence data.
173
Table 6.4: Number of haplotypes (h), haplotype diversity (Hd) and
nucleotide diversity (π) of the 4 Afghani populations using
DnaSP ver. 5.10.
174
Table 6.5: The number of shared haplotypes between the Afghani ethnic
groups in this study.
174
Table 6.6: Gene diversity of the Afghani populations in this study.
175
Table 6.7: Mean number of pairwise differences between the Afghani
populations.
176
10
Table 6.8: AMOVA results of variance within and among the Afghani
populations and additional populations.
176
Table 6.9: Pairwise differences between pairs of populations.
177
Table 6.10: FST p-values between pairs of populations.
177
Table 6.11: Estimators of female effective population size based upon the
number of pairwise differences (θπ), the number of segregating
sites (θS) and the number of observed haplotypes (θk).
178
Table 6.12: Tajima’s D statistic values for the Afghani populations using
the total number of mutations and the total number of segregating
sites and their statistical significance.
181
Table 6.13: Co-ordinate values for the Afghani populations and the
additional 62 populations.
183
11
List of Figures
Chapter One - Introduction; Anthropology and DNA
Figure 1.1: Map of Kenya; Lake Turkana in the north-northwest
region of Kenya.
23
Figure 1.2: Illustration of the Multi-Regional hypothesis of modern
human evolution
26
Figure 1.3: Illustration of the Out-of-Africa theory of modern human
evolution.
27
Figure 1.4: Illustration of the Assimilation hypothesis for modern human
evolution.
29
Figure 1.5: Diagrammatic view of mtDNA.
30
Figure 1.6: Simplified Y-Chromosome Consortium (YCC) haplogroup
tree.
40
Chapter Two - Afghanistan
Figure 2.1: Political map of Afghanistan.
47
Figure 2.2: 34 provinces of Afghanistan and also its location in Central
Asia.
48
Figure 2.3: The Population density of Afghanistan.
51
Figure 2.4: Distribution of Afghan ethnic groups.
52
Figure 2.5: Distribution of language groups spoken in Afghanistan.
53
Figure 2.6: Indo-European Language Tree illustrating the Centum and
Satem branches.
54
Figure 2.7: (a) The Iranian languages spoken in the Dnieper-Ural region;
(b) The Ural-Yenisei region and the eastern Iranian languages
spoken and the location of the Central Asian BMAC culture
(shaded) (c) The locations of the Afanasevo (shaded) and
Andronovo (outlined) cultures of Central Asia and the Iran-India
zone in the south.
55
Figure 2.8: Pashtun people from Afghanistan.
58
Figure 2.9: Tajik people from Afghanistan.
59
Figure 2.10: Hazara people from Afghanistan.
60
Figure 2.11: Uzbek people from Afghanistan.
61
Figure 2.12: Left: an Aimaq man from Afghanistan, Middle: a Baluch
man from Afghanistan, Right: a Turkmen man from Afghanistan. 62
12
Chapter Three - Materials and Methods
Figure 3.1: Map of Iran and inset; Khorasan province and the three
refugee camps near the cities of Mashad, Bojnurd and Birjand.
70
Chapter Four - Results
Figure 4.1: Amplification of nine overlapping primer pairs and three
internal primer pairs as described in Torroni et al. (1997).
80
Figure 4.2: Amplification of the fifteen overlapping primer pairs as
described in Palanichamy et al. (2004).
81
Figure 4.3: 2% agarose gel of HpaI restriction digests for Haplogroup
L3 characterisation.
83
Figure 4.4: 2% agarose gel of HpaI restriction digests on Afghan DNAs. 84
Figure 4.5: 2% agarose gel of AluI digests for Haplogroup M
characterisation.
85
Figure 4.6: 2% agarose gel of primer pair 9 (Palanichamy et al., 2004)
PCR products and AluI digests for Haplogroup M assignment.
86
Figure 4.7: 2% agarose gel of PCR products and restriction digests for
analysis of the Haplogroup M characteristic.
86
Figure 4.8: 2% agarose gel of PCR products (P) and digests (D) with
MnlI for Haplogroup N characterisation.
Figure 4.9: 2% agarose gel of Haplogroup N characterisation.
87
88
Figure 4.10: 2% agarose gel of PCR products (P) and digest products
(D) following incubation with HincII for Haplogroup C
characterisation.
89
Figure 4.11: 2% agarose gel of PCR products and digest products
following incubation with HincII for Haplogroup C
characterisation.
90
Figure 4.12: 2% agarose gel of PCR product (P) and cleaved DNA
products (D) following incubation with the endonuclease AluI
for Haplogroup D characterisation.
91
Figure 4.13: 2% agarose gel of PCR amplifications (P) and
endonuclease digestions of these amplifications (D) with HphI
for Haplogroup E classification.
92
Figure 4.14: 2% agarose gel of PCR products (P) and digested
amplifications (D) with the endonuclease HhaI for Haplogroup
G assignment.
93
13
Figure 4.15: 2.5% agarose gel of DNAs digested with the endonuclease
MboII for haplogroup R characterisation.
94
Figure 4.16: 2.5% agarose gel of amplified DNAs digested with the
endonuclease MboII for the assignment of haplogroup R.
95
Figure 4.17: 2.5% agarose gel of amplified DNAs digested with MboII
for characterisation of haplogroup R.
96
Figure 4.18: 2% agarose gel of amplified DNAs processed by the
endonuclease HaeIII for haplogroup A classification.
97
Figure 4.19: 2% agarose gel of RFLP analysis using HaeII on amplified
DNAs for haplogroup I classification.
98
Figure 4.20: 2% agarose gel of DNAs digested with the endonuclease
HaeIII for haplogroup Y assignment.
99
Figure 4.21: 2.5% agarose gel of DNAs following incubation with
the endonuclease MseI for haplogroup HV characterisation.
100
Figure 4.22: 2.5% agarose gel of DNAs digested with AluI for haplogroup
H classification.
101
Figure 4.23: 2% agarose gel of digested DNAs following incubation with
NlaIII for Haplogroup V assignment.
102
Figure 4.24: 2% agarose gel of digested DNAs following incubation with
the endonuclease NlaIII for Haplogroup TJ classification.
103
Figure 4.25: 2% agarose gel of DNAs digested by the endonuclease BfaI
for the characterisation of Haplogroup T.
104
Figure 4.26: 2% agarose gel of PCR products digested with the
endonuclease BstNI for Haplogroup J classification.
105
Figure 4.27: 2% agarose of gel DNAs following incubation with the
endonuclease HinfI for the assignment of Haplogroup Uk-group.
106
Figure 4.28: Sequence alignment of DNA samples assessed for the SNP
at np. 5843 for the classification of Haplogroup Q.
108
Figure 4.29: Chromatogram of sample 7; arrow indicates the SNP site for
Haplogroup Q, which in this case the nucleotide is an adenine.
108
Figure 4.30: Alignment of DNA sequences assessed for the polymorphism
at np. 9090 which characterises for the Haplogroup Z.
109
Figure 4.31: Chromatogram of sample 41Z; arrow indicates the
polymorphic nucleotide which for this sample is a thymine.
Figure 4.32: Chromatogram of sample 113Z; arrow denotes the SNP site
14
109
which is a cytosine.
110
Figure 4.33: Alignment of DNA sequences analysed for the Haplogroup
S polymorphism at np. 8404.
110
Figure 4.34: Chromatogram of sample 8S; arrow indicates the SNP site
for haplogroup S for which the nucleotide is a thymine.
111
Figure 4.35: DNA sequence alignment of samples examined for the
polymorphism at np. 11947 which is characteristic to Haplogroup
W.
111
Figure 4.36: Chromatogram of sample 110W; arrow signifies the
polymorphic nucleotide that defines Haplogroup W, which here is
112
an adenine.
Figure 4.37: Alignment of DNA sequences which were analysed for a
SNP at np. 6371 for Haplogroup X.
112
Figure 4.38: Chromatogram of sample 15X; arrow identifies the
polymorphic nucleotide, which is a thymine.
113
Figure 4.39: Chromatogram of sample 110X; arrow denotes the
polymorphic nucleotide, which in this sequence is a cytosine.
113
Figure 4.40: DNA sequence alignments of samples assessed for the
presence or absence of a 9bp deletion from np. 8281-8289 which,
if present, is characteristic of Haplogroup B.
114
Figure 4.41: Chromatogram of sample 21B; arrow & bracket identifies the
9bp sequence whose absence defines Haplogroup B.
115
Figure 4.42: Chromatogram of sample 35B; arrow denotes the position
where the 9bp sequence would be, between the adenine &
guanine bases.
115
Figure 4.43: Sequence alignment of DNA samples assessed for the
haplogroup F SNP at np. 6392.
116
Figure 4.44: Chromatogram of sample 13F; arrow denotes the
polymorphic nucleotide at the SNP site.
116
Figure 4.45: Chromatogram of sample 27F; arrow identifies the SNP site
where, in this case the nucleotide is a thymine.
Chapter Five - Phylogeographic Analysis of Afghani mtDNAs
Figure 5.1: An illustration of the mtDNA haplogroup tree; rooted by
mtEve, derived haplogroups are connected to their parental
haplogroups. Branches flanked by coding-region polymorphisms
15
117
for subsequent haplogroup determination. Haplogroups colour
coded as to their geographical location.
124
Figure 5.2: Haplogroup L3 frequency map of Hazara (red) and Pashtun
(blue) populations from Afghanistan and of neighbouring
populations (grey).
129
Figure 5.3: Frequencies of macrohaplogroup M* in the Hazara (red),
Baloch (green) and Pashtun (blue) populations and from
neighbouring groups (grey).
131
Figure 5.4: Haplogroup N* frequencies in the Hazara (red) and Tajiks
(yellow) and in surrounding west Eurasian populations.
133
Figure 5.5: Frequency of haplogroup R* in the Hazara (red), Tajiks
(yellow) and Pashtun (blue) and elsewhere in western Eurasia
(grey).
134
Figure 5.6: Frequencies of haplogroup C in Hazara (red) population and
neighbouring populations.
136
Figure 5.7: Haplogroup D frequencies among the Hazara (red), Baloch
(green) and Tajik (yellow) populations from Afghanistan and from
neighbouring populations (grey).
138
Figure 5.8: Comparison of haplogroup G frequencies found in the
Pashtuns (blue) of Afghanistan and from populations in Central
Asia and the Iranian Plateau and the Caucasus region.
139
Figure 5.9: Haplogroup Z frequency within the Hazara (red) and among
other populations in Central & East Asia.
141
Figure 5.10: Frequency map of haplogroup A within the Hazara (red) and
Baloch (green) and from Central Asian and East Asian populations
(grey).
142
Figure 5.11: Frequency of Haplogroup B among the Hazara (red) and
Pashtuns (blue) in Afghanistan compared to populations in west
Eurasia and East Asia (grey).
144
Figure 5.12: DNA alignment of DNA stretch of non-coding DNA between
COII and tRNAlys genes. Samples 25, 35 and 133 have the 9bp
deletion when compared to rCRS and a known haplogroup B
sequence.
145
Figure 5.13: Haplogroup F frequency among the Hazara (red) and Central
and East Asia populations such as Turkmens, Kyrgyz, Mongols,
16
Koreans and Chinese (grey).
147
Figure 5.14: Distribution of haplogroup X throughout western Eurasia and
in the Hazara (red) and Baloch (green).
149
Figure 5.15: Haplogroup HV frequencies among the Hazara (red), Tajiks
(yellow), Baloch (green) and Pashtuns (blue) and other west
Eurasian populations (grey).
151
Figure 5.16: Haplogroup H frequency among the Hazaras (red), Tajiks
(yellow), Baloch (green) and Pashtuns (blue) and from other
regional populations (grey).
152
Figure 5.17: Haplogroup JT frequencies of the Hazara (red), Tajiks
(yellow) and Baloch (green) populations from Afghanistan.
154
Figure 5.18: Haplogroup J frequency among the Baloch (green) and
Pashtuns (blue) and the populations from west Eurasia and Central
Asia (grey).
155
Figure 5.19: Haplogroup T frequency among the Hazara (red) and Baloch
(green) populations of Afghanistan compared to other west
Eurasian populations (grey) from Iran, Iraq, Turkmenistan,
Uzbekistan and Tajikistan.
157
Figure 5.20: A skeleton version of the maximum parsimony tree of the
mtDNA haplogroups present among the Afghani population with
relevant colours for different continental and regional lineages.
Circle size is proportionate to frequency observed within the
population.
159
Figure 5.21: Skeleton version of the maximum parsimony tree of the
fifteen mtDNA haplogroups observed among the Hazara ethnic
group.
160
Figure 5.22: Skeleton version of the maximum parsimony tree of the
six mtDNA lineages present among the Tajik population of
Afghanistan with colours representative of the different regional
lineages.
161
Figure 5.23: Skeleton version of the maximum parsimony tree of the
mtDNA haplogroups present among the Baloch population,
circle sizes are proportionate to haplogroup frequency.
Figure 5.24: Skeleton version of the maximum parsimony tree of the
eight mtDNA haplogroups observed among the Pashtun ethnic
17
162
group with different colours representative for the regional
lineages. Frequency of haplogroups is represented by circle size.
Figure 5.25: Location of the ethnic groups of Afghanistan.
163
165
Chapter Six - MtDNA Diversity and Polymorphism in Afghani Populations
Figure 6.1: Locations of six sub-populations of Uzbekistan with foreign
ancestry (Irwin et al., 2009b).
169
Figure 6.2: Aligned mtDNA HVS-I Sequences compared to the rCRS (top)
using DNA Alignment, highlighted cells indicate polymorphic
nucleotides.
171
Figure 6.3: Baluch population mismatch distribution (y axis = frequency);
(Raggedness index (r): 0.0324)
179
Figure 6.4: Hazara population mismatch distribution (y axis = frequency);
(Raggedness index (r): 0.0224)
179
Figure 6.5: Pashtun population mismatch distribution (y axis =
frequency); (Raggedness index (r): 0.0121)
180
Figure 6.6: Tajik population mismatch distribution (y axis = frequency);
(Raggedness index (r): 0.0270)
180
Figure 6.7: Median Joining network calculated from the HVS-I sequences
of the Afghani population.
182
Figure 6.8: The first five genetic barriers of the HVS-I sequence date of
66 populations, including the four Afghani groups, compiled using
the FST matrix as an input file in Barrier ver. 2.2.
185
Figure 6.9: The top ten genetic barriers using the FST matrix of the four
Afghani ethnic groups and the 62 additional populations in Barrier
v.2.2.
186
18
Abbreviations
AMH
Anatomically Modern Humans
BMAC
Bactria-Margiana Archaeological Complex
bp
Base Pairs
COI
Cytochrome Oxidase Subunit I
COII
Cytochrome Oxidase Subunit II
COIII
Cytochrome Oxidase Subunit III
CRS
Cambridge Reference Sequence
HVS
Hypervariable Segment
IE
Indo-European
Ky
Thousand Years
Kya
Thousand Years Ago
LGM
Last Glacial Maximum
Mb
Mega bases (1,000,000 bases)
mtDNA
Mitochondrial DNA
np
Nucleotide Position
NRPY or NPY
Non-Recombining Portion of the Y-Chromosome
PAR
Pseudoautosomal Region
PCR
Polymerase Chain Reaction
rCRS
Revised Cambridge Reference Sequence
RFLP
Restriction Fragment Length Polymorphism
SNP
Single Nucleotide Polymorphism
Y-STR
Y-Chromosome STR
YBP
Years Before Present
YCC
Y-Chromosome Consortium
19
Acknowledgements
Firstly, I would like to thank my supervisor Dr. Maziar Ashrafian Bonab for having the
patience to tolerate me for the duration of this study. I would also like to thank him for
all the support and encouragement he has given me especially during the low periods and
for also sharing his knowledge with me.
I would also like to thank the other members of my supervisory team, Dr. Julian Mitchell
and Dr. Frank Schubert, for the support and advice they have given and also for the many
spontaneous ‘corridor meetings’, your help has also been invaluable.
This project would not have been possible without the financial and organisational
support of IBBS and The Genographic Project, my thanks also go to you.
I would also like express my gratitude to the technical staff of King Henry Building,
especially Dr. George Zouganelis, Ms Christine Hughes and Mr Christopher Baker. Your
constant availability to listen to my grievances has ensured my sanity throughout the
difficult periods of this study. Additionally, I would like to thank the many other
academic and research staff at the University of Portsmouth for their help and support.
Finally, but certainly not least, I would like to thank my friends and family, especially
my parents Christine and Tony, whose patience in me has been tested fully, but despite
this have always supported me throughout. They have given me the strength and
determination during the tougher days to pick myself up again and continue. I would like
to thank Jonathan Parrott for his support but also for proof-reading some of my thesis
chapters and never ceasing to find a mistake. Thanks! Thank you for all your help.
20
Chapter One
Anthropology & DNA
21
1. Anthropology and DNA
1.1 Origins of Modern Humans
1.1.1
Early Hominids
For many years, scientists, especially biologists, and ourselves as inquisitive humans
have wondered and asked the question who walked the Earth before us? Where do we
come from?
The process of evolution as recognised by the early evolutionary biologists Charles
Darwin, Alfred Russell Wallace and Thomas Huxley have aided us in the explanation of
this question. We know that the early hominids diverged away from our nearest relatives,
the African apes, some 5-7 million years ago (MYA) based upon fossil identification and
protein comparisons between Asian and African apes and humans (Stoneking, 2008).
Most of the hominid fossils that are dated after 4.2 MYA, but do not comply with the
characteristics of the genus Homo, are grouped into the genus that preceded it,
Australopithecus (Jobling, Hurles & Tyler-Smith, 2004). Fossils belonging to this genus
have only been found in Africa (Jobling, Hurles & Tyler-Smith, 2004). The oldest of
these species is Australopithecus anamensis, found at Lake Turkana (Figure 1.1) and the
nearby sites of Kanapoi and Allia Bay in northern Kenya. The fossil evidence collected
at these sites were a mandible, molars and premolars, skull fragments, a tibia and a
humerus (Cartmill & Smith, 2009) and have been dated to the lower Pliocene era 3.9-4.2
MYA (Jobling, Hurles & Tyler-Smith, 2004; Cartmill & Smith, 2009).
Another early species was Australopithecus afarensis, which differs from the earlier A.
anamensis, in that the males are slightly smaller (Jobling, Hurles & Tyler-Smith, 2004).
Remains of A. afarensis have been found at multiple sites along the Eastern Rift Valley,
from Hadar, Ethiopia in the north to Laetoli, Tanzania, in the south dating to 3.0-3.9
MYA (Jobling, Hurles & Tyler-Smith, 2004; Cartmill & Smith, 2009).
A contemporaneous species to A. afarensis was Australopithecus bahrelghazali whose
remains have been found near Koro Toro, central Chad, some 2,500 Km west of the
Eastern Rift Valley where the remains of A. afarensis were found (Jobling, Hurles &
Tyler-Smith, 2004). These are dated from 3.0 MYA (Jobling, Hurles & Tyler-Smith,
2004) to 3.5 MYA (Jobling, Hurles, Tyler-Smith, 2004; Cartmill & Smith, 2009).
Australopithecus bahrelghazali are morphologically similar to the eastern relatives, and
may even be a variant of A. afarensis (Jobling, Hurles & Tyler-Smith, 2004). The main
features which allows for the differentiation between these two species is that the
mandibular symphysis, the fusion ridge joining the left and right halves of the mandible,
22
is more vertical in A. bahrelghazali (Cartmill & Smith, 2009), also the lower premolars
have three roots instead of the conventional two, however this feature is not necessarily
something that enables for the identification between them.
Figure 1.1: Map of Kenya; Lake Turkana in the north-northwest region of Kenya.
During the time soon after A. afarensis and A. bahrelghazali, there seems to be a
transition of hominids between Australopithecus and the future genus Homo. The species
in question is Australopithecus habilis or as it is also known Homo habilis of which some
specimens have been dated to ~2.5 MYA (Jobling, Hurles & Tyler-Smith, 2004). It has
been described as a primitive Homo and an advanced Australopithecus (Cartmill &
Smith, 2009). This hominid is regarded to belong to the genus Homo largely upon the
interpretation of partial skull and mandible fossils (Jobling, Hurles & Tyler-Smith, 2004).
Homo habilis had a larger brain and had smaller ‘cheek-teeth’ (Cartmill & Smith, 2009).
23
Homo habilis is regarded as having an Australopithecus-like brain and Homo-like face
while other fossils indicate the opposite leading on to claim that there’s a sister species
named Homo rudolfensis (Cartmill & Smith, 2009).
A Homo species where there is little confusion is Homo erectus. The only issue is that H.
erectus is sometimes used in reference to non-Africans while Homo ergaster is
consigned to African individuals (Jobling, Hurles & Tyler-Smith, 2004) but there is no
significant difference between the two. Here, to avoid confusion, Homo erectus will be
used in reference to both. The oldest H. erectus fossils were found in eastern Africa from
Koobi Fora, near Lake Turkana, and dated approximately 1.8-1.9 MYA (Jobling, Hurles
& Tyler-Smith, 2004). Some other fossils from this site have been dated to 1.7-1.8 MYA
(Cartmill & Smith, 2009). As well as being initially found in Africa, Homo erectus was
also the earliest hominid found outside Africa, and based on fossil evidence obtained
from Indonesia and China, H. erectus may have been in East Asia as early as 1.8 MYA
(Jobling, Hurles & Tyler-Smith, 2004). These Asian H. erectus had a larger body than
their African counterparts which may have provided greater tolerance to heat stress
which supported them in their migration, and there may also have been small populations
of Homo erectus still around as little as 27 thousand years ago (KYA) (Jobling, Hurles &
Tyler-Smith, 2004). The limb proportions and tooth size of H. erectus are similar to
anatomically modern humans (AMH), Homo sapiens sapiens, however, their brain size is
smaller (Jobling, Hurles & Tyler-Smith, 2004).
Later Homo species following the emergence of H. erectus but before AMHs include
Homo heidelbergensis and Homo neanderthalensis. The former are less robust than
Homo erectus but did have larger brains. Fossil evidence, particularly a mandible, of
Homo heidelbergensis has been found across Europe (in Germany, Greece, Italy, Spain
and the United Kingdom) and in Ethiopia, eastern Africa and to have been dated to ~1
MYA (Jobling, Hurles & Tyler-Smith, 2004). Due to their distribution, H.
heidelbergensis must have been a variable species, not too dissimilar to AMHs, as they
were able to inhabit several different regions under different environmental conditions.
However, Homo neanderthalensis, or the Neanderthal as it is more commonly known,
has been identified to have emerged ~250 KYA and become extinct ~27 KYA, just
before the Last Glacial Maximum (LGM). The Neanderthal inhabited regions of Europe
and western Asia with fossils identified in France, Germany, Israel and Iraq. Based upon
skeletal and skull evidence, which show H. neanderthalensis to have large brains and
24
well-defined brow ridges and also appear robust, Neanderthals may have in fact derived
from H. heidelbergensis (Jobling, Hurles & Tyler-Smith, 2004).
1.1.2
Modern Humans
The evolution from previous hominid species into AMHs is contested, not which archaic
human species from which we derive, but where geographically, in three hypotheses; (i)
the multi-regional hypothesis, (ii) the replacement hypothesis and (iii) the assimilation
hypothesis. Initially, the conclusions made regarding the evolutionary history of AMHs
focussed of fossil evidence (Schick & Toth, 1993), but now concentrates on both
archaeological, anthropological and genetic data from modern human populations
(Stringer, 2002; Mellars, 2006). During the early period of AMH history, modern
humans were at least one third of a group of evolving human species; Homo erectus in
Asia, Homo neanderthalensis in Europe/Eurasia and Homo sapiens in Africa (Klein,
2008).
The multi-regional hypothesis (Figure 1.2) or Regional Continuity model (Wolpoff et al.,
1984) suggests that Homo erectus populations, migrated out of Africa to the various
regions of the world more than 1 MYA (Nei, 1995), and gradually evolving in AMHs,
providing our current worldwide distribution. For example, Asian H. erectus evolved into
Asian modern humans, African H. erectus evolved into African modern humans etc.
Wolpoff et al. (2000) states that this model does not suggest parallel evolution,
independent multiple origins or the simultaneous appearance of characteristics within
different regions. This hypothesis does propose that the regional characteristics of
modern humans have remained unchanged since the time of their ancestors more than 1
MYA (Nei, 1995) which would seem unlikely. Genetically, there is no evidence of Homo
neanderthalensis mitochondrial DNA (mtDNA) contribution to AMHs (Hodgson &
Disotell, 2008). This is due to the large quantity of polymorphisms seen between
Neanderthal and modern human mtDNA in comparison to any two modern human
mtDNAs. However, Homo erectus X-Chromosome sequence data can be found in
modern humans (Cox et al., 2008), thus providing some genetic support to the multiregional hypothesis. This model doesn’t rule out the possibility of different H. erectus
populations breeding with one another; however it does suggest that the main form of
breeding occurred within isolated H. erectus groups. This hypothesis argued that “each
inhabited region showed a continuous anatomic sequence leading to modern humans, and
those non-African populations exhibited no special African influence” (Stringer, 2002).
25
Figure 1.2: Illustration of the Multi-Regional hypothesis of modern human evolution (Stoneking, 2008)
The Replacement hypothesis or Out-of-Africa theory (Figure 1.3) is the main alternative
to the multi-regional hypothesis. This theory also identifies an African origin (while in
the multi-regional hypothesis, this African origin is associated with Homo erectus and
not all AMHs), that proposes modern humans originated from an African H. erectus
population ~100,000-200,000 years ago (Nei, 1995), ~150Kya (Forster & Matsumura,
2005) and not from the expansive H. erectus populations outside Africa. Modern humans
would first expand and colonise Africa before migration into the Middle East and
subsequently onwards throughout the world.
Support for this hypothesis intensified with the introduction of genetics, in particular the
use of mtDNA and Y-Chromosome, as these are inherited unilaterally. Cann et al. (1987)
collected 147 mtDNA samples from five population groups around the world (African,
Asian, Australia, New Guinea and European). Restriction Fragment Length
Polymorphism (RFLP) analysis was performed on these samples in an attempt to identify
the level of genetic variation. Upon the construction of a genealogical tree illustrating the
evolutionary relationships between the populations, Cann et al. (1987) concluded that the
most ancestral sequence split the tree into two groups; the first consisting of only African
mtDNAs, and the second containing the mtDNAs of the rest of the world. They also
calculated, using the molecular clock, that the common ancestor for modern humans
lived in eastern Africa 140-280 KYA (Cann et al. 1987).
26
The hypervariable regions of mtDNA (HVS-I and HVS-II) from 189 individuals, of
which 121 were of African descent, were sequenced (Vigilant et al., 1991). Chimpanzee
and human mtDNA sequences were used to calibrate the rate of mtDNA evolution
resulting in the dating of the common human ancestor sometime between 166-249 KYA
(Vigilant et al., 1991). Other studies which also appear to support the Out of Africa
hypothesis, also calculate the ancestor of modern humans to be near this time; 230-298
KYA (Hasegawa & Horai, 1991; Ruvolo et al., 1993). The subsequent expansion(s) from
Africa are thought to have taken a northern (via the Levant) or a southern route across the
Horn of Africa. Recently, evidence has amassed supporting the latter as an initial single
successful migration (Kivisild et al., 1999; Forster & Matsumura, 2005; Macaulay et al.,
2005; Mellars, 2006; Hudjashov et al., 2007; Chandrasekar et al., 2009; Kumar et al.,
2009). A Levant migration has been recognised, but has either been identified as having
lesser impact (Forster & Matsumura, 2005) or occurring more recently (20-10Kya)
(Winters, 2011). The Strait of Gibraltar has been identified as a third migration point
from northern Africa, ~40-35Kya while Eurasia was still inhabited by Neanderthals
(Winters, 2011). To date, the current genetic and archaeological data available is
generally interpreted to substantiate the single AMH origin in east Africa (Liu et al.,
2006).
Figure 1.3: Illustration of the Out-of-Africa theory of modern human evolution (Stoneking, 2008)
27
The Out-of-Africa hypothesis has been supported further; Hudjashov et al. (2007)
identified, by analysis of both Y-Chromosome and mtDNA, that Australian Aboriginals
and Melanesians (from the region of Oceania that incorporates the islands around
Australia, including Papua New Guinea) belong to the founder groups (mtDNA lineages
M and N, and Y-Chromosome lineages C and F) that are associated with the initial exit
from Africa that occurred 50-70 KYA. It was also found that Australian Aboriginals
were closely related to the indigenous populations of Papua New Guinea and the rest of
Melanesia, and during one period or another were part of the same settlement group
which has been later separated due to the oceans (Hudjashov et al., 2007).
The Assimilation model (Figure 1.4) is an amalgamation of the two previous theories in
that AMHs “arose through the integration of an important African role with multiregional
views” (Stringer, 2002). The Assimilation model accepts the African origin for modern
humans, however suggests that the role of population migrations and the replacement of
the more archaic species has been over-proposed and the evolution of various H. erectus
populations into AMHs. For instance, it has recently been observed that the Neanderthal
genome shares a greater affinity with AMHs of Eurasia than those of Africa; between 14% of Eurasian genomes derive from the Neanderthal genome (Green et al., 2010). It
was also observed that the Neanderthal genome is as similar to a French individual as it
is to an East Asian (Han Chinese) and a Papuan genome indicating an occurrence of
admixture between AMHs and Neanderthals shortly after the modern human migration
from Africa but prior to the divergence of Europeans, East Asians and Papuans (Green et
al., 2010).
28
Figure 1.4: Illustration of the Assimilation hypothesis for modern human evolution (Stoneking, 2008)
1.2 Mitochondrial DNA (mtDNA)
Since Watson & Crick (1953) described the structure of DNA as two right-handed helical
chains coiled round the same axis held together by purine and pyrimidine bases,
scientists and in particular geneticists have been obsessed with its function, and once
identified in most cells, was quickly identified as an essential molecule. Within a typical
somatic cell, there are many complex processes occurring which are specific for that cell,
for example, the synthesis of a specific protein required for a particular job. The cell
contains many organelles which are required for the essential cellular procedures. The
main centre of the cell is the nucleus, where DNA is stored in the form of chromosomes
and will only leave in the form of mRNA. Nuclear DNA is heavily involved in protein
synthesis and as a consequence, providing the cell with its identity by the regulation and
expression of genes. For example, pancreatic cells are instructed to synthesise insulin, a
hormone used to regulate blood-sugar levels, while other cells have this gene switched
off. The organelles inside a cell are often involved in protein synthesis, the packaging or
transportation of proteins. The mitochondrion (~dria pl.) is a double-membrane bound
organelle, whose inner membrane (cristae) is extensively folded upon itself to maximise
surface area, and is like no other organelle inside the cell; not only does it receive
29
synthesised proteins, but also synthesises its own proteins from its own genome,
mitochondrial DNA, which is separate from nuclear DNA (Borst, 1977). Mitochondrial
DNA (mtDNA) is located inside the mitochondrion (Jobling, Hurles & Tyler-Smith,
2004; Butler, 2005) within the mitochondrial matrix, and unlike nuclear DNA, is not
involved in the majority if not all cellular processes, but only those which occur inside
the mitochondrion such as oxidative phosphorylation and ATP synthesis. The origins of
the mitochondria are widely accepted to have derived from a mutual symbiosis between
the cells and a bacterium (Anderson et al., 1981; Jobling, Hurles & Tyler-Smith, 2004).
Human mitochondrial DNA (Figure 1.5) is a double-stranded circular molecule and is
16,569 base pairs (bp) in length (Jobling, Hurles & Tyler-Smith, 2004; Butler, 2005;
Ebner et al., 2011).
Figure 1.5: Diagrammatic view of mtDNA.
Mitochondrial DNA is inherited unilaterally via the maternal line (Lightowlers et al.,
1997; Jobling, Hurles & Tyler-Smith, 2004; Butler, 2005) and is present in nearly all
cells. Biparental (additional paternal contribution) inheritance has been observed among
insects such as honeybees and drosophila, mussels, yeast and mice (Meusel & Moritz,
1993; Kvist et al., 2003; Kraytsberg et al., 2004). Among honeybees, as much as 27%
male mtDNA contribution has been observed during the egg stage 12 hours after
30
oviposition, while the contribution becomes negligible by larval emergence (Meusel &
Moritz, 1993). Paternal inheritance of mtDNA in humans has been observed among the
blastocyst stage of some abnormal embryos (St. John et al., 2000) but overall
contribution has been negligible (~0.7%) (Kraytsberg et al., 2004). The unilateral
inheritance may be attributed to a typical oocyte containing ~100,000 mtDNA genomes,
while spermatocytes only contain ~100 genomes (Chen, X et al., 1995; Jobling, Hurles &
Tyler-Smith, 2004) and also the selective destruction or inactivation of spermatozoon
mitochondria during early embryogenesis (Schwartz & Vissing, 2002). Within cells,
mtDNA is present in high copy numbers; 103-104 (Lightowlers et al., 1997; Butler,
2005), and most of these copies are identical to one another (Lightowlers et al., 1997).
Cells which require a greater ATP yield, such as muscle or nerve cells, will contain a
greater number of mitochondria and therefore more mtDNAs than those which have a
much lower demand.
The mitochondrial genome contains 37 genes; coding for 13 proteins, 22 tRNAs and two
rRNAs, which are contiguous and have little non-coding bases between them (Anderson,
et al., 1981). The major non-coding region within mtDNA is the D-loop; a 1122bp region
which houses the hypervariable (HVS) regions; HVS-I (np. 16024-16383; classically
16024-16365), HVS-II (np. 57-372; classically 73-340) and HVS-III (np. 438-574)
(Butler, 2005). Since the late 1980s, mtDNA, in conjunction with Y-Chromosome and
autosomal DNA, have been utilised for population genetics as it is possible to trace the
evolutionary and historical lineage of a species. Mitochondrial DNA has been used
extensively due to its maternal inheritance via the ova (Giles et al., 1980), lack of
recombination, high abundance per cell and high mutation rate (Olivo et al., 1983;
Merriwether et al., 1991; Elson et al., 2001; Piganeau & Eyre-Walker, 2004; Torroni et
al., 2006; Asari et al., 2007; Behar et al., 2007; Maji et al., 2008). The mitochondrial
genome acquires mutations approximately ten times faster than nuclear DNA (Brown et
al., 1979; Ingman & Gyllensten, 2001; Ebner et al., 2011). This high rate of mutation is
accredited to the absence of protective proteins, such as histones, around the DNA,
exposure to oxidative damage, and a lack of repair mechanisms (Bogenhagen, 1999).
The original sequenced sample of mtDNA, the Cambridge Reference Sequence (CRS)
(Anderson et al., 1981), was obtained from the placenta of an individual of European
descent and exhibits the typical characteristics of European mtDNA, belonging to
Haplogroup H, Sub-Haplogroup H2 (Achilli et al., 2004). Haplogroups are a set of
slowly mutating markers (Jobling, Hurles & Tyler-Smith, 2004) that tend to be shared by
peoples of the same geographic region. Wallace et al. (1999) found that many individuals
31
from the same or similar populations or cultural backgrounds shared similar mtDNA
sequences and could be clustered together to form haplogroups. This can be seen
particularly amongst European populations (Hedman et al., 2007; Richard et al., 2007;
Tetzlaff et al., 2007; Zimmerman et al., 2007) which share many of the same
haplogroups. The original sequence was reanalysed by Andrews et al. (1999) as other
investigators (Brown et al., 1992; Howell et al., 1992) had identified differences in the
genomic sequence. In total, the re-analysis identified eighteen errors or rare
polymorphisms within the mtDNA sequence, thus updating the sequence to become the
revised Cambridge Reference Sequence (rCRS).
Since the acknowledgement of mtDNA as an essential tool in population genetics, the
genome has been analysed extensively. Earlier analyses of mtDNA utilised a number of
restriction enzymes; AluI, AvaII, BamHI, DdeI, HaeII, HaeIII, HhaI, HincII, HinfI, HpaI,
HpaII/MspI, MboI, RsaI and TaqI. The earlier studies using this method, known as the
14-restriciton enzyme method, used the endonuclease HpaII (Torroni et al., 1992; 1993;
1994a) while those undertaken later switched to MspI (Torroni et al., 1994b; 1996; 1997;
1998; 1999; Brown et al., 1998; Kivisild et al., 1999; Macaulay et al., 1999; Kivisild et
al., 2003; Quintana-Murci et al., 2004; Alzualde et al., 2005), however both target the
sequence C:CGG. In many cases when this method was used, additional polymorphic
SNPs were also observed; most commonly AccI at 14465 and 15254, HinfI at 12308,
NlaIII at 4216 and 4577 and MseI at 14766 (Torroni et al., 1996; 1998; 1999; Brown et
al., 1998; Kivisild et al., 1999; Macaulay et al., 1999; Quintana-Murci et al., 2004;
Alzualde et al., 2005). This method has been modified more recently as restriction
enzymes have been used at the sites of diagnostic polymorphisms for haplogroup
identification (Torroni et al., 2001; Al-Zahery et al., 2003; Quintana-Murci et al., 2004;
Tambets et al., 2004; Alzualde et al., 2005; Nasidze et al., 2006; Jin et al., 2009). Quite
often, mtDNA genomes were also partially sequenced, mostly focussing on the
hypervariable regions (HVS-I and II) (Kivisild et al., 1999; 2003; Nasidze & Stoneking
et al., 2001; Torroni et al., 2001; Al-Zahery et al., 2003; Nasidze et al., 2004a; 2004b;
2005a; 2005b; 2006; 2007; Powell et al., 2007; Alshamali et al., 2008; Irwin et al.,
2009a; 2009b; Jin et al., 2009) and now DNA sequencing is a little more economical,
whole genome sequencing (Achilli et al., 2004; 2005; Fagundes et al., 2008;
Chandrasekar et al., 2009; Kumar et al., 2009) is a common mtDNA analysis method.
32
1.3 Mitochondrial and Y-Chromosome Haplogroup Distribution
1.3.1
Mitochondrial DNA Variation
Human mtDNA genomes differ broadly across the world, with populations of similar
descent or geographical region sharing many of the same characteristics. In some cases,
these characteristics can indicate some historical events of the population including
admixture with other populations or migrations.
The Americas are dominated by five haplogroups (Schurr et al., 1990; Achilli et al.,
2008; Fagundes et al., 2008); A, B and X among native North Americans, and
haplogroups B, C and D among South Americans. Haplogroup A is also abundant among
Central American populations. Haplogroups A-D are also present in Asia, while X is
found in low frequencies outside the Americas; In Europe, it accounts for <5% of the
mtDNA diversity (Fagundes et al., 2008). Haplogroup X arrived in the Americas as part
of a single founding population, which refutes multiple migration theories such as the
Solutrean hypothesis. This is based upon the five founder haplogroups possessing a
coalescence age of ~20KYA (Fagundes et al., 2008), an incidence that wouldn’t be
observed had a later peopling event have occurred. Additional contesting evidence is that
Amerindian mtDNAs contain rare mutations that can only be found in Asia, thus
indicating the peopling of the Americas occurred via a migration through Asia (Schurr et
al., 1990). The initial separation of American populations from Asian groups ended with
a population bottleneck in Beringia during the LGM ~23-19KYA reducing the female
contribution to ~1,000 individuals. Toward the end of the LGM, the population
experienced an expansion from ~18-15KYA implementing the migration by a southern
route, likely along the western coast of North America, as the ‘opening of the ice-free
corridor is dated no earlier than ~14KYA’ (Fagundes et al., 2008).
There are a number of minor populations among the larger populations within East Asia.
Haplogroups B, F, M7 and R9 tend to be found in abundance within Chinese
populations, and in Hong Kong, these four lineages make up over 50% of the mtDNA
diversity (Irwin et al., 2009a). The Korean population and the subpopulations that
neighbour them (Manchurians, Korean-Chinese and Han (Beijing), each have high
frequencies of haplogroups D (≥25.0%), M (≥15.0%), F (≥10.1%) and B (≥10.0%) (Jin et
al., 2009). The haplogroup D frequencies in the other populations studied by Jin et al.
(2009) were Vietnamese (18.8%), Mongolian (12.8%) and Thai (7.5%). The Korean
population also presents moderate frequencies of the haplogroups A (8.4%), G (7.3%)
which are common in northeast Asia and southeast Siberia. Other common lineages from
this region (C, Y & Z) only make up <4% of the Korean mtDNA gene pool. Based upon
33
mtDNA analysis, Koreans are most genetically similar to populations within their own
geographical region of northeast Asia. The Xibe, another Chinese ethnic group,
originating from north-eastern China but now inhabit a region of north-western China.
They genetically resemble populations from their indigenous north-eastern region of
China and are most similar to the Manchurian population of the same region (Powell et
al., 2007).
In Europe, there is one haplogroup in particular which dominates the population
landscape; haplogroup H (Mikkelsen et al., 2008). The frequency of this haplogroup
increases the further west into Europe, in addition, there are also regional ‘hotspots’
where the frequency is greater than the surrounding populations – such hotspots include
the Spanish Basques, northern Germany, Denmark, northern France and Great Britain
(Achilli et al., 2004). The average frequency within Europe is 40.5%, often found
between 40-50% (Grignani et al., 2009), 24.6% in the Caucasus region, 18.4% among
Middle Eastern populations and 10.6% within Asian groups (Achilli et al., 2004). There
are at least fifteen subgroups of haplogroup H, two common subgroups are H1 and H3
which exhibit high frequencies in the Iberian peninsula and the surrounding populations
such as the Berbers of Morocco, and both have a coalescent age of ~11KY (Achilli et al.,
2004). Another common European haplogroup is haplogroup U (coalescent age of
~60KY – giving it a date of origin soon after the AMH exit from Africa (Kivisild et al.,
1999; Achilli et al., 2005)), and in particular subgroup U5 (Kivisild et al., 1999). A U5
subclade (U5b1b) was found to be present among a single Yakut individual and one
Fulbe individual, both individuals differed by two coding region nucleotides and three
control region nucleotides (Achilli et al., 2005), an unusual occurrence since the Yakut
are from Siberia and the Fulbe are from Senegal. Sub-haplogroup U5 (coalescent age of
41.4 ±9.2KY) is often found at low frequencies within Europe and the Berber and
African populations, however U5 is found at a high frequency (~48%) among the Saami
of northern Scandinavia (Achilli et al., 2005), of which some can also be assigned to the
subclade U5b1b. The lineage U5b1b shares similar patterns with the other major
haplogroup found among the Saami; haplogroup V. Together, these two lineages account
for nearly 90% of the Saami mtDNAs (Achilli et al., 2005). The coalescent age for
U5b1b is 8.6 ±2.4KY, not dissimilar to the coalescent age of the popular H
subhaplogroups H1 and H3 and also haplogroup V itself (Achilli et al., 2005). The
identification of the lineage U5b1b links the Berbers (and the African tribes, such as the
Fulbe who are known to have mixed with the Berbers) and Europeans who have
34
contributed their H1, H3, U5b1b and V lineages with the populations of northern Africa
during the LGM (Achilli et al., 2005).
The populations within the Caucasus region have recently been studied extensively as
this region accommodates a number of populations which speak a variety of languages
from different languages families. Nasidze & Stoneking (2001) identified that Caucasian
populations were more similar to their geographical neighbours despite the language
differences than to populations who share a language family but are not geographically
local. A Neighbor-Joining tree also illustrated a close relationship between the
Azerbaijanis, Armenians and Chechenians who are all south Caucasian populations but
who speak an Altaic, Indo-European and North Caucasian language respectively. The
Caucasian populations have also been identified as an intermediate between West Asian
and European populations (Nasidze et al., 2004a) while it was found that two Iranian
groups (Tehran and Isfahan) are close the Caucasian groups of the Avarians and
Rutulians. Elsewhere, Iranians have been observed to lie within an intermediate position
between Caucasian and East Asian populations (Shepard & Herrera, 2006). The Ossetian
groups are also fairly similar to one another despite being found on both the northern and
southern slopes of the Caucasian mountains, possibly indicating a common origin
(Nasidze et al., 2004b). They speak a language that belongs on the Iranian branch of
languages but are surrounded by Caucasian-speaking groups. The Kurdish groups from
the region; Kurmanji speakers from Georgia and Turkey, Zazaki speakers from Turkey,
and Kurds from eastern Turkey, Iran and Turkmenistan, are genetically similar to one
another despite the linguistic and geographical differences (Nasidze et al., 2005a). The
Kurdish groups are also more similar to West Asian and European populations than to
Caucasian and Central Asian groups (Nasidze et al., 2005a).
The Kalmyks are an ethnic group that reside along the lower Volga River, Russia, that
are believed to have Mongolian ancestry. They have a frequency of the COII-tRNAlys
9bp deletion of ~7% - similar to the frequencies exhibited in the Korean, Mongolian and
Buryat populations (Nasidze et al., 2005b). Since this deletion is at low frequencies
among eastern Europeans, a Mongolian ancestry is supported. The Kalmyks also share
some similarity with local Russian populations indicating a more recent maternal
admixture (Nasidze et al., 2005b).
The Gagauz are a linguistic enclave; a Turkic-speaking group that originated in Turkey
before migrating to their current location in Moldova, surrounded by Indo-Europeanspeaking populations. The Moldovan population are similar to Europeans while the
35
Gagauz are an intermediate between Europeans and Caucasians; they are more similar to
Moldovans than to their Turkish ancestry (Nasidze et al., 2007).
Recently, there has been greater focus on Indian and Asian populations due to their rich
anthropological history. It is thought that the AMH expansion from Africa ~85Kya
consisted of a small group of 500-2,000 women, which would justify why only two sublineages (super-haplogroups M and N) have emerged from Africa (Forster & Matsumura,
2005). In India, the most common lineage is haplogroup M, it is ubiquitous and
contributes to >70% of Indian mtDNAs (Chandrasekar et al., 2009). The lineage is also
common among south Indian tribes and Caste populations and accounted for all but three
lineages among the Chenchus (Kivisild et al., 2003). Haplogroup M is also found at high
frequencies among the populations inhabiting the region along the southern coast of
Pakistan and northwest India; 30-55% (Quintana-Murci et al., 2004). Meanwhile, the
frequency of haplogroup M is low or absent west of the Indus Valley and low among
Central Asian populations (<12%) (Quintana-Murci et al., 2004). Haplogroup U was
identified as the second largest contributor (Kivisild et al., 1999) and is also the second
most frequent haplogroup in Europe, however the subhaplogroups differ between the two
regions; subgroup U5 in Europe and U2 in India. The distribution of haplogroup U is
similar to ‘M’ in Asia, focussing more within the Indo-Pakistan region (Quintana-Murci
et al., 2004). The Indian U2 genotype differs from the west Asian U2 in that the latter
also contains a transversion at np 16129 that is absent within the Indian U2 lineage
(Kivisild et al., 1999). The age of the split of the two genotypes is 53 ±4 KY while the
European U5 lineage also has a similar age (Kivisild et al., 1999). Frequencies of west
Asian haplogroups are lower in the Indian population than in Central Asia and the
Caucasus with the exception of haplogroup W where the frequency is greater in India
than both Central Asia and the Caucasus and haplogroup Uk-group that is greater than
Central Asia but not in the Caucasus (Table 1.1) (Kivisild et al., 1999).
36
Table 1.1: Haplogroup frequencies of west Asian haplogroups in India, Central Asia and the Caucasus
(Kivisild et al., 1999).
Haplogroup
India
Caucasus (Armenia & Georgia)
Central Asia
H
1.8%
24.8%
14%
I
0.7%
1.8%
1%
J
0.5%
6.7%
2.5%
K
0.2%
8.2%
0.5%
T
1.8%
11.8%
3.5%
Uk-group
13.1%
21.2%
8%
W
2.2%
0.9%
1%
India has been identified as a major region in the peopling of southeast Asia and
Australia as part of the ‘Southern Route’ migration from Africa (Kivisild et al., 1999;
Chandrasekar et al., 2009; Kumar et al., 2009). The southern route migration from Africa
was an essential component of the rapid peopling and settlements of southern Asia and
Australia certainly by ~46Kya (Forster & Matsumura, 2005; Macaulay et al., 2005;
Hudjashov et al., 2007). Chandrasekar et al. (2009) branded India a site of initial
settlement of AMHs following the exodus from Africa, and perhaps it was during this
period that the divergence of haplogroup U occurred. It was also found that populations
from the Andaman Islands and Australians have ancestral maternal roots in India
(Chandrasekar et al., 2009). Some individuals from Central Dravidian and Austro-Asiatic
tribes share two basal synonymous mtDNA polymorphisms within the M42 haplogroup
(G8251A & A9156T) which are specific to Australian Aborigines (Kumar et al., 2009).
The shared mtDNA lineage provides direct genetic evidence that Australia was populated
by AMHs through South Asia via the ‘Southern Route’ (Kumar et al., 2009). Kumar et
al. (2009) suggested an early colonisation of Australia dating to ~60 KYA, which
appears to be synonymous with archaeological data.
A study on the Iraqi population (Al-Zahery et al., 2003) identified a similarity with
Iranian and other Middle Eastern populations, but dissimilarity to Arabians. The west
Asian haplogroups (HV, H, V, J, T, K, Uk-group, I, X and W) total 77.9% of the Iraqi
mtDNAs, similar to the frequencies seen in Iran (80.4%) and Syria (75.1%). These
populations have frequencies that resemble European populations (>90%) more than they
do Arabian populations (60.4%) (Al-Zahery et al., 2003). Iraqis can also be grouped with
other west Asians populations (Lebanese, Turkish and Syrian) based upon YChromosome data. These four populations have a high frequency of haplogroup J
37
(41.9%-58.3%); other abundant haplogroups include R and E. In total, these three
haplogroups amass a total frequency of 76.4% (Turks), 83.8% (Lebanese), 87.8% (Iraqis)
and 90% (Syrians) (Al-Zahery et al., 2003).
Within Eurasian populations, the sub-Saharan African lineages L1, L2 and L3, are absent
aside from the Makrani of southern Pakistan where they are present in high frequencies –
39.4% (Quintana-Murci et al., 2004). Eastern Eurasian lineages are represented by the
haplogroups A, B, F and N9a (from the macrohaplogroup N) and C, D, G and Z (from
macrohaplogroup M) (Quintana-Murci et al., 2004). The latter are widespread among
northern and eastern Asians and to a lesser extent among Central Asians. The highest
frequencies of these lineages were found among Central Asian populations – Turkmens
(37%) and Uzbeks (31%), however Turkmen Kurds only exhibited a frequency of 9%
(Quintana-Murci et al., 2004), which supports the findings by Nasidze et al. (2005a) that
Kurdish groups are more similar to one another (and west Asians) than to their
geographical neighbours. These lineages are absent or at low frequencies among
populations in the Anatolia and Caucasus region, Iranian plateau and the Indus Valley,
again with the exception of a population from Pakistan: the Hazaras with a frequency of
35% (Quintana-Murci et al., 2004). The western Eurasian lineages (haplogroups HV, T,
J, Uk-group, I, W & X) exhibit a pattern contrary to eastern Eurasian groups; the
populations with greater frequencies are found within Anatolia and the Caucasus and the
Iranian plateau (Quintana-Murci et al., 2004).
In Uzbekistan, populations can be split into two distinct groups; those with Uzbek
ancestry and those that have ancestry from a neighbouring country. Western and eastern
Eurasian haplogroups dominate the populations with Uzbek ancestry (Karakalpakstan,
Khorezm, Qashkadayra, Tashkent and Fergana), with a minor South Asian contribution
(Irwin et al., 2009b). There is also an even smaller African lineage contribution found
within the western-most populations (Karakalpakstan and Khorezm), which are absent
among the other Uzbek groups (Irwin et al., 2009b). The populations with ‘foreign’
ancestry include those of Russian, Kazakhstani, Tajiki, Turkmen and Afghani heritage.
The mtDNA composition for the group with Russian ancestry is completely dominated
by western Eurasian lineages (>90%) while also consisting of a minor South Asian
contribution and an even smaller input of eastern Eurasian lineages. The populations with
Kazakhstani, Tajik and Turkmen heritage all have the majority of their mtDNA
composition made up of eastern and western Eurasian haplogroups with a minor South
Asian contribution, which was very minor within the Turkmen group. Finally, the group
with Afghan ancestry have the mtDNA genomes dominated by western Eurasian
38
haplogroups (~75%) followed by moderate eastern Eurasian lineages and a very small
South Asian contribution (Irwin et al., 2009b).
1.3.2
Y-Chromosome Variation
The Y-Chromosome is ~60Mb in length, is inherited unilaterally via the paternal line
(Jobling, Hurles & Tyler-Smith, 2004) and is a valuable genetic tool by providing the
male-driven demographic history. The chromosome consists of a short arm and a long
arm, and does undergo recombination with the X chromosome. More than 90% of the YChromosome does not participate in recombination with conserved regions of the X, this
region is known as the non-recombining portion of the Y-Chromosome (NRPY or NPY).
The regions of the Y-Chromosome which do recombine are known as the
Pseudoautosomal regions (PARs) which are located at the tip of both arms. PAR1 is
located at the tip of the short arm and is 2.6Mb in lenth, while PAR2 is found at the tip of
the longer arm but is much smaller at 0.32Mb (Jobling, Hurles & Tyler-Smith, 2004).
When analysing the variation of the Y-Chromosome, there are two main methods; the biallelic method and the multi-allelic method. The bi-allelic method identifies SNPs along
the chromosome, and is used for the assignment of haplogroups. The mutation rate of the
bi-allelic markers is ~10-8 per generation (Butler, 2005). The haplogroup tree (Figure 1.6)
was developed by the Y-Chromosome Consortium (YCC) (Jobling & Tyler-Smith,
2003). The multi-allelic method enables a greater resolution of the Y-Chromosome, and
generates the haplotypic profile. There are >200 Y-STR markers (Butler, 2005). Unlike
mitochondrial DNA, the nomenclature system for the most ancestral haplogroup is
identified by letters at the beginning of the alphabet; A-M91 and B-M60 for African
haplogroups (instead of L for mtDNA). The most common haplogroup found among
Caucasian Europeans is R1b-P25 (Butler, 2005). Haplogroups L-M20, H-M60 and R2aM124 are typical found among South Asian populations, particularly among Indians,
while haplogroups R1a1a-M17 and J2-M172 are both west Eurasian lineages (Haber et
al., 2012, also Chapter 7). A common East Asian lineage is haplogroup C3, which is
thought to represent the lineage of Genghis Khan (Zerjal et al., 2003; McElreavey &
Quintana-Murci, 2005).
39
Figure 1.6: Simplified Y-Chromosome Consortium (YCC) haplogroup tree.
The Y-Chromosome is a valuable genetic tool as it is able to provide the male-driven
demographic history such as migrations and invasions. Y-Chromosome data may provide
some slight variation to data observed from mtDNA, particularly with the latter where
women were not at the forefront of the invasions and expansions of the various Empires
throughout history. The Y-Chromosome analysis can also provide data of a population
that mitochondrial DNA cannot; in that it can infer population structures using patrilineal
surnames (Sykes & Irven, 2000; Jobling, 2001) and also language (Forster & Renfrew,
2011). The language spoken by a population can be driven by as little as 10% of the YChromosomes; some tribes of the Indian subcontinent, such as the Munda, speak
Austroasiatic languages typical of among East Asian populations, while mtDNA analysis
predominantly present South Asian haplogroups, immigrant East Asian haplogroups of
the Y-Chromosome are observed thus the Y-Chromosome has established a shift in
language spoken (Forster & Renfrew, 2011).
There is an increasing amount of Y-Chromosome data available from populations and
sub-populations around the world. The Korean population, based upon Y-Chromosome
data, appear to share close relationships with both northeast and southeast Asian
populations, while the mtDNA evidence, despite the 30% of the genome that can be
accredited to a south Asian origin, share a greater similarity to northeast Asian groups
(Jin et al., 2009). The Xibe, the population now residing in north-western China, lie in an
intermediary position between the main cluster of north-western and north-eastern
populations, but are closely related to (in a minor cluster) the Manchurian and Hezhe
groups which are both located in northeast Asia (Powell et al., 2007). This indicates the
40
Xibe have not lost their north-eastern heritage but have perhaps begun to integrate more
with their local populations.
Caucasian populations are similar to west Asian groups; in particular, the Lebanese
population can be placed among the Caucasian populations based upon Y-Chromosome
pairwise FST values, in addition, the Abkhazian group from west Georgia lie between
Iranian groups (Nasidze et al., 2004a). Meanwhile, religious groups of Lebanon present
Y-haplogroup frequencies that may be attributed to the religious origins; haplogroup JM172 was most frequent among the Maronites (division of the Roman Catholic Church),
J-M267 among Muslims and E-M35 among Greek Orthodox (Haber et al., 2010). The
differentiation between the different groups however, has been observed to have been
established before the adoption of the major religions within the region, but the
subsequent religious adoption has reinforced isolation of these groups (Haber et al.,
2010). The Ossetian groups of the Caucasus illustrate a north/south divide; the northern
groups are more similar to one another, while the southern groups, which live along the
southern slopes of the Caucasus Mountains, are more similar to other south Ossetian
groups than they are to each other (Nasidze et al., 2004b). The most frequent haplogroup
among the north Ossetians was haplogroup G, while haplogroup F was common within
the south Ossetians. In addition, haplogroup E was exclusively found within the south
Ossetian Y-Chromosomes (Nasidze et al., 2004b). The mtDNA analysis indicated a
common origin, supported by the groups speaking Iranian-related languages, however the
Y-Chromosome data suggests that any common paternal origin may have been lost based
upon the haplogroup differences exhibited. Both north and south Ossetians are closely
related to the neighbouring Caucasian groups but not so much to one another (Nasidze et
al., 2004b), indicating greater paternal admixture from these Caucasian groups into the
Ossetian population. The Kurdish populations from the Caucasus region are more closely
related to west Asian and Caucasian groups, unlike the mtDNA data which revealed a
closer affinity with Europeans (Nasidze et al., 2005a). The Kurmanji and Zazaki speakers
from Turkey are very close to west Asian and Iranian groups, while the Kurmanji
speakers from Georgia lie in an intermediate position between two north Caucasian
groups; the north Ossetians from Ardon and Darginians (Nasidze et al., 2005a).
The most frequent haplogroups displayed among the Kalmyks from south-western Russia
are C, C3c, K and P. Haplogroup C can be commonly found within both Central Asian
and Mongolian populations, but is absent among eastern Europeans. It has been touted as
41
the possible genetic lineage associated with Genghis Khan (McElreavey & QuintanaMurci, 2005). The C3c lineage is also found among the Mongolian population but also
the Kazakhs (Nasidze et al., 2005b). Haplogroup K can often be found within the
populations of this region (eastern Europeans, Central and East Asians) while haplogroup
P is absent or present at low frequencies among these groups (Nasidze et al., 2005b). The
common eastern European lineage N3 was found to be present in just one Kalmyk
sample, indicating that despite their geographical position, the Kalmyks have not
intermingled with their eastern European or Caucasian neighbours, thus substantiates the
Mongolian ancestral claim (Nasidze et al., 2005b). Within western Russian groups, the
haplogroup with the greatest frequency is R1a1. Other major haplogroups are I and N3;
together, these three lineages contribute 73.8-93.9% of the Y-Chromosomes among the
sub-populations (Fechner et al., 2008). Haplogroup R1a1 is commonly found in Eastern
Europe and the Volga-Ural region; the frequencies are greater toward south-western
Russia and the Caucasus. The R1a1 lineage is found at low frequencies within Western
Europe (Fechner et al., 2008). Haplogroup N3 if often found in Eurasia, Northern and
Eastern Europe, and the Volga-Ural region. Frequencies are greater in the Volga-Ural
region than they are in Eastern Europe (Fechner et al., 2008). Generally, the European
Russian Y-Chromosome composition is most similar to Eastern European and VolgaUral groups (Fechner et al., 2008).
The Gagauz of Moldova can be grouped with west Asian populations such as the
Lebanese, Syrians, Kurds and Iranians from Isfahan and then the south Caucasian
groups, the Armenians and Azerbaijanis, rather than with Moldovans and Europeans,
which are fairly close to one another, or with Turks and eastern Europeans (Nasidze et
al., 2007). However, they do share a greater genetic relationship with their geographical
neighbours than they do with the populations which share their linguistic heritage
(Nasidze et al., 2007).
The Gilaki and Mazandarani populations from the South Caspian region of Iran are,
according to Y-chromosome analysis, similar to the Caucasian groups (Azerbaijan and
Armenian populations) followed by west Asians, the region they now inhabit.
Haplogroup J2 and R1 were both found in high frequencies among both populations, and
both account for >50% of Gilaki and Mazandarani Y-Chromosomes (Nasidze et al.,
2006). Both populations indicate a potential paternal origin in the south Caucasus region
before migration and integration into the South Caspian region. In contrast, the mtDNA
42
data suggests a greater similarity with west Asians than with Caucasian and European
groups, and therefore resemble the geographical and linguistic neighbours (Nasidze et
al., 2006).
The Afghan population represents great ethnic, linguistic and cultural diversity (Lacau et
al., 2011). A recent study found a Greek contribution to the Pashtun ethnic group which
neighbour the Pakistani border. The Most Recent Common Ancestor (MRCA) between a
Pashtun and three Greek males ‘coincides with the time period’ in which Alexander the
Great invaded and occupied Persia. This genetic link was identified via the E-M78
lineage (Lacau et al., 2011). Twenty-two haplotypes were ascertained among the Afghani
population; eight were found in both northern and southern Afghanistan, while the
remaining fourteen were exclusively found among southern Afghans (Lacau et al., 2011).
The two regions are genetically distinct from one another; a possibility for this could be
due to the Hindu Kush Mountain range serving as a natural barrier between the
populations in northern and southern Afghanistan preventing any admixture between
them (Lacau et al., 2011). Previous studies of the human Y-Chromosome have included
the Pakistani population, which also included the Afghani Baluch, Hazara and Pashtun
populations (Qamar et al., 2002) and also populations of Central Asia (Heyer et al.,
2009). Both studies indicate that the populations share a common ancestry despite
differences in ethnicity. The study of Pakistani populations (Qamar et al., 2002)
identified that all populations exhibited a similar Y-haplogroup diversity which clustered
with South Asian groups, while the study of Turkmenistan, Uzbekistan, Kazakhstan,
Kyrgyzstan, and Tajikistan populations (Heyer et al., 2009) identified greater variation
and diversity within the populations rather than among them. (Haber et al., 2012).
Another recent study on the Y-STR data of four Afghani populations (Hazara, Pashtuns,
Tajiks and Uzbeks) identified the presence of 32 Y-Chromosome haplogroups among
these four ethnic groups (Haber et al., 2012). The west Eurasian lineage, haplogroup
R1a1a-M17, was identified at greater frequencies among the Pashtuns (51.02%) and the
Tajiks (30.36%) than among the Uzbeks (17.65%) and Hazaras (6.67%). Meanwhile,
haplogroup C3-M217 exhibits an inverse pattern in that it is most abundant among the
Hazaras (33.33%) and Uzbeks (41.18%) than the Tajiks (3.57%) and Pashtuns (2.04%)
(Haber et al., 2012). Y-haplogroup C3c has been found in 8% of males across sixteen
populations from northeast China to Uzbekistan has an MRCA dated to ~1000 years ago
(95% CI ~700-1,300 years) in Mongolia (Zerjal et al., 2003). This date corresponds with
the expansion of the Mongol dynasty of Genghis Khan.
43
1.4 Aims
Afghanistan lies in a region of Central Asia that was once a cross-road for the major
trade routes and migrations, and now currently exhibits a diversity of ethnic groups. The
main aim of this study is to identify the composition and distribution of maternally
inherited haplogroups via mtDNA of four Afghani ethnic groups. We also look to
identify whether the beliefs of each ethnic group’s own origin is supported from the
mtDNA analysis. Additionally, to determine whether the Afghani populations share
similar characteristics with adjacent populations through the sequencing the HVS-I
region and if the demographic processes have led to the emergence of any of these ethnic
groups.
44
Chapter Two
Afghanistan
45
2. Afghanistan
The name Afghanistan is of Indo-Iranian origin, meaning ‘Land of Afghans’; ~istan
originating from Persian meaning ‘country’, which itself derives from the Indo-Iranian
‘stanam’ to mean place or where one stands, and this word derives from Proto-IndoEuropean sta-no- meaning “to stand” (Harper, 2010). The name Afghan was initially
used by the Pashtun ethnic group as a name for themselves, and was first noted in 1030
AD (Harper, 2010). The flag of Afghanistan consists of three vertical stripes (left-right)
of black, red and green. In the centre of the flag, lies the national emblem of Afghanistan
which features a mosque, surrounded by sheaves of wheat, and a scroll scribed with the
word ‘Afghanistan’. Below the mosque are numerals for the solar year 1298 (year 1919
on the Gregorian calendar) to highlight Afghanistan’s independence from UK influence
(CIA, 2010); while above is an Arabic inscription of the Shahada (Muslim creed), rays of
the rising sun, and the Takbir, an Arabic expression meaning ‘God is great’ (CIA, 2010).
2.1 Geography
The Islamic Republic of Afghanistan is the 41st largest country in the world, with an area
of 652,230 km2 (CIA, 2010) and by comparison, is slightly larger than France.
Afghanistan is a land-locked country situated in Central Asia, largely known for its
constant involvement of both international and civil conflict. Afghanistan shares land
borders with Iran to the west (~950 km), Turkmenistan (~750 km) Uzbekistan (<150 km)
and Tajikistan (~1,200 km) to the north, Pakistan (~2,500 km) to the south and east, and
a very small border with China (<80 km) in the far north-east (CIA, 2010). Its location
allows the links to three major cultural and geographical regions; the Indian subcontinent
to its southeast, Central Asia to its north and Iran to its west (Barfield, 2010).
Afghanistan has thirty-four states or provinces (Figure 2.2), these can be found in Table
2.3 with their associated population estimates (Islamic Republic of Afghanistan Central
Statistics Organization (CSO), 2010). The capital and largest city is Kabul, located in
eastern Afghanistan, while other large cities include Kandahar (south), Herat (west),
Mazar-e Sharif (north) and Jalalabad (east of Kabul).
Afghanistan is generally split into three regions; the Central Highlands, the northern
plains and the south-western plateau. The region of the Central Highlands incorporates
the Hindu Kush Mountains and its sub-ranges. The landscape here is rugged with deep
valleys between the high peaks. The northern plains contain the most fertile land in
Afghanistan and as a result are the most agricultural region. Extending from the Iranian
border in the west to the Pamir Mountains, this region covers approximately
46
100,000Km2. The northern plains are a densely populated region (“Afghanistan”, 2011)
approximately 600 metres above sea level. The south-western plateau is “a region of high
plateaus, sandy deserts and semi-deserts” (“Afghanistan”, 2011) covering some 130,000
Km2, ~900 metres above sea level. A semiarid region that includes the Registan and
Margo deserts as well as the Helmand River.
Figure 2.1: Political map of Afghanistan.
Afghanistan has a varied landscape; dry plains in the north and also the south and
southwest, while the Hindu Kush mountain range stretch across the land from the
northeast toward the southwest, covering most of the country. The highest peak in
Afghanistan lies within the Hindu Kush mountains, Noshaq, also Nowshak, rising to an
impressive 7,485 metres (~24,500 feet), and is located near the north-eastern border with
Pakistan. Noshaq, is the second largest peak within the Hindu Kush mountain range
behind Tirich Mir (7,708 metres/~25,300 feet) which lies within the Chitral region of
north-western Pakistan. The Hindu Kush are a sub-range mountain system belonging to
the Himalayas (Lacau et al., 2011). The Hindu Kush, which stretches for approximately
600 miles, forms the western tip of the Pamir Mountains, Karakorum Mountains, and the
47
Himalayan mountain range. The height of the Hindu Kush Mountains decrease as they
stretch westward across Afghanistan. In truth, the westernmost mountainous region of
the Hindu Kush mountain system are not the Hindu Kush Mountains, but are a number of
smaller ranges which extends out toward Herat (Barfield, 2010). These include the Koh-i
Baba (west of Kabul), Koh-i Hisar (west of the Koh-I Baba), Safed Koh (Paropamisus),
Siah Koh (both near Herat) and Chalap Dalan (southeast of Herat) are all sub-ranges of
the Hindu Kush. The Torkestan Mountain sub-range extends northwest, while the Siah
Koh extends northward and the Malmand and Khakbad southwest.
Figure 2.2: 34 provinces of Afghanistan and also its location in Central Asia (ISAF;
http://www.isaf.nato.int/map-usfora/index.php) and inset; the Afghanistan’s capital, Kabul, and the
surrounding provinces.
Afghanistan is home to several river systems, all of which spring in the mountains, but
only one can ever reach the ocean; the Kabul River, a tributary of the Indus River. The
Murghab River is not confined within Afghanistan’s borders, flowing from the Koh-i
Hisar into south-eastern Turkmenistan. The Helmand River springs in the Koh-i Baba
and flows north of Registan and through the Dasht-i Margo, and pooling into the Hamun
Lakes. The Khash, Harut, and Farah rivers also join the Helmand River in pooling into
48
the Hamun Lakes, a group of three lakes (Hamun-e Helmand, Hamun-e Puzak and
Hamun-e Sabari) in eastern Iran and south-western Afghanistan. These lakes are present
seasonally and are salt-rich. Another significant lake is the Ab-i Istada, south of Ghazni,
which the River Tarnak flows into. Some rivers in Afghanistan may only be present
seasonally and do often dry out before reaching the basin of another river (Barfield,
2010). Afghanistan’s northern border can be identified by the flow of the Amu Darya
River (formally the Oxus River) as it runs along the territory line for ~1,000 Km,
separating Afghanistan from Tajikistan and Uzbekistan.
The south of Afghanistan is largely desert land; Dasht-i Margo and the Registan desert.
Dasht-i is the Persian/Pashto word for ‘plain’ or ‘desert’, while Margo translates as
‘death’ or ‘dead’, therefore the Dasht-i Margo is known as the desert of death. The land
in this desert is primarily rocky-clay and sand mounds with salt marshes. The Dasht-i
Margo lies approximately 900 metres above sea level within the Nimruz and Helmand
provinces and spans an area ~150,000 Km2. The Registan desert lies within the Helmand
and Kandahar provinces and is primarily a sand-base desert.
2.2 Climate
Afghanistan has a varied climate and is unsurprisingly different from one region to
another. Generally, Afghanistan has cold winters and very hot summers (CIA, 2010;
“Weather & Climate in Afghanistan”, 2011; Petrov & Weinbaum, 2011). The mountains
of the northeast have dry, cold winters, while the mountainous region near the Pakistan
border receives some of the wetter weather systems contributed by monsoons on the
Indian subcontinent. It is in this region in eastern Afghanistan where the most rainfall
occurs, largely due to the positions of the mountains; and it is here that Afghanistan’s
only natural forests can be found (Barfield, 2010). In the southwest, daytime
temperatures can reach as high as 35°C, while in Jalalabad temperatures of 49°C have
been recorded (Petrov & Weinbaum, 2011). In the mountains, January temperatures can
be -15°C or below, while -24°C has been recorded in Kabul (Petrov & Weinbaum, 2011).
During the period of June to October, Afghanistan receives very little rainfall, while most
of the country’s precipitation occurs between December and April. Snow falls in the
highlands from December to March. Afghanistan receives an average of 316mm of
rainfall per annum; on average, September is the driest month of the year while March is
the wettest (“Afghanistan Climate”, n.d.). Kabul has an average relative humidity of
56.4%; February is the most humid month (77%) while August is the least (33%)
(“Afghanistan Climate”, n.d.)
49
2.3 Population
Afghanistan has never had a completed census of the population with the first attempt in
1979 interrupted due to Soviet invasion and conflict, and as a consequence, obtaining
accurate population data for the country proves to be arduous. A further census was
scheduled in 2008; however, this was postponed for a further two years (Reuters, 2008;
UN, 2008) and is now scheduled from 2011 through to 2013 with data to be collected
and supplied one province at a time (UN Statistics Division, 2010). Recent estimates
suggest the Afghan population be approximately 25 million (CSO, 2010), 28.1 million
(BBC, 2010), 29.1 million (CIA, 2010; United Nations Population Fund (UNFPA), 2010)
to a forecasted 37 million by 2015 (UN, 2002) and a life expectancy of 44 years for both
men and women (BBC, 2010; CIA, 2010; UNFPA, 2010). Afghanistan is widely known
to be an Islamic nation, with population estimates of Sunni Muslims to be ~80%, Shi’a
Muslims ~19% and all other religions ~1% (CIA, 2010) including Buddhists, Hindus and
Sikhs (Nielson, 2010).
Table 2.1: Afghanistan population estimates every five years since 1950 (UNPD, 2009)
Of the estimated population values, 51.79% are male (15,079,000) and 48.21% are
female (14,038,000) (United Nations Population Division (UNPD), 2009) while the
average population density of Afghanistan is 45 people per Km2 (UNPD, 2009). Based
on the estimated population figures as seen in Table 2.1, the Afghan population has seen
an overall population growth of 27.99% since 1950 (UNPD, 2009) and 2010 will see an
estimated population growth of 3.45% (UNPD, 2009) or 2.47% (CIA, 2010). The
population density of Afghanistan is shown in Figure 2.3.
50
Table 2.2: Population growth rate every 5 Years since 1950 based on the Estimated Population of
Afghanistan (UNPD, 2009)
Figure 2.3: The Population density of Afghanistan.
2.4 Ethnicity & Language
Afghanistan is an ethnically diverse country with several different peoples inhabiting the
varied landscape (Figure 2.4). Pashtuns inhabit large areas of Afghanistan, mostly in the
51
south, while Tajiks are common in the north and the Hazara in a more central region.
Estimates of the population show a variety of ethnic groups: Pashtun (42%), Tajik (27%),
Hazara (9%), Uzbek (9%), Aimaq (4%), Turkmen (3%), Baloch (2%) and the remaining
4% representing all other ethnicities (CIA, 2010). Alternatively, Pashtuns make up
approximately 40% of the population; Tajiks ~25%; Hazara, ~20%; Uzbeks, ~5%;
Aimaqs, ~5% and Turkmen <5% (Weinbaum, 2011). Despite the slight difference in
estimates, both sources identify Pashtuns as the major ethnic group in Afghanistan,
followed by the Tajiks, Hazara and Uzbeks. This difference can also be put down to the
lack of official population statistical data which a national census would provide. As a
consequence of ethnic diversity, it is not unexpected to find that there are also a diverse
range of languages spoken in Afghanistan (Figure 2.5).
Figure 2.4: Distribution of Afghan ethnic groups.
Although the Pashtuns are the dominant ethnic group in Afghanistan, only ~35% of the
population speak their language (CIA, 2010) while approximately 50% speak Dari, the
Afghan dialect of Persian (CIA, 2010; Weinbaum, 2011). These Indo-European
languages are both official languages of Afghanistan. Other languages spoken include
Turkic languages - 11% (mostly Uzbek and Turkmen), Arabic, Indo-European languages
and other variations of Persian - 4% (CIA, 2010; Lewis, 2009). Ethnic groups in
Afghanistan are not generally defined by the language they speak, bilingualism is
52
widespread as non-Pashtuns will also speak Pashto, and Pashtuns may speak Dari (or a
variant thereof) (CIA, 2010; Weinbaum, 2011).
Most of the languages spoken in Afghanistan are Proto-Indo-European in origin (Lewis,
2009). Many of the languages spoken around the world are descendants of this language
group such as the Germanic languages (German and English), the Romance languages
(French, Italian, Latin and Spanish), Baltic and Slavic languages and the Indo-Aryan and
Indo-Iranian languages, including Hindi and Farsi (Figure 2.6).
Figure 2.5: Distribution of language groups spoken in Afghanistan (Retrieved: Nielson (2010).
According to Lewis (2009) there are over thirty Indo-Iranian (sub-branch of IndoEuropean) languages (not including their dialects) spoken in Afghanistan. Most falling
on the Indo-Aryan branch while the others on the Iranian branch. Nuristani and Pashayi
are two languages on the Indo-Aryan branch that stand out, while on the Iranian branch
Kurdish, Balochi, Pashto, Munji, Dari, Aimaq and Hazaragi are notable examples. There
is only one Dravidian language spoken in Afghanistan; Brahui, in Kandahar province of
south-eastern Afghanistan which borders Pakistan (Lewis, 2009). The Dravidian
languages are typically found deep in India and regions east of India. Equally, there is
one Afro-Asiatic language, Arabic, from the Semitic sub-branch spoken in Afghanistan
and is spoken by small communities in northern Afghanistan (Lewis, 2009). The
remaining languages belong to the Altaic group, including Uzbek, Turkmen and Kyrgyz
from the Turkic branch; and Mogholi from the Mongolic branch. Mogholi is spoken by a
small community near Herat (Lewis, 2009).
53
Figure 2.6: Indo-European Language Tree illustrating the Centum and Satem branches (Short, 2007)
Indo-Iranian languages are heterogeneous (Fortson, 2009) due to their many dialects, but
they stretch across a wide geographical area, often compartmentalised into four regions
(Figure 2.7). These four regions are: the Dnieper-Ural region, the Ural-Yenisei region,
the Central Asian zone and the Greater Iran-India zone (Mallory, 2003).
54
The Dnieper-Ural region saw the emergence of agricultural communities in the 5th
millennium BC and the later introduction of wheeled vehicles (Mallory, 2003).
(a)
(b)
(c)
Figure 2.7: (a) The Iranian languages spoken in the Dnieper-Ural region; (b) The Ural-Yenisei region and
the eastern Iranian languages spoken and the location of the Central Asian BMAC culture (shaded) (c) The
locations of the Afanasevo (shaded) and Andronovo (outlined) cultures of Central Asia and the Iran-India
zone in the south (Mallory, 2003).
Some eastern Iranian languages can be traced back to this region, such as Ossetic,
Scythian and Sarmation. The Ural-Yenisei region shared similarities with the Dnieper55
Ural region, both archaeological and cultural. The main culture here was the Andronovo
complex (Mallory, 2003). The Central Asian zone was formed resulting from the
emergence of communities during the Neolithic period. Also eastern Iranian language
speakers, the presence of other Indo-Iranian traits can be found in Bactria-Margiana
Archaeological Complex (BMAC) sites, such as the apparatus required pressing haoma,
a leafless vine that produces a milky juice (Mallory, 2003). The Greater Iran-India zone
refers to the territories that were occupied by the Indo-Iranian languages. Civilisations
can be traced to indigenous or near-indigenous origins from the 7th millennium BC,
despite the cultural diversity and the large area they inhabited (Mallory, 2003).
The Indo-Iranian language branch has two main sub-branches; Indic (Indo-Aryan) and
Iranian. An example of an early Iranian branch language is Avestan, while the Indic
counterpart is Sanskrit. The Rig Veda, an ancient set of sacred Hindu scripts and hymns
were written in Sanskrit (Fortson, 2009; Mallory, 2003). Approximately 8,000 years
before present (YBP), the Elamite civilisation from the Fertile Crescent are thought to
have spoken a Dravidian-family language which spread eastwards to the Indus Valley
and Indian subcontinent concurrently with the agricultural movement (Quintana-Murci,
et al., 2004). This language movement provides a rational justification for the presence of
the Brahui language in south-eastern Afghanistan. Later, Andronovo or Srubnaya cultural
nomads migrated into Iran and Afghanistan (~5,000 YBP), and probably brought the
Indo-Iranian language branch which would subsequently displace the use of the
Dravidian languages in Iran and the surrounding region (Quintana-Murci, et al., 2004).
2.5 Migrations
Modern Afghanistan is full of tribal and sub-tribal communities; Afghan towns are
centres of trade with pastoral and agricultural products from the more rural zones
exchanged for manufactured goods that are more widely available in the urban towns and
cities (Barfield, 2010). These towns and cities are often inhabited by multiple ethnicities,
providing diverse local communities. Afghans are quite nomadic, particularly those
living in the more remote regions; migrating, often seasonally, in search of work when
opportunities are poor within their own regions. For example, it is not uncommon for a
tribe member(s) to migrate in the winter season from the agricultural plains into towns
before moving back prior to the new season. Even the most remote regions have links to
their regional urbanised zones (Barfield, 2010).
56
2.6 Refugees
Due to the ongoing conflict in Afghanistan, many Afghans do not feel safe residing in the
country and many become refugees or seek asylum in the neighbouring countries.
According to the UNHCR, the UN Refugee Agency, as of January 2010 (Figure 2.7),
there were just under 2.9 million Afghan refugees, with 1.7 million in Pakistan and
another 933,500 in Iran. In addition, there are also nearly 300,000 internally displaced
Afghans and more than 30,000 asylum seekers. These figures are the official statistics,
while the actual numbers of displaced Afghans is likely to be much higher as not all
Afghans will go through the appropriate channels. Since 2002, the UNHCR have helped
4.5 million Afghan refugees reintegrate into Afghanistan via the UNHCR Shelter
programme (UNHCR, 2011).
Table 2.3: UNHCR statistics of displaced Afghanis as of January 2010 (UNHCR, 2011).
Type of Displacement
Number of Displaced Afghanis
Refugees
2,887,123
Asylum Seekers
30,412
Returned Refugees
57,582
Internally Displaced Persons (IDPs)
297,129
Returned Internally Displaced Persons (IDPs)
7,225
Various
0
Total Population of Concern
3,279,471
2.7 Afghan Sub-Populations
2.7.1
Pashtuns
The Pashtun peoples are considered to be ethnically Caucasian (“Afghans: Their History
& Culture”, 2002) and are Sunni Muslims. They are located in south-eastern
Afghanistan, but can also be found in north-western Pakistan and north-eastern Iran, and
are an eastern Iranian ethno-linguistic group, and as such, speak Pashto (“Afghans: Their
History & Culture”, 2002), an Indo-European language found on the Iranian sub-branch
(Short, 2007). Their traditional homeland lies in an area east, south and southwest of
Kabul (Weinbaum, 2011). They are not contained to one region of Afghanistan as they
also inhabit northern and western (around and near Herat) regions (Weinbaum, 2011).
57
Figure 2.8: Pashtun people from Afghanistan
Pashtuns practice a set of traditional cultural values, known as Pashtunwali, ethics which
include “badal; the right to seek revenge, nunawati; the right to seek refuge and live in
peace, melmastya; hospitality and protection to guests, tureh; bravery, sabats;
steadfastness, isteqamat; persistence, imamdari; righteousness, ghayrat; the right to
defend one’s property and honour, and mamus; the right to defend the female family
members”. Some of these traits can probably be identified in the current ongoing conflict
in Afghanistan, particularly as the Taliban are made up of Pashtuns (BBC, 2010). There
are more Pashtuns in Afghanistan than any other ethnic group, approximately 38% of the
total population (“Afghans: Their History & Culture”, 2002), and have been the
dominant group since the 18th century (Barfield, 2010), perhaps represented by the fact
that the President, Hamid Karzai, is also a Pashtun. The Pashtun origins are unknown,
however, their existence is probably a consequence of intermingling of ancient and the
subsequent invaders (“Afghans: Their History & Culture”, 2002) that have inhabited the
lands the Pashtuns now live. However, the Pashtuns themselves trace their lineage to
Qais (Barfield, 2010). Within the Pashtun ethnic group, there are four main Pashtundescendant groups (Barfield, 2010); i) the Durrani, who are descendants of Qais’s first
son, found in the south and southwest, ii) the Ghilzais (the largest Pashtun group),
descendants of Qais’s second son, but via his daughter, found in the east, iii) the
Gurghusht, descendants of Qais’s third son and iv) the Karlanri, who are claimed to be
the descendants of an adopted child of unknown/uncertain origin, these Pashtuns live
along the Afghanistan-Pakistan border with the majority of the population falling on the
Pakistani side (Barfield, 2010). While there are sub-divisions of the Pashtun ethnicity,
there are also divisions of tribes within these of which families and communities belong.
Pashtuns themselves do not only define themselves by their ethnicity, but also by
speaking Pashto and practicing Pashtunwali (Barfield, 2010).
58
2.7.2
Tajiks
The Tajik ethnic group are the largest of the Dari speaking peoples, inhabiting northern
Afghanistan, across the border from Tajikistan, into regions of the Hindu Kush
Mountains. They mostly inhabit Badakhshan province of north-eastern Afghanistan,
although there are pockets of Tajik populations elsewhere (Weinbaum, 2011), within the
Kabul and Herat regions for instance.
Figure 2.9: Tajik people from Afghanistan
Dari is a form of the Persian/Iranian language, and are generally defined as non-tribal
Persian-speakers (Barfield, 2010; Weinbaum, 2011). They are Caucasian, and are
morphologically similar to Iranians (“Afghans: Their History & Culture”, 2002). The
Tajik population makes up for approximately 25% (“Afghans: Their History & Culture”,
2002) ~30% (Barfield, 2010) of the overall Afghan population and are mostly Sunni
Muslims, while there are some Shi’a Muslims distributed within the remote mountain
populations. The Tajik population mostly resides within the mountain ranges of the
northeast; while there have also been significant populations within Kabul, Herat and
Mazar-e Sharif.
2.7.3
Hazaras
The Hazaras are also a Dari speaking groups (“Afghans: Their History & Culture”, 2002)
and speak a dialect of Persian called Hazaragi (Farr, 2009; Barfield, 2010). Based on
their language and religion (Shi’a Islam) the Hazaras were likely to have been contained
by Persian/Iranian influence or rule. The name Hazara is believed to derive from the
Persian hezar meaning thousand; perhaps a reference to a Mongol army unit (Farr, 2009).
The Hazara are of Mongol descent, believed to have arrived in Afghanistan in the 13th
59
and 14th centuries (“Afghans: Their History & Culture”, 2002), sometime between 1229
and 1447 (Farr, 2009) and still sharing Mongol words in their modern-day vocabulary
(Farr, 2009).
Figure 2.10: Hazara people from Afghanistan
They “represent the last remnants of the Mongol dynasties that came through
Afghanistan in the early part of the 13th century” (Farr, 2009). Unlike other ethnic
groups in Afghanistan, the Hazaras are all contained within the Afghan borders. A
traditionally nomadic group and can be found within the mountains of Central
Afghanistan, their home extends south to Ghazni and west towards Herat, a region
known as Hazarajat (Farr, 2009; Barfield, 2010). Hazarajat, although well-positioned
geographically in Afghanistan, is probably the most remote region due to a combination
of poor communication links and networks and also its position within the high
mountains of the Hindu Kush (Weinbaum, 2011). Despite this region is where most
Hazara can be found, many have migrated elsewhere due to a lack of land (Weinbaum,
2011). The Hazaras constitute an approximate 19% of the Afghan population (“Afghans:
Their History & Culture”, 2002) 15% (Barfield, 2010) being an estimated 2-3 million
(Farr, 2009) or 5 million (“Afghans: Their History & Culture”, 2002) strong, while many
ethnic leaders suggest the number is closer to 8 million (Farr, 2009). They are believed to
be the descendants of Mongol armies that conquered Iran (Barfield, 2010), perhaps the
descendants of Chagatai (a son of Genghis Khan and leader of the region in the early 13 th
century (Farr, 2009)) soldiers who failed in their attempt to conquer the Indian
subcontinent. In trying to do so, they migrated into the Hindu Kush, but never advanced
from this position. The Hazaras themselves claim to be descendants of Genghis Khan or
a close male relative (McElreavey & Quintana-Murci, 2005). The “presence of the YChromosomal Haplogroup C within the Hazaran population, and its absence from
neighbouring populations is inferred as the genetic legacy of Genghis Khan”
60
(McElreavey & Quintana-Murci, 2005). The recent history of the Hazaras saw them
placed at the bottom of the Afghan ethnic hierarchy, targeted for persecution by the
Taliban (Barfield, 2010) and sold as slaves in the cities, however the Hazara slave trade
saw their population proliferate.
2.7.4
Uzbeks
The Uzbeks, unlike the other mentioned ethnic groups, do not speak an Indo-European
language. They speak the Uzbek language, an Altaic language, a group of Turkic
languages, that is similar to Turkish and completely different to the Iranian languages
(“Afghans: Their History & Culture”, 2002). They are mostly Sunni Muslims (UNHCR,
2003; Barfield, 2010) and are ethnically Turkic (“Afghans: Their History & Culture”,
2002; UNHRC, 2003) that descend from nomadic tribes that arrived from Central Asia in
waves (Barfield, 2010).
Figure 2.11: Uzbek people from Afghanistan
They arrived in Afghanistan in the 16th Century, settling in the irrigated valleys or Loess
steppes and became farmers (Barfield, 2010). The Uzbeks are the largest of the Altaic
groups (Weinbaum, 2011), with an estimated population of 1 million, approximately 6%
of the total population (“Afghans: Their History & Culture”, 2002). They inhabit the area
of northern Afghanistan, across the border from Uzbekistan, south of the Amu Darya
(formally known as the Oxus) river; when the northern border of Afghanistan was
altered; the Uzbek populations (as well as other Altaic groups) became, by definition,
Afghans (“Afghans: Their History & Culture”, 2002; Barfield, 2010). The mostly inhabit
the Balkh province and are generally farmers (Weinbaum, 2011).
61
2.7.5
Other Ethnic Groups
As well as the mentioned ethnic groups, there are also several others inhabiting the
Afghan lands; the Aimaqs (also Aimak), Baluch (also Beluch), Turkmens, and
Nuristanis, to name but a few, to which when combined constitute approximately 12% of
the total population (“Afghans: Their History & Culture”, 2002).
Figure 2.12: Left: an Aimaq man from Afghanistan, Middle: a Baluch man from Afghanistan, Right: a
Turkmen man from Afghanistan
2.7.6
Aimaqs
The Aimaqs are tribal Central Asian peoples (“Afghans: Their History & Culture”, 2002)
of Persian speakers (Dari). They are Sunni Muslims, believed to be of Turkish descent
(Barfield, 2010; Weinbaum, 2011). There are approximately 500,000 Aimaqs (Barfield,
2010) who have historically inhabited the mountainous region east of Herat (Weinbaum,
2011) and west of Hazarajat (home of the Hazaras), but have also occupied some of the
steppes and desert-lands north and east of Herat (Barfield, 2010).
2.7.7
Baluch
The Baluch are often described as extensions of the Iranian and Pakistani populations
(Barfield, 2010). There inhabit south-western Afghanistan (“Afghans: Their History &
Culture”, 2002) in and around the sparsely populated Kandahar region (Weinbaum,
2011), and speak their own language; Baluchi, related to Persian (Barfield, 2010). Many
Baluch also speak Pashto (the Pashtun language) as they live closely with the Pashtuns;
often the distinguishing feature between the Baluch and the Pashtuns is not the spoken
language or descent, but the political allegiance to the Baluch Khans (Barfield, 2010).
They are pastoral nomads, known as smugglers linking Iran and India (Barfield, 2010).
2.7.8
Turkmens
The Turkmens are an Altaic group (like the Uzbeks) and constitute for approximately
10% of population of Afghanistan (Uzbeks & Turkmen combined) (Barfield, 2010).
62
They are Sunni Muslims and are an extended population from Turkmenistan, where the
majority reside (“Afghans: Their History & Culture”, 2002; Barfield, 2010). They inhabit
the north-western region of Afghanistan, close to the Turkmenistan border, mostly in
Balkh province with the Uzbek populations (Weinbaum, 2011). They speak a Turkish
language; Turkmen (“Afghans: Their History & Culture”, 2002; Barfield, 2010) and are
semi-nomadic, however, more nomadic than Uzbeks (Barfield, 2010; Weinbaum, 2011).
2.7.9
Nuristanis
The Nuristanis live in the mountains northeast of Kabul (Barfield, 2010, Weinbaum,
2011), inhabiting isolated valleys within Nuristan province. They are more culturally and
linguistic distinct than other ethnic groups in Afghanistan, only converting to Islam
(Sunni Muslims) recently post-conquest (1895) in the 20th century (“Afghans: Their
History & Culture”, 2002; Barfield, 2010). Their languages are unrelated to any others in
Afghanistan, and are even different between individual tribes from separate valleys
(Barfield, 2010).
2.8 Historical Influence on Afghanistan’s Population
Afghanistan has been ruled by various empires throughout history, often controlled by
foreign invaders (Barfield, 2010) and the combination of these empires have left a unique
but complex ethnic, linguistic, tribal and cultural structure (Rasanayagam, 2003).
Afghanistan has been invaded and conquered many times, in all likelihood for its
location and accessibility to other, more prosperous regions such as India, Central Asia
and other trade routes i.e. the Silk Route (Barfield, 2010).
2.8.1
Prehistory
Following the emergence of modern humans from Africa approximately 60,000 YBP, the
first region they would have encountered and settled in was the south-western region of
Eurasia (Quintana-Murci, et al., 2004). The earliest evidence found of human settlement
has been dated to ~30,000 BC (“Afghanistan Online”, 2008; Colorado State
University/Department of Defense (CSU), 2010; Dupree & Dupree, 2011). A sculpted
head was found at Aq Kupruk, this was dated to ~20,000 BC (CSU, 2010). The site of
Aq Kupruk also uncovered evidence of Stone Age technology and culture dated at
~10,000 years old (~8,000 BC) (Jacobson, 1979; Bednarik, 2010; Dupree & Dupree,
2011). By the time of the latter, the domestication of plants and animals had commenced
in the foothills of the Hindu Kush (CSU, 2010) making this region of northern
63
Afghanistan one of the earliest places where this occurred (CSU, 2010). The emergence
of Neolithic settlements between 9,000-6,000 BC indicates the progressive expansion of
the knowledge required to cultivate and rear domesticated plants and animals. By the late
4th millennium, plants were regularly used for cereals (Jacobson, 1979). There are two
main theories proposed for the spread of agriculture; (i) the immigration of farmers with
the required knowledge and technologies, as proposed by Gordon Childe (1925), and (ii)
the acquisition of cultural traits by communities from passing non-indigenous migrators
(Davison, 2006). Both forms permit the gradual spread of agricultural knowledge.
2.8.2
Aryan Migration
As a consequence of the domestication of animals, in particular the horse ~4,500-4,000
BC in Iberia and the Eurasian Steppe (Jansen, 2002; Anthony, 2007) the ability to travel
and migrate from one region to the next was revolutionised, enabling peoples to travel
more quickly (Mallory, 2003) and expand geographically in different direction (Zvelebil,
1980). One of these migrating peoples were the Aryans, believed to be one of the early
Proto-Indo-European speaking groups (Fortson, 2009). They arrived and settled in
northern Afghanistan sometime around 2,000-1,500 BC (CSU, 2010), while some
continued migrating and headed west settling in Iran and others south into India and
modern-day Pakistan (CSU, 2010). Their arrival and the timely demise of
contemporaneous established civilisations have ignited debates as to whether their
movement was a migration or more of an invasion. It is quite possible that as a result of
their expansive distribution, the Aryans were the main advocates of the Indo-European
languages and promoting their proliferation via demic or cultural diffusion (Kumar, V,
2008), and therefore displacing the indigenous languages.
2.8.3
Persian Empire
Afghanistan has been ruled/governed by the Persians on several occasions, and has been
part of Persia for most of its history. The first were the Medes (7th century-550 BC), a
nomadic tribe from Iran that became independent and due to their greater distribution,
ruled from Afghanistan in the east to Iraq in the west (CSU, 2010). During their reign,
the religion of Zoroastrianism was founded in Balkh. The Medes preceded the
Achaemenid Empire, who united Iranians together. Ruling from ~550 BC - ~330 BC,
their vast empire stretched from Libya, Egypt and Saudi Arabia in the south, Turkey
(west), the Balkans and the Black Sea (north) to Afghanistan and Pakistan in the east.
Afghanistan was split into satrapies; Arachosia (south; Kandahar), Aryana (west; Herat),
64
Bactria (northern Afghanistan and southern Uzbekistan, Tajikistan and Turkmenistan),
Drangiana (south and southwest; Sistan) and Gandhara (northeast and northern Pakistan)
(CSU, 2010; Dupree & Dupree, 2011). Darius the Great, the Achaemenid ruler, spread
the religion of Zoroastrianism throughout Afghanistan and the Achaemenid Empire
(“Afghans: Their History & Culture”, 2002; CSU, 2010), a religion that is still practiced
today, albeit by far fewer individuals.
2.8.4
Greek Rule
The Achaemenid Empire was ended abruptly by the formidable warrior, Alexander the
Great. By 332 BC, he had conquered much of the Persian Empire, forcing the then
Achaemenid ruler Darius III to flee from Persia as Alexander edged east. Darius would
escape to Afghanistan and join with his ally Bessus, the satrapy ruler of Bactria.
However Darius was murdered by Bessus, who would then proclaim to be King of the
empire. After crushing Persia, Alexander invaded Afghanistan in 330 BC and quickly
took Herat while chasing Bessus, who had secured himself within the mountains. He
would eventually get his man, and would gain control of most of the Afghan satrapies
(Dupree & Dupree, 2011). Alexander later attempted an invasion of India, but failed to
conquer it, and would return to Persia. Alexander died a few years later in 323 BC in
Babylon, leaving behind the Greek armies in Afghanistan (CSU, 2010; Dupree &
Dupree, 2011). Alexander’s successor was Seleucus, one of his officers (CSU, 2010)
who took control of Bactria, but controlled from Babylon (Dupree & Dupree, 2011). The
Seleucid Empire developed a Hellenistic culture in Afghanistan. The Greeks would rule
at least some part of Afghanistan for a few hundred years to come. Like Alexander
before him, Seleucus invaded India and also failed, thwarted by the Mauryan Emperor
Chandragupta. The Seleucids would offer southern Afghanistan (Achaemenid satrapies
of Arachosia and Gandhara), in return to maintain control north of the Hindu Kush (CSU,
2010; Dupree & Dupree, 2011). The Mauryans introduced Buddhism to Afghanistan and
would rule here from 304 to 180 BC (CSU, 2010).
The next form of Greek rule would be the Graeco-Bactrian Empire from 250-125 BC,
arising from the Seleucid dynasty (CSU, 2010). They established rule in Kabul, in the
meantime forcing the Mauryans into Pakistan. An Iranian dynasty also arose from the
Seleucids; the Parthians, who became independent from the Seleucids taking control of
the Sistan and Kandahar regions (CSU, 2010). Their reign would stretch west to Syria
(CSU, 2010). Later, the Parthians would join with the Scythians and become IndoScythians.
65
2.8.5
Yuezhi & the Kushan Empire
A Central Asian group of nomads, called the Yuezhi, migrated into northern Afghanistan
from western China. The Yuezhi united with other nomadic peoples from Central Asia
(“Afghans: Their History & Culture”, 2002; Dupree & Dupree, 2011) i.e. Scythians, and
forced the Greeks south into the Kabul valley. The Yuezhi would occupy Bactria for
approximately 100 years before establishing the Kushan Empire in northern India. The
Kushans expanded trade from China to Europe, using the Silk Route extensively; in
addition, this route also initiated the spread of Buddhism into China (“Afghans: Their
History & Culture”, 2002; Dupree & Dupree, 2011).
2.8.6
Arabs & Islam
During the seventh century, the Arab movement swept through Iran, defeating the
Sassanids at Nehavand in 642, and began entering Afghanistan (“Afghans: Their History
& Culture”, 2002; Dupree & Dupree, 2011) but faced difficulties in their attempts; Herat
was conquered in 652 AD while Kabul was finally taken in 664 (CSU, 2010). The slower
advancement into Afghanistan and as a consequence the slower conversion to Islam,
were likely due to a combination of the varied and often harsh terrains and the constant
revolt by Afghan tribes (CSU, 2010). At the end of the 8th century, Arabs were governing
the states of Herat, Samarkand, Kashgar and Sistan (CSU, 2010).
2.8.7
Mongol Dynasty
Genghis Khan began the Mongol invasion of Afghanistan in 1219 from the east (CSU,
2010). His invasion only transpiring as a result of the Khwarezmian Empire’s extremely
violent refusal to Khan’s proposal of alliance (CSU, 2010). As Genghis Khan swept
through Afghanistan, as a signal of his intent and displeasure, many Afghan cities were
not only demolished but completely depopulated and destroyed (CSU, 2010; Ali, Dupree
& Dupree, 2011). Khan died in 1227 at the age of 65, but the Mongols occupied
Afghanistan for a further 100 years, their kingdom divided into four Khanates; northern
and eastern Afghanistan became part of the Chagadai Khanate, while southern and
western Afghanistan became part of the Ilkhanate (CSU, 2010).
2.8.8
Modern Era
Afghanistan finally gained its independence in the 18th century, led by Mirwais Khan
(CSU, 2010; Ali, Dupree & Dupree, 2011); however, this was not to be the end of the
invasions into Afghanistan. The newly formed Afghanistan would even invade Persia
66
and control the region for a short period from 1722-1725 (CSU, 2010). The Persians
would come back and invade Afghanistan once more, and again would face revolt from
the Afghan tribes (CSU, 2010), eventually Afghans claimed their land back in 1747. In
1805, the Persians attacked Herat, but this time could not find victory (CSU, 2010). Later
during the 19th century, the British would attempt to gain control over Afghanistan. The
first of the British-Afghan wars occurred in 1839 (“Afghans: Their History & Culture”,
2002; CSU, 2010). The Afghans, on this occasion, defeated Britain in 1842 maintaining
control of their lands (“Afghans: Their History & Culture”, 2002; CSU, 2010). In 1859,
Afghanistan would lose land to the British as they gained Balochistan, consequently, this
made Afghanistan landlocked (CSU, 2010). The northern border was also a point of
interest, this time with the Russians; coveting a border that was moved southwards. The
second Anglo-Afghan war began in 1878, this time Britain would be more successful,
gaining some of Afghanistan’s eastern states including Kurram, Khyber and Pishin
(CSU, 2010). Britain withdrew from Afghanistan in 1880, but keeping the entitlement to
control Afghanistan’s foreign affairs (CSU, 2010). Just 5 years later, Russia would move
their border south taking Afghan lands north of the Oxus River (CSU, 2010). The third
Anglo-Afghan war occurred in 1919; the British were defeated, relinquishing control
over Afghan foreign affairs and Afghanistan would be independent again (CSU, 2010;
Ali. Dupree & Dupree, 2011).
In 1979, Afghanistan would be invaded again, this time by the Soviets, who would
eventually be defeated and leave in 1989 (“Afghans: Their History & Culture”, 2002;
CSU, 2010; Dupree, Dupree & Weinbaum, 2011). In the time that followed, Afghanistan
fell into severe civil war; culminating in the emergence of the Taliban, who enforced
their extreme views upon the Afghan population (CSU, 2010). Afghanistan would soon
be invaded again, this time by the USA and its allies in response to the 2001 attacks in
New York, the Pentagon building in Virginia and a fourth hijacked plane crashing in
rural Pennsylvania, USA. The Taliban, were targeted for their failure to ‘give up’ the
whereabouts of Osama Bin Laden, were soon removed from government and an interim
government established until some sort of stability could be sustained and democratic
elections to choose a President could take place. The current President of the Islamic
Republic of Afghanistan is Hamid Karzai, who was officially elected into office on
October 9th 2004 following a brief spell as the Chairman of the Interim Administration of
Afghanistan (Office of the President, 2009). The US and its allies still have a strong
military presence in Afghanistan today, attempting to quash the remaining and any
resurgent Taliban enforcers before they leave entirely.
67
Chapter Three
Materials and Methods
68
3. Materials & Methods
3.1 Materials
See Appendix 1 for the list of materials used in this project.
3.2 Precautionary Measures
When operating within the laboratory, protective clothing (laboratory coats and
disposable gloves) were worn at all times. Additional measures were also taken when
performing sensitive tasks or handling dangerous/harmful chemicals and equipment, such
as the UV irradiation and cleaning with absolute ethanol of workspaces before and after
procedures, the most sensitive techniques undertaken within a clean cabinet, and the use
of a fume hood for extremely dangerous chemicals i.e. powdered ethidium bromide. All
procedures were COSHH assessed and performed appropriately.
3.3 Sample Collection
The DNA samples were collected from three refugee camps all located in Khorasan
province (Figure 3.1) in north-eastern Iran, near the cities of Mashhad, Bojnurd and
Birjand. The samples were collected by researchers from Mashhad University of Medical
Sciences; Mashhad, Iran. Ethical consent was provided by all participants and the
research was approved following a full ethic review at the Mashhad University of
Medical Sciences, examples of the forms used can be found in Appendix 2. All
participants had at least three generations of ancestry in their country of birth and had
provided details of their geographical origin. Samples in the form of blood (8.5ml) were
collected using PAXgene Blood DNA Tubes (Qiagen). These tubes contain a reagent
mix that stabilises and preserves the blood (and cells), preventing coagulation.
3.4 DNA Isolation
This procedure was also performed by the researchers from Mashhad University of
Medical Sciences; Mashhad, Iran. DNAs were extracted and isolated from the blood
using PAXgene Blood DNA Kits (Qiagen). The blood samples are transferred into
processing tubes already containing lysis buffer, mixed and inverted to lyse the
erythrocytes and white blood cells. Following centrifugation, nuclei and mitochondria are
pelleted, and these washed and resuspended in a digestion buffer. The protein
contaminants are removed by incubation with a protease. The DNA is precipitated in
isopropanol, washed with 70% ethanol and dried before resuspension in resuspension
69
buffer (Qiagen, 2009 “PAXgene Blood DNA Kit”). See Appendix 1 for the protocol for
this procedure.
Figure 3.1: Map of Iran (www.geology.com/world/iran-satellite-image.shtml) and inset; Khorasan province
and the the refugee camps (circled) near the cities of Mashad, Bojnurd and Birjand.
3.5 PCR
Polymerase Chain Reaction (PCR) is a process, developed by Kary Mullis in 1984,
which enables the amplification of small quantities of DNA (Bartlett & Stirling, 2003).
Each reaction requires a specific set of oligonucleotides (primers), Taq polymerase,
dNTPs, a magnesium cofactor (MgCl2) and a stabilising buffer. During PCR, DNAs
undergo a series of cycles which (i) denatures the DNA; separating one strand from
another and therefore exposing the nucleotides, (ii) allow the annealing of
70
oligonucleotides to specific sequence of exposed nucleotides which flank the DNA
segment of interest, and (iii) the extension of the fragment of DNA using free nucleotides
(dNTPs).
In order to prevent contamination, each PCR reaction was prepared in an area separated
from all other areas where any other procedure would take place. PCRs were set up in a
Captair Bio PCR UV Cabinet. Before any work was carried out, pipettes and relevant
consumables (pipettor tips, microcentrifuge tubes and PCR tubes) were autoclaved and
then exposed to constant UV irradiation for 30 minutes. Once the PCR set-up was
completed and the samples transferred to the thermocycler, the cabinet was emptied and
UV irradiated for 30 minutes again.
Samples were amplified using a twelve primer pair set (Table 3.2) consisting of nine
overlapping primer pairs and three internal primer pairs (Torroni et al., 1997) and a
fifteen (Table 3.3) primer pair set (Kong et al., 2003; Palanichamy et al., 2004) specific
to mtDNA.
The contents of each reaction tube consist of PCR-grade H2O, template mtDNA and a
mastermix. The volumes used and final concentrations within each reaction tube for the
amplification of all primer pairs are found below (Table 3.1).
Table 3.1: Volumes and final concentrations of mastermix reagents for polymerase chain reaction
amplification of mtDNAs.
Reagent
Volume Added (µl)
Final Concentration
GeneCraft 10x PCR Buffer
2.5
1x & 1.5mMΔ
AB dNTP mix 10mM
1
0.4mM
Forward Primer (10µM)
0.5
0.2µM
Reverse Primer (10µM)
0.5
0.2µM
GeneCraft BioTherm DNA
0.1
0.5 unit
4.6
-
BioTherm with 15mM MgCl2
Polymerase (5units/ µl)
Total volume (µl)
Δ
final concentration of 1.0mM MgCl2 required for application of primer pair 4 (Torroni et al., 1997).
The Taq polymerase used is BioTherm Taq polymerase supplied by Genecraft and is a
concentration of 5 units/µl. The storage buffer for the enzyme contains 10mM KPhosphate buffer pH 7.0, 100mM NaCl, 0.5mM EDTA, 1mM DTT, 0.01% Tween 20
and 50% Glycerol (v/v). The reaction buffer supplied to support the BioTherm activity
consists of 160mM (NH4)2SO4, 670mM Tris-HCl pH 8.8 (at 25°C), 15mM MgCl2 and
0.1% Tween 20.
71
72
73
Table 3.4: Thermocycler conditions for the primer pairs as described by Torroni et al. (1997)
Table 3.5: Thermocycler conditions for the Palanichamy et al. (2004) primer pairs
3.6 Agarose Gel Electrophoresis
This technique enables the separation of DNA molecules based on their size. The gel is
of a sieve-like nature enabling the negatively charged DNA fragments to migrate, due to
74
electrical current, toward the positive electrode (Sambrook, 2001). Smaller fragments of
DNA and RNA will migrate further/faster through the gel than larger fragments.
Prior to its use, the electrophoresis equipment was washed with a detergent, rinsed with
dH2O and then absolute ethanol, before being left to dry. The gel casting plate and
comb(s) were assembled and water-tight seal made to prevent any leaking of the liquid
gel. The gel itself was prepared by heating in a microwave til the agarose-TBE solution
became molten. Ethidium bromide (10mg/mL) is added to the molten gel and mixed,
avoiding the formation of bubbles and to produce a uniformly stained gel, before being
poured into the gel casting plate. Once set, the water-tight seal is removed and 1x TBE
buffer is poured into the electrophoresis tank until the gel is submerged. The electrodes
are connected to the powerpack and the gel run at 200 volts for ~30 minutes. Following
this, the gel is removed from the electrophoresis tank and placed into a UV
transilluminator for the visualisation of the DNA bands and photographed.
3.7 Glycogen Precipitation of DNA (PCR Products)
One microliter (1µl) of glycogen (Sigma Aldrich) solution (20µg/µl) is added to PCR
products followed by 2-3 volumes of absolute ethanol. The mixture is transferred to a
1.5ml centrifuge tube and incubated at -20°C for a minimum of one hour. Following
incubation, the samples are centrifuged for 20 minutes holding at a temperature of 4°C.
The supernatant is aspirated without disturbing the DNA pellet, and then washed with
200µl 70% ethanol before centrifugation at 4°C for a further 5 minutes. The supernatant
is aspirated, again without disturbing the pellet, followed by drying in a vacuum for 6
minutes. The pellet is resuspended in 20µl dH2O.
3.8 Purification of PCR Products
Some samples were purified not by precipitation but by spin column, the purification
protocol used was obtained from Macherey-Nagel. Two volumes for Buffer NT were
added to the PCR product (i.e. 40µl Buffer NT added to 20µl PCR product) and this
mixture transferred to a spin column, which sits inside a 2ml collection tube. The sample
then centrifuged for 1 minute at 11,000 rpm and the flow-through discarded. Six hundred
microliters (600µl) Buffer NT3 added to the spin column in order to wash the silica
membrane, and the tube centrifuged at 11,000 rpm again for 1 minute. The flow-through
was discarded again. The tubes were centrifuged for a further 2 minutes at 11,000 rpm to
remove any remaining Buffer NT3 and dry the silica membrane. The 2ml collection tube
itself was discarded this time and the spin column placed into a new 1.5ml centrifuge
75
tube. Twenty microliters (20µl) dH2O added into the spin column and the tube again
centrifuged this time for 1 minute at 11,000 rpm and the DNA eluted. The spin column
was then discarded and the centrifuge tube lid shut.
3.9 DNA Extraction of PCR Products from Agarose Gels
On occasions, some samples required to be extracted from an agarose gel following
electrophoresis. The Macherey-Nagel DNA extraction protocol was followed to
undertake this task.
The required DNA band(s) were cut, using a clean scalpel, and removed from the agarose
gel and each individual samples placed into a separate 1.5ml centrifuge tube. The bands
were removed from the gel with the aid of a UV transilluminator. Two hundred
microliters (200µl) Buffer NT was added to the centrifuge tube and each tube then placed
into a heatblock set at a constant temperature of 50°C. Every 2½ minutes, the tubes were
removed from the heatblock and vortexed briefly before returning to incubate further.
The incubation and vortexing of the sample(s) continued until the gel inside the tubes had
completely dissolved. A NucleoSpin® Extract II Column was placed into a 2ml
collection tube. The sample from the centrifuge tube was transferred into the spin column
and centrifuged at 11,000 rpm for 1 minute, this to bind the DNA to the silica membrane.
The flow-through inside the collection tube was discarded. Seven hundred microliters
(700µl) Buffer NT3 was then added to the spin column, washing the DNA/Silica
membrane, for 1 minute at 11,000 rpm. The flow-through in the collection tube was
discarded once more. The sample was then centrifuged for 2 minutes at 11,000 rpm to
remove any remaining Buffer NT3 and dry the silica membrane. The collection tube this
time was discarded and the spin column placed into a new 1.5ml centrifuge tube. Twenty
microliters (20µl) Buffer NE added to the spin column, to elute the DNA, and the sample
centrifuged again for 1 minute at 11,000 rpm.
3.10
RFLP Analysis
Restriction Fragment Length Polymorphism (RFLP) analysis is a technique used to
determine the sites at which DNA has been cleaved by a restriction endonuclease into
linear form (if plasmid) or into two or more fragments. Here, the hierarchical method has
been used; targeting the haplogroup defining SNPs with specific restriction
endonucleases (Tambets et al. 2004; Quintana-Murci et al. 2004).
76
Each digest contains reaction buffer, enzyme and dH2O and in some cases BSA (BstNI,
HaeII, HhaI, HincII, MnlI, MseI and NlaIII) in a reaction volume of 20µl (Table 3.6).
Samples are incubated at 37°C for 2 hours.
Table 3.6: Reaction mixes for the restriction digests with and without BSA
Reagent
Volume
Final
Added (µl)
Concentration
2
1x
10x NE Reaction
Reagent
Volume
Final
Added (µl)
Concentration
2
1x
10x NE Reaction
Buffer (1, 2, 3, 4)
Buffer (1, 2, 3, 4)
Enzyme
1
5 units
Enzyme
1
5 units
dH2O
5
-
10x BSA
2
1x
Purified PCR
12
-
Purified PCR
12
-
dH2O
3
-
Total Volume
20
-
Product
Product
Total Volume
3.11
20
-
DNA Sequencing
3.11.1 Haplogroup Identification
Some DNA samples required analysis beyond RFLP investigation; those that had
exhausted the RFLP analysis route. These mtDNAs were amplified and sequenced using
the oligonucleotides L6337/H7406, L8215/H8345, L8215/H8861, L9794/H10356 and
L11718/12361 as described above. Amplified mitochondrial DNA segments were sent
for commercial sequencing (GATC Biotech Ltd, London).
3.11.2 Hypervariable Region I
The mtDNA samples assigned to a haplogroup underwent amplification and sequencing
using oligonucleotides (Table 3.7) of the hypervariable segment I (HVS-I) region in
mitochondrial DNA.
Table 3.7: Co-ordinates and sequences of the forward and reverse oligonucleotides and the fragment size
generated for HVS-I analysis.
Forward Sequence 5’-3’
Reverse Sequence 5’-3’
Fragment Size (bp)
TCAAAGCTTACACCAGTCTTGTAAACC
CCTGAAGTAGGAACCAGATG
590
(15908-15926)
(16517-16498)
77
Polymerase chain reaction consists of 35 cycles of denaturation at 95°C for thirty
seconds, annealing at 55°C for thirty seconds and elongation at 72°C for one minute,
followed by a final elongation stage at 72°C for five minutes.
Once amplified and purified, using the glycogen precipitation method (see section 3.7
above), the mtDNA segments were sent for commercial sequencing (GATC Biotech Ltd,
London).
78
Chapter Four
Results
79
4. Results
4.1 PCR Amplifications
All mtDNAs were amplified using a series of primer pairs ranging from ~300bp-~2.5Kb
(Figure 4.1) and ~0.92Kb-~1.63Kb (Figure 4.2). The fragment sizes generated following
PCR can be found in the previous chapter (Table 3.2 & 3.3).
M
2
3
4
5
6
7
8
9
10
11
12
13
M
3 Kb
2 Kb
1.5 Kb
1.2 Kb
1000 bp
900 bp
800 bp
700 bp
600 bp
500 bp
400 bp
300 bp
200 bp
100 bp
Figure 4.1: Amplification of nine overlapping primer pairs and three internal primer pairs as described in
Torroni et al. (1997). M = DNA Ladder; 2-Log DNA Ladder 0.1-10.0 Kb (New England Biolabs). Lane 2
= primer pair 1, Lane 3 = primer pair 2, Lane 4 = primer pair 3, Lane 5 = primer pair 4, Lane 6 = primer
pair 5, Lane 7 = primer pair 6, Lane 8 = primer pair 7, Lane 9 = primer pair 8, Lane 10 = primer pair 9,
Lane 11 = primer pair 10, Lane 12 = primer pair 11, Lane 13 = primer pair 12.
80
M
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Figure 4.2: Amplification of the fifteen overlapping primer pairs as described in Palanichamy et al. (2004).
M = DNA Ladder, as described above. Lane 2 = primer pair 1, Lane 3 = primer pair 2, Lane 4 = primer
pair 3, Lane 5 = primer pair 4, Lane 6 = primer pair 5, Lane 7 = primer pair 6, Lane 8 = primer pair 7, Lane
9 = primer pair 8, Lane 10 = primer pair 9, Lane 11 = primer pair 10, Lane 12 = primer pair 11, Lane 13 =
primer pair 12, Lane 14 = primer pair 13, Lane 15 = primer pair 14, Lane 16 = primer pair 15.
81
Table 4.1: Size of DNA fragments for each haplogroup characterisation from RFLP analysis; a denotes
primer pairs from Torroni et al., (1997), b denotes primer pairs from Palanichamy et al., (2004).
Haplogroup
PCR
(Figures)
Amplification
Enzyme
a
L3
Primer Pair 10
M
b
Characteristic
SNP np.
HpaI
3594
DNA Fragments Generated
Positive
Negative
Additional Fragments
Sample
Sample
(approximate size)
610bp
487bp &
N/A
123bp
Primer Pair 9
AluI
10398 & 10400
201bp &
366bp
~590bp, ~470bp & 80bp
301bp
~230bp, ~110bp, ~100bp,
165bp
N
Primer Pair 10
b
MnlI
10873
243bp &
58bp
~60bp, ~50bp, ~30bp, ~20bp &
~10bp
b
C
Primer Pair 12
D
b
HincII
13263
1,229bp
853bp &
~100bp & ~50bp
375bp
Primer Pair 5
b
E
Primer Pair 12
G
b
AluI
HphI
5178
13626
594bp
239bp
408bp &
~130bp, ~80bp, ~60bp, ~50bp,
186bp
~40bp & ~30bp
168bp &
~650bp, ~270bp & ~220bp
71bp
Primer Pair 4
HhaI
4833
301bp &
585bp
~830bp & ~80bp
420bp
~280bp, ~230bp, ~200bp,
284bp
R
Primer Pair 12b
MboII
12705
401bp &
19bp
A
Primer Pair 1
b
HaeIII
663
800bp &
~160bp & ~80bp
1,090bp
~330bp
718bp &
N/A
290bp
I
Primer Pair 3
a
HaeII
4529
1,378bp
660bp
Y
Primer Pair 8
b
Primer Pair 14
b
H
Primer Pair 17
b
V
Primer Pair 3
a
Primer Pair 3
a
Primer Pair 3
a
HV
HaeIII
MseI
AluI
NlaIII
8392
14766
7028
4580
322bp
228bp
188bp
1,025bp
181bp &
~360bp, ~270bp, ~200bp,
141bp
~160bp & ~30bp
211bp &
~670bp, ~450bp, ~60bp, ~40bp,
17bp
~10bp & ~5bp
158bp &
~420bp, ~370bp, ~170bp &
30bp
~20bp
738bp &
~320bp, ~30bp & ~10bp
287bp
TJ
NlaIII
4216
377bp &
738bp
361bp
T
BfaI
4917
279bp &
~320bp, ~290bp, ~30bp &
~10bp
318bp
39bp
~240bp, ~200bp, ~170bp,
~130bp, ~110bp, ~90bp, ~60bp,
~50bp & ~15bp
J
Primer Pair 13
b
Primer Pair 11
b
BstNI
13078
1,018bp
905bp &
N/A
113bp
Uk-group
HinfI
12308
316bp &
137bp
82
453bp
~550bp, ~220bp, ~80bp &
~10bp
Table 4.2: Recognition sequences and cut sites of the enzymes used for the haplogroup assignment of
samples. N = any base (A, C, G or T), R = either A or G, W = either A or T, Y = either C or T.
Enzyme
AluI
BfaI
BstNI
HaeII
HaeIII
HhaI
HincII
HinfI
HpaI
HphI
MboII
MnlI
MseI
NlaIII
Recognition Sequence & Cut Site (arrow)
AGCT
CTAG
CCWGG
RGCGCY
GGCC
GCGC
GTYRAC
GANTC
GTTAAC
GGTGA(N)8
GAAGA(N)8
CCTC(N)7
TTAA
CATG
For Haplogroup Assay
M, D, H
T
J
I
A, Y
G
C
Uk-group
L3
E
R
N
HV
V, TJ
4.2 Haplogroup Characterisations using RFLP Analyses
The initial analysis required DNAs to be amplified using primer pair 10 (Torroni et al.,
1997) and incubated with the restriction endonuclease HpaI for the assignment of
Haplogroup L3 (Figures 4.3 & 4.4). Mitochondrial DNAs which are positive for this
haplogroup (and all other downstream haplogroups) will retain the 610bp fragment from
amplification while those which are negative will produce two fragments; one 487bp and
another 123bp. No other fragments are generated from this assay.
Figure 4.3: 2% agarose gel of HpaI restriction digests for Haplogroup L3 characterisation. All samples
retain the 610bp amplified fragment. M = DNA Ladder as previously mentioned.
83
Figure 4.4: 2% agarose gel of HpaI restriction digests on Afghan DNAs. All samples have the 610bp
fragment, sample ‘PC’ is a positive control and has two fragments, 487bp & 123bp in size; M = DNA
Ladder as previously described.
84
Following the analysis for Haplogroup L3, samples will be assessed for Haplogroup M
(Figures 4.5-7) using the enzyme AluI. DNAs exhibiting the polymorphisms at nps 10398
and 10400, are positive for Haplogroup M, and will generate fragments of 201bp and
165bp while those that do not will engender a fragment of 366bp. In addition to these
diagnostic fragments, other fragments of ~590bp, ~470bp and ~80bp will also be
generated.
Figure 4.5: 2% agarose gel of AluI digests for Haplogroup M characterisation. ‘P’ denotes the PCR product
while ‘D’ denotes the restriction digest, M = DNA Ladder as previously described. Samples D1, D3, D4 &
D6 each have a 366bp fragment, samples D2 & D5 have both a 201bp & 165bp fragment. Additional
fragments of ~590bp, ~470bp and 80bp are also present.
85
Figure 4.6: 2% agarose gel of primer pair 9 (Palanichamy et al., 2004) PCR products and AluI digests for
Haplogroup M assignment. Samples D7, D8, D10, D11 & D12 each have the 366bp fragment while sample
D9 has the 201bp & 165bp fragments. All possess the additional fragments; ~590bp, ~470bp & 80bp in
P13
D13 P14 D14
M
size.P16 D16
P15 D15
M
P17 D17 P18
D18
Figure 4.7: 2% agarose gel of PCR products and restriction digests for analysis of the Haplogroup M
characteristic. M = DNA Ladder. Samples D13, D14 & D15 have the 366bp fragment; samples D16, D17
& D18 have the 201bp & 165bp fragments.
86
The samples which have been identified as negative for Haplogroup M, by possessing a
366bp fragment, will be examined for the polymorphism at np 10873 which is
characteristic of haplogroup N (Figures 4.8 & 4.9). Samples belonging to Haplogroup N,
or its downstream haplogroups, will produce a fragment of 243bp following digestion
with MnlI, while samples which do not bear the polymorphism will have a 301bp
fragment instead and will be classified as Haplogroup L3. Additional fragments
generated, which do not determine haplogroup designation, from this assay include
~230bp, ~110bp, ~100bp and <100bp fragments of ~60bp, ~50bp, ~30bp, ~20bp and
~10bp.
Figure 4.8: 2% agarose gel of PCR products (P) and digests (D) with MnlI for Haplogroup N
characterisation. PCR products are ~1.2Kb in size, sample D2 consists of a band at 301bp, samples D1, D3
& D4 have the 243bp band. Additional fragments of ~230bp, ~110bp and ~100bp are also visible.
87
Figure 4.9: 2% agarose gel of Haplogroup N characterisation; PCR products amplified with primer pair 10
(Palanichamy et al., 2004) and digests with MnlI. Samples D5, D7 & D8 have a 243bp
fragment while sample D6 has a band at 301bp instead. Fragments of ~230bp, ~110bp
and ~100bp in size are also observed.
88
The samples which have been acknowledged for being positive for Haplogroup M are
then examined for the characteristics which define the Haplogroups C, D, E and G. The
assay for Haplogroup C (Figures 4.10 & 11) requires samples to be analysed with the
endonuclease HincII which will produce a fragment of ~1.23Kb (1,229bp) (if the sample
bears the ‘C’ polymorphism) or two fragments of 853bp and 375bp in size if they do not.
Additional fragments include one ~100bp and one ~50bp in size.
Figure 4.10: 2% agarose gel of PCR products (P) and digest products (D) following incubation with HincII
for Haplogroup C characterisation. Sample D1 has bands at 853bp & 375bp, sample D2 has a single band
~1.23Kb (1,229bp) in size; M = DNA ladder. The additional fragment of ~100bp is observed.
89
Figure 4.11: 2% agarose gel of PCR products and digest products following incubation with HincII for
Haplogroup C characterisation; samples D3 & D6 have a band of ~1.23Kb (1,229bp) and samples D4, D5
& D7 have bands of 853bp & 375bp. An additional band of ~100bp is observed in each digest.
90
Those samples which did not generate the ~1.23Kb (1,229bp) fragment distinctive of
Haplogroup C, are then investigated for the characteristic representative of Haplogroup D
(Figure 4.12) using AluI. Sample will either generate a band of 594bp if positive or bands
of 408bp and 186bp if they are not. The additional fragments of ~130bp, ~80bp, ~60bp,
~50bp, ~40bp and ~30bp are also produced.
Figure 4.12: 2% agarose gel of PCR product (P) and cleaved DNA products (D) following incubation with
the endonuclease AluI for Haplogroup D characterisation. Samples D1 & D7 have a 594bp band; samples
D2, D3, D4, D5 & D6 has bands of 408bp & 186bp. The generic ~130bp band from this assay is also
present in each digest.
91
The restriction endonuclease HphI is used to classify samples into Haplogroup E (Figure
4.13) based upon the presence or absence of a polymorphism at np 13626. Fragments of
168bp and 71bp in size signify the absence of the characteristic while a 239bp fragment
indicates its presence and the sample belonging to the haplogroup. Additional fragments
generated include one ~650bp, ~270bp and ~220bp.
Figure 4.13: 2% agarose gel of PCR amplifications (P) and endonuclease digestions of these amplifications
(D) with HphI for Haplogroup E classification. No digests exhibit the characteristic 239bp band, and all
possess bands of 168bp & 71bp. The additional fragments of ~650bp, ~270bp and ~220bp are also
observed in each digest.
92
All those samples which have not yet been assigned to haplogroups C, D or E, are then
assessed for the feature specific for Haplogroup G (Figure 4.14). The endonuclease HhaI
is used to identify the presence or absence of this characteristic at np. 4833. Positive
samples will have bands of 301bp and 284bp present while negative samples will have a
585bp band instead. This assay also produces non-characteristic fragments of ~830bp
and ~80bp.
Figure 4.14: 2% agarose gel of PCR products (P) and digested amplifications (D) with the endonuclease
HhaI for Haplogroup G assignment. Only sample D5 has bands at 301bp & 284bp while the other digests
have a 585bp band. The additional fragment of ~830bp is present among all digests.
93
The samples which have been recognised as having the characteristic polymorphism of
Haplogroup N, are now examined for the distinguishing polymorphism for Haplogroup R
(Figures 4.15-17) at np. 12705 using the endonuclease MboII. From the resulting
incubation, samples will either possess a 401bp (positive for Haplogroup R), or a 420bp
sized band (negative). To determine the difference between the two bands, a 20bp DNA
Ladder was used to accurately identify the band sizes. This assay does generate
additional fragments of ~280bp, ~230bp, ~200bp, ~160bp and ~80bp in size.
Figure 4.15: 2.5% agarose gel of DNAs digested with the endonuclease MboII for Haplogroup R
characterisation. ‘L’ denotes the 20bp DNA ladder (20bp-1Kb). Samples 1 & 3 have a 401bp band, sample
2 has a 420bp band. Each digest also possesses the standard ~280bp, ~230bp, ~200bp and ~160bp bands
generated from this assay.
94
Figure 4.16: 2.5% agarose gel of amplified DNAs digested with the endonuclease MboII for the assignment
of Haplogroup R. L = 20bp DNA ladder. Samples 4, 6, 7 & 8 have a 401bp band, sample 5 has a 420bp
band. The additional ~280bp, ~230bp, ~200bp and ~160bp fragments are also present.
95
Figure 4.17: 2.5% agarose gel of amplified DNAs digested with MboII for characterisation of Haplogroup
R. Samples 9 & 12 have a 420bp band, while samples 10, 11 & 13 have a band 401bp in size. The
additional bands (~280bp, ~230bp, ~200bp and ~160bp) are also present.
96
The samples which possessed the 420bp band and are therefore negative for Haplogroup
R, are then investigated for the characteristics for Haplogroups A, I and Y. Figure 4.18
illustrates the outcome from samples that were analysed with the endonuclease AluIII for
the assignment of Haplogroup A (Figure 4.18), which generates either 800bp and 290bp
bands if the sample is positive, or a ~1.1Kb (1,090bp) band if negative. One extra
fragment is generated from the assay with this restriction enzyme, ~330bp in size.
Figure 4.18: 2% agarose gel of amplified DNAs processed by the endonuclease HaeIII for Haplogroup A
classification. Samples 1, 7 & 8 have bands 800bp & 290bp in size, while samples 2, 3, 4, 5 & 6 have a
band sized at ~1.1Kb (1,090bp). All samples also possess the standard ~330bp band from this assay; M =
DNA ladder (100bp-10.0Kb).
97
Haplogroup I (Figure 4.19) is characterised by a polymorphism at np. 4529; samples will
either generate bands of ~718bp and 660bp in size if they are negative or alternatively
will retain the amplified fragment of ~1.4Kb (1,378bp) in size. There are no other
fragments generated.
Figure 4.19: 2% agarose gel of RFLP analysis using HaeII on amplified DNAs for Haplogroup I
classification. All samples have bands of 718bp & 660bp in size; M = DNA ladder.
98
Samples are incubated with HaeIII, this time to characterise for Haplogroup Y (Figure
4.20). Samples which exhibit the Haplogroup Y polymorphism will produce a band of
322bp, while samples which do not will generate bands of 181bp and 141bp. The
additional fragments produced from the restriction enzyme activity are sized at ~360bp,
~250bp, ~200bp, ~160bp and ~30bp. A 50bp DNA ladder has been used here to
differentiate between the characteristic bands and other bands that will present.
Figure 4.20: 2% agarose gel of DNAs digested with the endonuclease HaeIII for Haplogroup Y
assignment. All samples have bands of 181bp & 141bp and possess the additional bands of ~360bp,
~250bp, ~200bp and ~160bp in size; DL = 50bp DNA ladder (50bp-650bp).
99
The samples which have been recognised as being positive for Haplogroup R, are then
analysed for the polymorphism representative of Haplogroup HV (Figure 4.21) at np.
14766 using MseI. Samples that are positive will not be cleaved at the polymorphic site
and will generate a 228bp band, while negative samples will produce a 211bp band
instead. Analogously to the assay for Haplogroup R, a 20bp DNA ladder was used to
accurately determine the band sizes. In addition to these fragments which determine
whether a sample belongs to haplogroup HV or not, bands of ~670bp, ~450bp, ~60bp,
~40bp, ~10bp and 5bp will be generated.
Figure 4.21: 2.5% agarose gel of DNAs following incubation with the endonuclease MseI for Haplogroup
HV characterisation. Samples 1, 2, 3, 4, 5 & 8 have a 211bp band, samples 6, 7, 9 & 10 have a 228bp band;
L = 20bp DNA ladder.
The samples which have been identified as being positive for Haplogroup HV are then
analysed for Haplogroup H and V. The polymorphism at np. 7028 is representative of
Haplogroup H, whereas a polymorphism at np. 4580 is characteristic of Haplogroup V.
Samples were incubated with the endonuclease AluI for Haplogroup H (Figure 4.22)
classification which will generate either a 188bp band if positive or a 158bp band if
negative. In addition to these bands, bands of ~420bp, ~370bp, ~170bp and ~20bp will
also be present regardless of the outcome and consequently the 20bp DNA ladder has
been used again to differentiate between the bands present. If the 158bp band is present,
100
then samples will be incubated with the NlaIII for Haplogroup V assignment (Figure
4.23). As a result, positive samples will engender a ~1Kb (1,025bp) band and bands of
738bp and 287bp will be present for negative samples. The assay for haplogroup V
determination will also generate additional bands of ~320bp, ~30bp and ~10bp in size.
Figure 4.22: 2.5% agarose gel of DNAs digested with AluI for Haplogroup H classification. Samples 1, 2,
& 3 have the 188bp band, while sample 4 has the 158bp band. Additional bands sized at ~420bp, ~370bp
& ~170bp are also present; L = 20bp DNA ladder.
101
Figure 4.23: 2% agarose gel of digested DNAs following incubation with NlaIII for Haplogroup V
assignment. All samples have bands 738bp & 287bp in size as well as the standard ~320bp band generated
from this assay; M = DNA ladder.
102
The DNAs that were negative for Haplogroup HV are here examined for Haplogroup TJ.
Following incubation with NlaIII, samples will either produce bands of 377bp and 361bp
if the representative polymorphism is present while the presence of a 738bp band is
indicative of an absence of the polymorphism. The assay with this restriction
endonuclease will engender additional bands of ~320bp, 287bp, ~30bp and ~10bp in
size.
Figure 4.24: 2% agarose gel of digested DNAs following incubation with the endonuclease NlaIII for
Haplogroup TJ classification. Samples 5 & 8 both have bands 377bp & 361bp in size while samples 1, 2, 3,
4, 6 & 7 have a band at 738bp. The additional bands of ~320bp & 287bp are also observed; M = DNA
ladder.
103
The samples identified as being positive for haplogroup TJ are then assessed for
haplogroups T and J, while those that have not are analysed for haplogroup Uk-group.
Haplogroup T (Figure 4.25) requires samples to be incubated with BfaI which will
provide bands of 279bp and 39bp if a sample is positive or alternatively if a sample is
negative, a 318bp band will be present. Additional bands sized at ~240bp, ~200bp,
~170bp, ~130bp, ~110bp, ~90bp, ~60bp, ~50bp and 13bp are also generated.
Figure 4.25: 2% agarose gel of DNAs digested by the endonuclease BfaI for the characterisation of
Haplogroup T. Samples 1 & 2 have a band 318bp in size, while sample 3 has one at 279bp. The additional
bands of ~240bp, ~200bp, ~170bp, ~130bp & ~110bp are also visible; DL = 50bp DNA ladder.
104
The assay for Haplogroup J (Figure 4.26) utilises the endonuclease BstNI at np. 13078
and will either cleave the DNA, producing bands of 905bp and 113bp, in event of a
sample being negative or will retain its length following amplification; ~1Kb (1,018bp)
as no other fragments are generated from this assay.
Figure 4.26: 2% agarose gel of PCR products digested with the endonuclease BstNI for Haplogroup J
classification. Sample 1 has a band ~1Kb (1,018bp) in size, while sample 2 has bands of 905bp & 113bp in
size; M = DNA ladder.
105
The analysis of Haplogroup Uk-group (Figure 4.27) employs HinfI to recognise the
presence or absence of a polymorphism at np. 12308. The presence of the polymorphism
indicates that the sample can be assigned to Haplogroup Uk-group and will engender
bands of 316bp and 137bp, however, an absence of the polymorphism will generate a
453bp sized band and the sample will not be classified as belonging to this haplogroup.
The additional fragments engendered from the endonuclease activity will be ~550bp,
~220bp, ~80bp and ~10bp in size.
Figure 4.27: 2% agarose of gel DNAs following incubation with the endonuclease HinfI for the assignment
of Haplogroup Uk-group. All samples have a band 453bp in size that is indicative of the absence of the
defining mutation for haplogroup Uk-group. In addition to this band, the ~550bp & ~220bp bands are also
observed; DL = 50bp DNA ladder.
106
4.3 Haplogroup Characterisations using DNA Sequencing
Once all RFLP analyses have been performed, samples still unassigned will be assessed
for further haplogroup-defining SNPs relevant according to the already completed RFLP
results. For example, outstanding samples which are positive for Haplogroup M will be
screened for a SNP at np. 5843 (Haplogroup Q) and np. 9090 (Haplogroup Z). Samples
positive for Haplogroup N, but negative for Haplogroup R will be assessed for SNPs at
np. 8404 (Haplogroup S), np. 11947 (Haplogroup W) and np. 6371 (Haplogroup X).
Finally, those which are positive for both Haplogroups N and R will be examined for a 9
bp deletion at nps. 8281-8289 (Haplogroup B) and a SNP at np. 6392 (Haplogroup F).
Any samples that have still not been assigned at this point will be assigned to
Haplogroups M, N and R respectively.
Table 4.3:SNP sites of haplogroups characterised via DNA sequencing
Haplogroup
Q
Z
S
W
X
B
F
SNP Site
5843
9090
8404
11947
6371
9bp deletion 8281-8289
6392
107
108
109
110
111
112
113
114
115
116
117
Table 4.4: Sequencing results for the samples analysed for Haplogroups Q & Z.
Sample Number
Haplogroup Q
Haplogroup Z
rCRS nucleotide
Nucleotide of
rCRS nucleotide
Nucleotide of
at SNP site
sample at SNP
at SNP site
sample at SNP
site
site
7
A5843
A5843
T9090
T9090
18
A5843
A5843
T9090
T9090
40
A5843
A5843
T9090
T9090
41
A5843
A5843
T9090
T9090
51
A5843
A5843
T9090
T9090
105
A5843
A5843
T9090
T9090
113
A5843
A5843
T9090
C9090
114
A5843
A5843
T9090
T9090
116
A5843
A5843
T9090
T9090
191
A5843
A5843
T9090
T9090
Table 4.5: Sequence results for samples examined for the characteristic SNPs of Haplogroups S, W & X.
Sample
Number
Haplogroup S
Haplogroup W
Haplogroup X
rCRS
Nucleotide of
rCRS
Nucleotide of
rCRS
Nucleotide of
nucleotide at
sample at SNP
nucleotide
sample at
nucleotide at
sample at SNP
SNP site
site
at SNP site
SNP site
SNP site
site
8
T8404
T8404
A11947
A11947
C6371
C6371
15
T8404
T8404
A11947
A11947
C6371
T6371
28
T8404
T8404
A11947
A11947
C6371
T6371
30
T8404
T8404
A11947
A11947
C6371
C6371
31
T8404
T8404
A11947
A11947
C6371
C6371
99
T8404
T8404
A11947
A11947
C6371
T6371
102
T8404
T8404
A11947
A11947
C6371
T6371
110
T8404
T8404
A11947
A11947
C6371
C6371
117
T8404
T8404
A11947
A11947
C6371
C6371
142
T8404
T8404
A11947
A11947
C6371
C6371
118
Table 4.6: Sequencing results of samples analysed for Haplogroups B & F.
Sample Number
Haplogroup B
Haplogroup F
rCRS at
Sequence of sample
rCRS nuceltoide
Nucleotide of
characteristic site
at characteristic
at SNP site
sample at SNP
site
site
13
A[CCCCCTCTA]G
A[CCCCCTCTA]G
T6392
C6392
20
A[CCCCCTCTA]G
A[CCCCCTCTA]G
T6392
T6392
21
A[CCCCCTCTA]G
A[CCCCCTCTA]G
T6392
T6392
23
A[CCCCCTCTA]G
A[CCCCCTCTA]G
T6392
T6392
24
A[CCCCCTCTA]G
A[CCCCCTCTA]G
T6392
T6392
25
A[CCCCCTCTA]G
A[---------]G
-
-
27
A[CCCCCTCTA]G
A[CCCCCTCTA]G
T6392
T6392
35
A[CCCCCTCTA]G
A[---------]G
-
-
38
A[CCCCCTCTA]G
A[CCCCCTCTA]G
T6392
T6392
47
A[CCCCCTCTA]G
A[CCCCCTCTA]G
T6392
T6392
118
A[CCCCCTCTA]G
A[CCCCCTCTA]G
T6392
T6392
133
A[CCCCCTCTA]G
A[---------]G
-
-
134
A[CCCCCTCTA]G
A[CCCCCTCTA]G
T6392
T6392
151
A[CCCCCTCTA]G
A[CCCCCTCTA]G
T6392
T6392
162
A[CCCCCTCTA]G
A[CCCCCTCTA]G
T6392
T6392
168
A[CCCCCTCTA]G
A[CCCCCTCTA]G
T6392
T6392
170
A[CCCCCTCTA]G
A[CCCCCTCTA]G
T6392
T6392
171
A[CCCCCTCTA]G
A[CCCCCTCTA]G
T6392
T6392
173
A[CCCCCTCTA]G
A[CCCCCTCTA]G
T6392
T6392
177
A[CCCCCTCTA]G
A[CCCCCTCTA]G
T6392
T6392
183
A[CCCCCTCTA]G
A[CCCCCTCTA]G
T6392
T6392
Based upon the results of both RFLP analysis and DNA sequencing, the samples can be
placed into haplogroups (Table 4.7).
119
Table 4.7: All samples and the Haplogroups to which they belong.
Sample
Number
1
2
5
6
7
8
10
11
13
15
18
19
20
21
23
24
25
27
28
30
31
32
33
34
35
38
39
40
41
43
44
47
48
49
50
51
80
97
98
99
100
101
102
103
104
105
106
107
108
109
110
Haplogroup
Sample Number
Haplogroup
A
D
C
H
M
N
C
TJ
F
X
M
C
R
R
R
R
B
R
X
N
N
H
H
L3
B
R
L3
M
M
J
HV
R
H
TJ
T
M
HV
T
J
X
H
D
X
A
H
M
L3
L3
L3
H
N
113
114
115
116
117
118
119
120
121
122
123
124
125
128
129
130
131
133
134
135
136
138
139
140
141
142
143
145
148
149
151
162
168
170
171
172
173
175
176
177
183
186
187
188
189
190
191
193
198
200
Z
M
D
M
N
R
A
C
H
H
TJ
H
H
C
HV
D
T
B
R
C
D
J
HV
HV
HV
N
H
HV
J
H
R
R
R
R
R
H
R
H
TJ
R
R
H
G
H
TJ
H
M
D
HV
D
120
Chapter Five
Phylogeographic Analysis of Afghani
Mitochondrial DNAs
121
5. Phylogeographic Analysis of Afghan mtDNAs
As previously mentioned in Chapter 1, and has been extensively reported over the last
two decades, populations which share a geographical region often share the same or
similar genetic traits in the mitochondrial genome known as haplogroups (Wallace et al.,
1999; Al-Zahery et al., 2003; Hedman et al., 2007; Richard et al., 2007; Tetzlaff et al.,
2007; Zimmerman et al., 2007; Jin et al., 2009). These haplogroups are often continent
specific and can also provide an indication of historical migrations of modern humans.
For instance, the Amerindian lineages (haplogroups A-D) can also be found among
Asian populations (Schurr et al., 1990). The L haplogroups (L1, L2 & L3) are often
referred to as the African lineages (Kivisild et al., 1999; Quintana-Murci et al., 2004;
Butler, 2005), groups M and U (in particular U2) are typical of South Asia (Kivisild et
al., 1999; Quintana-Murci et al., 2004), haplogroups HV, JT, U, K, N, I, W & X are
common in west Eurasian populations (Kivisild et al., 1999; Quintana-Murci et al., 2004;
Butler, 2005; Nasidze et al., 2006, 2007) and groups A, B, C, D, E, F, G, Z & M (in
particular sub-group M7) are identified as east Eurasian/East Asian lineages (Kivisild et
al., 1999; Quintana-Murci et al., 2004; Butler, 2005; Zlojutro et al., 2008; Irwin et al.,
2009a). Amongst the populations of western Eurasia, such as Armenia, Georgia,
Azerbaijan, Iraq and Iran, the most common haplogroups found in frequencies ≥10% are
HV, H, J, T and U (Kivisild et al., 1999; Al-Zahery et al., 2003; Comas et al., 2004;
Quintana-Murci et al., 2004; Derenko et al., 2007), while in south Asia, haplogroups M
and U are very common in India (Kivisild et al., 1999) and haplogroups H, M and U in
Pakistan (Kivisild et al., 1999b; Quintana-Murci et al., 2004). Among Central Asian
populations such as those in Turkmenistan, Uzbekistan, Tajikistan and Kyrgyzstan,
frequent haplogroups include C, D, HV/H and U (Comas et al., 2004; Quintana-Murci et
al., 2004; Derenko et al., 2007).
Table 5.1: Frequencies of the regional haplogroup lineages in the Afghanistan populations (%).
Population
East Asian
West Eurasian
South Asian
Lineages
Lineages
Lineages
Hazara
37.5
40.0
15.0
7.5
Tajik
10.5
89.5
0.0
0.0
Baloch
13.4
73.4
13.3
0.0
Pashtun
14.3
64.3
7.1
14.3
Afghan (total)
21.8
64.4
8.9
5.0
122
African Lineages
The mtDNA data identified the presence of 17 different haplogroups, all belonging to
either East Asian (A, B, C, D, F, G and Z), West Eurasian (HV, H, JT, J, T, N and X),
African (L3) or South Asian (M and R) lineages. Approximately 65% of the lineages
found belong to the West Eurasian collection of haplogroups (Table 5.1) indicating a
greater affinity with West Eurasian populations than any other region. This can be
attributed to the significant population pressures applied on the Afghan population by
multiple invasions and conquests of the Afghan lands by western groups.
Until recently, the region in which Afghanistan lies had, for many years, held a
significant position as a thoroughfare for trade routes and human migrations and
expansions (Barfield, 2010; Haber et al., 2012). The major migrations and invasions of
Afghanistan by ancient Persians, ancient Greeks, Indians and those more recent by the
Arabs and Mongols are likely a consequence of the desire to control the affluent region
brought by the trade routes (Barfield, 2010). The combination of these multiple
expansions have engendered a varied arrangement of ethnic groups which themselves
accommodate a diverse collection of mtDNA types.
123
124
125
126
127
5.1 Phylogeography of Individual Haplogroups
5.1.1
African Haplogroup L3
The first African populations of AMHs, including mitochondrial eve, originated
sometime between 100-200Kya (Disotell, 1999), 192,400Kya (95% Confidence Interval
(CI); 151,600-233,600Ky) (Soares et al., 2009). Haplogroup L3 has an East African
origin (Salas et al., 2002) and was the haplogroup attributed to the migration out of
Africa <100Kya (Disotell, 1999), ~60-80Kya (Salas et al., 2002) ~55-70Kya (Soares et
al., 2009). This migration gave rise to the macrohaplogroups M and N (and their
subsequent descendant lineages) that are found in all non-Africans (Torroni et al., 2006;
Gonder et al., 2007). The coalescent age of L3 has been estimated as ~71,000Kya; 95%
CI 57,100-86,600Ky (Soares et al., 2009) and 94.3 ±9.9Ky (Gonder et al., 2007). The
coalescent age of a lineage is the estimated time of the modern lineages to coalesce or
merge into their ancestral lineage or most recent common ancestor (MRCA). The L
haplogroups, including L3, are usually found among sub-Saharan populations, i.e. these
haplogroups constitute 100% of pygmy populations from the Democratic Republic of
Congo (formally Zaire) and the Central African Republic and ~67% of three groups from
Senegal (Chen et al., 1995). The L haplogroups have been observed in high frequencies
in north-western Africa, ranging from ~70-~90% in Algeria and Morocco and ~33% in
Egypt (Nasidze et al., 2008). The frequency of these haplogroups decrease further toward
the Near East exhibiting frequencies of ~10% in Israel, Jordan and Iraq and ~5% in
Syria, to an absence in western Iran (Nasidze et al., 2008). L3 is absent from Asian
populations and infrequent among the populations South Asia and western Eurasia and
have been found at low frequencies in Galicia (north-western Spain) and Catalonia, 2.5%
and 3% respectively (Alvarez-Iglesias et al., 2009).
The Baloch and Brahui populations of Balochistan in Pakistan both exhibit a frequency
of 2.6% (Figure 5.2) for haplogroup L3, while the Makrani population along the south
coast of Pakistan exhibits a much larger frequency of 27.3% (Quintana-Murci et al.,
2004). The Hazara and Pashtun populations have exhibited lower frequencies than the
Makrani, 7.5% and 14.3%, while this lineage is absent among the Tajiks and Baloch
populations of Afghanistan (Table 5.2). This infrequent African lineage is likely to have
been introduced into the Afghani population, as well as some adjacent populations, as a
consequence of the Arab invasion in the 7th century.
128
129
5.1.2
The Early non-African Lineages
5.1.2.1 Haplogroup M*
Macrohaplogroup M*, along with its sibling N*, account for all non-African mtDNAs.
This haplogroup originated during the human migration from Africa 57-75Kya
(Chandrasekar et al., 2009). Soares et al. (2009) provides an age estimate in south Asia
of 49,400Ky (95% CI; 39,000-62,200 years) and in East Asia 60,600Ky (95% CI;
47,300-74,300 years). Other estimates include 55-73Ky for the lineage among African
populations (Chen et al., 1995) and 69.3 ±5.4Ky among Chinese populations (Kong et
al., 2003). Haplogroup M* is considered a south Asian lineage due to its significant
contribution and distribution within the Indian population; >70% (Chandrasekar et al.,
2009) and ~60% (Disotell, 1999), and as such exhibits lower frequencies in Central Asia
and western Eurasia. It is believed to have arrived in the Indian subcontinent via the
Southern Route migration from Africa (Disotell, 1999; Macaulay et al., 2005; Torroni et
al., 2006; Chandrasekar et al., 2009; Kumar et al., 2009). Sub-group M7 is a common
lineage found in East Asia populations such as Korean-Chinese and the Han (Beijing) of
China, Mongolians and Koreans (Jin et al., 2009) and Japanese (Asari et al., 2007). The
Hazara, Baloch and Pashtun populations of Afghanistan exhibit frequencies of
haplogroup M* at 15%, 13.3% and 7.1% respectively (Figure 5.3).
Frequencies of macrohaplogroup M* have been found ranging from 26%-64% within the
Indian subcontinent (Kivisild et al., 1999b; Quintana-Murci et al., 2004) while the
frequencies found within the Afghan populations seem to resemble frequencies found
elsewhere in Central Asia (Figure 5.3).
5.1.2.2 Haplogroup N*
Haplogroup N* is the second macrohaplogroup to have diverged from the African
lineage L3. Its age has been estimated at 64.6 ±6.8Ky (Kong et al., 2003), 61,900 YBP in
west Eurasia (95% CI; 49,200-75,000 years), 71,200YBP in South Asia (95% CI;
55,800-87,100 years) and in East Asia, 58,200YBP (95% CI; 44,100-72,800) (Soares et
al., 2009). Haplogroup N* is the ancestor of many haplogroups found in Europe, Middle
East, Asia and the Americas (among the Amerindians). The origin of this lineage
occurred soon after or possibly even during the migration out of Africa, and is typically
considered a southwest Eurasian lineage (Kivisild et al., 1999; Quintana-Murci et al.,
2004; Nasidze et al., 2006, 2007). Haplogroup N* is fairly common in western Eurasia
and is also present in Europe. A frequency of 5.3% has been reported in eastern Crete
130
131
(Martinez et al., 2008) while the combination of haplogroups N*, I, W and X constitute
approximately 9% of the Finnish population (Hedman et al., 2007). Haplogroup N* is
also found in the Near East and northeast Africa; ~13% in Egypt, ~10% in Israel, Syria
and Jordan, ~5% in Iraq and ~23-44% among five Iranian populations (Nasidze et al.,
2008).
The Hazara exhibit a frequency of 7.5% and the Tajiks 10.5% of haplogroup N* (Figure
5.4). Elsewhere in Central Asia and western Eurasia, the frequency of N*ranges from
2.3% in the Tajiks of Tajikistan (Derenko et al., 2007) to 20% in the South Caspian
region in Iran (Comas et al., 2004). The greater frequencies appear in the more western
populations rather than in Central or South Asia. Haplogroup M* is affluent in South
Asia, however haplogroup N* appears to be lacking from the mtDNA landscape with
frequencies of 2.6% and 2.9% in the Brahui of Baluchistan and Gujarati of north-western
India, and 3% in both Pakistani and Makrani populations and 7.7% within the Han
Chinese population (Yang et al., 2011). The frequencies exhibited in Central Asia are
similar to those found within the Hazara and Tajiks with Uzbeks expressing 7.1%
(Quintana-Murci et al., 2004) and the Turkmen population 10% (Comas et al., 2004).
5.1.2.3 Haplogroup R*
As a descendant of the macrohaplogroup N*, haplogroup R* also diverged soon after the
migration from Africa. Along with macrohaplogroups M* and N*, R* is one of the
‘founder lineages for Eurasian settlement ~60-65Kya (Torroni et al., 2006). It has an
estimated age of 59,100 years in west Eurasia (CI; 47,100-74,100 YBP), 66,600 years in
South Asia (CI; 52,600-81,000 YBP) and 54,300 years in East Asia (CI; 41,200-67,800
YBP) (Soares et al., 2009), and 62.3 ±6.3Ky (Kong et al., 2003). Haplogroup R* is a
typical west Eurasian and South Asian lineage largely due to its early divergence from
N* in this region and can be characterised by a MboII site gain at np 12704 caused by a
transition at np 12705. Within the Finnish population, haplogroup R* makes up for <3%
of the maternal gene pool (Hedman et al., 2007). It has often been recorded in Central
and South Asia and western Eurasia; however its distribution is not uniform throughout
these regions (Figure 5.5). The Karakalpak present a frequency of 10% (Comas et al.,
2004), while in the Gujarati population of northwest India it appears in 8.8% of mtDNAs
and in 1.8% of Georgians (Quintana-Murci et al., 2004).
Within the Afghan populations, the Pashtuns exhibit 28.6%, Tajiks 15.8% and the Hazara
7.5%. Elsewhere, the greatest frequency was found the Uzbeks at a frequency of 20%,
132
133
134
while in the south Caspian region, haplogroup X was found in 2.4% of Persians and 9.5%
of Mazandrians.
The presence of these three lineages within the Afghani populations and the adjacent
populations from Iran, Central Asia and the Indian Subcontinent may be attributable to
this region being the initial territory where haplogroups M*, N* and R* settled following
the human emergence from Africa. Despite each lineage sharing similar coalescent ages,
haplogroup M* is prominent among South Asian populations, particular among those in
southern India in Andhra Pradesh where its frequency has been recorded at 64% (Figure
5.3). Haplogroups N* and R* appear to have a similar distribution to each other (Figures
5.4-5.5) with greater frequencies found among Iranian and Central Asian populations.
5.1.3
The East Asian Lineages
5.1.3.1 Haplogroup C
Haplogroup C, a derivative of macrohaplogroup M, is accepted to be an East
Asian/Eurasian lineage (Quintana-Murci et al., 2004) that can also be found among the
indigenous peoples of the Americas. This lineage is dated to 33-44Kya (Chen et al.,
1995) and 28,300 years before present (YBP) (95% CI; 19,400-37,400Kya) (Soares et
al., 2009). The lineage is generally found at low frequencies among Central Asian and
West Eurasian populations (Quintana-Murci et al., 2004), but is more widespread within
the populations of Siberia (Bermisheva et al., 2002) and has been reported at frequencies
of 15-21.3% among Mongolians (Kivisild et al., 1999b; Derenko et al., 2007; Jin et al.,
2009). It can be characterised by an absence of a HincII site at np 13259. Haplogroup C
was found exclusively among the Hazara population (Figure 5.6).
The frequency of haplogroup C among the Hazara of Afghanistan (15%) is similar to
those found in eastern Asia among the Mongolians (17%) and Buryats (16.6%) (Derenko
et al., 2007), Mongolians (15%) (Kivisild et al., 1999b), Mongolians (21.3%) and Thais
(10%) (Jin et al., 2009). Some Central Asian populations also exhibit strong frequencies,
such as the Bukharian Arabs of Uzbekistan (20%) and Kyrgyz (30%) (Comas et al.,
2004) and the Shugnan of Tajikistan (18.2%) (Quintana-Murci et al., 2004).
5.1.3.2 Haplogroup D
Like haplogroup C, haplogroup D is also regarded as an East Asian lineage (QuintanaMurci et al., 2004) and is also a descendant of macrohaplogroup M. With an estimated
age of 57.4 ±8.2Ky (Kong et al., 2003) and 48,300 years (95% CI; 35,600-61,400YBP)
135
136
(Soares et al., 2009), it is older than haplogroup C. It is characterised by an absence of an
AluI site at np 5176. Absent or at low frequencies within South Asian and west Eurasian
populations, haplogroup D exhibits its greatest frequencies within Central Asia and East
Asia. It has been found within these regions ranging from ~30-40% (Asari et al., 2007;
Derenko et al., 2007; Zlojutro et al., 2007; Jin et al., 2009) while the Han population
from China exhibit a frequency of 24.6% (Yang et al., 2011).
The Baloch, Hazara and Tajiks exhibit a frequency of 6.7%, 10% and 10.5% respectively
for haplogroup D. The frequencies exhibited in the Afghan populations are similar to
those found within Central Asian populations; displaying frequencies from 5-20% among
the Uzbek groups (Comas et al., 2004; Quintana-Murci et al., 2004) and 6.8% and 15%
within the Tajiks (Derenko et al., 2007; Comas et al., 2004). The Baloch population
show
a
similar
frequency
with
the
Brahui
of
Balochistan,
southern
Afghanistan/southwest Pakistan (Figure 5.7).
5.1.3.3 Haplogroup G
With an estimated age of 35,700 years (95% CI; 25,500-46,300 years) (Soares et al.,
2009), haplogroup G is another of the East Asian lineages (Quintana-Murci et al., 2004).
Like the previously mentioned East Asian lineages, frequencies among South Asians and
west Eurasians are low, whereas the lineage is more frequent within Central Asian and
East Asia. Within the populations of East Asia it has often been found to exceed 10% of
the mtDNAs; Koreans – 7.7%, Han (Beijing) – 10%, Vietnamese – 16.7%, Mongolians –
17% (Jin et al., 2009), Han Chinese – 3.3% (Yang et al., 2011), Japanese – 8.8% (Asari
et al., 2007), Mongolians – 10.6% and Buryats – 11.3% (Derenko et al., 2007) and
southern Kazakhs – 20% (Comas et al., 2004). Haplogroup G can be characterised by a
HhaI site gain at np 4831 generated by a transition at np 4833.
Haplogroup G was found exclusively within the Pashtun population, exhibiting a
frequency of 7.1%. Within South Asia, the frequency of this lineage is lower; having
been found in only 1% of Pakistani mtDNAs (Quintana-Murci et al., 2004), while its
distribution in Central Asian groups, appear to be more similar to the Pashtuns (Figure
5.8). Haplogroup G was found at 5% within the Karakalpaks (western Uzbekistan) and
Tajiks and 10% among eastern Uzbeks (Comas et al., 2004), 4.6% among Tajiks and
8.2% within the Kalmyks (south-western Russia) (Derenko et al., 2007).
137
138
139
5.1.3.4 Haplogroup Z
Haplogroup Z is the final East Asian lineage (Quintana-Murci et al., 2004) that descends
from haplogroup M* to have been identified within the Afghan population. Soares et al.
(2009) estimated its divergence age as 24,300YBP (95% CI; 15,400-33,600 years).
Haplogroup Z is characterised by the occurrence of a transition at np 9090 within the
ATPase6 gene. The Z lineage shares a common origin with haplogroup C is fairly
frequent within Central and East Asia (Meinilä et al., 2001) especially among the
indigenous populations of northern and eastern Siberia (Bermisheva et al., 2002; Ingman
& Gyllensten, 2007a), but is uncommon in western Eurasians. The Korean, Thai and
Vietnamese populations express frequencies of 0.6%, 5% and 2.1% respectively (Jin et
al., 2009). Ingman & Gyllensten (2007a) identified haplogroup Z was also present within
the Volga-Ural region of Russia, and additionally that the lineage is present at low
frequencies among the Saami indicating that the Z lineage has been introduced into the
population by northern Asian populations via the Volga-Ural region.
The Hazara population again illustrates a similarity with populations of Mongolia as both
share similar frequencies of haplogroup Z (Figure 5.9); 2.5% (Hazara) and 2.1% among
Mongolians (Derenko et al., 2007; Jin et al., 2009). The Tajiks, northeast of Afghanistan,
also exhibit a similar frequency, 2.3%, while populations north of Afghanistan, such as
the Karakalpaks and Kazakhs, show a slightly greater frequency of 5% (Comas et al.,
2004). Like haplogroup C, haplogroup Z was found exclusively among the Hazaras, an
observation that may be directly linked to both haplogroups sharing a common origin
before the lineages coalesce to haplogroup M* but also of their perceived East Asian and
Mongol heritage.
5.1.3.5 Haplogroup A
Haplogroup A, despite being a descendant of macrohaplogroup N*, is typically an East
Asian lineage (Kivisild et al., 1999; Quintana-Murci et al., 2004) and is also one of the
founder lineages of the Americas. Haplogroup A has an estimated age of 25-34Ky (Chen
et al., 1995) and 29,200 years (95% CI; 19,100-39,800 years) (Soares et al., 2009) and is
characterised by a HaeIII site gain at np 663 that is generated by a transition at the same
location. Haplogroup A is usually infrequent or absent among west Eurasians and is
generally found in its greatest frequencies in East Asia, but has also been found at a
frequency of 13.2% in African-descendant tribes of Brazil (Carvalho et al., 2008).
The Hazaras (5%) and the Baloch (6.7%) both exhibit similar frequencies of haplogroup
A (Figure 5.10), which in turn are similar to the surrounding populations. Within Central
140
141
142
Asia, populations such as the Khoremian Arabs of western Uzbekistan present a
frequency of 10%, the Dungan and Uighur populations of Kyrgyzstan display
frequencies of 12.5% and 6.4% respectively and the Tajik population 15% (Comas et al.,
2004). A similar frequency (7.5%) was found with the Han population of China (Yang et
al., 2011). Within South Asia, haplogroup A was found in 1% of the population from the
Uttar Pradesh region of northern India (Kivisild et al., 1999b), while other populations
such as the Hunza Burusho of Balochistan, Turkmen, Turkmen Kurds, north-eastern
Persians, Tajiks and Shugnan group from Tajikistan presented frequencies of 2.3-3.1%
(Quintana-Murci et al., 2004; Derenko et al., 2007). The frequencies from Mongolia
from three studies report frequencies of 3.9%, 13% and 4.3% (Kivisild et al., 1999b;
Derenko et al., 2007; Jin et al., 2009).
5.1.3.6 Haplogroup B
Haplogroup B has an estimated age of 50.8 ±6.6Ky (Kong et al., 2003) and 50,700 years,
CI – 38,100-63,800 YBP (Soares et al., 2009). It is a typical East Asian lineage, and
along with lineages A, C and D, is one of the founder groups of the Amerindians
(Kivisild et al., 1999; Quintana-Murci et al., 2004; Irwin et al., 2009a). The greater
frequencies of haplogroup B can be found in the East Asian populations (Figure 5.11) of
China, Korea and Mongolia and is less frequent within Central Asia and western Eurasia.
It was found in 0.9% of Iraqi mtDNAs (Al-Zahery et al., 2003), while Korean
populations reported frequencies of ~15% and ~20% (Jin et al., 2009; Derenko et al.,
2007) and Han Chinese 17.8% (Yang et al., 2011). Haplogroup B is typically
characterised by a 9bp deletion from np 8281-8289, this deletion occurs in a small
section of non-coding DNA between the COII and tRNAlys genes (Figure 5.12). This
lineage is primarily characterised by the deletion of the repeated CCCCCTCTA sequence
within the mtDNA coding region, however this deletion also presents itself among some
African populations, but these populations do not belong to haplogroup B. Despite
populations from two completely different regions possessing the same deletion, it has
been identified that they occurred independently from one another rather than these
populations descending from a common source (Soodyall et al., 1996). The Hazara
contain a frequency of 2.5% and Pashtuns 7.1% of haplogroup B. neighbouring
populations from Central Asia exhibit similar frequencies of this lineage; Turkmen
populations display 2.4% and 5% while Turkmen Kurds exhibit 6.3%, Uzbeks present a
frequency of 5% each among the Bukharian Arabs and Uzbeks but 10% among the
Khoremian Uzbeks. The Lurs and north-eastern Persians also exhibit similar
143
144
145
frequencies of 5.9% and 6.1% (Comas et al., 2004; Quintana-Murci et al., 2004).
Meanwhile East Asian populations such as the Mongolians, Chinese and Koreans
exhibit this lineage more often among their populations. In Mongolia, haplogroup B
occurs at a frequency of 8.5%, 9.7% and 15.3% (Jin et al., 2009; Kivisild et al., 1999b;
Derenko et al., 2007). Among two populations from eastern China; the Han (Beijing)
and Korean-Chinese, haplogroup B accounts for 10% and 11.8% of mtDNAs, while it is
found in 15.1% and 20.8% of Koreans (Jin et al., 2009; Derenko et al., 2007).
5.1.3.7 Haplogroup F
Haplogroup F is the last East Asian lineages (Kivisild et al., 1999; Quintana-Murci et
al., 2004; Irwin et al., 2009a) to be found among the Afghan populations. The age of
haplogroup F has been estimated as 43,400 years, CI – 32,200-55,000 YBP (Soares et
al., 2009) and 60.0 ±9.2Ky (Kong et al., 2003). This lineage is characterised by the
absence of a Tsp509I site at np 6389 that is an outcome from a transition at np 6392.
Haplogroup F is present in larger frequencies within populations from East Asia and
east Central Asia (Figure 5.13). It is rarely reported outside Asia, the Chinese Han
present a frequency of 17.7% (Yang et al., 2011) while the Kalmyks of south-western
Russia present a frequency of 5.5% (Derenko et al., 2007) that is greater than
populations immediately east of the Caspian Sea.
Haplogroup F was found exclusively among the Hazara at a frequency of 2.5%. this is
similar to the frequencies found in India (1-2%), and among some Uzbeks and Turkmen
populations (2.4%) (Kivisild et al., 1999b; Quintana-Murci et al., 2004). The lineage
has been found in large frequencies among the Kyrgyz – 15%, Kashmir – 21%, and
Uighur – 25% populations (Kivisild et al., 1999b; Comas et al., 2004), while
neighbouring populations all exhibit frequencies <10%. Large frequencies have also
been reported among Mongolians – 14.9%, and the Han (Beijing) – 22.5%, populations
(Jin et al., 2009).
As has been reported, these lineages are particularly common within East Asian and
North Asian populations. Of these lineages, only haplogroup G has not been found
among the Hazaras, thus indicating this Afghani ethnic group have had significant East
Asian genetic influence at some point or during multiple durations throughout their
history. This influence may be due to the combination of the arrival of Yuezhi invasion
shortly before the 1st century BC, the ruling of the Chinese Tang dynasty from 659-751
146
147
AD (Wilbur, 1962) and the more recent Mongol expansion early in the 13th Century. The
representation of the East Asian lineages accounts for 37.5% of Hazaran mtDNAs, while
among the Tajiks, Baloch and Pashtuns, the contribution is not greater than 14.3% (Table
5.1). This indicates the East Asian invasions and migrations have not had as significant
effect on these ethnic groups and their subsequent acquisition of the East Asian lineages
may also be due to some maternal admixture by the Hazaras.
5.1.4
The West Eurasian Lineages
5.1.4.1 Haplogroup X
Haplogroup X is recognised as a west Eurasian lineage (Kivisild et al., 1999; QuintanaMurci et al., 2004) with an origin in the Near East and West Eurasia (Reidla et al., 2003;
Shlush et al., 2008) as well as one found in the Amerindians, and is estimated to have
diverged 20.4 ±6.5Ky (Richards et al., 1998), a 95% Credible Region (CR) of 13,70026,600 YBP in the Near East and 17,000-30,000 YBP in Europe (Richards et al., 2000),
and 31,800 years (CI; 19,700-44,600 YBP) (Soares et al., 2009). It can be characterised
by transition at np 6371 in the COI gene and consists of two major sub-groups, X1 and
X2 (Reidla et al., 2003). The former sub-group is confined to north and east Africa while
the latter is widespread throughout west Eurasia and is likely to have expanded periLGM or shortly post-LGM as conditions ameliorated (Reidla et al., 2003). The version of
haplogroup X found among the Amerindians is a derivative of sub-group X2 (Reidla et
al., 2003). Haplogroup X is found in Europe, the Near East, Central Asia and among the
Amerindians of the Americas (Reidla et al., 2003). However, frequencies throughout
these regions vary. In Europe, haplogroup X has a frequency of 2.5% in the UK and USA
(Herrnstadt et al., 2002), 0.8% in France, 0.9% in England while the Orkney Islands have
a frequency of 7.2%. Elsewhere in Europe it appears in Spain (4.2%), Greece and Turkey
(both 4.4%). In the Middle East, X appears in Yemen (0.9%), Oman (1.3%), Saudi
Arabia (1.5%), Syria (1.8%), Jordan (2%), Lebanon (5.8%) and Israeli Druze (26.7%)
(Reidla et al., 2003). In Eurasia and Central Asia, haplogroup X was found to present
itself at 0.2% in India and 5.5% in Armenia and Georgia (Kivisild et al., 1999), 1% in
Andhra Pradesh (Kivisild et al., 1999b), 8.6% in Georgia (Quintana-Murci et al., 2004)
7.6% in Georgia, 2.6% Armenia, 2.7% in the North Caucasus, 0.2% India and 0.6%
among Uzbeks (Reidla et al., 2003). The Hazara exhibit a frequency of 7.5% and the
Baloch 6.7% (Figure 5.14) which are most similar to the Kashmir (5%) and Turkmen
Kurds (6.3%) populations. Elsewhere within the region (Iraq, Hunza Burusho, Turkmen
Kurds, Shugnan and Iran), the frequency of haplogroup X is ~2.4%.
148
149
5.1.4.2 Haplogroup HV*
Haplogroup HV* is a common west Eurasian and European lineage (Kivisild et al.,
1999; Quintana-Murci et al., 2004; Nasidze et al., 2006, 2007). This lineage possesses an
estimated age of 27,100 years (CI; 19,600-34,800 YBP) (Soares et al., 2009), a CR of
24,300-29,000 YBP in the Near East, while in Europe the CR is 20,700-22,800 YBP
(Richards et al., 2000). Haplogroup HV (and its consequent lineages) account for ~45%
of Finnish mtDNAs (Hedman et al., 2007) and despite being a common Eurasian and
European lineage is often found in Central Asia. It has been recorded in the Karakalpak
(25%) and the Uighurs (6.3%), where similar frequencies are also exhibited in their
neighbouring populations (Comas et al., 2004). This lineage can be characterised by the
absence of a MseI site at np 14766 that is caused by a transition at the same location.
Haplogroup HV is one of two haplogroups that have been found in the four Afghan
populations. It is present in 2.5% in the Hazara, 6.7% among the Baloch, 14.3% among
Pashtuns, and 15.8% among the Tajiks (Figure 5.15). Similar frequencies were found
among the nearby populations of Pakistan; 10.3% Baloch, 5.3% Brahui, 4% Pakistani
and 6.1% Makrani. In western Eurasia, HV appears in 19.1% of Persians, 24.3% of
Gilaki and 30% of Turkmens (Quintana-Murci et al., 2004). This distribution illustrates a
greater affinity in west Eurasian populations.
5.1.1.1 Haplogroup H
As the most common haplogroup in Europe and the Near East (Kivisild et al., 1999;
Richards et al., 2002; Quintana-Murci et al., 2004; Nasidze et al., 2006, 2007), it has
been extensively studied due to its frequencies within European populations, and as such,
has an estimated age of 20.5 ±2.5Ky (Richards et al., 1998), a CR of 23,200-28,400 YBP
in the Near East and 19,200-21,400 YBP in Europe (Richards et al., 2000) and 18,600
years, CI – 14,700-22,600 YBP (Soares et al., 2009). Haplogroup H can be classified as a
result of an absence of an AluI site at np 7025 that is caused by a transition at np 7028. In
Europe, haplogroup H, occurs at a frequency of 40-60% (Richards et al., 2000, 2002;
Pereira et al., 2006; Roostalu et al., 2007) with Basque populations exhibiting the
greatest frequencies, ~60% (Richards et al., 2002) while throughout the Near East, with
the westernmost populations display greater frequencies than those in the east. Within the
Near East and Caucasus, the frequency of haplogroup H is reported to be 25-30%
(Richards et al., 2002), ~20% (Pereira et al., 2006), and 10-30% within the Near East,
Caucasus and Central Asia (Roostalu et al., 2007). It was found at ~18% within the
150
151
152
Egyptian population, ~33-37% in Israel, Syria and Jordan, ~27% in Iraq, and ~10-21%
within Iran (Nasidze et al., 2008). In the UK and USA, haplogroup H is found in 52% of
mtDNAs (Herrnstadt et al., 2002), while in the Spanish regions, Galicia, Cantabria and
Catalonia, it accounts for ~44% and ~39% of the population (Alvarez-Iglesias et al.,
2009).
The distribution of haplogroup H (Figure 5.16) among west Eurasian populations is fairly
uniform with frequencies no lower than 10% while in the Indian subcontinent,
haplogroup H is all but absent. In Iran, the frequency among the different populations
ranged from 10% in the Kurdish group to 17.6% among the Lurs (Quintana-Murci et al.,
2004) which are similar to the frequency found among the Hazaras and Pashtuns, 10% &
14.3%. The Central Asian populations of Uzbeks, Turkmen, Tajiks and Shugnan exhibit
frequencies of 21.4%-29,5% which are more similar to the Baloch and Tajik populations
which both exhibit 26.7% and 36.8%.
5.1.1.1 Haplogroup JT
Haplogroup JT has an estimated age of 50,300 years, CI – 38,400-62,500 YBP (Soares et
al., 2009). The JT lineage is the ancestor to haplogroups J and T and can be found among
western Eurasian and Central Asian populations. Haplogroup JT itself is not often
recorded, but its subsequent offspring lineages do present themselves in European, Near
Eastern and Central Asian populations. JT was recorded in 1% of modern Hungarians
(Tömöry et al., 2007) and 9.7% among Yakuts (Zlojutro et al., 2008), however the
frequency of the latter also included the frequencies of haplogroups J and T and is
therefore not a complete representation of the JT lineage. Haplogroup JT is characterised
by the gain of a NlaIII at np 4216 that is generated by a transition at the same location.
Haplogroup JT was found to be present among the Hazaras (2.5%), Tajiks (10.5%) and
Baloch (13.3%) (Figure 5.17) but was either absent or not reported in the neighbouring
populations.
5.1.1.1 Haplogroup J
Derived from JT, haplogroup J is also a typical lineage of west Eurasia (Kivisild et al.,
1999; Quintana-Murci et al., 2004; Nasidze et al., 2006, 2007). This haplogroup is also
present in some European populations. The estimated age for haplogroup J is 32,600
years, CI – 22,400-43,200 YBP (Soares et al., 2009), 28.0 ±4.0Ky (Richards et al., 1998)
and a CR age of 42,400-53,700 YBP in the Near East and 22,000-27,400 YBP in Europe
(Richards et al., 2000). It is frequent within west Eurasian and Central Asian populations
153
154
155
(Figure 5.18) and has been found to be present in ~4% of the Finnish population
(Hedman et al., 2007), while in the Near East, it appears in slightly greater frequencies;
~7% Egypt, ~12% Israel, ~8% Syria, ~6% Jordan, ~13% Iraq and ~10-30% in Iran
(Nasidze et al., 2008). Haplogroup J was found at a very low frequency within India,
0.2% (Kivisild et al., 1999) but in Western Europe, UK and USA populations,
haplogroup J features in 7.6% of mtDNAs (Herrnstadt et al., 2002). Haplogroup J is
characterised by the absence of a BstNI at np 13704 caused by a transition at np 13708.
The Baloch exhibit a frequency of 13.3% while the Pashtuns display 7.1%. The south
Caspian populations of Iran exhibit frequencies of ~16-24%, while the populations of
Central Asia (Turkmen, Uzbeks, Tajiks), the frequency of this lineage is not greater than
5% in Tajikistan (5% Tajiks and 4.5% Shugnan), 7.1% in Uzbekistan (7.1% Uzbek, 5%
Karakalpak & Uzbek) and 9.8% in Turkmenistan (9.8% & 5%) (Comas et al., 2004;
Quintana-Murci et al., 2004).
5.1.1.2 Haplogroup T
Often identified as a west Eurasian lineage (Kivisild et al., 1999; Quintana-Murci et al.,
2004; Nasidze et al., 2006, 2007) haplogroup T is also found in Europe and has an
estimated age of divergence of 46.5 ±6.0Ky (Richards et al., 1998) and a CR of 41,90052,900 YBP in the Near East while in Europe this CR is 33,100-40,200 YBP (Richards et
al., 2000). Soares et al. (2009) calculated an estimated age of 26,800 years, CI – 18,10035,800 YBP. Haplogroup T is characterised by a BfaI site gain at np 4914 generated by a
transition at np 4917. Near Eastern populations have shown to exhibit frequencies of
haplogroup T from ~4%-~16%, these frequencies occurred in Egypt, Israel, Jordan,
Syria, Iraq and Iran (Nasidze et al., 2008). Haplogroup T was also found within the
Finnish population presenting a frequency of ~6% (Hedman et al., 2007). It is fairly
frequent in western Eurasia, the distribution of T decreases toward the east and south of
Asia (Figure 5.19) with the exception of the Dungan population which presents a
frequency of 18.8% (Comas et al., 2004). Haplogroup T was also reported at a frequency
of 10.6% in the United Kingdom and USA (Herrnstadt et al., 2002).
The Hazara exhibit a frequency of 2.5% and the Baloch 6.7% for haplogroup T, this in
comparison to populations from Iran, is relatively low, while they are similar to Central
Asian populations. This lineage is a much more frequent trait in western Eurasia than it is
in Central Asia. Haplogroup T is found in 1% of Indians in Uttar Pradesh and Andhra
Pradesh (Kivisild et al., 1999b).
156
157
The West Eurasian lineages contribute to the majority of the mtDNA types found among
each of the Afghani populations in this study, especially among those inhabiting the
lowlands; the Baloch, Pashtuns and Tajiks. The terrain of the Afghan lowlands provides
a greater opportunity for the gene flow of neighbouring or nearby characteristics and/or
lineages into the populations and tribes. These migratory movements may have included
the western invasions and reigns by the Persians and the Greeks. The Hazaras do not
possess a West Eurasian haplogroup contribution as significant (40% as to 64.3-89.5%)
as the other ethnic groups. This may be a consequence of their nomadic lifestyle and
isolation among Afghanistan’s largest physical barrier, the Hindu Kush Mountains.
5.2
Discussion
Without explaining the distribution of the haplogroups found in Afghanistan, their
presence means very little on its own. In total, seventeen haplogroups were found among
the Afghani population (Figure 5.20) and of the four Afghani ethnic groups reported, the
Hazara seem the most diverse. Of the seventeen haplogroups identified, fifteen were
present within the Hazara (only haplogroups G and J were absent), while the Tajiks are
the least diverse group as their mtDNA pool contained only six haplogroups, the Pashtun
and Baluch populations consist of eight and nine different lineages respectively (Figures
5.21-24). Of the seventeen haplogroups observed, only two were reported within each
Afghani ethnic group; the Pashtuns, Tajiks, Hazaras and Baluch, these two lineages are
the West Eurasian haplogroups HV and H. The Hazara has the greatest collection of East
Asian mtDNAs, and of the seven East Asian haplogroups found within the Afghani
population, six were observed among the Hazara; A, B, C, D, F and Z. Additionally,
three of these lineages, C, F and Z, were observed only among the Hazaras.
Skeleton trees were constructed (Figures 5.20-24) illustrating the haplogroups observed
among the whole Afghani population and also among the individual ethnic groups.
Branch lengths connecting haplogroups are not representative of the degree of variation
between them, however circle size of reported haplogroups are proportionate to their
observed
frequency.
For
instance,
among
the
entire
Afghani
population
(Figure 5.20) the largest circle belongs to haplogroup H which was observed most often
(n=18) while haplogroups F, G and Z were each observed once within the dataset and
therefore have the smallest circle.
158
159
160
161
162
163
Table 5.3: East Asian haplogroup frequencies among the Hazara, Mongolians and Koreans.
Lineage
Frequency in Population
Hazara
Mongolians
Han Chinese
Koreans
(Kivisild et al., 1999b, Derenko et al.,
(Yang et al., 2011)
(Derenko et al., 2007, Jin
2007, Jin et al., 2009)
et al., 2009)
A
5%
3.9%, 4.3%, 13%
7.5%
6.8%, 8.4%
B
2.5%
8.5%, 9.7%, 15.3%
17.8%
15.1%, 20.8%
C
15%
15%, 17%, 21.3%
5.4%*
1%, 1.7%
D
10%
11%, 12.8%, 19%
24.6%
33.5%, 39.8%
F
2.5%
5.8%, 6.4%, 14.9%
17.7%
4.9%, 10.1%
Z
2.5%
2.1%, 2.1%
5.4%*
0.6%
* Haplogroup CZ
When compared to East Asian populations (Table 5.3), the Hazaras present similar
frequencies of the East Asian lineages and in particular to those reported from Mongolian
populations. This indicates the Hazara may contain some historical maternal genetic
influence from Mongolia. This indication supports the belief the Hazara have of their
perceived Mongol ancestry.
The mtDNA genepool of the Tajiks, Baloch and Pashtuns contain low frequencies of
East Asian, South Asian and African lineages which infer a limited contribution via
migrations and/or invasions or gene flow. The Hazaras also exhibit low frequencies of
African and South Asian lineages; however present a near 50/50 split of the remaining
mtDNAs between East Asian (37.5%) and West Eurasian (40%) haplogroups.
Haplogroups HV and H constitute 12.5% of the Hazaran mtDNAs and when compared to
the Tajiks, Pashtuns and Baloch and to nearby populations from Iran, Turkmenistan and
Uzbekistan (Quintana-Murci et al., 2004) and Iraq (Al-Zahery et al., 2003) the frequency
is somewhat lower (Table 5.4).
164
Table 5.4: Frequency of haplogroups HV & H among Afghan, Iranian, Iraqi, Turkmen, Uzbek and
Pakistani populations.
Population
Frequency of HV
Frequency of H
Frequency of
Pakistan
Iran
Afghanistan
HV & H
Hazara
2.5%
10.0%
12.5%
Tajik
15.8%
36.8%
52.6%
Baloch
6.7%
26.7%
33.4%
Pashtun
14.3%
14.3%
28.6%
Persian
19.1%
14.3%
33.4%
Gilaki
24.3%
13.5%
37.8%
Mazandrian
9.5%
14.3%
23.8%
Lurs
11.8%
17.6%
29.4%
Iranian Kurds
20.0%
10.0%
30.0%
Hazara
4.3%
13.0%
17.3%
Baloch
10.3%
20.5%
30.8%
Iraq
10.6%
19.9%
30.5%
Turkmen
4.8%
22.0%
26.8%
Uzbek
7.2%
21.4%
28.6%
Figure 5.25: Location of the ethnic groups of Afghanistan.
165
The frequencies of these lineages and the contribution made by the remaining West
Eurasian lineages found among the Tajik, Baloch and Pashtun ethnic groups indicate a
likely origin to be within West Eurasia. The Hazara, however are less likely to have a
West Eurasian origin as a consequence of the strong East Asian mtDNA contribution to
their genepool. The differences between the Hazaras and the three other ethnic groups
reported in this study may be due to the barriers which prevent the regular admixture of
the ethnic groups. These barriers include those that are physical such as landscape,
religious and cultural or linguistic. The high frequency of the East Asian lineages
persisting within the Hazara population may be due to their inhabitancy of the Hindu
Kush Mountains. This study has identified the ethnic groups which inhabit the Afghan
lowlands both north and south of the Hindu Kush (Figure 5.25) exhibit significantly
larger contributions of West Eurasian lineages, and the isolation of the Hazara within the
mountain range inhibits the gene flow of these lineages. In addition, the ethnic groups
also practice different forms of Islam; the Hazaras are Shi’a Muslims while the other
ethnic groups are Sunni Muslims. Consequently, it is unlikely that any admixture would
occur between the two denominations of Islam, as Shi’a Muslims have historically been
persecuted and disparaged by Sunni Muslims. Despite this, the Hazara do have a
contribution of 40% of West Eurasian lineages within their mtDNA genepool and it is a
possibility that as a consequence of a history of persecution, the Hazaras may have
induced the gene flow of these lineages into their ethnic group. By inducing positive
gene flow, the Hazara may have been able to increase their ability to integrate themselves
into the Afghani society. Alternatively, the Hazaras may have been part of integrated
Afghani society, but have subsequently become isolated as a consequence of the practice
of the alternative Islamic denomination and the assimilation of the Mongol expansion is
the 13th and 14th centuries.
166
Chapter Six
Mitochondrial DNA Diversity and
Polymorphism in Afghani Populations
167
6. Mitochondrial DNA Diversity and Polymorphism in Afghani Populations
6.1 Previous mtDNA Studies on Afghani Populations
To date, there have been no previous mtDNA studies on the populations of Afghanistan.
The Central Asian region in which Afghanistan resides has been a major crossroad for
trade routes and migrations throughout history (Haber et al., 2012) but has been studied
very little in comparison to Europe and the numerous emerging studies on East Asian
populations. Some studies of Central Asia have focussed on populations adjacent to the
Afghani populations such as those in Iran, the Near East including Iraq, Jordan, Israel
and Syria, the Caucasus region, South Asia, northern Asia, Turkmenistan, Kyrgyzstan,
southern Kazakhstan, Tajikistan and Uzbekistan (Comas et al., 1998; 2004; Kivisild et
al., 1999; Richards et al., 2000; Al Zahery et al., 2003; 2011; Palanichamy et al., 2004;
Quintana-Murci et al., 2004; Derenko et al., 2007; Zlojutro et al., 2007; Irwin et al.,
2009b). One study (Irwin et al., 2009b) examined Uzbek sub-populations, including five
of Uzbek ancestry but also six with foreign ancestry including one with ancestry in
Afghanistan. This population of Afghani Uzbeks is located on Uzbekistan’s eastern
border with Tajikistan (Figure 6.1). This group has been established in Uzbekistan within
the last century and was identified to contain a large west Eurasian mtDNA contribution
(Irwin et al., 2009b); however, their ethnicity is unknown as is the cause for their
position in Uzbekistan. Their isolation from Afghanistan may be a consequence of the
former Soviet Union’s movement of their border southwards.
The study conducted by Quintana-Murci et al. (2004) examined a number of populations
from Iran, Pakistan and Central Asia. Some populations studied were located on
Pakistan’s western border with Afghanistan such as the Kalash, Baluch, Brahui and
Hazara but the study was not expanded in incorporate Afghani populations despite its
historical standing as mentioned in Chapter 2. When Iranian populations have been
studied in the past (Richards et al., 2000; Comas et al., 2004) their ethnicity has not been
reported and with a country with the area Iran possess, 1,648,195Km2 (CIA, 2011) there
are certainly a number of different ethnicities. For instance, Comas et al. (2004) report
findings of an Iranian population in a region of northern Iran near the south of the
Caspian Sea; there are a number of Iranian ethnic groups which live in this region:
Mazandaranis, Gilakis or the population may be from Tehran, Iran’s capital, that is only
~200Km south of the Caspian Sea, a relatively short distance. The data from this study is
also based upon 16-20 samples from twelve populations and to accurately gauge the
mtDNA landscape of these populations larger sample sizes would be appropriate.
168
169
From the published data for the populations studied within this region, we can identify
three distinct groups; i) the Iranian plateau and Turkmenistan, Uzbekistan and Tajikistan;
consist of large west Eurasian, smaller East Asian and a low or absent South Asian
contribution, the latter do however have a greater contribution of East Asian lineages
than the populations of Iran, ii) South Asia, where the mtDNA genepool is dominated by
South Asian lineages and is supplemented by a small west Eurasian contribution and a
near absence of East Asian and African (except among the Makrani, 39.4%) lineages and
iii) Kyrgyz populations, who consist of large East Asian, smaller west Eurasian and near
absence of South Asian contributions (Comas et al., 2004; Quintana-Murci et al., 2004).
Despite the absence of any mtDNA data on the Afghani populations, recently there have
been two Y-Chromosome studies emerge (Lacau et al., 2011; Haber et al., 2012).
6.2 mtDNA HVS-I Region Sequencing
6.2.1
Variable Sites
In total, 87 mtDNA HVS-I sequences were obtained across four Afghani ethnic groups.
The sequences include all base pairs from nucleotide positions 16024-16365, numbered
according to Anderson et al. (1981). All sequences were aligned against the rCRS
(Figure 6.2) and the number of different haplotypes was identified for each subpopulation. Among the HVS-I sequences, 80 polymorphic sites were identified (Table
6.1). Table 6.1 also shows the number of substitutions at each nucleotide position, the
position and quantity of both indels, and transversions. The majority of transitions
observed involved pyrimidines (241/287) while 18 transversions were also observed.
DnaSP version 5.10 was used for the basic analyses on the HVS-I sequence data. The
polymorphisms identified in this study are defined in relation to the CRS sequence,
which was observed among four individuals (Table 6.1). The greatest number of variable
sites observed within a single HVS-I haplotype was 10 (sample 110_Haz), relative to the
CRS.
Table 6.2 shows the total number of monomorphic and polymorphic sites observed
among the four Afghani ethnic groups, and also the total number of singleton sites; noninformative sites, and parsimony informative sites, polymorphic sites that are present at
least twice, within the mtDNA HVS-I sequence. The forensic output and mismatch tables
for the polymorphic sites/haplotypes identified among the four Afghani populations can
be found in Appendix 3.
170
171
Table 6.1: Frequency and nucleotide positions of transitions, transversions and indels within the HVS-I
sequences of the four Afghani ethnic groups. (Transversions in red bold; indels in bold).
HVS-I
Mutation
HVS-I
Mutation
HVS-I
Mutation
HVS-I
Mutation
Nucleotide
Frequency
Nucleotide
Frequency
Nucleotide
Frequency
Nucleotide
Frequency
Position
Position
Position
Position
16037.G
1
16181.-
1
16239.T
1
16298.C
8
16041.G
1
16182.-
3
16240.G
3
16300.G
1
16051.G
1
16183.C
9
16243.C
1
16304.C
6
16069.T
5
16183.-
2
16248.T
2
16305.T
1
16071.T
5
16184.A
1
16249.C
3
16309.G
1
16086.C
1
16184.T
2
16256.T
3
16311.C
13
16092.C
2
16185.T
1
16257.A
1
16318.T
1
16093.C
4
16186.T
1
16260.T
2
16319.A
8
16111.T
3
16189.C
18
16261.T
7
16325.C
4
16126.C
7
16189.1C
4
16262.T
1
16327.T
7
16129.A
11
16192.T
1
16265.G
2
16335.G
1
16134.T
1
16193.1C
17
16266.T
3
16343.G
1
16136.C
2
16193.2C
2
16270.T
1
16344.T
2
16140.C
2
16201.T
1
16271.C
2
16352.C
2
16145.A
6
16209.C
3
16274.A
3
16353.T
1
16148.T
2
16217.C
3
16278.T
7
16354.T
2
16163.G
1
16222.T
3
16288.C
1
16356.C
7
16172.C
10
16223.T
36
16289.G
1
16357.C
2
16173.T
1
16224.C
1
16290.T
4
16362.C
20
Anderson
4
16174.T
1
16227.G
1
16291.T
1
16175.G
1
16230.G
2
16292.A
3
16176.T
1
16232.A
2
16294.T
6
16180.-
1
16234.T
2
16297.C
3
Table 6.2: General data of the HVS-I polymorphisms among the four Afghani ethnic groups.
Population
Baluch
Hazara
Pashtun
Tajik
Selected region
16024-16365
16024-16365
16024-16365
16024-16365
Number of sites
351
351
351
351
Total number of sites (excluding sites with gaps or
341
338
341
342
Sites with alignment gaps or missing data
10
13
10
9
Invariable (monomorphic) sites
312
282
306
319
Variable (polymorphic) sites
29
56
35
23
Total number of mutations (relative to CRS)
29
57
35
23
Singleton variable sites
12
26
26
17
Parsimony informative sites
17
30
9
6
Singleton variable sites (two variants)
12
25
26
17
Parsimony informative sites (two variants)
17
30
9
6
Singleton variable sites (three variants)
0
1
0
0
Parsimony informative sites (three variants)
0
0
0
0
missing data)
172
Additional mtDNA data of the four Afghani ethnic populations, such as the number of
loci and polymorphic sites, frequency of transitions, transversions and indels and also the
mean number of pairwise differences, is shown in Table 6.3.
Table 6.3: Afghani ethnic group mtDNA HVS-I sequence data.
Population
Baluch
Hazara
Pashtun
Tajik
Sample size
15
40
14
18
No. of loci
351
351
351
351
No. of polymorphic sites
32
63
38
25
Sum of square frequencies
0.0756
0.0275
0.0714
0.0741
No. of observed transitions
27
52
33
22
No. of observed transversions
2
6
2
1
No. of observed substitutions
29
58
35
23
No. of observed indels
3
7
3
2
No. of observed sites with
27
52
33
22
2
6
2
1
29
57
35
23
No. of observed sites with indels
3
7
3
2
Nucleotide composition -C
33.33%
33.48%
33.50%
33.35%
Nucleotide composition -T
22.29%
22.15%
22.19%
22.20%
Nucleotide composition -A
33.29%
33.35%
33.15%
33.35%
Nucleotide composition -G
11.09%
11.02%
11.17%
11.09%
transitions
No. of observed sites with
transversions
No. of observed sites with
substitutions
Mean number of pairwise
7.295238
7.641026
7.208791
3.98032
differences
±3.618935
±3.639090
±3.595280
±2.088066
6.2.2
Haplotype Distribution
The number of different haplotypes (h) present within an observed population would be
expected to increase as sample sizes increase as the opportunity for multiple
polymorphism variations and combinations increase. However, the shorter the DNA
sequence length as sample sizes increase, the haplotype diversity would decrease. The
haplotype diversity (Hd) for the four Afghani ethnic groups has been calculated (Table
6.4) using DnaSP v.5.10. In this study, the Afghani populations are characterised by high
haplotype diversities; the least value among the Tajiks (0.9804) and highest among the
Baluch (1.00). Here, the Hd values of the Afghani populations are high (≥0.98) as there
are few shared haplotypes among them, which may be due to the relatively low sample
sizes. From all HVS-I sequences, 23 (26.4%) haplotypes are shared and found in two or
173
more individuals, while 64 haplotypes (73.6%) were found in only one individual. There
are shared haplotypes between each population (Table 6.5); the greatest number can be
found between samples belonging to Pashtuns and Tajiks (6 haplotypes shared).
Table 6.4: Number of haplotypes (h), haplotype diversity (Hd) and nucleotide diversity (π) of the 4
Afghani populations using DnaSP ver. 5.10
Population
N
Number of
Haplotype
S.D.
haplotypes (h)
Diversity (Hd)
Nucleotide
S.D.*
Diversity (π)*
Nucleotide Diversity
(Jukes & Cantor) (πJC)
Baluch
15
14
0.990
0.028
0.021207
±0.011795
0.01991
Hazara
40
36
0.994
0.008
0.022148
±0.011718
0.02009
Pashtun
14
14
1.000
0.027
0.020956
±0.011732
0.01954
Tajik
18
16
0.980
0.028
0.011571
±0.006788
0.01109
S.D. = Standard Deviation
* = calculated by Arlequin ver. 3.5.1.2
Table 6.5: Number of shared haplotypes between the Afghani ethnic groups in this study
Baluch
Hazara
Pashtun
Tajik
6.2.3
Baluch
3
2
2
Hazara
Pashtun
Tajik
2
4
6
-
Genetic Diversity
6.2.3.1 Gene Diversity
The test of gene diversity (H) is analogous to the expected heterozygosity of diploid data
and is determined by the probability that from a sample population, two randomly
selected sequences or haplotypes will differ. The estimated gene diversity (Table 6.6) for
the four Afghani populations in this study was calculated using the following equation by
Nei (1987):
H
k
2
n 
1

 p 
n  1  i 1 i 
Where n is the number of mtDNA HVS-I sequences, k is the number of haplotypes and
pi is the sample frequency of the i-th haplotype. The standard deviation of the
heterozygosity is calculated by:
The gene diversity was estimated using Arlequin vers. 3.5.1.2 (Excoffier & Lischer,
2010). Irwin et al., (2009b) calculated the genetic diversity for the Afghani population in
Uzbekistan as 0.943 based upon both HVS-I and HVS-II sequence data, and when
compared to the genetic diversity for the HVS-I sequence data for the Afghani
populations in this study infers that they (the population from Irwin et al.) are less
174
diverse despite the larger mtDNA sequence studied. This may be caused by the
population inside Uzbekistan remaining inside their own ethnic group and not admixing
with the indigenous or alternative displaced populations thus causing a bottleneck of
mtDNA haplotypes. Irwin’s study also identified the Afghani population inside
Uzbekistan to have a high random-match probability of 5.5% indicating a greater number
of shared haplotypes. The Afghan populations in this study exhibit high genetic
diversities (≥0.9804 ±0.0284) which would be expected as a consequence of
Afghanistan’s position as a thoroughfare to nearby regions north, south, east and west.
Table 6.6: Gene diversity of the Afghani populations in this study
Afghani Population
Gene Diversity (H) & S.D.
Baluch
0.9905 ±0.0281
Hazara
0.9974 ±0.0063
Pashtun
1.000 ±0.0270
Tajik
0.9804 ±0.0284
6.2.3.2 Nucleotide Diversity
The nucleotide diversity calculates the probability that ‘two randomly selected
homologous nucleotides differ’ (Excoffier & Lischer, 2010), and identifies the mean
pairwise differences (π). The calculation (Tajima, 1983; Nei, 1987) and is equivalent to
gene diversity, and is equally suitable for both sequence and RFLP data.
Where dij represents the number of mutations to have occurred since the divergence of
haplotypes i and j, k is the number of haplotypes, while pi and pj are the frequencies of
haplotypes i and j, and n is the sample size. There are multiple methods in calculating the
evolutionary distance dij (i) Jukes and Cantor method which applies the same mutation
rate to all nucleotides, A, C, G and T (Jukes and Cantor, 1969), (ii) the arrangement of
differing mutation rates for transitions and transversions as the former are more common
and (iii) the arrangement of a mutation rate for each base polymorphism which is most
suitable for analysis between species i.e. humans and other apes (Jobling, Hurles and
Tyler-Smith, 2004). The mean haplotypes per sample population size (k/n) for the
Afghani population in this study was 0.93, ranging from 0.89 among the Tajiks to 1.00
among the Pashtuns. The nucleotide diversity ranges from 0.011571 ±0.006788 to
0.022148 ±0.011718 (Table 6.4) while the mean number of pairwise differences (Table
6.7) ranges from 3.98 ±2.09 to 7.64 ±3.64 between the Tajiks and Hazaras. The Afghani
175
population of Uzbekistan (Irwin et al., 2009b) was identified to have a π value of 11.3
that is greater than those found in this study, which may be attributable to the larger
mtDNA sequences used in that study and/or the sample sizes of the four Afghani ethnic
groups in this study.
Table 6.7: Mean number of pairwise differences between the Afghani populations
Population
Mean Number of Pairwise Differences (π)
S.D.
Baluch
7.295238
±3.618935
Hazara
7.641026
±3.639090
Pashtun
7.208791
±3.595280
Tajik
3.980392
±2.088066
The high genetic diversity observed in this study may be explained by the numerous
settlement events by various populations migrating into Afghanistan and thus shaping its
mtDNA landscape. Despite housing several ethnic groups, some admixture has been
found among them here, as some haplotypes are shared across the ethnic boundaries
which may be due to male-driven exogamy. AMOVA (Excoffier et al., 1992) was used
to identify the mtDNA HVS-I sequence variation between the four ethnic populations as
a single group and also as two groups which divided the populations by religion; those
which practice the different denominations of Islam: Sunni Muslims (Baluch, Pashtuns
and Tajiks) and Shi’a Musims (Hazaras). AMOVA uses the variance of gene frequencies
and the number of mutations between molecular haplotypes (Excoffier & Lischer, 2010).
The AMOVA analysis (Table 6.8) of the Afghani populations based upon mtDNA HVSI sequence data reported variation within populations accounted for >98% of the genetic
variance. The AMOVA analysis was run again, this time however, the amount of genetic
variation between the Afghani populations and 62 other populations was analysed. These
mtDNA HVS-I sequences were of populations from Europe, Africa, the Near and Middle
East, Central and South Asia and East Asia.
Table 6.8: AMOVA results of variance within and among the Afghani populations and additional
populations.
Variation (%)
Among Groups
Among Populations,
Within Populations
P-Value
Within Groups
Religious Groups
1.89
-0.07
98.18
All Afghanis
1.30
-
98.70
0.08113 ±0.00735
66 Populations
7.78
-
92.22
0.00000 ±0.00000
176
Table 6.9: Pairwise differences between pairs of populations (1=Baluch, 2=Hazara, 3=Pashtun and
4=Tajik)
1
2
3
1
0.00000
2
0.00804
0.00000
3
-0.00725
0.01179
0.00000
4
0.02113
0.02353
0.00865
4
0.00000
Table 6.10: FST p-values between pairs of populations (1=Baluch, 2=Hazara, 3=Pashtun and 4=Tajik)
1
2
3
1
*
2
0.15315 ±0.0333
*
3
0.55856 ±0.0485
0.13514 ±0.0389
*
4
0.08108 ±0.0252
0.07207 ±0.0121
0.31532 ±0.0529
4
*
6.2.3.3 Theta Estimators
Several methods were used to estimate the population parameter θ = 2Nfµ, where Nf is
the female effective-population size and µ is the mutation rate. The mtDNA HVS-I
region should exhibit the same mutation rate across the Afghani populations, therefore
enabling θ to be determined by the effective number of women to have contributed their
mtDNA genome to female offspring over past generations.
Theta π (θπ). This measure estimates the effective size of the female population by using
nucleotide diversity. In a population, this figure represents the number of females it
would take to permutate the number of pairwise differences within the mtDNA sequence.
Theta S (θs) is based upon the number of segregating (polymorphic) sites (S) assuming
an infinite site model of genome evolution. The number of S is dependent on the number
of DNA sequences, thus the number of segregating sites would be expected to increase as
the population sample (number of DNA sequences) also increases. Genetic diversity can
be estimated based upon S without the dependency of sequence quantity; when mutations
are under no selection pressures and the randomly-mating population is at equilibrium.
Θs is estimated using (Watterson, 1975):
Here, S represents the total number of segregating sites and n is the number of mtDNA
HVS-I sequences.
Theta k (θk) is calculated by the number of haplotypes within the sample population
assuming the infinite allele model of genome evolution. The infinite allele model
177
assumes that each occurring mutation within the DNA sequence has not previously
emerged and instead generates a new allele. Θk shows the relationship between the
number of haplotypes observed and the sample size. It was estimated using (Ewens,
1972):
Where k is the number of haplotypes and n the number of DNA sequences. The theta
estimators were calculated (Table 6.11) using Arlequin version 3.5.1.2 (Excoffier &
Lischer, 2010).
Table 6.11: Estimators of female effective population size based upon the number of pairwise differences
(θπ), the number of segregating sites (θS) and the number of observed haplotypes (θk).
Baluch
Hazara
Pashtun
Tajik
Mean
S.D.
n
15
40
14
18
-
-
Theta π
7.29524
7.64103
7.20879
3.98039
6.53136
1.71087
S.D.
4.05764
4.04258
4.03581
2.33504
3.61777
0.85520
Theta S
8.91879
13.40059
11.00583
6.68692
10.00303
2.87061
S.D.
3.56456
4.20073
4.39099
2.64674
3.70076
0.78653
Theta k
95.44546
363.89165
*
65.06110
*
*
S.D.
**
**
*
**
*
*
* cannot be deduced when all haplotypes differ
** 95% confidence interval limits for theta (k).
The calculated θk value of the Hazara was approximately 4-5.5 times greater than the
observed values for the Baluch and Tajiks. The Baluch, Hazara and Pashtuns though do
exhibit a similar θπ value (7.209-7.641) which is approximately twice the observed for
the Tajiks (3.980), while the θS values are relatively similar between the four populations;
ranging from 6.687 in the Tajiks, to 13.401 among the Hazaras. These population
estimators indicate the Hazaras have the most effective female population in their
mtDNA heredity; however this may be a resultant of a higher population size than the
Baluch, Pashtuns and Tajiks.
6.2.3.4 Mismatch Distribution
A mismatch distribution here, is a graphic form of the number of pairwise differences
between a collection of haplotypes in a population, but can also be utilised on both RFLP
and microsatellite data. As well as illustrating the diversity within a population, also
indicates the population’s demographic history. For instance, a population which is in
178
equilibrium will exhibit a ragged and multi-modal distribution, while populations which
have recently experienced demographic expansion, the distribution will be unimodal and
present a bell-shape (Rogers & Harpending, 1992). To determine whether the four
Afghani populations present a multi- or unimodal distribution, a raggedness index (r) was
calculated for each distribution. This statistic is the sum of the squared differences
between neighbouring peaks and is calculated as defined by Harpending (1994):
Where d is the greatest number of differences between alleles, and xi is the relative
frequency of i pairwise differences. Smooth, unimodal distributions habitually have
lower raggedness values (less than 0.03 for sequence data) than for multimodal
distributions indicating a historical population expansion (Jobling, Hurles & Tyler-Smith,
2004).
Figure 6.3: Baluch population mismatch distribution (y axis = frequency); (Raggedness index (r): 0.0324)
Figure 6.4: Hazara population mismatch distribution (y axis = frequency); (Raggedness index (r): 0.0224)
179
The mismatch distributions for the Afghani populations with the expected curves under
the observed equilibrium/expansion curves and raggedness indexes were generated using
DnaSP ver. 5.10 (Figures 6.3-6.6). The Hazaras, Pashtuns and Tajiks show the unimodal
curve expected for expanding populations and have raggedness index values <0.03, while
the Baluch population distribution is less smooth and has a raggedned index >0.03
(0.0324).
Figure 6.5: Pashtun population mismatch distribution (y axis = frequency); (Raggedness index (r): 0.0121)
Figure 6.6: Tajik population mismatch distribution (y axis = frequency); (Raggedness index (r): 0.0270)
Tajima’s D statistic (Tajima, 1989) can provide an indication of demographic processes
as a result of testing the neutrality of a population. When a population is at equilibrium,
and is therefore exhibiting the neutrality traits, the D value generated will be zero.
Populations which possess positive D values can indicate balancing selection while
negative values indicate population expansion. Each Afghani population presented a
negative D value (Table 6.12) that was not significant, ranging from -1.796 for the
180
Hazara population to -1.052 for the Baluch population. This analysis was calculated
using DnaSP ver. 5.10.
Table 6.12: Tajima’s D statistic values for the Afghani populations using the total number of mutations
and the total number of segregating sites and their statistical significance.
Tajima’s D - using total
Statistical
Tajima’s D using total
Statistical
number of mutations
Significance
number of segregating sites
Significance
Baluch
-1.05171
P>0.10
-1.05171
P>0.10
Hazara
-1.79634
0.10>P>0.05
-1.76290
0.10>P>0.05
Pashtun
-1.74649
0.10>P>0.05
-1.74649
0.10>P>0.05
Tajik
-1.72745
0.10>P>0.05
-1.72745
0.10>P>0.05
6.3 Phylogenetic Network of the Afghani Population
A Median Joining network was constructed (Figure 6.7) based upon the mtDNA HVS-I
sequences of the Afghani ethnic groups. This network illustrates relationships between
the different HVS-I haplotypes identified within the Afghani populations, separating the
haplotypes by the polymorphic sites each possess. For instance, the haplotypes for
samples 113 and 190 are initially identical as they branch away from the taxa, however
beyond the node in the branch, the samples diverge as sample 190 gains one
polymorphism and 113 three polymorphisms, each not present in the haplotype of the
other. The Median Joining network, like some other analyses described previously, can
infer demographic processes of the population(s) such as population expansion when the
network presents a star-like phylogeny. A star-like phylogeny is when multiple branches
of taxons derive from a common taxa. The Median Joining network based upon the
HVS-I sequences of the Afghani populations exhibits a star-like phylogeny inferring the
population is under expansion.
6.4 Mitochondrial DNA Genetic Barriers between Afghans and Other
Populations
Anatomically Modern Humans, as a result of our ever-growing population in conjunction
with our expansive distribution exhibit lower genetic variation when compared to other
apes (Jobling, Hurles & Tyler-Smith, 2004). This can be attributed to the human
population’s placement under certain pressures, such as linguistic, cultural, religious and
geographical; the separation of populations due to natural, physical barriers i.e.
oceans/seas, mountain ranges, lakes and deserts. The Afghani populations in this study
each speak an Indo-European language; the Baluch, Hazara and Tajiks speak a variant of
181
Persian; Baluchi by the Baluch and Dari by both the Hazara and Tajiks (Farr, 2009;
Barfeild, 2010; Weinbaum, 2011) while the Pashtuns speak Pashto (Barfield, 2010).
Using the Barrier program version 2.2 (Manni et al., 2004) the pairwise FST values
derived from HVS-I haplotypes of the Afghani populations and of 3923 HVS-I
sequences from 62 additional populations (whose mtDNA HVS-I sequences had been
obtained from GenBank) were inputted to determine whether any genetic barriers could
be identified between them in relation to their geographical positions. The program will
likely identify an abrupt genetic difference between pairs of populations if a large
geographical gap is present between them as the likelihood of admixture decreases as
distance between the populations increase. This highlights the importance to include data
from the intermediate populations to avoid the occurrence of false barriers. The first five
genetic barriers identified using Barrier ver. 2.2 are shown in Figure 6.8 and the first ten
barriers in Figure 6.9.
Figure 6.7: Median Joining network calculated from the HVS-I sequences of the Afghani populations
182
Table 6.13: Co-ordinate values for the Afghani populations and the additional 62 population.
Number (in
Population
Longitude
Latitude
Number (in
reference to
reference to
Barrier test
Barrier test
output)
output)
Population
Longitude
Latitude
1
AfgBal
62,05
30,28
34
Karelians
33,50
63,44
2
AfgHaz
65,25
34,52
35
Kashmiri
76,51
34,08
3
AfgPas
65,71
31,61
36
KazakhIrw
70,20
41,78
4
AfgTaj
70,57
37,12
37
KikuyuKenya
36,55
01,16
5
Abazinian
39,21
44,04
38
KyrgIrw
71,67
40,99
6
AkhaChina
99,18
23,47
39
LisuYChina
96,57
26,01
7
Albanian
19,48
41,20
40
MokshaSVRiver
44,14
54,13
8
Altai
86,19
50,39
41
MongoBarga
106,53
47,55
9
Armenia
44,51
40,18
42
Morocco
06,83
34,01
10
AzerbRep
48,02
40,26
43
MukriIndia
74,57
15,02
11
Banglad
90,40
23,70
44
NubiaNSudan
30,41
20,07
12
Basque
01,59
42,41
45
Pakistan
73,03
31,40
13
Berber
03,67
32,48
46
RussiaIrw
31,26
58,54
14
Cantonese
110,26
21,12
47
Saami
21,48
65,31
15
Chechenian
45,42
43,19
48
SaliIndia
75,06
18,38
16
Cherkessian
41,44
43,53
49
Sardinians
09,04
40,06
17
ChinaGuang
113,26
23,12
50
Somali
45,02
02,45
18
ChinaMongol
111,74
40,84
51
Spain
03,70
40,41
19
DaiSYChina
101,15
21,55
52
TajIrw
67,38
39,20
20
English
00,44
51,47
53
Tibetian
91,11
29,64
21
Estonian
24,75
59,44
54
Turkey
32,85
39,92
22
Ethiopia
38,74
09,02
55
TurkmenIrw
66,83
37,83
23
Finland
24,93
60,16
56
IndiaReddy
79,16
18,01
24
Georgia
44,79
41,70
57
IndiaChaturvedi
77,56
26,40
25
IndAdia
95,04
28,61
58
IndiaBrahmin
75,15
31,30
26
IndNagan
94,10
25,67
59
IndiaBhargava
80,05
27,34
27
IndNisha
92,90
27,27
60
IranAra
48,40
30,39
28
IndNorthern_Sikh
75,22
31,15
61
IranAZE
48,17
38,05
29
IndPashtuns
80,03
27,34
62
IranBaluch
61,12
28,13
30
IndSouthern_Sril
80,46
07,29
63
IranFars1
51,24
35,41
31
Italy
13,40
41,56
64
IranGilak
49,35
37,16
32
Japan
139,69
35,68
65
IranJonobi
55,54
28,18
33
KabardinianNC
43,31
43,25
66
IranKord
47,01
35,18
183
The first five barriers (Figure 6.8) are described as the following:
1- The first barrier separates the Ethiopian, Kenyan and Somali (numbered 22, 37 and 50
on map) populations of Africa from the North Sudan (44 on map), European, Near East
and West Eurasian, Central, South and East Asian populations.
2- The second barrier separates the Altai, Barga Mongols, Chinese Mongols, Japanese,
Bangladeshi, Tibetian, Indian Nisha, Adia and Naga, Chinese Lisu, Akha and Dai,
Cantonese and Guangdong populations (identified as numbers 8, 41, 18, 32, 11, 53, 27,
25, 26, 39, 6, 19, 14 and 17 on map) from the populations of Central and South Asia.
3- The third barrier isolates the Saami population (47 on map) from all European
populations such as the Finns and Estonians (23 and 21), Karelians (34), Russians (46)
and English (20).
4- The fourth barrier separates the Kenyan and Somali populations from the Ethiopian
population.
5- The fifth barrier isolates the Moroccan population (numbered 42 on map) from the
Berbers, Spanish, Basque, Italian and Sardinian (named 13, 51, 12, 31 and 49)
populations of Europe.
The sixth-tenth genetic barriers on the HVS-I sequence data (Figure 6.9) are described
as:
6- The sixth barrier isolates the Indian Bhargava population (number 59) in South Asia
from the adjacent Indian Chaturvedi population (57) and the other South Asian
populations.
7- The seventh barrier separates the Iranian Baluch population (62 on map) from all other
populations; including the Iranian Jonobi and Afghani populations (65, 1, 2, 3 and 4).
This barrier separates the Iranian Baluch population from the Afghani Baluch.
8- The eighth barrier isolates the Pakistani population (45) from the neighbouring Central
Asian populations (1, 2, 3, 4, 36, 38, 52 and 55) and the Kashmiri (35) and Indian
Brahmin (58) populations to the east. Also, in South Asia, the Saliya and Mukri
populations of India (48 and 43) are separated from the Indian Reddy (56) and the Sri
Lankan population (30 on the map).
9- The ninth barrier separates the Altai and Barga Mongols from the East Asian
populations of the second barrier.
10- The tenth barrier separates the Bangladeshi, Tibetian and Indian Naga populations
(numbered 11, 53 and 26 on the map) from the Indian Nisha and Adia (27 and 25) and
the Chinese and Japanese populations.
184
185
186
The barrier analysis shows no significant genetic barrier between the Afghani
populations when analysed with the 62 additional populations. However, a genetic
barrier has been determined between the Iranian Baluch population of south-eastern Iran
and the Afghani Baluch of south-western Afghanistan. This barrier may be attributable to
a geographical barrier; the Dasht-e Margo desert in Afghanistan’s south west or the
Hamun lakes in eastern Iran and south-western Afghanistan. A second barrier was
identified which separated the Pakistani population from the Afghani populations to its
west and from the Indian groups to its east. The barrier analysis also shows that among
the top ten genetic barriers determined from this mtDNA HVS-I sequence collection, no
significant genetic barrier between the Afghani populations and the Central Asian
populations is present.
187
Chapter Seven
Y-Chromosome Analysis of Afghani
Ethnic Groups
Haber et al., (2012) Afghanistan’s Ethnic
Groups Share a Y-Chromosomal Heritage
Structured by Historical Events.
PLoS One 7:e34288.
188
PLoS One
Afghanistan’s Ethnic Groups Share a YChromosomal Heritage Structured by Historical
Events
Marc Haber1,2, Daniel E. Platt3, Maziar Ashrafian Bonab4, Sonia C. Youhanna1, David F. SoriaHernanz2,7, Begoria Martinez-Cruz2, Bouchra Douaihy1, Michella Ghassibe-Sabbagh1, Hoshang
Rafatpanah5, Mohsen Ghanbari5, John Whale4, Oleg Balanovsky6, R. Spencer Wells7, David Comas2,
Chris Tyler-Smith8, Pierre A. Zalloua1,9*, The Genographic Consortium”
1 The Lebanese American University, Chouran, Beirut, Lebanon. 2 Evolutionary Biology Institute, Pompeu Fabra University, Barcelona, Spain. 3
Bioinformatics and Pattern Discovery, IBM T.J. Watson Research Centre, Yorktown Heights, New York, United States of America. 4 Biological
Sciences, School of Biological Sciences, University of Portsmouth, Portsmouth, United Kingdom. 5 Mashhad University of Medical Sciences,
Mashhad, Iran. 6 Research Centre for Medical Genetics, Russian Academy of Medical Sciences, Moscow, Russia. 7 The Genographic Project, National
Geographic Society, Washington D.C., United States of America. 8 Wellcome Trust Genome Campus, The Wellcome Trust Sanger Institute, Hinxton,
Cambridgeshire, United Kingdom. 9 Harvard School of Public Health, Harvard University, Boston, Massachusetts, United States of America.
Abstract
Afghanistan has held a strategic position throughout history. It has been inhabited since the Paleolithic and
later became a crossroad for expanding civilizations and empires. Afghanistan’s location, history, and diverse
ethnic groups present a unique opportunity to explore how nations and ethnic groups emerged, and how
major cultural evolutions and technological developments in human history have influenced modern
population structures. In this study we have analyzed, for the first time, the four major ethnic groups in
present-day Afghanistan: Hazara, Pashtun, Tajik and Uzbek, using 52 binary markers and 19 short tandem
repeats on the non-recombinant segment of the Y-Chromosome. A total of 204 Afghan samples were
investigated along with more than 8,500 samples from surrounding populations important to Afghanistan’s
history through migrations and conquests, including Iranians, Greeks, Indians, Middle Easteners, East
Europeans and East Asians. Our results suggest that all current Afghans largely share a heritage derived from
a common unstructured ancestral population that could have emerged during the Neolithic revolution and the
formation of the first farming communities. Our results also indicate that inter-Afghan differentiation started
during the Bronze Age, probably driven by the formation of the first civilizations in the region, increasing
inter-population genetic differences, and giving the Afghans a unique genetic diversity in Central Asia.
Citation: Haber M, Platt DE, Ashrafian Bonab M, Youhanna SC, Soria-Hernanz DF, et al. (2012) Afghanistan's Ethnic Groups Share a Y-Chromosomal
Heritage Structured by Historical Events. PLoS ONE 7(3): e34288. doi:10.1371/journal.pone.0034288
Editor: Manfred Kayser, Erasmus University Medical Center, The Netherlands
Received November 21, 2011; Accepted February 25, 2012; Published March 28, 2012
Copyright: © 2012 Haber et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: This study is supported by the Waitt Family Foundation. The funder had no role in study design, data collection and analysis, decision to
publish, or preparation of the manuscript.
Competing Interests: Daniel E Platt is an employee if IBM. With regard to the Genographic Consortium: Janet Ziegle is employed by Applied Biosystems,
and Pandihumar Swamikrishnan, Asif Javed, Laximi Parida and Ajay K. Royyuru are employed by IBM. There are no patents or products in development or
marketed products to declare. This does not alter the authors' adherence to all the PLoS ONE policies on sharing data and materials.
* E-mail: [email protected]
" Membership of the Genographic Consortium is provided in the Acknowledgments.
Introduction
Afghanistan is landlocked country at the intersection of Central Asia, South
Asia, and the Middle East that has held a strategic position throughout
history. It was a crossroad of ancient trade routes and human migrations.
The main east-west trade routes passed through its northern and southern
plains, and through its mountain passes before the ascendancy of
waterborne trade between Europe and the Far East.
Paleolithic humans probably inhabited the caves of Afghanistan as long as
50,000 years ago (ya). In northern Afghanistan, flake tools found in Dara
Dadil, Darra Chakhmakh, and elsewhere indicate the probable existence of
189
Middle Paleolithic industries [1]. Northern Afghanistan also sits in a region of
the development of the earliest agricultural communities, marked by
domestication of the wheat/barley, sheep/goat/cattle complex leading to the
Neolithic revolution (10,000-7,000 ya), later supporting the economy of early
urban Bronze Age civilizations in Central Asia at the Bactria-Margiana
Archaeological Complex (4300-3700 ya) and in India at the Indus Valley (53003800 ya) [2] has been proposed that the decline of these early civilizations was
accompanied by, or was the result of, the expanding populations from the
Eurasian steppe, reaching the Indian subcontinent in the late Harappan period
[3].
The second and first millennia BCE were also marked by the influx of
Iranian tribes, later ruling Afghanistan as part of the Achaemenid Empire
established by Cyrus the Great (550 BCE) [4]. The military might of the
Achaemenids was destroyed by Alexander the Great, bringing Hellenic
language and culture to the region. During the next several centuries, control
over Afghanistan was contested among the Seleucids, Bactrians, Parthians,
and Indians of the Mauryan dynasty [5]. The first century CE, brought a new
invasion of Iranian tribes under the leadership of the Kushan tribes, who
adopted and spread Buddhism. After they have conquered most of Persia,
Arabic armies invaded Afghanistan spreading Islam. Mongol and TurcoMongol expansions brought turmoil to the region, marked by periods of
instability to the Silk Road traffic [4], which was later reduced permanently
with the establishment of European maritime trade systems.
The present population of Afghanistan contains many diverse elements,
the result of large-scale migrations and conquests that influenced its culture
and demography. Pashtuns are the largest ethnic group in Afghanistan,
accounting for about 42 percent ofthe population, with Tajiks (27%),
Hazaras (9%), Uzbeks (9%), Aimaqs (4%), Turkmen people (3%), Baluch
(2%), and other groups (4%) making up the remainder [6]. In the present
study, eight ethnic groups were examined, with a focus on the largest four
groups: -The Pashtuns, traditionally lived a seminomadic lifestyle, they reside
mainly in southern and eastern Afghanistan and in western Pakistan. They
speak Pashto which is a member of the Eastern Iranian languages. - The
Tajiks are a Persian-speaking ethnic group which are closely related to the
Persians of Iran. In Afghanistan, they are the largest Tajik population outside
their homeland to the north in Tajikistan. - The Hazara population speaks
Persian with some Mongolian words. They believe they are descendants of
Genghis Khan's army that invaded during the twelfth century. -The Uzbeks
are a Turkic speaking group that have been living a sedentary farming
lifestyle in Northern Afghanistan.
While previous theories about the origin of the Afghans are usually based
on oral traditions or scanty historical information (Table S1), few studies
have explored the genetic structure of the Afghan people, and those that did
were limited to either listing of autosomal short tandem repeats (STRs)
frequencies [7,8] or Y-chromosome STR analysis in a single ethnic group [9].
In this study, we present an extensive analysis of the Y-chromosomal
variation in the major ethnic groups of Afghanistan. We provide, for the first
time, deep phylogenetic information on Afghan haplogroup memberships,
and we also analyze 19 Y-chromosomal STRs allowing fine comparisons
across and among populations. We use this information to explore whether
the ethnic groups in Afghanistan reflect different social systems that arose in
a common population or whether cultural differences are founded on already
existing genetic differences. We also seek to understand the genetic
composition of modern Afghans in the context of surrounding populations
as well as other possible source populations, identifying traces of historical
movements that influenced the different ethnic groups, and exploring how
the establishment of the first civilizations in the region affected the present
Afghan genetic diversity.
Materials and Methods
Ethics Statement
All participants recruited and genotyped in the present study had at least
three generations of paternal ancestry in their country of birth, and provided
details of their geographical origin and written consent for this study, which
was approved by the IRB of the Lebanese American University.
Subjects and Comparative Datasets
The modern populations selected for this study were those from regions with
ancient historical importance to Afghanistan through conquest or migration,
including Iranians, Greeks and Indians, in addition to populations with more
recent impacts, such as the Arab expansion in the 7th century and the East
Asian invasions in the 13th and 14th century. In addition, we have also
included populations from the Pontic-Caspian steppe region, from West
Russia and East Europe, which were possibly involved in the Indo-European
migrations that reached the Iranian plateau and Northern India.
A total of 8,706 samples were used in the analyses including 204 newly
genotyped samples from Afghanistan. The genotyping results and the
190
subjects' paternal province and their city or village of origin when available
are listed in Table S2. The dataset used include Middle Easterns (2,720
samples) [10,11,12,13,14], Central/South Asians (1,335 samples)
[15,16,17,18], East Asians (1,029 samples) [15,19], Caucasians (1,525 samples)
[20], West Russians (545 samples) [21], Europeans (1,123 samples) [21,22,23,
24,25], and Africans (222 samples) [26,27]. More details on the analyzed
samples are listed in Table S3.
Genotyping
DNA was extracted from blood or buccal swabs using a standard phenolchloroform protocol. Samples were genotyped using the Applied Biosystems
7900HT Fast Real-Time PCR System with a set of 52, highly informative,
custom Y-chromosomal binary marker assays (Applied Biosystems, Foster
City, CA) from the non-recombining portion of the Y chromo-some which
define 32 different haplogroups. A total of 19 Y-chromosome STR loci were
analyzed for each sample in two multiplexes on an Applied Biosystems
3130xl Genetic Analyzer. The first multiplex contained the standard 17 loci
of the Y-filerTM PCR Amplification kit (Applied Biosystems, Foster City, CA).
The remaining two loci, DYS388 and DYS426, were genotyped in a custom
multiplex. STR alleles were named according to previous recommendations
[28].
Statistical Analyses
Haplogroup Frequencies and Principal Component Analysis, Fisher's
exact tests were performed on haplogroups vs populations to identify which
haplogroups were significantly over- or under- represented in Afghanistan's
ethnic groups. A principal component analysis (PCA) [29], was performed on
relative haplogroup frequencies normalized within populations, centered, and
without variance normalization. Since haplogroup resolution was not uniform
across studies, the haplogroups were reduced to the most informative derived
markers shared across studies.
Genetic Distances, Multidimensional Scaling and Barrier Analysis,
Non-metric multidimensional scaling (MDS) [30] was performed using WST
distances between populations computed by ARLEQUIN [31] on Y-STR loci
DYS19, DYS389I, DYS389b, DYS390, DYS391,
DYS392, DYS393,
DYS437, DYS438, DYS439, DYS448, DYS456, DYS458, DYS635, GATA
H4.
Monmonier's maximum difference algorithm [32] was imple-mented using
Barrier [33]. The algorithm enables interpretation of microevolutionary
processes in a geographic context, identifying genetic barriers that can be
visualized on a map.
AMOVA, Significance of population structures created by Barrier was tested
using AMOVA [34], implemented in ARLEQUIN [31]. We also tested
whether geography or Barrier structures better explained the present
distribution of diversity. AMOVA seeks to identify variance within
populations due to drift by comparing variation among groups of similar
populations via a nested analysis of variance. First, populations were grouped
according to their geographic location as follows; 1- Afghanistan:
Pashtun, Tajik, Uzbek, Hazara. 2- East Europe: Belarus, West Russia. 3Caucasus: Avar, Darginian, Lezgi, Abkhazian, Circassian. 4- Middle East and
Europe: Greece, Turkey, Lebanon, Syria. 5-Iran: East Azerbaijan, Markazi,
Mazandaran, Qazvin, Sistan and Baluchistan. 6- India: North, West, South.
Second, populations were grouped according to the identified barriers; 1Pashtun, Tajik, North India, West India. 2- Hazara, Uzbek 3- Caucasus:
Avar, Darginian, Lezgi. 4- Caucasus: Circassian, Abkhazian, 5- Iran: East
Azerbaijan, Markazi, Mazan-daran, Qazvin, Sistan and Baluchistan. 6Belarus, West Russia. 7-Middle East and Europe: Greece, Turkey, Lebanon,
Syria.
Reduced Median Networks, Reduced Median (RM) Networks [35] of STR
haplotypes within C-M130, R1a1a-M17, E1b1b1-M35, and B-M60 were
calculated using a reduction threshold of 1, with no STR weighting.
BATWING, We applied BATWING [36] to compute candidate population
splits in the modal tree among regional populations within and around
Afghanistan in order to test whether BARRIER-identified population
separations also showed older splits, exploring multiple combinations of
populations. The Hastings-Metropolis algorithm will tend to select larger
likelihoods for the leading genetic support assuming all the populations
originally emerged from one population with no genetic flow subsequent to
each splitting event. This provides a very specific view in determining genetic
relationships among the populations which could be compared and
contrasted with other methods, such as MDS or BARRIER [33]. STRs used
were those described under the MDS section above.
The mutation rate priors applied to these calculations were those
proposed in Xue et al. [19] based on Zhivotovsky et al.'s rate estimates [37].
There are differences between mutation rates that appear to accumulate over
multiple generations (an ''evolutionary rate'') versus those that accumulate
from generation to generation (a ''genealogical rate'') [38], which appears yet
unresolved. Nevertheless, the topology of the population splits BATWING
predicts, and the relative periods of isolation are proportionately unaffected.
Therefore, the population split trees still serve for comparison with
BARRIER and other methods regardless of the mutation rate. Effective
population sizes tend to scale inversely with the rates, with a slight impact
due to the effective population size prior. Use ofthe Zhivotovsky rates in
prior publications allows for comparisons with other publications that
applied the same rates.
The data were partitioned into multiple runs (Table S6). The independent
computation of multiple trees with different subsets and groupings of
populations should produce similar population splits and ages of population
divisions among configurations. One caveat is that inclusion of other
populations may provide more support to different candidate modal trees.
Therefore, compari-sons among multiple runs provide a consistency check
for convergence and stability: each of the runs must correspond with the
others at the points of their shared topologies. Given agreement between
BATWING runs, a composite tree comprised of these multiple runs, and
connected through shared branches, can be constructed.
The Indian populations structures resulted in slower equilibra-tion than
was seen among the other populations. After equilibration, the Indian
populations showed older splits among them than is shown between India as
a whole and the other populations when India is pooled. This older split may
have resulted partly from differences in weights among candidate trees that
the Metropolis-Hastings algorithm samples based on the likelihood ratios
derived from the population configurations that will lead to different modal
trees with different split times.
191
Alternatively, the older split may have also resulted from violations of the
assumption of isolation after population splitting. These complications led to
the separate treatments of India BATWING runs from the western
populations runs.
Results
Genotyping revealed 32 halpogroups present in Afghanistan's ethnic groups
among our samples. Haplogroups R1a1a-M17, C3-M217, J2-M172, and LM20 were the most frequent when Afghan ethnic groups were pooled,
together comprising >66% of the chromosomes. Absolute and relative
haplogroup frequencies are tabulated in Table S4.
Haplogroup frequencies across the major ethnic groups revealed large
differences. In particular, frequencies of haplogroup C3-M217, which is
mainly found in East Asia, and haplogroup R1a1a-M17, which is found in
Eurasia, varied substantially among the Afghan groups. C3-M217 was
significantly more frequent (p = 4.55 x10"9) in Uzbeks (41.18%) and Hazaras
(33.33%) than it was in Tajiks (3.57%) and Pashtuns (2.04%). On the other
hand, R1a1a-M17 was significantly more frequent (p = 3.00x10 6 ) i n
Pashtuns (51.02%) and Tajiks (30.36%) than in Uzbeks (17.65%) and Hazaras
(6.67%). RM networks of C3-M217 (Figure S1A) and R1a1a-M17 (Figure
S1B) show that when a haplogroup was infrequent in an ethnic group, its
haplotypes existed on branches not shared with other Afghans, suggesting
that the underrepre-sented haplogroups are not the result of a gene flow
between the ethnic groups, but probably a direct assimilation from source
populations.
Haplogroups autochthonous to India [15]; L-M20, H-M69, and R2a-M124
were found more (p = 0.004) in Pashtuns (20.41%) and Tajiks (19.64%) than
in Uzbeks (5.88%) and Hazaras (5%). E1b1b1-M35 was found in Hazaras
(5%) and Uzbeks (5.88%) but not in Pashtuns and Tajiks. RM network of
E1b1b1-M35 (Figure S1C) shows that Afghanistan's lineages are correlated
with Middle Easterners and Iranians. We also note the presence of the
African B-M60 only in Hazara, with a relatively recent common founder
ancestor from East Africa as shown in the RM network (Figure S1D).
PCA of the haplogroups frequency (Figure 1) also shows differences
among Afghans. Although the worldwide populations are mostly clustered
according to geography, Afghan groups appear to show more affinity to nonAfghans than to each others. Pashtun and Hazara in Afghanistan and
Pakistan show affinity to their ethnic groups across borders. The Afghan
Tajiks show equal distance to Central Asia and to Iran/Caucasus/West
Russia. The Afghan Hazara, Afghan Uzbek, and Pakistan Hazara sit between
East Asia and the Middle East/Europe-Caucasus/West Russia cluster.
More details about the structure of the Afghan population appear in the
MDS of the W S T 's (Figure 2B) which shows that the Afghan Pashtun and
Tajik are closer to North and West Indians than to the other Afghans; Hazara
and Uzbek. This cluster also sits between East Europeans and Iranians more
close to the Iranians especially to East Azerbaijan. Furthermore, Barrier
(Figure 2A) shows that Barrier IV splits the Afghan populations separating
the Hazara and Uzbek from the Pashtun, Tajik and the Indian populations,
creating groups of populations that have less variation within the groups
(2.30%, p<0.001) and more variation among groups (10.48%, p<0.001)
compared to populations grouped by region or country (within groups =
4.95%, p<0.001, among groups = 7.16%, p<0.001) (Table S5).
To explore the time depth in which the above reported structures have
emerged, we employed BATWING to create hypotheses on historical
population splitting and coalescent events,
Figure 1. PCA derived from Y-chromosomal haplogroup frequencies. The two leading principal components display the
variance. The superimposed biplot shows the contribution of each haplogroup as grey component loading vectors.
doi:10.1371/journal.pone.0034288.g001
reflecting dominating genetic ancestral structures identified in BATWING's
modal trees from which the current populations have emerged (Table S6).
The BATWING results showed that most of the regional splits occurred
around 10 kya (95% CI 7,100-15,825) (Figure 3). These splits coincide with
post LGM expansions that have led to the Neolithic agricultural revolution.
During this period Afghans, Iranians, Indians and East Europeans most
likely emerged as distinct unstructured populations. BATW-ING showed
another wave ofsplits that started later and may have created the interpopulation structures. This second wave of splits started in Afghans 4.7 kya
(95% CI 2,775-7,725), marking the
192
start of civilization building and displacements, and these splits appear to
have continued to nearly modern times. BATWING results in general
corroborated the geographical splits identified by BARRIER.
Results
This study describes for the first time the Y-chromosome diversity of the
main ethnic groups in Afghanistan. We have
Figure 2. Population genetic structure vs geography. Genetic barriers (A) and MDS plot (B) based on the W 5r's distances between
populations derived from Y-STR data. doi:10.1371/journal.pone.0034288.g002
explored the genetic composition of modern Afghans and correlated their
genetic diversity with well established historical events and movements of
neighbouring populations. The study data strongly shows that continuous
migrations and movements through Central Asia since at least the Holocene,
have created population structures that today, are highly correlated with
ethnicity in Afghanistan.
A previous study on Pakistan [39], that included ethnic groups also
present in Afghanistan (Baluch, Hazara, Pashtun), showed that Ychromosome variation was structured by geography and not by ethnic
affiliation. With the exception of Hazara, all ethnic groups in Pakistan were
shown to have similar Y-chromosome diversity, they clustered with South
Asians, and they are close to Middle Eastern males. A Y-chromosome study
[40] on populations from Turkmenistan, Uzbekistan, Kazakhstan,
193
Kyrgyzstan, and Tajikstan, found that there is greater diversity among
populations that share the same ethnic group than among the ethnic groups
themselves. These observations support a common genetic ancestry
hypothesis for these populations irrespective of ethnicity. We have also found
substantial differences among the various groups of Afghanistan. The interethnic comparisons however could not be tested in this study since
information on tribe and clan affiliation was not available. The high genetic
diversity observed among Afghanistan s groups has also been observed in
other populations of Central Asia [41,42,43,44,45]. It is possibly due to the
strategic location ofthis region and its unique harsh geography of mountains,
deserts and steppes, which could have facilitated the establishment of social
organizations within
Figure 3. Composite BATWING population splitting. The composite tree is constructed from data sets described in the text,
based on the results displayed in Table S6, with a pruned leading topology and averaged times. Numbers indicate branch
lengths measured in thousand years. doi:10.1371/journal.pone.0034288.g003
expanding populations, and helped maintaining genetic boundaries among
groups that have developed over time into distinct ethnicities.
The RM networks of the major common haplogroups show that the flow
of paternal lineages among the various ethnic groups is very limited, and it is
consistent with high level of endogamy practiced by these groups. Similar Ychromosome results have been previously reported among the Central Asian
ethnic groups [40], but with less pronounced genetic differentiation in
maternal lineages [40], most likely the results of endogamous practices that
were tolerant to assimilation of foreign females.
The prevailing Y-chromosome lineage in Pashtun and Tajik (R1a1a-M17),
has the highest observed diversity among populations of the Indus Valley
[46]. R1a1a-M17 diversity declines toward the Pontic-Caspian steppe where
the mid-Holocene R1a1a7-M458 sublineage is dominant [46]. R1a1a7-M458
was absent in Afghanistan, suggesting that R1a1a-M17 does not support, as
previously thought [47], expansions from the Pontic Steppe [3], bringing the
Indo-European languages to Central Asia and India.
MDS and Barrier analysis have identified a significant affinity between
Pashtun, Tajik, North Indian, and West Indian populations, creating an
Afghan-Indian population structure that excludes the Hazaras, Uzbeks, and
the South Indian Dravidian speakers. In addition, gene flow to Afghanistan
from India marked by Indian lineages, L-M20, H-M69, and R2a-M124, also
seems to mostly involve Pashtuns and Tajiks. This genetic affinity and gene
flow suggests interactions that could have existed since at least the
establishment of the region s first civilizations at the Indus Valley and the
Bactria-Margiana Archaeological Complex.
Furthermore, BATWING results indicate that the Afghan populations
split from Iranians, Indians and East Europeans at about 10.6 kya (95% CI
7,100-15,825), which marks the start of the Neolithic revolution and the
establishment of the farming communities. In addition, Pashtun split first
from the rest of the
Afghans around 4.7 kya (95% CI 2,775-7,725), which is a date
marked by the rise of the Bronze Age civilizations of the region. These
dates suggest that the differentiation of the social systems in Afghanistan
could have been driven by the emergence of the first urban civilizations.
However, the dates suggested by BATWING
should be treated with care, since BATWING does not model gene flow and
differential assimilation of incoming migrations. These events could alter the
time of split. However, it was previously shown that topologies and times of
splits in the modal trees generated by BATWING are insensitive to inmigration [13], which leaves BATWING timing results insusceptible to inmigrations and invasions that might be expected to reduce the times of split
[13]. On the other hand, the times of population splits for BATWING s
modal trees are very susceptible to subsequent migration between those
populations. This means that the 2 major waves of splitting could have
occurred earlier, but since RM networks of the major haplogroups show
limited gene flow between the ethnic groups and since the population
structure suggested by MDS and Barrier correlate populations from the
historically connected [2] Bronze Age sites to Pashtun and Tajik, BATWING
suggested splits in Afghan populations at 4.7 kya (95% CI 2,775-7,725) are
very probable. A previous study by Heyer et al conducted in Central Asia [40]
have also estimated significantly older dates for the emergence of ethnic
groups from what has been historically known. These older dates may be
explained by the fact that This suggests that the ethnic groups could have
resulted from a encompass fusion of different populations [40] or that
ethnicities developed were established from anin already structured
population(s).
BATWING’s hypotheses model mutations and coalescent events,
reflecting ancestral structures from which the current populations have
emerged. Later expansions into the region would have assimilated the
ancestral population, granting the Afghans distinctive genetics from the
expanding source populations even though they shared general genetic
features. This is evident in the Afghan Hazara and Afghan Uzbek who have
always been associated with expanding Mongols and Turco-Mongols. Although we have found that at least third to half of their chromosomes are of
East Asian origin, PCA places them between East Asia and Caucasus/Middle
East/Europe clusters.
Historical expansions and invasions appear to have had differential
contribution in shaping Afghanistan population structures. We have found
limited genetic evidence of expansions previously thought to have left
specific imprints in current populations.
194
The E1b1b1-M35 lineages in some Pakistani Pashtun were previously
traced to a Greek origin brought by Alexander s invasions [48]. However,
RM network of E1b1b1-M35 found that Afghanistan s lineages are
correlated with Middle Easterners and Iranians but not with populations
from the Balkans.
The Islamic invasion in the 7th century CE left an immense cultural
impact on the region, with reports of Arabs settling in Afghanistan and
mixing with the local population [49]. However the genetic signal of this
expansion is not clearly evident: some Middle Eastern lineages such as
E1b1b1-M35 are present in Afghanistan, but the most prevalent lineage
among Arabs (J1-M267) was only found in one Afghan subject. In addition,
the three Afghans that identified their ethnicity as Arab, had lineages
autochthonous to India.
We also note that three Hazara subjects belonged to haplogroup B-M60,
which is very rare outside Africa. RM network shows that the subjects had a
recent founding ancestor from East Africa, which could have been brought
Table S1 Suggested origins of the main ethnic groups in Afghanistan.
(DOC)
Table S2 Y-chromosome haplogroups and haplotypes in 204 unrelated
individuals from Afghanistan.
(XLS)
Table S3 Populations selected for this study.
(XLS)
Table S4 Y-chromosome haplogroups frequencies in Afghani-stan s ethnic
groups.
(XLS)
Table S5 AMOVA results. Comparing populations grouped according to
their country or region of origin with populations grouped according to
Barrier structures.
(DOC)
Table S6 BATWING topologies and dates with 95% confidence intervals of
population splits derived from multiple combinations of population subsets.
(XLS)
to Afghanistan through slave trade. This shows that the genetic ethnic
boundaries have been selectively permeable, however the history of the rules
of assimilation in this region over time are not yet clearly understood.
Language adoption and spread in Afghanistan also seem to have been a
complex process. The Afghan genetic structure tends to correlate Hazara
and Uzbek which belong to two different language families. Hazara, like
Pashtun and Tajik, belong to the Indo-Iranian group of the Indo-European
family, while the Uzbek language is in the Turkic family. The form of Turkic
spoken by the Uzbek appears to be a direct descendent of an extinct Turkic
language that was developed in the 15th century CE [50]. It appears that the
dominating genetics shared among Uzbek and Hazara split >1 ky prior to
this date. Therefore, it is possible that language differences in Afghanistan
reflect a more recent cultural shift.
In conclusion, Y-chromosome diversity in Afghanistan reveals major
differences among its ethnic groups. However, we have found that all
Afghans largely share a heritage of a common ancestral population that
emerged during the Neolithic revolu-tion and remained unstructured until
4.7 kya (95% CI 2,775-7,725). The first genetic structures between the
different social systems started during the Bronze Age accompanied, or
driven, by the formation of the first civilizations in the region. Later
migrations and invasions to the region have been differentially assimilated by
the ethnic groups, increasing inter-population genetic differences, and giving
the Afghan a unique genetic diversity in Central Asia.
Supporting Information
Figure S1 Reduced median networks. (A) C-M130, (B) R1a1a-M17, (C)
E1b1b1-M35, and (D) B-M60 showing STR haplotype distributions among
populations; area is proportional to haplotype frequency, and color indicates
populations. Connecting lines represent putative phylogenetic relationships
between haplotypes.
(TIF)
Acknowledgements
We thank the sample donors for taking part in this study. We also thank Dr.
Christopher Thornton and Mr. Brian Johnsrud for their insightful comments.
CTS is supported by The Wellcome Trust. The Genographic Project is
supported by funding from the National Geographic Society, IBM, and the
Waitt Family Foundation. Members of the Genographic Consortium:
Janet S. Ziegle (Applied Biosystems, Foster City, California, United States); Li
Jin & Shilin Li (Fudan University, Shanghai, China); Pandikumar
Swamikrishnan (IBM, Somers, New York, United States); Asif Javed, Laxmi
Parida & Ajay K. Royyuru (IBM, Yorktown Heights, New York, United
States); Lluis Quintana-Murci (Institut Pasteur, Paris, France); R. John
Mitchell (La Trobe University, Melbourne, Victoria, Australia); Syama
Adhikarla, ArunKumar GaneshPrasad, Ramasamy Pitchappan & Arun
Varatharajan Santhakumari (Madurai Kamaraj University, Madurai, Tamil
Nadu, India); Angela Hobbs & Himla Soodyall (National Health Laboratory
Service, Johannesburg, South Africa); Elena Balanovska (Research Centre for
Medical Genetics, Russian Academy of Medical Sciences, Moscow, Russia);
Daniela R. Lacerda & Fabricio R. Santos (Universidade Federal de Minas
Gerais, Belo Horizonte, Minas Gerais, Brazil); Pedro Paulo Vieira
(Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil); Jaume
Bertranpetit & Marta Mele (Universitat Pompeu Fabra, Barcelona, Spain);
Christina J. Adler, Alan Cooper, Clio S. I. Der Sarkissian & Wolfgang Haak
(University of Adelaide, South Australia, Australia); Matthew E. Kaplan &
Nirav C. Merchant (University of Arizona, Tucson, Arizona, United States);
Colin Renfrew (University of Cambridge, Cambridge, United Kingdom);
Andrew C. Clarke & Elizabeth A. Matisoo-Smith (University of Otago,
Dunedin, New Zealand); Matthew C. Dulik, Jill B. Gaieski, Amanda C.
Owings, Theodore G. Schurr & Miguel G. Vilar (University of Pennsylvania,
Philadelphia, Pennsylvania, United States).
Author Contributions
Conceived and designed the experiments: MH DP PZ. Performed the
experiments: MH SY BD MGS. Analyzed the data: MH DP MAB DSH
BMC. Contributed reagents/materials/analysis tools: MAB HR MG OB
JW. Wrote the paper: MH PZ. Revised the manuscript: RSW DC CTS.
References
1.
2.
3.
Dupree L (1964) Prehistoric Archeological Surveys and Excavations in
Afghanistan: 1959-1960 and 1961-1963. Science 146: 638-640.
Dupree L (1980) Afghanistan. Princeton, NJ: Princeton University
Press. 778 p.
Gimbutas M (1970) Proto-Indo-European Culture: The Kurgan Culture
during the Fifth, Fourth, and Third Millennia B.C. In: Cardona G,
Hoenigswald M, Senn A, eds. Indo-European and Indo-Europeans:
Papers Presented at the Third Indo-European Conference at the
University of Pennsylvania. Philadel-phia, PA: University of
Pennsylvania Press. pp 155-197.
195
4.
5.
6.
7.
Wilber D (1962) Afghanistan: Its people, its society, its culture. New
Haven, CT: Hraf Press.
Elizabeth E, Sarkhosh CV (2007) From Persepolis to the Punjab :
exploring ancient Iran, Afghanistan and Pakistan. London: British
Museum Press.
Library of Congress. Federal Research Division (2001) Afghanistan : a
country study. Baton Rouge, LA: Claitor s Pub. Division. xlv, 226 p.
Berti A, Barni F, Virgili A, Iacovacci G, Franchi C, et al. (2005)
Autosomal STR frequencies in Afghanistan population. J Forensic Sci
50:1494-1496.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
Di CristofaroJ, Buhler S, Temori SA, ChiaroniJ (2012) Genetic data of
15 STR loci in five populations from Afghanistan. Forensic Sci Int
Genet 6(1): e44-45.
Lacau H, Bukhari A, Gayden T, La Salvia J, Regueiro M, et al. (2011) YSTR profiling in two Afghanistan populations. Leg Med (Tokyo)
13:103-108.
Alakoc YD, Gokcumen O, Tug A, Gultekin T, Gulec E, et al. (2010) Ychromosome and autosomal STR diversity in four proximate
settlements in Central Anatolia. Forensic Sci Int Genet 4: e135-137.
Cinnioglu C, King R, Kivisild T, Kalfoglu E, Atasoy S, et al. (2004)
Excavating Y-chromosome haplotype strata in Anatolia. Hum Genet
114: 127-148.
El-Sibai M, Platt DE, Haber M, Xue Y, Youhanna SC, et al. (2009)
Geographical structure of the Y-chromosomal genetic landscape of the
Levant: a coastal-inland contrast. Ann Hum Genet 73: 568-581.
Haber M, Platt DE, Badro DA, Xue Y, El-Sibai M, et al. (2011)
Influences of history, geography, and religion on genetic structure: the
Maronites in Lebanon. Eur J Hum Genet 19: 334-340.
Zalloua PA, Platt DE, El Sibai M, Khalife J, Makhoul N, et al. (2008)
Identifying genetic traces of historical expansions: Phoenician
footprints in the Mediterranean. Am J Hum Genet 83: 633-642.
Sengupta S, Zhivotovsky LA, King R, Mehdi SQ, Edmonds CA, et al.
(2006) Polarity and temporality of high-resolution y-chromosome
distributions in India identify both indigenous and exogenous
expansions and reveal minor genetic influences of Central Asian
pastoralists. Am J Hum Genet 78: 202-221.
Yadav B, Raina A, Dogra TD (2011) Haplotype diversity of17 Ychromosomal STRs in Saraswat Brahmin Community of North India.
Forensic Sci Int Genet 5: e63-70.
Balamurugan K, Suhasini G, Vijaya M, Kanthimathi S, Mullins N, et al.
(2010) Y chromosome STR allelic and haplotype diversity in five ethnic
Tamil populations from Tamil Nadu, India. Leg Med (Tokyo) 12: 265269.
Thangaraj K, Naidu BP, Crivellaro F, Tamang R, Upadhyay S, et al.
(2010) The influence of natural barriers in shaping the genetic structure
of Maharashtra populations. PLoS One 5: e15283.
Xue Y, Zerjal T, Bao W, Zhu S, Shu Q, et al. (2006) Male demography
in East Asia: a north-south contrast in human population expansion
times. Genetics 172: 2431-2439.
Balanovsky O, Dibirova K, Dybo A, Mudrak O, Frolova S, et al. (2011)
Parallel Evolution of Genes and Languages in the Caucasus Region.
Mol Biol Evoldoi: 10.1093/molbev/msr126.
Roewer L, Willuweit S, Kruger C, Nagy M, Rychkov S, et al. (2008)
Analysis of Y chromosome STR haplotypes in the European part of
Russia reveals high diversities but non-significant genetic distances
between populations. Int J Legal Med 122: 219-223.
Bosch E, Calafell F, Gonzalez-Neira A, Flaiz C, Mateu E, et al. (2006)
Paternal and maternal lineages in the Balkans show a homogeneous
landscape over linguistic barriers, except for the isolated Aromuns. Ann
Hum Genet 70: 459-487.
Rebala K, Tsybovsky IS, Bogacheva AV, Kotova SA, Mikulich AI, et al.
(2011) Forensic analysis of polymorphism and regional stratification of
Y-chromosomal microsatellites in Belarus. Forensic Sci Int Genet 5:
e17-20.
Volgyi A, Zalan A, Szvetnik E, Pamjav H (2009) Hungarian population
data for 11 Y-STR and 49 Y-SNP markers. Forensic Sci Int Genet 3:
e27-28.
Kovatsi L, Saunier JL, Irwin JA (2009) Population genetics of Ychromosome STRs in a population of Northern Greeks. Forensic Sci
Int Genet 4: e21-22.
Batini C, Ferri G, Destro-Bisol G, Brisighelli F, Luiselli D, et al. (2011)
Signatures of the pre-agricultural peopling processes in sub-Saharan
Africa as revealed by the phylogeography of early Y chromosome
lineages. Mol Biol
Evoldoi: 10.1093/molbev/msr089.
Gomes V, Sanchez-Diz P, Amorim A, Carracedo A, Gusmao L (2010)
Digging deeper into East African human Y chromosome lineages. Hum
Genet 127:
603-613.
Gusmao L, Butler JM, Carracedo A, Gill P, Kayser M, et al. (2006)
DNA Commission of the International Society of Forensic Genetics
(ISFG): an update of the recommendations on the use of Y-STRs in
forensic analysis. Forensic Sci Int 157: 187-197.
196
29. Jolliffe I (1986) Principal Coponents Analysis, Second Edition. New
York, NY: Springer.
30. Kruskal JB (1964) Multidimensional scaling by optimizing goodness of
fit to a nonmetric hypothesis. Psychometrika 29: 1-27.
31. Excoffier L, Laval G, Schneider S (2005) Arlequin (version 3.0): An
integrated software package for population genetics data analysis. Evol
Bioinform Online 1:
47-50.
32. Monmonier M (1973) Maximum-difference barriers: An alternative
numerical regionalization method. Geographical Analysis. pp 245-261.
33. Manni F, Guerard E, Heyer E (2004) Geographic patterns of (genetic,
morphologic, linguistic) variation: How barriers can be detected by
using Monmonier s algorithm. Human Biology 76: 173-190.
34. Excoffier L, Smouse PE, Quattro JM (1992) Analysis of molecular
variance inferred from metric distances among DNA haplotypes:
application to human mitochondrial DNA restriction data. Genetics
131: 479-491.
35. Bandelt HJ, Forster P, Sykes BC, Richards MB (1995) Mitochondrial
portraits of human populations using median networks. Genetics 141:
743-753.
36. Wilson IJ, Weale ME, Balding DJ(2003) Inferences from DNA data:
population histories, evolutionary processes and forensic match
probabilities. Journal ofthe Royal Statistical Society A 166, part 2: 155201.
37. Zhivotovsky LA, Underhill PA, Cinnioglu C, Kayser M, Morar B, et al.
(2004) The effective mutation rate at Y chromosome short tandem
repeats, with application to human population-divergence time. Am J
Hum Genet 74: 50-61.
38. Zhivotovsky LA, Underhill PA, Feldman MW (2006) Difference
between evolutionarily effective and germ line mutation rate due to
stochastically varying haplogroup size. Mol Biol Evol 23: 2268-2270.
39. Qamar R, Ayub Q, Mohyuddin A, Helgason A, Mazhar K, et al. (2002)
Y-chromosomal DNA variation in Pakistan. Am J Hum Genet 70:
1107-1124.
40. Heyer E, Balaresque P, Jobling MA, Quintana-Murci L, Chaix R, et al.
(2009) Genetic diversity and the emergence of ethnic groups in Central
Asia. BMC Genet 10: 49.
41. Zerjal T, Wells RS, Yuldasheva N, Ruzibakiev R, Tyler-Smith C (2002)
A genetic landscape reshaped by recent events: Y-chromosomal insights
into central asia. Am J Hum Genet 71: 466-482.
42. Wells RS, Yuldasheva N, Ruzibakiev R, Underhill PA, Evseeva I, et al.
(2001) The Eurasian heartland: a continental perspective on Ychromosome diversity. Proc Natl Acad Sci USA 98: 10244-10249.
43. Chaix R, Austerlitz F, Khegay T, Jacquesson S, Hammer MF, et al.
(2004) The genetic or mythical ancestry of descent groups: lessons from
the Y chromosome. Am J Hum Genet 75: 1113-1116.
44. Perez-Lezaun A, Calafell F, Comas D, Mateu E, Bosch E, et al. (1999)
Sex-specific migration patterns in Central Asian populations, revealed by
analysis of Y-chromosome short tandem repeats and mtDNA. Am J
Hum Genet 65: 208-219.
45. Martinez-Cruz B, Vitalis R, Segurel L, Austerlitz F, Georges M, et al.
(2011) In the heartland of Eurasia: the multilocus genetic landscape of
Central Asian populations. Eur JHum Genet 19: 216-223.
46. Underhill PA, Myres NM, Rootsi S, Metspalu M, Zhivotovsky LA, et al.
(2010) Separating the post-Glacial coancestry of European and Asian Y
chromosomes within haplogroup R1a. Eur J Hum Genet 18: 479-484.
47. Semino O, Passarino G, Oefner PJ, Lin AA, Arbuzova S, et al. (2000)
The genetic legacy of Paleolithic Homo sapiens sapiens in extant
Europeans: a Y chromosome perspective. Science 290: 1155-1159.
48. Firasat S, Khaliq S, Mohyuddin A, Papaioannou M, Tyler-Smith C, et al.
(2007) Y-chromosomal evidence for a limited Greek contribution to the
Pathan population of Pakistan. Eur J Hum Genet 15: 121-126.
49. Emadi H (2005) Culture and customs of Afghanistan. Santa Barbara,
CA: Greenwood. 284 p.
50. Johanson L (1998) A History of Turkic. In: Johanson L, Csato E, eds.
The Turkic Languages. London: Routledge.
8. Conclusion
The work undertaken in this study was aimed at investigating the mtDNA haplogroup
composition and distribution among four ethnic groups of Afghanistan and to identify
whether this has been influenced by the demographic processes imposed upon
Afghanistan throughout its history. The Afghani population has been omitted from the
numerous population genetics studies of the past thirty years, largely due to the near
constant political and civil instability in this time. The results illustrate that the majority
of Afghani mtDNA types identified belong to West Eurasian lineages (64.4%). When the
individual ethnic groups are analysed, the Hazara exhibit a large East Asian mtDNA
contribution, at least 2½ times greater than any other ethnic group. This pattern of East
Asian lineage contribution is mirrored when Y-Chromosome lineages are examined
(Haber et al., 2012), where the Hazara contain the East Asian lineage C3-M217 (a
lineage inferred to derive from Genghis Khan (McElreavey and Quintana-Murci, 2005))
among 33.33% of the population. The same lineage was found in <4% of Tajiks and
Pashtuns. The presence and strength of the East Asian lineages observed by both mtDNA
and Y-Chromosome indicates a large occurrence of assimilation with East Asian tribes
among the Hazaras, thus indicating their origins may be from a source population in
eastern Asia. Additionally, the divergence and isolation of the Hazaras from the rest of
Afghanistan’s ethnic groups may be attributed to a combination of the assimilation with
Mongol or East Asian peoples, the practice of Shia Islam (while their ethnic
contemporaries practice Sunni Islam) and also their residence among the harsh
topography of the Hindu Kush mountain range.
The West Eurasian haplogroups compose the majority of the lineages observed in the
Tajiks, Baluch and Pashtuns (>64%) and 40% among the Hazara. The common West
Eurasian haplogroups HV and H are found in each ethnic group ranging from 52.6% in
the Tajik population to 12.5% in the Hazaras. Likewise, the common West Eurasian Yhaplogroup R1a1a is found least among the Hazaras and is greatest in the Pashtun
population. This supports the beliefs of the Baluch, Pashtuns and Tajiks for a West
Eurasian origin and may indicate a potential common ancestry of these groups which
have diverged to become their distinct ethnic groups over subsequent generations.
The HVS-I sequence data has illustrated that the Afghani ethnic groups (excluding the
Baluch) are expanding based upon their smooth bell-shaped mismatch distributions and
the star-like phylogeny of the Median Joining network. Each ethnic group has identified
a high level of genetic diversity based upon the numerous unique haplotypes exhibited. A
197
genetic barrier has also been identified separating the Iranian Baluch population of southeastern Iran and the Afghani Baluch population of south-western Afghanistan.
Additionally, Barrier analysis has highlighted that the Afghani populations have a greater
affinity with West Eurasians (with the exception of the Iranian Baluch population) and
Central Asians than to South or East Asian populations.
The populations of Afghanistan possess a unique mtDNA structure that is likely to have
been shaped by the combination of the country’s extreme terrain and its strategic position
in Central Asia which encouraged the numerous migrations, invasions and Empire
expansion events.
198
References
Achilli, A., Rengo, C., Magri, C., Battaglia, V., Olivieri, A., Scozzari, R., Cruciani,
F., Zeviani, M., Briem, E., Carelli, V., Moral, P., Dugoujon, J-M., Roostalu, U.,
Loogväli, E-L., Kivisild, T., Bandelt, H-J., Richards, M., Villems, R., SantachiaraBenerecetti, A.S., Semino, O., Torroni, A. (2004) The Molecular Dissection of mtDNA
Haplogroup H Confirms That the Franco-Cantabrian Glacial Refuge Was a Major Source
for the European Gene Pool. American Journal of Human Genetics 75:910-918.
Achilli, A., Rengo, C., Battaglia, V., Pala, M., Olivieri, A., Fornarino, S., Magri, C.,
Scozzari, R., Babudi, N., Santachiara-Benerecetti, A.S., Bandelt, H-J., Semino, O.,
Torroni, A. (2005) Saami and Berbers – An Unexpected Mitochondrial DNA Link.
American Journal of Human Genetics 76:883-886.
Achilli, A., Perego, U.A., Bravi, C.M., Coble, M.D., Kong, Q-P., Woodward, S.R.,
Salas, A., Torroni, A., Bandelt, H-J. (2008) The Phylogeny of the Four Pan-American
MtDNA Haplogroups: Implications for Evolutionary and Disease Studies. PLoS One
3:e1764.
Afghanistan Climate, Temperature, Average Weather History, Rainfall/Precipitation,
Sunshine
(n.d.)
Retrieved
January
10th,
2011,
from
http://www.climatetemp.info/afghanistan/
Afghanistan Ethnic Groups Map (2009) Retrieved January, 11th, 2011, from
http://www.mapsofworld.com/afghanistan/afghanistan-ethnic-groups-map.html
Afghanistan Online: Chronological History of Afghanistan. (2008) Retrieved
November 9th, 2010, from http://www.afghan-web.com/history/chron/index.html
Afghans: Their History and Culture. (2002) Retrieved November 15th, 2010, from
http://www.cal.org/co/afghan/apeop.html
Alshamali, F., Brandstätter, A., Zimmermann, B., Parson, W. (2008) Mitochondrial
DNA control region variation in Dubai, United Arab Emirates. Forensic Science
International: Genetics 2:e9-10.
199
Alvarez-Iglesias, V., Mosquera-Miguel, A., Cerezo, M., Quintans, B., Zarrabeitia,
M.T., Cusco, I., Lareu, M.V., Garcia, O., Perez-Jurado, L., Carracedo, A., Salas, A.
(2009) New Population and Phylogenetic Features of the Internal Variation within
Mitochondrial DNA Haplogroup R0. PLoS ONE 4:e5112.
Al-Zahery, N., Semino, O., Benuzzi, G., Magri, C., Passarino, G., Torroni, A.,
Santachiara-Benerecetti, A.S. (2003) Y-Chromosome and mtDNA Polymorphisms in
Iraq, A Crossroad of the Early Human Dispersal and of post-Neolithic Migrations.
Molecular Phylogenetics and Evolution 28:458-472.
Alzualde, A., Izzagirre, N., Alonso, S., Alonso, A., de la Rua, C. (2005) Temporal
Mitochondrial DNA Variation in the Basque Country: Influence of Post-Neolithic
Events. Annuals of Human Genetics 69:665-679.
Anderson, S., Bankier, S., Barrell, B., De Brujin, M., Coulson, A., Drouin, J.,
Eperon, I., Nierlich, D., Roe, B., Sanger, F., Schreier, P., Smith, A., Staden, R.,
Young, I. (1981) Sequence and Organization of the Human Mitochondrial Genome.
Nature 290:457-465.
Andrews, R., Kubacka, I., Chinney, P., Lightowlers, R., Turnbull, D., Howell, N.
(1999) Reanalysis and Revision of the Cambridge Reference Sequence for Human
Mitochondrial DNA. Nature Genetics 23:147.
Anthony, D. (2007) The Horse, The Wheel and Language: How Bronze Age Riders from
the Eurasian Steppes Shaped the Modern World. New Jersey: University Presses of
California, Columbia and Princeton.
Asari, M., Umetsu, K., Adachi, N., Azumi, J., Shimizu, K., Shiono, H. (2007) Utility
of Haplogroup Determination for Forensic mtDNA Analysis in the Japanese Population.
Legal Medicine 9:237-240.
Barfield, T. (2010) Afghanistan: A Cultural and Political History. Princeton, New
Jersey: Princeton University Press.
200
Bartlett, J., Stirling, D. (2003) A Short History of the Polymerase Chain Reaction.
Methods in Molecular Biology 226:3-6.
BBC (2010) Afghanistan Country Profile. Retrieved November 10th, 2010, from
http://news.bbc.co.uk/1/hi/world/south_asia/country_profiles/1162668.stm
Bednarik, R. (2010) An Overview of Asian Paleoart of the Pleistocene. IFRAO
Symposium – Congress: Pleistocene Art of Asia (Pre-Acts).
Behar, D., Rosset, S., Blue-Smith, J., Balanovsky, O., Tzur, S., Comas, D., Mitchell,
R.J., Quintana-Murci, L., Tyler-Smith, C., Wells, R.S., The Genographic
Consortium. (2007) The Genographic Project Public Participation Mitochondrial DNA
Database. PLoS Genetics 3:1083-1095.
Bermisheva, M.A., Tambets, K., Villems, R., Khusnutdinova, E.K. (2002) Diversity
of Mitochondrial DNA Haplogroups in Ethnic Populations of the Volga-Ural Region.
Molecular Biology 36:802-812.
Berniell-Lee, G., Plaza, S., Bosch, E., Calafell, F., Jourdan, E., Césari, M., Lefranc,
G., Comas, D. (2008) Admixture and Sexual Bias in the Population Settlement of La
Réunion Island (Indian Ocean). American Journal of Physical Anthropology 136:100107.
Bogenhagen, D. (1999) Repair of mtDNA in Vertebrates. American Journal of Human
Genetics 64:1276-1281.
Borst, P. (1977) Structure and Function of Mitochondrial DNA. Trends in Biochemical
Science 2(2):31-34.
Brown, M., Voljavec, A., Lott, M., Torroni, A., Yang, CC., Wallace, D. (1992)
Mitochondrial DNA Complex I and III Mutations Associated With Leber’s Hereditary
Optic Neuropathy. Genetics 130:163-173.
Brown, M., Hosseini, S., Torroni, A., Bandelt, HJ., Allen, J., Schurr, T., Scozzari,
R., Cruciani, F., Wallace, D. (1998) mtDNA Haplogroup X: An Ancient Link between
201
Europe/Western Asia and North America? American Journal of Human Genetics
63:1852-1861.
Brown, W., George, M., Wilson, A. (1979) Rapid Evolution of Animal Mitochondrial
DNA. Proceedings of the National Academy of Sciences of the USA 76(4):1967-1971.
Burch, J. (2008) Reuters: Afghan Census Postponed for Two Years - U.N. Retrieved
November 7th, 2010, from http://www.reuters.com/article/idUSISL267420080608
Butler, J. (2005) Forensic DNA Typing: Biology, Technology, and Genetics of STR
Markers. 2nd Edition. London. Elsevier Academic Press.
Cann, R., Stoneking, M., Wilson, A. (1987) Mitochondrial DNA and Evolution. Nature
325:31-36.
Cartmill, M., Smith, F.H. (2009) The Human Lineage. Hoboken, New Jersey. John
Wiley & Sons.
Carvalho, B.M., Bortolini, M.C., dos Santos, S.E.B., Ribeiro-dos-Santos, A.K.C.
(2008) Mitochondrial DNA Mapping of Social-Biological Interactions in Brazilian
Amazonian African-Descendant Populations. Genetics and Molecular Biology 31:12-22.
Chandrasekar, A., Kumar, S., Sreenath, J., Sarkar, B.N., Urade, B.P., Mallick, S.,
Bandopadhyay, S.S., Barua, P., Barik, S.S., Basu, D., Kiran, U., Gangopadhyay, P.,
Sahani, R., Prasad, B.V.R., Gangopadhyay, S., Lakshmi, G.R., Ravuri, R.R.,
Padmaja, K., Venugopal, P.N., Sharma, M-B., Rao, V.R. (2009) Updating Phylogeny
of Mitochondrial DNA Macrohaplogroup M in India: Dispersal of Modern Human in
South Asian Corridor. PLoS ONE 4:e7447.
Chen, Y-S., Torroni, A., Excoffier, L., Santachiarra-Benerecetti, A.S., Wallace, D.
(1995) Analysis of mtDNA Variation in African Populations Reveals the Most Ancient
of All Human Continent-Specific Haplogroups. American Journal of Human Genetics
57:133-149.
202
Chen, X., Prosser, R., Simoetti, S., Sadlock, J., Jagiello, G., Schon, E. (1995)
Rearranged Mitochondrial Genomes Are Present in Human Oocytes. American Journal
of Human Genetics 57:239-247.
Childe, G. (1925) The Dawn of European Civilization. London: Keegan Paul Trench &
Trubner
CIA - The World Factbook: Afghanistan (2010) Retrieved November 7th, 2010, from
https://cia.gov/library/publications/the-world-factbook/geos/af.html
CIA - The World Factbook: Iran (2011) Retrieved December 14th, 2011, from
https://www.cia.gov/library/publications/the-world-factbook/geos/ir.html
Colorado State University/Department of Defense (2010) Afghanistan: Cultural
Heritage
at
a
Glance.
Retrieved
November
15th,
2010,
from
http://www.cemml.colostate.edu/cultural/09476/afgh02-01enl.html
Comas, D., Plaza, S., Wells, R.S., Yuldaseva, N., Lao, O., Calafell, F., Bertranpetit,
J. (2004) Admixture, Migrations, and Dispersals in Central Asia: Evidence from
Maternal DNA Lineages. European Journal of Human Genetics 12:495-504.
Cox, M., Mendez, F., Karafet, T., Pilkington, M., Kingan, S., Destro-Bisol, G.,
Strassmann, B., Hammer, M. (2008) Testing for Archaic Hominin Admixture on the X
Chromosome: Model Likelihood for the Modern Human RRM2P4 Region from
Summaries of Genealogical Topology Under the Structural Coalescent. Genetic 178:427437.
Davison, K., Dolukhanov, P., Sarson, G., Shukurov, A. (2006) The Role of
Waterways in the Spread of the Neolithic. Journal of Archaeological Science 33:641652.
Derenko, M., Malyarchuk, B., Grzybowski, T., Denisova, G., Dambueva, I.,
Perkova, M., Dorzhu, C., Luzina, F., Lee, H.K., Vanecek, T., Villems, R., Zakharov,
I. (2007) Phylogeographic Analysis of Mitochondrial DNA in Northern Asian
Populations. American Journal of Human Genetics 81:1025-1041.
203
Disotell, T. (1999) Human Evolution: The Southern Route to Asia. Current Biology
9:R925-R928.
Dupree, L., Dupree, N. (2011) Afghanistan In Encyclopædia Britannica. Retrieved
January 8th, 2011, from http://www.britannica.com/EBchecked/topic/7798/Afghanistan
Ebner, S., Lang, R., Mueller, E., Eder, W., Oeller, M., Moser, A., Koller, J.,
Paulweber, B., Mayr, J., Sperl, W., Kofler, B. (2011) Mitochondrial Haplogroups,
Control Region Polymorphisms and Malignant Melanoma: A Study in Middle European
Caucasians. PLoS ONE 6:e27192.
Elson, J., Samuels, D., Turnbull, D., Chinnery, P. (2001) Random Intracellular Drift
Explains the Clonal Expansion of Mitochondrial DNA Mutations with Age. American
Journal of Human Genetics 68:802-806.
Ewens, W.J. (1972) The Sampling Theory of Selectively Neutral Alleles. Theoretical
Populations Biology 3:87-112.
Excoffier, L., Smouse, P., Quattro, J. (1992) Analysis of Molecular Variance Inferred
from Metric Distances Among DNA Haplotypes: Application to Human Mitochondrial
DNA Restriction Data. Genetics 131:479-491.
Excoffier, L., Lischer, H.E.L. (2010) Arlequin Suite ver 3.5: A New Series of Programs
to Perform Population Genetics Analysis Under Linux and Windows. Molecular Ecology
Resources 10:564-567.
Fagundes, N., Kanitz, R., Eckert, R., Valls, A., Bogo, M., Salzano, F., Smith, D.G.,
Silva Jr., W., Zago, M., Ribeiro-dos-Santos, A., Santos, S., Petzl-Erler, M.L.,
Bonatto, S. (2008) Mitochondrial Population Genomics Supports a Single a Single PreClovis Origin with a Coastal Route for the Peopling of the Americas. American Journal
of Human Genetics 82:583-592.
204
Farr, G. (2009) The Hazara of Central Afghanistan. In B. Brower & B. R. Johnston
(Eds), Disappearing Peoples? : Indigenous Groups and Ethnic Minorities in South and
Central Asia (pp154-169). Walnut Creek, CA, USA: Left Coast Press.
Fechner, A., Quinque, D., Rychkov, S., Morozowa, I., Naumova, O., Schneider, Y.,
Willuweit, S., Zhukova, O., Roewer, L., Stoneking, M., Nasidze, I. (2008) Boundaries
and Clines in the West Eurasian Y-Chromosome Landscape: Insights from the European
Part of Russia. American Journal of Physical Anthropology 137:41-47.
Forster, P., Cali, F., Röhl, A., Metspalu, E., D’Anna, R., Mirisola, M., De Leo, G.,
Flugy, A., Salerno, A., Ayala, G., Kouvatsi, A., Villems, R., Romano, V. (2002)
Continental and Subcontinental Distributions of mtDNA Control Region Types.
International Journal of Legal Medicine 116:99-108.
Forster, P., Renfrew, C. (2011) Mother Tongue and Y Chromosomes. Science
333:1390-1391.
Forster, P., Matsumura, S. (2005) Did Early Humans Go North or South? Science
308:965-966.
Fortson, B. (2009) Indo-European Language and Culture, An Introduction. 2nd Edition.
Chichester. John Wiley & Sons Ltd.
Fu, Y., Xie, C., Xu, X., Li, C., Zhang, Q., Zhou, H., Zhu, H. (2009) Ancient DNA
Analysis of Human Remains from the Upper Capital City of Kublai Khan. American
Journal of Physical Anthropology 138:23-29.
Giles, R., Blanc, H., Cann, H., Wallace, D. (1980) Maternal Inheritance of Human
Mitochondrial DNA. Proceedings of the National Academy of Sciences of the USA
77(11):6715-6719.
Gonder, M., Mortensen, H., Reed, F., de Sousa, A., Tishkoff, S. (2007) WholemtDNA Genome Sequence Analysis of Ancient African Lineages. Molecular Biology
and Evolution 24:757-768.
205
Green, R., Krause, J., Briggs, A.W., Maricic, T., Stenzel, U., Kircher, M., Patterson,
N., Li, H., Zhai, W., His-Yang Fritz, M., Hansen, N.F., Durand, E.Y., Malspinas, AS. Jensen, J.D., Marques-Bonet, T., Alkan, C., Prüfer, K., Meyer, M., Burbano,
H.A., Good, J.M., Schultz, R., Aximu-Petri, A., Butthof, A., Höber, B., Höffner, B.,
Siegemund, M., Weihmann, A., Nusbaum, C., Lander, E.S., Russ, C., Novod, N.,
Affourtit, J., Egholm, M., Verna, C., Rudan, P., Brajkovic, D., Kucan, Z., Gusic, I.,
Doronichev, V.B., Golovanova, L.V., Lalueza-Fox, C., de la Rasilla, M., Fortea, J.,
Rosas, A., Schmitz, R.W., Johnson, P.L.F., Eichler, E.E., Falush, D., Birney, E.,
Mullkin, J.C., Slatkin, M., Nielsen, R., Kelso, J., Lachmann, M., Reich, D., Pääbo, S.
(2010) A Draft Sequence of the Neanderthal Genome. Science 328:710-722.
Grignani, P., Turchi, C., Achilli, A., Peloso, G., Alu, M., Ricci, U., Robino, C.,
Pelotti, S., Carnevali, E., Boschi, I., Tagliabracci, A., Previdere, C. (2009) Multiplex
mtDNA Coding Region SNP Assays for Molecular Dissection of Haplogroups U/K and
J/T. Forensic Science International: Genetics 4:21-25.
Haber, M., Platt, D.E., Badro, D.A., Xue, Y., El-Sibai, M., Ashrafian Bonab, M.,
Youhanna, S.C., Saade, S., Soria-Hernanz, D.F., Royyuru, A., Spencer Wells, R.,
Tyler-Smith, C., Zalloua, P.A., The Genographic Consortium (2010) Influences of
History, Geography, and Religion on Genetic Structure: the Maronites in Lebanon.
European Journal of Human Genetics 19:334-340.
Haber, M., Platt, D.E., Ashrafian Bonab, M., Youhanna, S.C., Soria-Hernanz, D.F.,
Martinez-Cruz, B., Douaihy, B., Ghassibe-Sabbagh, M., Rafatpanah, H., Ghanbari,
M., Whale, J., Balanovsky, O., Spencer Wells, R., Comas, D., Tyler-Smith, C.,
Zalloua, P.A., The Genographic Consortium (2012) Afghanistan’s Ethnic Groups
Share a Y-Chromosomal Heritage Structured by Historical Events. PLoS One 7:e34288.
Harpending, R.C. (1994) Signature of Ancient Population Growth in a Low-Resolution
Mitochondrial DNA Mismatch Distribution. Human Biology 66:591-600.
Harper, D. (2010) Online Etymology Dictionary. Retrieved December 20th, 2010, from
http://www.etymonline.com/index.php?search=Afghanistan&searchmode=none
206
Hasegawa, M., Horai, S. (1991) Time of the Deepest Root for Polymorphism in Human
Mitochondrial DNA. Journal of Molecular Evolution 32:37-42.
Hedman, M., Brandstätter, A., Pimenoff, V., Sistonen, P., Palo, J., Parson, W.,
Sajantila, A. (2007) Finnish Mitochondrial DNA HVS-I and HVS-II Population Data.
Forensic Science International 172:171-178.
Herrnstadt, C., Elson, J.L., Fahy, E., Preston, G., Turnbull, D.M., Anderson, C.,
Ghosh, S.S., Olefsky, J.M., Beal, M.F., Davis, R.E., Howell, N. (2002) ReducedMedian-Network Analysis of Complete Mitochondrial DNA Coding-Region Sequences
for the Major African, Asian, and European Haplogroups. American Journal of Human
Genetics 70:1152-1171.
Heyer, E., Balaresque, P., Jobling, M., Quintana-Murci, L., Chaix, R., Segurel, L.,
Aldashev, A., Hegay, T. (2009) Genetic Diversity and the Emergence of Ethnic Groups
in Central Asia. BMC Genetics 10:49.
Hodgson, J., Disotell, T. (2008) No Evidence of a Neanderthal Contribution to Modern
Human Diversity. Genome Biology 9(2):206.
Howell, N., McCullogh, D., Kubacka, I., Halvorson, S., Mackey, D. (1992) The
Sequence of Human mtDNA: The Question of Errors versus Polymorphisms. American
Journal of Human Genetics 50:1333-1337.
Hudjashov, G., Kivisild, T., Underhill, P., Endicott, P., Sanchez, J., Lin, A., Shen, P.,
Oefner, P., Renfrew, C., Villems, R., Forster, P. (2007) Revealing the prehistoric
settlements of Australia by Y chromosome and mtDNA Analysis. Proceedings of the
National Academy of Sciences of the USA 104:8726-8730.
Ingman, M., Gyllensten, U. (2001) Analysis of the Complete Human mtDNA Genome:
Methodology and Inferences for Human Evolution. Journal of Heredity 92(6):454-461.
Ingman, M., Gyllensten, U. (2007a) A Recent Genetic Link Between Sami and the
Volga-Ural Region of Russia. European Journal of Human Genetics 15:115-120.
207
International Security Assistance Force (ISAF) (n.d.) Afghanistan Provinces Map
Retrieved December 20th, 2010, from http://www.isaf.nato.int/map-usfora/index.php
Irwin, J., Saunier, J., Beh, P., Strouss, K., Painter, C., Parsons, T. (2009a)
Mitochondrial DNA Control Region Variation in a Population Sample from Hong Kong,
China. Forensic Science International: Genetics 3:e119-e125.
Irwin, J., Ikramov, A., Saunier, J., Bodner, M., Amory, S., Röck, A., O’Callaghan,
J., Nuritdinov, A., Atakhodjaev, S., Mukhamedov, R., Parson, W., Parsons, T.
(2009b) The mtDNA Composition of Uzbekistan: A Microcosm of Central Asian
Patterns. International Journal of Legal Medicine 124:195-204.
Islamic Republic of Afghanistan Central Statistics Organization (CSO) (2010)
Afghanistan Statistical Yearbook 2009-2010. Retrieved November 10th, 2010, from
http://www.cso.gov.af/
Islamic Republic of Afghanistan Office of the President; Biography (2009) Retrieved
November 7th, 2010, from http://www.president.gov.af/sroot_eng.aspx?id=166
Jacobson, J. (1979) Recent Developments in South Asian Prehistory and Protohistory.
Annual Review of Anthropology 8:467-502.
Jansen, T., Forster, P., Levine, M., Oelke, H., Hurles, M., Renfrew, C., Weber, J.,
Olek, K. (2002) Mitochondrial DNA and the Origins of the Domestic Horse.
Proceedings of the National Academy of Sciences of the USA 99(16):10,905-10,910.
Jin, HJ., Tyler-Smith, C., Kim, W. (2009) The Peopling of Korea Revealed by
Analyses of Mitochondrial DNA and Y-Chromosomal Markers. PLoS ONE 4:e4210.
Jobling, M.A. (2001) In the Name of the Father: Surnames and Genetics. Trends in
Genetics 17:353-357.
Jobling, M.A., Tyler-Smith, C. (2003) The Human Y-Chromosome: An Evolutionary
Marker Comes of Age. Nature Review Genetics 4:598-612.
208
Jobling, M.A., Hurles, M.E., Tyler-Smith, C. (2004) Human Evolutionary Genetics:
Origins, Peoples & Disease. Abingdon, UK. Garland Science.
Jukes, T., Cantor, C. (1969) Evolution of Protein Molecules. In H.N. Munro (Ed),
Mammalian Protein Metabolism (pp21-132). New York:Academic Press, p21-132.
Kivisild, T., Bamshad, M.J., Kaldma, K., Metspalu, M., Metspalu, E., Reidla, M.,
Laos, S., Parik, J., Watkins, W.S., Dixon, M.E., Papiha, S.S., Mastana, S.S., Mir,
M.R., Ferak, V., Villems, R. (1999) Deep Common Ancestry of Indian and WesternEurasian Mitochondrial DNA Lineages. Current Biology 9:1331-1334.
Kivisild, T., Kaldma, K., Metspalu, M., Parik, J., Papiha, S., Villems, R. (1999b) The
Place of the Indian mtDNA Variants in the Global Network of Maternal Lineages and the
Peopling of the Old World. In R. Deka, S. Papiha, R. Chakraborty (Eds.) Genomic
Diversity: Applications in Human Population Genetics (pp. 135-152). New York:
Kluwer Academic/Plenum Publishers.
Kivisild, T., Rootsi, S., Metspalu, M., Mastana, S., Kaldma, K., Parik, J., Metspalu,
E., Adojaan, M., Tolk, H-V., Stepanov, V., Gölge, M., Usanga, E., Papiha, S.S.,
Cinnioglu, C., King, R., Cavalli-Sforza, L., Underhill, P.A., Villems, R. (2003) The
Genetic Heritage of the Earliest Settlers Persists Both in Indian Tribal and Caste
Populations. American Journal of Human Genetics 72:313-332.
Klein, R.G. (2008) Out of Africa and the Evolution of Human Behaviour. Evolutionary
Anthropology 17:267-281.
Kolman, C., Sambuughin, N., Bermingham, E. (1996) Mitochondrial DNA Analysis
of Mongolian Populations and Implications for the Origin of New World Founders.
Genetics Society of America 142:1321-1334.
Kong, Q-P., Yao, Y. G., Sun, C., Bandelt, HJ., Zhu, C. L., Zhang, Y. P. (2003)
Phylogeny of East Asian Mitochondrial DNA Lineages Inferred from Complete
Sequences. American Journal of Human Genetics 73:671-676.
209
Kraytsberg, Y., Schwartz, M., Brown, T.A., Ebralidse, K., Kunz, W.S., Clayton,
D.A., Vissing, J., Khrapko, K. (2004) Recombination of Human Mitochondrial DNA.
Science 304:981.
Kumar, S., Reddy Ravuri, R., Koneru, P., Urade, BP., Sarkar, BN., Chandrasekar,
A., Rao, VR. (2009) Reconstructing Indian-Australian Phylogenetic Link. BMC
Evolutionary Biology 9:173-177.
Kumar, V., Reddy, AN., Babu, P., Nageswar, T., Thangaraj, K., Reddy, AG., Singh,
L., Reddy, B. (2008) Molecular Genetic Study on the Status of Transitional Groups in
Central India: Cultural Diffusion or Demic Diffusion? International Journal of Human
Genetics 8(1-2):31-39.
Kvist, L., Martens, J., Nazarenko, A.A., Orell, M. (2003) Paternal Leakage of
Mitochondrial DNA in the Great Tit (Parus major). Molecular Biology and Evolution
20(2):243-247.
Lacau, H., Bukhari, A., Gayden, T., La Salvia, J., Regueiro, M., Stojkovic, O.,
Herrera, R. (2011) Y-STR Profiling in Two Afghanistan Populations. Legal Medicine
13(2):103-108.
Lewis, PM. (Ed) (2009) Ethnologue: Languages of the World, Sixteenth Edition. Dallas,
TX, USA: SIL International. Online Version: http://www.ethnologue.com/
Lightowlers, R., Chinnery, P., Turnbull, D., Howell, N. (1997) Mammalian
Mitochondrial Genetics: Heredity, Heteroplasmy and Disease. Trends in Genetics
13:450-455.
Liu, H., Prugnolle, F., Manica, A., Balloux, F. (2006) A Geographically Explicit
Genetic Model of Worldwide Human-Settlement History. American Journal of Human
Genetics 79:230-237.
Macaulay, V., Richards, M., Hickey, E., Vega, E., Cruciani, F., Guida, V., Scozzari,
R., Bonne-Tamir, B., Sykes, B., Torroni, A. (1999) The Emerging Tree of West
210
Eurasian mtDNAs: A Synthesis of Control-Region Sequences and RFLPs. American
Journal of Human Genetics 64:232-249.
Macaulay, V., Hill, C., Achilli, A., Rengo, C., Clarke, D., Meehan, W., Blackburn,
J., Semino, O., Scozzari, R., Cruciani, F., Taha, A., Kassim Shaari, N., Maripa Raja,
J., Ismail, P., Zainuddin, Z., Goodwin, W., Bulbeck, D., Bandelt, H-J.,
Oppenheimer, S., Torroni, A., Richards, M. (2005) Single, Rapid Coastal Settlement
of Asia Revealed by Analysis of Complete Mitochondrial Genomes. Science 308:10341036.
Maji, S., Krithika, S., Vasulu, T. (2008) Distribution of Mitochondrial DNA
Macrohaplogroup N in India with Special Reference to Haplogroup R and its SubHaplogroup U. International Journal of Human Genetics 8(1-2):85-96.
Mallory, J. (2003) Archaeological Models and Asian Indo-Europeans. In N. SimsWilliams (Ed) Indo-Iranian Languages and Peoples (pp. 19-42). Oxford. British
Academy.
Manni, F., Guérard, Heyer, E. (2004) Georgraphic Patterns of (Genetic, Morphologic,
Linguistic) Variation: How Barriers Can Be Detected by Using Monmonier’s Algorithm.
Human Biology 76:173-190.
Martinez, L., Mirabel, S., Luis, J.R., Herrera, R.J. (2008) Middle Eastern and
European mtDNA Lineages Characterize Populations from Eastern Crete. American
Journal of Physical Anthropology 137:213-223.
McElreavey, K., Quintana-Murci, L. (2005) A Population Genetics Perspective of the
Indus Valley Through Uniparentally Inherited Markers. Annuals of Human Biology
32:154-162.
Meinilä, M., Finnilä, S., Majamaa, K. (2001) Evidence for mtDNA Admixture
Between the Finns and the Saami. Human Heredity 52:160-170.
Mellars, P. (2006) Going East: New Genetic and Archaeological Perspectives on the
Modern Human Colonization of Eurasia. Science 313:796-800.
211
Merriwether, A., Clark, A., Ballinger, S., Schurr, T., Soodyall, H., Jenkins, T.,
Sherry, S., Wallace, D. (1991) The Structure of Human Mitochondrial DNA Variation,
journal of Molecular Evolution 33:543-555.
Meusel, M., Moritz, R. (1993) Transfer of Paternal Mitochondrial DNA During
Fertilization of Honeybee (Apis mellifera L.) eggs. Current Genetics 24(6):539-543.
Mikkelsen, M., Rockenbauer, E., Sørensen, E., Rasmussen, M., Børsting, C.,
Morling, N. (2008) A Mitochondrial DNA SNP Multiplex Assigning Caucasians into 36
Haplo- and Subhaplogroups. Forensic Science International: Genetics Supplement Series
1:287-289.
Nasidze, I., Stoneking, M. (2001) Mitochondrial DNA Variation and Language
Replacements in the Caucasus. Proceedings of the Royal Society London Biological
Sciences 268:1197-1206.
Nasidze, I., Ling, E.Y.S., Quinque, D., Dupanloup, I., Cordaux, R., Rychkov, S.,
Naumova, O., Zhukova, O., Sarraf-Zadegan, N., Naderi, G.A., Asgary, S., Sardas,
S., Farhud D.D., Sarkisian, T., Asadov, C., Kerimov, A., Stoneking, M. (2004a)
Mitochondrial DNA and Y-Chromosome Variation in the Caucasus. Annuals of Human
Genetics 68:205-221.
Nasidze, I., Quinque, D., Dupanloup, I., Rychkov, S., Naumova, O., Zhukova, O.,
Stoneking, M. (2004b) Genetic Evidence Concerning the Origins of South and North
Ossetians. Annuals of Human Genetics 68:588-599.
Nasidze, I., Quinque, D., Ozturk, M., Bendukidze, N., Stoneking, M. (2005a) mtDNA
and Y-Chromosome Variation in Kurdish Groups. Annuals of Human Genetics 69:401412.
Nasidze, I., Quinque, D., Dupanloup, I., Cordaux, R., Kokshunova, L., Stoneking,
M. (2005b) Genetic Evidence for the Mongolian Ancestry of Kalmyks. American
Journal of Physical Anthropology 128:846-854.
212
Nasidze, I., Quinque, D., Rahmani, M., Ali Alemohamad, S., Stoneking, M. (2006)
Concomitant Replacement of Language and mtDNA in South Caspian Populations of
Iran. Current Biology 16:668-673.
Nasidze, I., Quinque, D., Udina, I., Kunizheva, S., Stoneking, M. (2007) The Gagauz,
a Linguistic Enclave, are not a Genetic Isolate. Annuals of Human Genetics 71:379-389.
Nasidze, I., Quinque, D., Rahmani, M., Alemohamad, S.A., Stoneking, M. (2008)
Close Genetic Relationship Between Semitic-speaking and Indo-European-speaking
Groups in Iran. Annuals of Human Genetics 72:241-252.
Nei, M. (1987) Molecular Evolutionary Genetics. Columbia University Press, New
York, NY, USA.
Nei, M. (1995) Genetic Support for the out-of-Africa theory of human evolution.
Proceedings of the National Academy of Sciences of the USA 92:6720-6722.
Nielson, P. (2010) The People and Cultures of Afghanistan. Retrieved January 11th,
2011, from http://www.suite101.com/content/the-people-of-afghanistan-a189542
Olivo, P., van de Walle, M., Laipis, P., Hauswirth, W. (1983) Nucleotide Sequence
Evidence for Rapid Genotypic Shifts in the Bovine Mitochondrial DNA D-loop. Nature
306:400-402.
Palanichamy, M., Sun, C., Agrawal, S., Bandelt, HJ., Kong, QP., Khan, F., Wang,
CY., Chaudhuri, TK., Palla, V., Zhang, YP. (2004) Phylogeny of Mitochondrial DNA
Macrohaplogroup N in India, Based on Complete Sequencing: Implication for the
Peopling of South Asia. American Journal of Human Genetics 75:966-978.
Pereira, L., Richards, M., Goias, A., Alonso, A., Albarran, C., Garcia, O., Behar, D.,
Gölge, M., Hatina, J., Al-Gazali, L., Bradley, D., Macaulay, V., Amorim, A. (2006)
Evaluating the Forensic Informativeness of mtDNA Haplogroup H Sub-Typing on a
Eurasian Scale. Forensic Science International 159:43-50.
213
Petrov, V., Weinbaum, M. (2011) Afghanistan In Encyclopædia Britannica. Retrieved
January 8th, 2011, from http://www.britannica.com/EBchecked/topic/7798/Afghanistan
Piganeau, G., Eyre-Walker, A. (2004) A Reanalysis of the Indirect Evidence for
Recombination in Human Mitochondrial DNA. Heredity 92:282-288.
Powell, G., Yang, H., Tyler-Smith, C., Xue, Y. (2007) The Population History of the
Xibe in Northern China: A Comparison of Autosomal, mtDNA and Y-Chromosomal
Analyses of Migration and Gene Flow. Forensic Science International: Genetics 1:115119.
Qamar, R., Ayub, Q., Mohyuddin, A., Mazhar, K., Mansoor, A., Zerjal, T., TylerSmith, C., Mehdi, Q. (2002) Y-Chromosomal DNA Variation in Pakistan. American
Journal of Human Genetics 70:1107-1124.
Qiagen (2009) PAXgene Blood DNA Kit. Retrieved December 10th, 2010, from
http://www.qiagen.com/products/genomicdnastabilizationpurification/paxgeneblooddnas
ystem/paxgeneblooddnakit.aspx#Tabs=t1
Quintana-Murci, L., Chaix, R., Wells, S., Behar, D., Sayar, H., Scozzari, R., Rengo,
C., Al-Zahery, N., Semino, O., Santachiara-Benerecetti, S., Coppa, A., Ayub, Q.,
Mohyuddin, A., Tyler-Smith, C., Mehdi, Q., Torroni, A., McElreavey, K. (2004)
Where West Meets East: The Complex mtDNA Landscape of the Southwest and Central
Asian Corridor. American Journal of Human Genetics 74:827-845.
Rasanayagam, A. (2003) Afghanistan: A Modern History. London: I.B. Tauris.
Reidla, M., Kivisild, T., Metspalu, E., Kaldma, K., Tambets, K., Tolk, H-V, Parik,
J., Loogvali, E-L., Derenko, M., Malyarchuk, B., Bermisheva, M., Zhadanov, S.,
Pennarun, E., Gubina, M., Golubenko, M., Damba, L., Fedorova, S., Gusar, V.,
Grechanina, E., Mikerezi, I., Moisan, J-P., Chavantré, A., Khusnutdinova, E.,
Osipova, L., Stepanov, V., Voevoda, M., Achilli, A., Rengo, C., Rickards, O., De
Stefano, G.F., Papiha, S., Beckman, L., Janicijevic, B., Rudan, P., Anagnou, N.,
Michalodimitrakis, C., Koziel, S., Usanga, E., Geberhiwot, T., Herrnstadt, C.,
214
Howell, N., Torroni, A., Villems, R. (2003) Origin and Diffusion of mtDNA
Haplogroup X. American Journal of Human Genetics 73:1178-1190.
Richard, C., Pennarun, E., Kivisild, T., Tambets, K., Tolk, H-V., Metspalu, E.,
Reidla, M., Chevalier, S., Giraudet, S., Lauc, L., Pericic, M., Rudan, P., Claustres,
M., Journel, H., Dorval, I., Muller, C., Villems, R., Chaventre, A., Moisan, JP.
(2007) An mtDNA Perspective of French Genetic Variation. Annuals of Human Biology
34:68-79.
Richards, M., Macaulay, V., Bandelt, H-J., Sykes, B. (1998) Phylogeography of
Mitochondrial DNA in Western Europe. Annuals of Human Genetics 62:241-260.
Richards, M., Macaulay, V., Hickey, E., Vega, E., Sykes, B., Guida, V., Rengo, C.,
Sellitto, D., Cruciani, F., Kivisild, T., Villems, R., Thomas, M., Rychkov, S.,
Rychkov, O., Rychkov, Y., Gölge, M., Dimitrov, D., Hill, E., Bradley, D., Romano,
V., Cali, F., Vona, G., Demaine, A., Papiha, S., Triantaphyllidis, C., Stefanescu, G.,
Hatina, J., Belledi, M., Di Rienzo, A., Novelletto, A., Oppenheim, A., Norby, S., AlZaheri, N., Santachiara-Benerecetti, S., Scozzari, R., Torroni, A., Bandelt, H-J.
(2000) Tracing European Founder Lineages in the Near Eastern mtDNA Pool. American
Journal of Human Genetics 67:1251-1276.
Richards, M., Macaulay, V., Torroni, A., Bandelt, H-J. (2002) In Search of
Geographical Patterns in European Mitochondrial DNA. American Journal of Human
Genetics 71:1168-1174.
Rogers, A.R., Harpending, H. (1992) Population Growth Makes Waves in the
Distribution of Pairwise Genetic Differences. Molecular Biology and Evolution 9:552569.
Röhl, A., Brinkmann, B., Forster, L., Forster, P. (2001) An Annotated mtDNA
Database. International Journal of Legal Medicine 115:29-39.
Roostalu, U., Kutuev, I, Loogvali, E-L., Metspalu, E., Tambets, K., Reidla, M.,
Khusnutdinova, E.K., Usanga, E., Kivisild, T., Villems, R. (2007) Origin and
Expansion of Haplogroup H, the Dominant Human Mitochondrial DNA Lineage in West
215
Eurasia: The Near Eastern and Caucasian Perspective. Molecular Biology and Evolution
24:436-448.
Ruvolo, M., Zehr, S., von Dornum, M., Pan, D., Chang, D., Lin, J. (1993)
Mitochondrial COII Sequences and Modern Human Origins. Molecular Biology and
Evolution 10:1115-1135.
Salas, A., Richards, M., De la Fe, T., Lareu, M-V., Sobrino, B., Sanchez-Diaz, P.,
Macaulay, V., Carracedo, A. (2002) The Making of the African mtDNA Landscape.
American Journal of Human Genetics 71:1082-1111.
Schick, K., Toth, D. (1993) Making Silent Stones Speak: Human Evolution and the
Dawn of Technology. New York: Simon & Schuster.
Schurr, T., Ballinger, S., Gan, Y-Y., Hodge, J., Merriwether, D.A., Lawrence, D.,
Knowler, W., Weiss, K., Wallace, D. (1990) Amerindian Mitochondrial DNAs Have
Rare Asian Mutations at High Frequencies, Suggesting They Derived from Four Primary
Maternal Lineages. American Journal of Human Genetics 46:613-623.
Schwartz, M., Vissing, J. (2002) Paternal Inheritance of Mitochondrial DNA. New
England Journal of Medicine 347:576-580.
Shepard, E.M., Herrera, R.J. (2006) Iranian STR Variation at the Fringes of
Biogeographical Demarcation. Forensic Science International 158:140-148.
Shlush, L., Behar, D., Yudkovsky, G., Templeton, A., Hadid, Y., Basis, F., Hammer,
M., Itzkovitz, S., Skorecki, K. (2008) The Druze: A Population Genetic Refugium of
the Near East. PLoS ONE 3:e2105.
Short, D. (2007) Indo-European Languages; Part 1: Centum Languages. Retrieved
August 18th, 2009, from http://www.danshort.com/ie/iecentum.htm
Short, D. (2007) Indo-European Languages; Part 2: Satem Languages. Retrieved
August 18th, 2009, from http://danshort.com/ie/iesatem.htm
216
Soares, P., Ermini, L., Thompson, N., Mormina, M., Rito, T., Röhl, A., Salas, A.,
Oppenheimer, S., Macaulay, V., Richards, M. (2009) Correcting for Purifying
Selection: An Improved Human Mitochondrial Molecular Clock. American Journal of
Human Genetics 84:740-759.
Soodyall, H., Vigilant, L., Hill, A.V., Stoneking, M., Jenkins, T. (1996) mtDNA
Control-Region Sequence Variation Suggests Multiple Independent Origins of an
“Asian-Specific” 9-bp Deletion in Sub-Saharan Africans. American Journal of Human
Genetics 58:595-608.
St. John, J., Sakkas, D., Dimitriadi, K., Barnes, A., Maclin, V., Ramey, J., Barratt,
C., De Jonge, C. (2000) Failure of Elimination of Paternal Mitochondrial DNA in
Abnormal Embryos. The Lancet 355:200.
Stoneking, M. (2008) Human Origins: The Molecular Perspective. EMBO Reports
9:S46-S50.
Stringer, C. (2002) Modern Human Origins: Progress and Prospects. Philosophical
Transactions, Biological Sciences 357:563-579.
Sykes, B., Irven, C. (2000) Surnames and the Y Chromosome. American Journal of
Human Genetics 66:1417-1419.
Tajima, F. (1983) Evolutionary Relationship of DNA Sequences in Finite Populations.
Genetics 105:437-460.
Tajima, F. (1989) Statistical Method for Testing the Neutral Mutation Hypothesis by
DNA Polymorphism. Genetics 123:585-595.
Tambets, K., Rootsi, S., Kivisild, T., Help, H., Serk, P., Loogvali, EL., Tolk, HV.,
Reidle, M., Metspalu, E., Pliss, L., Balanovsky, O., Pshenichnov, A., Balanovska, E.,
Gubina, M., Zhadanov, S., Osipova, L., Damba, L., Voevoda, M., Kutuev, I.,
Bermisheva, M., Khusnutdinova, E., Gusar, V., Grechanina, E., Parik, J.,
Pennarun, E., Richard, C., Chaventre, A., Moisan, JP., Barac, L., Pericic, M.,
Rudan, P., Terzic, R., Mikerezi, I., Krumina, A., Baumanis, V., Koziel, S., Rickards,
217
O., De Stefano, GF., Anagnou, N., Pappa, KI., Michalodimitrakis, E., Ferak, V.,
Furedi, S., Komel, R., Beckman, L., Villems, R. (2004) The Western and Eastern
Roots of the Saami – the Story of Genetic “Outliers” Told by Mitochondrial DNA and Y
Chromosomes. American Journal of Human Genetics 74:661-682.
Tetzlaff, S., Brandstatter, A., Wegener, R., Parson, W., Weirich, V. (2007)
Mitochondrial DNA Population Data of HVS-I and HVS-II Sequences from a Northeast
German Sample. Forensic Science International 172:218-224.
Tömöry, G., Csanyi, B., Bogacsi-Szabo, E., Kalmar, T., Czibula, A., Csosz, A.,
Priskin, K., Mende, B., Lango, P., Downes, C.S., Rasko, I. (2007) Comparison of
Maternal Lineage and Biogeographic Analyses of Ancient and Modern Hungarian
Populations. American Journal of Physical Anthropology 134:354-368.
Torroni, A., Schurr, T., Yang, C. C., Szathmary, E., Williams, R., Schanfield, M.,
Troup, G., Knowler, W., Lawrence, D., Weiss, K., Wallace, D. (1992) Native
American Mitochondrial DNA Analysis Indicates That the Amerind and the Nadene
Populations Were Founded by Two Independent Migrations. Genetics 130:153-162.
Torroni, A., Schurr, T., Cabell, M., Brown, M., Neel, J., Larsen, M., Smith, D.,
Vullo, C., Wallace, D. (1993) Asian Affinities and Continental Radiation of the Four
Founding Native American mtDNAs. American Journal of Human Genetics 53:563-590.
Torroni, A., Neel, J., Barrantes, R., Schurr, T., Wallace, D. (1994a) Mitochondrial
DNA “Clock” for the Amerinds and its implications for timing their entry into North
America. Proceedings of the National Academy of Sciences of the USA 91:1158-1162.
Torroni, A., Miller, J., Moore, L., Zamudio, S., Zhuang, J., Droma, T., Wallace, D.
(1994b) Mitochondrial DNA Analysis in Tibet: Implications for the Origin of the Tibetan
Population and Its Adaptation to High Altitude. American Journal of Physical
Anthropology 93:189-199.
Torroni, A., Huoponen, K., Franalacci, P., Petrozzi, M., Morelli, L., Scozzari, R.,
Obinu, D., Savontaus, M. L., Wallace, D. (1996) Classification of European mtDNAs
from an Analysis of Three European Populations. Genetics 144:1835-1850.
218
Torroni, A., Petrozzi, M., D’Urbano, L., Sellitto, D., Zeviani, M., Carrara, F.,
Carducci, C., Leuzzi, V., Carelli, V., Barboni, P., De Negri, A., Scozzari, R. (1997)
Haplotype and Phylogenetic Analyses Suggest That One European-Specific mtDNA
Background Plays a Role in the Expression of Leber Hereditary Optic Neuropathy by
Increasing the Penetrance of Primary Mutations 11778 and 14484. American Journal of
Human Genetics 60:1107-1121
Torroni, A., Bandelt, HJ., D’Urbano, L., Lahermo, P., Moral, P., Sellitto, D., Rengo,
C., Forster, P., Savontaus, M.L., Bonne-Tamir, B., Scozzari, R. (1998) mtDNA
Analysis Reveals a Major Late Paleolithic Population Expansion from Southwestern to
Northeastern Europe. American Journal of Human Genetics 62:1137-1152.
Torroni, A., Cruciani, F., Rengo, C., Sellitto, D., Lopez-Bigas, N., Rabionet, R.,
Govea, N., Lopez de Munain, A., Sarduy, M., Romero, L., Villamar, M., del
Castillo, I., Moreno, F., Estivill, X., Scozzari, R. (1999) The A1555G Mutation in the
12S rRNA Gene of Human mtDNA: Recurrent Origins and Founder Events in Families
Affected by Sensorineural Deafness. American Journal of Human Genetics 65:13491358.
Torroni, A., Bandelt, HJ., Macaulay, V., Richards, M., Cruciani, F., Rengo, C.,
Martinez-Cabrera, V., Villems, R., Kivisild, T., Metspalu, E., Parik, J., Tolk, HV.,
Tambets, K., Forster, P., Karger, B., Francalacci, P., Rudan, P., Janicijevic, B.,
Rickards, O., Savontaus, ML., Huoponen, K., Laitinen, V., Koivumäki, S., Sykes,
B., Hickey, E., Novelletto, A., Moral, P., Sellitto, D., Coppa, A., Al-Zaheri, N.,
Santachiara-Benerecetti, A.S., Semino, O., Scozzari, R. (2001) A Signal, from Human
mtDNA, of Postglacial Recolonization in Europe. American Journal of Human Genetics
69:844-852.
Torroni, A., Achilli, A., Macaulay, V., Richards, M., Bandelt, HJ. (2006) Harvesting
the Fruit of the Human mtDNA Tree. Trends in Genetics 22(6):339-345.
United Nations Population Division (UNPD) (2009) World Population Prospects: The
2008
Revision
Population
Database.
Retrieved
http://esa.un.org/unpp/
219
January
11th,
2011,
from
United Nations Population Fund (UNFPA) (2010) From Conflict and Crisis to
Renewal: Generations of Change; Demographic, Social and Economic Indicators.
Retrieved
10th,
November
2010,
from
http://www.unfpa.org/swp/2010/web/en/indicators.shtml
UNHCR - United Nations Refugee Agency (2003) Assessment for Uzbeks in
Afghanistan,
Retrieved
November
17th,
2010,
from
http://www.unhcr.org/refworld/docid/469f3a521d.html and http://www.unhcr.org/cgibin/texis/vtx/page?page=49e486eb6
UNHCR (2011) 2011 UNHCR Country Operations Profile - Afghanistan. Retrieved
January 6th, 2011, from http://www.unhcr.org/cgi-bin/texis/vtx/page?page=49e486eb6
United Nations Statistics Division (2010) 2010 World Population and Housing Census
Programme.
Retrieved
November
15th,
2010,
from
http://unstats.un.org/unsd/demographic/sources/census/censusdates.htm
Vigilant, L., Stoneking, M., Harpending, H., Hawkes, K., Wilson, A. (1991) African
Populations and the Evolution of Human Mitochondrial DNA. Science 253:1503-1507.
Wallace, D., Brown, M., Lott, M. (1999) Mitochondrial DNA Variation in Human
Evolution and Disease. Gene 238:211-230.
Watson, J., Crick, F. (1953) A structure for deoxyribose nucleic acid. Nature 171:737738.
Watterson, G. (1975) On the Number of Segregating Sites in Genetical Models without
Recombination. Theoretical Population Biology 7:256-276.
Weather and Climate in Afghanistan (2011) Retrieved January 10th, 2011, from
http://www.southtravels.com/asia/afghanistan/weather.html
Weinbaum, M. (2011) Afghanistan In Encyclopædia Britannica. Retrieved January 8th,
2011, from http://www.britannica.com/EBchecked/topic/7798/Afghanistan
220
Wilber, D. (1962) Afghanistan: Its People, Its Society, Its Culture. New Haven: Hraf
Press.
Winters, C. (2011) The Gibraltar Out of Africa Exit for Anatomically Modern Humans.
Webmed Central 2(10):WMC002319.
Wolpoff, M., Wu, X., Thorne, A. (1984) Modern Homo sapiens Origins: A General
Theory of Hominid Evolution Involving the Fossil Evidence from East Asia. In F. Smith,
F. Spencer (Eds), The Origins of Modern Humans: A World Survey of the Fossil
Evidence (pp 411-483). New York: A. R. Liss.
Wolpoff, M., Hawks, J., Caspari, R. (2000) Multiregional, Not Multiple Origins.
American Journal of Physical Anthropology 112:129-136.
Yang, Y., Zhang, P., He, Q., Zhu, Y., Yang, X., Lv, R., Chen, J. (2011) A New
Strategy for the Discrimination of Mitochondrial DNA Haplogroups in Han Population.
Journal of Forensic Sciences 56:586-590.
Zerjal, T., Xue, Y., Bertorelle, G., Wells, R. S., Bao, W., Zhu, S., Qamar, R., Ayub,
Q., Mohyuddin, A., Fu, S., Li, P., Yuldasheva, N., Ruzibakiev, R., Xu, J., Shu, Q.,
Du, R., Yang, H., Hurles, M., Robinson, E., Gerelsaikhan, T., Dashnyam, B., Mehdi,
Q., Tyler-Smith, C. (2003) The Genetic Legacy of the Mongols. American Journal of
Human Genetics 72:717-721.
Zimmerman, B., Brandstätter, A., Duftner, N., Niederwieser, D., Spiroski, M.,
Arsov, T., Parson, W. (2007) Mitochondrial Control Region Population Data from
Macedonia. Forensic Science International: Genetics 1:e4-e9.
Zlojutro, M., Tarskaia, L., Sorensen, M., Snodgrass, J.J., Leonard, W., Crawford,
M. (2008) The Origins of the Yakut People: Evidence from Mitochondrial DNA
Diversity. International Journal of Human Genetics 8:119-130.
Zvelebil, M. (1980) The Rise of the Nomads in Central Asia. In A. Sherratt (Ed). The
Cambridge Encyclopedia of Archaeology (pp252-256). New York:Crown.
221
Appendices
Appendix 1: Materials
Appendix 2: Ethical Consent Forms
Appendix 3: mtDNA HVS-I Sequence
Data
222
Appendix 1: Materials
223
Equipment
1-10µl pipettor
1-20µl pipettor
10-100µl pipettor
100-1,000µl pipettor
Techne TC-3000 Thermocycler
Applied Biosystem Veriti Thermocycler
Thermo Electron Corporation Px2 Thermal
Cycler
Jouan BR4i Centrifuge
ThermoScientific Heraeus Pico 17
Centrifuge
MSE MicroCentaur Centrifuge
Grant Heatblock
Eppendorf Concentrator 5301 Vacuum
Captair Bio PCR UV Cabinet
Priorclave Compact 40 Benchtop
Autoclave
Stinol Fridge/Freezer
GeneFlash Syngene Bio Imaging UV
Transilluminator
BioRad PowerPac Basic Power Pack
Fissions Whirlmixer Vortex
Mettler PJ400 Scales
NanoDrop Spectrophotometer ND-1000
Stuart Scientific Magnetic Stirrer SM1
Proline Microwave
LEEC Heated Cabinet
Electrophoresis Tank
Consumables
0.5-10µl Pipettor tips
1-200µl Pipettor tips
100-1,000µl Pipettor tips
2ml Microcentrifuge tubes
1.5ml Microcentrifuge tubes
0.5ml Microcentrifuge tubes
0.2ml Domed PCR tubes
Macherey-Nagel NucleoSpin Extract II Kit
Restriction Enzymes
AluI
BfaI
BstNI
HaeII
HaeIII
HhaI
HincII
HinfI
HpaI
HphI
MboII
MnlI
MseI
NlaIII
224
Solutions
● 10x TBE
● 10% Ammonium Persulphate (APS)
54g Tris Base
27.5g Boric Acid
4.65g EDTA
1g APS in 10ml dH2O.
Added to 500ml dH2O.
● Ethidium Bromide (10mg/mL)
● Glycogen
0.2g in 20ml dH2O.
20mg/ml
225
DNA Isolation Protocol
226
227
228
Appendix 2: Ethical Consent Forms
229
230
231
232
233
234
235
Appendix 3: mtDNA HVS-I Sequence
Data
236
Forensic Format of HVS-I Sequence Haplotypes for the Afghani Ethnic Groups
Baluch:
Sample Number
43_Afghani_Bal
Polymorphic Sites
16126.C 16163.G 16186.T 16189.C 16294.T 16325.C
44_Afghani_Bal
16172.C 16183.C 16189.C 16193.1C
49_Afghani_Bal
16071.T
51_Afghani_Bal
16145.A 16176.T 16223.T 16261.T 16311.C
97_Afghani_Bal
16069.T 16126.C 16145.A 16172.C 16222.T 16261.T 16292.A 16344.T
98_Afghani_Bal
16069.T 16093.C 16126.C 16145.A 16240.G 16261.T
99_Afghani_Bal
16189.C 16189.1C 16193.1C 16223.T 16278.T 16311.C
100_Afghani_Bal
16354.T
101_Afghani_Bal
16093.C 16223.T 16362.C
103_Afghani_Bal
16182.- 16183.C 16189.C 16193.1C 16223.T 16290.T 16319.A 16362.C
104_Afghani_Bal
16256.T 16294.T 16352.C
114_Afghani_Bal
16129.A 16223.T
121_Afghani_Bal
16240.G 16256.T 16294.T 16352.C
122_Afghani_Bal
16300.G 16325.C 16362.C
123_Afghani_Bal
16071.T
Pashtun:
Sample Number
Polymorphic Sites
20_Afghani_Pas
16183.C 16189.C 16193.1C 16249.C 16265.G
25_Afghani_Pas
16140.C 16182.- 16183.C 16189.C 16193.1C 16193.2C 16217.C 16274.A 16335.G
33_Afghani_Pas
16184.T
34_Afghani_Pas
16183.C 16189.C 16193.1C 16223.T 16278.T
38_Afghani_Pas
16136.C 16174.T 16248.T 16266.T 16304.C 16325.C 16356.C
39_Afghani_Pas
16223.T 16289.G
47_Afghani_Pas
16309.G 16318.T 16343.G 16362.C
80_Afghani_Pas
16192.T 16217.C 16357.C
138_Afghani_Pas
16069.T 16126.C 16145.A 16222.T 16261.T
139_Afghani_Pas
16217.C
162_Afghani_Pas
16266.T 16304.C 16311.C 16356.C
186_Afghani_Pas
Anderson
187_Afghani_Pas
16223.T 16227.G 16262.T 16278.T 16294.T 16362.C
191_Afghani_Pas
16129.A 16223.T
237
Hazara:
Sample Number
Polymorphic Sites
1_Afghani_Haz
16223.T 16290.T 16319.A 16362.C
2_Afghani_Haz
16223.T 16362.C
5_Afghani_Haz
16129.A 16223.T 16298.C 16319.A 16327.T
6_Afghani_Haz
16362.C
7_Afghani_Haz
16223.T 16311.C
8_Afghani_Haz
16111.T 16129.A 16223.T 16257.A 16261.T
10_Afghani_Haz
16223.T 16297.C 16298.C 16327.T 16357.C
11_Afghani_Haz
16071.T
13_Afghani_Haz
16172.C 16183.C 16189.C 16193.1C 16232.A 16249.C 16304.C 16311.C
15_Afghani_Haz
16183.- 16189.C 16193.1C 16223.T 16278.T
18_Afghani_Haz
16129.A 16223.T 16297.C
19_Afghani_Haz
16223.T 16288.C 16298.C 16327.T
28_Afghani_Haz
16189.C 16193.1C 16223.T 16278.T
40_Afghani_Haz
16069.T 16126.C 16145.A 16172.C 16261.T 16292.A 16344.T
41_Afghani_Haz
16069.T 16126.C 16145.A 16172.C 16222.T 16261.T
102_Afghani_Haz
16183.C 16189.C 16193.1C 16223.T 16278.T
105_Afghani_Haz
16051.G 16086.C 16291.T 16305.T 16353.T
106_Afghani_Haz
16129.A 16189.C 16189.1C 16193.1C 16223.T 16248.T 16297.C
107_Afghani_Haz
16093.C 16223.T 16230.G 16234.T 16311.C 16362.C
108_Afghani_Haz
16111.T 16136.C 16223.T 16260.T 16298.C
109_Afghani_Haz
16209.C 16230.G 16256.T
110_Afghani_Haz
16037.G 16041.G 16172.C 16183.- 16189.C 16193.1C 16232.A 16249.C 16304.C 16311.C
113_Afghani_Haz
16185.T 16209.C 16260.T 16298.C
115_Afghani_Haz
16223.T 16294.T 16362.C
116_Afghani_Haz
16223.T 16239.T 16240.C 16274.A 16311.C 16319.A
117_Afghani_Haz
16224.C 16311.C 16362.C
118_Afghani_Haz
16129.A 16175.G 16180.- 16181.- 16189.C 16189.1C 16193.1C 16193.2C 16311.C
119_Afghani_Haz
16189.C 16193.1C 16223.T 16290.T 16319.A 16362.C
120_Afghani_Haz
16129.A 16223.T 16298.C 16319.A 16327.T
124_Afghani_Haz
16270.T
125_Afghani_Haz
16304.C
128_Afghani_Haz
16223.T 16298.C 16327.T
129_Afghani_Haz
16182.- 16183.C 16189.C 16193.1C 16319.A 16362.C
130_Afghani_Haz
16092.C 16129.A 16148.T 16223.T 16271.C 16362.C
131_Afghani_Haz
16126.C 16292.T 16294.T
133_Afghani_Haz
16111.T 16140.C 16183.C 16189.C 16193.1C 16234.T 16243.C
135_Afghani_Haz
16093.C 16129.A 16223.T 16298.C 16327.T
136_Afghani_Haz
16092.C 16129.A 16148.T 16223.T 16271.C 16362.C
151_Afghani_Haz
16356.C
168_Afghani_Haz
16311.C 16356.C 16362.C
238
Tajik:
Sample Number
Polymorphic Sites
30_Afghani_Taj
16223.T 16290.T 16319.A 16362.C
32_Afghani_Taj
Anderson
134_Afghani_Taj
16189.C 16189.1C 16193.1C 16223.T 16278.T 16311.C
140_Afghani_Taj
16356.C
142_Afghani_Taj
16201.T 16209.C 16223.T 16265.G
143_Afghani_Taj
Anderson
145_Afghani_Taj
16172.C 16184.A
149_Afghani_Taj
16274.A
170_Afghani_Taj
16134.T 16172.C 16356.C
173_Afghani_Taj
16266.T 16304.C 16311.C 16356.C
175_Afghani_Taj
Anderson
176_Afghani_Taj
16071.T 16172.C
188_Afghani_Taj
16327.T
189_Afghani_Taj
16071.T 16362.C
190_Afghani_Taj
16185.T 16354.T
193_Afghani_Taj
16173.T 16223.T 16362.C
198_Afghani_Taj
16325.C
200_Afghani_Taj
16172.C 16223.T 16362.C
239
Mismatch Format of HVS-I Sequence Haplotypes for the Afghani Ethnic Groups
Baluch:
Pashtun:
240
Hazara:
241
Tajik:
242