the nature of nucleotide sequence divergence

Comments

Transcription

the nature of nucleotide sequence divergence
Copyright 0 1984 by the Genetics Society of America
T H E NATURE OF NUCLEOTIDE SEQUENCE DIVERGENCE
BETWEEN BARLEY AND MAIZE CHLOROPLAST DNA
GERARD ZURAWSKI,* MICHAEL T. CLEGG+ AND ANTHONY H. D. BROWN*
*DNAX Research Institute, Palo Alto, Calqornia 94304; ?Department of Botany and Department of
Molecular and Population Genetics, University of Georgia, Athens, Georgia 30602; and *Division of Plant
Industry, CSIRO, Canberra, A.C. T. 2601, Australia
Manuscript received August 1, 1983
Revised copy accepted December 14, 1983
ABSTRACT
Analysis of a 2175-base pair (bp) SmaI-Hind111 fragment of barley chloroplast DNA revealed that rbcl (the gene for the large subunit of ribulose 1,5bisphosphate carboxylase) and atpB (the gene for the fl subunit of ATPase) are
transcribed divergently and are separated by an untranscribed region of 155166 bp. The rbcl mRNA has a 320-residue untranslated leader region, whereas
the [email protected] mRNA has a 296- to 309-residue leader region. The sequence of
these regions, together with the initial 113 bp of the atpB-coding region and
the initial 1279 bp of the rbcl-coding region, is compared with the analogous
maize chloroplast DNA sequences. T w o classes of nucleotide differences are
present, substitutions and insertions/deletions. Nucleotide substitutions show a
1.9-fold bias toward transitions in the rbcl-coding region and a 1.5-fold bias
toward transitions in the noncoding region. The level of nucleotide substitutions between the barley and maize sequences is about O.O65/bp. Seventy-one
percent of the substitutions in the rbcl-coding region are at the third codon
position, and 95% of these are synonymous changes. Insertion/deletion events,
which are confined to the noncoding regions, are not randomly distributed in
these regions and are often associated with short repeated sequences. The
extent of change for the noncoding regions (about 0.093 events/bp) is less than
the extent of change at the third codon positions in the rbcl-coding region
(about 0.135 events/bp), including insertion/deletion events. Limited sequence
analysis of the analogous DNA from a wild line (Hordeum spontaneum) and a
primitive Iranian barley (H. uulgare) suggested a low rate of chloroplast DNA
evolution. Compared to spinach chloroplast DNA, the barley rbcl-atpB untranslated region is extremely diverged, with only the putative rbcl promoters and
ribosome-binding site being extensively conserved.
RECISE estimates of the nature and rate of evolution of chloroplast DNA
P
depend on the availability of nucleotide sequence data from defined,
cloned DNA fragments. Comparison between the genes encoding the large
subunit of ribulose 1,5-bisphosphate carboxylase (rbcL) from spinach (Sf~inacia
oleracea) and maize (Zea mays) (ZURAWSKI et al. 1981) revealed that the sequences differed not only by nucleotide substitutions but by insertion/deletion
events. Furthermore, insertion/deletion events were confined entirely to noncoding sequences. The extent of divergence between these sequences (0.160
eventslbp in the coding region and about 0.35 eventslbp in the noncoding
Genetics 106: 735-749 April, 1984.
736
G. ZURAWSKI, M. T. CLEGG AND A. H. D. BROWN
region) was such that the substitutions and insertion/deletion events overlapped, thereby preventing analysis of the contribution of each type of event
and SUCIURA(1982) have compared seto sequence divergence. TAKAIWA
quence data for the 16s-23s spacer for chloroplast ribosomal rRNA gene
clusters from tobacco (Nicotiana tobacum) and maize. The strong conservation
of the two spacer sequences allows the separate resolution of substitution and
insertion/deletion events, thereby revealing the presence of short repeated
sequences at the sites of many insertion/deletions. In this study we present the
sequence of the noncoding region between rbcL and atpB and the coding region
for rbcL for barley (Hordeum vulgare) chloroplast DNA.
These sequence data, together with our definition of the regions coding for
the 5' ends of rbcL and atpB mRNA, allow a comparison with previously
and BOGORAD1980, as corrected by POULSEN
published (MCINTOSH,POULSEN
1981, and KREBBERSet al. 1982) data from another grass, maize. These data
permit estimates of the nature and degree of sequence divergence between
barley and maize chloroplast DNA, with particular reference to the relative
contribution of substitution and insertion/deletion events to chloroplast DNA
evolution. The data also allow an analysis of the distribution of these events
among nontranscribed, transcribed but noncoding and coding sequences. Finally, we discuss the use of divergence in the noncoding regions to identify
functional elements such as ribosome-binding sites and promoters.
MATERIALS AND METHODS
Cloning of barley chloroplast DNA: Barley chloroplasts were prepared by a nonaqueous procedure
(BOWMAN
and DYER 1982) and the DNA was purified by phenol extraction. Three lines were
selected to span the range of genetic diversity in wild and cultivated barley and included two
entries from H. vulgare (the cultivar Clipper and a land race, CPI 77169) and one entry from H.
spontaneum (CPI 77144) (described in CLEGG,BROWNAND WHITFELD1984). E. coli ECR291 cells
were transformed with ligated HindIII restricted chloroplast and pBR322 DNAs (BOLIVAR
et al.
1977) and plated at 200 pg/ml of ampicillin in the tetracycline-sensitiveselective medium of MALOY
and NUNN(198 1). Colonies patched onto replicate plates were screened by colony hybridization
(HANAHAN
and MESEWN 1980) using nick translated (MANIATIS et al. 1976) 1750-bp EcoRI
fragment from spinach chloroplast DNA containing sequences coding for atpB and rbcL (ZURAWSKI
et al. 1981). In this fashion chloroplast DNA from each line yielded plasmids pHvu H1 (Clipper),
pHvu H2 (land race) and pHsp H3 (H. spontaneum) each containing a -6-kb Hind111 fragment
inserted into pBR322.
pHvu H I , pHvu H2 and pHsp H3 DNA (1 pg each) were digested with SmaI and HindIII and
mixed with 0.5 fig of SmaI and HindIII cut pUC8 (VIERIRA
and MFSSINC 1982) DNA. The DNAs
were precipitated with ethanol, dried and ligated in a volume of 50 PI. Protocols for ligation and
transformation of strain JMlOl (VIERIRAand MESSING 1982) have been described previously
(ZURAWSKI,
BOTTOMLEY
and WHITFELD1982). Plasmid DNA was isolated from ampicillin-resistant
colonies with no expression of lacZ (VIERIRAand MESSING 1982) and screened for SmaI-Hind111
fragments derived from the three parental plasmids inserted into pUC8. The plasmids were named:
pHvu SH13 and pHvu SH14 (bearing, respectively, the atpB and rbcL SmaI-Hind111 fragments
derived from pHvu Hl), pHvu SH21 and pHvu SH26 (bearing respectively, the atpB and rbcL
SmaI-Hind111 fragments derived from pHvu H2) and pHsp SSH331 (bearing the rbcL SmaISmaIHindIII fragment from pHsp H3).
Nucleotide sequence and mRNA 5' end determinations: "P 5' end-labeled DNA restriction fragments
(1980) except that the C- and
were prepared and sequenced as described by MAXAMand GILBERT
T-specific reactions of RUBINand SCHMID
(1980) were used. Electrophoresis on thin polyacrylamide
737
CHLOROPLAST DNA SEQUENCE DIVERGENCE
gels was according to SANGERand COULSON(1978). Reverse transcriptase mapping of chloroplast
mRNA 5' ends was as described previously (ZURAWSKI
et al. 1981) except end-labeled DNA
fragments and RNA (50 pg) were hybridized in 10 PI of 40 mM Pipes, pH 6.9, 0.5 M NaCI, 1 mM
EDTA, 70% formamide for 2 h at 56" prior to reverse transcription.
RESULTS
Sequence analysis and 5' end mapping using the barley chloroplast DNA insert of
pHvu SH14: Figure 1 shows the individual sequence runs used to define the
structure of the SmaI-Hind111 DNA fragment from pHvu SH14. By comparison with previously determined maize sequences (MCINTOSH, POULSENand
BOGORAD1980, as corrected by POULSEN1981, KREBBERS et al. 1982) the
SmaI and HindIII sites were placed, respectively, 30 bp into the coding region
of atpB and 1279 bp into the coding region of rbcL. Figure 2 shows both the
barley and maize sequences and includes data for an additional adjacent SmaISmaI fragment (sequenced from pHsp SSH331) located from 30 to 110 bp
into the atpB-coding region.
The 5' ends of rbcL and atpB mRNA (indicated by arrows in Figure 2) were
determined using reverse transcription of total barley leaf RNA primed with
5' end-labeled fragments derived from pHvu SH 14. A preliminary experiment
using a TaqI-EcoRI fragment (coordinates -53 to -28, rbcL, Figure 2) labeled
at the EcoRI site revealed that rbcL mRNA with at least 200 untranslated
residues is present in barley. Experiments using a RsaI-DdeI fragment (coordinates -199 to -158, rbcL, Figure 2) labeled at the DdeI site and an HpaIEcoRI fragment (-273 to -238, rbcL Figure 2) labeled at the EcoRI site (Figure
3a) defined the 5' end of the transcript at coordinate -320 (rbcL, Figure 2).
A HinfI-Sau3A fragment (-195 to -253, atpB, Figure 2) was used to define
two major 5' ends (-296 and -297 atpB, Figure 2) and two minor 5' ends
-306 and -309 atpB mRNA (Figure 2) for barley atpB mRNA (Figure 3b).
Thus, the barley chloroplast DNA rbcL-atpB intergenic region is 155-167 bp.
The total untranslated region between the rbcL- and atpB-coding regions is
N Z W
0
atp8
c
-
o--.----,
_______-
0.
+
0-
...........
600
400
-
N
800
rbcL
...........................
-------J
P
- 0
L
Q
h
0
16W
I200
140
P
1WO
0
rg
..............
0-*
...............
Q
--a
2doo
I
)______
-0-s
le00
0
t
0
0
FIGURE1.-Map and sequencing strategy for the 2175-bp SmaI-Hind111 fragment of barley
chloroplast DNA. SmaI, EcoRI, PstI and HindIII restriction sites are indicated. The coordinates
are base pairs relative to the first SmaI site. N indicates the amino-terminus of the coding regions
(solid bars) for rbcL and [email protected] Lines denote transcribed but untranslated DNA; the open bar is
the untranscribed intergenic region. The arrows under the map indicate the individual sequencing
runs used to establish the sequence presented in Figure 2. Solid arrows are sequencing runs of
pHvu SH13 DNA, dotted arrows are pHvu SH26 DNA and dashed arrows are pHsp SSH331
DNA.
738
G. ZURAWSKI, M. T. CLEGG AND A. H. D. BROWN
100
CCCGG000~AGTGACATCCffiCACGOGTCCAATAATTTGATCGATACGACCTGTACTTTTTTCTTCACTTGTGGAAACCCCGGWICGAGMGTAGTAGGATTGGTTCTCATAATTAT
c ACG
C
--_AATTAT
-50
-100
CACATAATT
WI(ULMAffiGPIATTTGTCGAAATTTTTCTTTTTTTATTGTTWIATR(ITGCCAMTCAAATCAAAAA~A
ATCCAAAAGTAIAAAGOAAATGAATTAG
CACRTR(ITTTTCA(UIAAAMAAGAATTTGTCGAAATTTT
TTTTCTTGTTGAATAATGCC
AAATCAAAAAAAATATCCAA~AATCCAAAAGTCAAAAGGAAATWIATTAG
-150
-200
TTAATTCMTAAUlGPlWIAMG~~CAGGACTTGATTTCGTTGCCCA&GCGR(ITCCCATTCAATCGTTTACTCATOGAATGbGTCCGTTGGAAAGTTCAATCAATCTTTTTTTCATA
TTAATTCAATAAAAPA(UIAGGGGACTCGCACTTGATTTCG~TGCCCMGCGAATCCCATTCAATCGTT1ACTTATGGAATGAATCCGTTGGAAAGTTC~ATCCA TTTTTTTCRTA
8 8
81 8
-250
-300
-350
TKATTTTGCCTTTTGTOGAGGATCTGTGCCTACTCT~CTTTCCTATCTAGGACTTCWLTATACAAAATATATACTACTGTG~AGCATAGATTGCTGTCA~CAAAGAATTTTATTAGTA
TMATTTCGCCTTTTGTG~OGATCTGTGCCTACTCT~CTTTCCTATCTAGGACTTCGATATACAAAATATATACTACTGTGAAGCATAGATTGCTGTCAACAG~GMTTTTCTTAGTA
8
1
1
atPBE
8
I
*
-400
-350
tbcL
TTTffiTTAOOTATTTGCATTCCMAT~AAAffi ffiACCTATlAffiAACTTGTAMA~AAGGATTAGGGATTAAT~TGGG~TGCGCTA~ACClAlCAAAGAG~A~AC~AlAATGA~
TTT
A G G T A T T I A W I T T C A M A T A T C ~ ~ A G A A CT1T M M A TIGTAAAATAAAGATTAGGG
TTlGGGTTGCGETATAlCTATCAAAG~GTATACAAlA~TGAT
l t t
1 1 8
t
8
1
-300
-250
-200
~ATTTOGTAAOTCA(LP1TCCATGGTTTMTMCG~CGTGlTAACT~~CATAACAACAAClCAATTCCTATCGAATTCCTATAG~OGMTTCCT~TAGGATAGAACA~~CACAGGGT
W T l T G G T G M T C t i A A T CACOGTTTAATAACGAACCGTGTTAACTTACC~TAACAACAACTCAATTCCTATCGAATTCCT~TAGTAAAATTCCTATAGGAIAGAACGTACACAGGGT
81
-150
-100
GTACGCATTA~ATATWLATG~CA~ATTCAT~AACClAAGCATGCCCTCAATTTTCTT~AAlWIGTTGATA~TA~AlTAATTWI~TATCCTTT~TG~~~T~CGAGAlTT~TGCTAAAG
GTATK
ATAAAIGMTCAAACATATTACTTAACTTAAGCATACTCCTITTTTTATTTAATGAGTTGATATTA
RTTAAATATCATTTTITTT
AGATTTTTGCARAGG
81
1
1
18
1
I 1 1811
I
1
I
1
1 1
-50
TTTCATTTACGCCTAATTMC~~C~TAWICCCTGTTATTGTGAWLATTCTTAATTCAAGffiTTGTffiGGAGGGACllATGTCACCACAA~CAGAA~ClA~AGCAGGlGlTGGATTTC
TTTC TTTCGCCTMTCC TATCGAGTTGTCCCTGTWTTGTGTGAATTCTTAATTCATGAGTTGTffiGGAOGGACll~~~
A
A
8
* I t 1
at
18
1
I AA
AAOCTGGTGTTAAAGATTATAAATTGACTTACT~KCCCAGAGTATGAAACTAAGG~~~TGATATCTTGGCAGCATTCCGAGTAAGTCCTCAGCCTGGGGTTCCGCCCGAAGAAGCA
4
C
C
a
TC
t %
200
G~CTGCAGTAGCTGCCGAATCTTCTACTGGTACATGGACAACTGTTTGGACTGATG~TTACCAGTCTTGATCGTTACAA~GGACGATGCTATCACATCGAGCCTGTTGCTGGGGA
9
%
G
c
c
300
AOACIIOCCA(IT66ATCTGTTATGTA~CTTATCCATTAG~CTATTTGAGGAGOGTTCCGTTACTAACATGTTT~CTTCCATTGTGGGTAACGTATTTGGGTTCAAAGCCCTACGTGCTC
CCCAGAT
AT
a
t
c
400
500
TACGTTTGGAGMITCTACGAATTCCCCCTACTTATTC~TTTCCAffiGCCCGCCTCATGGTATCCAAGTTGAA~GAGATAAGTTGAACAAGTATGGCCGTCCTT~ATTGGGATGT
G
a Gc
G
e
c t
600
ACTATTAAACCAMATTGGG~TTATCCGCAAAAAATTATGGT~GCGTGTTATGffiTGTCT~GTGGTGGACTTGATTTT~CCAAAG~TGATGAAAACGTAAACTCACAACCATTTAT
C
C
700
GCGCTGGAWIGACCGTTTTGTCTTTTGlGCCG&AGCTATTTATR(IATC~CAGGCCGAAACCGGTGA~ATCAAGGGGCATTACTTGAATGCGACTGCGGGTACATGTGAAGAAATG~TTA
C
a
a
a
c T
800
AGPIWLGCTGTRTTTGCGAGffiAATTAGGGGTTCCTATTGTAATGCATGACTACTTAACCGGGGGATTC~CCGC~AATACTACTTTGGCTCACTATTGCCGCG~CAATGGCTTACTTCTT
G
a dC
% a t
T
t
c
c
CACA~~CACCG~GCAA~GCATGCA~~A~~GA~~~~A~~AAR(I~~A~GGTA~GCA~~~~CGTGTATTAGCTAAAG~ATTGCGTATGTCTGGGGGAWITCATATCCACTCCGGTACAG
a
1000
1100
AGTAUjTAffiTTAGAAGGGGAACGCG~AATGACTTTA~TTTTGTTGATTTATTGCGCGATGATTTTATTGAA~AAGATCGTGCTCGCGGTATCTTTTTCACTCAGGACTGGGTATCCA
1200
TGCCAGGTGTTATACCGGTACTTCAOGTGGTATTCATGTTTGGCATATGCC~GCTCTGACCGAAATCTTTGGGGACGATTCTGTATTACAATTTGGTGGAGGAACTTTAGGACATCCT
9
t %
C
a t
a
1279
TGOGGGAATGCACCTGGTGCAGCAOCTAATCGAGTGGCTTTAGAAGCTT
a
i
l
t
a
C
FIGUREZ.-Comparison between the nucleotide sequence of the barley cholorplast DNA 2175bp SmaI-Hind111 fragment and the analogous maize sequence. In the rbcL- and [email protected] regions
only differences between the barley (upper line) and maize (lower line) (MCINTOSH,
POULSENand
BOGORAD1980, as corrected by POULSEN1981; KREBBERSet al. 1982) sequences are indicated.
Synonymous changes are in lower case. In the noncoding region the sequences have been aligned
to yield maximum homology consistent with minimal number of insertion/deletion events. Nucleotide differences are highlighted with asterisks. Arrows indicate the locations of the major 5’
ends of the rbcL and atpB mRNAs. Coordinates are relative to the translation start codons which
are underlined.
similar in barley (784 bp) and maize (759 bp, Figure 2). Within the region the
sizes of the transcribed but untranslated atpB 5’ leaders from barley (296-309
residues) and maize (297) are identical. However, the rbcL mRNA of maize
has been determined to have a 63- to 64-residue 5’ leader (Figure 2). The
possible significance of these differences will be discussed.
Limited sequence analysis of related barley chloroplast DNAs: To estimate the
extent of sequence divergence between chloroplast DNAs from cultivated barley, a land race of barley and the wild progenitor of cultivated barley (H.
739
CHLOROPLAST DNA SEQUENCE DIVERGENCE
G A T C
R G A T C
R
-308
-325
-305
-322
-303
-31 9
-30 1
-299
-298
-297
-296
-31 5
-31 4
-31 3
-294
-292
-31 0
di
-290
-306
b
U
.
frT
FIGURE%-Reverse transcriptional mapping of the 5’ ends of (a) r b d and @) a @ mRNAs.
Both figures are autoradiographs of 6% polyacrylamide-7 M urea sequencing gels (SANGER
and
Courso~1975). T h e R, G . A. T. C lanes are, respectively. reverse transcription reactions (see
MATERIALS AND METHODS)and G. A + G,T. Cspecific chemical modification reactions (see
MATERIALSAND METHODS).a, T h e substrate for the sequencing reactions is the EcoRI (coordinate
-238. r6rL) Hhal (-350. r b d ) fragment 5‘ end labeled at the EcoRl end. T h e primer for the
reverse transcription reaction is EcoRl (-238. r b d ) Hpal (-273, rbcL) 5’ end labeled at the
EroRI site. b, T h e substrate for the sequencing reactions is the Hinfl (-195, afpB) HpaI (-273,
rbrL) fragment 5’ end labeled at the Hinfl end. T h e primer for the reverse transcription reaction
is the Hinfl (-195, nfpE) Sau3A (-253. [email protected]) fragment 5’ end labeled at the Hinfl end. T h e
coordinates in the figure refer to A residues in the vicinity of the 5’ endcoding region (Figure
2). In the case of (a), these correspond to T residues in Figure 2.
-
-
-
-
spontaneum), limited sequence analysis of pHvu SH26 and pHsp SSH33 1 was
undertaken. Figure 1 details the sequence runs which covered 235 and 120
coding nucleotides and 275 and 190 noncoding nucleotides from pHvu SH26
and pHsp SSH33 1, respectively. No nucleotide changes were present.
The nature of divergence behueen &he barlqr and maize sequences: Inspection of
Figure 2 reveals that the barley and maize sequences differ by nucleotide
740
G . ZURAWSKI, M. T. CLEGG AND A. H. D. BROWN
substitution and by insertion/deletion events. Moreover, the insertion/deletion
events all occur in the noncoding region, whereas nucleotide substitutions are
evenly distributed throughout the sequence. A second impression from the
data is that the nucleotide composition differs markedly between coding and
noncoding regions. Table 1 shows that in barley the noncoding regions are
31% GC as compared to 43% GC for the rbcL-coding region.
Table 2 classifies nucleotide substitutions into transitions and transversions
and further divides the transversion category into substitutions that preserve
hydrogen bonding (I) and those that alter hydrogen bonding (11). We calculate
the expected frequency of the various substitution events assuming the process
is random. Consider a single event, e.g., A c-* G. The probability that an event
(nucleotide substitution) occurs at a site with A is PA. The probability of
drawing a non-A nucleotide to replace A is %, assuming equal frequency of
introduced nucleotides. Thus, the probability that A is replaced by a different
nucleotide is '/s PA. Because only two sequences are being compared we do
not know whether A replaced G or G replaced A. Thus, the sum of both
pG), where the nucleotide frequencies (PA, pT,
kinds of events is % (PA
pG, pC) are taken from Table 1. A comparison of the distribution of observed
and expected events (Table 2) for the rbcL and noncoding regions shows a
+
TABLE 1
Nucleotide frequency in rbcL, atpB and intergenic regions of barley chloroplast D N A
Nucleotide
rbcL
atpB
Intergenic
A
T
C
G
0.2742
0.289 1
0.1922
0.2445
0.2566
0.2655
0.2212
0.2566
0.3532
0.3393
0.1601
0.1474
TABLE 2
Classijication of nucleotide substitutions between barley and maize chloroplast DNA
rbcL coding
Type of substitution
Expected
Intergenic region
Observed
Expected
Observed
Transitions
12.80
11.87
24
23
9.75
9.25
17
12
A
C-G
Transversions 1
13.89
10.77
8
6
13.16
5.84
8
4
A
T-G
Transversion I1
11.50
13.16
7
6
9.51
9.49
14
2
A-G
T-C
-
Xt
* P < 0.01.
* * P < 0.005.
30.47**
16.84*
741
CHLOROPLAST DNA SEQUENCE DIVERGENCE
significant excess of transition events. The bias favoring transitions for the
rbcL-coding region is about 1.9 as compared to about 1.5 for the noncoding
region. This transition bias is remarkable only in that it is small compared to
studies of animal mitochondrial DNA evolution (AQUADRO and GREENBERG
1983; BROWN, PRAGERand WILSON 1982) and slightly less than that estimated
for a multigene family of maize nuclear DNA (BROWNand CLEGG1983).
T o estimate the average number of base substitutions per site (denoted K),
we used the estimator of KIMURA (1981) to correct the bias in favor of transition events. Thus,
K = -(%)ln[(l
- 2P - [email protected](1 - 2P - 2R)(1 - 2Q - 2R)]
where P, Q and R are the relative frequencies of transition, transversion I and
transversion I1 events, respectively. The estimates are K = 0.065 f 0.025, K
= 0.076 f 0.010 and K = 0.060 f 0.007 for the atpB region, the noncoding
region and the rbcL region, respectively. Evidently, the relative rate of nucleotide substitution is very similar for all regions, despite the rather different
functional properties of the coding and noncoding sequences. Table 3 shows
that the relative rate of substitution in the rbcl-coding region is approximately
six times faster for third codon position sites as compared to second position
sites. The rate of synonymous substitution at third position sites entirely accounts for the accelerated rate of substitution in this position. Estimates of K
for the noncoding region, although heterogeneous, are on average only half
the rate for third position sites in rbcL. Thus, the rate of evolution in the
noncoding region is constrained relative to the third position rate for rbcl.
T o consider the distribution of nucleotide substitutions along the sequence,
we take as a null hypothesis that the distribution of substitutions is random. If
the substitution process is random, then the number of nucleotides separating
substitution events (n) follows the geometric distribution Npq", where N is the
total number of substitution events and p = 1 - q = N / L and where L is the
total number of nucleotides in the sequence (BROWNand CLEGG1983).
TABLE 3
Number of base substitutions per site (K) for first, second and third codon positions in
rbcL and for transcribed and nontranscribed portions of the intergenic region of barley
and maize chloroplast DNA
Codon position
K
ah'
____
rbcL
1st
2nd
3rd
K 'sb
TrbcL
TatpB
Nontranscribed
0.0360
0.0190
0.135
0.1 16
0.1080
0.0470
0.0691
denotes the standard error of the estimate of K.
' K denotes the synonymous fraction of third position substitutions.
'T denotes transcribed but not translated.
'nk
0.0094
0.0067
0.0188
0.0180
0.0195
0.0132
0.0202
742
G . ZURAWSKI, M. T. CLEGG AND A. H. D. BROWN
We apply two different tests of the null hypothesis that n has a geometric
distribution. First, we apply a x2 goodness-of-fit test in which the distribution
of n is divided into classes, subject to the constraint that the expected number
of events in each class is three or greater. Second, we apply a variance ratio
test in which the expected geometric variance, u2 = p / q 2 , is compared to the
empirical variance, S2. The quantity (k - 1)S2/u2is approximately distributed
as x2 with k - 1 degrees of freedom (see BROWNand CLEGG1983). These
two tests are not equivalent. In particular, long runs with very low expected
values will tend to be consolidated with shorter runs in the goodness-of-fit test.
Runs of this kind may, however, substantially increase the variance in the
variance ratio test. Thus, we regard the variance ratio test as more powerful
for detecting an overdispersed distribution. Conversely, the goodness-of-fit test
tends to be more powerful in detecting underdispersed distributions. Table 4
reports the results of these tests applied to the noncoding and coding regions.
Interestingly, the distribution of synonymous substitutions in rbcL is nonrandom by the goodness-of-fit criterion (0.01 < probability < 0.025) and approaches significance by the variance ratio criterion (0.05 < probability <
0.10). Substitutions that cause amino acid replacements are nonrandom by the
goodness-of-fit criterion (probability -0.025) but not by the variance ratio
criterion. This may be due to the clumped distribution of amino acid replacements. In the noncoding region, only the 5' leader sequence of rbcL departs
from a random distribution. This may reflect the relatively long runs between
coordinates -200 to -320.
The other class of sequence differences, insertion/deletion events, departs
from a geometric distribution at the 5% level by the goodness-of-fit criterion.
This departure arises because there is a marked excess of events distributed
within 10 to 20 bp of one another. This clustering may be a reflection of the
preference of insertion/deletion events to occur in relatively AT-rich regions,
since the regions affected are only about 23% GC as compared to 31% for
the remainder of the noncoding sequence.
Our analysis of the relative rate of nucleotide substitution showed that the
evolution of the noncoding region was taking place more slowly than the
TABLE 4
Tests j b r the f i t of obserued nucleotide runs to a geometric distribution
Noncoding region
T atpE
T rbcL"
rbcL region
Nontranscribed
Synonymous
Nonsynonymous
1.09
2
25.34*
13
16.24*
10.78
11
60.96
46
21.88
26
~
Goodness-of-fit
d.f.
0.91
3
12.88
Variance ratio
d.f.
5.89
12
5 1.96*
31
7
"TatpB and T rbcL denote transcribed but nontranslated 5' leader sequences of
* P < 0.05.
~
7
[email protected]
and rbcL.
743
CHLOROPLAST DNA SEQUENCE DIVERGENCE
synonymous substitution rate. However, the total rate of evolution should reflect
both nucleotide substitution and insertion/deletion events. This is particularly
important because nearly V4 of all events in the noncoding regions are insertion/
deletions. The total number of events per nucleotide for the noncoding region
becomes 0.093, assuming each insertion/deletion to be a single event, still
somewhat below the 0.135 third position rate for rbcL.
A conspicuous feature of the insertion/deletions is that they often involve the
creation or destruction of short repeated sequences. Table 5 lists the five
insertion/deletion events that belong to this class. TAKAIWA
and SUGIURA
(1982)
have noted a similar phenomenon in their comparison between the 16s-23s
spacer region of chloroplast rRNA gene clusters from tobacco and maize. In two
et al. 1980) and spontaother systems, the @-globingene family (EFSTRATIADIS
neous mutations in the E. coli lacl gene (FARABAUGHet al. 1978), a similar
coincidence of short repeated elements and deletions has been noted.
The use of sequence divergence to identijj noncoding sequences offunctional importance:
Figure 4b shows a dot matrix comparison between the rbcL-atpB intergenic
regions of barley and spinach chloroplast DNAs. Unlike the analogous barleymaize comparison (Figure 4a), only limited homology is apparent. The most
extensive homology corresponds to sequences surrounding the mature rbcL
mRNA 5' end-coding region and the sequence immediately proximal to the rbcL
translation initiation site. The sequence of the conserved nucleotides about the
mature rbcL mRNA 5' end-coding regions from barley and spinach is:
-350
b. TTGGGTTGCGCTATACCTATCAAAGAGTATACAATAATGA
S.
C
TA
G
-300
b. TGGATTTGGTAAATCAAATCCATGGTTTAATAACGAA
s.
T
CG
A
TA
The 39 nucleotides prior to, and the 37 nucleotides distal to, the assigned
TABLE 5
Small repeats associated with insertion/deletion changes between barley and maize
chloroplast DNA
AAATCAAATC
AAATC
ATCCAAAA
ATCCAAAAATCCAAAA
AATCAATC
AATCCA
- 76 atpB
-92 atpB
AGTTAGGT
AGGT
-362 atpB
TATTATATTA
TATTA
-1 19 rbcL
-220 atpB
*
*
The upper sequence and numbers refer to the barley chloroplast DNA sequence in Figure 2.
indicates divergence from the repeat unit.
744
G . ZURAWSKI, M. T. CLEGG AND A. H. D. BROWN
RBCL. DNA
a
b
CHLOROPLAST DNA SEQUENCE DIVERGENCE
745
(assuming there is no RNA processing of the 5’ end of the RNA) transcription
start site for rbcL mRNA (indicated by arrows in Figure 4a) are tightly conserved
relative to the entire noncoding region. This conservation is not due to chance,
and
since the analogous region is also conserved in tobacco rbcL (SHINOZAKI
SUGIURA
1982) and in pea rbcL (G. ZURAWSKI, unpublished results). It is most
probable that the requirement for specific sequences needed for RNA polymerase
binding and initiation restrains changes in this region through evolution. Similarly, the sequence immediately proximal to the rbcL translation initiation codon
is conserved between barley and spinach. This sequence:
b. AAGAGTTGTAGGGAGGGACTTatg
s. TGA
is also highly. conserved in tobacco (SHINOZAKI
and SUCIURA1982), maize
(MCINTOSH,
POULSEN
and BOGORAD
1980) and pea (G. ZURAWSKI,
unpublished
results). It is likely that this region serves for ribosome binding, and in fact it
contains a sequence complementary (underlined) to the 3’ end of 16s ribosomal RNA. KREBBERSet al. (1982) have suggested that the ATG 21-bp prior
to the initiator codon may start rbcL in maize. We feel this to be unlikely since
barley has an AAG at this position. In contast to rbcL the sequence proximal
to the atpB translation start site is poorly conserved between barley and spinach
(Figure 4b). It is possible that atpB has a class of ribosome-binding site in which
the sequence distal to, and including the ATG, is more important to ribosome
binding than proximal sequences.
DISCUSSION
The first indication that some chloroplast-encoded genes tended to be conserved in evolution came from studies of the rbcL protein (DORNER,
KAHNand
WILDMAN1958). Recent comparisons of complete nucleotide sequence data
for the rbcL gene among higher plants (ZURAWSKIet al. 1981) and among
cyanobacteria, algae and higher plants (CURTISand HASELKORN
1983; SHINOZAKI et al. 1983) have dramatically confirmed these early results. Studies of
DNA sequence variation among restriction endonuclease digests of total chloroplast DNA suggest that conservative rates of evolution are not confined to
the rbcL gene but are characteristic of the entire chloroplast genome (ATCHISON, WHITFELDand BOTTOMLEY
1976; PALMERand ZAMIR 1982; CLEGG,
RAWSONand THOMAS
1984). The present work is an attempt to define further
the nature of chloroplast evolution by comparing DNA sequence divergence
for different functional regions of the chloroplast genome.
Because a reasonable number of evolutionary events (nucleotide substitution
or insertion/deletion events) must be observed for statistical analyses, we have
defined functional region in broad terms. Thus, our primary focus has been
FIGURE4.-Comparison of the nucleotide sequences of the rbcL-atpB intergenic region from (b)
barley (x) and maize (y) and (a) barley (x) and spinach (y) chloroplast DNA. The numbers commence at the [email protected] initiator codon. The sequences were analyzed using the program described by
NOVOTNY(1982) using a tetranucleotide filter. The downward arrows indicate atpB mRNA 5’
ends. The upward arrows indicate rbcL mRNA 5’ ends (the hollow arrow is maize rbcL mRNA).
746
G. ZURAWSKI, M. T. CLEGG AND A. H. D. BROWN
directed toward the evolution of coding (rbcL) us. noncoding regions. Our
initial expectation was that the noncoding region would show a greater rate
of genetic change than associated coding regions. In one sense this expectation
has been confirmed, because the accumulation of insertion/deletion events has
resulted in considerable differentiation between barley and maize sequences.
Two points should be emphasized. First, the rate of occurrence of insertion/
deletion events is relatively slow (approximately one-quarter of the nucleotide
substitution rate). Second, insertion/deletion events occur more frequently in
highly AT-rich regions and tend to be associated with small repeats. If we take
the synonymous substitution rate for rbcL as an upper bound, on average the
noncoding region is evolving at approximately % the maximum rate. This
reduction can partly be ascribed to functional constraints on the sequence
imposed by putative promoters and ribosome-binding sites, which account for
an estimated 20% of the nontranslated region.
Insertion/deletion events between barley and maize are excluded from the
5‘ end-coding regions for rbcL and atpB mRNAs. In the comparison between
the barley and spinach intergenic regions, the accumulation of insertion/deletion and base changes has obliterated most of the sequence homology except
for the putative rbcL promoter and ribosome-binding site. Thus, as other untranslated regions are sequenced, comparisons with analogous regions from
other plants may be a fruitful way to identify sequences of functional importance. Conversely, sequences that tolerate gross changes are unlikely to be
functionally significant. It should be noted that a search for the atpB ”promoter” by the stated criteria would fail. However, a search for regions with
few or no insertion/deletion changes does detect the region immediately proximal to the mature atpB mRNA 5’ end-coding region.
The apparent lengths of 5’ untranslated leaders for maize and barley rbcL
differ considerably. The region encoding the 5‘ end of maize rbcL mRNA
contains extensive nucleotide changes and insertion/deletions in barley. In contrast the region encoding the 5’ ends of barley rbcL mRNA is highly conserved
in maize. This suggests that a small number of nucleotide changes have created
the barley rbcL promoter and that maize has evolved a separate rbcL promoter.
Alternatively, if the 5’ end of maize rbcL mRNA has been assigned incorrectly,
maize rbcL transcription may start where it starts in barley, spinach, tobacco
and pea chloroplast DNA and then undergo a specific processing event. A
further consideration is that, unlike the other plants mentioned, maize is a C4
plant and shows differential expression of rbcL mRNA in mesophyll and bundle
1978). The possibility of distinct rbcL
sheath cells (LINK,COENand BOGORAD
promoters in C3 and C4 plants awaits further study.
The data on the rate of chloroplast DNA evolution contrast to similar data
on mitochondrial DNA evolution in primates (AQUADRO
and GREENBERG
1983;
BROWN,PRAGER
and WILSON1982), where the rate of synonymous substitution
for primate mitochondrial DNA is approximately 10% per million years. Al(1981;
though it is difficult to date the separation of plant lineages, STEBBINS
G. L. STEBBINS,
personal communication) places the divergence time for these
species at 50 to 65 million years ago. Thus, to an approximation, it appears
CHLOROPLAST DNA SEQUENCE DIVERGENCE
747
that the synonymous rate for chloroplast DNA evolution among these grass
species is 1.1 nucleotide substitutions/site/1O9 yr (assuming the two lineages
diverged 50 million years ago). There are no sequence comparisons among
plant nuclear genes in which divergence times can be estimated. In animal
comparisons, KIMURA(1983) estimates the synonymous rate to be about 2 X
lo-’ to 3 X lo-’ substitutions/site/yr, depending upon the species and sequences compared. Pseudogene substitution rates are approximately 5 X 1O-’/
site/yr (LI 1983) in animal globin comparisons. Although the data at hand
suggest a more conservative rate of evolution for cpDNA sequences, sampling
errors and potential errors in the estimation of divergence times may make
such a conclusion premature. On the other hand, the 100-fold difference in
cpDNA, vs. mammalian mtDNA evolutionary rates, is very large and establishes that these two molecules evolve at very different rates. T h e reasons for
this large difference are not known.
M.T.C. acknowledges the support of National Science Foundation grant DEB-81 18414. The
authors would like to thank BILL BIRKYfor helpful suggestions concerning the manuscript.
LITERATURE CITED
AQUADRO,
C. F. and B. D. GREENBERG,
1983 Human mitochondrial DNA variation and evolution: analysis of nucleotide sequences from seven individuals. Genetics 103: 287-3 12.
ATCHISON,
B. A., P. R. WHITFELD
and W. BOTTOMLY,
1976 Comparison of chloroplast DNAs
by specific fragmentation with EcoRI endonuclease. Mol. Gen. Genet. 148: 263-269.
BOLIVAR,F., R. L. RODRIQUEZ,
M. C. BETLACHand H. W. BOYER,1977. Construction and
characterization of new cloning vehicles: ampicillin-resistant derivatives of the plasmid pMB9.
Gene 2: 75-93.
BOWMAN,
C. M. and T. A. DYER,1982. Purification and analysis of DNA from wheat chloroplasts
isolated in nonaqueous media. Anal. Biochem. 122: 108-1 18.
BROWN,A. H. D. and M. T. CLECC,1983. Analysis of variation in related DNA sequences. In:
Statistical Analysis of D N A Sequence Data, Edited by B. S. WEIR.Marcel Dekker, Inc., New York.
BROWN,W. M., E. M. PRACERand A. C. WILSON,1982. Mitochondrial DNA sequences of primates: tempo and mode of evolution. J. Mol. Evol. 1 8 225-239.
and K. THOMAS,1984. Chloroplast DNA variation in pearl millet
CLECG,M. T., J. R. Y. RAWSON
and related species. Genetics. In press.
CLECC,M. T., A. H. D. BROWNand P. R. WHITFELD,1984 Chloroplast DNA diversity in wild
and cultivated barley: implications for genetic conservation. Genet. Res. In press.
CURTIS,S. E. and R. HASELKORN,
1983. Isolation and sequence of the gene for the large subunit
of ribulose-I ,5-bisphosphate carboxylase from the cyanobacterium Anabaena 7120. Proc. Natl.
Acad. Sci. USA 8 0 1835-1839.
DORNER,R. W., A. KAHN and S. G. WILDMAN,
1958. Proteins of green leaves. VIII. The distribution of fraction I protein in the plant kingdom as detected by precipitin and ultracentrifugal
analyses. Biochim. Biophys. Acta 2 9 240-245.
T. MANIATIS, R. M. LAWN,C. O’CONNELL,
R. A. SPRITZ,J.
EFSTRATIADIS,
A., J. W. POSAKANY,
K. DERIEL,B. G. FOREST,S. M. WEISSMAN,
J. L. SLIGHTON,
A. E. BLECHL,0.SMITHIES, F.
E. BARALLE,
C. C. SHOULDERS
and N. J. PROUDFOOT,
1980. The structure and evolution of
the huma beta-globin family. Cell 21: 653-668.
P. J., U. SCHMEISSNER,
M. HOFERand J. H. MILLER,1978. Genetic studies of the lac
FARABAUCH,
748
G. ZURAWSKI, M. T. CLEGG AND A. H. D. BROWN
repressor. VII. On the molecular nature of spontaneous hotspots in the lac1 gene of Escherichia
coli. J. Mol. Biol. 126 847-857.
1980. A protocol for high density screening of plasmids in
HANAHAN,
D. and M. MESELSON,
X1776. Gene 10: 63-67.
KIMURA,
M., 1981. Estimation of evolutionary distances between homologous nucleotide sequences.
Proc. Natl. Acad. Sci. USA 78: 454-458.
KIMURA,
M., 1983. The neutral theory of molecular evolution. In: Evolution ofGenes and Proteins,
Edited by M. NEI and R. K. KOEHN. Sinauer Associates, Sunderland, Massachusetts.
L. MCINTOSHand L. BOGORAD,
1982. The maize chloroplast
KREBBERS,
E. T., I. M. LARRINUA,
genes for the @ and c subunits of the photosynthetic coupling factor CFI are fused. Nucleic
Acids Res. 10: 4985-5002.
LI, W. H., 1983. Evolution of duplicate genes and pseudogenes. In: Evolution ofGenes and Proteins,
Edited by M. NEI and R. K. KOEHN. Sinauer Associates, Sunderland, Massachusetts.
LINK,G., D. M. COENand L. BOGORAD,
1978. Differential expression of the gene for the large
subunit of ribulose bisphosphate carboxylase of maize leaf cell types. Cell 15: 725-731.
MALOY,S. R. and W. D. NUNN,1981. Selection for loss of tetracycline resistance by Escherichia
coli. J. Bacteriol. 145 1 110-1 1 12.
and F. C. KAFATOS,1976. Amplification and characMANIATIS,
T., S. G. KEE, A. EFSTRATIADIS
terization of beta-globin gene synthesized in vitro. Cell 8 163-182.
MAXAM,
A. M. and W. GILBERT,1980. Sequencing end-labeled DNA with base-specific chemical
cleavages. Methods Enzymol. 6 5 499-559.
and L. BOGORAD,1980. Chloroplast gene sequence for the large
MCINTOSH,L., C. POULSEN
subunit of ribulose bisphosphate carboxylate of maize. Nature 288: 556-560.
NOVOTNY,
J., 1982. Matrix program to analyze primary structure homology. Nucleic Acids Res.
10: 127-131.
J. D. and D. ZAMIR,1982. Chloroplast DNA evolution and phylogenetic relationships in
PALMER,
Lycopersicon. Proc. Natl. Acad. Sci. USA 7 9 5006-5010.
POULSEN,
C., 1981. Comments on the structure and function of the large subunit of the enzyme
ribulose bisphosphate carboxylase-oxygenase. Carlsberg Res. Commun. 4 6 259-278.
RUBIN,C. M. and C. W. SCHMID,1980. Pyrimidine-specific chemical reactions useful for DNA
sequencing. Nucleic Acids Res. 8 461 3-46 18.
1975. A rapid method for determining sequences in DNA by
SANGER,F. and A. R. COULSON,
primed synthesis with DNA polymerase. J. Mol. Biol. 94: 441-448.
SHINOZAKI,
K. and M. SUGIURA,
1982. Sequence of the intercistronic region between the ribulose1,5-bisphosphate carboxylase/oxygenase large subunit and the coupling factor @ subunit gene.
Nucleic Acids Res. 1 0 4923-4933.
and M. SUGIURA,
1983. Molecular cloning and sequence
SHINOZAKI,
K., C. YAMADA,
N. TAKAHATA
analysis of the cyanobacterial gene for the large subunit of ribulose-I ,5-bisphosphate carboxylase/oxygenase. Proc. Natl. Acad. Sci. USA 8 0 4050-4054.
STEBBINS,
G. L., 1981. Coevolution of grasses and herbivores. Ann. Missouri Bot. Card. 6 8 7586.
TAKAIWA,
F. and M. SUGIURA,
1982. Nucleotide sequence of the 16s-23s spacer region in an
rRNA gene cluster from tobacco chloroplast DNA. Nucleic Acids Res. 8 2665-2676.
VIERIRA,
J. and J. MESSING,1982. The pUC plasmids, an M13 mp 7derived system for insertion
mutagenisis and sequencing with synthetic universal primers. Gene 1 9 259-268.
ZURAWSKI,
G., W. BOTTOMLEY
and P. R. WHITFELD,1982. Structures of the genes for the @ and
CHLOROPLAST DNA SEQUENCE DIVERGENCE
749
subunits of spinach chloroplast ATPase indicates a dicistronic mRNA and an overlapping
translation stop/start signal. Proc. Natl. Acad. Sci. USA 7 9 6260-6264.
c
ZURAWSKI, G., B. PERROT,W. BOITOMLEYand P. R. WHITFELD,1981. The structure of the gene
for the large subunit of ribulose 1,5-bisphosphate carboxylase from spinach chloroplast DNA.
Nucleic Acids Res. 9 3251-3269.
Corresponding editor: J. E. BOYNTON

Similar documents