Natural Selection on the gag, pal, and eltv Genes of Human

Transcription

Natural Selection on the gag, pal, and eltv Genes of Human
Natural Selection on the gag, pal, and eltv Genes of Human Immunodeficiency
Virus 1 (HIV-l)
Stacy A. Seibert, Carina Y. Howell, Marianne K. Hughes, and Austin L. Hughes
Department
of Biology and Institute
of Molecular
Evolutionary
Genetics,
Pennsylvania
State University
Natural selection on polymorphic
protein-coding
loci of human immunodeficiency
virus-l (HIV-l), the more
geographically widespread of the two viruses causing human acquired immune deficiency syndrome (AIDS), was
studied by estimating the rates of nucleotide substitution per site in comparisons among alleles classified in families
of related alleles on the basis of a phylogenetic analysis. In the case of gag, pol, and gp41, the rate of synonymous
substitution
generally exceeded that of nonsynonymous
substitution,
indicating that these genes are subject to
purifying selection. However, in the case of several of the variable (V) regions of the gp120 gene, especially V2
and V3, comparisons within and between families often showed a significantly higher rate of nonsynonymous
than
of synonymous
nucleotide substitution.
This pattern of nucleotide substitution indicates that positive Darwinian
selection has acted to diversify these regions at the amino acid level. The V regions have been identified as probable
epitopes for antibody recognition; therefore, avoidance of such recognition seems likely to be the basis for positive
selection on these regions. By contrast, regions of HIV- 1 proteins identified as epitopes for T cell recognition show
no evidence of positive selection and are often highly conserved at the amino acid level. These results suggest that
selection favoring avoidance of T cell recognition has not been a major factor in the history of HIV-l and thus
that avoidance of T cell recognition is not likely to be a major factor in the pathogenesis of AIDS.
Introduction
Human
immunodeficiency
virus 1 (HIV-l),
the
more geographically
widespread of the two viruses causing human
acquired
immunodeficiency
syndrome
(AIDS), is known to exhibit high levels of genetic polymorphism even within a single patient (Hahn et al. 1986;
Fisher et al. 1988). Since viruses with RNA genomes are
known to have high mutation rates (Holland et al. 1982),
one possible explanation
for HIV-l polymorphism
is
that it is selectively neutral and that the high level of
polymorphism
is a consequence
of the high mutation
rate. An alternative hypothesis is that at least some HIV1 polymorphisms
are selectively maintained
and that
this natural selection arises as a result of host immune
defenses against the virus (Simmonds et al. 1990; Phillips
et al. 199 1; Holmes et al. 1992). So far it has been difficult
to decide between these two hypotheses.
A powerful method of discriminating
between positive Darwinian
selection and neutral polymorphism
is
to compare the rates of synonymous
and nonsynonyKey words: env gene, escape mutants, gag gene, HIV- 1, pol gene,
positive selection.
Address for correspondence and reprints: Austin L. Hughes, Department of Biology, Mueller Laboratory, Pennsylvania State University, University Park, Pennsylvania 16802; E-mail: austina
hugaus.bio.psu.edu.
Mol. Bid. Evol. 12(5):803-813. 1995.
0 1995 by The University of Chicago. All rights reserved.
0737-4038/95/1205-0009$02.00
mous nucleotide substitutions
per site (Hughes and Nei
1988, 1989). In the case of positive selection favoring
diversity at the amino acid level, the rate of nonsynonymous substitution
is found to exceed that of synonymous substitution.
Under purifying selection, as occurs
in the case of most protein-coding
genes, the synonymous rate is higher (Kimura
1977). Simmonds
et al.
(1990) analyzed sequences of a highly variable region
(V3) of the surface protein gp120 from HIV- 1 and reported that the ratio of synonymous
substitutions
per
site to nonsynonymous
substitutions
per site is 0.67,
which is consistent with the hypothesis that this polymorphism is maintained by positive selection. However,
these authors did not report the results of any statistical
tests of the difference between rates of synonymous
and
nonsynonymous
substitution; therefore, they did not rule
out the possibility that the observed bias toward nonsynonymous
substitution
in this region was due to
chance. By contrast, an analysis of a cytotoxic T cell
(CTL) epitope in the gag protein of the simian immunodeficiency
virus SIVmac failed to show statistically
significant evidence of positive Darwinian
selection, although considerable polymorphism
was observed in this
region and the occurrence of mutations eliminating CTL
recognition was documented
(Chen et al. 1992).
To study the question of positive selection on HIV1 genes in more detail, we estimated rates of synonymous
--_
804
Seibert et al.
and nonsynonymous substitution per site in the gag,
pal, and env genes of HIV- 1 from published sequences.
Because the host’s immune system has been hypothesized to be a source of selection favoring diversity of
HIV proteins, we analyzed separately regions reported
to be involved in immune recognition. Even when an
enhanced rate of nonsynonymous substitution is observed in comparisons of closely related sequences, the
same pattern may not be observed when more distantly
related sequences are compared (Hughes and Nei 1988,
1989; Tanaka and Nei 1989). Presumably this occurs
because selectively favored nonsynonymous substitutions become saturated over time, allowing the rate of
synonymous substitution to overtake the rate of nonsynonymous substitution (Hughes and Nei 1988).
Therefore, we expected that evidence of positive selection
on HIV- 1 genes would be most apparent in comparisons
of closely related genes. In order to identify such families
of genes, we conducted phylogenetic analyses of HIV- 1
gag, pal, and env genes.
The vertebrate immune system includes two different systems for molecular recognition that operate in
quite different ways. T cell receptors (present on both
CTL and helper T cells) recognize short peptides derived
from intracellularly processed foreign proteins that are
bound and presented on the cell surface by class I (in
the case of CTL) and class II (in the case of helper T
cells) major histocompatibility complex (MHC) molecules. Immunoglobulins, by contrast, recognize extracellular foreign antigens in their native state. By analyzing putative T cell and immunoglobulin
epitopes
separately, we obtained information regarding the relative importance of selection by these two components
of immune recognition on HIV proteins.
Methods
Sequences Analyzed
The genomes of HIV- 1 and HIV-2 and related lentiviruses contain three major genes, gag, pal, and env;
proteins encoded by these genes make up the bulk of
the infective virion (Arnold and Arnold 199 1). In each
case, the initial product of translation is a polyprotein
that is then broken down into separate proteins. The
gag gene encodes the virion structural proteins p 17 (matrix), p24 (capsid), and p7/p9 (nucleocapsid) (figs. 1 and
U). The pol gene encodes protease, the component proteins of reverse transcriptase (~66, p5 1, and RNAse H),
and integrase (figs. 1 and 2B). The env gene encodes the
envelope glycoproteins gp 120 and gp4 1 (figs. 1 and 2C).
Glycoprotein gp120 recognizes the CD4 receptor and
thus enables entry into CD4+ T cells, and gp120 is also
the primary target of anti-HIV antibodies (Arnold and
Arnold 199 1).
Comparison of gpl20 sequences has revealed five
hypervariable regions (Vl-V5; fig. 2C), and prediction
ClPl20
FIG. I.-HIV
virion. Abbreviations:
IN, integrase; PR, protease;
RNA, genomic RNA; RT, reverse transcriptase. Redrawn with changes
from Arnold and Arnold 199 1.
of antigens likely to be recognized by host immunoglobulins suggested that these antigens would mainly be
found in the hypervariable regions (Modrow et al. 1987).
Simmonds et al. (1990) computed rates of synonymous
and nonsynonymous nucleotide substitution per site in
the V3 and flanking region and reported that the nonsynonymous rate was higher. However, they did not report the results of any statistical test of the difference in
rates, and because they studied PCR-amplified sequences
in this region, they could not compare the rates of nucleotide substitution in V3 with those in other regions
of gp120.
Regions of the HIV proteins that are bound by host
MHC molecules and presented to T cells, or T cell epitopes (TCE), have been identified experimentally by the
method of cellular immunology or by elution and direct
sequencing of MHC-bound peptides. The former
method generally does not determine the exact peptide
bound by the MHC molecule but rather a broader region
that presumably contains one or more such peptides.
To test the hypothesis that natural selection acting on
HIV proteins favors evasion of T cell recognition (Phillips et al. 1991), we compared rates of nucleotide substitution in the remainder of the genes with those in
TCE identified (1) by cellular methods (Schrier et al.
1988; Clerici et al. 1989; Wahren et al. 1989; Krowka
et al. 1990; De Groot et al. 199 1; Johnson et al. 199 1;
Phillips et al. 199 1; Chen et al. 1992) or (2) by elution
of peptides (Huet et al. 1990; Takashi et al. 199 1; Tso-
(A)
PI7
MGARASVLSGGELDRWEKIRLRPGGKKKYKLKHIVWASRELERFAVNPGLLETSEGCRQILGQLQPSLQTGS
P24
EELRSLYNTVATLYCVHQRIEIKDTKEALDKIEEEQNKS~~~DTGHSSOVSONY~PIVONI~O~
HOAISPRTLNAWVKVVEEKAFSPEVIPMFSALSEGATPOD
++++++++++++
HP~GPIAPGQ~PRGSDIAGTTSTLQEQIG~TNNPPIPVGEIY~WIILGLNKI~YSPTSILDIRQ
GPKEPFRDYVDRFYKTLRAEQASQEVKNWMTETLLVQNANGGPGHK
P~/PG
ARVLIAEAMSQVTNSATIIMMQRGNFRNQRKIVKCFNCGKEGHIARNCRAPRKKGCWKCGKEGHOMKDCTER
QANFLGKIWPSYKGRPGNFLQSRPEPTAPPFLQSRPEPTAPPEESFRSG~TTTPSOKQEPID~LYPLTSL
RSLFGNDPSSQ
(B) protease
PQITLWQRPLVTIKIGGQLKEALLDTG~DT~EE~LPGR~P~IGGIGGFIK~QYDQIPIEICGH~I
reverse
transcriptase
GT~VGPTPVNIIGRNLLTQIGCTLNF(PISPIETVP~KPG~GPK~QWPLTEEKI~~ICTEMEKE
GKISKIGPENPYNTPVFAIKSTKWRKLVDFRELNKRTDVGDAYF
SVPLDKDFRKYTAFTIPSTNNETPGIRYQYNVLPQGWKGSDLY
VGSDLEIGQHRTKIEELRQHLLRWGLTTPDKKHQKEPPFL
+++++++++
VGKLNWASQIYAGIKVKQLCKLLRGTKALTEWPLTEEAELELAENREILKEPVHGMYDPSKDLIAELQKQ
GQGQWTYQIYQEPFKNLKTGKYAKMRGTHTNDVKQLTEAVID
YWQATWIPEWEFVNTPPLVKLWYQLEKEPIVGAETFYVDGPLTDTTNQK
TELQAIYL~QDSGLE~IVTDSQYALGIIQAQPDKSES
integrase
DKLVSAGIRKVLIFLDGIDKAAQEEHEKYHSNWRAMA
SDFNLPPWAKEIVASCDKCQLKGEAMHGQVDCSPG
IWQLDCTHLEGKVILVAVHVASGYIEAEVIPAETGQETAYFLL~AGRWP~TIHTDNGGNFISTT~CW
WAGIKQEFGIPYNPQSQGWES~NEL~IIGQV~QAEHLKTAVQ~VFIHNFK~GGIGGYSAGERIIDI
IATDIOTKELOKQITKIQNFRVYYRDSRDSRDPLWKGPA~L~GEGAWIQDNSDIKWPR~II~YGKQM
AGDDCVASIRQDED
(Cl
gpI20
TEKLWVTVYYGVPVWKEATTTLFCASDAKAYDTEVHNVWA
VI
QMHEDIISLWDQSLKPCVKLTPLCVSLKCTDLGNATNTNSSNTNSSSGE~KGEI~CSFNISTSIRGKV
v2
QKEYAFFYKLDIIPIDNDTTSYTLTSCNTSVITQACPKVSFEPIPIHYCAPAGFAILKCNNKTFNGTGPCTN
v3
VSTVQCTHGIRPWSTQLLLNGSL~EEWIRSANFTDNAKTIIVQLNOS~INCTRPNNNT~SIRIQRGP
++++++++
GRAFVTIGKIGNMRQAHCNIS~~ATLKQIASKLREOFGNNKTIIFKQSSGGDPEIVTHSFNCGGEFFYC
v4
NSTQLFNSTWFNSTWSTEGSNNTEGSDTITLPCRIKOFINMWQEVG~YAPPISGQIRCSSNITGLLLT~
V5
QP4I
GGNNNNGSEIFRPGGGDMRDNWRSELYKYKWKIEPLGVAPT~WQ~~IAVGIG~FLGFLG~GS
++++++++++
TMGARSMTLTVQARQLLSGI~~NNLLRAIEAQQHLLQLT~GIKOLQ~ILA~RYLKD~LLGIWGCSG
KLICTTAVPWNASWSNKSLEQIWNMTWMEWDREINNYTSLIHSLIEESQNQQEKNEQELLELLELDKWASL~
FNITNWLWYIKIFIMIVGGLVGLRIVFAVLSIVNRVRQGYRDR
+++++++++++
SIRLVNGSLALIWDDLRSLCLFSYHRLRDLLLLIVTRI~LLGRRG~~KY~LLQYWSQEL~SAVSLLN
ATAIAVAEGTDRVIEWOGACRAIRHIPRRIRQGLERILL
FIG. 2.-Amino
acid sequence of component proteins encoded by (A) gag, (B) pal, and (C) env from the LA1 isolate of HIV-l (WainHobson et al. 1985). Vertical lines indicate boundaries between proteins. T cell epitopes identified by cellular methods are underlined; those
identified by elution of peptides are marked with +. Variable regions V l-V5 of env gp120 are overlined. Sources for protein boundaries are as
follows: (A) ~17, ~24, and p7/p9 (McCutchan et al. 1992). (B) Protease (Loeb et al. 1989), reverse transcriptase (Jupp et al. 199 I), integrase
(Vink and Plasterk 1993). (C) gp120 and gp41 (Cheng-Mayer et al. 1990).
806
Seibert et al.
0
I
.OS
I
.lO
1
yL11768
(HIV-1 )
0
I
,
P
.025
1
.05
I
P
r LK0201 3 (HIV-1 )
1
l
Zl 1530 (HIV-1 )
2
**
-Ml
1
7451 (HIV-1 )
-M27323
**
3
-
(HIV-1 )
M22639 (HIV-l)
K03454 (HIV-1 )
LL11762
-L11766
(HIV-l)
Wf
’
(HIV-1 )
-L11782
(HIV-1 )
-L11792
(HIV-1 )
-L11778
(HIV-1 )
,-L11785
L20571 (HIV-l)
4
(HIV-1 )
L11765 (HIV-1 )
M66437 (SIVagm)
I
I
(HIV-1 ) 1
LK03454
l*r
1
uo4005 (SIVagm) Lo7625 (HIV_2)
1
5
L11797 (HIV-1 )
-L11752
clr*
(HIV-1 )
-LO3696
**
-
(HIV-1 )
-L11780
-
M30895 (HIV-2)
M30502 (HIV-2)
c
(HIV-1 )
JO4542 (HIV-2)
L11753 (HIV-1 )
yL11781
-Ml
(HIV-1 )
3136 (HIV-1 )
--M21137
~A04321
,-K02012
-K02013
6
(HIV-1 )
-Ml
c*
(HIV-1 )
**
(HIV-1 )
6403 (SIV)
cM76764
.
(SIV)
-DO1065 (SIV)
M33262 (SIV,,,)
(HIV-1 )
t
YOO295 (SIV)
FIG. 4.-Phylogenetic tree 01 pal sequences based on proportion
amino acid difference (p). Sequence identification, abbreviations, and
tests of internal branch length are as in fig. 3.
mides et al. 199 1; Dai et al. 1992; Johnson et al. 1992,
1993) (fig. 2).
DNA sequences for gag, env, and pol genes of HIV1, HIV-2, and SIV were collected from the GenBank
database. The numbers of sequences used were as follows: 45 gag sequences, 46 gpl20 sequences, 35 gp4 1
sequences, and 33 pol sequences. The sequences are
identified by accession number in the phylogenetic trees
(figs. 3-6).
X6801 9 (FIV)
FIG. 3.-Phylogenetic tree of gag sequences based on proportion
amino acid difference (p). Sequences are identified by their Genbank
accession number. Abbreviations are as follows: agm, African green
monkey; mat, macaque; stm, stump-tailed macaque; FIV, feline immunodeficiency virus. Tests of the hypothesis that the length of an
internal branch is equal to zero: 'P< 0.05; "P < 0.01;
***P
-c0.001.
Statistical Methods
Sequences were aligned at the amino acid level
by the CLUSTAL V program (Higgins et al. 1992)
using default settings, and in some cases were corrected
bv eve to imnrove the alignment. When a given set of
Natural Selection on HIV-l
0
.04
.02
807
1
P
A
B
I
- uM38430
M21138
c**
l
1
LO8657
C
M60472
I
sequences were compared, any codon at which the
alignment postulated a gap in any sequence was excluded from each set of pairwise comparisons, so that
a comparable data set was used in each comparison.
Phylogenetic trees were constructed by the neighborjoining method (Saitou and Nei 1987) on the basis of
the proportion of amino acid difference; amino acid
sequences were used because synonymous nucleotide
sites were saturated in many comparisons. The statistical significance of internal branches in phylogenetic
trees was tested by Rzhetsky and Nei’s (1992) method.
The phylogenetic analyses were used to identify families of closely related sequences, most of which cor-
0
I
.03
1
P
.06
I
1
K02013
A04321
M37575
**
A
LM33943
-M38430
-M21138
-Ml
2507
M63929
***
B
M68894
l*
-+
L
-Ml
L14574
1
3137
-Ll4574
TM79352
l
M68893
*
-M38428
M79353
**
-L14575
-M60472
LO8656
*
LO8655
c!
LO8657
-M31451
rM37576
r
LO7082
M37575
M37491
Id37574
-LO7421
L14572
-L14576
K03458
K03454
-JO3653
K03454
LO7082
FIG. 5.-Phylogenetic tree ofgpl20 sequences based on proportion
amino acid difference (p). All sequences are HIV-l; tests of internal
branch length are as in fig. 3.
FIG. 6.-Phylogenetic tree of gp41 sequences based on proportion
amino acid difference (p). All sequences are HIV- 1; tests of internal
branch length are as in fig. 3.
808
Seibert et al.
Table 1
Numbers of Synonymous (&) and Nonsynonymous (&) Nucleotide Substitutions
per 100 Sites (*SE) in Comparisons of HIV-l gag Alleles
REMAINDER
TCE
ds
ds
Family
Family
Family
Family
Family
Family
Family
Overall
1
,
2
3
,
4 .,
5
..
6
.
.
7
mean
11.2
8.8
3.7
55.7
21.9
10.9
13.2
15.5
+ 2.1
+- 2.1
f 1.1
+ 6.5
f 3.4
k 1.4
+ 2.6
k 4.8
2.9
3.0
1.9
5.8
5.7
2.9
5.1
3.5
+ 0.5***
+_0.6**
+ 0.4
f 0.9***
+ 0.9***
+ 0.4***
f 0.8**
+ 1.2*
15.2
13.8
4.0
36.0
21.6
12.4
17.9
18.4
6
f 2.4
re_2.5
2 1.2
+ 4.4
+ 3.3
-t 1.4
1 3.0
+ 2.2
3.2
6.0
2.1
7.4
9.6
37
6:l
5.3
f 0.5***
+_ 0.9**
f 0.4
+ 0.9***
+ l.l***
f 0 4***
+ 0:9***
+ 0.6***
NOTE.-Families are as in fig. 3. Standard errors of mean ds and dN are computed by Nei and Jin’s (1989) method.
Tests of the hypothesis that ds = dN: *P -c 0.05; **P < 0.01; ***P c 0.001.
responded
to clusters supported
by statistically
significant internal
branches
in the trees. Within these
families,
we estimated
the number
of synonymous
nucleotide
substitutions
per site (&) and the number
of nonsynonymous
nucleotide
substitutions
per site
(dN) using Nei and Gojobori’s
(1986) method. Standard errors of ds and dN for sets of pairwise comparisons
were estimated
by Nei and Jin’s (1989)
method.
To test for natural selection acting on gag- and
&encoded
proteins,
ds and dN were estimated
separately for TCE and for the remainder
of the gene (fig.
2A-2B). In the case of env, TCE mentioned
in the
literature
include some that overlap the V3, V4, and
V5 regions of gpl20 (fig. 2C). We estimated
ds and
dN for V 1-V5 of gp 120; for TCE in gp 120 other than
those overlapping
one of the V regions; for the remainder of gpl20; for TCE in gp4 1 (Schrier et al. 1988;
Table 2
Numbers of Synonymous (&) and Nonsynonymous
of pof Polyprotein Gene Alleles
Wahren et al. 1989); and for the remainder
(fig. 2C).
Results
The phylogenetic
tree of gag sequences (fig. 3) is
rooted by use of a sequence from feline immunodeficiency virus (FIV) as an outgroup.
As with previous
phylogenetic
analyses (Yokoyama
199 1), HIV- 1 and
HIV-2 clustered separately, with HIV-2 closer to genes
from simian viruses. The trees for pol (fig. 4), gp120 (fig.
5), and gp41 (fig. 6) were rooted by placing the root in
the longest internal branch because in these cases FIV
sequences were too distant for very reliable alignment.
In the case of pol, HIV-2 sequences clustered more
closely with simian sequences (fig. 4). In the case ofgag,
pal, and gp120, families of closely related sequences were
identified for analysis of rates of nucleotide substitution.
Each such family constituted a cluster of closely related
(&) Nucleotide Substitutions
FAMILY 1 (HIV-l)
Protease
...
..
Reverse transcriptase:
TCE
. ...
....
Remainder
.. ... .
Integrase:
TCE
.. ...
...
Remainder
.....
.
NOTE.-Families
of gp41
per 100 Sites (*SE) in Comparisons
FAMILY 2 (HIV-2)
ds
&
13.4 f 3.3
2.1 + 0.6***
28.2 +
17.1 + 5.9
22.2 + 1.3
11.2 + 3.1
19.7 + 2.1
FAMILY 3 (SIV)
&
ds
4.6
4.3 + 0.9***
33.1 + 6.5
2.9 k 0.8***
3.1 + 1.1*
2.6 + 0.2***
72.1 _+ 23.0
46.3 -e 2.7
3.2 + 1.5**
4.1 + 0.4***
3.3 IfI 3.4
6.6 f 0.9
0.0 + 0.0
0.7 + 0.2***
2.5 + 0.7**
1.6 + 0.3***
18.3 k
30.8 +
4.3 + 1.3**
3.6 + 0.5***
6.6 + 3.0
4.9 + 1.2
0.9 k 0.5
0.8 + 0.3***
&
5.1
3.3
d,v
are as in fig. 4. Standard errors of mean ds and dN are computed by Nei and Jin’s (1989) method. Tests of the hypothesis that ds = dN: *P
< 0.05; **p < 0.01; ***p < 0.001.
Natural Selection on HIV-l
o! -I
b--
09 1
w
9
”
r-4
0
m
d
+I +I +I t-l +I +I +I
99990909c‘!
2;
ocuoo~o-
fl +I +I tl tl fl +I
qd-qcq~cq”
9
9
T
hid
?
4-I
+1
fl fl +I
7‘
9 9 -*mo
wooo;odtiucj
y’?
10--
?‘
+I +I tl
109
?
wmt-d---_d
ccl
-cl+I
-0
o-oo;;oc;
9
+I +I
cow
66
c?‘?c?o\9c?w
r-~m~mod
woom~--
*m
d d
c? 9
+I +I
?‘
--
Cl
“!
+I
“”
m
r-j w. *
hiWOd
+I +I +I +I
09 ? c? WJ
+
rnb
66
fl +I
am
6
,’
c-4
d
vi
9 9
-r4ow-06
m
“I
G
09 -
+I +I +I tl +I +I +I
m. q 9 cc m -. w
*4_Ot-‘\d--_;
P4
I---
&_;
+I +I
WO
_;t+
809
sequences, and in most cases the cluster was supported
by a statistically significant internal branch (figs. 3-5).
The same families that were identified for gp120 (designated A-E in fig. 5) were seen on the gp41 tree (fig.
6), except for family E, for which gp41 sequences were
not available.
Numbers of nucleotide substitutions per synonymous site and per nonsynonymous site were estimated
within seven families of gag sequences (table 1). Both
in regions identified as TCE and in the remainder of the
gene, ds exceeded dN;and this difference was statistically
significant for all but one family (table 1). This pattern
is indicative of purifying selection. In the case of pol,
rates of nucleotide substitution were estimated within
three families, one from HIV-l, one from HIV-2, and
one from SIV (table 2). In each case, ds was significantly
greater than dN both in TCE and the remainder of the
protease, reverse transcriptase, and integrase genes, and
the difference is statistically significant for all comparisons but two (table 2).
The pattern of nucleotide substitution seen in the
case of the env gene differed from that of either pal or
gag. In families B, C, and E, dNwas significantly greater
than ds in the V2 region; and in families B and C, dN
was significantly greater than ds in the V3 region (table
3). Family C was remarkable in that it showed dN significantly greater than ds in the V4 region (table 3). When
overall means were computed for the five gpl20 families,
mean dN was significantly greater than mean ds in the
case of V2 and V3; in the case of V 1, V4, and V5 and
the TCE, ds and dN did not differ significantly. In the
remainder of gpl20 ds was significantly greater than dN
(table 3). By contrast, gp41 showed ds significantly
greater than dN in both TCE and the remainder of the
gene in overall means for families A-D (table 3).
Table 4 shows comparisons of V 1-V5, the remainder of gp120, and gp41 among the four families (A-D)
for which gp120 and gp41 sequences were available. In
three of six pairwise comparisons, dN was significantly
greater than ds in V2. Likewise, in two comparisons, dN
was significantly greater than ds in VI. However, no
such pattern was seen in the case of V3-V5. In the remainder of gp120 and in gp41, ds was greater than dN
in all comparisons, and the difference was significant in
two cases for gp120 and in five cases for gp41 (table 4).
When TCE were compared between families, ds was
always greater than d,, although this difference was generally not significant (data not shown); therefore, data
for the TCE were not included in table 4.
Recent analysis of self peptides bound by human
class I MHC molecules (i.e., peptides derived from self
proteins and presented by class I MHC molecules on
the surface of noninfected cells) showed that these peptides tend to be derived from proteins that have been
8IO
Seibert et al.
Table 4
Number of Synonymous (&) and Nonsynonymous
Families of HIV env Genes
(&) Nucleotide Substitutions
FAMILYAVLFAMILY B
ds
gp120:
VI ......
v2 ......
v3 ......
v4 ......
v5 ......
Remainder
gp41
......
NOTE.-Standard
.
.
0.0
1.3
19.2
7.8
17.2
5.8
4.8
+
++
f
+
-t
1
&
0.0
1.1
11.4
8.1
13.9
1.4
1.0
34.6
11.7
15.3
12.2
47.5
3.9
2.7
+
+
+
+
+
-+
+
15.2*
3.7**
5.0
5.6
15.8
0.6
0.4
18.0
1.0
18.9
35.0
21.2
11.9
6.9
errors of mean ds and dN are computed by Nei and Jin’s
per 100 Sites (&SE) in Comparisons
between
FAMILY A vs.FAMILYC
FAMILY A vs. FAMILY D
ds
&
rk 13.4
I!I 1.0
+ 11.3
k 27.1
+ 22.3
f 2.0
+ 1.4
d,v
27.4
21.6
15.7
56.9
5.9
9.6
3.1
I!I 6.6
+ 5.5***
AI 5.3
-t 19.0
f 3.3
+- 0.9
+ 0.4*
(1989) method. Tests of the hypothesis that
15.8
10.8
12.1
51.8
37.9
9.2
9.8
f 9.9
+ 7.1
f 6.5
f 31.3
f 27.7
f 2.0
+ 1.7
&
29.4
13.3
27.9
29.9
32.0
4.2
2.6
f 5.9
f 4.0
+ 7.5
+ 14.6
+ 10.1
-t 0.7*
+ 0.4***
ds = dN:*P < 0.05;
**P < 0.0I;***P
< 0.001.
highly conserved evolutionarily
and from the most conserved regions of these proteins (Hughes and Hughes
1995). Furthermore,
these peptides tend to be derived
from relatively hydrophobic
portions of proteins that
are hydrophilic
overall (Hughes and Hughes 1995). To
test whether HIV-derived
peptides bound by host class
I MHC molecules have similar characteristics,
we compared these peptides with their source proteins (table 5).
So that we would have a measure of evolutionary
conservation that was comparable
among different genes,
we made comparisons
between sequences derived from
two complete HIV genomes (LA1 and ELI) (table 5).
Since LA1 (accession
number
IS0201 3) and ELI
(K03454) are not closely related (figs. 3-6), this comparison provides an indication
of the degree of conservation over a long period of time relative to the deepest
branch point of available HIV- 1 sequences.
The results showed that the peptides were generally
hydrophobic,
as measured by the percentage of highly
hydrophobic
residues, and in most cases more hydrophobic than the remainder
of the source protein (table
5). In three of the five peptides examined, there were no
nonsynonymous
differences between LA1 and ELI in
the gene region encoding the peptide but a substantial
number of nonsynonymous
differences outside the peptide (table 5). In the other two cases, dN in the peptide
and the remainder of the gene did not differ significantly
(table 5). In these cases, the amino acid differences that
were observed in the peptides between LA1 and ELI were
generally conservative ones that did not affect the overall
hydrophobicity
of the peptide to a great extent (table 5).
Therefore, these data suggest that, far from being subject
to positive selection favoring diversity, peptides from
HIV-l proteins that are bound by the host class I MHC
are generally derived from relatively conserved regions,
and some are derived from highly conserved protein regions.
Discussion
Our analyses indicate that gp41, gag, and pol are
subject to purifying selection overall. Furthermore,
TCE
derived from these proteins not only are not subject to
positive selection favoring diversity at the amino acid
level but actually tend to be derived from portions of
the source proteins that are subject to purifying selection
and thus are relatively conserved evolutionarily.
In the
case of gpl20, by contrast, positive Darwinian selection
has acted to favor diversity at the amino acid level in
certain regions.
However, this positive selection has not acted in
the same way on all families of gp120 genes. Positive
selection was most frequently found on the V2 region.
There was evidence of positive selection on this region
in the B, C, and E families and in three of six comparisons between families (tables 3 and 4). In addition, there
was evidence of positive selection on the V3 region in
two families and on the V4 region in one family (table
3). Therefore, it appears that positive selection on gp120
has an episodic or opportunistic
character. In other
words, members of a given HIV-l lineage, sharing a
common ancestry and perhaps certain characteristics
of
their environment,
can be temporarily
subject to a type
of selection that does not necessarily occur in the case
of other such lineages. Perhaps differences in clinical
stage and/or host genotypes may contribute
to differences in the selective environment.
It seems likely that the source of positive selection
on gp120 is the vertebrate immune system and that diversity at the amino acid level is favored because it reduces or evades immune recognition
by the host. The
V regions in which positive selection was found are putative immunoglobulin
epitopes, and the V3 region has
also been implicated as a TCE. Therefore, it seems possible that such selection can be caused by both the T
Natural
FAMILY B vs. FAMILY C
div
ds
17.1
0.0
11.8
21.9
4.2
12.3
9.1
-t
*
f
-+
k
zk
*
FAMILY B vs. FAMILY D
17.1
0.0
8.4
18.6
10.6
2.1
1.4***
27.5
15.5
6.1
63.4
40.0
9.9
3.8
&
+ 11.0
f 4.4***
+ 2.3
+- 20.1
f 18.7
+ 0.9
+ 0.5
0.0 -t 0.0
4.7 -t 4.8
11.4-t
7.4
18.0 -t 20.9
24.1 + 19.0
8.6 -t 1.9
9.6 k 1.7
cell and the immunoglobulin components of the host
immune system.
However, there are several lines of evidence suggesting that selection to avoid immunoglobulin recognition has been the predominant mode of positive selection on gp120. First, positive selection was never
observed on any TCE except V3, which is also a putative
immunoglobulin epitope. Second, in most families, gag
and pol TCE showed significant evidence of purifying
selection (tables l-2); significant evidence of purifying
selection was also found in the case of TCE in one gp41
family (table 3). Furthermore, three of the five known
class I MHC-bound peptides from HIV-l have been
highly conserved over evolutionary time, including one
each from pol, gp41, and gpl20 (table 5).
Phillips and McMichael(l993)
state that “the appearance of sequence variation which alters the ability
ds
-+ 8.8
-t 4.0
+ 5.3
-t 22.0
k 21.4
-t 0.6*
k 0.5***
8.0
7.0
8.0
10.5
34.1
13.5
8.3
&
f 7.5
+ 5.0
+ 4.8
+ 15.3
f 24.6
+ 2.3
+ 1.4
Substitutions
PEPTIDE’
31.6
10.2
22.1
39.0
19.2
10.0
3.3
f 8.0*
f 3.0
f 6.1
f 16.6
+ 11.2
f 0.9
+ 0.5***
per 100 Sites (&)
% HFO
PROTEIN
8 11
of cytotoxic T cells to recognize antigens of the virus
is good evidence that this form of immunity exerts a
selective force.” However, the mere occurrence of such
mutations is not in itself evidence that they have been
selectively favored. A significantly higher rate of nonsynonymous than of synonymous nucleotide substitution provides much more convincing evidence of
positive selection. Alternatively, the hypothesis that an
escape mutation has been positively selected can be
supported by a study of the viral population within a
given host that shows fixation of such a mutation, as
recently shown for a hepatitis C virus mutation by
Weiner et al. (1995).
Analysis of known class I MHC-bound peptides
suggests that these are derived from often conserved,
relatively hydrophobic regions (table 5). In this respect,
these pathogen-derived peptides resemble self protein-
Table 5
Percent of Highly Hydrophobic Residues (9%HFO) and Nonsynonymous
in HIV-l Peptides Eluted from MHC Class I Molecules
GENE
on HIV- 1
FAMILY C vs. FAMILY D
&
15.2
12.6
18.0
54.8
56.3
3.9
3.5
Selection
Peptide
dN”
Remainder
Peptide
Remainder
gag
~24
134-142
LA1
ELI
KRWIILGLNKIV
_--__V_-____
58.3
30.6
3.6 + 3.6
em
gpl20
346-353
LA1
ELI
FNEGGEFF
________
50.0
33.1
0.0 4 o.o***
17.7 + 1.4
73-82
LA1
ELI
ERYLKDQQLL
__________
40.0
40.1
0.0 + o.o***
7.9 + 1.1
fzP4 1
259-269
LA1
ELI
RLRDLLLIVTR
_____J___AV_
54.5
40.1
RT
310-318
LA1
ELI
ILKEPVHGV
_________
44.4
34.1
gP4
PO1
1
18.3 + 9.5
0.0 + o.o***
7.2 f 0.7
7.9 It_ 1.1
2.7 + 0.5
a Numbering of residues and % HFO are based on the LA1 sequences (GenBank accession number KO2013). Differences between LA1 and ELI (accession
number K03454) are shown. Highly hydrophobic residues are C, F, I, L, M, V, W, Y.
b dN is computed between LA1 and ELI sequences. Tests of the hypothesis that dNin the peptide equals that in the remainder of the protein: ***P< 0.00
1.
8 12 Seibert et al.
identified in mice and humans: correlation with a cytotoxic
T cell epitope. J. Infect. Dis. 164: 1058- 1065.
FISHER,A. G., B. ENSOLI,D. LOONEY,A. ROSE, R. C. GALLO,
M. S. SAAG, G. M. SHAW, B. H. HAHN, and F. WONGSTAAL. 1988. Biologically diverse molecular variants within
a single HIV-l isolate. Nature 334:440-444.
HAHN, B., G. M. SHAW, M. E. TAYLOR, R. R. REDFIELD,
P. D. MARKHAM, S. Z. SALAHUDDIN,F. WONG-STAAL,
R. C. GALLO,E. S. PARKS,and W. D. PARKS. 1986. Genetic
variation in HTLV-III/LAV over time in patients with AIDS
or at risk for AIDS. Science 232:1548-1553.
HIGGINS, D. G., A. J. BLEASBY,and R. FUCHS. 1992. Clustal
V: improved software for multiple sequence alignment.
Comput. Appl. Biosci. 8: 189- 19 1.
HOLLAND, J., K. SPINDLER,F. HORODYSKI,E. GRABAU, S.
NICHOL, and S. VAN DE POL. 1982. Rapid evolution of
RNA genomes. Science 215:1577-1585.
HOLMES,E. C., L. Q. ZHANG, P. SIMMONDS,C. A. LUDLAM,
and A. J. LEIGH BROWN. 1992. Convergent and divergent
sequence evolution in the surface envelope glycoprotein of
human immunodeficiency virus type 1 within a single infected patient. Proc. Natl. Acad. Sci. USA 89:4835-4839.
HUET, S., D. F. NIXON, J. B. ROTHBARD, A. TOWNSEND,
S. A. ELLIS, and A. J. MCMICHAEL. 1990. Structural hoAcknowledgments
mologies between two HLA B27-restricted peptides suggest
residues important for interaction with HLA B27. Int. ImThis research
was supported
by grants RO lmunol. 2:3 11-3 16.
GM34940 and K04-GM006 14 from the National InstiHUGHES,A. L., and M. K. HUGHES. 1995. Self peptides bound
tutes of Health to A.L.H.
by HLA class I molecules are derived from highly conserved
regions of a set of evolutionarily conserved proteins. ImLITERATURE CITED
munogenetics 41:257-262.
HUGHES, A. L., and M. NEI. 1988. Pattern of nucleotide subARNOLD, E., and G. F. ARNOLD. 199 1. Human immunodestitution at major histocompatibility complex class I loci
ficiency virus structure: implications for antiviral design.
reveals overdominant selection. Nature 335: 167- 170.
Adv. Viral Res. 39: l-87.
. 1989. Nucleotide substitution at major histo-comCHEN, Z. W., L. SHEN, M. D. MILLER, S. H. GHIM, A. L. ~
patibility
complex class II loci: evidence for overdominant
HUGHES,and N. L. LETVIN. 1992. Cytotoxic T lymphocytes
selection. Proc. Natl. Acad. Sci. USA 86:958-962.
do not appear to select for mutations in an immunodomJOHNSON, R. D., A. TROCHA, T. M. BUCHANAN,and B. D.
inant epitope of simian immunodeficiency virus gag. J. ImWALKER. 1992. Identification of overlapping Hla class Imunol. 149:4060-4066.
restricted
cytotoxic T cell epitopes in a conserved region of
CHENG-MAYER,C., M. QUIROGA, J. W. TUNG, D. DINA, and
the human immunodeficiency virus type 1 envelope glyJ. LEVY. 1990. Viral determinants of human immunodecoprotein: definition of minimum epitopes and analysis of
ficiency virus type 1 T-cell or macrophage tropism, cytothe effects of sequence variation. J. Exp. Med. 175:961pathogenicity, and CD4 antigen modulation. J. Virol. 64:
971.
4390-4398.
1993. Recognition of a highly conserved region of
CLERICI, M., N. I. STOCKS, R. A. ZAJAC, R. N. BOSWELL, -.
human immunodeficiency virus type 1 gp 120 by an HLAD. C. BERSTEIN,D. L. MANN, G. M. SHEARER,and J. A.
Cw4-restricted cytotoxic T-lymphocyte clone. J. Virol. 67:
BERZOFSKY.1989. Interleukin-2 production used to detect
438-445.
antigen peptide recognition by T-helper lymphocytes from
JOHNSON, R. D., A. TROCHA, L. YANG, G. P. MAZZARA,
asymptomatic HIV-positive individuals. Nature 339:383D. L. PANICALI,T. M. BUCHANAN,and B. D. WALKER.
385.
199 1. HIV- 1 gag-specific cytotoxic T lymphocytes recognize
DAI, L. C., K. WEST, R. LITTUA, K. TAKASHI,and F. A. ENNIS.
multiple highly conserved epitopes: fine specificity of the
1992. Mutation of human immunodeficiency virus type 1
gag-specific response defined by using unstimulated peat amino acid 585 on gp4 1 results in loss of killing by CD8+
ripheral blood mononuclear cells and cloned effector cells.
A24-restricted cytotoxic T lymphocytes. J. Virol. 66:3 15 lJ. Immunol. 147:1512-1521.
3154.
DE GROOT, A. S., M. CLERICI,A. HOSMALIN,S. H. HUGHES, JUPP, R. A., L. H. PHYLIP, J. S. MILLS, S. F. J. LE GRICE, and
D. BARND, C. W. HENDRIX, R. HOUGHTON, G. M.
J. KAY. 199 1. Mutating P2 and P1 residues at cleavage
SHEARER,and J. A. BERZOFSKY.1991. Human immunojunctions in the HIV-l pol poly-protein. Effects on hydrodeficiency virus reverse transcriptase T helper epitopes
lysis by HIV- 1 proteinase. FEBS Lett. 283: 180- 184.
derived peptides (Hughes and Hughes 1995). Hughes
and Hughes (1995) suggested that any mechanism
that
leads to binding of relatively hydrophobic
peptides derived from overall relatively hydrophilic
proteins will
tend to select evolutionarily
conserved peptides, because
such hydrophobic
regions are often functionally
important. And any mechanism
that leads to binding of predominantly
conserved peptides will be to the host’s advantage in that it will minimize the likelihood of escape
mutants’ occurrence in parasite populations (Hughes and
Hughes 1995).
HIV, however, is able to evade the immune system
of its host despite the fact that relatively conserved peptides are presented to CTL by MHC molecules. This
suggests that evasion of CTL cannot be the only mechanism involved in HIV’s successful evasion of immune
recognition.
Indeed, the evidence of strong purifying selection in the case of many TCEs is not consistent with
the hypothesis that positive selection favoring avoidance
of CTL recognition is a major factor in the pathogenesis
of AIDS.