Reconciling the spectrum of Sagittarius A* with a two

Transcription

Reconciling the spectrum of Sagittarius A* with a two
letters to nature
Received 7 July; accepted 21 September 1998.
1. Dalbey, R. E., Lively, M. O., Bron, S. & van Dijl, J. M. The chemistry and enzymology of the type 1
signal peptidases. Protein Sci. 6, 1129±1138 (1997).
2. Kuo, D. W. et al. Escherichia coli leader peptidase: production of an active form lacking a requirement
for detergent and development of peptide substrates. Arch. Biochem. Biophys. 303, 274±280 (1993).
3. Tschantz, W. R. et al. Characterization of a soluble, catalytically active form of Escherichia coli leader
peptidase: requirement of detergent or phospholipid for optimal activity. Biochemistry 34, 3935±3941
(1995).
4. Allsop, A. E. et al. in Anti-Infectives, Recent Advances in Chemistry and Structure-Activity Relationships
(eds Bently, P. H. & O'Hanlon, P. J.) 61±72 (R. Soc. Chem., Cambridge, 1997).
5. Black, M. T. & Bruton, G. Inhibitors of bacterial signal peptidases. Curr. Pharm. Des. 4, 133±154
(1998).
6. Date, T. Demonstration by a novel genetic technique that leader peptidase is an essential enzyme in
Escherichia coli. J. Bacteriol. 154, 76±83 (1983).
7. Whitely, P. & von Heijne, G. The DsbA-DsbB system affects the formation of disul®de bonds in
periplasmic but not in intramembraneous protein domains. FEBS Lett. 332, 49±51 (1993).
8. Peat, T. S. et al. Structure of the UmuD9 protein and its regulation in response to DNA damage. Nature
380, 727±730 (1996).
9. Paetzel, M. et al. Crystallization of a soluble, catalytically active form of Escherichia coli leader
peptidase. Proteins Struct. Funct. Genet. 23, 122±125 (1995).
10. van Klompenburg, W. et al. Phosphatidylethanolamine mediated insertion of the catalytic domain of
leader peptidase in membranes. FEBS Lett. 431, 75±79 (1998).
11. Kim, Y. T., Muramatsu, T. & Takahashi, K. Identi®cation of Trp 300 as an important residue for
Escherichia coli leader peptidase activity. Eur. J. Biochem. 234, 358±362 (1995).
12. Landolt-Marticorena, C., Williams, K. A., Deber, C. M. & Reithmeirer, R. A. Non-random distribution of amino acids in the transmembrane segments of human type I single span membrane proteins.
J. Mol. Biol. 229, 602±608 (1993).
13. James, M. N. G. in Proteolysis and Protein Turnover (eds Bond, J. S. & Barrett, A. J.) 1±8 (Portland,
Brook®eld, VT, 1994).
14. Strynadka, N. C. J. et al. Molecular structure of the acyl-enzyme intermediate in b-lactamase at 1.7 AÊ
resolution. Nature 359, 393±400 (1992).
15. Manard, R. & Storer, A. C. Oxyanion hole interactions in serine and cysteine proteases. Biol. Chem.
Hoppe-Seyler 373, 393±400 (1992).
16. Nicolas, A. et al. Contribution of cutinase Ser 42 side chain to the stabilization of the oxyanion
transition state. Biochemistry 35, 398±410 (1996).
17. Paetzel, M. et al. Use of site-directed chemical modi®cation to study an essential lysine in Escherichia
coli leader peptidase. J. Biol. Chem. 272, 9994±10003 (1997).
18. Paetzel, M. & Dalbey, R. E. Catalytic hydroxyl/amine dyads with serine proteases. Trends Biochem. Sci.
22, 28±31 (1997).
19. von Heijne, G. Signal sequences. The limits of variation. J. Mol. Biol. 184, 99±105 (1985).
20. Izard, J. W. & Kendall, D. A. Signal peptides: exquisitely designed transport promoters. Mol. Microbiol.
13, 765±773 (1994).
21. Matthews, B. W. Solvent content of protein crystals. J. Mol. Biol. 33, 491±497 (1968).
22. Otwinowski, Z. in DENZO (eds Sawyer, L., Isaacs, N. & Baily, S.) 56±62 (SERC Daresbury Laboratory,
Warrington, UK, 1993).
23. Collaborative Computational Project No. 4 The CCP4 suite: programs for protein crystallography.
Acta Crystallogr. D 50, 760±763 (1994).
24. Jones, T. A., Zou, J. Y., Cowan, S. W. & Kieldgaard, M. Improved methods for building protein models
in electron density maps and the location of errors in these models. Acta Crystallogr. A 47, 110±119 (1991).
25. Brunger, A. T. X-PLOR: A System for X-ray Crystallography and NMR (Version 3.1) (Yale Univ. Press,
New Haven, 1987).
26. Tronrud, D. E. Conjugate-direction minimization: an improved method for the re®nement of
macromolecules. Acta Crystallogr. A 48, 912±916 (1992).
27. Wolfe, P. B., Wickner, W. & Goodman, J. M. Sequence of the leader peptidase gene of Escherichia coli
and the orientation of leader peptidase in the bacterial envelope. J. Biol. Chem. 258, 12073±12080
(1983).
28. Kraulis, P. G. Molscript: a program to produce both detailed and schematic plots of protein structures.
J. Appl. Crystallogr. 24, 946±950 (1991).
29. Nicholls, A., Sharp, K. A. & Honig, B. Protein folding and association: insights from the interfacial and
the thermodynamic properties of hydrocarbons. Proteins Struct. Funct. Genet. 11, 281±296 (1991).
30. Meritt, E. A. & Bacon, D. J. Raster3D: photorealistic molecular graphics. Methods Enzymol. 277, 505±
524 (1997).
Acknowledgements. We thank SmithKlineBeecham Pharmaceuticals for penem inhibitor; R. M. Sweet
for use of beamline X12C (NSLS, Brookhaven National Laboratory); G. Petsko for the ethylmercury
phosphate; M. N. G. James for access to equipment for characterization of earlier crystal forms of SPase;
and S. Mosimann and S. Ness for discussions. This work was supported by the Medical Research Council
of Canada, the Canadian Bacterial Diseases Network of Excellence, and British Columbia Medical
Research Foundation grants to N.C.J.S. M.P. is funded by an MRC of Canada post-doctoral fellowship,
N.C.J.S. by an MRC of Canada scholarship, and R.E.D. by the NIH and the American Heart Association.
Correspondence and requests for materials should be addressed to N.C.J.S. (e-mail: natalie@byron.
biochem.ubc.ca).
errata
Reconciling the spectrum
of Sagittarius A* with a
two-temperature
plasma model
Rohan Mahadevan
Nature 394, 651±653 (1998)
..................................................................................................................................
A misleading typographical error was introduced into the second
sentence of the bold introductory paragraph of this Letter: the word
``infrared'' should be ``inferred''.
M
Deciphering the biology of Mycobacterium tuberculosis from
the complete genome sequence
S. T. Cole, R. Brosch, J. Parkhill, T. Garnier, C. Churcher, D. Harris, S. V. Gordon, K. Eiglmeier, S. Gas, C. E. Barry III,
F. Tekaia, K. Badcock, D. Basham, D. Brown, T. Chillingworth, R. Connor, R. Davies, K. Devlin, T. Feltwell, S. Gentles,
N. Hamlin, S. Holroyd, T. Hornsby, K. Jagels, A. Krogh, J. McLean, S. Moule, L. Murphy, K. Oliver, J. Osborne, M. A. Quail,
M.-A. Rajandream, J. Rogers, S. Rutter, K. Seeger, J. Skelton, R. Squares, S. Squares, J. E. Sulston, K. Taylor, S. Whitehead
& B. G. Barrell
Nature
393, 537±544 (1998)
..........................................................................................................................................................................................................................................................................
As a result of an error during ®lm output, Table 1 was published with some symbols missing. The correct version can be found at
http://www.sanger.ac.uk and is reproduced again here (following pages).
Also, in Fig. 2, we incorrectly labelled Rv0649 as fadD37 instead of fabD2. Two of the genes for mycolyl transferases were inverted:
Rv0129c encodes antigen 85C and not 85C9 as stated, whereas Rv3803c codes for the secreted protein MPT51 and not antigen 85C (Infect.
Immun. 59, 372±382; 1991); Rv3803c is now designated fbpD. We thank Morten Harboe and Harald Wiker for drawing this to our
attention.
The sequence of Rv0746 from M. bovis BCG-Pasteur presented in Fig. 5b was incorrect and should have shown a 16-codon deletion
instead of 29, as indicated here:
H37Rv.....GSGAPGGAGGAAGLWGTGGAGGAGGSSAGGGGAGGAGGAGGWLLGDGGAGGIGGAST...
..........::::::::::::::::::::
:::::::::::::::::::::
BCG.......GSGAPGGAGGAAGLWGTGGA----------------GGAGGWLLGDGGAGGIGGAST...
190
Nature © Macmillan Publishers Ltd 1998
M
NATURE | VOL 396 | 12 NOVEMBER 1998 | www.nature.com
8
letters to nature
8
NATURE | VOL 396 | 12 NOVEMBER 1998 | www.nature.com
Nature © Macmillan Publishers Ltd 1998
191
letters to nature
8
192
Nature © Macmillan Publishers Ltd 1998
NATURE | VOL 396 | 12 NOVEMBER 1998 | www.nature.com
letters to nature
8
NATURE | VOL 396 | 12 NOVEMBER 1998 | www.nature.com
Nature © Macmillan Publishers Ltd 1998
193
letters to nature
8
194
Nature © Macmillan Publishers Ltd 1998
NATURE | VOL 396 | 12 NOVEMBER 1998 | www.nature.com
letters to nature
8
NATURE | VOL 396 | 12 NOVEMBER 1998 | www.nature.com
Nature © Macmillan Publishers Ltd 1998
195
letters to nature
8
196
Nature © Macmillan Publishers Ltd 1998
NATURE | VOL 396 | 12 NOVEMBER 1998 | www.nature.com
letters to nature
8
NATURE | VOL 396 | 12 NOVEMBER 1998 | www.nature.com
Nature © Macmillan Publishers Ltd 1998
197
letters to nature
8
198
Nature © Macmillan Publishers Ltd 1998
NATURE | VOL 396 | 12 NOVEMBER 1998 | www.nature.com
article
Deciphering the biology of
Mycobacterium tuberculosis from
the complete genome sequence
8
S. T. Cole*, R. Brosch*, J. Parkhill, T. Garnier*, C. Churcher, D. Harris, S. V. Gordon*, K. Eiglmeier*, S. Gas*,
C. E. Barry III†, F. Tekaia‡, K. Badcock, D. Basham, D. Brown, T. Chillingworth, R. Connor, R. Davies,
K. Devlin, T. Feltwell, S. Gentles, N. Hamlin, S. Holroyd, T. Hornsby, K. Jagels, A. Krogh§, J. McLean,
S. Moule, L. Murphy, K. Oliver, J. Osborne, M. A. Quail, M.-A. Rajandream, J. Rogers, S. Rutter,
K. Seeger, J. Skelton, R. Squares, S. Squares, J. E. Sulston, K. Taylor, S. Whitehead & B. G. Barrell
Sanger Centre, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK
* Unité de Génétique Moléculaire Bactérienne, and ‡ Unité de Génétique Moléculaire des Levures, Institut Pasteur, 28 rue du Docteur Roux, 75724 Paris Cedex 15, France
† Tuberculosis Research Unit, Laboratory of Intracellular Parasites, Rocky Mountain Laboratories, National Institute of Allergy and Infectious Diseases, National
Institutes of Health, Hamilton, Montana 59840, USA
§ Center for Biological Sequence Analysis, Technical University of Denmark, Lyngby, Denmark
. ............ ............ ............ ........... ............ ............ ............ ........... ............ ............ ............ ........... ............ ............ ............ ........... ............ ............ ............ ............ ...........
Countless millions of people have died from tuberculosis, a chronic infectious disease caused by the tubercle bacillus.
The complete genome sequence of the best-characterized strain of Mycobacterium tuberculosis, H37Rv, has been
determined and analysed in order to improve our understanding of the biology of this slow-growing pathogen and to help
the conception of new prophylactic and therapeutic interventions. The genome comprises 4,411,529 base pairs,
contains around 4,000 genes, and has a very high guanine + cytosine content that is reflected in the biased amino-acid
content of the proteins. M. tuberculosis differs radically from other bacteria in that a very large portion of its coding
capacity is devoted to the production of enzymes involved in lipogenesis and lipolysis, and to two new families of
glycine-rich proteins with a repetitive structure that may represent a source of antigenic variation.
Despite the availability of effective short-course chemotherapy
(DOTS) and the Bacille Calmette-Guérin (BCG) vaccine, the
tubercle bacillus continues to claim more lives than any other
single infectious agent1. Recent years have seen increased incidence
of tuberculosis in both developing and industrialized countries, the
widespread emergence of drug-resistant strains and a deadly
synergy with the human immunodeficiency virus (HIV). In 1993,
the gravity of the situation led the World Health Organisation (WHO)
to declare tuberculosis a global emergency in an attempt to heighten
public and political awareness. Radical measures are needed now to
prevent the grim predictions of the WHO becoming reality. The
combination of genomics and bioinformatics has the potential to
generate the information and knowledge that will enable the
conception and development of new therapies and interventions
needed to treat this airborne disease and to elucidate the unusual
biology of its aetiological agent, Mycobacterium tuberculosis.
The characteristic features of the tubercle bacillus include its slow
growth, dormancy, complex cell envelope, intracellular pathogenesis and genetic homogeneity2. The generation time of M. tuberculosis, in synthetic medium or infected animals, is typically ,24
hours. This contributes to the chronic nature of the disease, imposes
lengthy treatment regimens and represents a formidable obstacle for
researchers. The state of dormancy in which the bacillus remains
quiescent within infected tissue may reflect metabolic shutdown
resulting from the action of a cell-mediated immune response that
can contain but not eradicate the infection. As immunity wanes,
through ageing or immune suppression, the dormant bacteria
reactivate, causing an outbreak of disease often many decades
after the initial infection3. The molecular basis of dormancy and
reactivation remains obscure but is expected to be genetically
programmed and to involve intracellular signalling pathways.
The cell envelope of M. tuberculosis, a Gram-positive bacterium
with a G + C-rich genome, contains an additional layer beyond the
peptidoglycan that is exceptionally rich in unusual lipids, glycoliNATURE | VOL 393 | 11 JUNE 1998
pids and polysaccharides4,5. Novel biosynthetic pathways generate
cell-wall components such as mycolic acids, mycocerosic acid,
phenolthiocerol, lipoarabinomannan and arabinogalactan, and
several of these may contribute to mycobacterial longevity, trigger
inflammatory host reactions and act in pathogenesis. Little is
known about the mechanisms involved in life within the macrophage, or the extent and nature of the virulence factors produced by
the bacillus and their contribution to disease.
It is thought that the progenitor of the M. tuberculosis complex,
comprising M. tuberculosis, M. bovis, M. bovis BCG, M. africanum
and M. microti, arose from a soil bacterium and that the human
bacillus may have been derived from the bovine form following the
domestication of cattle. The complex lacks interstrain genetic
diversity, and nucleotide changes are very rare6. This is important
in terms of immunity and vaccine development as most of the
proteins will be identical in all strains and therefore antigenic drift
will be restricted. On the basis of the systematic sequence analysis of
26 loci in a large number of independent isolates6, it was concluded
that the genome of M. tuberculosis is either unusually inert or that
the organism is relatively young in evolutionary terms.
Since its isolation in 1905, the H37Rv strain of M. tuberculosis has
found extensive, worldwide application in biomedical research
because it has retained full virulence in animal models of tuberculosis, unlike some clinical isolates; it is also susceptible to drugs and
amenable to genetic manipulation. An integrated map of the 4.4
megabase (Mb) circular chromosome of this slow-growing pathogen had been established previously and ordered libraries of
cosmids and bacterial artificial chromosomes (BACs) were
available7,8.
Organization and sequence of the genome
Sequence analysis. To obtain the contiguous genome sequence, a
combined approach was used that involved the systematic sequence
analysis of selected large-insert clones (cosmids and BACs) as well as
Nature © Macmillan Publishers Ltd 1998
537
article
random small-insert clones from a whole-genome shotgun library.
This culminated in a composite sequence of 4,411,529 base pairs
(bp) (Figs 1, 2), with a G + C content of 65.6%. This represents the
second-largest bacterial genome sequence currently available (after
that of Escherichia coli)9. The initiation codon for the dnaA gene, a
hallmark for the origin of replication, oriC, was chosen as the start
point for numbering. The genome is rich in repetitive DNA,
particularly insertion sequences, and in new multigene families
and duplicated housekeeping genes. The G + C content is relatively
constant throughout the genome (Fig. 1) indicating that horizontally transferred pathogenicity islands of atypical base composition
are probably absent. Several regions showing higher than average G
+ C content (Fig. 1) were detected; these correspond to sequences
belonging to a large gene family that includes the polymorphic G +
C-rich sequences (PGRSs).
Genes for stable RNA. Fifty genes coding for functional RNA
molecules were found. These molecules were the three species
produced by the unique ribosomal RNA operon, the 10Sa RNA
involved in degradation of proteins encoded by abnormal messenger RNA, the RNA component of RNase P, and 45 transfer RNAs.
No 4.5S RNA could be detected. The rrn operon is situated
unusually as it occurs about 1,500 kilobases (kb) from the putative
oriC; most eubacteria have one or more rrn operons near to oriC to
exploit the gene-dosage effect obtained during replication10. This
arrangement may be related to the slow growth of M. tuberculosis.
The genes encoding tRNAs that recognize 43 of the 61 possible sense
codons were distributed throughout the genome and, with one
0
4
1
M. tuberculosis
H37Rv
4,411,529 bp
3
2
exception, none of these uses A in the first position of the anticodon,
indicating that extensive wobble occurs during translation. This is
consistent with the high G + C content of the genome and the
consequent bias in codon usage. Three genes encoding tRNAs for
methionine were found; one of these genes (metV) is situated in a
region that may correspond to the terminus of replication (Figs 1,
2). As metV is linked to defective genes for integrase and excisionase,
perhaps it was once part of a phage or similar mobile genetic
element.
Insertion sequences and prophages. Sixteen copies of the promiscuous insertion sequence IS6110 and six copies of the more stable
element IS1081 reside within the genome of H37Rv8. One copy of
IS1081 is truncated. Scrutiny of the genomic sequence led to the
identification of a further 32 different insertion sequence elements,
most of which have not been described previously, and of the 13E12
family of repetitive sequences which exhibit some of the characteristics of mobile genetic elements (Fig. 1). The newly discovered
insertion sequences belong mainly to the IS3 and IS256 families,
although six of them define a new group. There is extensive
similarity between IS1561 and IS1552 with insertion sequence
elements found in Nocardia and Rhodococcus spp., suggesting that
they may be widely disseminated among the actinomycetes.
Most of the insertion sequences in M. tuberculosis H37Rv appear
to have inserted in intergenic or non-coding regions, often near
tRNA genes (Fig. 1). Many are clustered, suggesting the existence of
insertional hot-spots that prevent genes from being inactivated, as
has been described for Rhizobium11. The chromosomal distribution
of the insertion sequences is informative as there appears to have
been a selection against insertions in the quadrant encompassing
oriC and an overrepresentation in the direct repeat region that
contains the prototype IS6110. This bias was also observed experimentally in a transposon mutagenesis study12.
At least two prophages have been detected in the genome
sequence and their presence may explain why M. tuberculosis
shows persistent low-level lysis in culture. Prophages phiRv1 and
phiRv2 are both ,10 kb in length and are similarly organized, and
some of their gene products show marked similarity to those
encoded by certain bacteriophages from Streptomyces and saprophytic mycobacteria. The site of insertion of phiRv1 is intriguing as
it corresponds to part of a repetitive sequence of the 13E12 family
that itself appears to have integrated into the biotin operon. Some
strains of M. tuberculosis have been described as requiring biotin as a
growth supplement, indicating either that phiRv1 has a polar effect
on expression of the distal bio genes or that aberrant excision,
leading to mutation, may occur. During the serial attenuation of M.
bovis that led to the vaccine strain M. bovis BCG, the phiRv1
prophage was lost13. In a systematic study of the genomic diversity
of prophages and insertion sequences (S.V.G. et al., manuscript in
preparation), only IS1532 exhibited significant variability, indicating that most of the prophages and insertion sequences are currently
stable. However, from these combined observations, one can conclude that horizontal transfer of genetic material into the free-living
ancestor of the M. tuberculosis complex probably occurred in nature
before the tubercle bacillus adopted its specialized intracellular
niche.
8
Figure 1 Circular map of the chromosome of M. tuberculosis H37Rv. The outer
circle shows the scale in Mb, with 0 representing the origin of replication. The first
ring from the exterior denotes the positions of stable RNA genes (tRNAs are blue,
Figure 2 Linear map of the chromosome of M. tuberculosis H37Rv showing the
others are pink) and the direct repeat region (pink cube); the second ring inwards
position and orientation of known genes and coding sequences (CDS). We used
shows the coding sequence by strand (clockwise, dark green; anticlockwise, light
the following functional categories (adapted from ref. 20): lipid metabolism
green); the third ring depicts repetitive DNA (insertion sequences, orange; 13E12
(black); intermediary metabolism and respiration (yellow); information pathways
REP family, dark pink; prophage, blue); the fourth ring shows the positions of the
(pink); regulatory proteins (sky blue); conserved hypothetical proteins (orange);
PPE family members (green); the fifth ring shows the PE family members (purple,
proteins of unknown function (light green); insertion sequences and phage-
excluding PGRS); and the sixth ring shows the positions of the PGRS sequences
related functions (blue); stable RNAs (purple); cell wall and cell processes (dark
(dark red). The histogram (centre) represents G + C content, with ,65% G + C in
green); PE and PPE protein families (magenta); virulence, detoxification and
yellow, and .65% G + C in red. The figure was generated with software from
adaptation (white). For additional information about gene functions, refer to
DNASTAR.
http://www.sanger.ac.uk.
538
Nature © Macmillan Publishers Ltd 1998
NATURE | VOL 393 | 11 JUNE 1998
Q
article
Genes encoding proteins. 3,924 open reading frames were identified in the genome (see Methods), accounting for ,91% of the
potential coding capacity (Figs 1, 2). A few of these genes appear to
have in-frame stop codons or frameshift mutations (irrespective of
the source of the DNA sequenced) and may either use frameshifting
during translation or correspond to pseudogenes. Consistent with
the high G + C content of the genome, GTG initiation codons (35%)
are used more frequently than in Bacillus subtilis (9%) and E. coli
(14%), although ATG (61%) is the most common translational
start. There are a few examples of atypical initiation codons, the
most notable being the ATC used by infC, which begins with ATT in
both B. subtilis and E. coli9,14. There is a slight bias in the orientation
of the genes (Fig. 1) with respect to the direction of replication as
,59% are transcribed with the same polarity as replication,
compared with 75% in B. subtilis. In other bacteria, genes transcribed in the same direction as the replication forks are believed to
be expressed more efficiently9,14. Again, the more even distribution
in gene polarity seen in M. tuberculosis may reflect the slow growth
and infrequent replication cycles. Three genes (dnaB, recA and
Rv1461) have been invaded by sequences encoding inteins (protein
introns) and in all three cases their counterparts in M. leprae also
contain inteins, but at different sites15 (S.T.C. et al., unpublished
observations).
Protein function, composition and duplication. By using various
database comparisons, we attributed precise functions to ,40% of
the predicted proteins and found some information or similarity for
another 44%. The remaining 16% resembled no known proteins
and may account for specific mycobacterial functions. Examination
of the amino-acid composition of the M. tuberculosis proteome by
correspondence analysis16, and comparison with that of other
microorganisms whose genome sequences are available, revealed a
statistically significant preference for the amino acids Ala, Gly, Pro,
Arg and Trp, which are all encoded by G + C-rich codons, and a
comparative reduction in the use of amino acids encoded by A + Trich codons such as Asn, Ile, Lys, Phe and Tyr (Fig. 3). This approach
also identified two groups of proteins rich in Asn or Gly that belong
to new families, PE and PPE (see below). The fraction of the
proteome that has arisen through gene duplication is similar to
that seen in E. coli or B. subtilis (,51%; refs 9, 14), except that the
level of sequence conservation is considerably higher, indicating
that there may be extensive redundancy or differential production
of the corresponding polypeptides. The apparent lack of divergence
following gene duplication is consistent with the hypothesis that
M. tuberculosis is of recent descent6.
General metabolism, regulation and drug resistance
Metabolic pathways. From the genome sequence, it is clear that the
tubercle bacillus has the potential to synthesize all the essential
amino acids, vitamins and enzyme co-factors, although some of the
pathways involved may differ from those found in other bacteria. M.
tuberculosis can metabolize a variety of carbohydrates, hydrocarbons, alcohols, ketones and carboxylic acids2,17. It is apparent from
genome inspection that, in addition to many functions involved in
lipid metabolism, the enzymes necessary for glycolysis, the pentose
phosphate pathway, and the tricarboxylic acid and glyoxylate cycles
are all present. A large number (,200) of oxidoreductases, oxygenases and dehydrogenases is predicted, as well as many oxygenases
containing cytochrome P450, that are similar to fungal proteins
involved in sterol degradation. Under aerobic growth conditions,
ATP will be generated by oxidative phosphorylation from electron
transport chains involving a ubiquinone cytochrome b reductase
complex and cytochrome c oxidase. Components of several anaerobic phosphorylative electron transport chains are also present,
including genes for nitrate reductase (narGHJI), fumarate reductase
(frdABCD) and possibly nitrite reductase (nirBD), as well as a new
reductase (narX) that results from a rearrangement of a homologue
of the narGHJI operon. Two genes encoding haemoglobin-like
NATURE | VOL 393 | 11 JUNE 1998
proteins, which may protect against oxidative stress or be involved
in oxygen capture, were found. The ability of the bacillus to adapt its
metabolism to environmental change is significant as it not only has
to compete with the lung for oxygen but must also adapt to the
microaerophilic/anaerobic environment at the heart of the
burgeoning granuloma.
Regulation and signal transduction. Given the complexity of the
environmental and metabolic choices facing M. tuberculosis, an
extensive regulatory repertoire was expected. Thirteen putative
sigma factors govern gene expression at the level of transcription
initiation, and more than 100 regulatory proteins are predicted
(Table 1). Unlike B. subtilis and E. coli, in which there are .30 copies
of different two-component regulatory systems14, M. tuberculosis
has only 11 complete pairs of sensor histidine kinases and response
regulators, and a few isolated kinase and regulatory genes. This
relative paucity in environmental signal transduction pathways is
probably offset by the presence of a family of eukaryotic-like serine/
threonine protein kinases (STPKs), which function as part of a
phosphorelay system18. The STPKs probably have two domains: the
well-conserved kinase domain at the amino terminus is predicted to
be connected by a transmembrane segment to the carboxy-terminal
region that may respond to specific stimuli. Several of the predicted
envelope lipoproteins, such as that encoded by lppR (Rv2403), show
extensive similarity to this putative receptor domain of STPKs,
suggesting possible interplay. The STPKs probably function in
signal transduction pathways and may govern important cellular
decisions such as dormancy and cell division, and although their
partners are unknown, candidate genes for phosphoprotein phosphatases have been identified.
Drug resistance. M. tuberculosis is naturally resistant to many
antibiotics, making treatment difficult19. This resistance is due
mainly to the highly hydrophobic cell envelope acting as a permeability barrier4, but many potential resistance determinants are also
encoded in the genome. These include hydrolytic or drug-modifying enzymes such as b-lactamases and aminoglycoside acetyl
transferases, and many potential drug–efflux systems, such as 14
members of the major facilitator family and numerous ABC
transporters. Knowledge of these putative resistance mechanisms
will promote better use of existing drugs and facilitate the conception of new therapies.
8
F2 -22.6%
Glu
Mj
0.1
Tyr
Gly
Val
Ile
Lys
0
Af
Mth
Ae
Bs
Bb
Arg
Met
Mt
Cys
Asp
Hp
Phe
-0.1
Leu
Sc
Ser
Pro
Hi
Ce
His
Ec
Ss
Thr
Mg
Ala
Trp
Mp
Asn
-0.2
-0.3
Gln
-0.30
-0.15
0
0.15
0.30
F1 - 55.2%
Figure 3 Correspondence analysis of the proteomes from extensively
sequenced organisms as a function of amino-acid composition. Note the
extreme position of M. tuberculosis and the shift in amino-acid preference
reflecting increasing G + C content from left to right. Abbreviations used: Ae,
Aquifex aeolicus; Af, Archaeoglobus fulgidis; Bb, Borrelia burgdorfei; Bs, B.
subtilis; Ce, Caenorhabditis elegans; Ec, E. coli; Hi, Haemophilus influenzae; Hp,
Helicobacter pylori; Mg, Mycoplasma genitalium; Mj, Methanococcus jannaschi;
Mp, Mycoplasma pneumoniae; Mt, M. tuberculosis; Mth, Methanobacterium
thermoautotrophicum; Sc, Saccharomyces cerevisiae; Ss, Synechocystis sp.
strain PCC6803. F1 and F2, first and second factorial axes16.
Nature © Macmillan Publishers Ltd 1998
539
article
Figure 4 Lipid metabolism. a, Degradation of
a
OH
host-cell lipids is vital in the intracellular life of
O
R
SCoA
echA1-21
R
SCoA
β-Oxidation
(fadA/fadB)
fadE1-36
O
Citric
acid
cycle
O
O
R
accA1/accD1
accA2/accD2
O
fadD1-36
as well as potential precursors of mycobac-
Purines,
Pyrimidines
Glyoxylate
shunt
SCoA
precursors for many metabolic processes,
NADH, FADH 2
SCoA
SCoA
fadA2-6
O
R
ATP
fadB2-5
O
M. tuberculosis. Host-cell membranes provide
terial cell-wall constituents, through the
actions of a broad family of b-oxidative
Porphyrins
enzymes encoded by multiple copies in the
Amino acids
genome. These enzymes produce acetyl
Glucose
CoA, which can be converted into many
different metabolites and fuel for the bacteria
S Co A
O
through the actions of the enzymes of the
CO2
R
citric acid cycle and the glyoxylate shunt of
OH
this cycle. b, The genes that synthesize
Cell wall and mycobacterial lipids
mycolic acids, the dominant lipid component
of the mycobacterial cell wall, include the
type I fatty acid synthase (fas) and a unique
Host membranes
type II system which relies on extension of a
precursor bound to an acyl carrier protein to
b
form full-length (,80-carbon) mycolic acids.
The
H
OH
O
H
OH
fas
cma
genes
are
responsible
for
cyclopropanation. c, The genes that produce
phthiocerol dimycocerosate form a large
operon and represent type I (mas) and type
II (the pps operon) polyketide synthase
systems. Functions are colour coordinated.
0
1
fabD
mabA
2
acpM
3
kasA
kasB
4
5
6
7
8 (kb)
accD
inhA
cmaA1
Gene
Function
fas
Fatty acid synthase, produces C16 -C 26 acyl-CoA esters
Malonyl-CoA:AcpM acyltransferase, AcpM loading
Acyl carrier protein, meromycolate precursor transport
Ketoacyl acyl carrier protein synthase, chain elongation
Ketoacyl acyl carrier protein synthase, chain elongation
Acetyl-CoA carboxylase, malonyl-CoA synthesis
3-oxo-acyl-acyl carrier protein reductase, reduces KasA/B product
Enoyl-acyl carrier protein reductase
cyclopropane mycolic acid synthase 1, distal-position specific
cyclopropane mycolic acid synthase 2, proximal-position specific
fabD
acpM
kasA
kasB
accD
mabA
inhA
cmaA1
cmaA2
cmaA2
c
CH 3
CH 3
O
ppsA
ppsB
ppsC
ppsD
ppsE
CH 3
CH 3
mas
CH 3
CH 3
0
10
20
30
O
OCH 3
CH 3
CH 3
CH 3
CH 3
40 (kb)
Gene
Function
mas
Four rounds extension of C 16,18 using Me-malonyl CoA
ppsA
Extension of C 18 with malony CoA, partial reduction
ppsB
Extension with malony CoA, partial reduction
ppsC
Extension with malony CoA, complete reduction
ppsD
Extension with Me-malony CoA, partial reduction
ppsE
Extension with malony CoA, partial reduction, decarboxylation
540
O
O
Nature © Macmillan Publishers Ltd 1998
NATURE | VOL 393 | 11 JUNE 1998
8
article
Lipid metabolism
Very few organisms produce such a diverse array of lipophilic
molecules as M. tuberculosis. These molecules range from simple
fatty acids such as palmitate and tuberculostearate, through isoprenoids, to very-long-chain, highly complex molecules such as
mycolic acids and the phenolphthiocerol alcohols that esterify with
mycocerosic acid to form the scaffold for attachment of the mycosides. Mycobacteria contain examples of every known lipid and
polyketide biosynthetic system, including enzymes usually found in
mammals and plants as well as the common bacterial systems. The
biosynthetic capacity is overshadowed by the even more remarkable
radiation of degradative, fatty acid oxidation systems and, in total,
there are ,250 distinct enzymes involved in fatty acid metabolism
in M. tuberculosis compared with only 50 in E. coli20.
Fatty acid degradation. In vivo-grown mycobacteria have been
suggested to be largely lipolytic, rather than lipogenic, because of
the variety and quantity of lipids available within mammalian cells
and the tubercle2 (Fig. 4a). The abundance of genes encoding
components of fatty acid oxidation systems found by our genomic
approach supports this proposition, as there are 36 acyl-CoA
synthases and a family of 36 related enzymes that could catalyse
the first step in fatty acid degradation. There are 21 homologous
enzymes belonging to the enoyl-CoA hydratase/isomerase superfamily of enzymes, which rehydrate the nascent product of the acylCoA dehydrogenase. The four enzymes that convert the 3-hydroxy
fatty acid into a 3-keto fatty acid appear less numerous, mainly
a
Representatives of the PE family
PE
~110 aa
PE
~110 aa
PGRS
(GGAGGA) n
0 .. >1,400 aa
unique sequence
0 .. ~500 aa
Representatives of the PPE family
PPE
~180 aa
PPE
~180 aa
PPE
~180 aa
MPTR
(NxGxGNxG) n
~200 .. >3,500 aa
GxxSVPxxW
~200 .. ~400 aa
unique sequence
0 .. ~400 aa
b
PE-PGRS sequence variation in ORF Rv0746
H37Rv
PE
29 aa
PGRS
BCG
PE
PGRS
46 aa
H37Rv .. ...GSGAPGGAGGAAGLWGTGGAGGAGGSSAGGGGAGGAGGAGGWLLGDGGAGGIGGAST
..........
:::::::::::::::::
.............................
:::::::::::
...GSGAPGGAGGAAGLWGT-----------------------------GGAGGIGGAST
BCG ....
H37Rv .. VLGGTGGGGGVGGLWGAGGAGGAGGTGLVGGDGGAGGAGGTGGLLAGLIGAGGGHGGTGG
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
VLGGTGGGGGVGGLWGAGGAGGAGGTGLVGGDGGAGGAGGTGGLLAGLIGAGGGHGGTGG
BCG ....
.......
H37Rv .. LSTNGDGGVGGAGGNAGMLAGPGGAGGAGGDGENLDTGGDGGAGGSAGLLFGSGGAGGAG
.......
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
LSTNGDGGVGGAGGNAGMLAGPGGAGGAGGDGENLDTGGDGGAGGSAGLLFGSGGAGGAG
BCG ....
H37Rv .. GFGFLGGDGGAGGNAGLLLSSGGAGGFGGFGTAGGVGGAGGNAGWLGF-----------::::::::::::::::::::::::::::::::::::::::::::::::
GFGFLGGDGGAGGNAGLLLSSGGAGGFGGFGTAGGVGGAGGNAGWLGFGGAGGIGGIGGN
BCG ....
.......
H37Rv .. ----------------------------------GGAGGVGGSAGLIGTGGNGGNGGTGA
.......::::::::::::::::::::::::::::::::::
::::::::::::::::::::::::::
BCG ....
ANGGAGGNGGTGGQLWGSGGAGGEGGAALSVGDTGGAGGVGGSAGLIGTGGNGGNGGTGA
H37Rv .. NAGSPGTGGAGGLLLGQNGLNGLP*
::::::::::::::::::::::::
NAGSPGTGGAGGLLLGQNGLNGLP*
BCG ....
.......
Figure 5 The PE and PPE protein families. a, Classification of the PE and PPE
protein families. b, Sequence variation between M. tuberculosis H37Rv and M.
bovis BCG-Pasteur in the PE-PGRS encoded by open reading frame (ORF)
Rv0746.
NATURE | VOL 393 | 11 JUNE 1998
because they are difficult to distinguish from other members of the
short-chain alcohol dehydrogenase family on the basis of primary
sequence. The five enzymes that complete the cycle by thiolysis of
the b-ketoester, the acetyl-CoA C-acetyltransferases, do indeed
appear to be a more limited family. In addition to this extensive
set of dissociated degradative enzymes, the genome also encodes the
canonical FadA/FadB b-oxidation complex (Rv0859 and Rv0860).
Accessory activities are present for the metabolism of odd-chain and
multiply unsaturated fatty acids.
Fatty acid biosynthesis. At least two discrete types of enzyme
system, fatty acid synthase (FAS) I and FAS II, are involved in
fatty acid biosynthesis in mycobacteria (Fig. 4b). FAS I (Rv2524, fas)
is a single polypeptide with multiple catalytic activities that generates several shorter CoA esters from acetyl-CoA primers5 and
probably creates precursors for elongation by all of the other fatty
acid and polyketide systems. FAS II consists of dissociable enzyme
components which act on a substrate bound to an acyl-carrier
protein (ACP). FAS II is incapable of de novo fatty acid synthesis but
instead elongates palmitoyl-ACP to fatty acids ranging from 24 to
56 carbons in length17,21. Several different components of FAS II may
be targets for the important tuberculosis drug isoniazid, including
the enoyl-ACP reductase InhA22, the ketoacyl-ACP synthase KasA
and the ACP AcpM21. Analysis of the genome shows that there are
only three potential ketoacyl synthases: KasA and KasB are highly
related, and their genes cluster with acpM, whereas KasC is a more
distant homologue of a ketoacyl synthase III system. The number of
ketoacyl synthase and ACP genes indicates that there is a single FAS
II system. Its genetic organization, with two clustered ketoacyl
synthases, resembles that of type II aromatic polyketide biosynthetic
gene clusters, such as those for actinorhodin, tetracycline and
tetracenomycin in Streptomyces species23. InhA seems to be the sole
enoyl-ACP reductase and its gene is co-transcribed with a fabG
homologue, which encodes 3-oxoacyl-ACP reductase. Both of these
proteins are probably important in the biosynthesis of mycolic acids.
Fatty acids are synthesized from malonyl-CoA and precursors are
generated by the enzymatic carboxylation of acetyl (or propionyl)CoA by a biotin-dependent carboxylase (Fig. 4b). From study of the
genome we predict that there are three complete carboxylase
systems, each consisting of an a- and a b-subunit, as well as three
b-subunits without an a-counterpart. As a group, all of the
carboxylases seem to be more related to the mammalian homologues than to the corresponding bacterial enzymes. Two of these
carboxylase systems (accA1, accD1 and accA2, accD2) are probably
involved in degradation of odd-numbered fatty acids, as they are
adjacent to genes for other known degradative enzymes. They may
convert propionyl-CoA to succinyl-CoA, which can then be incorporated into the tricarboxylic acid cycle. The synthetic carboxylases
(accA3, accD3, accD4, accD5 and accD6) are more difficult to
understand. The three extra b-subunits might direct carboxylation
to the appropriate precursor or may simply increase the total
amount of carboxylated precursor available if this step were ratelimiting.
Synthesis of the paraffinic backbone of fatty and mycolic acids in
the cell is followed by extensive postsynthetic modifications and
unsaturations, particularly in the case of the mycolic acids24,25.
Unsaturation is catalysed either by a FabA-like b-hydroxyacylACP dehydrase, acting with a specific ketoacyl synthase, or by an
aerobic terminal mixed function desaturase that uses both molecular oxygen and NADPH. Inspection of the genome revealed no
obvious candidates for the FabA-like activity. However, three
potential aerobic desaturases (encoded by desA1, desA2 and
desA3) were evident that show little similarity to related vertebrate
or yeast enzymes (which act on CoA esters) but instead resemble
plant desaturases (which use ACP esters). Consequently, the genomic data indicate that unsaturation of the meromycolate chain may
occur while the acyl group is bound to AcpM.
Much of the subsequent structural diversity in mycolic acids is
Nature © Macmillan Publishers Ltd 1998
8
541
article
generated by a family of S-adenosyl-L-methionine-dependent
enzymes, which use the unsaturated meromycolic acid as a substrate
to generate cis and trans cyclopropanes and other mycolates. Six
members of this family have been identified and characterized25 and
two clustered, convergently transcribed new genes are evident in the
genome (umaA1 and umaA2). From the functions of the known
family members and the structures of mycolic acids in M. tuberculosis, it is tempting to speculate that these new enzymes may
introduce the trans cyclopropanes into the meromycolate precursor.
In addition to these two methyltransferases, there are two other
unrelated lipid methyltransferases (Ufa1 and Ufa2) that share
homology with cyclopropane fatty acid synthase of E. coli25.
Although cyclopropanation seems to be a relatively common
modification of mycolic acids, cyclopropanation of plasma-membrane constituents has not been described in mycobacteria. Tuberculostearic acid is produced by methylation of oleic acid, and may
be synthesized by one of these two enzymes.
Condensation of the fully functionalized and preformed meromycolate chain with a 26-carbon a-branch generates full-length
mycolic acids that must be transported to their final location for
attachment to the cell-wall arabinogalactan. The transfer and
subsequent transesterification is mediated by three well-known
immunogenic proteins of the antigen 85 complex26. The genome
encodes a fourth member of this complex, antigen 85C9 ( fbpC2,
Rv0129), which is highly related to antigen 85C. Further studies are
needed to show whether the protein possesses mycolytransferase
activity and to clarify the reason behind the apparent redundancy.
Polyketide synthesis. Mycobacteria synthesize polyketides by several different mechanisms. A modular type I system, similar to that
involved in erythromycin biosynthesis23, is encoded by a very large
operon, ppsABCDE, and functions in the production of
phenolphthiocerol5. The absence of a second type I polyketide
synthase suggests that the related lipids phthiocerol A and B,
phthiodiolone A and phthiotriol may all be synthesized by the
same system, either from alternative primers or by differential
postsynthetic modification. It is physiologically significant that
the pps gene cluster occurs immediately upstream of mas, which
encodes the multifunctional enzyme mycocerosic acid synthase
(MAS), as their products phthiocerol and mycocerosic acid esterify
to form the very abundant cell-wall-associated molecule phthiocerol dimycocerosate (Fig. 4c).
Members of another large group of polyketide synthase enzymes
are similar to MAS, which also generates the multiply methylbranched fatty acid components of mycosides and phthiocerol
dimycocerosate, abundant cell-wall-associated molecules5. Although
some of these polyketide synthases may extend type I FAS CoA
primers to produce other long-chain methyl-branched fatty acids
such as mycolipenic, mycolipodienic and mycolipanolic acids or the
phthioceranic and hydroxyphthioceranic acids, or may even show
functional overlap5, there are many more of these enzymes than
there are known metabolites. Thus there may be new lipid and
polyketide metabolites that are expressed only under certain conditions, such as during infection and disease.
A fourth class of polyketide synthases is related to the plant
enzyme superfamily that includes chalcone and stilbene synthase23.
These polyketide synthases are phylogenetically divergent from all
other polyketide and fatty acid synthases and generate unreduced
polyketides that are typically associated with anthocyanin pigments
and flavonoids. The function of these systems, which are often
linked to apparent type I modules, is unknown. An example is the
gene cluster spanning pks10, pks7, pks8 and pks9, which includes two
of the chalcone-synthase-like enzymes and two modules of an
apparent type I system. The unknown metabolites produced by
these enzymes are interesting because of the potent biological
activities of some polyketides such as the immunosuppressor
rapamycin.
Siderophores. Peptides that are not ribosomally synthesized are
542
made by a process that is mechanistically analogous to polyketide
synthesis23,27. These peptides include the structurally related ironscavenging siderophores, the mycobactins and the exochelins2,28,
which are derived from salicylate by the addition of serine (or
threonine), two lysines and various fatty acids and possible polyketide segments. The mbt operon, encoding one apparent salicylateactivating protein, three amino-acid ligases, and a single module of
a type I polyketide synthase, may be responsible for the biosynthesis
of the mycobacterial siderophores. The presence of only one nonribosomal peptide-synthesis system indicates that this pathway may
generate both siderophores and that subsequent modification of a
single e-amino group of one lysine residue may account for the
different physical properties and function of the siderophores28.
Immunological aspects and pathogenicity
Given the scale of the global tuberculosis burden, vaccination is not
only a priority but remains the only realistic public health intervention that is likely to affect both the incidence and the prevalence
of the disease29. Several areas of vaccine development are promising,
including DNA vaccination, use of secreted or surface-exposed
proteins as immunogens, recombinant forms of BCG and rational
attenuation of M. tuberculosis29. All of these avenues of research will
benefit from the genome sequence as its availability will stimulate
more focused approaches. Genes encoding ,90 lipoproteins were
identified, some of which are enzymes or components of transport
systems, and a similar number of genes encoding preproteins (with
type I signal peptides) that are probably exported by the Secdependent pathway. M. tuberculosis seems to have two copies of
secA. The potent T-cell antigen Esat-6 (ref. 30), which is probably
secreted in a Sec-independent manner, is encoded by a member of a
multigene family. Examination of the genetic context reveals several
similarly organized operons that include genes encoding large ATPhydrolysing membrane proteins that might act as transporters. One
of the surprises of the genome project was the discovery of two
extensive families of novel glycine-rich proteins, which may be of
immunological significance as they are predicted to be abundant
and potentially polymorphic antigens.
The PE and PPE multigene families. About 10% of the coding
capacity of the genome is devoted to two large unrelated families of
acidic, glycine-rich proteins, the PE and PPE families, whose genes
are clustered (Figs 1, 2) and are often based on multiple copies of the
polymorphic repetitive sequences referred to as PGRSs, and major
polymorphic tandem repeats (MPTRs), respectively31,32. The names
PE and PPE derive from the motifs Pro–Glu (PE) and Pro–Pro–
Glu (PPE) found near the N terminus in most cases33. The 99
members of the PE protein family all have a highly conserved Nterminal domain of ,110 amino-acid residues that is predicted to
have a globular structure, followed by a C-terminal segment that
varies in size, sequence and repeat copy number (Fig. 5). Phylogenetic analysis separated the PE family into several subfamilies. The
largest of these is the highly repetitive PGRS class, which contains 61
members; members of the other subfamilies, share very limited
sequence similarity in their C-terminal domains (Fig. 5). The
predicted molecular weights of the PE proteins vary considerably
as a few members contain only the N-terminal domain, whereas
most have C-terminal extensions ranging in size from 100 to 1,400
residues. The PGRS proteins have a high glycine content (up to
50%), which is the result of multiple tandem repetitions of Gly–
Gly–Ala or Gly–Gly–Asn motifs, or variations thereof.
The 68 members of the PPE protein family (Fig. 5) also have a
conserved N-terminal domain that comprises ,180 amino-acid
residues, followed by C-terminal segments that vary markedly in
sequence and length. These proteins fall into at least three groups,
one of which constitutes the MPTR class characterized by the
presence of multiple, tandem copies of the motif Asn–X–Gly–X–
Gly–Asn–X–Gly. The second subgroup contains a characteristic,
well-conserved motif around position 350, whereas the third contains
Nature © Macmillan Publishers Ltd 1998
NATURE | VOL 393 | 11 JUNE 1998
8
article
proteins that are unrelated except for the presence of the common
180-residue PPE domain.
The subcellular location of the PE and PPE proteins is unknown
and in only one case, that of a lipase (Rv3097), has a function been
demonstrated. On examination of the protein database from the
extensively sequenced M. leprae15, no PGRS- or MPTR-related
polypeptides were detected but a few proteins belonging to the
non-MPTR subgroup of the PPE family were found. These proteins
include one of the major antigens recognized by leprosy patients,
the serine-rich antigen34. Although it is too early to attribute
biological functions to the PE and PPE families, it is tempting to
speculate that they could be of immunological importance. Two
interesting possibilities spring to mind. First, they could represent
the principal source of antigenic variation in what is otherwise a
genetically and antigenically homogeneous bacterium. Second,
these glycine-rich proteins might interfere with immune responses
by inhibiting antigen processing.
Several observations and results support the possibility of antigenic variation associated with both the PE and the PPE family
proteins. The PGRS member Rv1759 is a fibronectin-binding
protein of relative molecular mass 55,000 (ref. 35) that elicits a
variable antibody response, indicating either that individuals
mount different immune responses or that this PGRS protein
may vary between strains of M. tuberculosis. The latter possibility
is supported by restriction fragment length polymorphisms for
various PGRS and MPTR sequences in clinical isolates33. Direct
support for genetic variation within both the PE and the PPE
families was obtained by comparative DNA sequence analysis (Fig.
5). The gene for the PE–PGRS protein Rv0746 of BCG differs from
that in H37Rv by the deletion of 29 codons and the insertion of 46
codons. Similar variation was seen in the gene for the PPE protein
Rv0442 (data not shown). As these differences were all associated
with repetitive sequences they could have resulted from intergenic
or intragenic recombinational events or, more probably, from
strand slippage during replication32. These mechanisms are
known to generate antigenic variability in other bacterial
pathogens36.
There are several parallels between the PGRS proteins and the
Epstein–Barr virus nuclear antigens (EBNAs). Members of both
polypeptide families are glycine-rich, contain extensive Gly–Ala
repeats, and exhibit variation in the length of the repeat region
between different isolates. The Gly–Ala repeat region of EBNA1
functions as a cis-acting inhibitor of the ubiquitin/proteasome
antigen-processing pathway that generates peptides presented in
the context of major histocompatibility complex (MHC) class I
molecules37,38. MHC class I knockout mice are very susceptible to M.
tuberculosis, underlining the importance of a cytotoxic T-cell
response in protection against disease3,39. Given the many potential
effects of the PPE and PE proteins, it is important that further
studies are performed to understand their activity. If extensive
antigenic variability or reduced antigen presentation were indeed
found, this would be significant for vaccine design and for understanding protective immunity in tuberculosis, and might even
explain the varied responses seen in different BCG vaccination
programmes40.
Pathogenicity. Despite intensive research efforts, there is little
information about the molecular basis of mycobacterial virulence41.
However, this situation should now change as the genome sequence
will accelerate the study of pathogenesis as never before, because
other bacterial factors that may contribute to virulence are becoming apparent. Before the completion of the genome sequence, only
three virulence factors had been described41: catalase-peroxidase,
which protects against reactive oxygen species produced by the
phagocyte; mce, which encodes macrophage-colonizing factor42;
and a sigma factor gene, sigA (aka rpoV), mutations in which can
lead to attenuation41. In addition to these single-gene virulence
factors, the mycobacterial cell wall4 is also important in pathology,
NATURE | VOL 393 | 11 JUNE 1998
but the complex nature of its biosynthesis makes it difficult to
identify critical genes whose inactivation would lead to attenuation.
On inspection of the genome sequence, it was apparent that four
copies of mce were present and that these were all situated in
operons, comprising eight genes, organized in exactly the same
manner. In each case, the genes preceding mce code for integral
membrane proteins, whereas mce and the following five genes are all
predicted to encode proteins with signal sequences or hydrophobic
stretches at the N terminus. These sets of proteins, about which little
is known, may well be secreted or surface-exposed; this is consistent
with the proposed role of Mce in invasion of host cells42. Furthermore, a homologue of smpB, which has been implicated in intracellular survival of Salmonella typhimurium, has also been
identified43. Among the other secreted proteins identified from
the genome sequence that could act as virulence factors are a
series of phospholipases C, lipases and esterases, which might
attack cellular or vacuolar membranes, as well as several proteases.
One of these phospholipases acts as a contact-dependent haemolysin (N. Stoker, personal communication). The presence of storage
proteins in the bacillus, such as the haemoglobin-like oxygen
captors described above, points to its ability to stockpile essential
growth factors, allowing it to persist in the nutrient-limited environment of the phagosome. In this regard, the ferritin-like proteins,
encoded by bfrA and bfrB, may be important in intracellular survival
as the capacity to acquire enough iron in the vacuole is very
M
limited.
8
.........................................................................................................................
Methods
Sequence analysis. Initially, ,3.2 Mb of sequence was generated from
cosmids8 and the remainder was obtained from selected BAC clones7 and
45,000 whole-genome shotgun clones. Sheared fragments (1.4–2.0 kb) from
cosmids and BACs were cloned into M13 vectors, whereas genomic DNA was
cloned in pUC18 to obtain both forward and reverse reads. The PGRS genes
were grossly underrepresented in pUC18 but better covered in the BAC and
cosmid M13 libraries. We used small-insert libraries44 to sequence regions
prone to compression or deletion and, in some cases, obtained sequences from
products of the polymerase chain reaction or directly from BACs7. All shotgun
sequencing was performed with standard dye terminators to minimize compression problems, whereas finishing reactions used dRhodamine or BigDye
terminators (http://www.sanger.ac.uk). Problem areas were verified by using
dye primers. Thirty differences were found between the genomic shotgun
sequences and the cosmids; twenty of which were due to sequencing errors and
ten to mutations in cosmids (1 error per 320 kb). Less than 0.1% of the
sequence was from areas of single-clone coverage, and ,0.2% was from one
strand with only one sequencing chemistry.
Informatics. Sequence assembly involved PHRAP, GAP4 (ref. 45) and a
customized perl script that merges sequences from different libraries and
generates segments that can be processed by several finishers simultaneously.
Sequence analysis and annotation was managed by DIANA (B.G.B. et al.,
unpublished). Genes encoding proteins were identified by TB-parse46 using a
hidden Markov model trained on known M. tuberculosis coding and noncoding regions and translation-initiation signals, with corroboration by positional base preference. Interrogation of the EMBL, TREMBL, SwissProt,
PROSITE47 and in-house databases involved BLASTN, BLASTX48, DOTTER
(http://www.sanger.ac.uk) and FASTA49. tRNA genes were located and identified using tRNAscan and tRNAscan-SE50. The complete sequence, a list of
annotated cosmids and linking regions can be found on our website (http://
www. sanger.ac.uk) and in MycDB (http://www.pasteur.fr/mycdb/).
Received 15 April; accepted 8 May 1998.
1. Snider, D. E. Jr, Raviglione, M. & Kochi, A. in Tuberculosis: Pathogenesis, Protection, and Control (ed.
Bloom, B. R.) 2–11 (Am. Soc. Microbiol., Washington DC, 1994).
2. Wheeler, P. R. & Ratledge, C. in Tuberculosis: Pathogenesis, Protection, and Control (ed. Bloom, B. R.)
353–385 (Am. Soc. Microbiol., Washington DC, 1994).
3. Chan, J. & Kaufmann, S. H. E. in Tuberculosis: Pathogenesis, Protection, and Control (ed. Bloom, B. R.)
271–284 (Am. Soc. Microbiol., Washington DC, 1994).
4. Brennan, P. J. & Draper, P. in Tuberculosis: Pathogenesis, Protection, and Control (ed. Bloom, B. R.)
271–284 (Am. Soc. Microbiol., Washington DC, 1994).
5. Kolattukudy, P. E., Fernandes, N. D., Azad, A. K., Fitzmaurice, A. M. & Sirakova, T. D. Biochemistry
Nature © Macmillan Publishers Ltd 1998
543
article
and molecular genetics of cell-wall lipid biosynthesis in mycobacteria. Mol. Microbiol. 24, 263–270
(1997).
6. Sreevatsan, S. et al. Restricted structural gene polymorphism in the Mycobacterium tuberculosis
complex indicates evolutionarily recent global dissemination. Proc. Natl Acad. Sci. USA 94, 9869–
9874 (1997).
7. Brosch, R. et al. Use of a Mycobacterium tuberculosis H37Rv bacterial artificial chromosome library for
genome mapping, sequencing and comparative genomics. Infect. Immun. 66, 2221–2229 (1998).
8. Philipp, W. J. et al. An integrated map of the genome of the tubercle bacillus, Mycobacterium
tuberculosis H37Rv, and comparison with Mycobacterium leprae. Proc. Natl Acad. Sci. USA 93, 3132–
3137 (1996).
9. Blattner, F. R. et al. The complete genome sequence of Escherichia coli K-12. Science 277, 1453–1462
(1997).
10. Cole, S. T. & Saint-Girons, I. Bacterial genomics. FEMS Microbiol. Rev. 14, 139–160 (1994).
11. Freiberg, C. et al. Molecular basis of symbiosis between Rhizobium and legumes. Nature 387, 394–401
(1997).
12. Bardarov, S. et al. Conditionally replicating mycobacteriophages: a system for transposon delivery to
Mycobacterium tuberculosis. Proc. Natl Acad. Sci. USA 94, 10961–10966 (1997).
13. Mahairas, G. G., Sabo, P. J., Hickey, M. J., Singh, D. C. & Stover, C. K. Molecular analysis of genetic
differences between Mycobacterium bovis BCG and virulent M. bovis. J. Bacteriol. 178, 1274–1282
(1996).
14. Kunst, F. et al. The complete genome sequence of the gram-positive bacterium Bacillus subtilis. Nature
390, 249–256 (1997).
15. Smith, D. R. et al. Multiplex sequencing of 1.5 Mb of the Mycobacterium leprae genome. Genome Res.
7, 802–819 (1997).
16. Greenacre, M. Theory and Application of Correspondence Analysis (Academic, London, 1984).
17. Ratledge, C. R. in The Biology of the Mycobacteria (eds Ratledge, C. & Stanford, J.) 53–94 (Academic,
San Diego, 1982).
18. Av-Gay, Y. & Davies, J. Components of eukaryotic-like protein signaling pathways in Mycobacterium
tuberculosis. Microb. Comp. Genomics 2, 63–73 (1997).
19. Cole, S. T. & Telenti, A. Drug resistance in Mycobacterium tuberculosis. Eur. Resp. Rev. 8, 701S–713S
(1995).
20. Riley, M. & Labedan, B. in Escherichia coli and Salmonella (ed. Neidhardt, F. C.) 2118–2202 (ASM,
Washington, 1996).
21. Mdluli, K. et al. Inhibition of a Mycobacterium tuberculosis b-ketoacyl ACP synthase by isoniazid.
Science 280, 1607–1610 (1998).
22. Banerjee, A. et al. inhA, a gene encoding a target for isoniazid and ethionamide in Mycobacterium
tuberculosis. Science 263, 227–230 (1994).
23. Hopwood, D. A. Genetic contributions to understanding polyketide synthases. Chem. Rev. 97, 2465–
2497 (1997).
24. Minnikin, D. E. in The Biology of the Mycobacteria (eds Ratledge, C. & Stanford, J.) 95–184 (Academic,
London, 1982).
25. Barry, C. E. III et al. Mycolic acids: structure, biosynthesis, and phsyiological functions. Prog. Lipid
Res. (in the press).
26. Belisle, J. T. et al. Role of the major antigen of Mycobacterium tuberculosis in cell wall biogenesis.
Science 276, 1420–1422 (1997).
27. Marahiel, M. A., Stachelhaus, T. & Mootz, H. D. Modular peptide synthetases involved in
nonribosomal peptide synthesis. Chem. Rev. 97, 2651–2673 (1997).
28. Gobin, J. et al. Iron acquisition by Mycobacterium tuberculosis: isolation and characterization of a
family of iron-binding exochelins. Proc. Natl Acad. Sci. USA 92, 5189–5193 (1995).
29. Young, D. B. & Fruth, U. in New Generation Vaccines (eds Levine, M., Woodrow, G., Kaper, J. & Cobon,
G. S.) 631–645 (Marcel Dekker, New York, 1997).
30. Sorensen, A. L., Nagai, S., Houen, G., Andersen, P. & Anderson, A. B. Purification and characterization
of a low-molecular-mass T-cell antigen secreted by Mycobacterium tuberculosis. Infect. Immun. 63,
544
1710–1717 (1995).
31. Hermans, P. W. M., van Soolingen, D. & van Embden, J. D. A. Characterization of a major
polymorphic tandem repeat in Mycobacterium tuberculosis and its potential use in the epidemiology
of Mycobacterium kansasii and Mycobacterium gordonae. J. Bacteriol. 174, 4157–4165 (1992).
32. Poulet, S. & Cole, S. T. Characterisation of the polymorphic GC-rich repetitive sequence (PGRS)
present in Mycobacterium tuberculosis. Arch. Microbiol. 163, 87–95 (1995).
33. Cole, S. T. & Barrell, B. G. in Genetics and Tuberculosis (eds Chadwick, D. J. & Cardew, G., Novartis
Foundation Symp. 217) 160–172 (Wiley, Chichester, 1998).
34. Vega-Lopez, F. et al. Sequence and immunological characterization of a serine-rich antigen from
Mycobacterium leprae. Infect. Immun. 61, 2145–2153 (1993).
35. Abou-Zeid, C. et al. Genetic and immunological analysis of Mycobacterium tuberculosis fibronectinbinding proteins. Infect. Immun. 59, 2712–2718 (1991).
36. Robertson, B. D. & Meyer, T. F. Genetic variation in pathogenic bacteria. Trends Genet. 8, 422–427
(1992).
37. Levitskaya, J. et al. Inhibition of antigen processing by the internal repeat region of the Epstein-Barr
virus nuclear antigen-1. Nature 375, 685–688 (1995).
38. Levitskaya, J., Sharipo, A., Leonchiks, A., Ciechanover, A. & Masucci, M. G. Inhibition of ubiquitin/
proteasome-dependent protein degradation by the Gly-Ala repeat domain of the Epstein-Barr virus
nuclear antigen 1. Proc. Natl Acad. Sci. USA 94, 12616–12621 (1997).
39. Flynn, J. L., Goldstein, M. A., Treibold, K. J., Koller, B. & Bloom, B. R. Major histocompatability
complex class-I restricted T cells are required for resistance to Mycobacterium tuberculosis infection.
Proc. Natl Acad. Sci. USA 89, 12013–12017 (1992).
40. Bloom, B. R. & Fine, P. E. M. in Tuberculosis: Pathogenesis, Protection, and Control (ed. Bloom, B. R.)
531–557 (Am. Soc. Microbiol., Washington DC, 1994).
41. Collins, D. M. In search of tuberculosis virulence genes. Trends Microbiol. 4, 426–430 (1996).
42. Arruda, S., Bomfim, G., Knights, R., Huima-Byron, T. & Riley, L. W. Cloning of an M. tuberculosis
DNA fragment associated with entry and survival inside cells. Science 261, 1454–1457 (1993).
43. Baumler, A. J., Kusters, J. G., Stojikovic, I. & Heffron, F. Salmonella typhimurium loci involved in
survival within macrophages. Infect. Immun. 62, 1623–1630 (1994).
44. McMurray, A. A., Sulston, J. E. & Quail, M. A. Short-insert libraries as a method of problem solving in
genome sequencing. Genome Res. 8, 562–566 (1998).
45. Bonfield, J. K., Smith, K. F. & Staden, R. A new DNA sequence assembly program. Nucleic Acids Res.
24, 4992–4999 (1995).
46. Krogh, A., Mian, I. S. & Haussler, D. A hidden Markov model that finds genes in E. coli DNA. Nucleic
Acids Res. 22, 4768–4778 (1994).
47. Bairoch, A., Bucher, P. & Hofmann, K. The PROSITE database, its status in 1997. Nucleic Acids Res. 25,
217–221 (1997).
48. Altschul, S., Gish, W., Miller, W., Myers, E. & Lipman, D. A basic local alignment search tool. J. Mol.
Biol. 215, 403–410 (1990).
49. Pearson, W. & Lipman, D. Improved tools for biological sequence comparisons. Proc. Natl Acad. USA
85, 2444–2448 (1988).
50. Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in
genomic DNA. Nucleic Acids Res. 25, 955–964 (1997).
Acknowledgements. We thank Y. Av-Gay, F.-C. Bange, A. Danchin, B. Dujon, W. R. Jacobs Jr, L. Jones,
M. McNeil, I. Moszer, P. Rice and J. Stephenson for advice, reagents and support. This work was supported
by the Wellcome Trust. Additional funding was provided by the Association Française Raoul Follereau,
the World Health Organisation and the Institut Pasteur. S.V.G. received a Wellcome Trust travelling
research fellowship.
Correspondence and requests for materials should be addressed to B.G.B. ([email protected]) or S.T.C.
([email protected]). The complete sequence has been deposited in EMBL/GenBank/DDBJ as MTBH37RV,
accession number AL123456.
Nature © Macmillan Publishers Ltd 1998
NATURE | VOL 393 | 11 JUNE 1998
8
Table 1. Functional classification of Mycobacterium tuberculosis protein-coding genes
I. Small-molecule metabolism
A. Degradation
1. Carbon compounds
Rv0186
bglS
β-glucosidase
Rv2202c cbhK
carbohydrate kinase
Rv0727c fucA
L-fuculose phosphate aldolase
Rv1731
gabD1 succinate-semialdehyde dehydrogenase
Rv0234c gabD2
succinate-semialdehyde dehydrogenase
Rv0501
galE1
UDP-glucose 4-epimerase
Rv0536
galE2
UDP-glucose 4-epimerase
Rv0620
galK
galactokinase
Rv0619
galT
galactose-1-phosphate uridylyltransferase C-term
Rv0618
galT'
galactose-1-phosphate uridylyltransferase N-term
Rv0993
galU
UTP-glucose-1-phosphate uridylyltransferase
Rv3696c glpK
ATP:glycerol 3-phosphotransferase
Rv3255c manA
mannose-6-phosphate isomerase
Rv3441c mrsA
phosphoglucomutase or phosphomannomutase
Rv0118c oxcA
oxalyl-CoA decarboxylase
Rv3068c pgmA
phosphoglucomutase
Rv3257c pmmA
phosphomannomutase
Rv3308
pmmB
phosphomannomutase
Rv2702
ppgK
polyphosphate glucokinase
Rv0408
pta
phosphate acetyltransferase
Rv0729
xylB
xylulose kinase
Rv1096
carbohydrate degrading enzyme
2. Amino acids and amines
Rv1905c aao
D-amino acid oxidase
Rv2531c adi
ornithine/arginine decarboxylase
Rv2780
ald
L-alanine dehydrogenase
Rv1538c ansA
L-asparaginase
Rv1001
arcA
arginine deiminase
Rv0753c mmsA
methylmalmonate semialdehyde
dehydrogenase
Rv0751c mmsB
methylmalmonate semialdehyde
oxidoreductase
Rv1187
rocA
pyrroline-5-carboxylate dehydrogenase
Rv2322c rocD1
ornithine aminotransferase
Rv2321c rocD2
ornithine aminotransferase
Rv1848
ureA
urease γ subunit
Rv1849
ureB
urease β subunit
Rv1850
ureC
urease α subunit
Rv1853
ureD
urease accessory protein
Rv1851
ureF
urease accessory protein
Rv1852
ureG
urease accessory protein
Rv2913c probable D-amino acid
aminohydrolase
Rv3551
possible glutaconate CoAtransferase
3. Fatty acids
Rv2501c accA1
Rv0973c
accA2
Rv2502c
accD1
Rv0974c
accD2
Rv3667
Rv3409c
Rv0222
acs
choD
echA1
Rv0456c
echA2
Rv0632c
echA3
Rv0673
echA4
Rv0675
echA5
Rv0905
echA6
Rv0971c
echA7
Rv1070c
echA8
Rv1071c
echA9
Rv1142c
echA10
Rv1141c
echA11
Rv1472
echA12
Rv1935c
echA13
Rv2486
echA14
Rv2679
echA15
acetyl/propionyl-CoA carboxylase,
α subunit
acetyl/propionyl-CoA carboxylase,
α subunit
acetyl/propionyl-CoA carboxylase,
β subunit
acetyl/propionyl-CoA carboxylase,
β subunit
acetyl-CoA synthase
cholesterol oxidase
enoyl-CoA hydratase/isomerase
superfamily
enoyl-CoA hydratase/isomerase
superfamily
enoyl-CoA hydratase/isomerase
superfamily
enoyl-CoA hydratase/isomerase
superfamily
enoyl-CoA hydratase/isomerase
superfamily
enoyl-CoA hydratase/isomerase
superfamily (aka eccH)
enoyl-CoA hydratase/isomerase
superfamily
enoyl-CoA hydratase/isomerase
superfamily
enoyl-CoA hydratase/isomerase
superfamily
enoyl-CoA hydratase/isomerase
superfamily
enoyl-CoA hydratase/isomerase
superfamily
enoyl-CoA hydratase/isomerase
superfamily
enoyl-CoA hydratase/isomerase
superfamily
enoyl-CoA hydratase/isomerase
superfamily
enoyl-CoA hydratase/isomerase
Rv2831
Rv3039c
Rv3373
Rv3374
Rv3516
Rv3550
Rv3774
Rv0859
Rv0243
Rv1074c
Rv1323
Rv3546
Rv3556c
Rv0860
Rv0468
Rv1715
Rv3141
Rv1912c
Rv1750c
Rv0270
Rv3561
Rv0214
Rv0166
Rv1206
Rv0119
Rv0551c
Rv2590
Rv0099
Rv1550
Rv1549
Rv1427c
Rv3089
Rv1058
Rv2187
Rv0852
Rv3506
Rv3513c
Rv3515c
Rv1185c
Rv2948c
Rv3826
Rv1529
Rv1521
Rv2930
Rv0275c
Rv2941
Rv2950c
Rv0404
Rv1925
Rv3801c
Rv1345
Rv0035
Rv2505c
Rv1193
Rv0131c
Rv0154c
Rv0215c
Rv0231
Rv0244c
Rv0271c
Rv0400c
Rv0672
Rv0752c
Rv0873
Rv0972c
Rv0975c
Rv1346
Rv1467c
Rv1679
Rv1934c
Rv1933c
Rv2500c
Rv2724c
Rv2789c
Rv3061c
Rv3140
Rv3139
Rv3274c
Rv3504
Rv3505
Rv3544c
superfamily
enoyl-CoA hydratase/isomerase
superfamily
echA17 enoyl-CoA hydratase/isomerase
superfamily
echA18 enoyl-CoA hydratase/isomerase
superfamily, N-term
echA18' enoyl-CoA hydratase/isomerase
superfamily, C-term
echA19 enoyl-CoA hydratase/isomerase
superfamily
echA20 enoyl-CoA hydratase/isomerase
superfamily
echA21 enoyl-CoA hydratase/isomerase
superfamily
fadA
β oxidation complex, β subunit
(acetyl-CoA C-acetyltransferase)
fadA2
acetyl-CoA C-acetyltransferase
fadA3
acetyl-CoA C-acetyltransferase
fadA4
acetyl-CoA C-acetyltransferase
(aka thiL)
fadA5
acetyl-CoA C-acetyltransferase
fadA6
acetyl-CoA C-acetyltransferase
fadB
β oxidation complex, α subunit
(multiple activities)
fadB2
3-hydroxyacyl-CoA dehydrogenase
fadB3
3-hydroxyacyl-CoA dehydrogenase
fadB4
3-hydroxyacyl-CoA dehydrogenase
fadB5
3-hydroxyacyl-CoA dehydrogenase
fadD1
acyl-CoA synthase
fadD2
acyl-CoA synthase
fadD3
acyl-CoA synthase
fadD4
acyl-CoA synthase
fadD5
acyl-CoA synthase
fadD6
acyl-CoA synthase
fadD7
acyl-CoA synthase
fadD8
acyl-CoA synthase
fadD9
acyl-CoA synthase
fadD10 acyl-CoA synthase
fadD11 acyl-CoA synthase, N-term
fadD11' acyl-CoA synthase, C-term
fadD12 acyl-CoA synthase
fadD13 acyl-CoA synthase
fadD14 acyl-CoA synthase
fadD15 acyl-CoA synthase
fadD16 acyl-CoA synthase
fadD17 acyl-CoA synthase
fadD18 acyl-CoA synthase
fadD19 acyl-CoA synthase
fadD21 acyl-CoA synthase
fadD22 acyl-CoA synthase
fadD23 acyl-CoA synthase
fadD24 acyl-CoA synthase
fadD25 acyl-CoA synthase
fadD26 acyl-CoA synthase
fadD27 acyl-CoA synthase
fadD28 acyl-CoA synthase
fadD29 acyl-CoA synthase
fadD30 acyl-CoA synthase
fadD31 acyl-CoA synthase
fadD32 acyl-CoA synthase
fadD33 acyl-CoA synthase
fadD34 acyl-CoA synthase
fadD35 acyl-CoA synthase
fadD36 acyl-CoA synthase
fadE1
acyl-CoA dehydrogenase
fadE2
acyl-CoA dehydrogenase
fadE3
acyl-CoA dehydrogenase
fadE4
acyl-CoA dehydrogenase
fadE5
acyl-CoA dehydrogenase
fadE6
acyl-CoA dehydrogenase
fadE7
acyl-CoA dehydrogenase
fadE8
acyl-CoA dehydrogenase
(aka aidB)
fadE9
acyl-CoA dehydrogenase
fadE10 acyl-CoA dehydrogenase
fadE12 acyl-CoA dehydrogenase
fadE13 acyl-CoA dehydrogenase
fadE14 acyl-CoA dehydrogenase
fadE15 acyl-CoA dehydrogenase
fadE16 acyl-CoA dehydrogenase
fadE17 acyl-CoA dehydrogenase
fadE18 acyl-CoA dehydrogenase
fadE19 acyl-CoA dehydrogenase
(aka mmgC)
fadE20 acyl-CoA dehydrogenase
fadE21 acyl-CoA dehydrogenase
fadE22 acyl-CoA dehydrogenase
fadE23 acyl-CoA dehydrogenase
fadE24 acyl-CoA dehydrogenase
fadE25 acyl-CoA dehydrogenase
fadE26 acyl-CoA dehydrogenase
fadE27 acyl-CoA dehydrogenase
fadE28 acyl-CoA dehydrogenase
echA16
Nature © Macmillan Publishers Ltd 1998
Rv3543c
Rv3560c
Rv3562
Rv3563
Rv3564
Rv3573c
Rv3797
Rv3761c
Rv1175c
Rv0855
Rv1143
Rv1492
fadE29
fadE30
fadE31
fadE32
fadE33
fadE34
fadE35
fadE36
fadH
far
mcr
mutA
Rv1493
mutB
Rv2504c
scoA
Rv2503c
scoB
Rv1136
Rv1683
-
acyl-CoA dehydrogenase
acyl-CoA dehydrogenase
acyl-CoA dehydrogenase
acyl-CoA dehydrogenase
acyl-CoA dehydrogenase
acyl-CoA dehydrogenase
acyl-CoA dehydrogenase
acyl-CoA dehydrogenase
2,4-Dienoyl-CoA Reductase
fatty acyl-CoA racemase
α-methyl acyl-CoA racemase
methylmalonyl-CoA mutase, β
subunit
methylmalonyl-CoA mutase, α
subunit
3-oxo acid:CoA transferase, α subunit
3-oxo acid:CoA transferase, β subunit
probable carnitine racemase
possible acyl-CoA synthase
4. Phosphorous compounds
Rv2368c phoH
ATP-binding pho regulon
component
Rv1095
phoH2 PhoH-like protein
Rv3628
ppa
probable inorganic pyrophosphatase
Rv2984
ppk
polyphosphate kinase
B. Energy metabolism
1. Glycolysis
Rv1023
eno
enolase
Rv0363c fba
fructose bisphosphate aldolase
Rv1436
gap
glyceraldehyde 3-phosphate dehydrogenase
Rv0489
gpm
phosphoglycerate mutase I
Rv3010c pfkA
phosphofructokinase I
Rv2029c pfkB
phosphofructokinase II
Rv0946c pgi
glucose-6-phosphate isomerase
Rv1437
pgk
phosphoglycerate kinase
Rv1617
pykA
pyruvate kinase
Rv1438
tpi
triosephosphate isomerase
Rv2419c putative phosphoglycerate mutase
Rv3837c putative phosphoglycerate mutase
2. Pyruvate dehydrogenase
Rv2241
aceE
pyruvate dehydrogenase E1 component
Rv3303c lpdA
dihydrolipoamide dehydrogenase
Rv2497c pdhA
pyruvate dehydrogenase E1 component α subunit
Rv2496c pdhB
pyruvate dehydrogenase E1 component β subunit
Rv2495c pdhC
dihydrolipoamide acetyltransferase
Rv0462
probable dihydrolipoamide dehydrogenase
3. TCA cycle
Rv1475c acn
Rv0889c citA
Rv2498c citE
Rv1098c fum
Rv1131
gltA1
Rv0896
gltA2
Rv3339c icd1
Rv0066c icd2
Rv0794c lpdB
Rv1240
mdh
Rv2967c pca
Rv3318
sdhA
Rv3319
sdhB
Rv3316
sdhC
Rv3317
sdhD
Rv1248c
Rv2215
sucA
sucB
Rv0951
Rv0952
sucC
sucD
4. Glyoxylate bypass
Rv0467
aceA
Rv1915
aceAa
Rv1916
aceAb
Rv1837c glcB
Rv3323c gphA
aconitate hydratase
citrate synthase 2
citrate lyase β chain
fumarase
citrate synthase 3
citrate synthase 1
isocitrate dehydrogenase
isocitrate dehydrogenase
dihydrolipoamide dehydrogenase
malate dehydrogenase
pyruvate carboxylase
succinate dehydrogenase A
succinate dehydrogenase B
succinate dehydrogenase C subunit
succinate dehydrogenase D subunit
2-oxoglutarate dehydrogenase
dihydrolipoamide succinyltransferase
succinyl-CoA synthase β chain
succinyl-CoA synthase α chain
isocitrate lyase
isocitrate lyase, α module
isocitrate lyase, β module
malate synthase
phosphoglycolate phosphatase
5. Pentose phosphate pathway
Rv1445c devB
glucose-6-phosphate 1-dehydrogenase
Rv1844c gnd
6-phosphogluconate dehydrogenase (Gram –)
Rv1122
gnd2
6-phosphogluconate dehydrogenase (Gram +)
Rv1446c opcA
unknown function, may aid
G6PDH
8
Rv2436
Rv1408
Rv2465c
Rv1448c
Rv1449c
Rv1121
rbsK
rpe
rpi
tal
tkt
zwf
Rv1447c
zwf2
6. Respiration
a. aerobic
Rv0527
ccsA
Rv0529
ccsB
Rv1451
ctaB
Rv2200c
Rv3043c
ctaC
ctaD
Rv2193
ctaE
Rv1542c
Rv2470
Rv2249c
glbN
glbO
glpD1
Rv3302c
glpD2
Rv0694
lldD1
Rv1872c
Rv1854c
Rv3145
Rv3146
Rv3147
Rv3148
Rv3149
Rv3150
Rv3151
Rv3152
Rv3153
Rv3154
Rv3155
Rv3156
Rv3157
Rv3158
Rv2195
lldD2
ndh
nuoA
nuoB
nuoC
nuoD
nuoE
nuoF
nuoG
nuoH
nuoI
nuoJ
nuoK
nuoL
nuoM
nuoN
qcrA
Rv2196
qcrB
Rv2194
qcrC
b. anaerobic
Rv2392
cysH
Rv2899c
Rv2900c
fdhD
fdhF
Rv1552
frdA
Rv1553
frdB
Rv1554
frdC
Rv1555
frdD
Rv1161
Rv1162
Rv1164
Rv1163
Rv1736c
Rv2391
narG
narH
narI
narJ
narX
nirA
Rv0252
Rv0253
nirB
nirD
ribokinase
ribulose-phosphate 3-epimerase
phosphopentose isomerase
transaldolase
transketolase
glucose-6-phosphate 1-dehydrogenase
glucose-6-phosphate 1-dehydrogenase
cytochrome c-type biogenesis
protein
cytochrome c-type biogenesis
protein
cytochrome c oxidase assembly
factor
cytochrome c oxidase chain II
cytochrome c oxidase polypeptide I
cytochrome c oxidase polypeptide III
hemoglobin-like, oxygen carrier
hemoglobin-like, oxygen carrier
glycerol-3-phosphate dehydrogenase
glycerol-3-phosphate dehydrogenase
L-lactate dehydrogenase
(cytochrome)
L-lactate dehydrogenase
probable NADH dehydrogenase
NADH dehydrogenase chain A
NADH dehydrogenase chain B
NADH dehydrogenase chain C
NADH dehydrogenase chain D
NADH dehydrogenase chain E
NADH dehydrogenase chain F
NADH dehydrogenase chain G
NADH dehydrogenase chain H
NADH dehydrogenase chain I
NADH dehydrogenase chain J
NADH dehydrogenase chain K
NADH dehydrogenase chain L
NADH dehydrogenase chain M
NADH dehydrogenase chain N
Rieske iron-sulphur component of
ubiQ-cytB reductase
cytochrome β component of ubiQcytB reductase
cytochrome b/c component of
ubiQ-cytB reductase
3'-phosphoadenylylsulfate (PAPS)
reductase
affects formate dehydrogenase-N
molybdopterin-containing oxidoreductase
fumarate reductase flavoprotein
subunit
fumarate reductase iron sulphur
protein
fumarate reductase 15kD anchor
protein
fumarate reductase 13kD anchor
protein
nitrate reductase α subunit
nitrate reductase β chain
nitrate reductase γ chain
nitrate reductase δ chain
fused nitrate reductase
probable nitrite reductase/sulphite
reductase
nitrite reductase flavoprotein
probable nitrite reductase small
subunit
c. Electron transport
Rv0409
ackA
acetate kinase
Rv1623c appC
cytochrome bd-II oxidase
subunit I
Rv1622c cydB
cytochrome d ubiquinol oxidase
subunit II
Rv1620c cydC
ABC transporter
Rv1621c cydD
ABC transporter
Rv2007c fdxA
ferredoxin
Rv3554
fdxB
ferredoxin
Rv1177
fdxC
ferredoxin 4Fe-4S
Rv3503c fdxD
probable ferredoxin
Rv3029c fixA
electron transfer flavoprotein
β subunit
Rv3028c fixB
electron transfer flavoprotein α
subunit
Rv3106
fprA
adrenodoxin and NADPH ferredoxin reductase
Rv0886
fprB
ferredoxin, ferredoxin-NADP
reductase
Rv3251c rubA
rubredoxin A
Rv3250c
rubB
rubredoxin B
7. Miscellaneous oxidoreductases and oxygenases 171
8. ATP-proton motive force
Rv1308
atpA
ATP synthase
Rv1304
atpB
ATP synthase
Rv1311
atpC
ATP synthase
Rv1310
atpD
ATP synthase
Rv1305
atpE
ATP synthase
Rv1306
atpF
ATP synthase
Rv1309
atpG
ATP synthase
Rv1307
atpH
ATP synthase
α chain
α chain
e chain
β chain
c chain
b chain
γ chain
δ chain
C. Central intermediary metabolism
1. General
Rv2589
gabT
4-aminobutyrate aminotransferase
Rv3432c gadB
glutamate decarboxylase
Rv1832
gcvB
glycine decarboxylase
Rv1826
gcvH
glycine cleavage system H protein
Rv2211c gcvT
T protein of glycine cleavage
system
Rv1213
glgC
glucose-1-phosphate adenylyltransferase
Rv3842c glpQ1
glycerophosphoryl diester phosphodiesterase
Rv0317c glpQ2
glycerophosphoryl diester phosphodiesterase
Rv3566c nhoA
N-hydroxyarylamine o-acetyltransferase
Rv0155
pntAA
pyridine transhydrogenase subunit α1
Rv0156
pntAB
pyridine transhydrogenase subunit α2
Rv0157
pntB
pyridine transhydrogenase
subunit β
Rv1127c ppdK
similar to pyruvate, phosphate
dikinase
2. Gluconeogenesis
Rv0211
pckA
phosphoenolpyruvate carboxykinase
Rv0069c sdaA
L-serine dehydratase 1
3. Sugar nucleotides
Rv1512
epiA
nucleotide sugar epimerase
Rv3784
epiB
probable UDP-galactose 4epimerase
Rv1511
gmdA
GDP-mannose 4,6 dehydratase
Rv0334
rmlA
glucose-1-phosphate thymidyltransferase
Rv3264c rmlA2
glucose-1-phosphate thymidyltransferase
Rv3464
rmlB
dTDP-glucose 4,6-dehydratase
Rv3634c rmlB2
dTDP-glucose 4,6-dehydratase
Rv3468c rmlB3
dTDP-glucose 4,6-dehydratase
Rv3465
rmlC
dTDP-4-dehydrorhamnose
3,5-epimerase
Rv3266c rmlD
dTDP-4-dehydrorhamnose
reductase
Rv0322
udgA
UDP-glucose
dehydrogenase/GDP-mannose 6dehydrogenase
Rv3265c wbbL
dTDP-rhamnosyl transferase
Rv1525
wbbl2
dTDP-rhamnosyl transferase
Rv3400
probable β-phosphoglucomutase
4. Amino sugars
Rv3436c glmS
5. Sulphur
Rv0711
Rv3299c
Rv0663
Rv3077
Rv0296c
Rv3796
Rv1285
Rv1286
Rv2131c
Rv3248c
Rv3283
Rv2291
Rv3118
Rv0814c
Rv3762c
glucosamine-fructose-6phosphate aminotransferase
metabolism
atsA
arylsulfatase
atsB
proable arylsulfatase
atsD
proable arylsulfatase
atsF
proable arylsulfatase
atsG
proable arylsulfatase
atsH
proable arylsulfatase
cysD
ATP:sulphurylase subunit 2
cysN
ATP:sulphurylase subunit 1
cysQ
homologue of M.leprae cysQ
sahH
adenosylhomocysteinase
sseA
thiosulfate sulfurtransferase
sseB
thiosulfate sulfurtransferase
sseC
thiosulfate sulfurtransferase
sseC2
thiosulfate sulfurtransferase
probable alkyl sulfatase
D. Amino acid biosynthesis
1. Glutamate family
Rv1654
argB
acetylglutamate kinase
Rv1652
argC
N-acetyl-γ-glutamyl-phosphate
reductase
Rv1655
argD
acetylornithine aminotransferase
Rv1656
argF
ornithine carbamoyltransferase
Rv1658
argG
arginosuccinate synthase
Rv1659
argH
arginosuccinate lyase
Rv1653
argJ
glutamate N-acetyltransferase
Rv2220
glnA1
glutamine synthase class I
Rv2222c glnA2
glutamine synthase class II
Nature © Macmillan Publishers Ltd 1998
Rv1878
Rv2860c
Rv2918c
Rv2221c
glnA3
glnA4
glnD
glnE
Rv3859c
gltB
Rv3858c
gltD
Rv3704c
gshA
Rv2427c
Rv2439c
Rv0500
proA
proB
proC
2. Aspartate family
Rv3708c asd
Rv3709c
Rv2201
Rv3565
Rv0337c
Rv2753c
Rv2773c
Rv1202
ask
asnB
aspB
aspC
dapA
dapB
dapE
Rv2141c
dapE2
Rv2726c
Rv1293
Rv3341
Rv1079
Rv3340
Rv1133c
dapF
lysA
metA
metB
metC
metE
Rv2124c
metH
Rv1392
Rv0391
metK
metZ
Rv1294
Rv1296
Rv1295
thrA
thrB
thrC
3. Serine family
Rv0815c cysA2
Rv3117
cysA3
Rv2335
cysE
Rv0511
cysG
Rv2847c
cysG2
Rv2334
Rv1336
Rv1077
Rv0848
Rv1093
Rv0070c
Rv2996c
cysK
cysM
cysM2
cysM3
glyA
glyA2
serA
Rv0505c
serB
Rv3042c
serB2
Rv0884c
serC
probable glutamine synthase
proable glutamine synthase
uridylyltransferase
glutamate-ammonia-ligase
adenyltransferase
ferredoxin-dependent glutamate
synthase
small subunit of NADH-dependent
glutamate synthase
possible γ-glutamylcysteine synthase
γ-glutamyl phosphate reductase
glutamate 5-kinase
pyrroline-5-carboxylate reductase
aspartate semialdehyde dehydrogenase
aspartokinase
asparagine synthase B
aspartate aminotransferase
aspartate aminotransferase
dihydrodipicolinate synthase
dihydrodipicolinate reductase
succinyl-diaminopimelate desuccinylase
ArgE/DapE/Acy1/Cpg2/yscS
family
diaminopimelate epimerase
diaminopimelate decarboxylase
homoserine o-acetyltransferase
cystathionine γ-synthase
cystathionine β-lyase
5-methyltetrahydropteroyltriglutamate-homocysteine methyltransferase
5-methyltetrahydrofolate-homocysteine methyltransferase
S-adenosylmethionine synthase
o-succinylhomoserine sulfhydrylase
homoserine dehydrogenase
homoserine kinase
homoserine synthase
thiosulfate sulfurtransferase
thiosulfate sulfurtransferase
serine acetyltransferase
uroporphyrin-III c-methyltransferase
multifunctional enzyme, siroheme
synthase
cysteine synthase A
cysteine synthase B
cystathionine β-synthase
putative cysteine synthase
serine hydroxymethyltransferase
serine hydroxymethyltransferase
D-3-phosphoglycerate dehydrogenase
probable phosphoserine phosphatase
C-term similar to phosphoserine
phosphatase
phosphoserine aminotransferase
4. Aromatic amino acid family
Rv3227
aroA
3-phosphoshikimate
1-carboxyvinyl transferase
Rv2538c aroB
3-dehydroquinate synthase
Rv2537c aroD
3-dehydroquinate dehydratase
Rv2552c aroE
shikimate 5-dehydrogenase
Rv2540c aroF
chorismate synthase
Rv2178c aroG
DAHP synthase
Rv2539c aroK
shikimate kinase I
Rv3838c pheA
prephenate dehydratase
Rv1613
trpA
tryptophan synthase α chain
Rv1612
trpB
tryptophan synthase β chain
Rv1611
trpC
indole-3-glycerol phosphate
synthase
Rv2192c trpD
anthranilate phosphoribosyltransferase
Rv1609
trpE
anthranilate synthase
component I
Rv2386c trpE2
anthranilate synthase
component I
Rv3754
tyrA
prephenate dehydrogenase
5. Histidine
Rv1603
hisA
Rv1601
hisB
Rv1600
hisC
Rv3772
hisC2
Rv1599
hisD
phosphoribosylformimino-5aminoimidazole carboxamide
ribonucleotide isomerase
imidazole glycerol-phosphate
dehydratase
histidinol-phosphate aminotransferase
histidinol-phosphate aminotransferase
histidinol dehydrogenase
8
Rv1605
hisF
Rv2121c
Rv1602
Rv2122c
hisG
hisH
hisI
Rv1606
hisI2
Rv0114
-
6. Pyruvate family
Rv3423c alr
imidazole glycerol-phosphate
synthase
ATP phosphoribosyltransferase
amidotransferase
phosphoribosyl-AMP cyclohydrolase
probable phosphoribosyl-AMP 1,6
cyclohydrolase
similar to HisB
Rv3048c
nrdG
Rv3053c
nrdH
Rv3052c
Rv3247c
Rv2764c
Rv0570
Rv3752c
nrdI
tmk
thyA
nrdZ
-
subunit
ribonucleoside-diphosphate small
subunit
glutaredoxin electron transport
component of NrdEF system
NrdI/YgaO/YmaA family
thymidylate kinase
thymidylate synthase
ribonucleotide reductase, class II
probable cytidine/deoxycytidylate
deaminase
alanine racemase
7. Branched amino acid family
Rv1559
ilvA
threonine deaminase
Rv3003c ilvB
acetolactate synthase I large subunit
Rv3470c ilvB2
acetolactate synthase large subunit
Rv3001c ilvC
ketol-acid reductoisomerase
Rv0189c ilvD
dihydroxy-acid dehydratase
Rv2210c ilvE
branched-chain-amino-acid
transaminase
Rv1820
ilvG
acetolactate synthase II
Rv3002c ilvN
acetolactate synthase I small subunit
Rv3509c ilvX
probable acetohydroxyacid synthase I large subunit
Rv3710
leuA
α-isopropyl malate synthase
Rv2995c leuB
3-isopropylmalate dehydrogenase
Rv2988c leuC
3-isopropylmalate dehydratase
large subunit
Rv2987c leuD
3-isopropylmalate dehydratase
small subunit
E. Polyamine synthesis
Rv2601
speE
spermidine synthase
F. Purines, pyrimidines, nucleosides and nucleotides
1. Purine ribonucleotide biosynthesis
Rv1389
gmk
putative guanylate kinase
Rv3396c guaA
GMP synthase
Rv1843c guaB1
inosine-5'-monophosphate dehydrogenase
Rv3411c guaB2
inosine-5'-monophosphate dehydrogenase
Rv3410c guaB3
inosine-5'-monophosphate dehydrogenase
Rv1017c prsA
ribose-phosphate pyrophosphokinase
Rv0357c purA
adenylosuccinate synthase
Rv0777
purB
adenylosuccinate lyase
Rv0780
purC
phosphoribosylaminoimidazolesuccinocarboxamide synthase
Rv0772
purD
phosphoribosylamine-glycine ligase
Rv3275c purE
phosphoribosylaminoimidazole
carboxylase
Rv0808
purF
amidophosphoribosyltransferaseRv0957
purH
phosphoribosylaminoimidazolecarboxamide formyltransferase
Rv3276c purK
phosphoribosylaminoimidazole
carboxylase ATPase subunit
Rv0803
purL
phosphoribosylformylglycinamidine synthase II
Rv0809
purM
5'-phosphoribosyl-5-aminoimidazole synthase
Rv0956
purN
phosphoribosylglycinamide
formyltransferase I
Rv0788
purQ
phosphoribosylformylglycinamidine synthase I
Rv0389
purT
phosphoribosylglycinamide
formyltransferase II
Rv2964
purU
formyltetrahydrofolate deformylase
4. Salvage of nucleosides and nucleotides
Rv3313c add
probable adenosine deaminase
Rv2584c apt
adenine phosphoribosyltransferases
Rv3315c cdd
probable cytidine deaminase
Rv3314c deoA
thymidine phosphorylase
Rv0478
deoC
deoxyribose-phosphate aldolase
Rv3307
deoD
probable purine nucleoside phosphorylase
Rv3624c hpt
probable hypoxanthine-guanine
phosphoribosyltransferase
Rv3393
iunH
probable inosine-uridine
preferring nucleoside hydrolase
Rv0535
pnp
phosphorylase from Pnp/MtaP
family 2
Rv3309c upp
uracil phophoribosyltransferase
5. Miscellaneous nucleoside/nucleotide reactions
Rv0733
adk
probable adenylate kinase
Rv2364c bex
GTP-binding protein of Era/ThdF
family
Rv1712
cmk
cytidylate kinase
Rv2344c dgt
probable deoxyguanosine
triphosphate hydrolase
Rv2404c lepA
GTP-binding protein LepA
Rv2727c miaA
tRNA δ(2)-isopentenylpyrophosphate transferase
Rv2445c ndkA
nucleoside diphosphate kinase
Rv2440c obg
Obg GTP-binding protein
Rv2583c relA
(p)ppGpp synthase I
G. Biosynthesis of cofactors, prosthetic groups and
carriers
1. Biotin
Rv1568
bioA
adenosylmethionine-8-amino-7oxononanoate aminotransferase
Rv1589
bioB
biotin synthase
Rv1570
bioD
dethiobiotin synthase
Rv1569
bioF
8-amino-7-oxononanoate
synthase
Rv0032
bioF2
C-terminal similar to B. subtilis
BioF
Rv3279c birA
biotin apo-protein ligase
Rv1442
bisC
biotin sulfoxide reductase
Rv0089
possible bioC biotin synthesis
gene
2. Folic acid
Rv2763c dfrA
Rv2447c folC
Rv3356c folD
Rv3609c
Rv3606c
folE
folK
Rv3608c
Rv1207
Rv3607c
folP
folP2
folX
Rv0013
pabA
Rv1005c
Rv0812
pabB
pabC
2. Pyrimidine ribonucleotide biosynthesis
Rv1383
carA
carbamoyl-phosphate synthase
subunit
Rv1384
carB
carbamoyl-phosphate synthase
subunit
Rv1380
pyrB
aspartate carbamoyltransferase
Rv1381
pyrC
dihydroorotase
Rv2139
pyrD
dihydroorotate dehydrogenase
Rv1385
pyrF
orotidine 5'-phosphate decarboxylase
Rv1699
pyrG
CTP synthase
Rv2883c pyrH
uridylate kinase
Rv0382c umpA
probable uridine 5'-monophosphate synthase
3. Lipoate
Rv2218
lipA
Rv2217
lipB
3. 2'-deoxyribonucleotide metabolism
Rv0321
dcd
deoxycytidine triphosphate
deaminase
Rv2697c dut
deoxyuridine triphosphatase
Rv0233
nrdB
ribonucleoside-diphosphate
reductase B2 (eukaryotic-like)
Rv3051c nrdE
ribonucleoside diphosphate
reductase α chain
Rv1981c nrdF
ribonucleotide reductase small
4. Molybdopterin
Rv3109
moaA
Rv0869c
moaA2
Rv0438c
moaA3
Rv3110
moaB
Rv0984
moaB2
Rv3111
moaC
Rv0864
moaC2
Rv3324c
moaC3
Rv3112
moaD
Rv0868c
moaD2
dihydrofolate reductase
folylpolyglutamate synthase
methylenetetrahydrofolate dehydrogenase
GTP cyclohydrolase I
7,8-dihydro-6-hydroxymethylpterin
pyrophosphokinase
dihydropteroate synthase
dihydropteroate synthase
may be involved in folate biosynthesis
p-aminobenzoate synthase glutamine amidotransferase
p-aminobenzoate synthase
aminodeoxychorismate lyase
lipoate biosynthesis protein A
lipoate biosynthesis protein B
molybdenum cofactor biosynthesis, protein A
molybdenum cofactor biosynthesis, protein A
molybdenum cofactor biosynthesis, protein A
molybdenum cofactor biosynthesis, protein B
molybdenum cofactor biosynthesis, protein B
molybdenum cofactor biosynthesis, protein C
molybdenum cofactor biosynthesis, protein C
molybdenum cofactor biosynthesis, protein C
molybdopterin converting factor
subunit 1
molybdopterin converting factor
Nature © Macmillan Publishers Ltd 1998
Rv3119
moaE
Rv0866
moaE2
Rv3322c
moaE3
Rv0994
Rv3116
Rv2338c
Rv1681
Rv1355c
Rv3206c
moeA
moeB
moeW
moeX
moeY
moeZ
Rv0865
mog
5. Pantothenate
Rv1092c coaA
Rv2225
panB
panC
panD
Rv3602c
Rv3601c
6. Pyridoxine
Rv2607
pdxH
7. Pyridine
Rv1594
Rv1595
Rv1596
Rv0423c
subunit 1
molybdopterin-converting factor
subunit 2
molybdopterin-converting factor
subunit 2
molybdopterin-converting factor
subunit 2
molybdopterin biosynthesis
molybdopterin biosynthesis
molybdopterin biosynthesis
weak similarity to E. coli MoaA
weak similarity to E. coli MoeB
probably involved in
molybdopterin biosynthesis
molybdopterin biosynthesis
pantothenate kinase
3-methyl-2-oxobutanoate
hydroxymethyltransferase
pantoate-β-alanine ligase
aspartate 1-decarboxylase
pyridoxamine 5'-phosphate
oxidase
nucleotide
nadA
quinolinate synthase
nadB
L-aspartate oxidase
nadC
nicotinate-nucleotide pyrophosphatase
thiC
thiamine synthesis, pyrimidine
moiety
8. Thiamine
Rv0422c thiD
Rv0414c thiE
Rv0417
thiG
Rv2977c
thiL
9. Riboflavin
Rv1940
ribA
Rv1415
ribA2
Rv1412
ribC
Rv2671
ribD
Rv2786c ribF
Rv1409
ribG
Rv1416
ribH
Rv3300c -
phosphomethylpyrimidine kinase
thiamine synthesis, thiazole
moiety
thiamine synthesis, thiazole
moiety
probable thiamine-monophosphate kinase
GTP cyclohydrolase II
probable GTP cyclohydrolase II
riboflavin synthase α chain
probable riboflavin deaminase
riboflavin kinase
riboflavin biosynthesis
riboflavin synthase β chain
probable deaminase, riboflavin
synthesis
10. Thioredoxin, glutaredoxin and mycothiol
Rv0773c ggtA
putative γ-glutamyl transpeptidase
Rv2394
ggtB
γ -glutamyltranspeptidase
precursor
Rv2855
gorA
glutathione reductase homologue
Rv0816c thiX
equivalent to M. leprae ThiX
Rv1470
trxA
thioredoxin
Rv1471
trxB
thioredoxin reductase
Rv3913
trxB2
thioredoxin reductase
Rv3914
trxC
thioredoxin
11. Menaquinone, PQQ, ubiquinone and other
terpenoids
Rv2682c dxs
1-deoxy-D-xylulose 5-phosphate
synthase
Rv0562
grcC1
heptaprenyl diphosphate
synthase II
Rv0989c grcC2
heptaprenyl diphosphate
synthase II
Rv3398c idsA
geranylgeranyl pyrophosphate
synthase
Rv2173
idsA2
geranylgeranyl pyrophosphate
synthase
Rv3383c idsB
transfergeranyl, similar geranyl
pyrophosphate synthase
Rv0534c menA
4-dihydroxy-2-naphthoate
octaprenyltransferase
Rv0548c menB
naphthoate synthase
Rv0553
menC
o-succinylbenzoate-CoA synthase
Rv0555
menD
2-succinyl-6-hydroxy-2,4-cyclohexadiene-1-carboxylate synthase
Rv0542c menE
o-succinylbenzoic acid-CoA ligase
Rv3853
menG
S-adenosylmethionine:
2-demethylmenaquinone
Rv3397c phyA
phytoene synthase
Rv0693
pqqE
coenzyme PQQ synthesis
protein E
Rv0558
ubiE
ubiquinone/menaquinone biosynthesis methyltransferase
12. Heme
Rv0509
Rv0512
Rv0510
Rv2678c
and porphyrin
hemA
glutamyl-tRNA reductase
hemB
δ-aminolevulinic acid dehydratase
hemC
porphobilinogen deaminase
hemE
uroporphyrinogen decarboxylase
8
Rv1300
Rv0524
hemK
hemL
Rv2388c
hemN
Rv2677c
Rv1485
hemY'
hemZ
13. Cobalamin
Rv2849c cobA
Rv2848c cobB
Rv2231c
Rv2236c
Rv2064
Rv2065
Rv2066
Rv2070c
Rv2072c
Rv2071c
Rv2062c
Rv2208
cobC
cobD
cobG
cobH
cobI
cobK
cobL
cobM
cobN
cobS
Rv2207
cobT
Rv0254c
Rv0255c
Rv3713
Rv0306
cobU
cobQ
cobQ2
-
14. Iron utilization
Rv1876
bfrA
Rv3841
bfrB
Rv3215
entC
Rv3214
entD
Rv2895c
viuB
Rv3525c
-
protoporphyrinogen oxidase
glutamate-1-semialdehyde aminotransferase
oxygen-independent coproporphyrinogen III oxidase
protoporphyrinogen oxidase
ferrochelatase
cob(I)alamin adenosyltransferase
cobyrinic acid a,c-diamide
synthase
aminotransferase
cobinamide synthase
percorrin reductase
precorrin isomerase
CobI-CobJ fusion protein
precorrin reductase
probable methyltransferase
precorrin-3 methylase
cobalt insertion
cobalamin (5'-phosphate)
synthase
nicotinate-nucleotide-dimethylbenzimidazole transferase
cobinamide kinase
cobyric acid synthase
possible cobyric acid synthase
similar to BluB cobalamin synthesis protein R. capsulatus
bacterioferritin
bacterioferritin
probable isochorismate synthase
weak similarity to many phosphoglycerate mutases
similar to proteins involved in
vibriobactin uptake
similar to ferripyochelin binding
protein
H. Lipid biosynthesis
1. Synthesis of fatty and mycolic acids
Rv3285
accA3
acetyl/propionyl CoA carboxylase
α subunit
Rv0904c accD3
acetyl/propionyl CoA carboxylase
β subunit
Rv3799c accD4
acetyl/propionyl CoA carboxylase
β subunit
Rv3280
accD5
acetyl/propionyl CoA carboxylase
β subunit
Rv2247
accD6
acetyl/propionyl CoA carboxylase
β subunit
Rv2244
acpM
acyl carrier protein (meromycolate
extension)
Rv2523c acpS
CoA:apo-[ACP] pantethienephosphotransferase
Rv2243
fabD
malonyl CoA-[ACP] transacylase
Rv0649
fabD2
malonyl CoA-[ACP] transacylase
Rv1483
fabG1
3-oxoacyl-[ACP] reductase (aka
MabA)
Rv1350
fabG2
3-oxoacyl-[ACP] Reductase
Rv2002
fabG3
3-oxoacyl-[ACP] reductase
Rv0242c fabG4
3-oxoacyl-[ACP] reductase
Rv2766c fabG5
3-oxoacyl-[ACP] reductase
Rv0533c fabH
β-ketoacyl-ACP synthase III
Rv2524c fas
fatty acid synthase
Rv1484
inhA
enoyl-[ACP] reductase
Rv2245
kasA
β-ketoacyl-ACP synthase
(meromycolate extension)
Rv2246
kasB
β-ketoacyl-ACP synthase
(meromycolate extension)
Rv1618
tesB1
thioesterase II
Rv2605c tesB2
thioesterase II
Rv0033
possible acyl carrier protein
Rv1344
possible acyl carrier protein
Rv1722
possible biotin carboxylase
Rv3221c resembles biotin carboxyl carrier
Rv3472
possible acyl carrier protein
2. Modification of fatty and mycolic acids
Rv3391
acrA1
fatty acyl-CoA reductase
Rv3392c cmaA1 cyclopropane mycolic acid
synthase 1
Rv0503c cmaA2 cyclopropane mycolic acid synthase 2
Rv0824c desA1
acyl-[ACP] desaturase
Rv1094
desA2
acyl-[ACP] desaturase
Rv3229c desA3
acyl-[ACP] desaturase
Rv0645c mmaA1 methoxymycolic acid synthase 1
Rv0644c mmaA2 methoxymycolic acid synthase 2
Rv0643c mmaA3 methoxymycolic acid synthase 3
Rv0642c mmaA4 methoxymycolic acid synthase 4
Rv0447c ufaA1
unknown fatty acid methyltransferase
Rv3538
ufaA2
unknown fatty acid methyltransferase
Rv0469
umaA1 unknown mycolic acid methyltransferase
Rv0470c
umaA2
unknown mycolic acid methyltransferase
3. Acyltransferases, mycoloyltransferases and
phospholipid synthesis
Rv2289
cdh
CDP-diacylglycerol phosphatidylhydrolase
Rv2881c cdsA
phosphatidate cytidylyltransferase
Rv3804c fbpA
antigen 85A, mycolyltransferase
Rv1886c fbpB
antigen 85B, mycolyltransferase
Rv3803c fbpC1
antigen 85C, mycolyltransferase
Rv0129c fbpC2
antigen 85C', mycolytransferase
Rv0564c gpdA1
glycerol-3-phosphate dehydrogenase
Rv2982c gpdA2
glycerol-3-phosphate dehydrogenase
Rv2612c pgsA
CDP-diacylglycerol-glycerol-3phosphate phosphatidyltransferase
Rv1822
pgsA2
CDP-diacylglycerol-glycerol-3phosphate phosphatidyltransferase
Rv2746c pgsA3
CDP-diacylglycerol-glycerol-3phosphate phosphatidyltransferase
Rv1551
plsB1
glycerol-3-phosphate acyltransferase
Rv2482c plsB2
glycerol-3-phosphate acyltransferase
Rv0437c psd
putative phosphatidylserine
decarboxylase
Rv0436c pssA
CDP-diacylglycerol-serine
o-phosphatidyltransferase
Rv0045c possible dihydrolipoamide acetyltransferase
Rv0914c lipid transfer protein
Rv1543
probable fatty-acyl CoA reductase
Rv1627c lipid carrier protein
Rv1814
possible C-5 sterol desaturase
Rv1867
similar to acetyl CoA
synthase/lipid carriers
Rv2261c apolipoprotein N-acyltransferase-a
Rv2262c apolipoprotein N-acyltransferase-b
Rv3523
lipid carrier protein
Rv3720
C-term similar to cyclopropane
fatty acid synthases
I. Polyketide and non-ribosomal peptide synthesis
Rv2940c mas
mycocerosic acid synthase
Rv2384
mbtA
mycobactin/exochelin synthesis
(salicylate-AMP ligase)
Rv2383c mbtB
mycobactin/exochelin synthesis
(serine/threonine ligation)
Rv2382c mbtC
mycobactin/exochelin synthesis
Rv2381c mbtD
mycobactin/exochelin synthesis
(polyketide synthase)
Rv2380c mbtE
mycobactin/exochelin synthesis
(lysine ligation)
Rv2379c mbtF
mycobactin/exochelin synthesis
(lysine ligation)
Rv2378c mbtG
mycobactin/exochelin synthesis
(lysine hydroxylase)
Rv2377c mbtH
mycobactin/exochelin synthesis
Rv0101
nrp
unknown non-ribosomal peptide
synthase
Rv1153c omt
PKS o-methyltransferase
Rv3824c papA1
PKS-associated protein, unknown
function
Rv3820c papA2
PKS-associated protein, unknown
function
Rv1182
papA3
PKS-associated protein, unknown
function
Rv1528c papA4
PKS-associated protein, unknown
function
Rv2939
papA5
PKS-associated protein, unknown
function
Rv2946c pks1
polyketide synthase
Rv1660
pks10
polyketide synthase (chalcone
synthase-like)
Rv1665
pks11
polyketide synthase (chalcone
synthase-like)
Rv2048c pks12
polyketide synthase (erythronolide
synthase-like)
Rv3800c pks13
polyketide synthase
Rv1342c pks14
polyketide synthase (chalcone
synthase-like)
Rv2947c pks15
polyketide synthase
Rv1013
pks16
polyketide synthase
Rv1663
pks17
polyketide synthase
Rv1372
pks18
polyketide synthase
Rv3825c pks2
polyketide synthase
Rv1180
pks3
polyketide synthase
Rv1181
pks4
polyketide synthase
Rv1527c pks5
polyketide synthase
Rv0405
pks6
polyketide synthase
Rv1661
pks7
polyketide synthase
Rv1662
pks8
polyketide synthase
Rv1664
pks9
polyketide synthase
Nature © Macmillan Publishers Ltd 1998
Rv2931
Rv2932
Rv2933
Rv2934
Rv2935
Rv2928
Rv1544
ppsA
ppsB
ppsC
ppsD
ppsE
tesA
-
phenolpthiocerol synthesis (pksB)
phenolpthiocerol synthesis (pksC)
phenolpthiocerol synthesis (pksD)
phenolpthiocerol synthesis (pksE)
phenolpthiocerol synthesis (pksF)
thioesterase
probable ketoacyl reductase
J. Broad regulatory functions
1. Repressors/activators
Rv1657
argR
arginine repressor
Rv1267c embR
regulator of embAB genes
(AfsR/DndI/RedD family)
Rv1909c furA
ferric uptake regulatory protein
Rv2359
furB
ferric uptake regulatory protein
Rv2919c glnB
nitrogen regulatory protein
Rv2711
ideR
iron dependent repressor, IdeR
Rv2720
lexA
LexA, SOS repressor protein
Rv1479
moxR
transcriptional regulator, MoxR
homologue
Rv3692
moxR2 transcriptional regulator, MoxR
homologue
Rv3164c moxR3 transcriptional regulator, MoxR
homologue
Rv0212c nadR
similar to E.coli NadR
Rv0117
oxyS
transcriptional regulator (LysR
family)
Rv1379
pyrR
regulatory protein pyrimidine
biosynthesis
Rv2788
sirR
iron-dependent transcriptional
repressor
Rv3082c virS
putative virulence regulating
protein (AraC/XylS family)
Rv3219
whiB1
WhiB transcriptional activator
homologue
Rv3260c whiB2
WhiB transcriptional activator
homologue
Rv3416
whiB3
WhiB transcriptional activator
homologue
Rv3681c whiB4
WhiB transcriptional activator
homologue
Rv0023
putative transcriptional regulator
Rv0043c transcriptional regulator (GntR
family)
Rv0067c transcriptional regulator
(TetR/AcrR family)
Rv0078
transcriptional regulator
(TetR/AcrR family)
Rv0081
transcriptional regulator (ArsR
family)
Rv0135c putative transcriptional regulator
Rv0144
putative transcriptional regulator
Rv0158
transcriptional regulator
(TetR/AcrR family)
Rv0165c transcriptional regulator (GntR
family)
Rv0195
transcriptional regulator
(LuxR/UhpA family)
Rv0196
transcriptional regulator
(TetR/AcrR family)
Rv0232
transcriptional regulator
(TetR/AcrR family)
Rv0238
transcriptional regulator
(TetR/AcrR family)
Rv0273c putative transcriptional regulator
Rv0302
transcriptional regulator
(TetR/AcrR family)
Rv0324
putative transcriptional regulator
Rv0328
transcriptional regulator
(TetR/AcrR family)
Rv0348
putative transcriptional regulator
Rv0377
transcriptional regulator (LysR
family)
Rv0386
transcriptional regulator
(LuxR/UhpA family)
Rv0452
putative transcriptional regulator
Rv0465c transcriptional regulator
(PbsX/Xre family)
Rv0472c transcriptional regulator
(TetR/AcrR family)
Rv0474
transcriptional regulator
(PbsX/Xre family)
Rv0485
transcriptional regulator (ROK
family)
Rv0494
transcriptional regulator (GntR
family)
Rv0552
putative transcriptional regulator
Rv0576
putative transcriptional regulator
Rv0586
transcriptional regulator (GntR
family)
Rv0650
transcriptional regulator (ROK
family)
Rv0653c putative transcriptional regulator
Rv0681
transcriptional regulator
(TetR/AcrR family)
Rv0691c transcriptional regulator
(TetR/AcrR family)
Rv0737
putative transcriptional regulator
Rv0744c putative transcriptional regulator
Rv0792c transcriptional regulator (GntR
8
Nature © Macmillan Publishers Ltd 1998
2,100,001
1,950,001
1,800,001
1,650,001
1,500,001
1,350,001
1,200,001
1,050,001
900,001
750,001
600,001
450,001
300,001
150,001
1
aA
sM
2
43
09
urF
p
42
09
cy
bU
pra
etB
m
44 45
09 09
12
12
C
PE
15
12
pB
10 pC
tr
16
B
trp
55 856
1
18
A
od
m
27
17
28 729
17
1
h
nd
26
17
A
61 h
18 ad
63 64 65
18 18 18
32 33 34 35
17 17 17 17
B
C
od
od
m
m
D
od
m
1
bD
ga
30
17
bc
rX
na
trp
A
E
trp
47
13
74
14
aA
ch
73
14
uT
le
t
lg
18 19 20
12 12 12
RS
G
_P
PE
cD
su
14
fa
dE
oY
ph
2
bP
17
12
12
A B hA
trx trx ec
fa
86
10
50
09
cC
su
p
T
ho
RS
G
_P
PE
15
dE
fa
ctp
D
4 3
s1 34
pk 1
21
05
20
05
ga
3
rK
na
35
01
41
01
54
09
22
08
rp
oC
66
18
2
rK
na
n
ac
48
13
E
sig
38
17
87 PE
03 P
pk
42
01
nA
49
13
A
htr
67
18
39
17
py
kA
55
09
23
08
78
14
68
18
40 41 42
17 17 17
1
sB
te
77
14
44
01
ro
dA
26
12
19
16
69
18
nE
pk
fa
8
dE
70 71
D2
18 18 lld
bH
fa
A
en
m
dE
6
fa
C
47
17
p
ap
hA
in
73 74 75
18 18 18
dB
cy
1
56
13
bG
97
10
86
14
A
bfr
34
12
qY
25 uV
le
16
77
18
49
17
48
17
27
90
14
79
18
51
17
52
17
91
14
lA
A2
c
ac
lp
E
PP
sA
rp
54
17
31
16
utB
m
62 63
13 13
RS
G
_P
PE
'
tB
ly
D
plc
66
13
67
13
RS
G
_P
PE
45
08
89 90
18 18
35
16
99 00
14 15
rF
lp
cA
su
94
18
91 92 93
18 18 18
RS )
G 22
_P ag
PE (w
34
16
98
14
10
58
61
IS
17
56 57
17 17
88
18
rL
na
90 691
0
06
43
08
8
82
02
pta
hE
ad
95
18
8
s1
pk
50
12
d2
gn
60
17
lp
98
18
pD
61 62
17 17
rA
uv
oB
J
lip
10
61
IS
cin
A
sX
ly
09
15
77
13
10
15
78
13
X
htp
03 04
19 19
o
aa
p
yrR
55
12
1
dA
gp
91
02
51
00
dA
gm
p
yrB
56
12
29
11
06
19
07
19
69
17
p
yrC
57
12
45
16
tG
ka
70
17
58
12
A1
glt
grc
C2
65
05
73
17
72
17
47
16
15
15
c
24
04
61 62
12 12
etE
m
62
08
p
PE
2
iB
am
yrF
14
19
13
19
76
17
75
17
eS
ph
Aa
e
ac
77
17
eT
ph
G
sig
83
01
H
ctp
Z
nrd
71
05
Ab
e
ac
78
17
fa
5
2
dD
79
17
nH
pk
E
PP
80
17
81
17
arg
12
pL
m
m
C
bR
em
37 38 39
11 11 11
F k 0
9
IH
dfp
m gm 13
65
12
RS
G
_P
PE
E
PP
E
PP
36
11
J
A
64
00
26
07
75
05
02
10
RS
G
_P
PE
93
13
D
arg
E
PP
82
17
B
arg
24
15
44
11
04
10
26
15
95
13
83
17
19
19
ic
74
08
20
19
pF
lp
arg
73
12
45
11
31
07
47
11
76
08
85
17
s5
22
19
86
17
0
s1
pk
pk
97 98 pH
13 13 li
D
lip
94
01
80
05
qN
lp
81 82
05 05
39
04
I
lip
24
19
E
PP
01
14
77
12
fa
31
dD
4
pA
26
19
E
PP
s7
pa
pk
p
riA
78
12
51
11
49 50
11 11
09
10
79
08
E
IK
-L
IS
08
10
E
PP
P
RE
48
11
77
08
37
07
2
EL
gro
38
07
84
05
41
04
11
03
fa
4
2
dD
05
14
27
19
28 29 30 31
19 19 19 19
95
17
tp
18
dE
fa
x
17
dE
faa
88
08
36
19
98
17
S
ile
rG
lp
7
s1
pk
10
14
sN
cy
37
19
pT
s9
pk
90
08
rL
lp
52
04
E
PP
88
12
rH
na
89
12
90
12
rJ rI
na na
X
66
16
E
PP
A
hB 39
ep 19 rib
1
s1
pk
sA
an
94
08
RS
G
_P
PE
22
14
P
RE
45
19
pG
lp
04 05
18 18
72
16
44 45 46
15 15 15
21
14
S
arg
qW
lp
1
aE
dn
24
14
64
04
E
PP
dD
fa
rB
th
PE
12
E
PP
54
19
fa
39
03
66
04
58 59 60
19 19 19
62
19
61
19
12
18
16
80
dE
16
fa
74
11
63
19
13
18
PE
14
18
82
16
B1
pls
X
oe
m
29
14
dH
3
0
2
aA
um
1
aA
um
60
15
IS
67
07
07
09
16
06
A
ald
83
16
64 965
19
1
15 16
18 18
A
frd
31
14
rfe
43
03
C
lip
79
11
PE
E
ctp
69
07
3
ce
m
17
18
67
19
RS
G
_P
PE
68
19
85 86 87
16 16 16
84
16
qJ
lp
69
19
19
18
88
16
ty
G
atp
tA
gg
rM
lp
Z
glg
i
tp
71
19
G
ilv
79
04
25
02
13
09
74
07
4
Y
glg
cA
2
76
19
72 73 74 75
19 19 19 19
se
cN
re
X
glg
bis
C
urA
m
3
pA
pa
48
10
PE
tP
be
79
07
cB
re
84
04
s
rr
85
04
E
PP
98
16
77
19
78
19
79
19
4
pt6
m
F
nrd
cC
f2
2
dD
1
f
rr
82
19
P
RE
B
84
19
RS
G
_P
PE
v
gc
t
tk
18
13
cA
ro
57
10
23
09
85
19
33
18
03
17
72
15
cy
cA
86
19
34
18
93
04
94
04
56
03
36
02
11
01
95
04
rA
pu
86
07
27
09
I
sig
20
13
90
11
87 88
19 19
89 90
19 19
36
18
E
PP
91
19
qI
lp
92
07
93
07
dB
lp
4
dA
3
dD
B
G
ctp
glc
07
17
6
RS
G
_P
PE
fa
fa
63 qV
10
lp
r
qo
87 88
15 15
93 94
19 19
95
19
41
18
96
19
56
14
F
ctp
42
18
13
17
57 458
1
14
27
13
93
15
dA
na
59
14
P
glg
pE
da
9
37
09
2
1
98
19
aB
gu
99
19
d
gn
72
10
dB
na
60
14
48
02
22 23
01 01
47
02
45 46
18 18
fa
01
bG
fa
20
47 A B
18 ure ure
20 21
17 17
3
03
20
C
ure
22
17
23
17
30
13
98
15
dC 597
1
na
61
14
75
10
05
12
3
dA
39
09
05
08
53
06
lJ lL
rp rp
2
pL
m
m
04
08
G
din
04
12
73
10
38
09
rL
pu
37
dD 650
0
2
pS
m
m
fa
03
12
00
20
46
02
21
01
69 70 71 72
03 03 03 03
45
02
sA
fu
rB
se
68
03
8
hA chA
e
tA
ps
01
12
ec
tC
ps
2
02
08
pC 801
pe
0
48
06
2
4
aA 050
cm
00
12
69
10
92
15
5
dE
7
dD
3
T
6
8
14
19
dB 71 717 71
17
1 1 1
17 pro
fa
B 90 91
bio 15 15
55
14
99
11
fa
fa
65 366 367
03
0 0
02
05
1
oS
ph
99
07
81
97 98 10
11 11
IS
B
glg
P
RE
tB
ps
98
07
64
03
fa
A
2
dA
c
ox
1
lE
ga
S
47
06
a
fb
y
ox
RS
G
_P
PE
4
53
14
k
09 10 11
17 17 17 cm
86
15
47
15
IS
tS
ps
E
PP
bG
97
07
G
lip
C
pro
gtE
m
fa
RS
G
_P
PE
38 39
RS
G
18 18
_P
PE
08
17
94
11
24
13
PE
RS
G
_P
PE
65 66
10 10
nD
pk
10
61
IS
95 96
07 07
4
3
2
1
aA
aA
aA
aA
m
m
m
m
m
m
m
m
82 583 584 585
15
1 1 1
cta
B
92
11
61
03
16
01
41
02
98 499
04
0
60
03
97
04
15
01
38 39 40
02 02 02
hA 14
gm 01
58 59
03 03
96
04
a
gc
2
2
1
oS
tC
tA
ps
ps
ph
91
07
21 22
13 13
91
11
77 78 79 80 81
15 15 15 15 15
E
PP
35
18
76
15
26
09
89 90
07 07
rQ
pu
59 060 061 062
1
10
1
1
RS
G
_P
PE
19
13
88
11
fa
14
dD
25
09
87
07
E G
rT etT 35 36 37 pT c s lK lA
th m 06 06 06 tr se nu rp rp
92
04
RS
G
_P
10
01
PE
35
02
3
gX
p
m
nra
34
06
re
08
01
D 71 73 74 75
bio 15 15 15 15
A
86
11
alk
02
17
F
bio
l
ta
t
og
84
07
3
nX
85
07
se
E
PP
2
bD
ga
3
33
hA 06
22
09
35
15
IS
21
09
83
07
ec
88 pm
04
g
32 rdB
02
n
I
ctp
54 55 X 56
10 10 leu 10
00 01
17 17
A
bio
zw
fa
53
10
52
10
20
09
4
dE
87
04
fa
54
15
IS
30 31
18 18
rG
py
66 67
15 15
rrl
84
11
44 vB cA
14 de op
51
10
T
arg
18 19
09 09
re
86
04
Bb trBa
rC
p
pu ptr
10
pL
m
m
65
15
97
16
06
01
29 230
02
0
aJ pR
dn hs
28
02
B
m
rp
49 50
10 10
43
14
78
07
83
04
E
grp
27
02
04
01
2
sA 23 24 25 vH 27 28 29
pg 18 18 18 gc 18 18 18
95
16
RS
G
_P
PE
14
13
rB
pu
cD
re
urB
m
aK
dn
E
PP
47
10
81
10
IS
57
15
IS
46
10
13
13
92 93 A
16 16 tly
s
G
ec
C 12
atp 13
s
pk
14
09
76
07
75
07
B
ctp
26
02
80 81
04 04
28
06
26 27
06 06
44 45
10 10
39
14
D
atp
43
10
rJ 91
lp 16
60 61
15 15
k
pg
3
24
02
02
01
47 348 349
0
03
0
25
06
22 23 24
06 06 06
E
IK
-L
IS
41 42
10 10
rS
A
ilv
p
ga
23
02
09 10 11 12
09 09 09 09
s
pk
A
atp
P2
aro
e
A1
ch
74 475 476 477 eoC
04
0 0 0 d
21
06
73
04
45
03
21
02
70 71 urD
07 07
p
6
pL 8
m 55
1
m
35
14
33 34
14 14
B dC D 56
frd fr frd 15
32
14
H
03 B E F
13 atp atp atp atp
xC 78
fd 11
33 34 35 36 37 38 PE
P
10 10 10 10 10 10
76
11
19
02
71 72
04 04
42
03
18
02
nrp
'
lK
14 15 17 lT lT
06 06 06 ga ga ga
2
dB
66
07
fa
41
03
W
lip
00
01
16
02
1
dD
40
03
fa
6
hA 906
ec
0
65
07
K
m 301
1
he
fa
32
10
cD
ac
eA
13
06
ac
3
dE
fa
97 98
00 00
4
dD
12
06
pC
kd
03
09
E A
m
rp prf
'
11 11
dD dD
fa
fa
28
14
o
rh
10 gtC
18 m
76 bF 678
16 ds
1
O
lip
rC
th
73
11
02
09
pB
kd
pA 00 01
om 09 09
pA
kd
98
08
10 611
0
06
65
04
13
02
E
PP
59 60 hB 762 763 764
07 07 ad
0 0
0
oR
ph
52 53 55 56 57
19 19 19 19 19
E
PP
75
16
48 49 50 51
19 19 19 19
E
PP
97
08
P
o
ph
70 171
11
1
25
14
rA
th
PE
pD
kd
A2
glt
rV 56
th 07
38
03
dR
na
05 06 07 08 09
06 06 06 06 06
36
15
IS
P
RE
94 095
0
00
62 463
04
0
93
00
A
ck
p
pC
as
A
ctp
59 60 61
04 04 04
36
03
03 qO
06 lp
PE
73 674
16
1
47
19
PE
23
14
sA
ly
pE
kd
E
PP
E
PP
95
08
67
11
RS
G
_P
PE
69 70 71
16 16 16
43
15
42 43 44
19 19 19
41
19
E
PP
u
68
16
rI
lp lbN
g
67
16
0
pA 54
1
ls
vrC
91 V
12 arg
65
11
57
04
32 33 lA
03 03 rm
10
02
58
04
91
00
09
02
98 99 00 01 rA
05 05 06 06 tc
31
03
07 08
02 02
88 89 090
0
00 00
24 25 26
10 10 10
97
05
93
08
o
en
sA
m
m
95 96
05 05
55 A2
04 ch
e
54
04
cE
hy
3
pL
m
m
29 30
03 03
92
08
qU
lp
9
94
05
E
PP
dE
fa
21
10
91
08
sB
m
m
27
03
cQ
hy
28
03
05
02
cP
hy
A2 bH 17 rH
19
rib
ri 14 lp
14
87
12
fd
m
din
13 14
14 14
rG
na
lp
92
05
4
pS
m
m
23
03
04
02
cD
hy
24 25 26
03 03 03
03
02
83
00
RS
G
_P
48 49 50
07 07 07
PE
91
05
4
pL
m
m
A
cit
C
rib
84 ysD
c
12
19
10
90
05
gA
ud
11
pL
m
m
79 80 81 82
00 00 00 00
20 d
03 dc
T
U
ln
glm g
2
utT
m
13
hA
ec
97
17
49
04
2
ce
m
RS
G
_P
PE
pB
op
35
15
G
rib
pC
59
11
op
34
15
e
rp
96
17
s8
pk
33
15
pD
87
08
45
07
qT rsA
lpp
p
rBB
fp
58
11
op
u
fm
57
11
48
04
86 87 588
0
05 05
43 44
07 07
pth rplY
85
08
32
15
31
15
t
fm
pA
56
11
6
s1
pk
op
h
ad
92 93 94
PE 17 17 17
03
14
04
14
79
12
55
11
rC
se
11 12
10 10
t
4
om 115
52
11
gA
ks
83
08
'
57
15
IS
gllp
p
pc
01
02
99 00
01 02
78
00
2
3
Q lyU 18
g 03
98
01
76 77
00 00
15 16
03 03
75
00
44 K 46
A1
04 sig 04
ufa
85
05
43
04
14
03
97
01
74
00
RS
PG
39 40 41 _
07 07 07 PE
E
PP
12
03
13
03
73
00
95 96
01 01
'
71 P 72
00 RE 00
80 81 82
08 08 08
10
03
08 09
03 03
gly
A2
'
L 36
k ap
ad m sig 07
E
PP PE
76
12
A3
79
05
oa
m
07
03
aA
sd
06
03
etS
m
Y
c
se
rB rC
lp lp
46
11
75
08
H
30
07
93
01
68
00
sA psd
ps
67
00
06
10
E
PP
92
01
RS
G
_P
PE
35
04
RS
G
_P
PE
84
17
d2
91
01
bB
pa
F
R
G
arg arg arg
l2
bb
w
94
13
cr
m
fa
0
1
dE
lB
xy
34
04
90
01
65
00
77
05
72
12
28
07
76
05
03
10
cA
fu
68 69 rA 71
12 12 lp 12
23
15
D
ilv
C
30 31 d
33
04 04 so
04
E
PP
87 88
01 01
11
10
hA chA
ec
e
40
11
arc
etK
m
arg
99 00
09 10
cs
25
07
pB
f
de
74
05
2
D2 A 870
oa oa 0
m m
V
96
97 98
09 ala 09 09
64
12
34
11
J
rim
b
glS
63
00
28
04
A
p
sp
73
05
26 thA
04
x
72
05
67
08
C2
E2
63 oa og oa
08 m m m
lA
ce
84 85
01 01
60 61
00 00
RS
G
_P
98 99 00 01 02 03
02 02 03 03 03 03
PE
81
01
59
00
D
lN lX lE sN sH lF lR sE m lO
rp rp rp rp rp rp rp rp rp rp
69
05
80
01
A
oe
m
13
07
68
05
lU
ga
G
ats
rO
lp
aB
dn
17 518 519 520
1
15
1
1
arB
60
12
74
17
48
16
16
15
59
12
32
11
12
07
61
08
90 91 92
09 09 09
fa
dB
A
ats
66
05
iC
th
95
02
rT 567
0
ty
5
rA 910 pC
dB
1 lp
fa
fu
71
17
PE
14
15
82 arA
13
c
30
11
88
09
fa
dA
93
02
94
02
75 76 77 78
01 01 01 01
52 sF b sR lI 57
00 rp ss rp rp 00
20 21 iD
04 04 th
92
02
74
01
qM
lp
iA 513
ep
1
58
08
87
09
56 57
08 08
I
fC m lT nR
in rp rp ts
54
12
28
11
P
RE
RS
G
_P
PE
L
aD
de
dK
pp
sc
m
86
09
54 r
08 fa
B2
oa
m
66 67
17 17
nT
na
c
pd
C1
grc
lp
qL
rK
lp
nA
po
C
sJ lC lD lW lB sS lV sC lP m sQ
rp rp rp rp rp rp rp rp rp rp rp
61
05
15 16 iG
04 04 th
90
02
72
01
49
00
87 88 89
02 02 02
71
01
47 048
00
0
59 60
05 05
76
13
rE
lp
iE
th
iE
ub
83
09
26
11
'
B9
IS
08
15
75
13
25
11
fa
6
1
dD
E
PP
70
01
46
00
3
utT
m
PE
98 99
06 06
57
05
12
04
82
09
65
17
39
16
74
13
51
12
63 64
17 17
hC
ep
81
09
07
15
73
13
bp
03 04 05 06
15 15 15 15
71
13
f
zw
56
05
'
06 51
16 08
IS
49 50
08 08
97
06
D
en
m
H
gln
84
02
1
ce
m
44 045
0
00
67 168
01
0
42 043
00
0
96
06
3
sM
cy
95
06
79
RS
09
G
_P
PE
lp
qS
96 97
18 18
02
15
37
16
36
16
01
15
69 70
13 13
10
61
IS
49
12
18 19 20
11 11 11
RS
G
_P
PE
46
08
nG
pk
fa
5
dD
C
oC
en
m
bp
kA
ac
D1
lld
52
05
uS
le
83
02
65
01
63 64
01 01
qE
92
06 pq
dD
fa
07
04
81
02
61
01
39 40
00 00
38
00
12 13 14 15 16 17
11 11 11 11 11 11
76
09
42
08
L
lip
rB
uv
06
04
E
PP
37
00
B 49 50
en 05 05
m
89
06
88
06
PE
36
00
45 46 47
12 12 12
65
13
qZ
lp
87
18
32
16
13
11
11
dE
fa
94 95 96
14 14 14
bU
rs
87
06
46 47
05 05
40 841
08
0
2
PE
s6
pk
34
dD
fa
RS
G
_P
PE
58
01
33 34
00 00
86
06
cD
ac
39
08
09
11
qR
f
tu
dh 41 42
m 12 12
06 eB eA
11 xs xs
utA
m
F2
E 3 4
A
en 54 54 pit
m 0 0
pE 882 883 884 885 pB
1
1 1 1
fb
lp
po
fu
30
dD
fa
RS
G
_P
PE
sA
41
05
37
08
E
PP
05
11
rA
co
60
13
C
04
11
g
su
36
08
7
12
hA
dE
ec
fa
70
09
lp
qQ
80
18
59
13
gB
su
39 40
05 05
1
pS
m
m
77
02
81 sL sG
06 rp rp
38
05
1
pL
m
m
dD
fa
76
02
2
53
dE
01
fa
bio
A
B
tA tA ntB
pn pn
p
27 28 29 30 31
00 00 00 00 00
01 02 03
11 11 11
gA
su
V
ctp
27 628
16
1
A3
gln
1
dD
fa
26
16
89
14
58
13
lp
99
10
00
11
67 68
09 09
87 488
14
1
m
fu
33
12
24
16
Z
m
he
57
13
32
12
96
10
62
09
31
12
2
oH
ph
60 61
09 09
RS
G
_P
PE
79 80
06 06
78
06
37
05
7
dE
fa
01
04
74
02
26
00
PE
72 273
02
0
5
pS
m
m
2
lE
ga
PE
23 24 25
00 00 00
qK
lp
63 64 65 66
09 09 09 09
RS
RS
G
G
T pT eT _P
_P
PE
gluas ph PE
5
pL
m
m
p
pn
98
03
fa
50
01
49
01
95 96 97
03 03 03
d
fa
D2
48
01
21 022
0
00
uT
le
A4 74 hA5
06 ec
h
ec
30
12
nF
pk
82
14
Y
oe
m
dD
cy
81
14
rp
m
sA
de
59
09
94
03
RS
G
_P
PE
31 sT
08 ly
2
31
05
93
03
20
00
47
01
69
02
19
00
68
02
46
01
29 30
08 08
44 45
17 17
dC
cy
qX
lp
A
gly
54
13
27
12
58
09
R
80
ox
m
14
53
13
lp
92
03
30
05
qP
p
pp
rU
na
45
01
'
27 28 05
08 08 16
IS
26
08
d
en
sB
cc
90 etZ
m
03
aA
co
rH
pu
25
08
28
05
rT
pu
66
02
43
01
pb
pA
2
bG 51 52
fa 13 13
25
12
24
12
RS
G
_P
PE
rN
pu
1
sA
de
69
06
25 26 sA
05 05 cc
76
14
15 16
16 16
22
12
90
PE PE 10
53
09
23
05
L
m
he
86
03
62 63 64
B2
02 02 02 fec
37
01
nB
pk
38 39 40
01 01 01
12 bA
00 pa
36
01
10 11
00 00
iA
85
03
18 819
08
0
rp
oB
19
05
60
02
e
pp
F
ph
08
00
33
dD
85
10
16
12
44
13
84
10
rD
uv
B
18
05
clp
iX 17
th 08
64 65 66
06 06 06
17
05
57 58 59
02 02 02
32
01
33
01
07 T T
00 ile ala
F I2
B H A
pA
his his his im his his
24 25
17 17
his
64 465 466
14
1 1
F G D
ure ure ure
D
his
62 63
14 14
E
PP
16
05
47 48
09 09
82 83
10 10
C
glg
A 81
gre 10
i
pg
1
dE
fa
13 C2 A2
08 se ys
s c
D
ats
15
05
C
ab
p
B 3 4
m
1 1
he 05 05
10 11
08 08
rM
pu
b
co
Q
2
pC
fb
30
01
rA
gy
80 381 pA 383
0 um
03
0
77 378 ec
03
0 s
co
D
nir
28
01
rB
gy
56 57 58 59 60 61 62
06 06 06 06 06 06 06
G
s
cy
76
03
B
nir
27
01
cF 004
re
0
2
08 09 gA 11
lP
fo
12 12 ta 12
41
09
07
08
55
06
C
m
he
74 75
03 03
26
01
aN
31 32 333 334 335 ysM 337 urI 339
hA 41
m
rp 13
13 13
1
1 1 c
1
1
6
dD
fa
U
lip
40
09
Y
s
cp
54
06
A
m
he
73
03
08
05
dn
49 50 sp
02 02 h
RS
G
_P
pA
pe
PE
dn
8
2,250,000
2,100,000
1,950,000
1,800,000
1,650,000
1,500,000
1,350,000
1,200,000
1,050,000
900,000
750,000
600,000
450,000
300,000
150,000
4,350,001
4,200,001
4,050,001
3,900,001
3,750,001
3,600,001
3,450,001
3,300,001
3,150,001
3,000,001
2,850,001
2,700,001
2,550,001
2,400,001
2,250,001
lE
fo
PE
rA
ty
26
32
hD
ad
B
30
25
07
24
82
22
A
E
74 at6
PP 38 es
08
20
28
32
85
34
76
38
X
pro
36
dE
77
38
fa
60
37
F
lip
86
34
3
85
22
11
24
86
22
Z
fts
11
20
09 10
20 20
90
30
91
30
79
38
hA
92 93
34 34
80
38
64
37
81
38
65
37
19 20 PE
P
36 36
qH
lp
18
36
91
34
B
K
PE
lp
82
38
26
36
96
34
96
30
83
38
PE
27
36
97
34
39
32
84
38
96 97
22 22
urF
m
22
24
98
22
urE
m
24
20
gK
24
24
C
ald
05 06
27 27
04
27
a
pc
59
28
09
27
a
29
36
4
ce
m
85
38
73
37
C2
his
pp
98
34
86
38
02 xD
35 fd
88
38
fa
26
dE
27
35
36
78
37
fa
dE
89 90 91 PE
P
38 38 38
77
37
2
lB
rm
rU
se
76
37
31 32 33
36 36 36
87
38
21
hA
E
ec
lip
30
36
qB
lp
61 62 63 64
33 33 33 33
B
prf
rA
fa
17
65
33
PE
79
37
34
15
IS
o
39
36
sp
oU
m
pE
gc
40
36
94
38
68
33
43
36
95
38
iB
ep
54
32
87
37
79
29
pA
99
38
X
hfl
66
25
X
2
dA
02
39
91
37
03
39
92
37
10
35
77
33
27
31
nA
51
36
78
33
69
25
84
28
31
27
L
8
08
39
09
39
bA
em
54 55 56 57 58
36 36 36 36 36
id
B
trb
lD
rm
31
31
lS
f
ts
sB
rp
60
36
10
39
bB
em
93
29
91
28
H
19
62
36
71
32
pC
dp
98
37
lM
cw
57
15
IS
B2 C
trx trx
pD
dp
19
hA 517
3
ec
S
55
24
his
4
16
39
cD
ac
pB
dp
rA
pa
pA
dp
19
35
73
32
rC
qc
50
20
lA
re
X
clp
rB
pa
20
35
1
00
30
0
gid 392
s
t
hF
fd
ap
58
24
E
PP
68
36
23
35
94
33
W
oe
m
cF
49 50 751
27 27
2
se
P2 lpP
clp c
37
23
25
35
aA
71
36
oE
nu
02
38
72
36
sA
id
1
pC
fb
73 nth
36
28
35
yA
ph
pA
da
PE
pA
fb
31
35
05
38
A
pfk
12
29
B
c
da
bH
co
O
glb
t
dg
80
36
E
PP
06 07
38 38
79
36
01
34
tC
ga
13
30
91
32
08
38
4
hiB
w
E
PP
02
34
t
la
'
nA
glf
po
34
35
03
34
oL
nu
B
G
pir
83
36
36
35
bK
C
plc
12
22
bM
co
76
24
plc
B
B
p
pe
bL
co
ffh
95
32
oN
nu
17
30
p
cs
V
84
36 pro
37
35
RS
G
_P
PE
85
36
ufa
A2
D
gln
E
PP
B
gln
23
30
81
10
IS
t
am
13 814 815 816
38
3
3
3
89
36
1 2
2
ltp 354 354
2
aB
gu
17
38
90
36
29
dE
fa
28
18
38
91
36
dE
fa
19
38
c
sm
78 79
27 27
E
PP
93
36
94
36
21
38
fa
95
36
p
K
glp
49
35
sI
gp
dA
x
hp
alr
8
pL
m
m
97
36
1
pA
pa
99
36
53
35
E
PP
iA
am
74
31
00
37
01 702
3
37
55
35
28
34
36
30
s2
pk
03
37
p
up
6
dA
fa
B
m
pm
hA
gs
57
35
23
26
05
37
E
PP
91
27
92
27
27
26
fa
dD
23
30
dE
fa
dB
ga
as
k
as
29
38
cB
32
38
31
38
aQ
dn
fa
33
dE
S
glm
hA
sd
98
27
sC
pp
33
38
se
12
37
oA
nh
c
rS
37 38
34 34
B
sp
etU
m
2
69
35
35
38
36
38
14 cR 16
37 re 37
Q
ob
67 68
35 35
39 440 rsA
34
3
m
27
33
E
nrd
J
sig
94
31
37 eA
38 ph
18
37
17
37
70
35
71 572
35
3
sD
pp
29
33
95
31
Q
glp
1
43
38
X
aZ
dn
C
gI
su
97
31
22
37
75
35
13
28
48
34
c
ac
48
22
10
61
IS
B2
ars
dA 47
so 38
26
37
79
35
50
34
35
33
34
33
99
31
37 38
33 33
22
dE
B
C
drr drr
fa
cy
sS
15
21
54
34
d1
81 82 83
35 35 35
etC
m
19
28
B
lig
28
37
qE
lp
54
38
G
49 50 51 s en
38 38 38 hn m
27
37
11 T
25 his
K
lip
12
25
81
10
IS
E2
trp
13
25
54 55 56 257
22 22 22
2
52 53
22 22
87
23
58
22
tA
etA 342
m
3
02
32
cs
ra
dA
55
38
29
37
87 588
35
3
56 57
38 38
86
35
ala
U
63
34
D
glt
C
lig
RS
G
_P
PE
30
37
pS
lp
ly
sU
A
nir
E
PP
Z
oe
m
07
32
33
37
B
glt
32
37
91
35
34
37
B2
ilv
37
37
62
38
63
38
E E
PP PP
oA
bp
72
34
C
clp
71
34
ats
ly
fa
sA
s
41 42
37 37
cy
1
nK
pk
s
pk
PE
17
32
43
37
68
38
sT
69
38
45 PE
37
44
37
21
32
S
vir
2
dD
2
83
30
80
34
22 H
32 sig
fa
51
37
70
38
71
38
49 50 rX 52 53
37 37 se 37 37
47 48
37 37
05 lK lX lP
36 fo fo fo
79
34
04
36
E
PP
20
32
81
30
s
dx
02
24
77
22
rD
py
40 sA 842
28 nu
2
01
24
76
22
pL
lp
81
26
bI
su
75
22
5
s1
pk
fB
in
1
hiB
w
03
36
E
PP
cy
74
22
15
hA 680
2
ec
sW
18
32
37 fA
28 rb
E
m
he
cy
sS 599 600 anD anC
3 3 p
p
tP
kg
46
33
64 865 866 867
38
3 3
3
40
37
r2
ls
10
61
IS
69
22
RS
G
_P
PE
68
22
pN 71 72 73
lp 22 22 22
33 34 35 36 37
21 21 21 21 21
32
21
Y'
m
he
sQ
F
din
cy
tD ntC 216
en
3
e
79
30
78
30
74 75
34 34
13
32
F
pX
lp
RS
G
_P
PE
76
30
67
22
2
sS
75 76
26 26
74
26
95
23
66
22
cy
pE gpA
u
ug
44
29
12
32
pB
ug
43
29
33
15
IS
lE
rh
75
30
60 61
38 38
35 736
37
3
RS
G
_P
PE
94
35
E
hp
m
RS
G
_P
PE
10
32
09
32
74
30
pC
ug
16
hA
ec
73
26
22 cpS
25
a
72
26
20
25
p
bc
tB
gg
65
22
29
21
sP 128
2
an
D
rib
7
pL
m
m
72 73
30 30
3
lB
rm
92 qF
35
lp
P
RE
70
26
64
22
sH 93
cy 23
PE
63
22
RS
G
_P
PE
25
21
28 29 30
28 28 28
08
32
69 70 71
30 30 30
fa
8
27
28
2
dD
26
28
lC 66 67
lB
rm rm 34 34
05
32
25
28
'
81
10
IS
A
m
pg
24
28
V 04
lip 32
utY
m
16 17
25 25
89 90
23 23
61 62
22 22
'
62 63 64 65 66 lpX 668 669
2 2
c
26 26 26 26 26
rE 66 67
em 30 30
23
28
64
30
as
m
20 21 22
28 28 28
etH
m
2
hE 260
2
ad
N
m
he
E
PP
14 515
25
2
G isI
20
21 his h
19
21
A
A lQ
D K M J fA
tru rp rpo rps rps rps rpm in
5
pA
pa
18
28
01
32
ic
10
25
btA
m
51
22
18
21
7
pK 1
lp 21
51 52 53 54 655 656 657 658 659 660 661
26 26 26 26
2 2 2
2 2
2
2
09
25
50
22
51 52 53
34 34 34
S
trp
50
26
16 17
28 28
08
25
btB
m
00
32
48
38
60
30
A
drr
10
61
IS
14
21
D1
glp
13
21
06 507
2
25
D6
12
21
14 15
28 28
24 25
37 37
44 45
38 38
rV 23
se 37
5
49
34
33
33
2
rD
uv
59
30
sE
pp
77
35
gA
na
nM
pk
3
dD
btC
m
sB
fa
ka
B 11
1
prc 2
T T
5 6
7 8 9
glycys alU264 264 64 64 264
v
2 2
oB coA
s
A
44 lT
26 va
sc
btD
m
57 058
3
30
12
28
04
16
IS
74
35
30
33
96
31
1
ars
cD
55 inP
30
d
47
34
10
28
prc
M
A
bD cp as
k
a
E
PE PP
ac
fa
41 42
26 26
1
'
55
08 09 15 11
28 28 IS 28
34
dE
20
37
fa
39 40 B
38 38 bfr
19
37
cA
39 40
26 26
ac
42
22
10
61
IS
05 06
21 21
I H
54
nrdnrd 30
05 06 07
28 28 28
6
sI lM 44 45
rp rp 34 34 344
47
15
IS
93
31
50
30
btE
m
E
9
E 9
cit 249 dE1
fa
e
ac
03 04
21 21
02
21
A
35 36 ed 638
d
26 26
2
02 03 04
28 28 28
25 26
33 33
49
30
01
28
92
31
hA
pd
RS
G
_P
PE
hB
pd
20 21 E3 hA C3
10
33 33 oa gp oa
61
IS
m
m
hB
sd
91
31
03
16
IS
00
28
32 33
26 26
G
46 47
30 30 nrd
C
dh
a
a
97
27
99
27
hC
pd
lZ
he
lV pE 39 40
va ah 22 22
37
22
btF
m
00
21
bD
co
91 92 93 94
24 24 24 24
29 630 631
2
26
2
90
31
fe
30
38
uA
le
fa
32
dE
hC hD
sd sd
31
dE
fa
pV
lp
28
26
PE
32 33 A 35
22 22 ptp 22
RS
G
_P
PE
76 tH tG
23 mb mb
88 89
31 31
34 35
34 34
28
38
37
15
IS
fa
3
dD
d
cd
10
61
IS
oA
de
33
34
27
38
d
d
ad
10
61
IS
D
cta
82 83 84 85 186 187
3 3
31 31 31 31
2
rB
se
sB
pp
B 94
95
tru 27
27
25 626
26
2
A
97
20
75
23
bC
co
96
20
hrc
30
22
RS
G
_P
PE
J2
72
23 dna
29
22
93 94 95
20 20 20
28
22
PE
02
16
IS
24
26
89
24
06 07
37 37
59
35
31
34
52
15
IS
12
33
80 81
31 31
11
33
79
31
30
34
1
ltp
22
26
88
24
40
E
15
PP IS
10
33
pB
rn
27
22
38
17 040 041
30
3
3
hA
ec
A
s
pp
21
dE
fa
37
30
R
sir
26
22
lY
he
67 oH 69 370
23 ph 23
2
19 20 21
26 26 26
18
26
77 78
31 31
35
30
87
27
17
26
16
26
91
20
B
n
pa
90
20
5
6
x
be 236 236
24
22
pE
pe
RS
G
_P
PE
32
15
IS
oD
S
lip
de
27
34
xB
fd
E
PP
iB
am
75
31
34
30
26
dD
fa
pU sO ribF
lp rp
33
30
2
iA
am
nJ
pk
14
W hA
arg ec
23
22
RS
G
_P
PE
Q
lip
sA 29
te 29
32
30
24
34
98
36
20
hA 551 552
3
3
ec
04
33
72 73
31 31
31
30
rS
th
84
24
60 61 362
23 23
2
A2
gln
56
15
IS
84 85 86 87
20 20 20 20
58 rB
23 fu
83
20
c 6
7
rn 292 292
lp
30
30
70
31
g
fp
pR
pe
11 sA 13
26 pg 26
83
24
E
gln
S
gly
82
20
I 1 2
rim 342 342
D2
glp
gc
48
35
22
38
47
35
ES
gro
5
dA
1
00 Y1
33 ho
p
EL
gro
3
hiB
w
67
31
A
fix
23
29
81
27
10
26
69
31
B
fix
ald
B2
pls
E
PP
A1
gln
81
20
pJ
lp
09
26
68
31
26 027
30
3
2
pA
pa
R2
ox
m
45
35
B
ats
5
6
R3 16 16
3
3
ox
m
25
30
77
27
xH
pd
81
24
10
61
IS
10
61
IIS
E
PP
19
22
79
20
54 55
23 23
78 479 480
24
2 2
D 15
13
34 sig 34
qC
lp
12
34
i
ne
24
30
Y
fts
78
20
16 pB pA
li
li
22
76
27
75
27
06
26
77
24
E
PP
B
76 77
20 20
c
su
71 72 pB 74
27 27 da 27
61 162 163
31
3
3
86 87 88
36 36 36
PE
3
60
31
aB
gu
r
lh
75
20
03 604 B2
2 tes
26
02
26
A
74
20
plc
hD
ep
73
20
19 PE PE PE
P P
30
PE
E
PP
oD
ch
E
PP
17
29
E
PP
06 07 08
34 34 34
94
32
lp
qA
5 67
bG 27
95 96 97 98 99 00 peE
25 25 25 25 25 26
s
fa
oM
nu
15
30
C
co
sig
74 75
24 24
72 73
24 24
65
27
04 05
34 34
ald
15
29
35
35
A
lig
92
32
nI
pk
gc
vT
C
bla
46 47 48
23 23 23
E
ilv
67
20
45
23
71
24
09
22
bI
co
vB vA vC
ru
ru ru
RS
G
_P
PE
68 469
2
24
aG
dn
oH uoI uoJ uoK
n n
nu
n
tA
13
29
ga
F W 88 89
sig rsb 32 32
77 78
36 36
30
35
75 76
36 36
29
35
00
34
3
oG
nu
tB
ga
cA
ac
08
30
99
33
oF
nu
07
30
9
dD
fa
pe
pD
pQ 42
lp 23
05
22
bG
co
S
06 obT ob
c
22
c
63
20
M
S'
57 58 59 60 61 62 A yA
54
27 hsd hsd 27 27 27 27 27 27 dfr th
bT
ga
nT
as
04
22
03
22
bN
co
i
64 rp 466
24
2
hK
cb
D M 08 sP 10
trm rim 29 rp 29
pZ
lp
lp
88
25
81 82 seA 284
32 32
s 3
oD
nu
05
30
26 527
35
3
gu
4,411,529
32
dD
fa
69 hE
36 ep
24
35
95
33
5
cD
ac
oA oB oC
nu nu nu
B
ilv
04
30
pW
52
27
cD
se
9
pL
m
m
B
n
as
61
20
0
06
59 2
20
V P
gly lip
tig roU
p
2 2 G 2
sR sN m mB
rp rp rp rp
54
20
01 hB
pB lS
29 rn
le rp
85
25
59
24
78 irA
32 b
77
32
43
31
53
20
97 S3 199 taC
21 mp
2
c
m
52
20
36
23
K
fts
21 22 pA H
39 39 rn rpm
22
35
ac
3
s1
pk
21
35
1
nH
iu
rE urK
pu
p
c
E
ys
rB
qc
N
C
ilv ilv
47
27
42
31
B4
d
fa
pY
lp
aA
cm
25
dE
fa
c
K
ys
rA
qc
51
20
98 hD
28
fd
3
E2
d
fa
98
29
97
28
rA
ac
4
E2
d
fa
97
29
96
28
iB
pp
56
24
33
23
E
cta
49
20
43
ag 45 A3
27 d_ 27 gs
k
p
35
81
25
42
27
qD
lp
18
35
A
pfl
D
trp
ez
m
RS
G
_P
PE
B
viu
89
33
72
32
37
31
se
rA
B
lin
rC
xe
40
27
91
21
31
23
54
24
pP
lp
RS
G
_P
PE
35
dE
fa
C
ctp
93
28
uB
le
E
PP
94
29
E
PP
dD
fa
78
25
38 39
27 27
M 12
sig 39
ats
61
36
60
15
IS
51
24
1
rK
na
90
21
50 452 453
24
2 2
77
25
E
PP
PE
89
21
86 87
33 33
68 69
32 32
33 34
31 31
84 85
33 33
RS
G
_P
PE
sB
67
32
32
31
U U ltS
glu gln g
91
29
49
24
76
25
cA
re
88
21
2
s1
pk
27
23
73 74 75
25 25 25
va
26
23
15
dD
fa
35 cX
27 re
iC
am
34
27
90
29
87
28
89
29
bb
w
10
61
IS
lC
pS
as
fo
80 81 tB
ly
33 33
1
dD
fa
RS RS
G G
_P _P
PE PE
RS
G
_P
PE
30
31
2
lA
rm
29
31
86
28
39
15
IS
33
27
71
25
uC
le
85
28
32
27
63
32
79
33
28
31
kA 446
nd 2
70
25
25
23
85 86
21 21
24
23
82 83 84
21 21 21
2 1 23
cD cD 23
ro ro
81
21
pB leuD
hu
1
utT
m
rH
30
27
e
rn
cE
ro
68
25
py
62
32
pc
bC
em
PE
frr
26
31
RS
G
_P
PE
76
33
61
32
E
PP
04 05 06
39 39 39
49
36
2
58
32 whiB
59
32
p
pk
19
23
29
27
tA
dc
67
25
u
47
20
79 80
21 21
pI
lp
C
sp
G
aro
8
A
ia 272
m
24
31
83
29
'
18 18
hA hA iD
ec ec am
47 pA
36 cs
ilv
u
E
sp
A
m lU
rp rp
pF
22 23
31 31
gp
A
m
pm
21
31
lA
dd
g
da
ob
u
A
sp
77
21
58
15
IS
T
lip
3 9 0
sA
erT t5 7 8
m mp 28 28
cd
0 6
pt7 7
m 28
B2
ots
90
37
00 01
39 39
88 89
37 37
to
71
33
A
an
m
56
32
80
29
20
dE
3
E
sA eC oa 20
cy ss m 31
78
29
RS
G
_P
PE
45
36
2
aE
86
37
dn
53
32
B
oe
m
iL
th
38
15
IS
74
28
B
pro
nL
pk
42 cA 44
20 pn 20
15
23
75
21
41
20
74
21
14
23
fa
38
24
65
25
22 23
27 27
37
24
3
71 72 pt8
28 28 m
21
27
Q
gln
sK
rb
13
23
2
sA
id
39 040
20
2
10 11 12
23 23 23
72
21
38
20
96 897 898
38
3 3
85
37
fic 642 rU 644
th 3
3
83
37
53
15
IS
RS
G
_P
PE
69
33
49 bB bA 52
32 ru ru 32
RS
G
_P
PE
15
31
g
un
70
28
81
10
IS
74 975
29
2
69
28
xA
le
61 62 63
25 25 25
35
24
B C D
oa oa oa 13 14
m m m 31 31
hH
sa
aA
cG
re
80 81 bE
37 37
rf
36 637 638
3 3
36
dD
08
31
72
29
67
28
60
25
9
etV 0
m 23
08
23
67 68 69
21 21 21
37
20
70 pM
21 lp
34 35 36
20 20 20
10
61
IS
33
20
17 18 19
27 27 27
16
27
59
25
15
27
65 66
28 28
14
27
57 58
25 25
E E 32 33
34
PP P 24 24
24
k
trA tm
m
71
29
64
28
07
31
trB
m
56
25
32
20
65 166
2
21
X
p
hs
06 07
23 23
64
21
30
20
pC pD
ah ah
05
23
13
27
pB
pb
B
pfk
63
28
N
lip
12
27
fp
S
ala
A
pro
62
28
eR
id
68 69
29 29
m
ap
54
25
26
24
28
20
03 04
23 23
01 02
23 23
B
60
33
42 43
32 32
57 58 59
33 33 33
41
32
00 501
35
3
51 52 53 55 lD
33 33 33 33 fo
54
33
cA
se
27
20
RS
G
_P
PE
sig
53
25
25
24
00
23
A4
gln
08
27
07
27
49 50 51 E
25 25 25 aro
tB 66
kd 29
57
28
A
sig
23
24
58
15
IS
G
htp
61
21
26
20
59 60
21 21
25
20
98 ssr 099 pB sX sE 103 104
ft 3
ft
3
30
3 sm
rU
pu
T
nic
pp
pA pB 545 546 547 548
lp 2 2 2 2
lp
70 771rgU rT
37
3 a se
68 69
37 37
J
es
m
rN
lp
t
hp
67
37
qG
66
37
94
34
fB 237 238
ke
3
3
94
30
95
30
62
29
63
29
rA
go
61
29
95
22
urX
m
18 19 20 21
24 24 24 24
hB
su
00
27
42
25
17
24
99
26
98
26
41
25
16
24
94
22
urD
m
20 21 22 23
20 20 20 20
18 19
20 20
RS
G
_P
54
PE
28
59 60
29 29
93
30
F
aro
15
24
92 93
22 22
W
fts
16 017
20
2
t
96 du
26
aro
95
26
aro
14
24
35
32
E
PP
92
30
58
29
52
28
93 94
26 26
31 dS 33 34
32 pv 32 32
ep
62
37
51
28
56 957
29
2
50
28
A
88 89
34 34 ots
30
32
55
29
bA
co
90
26
D
aro
36
25
urG
m
15
20
B
88 h
pO e
22 cd lp ss
urC
m
13
24
sT
yjc
E
07
16
IS
13 14
20 20
A B
trk trk
rp
Q
fts
12
20
Q
32 sB fp
25 nu e pep
10
24
78
38
13
dD
fa
54
29
co
sA
de
53
29
bB
09
24
89
26
i
ad
PE
83 M
22 lip
12 13 14 15 616
36 36 36 36
3
11
36
sA
cp
08 49
16 33
IS
2
88
30
52
29
sG
cy
'
61
48 15
33 IS
aro
87
30
51
29
A
efp
86 87 688
26 26
2
29
25
06
24
05
24
fd
xA
44 31 146 147 148 fiH
y
21 ag
2 2
2
w
B
pit
43
21
ots
55 Z W roV
37 pro pro
p
H
fts
B
ars
82 83
34 34
25
32
85
30
S
rr
m
pro
29
dD
fa
81
34
24
32
R
lip
49
29
A
ars
43 44
28 28
83
26
25
25
pA
26 27
25 25
pR
lp
le
80
22
10
61
IS
42
21
uU
le
78 79
22 22
05
20
2
40
pE
21
da
04
20
8
Nature © Macmillan Publishers Ltd 1998
4,350,000
4,200,000
4,050,000
3,900,000
3,750,000
3,600,000
3,450,000
3,300,000
3,150,000
3,000,000
2,850,000
2,700,000
2,550,000
2,400,000
Rv0823c
-
Rv0827c
-
Rv0890c
-
Rv0891c
Rv0894
Rv1019
-
Rv1049
-
Rv1129c
-
Rv1151c
Rv1152
-
Rv1167c
Rv1219c
Rv1255c
-
Rv1332
Rv1353c
-
Rv1358
-
Rv1359
Rv1395
-
Rv1404
-
Rv1423
Rv1460
Rv1474c
-
Rv1534
-
Rv1556
Rv1674c
Rv1675c
Rv1719
-
Rv1773c
-
Rv1776c
Rv1816
Rv1846c
Rv1931c
-
Rv1956
Rv1963c
Rv1985c
-
Rv1990c
Rv1994c
-
Rv2017
-
Rv2021c
Rv2034
-
Rv2175c
Rv2250c
Rv2258c
Rv2282c
-
Rv2308
Rv2324
-
Rv2358
-
Rv2488c
-
Rv2506
-
Rv2621c
Rv2640c
-
Rv2642
-
Rv2669
Rv2745c
Rv2779c
-
Rv2887
-
Rv2912c
-
Rv2989
-
Rv3050c
Rv3055
Rv3058c
Rv3060c
-
Rv3066
Rv3095
Rv3124
-
family)
transcriptional regulator
(NifR3/Smm1 family)
transcriptional regulator (ArsR
family)
transcriptional regulator
(LuxR/UhpA family)
putative transcriptional regulator
putative transcriptional regulator
transcriptional regulator
(TetR/AcrR family)
transcriptional regulator (MarR
family)
transcriptional regulator
(PbsX/Xre family)
putative transcriptional regulator
transcriptional regulator (GntR
family)
putative transcriptional regulator
putative transcriptional regulator
transcriptional regulator
(TetR/AcrR family)
putative transcriptional regulator
transcriptional regulator
(TetR/AcrR family)
transcriptional regulator
(LuxR/UhpA family)
putative transcriptional regulator
transcriptional regulator
(AraC/XylS family)
transcriptional regulator (MarR
family)
putative transcriptional regulator
putative transcriptional regulator
transcriptional regulator
(TetR/AcrR family)
transcriptional regulator
(TetR/AcrR family)
putative transcriptional regulator
putative transcriptional regulator
putative transcriptional regulator
transcriptional regulator (IclR
family)
transcriptional regulator (IclR
family)
putative transcriptional regulator
putative transcriptional regulator
putative transcriptional regulator
transcriptional regulator
(AraC/XylS family)
putative transcriptional regulator
putative transcriptional regulator
transcriptional regulator (LysR
family)
putative transcriptional regulator
transcriptional regulator (MerR
family)
putative transcriptional regulator
(PbsX/Xre family)
putative transcriptional regulator
transcriptional regulator (ArsR
family)
putative transcriptional regulator
putative transcriptional regulator
putative transcriptional regulator
transcriptional regulator (LysR
family)
putative transcriptional regulator
transcriptional regulator
(Lrp/AsnC family)
transcriptional regulator (ArsR
family)
transcriptional regulator
(LuxR/UhpA family)
transcriptional regulator
(TetR/AcrR family)
putative transcriptional regulator
transcriptional regulator (ArsR
family)
transcriptional regulator (ArsR
family)
putative transcriptional regulator
putative transcriptional regulator
transcriptional regulator
(Lrp/AsnC family)
transcriptional regulator (MarR
family)
transcriptional regulator
(TetR/AcrR family)
transcriptional regulator (IclR
family)
putative transcriptional regulator
putative transcriptional regulator
putative transcriptional regulator
transcriptional regulator (GntR
family)
putative transcriptional regulator
putative transcriptional regulator
transcriptional regulator
(AfsR/DndI/RedD family)
Rv3160c
Rv3167c
Rv3173c
-
Rv3183
Rv3208
-
Rv3249c
-
Rv3291c
-
Rv3295
-
Rv3334
-
Rv3405c
Rv3522
Rv3557c
-
Rv3574
-
Rv3575c
-
Rv3583c
Rv3676
-
Rv3678c
-
Rv3736
-
Rv3744
-
Rv3830c
-
Rv3833
-
Rv3840
Rv3855
-
putative transcriptional regulator
putative transcriptional regulator
transcriptional regulator
(TetR/AcrR family)
putative transcriptional regulator
transcriptional regulator
(TetR/AcrR family)
transcriptional regulator
(TetR/AcrR family)
transcriptional regulator
(Lrp/AsnC family)
transcriptional regulator
(TetR/AcrR family)
transcriptional regulator (MerR
family)
putative transcriptional regulator
putative transcriptional regulator
transcriptional regulator
(TetR/AcrR family)
transcriptional regulator
(TetR/AcrR family)
transcriptional regulator (LacI
family)
putative transcriptional regulator
transcriptional regulator (Crp/Fnr
family)
transcriptional regulator (LysR
family)
transcriptional regulator
(AraC/XylS family)
transcriptional regulator (ArsR
family)
transcriptional regulator
(TetR/AcrR family)
transcriptional regulator
(AraC/XylS family)
putative transcriptional regulator
putative transcriptional regulator
2. Two component systems
Rv1028c kdpD
sensor histidine kinase
Rv1027c kdpE
two-component response
regulator
Rv3246c mtrA
two-component response
regulator
Rv3245c mtrB
sensor histidine kinase
Rv0844c narL
two-component response
regulator
Rv0757
phoP
two-component response
regulator
Rv0758
phoR
sensor histidine kinase
Rv0491
regX3
two-component response
regulator
Rv0490
senX3
sensor histidine kinase
Rv0602c tcrA
two-component response
regulator
Rv0260c two-component response
regulator
Rv0600c sensor histidine kinase
Rv0601c sensor histidine kinase
Rv0818
two-component response
regulator
Rv0845
sensor histidine kinase
Rv0902c sensor histidine kinase
Rv0903c two-component response
regulator
Rv0981
two-component response
regulator
Rv0982
sensor histidine kinase
Rv1032c sensor histidine kinase
Rv1033c two-component response
regulator
Rv1626
two-component response
regulator
Rv2027c sensor histidine kinase
Rv2884
two-component response
regulator
Rv3132c sensor histidine kinase
Rv3133c two-component response
regulator
Rv3143
putative sensory transduction
protein
Rv3220c sensor histidine kinase
Rv3764c sensor histidine kinase
Rv3765c two-component response
regulator
3. Serine-threonine protein kinases and phosphoprotein
phosphatases
Rv0015c pknA
serine-threonine protein kinase
Rv0014c pknB
serine-threonine protein kinase
Rv0931c pknD
serine-threonine protein kinase
Rv1743
pknE
serine-threonine protein kinase
Rv1746
pknF
serine-threonine protein kinase
Rv0410c pknG
serine-threonine protein kinase
Rv1266c pknH
serine-threonine protein kinase
Rv2914c pknI
serine-threonine protein kinase
Rv2088
pknJ
serine-threonine protein kinase
Rv3080c pknK
serine-threonine protein kinase
Rv2176
pknL
serine-threonine protein kinase,
Nature © Macmillan Publishers Ltd 1998
Rv0018c
ppp
Rv2234
ptpA
Rv0153c
-
truncated
putative phosphoprotein phosphatase
low molecular weight protein-tyrosine-phosphatase
putative protein-tyrosine-phosphatase
II. Macromolecule metabolism
A. Synthesis and modification of macromolecules
1. Ribosomal protein synthesis and modification
Rv3420c rimI
ribosomal protein S18 acetyl
transferase
Rv0995
rimJ
acetylation of 30S S5 subunit
Rv0641
rplA
50S ribosomal protein L1
Rv0704
rplB
50S ribosomal protein L2
Rv0701
rplC
50S ribosomal protein L3
Rv0702
rplD
50S ribosomal protein L4
Rv0716
rplE
50S ribosomal protein L5
Rv0719
rplF
50S ribosomal protein L6
Rv0056
rplI
50S ribosomal protein L9
Rv0651
rplJ
50S ribosomal protein L10
Rv0640
rplK
50S ribosomal protein L11
Rv0652
rplL
50S ribosomal protein L7/L12
Rv3443c rplM
50S ribosomal protein L13
Rv0714
rplN
50S ribosomal protein L14
Rv0723
rplO
50S ribosomal protein L15
Rv0708
rplP
50S ribosomal protein L16
Rv3456c rplQ
50S ribosomal protein L17
Rv0720
rplR
50S ribosomal protein L18
Rv2904c rplS
50S ribosomal protein L19
Rv1643
rplT
50S ribosomal protein L20
Rv2442c rplU
50S ribosomal protein L21
Rv0706
rplV
50S ribosomal protein L22
Rv0703
rplW
50S ribosomal protein L23
Rv0715
rplX
50S ribosomal protein L24
Rv1015c rplY
50S ribosomal protein L25
Rv2441c rpmA
50S ribosomal protein L27
Rv0105c rpmB
50S ribosomal protein L28
Rv2058c rpmB2 50S ribosomal protein L28
Rv0709
rpmC
50S ribosomal protein L29
Rv0722
rpmD
50S ribosomal protein L30
Rv1298
rpmE
50S ribosomal protein L31
Rv2057c rpmG
50S ribosomal protein L33
Rv3924c rpmH
50S ribosomal protein L34
Rv1642
rpmI
50S ribosomal protein L35
Rv3461c rpmJ
50S ribosomal protein L36
Rv1630
rpsA
30S ribosomal protein S1
Rv2890c rpsB
30S ribosomal protein S2
Rv0707
rpsC
30S ribosomal protein S3
Rv3458c rpsD
30S ribosomal protein S4
Rv0721
rpsE
30S ribosomal protein S5
Rv0053
rpsF
30S ribosomal protein S6
Rv0683
rpsG
30S ribosomal protein S7
Rv0718
rpsH
30S ribosomal protein S8
Rv3442c rpsI
30S ribosomal protein S9
Rv0700
rpsJ
30S ribosomal protein S10
Rv3459c rpsK
30S ribosomal protein S11
Rv0682
rpsL
30S ribosomal protein S12
Rv3460c rpsM
30S ribosomal protein S13
Rv0717
rpsN
30S ribosomal protein S14
Rv2056c rpsN2
30S ribosomal protein S14
Rv2785c rpsO
30S ribosomal protein S15
Rv2909c rpsP
30S ribosomal protein S16
Rv0710
rpsQ
30S ribosomal protein S17
Rv0055
rpsR
30S ribosomal protein S18
Rv2055c rpsR2
30S ribosomal protein S18
Rv0705
rpsS
30S ribosomal protein S19
Rv2412
rpsT
30S ribosomal protein S20
Rv3241c member of S30AE ribosomal
protein family
2. Ribosome modification and maturation
Rv1010
ksgA
16S rRNA dimethyltransferase
Rv2838c rbfA
ribosome-binding factor A
Rv2907c rimM
16S rRNA processing protein
3. Aminoacyl tRNA synthases and their modification
Rv2555c alaS
alanyl-tRNA synthase
Rv1292
argS
arginyl-tRNA synthase
Rv2572c aspS
aspartyl-tRNA synthase
Rv3580c cysS
cysteinyl-tRNA synthase
Rv2130c cysS2
cysteinyl-tRNA synthase
Rv1406
fmt
methionyl-tRNA formyltransferase
Rv3011c gatA
glu-tRNA-gln amidotransferase,
subunit B
Rv3009c gatB
glu-tRNA-gln amidotransferase,
subunit A
Rv3012c gatC
glu-tRNA-gln amidotransferase,
subunit C
Rv2992c gltS
glutamyl-tRNA synthase
Rv2357c glyS
glycyl-tRNA synthase
Rv2580c hisS
histidyl-tRNA synthase
Rv1536
ileS
isoleucyl-tRNA synthase
Rv0041
leuS
leucyl-tRNA synthase
Rv3598c lysS
lysyl-tRNA synthase
Rv1640c lysX
C-term lysyl-tRNA synthase
Rv1007c metS
methionyl-tRNA synthase
Rv1649
pheS
phenylalanyl-tRNA synthase α
subunit
8
Rv1650
Rv2845c
Rv3834c
Rv2614c
Rv2906c
Rv3336c
Rv1689
Rv2448c
pheT
proS
serS
thrS
trmD
trpS
tyrS
valS
4. Nucleoproteins
Rv1407
fmu
Rv3852
hns
Rv2986c hupB
Rv1388
mIHF
phenylalanyl-tRNA synthase β
subunit
prolyl-tRNA synthase
seryl-tRNA synthase
threonyl-tRNA synthase
tRNA (guanine-N1)-methyltransferase
tryptophanyl tRNA synthase
tyrosyl-tRNA synthase
valyl-tRNA synthase
similar to Fmu protein
HU-histone protein
DNA-binding protein II
integration host factor
5. DNA replication, repair, recombination and restriction/modification
Rv1317c alkA
DNA-3-methyladenine glycosidase II
Rv2836c dinF
DNA-damage-inducible protein F
Rv1329c dinG
probable ATP-dependent helicase
Rv3056
dinP
DNA-damage-inducible protein
Rv1537
dinX
probable DNA-damage-inducible
protein
Rv0001
dnaA
chromosomal replication initiator
protein
Rv0058
dnaB
DNA helicase (contains intein)
Rv1547
dnaE1
DNA polymerase III, α subunit
Rv3370c dnaE2
DNA polymerase III α chain
Rv2343c dnaG
DNA primase
Rv0002
dnaN
DNA polymerase III, β subunit
Rv3711c dnaQ
DNA polymerase III e chain
Rv3721c dnaZX
DNA polymerase III, γ (dnaZ) and
τ (dnaX)
Rv2924c fpg
formamidopyrimidine-DNA glycosylase
Rv0006
gyrA
DNA gyrase subunit A
Rv0005
gyrB
DNA gyrase subunit B
Rv2092c helY
probable helicase, Ski2 subfamily
Rv2101
helZ
probable helicase, Snf2/Rad54
family
Rv2756c hsdM
type I restriction/modification system DNA methylase
Rv2755c hsdS'
type I restriction/modification system specificity determinant
Rv3296
lhr
ATP-dependent helicase
Rv3014c ligA
DNA ligase
Rv3062
ligB
DNA ligase
Rv3731
ligC
probable DNA ligase
Rv1020
mfd
transcription-repair coupling factor
Rv2528c mrr
restriction system protein
Rv2985
mutT1
MutT homologue
Rv1160
mutT2
MutT homologue
Rv0413
mutT3
MutT homologue
Rv3589
mutY
probable DNA glycosylase
Rv3297
nei
probable endonuclease VIII
Rv3674c nth
probable endonuclease III
Rv1316c ogt
methylated-DNA-protein-cysteine
methyltransferase
Rv1629
polA
DNA polymerase I
Rv1402
priA
putative primosomal protein n'
(replication factor Y)
Rv3585
radA
probable DNA repair RadA homologue
Rv2737c recA
recombinase (contains intein)
Rv0630c recB
exodeoxyribonuclease V
Rv0631c recC
exodeoxyribonuclease V
Rv0629c recD
exodeoxyribonuclease V
Rv0003
recF
DNA replication and SOS induction
Rv2973c recG
ATP-dependent DNA helicase
Rv1696
recN
recombination and DNA repair
Rv3715c recR
RecBC-Independent process of
DNA repair
Rv2736c recX
regulatory protein for RecA
Rv2593c ruvA
Holliday junction binding protein,
DNA helicase
Rv2592c ruvB
Holliday junction binding protein
Rv2594c ruvC
Holliday junction resolvase, endodeoxyribonuclease
Rv0054
ssb
single strand binding protein
Rv1210
tagA
DNA-3-methyladenine glycosidase I
Rv3646c topA
DNA topoisomerase
Rv2976c ung
uracil-DNA glycosylase
Rv1638
uvrA
excinuclease ABC subunit A
Rv1633
uvrB
excinuclease ABC subunit B
Rv1420
uvrC
excinuclease ABC subunit C
Rv0949
uvrD
DNA-dependent ATPase I and
helicase II
Rv3198c uvrD2
putative UvrD
Rv0427c xthA
exodeoxyribonuclease III
Rv0071
group II intron maturase
Rv0861c probable DNA helicase
Rv0944
possible formamidopyrimidineDNA glycosylase
Rv1688
probable 3-methylpurine DNA
glycosylase
Rv2090
-
Rv2191
-
Rv2464c
-
Rv3201c
-
Rv3202c
Rv3263
Rv3644c
-
6. Protein
Rv0429c
Rv2534c
Rv2882c
Rv0684
Rv0120c
Rv1080c
Rv3462c
Rv2839c
Rv1641
Rv0009
Rv2582
Rv1299
Rv3105c
Rv2889c
Rv0685
translation
def
efp
frr
fusA
fusA2
greA
infA
infB
infC
ppiA
ppiB
prfA
prfB
tsf
tuf
partially similar to DNA polymerase I
similar to both PolC and UvrC
proteins
probable DNA glycosylase,
endonuclease VIII
probable ATP-dependent DNA
helicase
similar to UvrD proteins
probable DNA methylase
similar in N-term to DNA polymerase III
2. DNA
Rv0670
Rv1108c
Rv1107c
Rv2460c
clpP2
and modification
polypeptide deformylase
elongation factor P
ribosome recycling factor
elongation factor G
elongation factor G
transcription elongation factor G
initiation factor IF-1
initiation factor IF-2
initiation factor IF-3
peptidyl-prolyl cis-trans isomerase
peptidyl-prolyl cis-trans isomerase
peptide chain release factor 1
peptide chain release factor 2
elongation factor EF-Ts
elongation factor EF-Tu
Rv2457c
clpX
7. RNA synthesis, RNA modification and DNA
transcription
Rv1253
deaD
ATP-dependent DNA/RNA
helicase
Rv2783c gpsI
pppGpp synthase and polyribonucleotide phosphorylase
Rv2841c nusA
transcription termination factor
Rv2533c nusB
N-utilization substance protein B
Rv0639
nusG
transcription antitermination
protein
Rv3907c pcnA
polynucleotide polymerase
Rv3232c pvdS
alternative sigma factor for
siderophore production
Rv3211
rhlE
probable ATP-dependent
RNA helicase
Rv1297
rho
transcription termination
factor rho
Rv3457c rpoA
α subunit of RNA polymerase
Rv0667
rpoB
β subunit of RNA polymerase
Rv0668
rpoC
β' subunit of RNA polymerase
Rv1364c rsbU
SigB regulation protein
Rv3287c rsbW
anti-sigma B factor
Rv2703
sigA
RNA polymerase sigma factor
(aka MysA, RpoV)
Rv2710
sigB
RNA polymerase sigma factor
(aka MysB)
Rv2069
sigC
ECF subfamily sigma subunit
Rv3414c sigD
ECF subfamily sigma subunit
Rv1221
sigE
ECF subfamily sigma subunit
Rv3286c sigF
ECF subfamily sigma subunit
Rv0182c sigG
sigma-70 factors ECF subfamily
Rv3223c sigH
ECF subfamily sigma subunit
Rv1189
sigI
ECF family sigma factor
Rv3328c sigJ
similar to SigI, ECF family
Rv0445c sigK
ECF-type sigma factor
Rv0735
sigL
sigma-70 factors ECF subfamily
Rv3911
sigM
probable sigma factor, similar to
SigE
Rv3366
spoU
probable rRNA methylase
Rv3455c truA
probable pseudouridylate synthase
Rv2793c truB
tRNA pseudouridine 55 synthase
Rv1644
tsnR
putative 23S rRNA methyltransferase
Rv3649
ATP-dependent DNA/RNA helicase
8. Polysaccharides (cytoplasmic)
Rv1326c glgB
1,4-α-glucan branching enzyme
Rv1328
glgP
probable glycogen phosphorylase
Rv1564c glgX
probable glycogen debranching
enzyme
Rv1563c glgY
putative α-amylase
Rv1562c glgZ
maltooligosyltrehalose trehalohydrolase
Rv0126
probable glycosyl hydrolase
Rv1781c probable 4-α-glucanotransferase
Rv2471
probable maltase α-glucosidase
B. Degradation of macromolecules
1. RNA
Rv1014c pth
peptidyl-tRNA hydrolase
Rv2925c rnc
RNAse III
Rv2444c rne
similar at C-term to ribonuclease E
Rv2902c rnhB
ribonuclease HII
Rv3923c rnpA
ribonuclease P protein component
Rv1340
rphA
ribonuclease PH
Nature © Macmillan Publishers Ltd 1998
end
xseA
xseB
endonuclease IV (apurinase)
exonuclease VII large subunit
exonuclease VII small subunit
3. Proteins, peptides
Rv3305c amiA
Rv3306c amiB
Rv3596c clpC
Rv2461c clpP
Rv2667
clpX'
Rv3419c
Rv2725c
Rv1223
Rv2861c
Rv0734
gcp
hflX
htrA
map
map'
Rv0319
Rv0125
Rv2213
Rv0800
Rv2467
Rv2089c
Rv2535c
Rv2782c
pcp
pepA
pepB
pepC
pepD
pepE
pepQ
pepR
Rv2109c
Rv2110c
Rv0782
Rv0781
Rv0724
prcA
prcB
ptrBa
ptrBb
sppA
Rv0198c
Rv0457c
Rv0840c
Rv0983
Rv1977
Rv3668c
Rv3671c
Rv3883c
Rv3886c
-
4. Polysaccharides,
lipids
Rv0062
celA
Rv3915
cwlM
Rv0315
Rv1090
Rv1327c
-
Rv1333
Rv3463
Rv3717
-
and glycopeptides
probable aminohydrolase
probable aminohydrolase
ATP-dependent Clp protease
ATP-dependent Clp protease proteolytic subunit
ATP-dependent Clp protease proteolytic subunit
ATP-dependent Clp protease
ATP-binding subunit ClpX
similar to ClpC from M. leprae but
shorter
glycoprotease
GTP-binding protein
serine protease
methionine aminopeptidase
probable methionine aminopeptidase
pyrrolidone-carboxylate peptidase
probable serine protease
aminopeptidase A/I
aminopeptidase I
probable aminopeptidase
cytoplasmic peptidase
cytoplasmic peptidase
protease/peptidase, M16 family
(insulinase)
proteasome α-type subunit 1
proteasome β-type subunit 2
protease II, α subunit
protease II, β subunit
protease IV, signal peptide peptidase
probable zinc metalloprotease
probable peptidase
probable proline iminopeptidase
probable serine protease
probable zinc metallopeptidase
probable alkaline serine protease
probable serine protease
probable secreted protease
protease
lipopolysaccharides and phosphocellulase/endoglucanase
hydrolase
probable β-1,3-glucanase
probable inactivated
cellulase/endoglucanase
probable glycosyl hydrolase, αamylase family
probable hydrolase
probable neuraminidase
possible N-acetylmuramoyl-L-alanine amidase
5. Esterases and lipases
Rv0220
lipC
probable esterase
Rv1923
lipD
probable esterase
Rv3775
lipE
probable hydrolase
Rv3487c lipF
probable esterase
Rv0646c lipG
probable hydrolase
Rv1399c lipH
probable lipase
Rv1400c lipI
probable lipase
Rv1900c lipJ
probable esterase
Rv2385
lipK
probable acetyl-hydrolase
Rv1497
lipL
esterase
Rv2284
lipM
probable esterase
Rv2970c lipN
probable lipase/esterase
Rv1426c lipO
probable esterase
Rv2463
lipP
probable esterase
Rv2485c lipQ
probable carboxlyesterase
Rv3084
lipR
probable acetyl-hydrolase
Rv3176c lipS
probable esterase/lipase
Rv2045c lipT
probable carboxylesterase
Rv1076
lipU
probable esterase
Rv3203
lipV
probable lipase
Rv0217c lipW
probable esterase
Rv2351c plcA
phospholipase C precursor
Rv2350c plcB
phospholipase C precursor
Rv2349c plcC
phospholipase C precursor
Rv1755c plcD
partial CDS for phospholipase C
Rv1104
probable esterase pseudogene
Rv1105
probable esterase pseudogene
6. Aromatic hydrocarbons
Rv3469c mhpE
probable 4-hydroxy-2-oxovalerate
aldolase
Rv0316
probable muconolactone isomerase
Rv0771
probable 4-carboxymuconolactone decarboxylase
Rv0939
probable dehydrase
Rv1723
6-aminohexanoate-dimer hydro-
8
Rv2715
-
Rv3530c
Rv3534c
Rv3536c
-
lase
2-hydroxymuconic semialdehyde
hydrolase
probable cis-diol dehydrogenase
4-hydroxy-2-oxovalerate aldolase
aromatic hydrocarbon degradation
C. Cell envelope
1. Lipoproteins (lppA-lpr0) 65
2. Surface polysaccharides, lipopolysaccharides, proteins and antigens
Rv0806c cpsY
probable UDP-glucose-4epimerase
Rv3811
csp
secreted protein
Rv1677
dsbF
highly similar to C-term Mpt53
Rv3794
embA
involved in arabinogalactan synthesis
Rv3795
embB
involved in arabinogalactan synthesis
Rv3793
embC
involved in arabinogalactan synthesis
Rv3875
esat6
early secretory antigen target
Rv0112
gca
probable GDP-mannose dehydratase
Rv0113
gmhA
phosphoheptose isomerase
Rv2965c kdtB
lipopolysaccharide core biosynthesis protein
Rv2878c mpt53
secreted protein Mpt53
Rv1980c mpt64
secreted immunogenic protein
Mpb64/Mpt64
Rv2875
mpt70
major secreted immunogenic protein Mpt70 precursor
Rv2873
mpt83
surface lipoprotein Mpt83
Rv0899
ompA
member of OmpA family
Rv3810
pirG
cell surface protein precursor (Erp
protein)
Rv3782
rfbE
similar to rhamnosyl transferase
Rv1302
rfe
undecaprenyl-phosphate α-Nacetylglucosaminyltransferase
Rv2145c wag31
antigen 84 (aka wag31)
Rv0431
tuberculin related peptide (AT103)
Rv0954
cell envelope antigen
Rv1514c involved in polysaccharide synthesis
Rv1518
involved in exopolysaccharide
synthesis
Rv1758
partial cutinase
Rv1910c probable secreted protein
Rv1919c weak similarity to pollen antigens
Rv1984c probable secreted protein
Rv1987
probable secreted protein
Rv2223c probable exported protease
Rv2224c probable exported protease
Rv2301
probable cutinase
Rv2345
precursor of probable membrane
protein
Rv2672
putative exported protease
Rv3019c similar to Esat6
Rv3036c probable secreted protein
Rv3449
probable precursor of serine protease
Rv3451
probable cutinase
Rv3452
probable cutinase precursor
Rv3724
probable cutinase precursor
3. Murein
Rv2911
Rv2981c
Rv3809c
Rv1018c
Rv3382c
Rv1110
Rv1315
Rv0482
Rv2152c
Rv2155c
Rv2158c
Rv2157c
Rv2153c
Rv1338
Rv2156c
Rv3332
Rv0016c
Rv2163c
Rv0050
Rv3682
Rv0017c
Rv0907
sacculus and peptidoglycan
dacB
penicillin binding protein
ddlA
D-alanine-D-alanine ligase A
glf
UDP-galactopyranose mutase
glmU
UDP-N-acetylglucosamine
pyrophosphorylase
lytB
LytB protein homologue
lytB'
very similar to LytB
murA
UDP-N-acetylglucosamine-1-carboxyvinyltransferase
murB
UDP-N-acetylenolpyruvoylglucosamine reductase
murC
UDP-N-acetyl-muramate-alanine
ligase
murD
UDP-N-acetylmuramoylalanine-Dglutamate ligase
murE
meso-diaminopimelate-adding
enzyme
murF
D-alanine:D-alanine-adding
enzyme
murG
transferase in peptidoglycan synthesis
murI
glutamate racemase
murX
phospho-N-acetylmuramoylpetapeptide transferase
nagA
N-acetylglucosamine-6-Pdeacetylase
pbpA
penicillin-binding protein
pbpB
penicillin-binding protein 2
ponA
penicillin-bonding protein
ponA'
class A penicillin binding protein
rodA
FtsW/RodA/SpovE family
probable penicillin binding protein
Rv1367c
Rv1730c
Rv1922
Rv2864c
Rv3330
Rv3627c
-
probable
probable
probable
probable
probable
probable
penicillin
penicillin
penicillin
penicillin
penicillin
penicillin
binding
binding
binding
binding
binding
binding
protein
protein
protein
protein
protein
protein
4. Conserved membrane proteins
Rv0402c mmpL1 conserved large membrane
protein
Rv0507
mmpL2 conserved large membrane
protein
Rv0206c mmpL3 conserved large membrane
protein
Rv0450c mmpL4 conserved large membrane
protein
Rv0676c mmpL5 conserved large membrane
protein
Rv1557
mmpL6 conserved large membrane
protein
Rv2942
mmpL7 conserved large membrane
protein
Rv3823c mmpL8 conserved large membrane
protein
Rv2339
mmpL9 conserved large membrane
protein
Rv1183
mmpL10 conserved large membrane
protein
Rv0202c mmpL11 conserved large membrane
protein
Rv1522c mmpL12 conserved large membrane
protein
Rv0403c mmpS1 conserved small membrane
protein
Rv0506
mmpS2 conserved small membrane
protein
Rv2198c mmpS3 conserved small membrane
protein
Rv0451c mmpS4 conserved small membrane
protein
Rv0677c mmpS5 conserved small membrane
protein
5. Other membrane proteins 211
III. Cell processes
A. Transport/binding proteins
1. Amino acids
Rv2127
ansP
L-asparagine permease
Rv0346c aroP2
probable aromatic amino acid
permease
Rv0917
betP
glycine betaine transport
Rv1704c cycA
transport of D-alanine, D-serine
and glycine
Rv3666c dppA
probable peptide transport system
permease
Rv3665c dppB
probable peptide transport system
permease
Rv3664c dppC
probable peptide transport system
permease
Rv3663c dppD
probable ABC-transporter
Rv0522
gabP
probable 4-amino butyrate transporter
Rv0411c glnH
putative glutamine binding protein
Rv2564
glnQ
probable ATP-binding transport
protein
Rv1280c oppA
probable oligopeptide transport
protein
Rv1283c oppB
oligopeptide transport protein
Rv1282c oppC
oligopeptide transport system permease
Rv1281c oppD
probable peptide transport protein
Rv2320c rocE
arginine/ornithine transporter
Rv3253c probable cationic amino acid
transport
Rv3454
possible proline permease
2. Cations
Rv2920c amt
Rv1607
chaA
Rv1239c corA
Rv0092
Rv0103c
Rv3270
Rv1469
ctpA
ctpB
ctpC
ctpD
Rv0908
Rv1997
Rv1992c
Rv0425c
ctpE
ctpF
ctpG
ctpH
Rv0107c
ctpI
Rv0969
Rv3044
Rv0265c
trate
Rv1029
ctpV
fecB
fecB2
kdpA
putative ammonium transporter
putative calcium/proton antiporter
probable magnesium and cobalt
transport protein
cation-transporting ATPase
cation transport ATPase
cation transport ATPase
probable cadmium-transporting
ATPase
probable cation transport ATPase
probable cation transport ATPase
probable cation transport ATPase
C-terminal region putative cationtransporting ATPase
probable magnesium transport
ATPase
cation transport ATPase
putative FeIII-dicitrate transporter
iron transport protein FeIII dicitransporter
potassium-transporting ATPase A
chain
Nature © Macmillan Publishers Ltd 1998
Rv1030
kdpB
Rv1031
kdpC
Rv3236c
kefB
Rv2877c
merT
Rv1811
mgtC
Rv0362
mgtE
Rv2856
Rv0924c
nicT
nramp
Rv2691
trkA
Rv2692
trkB
Rv2287
Rv2723
yjcE
-
Rv3162c
Rv3237c
-
Rv3743c
-
potassium-transporting ATPase B
chain
potassium-transporting ATPase C
chain
probable glutathione-regulated
potassium-efflux protein
possible mercury resistance
transport system
probable magnesium transport
ATPase protein C
putative magnesium ion
transporter
probable nickel transport protein
transmembrane protein belonging
to Nramp family
probable potassium uptake protein
probable potassium uptake protein
probable Na+/H+ exchanger
probable membrane protein,
tellurium resistance
probable membrane protein
possible potassium channel
protein
probable cation-transporting
ATPase
3. Carbohydrates, organic acids and alcohols
Rv2443
dctA
C4-dicarboxylate transport protein
Rv3476c kgtP
sugar transport protein
Rv1902c nanT
probable sialic acid transporter
Rv1236
sugA
membrane protein probably
involved in sugar transport
Rv1237
sugB
sugar transport protein
Rv1238
sugC
ABC transporter component of
sugar uptake system
Rv3331
sugI
probable sugar transport protein
Rv2835c ugpA
sn-glycerol-3-phosphate
permease
Rv2833c ugpB
sn-glycerol-3-phosphate-binding
periplasmic lipoprotein
Rv2832c ugpC
sn-glycerol-3-phosphate transport
ATP-binding protein
Rv2834c ugpE
sn-glycerol-3-phosphate transport
system protein
Rv2316
uspA
sugar transport protein
Rv2318
uspC
sugar transport protein
Rv2317
uspE
sugar transport protein
Rv1200
probable sugar transporter
Rv2038c probable ABC sugar transporter
Rv2039c probable sugar transporter
Rv2040c probable sugar transporter
Rv2041c probable sugar transporter
4. Anions
Rv2684
Rv2685
Rv3578
Rv2643
Rv2397c
arsA
arsB
arsB2
arsC
cysA
Rv2399c
cysT
Rv2398c
cysW
Rv1857
Rv1858
modA
modB
Rv1859
modC
Rv1860
modD
Rv2329c
Rv1737c
Rv0261c
Rv0267
narK1
narK2
narK3
narU
Rv0934
phoS1
Rv0928
phoS2
Rv0820
phoT
Rv3301c
phoY1
Rv0821c
phoY2
Rv0545c
pitA
Rv2281
Rv0930
pitB
pstA1
Rv0936
pstA2
Rv0933
pstB
Rv0935
pstC
Rv0929
pstC2
probable arsenical pump
probable arsenical pump
probable arsenical pump
probable arsenical pump
sulphate transport ATP-binding
protein
sulphate transport system permease protein
sulphate transport system permease protein
molybdate binding protein
transport system permease,
molybdate uptake
molybdate uptake ABCtransporter
precursor of Apa (45/47
kD secreted protein)
probable nitrite extrusion protein
nitrite extrusion protein
nitrite extrusion protein1
similar to nitrite extrusion
protein 2
PstS component of phosphate
uptake
PstS component of phosphate
uptake
phosphate transport system ABC
transporter
phosphate transport system
regulator
phosphate transport system
regulator
low-affinity inorganic phosphate
transporter
phosphate permease
PstA component of phosphate
uptake
PstA component of phosphate
uptake
ABC transport component of
phosphate uptake
PstC component of phosphate
uptake
membrane-bound component of
8
Rv0932c
pstS
Rv2400c
Rv0143c
Rv1707
Rv1739c
Rv3679
Rv3680
subI
-
phosphate transport system
PstS component of phosphate
uptake
sulphate binding precursor
probable chloride channel
probable sulphate permease
possible sulphate transporter
possible anion transporter
probable anion transporter
5. Fatty acid transport
Rv2790c ltp1
non-specific lipid transport protein
Rv3540c ltp2
non-specific lipid transport protein
6. Efflux proteins
Rv2936
drrA
Rv2937
drrB
Rv2938
drrC
Rv2846c
Rv3065
Rv0783c
Rv0849
Rv1145
Rv1146
Rv1250
Rv1258c
efpA
emrE
-
Rv1410c
Rv1634
Rv1819c
-
Rv2136c
-
Rv2209
Rv2333c
-
Rv2994
-
Rv1877
Rv2459
-
similar daunorubicin resistance
ABC-transporter
similar daunorubicin resistance
transmembrane protein
similar daunorubicin resistance
transmembrane protein
putative efflux protein
resistance to ethidium bromide
multidrug resistance protein
possible quinolone efflux pump
probable drug transporter
probable drug transporter
probable drug efflux protein
probable multidrug resistance
pump
probable drug efflux protein
probable drug efflux protein
probable multidrug resistance
pump
putative bacitracin resistance protein
probable drug efflux protein
probable tetracenomycin C resistance protein
probable fluoroquinolone efflux
protein
probable drug efflux protein
probable drug efflux protein
B. Chaperones/Heat shock
Rv0384c clpB
heat shock protein
Rv0352
dnaJ
acts with GrpE to stimulate DnaK
ATPase
Rv2373c dnaJ2
DnaJ homologue
Rv0350
dnaK
70 kD heat shock protein, chromosome replication
Rv3417c groEL1 60 kD chaperonin 1
Rv0440
groEL2 60 kD chaperonin 2
Rv3418c groES
10 kD chaperone
Rv0351
grpE
stimulates DnaK ATPase activity
Rv2374c hrcA
heat-inducible transcription
repressor
Rv0251c hsp
possible heat shock protein
Rv0353
hspR
heat shock regulator
Rv2031c hspX
14kD antigen, heat shock protein
Hsp20 family
Rv2299c htpG
heat shock protein Hsp90 family
Rv0563
htpX
probable (transmembrane) heat
shock protein
Rv2701c suhB
putative extragenic suppressor
protein
Rv3269
probable heat shock protein
C. Cell division
Rv3641c fic
Rv3102c ftsE
Rv3610c ftsH
Rv2748c
Rv2151c
Rv2154c
ftsK
ftsQ
ftsW
Rv3101c
Rv2921c
Rv2150c
Rv3919c
Rv3625c
Rv3917c
ftsX
ftsY
ftsZ
gid
mesJ
parA
Rv3918c
parB
Rv2922c
smc
Rv0012
Rv0435c
Rv2115c
Rv3213c
-
Rv1708
-
possible cell division protein
membrane protein
inner membrane protein,
chaperone
chromosome partitioning
ingrowth of wall at septum
membrane protein (shape determination)
membrane protein
cell division protein FtsY
circumferential ring, GTPase
glucose inhibited division protein B
probable cell cycle protein
chromosome partitioning; DNA binding
possibly involved in chromosome
partitioning
member of Smc1/Cut3/Cut14
family
possible cell division protein
ATPase of AAA-family
ATPase of AAA-family
possible role in chromosome segregation
possible role in chromosome partitioning
D. Protein and peptide secretion
Rv2916c ffh
signal recognition particle protein
Rv2903c lepB
signal peptidase I
Rv1614
lgt
prolipoprotein diacylglyceryl transferase
Rv1539
lspA
lipoprotein signal peptidase
Rv0379
sec
probable transport protein
SecE/Sec61- γ family
Rv3240c secA
SecA, preprotein translocase sub-
Rv1821
secA2
Rv2587c
Rv0638
Rv2586c
Rv1440
secD
secE
secF
secG
Rv0732
secY
Rv2462c
tig
Rv2813
-
unit
SecA, preprotein translocase subunit
protein-export membrane protein
SecE preprotein translocase
protein-export membrane protein
protein-export membrane protein
SecG
SecY subunit of preprotein translocase
chaperone protein, similar to
trigger factor
probable general secretion pathway protein
E. Adaptations and atypical conditions
Rv1901
cinA
competence damage protein
Rv3648c cspA
cold shock protein, transcriptional
regulator
Rv0871
cspB
probable cold shock protein
Rv3063
cstA
starvation-induced stress
response protein
Rv3490
otsA
probable α,α-trehalose-phosphate
synthase
Rv2006
otsB
trehalose-6-phosphate phosphatase
Rv3372
otsB2
trehalose-6-phosphate phosphatase
Rv3758c proV
osmoprotection ABC transporter
Rv3757c proW
transport system permease
Rv3759c proX
similar to osmoprotection proteins
Rv3756c proZ
transport system permease
Rv1026
probable pppGpp-5'phosphohydrolase
F. Detoxification
Rv2428
ahpC
Rv2429
ahpD
Rv2238c ahpE
Rv2521
bcp
Rv1608c bcpB
Rv3473c
bpoA
Rv1123c
bpoB
Rv0554
bpoC
Rv3617
Rv1938
Rv1124
Rv2214c
Rv3670
Rv0134
Rv3171c
ephA
ephB
ephC
ephD
ephE
ephF
hpx
Rv1908c
Rv3846
Rv0432
katG
sodA
sodC
Rv1932
Rv0634c
Rv2581c
Rv3177
tpx
-
IV. Other
A. Virulence
Rv0169
mce1
Rv0589
mce2
Rv1966
mce3
Rv3499c mce4
Rv3100c smpB
Rv1694
tlyA
Rv0024
Rv0167
Rv0168
Rv0170
Rv0171
Rv0172
Rv0174
Rv0587
Rv0588
Rv0590
Rv0591
Rv0592
Rv0594
Rv1085c Rv1477
Rv1478
-
Rv1566c
-
Rv1964
Rv1965
Rv1967
Rv1968
Rv1969
Rv1971
Rv2190c
Rv3494c
Rv3496c
Rv3497c
Rv3498c
-
alkyl hydroperoxide reductase
member of AhpC/TSA family
member of AhpC/TSA family
bacterioferritin comigratory protein
probable bacterioferritin comigratory protein
probable non-heme bromoperoxidase
probable non-heme bromoperoxidase
probable non-heme bromoperoxidase
probable epoxide hydrolase
probable epoxide hydrolase
probable epoxide hydrolase
probable epoxide hydrolase
probable epoxide hydrolase
probable epoxide hydrolase
probable non-heme haloperoxidase
catalase-peroxidase
superoxide dismutase
superoxide dismutase precursor (Cu-Zn)
thiol peroxidase
putative glyoxylase II
putative glyoxylase II
probable non-heme haloperoxidase
cell invasion protein
cell invasion protein
cell invasion protein
cell invasion protein
probable small protein b
cytotoxin/hemolysin homologue
putative p60 homologue
part of mce1 operon
part of mce1 operon
part of mce1 operon
part of mce1 operon
part of mce1 operon
part of mce1 operon
part of mce2 operon
part of mce2 operon
part of mce2 operon
part of mce2 operon
part of mce2 operon
part of mce2 operon
possible hemolysin
putative exported p60 protein
homologue
putative exported p60 protein
homologue
putative exported p60 protein
homologue
part of mce3 operon
part of mce3 operon
part of mce3 operon
part of mce3 operon
part of mce3 operon
part of mce3 operon
putative p60 homologue
part of mce4 operon
part of mce4 operon
part of mce4 operon
part© of
mce4 operon
Nature
Macmillan
Publishers Ltd 1998
Rv3500c
Rv3501c
Rv3896c
Rv3922c
-
part of mce4 operon
part of mce4 operon
putative p60 homologue
possible hemolysin
B. IS elements, Repeated sequences, and Phage
1. IS elements
IS6110
16 copies
IS1081
6 copies
Others
37 copies
2. REP13E12 family
7 copies
3. Phage-related functions
Rv2894c xerC
integrase/recombinase
Rv1701
xerD
integrase/recombinase
Rv1054
integrase-a
Rv1055
integrase-b
Rv1573
phiRV1 phage related protein
Rv1574
phiRV1 phage related protein
Rv1575
phiRV1 phage related protein
Rv1576c phiRV1 phage related protein
Rv1577c phiRV1 possible prohead protease
Rv1578c phiRV1 phage related protein
Rv1579c phiRV1 phage related protein
Rv1580c phiRV1 phage related protein
Rv1581c phiRV1 phage related protein
Rv1582c phiRV1 phage related protein
Rv1583c phiRV1 phage related protein
Rv1584c phiRV1 phage related protein
Rv1585c phiRV1 phage related protein
Rv1586c phiRV1 integrase
Rv2309c integrase
Rv2310
excisionase
Rv2646
phiRV2 integrase
Rv2647
phiRV2 phage related protein
Rv2650c phiRV2 phage related protein
Rv2651c phiRV2 prohead protease
Rv2652c phiRV2 phage related protein
Rv2653c phiRV2 phage related protein
Rv2654c phiRV2 phage related protein
Rv2655c phiRV2 phage related protein
Rv2656c phiRV2 phage related protein
Rv2657c similar to gp36 of mycobacteriophage L5
Rv2658c phiRV2 phage related protein
Rv2659c phiRV2 integrase
Rv2830c similar to phage P1 phd gene
Rv3750c excisionase
Rv3751
putative integrase
C. PE and PPE families
1. PE family
PE subfamily
38 members
PE_PGRS subfamily 61 members
2. PPE family
68 members
D. Antibiotic production and resistance
Rv2068c blaC
class A β-lactamase
Rv3290c lat
lysine-e aminotransferase
Rv2043c pncA
pyrazinamide resistance/sensitivity
Rv0133
possible puromycin N-acetyltransferase
Rv0262c aminoglycoside 2'-N-acetyltransferase
Rv0802c acetyltransferase
Rv1082
similar to S. lincolnensis lmbE
Rv1170
similar to S. lincolnensis lmbE
Rv1347c possible aminoglycoside 6'-Nacetyltransferase
Rv2036
similar to lincomycin production
genes
Rv2303c similar to S. griseus macrotetrolide
resistance protein
Rv3225c probable aminoglycoside 3'-phosphotransferases
Rv3700c probable acetyltransferase
Rv3817
probable aminoglycoside 3'-phosphotransferase
E. Bacteriocin-like proteins
3
F. Cytochrome P450 enzymes
22
G. Coenzyme F420-dependent
enzymes
3
H. Miscellaneous transferases
61
I. Miscellaneous phosphatases, lyases,
and hydrolases
18
J. Cyclases
6
K. Chelatases
2
V. Conserved hypotheticals
912
VI. Unknowns
606
TOTAL
3924
8

Similar documents