Full-length transcriptome sequences and splice variants, obtained

Transcription

Full-length transcriptome sequences and splice variants, obtained
The Plant Journal (2015)
doi: 10.1111/tpj.12865
Full-length transcriptome sequences and splice variants,
obtained by a combination of sequencing platforms applied
to different root tissues of Salvia miltiorrhiza, and tanshinone
biosynthesis
Zhichao Xu1,†, Reuben J. Peters2,†, Jason Weirather3, Hongmei Luo1, Baosheng Liao1, Xin Zhang1, Yingjie Zhu4, Aijia Ji1,
Bing Zhang5, Songnian Hu5, Kin Fai Au3, Jingyuan Song1,* and Shilin Chen1,4,*
1
Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing
100193, China,
2
Department of Biochemistry, Biophysics & Molecular Biology, Iowa State University, Ames, IA 50011, USA,
3
Department of Internal Medicine, University of Iowa, Iowa City, IA 52242, USA,
4
Institute of Chinese Materia Medica, Chinese Academy of Chinese Medical Science, Beijing 100700, China, and
5
Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
Received 2 March 2015; revised 19 April 2015; accepted 21 April 2015.
*For correspondence (e-mails [email protected]; [email protected]).
†
These authors contributed equally to this work.
SUMMARY
Danshen, Salvia miltiorrhiza Bunge, is one of the most widely used herbs in traditional Chinese medicine,
wherein its rhizome/roots are particularly valued. The corresponding bioactive components include the tanshinone diterpenoids, the biosynthesis of which is a subject of considerable interest. Previous investigations
of the S. miltiorrhiza transcriptome have relied on short-read next-generation sequencing (NGS) technology,
and the vast majority of the resulting isotigs do not represent full-length cDNA sequences. Moreover, these
efforts have been targeted at either whole plants or hairy root cultures. Here, we demonstrate that the tanshinone pigments are produced and accumulate in the root periderm, and apply a combination of NGS and
single-molecule real-time (SMRT) sequencing to various root tissues, particularly including the periderm, to
provide a more complete view of the S. miltiorrhiza transcriptome, with further insight into tanshinone biosynthesis as well. In addition, the use of SMRT long-read sequencing offered the ability to examine alternative splicing, which was found to occur in approximately 40% of the detected gene loci, including several
involved in isoprenoid/terpenoid metabolism.
Keywords: alternative splicing, next-generation sequencing, Salvia miltiorrhiza, single-molecule real-time
sequencing, tanshinone biosynthesis.
INTRODUCTION
Salvia miltiorrhiza Bunge is considered a model medicinal
plant in traditional Chinese medicine (TCM) research
because of its significant medicinal value, relatively small
genome (approximately 538 Mb), short life cycle, efficient
transgenic system, and uncomplicated tissue culture
requirements (Ma et al., 2012). Termed danshen, S. miltiorrhiza is one of the most commonly used herbs in TCM,
wherein its dried root or rhizome is highly valued. Danshen
is best known for its use in the treatment of cardiovascular
diseases, and exhibits strong antioxidative activity (Dong
et al., 2011), leading to extensive interest in potential use
© 2015 The Authors
The Plant Journal © 2015 John Wiley & Sons Ltd
for modern clinical trials (Qiu, 2007). Indeed, Compound
Danshen Dripping Pills from the Tasly Pharmaceutical
Group Co. Ltd. underwent a phase-II trial in 2010, and are
currently undergoing a phase-III trial for an investigational
new drug (IND) by the Food and Drug Administration
(FDA) (Zhao et al., 2015). The major bioactive constituents
of S. miltiorrhiza are lipophilic diterpenoid pigments and
hydrophilic phenolic acids (Wang et al., 2007). More than
40 lipophilic diterpenoids and 20 hydrophilic phenolic
acids have been isolated and identified from S. miltiorrhiza, including tanshinone I, tanshinone IIA, cryptotanshi1
2 Zhichao Xu et al.
none, dihydrotanshinone, salvianolic acid A, salvianolic
acid B, rosmarinic acid, lithospermic acid and dihydroxyphenyllactic acid. Elucidating the biosynthetic pathways
and regulatory mechanisms of the active constituents will
provide a foundation for investigating the use of danshen
in TCM, and the potential production of these natural products as innovative pharmaceutical materials (Kai et al.,
2011).
As a result of the interest in the medicinal properties of
danshen there has been extensive investigation of its transcriptome. An early report used unsequenced cDNAs from
hairy root cultures to construct a microarray, with differential expression correlated with either culture time/development or induction (both of which are associated with
tanshinone accumulation), which was used to highlight
cDNAs for sequencing (Ge and Wu, 2005). The tanshinones
are labdane-related diterpenoids (Peters, 2010), the biosynthesis of which requires a copalyl diphosphate synthase
(CPS) and subsequently acting cyclase related to the kaurene synthases involved in gibberellin phytohormone
metabolism, which is often termed kaurene synthase-like
(KSL). Accordingly, the functional characterization of the
two inducible diterpene synthases found in the microarray
study (SmCPS1 and SmKSL1) led to the identification of
the resulting diterpene olefin precursor to the tanshinone
miltiradiene. Later, next-generation sequencing (NGS)based RNA-Seq analysis of similarly induced hairy root
cultures led to the identification and functional characterization of a cytochrome P450 (CYP) involved in tanshinone
biosynthesis, CYP76AH1 (Guo et al., 2013), which carries
out the initial hydroxylation of aromatized miltiradiene to
form ferruginol.
Other transcriptomic studies have been reported, including an untargeted expressed sequence tag (EST) effort
using whole plantlets that yielded partial sequences for
approximately 4000 different unigenes (Yan et al., 2010),
and RNA-seq analyses of the transcriptome from growing
plants (Hua et al., 2011) or induced leaves (Luo et al.,
2014). The short-read sequences generated by NGS generally prevented the assembly of full-length transcripts, however, necessitating additional effort to clone cDNAs of
potential interest (e.g. for SmCPS1, SmKSL1 and
CYP76AH1; Guo et al., 2013), and the reported average
lengths of the isotigs from the previous RNA-Seq investigations are <500 bp. In addition, these previous studies did
not dissect the root finely enough to localize tanshinone
production and accumulation for more informative coexpression studies.
Single-molecule real-time (SMRT) sequencing carried
out in PACBIO RS (Pacific Biosciences of California, Inc, http://
www.pacificbiosciences.com/) provides a third-generation
sequencing platform that is widely used in genome
sequencing because of its long reads (average 4–8 kb;
Chaisson et al., 2014; Chen et al., 2014b). Moreover, recent
studies have addressed the problem of the higher error
rate (up to 15%) observed with SMRT sequencing, by correction with NGS reads (Au et al., 2013) and/or self-correction via circular-consensus (CCS) reads (Li et al., 2014). The
use of SMRT sequencing then offers access to more complete (i.e. full-length) transcriptome data, as has been
recently demonstrated (Au et al., 2013; Sharon et al., 2013;
Chen et al., 2014a). Here we combined NGS and SMRT
sequencing to generate a more complete/full-length S. miltiorrhiza transcriptome. Moreover, this approach was
applied to dissected root samples, enabling a more precise
correlation of co-expression data for the resulting transcriptional data to the periderm, where tanshinones are
produced and accumulated. Accordingly, this study provides a valuable resource for further investigation of tanshinone biosynthesis.
RESULTS
Localization of tanshinone accumulation
It is the rhizome or root of S. miltiorrhiza that is used in
TCM, accounting for the value of hairy root cultures in
studies of this model medicinal herb. The rhizome/root of
S. miltiorrhiza exhibits a characteristic reddish brown
color, stemming from the tanshinones, which are largely
found in the periderm, as can be readily appreciated by
simply peeling or viewing a cross section of this organ
(Figure 1a–c). Phytochemical analysis of the peeled tissues,
roughly corresponding to the periderm, phloem and
xylem, respectively (Figure 1d), demonstrates the localization of the tanshinones to the periderm (e.g. tanshinone
IIA; Figure 1e). These results suggest that tanshinone biosynthesis may be completely carried out in this tissue, providing a potential basis for co-expression analysis.
Combined sequencing approach to the roots of danshen
To identify and differentiate the periderm transcriptome
from that of the rest of the root, two experiments were
undertaken, using either the NGS or the SMRT sequencing
platforms (ILLUMINA; Illumina, Inc, http://www.illumina.com/
and PACBIO, respectively). First, nine mRNA samples from
three different root tissues (periderm, phloem and xylem;
each in triplicate) were subjected to 2 9 100 paired-end
sequencing using the HiSeq 2500 platform, with
489 309 772 reads produced (Table S1). Second, full-length
cDNAs from nine pooled poly(A) RNA samples were normalized and subjected to an SMRT sequencing using the
PACBIO RS platform. In total, 1 202 336 raw reads (4.8 billion
bases) were generated by PACBIO RS. After filtering using the
RS_Subreads.1 of PACBIO RS, 796 011 subreads representing
4.3 billion bases were obtained. Next, we performed RS_IsoSeq.1 protocols, which included Classify, Cluster and
Map to the reference genome, to generate CCS data, as
this provides much more accurate sequence information
© 2015 The Authors
The Plant Journal © 2015 John Wiley & Sons Ltd, The Plant Journal, (2015), doi: 10.1111/tpj.12865
Combined sequencing of Salvia miltiorrhiza roots 3
(a)
(b)
(d)
(c)
(e)
Figure 1. Morphology and microstructure of the root of Salvia mitiorrhiza.
(a) The roots of S. miltiorrhiza.
(b) The roots were peeled into three parts, which roughly correlate to the periderm, phloem and xylem.
(c) The root tissues under a stereomicroscope; the radius of the root was 0.75 cm.
(d) Paraffin section of the root. Three tissues were clearly identified.
(e) The chromatogram and corresponding histograms indicate the differences in tanshinone IIA levels from the three different tissues.
from reads that pass at least three times through the insert
(Sharon et al., 2013), and obtained 70 761 multipass consensus reads, all generated from the <1 and 1–2 kb
libraries, as it proved to be too difficult to produce consensus CCS reads from the 2–3 and >3 kb libraries, because of
their larger insert lengths. In total, 223 368 full-length reads
were obtained as indicated by detection of the poly(A), as
well as 50 and 30 primer, sequences.
All of the SMRT subreads were mapped against the
S. miltiorrhiza genome, with 96% of the reads successfully
mapped using BLAT (Kent, 2002; Figure S1). To resolve the
high error rates of the subreads, all 796 011 SMRT subreads were corrected using the approximately 500 million
NGS reads as input data (Figure 2; Au et al., 2012). After
removing the redundant sequences for all SMRT subreads
using CD-HIT-EST (c = 0.90), 160 468 non-redundant reads
were produced, with a mean read length of 2059 bases.
Besides those coding for proteins, 11 046 of these reads
were predicted to be long (more than 200 bases) non-cod-
ing RNAs using the coding potential calculator (CPC) for
non-redundant long reads.
Even though the coverage was quite high (approximately 2009), the transcripts assembled from the ILLUMINA
short reads by Trinity largely did not represent full-length
cDNAs. Approximately 61% of the assembled transcripts
from NGS reads were <600 bases, whereas only 4% of the
transcripts from the PACBIO reads were <600 bases (Figure 2). Indeed, the mean full-length read lengths from the
different libraries (<1, 1–2, 2–3, and >3 kb) produced by
SMRT sequencing were 923, 1283, 2026, and 3020 bases,
respectively (Table S1). Nevertheless, from this study it
seems that the use of NGS data to correct the low-quality
SMRT reads may be better than simply relying on CCS
reads.
In total, from the NGS data, using a cut-off of
FPKM > 10 (fragments per kilobase of exon model per
million mapped reads), expression from 12 667 distinct
gene loci was detected in the root, with 11 174 expressed
© 2015 The Authors
The Plant Journal © 2015 John Wiley & Sons Ltd, The Plant Journal, (2015), doi: 10.1111/tpj.12865
4 Zhichao Xu et al.
Figure 2. (a) Comparison of transcript length distribution from different sequencing platforms.
(b) Comparison of PACBIO read quality from subreads
and corrected reads.
(a)
(b)
in the periderm, 11 149 in the phloem and 10 933 in the
xylem. Some genes were uniquely expressed in a single
root tissue: 939 in the periderm, 347 in the phloem and
422 in the xylem. Thus, it is possible to distinguish
between the transcriptomes from each of these root tissues/sections.
Expression analysis indicates co-localization of tanshinone
biosynthesis and accumulation
In order for the periderm-localized expression of SmCPS1
to be relevant to tanshinone biosynthesis there must be
similarly localized production of its substrate, the general
diterpenoid precursor (E,E,E)-geranylgeranyl diphosphate
(GGPP). In turn, this results from the addition of the general isoprenoid precursor isoprenyl diphosphate (IPP) to
allylic diphosphate isoprenyls, beginning with dimethylallyl diphosphate (DMAPP). IPP and DMAPP are doublebond isomers interconverted by IPP isomerase (IPI). IPP
and DMAPP are produced by the 2-C-methyl-D-erythritol 4phosphate (MEP)-dependent pathway in the plastid,
where diterpenoid biosynthesis is initiated, although IPP
can be imported from the cytosol, where it is produced
by the distinct mevalonate (MVA)-dependent pathway (Zi
et al., 2014). Accordingly, we investigated the root tissuespecific expression of the isogenes encoding the enzymes
that make up both the MEP- and MVA-dependent isoprenoid precursor pathways, as well as potential GGPP synthases (GGPSs) and IPI. Consistent with the localized
production of the tanshinones in the periderm, analysis of
our root tissue-specific transcriptome data set revealed
Figure 3. Heat map depicting the expression profile of isoprenoid and more specifically tanshinone biosynthesis-related genes in the periderm, phloem and
xylem tissues of Salvia miltiorrhiza.
(a) Transcript abundance profiles of enzymatic genes from the MEP pathway.
(b) Transcript abundance profiles of enzymatic genes from the MVA pathway.
(c, d) Differential expression of various diterpene synthases (SmCPSs and SmKSLs), of which SmCPS1 and SmKSL1 together lead to the production of the
known tanshinone precursor miltiradiene.
(e) Differential expression of CYP76AH1, which produces ferruginol.
© 2015 The Authors
The Plant Journal © 2015 John Wiley & Sons Ltd, The Plant Journal, (2015), doi: 10.1111/tpj.12865
Combined sequencing of Salvia miltiorrhiza roots 5
(b)
(a)
(c)
(d)
(e)
© 2015 The Authors
The Plant Journal © 2015 John Wiley & Sons Ltd, The Plant Journal, (2015), doi: 10.1111/tpj.12865
6 Zhichao Xu et al.
that not only was SmCPS1 specifically expressed in the
periderm, but also that at least one isoform of each of the
enzymes that make up the MEP- and MVA-dependent precursor pathways, as well as GGPS and IPI (Figure 3a,b). In
addition, of the 12 SmCPS and nine SmKSL homologs in
the S. miltiorrhiza genome, SmCPS1 and SmKSL1 were
the most highly (and quite specifically) expressed in the
periderm (Table S2). Whereas SmCPS5, SmKSL7 and
SmKSL8 also exhibit somewhat higher expression in the
periderm than other root tissues, these all seemed to be
expressed at significantly lower levels than SmCPS1 and
SmKSL1 (Table S2). Moreover, the SmCPS1 and SmKSL1
expression patterns observed here are consistent with
their role in tanshinone production, and both SmKSL7
and SmKSL8 appear to be pseudogenes (Figure 3c,d).
Even beyond these, CYP76AH1, suggested to play a role
in tanshinone biosynthesis, is also specifically expressed
in the periderm (Figure 3e).
Co-expression analysis for the investigation of tanshinone
biosynthesis
Given the clear periderm-specific expression of the biosynthetic machinery necessary for the production of at least
the initially oxygenated intermediate ferruginol (Figure 3),
and the accumulation of the tanshinones (Figure 1), we
hypothesize that tanshinone biosynthesis occurs entirely in
this root tissue. Accordingly, the remainder of the genes
encoding enzymes involved in tanshinone biosynthesis
might be expected to exhibit a similar periderm-specific
(co-)expression pattern. Beyond that suggested for
CYP76AH1, CYP mono-oxygenases are likely to play additional roles in tanshinone biosynthesis. Consistent with the
expanded nature of the CYP superfamily in plants (Nelson
and Werck-Reichhart, 2011), a total of 457 CYPs were identified from the S. miltiorrhiza genome (Table S3). Among
the CYP genes, 21% (96/457) were expressed in the periderm with an FPKM > 10, with 33 exhibiting periderm-specific expression profiles like that observed for CYP76AH1
(Table S4). To further refine this list, we carried out qRTPCR analysis of the expression level of these genes in a
wider range of plant organs (flowers, leaves, roots, and
stems), as well as leaves treated with the defense signaling
molecule methyl jasmonate (MeJA). As controls, SmCPS1
and SmKSL1 were also analysed in this manner, verifying
their periderm-specific expression. Notably, sixteen CYPs,
including CYP76AH1, were then identified as being most
specifically expressed in the periderm, and we suggest that
these should be given priority in further investigations of
tanshinone biosynthesis (Figure 4). Moreover, phylogenetic analysis indicated that two of these were also members of the CYP76AH subfamily, CYP76AH3v3
(SMil_00006344) and CYP76AH3 (SMil_00029757), which by
definition share >55% amino acid sequence identity with
CYP76AH1 (Figure S2). Given the analogous ferruginol syn-
thase activity of CYP76AH4 from Rosmarinus officinalis
(rosemary) as that observed with CYP76AH1 (Zi and Peters,
2013), this suggests that the CYP76AH subfamily may have
evolved to play a role in such phenolic diterpenoid biosynthesis in the Laminaceae plant family more generally.
Given the highly oxidized nature of the tanshinones, it is
possible that other oxygenases (e.g. 2-oxo-glutarate
dependent di-oxygenases, 2ODDs), as well as dehydrogenases (e.g. short-chain alcohol dehydrogenases, SDRs), may
play role(s) in tanshinone production, much as observed in
other plant diterpenoid biosynthesis (Zi et al., 2014).
Accordingly, we carried out similar co-expression analysis
of these enzymatic families as well. The 2ODD superfamily
is also quite expansive in plants (Kawai et al., 2014), with
144 members found in S. miltiorrhiza, 47 of which were
expressed in the roots with FPKM > 10. Of these, 16 were
found to be more highly expressed in the periderm than in
the rest of the root (Table S5); however, upon analysis of a
wider range of plant tissues only one was found to exhibit
a root-high expression profile (2ODD-8; Figure 4). The SDR
superfamily is similarly expansive in plants (Moummou
et al., 2012), with 159 members present in S. miltiorrhiza,
48 of which were expressed in the root with FPKM > 10. Of
these, five were found to be more highly expressed in the
periderm than in the rest of root, and wider analysis indicated that all five further exhibit a periderm-specific
expression profile (Figure 4; Table S6). The co-expression
pattern exhibited by the one 2ODD and five SDRs may indicate a role for these in tanshinone biosynthesis that warrants further investigation.
Alternatively spliced isoforms
The long reads generated by SMRT sequencing are
expected to offer extensive information about alternative
splicing (Au et al., 2013; Sharon et al., 2013; Chen et al.,
2014a). Consistent with this, analysis of the 60 584 058 Illumina short reads by SPLICEMAP (Au et al., 2010) led to the
detection of only 110 715 junctions that were retained after
nUM filtering with approximately 95% specificity. By contrast, isoform detection and prediction (IDP) analysis of the
1 313 216 sequences generated by SMRT sequencing
detected junctions in 1 109 011 of these long-read data
(84%). Although there are 26 064 loci annotated as multiexon genes in the S. miltiorrhiza genome, from the NGS
data CUFFLINKS (Trapnell et al., 2012) identified only 10 245
expressed with FPKM > 10 in the root, with 3745 genes of
these directly detected using IDP, a 36% detection rate.
From spliced alignment of the long-read SMRT sequences,
IDP analysis found 16 241 isoforms covering 10 323 multiexon genes found in this data set, with 6660 exhibiting
FPKM > 10, increasing the sensitivity of isoform identification up to 65% (Figure 5a).
Of the 10 323 multi-exon gene loci expressed in the root,
4165 (40%) exhibited alternative spliced isoforms, with
© 2015 The Authors
The Plant Journal © 2015 John Wiley & Sons Ltd, The Plant Journal, (2015), doi: 10.1111/tpj.12865
Combined sequencing of Salvia miltiorrhiza roots 7
Figure 4. Heat map depicting the various CYPs, 2ODDs and SDRs that are co-expressed with the SmCPS1, SmKSL1 and CYP76AH1 known to be involved in tanshinone biosynthesis, in a range of different tissues (periderm, phloem, xylem, root, stem, leaf and flower), as well as control or methyl jasmonate (MeJA)-treated leaves of S. miltiorrhiza (MeJA-0 and MeJA-12, respectively). qRT-PCR analysis was also carried out for the genes in red.
more than two isoforms found for 15% (Figure 5b,c). It
should be noted that 3526 (85%) of these loci exhibit predominant expression of a single isoform, and the alternative isoforms observed for these may simply represent
splicing errors. Nevertheless, our data provide clear evidence for alternative splicing in the S. miltiorrhiza root.
Consistent with this, our combined transcriptome data
demonstrated the expression of genes encoding all the
necessary subunits for splicesome assembly (Table S7). To
investigate the distribution of the different types of alternative splicing, we further analyzed all of the junctions and
isoforms detected using SPLICEMAP and IDP. A total of 12 264
identified isoforms contained annotated and unannotated
junctions, which represented alternatively spliced isoforms
of known genes. Of these, 21 and 4% resulted from intron
retention and exon skipping events, respectively, whereas
18 and 39% of the junctions were characterized as alternative 50 and 30 splice site events, respectively (Figures 5d
and S4).
The genes encoding SmCPS1 and the CYPs do not seem
to undergo any significant degree of alternative splicing.
Nevertheless, there may be a role for alternative splicing in
regulating tanshinone biosynthesis and isoprenoid/terpenoid metabolism more generally (Figure S5). First, five differentially spliced isoforms were observed for SmKSL1 (all
with FPKM > 10), only one of which seems likely to encode
a catalytically competent enzyme. In addition, a number of
genes involved in the production of the isoprenoid precursors exhibit alternative splicing, with only one (SmHDR3,
4-hydroxy-3-methylbut-2-enyl diphosphate reductase) from
the MEP pathway, but with four from the MVA pathway:
SmAACT3 (acetyl-CoA C-acetyltransferase), SmHMGR (3-
© 2015 The Authors
The Plant Journal © 2015 John Wiley & Sons Ltd, The Plant Journal, (2015), doi: 10.1111/tpj.12865
8 Zhichao Xu et al.
(a)
(d)
(b)
(c)
hydroxy-3-methylglutaryl-coenzyme A reductase), SmMK
(mevalonate kinase) and SmPMK (5-phosphomevalonate
kinase). Of particular interest, both SmHDR3 and SmPMK
have multiple isoforms expressed with FPKM > 10, only
one of which can be translated to a catalytically competent
enzyme in each case, such that the regulation of alternative
splicing may play a role in controlling flux through both
the MEP- and MVA-dependent isoprenoid precursor pathways (Table S8).
DISCUSSION
The long-standing and widespread use of danshen in TCM,
along with its continuing translation to modern western
medicine, has led to intense interest in the biosynthesis of
the relevant bioactive components (Qiu, 2007). Much of
this interest has focused on the tanshinone diterpenoids,
which provide the characteristic reddish brown coloring to
the highly valued rhizome, and exhibit potent biological
activity (Dong et al., 2011). For this purpose, a number of
whole-plant and hairy root culture-based transcriptome
studies have been previously reported (Ge and Wu, 2005;
Hua et al., 2011; Luo et al., 2014; Yan et al., 2010); however, these studies were limited by either number and/or
length of the generated sequence information, necessitating further cloning efforts in order to obtain full-length
cDNA sequences for the investigation of potential roles in
tanshinone biosynthesis (Guo et al., 2013). With the identification of the diterpene precursor miltiradiene, many of
the remaining steps in tanshinone biosynthesis are likely
to be catalyzed by CYPs (Zi et al., 2014), the investigation
of which largely relies on synthetic biology approaches
Figure 5. Detection and prediction of the gene isoforms of Salvia miltiorrhiza using IDP.
(a) Venn diagram of isoform detection and prediction. A total of 4035 isoforms and 16 241 isoforms
are detected and predicted, respectively.
(b, c) The distribution of alternative spliced isoforms from each gene locus.
(d) Pie chart of the different alternative spliced
types. ES, exon skipping; IR, intron retention; A30 S,
alternative 30 splice site; A50 S, alternative 50 splice
site.
using genes codon-optimized for recombinant expression
(Kitaoka et al., 2015). This obviously requires accurate and
full-length cDNAs, and can be limited by inaccurate
sequence information, such as those predicted from genome sequences: e.g. as demonstrated by the investigation
of the KS(L) gene family in Ricinus communis (castor bean;
Jackson et al., 2014).
Given our interest in tanshinone biosynthesis, we demonstrate here that not only accumulation (Figure 1) but
also biosynthesis of the tanshinones occurs in the danshen
root periderm (Figure 3). To address the incomplete transcriptome available for S. miltiorrhiza we combined shortread NGS and long-read SMRT sequencing of three distinct root tissues (i.e. the periderm, phloem and xylem),
from which we were able to generate a much more complete transcriptome of the danshen root (Figure 2). The use
of full-length libraries with long SMRT sequencing reads
(SMRT sequencing N50 = 2411 bp) enabled the generation
of full-length transcripts relative to assemblies generated
with ILLUMINA reads only (ILLUMINA assembled isotigs
N50 = 1530 bp; Figure 2a). Nevertheless, a hybrid sequencing approach combining both types of data, specifically
correcting the SMRT reads using ILLUMINA reads, led to
high-quality full-length transcripts, avoiding mis-assemblies of genes and gene families with high sequence identity. Via this accurate hybrid approach, we were able to
generate full-length sequences for a significantly higher
proportion of the enzymatic genes involved in terpenoid
biosynthesis. In particular, although ILLUMINA-based studies
were only able to assemble full-length cDNA sequences for
43% of the relevant enzymatic families on average (e.g. ter-
© 2015 The Authors
The Plant Journal © 2015 John Wiley & Sons Ltd, The Plant Journal, (2015), doi: 10.1111/tpj.12865
Combined sequencing of Salvia miltiorrhiza roots 9
pene synthases, CYPs, 2ODD and SDRs), here we were
able to increase this to 73% (Table S10).
Critically, our transcriptome analysis not only includes
the periderm, where tashinones are produced and stored,
but also alternative root tissues. This then enabled the interrogation of the transcriptomic data to verify periderm-specific expression of both the MEP- and MVA-dependent
isoprenoid precursor pathways, along with IPI and potential
GGPS genes required for the production of diterpenoids, as
well as the SmCPS1, SmKSL1 and CYP76AH1 genes more
specifically involved in tanshinone biosynthesis (Figure 3).
On this basis we then analyzed the extensive CYP, 2ODD
and SDR superfamilies in S. miltiorrhiza, finding that a limited number of each were more highly expressed in the
periderm than the other root tissues investigated here
(Tables S4–S6). Given this more limited number of genes it
was further possible to use qRT-PCR to more generally analyze their expression pattern, with co-expression with
SmCPS1, SmKSL1 and CYP76AH1 suggesting 15 additional
CYPs, one 2ODD, and five SDRs that may play role(s) in tanshinone biosynthesis (Figures 4 and S3).
Previous studies relying on NGS were able to identify
novel introns and splicing variants that altogether indicated
that up to 60% of multi-exon genes underwent alternative
splicing events in different plants (Wang et al., 2009), such
as Arabidopsis thaliana (Filichkin et al., 2010; Marquez
et al., 2012), Glycine max (Shen et al., 2014), Brachypodium
distachyon (Walters et al., 2013) and Oryza sativa (Zhang
et al., 2010). Whereas such NGS short-read data can identify
spliced junctions with the use of SPLICEMAP or TOPHAT (Kim
and Salzberg, 2011), the mostly incomplete nature of the
assembled transcripts largely eliminates the direct identification of distinct isoforms. Combining SMRT long reads
and NGS short reads led to sensitive isoform detection and
prediction, revealing the corresponding alternative splicing
events in the human transcriptome (Au et al., 2013; Sharon
et al., 2013; Chen et al., 2014a). Similarly, our hybrid
sequencing approach has enabled such analysis of the
S. miltiorrhiza transcriptome. In total, 40% of the detected
gene loci were identified as undergoing alternative splicing
in S. miltiorrhiza (Figure 5). It should be noted that for the
majority of these a single isoform was predominant, suggesting that some of the observed alternative isoforms may
not be significant (e.g. the result of splicing errors rather
than a regulated/controlled process; Reddy et al., 2013).
Nevertheless, alternative splicing is clearly observed among
the genes involved in tanshinone biosynthesis (Figure S5),
which may serve as a regulatory mechanism in controlling
such diterpenoid metabolism.
In summary, we localized the metabolism of the bioactive
tanshinone diterpenoids from the model medicine plant
S. miltiorrhiza to the root periderm, and carried out tissuedifferentiated transcriptome analysis of this plant organ
using a combined NGS short-read and SMRT long-read
sequencing approach. This enabled the generation of fulllength transcripts as well as providing evidence for periderm-localized tanshinone biosynthesis, which was used to
further identify a subset of 15 CYPs, the co-expression of
which with the already known enzymatic genes SmCPS1,
SmKSL1 and CYP76AH1, suggests a role in such bioactive
diterpenoid metabolism. Moreover, our study provides a
template for investigating secondary metabolism in other
species, paving the way towards synthetic biology
approaches to such natural products.
EXPERIMENTAL PROCEDURES
Plant materials and RNA sample preparation
Three-year-old S. miltiorrhiza (line 99-3) plants were harvested
from an experimental field at the Institute of Medicinal Plant
Development (IMPLAD). Fifteen independent root samples were
collected and divided into three portions. Each portion was
divided into three parts (periderm, phloem and xylem). Nine total
RNA samples (three different root tissues with three repetitions)
were isolated using the RNeasy Plus Mini Kit (#74134; Qiagen,
http://www.qiagen.com). The total RNA was quantified and the
quality was assessed using an Agilent 2100 Bioanalyzer (Agilent,
http://www.agilent.com). Three experiments were conducted.
First, the different organs (root, stem, leaf and flower) were collected, and total RNA was extracted. Eight libraries of four different organs were subjected to 2 9 100 paired-end sequencing
using the ILLUMINA HiSeq 2000 platform. Second, nine libraries of
three root tissues (periderm, phloem and xylem) were subjected
to 2 9 100 paired-end RNA-seq using ILLUMINA HiSeq 2500. Third,
the nine individual samples were pooled to provide 90 lg of total
S. miltiorrhiza RNA. Poly(A) RNA was isolated from the total RNA
using the oligo d(T) magnetic bead binding method and the Poly
(A)PuristTM Kit (#AM1916; Ambion, now Life Technologies, http://
www.lifetechnologies.com/uk/en/home/brands/ambion.html). Isolated poly(A) RNA was eluted with 20 ll of RNase-free water. All
of the experiments were performed following the protocols
included with the kits.
cDNA synthesis and normalization
Isolated poly(A) RNA was quantified using the Agilent 2100 bioanalyzer. First-strand cDNA was synthesized using the SMARTer PCR
cDNA Synthesis Kit (#634926; Clontech, http://www.clontech.com).
The tailing by SMARTScribeTM Reverse Transcriptase could switch
the same adaptor primer on the 30 and 50 ends of the poly(A) RNA
using CDS Primer IIA [50 -AAGCAGTGGTATCAACGCAGAGTACT(30)N–1N-30 ] and SMARTer IIA Oligonucleotide (50 -AAGCAGTGGTATCAACGCAGAGTACXXXXX-30 ). Next, second-strand
cDNA synthesis was performed using Phusion High-Fidelity DNA
Polymerase (#M0530; NEB, http://www.neb.com) with 50 PCR Primer IIA (50 -AAGCAGTGGTATCAACGCAGAGTAC-30 ). As revealed
by preliminary testing, the 14-cycles condition was optimal for
avoiding the over-amplification of small fragments. Purified cDNA
was normalized using the Trimmer-2
cDNA Normalization Kit (#NK003; Evrogen, http://www.evrogen.com). Next, the normalized cDNA was amplified using the 50
PCR Primer IIA with 18 cycles. Agarose gel-based size selection
was performed using the SYBR Safe DNA Gel Stain and blue light
system to avoid DNA damage. Then, four fractions, containing
fragments >3, 2–3, 1–2, or <1 kb, were collected and purified using
the QIAquick Gel Extraction Kit. The extracted products were
© 2015 The Authors
The Plant Journal © 2015 John Wiley & Sons Ltd, The Plant Journal, (2015), doi: 10.1111/tpj.12865
10 Zhichao Xu et al.
amplified using the 50 Primer IIA. Importantly, all of the PCR steps
required the selection of the most optimal cycling conditions to
avoid the over-amplification of small fragments. After amplification, the PCR products were purified using 0.59 AMPure beads
(#A63880; Beckman, http://www.beckmancoulter.com).
Library preparation and PACBIO sequencing
Four normalized cDNAs of different sizes were constructed separately for four SMRT cell libraries using a DNA Template Prep Kit
(3–10 kb, part #001-540-835; Pacific Biosciences of California, Inc,
http://www.pacificbiosciences.com/). The templates were bound to
SA-DNA polymerase and V2 primers using the DNA/polymerase
Binding Kit 2.0 (part #001-672-551). The complexes of templates
and polymerase were bound to magnetic beads (part #100-125900) and transferred to a 96-well PCR plate for processing on a PACBIO RS using C2 sequencing reagents. Each library underwent
SMRT sequencing using two SMRT cells. Subreads were filtered
and subjected to CCS using the SMRT Analysis Server 2.2.0 (Pacific Biosciences of California, Inc).
Isoform detection and prediction
The short reads generated with HiSeq 2500 were filtered using the
NGS QC Toolkit. LSC 1.alpha software was used to correct CCS
reads by alignment with filtered NGS short reads. SPLICEMAP 3.3.5.2
(Au et al., 2010) was used to detect exon junctions and novel gene
loci. BOWTIE was used to align the short reads with the S. miltiorrhiza genome using SPLICEMAP. IDP 0.1.7 used the error-corrected long
reads from LSC and the junctions from SPLICEMAP as input to detect
and predict the isoforms.
Phylogenetic analysis
Eight CYPs related to artemisinin biosynthesis in Artemisia annua
were selected from NCBI and then pooled with 34 CYPs before
performing an alignment with MEGA 6 (MEGA, http://www.megasoftware.net/). We then constructed an unrooted phylogenetic tree
using the neighbour-joining clustering method with the full-length
amino acid sequences using the bootstrap method with 1000 replications.
qRT-PCR analysis
Nine RNA samples were isolated from different tissues (periderm,
phloem, xylem, root, stem, leaf and flower) from S. miltiorrhiza,
and leaves were treated with MeJA (control and MeJA, 12 h).
Reverse transcription was performed with PrimeScriptTM Reverse
Transcriptase (TaKaRa, http://www.takara-bio.com). qRT-PCR
primers were designed with PRIMER PREMIER 6 (PREMIER Biosoft,
http://www.premierbiosoft.com/primerdesign/), and their specificity was verified by PCR (Table S9). qRT-PCR analysis was conducted in triplicate using SYBRâ Premix Ex TaqTM II (TaKaRa),
with SmActin as a reference gene, by 7500 real-time PCR system
(ABI).
Accession codes
SMRT sequencing data and ILLUMINA HiSeq 2500 data have been
submitted to the Sequence Read Archive (SRA) of the National
Center for Biotechnology Information (NCBI) under accession numbers SRX753381, SRR1640458, SRP028388 and
SRP051564.
Differential expression analysis
ACKNOWLEDGEMENTS
The reads from three root tissues (periderm, phloem and xylem)
and four different organs (root, stem, leaf and flower) were produced in this study. The data from MeJA-treated leaves (200 lM)
were derived from a previous study (Luo et al., 2014). The
expression analysis from ILLUMINA reads of different tissues and
treatment was performed with TOPHAT and CUFFLINKS (Trapnell et al.,
2012).
We thank Dr David R. Nelson for helping in the naming of the
CYP450s of Salvia miltiorrhiza. This work was supported by the
National Science-technology Support Plan of China (grant no.
2012BAI29B01) and the Major Scientific and Technological Special
Project for ‘Significant New Drugs Creation’ (grant no.
2014ZX09304307001).
lncRNAs prediction
Redundant reads of the error-corrected CCS reads were filtered
using CD-HIT-EST. The CD-HIT-EST clustered the cDNAs with similarity thresholds of 0.85 into clusters and then removed the redundant sequences. A total of 160 468 filtered non-redundant
sequences as input data were subjected to the coding potential
calculator (CPC) to predict lncRNAs.
UPLC analysis of tanshinone IIA content
The detection methods followed the Pharmacopoeia of the People’s Republic of China. All of the periderm, phloem and xylem
samples were ground into a powder with three repetitions for each
sample, and then each sample of weighed ground powder (0.3 g)
was extracted with 50 ml of methanol. After 1 h of heating reflux
extraction, methanol was added to complement and maintain a
constant weight, and then the sample was filtered through a 0.45lm syringe filter. In addition, the Tanshinone IIA standard was
dissolved in methanol at a concentration of 16 mg ml1. Chromatographic separations were performed using the Waters
X bridge C18 column with a mobile phase of 75% methanol to 25%
H2O in a Waters UPLC system (Waters, http://www.waters.com).
SUPPORTING INFORMATION
Additional Supporting Information may be found in the online version of this article.
Figure S1. (a) The distribution of PACBIO reads of different length.
(b) Mapping statistics of the PACBIO reads to the Salvia miltiorrhiza
genome with BLAT. (c) A count VS dispersion plot for different tissues from the root with the CUFFLINKS to the ILLUMINA reads.
Figure S2. Phylogenetic tree analysis of candidate CYPs that were
co-expressed with CYP76AH1.
Figure S3. qRT-PCR analysis of the putative copalyl diphosphate
synthase (CPS), kaurene synthase-like (KSL), cytochrome P450s
(CYPs), 2-oxo-glutarate dependent di-oxygenase (2ODDs) and
short-chain alcohol dehydrogenases (SDRs) with putative roles in
tanshinone biosynthesis in different tissues (periderm, phloem,
xylem, root, stem, leaf, flower), and without or with MeJA treatment (MeJA-0 and MeJA-12) in Salvia miltiorrhiza.
Figure S4. Predicted isoforms of existing gene loci with different
alternatively spliced types using the IGV genome browser.
Figure S5. The different alternative splicing isoforms of enzymatic
genes involved in Salvia miltiorrhiza terpenoid biosynthesis.
Table S1. General properties of the reads produced by ILLUMINA
HiSeq 2500 and PACBIO sequencing platforms.
© 2015 The Authors
The Plant Journal © 2015 John Wiley & Sons Ltd, The Plant Journal, (2015), doi: 10.1111/tpj.12865
Combined sequencing of Salvia miltiorrhiza roots 11
Table S2. Genome-wide identification of diterpenoid synthases in
Salvia miltiorrhiza.
Table S3. Genome-wide identification of CYPs in Salvia miltiorrhiza.
Table S4. Genome-wide identification of candidate CYPs that were
co-expressed with CYP67AH1 in Salvia miltiorrhiza.
Table S5. Genome-wide identification of candidate 2-oxo-glutarate
dependent di-oxygenases (2ODDs) that exhibited periderm-specific expression in Salvia miltiorrhiza.
Table S6. Genome-wide identification of candidate short-chain
alcohol dehydrogenases (SDRs) that exhibited periderm-specific
expression in Salvia miltiorrhiza.
Table S7. Genome-wide identification of spliceosomal proteins in
Arabidopsis thaliana and Salvia miltiorrhiza.
Table S8. The expression pattern of differentially spliced isoforms
of enzymatic genes from tanshinones biosynthesis in Salvia miltiorrhiza.
Table S9. The primers used for qPCR analysis.
Table S10. Identified full-length or partial-length genes of the
expressed terpenoid synthases, candidate CYPs, 2ODD and SDRs
from ILLUMINA assembly and hybrid sequencing.
REFERENCES
Au, K.F., Jiang, H., Lin, L., Xing, Y. and Wong, W.H. (2010) Detection of
splice junctions from paired-end RNA-seq data by SpliceMap. Nucleic
Acids Res. 38, 4570–4578.
Au, K.F., Sebastiano, V., Afshar, P.T. et al. (2013) Characterization of the
human ESC transcriptome by hybrid sequencing. Proc. Natl Acad. Sci.
USA, 110, E4821–E4830.
Au, K.F., Underwood, J.G., Lee, L. and Wong, W.H. (2012) Improving PacBio
long read accuracy by short read alignment. PLoS ONE, 7, e46679.
Chaisson, M.J., Huddleston, J., Dennis, M.Y. et al. (2014) Resolving the
complexity of the human genome using single-molecule sequencing.
Nature, 517, 608–611.
Chen, L., Kostadima, M., Martens, J.H. et al. (2014a) Transcriptional diversity during lineage commitment of human blood progenitors. Science,
345, 1251033.
Chen, X., Bracht, J.R., Goldman, A.D. et al. (2014b) The architecture of a
scrambled genome reveals massive levels of genomic rearrangement
during development. Cell, 158, 1187–1198.
Dong, Y., Morris-Natschke, S.L. and Lee, K.H. (2011) Biosynthesis, total syntheses, and antitumor activity of tanshinones and their analogs as potential therapeutic agents. Nat. Prod. Rep. 28, 529–542.
Filichkin, S.A., Priest, H.D., Givan, S.A., Shen, R., Bryant, D.W., Fox, S.E.,
Wong, W.K. and Mockler, T.C. (2010) Genome-wide mapping of alternative splicing in Arabidopsis thaliana. Genome Res. 20, 45–58.
Ge, X. and Wu, J. (2005) Tanshinone production and isoprenoid pathways
in Salvia miltiorrhiza hairy roots induced by Ag+ and yeast elicitor. Plant
Sci. 168, 487–491.
Guo, J., Zhou, Y.J., Hillwig, M.L. et al. (2013) CYP76AH1 catalyzes turnover
of miltiradiene in tanshinones biosynthesis and enables heterologous
production of ferruginol in yeasts. Proc. Natl Acad. Sci. USA, 110, 12108–
12113.
Hua, W., Zhang, Y., Song, J., Zhao, L. and Wang, Z. (2011) De novo transcriptome sequencing in Salvia miltiorrhiza to identify genes involved in
the biosynthesis of active ingredients. Genomics, 98, 272–279.
Jackson, A.J., Hershey, D.M., Chesnut, T., Xu, M. and Peters, R.J. (2014)
Biochemical characterization of the castor bean ent-kaurene synthase(like) family supports quantum chemical view of diterpene cyclization.
Phytochemistry, 103, 13–21.
Kai, G., Xu, H., Zhou, C., Liao, P., Xiao, J., Luo, X., You, L. and Zhang, L.
(2011) Metabolic engineering tanshinone biosynthetic pathway in Salvia
miltiorrhiza hairy root cultures. Metab. Eng. 13, 319–327.
Kawai, Y., Ono, E. and Mizutani, M. (2014) Evolution and diversity of the 2oxoglutarate-dependent dioxygenase superfamily in plants. Plant J. 78,
328–343.
Kent, W.J. (2002) BLAT–the BLAST-like alignment tool. Genome Res. 12,
656–664.
Kim, D. and Salzberg, S.L. (2011) TopHat-Fusion: an algorithm for discovery
of novel fusion transcripts. Genome Biol. 12, R72.
Kitaoka, N., Lu, X., Yang, B. and Peters, R.J. (2015) The application of synthetic biology to elucidation of plant terpenoid metabolism. Mol. Plant,
8, 6–16.
Li, Q., Li, Y., Song, J. et al. (2014) High-accuracy de novo assembly and
SNP detection of chloroplast genomes using a SMRT circular consensus
sequencing strategy. New Phytol. 204, 1041–1049.
Luo, H., Zhu, Y., Song, J. et al. (2014) Transcriptional data mining of Salvia
miltiorrhiza in response to methyl jasmonate to examine the mechanism
of bioactive compound biosynthesis and regulation. Physiol. Plant 152,
241–255.
Ma, Y., Yuan, L., Wu, B., Li, X., Chen, S. and Lu, S. (2012) Genome-wide
identification and characterization of novel genes involved in terpenoid
biosynthesis in Salvia miltiorrhiza. J. Exp. Bot. 63, 2809–2823.
Marquez, Y., Brown, J.W., Simpson, C., Barta, A. and Kalyna, M. (2012)
Transcriptome survey reveals increased complexity of the alternative
splicing landscape in Arabidopsis. Genome Res. 22, 1184–1195.
Moummou, H., Kallberg, Y., Tonfack, L.B., Persson, B. and van der Rest,
B. (2012) The plant short-chain dehydrogenase (SDR) superfamily:
genome-wide inventory and diversification patterns. BMC Plant Biol.
12, 219.
Nelson, D. and Werck-Reichhart, D. (2011) A P450-centric view of plant evolution. Plant J. 66, 194–211.
Peters, R.J. (2010) Two rings in them all: the labdane-related diterpenoids.
Nat. Prod. Rep. 27, 1521–1530.
Qiu, J. (2007) Traditional medicine: a culture in the balance. Nature, 448,
126–128.
Reddy, A.S., Marquez, Y., Kalyna, M. and Barta, A. (2013) Complexity of the
alternative splicing landscape in plants. Plant Cell, 25, 3657–3683.
Sharon, D., Tilgner, H., Grubert, F. and Snyder, M. (2013) A single-molecule
long-read survey of the human transcriptome. Nat. Biotechnol. 31, 1009–
1014.
Shen, Y., Zhou, Z., Wang, Z. et al. (2014) Global dissection of alternative
splicing in paleopolyploid soybean. Plant Cell, 26, 996–1008.
Trapnell, C., Roberts, A., Goff, L., Pertea, G., Kim, D., Kelley, D.R., Pimentel,
H., Salzberg, S.L., Rinn, J.L. and Pachter, L. (2012) Differential gene and
transcript expression analysis of RNA-seq experiments with TopHat and
Cufflinks. Nat. Protoc. 7, 562–578.
Walters, B., Lum, G., Sablok, G. and Min, X.J. (2013) Genome-wide landscape of alternative splicing events in Brachypodium distachyon. DNA
Res. 20, 163–171.
Wang, X., Morris-Natschke, S.L. and Lee, K.H. (2007) New developments in
the chemistry and biology of the bioactive constituents of Tanshen. Med.
Res. Rev. 27, 133–148.
Wang, Z., Gerstein, M. and Snyder, M. (2009) RNA-Seq: a revolutionary tool
for transcriptomics. Nat. Rev. Genet. 10, 57–63.
Yan, Y., Wang, Z., Tian, W., Dong, Z. and Spencer, D.F. (2010) Generation
and analysis of expressed sequence tages from the medicinal plant Salvia miltiorrhiza. Sci. China Life Sci. 53, 273–285.
Zhang, G., Guo, G., Hu, X. et al. (2010) Deep RNA sequencing at single
base-pair resolution reveals high complexity of the rice transcriptome.
Genome Res. 20, 646–654.
Zhao, X., Zheng, X., Fan, T.-P., Li, Z., Zhang, Y. and Zheng, J. (2015) A novel
drug discovery strategy inspired by traditional medicine philosophies.
Science, 347, S38–S40.
Zi, J., Mafu, S. and Peters, R.J. (2014) To gibberellins and beyond! Surveying the evolution of (di)terpenoid metabolism. Annu. Rev. Plant Biol. 65,
259–286.
Zi, J. and Peters, R.J. (2013) Characterization of CYP76AH4 clarifies phenolic
diterpenoid biosynthesis in the Lamiaceae. Org. Biomol. Chem. 11, 7650–
7652.
© 2015 The Authors
The Plant Journal © 2015 John Wiley & Sons Ltd, The Plant Journal, (2015), doi: 10.1111/tpj.12865

Similar documents

Iso-Seq - GeT (Génome et Transcriptome)

Iso-Seq - GeT (Génome et Transcriptome) The Iso-Seq method generates accurate information about alternatively spliced exons and transcriptional start sites. It also delivers information about polyadenylation sites and therefore the stran...

More information