Full-length transcriptome sequences and splice variants, obtained
Transcription
Full-length transcriptome sequences and splice variants, obtained
The Plant Journal (2015) doi: 10.1111/tpj.12865 Full-length transcriptome sequences and splice variants, obtained by a combination of sequencing platforms applied to different root tissues of Salvia miltiorrhiza, and tanshinone biosynthesis Zhichao Xu1,†, Reuben J. Peters2,†, Jason Weirather3, Hongmei Luo1, Baosheng Liao1, Xin Zhang1, Yingjie Zhu4, Aijia Ji1, Bing Zhang5, Songnian Hu5, Kin Fai Au3, Jingyuan Song1,* and Shilin Chen1,4,* 1 Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100193, China, 2 Department of Biochemistry, Biophysics & Molecular Biology, Iowa State University, Ames, IA 50011, USA, 3 Department of Internal Medicine, University of Iowa, Iowa City, IA 52242, USA, 4 Institute of Chinese Materia Medica, Chinese Academy of Chinese Medical Science, Beijing 100700, China, and 5 Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China Received 2 March 2015; revised 19 April 2015; accepted 21 April 2015. *For correspondence (e-mails [email protected]; [email protected]). † These authors contributed equally to this work. SUMMARY Danshen, Salvia miltiorrhiza Bunge, is one of the most widely used herbs in traditional Chinese medicine, wherein its rhizome/roots are particularly valued. The corresponding bioactive components include the tanshinone diterpenoids, the biosynthesis of which is a subject of considerable interest. Previous investigations of the S. miltiorrhiza transcriptome have relied on short-read next-generation sequencing (NGS) technology, and the vast majority of the resulting isotigs do not represent full-length cDNA sequences. Moreover, these efforts have been targeted at either whole plants or hairy root cultures. Here, we demonstrate that the tanshinone pigments are produced and accumulate in the root periderm, and apply a combination of NGS and single-molecule real-time (SMRT) sequencing to various root tissues, particularly including the periderm, to provide a more complete view of the S. miltiorrhiza transcriptome, with further insight into tanshinone biosynthesis as well. In addition, the use of SMRT long-read sequencing offered the ability to examine alternative splicing, which was found to occur in approximately 40% of the detected gene loci, including several involved in isoprenoid/terpenoid metabolism. Keywords: alternative splicing, next-generation sequencing, Salvia miltiorrhiza, single-molecule real-time sequencing, tanshinone biosynthesis. INTRODUCTION Salvia miltiorrhiza Bunge is considered a model medicinal plant in traditional Chinese medicine (TCM) research because of its significant medicinal value, relatively small genome (approximately 538 Mb), short life cycle, efficient transgenic system, and uncomplicated tissue culture requirements (Ma et al., 2012). Termed danshen, S. miltiorrhiza is one of the most commonly used herbs in TCM, wherein its dried root or rhizome is highly valued. Danshen is best known for its use in the treatment of cardiovascular diseases, and exhibits strong antioxidative activity (Dong et al., 2011), leading to extensive interest in potential use © 2015 The Authors The Plant Journal © 2015 John Wiley & Sons Ltd for modern clinical trials (Qiu, 2007). Indeed, Compound Danshen Dripping Pills from the Tasly Pharmaceutical Group Co. Ltd. underwent a phase-II trial in 2010, and are currently undergoing a phase-III trial for an investigational new drug (IND) by the Food and Drug Administration (FDA) (Zhao et al., 2015). The major bioactive constituents of S. miltiorrhiza are lipophilic diterpenoid pigments and hydrophilic phenolic acids (Wang et al., 2007). More than 40 lipophilic diterpenoids and 20 hydrophilic phenolic acids have been isolated and identified from S. miltiorrhiza, including tanshinone I, tanshinone IIA, cryptotanshi1 2 Zhichao Xu et al. none, dihydrotanshinone, salvianolic acid A, salvianolic acid B, rosmarinic acid, lithospermic acid and dihydroxyphenyllactic acid. Elucidating the biosynthetic pathways and regulatory mechanisms of the active constituents will provide a foundation for investigating the use of danshen in TCM, and the potential production of these natural products as innovative pharmaceutical materials (Kai et al., 2011). As a result of the interest in the medicinal properties of danshen there has been extensive investigation of its transcriptome. An early report used unsequenced cDNAs from hairy root cultures to construct a microarray, with differential expression correlated with either culture time/development or induction (both of which are associated with tanshinone accumulation), which was used to highlight cDNAs for sequencing (Ge and Wu, 2005). The tanshinones are labdane-related diterpenoids (Peters, 2010), the biosynthesis of which requires a copalyl diphosphate synthase (CPS) and subsequently acting cyclase related to the kaurene synthases involved in gibberellin phytohormone metabolism, which is often termed kaurene synthase-like (KSL). Accordingly, the functional characterization of the two inducible diterpene synthases found in the microarray study (SmCPS1 and SmKSL1) led to the identification of the resulting diterpene olefin precursor to the tanshinone miltiradiene. Later, next-generation sequencing (NGS)based RNA-Seq analysis of similarly induced hairy root cultures led to the identification and functional characterization of a cytochrome P450 (CYP) involved in tanshinone biosynthesis, CYP76AH1 (Guo et al., 2013), which carries out the initial hydroxylation of aromatized miltiradiene to form ferruginol. Other transcriptomic studies have been reported, including an untargeted expressed sequence tag (EST) effort using whole plantlets that yielded partial sequences for approximately 4000 different unigenes (Yan et al., 2010), and RNA-seq analyses of the transcriptome from growing plants (Hua et al., 2011) or induced leaves (Luo et al., 2014). The short-read sequences generated by NGS generally prevented the assembly of full-length transcripts, however, necessitating additional effort to clone cDNAs of potential interest (e.g. for SmCPS1, SmKSL1 and CYP76AH1; Guo et al., 2013), and the reported average lengths of the isotigs from the previous RNA-Seq investigations are <500 bp. In addition, these previous studies did not dissect the root finely enough to localize tanshinone production and accumulation for more informative coexpression studies. Single-molecule real-time (SMRT) sequencing carried out in PACBIO RS (Pacific Biosciences of California, Inc, http:// www.pacificbiosciences.com/) provides a third-generation sequencing platform that is widely used in genome sequencing because of its long reads (average 4–8 kb; Chaisson et al., 2014; Chen et al., 2014b). Moreover, recent studies have addressed the problem of the higher error rate (up to 15%) observed with SMRT sequencing, by correction with NGS reads (Au et al., 2013) and/or self-correction via circular-consensus (CCS) reads (Li et al., 2014). The use of SMRT sequencing then offers access to more complete (i.e. full-length) transcriptome data, as has been recently demonstrated (Au et al., 2013; Sharon et al., 2013; Chen et al., 2014a). Here we combined NGS and SMRT sequencing to generate a more complete/full-length S. miltiorrhiza transcriptome. Moreover, this approach was applied to dissected root samples, enabling a more precise correlation of co-expression data for the resulting transcriptional data to the periderm, where tanshinones are produced and accumulated. Accordingly, this study provides a valuable resource for further investigation of tanshinone biosynthesis. RESULTS Localization of tanshinone accumulation It is the rhizome or root of S. miltiorrhiza that is used in TCM, accounting for the value of hairy root cultures in studies of this model medicinal herb. The rhizome/root of S. miltiorrhiza exhibits a characteristic reddish brown color, stemming from the tanshinones, which are largely found in the periderm, as can be readily appreciated by simply peeling or viewing a cross section of this organ (Figure 1a–c). Phytochemical analysis of the peeled tissues, roughly corresponding to the periderm, phloem and xylem, respectively (Figure 1d), demonstrates the localization of the tanshinones to the periderm (e.g. tanshinone IIA; Figure 1e). These results suggest that tanshinone biosynthesis may be completely carried out in this tissue, providing a potential basis for co-expression analysis. Combined sequencing approach to the roots of danshen To identify and differentiate the periderm transcriptome from that of the rest of the root, two experiments were undertaken, using either the NGS or the SMRT sequencing platforms (ILLUMINA; Illumina, Inc, http://www.illumina.com/ and PACBIO, respectively). First, nine mRNA samples from three different root tissues (periderm, phloem and xylem; each in triplicate) were subjected to 2 9 100 paired-end sequencing using the HiSeq 2500 platform, with 489 309 772 reads produced (Table S1). Second, full-length cDNAs from nine pooled poly(A) RNA samples were normalized and subjected to an SMRT sequencing using the PACBIO RS platform. In total, 1 202 336 raw reads (4.8 billion bases) were generated by PACBIO RS. After filtering using the RS_Subreads.1 of PACBIO RS, 796 011 subreads representing 4.3 billion bases were obtained. Next, we performed RS_IsoSeq.1 protocols, which included Classify, Cluster and Map to the reference genome, to generate CCS data, as this provides much more accurate sequence information © 2015 The Authors The Plant Journal © 2015 John Wiley & Sons Ltd, The Plant Journal, (2015), doi: 10.1111/tpj.12865 Combined sequencing of Salvia miltiorrhiza roots 3 (a) (b) (d) (c) (e) Figure 1. Morphology and microstructure of the root of Salvia mitiorrhiza. (a) The roots of S. miltiorrhiza. (b) The roots were peeled into three parts, which roughly correlate to the periderm, phloem and xylem. (c) The root tissues under a stereomicroscope; the radius of the root was 0.75 cm. (d) Paraffin section of the root. Three tissues were clearly identified. (e) The chromatogram and corresponding histograms indicate the differences in tanshinone IIA levels from the three different tissues. from reads that pass at least three times through the insert (Sharon et al., 2013), and obtained 70 761 multipass consensus reads, all generated from the <1 and 1–2 kb libraries, as it proved to be too difficult to produce consensus CCS reads from the 2–3 and >3 kb libraries, because of their larger insert lengths. In total, 223 368 full-length reads were obtained as indicated by detection of the poly(A), as well as 50 and 30 primer, sequences. All of the SMRT subreads were mapped against the S. miltiorrhiza genome, with 96% of the reads successfully mapped using BLAT (Kent, 2002; Figure S1). To resolve the high error rates of the subreads, all 796 011 SMRT subreads were corrected using the approximately 500 million NGS reads as input data (Figure 2; Au et al., 2012). After removing the redundant sequences for all SMRT subreads using CD-HIT-EST (c = 0.90), 160 468 non-redundant reads were produced, with a mean read length of 2059 bases. Besides those coding for proteins, 11 046 of these reads were predicted to be long (more than 200 bases) non-cod- ing RNAs using the coding potential calculator (CPC) for non-redundant long reads. Even though the coverage was quite high (approximately 2009), the transcripts assembled from the ILLUMINA short reads by Trinity largely did not represent full-length cDNAs. Approximately 61% of the assembled transcripts from NGS reads were <600 bases, whereas only 4% of the transcripts from the PACBIO reads were <600 bases (Figure 2). Indeed, the mean full-length read lengths from the different libraries (<1, 1–2, 2–3, and >3 kb) produced by SMRT sequencing were 923, 1283, 2026, and 3020 bases, respectively (Table S1). Nevertheless, from this study it seems that the use of NGS data to correct the low-quality SMRT reads may be better than simply relying on CCS reads. In total, from the NGS data, using a cut-off of FPKM > 10 (fragments per kilobase of exon model per million mapped reads), expression from 12 667 distinct gene loci was detected in the root, with 11 174 expressed © 2015 The Authors The Plant Journal © 2015 John Wiley & Sons Ltd, The Plant Journal, (2015), doi: 10.1111/tpj.12865 4 Zhichao Xu et al. Figure 2. (a) Comparison of transcript length distribution from different sequencing platforms. (b) Comparison of PACBIO read quality from subreads and corrected reads. (a) (b) in the periderm, 11 149 in the phloem and 10 933 in the xylem. Some genes were uniquely expressed in a single root tissue: 939 in the periderm, 347 in the phloem and 422 in the xylem. Thus, it is possible to distinguish between the transcriptomes from each of these root tissues/sections. Expression analysis indicates co-localization of tanshinone biosynthesis and accumulation In order for the periderm-localized expression of SmCPS1 to be relevant to tanshinone biosynthesis there must be similarly localized production of its substrate, the general diterpenoid precursor (E,E,E)-geranylgeranyl diphosphate (GGPP). In turn, this results from the addition of the general isoprenoid precursor isoprenyl diphosphate (IPP) to allylic diphosphate isoprenyls, beginning with dimethylallyl diphosphate (DMAPP). IPP and DMAPP are doublebond isomers interconverted by IPP isomerase (IPI). IPP and DMAPP are produced by the 2-C-methyl-D-erythritol 4phosphate (MEP)-dependent pathway in the plastid, where diterpenoid biosynthesis is initiated, although IPP can be imported from the cytosol, where it is produced by the distinct mevalonate (MVA)-dependent pathway (Zi et al., 2014). Accordingly, we investigated the root tissuespecific expression of the isogenes encoding the enzymes that make up both the MEP- and MVA-dependent isoprenoid precursor pathways, as well as potential GGPP synthases (GGPSs) and IPI. Consistent with the localized production of the tanshinones in the periderm, analysis of our root tissue-specific transcriptome data set revealed Figure 3. Heat map depicting the expression profile of isoprenoid and more specifically tanshinone biosynthesis-related genes in the periderm, phloem and xylem tissues of Salvia miltiorrhiza. (a) Transcript abundance profiles of enzymatic genes from the MEP pathway. (b) Transcript abundance profiles of enzymatic genes from the MVA pathway. (c, d) Differential expression of various diterpene synthases (SmCPSs and SmKSLs), of which SmCPS1 and SmKSL1 together lead to the production of the known tanshinone precursor miltiradiene. (e) Differential expression of CYP76AH1, which produces ferruginol. © 2015 The Authors The Plant Journal © 2015 John Wiley & Sons Ltd, The Plant Journal, (2015), doi: 10.1111/tpj.12865 Combined sequencing of Salvia miltiorrhiza roots 5 (b) (a) (c) (d) (e) © 2015 The Authors The Plant Journal © 2015 John Wiley & Sons Ltd, The Plant Journal, (2015), doi: 10.1111/tpj.12865 6 Zhichao Xu et al. that not only was SmCPS1 specifically expressed in the periderm, but also that at least one isoform of each of the enzymes that make up the MEP- and MVA-dependent precursor pathways, as well as GGPS and IPI (Figure 3a,b). In addition, of the 12 SmCPS and nine SmKSL homologs in the S. miltiorrhiza genome, SmCPS1 and SmKSL1 were the most highly (and quite specifically) expressed in the periderm (Table S2). Whereas SmCPS5, SmKSL7 and SmKSL8 also exhibit somewhat higher expression in the periderm than other root tissues, these all seemed to be expressed at significantly lower levels than SmCPS1 and SmKSL1 (Table S2). Moreover, the SmCPS1 and SmKSL1 expression patterns observed here are consistent with their role in tanshinone production, and both SmKSL7 and SmKSL8 appear to be pseudogenes (Figure 3c,d). Even beyond these, CYP76AH1, suggested to play a role in tanshinone biosynthesis, is also specifically expressed in the periderm (Figure 3e). Co-expression analysis for the investigation of tanshinone biosynthesis Given the clear periderm-specific expression of the biosynthetic machinery necessary for the production of at least the initially oxygenated intermediate ferruginol (Figure 3), and the accumulation of the tanshinones (Figure 1), we hypothesize that tanshinone biosynthesis occurs entirely in this root tissue. Accordingly, the remainder of the genes encoding enzymes involved in tanshinone biosynthesis might be expected to exhibit a similar periderm-specific (co-)expression pattern. Beyond that suggested for CYP76AH1, CYP mono-oxygenases are likely to play additional roles in tanshinone biosynthesis. Consistent with the expanded nature of the CYP superfamily in plants (Nelson and Werck-Reichhart, 2011), a total of 457 CYPs were identified from the S. miltiorrhiza genome (Table S3). Among the CYP genes, 21% (96/457) were expressed in the periderm with an FPKM > 10, with 33 exhibiting periderm-specific expression profiles like that observed for CYP76AH1 (Table S4). To further refine this list, we carried out qRTPCR analysis of the expression level of these genes in a wider range of plant organs (flowers, leaves, roots, and stems), as well as leaves treated with the defense signaling molecule methyl jasmonate (MeJA). As controls, SmCPS1 and SmKSL1 were also analysed in this manner, verifying their periderm-specific expression. Notably, sixteen CYPs, including CYP76AH1, were then identified as being most specifically expressed in the periderm, and we suggest that these should be given priority in further investigations of tanshinone biosynthesis (Figure 4). Moreover, phylogenetic analysis indicated that two of these were also members of the CYP76AH subfamily, CYP76AH3v3 (SMil_00006344) and CYP76AH3 (SMil_00029757), which by definition share >55% amino acid sequence identity with CYP76AH1 (Figure S2). Given the analogous ferruginol syn- thase activity of CYP76AH4 from Rosmarinus officinalis (rosemary) as that observed with CYP76AH1 (Zi and Peters, 2013), this suggests that the CYP76AH subfamily may have evolved to play a role in such phenolic diterpenoid biosynthesis in the Laminaceae plant family more generally. Given the highly oxidized nature of the tanshinones, it is possible that other oxygenases (e.g. 2-oxo-glutarate dependent di-oxygenases, 2ODDs), as well as dehydrogenases (e.g. short-chain alcohol dehydrogenases, SDRs), may play role(s) in tanshinone production, much as observed in other plant diterpenoid biosynthesis (Zi et al., 2014). Accordingly, we carried out similar co-expression analysis of these enzymatic families as well. The 2ODD superfamily is also quite expansive in plants (Kawai et al., 2014), with 144 members found in S. miltiorrhiza, 47 of which were expressed in the roots with FPKM > 10. Of these, 16 were found to be more highly expressed in the periderm than in the rest of the root (Table S5); however, upon analysis of a wider range of plant tissues only one was found to exhibit a root-high expression profile (2ODD-8; Figure 4). The SDR superfamily is similarly expansive in plants (Moummou et al., 2012), with 159 members present in S. miltiorrhiza, 48 of which were expressed in the root with FPKM > 10. Of these, five were found to be more highly expressed in the periderm than in the rest of root, and wider analysis indicated that all five further exhibit a periderm-specific expression profile (Figure 4; Table S6). The co-expression pattern exhibited by the one 2ODD and five SDRs may indicate a role for these in tanshinone biosynthesis that warrants further investigation. Alternatively spliced isoforms The long reads generated by SMRT sequencing are expected to offer extensive information about alternative splicing (Au et al., 2013; Sharon et al., 2013; Chen et al., 2014a). Consistent with this, analysis of the 60 584 058 Illumina short reads by SPLICEMAP (Au et al., 2010) led to the detection of only 110 715 junctions that were retained after nUM filtering with approximately 95% specificity. By contrast, isoform detection and prediction (IDP) analysis of the 1 313 216 sequences generated by SMRT sequencing detected junctions in 1 109 011 of these long-read data (84%). Although there are 26 064 loci annotated as multiexon genes in the S. miltiorrhiza genome, from the NGS data CUFFLINKS (Trapnell et al., 2012) identified only 10 245 expressed with FPKM > 10 in the root, with 3745 genes of these directly detected using IDP, a 36% detection rate. From spliced alignment of the long-read SMRT sequences, IDP analysis found 16 241 isoforms covering 10 323 multiexon genes found in this data set, with 6660 exhibiting FPKM > 10, increasing the sensitivity of isoform identification up to 65% (Figure 5a). Of the 10 323 multi-exon gene loci expressed in the root, 4165 (40%) exhibited alternative spliced isoforms, with © 2015 The Authors The Plant Journal © 2015 John Wiley & Sons Ltd, The Plant Journal, (2015), doi: 10.1111/tpj.12865 Combined sequencing of Salvia miltiorrhiza roots 7 Figure 4. Heat map depicting the various CYPs, 2ODDs and SDRs that are co-expressed with the SmCPS1, SmKSL1 and CYP76AH1 known to be involved in tanshinone biosynthesis, in a range of different tissues (periderm, phloem, xylem, root, stem, leaf and flower), as well as control or methyl jasmonate (MeJA)-treated leaves of S. miltiorrhiza (MeJA-0 and MeJA-12, respectively). qRT-PCR analysis was also carried out for the genes in red. more than two isoforms found for 15% (Figure 5b,c). It should be noted that 3526 (85%) of these loci exhibit predominant expression of a single isoform, and the alternative isoforms observed for these may simply represent splicing errors. Nevertheless, our data provide clear evidence for alternative splicing in the S. miltiorrhiza root. Consistent with this, our combined transcriptome data demonstrated the expression of genes encoding all the necessary subunits for splicesome assembly (Table S7). To investigate the distribution of the different types of alternative splicing, we further analyzed all of the junctions and isoforms detected using SPLICEMAP and IDP. A total of 12 264 identified isoforms contained annotated and unannotated junctions, which represented alternatively spliced isoforms of known genes. Of these, 21 and 4% resulted from intron retention and exon skipping events, respectively, whereas 18 and 39% of the junctions were characterized as alternative 50 and 30 splice site events, respectively (Figures 5d and S4). The genes encoding SmCPS1 and the CYPs do not seem to undergo any significant degree of alternative splicing. Nevertheless, there may be a role for alternative splicing in regulating tanshinone biosynthesis and isoprenoid/terpenoid metabolism more generally (Figure S5). First, five differentially spliced isoforms were observed for SmKSL1 (all with FPKM > 10), only one of which seems likely to encode a catalytically competent enzyme. In addition, a number of genes involved in the production of the isoprenoid precursors exhibit alternative splicing, with only one (SmHDR3, 4-hydroxy-3-methylbut-2-enyl diphosphate reductase) from the MEP pathway, but with four from the MVA pathway: SmAACT3 (acetyl-CoA C-acetyltransferase), SmHMGR (3- © 2015 The Authors The Plant Journal © 2015 John Wiley & Sons Ltd, The Plant Journal, (2015), doi: 10.1111/tpj.12865 8 Zhichao Xu et al. (a) (d) (b) (c) hydroxy-3-methylglutaryl-coenzyme A reductase), SmMK (mevalonate kinase) and SmPMK (5-phosphomevalonate kinase). Of particular interest, both SmHDR3 and SmPMK have multiple isoforms expressed with FPKM > 10, only one of which can be translated to a catalytically competent enzyme in each case, such that the regulation of alternative splicing may play a role in controlling flux through both the MEP- and MVA-dependent isoprenoid precursor pathways (Table S8). DISCUSSION The long-standing and widespread use of danshen in TCM, along with its continuing translation to modern western medicine, has led to intense interest in the biosynthesis of the relevant bioactive components (Qiu, 2007). Much of this interest has focused on the tanshinone diterpenoids, which provide the characteristic reddish brown coloring to the highly valued rhizome, and exhibit potent biological activity (Dong et al., 2011). For this purpose, a number of whole-plant and hairy root culture-based transcriptome studies have been previously reported (Ge and Wu, 2005; Hua et al., 2011; Luo et al., 2014; Yan et al., 2010); however, these studies were limited by either number and/or length of the generated sequence information, necessitating further cloning efforts in order to obtain full-length cDNA sequences for the investigation of potential roles in tanshinone biosynthesis (Guo et al., 2013). With the identification of the diterpene precursor miltiradiene, many of the remaining steps in tanshinone biosynthesis are likely to be catalyzed by CYPs (Zi et al., 2014), the investigation of which largely relies on synthetic biology approaches Figure 5. Detection and prediction of the gene isoforms of Salvia miltiorrhiza using IDP. (a) Venn diagram of isoform detection and prediction. A total of 4035 isoforms and 16 241 isoforms are detected and predicted, respectively. (b, c) The distribution of alternative spliced isoforms from each gene locus. (d) Pie chart of the different alternative spliced types. ES, exon skipping; IR, intron retention; A30 S, alternative 30 splice site; A50 S, alternative 50 splice site. using genes codon-optimized for recombinant expression (Kitaoka et al., 2015). This obviously requires accurate and full-length cDNAs, and can be limited by inaccurate sequence information, such as those predicted from genome sequences: e.g. as demonstrated by the investigation of the KS(L) gene family in Ricinus communis (castor bean; Jackson et al., 2014). Given our interest in tanshinone biosynthesis, we demonstrate here that not only accumulation (Figure 1) but also biosynthesis of the tanshinones occurs in the danshen root periderm (Figure 3). To address the incomplete transcriptome available for S. miltiorrhiza we combined shortread NGS and long-read SMRT sequencing of three distinct root tissues (i.e. the periderm, phloem and xylem), from which we were able to generate a much more complete transcriptome of the danshen root (Figure 2). The use of full-length libraries with long SMRT sequencing reads (SMRT sequencing N50 = 2411 bp) enabled the generation of full-length transcripts relative to assemblies generated with ILLUMINA reads only (ILLUMINA assembled isotigs N50 = 1530 bp; Figure 2a). Nevertheless, a hybrid sequencing approach combining both types of data, specifically correcting the SMRT reads using ILLUMINA reads, led to high-quality full-length transcripts, avoiding mis-assemblies of genes and gene families with high sequence identity. Via this accurate hybrid approach, we were able to generate full-length sequences for a significantly higher proportion of the enzymatic genes involved in terpenoid biosynthesis. In particular, although ILLUMINA-based studies were only able to assemble full-length cDNA sequences for 43% of the relevant enzymatic families on average (e.g. ter- © 2015 The Authors The Plant Journal © 2015 John Wiley & Sons Ltd, The Plant Journal, (2015), doi: 10.1111/tpj.12865 Combined sequencing of Salvia miltiorrhiza roots 9 pene synthases, CYPs, 2ODD and SDRs), here we were able to increase this to 73% (Table S10). Critically, our transcriptome analysis not only includes the periderm, where tashinones are produced and stored, but also alternative root tissues. This then enabled the interrogation of the transcriptomic data to verify periderm-specific expression of both the MEP- and MVA-dependent isoprenoid precursor pathways, along with IPI and potential GGPS genes required for the production of diterpenoids, as well as the SmCPS1, SmKSL1 and CYP76AH1 genes more specifically involved in tanshinone biosynthesis (Figure 3). On this basis we then analyzed the extensive CYP, 2ODD and SDR superfamilies in S. miltiorrhiza, finding that a limited number of each were more highly expressed in the periderm than the other root tissues investigated here (Tables S4–S6). Given this more limited number of genes it was further possible to use qRT-PCR to more generally analyze their expression pattern, with co-expression with SmCPS1, SmKSL1 and CYP76AH1 suggesting 15 additional CYPs, one 2ODD, and five SDRs that may play role(s) in tanshinone biosynthesis (Figures 4 and S3). Previous studies relying on NGS were able to identify novel introns and splicing variants that altogether indicated that up to 60% of multi-exon genes underwent alternative splicing events in different plants (Wang et al., 2009), such as Arabidopsis thaliana (Filichkin et al., 2010; Marquez et al., 2012), Glycine max (Shen et al., 2014), Brachypodium distachyon (Walters et al., 2013) and Oryza sativa (Zhang et al., 2010). Whereas such NGS short-read data can identify spliced junctions with the use of SPLICEMAP or TOPHAT (Kim and Salzberg, 2011), the mostly incomplete nature of the assembled transcripts largely eliminates the direct identification of distinct isoforms. Combining SMRT long reads and NGS short reads led to sensitive isoform detection and prediction, revealing the corresponding alternative splicing events in the human transcriptome (Au et al., 2013; Sharon et al., 2013; Chen et al., 2014a). Similarly, our hybrid sequencing approach has enabled such analysis of the S. miltiorrhiza transcriptome. In total, 40% of the detected gene loci were identified as undergoing alternative splicing in S. miltiorrhiza (Figure 5). It should be noted that for the majority of these a single isoform was predominant, suggesting that some of the observed alternative isoforms may not be significant (e.g. the result of splicing errors rather than a regulated/controlled process; Reddy et al., 2013). Nevertheless, alternative splicing is clearly observed among the genes involved in tanshinone biosynthesis (Figure S5), which may serve as a regulatory mechanism in controlling such diterpenoid metabolism. In summary, we localized the metabolism of the bioactive tanshinone diterpenoids from the model medicine plant S. miltiorrhiza to the root periderm, and carried out tissuedifferentiated transcriptome analysis of this plant organ using a combined NGS short-read and SMRT long-read sequencing approach. This enabled the generation of fulllength transcripts as well as providing evidence for periderm-localized tanshinone biosynthesis, which was used to further identify a subset of 15 CYPs, the co-expression of which with the already known enzymatic genes SmCPS1, SmKSL1 and CYP76AH1, suggests a role in such bioactive diterpenoid metabolism. Moreover, our study provides a template for investigating secondary metabolism in other species, paving the way towards synthetic biology approaches to such natural products. EXPERIMENTAL PROCEDURES Plant materials and RNA sample preparation Three-year-old S. miltiorrhiza (line 99-3) plants were harvested from an experimental field at the Institute of Medicinal Plant Development (IMPLAD). Fifteen independent root samples were collected and divided into three portions. Each portion was divided into three parts (periderm, phloem and xylem). Nine total RNA samples (three different root tissues with three repetitions) were isolated using the RNeasy Plus Mini Kit (#74134; Qiagen, http://www.qiagen.com). The total RNA was quantified and the quality was assessed using an Agilent 2100 Bioanalyzer (Agilent, http://www.agilent.com). Three experiments were conducted. First, the different organs (root, stem, leaf and flower) were collected, and total RNA was extracted. Eight libraries of four different organs were subjected to 2 9 100 paired-end sequencing using the ILLUMINA HiSeq 2000 platform. Second, nine libraries of three root tissues (periderm, phloem and xylem) were subjected to 2 9 100 paired-end RNA-seq using ILLUMINA HiSeq 2500. Third, the nine individual samples were pooled to provide 90 lg of total S. miltiorrhiza RNA. Poly(A) RNA was isolated from the total RNA using the oligo d(T) magnetic bead binding method and the Poly (A)PuristTM Kit (#AM1916; Ambion, now Life Technologies, http:// www.lifetechnologies.com/uk/en/home/brands/ambion.html). Isolated poly(A) RNA was eluted with 20 ll of RNase-free water. All of the experiments were performed following the protocols included with the kits. cDNA synthesis and normalization Isolated poly(A) RNA was quantified using the Agilent 2100 bioanalyzer. First-strand cDNA was synthesized using the SMARTer PCR cDNA Synthesis Kit (#634926; Clontech, http://www.clontech.com). The tailing by SMARTScribeTM Reverse Transcriptase could switch the same adaptor primer on the 30 and 50 ends of the poly(A) RNA using CDS Primer IIA [50 -AAGCAGTGGTATCAACGCAGAGTACT(30)N–1N-30 ] and SMARTer IIA Oligonucleotide (50 -AAGCAGTGGTATCAACGCAGAGTACXXXXX-30 ). Next, second-strand cDNA synthesis was performed using Phusion High-Fidelity DNA Polymerase (#M0530; NEB, http://www.neb.com) with 50 PCR Primer IIA (50 -AAGCAGTGGTATCAACGCAGAGTAC-30 ). As revealed by preliminary testing, the 14-cycles condition was optimal for avoiding the over-amplification of small fragments. Purified cDNA was normalized using the Trimmer-2 cDNA Normalization Kit (#NK003; Evrogen, http://www.evrogen.com). Next, the normalized cDNA was amplified using the 50 PCR Primer IIA with 18 cycles. Agarose gel-based size selection was performed using the SYBR Safe DNA Gel Stain and blue light system to avoid DNA damage. Then, four fractions, containing fragments >3, 2–3, 1–2, or <1 kb, were collected and purified using the QIAquick Gel Extraction Kit. The extracted products were © 2015 The Authors The Plant Journal © 2015 John Wiley & Sons Ltd, The Plant Journal, (2015), doi: 10.1111/tpj.12865 10 Zhichao Xu et al. amplified using the 50 Primer IIA. Importantly, all of the PCR steps required the selection of the most optimal cycling conditions to avoid the over-amplification of small fragments. After amplification, the PCR products were purified using 0.59 AMPure beads (#A63880; Beckman, http://www.beckmancoulter.com). Library preparation and PACBIO sequencing Four normalized cDNAs of different sizes were constructed separately for four SMRT cell libraries using a DNA Template Prep Kit (3–10 kb, part #001-540-835; Pacific Biosciences of California, Inc, http://www.pacificbiosciences.com/). The templates were bound to SA-DNA polymerase and V2 primers using the DNA/polymerase Binding Kit 2.0 (part #001-672-551). The complexes of templates and polymerase were bound to magnetic beads (part #100-125900) and transferred to a 96-well PCR plate for processing on a PACBIO RS using C2 sequencing reagents. Each library underwent SMRT sequencing using two SMRT cells. Subreads were filtered and subjected to CCS using the SMRT Analysis Server 2.2.0 (Pacific Biosciences of California, Inc). Isoform detection and prediction The short reads generated with HiSeq 2500 were filtered using the NGS QC Toolkit. LSC 1.alpha software was used to correct CCS reads by alignment with filtered NGS short reads. SPLICEMAP 3.3.5.2 (Au et al., 2010) was used to detect exon junctions and novel gene loci. BOWTIE was used to align the short reads with the S. miltiorrhiza genome using SPLICEMAP. IDP 0.1.7 used the error-corrected long reads from LSC and the junctions from SPLICEMAP as input to detect and predict the isoforms. Phylogenetic analysis Eight CYPs related to artemisinin biosynthesis in Artemisia annua were selected from NCBI and then pooled with 34 CYPs before performing an alignment with MEGA 6 (MEGA, http://www.megasoftware.net/). We then constructed an unrooted phylogenetic tree using the neighbour-joining clustering method with the full-length amino acid sequences using the bootstrap method with 1000 replications. qRT-PCR analysis Nine RNA samples were isolated from different tissues (periderm, phloem, xylem, root, stem, leaf and flower) from S. miltiorrhiza, and leaves were treated with MeJA (control and MeJA, 12 h). Reverse transcription was performed with PrimeScriptTM Reverse Transcriptase (TaKaRa, http://www.takara-bio.com). qRT-PCR primers were designed with PRIMER PREMIER 6 (PREMIER Biosoft, http://www.premierbiosoft.com/primerdesign/), and their specificity was verified by PCR (Table S9). qRT-PCR analysis was conducted in triplicate using SYBRâ Premix Ex TaqTM II (TaKaRa), with SmActin as a reference gene, by 7500 real-time PCR system (ABI). Accession codes SMRT sequencing data and ILLUMINA HiSeq 2500 data have been submitted to the Sequence Read Archive (SRA) of the National Center for Biotechnology Information (NCBI) under accession numbers SRX753381, SRR1640458, SRP028388 and SRP051564. Differential expression analysis ACKNOWLEDGEMENTS The reads from three root tissues (periderm, phloem and xylem) and four different organs (root, stem, leaf and flower) were produced in this study. The data from MeJA-treated leaves (200 lM) were derived from a previous study (Luo et al., 2014). The expression analysis from ILLUMINA reads of different tissues and treatment was performed with TOPHAT and CUFFLINKS (Trapnell et al., 2012). We thank Dr David R. Nelson for helping in the naming of the CYP450s of Salvia miltiorrhiza. This work was supported by the National Science-technology Support Plan of China (grant no. 2012BAI29B01) and the Major Scientific and Technological Special Project for ‘Significant New Drugs Creation’ (grant no. 2014ZX09304307001). lncRNAs prediction Redundant reads of the error-corrected CCS reads were filtered using CD-HIT-EST. The CD-HIT-EST clustered the cDNAs with similarity thresholds of 0.85 into clusters and then removed the redundant sequences. A total of 160 468 filtered non-redundant sequences as input data were subjected to the coding potential calculator (CPC) to predict lncRNAs. UPLC analysis of tanshinone IIA content The detection methods followed the Pharmacopoeia of the People’s Republic of China. All of the periderm, phloem and xylem samples were ground into a powder with three repetitions for each sample, and then each sample of weighed ground powder (0.3 g) was extracted with 50 ml of methanol. After 1 h of heating reflux extraction, methanol was added to complement and maintain a constant weight, and then the sample was filtered through a 0.45lm syringe filter. In addition, the Tanshinone IIA standard was dissolved in methanol at a concentration of 16 mg ml1. Chromatographic separations were performed using the Waters X bridge C18 column with a mobile phase of 75% methanol to 25% H2O in a Waters UPLC system (Waters, http://www.waters.com). SUPPORTING INFORMATION Additional Supporting Information may be found in the online version of this article. Figure S1. (a) The distribution of PACBIO reads of different length. (b) Mapping statistics of the PACBIO reads to the Salvia miltiorrhiza genome with BLAT. (c) A count VS dispersion plot for different tissues from the root with the CUFFLINKS to the ILLUMINA reads. Figure S2. Phylogenetic tree analysis of candidate CYPs that were co-expressed with CYP76AH1. Figure S3. qRT-PCR analysis of the putative copalyl diphosphate synthase (CPS), kaurene synthase-like (KSL), cytochrome P450s (CYPs), 2-oxo-glutarate dependent di-oxygenase (2ODDs) and short-chain alcohol dehydrogenases (SDRs) with putative roles in tanshinone biosynthesis in different tissues (periderm, phloem, xylem, root, stem, leaf, flower), and without or with MeJA treatment (MeJA-0 and MeJA-12) in Salvia miltiorrhiza. Figure S4. Predicted isoforms of existing gene loci with different alternatively spliced types using the IGV genome browser. Figure S5. The different alternative splicing isoforms of enzymatic genes involved in Salvia miltiorrhiza terpenoid biosynthesis. Table S1. General properties of the reads produced by ILLUMINA HiSeq 2500 and PACBIO sequencing platforms. © 2015 The Authors The Plant Journal © 2015 John Wiley & Sons Ltd, The Plant Journal, (2015), doi: 10.1111/tpj.12865 Combined sequencing of Salvia miltiorrhiza roots 11 Table S2. Genome-wide identification of diterpenoid synthases in Salvia miltiorrhiza. Table S3. Genome-wide identification of CYPs in Salvia miltiorrhiza. Table S4. Genome-wide identification of candidate CYPs that were co-expressed with CYP67AH1 in Salvia miltiorrhiza. Table S5. Genome-wide identification of candidate 2-oxo-glutarate dependent di-oxygenases (2ODDs) that exhibited periderm-specific expression in Salvia miltiorrhiza. Table S6. Genome-wide identification of candidate short-chain alcohol dehydrogenases (SDRs) that exhibited periderm-specific expression in Salvia miltiorrhiza. Table S7. Genome-wide identification of spliceosomal proteins in Arabidopsis thaliana and Salvia miltiorrhiza. Table S8. The expression pattern of differentially spliced isoforms of enzymatic genes from tanshinones biosynthesis in Salvia miltiorrhiza. Table S9. The primers used for qPCR analysis. Table S10. Identified full-length or partial-length genes of the expressed terpenoid synthases, candidate CYPs, 2ODD and SDRs from ILLUMINA assembly and hybrid sequencing. REFERENCES Au, K.F., Jiang, H., Lin, L., Xing, Y. and Wong, W.H. (2010) Detection of splice junctions from paired-end RNA-seq data by SpliceMap. Nucleic Acids Res. 38, 4570–4578. Au, K.F., Sebastiano, V., Afshar, P.T. et al. (2013) Characterization of the human ESC transcriptome by hybrid sequencing. Proc. Natl Acad. Sci. USA, 110, E4821–E4830. Au, K.F., Underwood, J.G., Lee, L. and Wong, W.H. (2012) Improving PacBio long read accuracy by short read alignment. PLoS ONE, 7, e46679. Chaisson, M.J., Huddleston, J., Dennis, M.Y. et al. (2014) Resolving the complexity of the human genome using single-molecule sequencing. Nature, 517, 608–611. Chen, L., Kostadima, M., Martens, J.H. et al. (2014a) Transcriptional diversity during lineage commitment of human blood progenitors. Science, 345, 1251033. Chen, X., Bracht, J.R., Goldman, A.D. et al. (2014b) The architecture of a scrambled genome reveals massive levels of genomic rearrangement during development. Cell, 158, 1187–1198. Dong, Y., Morris-Natschke, S.L. and Lee, K.H. (2011) Biosynthesis, total syntheses, and antitumor activity of tanshinones and their analogs as potential therapeutic agents. Nat. Prod. Rep. 28, 529–542. Filichkin, S.A., Priest, H.D., Givan, S.A., Shen, R., Bryant, D.W., Fox, S.E., Wong, W.K. and Mockler, T.C. (2010) Genome-wide mapping of alternative splicing in Arabidopsis thaliana. Genome Res. 20, 45–58. Ge, X. and Wu, J. (2005) Tanshinone production and isoprenoid pathways in Salvia miltiorrhiza hairy roots induced by Ag+ and yeast elicitor. Plant Sci. 168, 487–491. Guo, J., Zhou, Y.J., Hillwig, M.L. et al. (2013) CYP76AH1 catalyzes turnover of miltiradiene in tanshinones biosynthesis and enables heterologous production of ferruginol in yeasts. Proc. Natl Acad. Sci. USA, 110, 12108– 12113. Hua, W., Zhang, Y., Song, J., Zhao, L. and Wang, Z. (2011) De novo transcriptome sequencing in Salvia miltiorrhiza to identify genes involved in the biosynthesis of active ingredients. Genomics, 98, 272–279. Jackson, A.J., Hershey, D.M., Chesnut, T., Xu, M. and Peters, R.J. (2014) Biochemical characterization of the castor bean ent-kaurene synthase(like) family supports quantum chemical view of diterpene cyclization. Phytochemistry, 103, 13–21. Kai, G., Xu, H., Zhou, C., Liao, P., Xiao, J., Luo, X., You, L. and Zhang, L. (2011) Metabolic engineering tanshinone biosynthetic pathway in Salvia miltiorrhiza hairy root cultures. Metab. Eng. 13, 319–327. Kawai, Y., Ono, E. and Mizutani, M. (2014) Evolution and diversity of the 2oxoglutarate-dependent dioxygenase superfamily in plants. Plant J. 78, 328–343. Kent, W.J. (2002) BLAT–the BLAST-like alignment tool. Genome Res. 12, 656–664. Kim, D. and Salzberg, S.L. (2011) TopHat-Fusion: an algorithm for discovery of novel fusion transcripts. Genome Biol. 12, R72. Kitaoka, N., Lu, X., Yang, B. and Peters, R.J. (2015) The application of synthetic biology to elucidation of plant terpenoid metabolism. Mol. Plant, 8, 6–16. Li, Q., Li, Y., Song, J. et al. (2014) High-accuracy de novo assembly and SNP detection of chloroplast genomes using a SMRT circular consensus sequencing strategy. New Phytol. 204, 1041–1049. Luo, H., Zhu, Y., Song, J. et al. (2014) Transcriptional data mining of Salvia miltiorrhiza in response to methyl jasmonate to examine the mechanism of bioactive compound biosynthesis and regulation. Physiol. Plant 152, 241–255. Ma, Y., Yuan, L., Wu, B., Li, X., Chen, S. and Lu, S. (2012) Genome-wide identification and characterization of novel genes involved in terpenoid biosynthesis in Salvia miltiorrhiza. J. Exp. Bot. 63, 2809–2823. Marquez, Y., Brown, J.W., Simpson, C., Barta, A. and Kalyna, M. (2012) Transcriptome survey reveals increased complexity of the alternative splicing landscape in Arabidopsis. Genome Res. 22, 1184–1195. Moummou, H., Kallberg, Y., Tonfack, L.B., Persson, B. and van der Rest, B. (2012) The plant short-chain dehydrogenase (SDR) superfamily: genome-wide inventory and diversification patterns. BMC Plant Biol. 12, 219. Nelson, D. and Werck-Reichhart, D. (2011) A P450-centric view of plant evolution. Plant J. 66, 194–211. Peters, R.J. (2010) Two rings in them all: the labdane-related diterpenoids. Nat. Prod. Rep. 27, 1521–1530. Qiu, J. (2007) Traditional medicine: a culture in the balance. Nature, 448, 126–128. Reddy, A.S., Marquez, Y., Kalyna, M. and Barta, A. (2013) Complexity of the alternative splicing landscape in plants. Plant Cell, 25, 3657–3683. Sharon, D., Tilgner, H., Grubert, F. and Snyder, M. (2013) A single-molecule long-read survey of the human transcriptome. Nat. Biotechnol. 31, 1009– 1014. Shen, Y., Zhou, Z., Wang, Z. et al. (2014) Global dissection of alternative splicing in paleopolyploid soybean. Plant Cell, 26, 996–1008. Trapnell, C., Roberts, A., Goff, L., Pertea, G., Kim, D., Kelley, D.R., Pimentel, H., Salzberg, S.L., Rinn, J.L. and Pachter, L. (2012) Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. 7, 562–578. Walters, B., Lum, G., Sablok, G. and Min, X.J. (2013) Genome-wide landscape of alternative splicing events in Brachypodium distachyon. DNA Res. 20, 163–171. Wang, X., Morris-Natschke, S.L. and Lee, K.H. (2007) New developments in the chemistry and biology of the bioactive constituents of Tanshen. Med. Res. Rev. 27, 133–148. Wang, Z., Gerstein, M. and Snyder, M. (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10, 57–63. Yan, Y., Wang, Z., Tian, W., Dong, Z. and Spencer, D.F. (2010) Generation and analysis of expressed sequence tages from the medicinal plant Salvia miltiorrhiza. Sci. China Life Sci. 53, 273–285. Zhang, G., Guo, G., Hu, X. et al. (2010) Deep RNA sequencing at single base-pair resolution reveals high complexity of the rice transcriptome. Genome Res. 20, 646–654. Zhao, X., Zheng, X., Fan, T.-P., Li, Z., Zhang, Y. and Zheng, J. (2015) A novel drug discovery strategy inspired by traditional medicine philosophies. Science, 347, S38–S40. Zi, J., Mafu, S. and Peters, R.J. (2014) To gibberellins and beyond! Surveying the evolution of (di)terpenoid metabolism. Annu. Rev. Plant Biol. 65, 259–286. Zi, J. and Peters, R.J. (2013) Characterization of CYP76AH4 clarifies phenolic diterpenoid biosynthesis in the Lamiaceae. Org. Biomol. Chem. 11, 7650– 7652. © 2015 The Authors The Plant Journal © 2015 John Wiley & Sons Ltd, The Plant Journal, (2015), doi: 10.1111/tpj.12865
Similar documents
Iso-Seq - GeT (Génome et Transcriptome)
The Iso-Seq method generates accurate information about alternatively spliced exons and transcriptional start sites. It also delivers information about polyadenylation sites and therefore the stran...
More information