How To… Process Your 454 16S/18S rRNA Amplicons with QIIME
Transcription
How To… Process Your 454 16S/18S rRNA Amplicons with QIIME
Sequencing Solutions Technical Note September 2013 How To… 1. OVERVIEW Process Your 454 16S/18S rRNA Amplicons with QIIME QIIME (pronounced “chime”) is an open source software tool for analyzing raw data from high throughput sequencing of microbiomes (Caporaso, J et al. Nature Methods 7, 335 – 336, 2010). QIIME computes results and generates output from 16S/18S sequencing data and supports taxonomic resolution of a wide range of microbial communities. 454 Sequencing platforms are ideal for sequencing the small subunit (SSU) ribosomal RNA gene (16S or 18S) because of their ability to: Multiplex hundreds of samples with samplespecific identifiers with the use of MIDs Produce >700,000 reads / run on the GS FLX+ system and >70,000 reads / run on the GS Junior system Application Ribosomal RNA sequencing Sequence long amplicons, up to 1,100 bp with software v2.9 and Flow Pattern B (available on GS FLX+ instruments and in early 2014 for the GS Junior instrument). Currently, sequence up to 550 bp on the GS Junior instrument. Choose sequencing depth for an optimal taxonomic resolution Products GS FLX, GS FLX+ and GS Junior Systems These features make 454 Sequencing systems the standard Next Generation Sequencing platforms since 2008 for identifying SSU rRNA, with over 1,000 peerreviewed publications. Here, we show an overview of the steps for setting up a sequencing experiment for 16S or 18S rRNA genes on a For life science research only. Not for use in diagnostic procedures. 1 454 Sequencing platform and analyzing the amplicon data with the QIIME software. Questions and comments related to the sequencing run should be directed to your local Roche support team, while QIIME related questions should be directed to QIIME’s user support group at QIIME.org. 2. STARTING WITH QIIME Install the QIIME software on your computer following the instructions on the QIIME website (www.QIIME.org). It can run on multiple computer formats and several operating systems are supported. Typically, users install a local copy of QIIME on a Mac or Linux based computer, or use the provided Amazon cloud instance. 3. CREATING AMPLICONS WITH QIIME The most efficient way to create 16S or 18S amplicons from 16S/18S hypervariable regions is to use the primers posted on the QIIME website. These primers, along with PCR amplification protocols, were tested by QIIME developers to create long amplicons that were successfully sequenced on the GS FLX+ instrument and analyzed with QIIME. The details of the protocols, primer sequences, PCR conditions, and thermocycler settings can be downloaded from the ‘Resources’ section at QIIME.org. The MS Word file primer_ordering_and_resuspension.doc, available via download from the “Resources” section, has instructions on ordering primers and preparing the amplicons. There are three primer sets described: Primer set 515f - 1119r is targeted towards the eukaryotic rRNA gene and yields a roughly 800 bp amplicon that spans the V4 – V6 variable regions of the small subunit rRNA gene. The forward primer, 515f, is barcoded and is a three domain universal primer that picks up Eukaryotes, Bacteria, and Archaea. The reverse primer, 1119r, is targeted towards Eukaryotes but is a poor match to vertebrates. Primer set 515f – 1391r is made of universal primers that amplify the ribosomal rRNA gene of Bacteria, Eukaryotes and Archaea. It yields an amplicon roughly 900 bp long for Bacteria and Archaea and 1,100 bp long for Eukaryotes. Primer set 515f – EukBr is targeted towards Eukaryotes and yields an amplicon roughly 1,200 bp long that spans the V4 – V9 variable regions. Technical Note: How To… Process Your 454 16S/18S rRNA Amplicons with QIIME 2 4. TIPS FOR SEQUENCING YOUR 16S/18S SAMPLES Cleaning the Amplicon Library before Sequencing Before sequencing your 16S /18S amplicons, it is crucial to clean the amplicon library of short fragments that would take up sequencing space from more valuable long reads. For instance, in the sequencing run below (Figure 1), the short peaks below the 300 bp mark could have been easily eliminated with AMPure XP Beads or gel electrophoresis. Short reads waste space that could be used for longer reads. Figure 1: A 16S sequencing run with primer set 515f - 1391r. The intended product is approximately 900 bases. Primer dimer peaks from the improperly purified library are evident below 300 bases. Choosing the Right Analysis Pipeline Figure 2 shows an 18S sequencing run on the GS FLX+ instrument, analyzed with the shotgun pipeline. Similar results would have been obtained with the Long Amplicons 2 pipeline. Both analysis pipelines allow reads to be trimmed at the 3’ ends if errors in the last 100 bases exceed a certain threshold (approximately, ~ Q20 or 1% error). Because sequencing errors tend to increase with read length, the basecalls at the beginning of the reads will generally be much better than Q20. A total of four analysis pipelines are available: three for long amplicons (Long Amplicons 1, Long Amplicons 2, and Long Amplicons 3) and one for shotgun. General guidelines on the performances of these pipelines are the following: Long Amplicons 1 pipeline may be overly stringent for most 16S/18S studies as it rejects too many reads. This pipeline does not have read trimming capabilities, so partially good reads will not be trimmed back to eliminate errors but will be eliminated. This signal processing pipeline was designed for sequencing applications, where keeping the entire read is a requirement. Long Amplicons 2 and shotgun pipeline produce reads with similar accuracies. For highly diverse samples, shotgun might be preferred, but for samples with less diversity, Long Amplicons 2 may perform better. The shotgun pipeline takes advantage of read diversity to perform accuracy calculations, so if diversity is low, many good reads may be rejected by the shotgun pipeline. Long Amplicons 3 pipeline tends to produce the most reads, however accuracy could be decreased. Users should be aware that this pipeline may allow an error rate of ~2% in the last 100 bases of the read. Technical Note: How To… Process Your 454 16S/18S rRNA Amplicons with QIIME 3 Figure 2: An 18S sequencing run using primers 515f – EukBr. The read lengths for this amplicon are close to the limit of sequencing possible for the GS FLX+ instrument with Flow Pattern B. 5. CHARACTERIZING YOUR SAMPLE WITH QIIME Denoising and Read Filtering Tips For long amplicons (up to 1,100 bp) that will be sequenced with Flow Pattern B (available with v2.9 software or later), please be aware that Flow Pattern B will not work correctly with the flow-based denoiser used in QIIME. Downstream QIIME analysis tools can still be used with Flow Pattern B, but the denoising step should be skipped with Flow Pattern B sequencing runs. If denoising is needed, it is possible to use a non-flow based denoiser such as Acacia (Bragg, et al., Nature Methods, pp. 425–426, 2012). Recommendations comparing the various denoiser options will be available from the QIIME web site once benchmarking is complete. Denoising Option There is a filtering option when denoising is needed. However, the recommendation for filtering reads (using the ‘-w 50 –g ‘ combination to discard reads of poor quality) is very aggressive and can discard a large number of good quality reads, sometimes more than 50% of the good reads in a run. If too many reads are discarded, we suggest using one of the alternate trimming protocols below which use quality averages to identify where to trim off the low accuracy ends of the reads. Below are two examples of filtering from a mapping file named “HI2KAW301_mapping.txt”: Example 1 The filter examines entire read quality, if the average falls below Q25, it trims the 3’ end until the quality is above Q25. split_libraries.py -o denoiser_dir -f HI2KAW301.fna -q HI2KAW301.qual -m HI2KAW301_mapping.txt -b 10 -p -k -e 1 -r –s 25 -l 200 Technical Note: How To… Process Your 454 16S/18S rRNA Amplicons with QIIME 4 Example 2 The filter scans a sliding window of 50 bases, and trims where the 50 base average falls below Q25. split_libraries.py -o denoiser_dir -f HI2KAW301.fna -q HI2KAW301.qual -m HI2KAW301_mapping.txt -b 10 -p -k -e 1 -r -w 50 -l 200 The denoiser can be used with Flow Pattern A runs generated on either the GS FLX or the GS Junior instrument. However, in the QIIME tutorial, the 'denoise_wrapper.py' wrapper script will return error messages. To circumvent this issue, please run the script manually from the command line as directed on the QIIME denioser.py page (http://qiime.org/scripts/denoiser.html). In brief, after running split_libraries.py, denoiser.py can be run as below: denoiser.py -c -n 8 -f denoiser_dir/seqs.fna -i HI2KAW301.sff.txt -o denoiser_dir/out 6. QIIME ANALYSIS OUTPUT The final result of a 16S/18S amplicon study is to characterize the microbial community by several means that can include the following: Comparing taxa across samples (Figure 3) Figure 3: Example from the QIIME tutorial of a taxonomy summary comparing taxa across mulitple samples Technical Note: How To… Process Your 454 16S/18S rRNA Amplicons with QIIME 5 Creating rarefaction curves (Figure 4) Figure 4: Example from the QIIME tutorial of a rarefaction curve, used to determine the levels of diversity in samples Generating operational taxonomic units (OTU) heatmaps (Figure 5). Figure 5: Example from the QIIME tutorial of a heatmap comparing OTUs across samples Technical Note: How To… Process Your 454 16S/18S rRNA Amplicons with QIIME 6 Generating phylogenetic trees (Figure 6). Figure 6: Phylogenetic tree from the QIIME tutorial generated from the example dataset Generating OTU networks (Figure 7). Figure 7: Example from the QIIME tutorial of an OTU network that was produced using the sample dataset Technical Note: How To… Process Your 454 16S/18S rRNA Amplicons with QIIME 7 Figure 8: Principal Components Analysis (PCoA) 3-dimensional plot using the sample dataset in the QIIME tutorial. 2-D versions are also demonstrated in the tutorial. 7. LEARNING QIIME To take full advantage of QIIME when performing an amplicon study with the 454 sequencing system, a simple way is to first follow the tutorial on the QIIME website at http://qiime.org/tutorials/index.html. The tutorial will explain each QIIME tool to compare, contrast, organize and display data and walk you through the process to run them. Technical Note: How To… Process Your 454 16S/18S rRNA Amplicons with QIIME 8 Published by: Roche Diagnostics GmbH Sandhofer Straße 116 68305 Mannheim Germany © 2013 Roche Diagnostics All rights reserved. Notice to Purchaser For patent license limitations for individual products please refer to: www.technical-support.roche.com. For life science research only. Not for use in diagnostic procedures. Trademarks 454, 454 LIFE SCIENCES, 454 SEQUENCING, GS FLX, and GS JUNIOR are trademarks of Roche. All other product names and trademarks are the property of their respective owners. 07107145001 (1) 0913 Technical Note: How To… Process Your 454 16S/18S rRNA Amplicons with QIIME 9