How To… Process Your 454 16S/18S rRNA Amplicons with QIIME

Transcription

How To… Process Your 454 16S/18S rRNA Amplicons with QIIME
Sequencing Solutions Technical Note
September 2013
How To…
1. OVERVIEW
Process Your 454 16S/18S
rRNA Amplicons with QIIME
QIIME (pronounced “chime”) is an open source
software tool for analyzing raw data from high
throughput sequencing of microbiomes (Caporaso, J et
al. Nature Methods 7, 335 – 336, 2010). QIIME
computes results and generates output from 16S/18S
sequencing data and supports taxonomic resolution of a
wide range of microbial communities.
454 Sequencing platforms are ideal for sequencing the
small subunit (SSU) ribosomal RNA gene (16S or 18S)
because of their ability to:
 Multiplex hundreds of samples with samplespecific identifiers with the use of MIDs
 Produce >700,000 reads / run on the GS FLX+
system and >70,000 reads / run on the GS Junior system
Application
Ribosomal RNA sequencing
 Sequence long amplicons, up to 1,100 bp with
software v2.9 and Flow Pattern B (available on GS
FLX+ instruments and in early 2014 for the GS Junior
instrument). Currently, sequence up to 550 bp on the
GS Junior instrument.
 Choose sequencing depth for an optimal
taxonomic resolution
Products
GS FLX, GS FLX+ and GS Junior Systems
These features make 454 Sequencing systems the
standard Next Generation Sequencing platforms since
2008 for identifying SSU rRNA, with over 1,000 peerreviewed publications.
Here, we show an overview of the steps for setting up a
sequencing experiment for 16S or 18S rRNA genes on a
For life science research only. Not for use in diagnostic procedures.
1
454 Sequencing platform and analyzing the amplicon data with the QIIME software.
Questions and comments related to the sequencing run should be directed to your local Roche support team, while
QIIME related questions should be directed to QIIME’s user support group at QIIME.org.
2. STARTING WITH QIIME
Install the QIIME software on your computer following the instructions on the QIIME website (www.QIIME.org). It
can run on multiple computer formats and several operating systems are supported. Typically, users install a local
copy of QIIME on a Mac or Linux based computer, or use the provided Amazon cloud instance.
3. CREATING AMPLICONS WITH QIIME
The most efficient way to create 16S or 18S amplicons from 16S/18S hypervariable regions is to use the primers
posted on the QIIME website. These primers, along with PCR amplification protocols, were tested by QIIME
developers to create long amplicons that were successfully sequenced on the GS FLX+ instrument and analyzed with
QIIME. The details of the protocols, primer sequences, PCR conditions, and thermocycler settings can be
downloaded from the ‘Resources’ section at QIIME.org.
The MS Word file primer_ordering_and_resuspension.doc, available via download from the “Resources” section,
has instructions on ordering primers and preparing the amplicons. There are three primer sets described:

Primer set 515f - 1119r is targeted towards the eukaryotic rRNA gene and yields a roughly 800 bp amplicon
that spans the V4 – V6 variable regions of the small subunit rRNA gene. The forward primer, 515f, is
barcoded and is a three domain universal primer that picks up Eukaryotes, Bacteria, and Archaea. The
reverse primer, 1119r, is targeted towards Eukaryotes but is a poor match to vertebrates.

Primer set 515f – 1391r is made of universal primers that amplify the ribosomal rRNA gene of Bacteria,
Eukaryotes and Archaea. It yields an amplicon roughly 900 bp long for Bacteria and Archaea and 1,100 bp
long for Eukaryotes.

Primer set 515f – EukBr is targeted towards Eukaryotes and yields an amplicon roughly 1,200 bp long that
spans the V4 – V9 variable regions.
Technical Note: How To…
Process Your 454 16S/18S rRNA Amplicons with QIIME
2
4. TIPS FOR SEQUENCING YOUR 16S/18S SAMPLES
Cleaning the Amplicon Library before Sequencing
Before sequencing your 16S /18S amplicons, it is crucial to clean the amplicon library of short fragments that would
take up sequencing space from more valuable long reads. For instance, in the sequencing run below (Figure 1), the
short peaks below the 300 bp mark could have been easily eliminated with AMPure XP Beads or gel electrophoresis.
Short reads waste space that could be used for longer reads.
Figure 1: A 16S sequencing run with primer set 515f - 1391r. The intended product is approximately 900 bases. Primer dimer
peaks from the improperly purified library are evident below 300 bases.
Choosing the Right Analysis Pipeline
Figure 2 shows an 18S sequencing run on the GS FLX+ instrument, analyzed with the shotgun pipeline. Similar
results would have been obtained with the Long Amplicons 2 pipeline. Both analysis pipelines allow reads to be
trimmed at the 3’ ends if errors in the last 100 bases exceed a certain threshold (approximately, ~ Q20 or 1% error).
Because sequencing errors tend to increase with read length, the basecalls at the beginning of the reads will generally
be much better than Q20.
A total of four analysis pipelines are available: three for long amplicons (Long Amplicons 1, Long Amplicons 2, and
Long Amplicons 3) and one for shotgun. General guidelines on the performances of these pipelines are the
following:

Long Amplicons 1 pipeline may be overly stringent for most 16S/18S studies as it rejects too many reads. This
pipeline does not have read trimming capabilities, so partially good reads will not be trimmed back to eliminate
errors but will be eliminated. This signal processing pipeline was designed for sequencing applications, where
keeping the entire read is a requirement.

Long Amplicons 2 and shotgun pipeline produce reads with similar accuracies. For highly diverse samples,
shotgun might be preferred, but for samples with less diversity, Long Amplicons 2 may perform better. The
shotgun pipeline takes advantage of read diversity to perform accuracy calculations, so if diversity is low, many
good reads may be rejected by the shotgun pipeline.

Long Amplicons 3 pipeline tends to produce the most reads, however accuracy could be decreased. Users should
be aware that this pipeline may allow an error rate of ~2% in the last 100 bases of the read.
Technical Note: How To…
Process Your 454 16S/18S rRNA Amplicons with QIIME
3
Figure 2: An 18S sequencing run using primers 515f – EukBr. The read lengths for this amplicon are close to the limit of
sequencing possible for the GS FLX+ instrument with Flow Pattern B.
5. CHARACTERIZING YOUR SAMPLE WITH QIIME
Denoising and Read Filtering Tips
For long amplicons (up to 1,100 bp) that will be sequenced with Flow Pattern B (available with v2.9 software or
later), please be aware that Flow Pattern B will not work correctly with the flow-based denoiser used in QIIME.
Downstream QIIME analysis tools can still be used with Flow Pattern B, but the denoising step should be skipped
with Flow Pattern B sequencing runs. If denoising is needed, it is possible to use a non-flow based denoiser such as
Acacia (Bragg, et al., Nature Methods, pp. 425–426, 2012). Recommendations comparing the various denoiser
options will be available from the QIIME web site once benchmarking is complete.
Denoising Option
There is a filtering option when denoising is needed. However, the recommendation for filtering reads (using the ‘-w
50 –g ‘ combination to discard reads of poor quality) is very aggressive and can discard a large number of good
quality reads, sometimes more than 50% of the good reads in a run. If too many reads are discarded, we suggest
using one of the alternate trimming protocols below which use quality averages to identify where to trim off the low
accuracy ends of the reads.
Below are two examples of filtering from a mapping file named “HI2KAW301_mapping.txt”:
Example 1 The filter examines entire read quality, if the average falls below Q25, it trims the 3’ end until the quality
is above Q25.
split_libraries.py -o denoiser_dir -f HI2KAW301.fna -q HI2KAW301.qual -m HI2KAW301_mapping.txt -b 10
-p -k -e 1 -r –s 25 -l 200
Technical Note: How To…
Process Your 454 16S/18S rRNA Amplicons with QIIME
4
Example 2 The filter scans a sliding window of 50 bases, and trims where the 50 base average falls below Q25.
split_libraries.py -o denoiser_dir -f HI2KAW301.fna -q HI2KAW301.qual -m HI2KAW301_mapping.txt -b 10
-p -k -e 1 -r -w 50 -l 200
The denoiser can be used with Flow Pattern A runs generated on either the GS FLX or the GS Junior instrument.
However, in the QIIME tutorial, the 'denoise_wrapper.py' wrapper script will return error messages. To circumvent
this issue, please run the script manually from the command line as directed on the QIIME denioser.py page
(http://qiime.org/scripts/denoiser.html). In brief, after running split_libraries.py, denoiser.py can be run as below:
denoiser.py -c -n 8 -f denoiser_dir/seqs.fna -i HI2KAW301.sff.txt -o denoiser_dir/out
6. QIIME ANALYSIS OUTPUT
The final result of a 16S/18S amplicon study is to characterize the microbial community by several means that can
include the following:
Comparing taxa across samples (Figure 3)
Figure 3: Example from the QIIME tutorial of a taxonomy summary comparing taxa across mulitple samples
Technical Note: How To…
Process Your 454 16S/18S rRNA Amplicons with QIIME
5
Creating rarefaction curves (Figure 4)
Figure 4: Example from the QIIME tutorial of a rarefaction curve, used to determine the levels of diversity in samples
Generating operational taxonomic units (OTU) heatmaps (Figure 5).
Figure 5: Example from the QIIME tutorial of a heatmap comparing OTUs across samples
Technical Note: How To…
Process Your 454 16S/18S rRNA Amplicons with QIIME
6
Generating phylogenetic trees (Figure 6).
Figure 6: Phylogenetic tree from the QIIME tutorial generated from the example dataset
Generating OTU networks (Figure 7).
Figure 7: Example from the QIIME tutorial of an OTU network that was produced using the sample dataset
Technical Note: How To…
Process Your 454 16S/18S rRNA Amplicons with QIIME
7
Figure 8: Principal Components Analysis (PCoA) 3-dimensional plot using the sample dataset in the QIIME tutorial. 2-D
versions are also demonstrated in the tutorial.
7. LEARNING QIIME
To take full advantage of QIIME when performing an amplicon study with the 454 sequencing system, a simple way
is to first follow the tutorial on the QIIME website at http://qiime.org/tutorials/index.html.
The tutorial will explain each QIIME tool to compare, contrast, organize and display data and walk you through the
process to run them.
Technical Note: How To…
Process Your 454 16S/18S rRNA Amplicons with QIIME
8
Published by:
Roche Diagnostics GmbH
Sandhofer Straße 116
68305 Mannheim
Germany
© 2013 Roche Diagnostics
All rights reserved.
Notice to Purchaser
For patent license limitations for individual products please refer to: www.technical-support.roche.com.
For life science research only. Not for use in diagnostic procedures.
Trademarks
454, 454 LIFE SCIENCES, 454 SEQUENCING, GS FLX, and GS JUNIOR are trademarks of Roche.
All other product names and trademarks are the property of their respective owners.
07107145001 (1) 0913
Technical Note: How To…
Process Your 454 16S/18S rRNA Amplicons with QIIME
9