Data analysis for DNA and RNA NGS

Transcription

Data analysis for DNA and RNA NGS
Illumina Analysis
Solutions
Thomas Patrick Klemm
Sr. Sales Specialist
South APAC
Illumina Singapore
© 2014 Illumina, Inc. All rights reserved.
Illumina, 24sure, BaseSpace, BeadArray, BlueFish, BlueFuse, BlueGnome, cBot, CSPro, CytoChip, DesignStudio, Epicentre, GAIIx, Genetic Energy, Genome Analyzer, GenomeStudio, GoldenGate,
HiScan, HiSeq, HiSeq X, Infinium, iScan, iSelect, ForenSeq, MiSeq, MiSeqDx, MiSeqFGx, NeoPrep, Nextera, NextBio, NextSeq, Powered by Illumina, SeqMonitor, SureMDA, TruGenome, TruSeq,
TruSight, Understand Your Genome, UYG, VeraCode, verifi, VeriSeq, the pumpkin orange color, and the streaming bases design are trademarks of Illumina, Inc. and/or its affiliate(s) in the U.S. and/or
other countries. All other names, logos, and other trademarks are the property of their respective owners.
Agenda
Intro
MiSeq Reporter - MiSeq Automated Analysis Workflows
DesignStudio – Creating Your own Custom Panel
VariantStudio – Targeted Variant Analysis
BaseSpace – Cloud Analysis and Storage
2
Welcome to the Future!
3
NGS changing lives!
4
Saving children’s lives
The technology is ready!
5
Inherited Pediatric Diseases
6
“Baby-Seq” – Starting early
7
“Obama-Seq” – Precision Medicine with FDA
8
Ebola
Illumina collaborating with the BROAD and USAID
9
Our Vision
Innovating for the Future of Human Health
To improve human health by unlocking the power of the genome
10
Seamless End to End Genomics Solution
Illumina
Bioinformatics
Simplest
NGS
Workflow
Most Peer
Reviewed
Technology
11
Integrated,
Optimized
Sample Prep
Broadest
Applications
Largest
Community
of Users
Illumina’s Suite of Library Prep Solutions
TruSeq RNA
TruSeq DNA PCR-Free
TruSeq Custom Amplicon
TruSeq Stranded mRNA
TruSeq Nano DNA
TruSight Tumor
TruSeq Stranded Total RNA
Nextera
TruSeq Amplicon Cancer Panel
TruSeq Small RNA
Nextera XT
TruSight Myeloid
TruSeq Targeted RNA Expression
Nextera Mate Pair
Nextera Rapid Capture Exome/Custom
TruSeq RNA Access
TruSeq Synthetic Long Read TruSight Panels
TruSeq ChIP
12
No matter the input, all libraries end up looking
similar
Dual Index Library shown
The aim of the Library Prep step is to obtain nucleic
acid fragments with adapters attached on both ends
13
Intuitively Designed Software for Quick Adoption
MiSeq Control Software
MiSeq Reporter
Design Studio
14
For Research Use Only. Not for use in diagnostic procedures
Illumina Experiment Manager
BaseSpace
MiSeq Reporter - Automated Analysis
Workflows
15
Analysis Overview
Analysis Type
Software
Outputs
Control Software
MCS/RTA
Images, Intensities and Base Calls
Analysis Software
Alignments, Variant Detection
Visualization
Software
Annotation, Filtering, Reports
16
MiSeq Reporter
Unprecedented Walk-Away Informatics Solution
Automated secondary analysis for key applications:
Resequencing
Amplicon sequencing
16S Metagenomics
de novo assembly
Small RNA
Library QC
Integrated analysis hardware
Output in standard formats:
Fastq
BAM
Vcf
txt
17
MiSeq Applications Portfolio
Integrated. Optimized. Simplified.
Amplicon
Sequencing
Custom
Amplicon
Targeted
Resequencing
Custom
Enrichment
Small RNA
sequencing
Clone
checking
ChIP-Seq
Library QC
Plasmid
Regulation
RNA-Seq
Resequencing
Small
genome
RNA
sequencing
De novo
sequencing
16S
Metagenomics
18
For Research Use Only. Not for use in diagnostic procedures
General Summary Report – All Apps
Low %
High %
19
Clusters
Mismatch
MSR Workflows: DS Workflow
Common Applications:
– TruSight Tumor samples, especially FFPE
– Circulating DNA
Process:
1. Reads are aligned against unique targeted manifest for each strand (banded SmithWaterman aligner)
2. Variants are called using the somatic variant caller
3. Forward and reverse strand variants are compared and reconciled in final output
Outputs:
–
–
–
–
20
FASTQ
BAM
VCF
gVCF
For Research Use Only. Not for use in diagnostic procedures
MSR Workflows: De Novo Assembly
Common applications:
– De novo assembly of small genomes
Process:
1. Uses Velvet assembler to reconstruct small genomes through use of contigs, without
the need for a reference
2. Can compare with known reference if available to generate dot-plot
Outputs:
– FASTA file containing contigs
– Dot plot png
21
For Research Use Only. Not for use in diagnostic procedures
De Novo Assembly Details Report
De novo metrics
Syntenic dot plot
(if reference
genome supplied)
 Assembly = Velvet
22
MSR Workflows: Generate FASTQ
Common applications:
– Most flexible intermediate output for any downstream analysis outside of MSR
– Analogous to the “BCL to FASTQ Converter” utility
Process:
1. Reads are assembled from base call files and written to FASTQ
2. FASTQ is ready for additional processing, such as alignment
Outputs:
– FASTQ
23
For Research Use Only. Not for use in diagnostic procedures
MSR Workflows: Library QC
Common applications:
– QC analysis of libraries before pooling and running on higher throughput instruments
(HiSeq, NextSeq)
Process:
1. Reads are aligned against reference genomes (BWA)
2. Per sample statistics written to report file
Outputs:
– FASTQ
– BAM
– HTML report
24
For Research Use Only. Not for use in diagnostic procedures
Library QC Details Report
Zoom in/out
Table of samples
Scope of view
Coverage and
error
Q score
Table of targets
 Alignment = Burrows-Wheeler Alignment (BWA)
26
MSR Workflows: Enrichment
Common applications:
– Large, targeted panels using pulldown enrichment/capture
– Nextera Rapid Capture Custom, Nextera Rapid Capture Exome (8 rxn x 1 plex)
Process:
1. Reads are aligned against whole genome reference (BWA)
2. Variants are called using the standard variant caller (GATK), or the somatic variant
caller if specified in the sample sheet (particularly useful on cancer samples)
Outputs:
–
–
–
–
27
FASTQ
BAM
VCF
gVCF
For Research Use Only. Not for use in diagnostic procedures
Enrichment Details Report
Zoom in/out
Scope of view
Coverage and
error
Table of samples
Q score
SNP/indel +
annotation
Table of targets
Table of variants
 Alignment = Burrows-Wheeler Alignment (BWA)
 Variant calling = Genome Analysis Toolkit (GATK) (default);
Somatic variant caller; (option)
28
MSR Workflows: Metagenomics
Common application:
– Bacteria population analysis based on 16S rRNA amplicons
– Generate taxonomic classification data down to species level
– Integrates seamlessly with Illumina’s 16S Demonstrated Protocol (V3-V4 amplicons)
Process:
1. Reads are classified by sorting against 16S database, GreenGenes (V1-V9 regions)
2. Per sample statistics written to report files and plots
Outputs:
–
–
–
–
29
FASTQ
BAM
GUI Plots
HTML report
For Research Use Only. Not for use in diagnostic procedures
Metagenomics
Taxonomic Level
Classification
Table of samples
 Classifier is based on GreenGenes 16S rRNA database
30
MSR Workflows: Amplicon
Common applications:
– Analysis of PCR amplicons fragmented with Nextera tagmentation
Process:
1. Reads are aligned (BWA) against a custom built manifest file from IEM
2. Variant analysis in regions of interest (GATK)
Outputs:
– FASTQ, BAM, VCF, gVCF
31
For Research Use Only. Not for use in diagnostic procedures
PCR Amplicon Detail Report
Zoom in/out
Scope of view
Coverage and
error
Table of samples
Q score
SNP/indel +
annotation
Table of targets
Table of variants
 Alignment = Burrows-Wheeler Alignment (BWA)
 Variant calling = Genome Analysis Toolkit (GATK) (default);
Somatic variant caller; Starling (option)
32
Amplicon Detail Report
Zoom in/out
Scope of view
Coverage and
error
Table of samples
Q score
SNP/indel +
annotation
Tableof
ofvariants
targets
Table
 Alignment = Banded Smith-Waterman
 Variant calling = Genome Analysis Toolkit (GATK) (default);
Somatic variant caller; Starling (option)
33
MSR Workflows: Resequencing
Common applications:
– Small genome analysis (~20mb or smaller)
Process:
1. Reads are aligned against reference genomes (BWA)
2. Variant analysis in regions of interest (GATK)
Outputs:
–
–
–
–
34
FASTQ
BAM
VCF
gVCF
For Research Use Only. Not for use in diagnostic procedures
Resequencing Details Report
Zoom in/out
Scope of view
Coverage and
error
Table of samples
Q score
SNP/indel +
annotation
Table of targets
Table of variants
 Alignment = Burrows-Wheeler Alignment (BWA)
 Variant calling = Genome Analysis Toolkit (GATK) (default);
Somatic variant caller; Starling (option)
35
MSR Workflows: Small RNA
Common applications:
– Small RNA abundance measurements typically important in transcription regulation
– Often important for cancer research
Process:
1. Reads are aligned against databases for mature miRNA (miRBase), small RNA, and
a genomic reference using Bowtie (flexible reference storage)
2. Small RNA hits and relative species abundance is reported
Outputs:
–
–
–
–
36
FASTQ
BAM
TXT reports
Charts
For Research Use Only. Not for use in diagnostic procedures
Small RNA Summary Page
Cluster Info
Trimmed read
lengths
37
Small RNA Details Report
Distribution of
RNA species
Samples Table
38
Top 10 most
abundant species
MSR Workflows: Targeted RNA
Common applications:
– TruSeq Targeted RNA Expression
Process:
1. Reads are aligned against custom manifest file (banded Smith-Waterman)
2. Reports relative expression of genes and isoforms between several samples
Outputs:
– FASTQ
– BAM
– HTML report
39
For Research Use Only. Not for use in diagnostic procedures
Targeted RNA Summary
Samples
Comparison Graph
Comparison Table
40
MSR Workflows: TruSeq Amplicon
Common applications:
– TruSeq Custom Amplicon panels
Overview:
1. Reads are aligned against custom manifest from DesignStudio (banded SmithWaterman)
2. Variants called against reference genome (GATK or somatic variant caller)
Outputs:
–
–
–
–
41
FASTQ
BAM
VCF
gVCF
For Research Use Only. Not for use in diagnostic procedures
DesignStudio – Creating Your own Custom
Panel
42
2.0
Technical training
Illumina’s complete self-service tool for targeted resequencing assay design.
Build custom panels with no need for bioinformatics expertise.
Fully supported and optimized targeted NGS assays:
43

TruSeq Custom Amplicon (TSCA)

Nextera Rapid Capture Custom

Targeted RNA Expression
2.0
Technical training
With a completely new user experience and interface, DesignStudio 2.0
streamlines custom panel design for new and returning customers.
44

Create free custom designs before purchasing your panel

Get started quickly with our new assay selector tool

Move rapidly from design
order in five easy steps
Technical training
DesignStudio Workflow
Assay Type
45
Technical training
DesignStudio Workflow
Assay Type
Select a targeted
technology clearly
and intuitively
46
Technical training
DesignStudio Workflow
Configure Design
47
Technical training
DesignStudio Workflow
Configure Design
Clear visualization of
configuration options
48
Technical training
DesignStudio Workflow
Add Targets
49
Technical training
DesignStudio Workflow
Add Targets
Easier target entry, and design submission
50
Technical training
DesignStudio Workflow
Submit Design
51
Technical training
DesignStudio Workflow
Review Design
52
Technical training
DesignStudio Workflow
Review Design
Emphasized coverage, and easy order
53
Technical training
DesignStudio Workflow
Order
54
2.0
Technical training
Additional new features, including:

Dynamic design reports presenting target-based coverage and variant information

Direct download of custom manifest files for streamlined analysis

One-click access to the UCSC Genome Browser

Convenient visualization of DNA target types
Notes:

55
All existing customer designs will be automatically migrated to DesignStudio 2.0
Technical training
DesignStudio Dynamic Reports
View coverage and gap details
56
Technical training
DesignStudio File Download
Direct access to manifest files
57
Technical training
DesignStudio Link to UCSC Browser
Convenient visualization of designs
58
VariantStudio – Targeted Variant Analysis
59
VariantStudio (also on BaseSpace)
Intuitive analysis and interpretation of genomic data
Variant
Pathogenic
Annotation
• Variant, transcript,
and gene level
• Transcript
consequence
• Functional impact
• Overlap with
functional elements
• Allele frequencies
• Disease association
• Literature searches
60
Filtering
Likely
Unknown
Pathogenic Significance
Likely
Benign
Benign
Interpretation
• Single sample, tumornormal pairs, and
family-based filtering
• Record variant
classification and
interpretation
• Combine filters and
save as a workflow
• Store information in
Classification
Database to apply to
future samples
Reporting
• Create report
templates
• Generate reports with
variant interpretations
VariantStudio
Small Variant Annotation, Filtering, and Reporting
TopHat
Alignment
TruSeq
Amplicon
Amplicon DS
BWA Enrichment
BWA
WGS
VariantStudio
Isaac
Enrichment
Isaac
WGS
Tumor
Normal
61
Illumina VariantStudio Software Tool
User friendly analysis and interpretation
Intuitive user interface for easy data exploration
Rich annotation from a broad range of sources
Flexible and comprehensive set of filters
62
Illumina VariantStudio Workflow
Data in, biological knowledge out
Import VCF File
Export annotated/filtered variants
Illumina VariantStudio Desktop Client
63
Easy-To-Use Software Application
Intuitive user interface for analysis and exploration
Ribbon Menu
Gene View
Filters Pane
Transcripts
Pane
Variants
Table
Filter History
64
Rapid And Complete Annotation
Enrich data with biological context
Provides annotations at:
– Variant level
– Transcript level
– Gene level
Categories of annotations
include:
– Transcript consequence
(synonymous, frameshift,
missense, etc)
– Functional impact (PolyPhen
and SIFT damaging, benign,
etc)
– Population allele frequencies
(1000 Genomes, dbSNP, etc)
– Disease association (HGMD,
OMIM, COSMIC, etc)
– Scientific literature (PubMed)
65
Annotation Databases
Increasing power to interpret clinical impact
ClinVar
• Aggregates information about variation and its clinical significance
• Info submitted by expert panels, professional societies, testing
laboratories, and curatorial groups
GeneReviews
• Expert-authored disease descriptions focusing on clinically
relevant and actionable information on diagnosis, management,
and counseling of patients with specific inherited conditions
MedGen
• Organizes information about conditions such as clinical features,
related genes, practice guidelines, and ontologies
Snomed CT
• Standardizes clinical health care terminology that provides a
consistent way of capturing, sharing, and aggregating heath data
66
Interactive Filtering
Zeroing in on the disease relevant variants
Access commonly applied
filters in the user interface
–
–
–
–
–
–
Variant type
Variant quality
Allele frequency
Functional impact
Gene association
Sample comparison
Filter or sort variants based on
any columns in the variant table
Combine filters to create
workflows
67
1,000,000s
Detected Variants
10,000s
Coding Variants
100s
Deleterious
Variants
Causal
few
Variants
Determining Appropriate Filtering Parameters
Filtering parameters are customizable, and have no default setting
There are no “right” or “wrong” settings
Filtering parameters should be tailored to meet
– Required analytical performance

Confidence of variant calls
– Desired data

Germline, mosaic, or somatic mutations?
– Assay type

FFPE samples? High coverage?
Numbers will be determined by community standards, assay specific
data generated during assay validation
– Lab/assay dependent for Laboratory Developed Tests (LDTs)
– Predetermined and fixed for In-Vitro Diagnostic Tests (IVDs)
68
The Filtering Tab
69
Variant Quality Filter
70
Inherited Disease Analysis
Family analysis increases power to identify causative variants
TruSight One + VariantStudio delivers
a sample-to-report workflow
– Broadest coverage of genomic
content (4,813 genes) with known
association to clinical phenotypes
– Superior performance with high
coverage uniformity
– Sequence a family trio in a single
MiSeq run
– Rapid identification and reporting of
causative mutations in VariantStudio
71
Family-based Filtering
Rapid identification of mutations in inherited diseases
Increases power to isolate causative variants
by levering family information
Analysis support for trio plus multiple
affected or unaffected siblings
Identifies variants consistent with a particular
inheritance mode, including recessive,
dominant, x-linked, and de novo
Reports whether allele in proband is
inherited from mother or father
72
Population Based Filtering
Mutations with a high frequency in the
population are unlikely to be pathogenic
– Otherwise most of us would be sick!
Removing high frequency mutations
observed in the 1000 genomes database
helps us ignore common and benign
mutations
This can be done in a geographic specific
manner
73
Tumor Normal Paired Analysis
Quick identification of somatic mutations
Cross Sample Subtraction tool filters
for variants present in one sample but
not the other
74
Using Cross Sample Subtraction
Removing germline variants with data from normal tissue
75
Save Filtering Settings for Future Use
Please click “Filter Favorites”, then “Save” to create a new custom filter that can be
used for future analyses
76
What Mutations Remain?
Which genes?
What type of mutation?
Are they somatic? What is the variant frequency?
What are the consequences to the genes?
Are these variants present in annotation databases?
What information is available on these variants?
Have your own information on these variants? Add a custom database!
Take a moment to explore the annotation information and database
links provided, please take notes/links/citations to add to your
classification database and use in your final report
77
Illumina VariantStudio
Intuitive analysis and interpretation
Import
Data
Annotate
Filter
Classify
Report
Insight
78
Variant Classification
Sorting variants according to their impact
Pathogenic
Likely
Benign
Likely
Pathogenic
Unknown
Significance
Benign
Additional
Categories…
Classification: The assignment of variant to a defined category based on
an assessment of its clinical impact
79
Standard Guidelines Can Be Used
http://www.ncbi.nlm.nih.gov/pubmed/18414213
80
Assign Classification
Click on the “…” under the “Classification” header
To include in the
report, assign
Classifications
by clicking on
“…”
81
Illumina VariantStudio
Intuitive analysis and interpretation
Import
Data
Annotate
Filter
Classify
Report
Insight
82
Customizable Reporting
Summarize significant results in sample report
83
BaseSpace – Cloud Analysis and Storage
84
Illumina Sequencing
Streamlined NGS Solutions
INTEGRATED
SAMPLE
MANAGEMENT
CORE APPS
All major
biological
applications
Workflow • Storage • Analysis • Sharing
3RD PARTY
APPS
A broad and
growing
ecosystem
EASY
SHARING
BaseSpace is
the place
where ideas
foster
ONE CLICK
DELIVERY
No FTP site,
no hard drive
to ship
85
BaseSpace Apps are being used all around the
world (> 10K Analyses per month)
86
APAC App Launches 2015 (1H)
87
BaseSpace Growth – APAC 1H2015
88
What?
BaseSpace is Illumina’s genomic cloud computing environment
BaseSpace
Eliminates need for onsite storage and compute
Web based data management and analysis
Tools for collaboration and sharing
Available for Illumina and non-Illumina customers
Signup via My Illumina (formerly iCom)
89
Why?
BaseSpace is the best place to put your NGS data
BaseSpace already built into MiSeq
Secure and reliable
Simple to use
Reads and qualities
Sample and experiment descriptions
Analysis results
variants
contigs
metagenomes
coverage statistics
miRNA counts
more…
90
How?
BaseSpace is a computer-free NGS analysis tool
Automatic push to cloud
Walk away bioinformatics
Results available anywhere, anytime
Browse the results via web-based
graphical environment
Access to a growing suite of
analysis tools
91
Easy to Use
BaseSpace allows seamless sharing and collaboration
Share results with peers
Make results publically available
Provide deep access to raw data
Share your challenges with Illumina
technical support.
92
Direct Integration with BaseSpace
MiSeq users have the option to “push” data to BaseSpace
93
MiSeq Pushes Data to BaseSpace
94
Simple to Use
BaseSpace is built into MiSeq
Results available within two hours of run
completion
From anywhere in the world
With near ZERO human interaction
95
MSR vs BaseSpace
MiSeq Reporter
BaseSpace
Automatic data analysis & reporting


Offline analysis


Unlimited licensed users


Seamless data sharing with collaborators

Scalable data storage & archiving

Latest version of bioinformatics tools

Access results anywhere, anytime

96
BaseSpace Dashboard
97
Three App Types (61 Total, May 2015)
Illumina Core Apps
– Developed by Illumina Developers
– Rigorous Software Testing and Documentation
– Supported by Illumina Technical Support
BaseSpace Labs App
– Developed by Illumina Developers
– Lite Testing and Documentation
– Not supported by Illumina Technical Support
3rd Party apps
– Developed by 3rd Parties
– Not supported by Illumina Technical Support
98
18 Illumina Core Apps
16S Metagenomics
TopHat Alignment
BWA Enrichment
BWA
WGS
Small RNA
99
Cufflinks Assembly &
DE
RNA
Express
Variant
Studio
Isaac
Enrichment
Broad IGV
TruSeq
Amplicon
Amplicon DS
Isaac
WGS
Tumor
Normal
Long Read Assembly
Long Read Phasing
TruSeq Targeted
RNA
MethylSeq
12 BaseSpace Labs Apps
FASTQ Toolkit
FastQC
NextBio Annotates
RNA-Seq
Prokka Genome
Annotation
Velvet de novo
Assembly
SRA Submission
100
Kraken
Metagenomics
NextBio
Transporter
PicardSpace
SRA Import
VCAT
SRST2
31 Third Party Apps
DNA Star
DeepChekHBV, HCV,
HIV
Melanoma
Profiler
MetaPhlAn
iPathwayGuide
EDGC
Annotator
GENIUS
Metagenomics:
Know Now
LoFreq Rare
Variant Caller
MyFLq
OncoMD
Novoalign
Protein Expression
Assembler
Genomatix
Elastic Genome
SPAdes Genome
Pathway System
Browser
Assembler
RNA-Seq
Translator
PathSEQ
Virome
Protein Expression
Workflow
miRNA Analysis
Genome
Profiler
SWATH Atlas
Variant
Interpreter
GeneTalk Variant
Analyzer
101
Tute
Genomics
PEDANT
Protein Expression
Sequence-Analyzer
Analytics
Protein Expression
Extractor
BaseSpace News
http://blog.basespace.illumina.com/
BaseSpace Mount – Use BaseSpace from Linux interface (Pros only!)
Haplotype comparison options via FASTQ Toolkit App
Differential Methylation App
MiSeq Reporter App – Built In MSR with upgraded visualization
WGS v4.0
–
–
–
–
–
–
–
–
–
–
102
New structural variant (SV) caller
New copy number variant (CNV) caller
On node annotation for increased performance and stability
Annotation of minor alleles that correspond to the reference genome (refminor)
Ploidy Correction for Sex Chromosomes for small variants
Variant quality score recalibration for small variant calling
Isaac2 aligner (better performance and performs supplementary alignments)
Merging of SV/CNV files to a single VCF file
Bug and stability fixes
Multi-node analysis (analyze up to 96 samples in parallel with each App launch)
RNA-Seq
103
Four Easy Steps to RNA-Seq Results
I.
Set up/ Run TopHat
II. QC of TopHat results
1. Filter out challenging
samples as needed
III. Set up/ Run Cufflinks
1.Name/ Select Control Group
2.Name/ Select Comparison Group
IV. Visualize Group 1: Group 2
GEX correlation
104
Four Easy Steps to RNA-Seq Results
I.
Set up/ Run TopHat
II. QC of TopHat results
1. Filter out challenging
samples as needed
III. Set up/ Run Cufflinks
1
2
1.Name/ Select Control Group
2.Name/ Select Comparison Group
IV. Visualize Group 1: Group 2
GEX correlation
105
3
Four Easy Steps to RNA-Seq Results
I.
Set up/ Run TopHat
II. QC of TopHat results
Insert Length Distribution
1. Filter out challenging
samples as needed
III. Set up/ Run Cufflinks
1.Name/ Select Control Group
2.Name/ Select Comparison Group
IV. Visualize Group 1: Group 2
Alignment Distribution
GEX correlation
Transcript coverage
106
Four Easy Steps to RNA-Seq Results
I.
Set up/ Run TopHat
1
II. QC of TopHat results
1. Filter out challenging
samples as needed
III. Set up/ Run Cufflinks
1.Name/ Select Control Group
2
2.Name/ Select Comparison Group
IV. Visualize Group 1: Group 2
GEX correlation
3
4
107
Four Easy Steps to RNA-Seq Results
I.
Set up/ Run TopHat
Correlation heat map and dendogram
II. QC of TopHat results
1. Filter out challenging
samples as needed
III. Set up/ Run Cufflinks
1
2
1.Name/ Select Control Group
2.Name/ Select Comparison Group
IV. Visualize Group 1: Group 2
3
GEX correlation
Filter expression levels by log2 ratios (1), significance (2),
Or gene families/names (3), and export filtered list in .csv
108
RNA-Seq for detection of gene fusions
The BCR-ABL fusion is quickly detected in this analysis of a
UHRR stock sample (mixture of CML, breast cancer and
other cancer cell lines)
The BCAS gene fusion, implicated in leukemia and
breast cancers, is detected in the same sample
109
RNA-Seq for detection of critical genomic
signatures
Efficient, automatic detection of cSNPs and indels in RNA-Seq reads
110
RNASeq Time-to-Answer on BaseSpace Cloud
NextSeq Hi Output Mode –TopHat/Cufflinks
RNA-Seq Experiment
Per-Sample Timings1,2
Read/ Sample
Specifications
High End Experiment:
total RNA Expression Profiling, identify alt
transcripts, fusion calling and cSNPs 3
3h
50M PE clusters
2x75bp
8 samples /run
Mid-level Experiment:
mRNA Expression Profiling, identify alt
transcripts
70 min
25M PE clusters
2x75bp
16 samples/run
Low End Typical Experiment:
Expression Profiling Only
14 min
10M SE clusters
1x50bp
40 samples/run
Parallel cloud processing means gene expression
profiling is obtained in as little as 4 minutes per sample
1. Does not include bcl upload and demultiplexing/ fastq generation times
2. Extrapolated time, since a “per-sample” differential expression time is not possible
3. Single NextSeq run only supports 1 vs 1 differential expression experiment at high end
111
RNAExpress: Fast and accurate Gene Expression
Profiling
RNA-Express
TopHat/
Cufflinks
RNA Express
Abundant sequence filtering
Yes
Yes
Sequence alignment
Yes
Yes
Variant calling
Yes
No
Fusion calling
Yes
No
Transcript assembly
Yes
No
Gene abundance estimation
Yes
Yes
Transcript abundance estimation
Yes
No
Differential expression
Yes
Yes
Feature
112
Exome
113
Push Button Exome Analysis in BaseSpace
Designed for biologists: tailor-made Apps with high usability
Detection of SNPs and small indels
Graphical aggregate and per-sample reports deliver key variant and enrichment metrics
Two App options provide greater research flexibility
– BWA/GATK: Industry-standard method. Available today
– Isaac: Illumina’s fast and accurate alignment and variant calling alternative
Per-sample compute time only 2-5 hours on BaseSpace Cloud (read-depth dependent)
114
Push Button Exome Analysis in BaseSpace
Elegant Apps to address complex workflows
BWA
Isaac
Enrichment v2.1 Enrichment v2.1
Third-party tools currently needed to extract somatic
variants from per-sample variants
115
Push Button Exome Analysis in BaseSpace
Elegant Apps to address complex workflows
BWA
Isaac
Enrichment v2.1 Enrichment v2.1
Third-party tools currently needed to extract somatic
complex
workflow
encapsulated in a click-and-go interface
variantsAfrom
per-sample
variants
116
Push Button Exome Analysis in BaseSpace
Elegant Apps to address complex workflows
BWA
Isaac
Enrichment v2.1 Enrichment v2.1
All Nextera Rapid Capture Exome
and TruSight Fixed Content
manifests supported
117
Push Button Exome Analysis in BaseSpace
Comprehensive aggregate reports
Aggregate reports
enable quick analysis of
metrics across all
samples
BWA
Isaac
Enrichment v2.1 Enrichment v2.1
– Access variant files
and statistics
– Identify enrichment/
off-target rates
– Explore biological
context of variants
such as SNVs and
indels
Exome reports provide a quick, high-level aggregate summary of enrichment
and sequencing statistics across many samples
118
Push Button Exome Analysis in BaseSpace
Comprehensive aggregate reports
BWA
Isaac
Enrichment v2.1 Enrichment v2.1
Exome reports provide a quick, high-level aggregate summary of enrichment
and sequencing statistics across many samples
119
Push Button Exome Analysis in BaseSpace
Comprehensive per-sample reports
Per-sample reports allow you to quickly drill down to detailed statistics
120
Exome Analysis in BaseSpace
BWA/GATK vs. Isaac Accuracy Comparison on NA128781
2
3
Method/ Coverage
Variant type
Specificity
Sensitivity
BWA 98x
SNV
0.994
0.914
Isaac 98x
SNV
0.998
0.857
BWA 111x
SNV
0.994
0.928
Isaac 111x
SNV
0.998
0.879
BWA 116x
SNV
0.995
0.931
Isaac 116x
SNV
0.998
0.883
BWA 98x
Indel
0.929
0.756
Isaac 98x
Indel
0.813
0.790
BWA 111x
Indel
0.938
0.798
Isaac 111x
Indel
0.818
0.812
BWA 116x
Indel
0.937
0.787
Isaac 116x
Indel
0.838
0.826
Choose the industry-standard BWA/GATK method or the faster and
accurate Isaac method according to your specific research needs
1.
2.
3.
121
NA12878 datasets from Platinum Genomes: http://www.illumina.com/platinumgenomes/
Mendelian non-conflict rate for the variants called in the trio set
Recovery rate of child variants reported in Kidd et al [Nature. 2008 May 1;453(7191):56-64] (~95k SNVs and ~11k indels)
Whole Genome Sequencing (WGS) and
Tumor-Normal WGS
122
WGS with BWA and Isaac
Elegant Apps to address complex workflows
Industry-standard BWA method
Illumina’s own fast and accurate
Isaac method
* If using Isaac Variant Calling, SV information
from Grouper is used during variant calling
123
BWA/ GATK
WGS v1
Isaac
WGS v2.0
WGS References Supported
Human (UCSC HG 19)
Mouse (UCSC MM9)
Rat (UCSC RN5)
Rhodobacter (NCBI 2005-10-07)
E. Coli DH10B (NCBI 2008-03-17)
E. Coli MG1655 (NCBI 2001-10-15 )
S. Cerevisia (UCSC sacCer2)
Drosophila (UCSC version dm3)
Phi X (Illumina)
Arabidopsis thaliana (NCBI 9.1)
B. Taurus (Ensembl UMD3.1)
S. Aureus NCTC 8325 (NCBI 2006-02-13)
124
BWA/ GATK
WGS
Isaac
WGS
WGS Analysis in BaseSpace
BWA/GATK vs. Isaac Accuracy Comparison on NA128781
Method
Total SNV count
Ts/Tv ratio
SNV Het/Hom
ratio
SNV novelty
rate
Isaac
3,600,181
2.07
1.61
3.88 %
BWA/GATK
3,274,233
2.09
1.50
3.19 %
Unique to
Isaac
125
1.
50x NA12878 dataset from Platinum Genomes
(http://www.illumina.com/platinumgenomes/)
Unique to
BWA/GATK
WGS Analysis in BaseSpace
BWA/GATK vs. Isaac Accuracy Comparison on NA128781
Method
Total Indel
count
Indel Het/Hom
ratio
Indel novelty
rate
Isaac
602,822
1.90
7.55 %
BWA/GATK
674,928
1.48
7.46 %
Unique to
Isaac
126
1.
50x NA12878 dataset from Platinum Genomes
(http://www.illumina.com/platinumgenomes/)
Unique to
BWA/GATK
WGS Analysis in BaseSpace
BWA/GATK vs. Isaac Accuracy Comparison on NA128781
SNV Statistics
Method
Specificity
Sensitivity
Isaac
0.999
0.955
BWA / GATK
0.999
0.897
Indel Statistics
Method
Specificity
Sensitivity
Isaac
0.977
0.916
BWA / GATK
0.987
0.950
Choose the industry-standard BWA/GATK method or the faster and
accurate Isaac method according to your specific research needs
1.
2.
3.
127
NA12878 datasets from Platinum Genomes: http://www.illumina.com/platinumgenomes/
Mendelian non-conflict rate for the variants called in the trio set
Recovery rate of Child variants reported in Kidd et al (Nature. 2008 May 1;453(7191):56-64) (~95k SNVs and ~11k indels)
WGS Analysis in BaseSpace
BWA/GATK vs. Isaac Compute Times
BWA/ GATK
WGS v1
~30x coverage
Choose the industry-standard BWA/GATK method or the faster and
accurate Isaac method according to your specific research needs
128
Isaac
WGS v2.0
Tumor Normal Analysis
Whole-genome based detection of somatic variants
Tumor Normal
SNV/
Indel
(Strelka)
Leverages Illumina’s Strelka-based method used by leading academic labs
– Wash U
– BCCA
– Kennedy-Krieger1
1. N Engl J Med 2013; 368:1971-1979
129
Tumor Normal Analysis
Whole-genome based detection of somatic variants
Tumor Normal
SNV/
Indel
(Strelka)
Leverages Illumina’s Strelka-based method used by leading academic labs
– Wash U
– BCCA
– Kennedy-Krieger1
1. N Engl J Med 2013; 368:1971-1979
130
Graphical Overview of Genome-scale somatic
rearrangements
131
Graphical Overview of Genome-scale somatic
rearrangements
132
Graphical Overview of Genome-scale somatic
copy number aberrations
Top: ration of tumor:normal read depth. Copy (red), copy number gains (red), losses (green)
Bottom: Variant allele frequencies in the tumor sample at dbSNP positions where the normal
sample is heterozygous
133
Somatic Variants in Detail
134
BaseSpace Core Apps for 16S metagenomics
135
BaseSpace Core Apps for 16S metagenomics
16S metagenomics
Tailor-made workflows accessible to the bench biologist
– Click-and-go user interface
– Graphical, interactive display of biological results
Optimized tools for better classification
– Hig Performance RDP1 : Optimized implementation of RDP naïve
Bayesian classification algorithm which takes advantage of longer k-mer
lengths and paired-end reads.
– Illumina curated GreenGenes database : An internally curated version of
the GreenGenes database to which supports species classifications
Interactive single sample graphics
– Interactive graphics: Interactive sunburst plots allowing the user to drill
down into the community structure of the sample
Multiple sample analysis is enabled
– Aggregate analysis: Preforms aggregate analysis of multiple samples with
interactive PCoA and hierarchical clustering plots.
1. Applied and Environmental Microbiology 2007, vol. 73 no 16 5261-5267
136
Four simple steps for 16S metagenomics results
I. Sequence 16S metagnomics
samples
II. Set up/Run 16S
metagenomics app
1. Select project to save results
2. Select one or more samples
3. Hit continue
III. Visualize individual sample
output
IV. Visualize multisampling
output
137
http://supportres.illumina.com/documents/documentation/chemistry_do
cumentation/16s/16s-metagenomic-library-prep-guide-15044223-b.pdf
Four simple steps for 16S metagenomics results
I. Sequence 16S metagnomics
samples
II. Set up/Run 16S
metagenomics app
1. Select project to save results
2. Select one or more samples
3. Hit continue
III. Visualize individual sample
output
IV. Visualize multisampling
1
output
2
138
3
Four simple steps for 16S metagenomics results
I. Sequence 16S metagnomics
samples
II. Set up/Run 16S
metagenomics app
1. Select project to save results
2. Select one or more samples
3. Hit continue
III. Visualize individual sample
output
IV. Visualize multisampling
output
139
Four simple steps for 16S metagenomics results
I. Sequence 16S metagnomics
samples
II. Set up/Run 16S
metagenomics app
1. Select project to save results
2. Select one or more samples
3. Hit continue
III. Visualize individual sample
output
IV. Visualize multisampling
output
140
Metagenomic Classification Apps
App
Description
Pro
Con
Widely used 3rd party tool
FastQC
Evaluates Data Quality,
Q-Values, Read Lengths, GC
Content, Enriched Sequences
BaseSpace Labs App with
only limited ILMN support
Taxonomic classification of full
length or partial 16S cDNA or
genomic amplicons using an
Illumina-curated GreenGenes
taxonomic database
Assigns taxonomic labels for
bacterial, archaeal or viral
classification to short DNA
sequences with high sensitivity and
speed using exact alignments of kmers
PCoA & hierarchical
clustering dendrogram of
multiple samples;
Interactive Krona charts
Very slow on shotgun
metagenomics data;
No Viral or Eukaryotic
detection – only bacterial
species
No Eukaryotic detection;
Utilizes only MiniKraken –
not full Kraken database;
16S Metagenomics v1.0
Kraken Metagenomics
A computational tool for profiling
MetaPhlAn (Metagenomic the composition of microbial
communities from metagenomic
Phylogenetic Analysis)
shotgun sequencing data.,
GENIUS Metagenomics:
Know Now
141
CosmosID's curated genome
database and high performance
algorithms to provide bacterial
identification at the species,
subspecies, and/or strain level.
Very fast on shotgun
metagenomics data;
Works on rRNA reads;
Interactive Krona charts;
Multi Sample Submit;
Host Filtering;
MetaPhlAn relies on
unique clade-specific
marker genes identified
from reference genomes
No Eukaryotic detection;
Single Sample Analysis;
No 16S amplicon analysis
Rapid, bacterial
identification at the
species, subspecies,
and/or strain level.
Proprietary algorithms.
Single Sample Analysis.
Read1 & read2 of PE data
separately analysed; No
16S amplicon analysis
Shotgun Metagenomic Tools – NOT 16s!
Kraken Metagenomics
ccb.jhu.edu/software/kraken
Taxonomic analysis of short reads for
Bacteria, archaea and viral classification
MetaPhLAn
– http://huttenhower.sph.harvard.edu/metaphlan
Publicly available alternatives:
MEGAN5
ab.inf.uni-tuebingen.de/software/megan5
Taxonomic, functional, and comparative analyses
MG-RAST
metagenomics.anl.gov
Taxonomic, functional, and comparative analyses; data sharing
Metavir 2
metavir-meb.univ-bpclermont.fr
Viral metagenome comparison; assembled virome analysis
MetaPhase for analyzing Hi-C data from metagenomes
https://github.com/shendurelab/MetaPhase
Reconstructing individual genomes (assembly)
SURPI
http://chiulab.ucsf.edu/surpi/
Pathogen detection from metagenomics Clinical samples
142
TruSeq Amplicon BaseSpace Core App
143
TruSeq Amplicon Analysis
Powerful, yet simplified
Per-Sample
Reads
Fastq
Annotated
VCF
Alignment
Manifest
(Banded
Smith-Waterman)
BAM
Variant Calling
gVCF
(GATK, Isaac, or
Somatic VC)
VCF
Annotation
Metric
Generation
Biological summary
PDF Report
Based on the TruSeq Amplicon workflow in MiSeq Reporter, this app offers the
same pipeline options:
–
–
–
–
144
Banded Smith-Waterman alignment
Three options for variant calling: GATK, Isaac, and the Somatic Variant Caller
Two options for annotation: RefSeq and Ensembl
Illumina VariantStudio allows further downstream filtering and annotation of .vcf files
Illumina Amplicon Panel Support
Validated for streamlined analysis
TruSeq Custom Amplicon Panels
Build custom targeted panels in Illumina’s free web portal, DesignStudio
DesignStudio v1.6 (now live) is optimized for better, faster TSCA designs
TruSeq Amplicon Cancer Panel
Detects somatic mutations at low frequencies
Designed to cover important hot spots in 48 genes
TruSight Myeloid Panel
Detects somatic mutations at low frequencies (as low as 5%)
Designed to cover 54 important regions (full genes and exons)
145
Input & Output Files
Input: requires FASTQ data uploaded to BaseSpace (available from any
Illumina sequencer connected to BaseSpace)
– Allows combination of data from multiple sequencing runs
– Supports custom manifests
Output: standard outputs from MiSeq Reporter workflow, + new
reporting functionality for Core Apps
– .bam files to show aligned reads (can view directly in IGV)
– .vcf to report variant calls/.genome.vcf to report all regions assayed (can
view directly in Illumina VariantStudio)
– .pdf and .html reports to summarize results in friendly, graphical ways
– .csv summary files containing run metrics (summary.csv) and amplicon
performance (coverage.csv)
146
Three simple steps for TruSeq Amplicon results
I. Sequence TruSeq Amplicon
Samples
II. Set up/Run TruSeq Amplicon
app
1. Select project to save results
2. Select one or more samples
3. Confirm settings for Variant
Caller and Annotation; Hit continue
III. View reports and download
result files
147
Three simple steps for TruSeq Amplicon results
I. Sequence TruSeq Amplicon
Samples
II. Set up/Run TruSeq Amplicon
app
1. Select project to save results
2. Select one or more samples
1
2
3. Confirm settings for Variant
Caller and Annotation; Hit continue
III. View reports and download
result files
148
3
Three simple steps for TruSeq Amplicon results
I. Sequence TruSeq Amplicon
Samples
II. Set up/Run TruSeq Amplicon
app
1. Select project to save results
2. Select one or more samples
3. Confirm settings for Variant
Caller and Annotation; Hit continue
III. View reports and download
result files
149
Demo if time left…
151
152
Thank you for your attention!
153