Data analysis for DNA and RNA NGS
Transcription
Data analysis for DNA and RNA NGS
Illumina Analysis Solutions Thomas Patrick Klemm Sr. Sales Specialist South APAC Illumina Singapore © 2014 Illumina, Inc. All rights reserved. Illumina, 24sure, BaseSpace, BeadArray, BlueFish, BlueFuse, BlueGnome, cBot, CSPro, CytoChip, DesignStudio, Epicentre, GAIIx, Genetic Energy, Genome Analyzer, GenomeStudio, GoldenGate, HiScan, HiSeq, HiSeq X, Infinium, iScan, iSelect, ForenSeq, MiSeq, MiSeqDx, MiSeqFGx, NeoPrep, Nextera, NextBio, NextSeq, Powered by Illumina, SeqMonitor, SureMDA, TruGenome, TruSeq, TruSight, Understand Your Genome, UYG, VeraCode, verifi, VeriSeq, the pumpkin orange color, and the streaming bases design are trademarks of Illumina, Inc. and/or its affiliate(s) in the U.S. and/or other countries. All other names, logos, and other trademarks are the property of their respective owners. Agenda Intro MiSeq Reporter - MiSeq Automated Analysis Workflows DesignStudio – Creating Your own Custom Panel VariantStudio – Targeted Variant Analysis BaseSpace – Cloud Analysis and Storage 2 Welcome to the Future! 3 NGS changing lives! 4 Saving children’s lives The technology is ready! 5 Inherited Pediatric Diseases 6 “Baby-Seq” – Starting early 7 “Obama-Seq” – Precision Medicine with FDA 8 Ebola Illumina collaborating with the BROAD and USAID 9 Our Vision Innovating for the Future of Human Health To improve human health by unlocking the power of the genome 10 Seamless End to End Genomics Solution Illumina Bioinformatics Simplest NGS Workflow Most Peer Reviewed Technology 11 Integrated, Optimized Sample Prep Broadest Applications Largest Community of Users Illumina’s Suite of Library Prep Solutions TruSeq RNA TruSeq DNA PCR-Free TruSeq Custom Amplicon TruSeq Stranded mRNA TruSeq Nano DNA TruSight Tumor TruSeq Stranded Total RNA Nextera TruSeq Amplicon Cancer Panel TruSeq Small RNA Nextera XT TruSight Myeloid TruSeq Targeted RNA Expression Nextera Mate Pair Nextera Rapid Capture Exome/Custom TruSeq RNA Access TruSeq Synthetic Long Read TruSight Panels TruSeq ChIP 12 No matter the input, all libraries end up looking similar Dual Index Library shown The aim of the Library Prep step is to obtain nucleic acid fragments with adapters attached on both ends 13 Intuitively Designed Software for Quick Adoption MiSeq Control Software MiSeq Reporter Design Studio 14 For Research Use Only. Not for use in diagnostic procedures Illumina Experiment Manager BaseSpace MiSeq Reporter - Automated Analysis Workflows 15 Analysis Overview Analysis Type Software Outputs Control Software MCS/RTA Images, Intensities and Base Calls Analysis Software Alignments, Variant Detection Visualization Software Annotation, Filtering, Reports 16 MiSeq Reporter Unprecedented Walk-Away Informatics Solution Automated secondary analysis for key applications: Resequencing Amplicon sequencing 16S Metagenomics de novo assembly Small RNA Library QC Integrated analysis hardware Output in standard formats: Fastq BAM Vcf txt 17 MiSeq Applications Portfolio Integrated. Optimized. Simplified. Amplicon Sequencing Custom Amplicon Targeted Resequencing Custom Enrichment Small RNA sequencing Clone checking ChIP-Seq Library QC Plasmid Regulation RNA-Seq Resequencing Small genome RNA sequencing De novo sequencing 16S Metagenomics 18 For Research Use Only. Not for use in diagnostic procedures General Summary Report – All Apps Low % High % 19 Clusters Mismatch MSR Workflows: DS Workflow Common Applications: – TruSight Tumor samples, especially FFPE – Circulating DNA Process: 1. Reads are aligned against unique targeted manifest for each strand (banded SmithWaterman aligner) 2. Variants are called using the somatic variant caller 3. Forward and reverse strand variants are compared and reconciled in final output Outputs: – – – – 20 FASTQ BAM VCF gVCF For Research Use Only. Not for use in diagnostic procedures MSR Workflows: De Novo Assembly Common applications: – De novo assembly of small genomes Process: 1. Uses Velvet assembler to reconstruct small genomes through use of contigs, without the need for a reference 2. Can compare with known reference if available to generate dot-plot Outputs: – FASTA file containing contigs – Dot plot png 21 For Research Use Only. Not for use in diagnostic procedures De Novo Assembly Details Report De novo metrics Syntenic dot plot (if reference genome supplied) Assembly = Velvet 22 MSR Workflows: Generate FASTQ Common applications: – Most flexible intermediate output for any downstream analysis outside of MSR – Analogous to the “BCL to FASTQ Converter” utility Process: 1. Reads are assembled from base call files and written to FASTQ 2. FASTQ is ready for additional processing, such as alignment Outputs: – FASTQ 23 For Research Use Only. Not for use in diagnostic procedures MSR Workflows: Library QC Common applications: – QC analysis of libraries before pooling and running on higher throughput instruments (HiSeq, NextSeq) Process: 1. Reads are aligned against reference genomes (BWA) 2. Per sample statistics written to report file Outputs: – FASTQ – BAM – HTML report 24 For Research Use Only. Not for use in diagnostic procedures Library QC Details Report Zoom in/out Table of samples Scope of view Coverage and error Q score Table of targets Alignment = Burrows-Wheeler Alignment (BWA) 26 MSR Workflows: Enrichment Common applications: – Large, targeted panels using pulldown enrichment/capture – Nextera Rapid Capture Custom, Nextera Rapid Capture Exome (8 rxn x 1 plex) Process: 1. Reads are aligned against whole genome reference (BWA) 2. Variants are called using the standard variant caller (GATK), or the somatic variant caller if specified in the sample sheet (particularly useful on cancer samples) Outputs: – – – – 27 FASTQ BAM VCF gVCF For Research Use Only. Not for use in diagnostic procedures Enrichment Details Report Zoom in/out Scope of view Coverage and error Table of samples Q score SNP/indel + annotation Table of targets Table of variants Alignment = Burrows-Wheeler Alignment (BWA) Variant calling = Genome Analysis Toolkit (GATK) (default); Somatic variant caller; (option) 28 MSR Workflows: Metagenomics Common application: – Bacteria population analysis based on 16S rRNA amplicons – Generate taxonomic classification data down to species level – Integrates seamlessly with Illumina’s 16S Demonstrated Protocol (V3-V4 amplicons) Process: 1. Reads are classified by sorting against 16S database, GreenGenes (V1-V9 regions) 2. Per sample statistics written to report files and plots Outputs: – – – – 29 FASTQ BAM GUI Plots HTML report For Research Use Only. Not for use in diagnostic procedures Metagenomics Taxonomic Level Classification Table of samples Classifier is based on GreenGenes 16S rRNA database 30 MSR Workflows: Amplicon Common applications: – Analysis of PCR amplicons fragmented with Nextera tagmentation Process: 1. Reads are aligned (BWA) against a custom built manifest file from IEM 2. Variant analysis in regions of interest (GATK) Outputs: – FASTQ, BAM, VCF, gVCF 31 For Research Use Only. Not for use in diagnostic procedures PCR Amplicon Detail Report Zoom in/out Scope of view Coverage and error Table of samples Q score SNP/indel + annotation Table of targets Table of variants Alignment = Burrows-Wheeler Alignment (BWA) Variant calling = Genome Analysis Toolkit (GATK) (default); Somatic variant caller; Starling (option) 32 Amplicon Detail Report Zoom in/out Scope of view Coverage and error Table of samples Q score SNP/indel + annotation Tableof ofvariants targets Table Alignment = Banded Smith-Waterman Variant calling = Genome Analysis Toolkit (GATK) (default); Somatic variant caller; Starling (option) 33 MSR Workflows: Resequencing Common applications: – Small genome analysis (~20mb or smaller) Process: 1. Reads are aligned against reference genomes (BWA) 2. Variant analysis in regions of interest (GATK) Outputs: – – – – 34 FASTQ BAM VCF gVCF For Research Use Only. Not for use in diagnostic procedures Resequencing Details Report Zoom in/out Scope of view Coverage and error Table of samples Q score SNP/indel + annotation Table of targets Table of variants Alignment = Burrows-Wheeler Alignment (BWA) Variant calling = Genome Analysis Toolkit (GATK) (default); Somatic variant caller; Starling (option) 35 MSR Workflows: Small RNA Common applications: – Small RNA abundance measurements typically important in transcription regulation – Often important for cancer research Process: 1. Reads are aligned against databases for mature miRNA (miRBase), small RNA, and a genomic reference using Bowtie (flexible reference storage) 2. Small RNA hits and relative species abundance is reported Outputs: – – – – 36 FASTQ BAM TXT reports Charts For Research Use Only. Not for use in diagnostic procedures Small RNA Summary Page Cluster Info Trimmed read lengths 37 Small RNA Details Report Distribution of RNA species Samples Table 38 Top 10 most abundant species MSR Workflows: Targeted RNA Common applications: – TruSeq Targeted RNA Expression Process: 1. Reads are aligned against custom manifest file (banded Smith-Waterman) 2. Reports relative expression of genes and isoforms between several samples Outputs: – FASTQ – BAM – HTML report 39 For Research Use Only. Not for use in diagnostic procedures Targeted RNA Summary Samples Comparison Graph Comparison Table 40 MSR Workflows: TruSeq Amplicon Common applications: – TruSeq Custom Amplicon panels Overview: 1. Reads are aligned against custom manifest from DesignStudio (banded SmithWaterman) 2. Variants called against reference genome (GATK or somatic variant caller) Outputs: – – – – 41 FASTQ BAM VCF gVCF For Research Use Only. Not for use in diagnostic procedures DesignStudio – Creating Your own Custom Panel 42 2.0 Technical training Illumina’s complete self-service tool for targeted resequencing assay design. Build custom panels with no need for bioinformatics expertise. Fully supported and optimized targeted NGS assays: 43 TruSeq Custom Amplicon (TSCA) Nextera Rapid Capture Custom Targeted RNA Expression 2.0 Technical training With a completely new user experience and interface, DesignStudio 2.0 streamlines custom panel design for new and returning customers. 44 Create free custom designs before purchasing your panel Get started quickly with our new assay selector tool Move rapidly from design order in five easy steps Technical training DesignStudio Workflow Assay Type 45 Technical training DesignStudio Workflow Assay Type Select a targeted technology clearly and intuitively 46 Technical training DesignStudio Workflow Configure Design 47 Technical training DesignStudio Workflow Configure Design Clear visualization of configuration options 48 Technical training DesignStudio Workflow Add Targets 49 Technical training DesignStudio Workflow Add Targets Easier target entry, and design submission 50 Technical training DesignStudio Workflow Submit Design 51 Technical training DesignStudio Workflow Review Design 52 Technical training DesignStudio Workflow Review Design Emphasized coverage, and easy order 53 Technical training DesignStudio Workflow Order 54 2.0 Technical training Additional new features, including: Dynamic design reports presenting target-based coverage and variant information Direct download of custom manifest files for streamlined analysis One-click access to the UCSC Genome Browser Convenient visualization of DNA target types Notes: 55 All existing customer designs will be automatically migrated to DesignStudio 2.0 Technical training DesignStudio Dynamic Reports View coverage and gap details 56 Technical training DesignStudio File Download Direct access to manifest files 57 Technical training DesignStudio Link to UCSC Browser Convenient visualization of designs 58 VariantStudio – Targeted Variant Analysis 59 VariantStudio (also on BaseSpace) Intuitive analysis and interpretation of genomic data Variant Pathogenic Annotation • Variant, transcript, and gene level • Transcript consequence • Functional impact • Overlap with functional elements • Allele frequencies • Disease association • Literature searches 60 Filtering Likely Unknown Pathogenic Significance Likely Benign Benign Interpretation • Single sample, tumornormal pairs, and family-based filtering • Record variant classification and interpretation • Combine filters and save as a workflow • Store information in Classification Database to apply to future samples Reporting • Create report templates • Generate reports with variant interpretations VariantStudio Small Variant Annotation, Filtering, and Reporting TopHat Alignment TruSeq Amplicon Amplicon DS BWA Enrichment BWA WGS VariantStudio Isaac Enrichment Isaac WGS Tumor Normal 61 Illumina VariantStudio Software Tool User friendly analysis and interpretation Intuitive user interface for easy data exploration Rich annotation from a broad range of sources Flexible and comprehensive set of filters 62 Illumina VariantStudio Workflow Data in, biological knowledge out Import VCF File Export annotated/filtered variants Illumina VariantStudio Desktop Client 63 Easy-To-Use Software Application Intuitive user interface for analysis and exploration Ribbon Menu Gene View Filters Pane Transcripts Pane Variants Table Filter History 64 Rapid And Complete Annotation Enrich data with biological context Provides annotations at: – Variant level – Transcript level – Gene level Categories of annotations include: – Transcript consequence (synonymous, frameshift, missense, etc) – Functional impact (PolyPhen and SIFT damaging, benign, etc) – Population allele frequencies (1000 Genomes, dbSNP, etc) – Disease association (HGMD, OMIM, COSMIC, etc) – Scientific literature (PubMed) 65 Annotation Databases Increasing power to interpret clinical impact ClinVar • Aggregates information about variation and its clinical significance • Info submitted by expert panels, professional societies, testing laboratories, and curatorial groups GeneReviews • Expert-authored disease descriptions focusing on clinically relevant and actionable information on diagnosis, management, and counseling of patients with specific inherited conditions MedGen • Organizes information about conditions such as clinical features, related genes, practice guidelines, and ontologies Snomed CT • Standardizes clinical health care terminology that provides a consistent way of capturing, sharing, and aggregating heath data 66 Interactive Filtering Zeroing in on the disease relevant variants Access commonly applied filters in the user interface – – – – – – Variant type Variant quality Allele frequency Functional impact Gene association Sample comparison Filter or sort variants based on any columns in the variant table Combine filters to create workflows 67 1,000,000s Detected Variants 10,000s Coding Variants 100s Deleterious Variants Causal few Variants Determining Appropriate Filtering Parameters Filtering parameters are customizable, and have no default setting There are no “right” or “wrong” settings Filtering parameters should be tailored to meet – Required analytical performance Confidence of variant calls – Desired data Germline, mosaic, or somatic mutations? – Assay type FFPE samples? High coverage? Numbers will be determined by community standards, assay specific data generated during assay validation – Lab/assay dependent for Laboratory Developed Tests (LDTs) – Predetermined and fixed for In-Vitro Diagnostic Tests (IVDs) 68 The Filtering Tab 69 Variant Quality Filter 70 Inherited Disease Analysis Family analysis increases power to identify causative variants TruSight One + VariantStudio delivers a sample-to-report workflow – Broadest coverage of genomic content (4,813 genes) with known association to clinical phenotypes – Superior performance with high coverage uniformity – Sequence a family trio in a single MiSeq run – Rapid identification and reporting of causative mutations in VariantStudio 71 Family-based Filtering Rapid identification of mutations in inherited diseases Increases power to isolate causative variants by levering family information Analysis support for trio plus multiple affected or unaffected siblings Identifies variants consistent with a particular inheritance mode, including recessive, dominant, x-linked, and de novo Reports whether allele in proband is inherited from mother or father 72 Population Based Filtering Mutations with a high frequency in the population are unlikely to be pathogenic – Otherwise most of us would be sick! Removing high frequency mutations observed in the 1000 genomes database helps us ignore common and benign mutations This can be done in a geographic specific manner 73 Tumor Normal Paired Analysis Quick identification of somatic mutations Cross Sample Subtraction tool filters for variants present in one sample but not the other 74 Using Cross Sample Subtraction Removing germline variants with data from normal tissue 75 Save Filtering Settings for Future Use Please click “Filter Favorites”, then “Save” to create a new custom filter that can be used for future analyses 76 What Mutations Remain? Which genes? What type of mutation? Are they somatic? What is the variant frequency? What are the consequences to the genes? Are these variants present in annotation databases? What information is available on these variants? Have your own information on these variants? Add a custom database! Take a moment to explore the annotation information and database links provided, please take notes/links/citations to add to your classification database and use in your final report 77 Illumina VariantStudio Intuitive analysis and interpretation Import Data Annotate Filter Classify Report Insight 78 Variant Classification Sorting variants according to their impact Pathogenic Likely Benign Likely Pathogenic Unknown Significance Benign Additional Categories… Classification: The assignment of variant to a defined category based on an assessment of its clinical impact 79 Standard Guidelines Can Be Used http://www.ncbi.nlm.nih.gov/pubmed/18414213 80 Assign Classification Click on the “…” under the “Classification” header To include in the report, assign Classifications by clicking on “…” 81 Illumina VariantStudio Intuitive analysis and interpretation Import Data Annotate Filter Classify Report Insight 82 Customizable Reporting Summarize significant results in sample report 83 BaseSpace – Cloud Analysis and Storage 84 Illumina Sequencing Streamlined NGS Solutions INTEGRATED SAMPLE MANAGEMENT CORE APPS All major biological applications Workflow • Storage • Analysis • Sharing 3RD PARTY APPS A broad and growing ecosystem EASY SHARING BaseSpace is the place where ideas foster ONE CLICK DELIVERY No FTP site, no hard drive to ship 85 BaseSpace Apps are being used all around the world (> 10K Analyses per month) 86 APAC App Launches 2015 (1H) 87 BaseSpace Growth – APAC 1H2015 88 What? BaseSpace is Illumina’s genomic cloud computing environment BaseSpace Eliminates need for onsite storage and compute Web based data management and analysis Tools for collaboration and sharing Available for Illumina and non-Illumina customers Signup via My Illumina (formerly iCom) 89 Why? BaseSpace is the best place to put your NGS data BaseSpace already built into MiSeq Secure and reliable Simple to use Reads and qualities Sample and experiment descriptions Analysis results variants contigs metagenomes coverage statistics miRNA counts more… 90 How? BaseSpace is a computer-free NGS analysis tool Automatic push to cloud Walk away bioinformatics Results available anywhere, anytime Browse the results via web-based graphical environment Access to a growing suite of analysis tools 91 Easy to Use BaseSpace allows seamless sharing and collaboration Share results with peers Make results publically available Provide deep access to raw data Share your challenges with Illumina technical support. 92 Direct Integration with BaseSpace MiSeq users have the option to “push” data to BaseSpace 93 MiSeq Pushes Data to BaseSpace 94 Simple to Use BaseSpace is built into MiSeq Results available within two hours of run completion From anywhere in the world With near ZERO human interaction 95 MSR vs BaseSpace MiSeq Reporter BaseSpace Automatic data analysis & reporting Offline analysis Unlimited licensed users Seamless data sharing with collaborators Scalable data storage & archiving Latest version of bioinformatics tools Access results anywhere, anytime 96 BaseSpace Dashboard 97 Three App Types (61 Total, May 2015) Illumina Core Apps – Developed by Illumina Developers – Rigorous Software Testing and Documentation – Supported by Illumina Technical Support BaseSpace Labs App – Developed by Illumina Developers – Lite Testing and Documentation – Not supported by Illumina Technical Support 3rd Party apps – Developed by 3rd Parties – Not supported by Illumina Technical Support 98 18 Illumina Core Apps 16S Metagenomics TopHat Alignment BWA Enrichment BWA WGS Small RNA 99 Cufflinks Assembly & DE RNA Express Variant Studio Isaac Enrichment Broad IGV TruSeq Amplicon Amplicon DS Isaac WGS Tumor Normal Long Read Assembly Long Read Phasing TruSeq Targeted RNA MethylSeq 12 BaseSpace Labs Apps FASTQ Toolkit FastQC NextBio Annotates RNA-Seq Prokka Genome Annotation Velvet de novo Assembly SRA Submission 100 Kraken Metagenomics NextBio Transporter PicardSpace SRA Import VCAT SRST2 31 Third Party Apps DNA Star DeepChekHBV, HCV, HIV Melanoma Profiler MetaPhlAn iPathwayGuide EDGC Annotator GENIUS Metagenomics: Know Now LoFreq Rare Variant Caller MyFLq OncoMD Novoalign Protein Expression Assembler Genomatix Elastic Genome SPAdes Genome Pathway System Browser Assembler RNA-Seq Translator PathSEQ Virome Protein Expression Workflow miRNA Analysis Genome Profiler SWATH Atlas Variant Interpreter GeneTalk Variant Analyzer 101 Tute Genomics PEDANT Protein Expression Sequence-Analyzer Analytics Protein Expression Extractor BaseSpace News http://blog.basespace.illumina.com/ BaseSpace Mount – Use BaseSpace from Linux interface (Pros only!) Haplotype comparison options via FASTQ Toolkit App Differential Methylation App MiSeq Reporter App – Built In MSR with upgraded visualization WGS v4.0 – – – – – – – – – – 102 New structural variant (SV) caller New copy number variant (CNV) caller On node annotation for increased performance and stability Annotation of minor alleles that correspond to the reference genome (refminor) Ploidy Correction for Sex Chromosomes for small variants Variant quality score recalibration for small variant calling Isaac2 aligner (better performance and performs supplementary alignments) Merging of SV/CNV files to a single VCF file Bug and stability fixes Multi-node analysis (analyze up to 96 samples in parallel with each App launch) RNA-Seq 103 Four Easy Steps to RNA-Seq Results I. Set up/ Run TopHat II. QC of TopHat results 1. Filter out challenging samples as needed III. Set up/ Run Cufflinks 1.Name/ Select Control Group 2.Name/ Select Comparison Group IV. Visualize Group 1: Group 2 GEX correlation 104 Four Easy Steps to RNA-Seq Results I. Set up/ Run TopHat II. QC of TopHat results 1. Filter out challenging samples as needed III. Set up/ Run Cufflinks 1 2 1.Name/ Select Control Group 2.Name/ Select Comparison Group IV. Visualize Group 1: Group 2 GEX correlation 105 3 Four Easy Steps to RNA-Seq Results I. Set up/ Run TopHat II. QC of TopHat results Insert Length Distribution 1. Filter out challenging samples as needed III. Set up/ Run Cufflinks 1.Name/ Select Control Group 2.Name/ Select Comparison Group IV. Visualize Group 1: Group 2 Alignment Distribution GEX correlation Transcript coverage 106 Four Easy Steps to RNA-Seq Results I. Set up/ Run TopHat 1 II. QC of TopHat results 1. Filter out challenging samples as needed III. Set up/ Run Cufflinks 1.Name/ Select Control Group 2 2.Name/ Select Comparison Group IV. Visualize Group 1: Group 2 GEX correlation 3 4 107 Four Easy Steps to RNA-Seq Results I. Set up/ Run TopHat Correlation heat map and dendogram II. QC of TopHat results 1. Filter out challenging samples as needed III. Set up/ Run Cufflinks 1 2 1.Name/ Select Control Group 2.Name/ Select Comparison Group IV. Visualize Group 1: Group 2 3 GEX correlation Filter expression levels by log2 ratios (1), significance (2), Or gene families/names (3), and export filtered list in .csv 108 RNA-Seq for detection of gene fusions The BCR-ABL fusion is quickly detected in this analysis of a UHRR stock sample (mixture of CML, breast cancer and other cancer cell lines) The BCAS gene fusion, implicated in leukemia and breast cancers, is detected in the same sample 109 RNA-Seq for detection of critical genomic signatures Efficient, automatic detection of cSNPs and indels in RNA-Seq reads 110 RNASeq Time-to-Answer on BaseSpace Cloud NextSeq Hi Output Mode –TopHat/Cufflinks RNA-Seq Experiment Per-Sample Timings1,2 Read/ Sample Specifications High End Experiment: total RNA Expression Profiling, identify alt transcripts, fusion calling and cSNPs 3 3h 50M PE clusters 2x75bp 8 samples /run Mid-level Experiment: mRNA Expression Profiling, identify alt transcripts 70 min 25M PE clusters 2x75bp 16 samples/run Low End Typical Experiment: Expression Profiling Only 14 min 10M SE clusters 1x50bp 40 samples/run Parallel cloud processing means gene expression profiling is obtained in as little as 4 minutes per sample 1. Does not include bcl upload and demultiplexing/ fastq generation times 2. Extrapolated time, since a “per-sample” differential expression time is not possible 3. Single NextSeq run only supports 1 vs 1 differential expression experiment at high end 111 RNAExpress: Fast and accurate Gene Expression Profiling RNA-Express TopHat/ Cufflinks RNA Express Abundant sequence filtering Yes Yes Sequence alignment Yes Yes Variant calling Yes No Fusion calling Yes No Transcript assembly Yes No Gene abundance estimation Yes Yes Transcript abundance estimation Yes No Differential expression Yes Yes Feature 112 Exome 113 Push Button Exome Analysis in BaseSpace Designed for biologists: tailor-made Apps with high usability Detection of SNPs and small indels Graphical aggregate and per-sample reports deliver key variant and enrichment metrics Two App options provide greater research flexibility – BWA/GATK: Industry-standard method. Available today – Isaac: Illumina’s fast and accurate alignment and variant calling alternative Per-sample compute time only 2-5 hours on BaseSpace Cloud (read-depth dependent) 114 Push Button Exome Analysis in BaseSpace Elegant Apps to address complex workflows BWA Isaac Enrichment v2.1 Enrichment v2.1 Third-party tools currently needed to extract somatic variants from per-sample variants 115 Push Button Exome Analysis in BaseSpace Elegant Apps to address complex workflows BWA Isaac Enrichment v2.1 Enrichment v2.1 Third-party tools currently needed to extract somatic complex workflow encapsulated in a click-and-go interface variantsAfrom per-sample variants 116 Push Button Exome Analysis in BaseSpace Elegant Apps to address complex workflows BWA Isaac Enrichment v2.1 Enrichment v2.1 All Nextera Rapid Capture Exome and TruSight Fixed Content manifests supported 117 Push Button Exome Analysis in BaseSpace Comprehensive aggregate reports Aggregate reports enable quick analysis of metrics across all samples BWA Isaac Enrichment v2.1 Enrichment v2.1 – Access variant files and statistics – Identify enrichment/ off-target rates – Explore biological context of variants such as SNVs and indels Exome reports provide a quick, high-level aggregate summary of enrichment and sequencing statistics across many samples 118 Push Button Exome Analysis in BaseSpace Comprehensive aggregate reports BWA Isaac Enrichment v2.1 Enrichment v2.1 Exome reports provide a quick, high-level aggregate summary of enrichment and sequencing statistics across many samples 119 Push Button Exome Analysis in BaseSpace Comprehensive per-sample reports Per-sample reports allow you to quickly drill down to detailed statistics 120 Exome Analysis in BaseSpace BWA/GATK vs. Isaac Accuracy Comparison on NA128781 2 3 Method/ Coverage Variant type Specificity Sensitivity BWA 98x SNV 0.994 0.914 Isaac 98x SNV 0.998 0.857 BWA 111x SNV 0.994 0.928 Isaac 111x SNV 0.998 0.879 BWA 116x SNV 0.995 0.931 Isaac 116x SNV 0.998 0.883 BWA 98x Indel 0.929 0.756 Isaac 98x Indel 0.813 0.790 BWA 111x Indel 0.938 0.798 Isaac 111x Indel 0.818 0.812 BWA 116x Indel 0.937 0.787 Isaac 116x Indel 0.838 0.826 Choose the industry-standard BWA/GATK method or the faster and accurate Isaac method according to your specific research needs 1. 2. 3. 121 NA12878 datasets from Platinum Genomes: http://www.illumina.com/platinumgenomes/ Mendelian non-conflict rate for the variants called in the trio set Recovery rate of child variants reported in Kidd et al [Nature. 2008 May 1;453(7191):56-64] (~95k SNVs and ~11k indels) Whole Genome Sequencing (WGS) and Tumor-Normal WGS 122 WGS with BWA and Isaac Elegant Apps to address complex workflows Industry-standard BWA method Illumina’s own fast and accurate Isaac method * If using Isaac Variant Calling, SV information from Grouper is used during variant calling 123 BWA/ GATK WGS v1 Isaac WGS v2.0 WGS References Supported Human (UCSC HG 19) Mouse (UCSC MM9) Rat (UCSC RN5) Rhodobacter (NCBI 2005-10-07) E. Coli DH10B (NCBI 2008-03-17) E. Coli MG1655 (NCBI 2001-10-15 ) S. Cerevisia (UCSC sacCer2) Drosophila (UCSC version dm3) Phi X (Illumina) Arabidopsis thaliana (NCBI 9.1) B. Taurus (Ensembl UMD3.1) S. Aureus NCTC 8325 (NCBI 2006-02-13) 124 BWA/ GATK WGS Isaac WGS WGS Analysis in BaseSpace BWA/GATK vs. Isaac Accuracy Comparison on NA128781 Method Total SNV count Ts/Tv ratio SNV Het/Hom ratio SNV novelty rate Isaac 3,600,181 2.07 1.61 3.88 % BWA/GATK 3,274,233 2.09 1.50 3.19 % Unique to Isaac 125 1. 50x NA12878 dataset from Platinum Genomes (http://www.illumina.com/platinumgenomes/) Unique to BWA/GATK WGS Analysis in BaseSpace BWA/GATK vs. Isaac Accuracy Comparison on NA128781 Method Total Indel count Indel Het/Hom ratio Indel novelty rate Isaac 602,822 1.90 7.55 % BWA/GATK 674,928 1.48 7.46 % Unique to Isaac 126 1. 50x NA12878 dataset from Platinum Genomes (http://www.illumina.com/platinumgenomes/) Unique to BWA/GATK WGS Analysis in BaseSpace BWA/GATK vs. Isaac Accuracy Comparison on NA128781 SNV Statistics Method Specificity Sensitivity Isaac 0.999 0.955 BWA / GATK 0.999 0.897 Indel Statistics Method Specificity Sensitivity Isaac 0.977 0.916 BWA / GATK 0.987 0.950 Choose the industry-standard BWA/GATK method or the faster and accurate Isaac method according to your specific research needs 1. 2. 3. 127 NA12878 datasets from Platinum Genomes: http://www.illumina.com/platinumgenomes/ Mendelian non-conflict rate for the variants called in the trio set Recovery rate of Child variants reported in Kidd et al (Nature. 2008 May 1;453(7191):56-64) (~95k SNVs and ~11k indels) WGS Analysis in BaseSpace BWA/GATK vs. Isaac Compute Times BWA/ GATK WGS v1 ~30x coverage Choose the industry-standard BWA/GATK method or the faster and accurate Isaac method according to your specific research needs 128 Isaac WGS v2.0 Tumor Normal Analysis Whole-genome based detection of somatic variants Tumor Normal SNV/ Indel (Strelka) Leverages Illumina’s Strelka-based method used by leading academic labs – Wash U – BCCA – Kennedy-Krieger1 1. N Engl J Med 2013; 368:1971-1979 129 Tumor Normal Analysis Whole-genome based detection of somatic variants Tumor Normal SNV/ Indel (Strelka) Leverages Illumina’s Strelka-based method used by leading academic labs – Wash U – BCCA – Kennedy-Krieger1 1. N Engl J Med 2013; 368:1971-1979 130 Graphical Overview of Genome-scale somatic rearrangements 131 Graphical Overview of Genome-scale somatic rearrangements 132 Graphical Overview of Genome-scale somatic copy number aberrations Top: ration of tumor:normal read depth. Copy (red), copy number gains (red), losses (green) Bottom: Variant allele frequencies in the tumor sample at dbSNP positions where the normal sample is heterozygous 133 Somatic Variants in Detail 134 BaseSpace Core Apps for 16S metagenomics 135 BaseSpace Core Apps for 16S metagenomics 16S metagenomics Tailor-made workflows accessible to the bench biologist – Click-and-go user interface – Graphical, interactive display of biological results Optimized tools for better classification – Hig Performance RDP1 : Optimized implementation of RDP naïve Bayesian classification algorithm which takes advantage of longer k-mer lengths and paired-end reads. – Illumina curated GreenGenes database : An internally curated version of the GreenGenes database to which supports species classifications Interactive single sample graphics – Interactive graphics: Interactive sunburst plots allowing the user to drill down into the community structure of the sample Multiple sample analysis is enabled – Aggregate analysis: Preforms aggregate analysis of multiple samples with interactive PCoA and hierarchical clustering plots. 1. Applied and Environmental Microbiology 2007, vol. 73 no 16 5261-5267 136 Four simple steps for 16S metagenomics results I. Sequence 16S metagnomics samples II. Set up/Run 16S metagenomics app 1. Select project to save results 2. Select one or more samples 3. Hit continue III. Visualize individual sample output IV. Visualize multisampling output 137 http://supportres.illumina.com/documents/documentation/chemistry_do cumentation/16s/16s-metagenomic-library-prep-guide-15044223-b.pdf Four simple steps for 16S metagenomics results I. Sequence 16S metagnomics samples II. Set up/Run 16S metagenomics app 1. Select project to save results 2. Select one or more samples 3. Hit continue III. Visualize individual sample output IV. Visualize multisampling 1 output 2 138 3 Four simple steps for 16S metagenomics results I. Sequence 16S metagnomics samples II. Set up/Run 16S metagenomics app 1. Select project to save results 2. Select one or more samples 3. Hit continue III. Visualize individual sample output IV. Visualize multisampling output 139 Four simple steps for 16S metagenomics results I. Sequence 16S metagnomics samples II. Set up/Run 16S metagenomics app 1. Select project to save results 2. Select one or more samples 3. Hit continue III. Visualize individual sample output IV. Visualize multisampling output 140 Metagenomic Classification Apps App Description Pro Con Widely used 3rd party tool FastQC Evaluates Data Quality, Q-Values, Read Lengths, GC Content, Enriched Sequences BaseSpace Labs App with only limited ILMN support Taxonomic classification of full length or partial 16S cDNA or genomic amplicons using an Illumina-curated GreenGenes taxonomic database Assigns taxonomic labels for bacterial, archaeal or viral classification to short DNA sequences with high sensitivity and speed using exact alignments of kmers PCoA & hierarchical clustering dendrogram of multiple samples; Interactive Krona charts Very slow on shotgun metagenomics data; No Viral or Eukaryotic detection – only bacterial species No Eukaryotic detection; Utilizes only MiniKraken – not full Kraken database; 16S Metagenomics v1.0 Kraken Metagenomics A computational tool for profiling MetaPhlAn (Metagenomic the composition of microbial communities from metagenomic Phylogenetic Analysis) shotgun sequencing data., GENIUS Metagenomics: Know Now 141 CosmosID's curated genome database and high performance algorithms to provide bacterial identification at the species, subspecies, and/or strain level. Very fast on shotgun metagenomics data; Works on rRNA reads; Interactive Krona charts; Multi Sample Submit; Host Filtering; MetaPhlAn relies on unique clade-specific marker genes identified from reference genomes No Eukaryotic detection; Single Sample Analysis; No 16S amplicon analysis Rapid, bacterial identification at the species, subspecies, and/or strain level. Proprietary algorithms. Single Sample Analysis. Read1 & read2 of PE data separately analysed; No 16S amplicon analysis Shotgun Metagenomic Tools – NOT 16s! Kraken Metagenomics ccb.jhu.edu/software/kraken Taxonomic analysis of short reads for Bacteria, archaea and viral classification MetaPhLAn – http://huttenhower.sph.harvard.edu/metaphlan Publicly available alternatives: MEGAN5 ab.inf.uni-tuebingen.de/software/megan5 Taxonomic, functional, and comparative analyses MG-RAST metagenomics.anl.gov Taxonomic, functional, and comparative analyses; data sharing Metavir 2 metavir-meb.univ-bpclermont.fr Viral metagenome comparison; assembled virome analysis MetaPhase for analyzing Hi-C data from metagenomes https://github.com/shendurelab/MetaPhase Reconstructing individual genomes (assembly) SURPI http://chiulab.ucsf.edu/surpi/ Pathogen detection from metagenomics Clinical samples 142 TruSeq Amplicon BaseSpace Core App 143 TruSeq Amplicon Analysis Powerful, yet simplified Per-Sample Reads Fastq Annotated VCF Alignment Manifest (Banded Smith-Waterman) BAM Variant Calling gVCF (GATK, Isaac, or Somatic VC) VCF Annotation Metric Generation Biological summary PDF Report Based on the TruSeq Amplicon workflow in MiSeq Reporter, this app offers the same pipeline options: – – – – 144 Banded Smith-Waterman alignment Three options for variant calling: GATK, Isaac, and the Somatic Variant Caller Two options for annotation: RefSeq and Ensembl Illumina VariantStudio allows further downstream filtering and annotation of .vcf files Illumina Amplicon Panel Support Validated for streamlined analysis TruSeq Custom Amplicon Panels Build custom targeted panels in Illumina’s free web portal, DesignStudio DesignStudio v1.6 (now live) is optimized for better, faster TSCA designs TruSeq Amplicon Cancer Panel Detects somatic mutations at low frequencies Designed to cover important hot spots in 48 genes TruSight Myeloid Panel Detects somatic mutations at low frequencies (as low as 5%) Designed to cover 54 important regions (full genes and exons) 145 Input & Output Files Input: requires FASTQ data uploaded to BaseSpace (available from any Illumina sequencer connected to BaseSpace) – Allows combination of data from multiple sequencing runs – Supports custom manifests Output: standard outputs from MiSeq Reporter workflow, + new reporting functionality for Core Apps – .bam files to show aligned reads (can view directly in IGV) – .vcf to report variant calls/.genome.vcf to report all regions assayed (can view directly in Illumina VariantStudio) – .pdf and .html reports to summarize results in friendly, graphical ways – .csv summary files containing run metrics (summary.csv) and amplicon performance (coverage.csv) 146 Three simple steps for TruSeq Amplicon results I. Sequence TruSeq Amplicon Samples II. Set up/Run TruSeq Amplicon app 1. Select project to save results 2. Select one or more samples 3. Confirm settings for Variant Caller and Annotation; Hit continue III. View reports and download result files 147 Three simple steps for TruSeq Amplicon results I. Sequence TruSeq Amplicon Samples II. Set up/Run TruSeq Amplicon app 1. Select project to save results 2. Select one or more samples 1 2 3. Confirm settings for Variant Caller and Annotation; Hit continue III. View reports and download result files 148 3 Three simple steps for TruSeq Amplicon results I. Sequence TruSeq Amplicon Samples II. Set up/Run TruSeq Amplicon app 1. Select project to save results 2. Select one or more samples 3. Confirm settings for Variant Caller and Annotation; Hit continue III. View reports and download result files 149 Demo if time left… 151 152 Thank you for your attention! 153
Similar documents
Sequencing in Microbiology and Infectious Disease
© 2014 Illumina, Inc. All rights reserved. Illumina, 24sure, BaseSpace, BeadArray, BlueFish, BlueFuse, BlueGnome, cBot, CSPro, CytoChip, DesignStudio, Epicentre, GAIIx, Genetic Energy, Genome Analy...
More information