Presentation Slides - Global HIV Vaccine Enterprise
Transcription
Presentation Slides - Global HIV Vaccine Enterprise
Next generation sequencing in Next generation sequencing in HIV host genetics Kevin Shianna, PhD Assistant Professor Director of Operations, CHGV Director, Genomic Analysis Facility Duke University School of Medicine k h l f d Overview •Current state of next generation sequencing •Current state of next generation sequencing •Future of sequencing •HIV host genetics sequencing studies Genome Sequencing Platforms • Roche – 454 • Long reads • pyrosequencing • expensive expensive Genome Sequencing Platforms • Roche – 454 • Long reads • pyrosequencing • expensive expensive • Life Technologies – SOLiD • sequencing‐by‐ligation • Illumina – Hiseq 2000 • sequencing‐by‐synthesis Illumina HiSeq 2000 – sequencing-by-synthesis Cost and throughput 1 0.1 Moore’s Law 0.01 0.001 0.0001 0.00001 000001 cost 000001 1E‐08 2007 single run 1Gb 2008 2009 2010 2011 single run 650 Gb Sequencing ‐ q g Current ‐Illumina Hiseq 2000 192 exomes exomes per month per month ‐192 ‐10 genomes per month (>30x coverage) ‐cost ‐exome exome = $1000‐1300 $1000 1300 ‐genome = $4000 * *cost does not include IT related costs d i l d IT l d Sequencing – q g Future (mid‐2012) ( ) ‐Illumina Hiseq 2000 288 576 exomes exomes per month per month ‐288‐576 ‐20‐40 genomes per month (>30x coverage) ‐cost ‐exome exome = $750‐900? $750 900? ‐genome = $2000‐3000? *cost does not include IT related costs Whole genome vs exome sequencing Cost Throughput IT/Data managment Structural variation/CNVs Future of sequencing Basic idea: g ( p) longer reads (1000‐2000bp) ‐de novo alignment no amplification (single molecule) p ( g ) no DNA modification no fluorescence no optical hardware p q p ( p) no expensive equipment (bench top) Future of sequencing?? Current developments Pacific Biosciences Ion Torrent Future development Future development Oxford Nanopore Pacific Biosciences RS SMRT – Single Molecule Real-Time Technology -long read (1000-2000 bp) -single molecule -unmodified DNA -unamplified unamplified DNA Ion Torrent Personal Genome Machine (PGM) -Semiconductor chip -no fluorescence or optics -pH meter – after incorporation of nucleotide a H+ is released changing the pH of the solution Oxford Nanopore -protein nanopore -label label free, free single molecule sequencing -base calling by measuring electrical current Exonuclease seq Sequencing Approach q g pp 1. Identify subjects Sequencing Approach q g pp Cirulli ET& Goldstein DB. Nature Reviews Genetics, 2010 Sequencing Approach q g pp 1. 2. 3 3. 4. Identify subjects Sequence (50‐100 individuals) Align to reference and call variants Align to reference and call variants Compare to 1000’s of sequenced controls 5. Follow‐up genotyping in larger cohorts! Estimation by Read Depth with SNVs ( (ERDS) ) • • • Calculate average read depth (RD) for each 1‐kb window, and correct for GC content GC content Use Paired Hidden Markov model to infer the copy numbers for each window, by utilizing both RD information and SNV heterozygosity information o a o Can use a Fisher’s Exact test to calculate imbalance homozygous h deletion heterozygous deletion Mingfu Zhu duplication An overall view of the SV‐Finder pipeline BWA bam file PMR Partially mapped (soft clipped) reads Reads with abnormal distance/directions UMR Unmapped reads Split‐Read analysis Small InDels, unknown SVs k (all & unique) Pair‐End analysis Big InDels, Inversions, translocations l Yujun Han Read coverage Read coverage Four general types of SVs as viewed in SV‐viewer A C PMR Transolo PMR UMR Paired reads with normal distance UMR Paired reads with long distance B D Transolo Transolo PMR UMR PMR PMR PMR UMR UMR UMR Paired reads with forward directions Paired reads with reverse directions A) Deletion. B) Insertion. C) Translocation. D) Inversion. Legends and arrows were added manually. Does sequencing work? Homozygous variants identified by exome sequencing F il 1 Family 3 homozygous variants shared by siblings AND absent in ~300 g sequenced control genomes Family 2 Only one of these is also present in unrelated affected individual CHAVI Host Genetics Sequencing Studies – Resistance to HIV – Rapid progressors Rapid progressors – Viral controllers – Viral setpoint p – Broad neutralizers – B57 modifiers Hemophilia High Risk Seronegative (HRSN,HESN) High Risk Seronegative (HRSN HESN) . Cases 44 high risk seronegative hemophilia patients European ancestry Funded by the Gates Foundation Genome Controls 43 low-risk population controls t l European ancestry HIV+ Controls 41 HIV+ exomes European ancestry 44 HRSN cases 43 genome controls 574 exome controls 574 exome (all analyses used just genome controls unless specified) QC FET Top overall (allelic/rec) Top hits, Rare in controls QC FET Top up/downstream (allelic/rec) Top hits, Rare in controls Functional variants in 2 or more cases QC Functional variants in 1 or more cases QC QC QC FET allelic and recessive models: FET ll li d i d l p<0.05, Rare in controls FET using 574 exome controls, allelic and recessive models: p<0.02, Rare in controls p 0.0 , Rare in controls Homozygous listings: H li ti Homo in 1 or more cases, Rare in controls Stop gain variants: In any case and at higher freq in cases than in controls. Homo in 1 or more cases, het in 2 or more cases Functional variants are: stop gain, stop loss, nonsynonymous, essential splice site, frameshift indel, nonframeshift indel =1390 variants of interest Follow‐up genotyping pg yp g High risk seronegatives General population HIV-positive individuals Enriched for protective variants Protective variants at low frequency Depleted for protective variants n=400 n=2000 n=1000 Variant absent in 859 HIV+ individuals: Homozygous 32 variants homo in >1 HRSN and not homo in any HIV+ individual 15 of these variants show depletion of homozygotes 15 of these variants show depletion of homozygotes in HIV+ in HIV+ samples (Given the MAF in HIV‐neg controls, we would have expected to see a homo in the HIV+ samples) 2 of these variants are enriched by FET in the HRSN as compared to the HIV+ and/or to the population controls 1 variant shows an association with lower set point • All are very rare All are very rare – Only in 2, 3 or 4 of the 400 HRSN individuals • Also very rare in population controls Variant absent in 859 HIV+ individuals: Heterozygous 11 variants het in >1 HRSN and absent in any form in all HIV+ individuals HIV+ individuals 5 of these variants show depletion of heterozygotes 5 of these variants show depletion of heterozygotes in HIV+ in HIV+ samples (Given the MAF in HIV‐neg controls, we would have expected to see a het in the HIV+ samples) 3 of these variants are enriched by FET i th HRSN in the HRSN as compared to the HIV+ d t th HIV and to the population controls • All are very rare All are very rare – Only in 2, 3 or 4 of the 400 HRSN individuals • Also very rare in population controls Next steps p ‐sequence more cases ‐funding by Gates Foundation and CHAVI/NIAID to funding by Gates Foundation and CHAVI/NIAID to sequence 200 more HRSN ‐functional studies ‐collapsing method followed by sequencing genes in collapsing method followed by sequencing genes in 1000’s of individuals Illumina MiSeq -Sequence 30-40 genes (400 exons) -100 100’s s-1000 1000’s s of samples -as low as $50-75 per sample David Goldstein Goldstein Lab Dongliang Ge Jianying Li Hee Shin Kim Jessica Maia Curtis Gumbs Liz Cirulli Kim Pelak Mingfu Zhu Qianqian Zhu Min He Darcy McMullin Shianna Lab Ryan Campbell Linda Hong Melora McCall Alex McKenzie Josh Mauro Hemophilia Project Center for HIV/AIDS Vaccine Immunology (CHAVI) 05 # U19 AI067854 AI067854-05 Bill and Melinda Gates Foundation Grant # 157412 Jim Goedert Jacques Fellay Dongliang Ge Kimberly Pelak Microcephaly p y Elizabeth Ruzzo Bruria Ben-Zeev Yuki Hitomi Kimberly Pelak Doron Lancet Elon Pras