THE SEQUENCING TECNOLOGY (R)EVOLUTION
Transcription
THE SEQUENCING TECNOLOGY (R)EVOLUTION
THE SEQUENCING TECNOLOGY (R)EVOLUTION TIM STAKENORG IMEC MB&C meeting May 16, 2013 © IMEC 2013 HISTORY OF SEQUENCING ▸ 384 - 322 BC - Aristotle told his students that all inheritance comes from the father ▸ 1977 (2 indepent methods published in PNAS) - Maxam & Gilbert: chemical degradation method - Sanger: ddNTP-mediated chain termination !! ▸ 1995 (Fleishmann et al., Science 269: 485) - Mycoplasma genitalium (first fully sequenced bacterial genome) ▸ 2001 (Science/Nature) - First human genome (13 years, 300 million USD) ▸ May 2005 (454 technology) - 6 month, >30 million USD © IMEC 2013 2 HISTORY OF SEQUENCING G80 1,E+09 RV770 Itanium 2 transistor count (Moore's law) vs. pairs/day sequenced kilo base AMD K10 1,E+08 Cell AMD K8 Illumina HiSeq Barton Pacific Biosciences SMRT* Atom Pentium 4 AMD K7 Pentium II 1,E+07 2nd generation Pentium III Moore’s law (sequence by synthesis) AMD K5 Pentium Intel 80486 1,E+06 454 Titanium, ABI Solid3 First Solid Intel 80386 1,E+05 3rd generation Roche 454 Life Sciences Intel 8088 Intel 80286 1,E+04 Intel 8080 1,E+03 1st generation ABI 3730XL Intel 4004 (capillary electrophoresis) ABI37000 1,E+02 ABI377 1,E+01 ABI373 (slab gels) manual slab gel 1,E+00 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 date of introduction © IMEC 2013 Note: human genome = ~3.109 bases 3 HISTORY OF SEQUENCING ▸ Still many challenges in post-processing of data ▸ Data handling ▸ Computational algorithms © IMEC 2013 4 THE FIRST GENERATION © IMEC 2013 5 FIRST GENERATION (SANGER) ▸ Cyclic sequencing (amplification) reaction - PCR products of different length - Last base is fluorescent (different color per base) - Separation by size ▸ Pros and Cons - Extensive sample prep (-) - High cost (-) - Low throughput (-) - Long read lengths (+) © IMEC 2013 6 DRAFT GENOME ▸ 1990 Human genome project started ▸ First draft in 2001, over 10 years and $3 billion later ▸ In 2003 (published 2004) finished human genome sequence February 2001 © IMEC 2013 April 2011 7 © IMEC 2013 THE NUMBER OF GENES ▸ Human genome : ~3 Gbase (300,000 kbases) ▸ Average gene size: ~3kbases, but sizes vary greatly (largest is dystrophin: 2.4 Mbases) ▸ GENE SWEEP (Cold Spring Harbor Lab 2000-2003) ▸ Rules: $1 in 2000, $5 in 2001 and $20 in 2002 ▸ 165 bets ▸ Mean 61710 ▸ Lowest 25947 (Lee Rowen) ▸ Highest 153478 © IMEC 2013 9 THE HUMAN GENOME ▸ ▸ ▸ ▸ ~3 Gbase, 24 chromosomes: 1-22, X,Y 21,500 - 24,000 genes only 2% of the genome encodes genes about 46% of the genome is repetitive sequence => THERE IS A LOT OF “GENOMIC DARK MATTER” (or non coding RNA) © IMEC 2013 10 THE HUMAN GENOME © IMEC 2013 11 THE HUMAN GENOME ▸ Almost all (99.9%) nucleotide bases are exactly the same in all people (0.1%, difference which is 1 difference per 1,000 base pairs) - Humans (0.08 - 0.1%) - Chimpanzees (0.12 - 0.17%) - Drosophila simulans (2%) - E. coli (5%) - HIV-I (30%) ▸ SNPs (a single base change in more than 1% humans) - Harmless (e.g. change in phenotype) Harmful (e.g. diabetes, cancer, heart disease, Huntington‘s) Latent (e.g. susceptibility to lung cancer) © IMEC 2013 Photos from UN photo gallery www.un.org/av/photo 12 THE SECOND GENERATION (NEXT GEN) © IMEC 2013 13 SECOND GENERATION ▸ Sequence by synthesis - Step-wise base addition & read-out - Washing steps between each step ▸ Pros and Cons - Extensive sample-prep (-) - Relative costly reagents/run (-) - Massively parallel sequencing (+) - Relatively short fragments (-) Examples: Roche 454 GS-FLX, Illuminia HiSeq, Applied Solid, IonTorrent, etc © IMEC 2013 14 2ND GENERATION: SAMPLE PREP ▸ Extensive sample prep - Library generation (generation of fragments with adapters) - Clonal amplification - e.g. emPCR (e.g. Roche GS-FLX, ABI Solid, etc) - e.g. bridge PCR (e.g. Solexa from Illumina) © IMEC 2013 15 FLUORESCENT READ-OUT ▸ e.g. Illumina, ABI Solid, (or Helicos on single molecule level) © IMEC 2013 16 BASE CALLING: NOISE FACTORS ▸ Phasing noise - Leading / Lagging ▸ Fading noise - Exponential decay in fluorescent signal ▸ Cycle-dependent change in fluorophore cross-talk © IMEC 2013 Erlich et al. Nature Methods 5: 679-682 (2008); http://www.cs.utoronto.ca/~brudno/csc2431w10/altacyclic_pres.pdf 17 PYROSEQUENCING (OPTICAL) ▸ e.g. Roche GS FLX 454 © IMEC 2013 Figure from OMICS Journals (doi:10.4172/jcsb.1000019) and Nature Biotechnology (doi:10.1038/nbt1485) 18 PYROSEQUENCING (ELECTRICAL) ▸ e.g. IonTorrent (Life Technologies) © IMEC 2013 19 PYROSEQUENCING (ELECTRICAL) ▸ Making small sequencing tests available (e.g. DNA electronics/Roche) © IMEC 2013 20 THE THIRD GENERATION (NEXT NEXT-GEN) © IMEC 2013 21 THIRD GENERATION ▸ Sequencing (by synthesis) - Single molecule sensitivity - Read-out during copying ▸ Pros and Cons - Potentially long fragments (+) - Large cost reduction per run (+) - Easier sample prep (+) - Enzyme necessary: speed limited (1-3 bases/second/pore) ▸ Examples: Pacific Biosciences, Oxford Nanopore,Visigen (now Life Tech), etc. © IMEC 2013 22 REAL-TIME SEQUENCING ▸ Zero mode waveguides (Pacific Biosciences) ▸ Single Molecule Real-Time (SMRT) sequencing 70nm © IMEC 2013 - Polymerase is immobilized in 20 zL sized zeromode waveguides (ZMW) - Polymerase cleaves off the fluorescent tags - Fluorescent read-out - Diffusion time: microseconds - Incorporation time: milliseconds 23 MINATURIZING DNA SEQUENCING © IMEC 2013 - Molecular Biology and Cytometry Course - SCK•CEN 24 © IMEC 2013 25 COMPARISON OF COMMERCIAL PRODUCTS Illumina: HiSeq 2000, began shipping in the third quarter of 2012. The instrument produces 2x150-base paired-end reads, which will increase to 2x250. “That will give you around 300 gigabases in approximately 60 hours,” Roche: GS FLX+ system, coupled with its newest software produces reads of “up to 1,000 bp and beyond” Life Technologies (Ion Torrent): Ion Proton can sequence “a human exome in a few hours,” Proton II is basically a 50x improvement of their first chip (120 Gb), but with a somewhat higher error rate than Illumina Pacific Biosciences: PacBio RS, the company’s new XL chemistry produces reads averaging 5,000 bases a piece, though about 5% of those exceed 10,000 bases. © IMEC 2013 26 FUTURE FOURTH GENERATION © IMEC 2013 27 NANOPORE BASED SEQUENCING ▸ e.g. Oxford Nanopore © IMEC 2013 28 NANOPORE BASED SEQUENCING ▸ Hybridization assisted sequencing e.g. Nabsys - Short fragments are hybridized to DNA Their distance is measured In parallel for many fragments © IMEC 2013 e.g. Noblegen - Replace bases by barcode Hybridize molecular beacons Unzip DNA fragments in pore Read fluorescent signals 29 FOURTH GENERATION ▸ Direct read-out of DNA - Nanopore based sequencing - Electron microscopy ▸ Pros and Cons - In principle, simple sample prep - Limited or no reagent costs - Long read lengths - No enzymatic reaction needed - Ability to read RNA, DNA modifications, etc Examples: imec, IBM, Halycon, © IMEC 2013 30 NANOPORE BASED SEQUENCING © IMEC 2013 Figures from Hao Liu, (Biodesign Institute) and http://www.mcb.harvard.edu/branton/index.htm 31 NANOPORE/NANOSLIT COMBINATION ▸ Controlled translocation through a solid-state nanopore ▸ Electrically induced translocation ▸ Mechanical confinement of a single DNA strand V ▸ SERS in a plasmonic nanoslit ▸ Vibrational fingerprinting ▸ Molecular information in the pore © IMEC 2013 - RESTRICTED MOLECULAR SPECTROSCOPY BY SERS ▸ The normal Raman effect ▸ Inelastic scattering from light by molecules through the excitation of molecular vibrations ▸ Spectroscopy ▸ Weak process! ▸ Surface Enhanced Raman Scattering ▸ “Hot spots” near metal nanostructures (excitation of plasmons) ▸ Enhancement with E4 ▸ Single molecule resolution © IMEC 2013 - RESTRICTED SERS NANOSLIT λ=785 nm Au H2O Hot spot ▸ Generating a hot spot using top-down designed plasmonic nanocavities ▸ Large and highly localized field enhancement ▸ Raman enhancement: 105-1010 x (to single molecule levels) © IMEC 2013 - RESTRICTED NEXT-GENERATION SEQUENCING 1st generation 2nd generation 3nd generation 4th generation Sanger sequencing with size separation of amplified fragments Site-selective amplification followed by iterative base-incorporation, read and wash steps Enzymatic reaction to continuously integrate and read-out bases. True single molecule analyses Direct read-out of bases (without copying). True single molecule analysis Extensive Extensive Moderate Almost none Very low (<<1/sec) Low (<1/sec) Moderate (3/sec) Very fast (~1 ms) Throughput Low Very high Very high Very high Accuracy High Low Low (~80%) NA Long (~1000) Short (~15-400) Moderate (~450) Very long (>1000) De novo sequencing Possible Not possible Difficult Easy Repeat regions Limited Highly limited Limited No intrinsic limit Indirectly Indirectly Indirectly Yes Very high Very high High None Basic principle Sample preparation Speed/base/site Read length DNA/protein derivatives Reagent cost © IMEC 2013 35 Generation On market Single molecule Nanopore (NP) / Enzymatic (E) Based Principle website Illumina HiSeq* 2 Yes No E Fluorescence, sequence by synthesis www.illumina.com Roche (FLX Titanium) 2 Yes No E Light, sequence by synthesis www.454.com Polonator 2 Yes ? E Fluorescence/Polony www.polonator.org Complete Genomics 2 Yes No E Fluorescence, sequence by synthesis www.completegenomics.com Helicos (TSMS) 2 Yes Yes E Fluorescent, sequence by synthesis www.helicosbio.com Life Tech (ABI Solid4 ) 2 Yes No E Fluorescence, sequence by synthesis www.appliedbiosystems.com Life Tech (IonTorrent) 2 (3) Yes No E Electrical, sequence by synthesis www.iontorrent.com 2 No ? E Fluorescence, sequence by synthesis www.intelligentbiosystems.com Technology Intelligent Bio GE Global 2 No Yes E Fluorescence, sequence by synthesis http://ge.geglobalresearch.com/blog/sequencing-a-human-sized-genomein-less-than-a-day/ GnuBio 2 No No E Microdroplets, sequence by ligation www.gnubio.com Genizon Bioscience 2? No No E http://www.genizon.com/images/pdfs/Pihlak_Linnarsson_NBT2008.pdf www.geniozon.com Light Speed 2? No ? ? Light interference, patent: US 2009/0061526 www.lsgen.com Mobious Biosystems (Nexus) 2? No No E? ? www.mobious.com Pacific Biosciences (tSMRT) 3 Yes Yes E Fluorescence (SMRT), sequence by synthesis www.pacificbiosciences.com Oxford Nanopore 3 No Yes NP/E Electrical, enzymatic cutting of DNA www.nanoporetech.com Visigen 3 No ? E FRET measurement using TIRF www.visigenbio.com Cracker 3 No Yes E SMRT, read-out on chip www.crackerbio.com IBM/Roche nanopore 4 No Yes NP/- Electrical, tunneling using NPs http://www-03.ibm.com/press/us/en/pressrelease/32037.wss Nabsys 4 No Yes NP/- Electrical, hybridization assisted NP sequencing www.nabsys.com NobleGen Biosciences 4 No Yes NP/- Electrical, fluorescent after hybiridization (Meller) www.noblegenbio.com 3 (4?) No Yes NP/- Electrical using biological NPs www.electronicbio.com Reveo 4 No Yes -/- Electrical, tunneling using nano-knifes www.reveo.com Base4 Innovation 4 No Yes ?? Nanopore + optical? www.base4innovation.co.uk ZS Genetics 4? No Yes -/- Electronmicroscopy www.zsgenetics.com Halcyon Molecular 4? No Yes -/- Electronmicroscopy www.halcyonmolecular.com Electronic Bio © IMEC 2013 36 MANY APPLICATIONS & PUBLICATIONS © IMEC 2013 37 APPLICATIONS OF NEXT-GEN SEQUENCING ▸ ▸ ▸ ▸ ▸ ▸ ▸ ▸ ▸ ▸ ▸ ▸ ▸ ▸ ▸ ▸ Whole-genome sequencing Comparative genomics Genome re-sequencing Structural variation analysis Polymorphism discovery Meta-genomics Environmental sequencing Gene expression profiling Genotyping Population genetics Migration studies Ancestry inference Relationship inference Genetic screening Drug targeting Forensics © IMEC 2013 38 HAS NGS A PROGNOSTIC VALUE © IMEC 2013 ... and for personal health ? © IMEC 2013 ... and for personal health ? © IMEC 2013 © IMEC 2013 © IMEC 2013 ... and for personal health ? 2 virus infections during the test period (common cold and sinus infection) Diabetes developed during / after the 2nd infection (Genetic risk had already been identified from whole genome sequencing) © IMEC 2013 © IMEC 2013 HAS NGS A PROGNOSTIC VALUE ▸ Sequencing has gone through a revolution and has become affordable for some applications (e.g. exome sequencing) ▸ Personal genome sequencing is already possible, but the medical interpretation is still difficult ▸ Genome sequencing can predict disease risks ▸ Genome sequencing should be combined with other –omics to monitor disease risk ▸ Integrated analysis are possible, but still need further improvement and understanding ▸ Regulatory information needs to be considered ▸ Every person is unique and longitudinally follow-up will provide further insight ▸ Longitudinal follow-up: case studies have proven value, but no good biomarkers yet © IMEC 2013 THANK YOU FOR YOUR ATTENTION © IMEC 2013