THE SEQUENCING TECNOLOGY (R)EVOLUTION

Transcription

THE SEQUENCING TECNOLOGY (R)EVOLUTION
THE SEQUENCING TECNOLOGY
(R)EVOLUTION
TIM STAKENORG
IMEC
MB&C meeting
May 16, 2013
© IMEC 2013
HISTORY OF SEQUENCING
▸ 384 - 322 BC
- Aristotle told his students that all inheritance comes from the father
▸ 1977 (2 indepent methods published in PNAS)
- Maxam & Gilbert: chemical degradation method
- Sanger: ddNTP-mediated chain termination !!
▸ 1995 (Fleishmann et al., Science 269: 485)
- Mycoplasma genitalium (first fully sequenced bacterial genome)
▸ 2001 (Science/Nature)
- First human genome (13 years, 300 million USD)
▸ May 2005 (454 technology)
- 6 month, >30 million USD
© IMEC 2013
2
HISTORY OF SEQUENCING
G80
1,E+09
RV770
Itanium 2
transistor count (Moore's law) vs.
pairs/day
sequenced kilo base
AMD K10
1,E+08
Cell
AMD K8
Illumina HiSeq
Barton
Pacific Biosciences SMRT*
Atom
Pentium 4
AMD K7
Pentium II
1,E+07
2nd generation
Pentium III
Moore’s law
(sequence by synthesis)
AMD K5
Pentium
Intel 80486
1,E+06
454 Titanium, ABI Solid3
First Solid
Intel 80386
1,E+05
3rd generation
Roche 454 Life Sciences
Intel 8088
Intel 80286
1,E+04
Intel 8080
1,E+03
1st generation
ABI 3730XL
Intel 4004
(capillary electrophoresis)
ABI37000
1,E+02
ABI377
1,E+01
ABI373
(slab gels)
manual slab gel
1,E+00
1970
1975
1980
1985
1990
1995
2000
2005
2010
2015
date of introduction
© IMEC 2013
Note: human genome = ~3.109 bases
3
HISTORY OF SEQUENCING
▸ Still many challenges in post-processing of data
▸ Data handling
▸ Computational algorithms
© IMEC 2013
4
THE FIRST GENERATION
© IMEC 2013
5
FIRST GENERATION (SANGER)
▸ Cyclic sequencing (amplification) reaction
- PCR products of different length
- Last base is fluorescent
(different color per base)
- Separation by size
▸ Pros and Cons
- Extensive sample prep (-)
- High cost (-)
- Low throughput (-)
- Long read lengths (+)
© IMEC 2013
6
DRAFT GENOME
▸ 1990 Human genome project started
▸ First draft in 2001, over 10 years and $3 billion later
▸ In 2003 (published 2004) finished human genome sequence
February 2001
© IMEC 2013
April 2011
7
© IMEC 2013
THE NUMBER OF GENES
▸ Human genome : ~3 Gbase (300,000 kbases)
▸ Average gene size: ~3kbases, but sizes vary greatly (largest is
dystrophin: 2.4 Mbases)
▸ GENE SWEEP (Cold Spring Harbor Lab 2000-2003)
▸ Rules: $1 in 2000, $5 in 2001 and $20 in 2002
▸ 165 bets
▸ Mean 61710
▸ Lowest 25947 (Lee Rowen)
▸ Highest 153478
© IMEC 2013
9
THE HUMAN GENOME
▸
▸
▸
▸
~3 Gbase, 24 chromosomes: 1-22, X,Y
21,500 - 24,000 genes
only 2% of the genome encodes genes
about 46% of the genome is repetitive sequence
=> THERE IS A LOT OF “GENOMIC DARK MATTER”
(or non coding RNA)
© IMEC 2013
10
THE HUMAN GENOME
© IMEC 2013
11
THE HUMAN GENOME
▸ Almost all (99.9%) nucleotide bases are exactly the same
in all people (0.1%, difference which is 1 difference per
1,000 base pairs)
-
Humans (0.08 - 0.1%)
-
Chimpanzees (0.12 - 0.17%)
-
Drosophila simulans (2%)
-
E. coli (5%)
-
HIV-I (30%)
▸ SNPs (a single base change in more than 1% humans)
-
Harmless (e.g. change in phenotype)
Harmful (e.g. diabetes, cancer, heart disease, Huntington‘s)
Latent (e.g. susceptibility to lung cancer)
© IMEC 2013
Photos from UN photo gallery www.un.org/av/photo
12
THE SECOND GENERATION
(NEXT GEN)
© IMEC 2013
13
SECOND GENERATION
▸ Sequence by synthesis
- Step-wise base addition & read-out
- Washing steps between each step
▸ Pros and Cons
- Extensive sample-prep (-)
- Relative costly reagents/run (-)
- Massively parallel sequencing (+)
- Relatively short fragments (-)
Examples: Roche 454 GS-FLX, Illuminia HiSeq,
Applied Solid, IonTorrent, etc
© IMEC 2013
14
2ND GENERATION: SAMPLE PREP
▸ Extensive sample prep
- Library generation (generation of fragments with adapters)
- Clonal amplification
- e.g. emPCR (e.g. Roche GS-FLX, ABI Solid, etc)
- e.g. bridge PCR (e.g. Solexa from Illumina)
© IMEC 2013
15
FLUORESCENT READ-OUT
▸ e.g. Illumina, ABI Solid, (or Helicos on single molecule level)
© IMEC 2013
16
BASE CALLING: NOISE FACTORS
▸ Phasing noise
- Leading / Lagging
▸ Fading noise
- Exponential decay in fluorescent signal
▸ Cycle-dependent change in fluorophore cross-talk
© IMEC 2013
Erlich et al. Nature Methods 5: 679-682 (2008); http://www.cs.utoronto.ca/~brudno/csc2431w10/altacyclic_pres.pdf
17
PYROSEQUENCING (OPTICAL)
▸ e.g. Roche GS FLX 454
© IMEC 2013
Figure from OMICS Journals (doi:10.4172/jcsb.1000019) and Nature Biotechnology (doi:10.1038/nbt1485)
18
PYROSEQUENCING (ELECTRICAL)
▸ e.g. IonTorrent (Life Technologies)
© IMEC 2013
19
PYROSEQUENCING (ELECTRICAL)
▸ Making small sequencing tests available (e.g. DNA electronics/Roche)
© IMEC 2013
20
THE THIRD GENERATION
(NEXT NEXT-GEN)
© IMEC 2013
21
THIRD GENERATION
▸ Sequencing (by synthesis)
- Single molecule sensitivity
- Read-out during copying
▸ Pros and Cons
- Potentially long fragments (+)
- Large cost reduction per run (+)
- Easier sample prep (+)
- Enzyme necessary: speed limited
(1-3 bases/second/pore)
▸ Examples: Pacific Biosciences, Oxford
Nanopore,Visigen (now Life Tech), etc.
© IMEC 2013
22
REAL-TIME SEQUENCING
▸ Zero mode waveguides (Pacific Biosciences)
▸ Single Molecule Real-Time (SMRT) sequencing
70nm
© IMEC 2013
- Polymerase is immobilized in 20 zL sized zeromode waveguides (ZMW)
- Polymerase cleaves off the fluorescent tags
- Fluorescent read-out
- Diffusion time: microseconds
- Incorporation time: milliseconds
23
MINATURIZING DNA SEQUENCING
© IMEC 2013 - Molecular Biology and Cytometry Course - SCK•CEN
24
© IMEC 2013
25
COMPARISON OF
COMMERCIAL PRODUCTS
Illumina:
HiSeq 2000, began shipping in the third quarter of 2012.
The instrument produces 2x150-base paired-end reads,
which will increase to 2x250. “That will give you around
300 gigabases in approximately 60 hours,”
Roche:
GS FLX+ system, coupled with its newest software produces reads of “up to 1,000 bp and
beyond”
Life Technologies (Ion Torrent):
Ion Proton can sequence “a human exome in a few hours,” Proton II is basically a 50x
improvement of their first chip (120 Gb), but with a somewhat higher error rate than Illumina
Pacific Biosciences:
PacBio RS, the company’s new XL chemistry produces reads averaging 5,000 bases a piece,
though about 5% of those exceed 10,000 bases.
© IMEC 2013
26
FUTURE FOURTH GENERATION
© IMEC 2013
27
NANOPORE BASED SEQUENCING
▸ e.g. Oxford Nanopore
© IMEC 2013
28
NANOPORE BASED SEQUENCING
▸ Hybridization assisted sequencing
e.g. Nabsys
-
Short fragments are hybridized to DNA
Their distance is measured
In parallel for many fragments
© IMEC 2013
e.g. Noblegen
-
Replace bases by barcode
Hybridize molecular beacons
Unzip DNA fragments in pore
Read fluorescent signals
29
FOURTH GENERATION
▸ Direct read-out of DNA
- Nanopore based sequencing
- Electron microscopy
▸ Pros and Cons
- In principle, simple sample prep
- Limited or no reagent costs
- Long read lengths
- No enzymatic reaction needed
- Ability to read RNA, DNA modifications, etc
Examples: imec, IBM, Halycon,
© IMEC 2013
30
NANOPORE BASED SEQUENCING
© IMEC 2013
Figures from Hao Liu, (Biodesign Institute) and http://www.mcb.harvard.edu/branton/index.htm
31
NANOPORE/NANOSLIT COMBINATION
▸ Controlled translocation through a solid-state nanopore
▸ Electrically induced translocation
▸ Mechanical confinement of a single DNA strand
V
▸ SERS in a plasmonic nanoslit
▸ Vibrational fingerprinting
▸ Molecular information in the pore
© IMEC 2013 - RESTRICTED
MOLECULAR SPECTROSCOPY BY SERS
▸ The normal Raman effect
▸ Inelastic scattering from light by molecules through the excitation of
molecular vibrations
▸ Spectroscopy
▸ Weak process!
▸ Surface Enhanced Raman Scattering
▸ “Hot spots” near metal nanostructures (excitation of plasmons)
▸ Enhancement with E4
▸ Single molecule resolution
© IMEC 2013 - RESTRICTED
SERS NANOSLIT
λ=785 nm
Au
H2O
Hot spot
▸ Generating a hot spot using top-down designed plasmonic nanocavities
▸ Large and highly localized field enhancement
▸ Raman enhancement: 105-1010 x (to single molecule levels)
© IMEC 2013 - RESTRICTED
NEXT-GENERATION SEQUENCING
1st generation
2nd generation
3nd generation
4th generation
Sanger sequencing
with size separation of
amplified fragments
Site-selective
amplification
followed by iterative
base-incorporation,
read and wash steps
Enzymatic reaction to
continuously integrate
and read-out bases.
True single molecule
analyses
Direct read-out of
bases (without
copying). True single
molecule analysis
Extensive
Extensive
Moderate
Almost none
Very low (<<1/sec)
Low (<1/sec)
Moderate (3/sec)
Very fast (~1 ms)
Throughput
Low
Very high
Very high
Very high
Accuracy
High
Low
Low (~80%)
NA
Long (~1000)
Short (~15-400)
Moderate (~450)
Very long (>1000)
De novo sequencing
Possible
Not possible
Difficult
Easy
Repeat regions
Limited
Highly limited
Limited
No intrinsic limit
Indirectly
Indirectly
Indirectly
Yes
Very high
Very high
High
None
Basic principle
Sample preparation
Speed/base/site
Read length
DNA/protein
derivatives
Reagent cost
© IMEC 2013
35
Generation
On
market
Single
molecule
Nanopore
(NP) /
Enzymatic (E)
Based
Principle
website
Illumina HiSeq*
2
Yes
No
E
Fluorescence, sequence by synthesis
www.illumina.com
Roche (FLX Titanium)
2
Yes
No
E
Light, sequence by synthesis
www.454.com
Polonator
2
Yes
?
E
Fluorescence/Polony
www.polonator.org
Complete Genomics
2
Yes
No
E
Fluorescence, sequence by synthesis
www.completegenomics.com
Helicos (TSMS)
2
Yes
Yes
E
Fluorescent, sequence by synthesis
www.helicosbio.com
Life Tech (ABI Solid4 )
2
Yes
No
E
Fluorescence, sequence by synthesis
www.appliedbiosystems.com
Life Tech (IonTorrent)
2 (3)
Yes
No
E
Electrical, sequence by synthesis
www.iontorrent.com
2
No
?
E
Fluorescence, sequence by synthesis
www.intelligentbiosystems.com
Technology
Intelligent Bio
GE Global
2
No
Yes
E
Fluorescence, sequence by synthesis
http://ge.geglobalresearch.com/blog/sequencing-a-human-sized-genomein-less-than-a-day/
GnuBio
2
No
No
E
Microdroplets, sequence by ligation
www.gnubio.com
Genizon Bioscience
2?
No
No
E
http://www.genizon.com/images/pdfs/Pihlak_Linnarsson_NBT2008.pdf
www.geniozon.com
Light Speed
2?
No
?
?
Light interference, patent: US 2009/0061526
www.lsgen.com
Mobious Biosystems (Nexus)
2?
No
No
E?
?
www.mobious.com
Pacific Biosciences (tSMRT)
3
Yes
Yes
E
Fluorescence (SMRT), sequence by synthesis
www.pacificbiosciences.com
Oxford Nanopore
3
No
Yes
NP/E
Electrical, enzymatic cutting of DNA
www.nanoporetech.com
Visigen
3
No
?
E
FRET measurement using TIRF
www.visigenbio.com
Cracker
3
No
Yes
E
SMRT, read-out on chip
www.crackerbio.com
IBM/Roche nanopore
4
No
Yes
NP/-
Electrical, tunneling using NPs
http://www-03.ibm.com/press/us/en/pressrelease/32037.wss
Nabsys
4
No
Yes
NP/-
Electrical, hybridization assisted NP sequencing
www.nabsys.com
NobleGen Biosciences
4
No
Yes
NP/-
Electrical, fluorescent after hybiridization (Meller)
www.noblegenbio.com
3 (4?)
No
Yes
NP/-
Electrical using biological NPs
www.electronicbio.com
Reveo
4
No
Yes
-/-
Electrical, tunneling using nano-knifes
www.reveo.com
Base4 Innovation
4
No
Yes
??
Nanopore + optical?
www.base4innovation.co.uk
ZS Genetics
4?
No
Yes
-/-
Electronmicroscopy
www.zsgenetics.com
Halcyon Molecular
4?
No
Yes
-/-
Electronmicroscopy
www.halcyonmolecular.com
Electronic Bio
© IMEC 2013
36
MANY APPLICATIONS
& PUBLICATIONS
© IMEC 2013
37
APPLICATIONS OF NEXT-GEN SEQUENCING
▸
▸
▸
▸
▸
▸
▸
▸
▸
▸
▸
▸
▸
▸
▸
▸
Whole-genome sequencing
Comparative genomics
Genome re-sequencing
Structural variation analysis
Polymorphism discovery
Meta-genomics
Environmental sequencing
Gene expression profiling
Genotyping
Population genetics
Migration studies
Ancestry inference
Relationship inference
Genetic screening
Drug targeting
Forensics
© IMEC 2013
38
HAS NGS A PROGNOSTIC VALUE
© IMEC 2013
... and for personal health ?
© IMEC 2013
... and for personal health ?
© IMEC 2013
© IMEC 2013
© IMEC 2013
... and for personal health ?
2 virus infections during the test period
(common cold and sinus infection)
Diabetes developed during /
after the 2nd infection
(Genetic risk had already
been identified from whole
genome sequencing)
© IMEC 2013
© IMEC 2013
HAS NGS A PROGNOSTIC VALUE
▸ Sequencing has gone through a revolution and has become affordable for
some applications (e.g. exome sequencing)
▸ Personal genome sequencing is already possible, but the medical
interpretation is still difficult
▸ Genome sequencing can predict disease risks
▸ Genome sequencing should be combined with other –omics to monitor
disease risk
▸ Integrated analysis are possible, but still need further improvement and
understanding
▸ Regulatory information needs to be considered
▸ Every person is unique and longitudinally follow-up will provide further
insight
▸ Longitudinal follow-up: case studies have proven value, but no good
biomarkers yet
© IMEC 2013
THANK YOU FOR YOUR ATTENTION
© IMEC 2013

Similar documents