Presentation Slides - Global HIV Vaccine Enterprise

Transcription

Presentation Slides - Global HIV Vaccine Enterprise
Next generation sequencing in Next
generation sequencing in
HIV host genetics
Kevin Shianna, PhD
Assistant Professor
Director of Operations, CHGV
Director, Genomic Analysis Facility
Duke University School of Medicine
k
h l f
d
Overview
•Current state of next generation sequencing
•Current
state of next generation sequencing
•Future of sequencing
•HIV host genetics sequencing studies
Genome Sequencing Platforms
• Roche – 454
• Long reads
• pyrosequencing
• expensive expensive
Genome Sequencing Platforms
• Roche – 454
• Long reads
• pyrosequencing
• expensive expensive
• Life Technologies – SOLiD
• sequencing‐by‐ligation
• Illumina – Hiseq 2000
• sequencing‐by‐synthesis
Illumina HiSeq 2000 – sequencing-by-synthesis
Cost and throughput
1
0.1
Moore’s Law
0.01
0.001
0.0001
0.00001
000001
cost
000001
1E‐08
2007
single run 1Gb
2008
2009
2010
2011
single run 650 Gb
Sequencing ‐
q
g Current
‐Illumina Hiseq 2000
192 exomes
exomes per month
per month
‐192
‐10 genomes per month (>30x coverage)
‐cost
‐exome
exome = $1000‐1300
$1000 1300
‐genome = $4000
*
*cost does not include IT related costs
d
i l d IT l d
Sequencing –
q
g Future (mid‐2012)
(
)
‐Illumina Hiseq 2000
288 576 exomes
exomes per month
per month
‐288‐576
‐20‐40 genomes per month (>30x coverage)
‐cost
‐exome
exome = $750‐900?
$750 900?
‐genome = $2000‐3000?
*cost does not include IT related costs
Whole genome vs exome sequencing
Cost
Throughput
IT/Data managment
Structural variation/CNVs
Future of sequencing
Basic idea:
g
(
p)
longer reads (1000‐2000bp)
‐de novo alignment
no amplification (single molecule)
p
( g
)
no DNA modification
no fluorescence
no optical hardware
p
q p
(
p)
no expensive equipment (bench top)
Future of sequencing??
Current developments
Pacific Biosciences
Ion Torrent
Future development
Future
development
Oxford Nanopore
Pacific Biosciences RS
SMRT – Single Molecule Real-Time Technology
-long read (1000-2000 bp)
-single molecule
-unmodified DNA
-unamplified
unamplified DNA
Ion Torrent Personal Genome Machine (PGM)
-Semiconductor chip
-no fluorescence or optics
-pH meter – after incorporation of nucleotide a H+ is
released changing the pH of the solution
Oxford Nanopore
-protein nanopore
-label
label free,
free single molecule sequencing
-base calling by measuring electrical current
Exonuclease seq
Sequencing Approach
q
g pp
1. Identify subjects
Sequencing Approach
q
g pp
Cirulli ET& Goldstein DB. Nature Reviews Genetics, 2010
Sequencing Approach
q
g pp
1.
2.
3
3.
4.
Identify subjects
Sequence (50‐100 individuals)
Align to reference and call variants
Align to reference and call variants
Compare to 1000’s of sequenced controls
5. Follow‐up genotyping in larger cohorts!
Estimation by Read Depth with SNVs
(
(ERDS)
)
•
•
•
Calculate average read depth (RD) for each 1‐kb window, and correct for GC content
GC content
Use Paired Hidden Markov model to infer the copy numbers for each window, by utilizing both RD information and SNV heterozygosity
information
o a o
Can use a Fisher’s Exact test to calculate imbalance
homozygous
h
deletion
heterozygous
deletion
Mingfu Zhu
duplication
An overall view of the SV‐Finder pipeline
BWA bam file
PMR
Partially mapped
(soft clipped) reads
Reads with abnormal distance/directions
UMR
Unmapped reads
Split‐Read analysis
Small InDels, unknown SVs
k
(all & unique)
Pair‐End analysis
Big InDels, Inversions, translocations
l
Yujun Han
Read coverage
Read coverage
Four general types of SVs as viewed in SV‐viewer A
C
PMR
Transolo
PMR
UMR
Paired reads with
normal distance
UMR
Paired reads with
long distance
B
D
Transolo
Transolo
PMR
UMR
PMR
PMR
PMR
UMR
UMR
UMR
Paired reads with
forward directions
Paired reads with
reverse directions
A) Deletion. B) Insertion. C) Translocation. D) Inversion. Legends and arrows were added manually.
Does sequencing work?
Homozygous variants identified by exome sequencing
F il 1
Family
3 homozygous variants shared by siblings AND absent in ~300 g
sequenced control genomes
Family 2
Only one of these is also present in unrelated affected individual
CHAVI Host Genetics Sequencing Studies
– Resistance to HIV
– Rapid progressors
Rapid progressors
– Viral controllers
– Viral setpoint
p
– Broad neutralizers
– B57 modifiers
Hemophilia
High Risk Seronegative (HRSN,HESN)
High Risk Seronegative
(HRSN HESN)
.
Cases
44 high risk
seronegative
hemophilia patients
European ancestry
Funded by the Gates Foundation
Genome Controls
43 low-risk population
controls
t l
European ancestry
HIV+ Controls
41 HIV+ exomes
European ancestry
44 HRSN cases
43 genome controls
574 exome controls
574 exome
(all analyses used just genome controls unless specified)
QC
FET Top overall (allelic/rec)
Top hits, Rare in controls
QC
FET Top up/downstream (allelic/rec)
Top hits, Rare in controls
Functional variants in 2 or more cases
QC
Functional variants in 1 or more cases
QC
QC
QC
FET allelic and recessive models:
FET
ll li
d
i
d l
p<0.05, Rare in controls
FET using 574 exome controls, allelic and recessive models:
p<0.02, Rare in controls
p
0.0 , Rare in controls
Homozygous listings:
H
li ti
Homo in 1 or more cases, Rare in controls
Stop gain variants:
In any case and at higher freq in cases than in controls.
Homo in 1 or more cases, het in 2 or more cases
Functional variants are:
stop gain, stop loss, nonsynonymous, essential splice site, frameshift indel, nonframeshift indel
=1390 variants of interest
Follow‐up genotyping
pg
yp g
High risk
seronegatives
General
population
HIV-positive
individuals
Enriched for
protective variants
Protective variants
at low frequency
Depleted for
protective variants
n=400
n=2000
n=1000
Variant absent in 859 HIV+ individuals:
Homozygous
32 variants homo in >1 HRSN and not homo in any HIV+ individual
15 of these variants show depletion of homozygotes
15
of these variants show depletion of homozygotes in HIV+ in HIV+
samples (Given the MAF in HIV‐neg controls, we would have expected to see a homo in the HIV+ samples)
2 of these variants are enriched by FET in the HRSN as compared to the HIV+ and/or to the population controls
1 variant shows an association with lower set point • All are very rare
All are very rare
– Only in 2, 3 or 4 of the 400 HRSN individuals • Also very rare in population controls
Variant absent in 859 HIV+ individuals:
Heterozygous
11 variants het in >1 HRSN and absent in any form in all HIV+ individuals
HIV+ individuals
5 of these variants show depletion of heterozygotes
5
of these variants show depletion of heterozygotes in HIV+ in HIV+
samples (Given the MAF in HIV‐neg controls, we would have expected to see a het in the HIV+ samples)
3 of these variants are enriched by FET i th HRSN
in the HRSN as compared to the HIV+ d t th HIV
and to the population controls
• All are very rare
All are very rare
– Only in 2, 3 or 4 of the 400 HRSN individuals • Also very rare in population controls
Next steps
p
‐sequence more cases ‐funding by Gates Foundation and CHAVI/NIAID to funding by Gates Foundation and CHAVI/NIAID to
sequence 200 more HRSN
‐functional studies
‐collapsing method followed by sequencing genes in collapsing method followed by sequencing genes in
1000’s of individuals
Illumina MiSeq
-Sequence 30-40 genes (400 exons)
-100
100’s
s-1000
1000’s
s of samples
-as low as $50-75 per sample
David Goldstein
Goldstein Lab
Dongliang Ge
Jianying Li
Hee Shin Kim
Jessica Maia
Curtis Gumbs
Liz Cirulli
Kim Pelak
Mingfu Zhu
Qianqian Zhu
Min He
Darcy McMullin
Shianna Lab
Ryan Campbell
Linda Hong
Melora McCall
Alex McKenzie
Josh Mauro
Hemophilia Project
Center for HIV/AIDS Vaccine Immunology (CHAVI)
05
# U19 AI067854
AI067854-05
Bill and Melinda Gates Foundation Grant # 157412
Jim Goedert
Jacques Fellay
Dongliang Ge
Kimberly Pelak
Microcephaly
p y
Elizabeth Ruzzo
Bruria Ben-Zeev
Yuki Hitomi
Kimberly Pelak
Doron Lancet
Elon Pras