Debjit Ray, 5.2

Transcription

Debjit Ray, 5.2
Genome-Based prediction of pathogenic
potential of the new “super bug”:
Clostridium difficile
Superbug
Debjit Ray
Sandia National Laboratories
18th April, 2016
Background Produce Clostridium difficile toxin A and toxin N, both of which may produce
diarrhea and inflammation in infected patients.
Clostridium difficile is now the most common cause of infectious diarrhea in
hospitals and long term care settings.
The good and the evil
Overview of Project
Cultured ~100 C.diff clinical isolates
•  Type 027 “hypervirulent” strains
•  Access to clinical isolates, detailed clinical patient records
•  Cheap and rapid sequencing methods
•  De Novo Assembly
Unbiased discovery of new genome features, accurate SNP
calls, inversions and transpositions are revealed
Multidrug resistance evolution and pathogen emergence
End goal is a set of methods and software tools suitable for
routine clinical use that will support translational research
5 Assembly Pipeline
Mate Pairs
Illumina
Filtering
MP demux
NxTrim
SPades
Paired Ends
Scaffolding
Reference ass.
Scaffold ordering
Bridger
Corrected reads, filtered scaffolds
Spades
Trusted contigs
Check for Inconsistent assembly
(Inversion, deletion, translocation)
YES
NO
Map reads back to genome
Gap closure
Final Genome
Annotation Pipeline
Assembled
Genomes
Annotation
“RATT”, “PROKKA”
Gene Finder
“Prodigal”
RNA Genes
“rfind”
Islands
“Islander”
Integrons
“Integral”
Whole Genome
Alignment
“Mugsy”
Gene Families
“HMMR”
Virulence DB
Abx Res DB
Transposases
Integrases
CAS/CRISPR
Custom (Cdiff)
§  Mobile Elements
§  Where are they?
§  Virulence Genes
§  Antibiotic Res.
§  Phylogenetic Tree
(with dates)
Features contributing to pathogenicity
Genomic profile
Patient profile
Clinical predictions
§ 
§ 
Host factors must be controlled for Pa3ent data controls for host state and possibly environment §  Is 027 severity associated with nursing homes? Virulence may “emerge” but may also aCenuate. rRNA involving Contig breaks
There are 10 different rRNA loci in CD196,
they lead to most of the contig breaks during
de novo assembly
Poorly assembled region
51
I
0
28
5
I
0.5
I
1
9
I
1.5
I
I
I
I
2
2.5
3
3.5
mB
Numbers in red denote, genomes (contig) breaks by the rRNA
11 5
I
4
Mate Pair Challenge
16S
tRNAs
tRNA Ala 9bp
23S
33bp
5S
tRNAs
9bp
33bp
rRNA are
~5Kb, Highly conserved
Common sites of recombination
~5Kb
rRNA
Increase Mate Pair Size to Span rRNA
Repeats Reliably
7 Gross Genomic Rearrangements
Reference genome CD196
Cd25,27,32,84 :: Inversion :: 855, 165 – 1,275,562
Cd29,35 :: Inversion :: 830,748 – 1,049,929
Cd28 :: Inversion :: 440,137 – 612,607
Cd30 :: Translocations :: 2,208,355 – 2,451,812
Cd31 :: 2 inversions :: 1,033,048 – 1,664,720 and 3,266,930 – 3,671,595
Cd39 :: Inversion :: 1,448,058 – 1,926,943
Cd42 :: Translocation :: 3,447,213 – 3,853,885
A Ubiquitous CAS-containing
and Two Unique Genomic islands
Genomic islands - Clusters of genes that are acquired by horizontal transfer.
GIs are associated with microbial adaptations and they have had a substantial
impact on bacterial evolution and pathogenicity
Genome
%
tRNA
Identity
Island
Length
Island_1
All
100
18,965
Cas5, Cas6,
Phage_integrase, SmpB
Island_2
Cd2, Cd17
89
82,810
Phage_integrase
Island_3
Cd7, Cd10
98
21,817
Phage_integrase
CRISPR based gene manipulation
Natural to bacteria
CRISPR-Cas9 is a
groundbreaking technique that
enables scientists to make precise
targeted changes in living cells.
Knock genes out and introduce
new genes
Unlike traditional gene-editing
methods, it is cheap, easy to use
and effective in almost any
organism.
Better and Faster Sequencing
§  Goal:
§ 
§ 
§ 
§ 
Minimal prep time and labor
Efficient and optimal use of sequence reads
Specific enzyme design
Introduc3on of site-­‐directed inser3ons through unstable plasmids into the C. difficile chromosome.
Safe, Simple
Culture Method
DNA
Extraction
~$1
+/- endonuclease
Petri Dish
Single Pot
Library Prep
Avidin
Captured
(MP)
Flow
Express our own Through
Transposase
(PE)
~$4-8 per sample
400x
multiplexing
yields 100x
coverage at
$10 per
sample
Technology development
§  Hyper-multiplexing
§  Lowering coverage from ~200x to 50x
§  Mixing 2-4 isolates with different genome characteristics in a given
sample barcode to lower library reagent costs
§  Cdiff (28.5%) and Burkholderia (68%)
§  Cdiff (28.5%), Escherichia (51%) and Pseudomonas (66%)
§  Cheap Long Reads
§  Oxford Nanopore produces long (up to >10kb) but error prone reads
§  Already shown to be able to resolve adjacent repeats in
assemblies
–  Ashton et al, “MinION nanopore sequencing identifies the
position and structure of a bacterial antibiotic resistance
island”, Nature Biotechnology (Dec 2014)
Acknowledgements Photos placed in
horizontal position
with even amount
of white space
between photos
and header
Photos placed in horizontal
position
with even amount of white
space
between photos and header
Funding was provided by the Laboratory Directed Research
and Development program at Sandia National Laboratories .
Collaboration with Clinical Microbiology Laboratory, UC
Davis Medical Center.
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin
Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.