Debjit Ray, 5.2
Transcription
Debjit Ray, 5.2
Genome-Based prediction of pathogenic potential of the new “super bug”: Clostridium difficile Superbug Debjit Ray Sandia National Laboratories 18th April, 2016 Background Produce Clostridium difficile toxin A and toxin N, both of which may produce diarrhea and inflammation in infected patients. Clostridium difficile is now the most common cause of infectious diarrhea in hospitals and long term care settings. The good and the evil Overview of Project Cultured ~100 C.diff clinical isolates • Type 027 “hypervirulent” strains • Access to clinical isolates, detailed clinical patient records • Cheap and rapid sequencing methods • De Novo Assembly Unbiased discovery of new genome features, accurate SNP calls, inversions and transpositions are revealed Multidrug resistance evolution and pathogen emergence End goal is a set of methods and software tools suitable for routine clinical use that will support translational research 5 Assembly Pipeline Mate Pairs Illumina Filtering MP demux NxTrim SPades Paired Ends Scaffolding Reference ass. Scaffold ordering Bridger Corrected reads, filtered scaffolds Spades Trusted contigs Check for Inconsistent assembly (Inversion, deletion, translocation) YES NO Map reads back to genome Gap closure Final Genome Annotation Pipeline Assembled Genomes Annotation “RATT”, “PROKKA” Gene Finder “Prodigal” RNA Genes “rfind” Islands “Islander” Integrons “Integral” Whole Genome Alignment “Mugsy” Gene Families “HMMR” Virulence DB Abx Res DB Transposases Integrases CAS/CRISPR Custom (Cdiff) § Mobile Elements § Where are they? § Virulence Genes § Antibiotic Res. § Phylogenetic Tree (with dates) Features contributing to pathogenicity Genomic profile Patient profile Clinical predictions § § Host factors must be controlled for Pa3ent data controls for host state and possibly environment § Is 027 severity associated with nursing homes? Virulence may “emerge” but may also aCenuate. rRNA involving Contig breaks There are 10 different rRNA loci in CD196, they lead to most of the contig breaks during de novo assembly Poorly assembled region 51 I 0 28 5 I 0.5 I 1 9 I 1.5 I I I I 2 2.5 3 3.5 mB Numbers in red denote, genomes (contig) breaks by the rRNA 11 5 I 4 Mate Pair Challenge 16S tRNAs tRNA Ala 9bp 23S 33bp 5S tRNAs 9bp 33bp rRNA are ~5Kb, Highly conserved Common sites of recombination ~5Kb rRNA Increase Mate Pair Size to Span rRNA Repeats Reliably 7 Gross Genomic Rearrangements Reference genome CD196 Cd25,27,32,84 :: Inversion :: 855, 165 – 1,275,562 Cd29,35 :: Inversion :: 830,748 – 1,049,929 Cd28 :: Inversion :: 440,137 – 612,607 Cd30 :: Translocations :: 2,208,355 – 2,451,812 Cd31 :: 2 inversions :: 1,033,048 – 1,664,720 and 3,266,930 – 3,671,595 Cd39 :: Inversion :: 1,448,058 – 1,926,943 Cd42 :: Translocation :: 3,447,213 – 3,853,885 A Ubiquitous CAS-containing and Two Unique Genomic islands Genomic islands - Clusters of genes that are acquired by horizontal transfer. GIs are associated with microbial adaptations and they have had a substantial impact on bacterial evolution and pathogenicity Genome % tRNA Identity Island Length Island_1 All 100 18,965 Cas5, Cas6, Phage_integrase, SmpB Island_2 Cd2, Cd17 89 82,810 Phage_integrase Island_3 Cd7, Cd10 98 21,817 Phage_integrase CRISPR based gene manipulation Natural to bacteria CRISPR-Cas9 is a groundbreaking technique that enables scientists to make precise targeted changes in living cells. Knock genes out and introduce new genes Unlike traditional gene-editing methods, it is cheap, easy to use and effective in almost any organism. Better and Faster Sequencing § Goal: § § § § Minimal prep time and labor Efficient and optimal use of sequence reads Specific enzyme design Introduc3on of site-‐directed inser3ons through unstable plasmids into the C. difficile chromosome. Safe, Simple Culture Method DNA Extraction ~$1 +/- endonuclease Petri Dish Single Pot Library Prep Avidin Captured (MP) Flow Express our own Through Transposase (PE) ~$4-8 per sample 400x multiplexing yields 100x coverage at $10 per sample Technology development § Hyper-multiplexing § Lowering coverage from ~200x to 50x § Mixing 2-4 isolates with different genome characteristics in a given sample barcode to lower library reagent costs § Cdiff (28.5%) and Burkholderia (68%) § Cdiff (28.5%), Escherichia (51%) and Pseudomonas (66%) § Cheap Long Reads § Oxford Nanopore produces long (up to >10kb) but error prone reads § Already shown to be able to resolve adjacent repeats in assemblies – Ashton et al, “MinION nanopore sequencing identifies the position and structure of a bacterial antibiotic resistance island”, Nature Biotechnology (Dec 2014) Acknowledgements Photos placed in horizontal position with even amount of white space between photos and header Photos placed in horizontal position with even amount of white space between photos and header Funding was provided by the Laboratory Directed Research and Development program at Sandia National Laboratories . Collaboration with Clinical Microbiology Laboratory, UC Davis Medical Center. Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.