RNA, la materia oscura della biologia
Transcription
RNA, la materia oscura della biologia
Colloquio Interdisciplinare sulla Biologia 27 gennaio 2016 – Aula 20 – Tor Vergata RNA, la materia oscura della biologia: sequenza, struttura e funzione Manuela Helmer-Citterich Centro di Bioinformatica Molecolare Dipartimento di Biologia outline • why RNA • development of tools for the analysis of RNA molecules • a new alphabet integrating sequence and structure information • a substitution matrix for RNA secondary structure elements • accurate and faster search of RNA motifs in sequence and structure introduction WHY RNA DNA: stable, double-stranded, stores genetic information proteins: built with the information stored in the DNA, different structures and functions introduction RNA can carry genetic information can build and break molecules and much more..... introduction RNA(nc) structure is important for function primary sequence diverges faster than structure poor identification of homologous RNAs when sequence identity is below 50-60% .... structure information is essential!! introduction KNOWN STRUCTURES introduction KNOWN STRUCTURES introduction KNOWN STRUCTURES AUUCGAUUAGGCCUAA...... introduction KNOWN STRUCTURES free energy according to experimental measures development of software tools for the analysis of RNA molecules tree-based encoding development of software tools for the analysis of RNA molecules a new alphabet for RNA secondary structure Stem : {S1, S2, S3, ... , Sn} a b c l m n Loop : {L3, L4, L5, ... , Ln} Internal loop : {I2, I3, ... , In} Bulge Sx: {B1}, Bulge Dx: {B1} m n [ ] a new alphabet for RNA secondary structure UCCAUCCUGGGCAACAGAGCUGGA! ((((.((((.....))).).))))! ! a new alphabet for RNA secondary structure 5 residue loop: L5 (o) UCCAUCCUGGGCAACAGAGCUGGA! ((((.((((.....))).).))))! ooooo! a new alphabet for RNA secondary structure 5 residue loop: L5 (o) 3 residue stem: S3 (c) UCCAUCCUGGGCAACAGAGCUGGA! ((((.((((.....))).).))))! cccoooooccc! a new alphabet for RNA secondary structure 5 residue loop: L5 (o) 3 residue stem: S3 (c) right bulge: B1 (]) UCCAUCCUGGGCAACAGAGCUGGA! ((((.((((.....))).).))))! cccoooooccc]! a new alphabet for RNA secondary structure 5 residue loop: L5 (o) 3 residue stem: S3 (c) right bulge: B1 (]) 1 residue stem: S1 (c) right bulge: B1 (]) left bulge: B1 ([) 4 residue stem: S4 (d) UCCAUCCUGGGCAACAGAGCUGGA! ((((.((((.....))).).))))! dddd[acccoooooccc]a]dddd! development of software tools for the analysis of RNA molecules development of software tools for the analysis of RNA molecules COMPUTE SUBSTITUTION MATRIX retrieve a multiple alignment of homologuous RNAs (RFAM) extract from RFAM the consensus secondary structure for the family development of software tools for the analysis of RNA molecules convert the MSA in multiple alignment of secondary structure in the new encoding compute the mutation rate for every position in the multiple alignment development of software tools for the analysis of RNA molecules Lo odds scoreg substitution matrix for RNA secondary structure elements development of software tools for the analysis of RNA molecules development of software tools for the analysis of RNA molecules SPS (Sum Of Pairs), that is the fraction of base pairs aligned as in the reference alignment, is used as a measure for the accurary of the alignments. Ranges from 0 (all the base pairs are wrong) to 1 (all the base pairs are correct development of software tools for the analysis of RNA molecules datasets development of software tools for the analysis of RNA molecules identification of best parameters development of software tools for the analysis of RNA molecules comparison with other algorithms Programs Sequence Information Structure Information Approach Complexity needle Yes No NW alignment O(n^2) Beagle Yes Yes NW + MBR O(n^2) gardenia Yes Yes Tree-based O(n^4) RNAStrAT Yes Yes Tree-based O(n^4) O(n^2(n^2+m^2)) LocARNA Yes Yes Simultaneously folding and aligning RNAdistance Yes Yes Tree-based O(n^3) Forest-based O(|F1|*|F2| *deg(F1)*deg(F2)*( deg(F1)+deg(F2)) RNAforester Yes Yes development of software tools for the analysis of RNA molecules comparison with other algorithms Beagle development of software tools for the analysis of RNA molecules quality of the alignment comparison with other algorithms methods Beagle development of software tools for the analysis of RNA molecules quality of the alignment comparison with other algorithms methods Beagle development of software tools for the analysis of RNA molecules comparison with other algorithms SPS Beagle development of software tools for the analysis of RNA molecules comparison with other algorithms SPS Beagle development of software tools for the analysis of RNA molecules comparison with other algorithms SPS Beagle 65% development of software tools for the analysis of RNA molecules conclusions – part 1 we defined a new alphabet for describing RNA secondary structure along with its sequence this alphabet allowed us to define a substituion matrix describing allowed variations in the secondary structure of homologous RNAs the alphabet and the matrix allow the alignment of RNAs with accuracy comparable to the one of other state-of-the-art methods but with lower calculation complexity (time) the alphabet and the matrix are powerful resources for the RNA secondary structure analysis, comparison and classification, motif finding, and phylogeny development of software tools for the analysis of RNA molecules ...... BUT development of software tools for the analysis of RNA molecules use of suboptimal predictions development of software tools for the analysis of RNA molecules use of suboptimal predictions Move from : >RNA1 GCUUUUGGGAUGCAUUUUGUGCGGUUAUGUCUGCCUCC .((((.(((((...............)))).))))).. >RNA2 ACUUUUGGGAUGCAUUUUGUGCAGAUGUCUAACGAUUGA .((((.(((.......(((....)))....))).)))). To : >RNA1 GCUUUUGGGAUGCAUUUUGUGCGGUUAUGUCUGCCUCC .((((.(((((...............)))).))))).. .(((...((......................))))).. . . . .(((.(.....((((((...............)....) >RNA2 ACUUUUGGGAUGCAUUUUGUGCAGAUGUCUAACGAUUG .(((.(((((((...............)....)))))) .(((((((....................))).)))).. . . . .((((.(((.......(((....)))....))).)))). RNAsubopt 150 stuctures for each RNA. 22500 total alignments when performing pairwise alignment. development of software tools for the analysis of RNA molecules use of suboptimal predictions Are there good alignments among those 22500? Yes, there are…but how do we select them without a reference alignment development of software tools for the analysis of RNA molecules use of suboptimal predictions We used the score produced by Beagle development of software tools for the analysis of RNA molecules conclusions 1,5 we do not know how to take advantage of suboptimal predictions, yet experimental data on the RNAs in our datasets are not available, we cannot prove we work better than the other methods Gabriele Ausiello Fabrizio Ferrè Alessio Colantoni Antonio Palmeri Centre for Molecular Bioinformatics University of Rome – Tor Vergata Claudio Patavino Eugenio Mattei Salvo Cirillo Marco Pietrosanto