RNA, la materia oscura della biologia

Transcription

RNA, la materia oscura della biologia
Colloquio Interdisciplinare sulla Biologia 27 gennaio 2016 – Aula 20 – Tor Vergata
RNA, la materia oscura della biologia:
sequenza, struttura e funzione
Manuela Helmer-Citterich
Centro di Bioinformatica Molecolare
Dipartimento di Biologia
outline
•  why RNA
•  development of tools for the
analysis of RNA molecules
•  a new alphabet integrating sequence and
structure information
•  a substitution matrix for RNA secondary
structure elements
•  accurate and faster search of RNA motifs
in sequence and structure
introduction
WHY RNA
DNA: stable, double-stranded, stores genetic information
proteins: built with the information stored in the DNA,
different structures and functions
introduction
RNA
can carry genetic information
can build and break
molecules
and much more.....
introduction
RNA(nc)
structure is important for function
primary sequence diverges faster than structure
poor identification of homologous RNAs when
sequence identity is below 50-60% .... structure
information is essential!!
introduction
KNOWN STRUCTURES
introduction
KNOWN STRUCTURES
introduction
KNOWN STRUCTURES
AUUCGAUUAGGCCUAA......
introduction
KNOWN STRUCTURES
free energy according
to experimental measures
development of software tools for the analysis of RNA molecules
tree-based encoding
development of software tools for the analysis of RNA molecules
a new alphabet for RNA
secondary structure
Stem : {S1, S2, S3, ... , Sn}
a
b
c
l
m
n
Loop : {L3, L4, L5, ... , Ln}
Internal loop : {I2, I3, ... , In}
Bulge Sx: {B1}, Bulge Dx: {B1}
m
n
[
]
a new alphabet for RNA
secondary structure
UCCAUCCUGGGCAACAGAGCUGGA!
((((.((((.....))).).))))!
!
a new alphabet for RNA
secondary structure
5 residue loop: L5 (o)
UCCAUCCUGGGCAACAGAGCUGGA!
((((.((((.....))).).))))!
ooooo!
a new alphabet for RNA
secondary structure
5 residue loop: L5 (o)
3 residue stem: S3 (c)
UCCAUCCUGGGCAACAGAGCUGGA!
((((.((((.....))).).))))!
cccoooooccc!
a new alphabet for RNA
secondary structure
5 residue loop: L5 (o)
3 residue stem: S3 (c)
right bulge: B1 (])
UCCAUCCUGGGCAACAGAGCUGGA!
((((.((((.....))).).))))!
cccoooooccc]!
a new alphabet for RNA
secondary structure
5 residue loop: L5 (o)
3 residue stem: S3 (c)
right bulge: B1 (])
1 residue stem: S1 (c)
right bulge: B1 (])
left bulge: B1 ([)
4 residue stem: S4 (d)
UCCAUCCUGGGCAACAGAGCUGGA!
((((.((((.....))).).))))!
dddd[acccoooooccc]a]dddd!
development of software tools for the analysis of RNA molecules
development of software tools for the analysis of RNA molecules
COMPUTE SUBSTITUTION MATRIX
retrieve a multiple alignment of homologuous RNAs (RFAM)
extract from RFAM the consensus secondary structure for the
family
development of software tools for the analysis of RNA molecules
convert the MSA in multiple alignment of secondary structure in the
new encoding
compute the mutation rate for every position in the multiple alignment
development of software tools for the analysis of RNA molecules
Lo odds scoreg
substitution matrix for RNA secondary
structure elements
development of software tools for the analysis of RNA molecules
development of software tools for the analysis of RNA molecules
SPS (Sum Of Pairs), that is the fraction of base pairs aligned as in the reference alignment, is used as a
measure for the accurary of the alignments. Ranges from 0 (all the base pairs are wrong) to 1 (all the
base pairs are correct
development of software tools for the analysis of RNA molecules
datasets
development of software tools for the analysis of RNA molecules
identification of best parameters
development of software tools for the analysis of RNA molecules
comparison with other algorithms
Programs
Sequence
Information
Structure
Information
Approach
Complexity
needle
Yes
No
NW alignment
O(n^2)
Beagle
Yes
Yes
NW + MBR
O(n^2)
gardenia
Yes
Yes
Tree-based
O(n^4)
RNAStrAT
Yes
Yes
Tree-based
O(n^4)
O(n^2(n^2+m^2))
LocARNA
Yes
Yes
Simultaneously
folding and
aligning
RNAdistance
Yes
Yes
Tree-based
O(n^3)
Forest-based
O(|F1|*|F2|
*deg(F1)*deg(F2)*(
deg(F1)+deg(F2))
RNAforester
Yes
Yes
development of software tools for the analysis of RNA molecules
comparison with other algorithms
Beagle
development of software tools for the analysis of RNA molecules
quality of the alignment
comparison with other algorithms
methods
Beagle
development of software tools for the analysis of RNA molecules
quality of the alignment
comparison with other algorithms
methods
Beagle
development of software tools for the analysis of RNA molecules
comparison with other algorithms
SPS
Beagle
development of software tools for the analysis of RNA molecules
comparison with other algorithms
SPS
Beagle
development of software tools for the analysis of RNA molecules
comparison with other algorithms
SPS
Beagle
65%
development of software tools for the analysis of RNA molecules
conclusions – part 1
we defined a new alphabet for describing RNA secondary structure
along with its sequence
this alphabet allowed us to define a substituion matrix
describing allowed variations in the secondary structure of
homologous RNAs
the alphabet and the matrix allow the alignment of RNAs with
accuracy comparable to the one of other state-of-the-art
methods but with lower calculation complexity (time)
the alphabet and the matrix are powerful resources for the RNA
secondary structure analysis, comparison and classification, motif finding,
and phylogeny development of software tools for the analysis of RNA molecules
...... BUT
development of software tools for the analysis of RNA molecules
use of suboptimal predictions
development of software tools for the analysis of RNA molecules
use of suboptimal predictions
Move from :
>RNA1
GCUUUUGGGAUGCAUUUUGUGCGGUUAUGUCUGCCUCC
.((((.(((((...............)))).)))))..
>RNA2
ACUUUUGGGAUGCAUUUUGUGCAGAUGUCUAACGAUUGA
.((((.(((.......(((....)))....))).)))).
To :
>RNA1
GCUUUUGGGAUGCAUUUUGUGCGGUUAUGUCUGCCUCC
.((((.(((((...............)))).)))))..
.(((...((......................)))))..
.
.
.
.(((.(.....((((((...............)....)
>RNA2
ACUUUUGGGAUGCAUUUUGUGCAGAUGUCUAACGAUUG
.(((.(((((((...............)....))))))
.(((((((....................))).))))..
.
.
.
.((((.(((.......(((....)))....))).)))).
RNAsubopt 150 stuctures for each RNA.
22500 total alignments when performing pairwise alignment.
development of software tools for the analysis of RNA molecules
use of suboptimal predictions
Are there good alignments among those 22500?
Yes, there are…but how do we select them without a reference alignment
development of software tools for the analysis of RNA molecules
use of suboptimal predictions
We used the score produced by Beagle
development of software tools for the analysis of RNA molecules
conclusions 1,5
we do not know how to take advantage of suboptimal
predictions, yet
experimental data on the RNAs in our datasets are not
available, we cannot prove we work better than the
other methods
Gabriele Ausiello
Fabrizio Ferrè
Alessio Colantoni
Antonio Palmeri
Centre for Molecular Bioinformatics
University of Rome – Tor Vergata
Claudio Patavino
Eugenio Mattei
Salvo Cirillo
Marco Pietrosanto