Dominios de Proteinas y Homologia Remota
Transcription
Dominios de Proteinas y Homologia Remota
Dominios de Proteinas y Homologia Remota in the twilight zone of protein sequence analysis. Master en Bioinformatica UCM 2013 Luis Sanchez Pulido Department of Physiology, Anatomy & Genetics The most important influences on my career are: Who Am I? 2005 PhD Title: DomainOriented Computational Protein Sequence Analysis Since 2008 A Valencia Between 1995 2008 MA AndradeNavarro Visitor at P. Bork and A. Tramontano Labs LongTerm Fellowship Chris Ponting Department of Physiology, Anatomy & Genetics The Areas of My Expertise.... ---> PROTEINS Initially > Structural (Homology Modeling, Mutant interpretation, ...) What if? ● Insight II ● MolMol ● Rasmol ● PYMOL ● Early I Fell in Love with Protein Sequence Analysis Blast HHpred HMMer Pfam & SMART Why do we analyse sequences? Proteins with known sequence Structure Function Both ????? “There is no darkness but ignorance” William Shakespeare Data Overload!!! Database growth by year www.ebi.ac.uk/ena/about/statistics Protein Sequence Databases are becoming every day BIGGER and more complex... ● Protein Sequence Databases are becoming every day BIGGER... ● Michael Y. Galperin and Eugene V. Koonin From complete genome sequence to ‘complete’ understanding? Trends Biotechnol. 28. 2010 Protein Sequence Databases are becoming every day MORE COMPLEX... ● Nature of the protein universe. Michael Levitt. PNAS 2009 The analysis of the known and predicted context of each protein is becoming every day more difficult... every week is published a new Highthroughput experiment...(cell localization, interactions, Function...) ● Thanks to the recognition of homology between proteins, we can TRANSFER INFORMATION Structural and/or Functional Homologues: two proteins with a common ancestor. ... dependent on the type of divergence they can be: • orthologues - speciation • paralogues – gene duplication • xenologues – horizontal transference Admiring the amazing life's diversity GenBank Copyright Cédric Notredame, 2000, all rights reserved dbEST Three Generations of Tools in Protein Sequence Analysis Reference Database Sequence MRTSRGH..... First Generation .... 1987 ● Sequence versus Sequence – BLAST Alignment > Profile RTNMSDAQQGSWYSDPKREGWFYN RTNMSDAQQGSWYSDPKREGWFYN RTNMSDAQQGSWYSDPKREGWFYN RTNMSDAQQGSWYSDPKREGWFYN RTNMSDAQQGSWYSDPKREGWFYN RTNMSDAQQGSWYSDPKREGWFYN RTNMSDAQQGSWYSDPKREGWFYN Second Generation .... 1997 ● Profile versus Sequence – PSSMs - PSI-BLAST & HMMer Third Generation .... 2005 ● Profile versus Profile – HHpred Detection of homologous protein sequences Reference Database Three Generations of Tools in Protein Sequence Analysis 1- Sequence versus Sequence (Blast) 2- Profile versus Sequence - PSSMs (PSI-BLAST & HMMer) 3- Profile versus Profile (HHpred) Reciprocal! Why do we analyse sequences?? because.... Thanks to the recognition of homology between proteins, we can TRANSFER INFORMATION •Structural from HOMOLOGOUS proteins of known structure (X-Ray, NMR o EM) •Functional from experimentally characterised HOMOLOGOUS proteins or their genomic or proteomic context The Structure is better conserved than sequence! D'Alfonso G, Tramontano A, Lahm A. Structural conservation in singledomain proteins: implications for homology modeling. J Struct Biol. 134, 24656. (2001) A Remote Homology example: 1NYN 1P9Q SBDS Family The Structure is better conserved than sequence! Definiendo Homología Remota true positives Rost B. (1999) Twilight zone of protein sequence alignments. Protein Eng. 12:8594. true negatives Comparisons between pairs of sequences with known structure 100 Identity 50 20% Size 10 50 100 150 200 Twilight zone Chothia & Lesk, 1986 Rost, 1999 = = Rmsd > 3A Rmsd < 3A INFORMATION TRANSFER •Structural from HOMOLOGOUS proteins of known structure (X-Ray, NMR o EM) •Functional from experimentally characterised HOMOLOGOUS proteins or their genomic or proteomic context ¿FUNCTION? These are homologous Proteins... Their role in the cell is very different But... All of them bind GTP Key Points in Protein Function prediction: * Few functional annotations are derived by experiments, and most functional annotations are automated. * Remote homology, Structural information, chromosomal location, phylogenetic information, expression and molecular interaction data... are all being used for function prediction. * Different methods are better at predicting certain functional aspects. * Combined approaches of different methods are currently emerging (my favourite > STRING) The Transfer of Structural and/or Functional Information between homologous proteins is a Complex Task How is it done? Divide each of the Tasks in as many parts as is necessary to solve the problem Domain Definition Domains are described, from a structural point of view, as structurally compact units, locally independent in function and folding and usually characterized by a well define hydrophobic core. From sequence analysis point of view, we describe domains as evolutionary conserved regions that are present in different protein families of diverse architecture. “Hypothetical Domain” REPEATS – In the limits of Domain Definition every repeat is not structurally independent....... LRR HEAT TPR PFTA betal WD40 Very Low structural constraints allow high rates of sequence divergence between repeats, making their detection by sequence similarity VERY VERY difficult. Protein irregularities that hinder sequence analysis RB Russell & CP Ponting, 1998 Low complexity regions ● Repeats, Transmembrane and Coiledcoil regions (high mutation rates) ● and Fold irregularities, such as: Circular Permutations and Insertions ● N term C term N term C term Evolución de Proteínas: el papel de los dominios Barajado + Acreción SPP1/SET1C Familia CGBP CGBP_HUMAN Q03012_Yeast CxxC PHD PHD dPHD dPHD Q9W352_Drome PHD CxxC dPHD DATF1_HUMAN (DIDO-1) TFS2M PHD SPOC s_zf dPHD PHF3_HUMAN TFS2M PHD s_zf Q9VG78_Fly SPOC TFS2M PHD SPOC dPHD BRK YKA5_YEAST PHD TFS2M SPOC RBMF_HUMAN RRM Familia SPEN RRM RRM SPOC Q8IL17_Plasmodium RRM Q22855_Athaliana RRM RRM RRM SPOC SPOC Provoca aumento de la versatilidad funcional de las proteínas Levitt M. Nature of the protein universe. Proc Natl Acad Sci U S A. 2009 Jul 7 Objetivos del Analisis de Secuencia de Proteinas La identificación de dominios a nivel de secuencia, evaluando su conservación y distribución entre diferentes familias de proteínas. • Racionalizar e interpretar la similitud de secuencia en términos funcionales comunes, tales como: interacciones con otras moléculas o proteínas, mecanismos de reacción y/o regulación coincidentes, etc. • Y en definitiva, aportar nuevas hipótesis de funcionalidad común entre diversas familias de proteínas homólogas, para su posterior verificación experimental. • METHODS ON DOMAIN ORIENTED SEQUENCE ANALYSIS Common Name ID or ACC or GI Reference Database SRS - EBI Entrez – NCBI Buscar comparando: Sequence MRTSRGH..... Alignment > Profile RTNMSDAQQGSWYSDPKREGWFYN RTNMSDAQQGSWYSDPKREGWFYN RTNMSDAQQGSWYSDPKREGWFYN RTNMSDAQQGSWYSDPKREGWFYN RTNMSDAQQGSWYSDPKREGWFYN RTNMSDAQQGSWYSDPKREGWFYN RTNMSDAQQGSWYSDPKREGWFYN Secuencia contra Secuencias – BLAST Secuencias contra Perfiles – Pfam Buscar comparando: Perfil contra Secuencias – PsiBlast o HMMer Perfil contra Perfiles – HHpred Three Generations of Tools in Protein Sequence Analysis Reference Database Sequence MRTSRGH..... First Generation .... 1987 ● Sequence versus Sequence – BLAST Alignment > Profile RTNMSDAQQGSWYSDPKREGWFYN RTNMSDAQQGSWYSDPKREGWFYN RTNMSDAQQGSWYSDPKREGWFYN RTNMSDAQQGSWYSDPKREGWFYN RTNMSDAQQGSWYSDPKREGWFYN RTNMSDAQQGSWYSDPKREGWFYN RTNMSDAQQGSWYSDPKREGWFYN Second Generation .... 1997 ● Profile versus Sequence – PSSMs - PSI-BLAST & HMMer Third Generation .... 2005 ● Profile versus Profile – HHpred Detection of homologous protein sequences Reference Database Three Generations of Tools in Protein Sequence Analysis 1- Sequence versus Sequence (Blast) 2- Profile versus Sequence - PSSMs (PSI-BLAST & HMMer) 3- Profile versus Profile (HHpred) Reciprocal! http://en.wikipedia.org/wiki/BLAST Is common to have a high value of G (around 1015) and smaller for L (around 12) Why????? Two main Characterictics: •Combining Multiple Alignment Methods •Mixing Heterogenous Information • AND Admiring the amazing life's diversity GenBank Copyright Cédric Notredame, 2000, all rights reserved dbEST Sequence Domain Oriented Sequence Analysis Flow-Chart Sequence DataBases HMMer dbEST GenBank Domain Databases “As you will never be sure which are the right problems to work on, most of the time that you spend in the laboratory or at your desk will be wasted. If you want to be creative, HHpred ALIGNMENT HMMer then you will have to get used to spending most of your time DOMAIN Hypothetical Domain not being creative, to being becalmed on the ocean of scientific knowledge.” Steven Weinberg Biochemical Knowledge How Do the Pieces of the Functional Assignment Puzzle Fit Together? Functional Hypothesis Epistemology is the branch of philosophy concerned David B. Searls (2003) Pharmacophylogenomics: genes, evolution and drug targets Nature Reviews Drug Discovery. 2, 61323 with the nature and scope of knowledge. It questions what knowledge is, how it is acquired, and the possible extent to which a given subject can be known. REAL-LIFE EXAMPLES