Untitled
Transcription
Untitled
Sica, Mauricio P. Proceedings of the VCAB2 C por A2 B2 C se distribuye bajo una Licencia Creative Commons Atribución-NoComercialSinDerivar 4.0 Internacional. Basada en una obra en http://www.a2b2c.org/Proceedings_A2B2C_2014_Tablet. pdf. Fecha de catalogación: 10/09/2014 Diseño de tapa: Sica,MP Diagramación: Sica,MP VCAB2 C Sponsors ii VCAB2 C 5ta Conferencia Argentina de Bioinformática y Biología Computacional (VCAB2C) 5th Argentinian Conference on Bioinformatics and Computational Biology Program Committee Dr. Morten Nielsen (President) Center for Biological Sequence Analysis, Department of Systems Biology, The Technical University of Denmark - Biotechnological Research Institute, National University of San Martín - San Martín, Buenos Aires, Argentina. Dr. Gustavo Parisi Structural Bioinformatics Group (SBG), Department of Science and Technology, National University of Quilmes - Bernal, Buenos Aires, Argentina. Dr. Mauricio Sica Bioenergy Laboratory (IEDS-CONICET), Atomic Center Bariloche (CAB) - San Carlos de Bariloche, Río Negro, Argentina. Dr. Ignacio Sánchez Protein Physiology Laboratory, Exact and Natural Sciences Faculty, University of Buenos Aires - Buenos Aires, Argentina. Dr. Patricio Yankilevich Institute for Research in Biomedicine of Buenos Aires (IBioBA) CONICET, Institute of the Max Planck Society - Buenos Aires, Argentina. Dr. Juan Morales Ecotone Laboratory (INIBIOMA-CONICET), National University of Comahue - San Carlos de Bariloche, Río Negro, Argentina. Dr. Sebastián Bouzat Atomic Center Bariloche (CAB), National Atomic Energy Commission (CNEA) - San Carlos de Bariloche, Río Negro, Argentina. Steering Committee Dr. Mauricio Sica Bioenergy Laboratory, IEDS, CONICET, Atomic Center Bariloche (CAB) - San Carlos de Bariloche, Río Negro, Argentina. Dra. Belén Prados Environmental Sciences Laboratory, National University of Río Negro - Río Negro, Argentina. Dra. Carolina Bagnato National University of Río Negro - San Carlos de Bariloche, Rio Negro, Argentina. Dr. Sebastián Bouzat Atomic Center Bariloche (CAB), National Atomic Energy Commission (CNEA) - San Carlos de Bariloche, Río Negro, Argentina. Dr. Gabriel Paissan Department of Computational Mechanics, Atomic Center Bariloche (CAB) - San Carlos de Bariloche, Rio Negro, Argentina. iii VCAB2 C Dr. Ignacio Ponzoni Laboratory for Research and Development in Scientific Computing (LIDeCC), Department of Computer Science and Engineering, National University of South - Bahía Blanca, Argentina. Dra. Cristina Marino Buslje Structural Bioinformatics Laboratory, Institute of Biochemical Research in Buenos Aires (IIBBA), Leloir Institute Foundation - Buenos Aires, Argentina. Dr.Gustavo Parisi Structural Bioinformatics Group (SBG), Department of Science and Technology, National University of Quilmes - Bernal, Buenos Aires, Argentina. A2 B2 C Executive Commission President Vicepresident Secretary Treasure Board memberes Substitute Board memberes Audit Dr. Ignacio Ponzoni Dr. Gustavo Parisi Dra. Elizabeth Tapia Dr. Fernán Agüero Dra. Cristina Marino Buslje Dr. Arjen ten Have Dr. Marcel Brun Dra. Silvina Fornasari Dr. Ariel Chernomorez iv VCAB2 C Palabras preliminares Nuestra joven sociedad surgió recientemente de la necesidad de sus fundadores de dar impulso a un área desatendida en nuestro país, creando un espacio de cooperación, intercambio e identificación entre sus participantes. Ante la tarea de organizar la Conferencia 2014 nos propusimos continuar el proceso de consolidación de la identidad de esta nueva asociación. La pequeña ciudad patagónica de San Carlos de Bariloche, enclavada en un Parque Nacional privilegiado por sus recursos naturales, es considerada un polo científico y tecnológico de excelencia. Aquí tienen sus sedes tres universidades nacionales, institutos de CONICET y CNEA y empresas de base tecnológica como INVAP y Satellogic. Bariloche exporta tecnología nuclear y de telecomunicaciones a todo el mundo, empleando una importante parte de sus ciento cincuenta mil pobladores. Sin embargo, como en otras regiones de nuestro país y especialmente en la Patagonia, la historia de Bariloche está marcada por el aislamiento geográfico y político. Por lo tanto esta reunión constituye un paso más para ampliar e integrar la comunidad científica nacional y fortalecer su vinculación internacional. En los últimos años, la actividad científica en nuestro país vive un período de revitalización. Las políticas científicas se estabilizan con el consenso de la comunidad de investigadores, la población general revaloriza el papel de una ciencia nacional y el bloque geopolítico regional abre perspectivas de desarrollo autónomo. En este contexto, las sociedades científicas constituyen el espacio natural para que los investigadores canalicen conjuntamente acciones concretas para consolidad estos cambios y fortalecer el desarrollo científico nacional. Es nuestro deseo que esta conferencia enriquezca el desarrollo científico de los participantes, que promueva el encuentro cordial entre colegas y amigos y facilite el intercambio de conocimientos y experiencias entre grupos. Esperamos despertar el entusiasmo de los jóvenes, fomentando su curiosidad y aptitud para el intercambio con sus colegas. Pero también esperamos que estos días en Bariloche constituyan una experiencia integral fructífera que ayude a pensar nuestra historia y plantearnos los desafíos futuros de esta joven comunidad científica. Comisión Organizadora VCAB2 C 2014 Forewords Our young association, was born recently out of the necessity of its founders to encourage a discipline disregarded in our country, creating an environment for the cooperation, mutual exchange and identification between its members. One of our purposes in the organization of the Conference 2014 consists of continuing the consolidation of the identity of our novel association. The small Patagonian city of San Carlos de Bariloche, located in a National Park renown for its natural resources, is considered as a center of excellence for the science and technology. Here, three National Universities, institutes of CONICET and CNEA, and technological companies as INVAP and Satellogic have their seat. Bariloche exports technology on nuclear energy and telecommunications, employing an important part of its 150 thousand inhabitants. But, as in other regions of our country and particularly in Patagonia, the geographical and political isolation has left its mark in the history of Bariloche. Thus, this Conference is a further step in the process of expanding and integrating the national scientific community and strengthening its international links. In this recent years, the scientific agenda experiences a period of revitalization in our country. The policies on this field grow on stability with the consensus of the community of researchers, the general population recognizes the value of the science for our Nation and the geopolitical alliances in the region opens opportunities for an autonomous development. In this scenario, the scientific societies constitutes a natural means for the researchers to channel combined actions to consolidate this changes and reinforce the national scientific development. It is our wish that this conference enrich the scientific development of the participants, promote the meeting between v VCAB2 C colleges and friends, and facilitate the exchange of knowledge and experiences. We also hope to arise the enthusiasm of young researchers, enlivening their curiosity and aptitude for sharing and exchanging with their colleges. But we also wish that these days in Bariloche be a fruitful and integral experience to think about the history and future challenges of our young scientific community. Steering Committee VCAB2 C 2014 vi Contents Page Program Committee Steering Committe Executive Commission Forewords Program Main Lectures iii iii iv v ix 1 Tom L Blundell: Proteomes, Structural Biology and Drug Discovery: Visualization, Analysis and Molecular Modeling. Monday 22, 10:30AM. Nuria E. Campillo: 4:30PM. This thing called Cheminformatics. Monday 22, 2 2 Manfred Sippl: Buena Vista – a grand view on protein folds and folding.. Tuesday 23, 11:00AM. 2 Morten Nielsen: Algorithms in bioinformatics: Simple solutions to complex problems. Tuesday 23, 5:30PM. 3 Cristina Marino Buslje: Activating Mutations Cluster in the ’Molecular Brake’ Regions of Protein Kinases. Implications for Driver Mutation Prediction. Wednesday 24, 12:00AM. 3 Francisco Melo Ledermann: Towards a better understanding of the key molecular determinants that mediate protein-DNA recognition. Wednesday 24, 2:00PM. 4 Lectures 5 Ignacio Sanchez: Aminoacid metabolism conflicts with protein diversity. Monday 22, 12:00AM. 6 Gustavo E. Vazquez: The problem of Feature Selection in Cheminformatics: How can visual analytics help us?. Monday 22, 5:30PM. 6 Sebastian Fernández-Alberti: Collective vibrations and key residues associated to conformational selection upon ligand binding. Tuesday 23, 12:00AM. 6 Mariano C. González Lebrero: Expanding the boundaries of the quantumclassical simulations using GPUs and electronic dynamics.. Tuesday 23, 2:00PM. 7 Paolo Marcatili: High-throughput identification of antigens by metatranscriptomics and peptide chip technology. Tuesday 23, 5:30PM. 7 Oral Sessions 8 vii VCAB2 C Session 1: Structure prediction & protein function – Proteomics (Monday 22, 2:00PM) Session 2: Sequence analysis – Cheminformatics (Tuesday 23, 9:00AM) Session 3: Systems Biology & Networks (Tuesday 23, 3:00PM) Session 4: Genomics, functional genomics – Proteomics & functional proteomics (Wednesday 24, 9:30AM) Poster Session Sequence analysis System Biology and Networks Genome Annotation and Organization Evolution, phylogenetics and comparative genomics Genomics, functional genomics and metagenomics Metabolomics and Cheminformatics Proteomics and functional proteomics Structure prediction and protein function Index 9 13 18 20 27 28 35 42 47 51 56 59 61 72 viii VCAB2 C Conference Program Monday 22 9:00 10:00 10:30 11:30 12:00 12:45 14:00 16:00 16:30 17:30 18:30 Registration Open Ceremony Opening Lecture: Sir Tom Blundell Coffee break Lecture: Ignacio Sánchez Lunch time Oral session 1: Structure Prediction & Proteomics Coffee break Main Lecture: Nuria Campillo Lecture: Gustavo Vázquez Poster Session 9:00 10:30 11:00 12:00 12:45 14:00 15:00 16:00 16:30 17:30 18:30 Tuesday 23 Oral session 2: Sequence Analysis & Cheminformatics Coffee break Main Lecture: Manfred Sippl Lecture: Sebastián Fernandez-Alberti Lunch time Lecture: Mariano González Lebrero Oral session 3: Systems Biology Coffee break Main Lecture: Morten Nielsen Lecture: Paolo Marcatili Poster Session 9:30 11:30 12:00 12:45 14:00 15:00 15:30 Wednesday 24 Oral session 4: Genomics and Proteomics Coffee break Main Lecture: Cristina Marino-Buslje Lunch time Closing Lecture: Francisco Melo Poster Prizes Closing Ceremony and coffee break ix Main Lectures Main Lectures Proteomes, Structural Biology and Drug Discovery: Visualization, Analysis and Molecular Modeling Tom L Blundell Department of Biochemistry, University of Cambridge,Tennis Court Road, Cambridge CB2 1GA My talk will focus on the importance of understanding the structures of proteins and and the analysis of multiprotein assemblies in order to understand their central roles in cell regulation. I will describe the development of software that allows modelling and visualisation of the proteome of humans and their pathogens. I will discuss the increasing interest in targeting protein-protein interfaces of multiprotein assemblies in the design of chemical tools and therapeutic agents. Evidence is accumulating that such an approach will offer greater opportunities in improving specificity and selectivity compared to targeting active sites of proteases, protein kinases and other enzymes involved in post-translational modification. However, at the same time they pose new challenges, particularly because the protein-protein interfaces tend to be less ligandable than active sites. This thing called Cheminformatics Nuria E. Campillo Centro de Investigaciones Biológicas (CIB-CSIC) - Ramiro de Maeztu, 28040-Madrid-Spain Cheminformatics is the use of computer and informational techniques applied to a range of problems in the field of chemistry. Specifically in this talk we will look at the application of cheminformatics in drug development. A brief introduction about the cheminformatics will give us the way to see different applications of its use in two of our ongoing projects. In the first of them we use cheminformatic tools to develop a new strategy based on the design of multi-targeted drugs to treat AD. This strategy is based on the design of chemical compounds capable of interacting with multiple targets that are known to be involved in some aspects related to the development of this disease, such as cholinergic deficit and aggregation of β-amyloid peptide. The targets considered in this project are CB2R (cannabinoid system) and BuChE (cholinergic system). The second project deals with the development of neural network models for the prediction of blood-brain-barrier passage and human intestinal absorption. These models have been published on the EURL ECVAM’s DB-ALM website as in silico protocol to use as alternative method. Buena Vista – a grand view on protein folds and folding. Manfred Sippl C.A.M.E. - Center of Applied Molecular Engineering University of Salzburg - Department of Molecular Biology - Division of Structural Biology & Bioinformatics With the massive increase in the number of solved protein structures we begin to see more clearly how new protein folds arise from old templates. Proteins evolve as molecular complexes as opposed to single chain entities. Exploration of the manifold phylogenetic and functional relations among molecular complexes require specific tools for fast retrieval and visualization of structural matches. The problems involved challenge some fundamental issues in bioinformatics. We discuss current challenges and solutions. References • Sippl, M.J. & Wiederstein, M., Detection of spatial correlations in protein structures and molecular complexes. Structure Vol. 20, pp. 718-728 (2012) • Wiederstein, M., Gruber, M., Frank, K., Melo, F. & Sippl, M.J., Structure-based characterization of multiprotein complexes. Structure Vol. 22(7), pp. 1063-1070 (2014) • Sippl, M.J., On distance and similarity in fold space. Bioinformatics Vol. 24 (6) , pp. 872-873 (2008) 2 Main Lectures Web-services Structure Search (TopSearch): https://topsearch.services.came.sbg.ac.at/ Protein Structure Analysis (Prosa): https://prosa.services.came.sbg.ac.at/prosa.php Algorithms in bioinformatics: Simple solutions to complex problems Morten Nielsen Associate Professor Center for Biological Sequence Analysis, The Technical University of Denmark, Denmark Instituto de Investigaciones Biotecnológicas, Universidad Nacional de San Martín, San Martín, Buenos Aires, Argentina e-mail: [email protected] Data mining and machine learning are two central areas of bioinformatics. During the last decades, we have in my group developed a large panel of machine learning methods suitable for data mining and pattern recognition in biological data. Most of the methods are hybrids of standard machine learning methods including linear regression, Gibbs sampling, and artificial neural networks. Although very simple, these methods have proven highly accurate when it comes to identification of patterns in complex biological data. In my presentation, I will describe the background of some of these methods, illustrate their functionality on biological data, and outline areas where I believe they could be complemented by expanding into novel areas of machine learning such as Deep Learning or by making novel machine learning hybrids such as artificial neural network guided Gibbs Clustering. Activating Mutations Cluster in the ’Molecular Brake’ Regions of Protein Kinases. Implications for Driver Mutation Prediction Cristina Marino Buslje Bioinformatics Unit, Fundación Instituto Leloir, Capital Federal, Argentina Mutations leading to activation of proto-oncogenic protein kinases (PKs) are a type of drivers crucial for understanding tumorogenesis and as targets for anti-tumor drugs. However, bioinformatics tools so far developed to differentiate driver mutations, typically based on conservation considerations, systematically fail to predict activating mutations in PKs. Here we present the first comprehensive analysis of the 407 activating mutations described in the literature, which affect 41 PKs. Unexpectedly, we found that these mutations do not associate with conserved positions and do not directly affect ATP binding or catalytic residues. Instead, they cluster around three segments that have been demonstrated to act, in some PKs, as "molecular brakes" of the kinase activity. This finding led us to hypothesize that an auto inhibitory mechanism mediated by such "brakes" is present in all PKs and that the majority of activating mutations act by releasing it. Our results also demonstrate that activating mutations of PKs constitute a distinct group of drivers and that specific bioinformatics tools are needed to identify them in the numerous cancer sequencing projects currently underway. The clustering in three segments should represent the starting point of such tools, a hypothesis that we tested by identifying two somatic mutations in EPHA7 that might be functionally relevant. This article is protected by copyright. All rights reserved. 3 Main Lectures Towards a better understanding of the key molecular determinants that mediate protein-DNA recognition Francisco Melo Ledermann Faculty of Biological Sciences Pontificia Universidad Católica de Chile email address: fmelo at bio.puc.cl In this talk, a general description of several bioinformatics tools recently developed in our lab to assist the study of protein-DNA interactions will be provided. This include a database of protein-DNA interfaces, knowledge-based potentials to describe protein-DNA interactions, a software for the fullatom 3D modeling of duplex DNA and proteinDNA complexes and a PyMol plugin to visualize the binding interface of protein-DNA complexes. Additionaly, some preliminary results obtained from ongoing research on the validation of these bioinformatic tools and in the analysis of experimental data involving protein-DNA complex structures, protein-DNA binding assays and genomic data will be shown. 4 Lectures Lectures Aminoacid metabolism conflicts with protein diversity Ignacio Sanchez Krick T, Verstraete N, Alonso LG, Shub DA, Ferreiro DU, Shub M, Sánchez IE. Pab II, 4th floor, Lab QB-9, Facultad de Ciencias Exactas y Naturales Universidad de Buenos Aires - Buenos Aires, Argentina The twenty protein coding amino acids are found in proteomes with different relative abundances. The most abundant amino acid, leucine, is nearly an order of magnitude more prevalent than the least abundant amino acid, cysteine. Amino acid metabolic costs differ similarly, constraining their incorporation into proteins. On the other hand, a diverse set of protein sequences is necessary to build functional proteomes. Here we present a simple model for a cost-diversity trade-off postulating that natural proteomes minimize amino acid metabolic flux while maximizing sequence entropy. The model explains the relative abundances of amino acids across a diverse set of proteomes. We found that the data is remarkably well explained when the cost function accounts for amino acid chemical decay. More than one hundred organisms reach comparable solutions to the trade-off by different combinations of proteome cost and sequence diversity. Quantifying the interplay between proteome size and entropy shows that proteomes can get optimally large and diverse. The problem of Feature Selection in Cheminformatics: How can visual analytics help us? Gustavo E. Vazquez Facultad de Ingeniería y Tecnologías - Universidad Católica del Uruguay, Montevideo, Uruguay. Traditionally, the design of QSAR/QSPR models is a complex task; the identification of the most relevant descriptors that describe the phenomena under study constitutes a key step of this process. Most feature selection methods used for addressing this step are focused on pure statistical associations among descriptors and target properties, whereas the chemical knowledge is left out of the analysis. For this reason, the interpretability and generality of the QSAR/QSPR models obtained by these feature selection methods are drastically affected. Therefore, an approach to integrating chemist expertise in the selection process is needed for increase the user confidence in the final set of chosen descriptors, improving the interpretability of the final model. We will talk about how the visual analytics discipline can assist the model developer in the process of feature selection. Collective vibrations and key residues associated to conformational selection upon ligand binding Sebastian Fernández-Alberti Universidad Nacional de Quilmes, Roque Saenz Peña 352, B1876BXD Bernal, Argentina The conformational selection paradigm for receptor-ligand binding establishes that ligand-bound conformations are a subset of the ligand-free conformational space. Therefore, dynamic fluctuations associated to the ligand-free conformation should contain information about unbound-to-bound conformational changes in the receptor. Coarsegrained Normal Mode Analysis and Molecular dynamics simulations provide the required information to explore these features. Firstly, we present a procedure to identify and characterize dynamically relevant residues responsible of maintaining the conformational multiplicity associated to ligand-binding. The key residues can potentially be considered as fingerprints of protein function. Furthermore, they can be proposed as promising targets for mutational and functional studies. Next, we present a novel procedure to define and compare essential dynamics subspaces associated with ligand-bound and ligand-free conformations. Our procedure allows us to emphasize the main similarities and differences between the different essential dynamics. Essential dynamics subspaces associated to conformational transitions are also defined. 6 Lectures In this way, the extent through which conformational changes upon ligand binding are included in each conformerspecific essential dynamics can be evaluated. As a test case, the glutaminase interacting protein (GIP), composed of a single PDZ domain, is considered. Both GIP ligand-free state and glutaminase L peptide-bound state are analyzed. Expanding the boundaries of the quantum-classical simulations using GPUs and electronic dynamics. Mariano C. González Lebrero Instituto de Química y Fisicoquímica Biológicas - Facultad de Farmacia y Bioquímica Universidad de Buenos Aires - Buenos Aires, Argentina. The use of hybrid quantum-classical (QM / MM) simulation tools has proved useful for the response to questions in the field of chemistry and biochemistry. Proof of this is that it has been awarded with the Nobel Prize in Chemistry the main developers of thease techniques. The QM / MM current applications seek to describe the nuclear dynamics maintaining the electronic structure in the ground state. This approach does not allow the treatment of conditions in which electronic dynamics are relevant, for example in interaction with light and derived processes; electron transfer; among others. In this talk I will present the results of our efforts in order to expand the boundaries of the systems / processes for which these techniques can be applied. In particular I will focus on the use of GPUs to achieve simulate large systems at an affordable computational cost and the resent implementation of methods of electronic and electronic-nuclear dynamics based on the Real Time-Time Dependent Density Functional Theory (RT -TDDFT) scheeme. High-throughput identification of antigens by metatranscriptomics and peptide chip technology Paolo Marcatili Technical University of Denmark (DTU), Department of Systems Biology Lyngby, Denmark The experimental identification of antigens is a fundamental yet problematic task in vaccine discovery: many pathogens can hardly be cultivated, they might require specific environmental conditions to express their antigenic proteins, they might present a large number of subdominant antigens and induce a complex polyclonal antibody response in the host organism. In order to solve these problems we developed an integrated pipeline to detect simultaneously all the potential antigens for the ruminant disease Digital Dermatitis (DD), together with the specific immune response developed by the host cow, in a culture independent manner. We used a metatranscriptomic approach to identify all the genes expressed by the complex assemblage of DD-associated bacteria (mainly belonging to the treponema genus) and the immunologically relevant genes associated with the polyclonal immune response in more than 30 infected cows. From this extended pool of more than 80.000 bacterial transcript we identified, using structural and functional bioinformatics prediction tools, the 600 proteins more likely to be antigenic and subsequently screened those for antibody reactivity using a peptide-chip technology. On the other hand, the immune repertoire of B- and T-cell receptors expressed by each individual cow in response to the disease has been identified and analysed in order to provide further information for the development of the vaccine, such as the MHC specificity and eventually the molecular basis of antibody-antigen interaction. This novel integrated approach can be extremely powerful to develop new vaccines and to understand the complex interplay between pathogens and their interactions with the host immune system. 7 Oral Sessions Structure prediction and protein function ID:3 Structure prediction and protein function Oral Session Oral Session – Submission 3 Characterization of binding specificities of Bovine Leucocytes class I molecules: Impacts for rational epitope discovery Morten Nielsen1,2 , Andreas M. Hansen3 , Michael Rasmussen3 , and Soren Buus3 1-Center for Biological Sequence Analysis, Danish Technical University, Denmark 2-Instituto de Investigaciones Biotecnológicas, UNSAM, San Martín, Buenos Aires, Argentina 3-Laboratory of Experimental Immunology, Faculty of Health Sciences, University of Copenhagen, Denmark Background. The binding of peptides to classical major histocompatibility complex (MHC) classical class-I proteins is the single most selective step in antigen presentation [1]. However, the peptide binding specificity of the cattle MHC (bovine leucocyte antigen, BoLA) class I (BoLA-I) molecules remains poorly characterized. We have previously proposed a reverse immunology strategy for effective and rational epitope discovery based on in silico prediction tools combined with experimental peptide-binding data from recombinant bovine MHCs [2]. Our aim here is to extend this approach and improve the performance of the MHC peptide binding prediction methods NetMHC [3, 4] and NetMHCpan [5, 6] by integrating peptide binding affinity data for a limited set of prevalent BoLA MHC class I molecules. This will demonstrate how such an approach in a highly cost effective manner can be used to guide the search for CTL epitopes in cattle. Our strategy was to use a nonameric Positional Scanning Combinatorial Peptide Library (PSCPL) in combination with a high throughput peptide - MHC-I dissociation assay, and to feed this data into peptide binding prediction methods. [7]. Results. Using this strategy, we have characterized 8 BoLA-I molecules. The peptide specificity of the BoLA-I molecules was found to resemble that of human MHC-I molecules with primary anchors at P2 and P9, and, occasional auxiliary P1 and P3 anchors. Seven of the 8 molecules preferred hydrophobic, whereas one (BoLA-2*01201 (T2A)) preferred positively charged P9 terminal anchor residues. Anchors in the other positions were more diverse. An example of two characterized binding motifs is shown in figure 1. Figure 1: Sequence logo representation of the binding motif of (left) BoLA-HD6 and (right) BoLA-T2A. The sequence logos were generated using the Seq2Logo server [8] from the PLSPL binding data. We analyzed 9 reported CTL epitopes from the T. Parva, the causative agent of East Coast fever in cattle, and in 8 cases, stable and high affinity binding was confirmed. Likewise, cross binding was observed between functionally related MHCs. A set of peptides were tested for binding affinity to the 8 BoLA proteins and used to refine the predictors NetMHC and NetMHCpan. 9 Structure prediction and protein function ID:6 Oral Session Table 1: Experimental validation of binding affinity of 5 known BoLA-I restricted epitopes and the alternative minimal epitopes suggested by in silico predictions. Additional amino acids flanking the minimal epitope are underlined. The inclusion of BoLA specific peptide binding data led to a significant improvement in prediction accuracy for reported T. parva CTL epitopes. For an extended set of reported CTL epitopes with weak or no predicted binding, these refined prediction methods suggested presence of nested truncated minimal epitopes with high-predicted binding affinity. The enhanced affinity of the alternative peptides were tested and in all cases confirmed experimentally (see table 1), and in one case was the suggested new minimal epitope validated using tetramer straining (see figure 2). Figure 2: Validation of the alternative BoLA- B*04101 Tp2 epitope. T cells were stained with anti- bovin CD8 and different BoLA-6*04101 tetramers: (Left) Unfolded (no peptide), (Middel) The longer Tp2 peptide, (Right) The truncated optimal peptide. Data taken from [9]. Conclusions. To the best of our knowledge, this is the first study that demonstrates how biochemical peptide binding data combined with immunoinformatics can be effectively used to characterize the peptide binding motifs of BoLA-I molecules, and how such data can be used to boost performance of MHC-peptide binding prediction methods, empowering rational epitope discovery and aiding the understanding of T-cell immune response in cattle. References 1. Yewdell, J.W. and J.R. Bennink, Immunodominance in major histocompatibility complex class I-restricted T lymphocyte responses. Annual Review of Immunology, 1999. 17: p. 51- 88. 2. Nene, V., et al., Designing bovine T cell vaccines via reverse immunology. Ticks Tick Borne Dis, 2012. 3(3): p. 188-92. 3. Lundegaard, C., et al., NetMHC-3.0: accurate web accessible predictions of human, mouse and monkey MHC class I affinities for peptides of length 8-11. Nucleic Acids Res, 2008. 4. Nielsen, M., et al., Reliable prediction of T-cell epitopes using neural networks with novel sequence representations. Protein Sci, 2003. 12(5): p. 1007-17. 5. Nielsen, M., et al., NetMHCpan, a method for quantitative predictions of peptide binding to any HLA-A and -B locus protein of known sequence. PLoS ONE, 2007. 2(8): p. e796. 6. Hoof, I., et al., NetMHCpan, a method for MHC class I binding prediction beyond humans. Immunogenetics, 2009. 61(1): p. 1-13. 7. Harndahl, M., et al., Real-time, high-throughput measurements of peptide-MHC-I dissociation using a scintillation proximity assay. J Immunol Methods, 2011. 374(1-2): p. 5-12. 8. Thomsen, M.C. and M. Nielsen, Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion. Nucleic Acids Res, 2012. 40(Web Server issue): p. W281-7. 10 Structure prediction and protein function ID:6 Oral Session 9. Svitek, N., et al., Use of "one-pot, mix-and-read" peptide-MHC class I tetramers and predictive algorithms to improve detection of cytotoxic T lymphocyte responses in cattle. Vet Res, 2014. 45(1): p. 50. Structure prediction and protein function 13 C α and 13 Oral Session – Submission 6 C β chemical shift-driven refinement of protein structures Pedro G. Ramírez IMASL-CONICET. Universidad Nacional de San Luis, Italia 1556, 5700 - San Luis, Argentina Background. X-ray crystallography (XRC) and nuclear magnetic resonance (NMR) spectroscopy are the most powerful and predominant techniques used to experimentally determine the three–dimensional structures of biological macromolecules at near atomic resolution. On one hand, XRC has no size limitations and provides the most precise atomic detail, whereas information about the dynamics of the molecule may be limited. On the other hand, NMR– spectroscopy tops XRC in those cases where no protein crystals are available and, besides, it provides solution state dynamics. However, the main drawback of NMR-spectroscopy is the fact that it delivers lower resolution structures [1]. Because of this, validation, the process of evaluating the reliability for 3-dimensional atomic models, becomes critically important to protein structure determination via NMR-spectroscopy. Materials and methods. Our group has developed a protein structure validation method called CheShift-2 [2], which allows us to calculate the “differences” between observed and calculated chemical shifts for the nuclei of interest (13 C α and 13 C β ). This validation method indicates where, in the protein structure, the biggest “differences” are found. Thus, allowing us to modify the desired torsional angles, but keeping compatibility with all the existent experimental information, in such a way that the observed and computed chemical shift values at a local and global level are optimized. We use a refinement algorithm that identifies the residues that contain flaws and then modifies the protein structure’s torsional angles in a way that tend to diminish these flaws. The information to identify these residues is obtained by CheShift-2, and to perturb the protein structure we use the software package for prediction and design of protein structures, ROSETTA [3]. Conclusions. We evaluate our methodology by comparing the group of refined structures’ root mean square deviation (RMSD) and global distance test high accuracy score (GDT-HA) [4] against the same protein experimentally determined at high-quality level. Moreover, the physicochemical quality of the results were assessed with validation methods like PROCHECK [5] and MolProbity [6]. Acknowledgments. This work was supported by PIP-112-2011-0100030 (JAV) from IMASL-CONICET, Argentina, and Project 328402 (JAV) from UNSL, Argentina. The research was conducted by using the resources of a local Beowulf-type cluster at the IMASL-CONICET. References 1. Krishnan VR, B.: Macromolecular Structure Determination: Comparison of X- ray Crystallography and NMR Spectroscopy. eLS 2012. 2. Martin OA, Vila JA, Scheraga HA: CheShift-2: graphic validation of protein structures. Bioinformatics 2012, 28(11):15381539. 3. Raman S, Vernon R, Thompson J, Tyka M, Sadreyev R, Pei J, Kim D, Kellogg E, DiMaio F, Lange O et al: Structure prediction for CASP8 with all-atom refinement using Rosetta. Proteins 2009, 77 Suppl 9:89-99. 4. Zemla A: LGA: A method for finding 3D similarities in protein structures. Nucleic acids research 2003, 31(13):3370-3374. 5. Laskowski RA, MacArthur MW, Moss DS, Thornton JM: PROCHECK: a program to check the stereochemical quality of protein structures. Journal of Applied Crystallography 1993, 26(2):283-291. 11 Proteomics and functional proteomics ID:52 Oral Session 6. Davis IW, Leaver-Fay A, Chen VB, Block JN, Kapral GJ, Wang X, Murray LW, Arendall WB, 3rd, Snoeyink J, Richardson JS et al: MolProbity: all-atom contacts and structure validation for proteins and nucleic acids. Nucleic acids research 2007, 35(Web Server issue):W375-383. Structure prediction and protein function Oral Session – Submission 37 Physicochemical Characterization and Phylogenetic Classification of the 2/2 Hemoglobins Family sheds light on their Molecular Functions Juan P. Bustamante1,3 , Leonardo Boechi 2 , Leandro Radusky3 , Darío A. Estrín1 , Arjen ten Have4 and Marcelo A. Martí1,3 1 Departamento de Química Inorgánica, Analítica y Química Física, INQUIMAE-CONICET, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires, Argentina. [email protected] 2 Instituto de Cálculo, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires, Argentina 3 Departamento de Química Biológica, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires, Argentina 4 Instituto de Investigaciones Biológicas, CONICET, Universidad Nacional de Mar del Plata. Buenos Aires, Argentina The globin family of heme proteins offers a large, diverse set of proteins, whose function is tightly related to affinity and reactivity towards small ligands, mainly O2 but also NO, CO, and H2 S1 . Globins with high O2 affinity, generally function as O2 -redox related enzymes, like Mycobacterium tuberculosis 2/2 HbN NO dioxygenase, moderate affinity globins usually act as oxygen carriers, like mammalian myoglobin, while low O2 affinity globins are mostly NO or CO sensors, like soluble guanylate cyclase. We classified and characterized 1107 protein sequences of the 2/2 Hbs family, one of the three major globin subfamilies2, based on the assumption that a protein’s function is determined by its structure and physicochemical properties encoded by its sequence. We combined bioinformatics and structural biology with a phylogenetic reconstruction to describe and assign key 2/2 Hbs features that in turn determine O2 affinity. Our physicochemical model sheds light on molecular details of the O2 affinity and allows to estimate kinetic constants for 2/2 Hbs proteins. The predicted O2 affinities, based on ligand entry and stabilization, are substantiated by the evolutionary relationships demonstrated by the phylogenetic tree. The results offer a general and profound understanding of the putative functions of 2/2 Hbs in terms of protein diversity. References 1. Milani M, Pesce A, Nardini M, Ouellet H, Ouellet Y, Dewilde S, Bocedi A, Ascenzi P, Guertin M, Moens L, Friedman JM, Wittenberg JB, Bolognesi M. Structural bases for heme binding and diatomic ligand recognition in truncated hemoglobins. Journal of Inorganic Biochemistry. 2005. 99:97-109. 2. Vuletich DA and Lecomte JTJ. A Phylogenetic and Structural Analysis of Truncated Hemoglobins. Journal of Molecular Evolution. 2006. 62:196–210. Proteomics and functional proteomics Oral Session – Submission 52 On the analysis of vibrations associated to conformational selection upon ligand binding in a PDZ domain protein Marcos Grosso1 , Adrián Kalstein1 , Adrián Roitberg2 and Sebastián Fernández-Alberti1 1 Quilmes National University, Bernal, Argentina, Roque Saenz Peña 352, B1876BXD 2 Department of Chemistry, University of Florida, Gainesville, Florida 3261 The conformational selection paradigm for receptor-ligand binding establishes that ligand-bound conformations are a subset of the ligand-free conformational space. Therefore, dynamic fluctuations associated to the ligand-free conformation should contain information about unbound-to-bound conformational changes in the receptor. This concept emerged as an alternative for the traditional induced-fit model, based on the hypothesis that ligand-binding 12 Sequence analysis ID:25 Oral Session to the ligand-free conformations induces conformational transitions to the ligand-bound state. Molecular dynamics simulations provide the required information to explore these features. Its use in combination with subsequent essential dynamics analysis (1) allows separating large concerted conformational rearrangements from irrelevant fluctuations. We present a novel procedure to define and compare essential dynamics subspaces associated with ligand-bound and ligand-free conformations. Our procedure allows us to emphasize the main similarities and differences between the different essential dynamics. Essential dynamics subspaces associated to conformational transitions are also defined. In this way, the extent through which conformational changes upon ligand binding are included in each conformerspecific essential dynamics can be evaluated. As a test case, the glutaminase interacting protein (GIP), composed of a single PDZ domain, is considered. Both GIP ligand-free state and glutaminase L peptide-bound state are analyzed. Our findings concerning the relative changes in the flexibility pattern upon binding are in good agreement with previous NMR data. Subspace A S(lb) (SV) (SV) Subspace B S(lf ) S(lf ) S(lb) M 106 33 33 ζ 69.74% 96.34% 96.29% nD 92.8 31.8 31.8 Table 1: Comparison of ligand-free, ligand-bound and, conformational transition essential dynamics subspaces (S(lf ), S(lb) and, SV respectively). Conclusions. We have developed a general and novel procedure to define size and composition of conformer- specific, and conformational transition essential dynamics. We have also described a procedure to compare essential dynamics subspaces. The procedure is easy to implement and allows emphasizing the main similarities and differences between the different essential dynamics. We were able to explore the extent through which conformational changes upon ligand binding are included in each conformer-specific essential dynamics. We consider that the method is suitable to be applied in a large variety of cases such as the analysis of the effects of mutations on dynamics, design of new drugs that prevent conformational changes upon ligand-binding, and the analysis of conformational transitions induced by changes in cofactor oxidation states. MD simulations and PCA of GIP in its ligand-bound and ligandfree conformations have been considered as a test case. We have found that the sizes of the essential subspaces, required to include every PCA mode that participates significantly in any structural change observed during each of the MD simulations, are larger than the most frequently considered number of modes. The analysis of the essential dynamics subspace associated to conformational transitions indicates that in most cases mainly the βa-βb hairpin, and the β2-β3 loop are involved. Our findings are in good agreement with previous NMR data analysis performed by Mohanty et al(2). The relative changes in the flexibility pattern upon binding are in agreement with the general trend that, except in the regions of GIP that directly interact with the ligand, the ligand-bound conformation is more flexible than the ligand-free conformation. We observed that the conformational transitions involve more complex geometry distortions than the ones collected during the ligand-free MD simulations. The comparison of essential dynamics subspaces for ligand-free, ligand-bound, and conformational transition reveals the ligand-free and ligandbound MD simulations share almost 70% of their essential dynamics. Besides, the essential dynamics associated to the conformational transition is completely covered by the essential dynamics of each of ligand-free and ligand-bound states. In this way, the conformational selection model for binding is validated. Dynamic fluctuations associated to both conformations account for unbound-to-bound displacements. References 1. A. Amadei, A. B. M. Linssen, and H. J. C. Berendsen, Essential dynamics of proteins, Proteins 17 (1993), 412–425. 2. Zoetewey, D. L., M. Ovee, M. Banerjee, R. Bhaskaran,and S. Mohanty. 2011. Promiscuousbinding at the crossroads of numerous cancer pathways: insight from the binding of glutaminase interacting protein with glutaminase L. Biochemistry. 50: 3528–39. Sequence analysis Oral Session – Submission 25 13 Sequence analysis ID:25 Oral Session Evolution of linear motifs within the adenovirus E1A oncoprotein Juliana Glavina1 , Lucía B. Chemes2 , Rocío Espada1 , Ricardo Rodriguez de la Vega3 and Ignacio E. Sánchez1 1 Protein Physiology Laboratory, Departamento de Química Biológica and IQUIBICEN-CONICET, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires. 2 Protein Structure-Function and Engineering Laboratory. Fundación Instituto Leloir and IIBBA-CONICET. 3 Ecologie, Systématique et Evolution, CNRS, UMR 8079, Orsay, France and Ecologie, Systématique et Evolution, UMR 8079, Université Paris-Sud, Orsay, France Introduction. Many protein-protein interactions are mediated by linear sequence motifs of 5 function-determining residues, which are often found within intrinsically disordered domains [1]. Linear motifs appear or disappear with only a handful of point mutations and are thought to evolve rapidly. We have chosen the adenovirus E1A oncoprotein as a model to study sequence conservation and linear motif evolution. The E1A protein is unique to the adenovirus Genus Mastadenovirus, which infects mammals. Mastadenovirus types differ in their phenotypical traits, including host, tissue tropisms and oncogenic potential. E1A consists of 4 intrinsically disordered regions, designated Nt, CR1, CR2 and CR4, and one globular region designated CR3 [2]. We have analyzed the variability and evolution of 13 linear motifs in E1A and the relationship between different motif repertoires and virus phenotypes. Methods. We used over 100 E1A sequences from known mastadenovirus types to construct an alignment. We used the information content of each position in the alignment as a measure of conservation. Direct information is a measure used to infer direct co-evolutionary couplings among residue pairs in multiple sequence alignments, taking to a minimum the influence of indirect correlations. We used this approach to predict residue-residue contacts on the E1A protein. We also studied the variability in the linear motif repertoire for different E1A proteins. The motif repertoire was then represented superimposed on a phylogenetic tree of Mastadenoviruses. Last, we performed hypergeometric association tests on all individual combinations of linear motifs, phenotypic traits and hosts. Figure 1: Linear motifs within the E1A protein and E1A targets mapped to single or multiple binding sites and unmapped targets. Results. The E1A protein is densely packed with linear motifs that explain the high number of binding partners (Figure 1). The intrinsically disordered regions and the the globular CR3 region show a high degree of conservation along the whole length. We found pairs of co-evolving residues within each region as well as across regions, indicating that. The different motifs showed different abundance and distribution patterns. Some were highly conserved and some were present only in a few species. Conclusions. E1A linear motifs evolve rapidly and follow motif-specific trends. The different motifs and regions of the protein did not evolve independently as shown by co-evolution, and evolutionary analyses. A lack of globular structure does not necessarily lead to a lower degree of sequence conservation. 14 Sequence analysis ID:2 Oral Session Acknowledgments. We acknowledge funding from Agencia Nacional de Promoción Científica y Tecnológica (PICT 2012-2550 to I.E.S), Consejo Nacional de Investigaciones Científicas y Técnicas (doctoral fellowship to J.G., L.B.C. and I.E.S. are CONICET career investigators) References 1. Davey NE, Trave G, Gibson TJ. Trends Biochem Sci (2011) 36: 159-169. 2. Pelka P, Ablack JN, Fonseca GJ, Yousef AF, Mymryk JS. J Virol (2008) 82(15):7252-63 Sequence analysis Oral Session – Submission 2 On the design of shortened BCH barcode Laura Angelone1,2 ,† , Flavio E. Spetale1,2 , Javier Murillo Tapia 1,2 1 2 1 , Joaquin Ezpeleta 1 , Pilar Bulacio Elizabeth CIFASIS-Conicet Institute, Rosario, Argentina Fac. de Cs. Exactas e Ingeniería, Universidad Nacional de Rosario, Argentina †E-mail: [email protected] Abstract. Binary BCH codes have been recently proposed for the design of barcoding systems of high multiplexing capacity suitable for use in sequencing platforms impaired by mismatch errors. We generalize the design of BCH barcodes by introducing shortened BCH barcodes, a class of barcodes built from binary BCH codes allowing otherwise prohibited barcoding sizes. Introduction. The DNA barcoding problem is indeed an instance of a largely studied problem in Communication Theory, the error-free transmission of discrete patterns in the presence of random noise [1], a problem which leads to the theory of error correcting codes. Since the recognition of this fact in 2008 [2], few works have considered the systematic design of coding-based barcode systems. With main focus on sequencing platforms impaired by mismatch errors, we generalize the design of BCH barcodes [3] by introducing shortened BCH barcodes. Results. Binary BCH codes of size n = 2m − 1, m ≥ 4, can be used for the construction of barcodes of N = 8, 16, 32, . . . bases [3]. For given n, multiple t > 1 error-correction options are possible. Hence, BCH barcodes can indeed be used to correct at least b = 2t base mismatches. To improve the design flexibility of BCH barcodes allowing intermediate N settings, shortened binary BCH codes can be considered. Shortening BCH codes with parameter s > 0 reduces the number of informative bits from k to k 0 = k − s preserving the number of redundant bits. Hence, improved error correction abilities at the expense of diminished multiplexing capacity can be expected for shortened BCH barcodes. By means of shortening, BCH barcodes of size N = n+1−s for s even or N = n−s 2 2 for s odd can be designed. To recover from sequencing errors, shortened BCH barcodes must be first demapped to the binary domain where earlier removed bits must be reinserted. As with standard BCH barcodes, shortened BCH barcodes must avoid homopolymer regions [4] and take into account well-known chemistry constraints. Most of these constraints have been already taken into account in the design of Barcrawl [5], a tool for the ab-initio design of primer barcodes for pyrosequencing applications. Hence, before their deployment, candidate barcodes are passed through an adapted version of the Barcrawl tool. For each barcoding system of size N built from a given error correcting code of size n, a wide range of error correction and multiplexing abilities were evaluated. For practical purposes, N was limited to 30 bases and thus, binary BCH codes of size n ∈ {15, 31, 63} and shortened versions of them were considered. Barcoding systems were evaluated through their multiplexing capacity M , their barcoding rate B and their probabilities pe and pu of detected and undetected barcode identification errors. For each N , we define M as the maximum number of barcodes which are compatible with the given sequencing chemistry. Similarly, we define B as the actual fraction of informative quads per barcode, i.e., B = logN4 M . The multiplexing capacity M of BCH barcodes size N on ideal mismatch sequencing channels depends on the desired pe and pu for the given ps of the corresponding QSC model. To accomplish a strict control of pu , we looked for BCH barcodes able to satisfy the 15 Metabolomics and Cheminformatics ID:31 Oral Session operational constraint pu < 10−8 at ps = 10−2 for N ≤ 27. We found that the desired operational constraint could be only satisfied with shortened binary BCH codes of size n = 63. As shown in Table 1, a broad range of (M, pe ) configurations can be obtained. N 21 22 24 25 27 M 86 384 pe = 10−5 B (n, k, t, s) 0.168 (63, 30, 6, 21) 0.187 (63, 30, 6, 19) M 73 295 pe = 10−5 B (n, k, t, s) 0.142 0.144 (63, 24, 7, 15) (63, 24, 7, 13) M 72 pe = 10−5 B (n, k, t, s) 0.148 (63, 18, 10, 9) Table 1: The multiplexing capacity M and the barcoding rate B accomplished by BCH barcodes of size N built from shortened binary BCH codes of size n able to carry k informative bits and to correct at least t binary errors when variable shortening degrees s are used. BCH barcodes are constrained to accomplish increasingly stringent pe settings with pu < 10−8 for ps = 10−2 on a QSC channel model. Acknowledgments. LA’s, FES’s, JM’s, JE’s, PB’s and ET’s work was supported by project PICT 2012- 2513, SECYT, Argentina References 1. Calderbank AR (1998) The art of signaling: fifty years of coding theory. IEEE Transactions on Information Theory 44. 2. Hamady M, Walker JJ, Harris JK, Gold NJ, Knight R (2008) Error-correcting barcoded primers for pyrosequencing hundreds of samples in multiplex. Nat Methods 5: 235–237. 3. Krishnan A, Sweeney M, Vasic J, Galbraith D, Vasic B (2011) Barcodes for dna sequencing with guaranteed error correction capability. Electronic Letters 47: 236–237. 4. Meyer M, Kircher M (2010) Illumina sequencing library preparation for highly multiplexed target capture and sequencing. Cold Spring Harb Protoc 2010: pdb.prot5448. 5. Frank D (2009) Barcrawl and bartab: software tools for the design and implementation of barcoded primers for highly multiplexed dna sequencing. BMC Bioinformatics 10: 362. Metabolomics and Cheminformatics Oral Session – Submission 31 A multilayer network approach for guiding drug repositioning in neglected diseases Ariel J Berenstein1,2 ,* María P Magariños3,1 , Ariel Chernomoretz1,2 , Fernán Agüero3 1 Laboratorio de Bioinformática, Fundación Instituto Leloir, Buenos Aires, Argentina 2 Departamento de Física, Universidad de Buenos Aires, Buenos Aires, Argentina 3 Laboratorio de Genómica y Bioinformática, Instituto de Investigaciones Biotecnológicas, Universidad de San Martin, San Martín, Buenos Aires, Argentina. Background. Neglected tropical diseases (NTDs) are human infectious diseases that occur in tropical or subtropical regions and are often associated with poverty. Historically, lack of interest from the pharmaceutical industry, resulted in the lack of drugs to combat the majority of the pathogens that cause these diseases. Recently, the availability of open chemical information has increased with the advent of public domain chemical resources and the release of data from high throughput screening assays. In our laboratory, our goal is to prioritize and identify candidate drug targets, and candidate drug-like molecules to foster drug development in for these diseases. For this we use comparative genomics, and chemogenomics approaches. Materials and methods. Chemical data-sets, including bioactivity data against pathogen and non- pathogen targets were obtained from open databases and high throughput screenings. Using these data, we built a multilayer network considering three disjoint set of vertexes with 1.48 106 drugs and 1.67 105 proteins across 221 species and a few key protein features (orthology, Pfam domains, participation in defined metabolic pathways), organized in three different layers (Fig. 1A). Three different classes of target similarity criteria were considered: sharing of PFAM domains 16 System Biology and Networks ID:27 Oral Session present in the same protein, clustering in the same ortholog group (OrthoMCL algorithm), and belonging to the same metabolic pathway. Only statistically significant terms (in context of drug-target predictions) were taken into account. A bipartite projection was made using a modified version of the Zhou method [2] over the protein layer (Fig. 1b). In the resulting monopartite protein projected network, proteins are linked if and only if, they share at least one relevant biological entity. Taking advantage of this approach, we first tackled the problem of prioritizing targets for drug discovery in the absence or scarcity of bioactivity data for an organism of interest. For this, given an organism of interest we took advantage of the network to get a global prioritized list of promising targets in the query species. In a second application, we suggest candidate targets for orphan compounds, which have been shown to be active in whole-cell or whole-organism screenings but whose target is currently unknown. In this case, we aim to obtain reduced prioritization list of target proteins for the orphan molecule. Figure 1: Schematic representation of data and workflow. A: Multilayer representation of drug-target data, first layer (bottom) contains drugs with any known bioactivity over proteins represented in the second layer. Top plane contains significant biological entities involving proteins of different organisms (orthologs, metabolic pathways and PFAM domains). B. Bipartite projection of protein-entities layers in a protein-projected network (PP-Layer). In the resulting monopartite protein projected network, proteins are linked if and only if, they share at least one relevant biological entity. Results. We find that our approach allow us to get statistically significant prioritized lists in both pathogen and model organisms, as evaluated by a tenfold cross validation procedure. Moreover, we found that our method overcomes traditional sequence-alignment based approaches like FASTA. We will discuss a number of interesting targets in pathogen organisms which have been prioritized under the assumption that no bioactivity information was available for them. On the other hand, we found our approach is especially useful to get reduced prioritization lists of target proteins for orphan query molecules. We did this in two ways: 1) in silico, by generating artificially orphaned compounds, via a leave one out procedure, and 2) in a post-facto validation of the strategy, in which we analyzed a number of suggested targets for compounds that are active against P. falciparum. Overall, our results suggest that it is possible to identify candidate drug targets, either for complete query species or for orphan compounds, even in the absence of species-specific inhibition data. This is particularly important in the case of neglected diseases, as this means we can leverage data from model organisms (or from other tropical diseases) to guide drug repositioning exercises for these important diseases Acknowledgments. We acknowledge support from CONICET (fellowships and salaries) and from ANPCyT (Agencia Nacional de Promoción Científica y Tecnológica (grant PICT-2010-1479) References 1. Magariños et al. (2012) Nucleic Acids Res 40: D1118-D1127. DOI: 10.1093/nar/gkr1053 2. Zhou et al. Phys Rev E. (2007) 76:046115. DOI: 10.1103/PhysRevE.76.046115 17 System Biology and Networks ID:27 System Biology and Networks Oral Session Oral Session – Submission 27 Microarray Metanalysis and Gene Regulatory inference of LPS transactivation of macrophage 1-α-hydroxylase Romina Martinelli1 , Lucas Daurelio2 , Luis Esteban1,2 1 2 Facultad de Ciencias Médicas, Universidad Nacional de Rosario, Santa Fe, Argentina Instituto de Biología Molecular y Celular de Rosario (IBR-CONICET). Facultad de Ciencias Bioquímicas y Farmacéuticas, Universidad Nacional de Rosario, Santa Fe, Argentina E-mail: [email protected] Background. 25-Hydroxyvitamin-D can be activated to 1,25-dihydroxyvitamin-D3 [1,25(OH)2 D3] by the rate-limiting enzyme 1-α-hydroxylase. Particularly, in cells of the immune system this enzime is under control of immune stimuli. In pathological situations, such tuberculosis, this can lead to systemic excess of 1,25(OH)2 D3 and hypercalcemia. Despite there are some studies of LPS transactivation of macrophage 1-α-hydroxylase, all of them are focused on the most relevant transcriptional factors involved, but no systems approach was used to examine the complex interaction that involves the enzime regulation. Materials and methods. To make it, we employed microarray data from human macrophages, obtained from GEO (6), using ”macrophages” and ”LPS” as key words. The experiments made at least by triplicates (see Table 1). The meta-analysis was performed with INMEX (2). To perform differential expression analysis on individual data sets Benjamini-Hochberg’s False Discovery Rate (FDR) was settled. To combine p-values from multiple studies for information integration Fisher’s method was chosen. Enrichment in Pathways and Go analysis using hypergeometric test were done to get functionality information. List of genes significantly enriched in a particular pathway which contain to 1-α-hydroxylase was selected. The list was feed with regulatory proteins and enzimes names detected in the lab involved in the 1-α- hydroxylase response (4). The final list was loaded in Genemania (7). In order to improve the visualization and curate it after the network was deployed in Cytoscape (8). Results and discusion. As it was expected according to the literature, the Gene Differentially Expressed List was rich, among others in: Jak-STAT signaling pathway ,Transcriptional misregulation in cancer, Chemokine signaling pathway, and Toll-like receptor signaling pathway. Interesting the 1-hydroxylase gene mapped to tuberculosis pathway, one clinical association which gave the first insight of the extra-renal activity of this enzyme. The network show various hubs; Myd88, Stat1α and TIrap seem to be interesting from the 1-α-hydroylase regulation. We found most 18 System Biology and Networks ID:44 Oral Session of the transcription factors described before to interact with 1-α-hydroxylase promoter, namely NFKB1, CREB, STAT1α, C/EBPβ and Jun. All of them possesses binding sites in the hydroxylase promoter. It was previously shown by transfection studies and gel shift assays that C/EBPβ (1,4,5) plays a role in 1-α-hydroxylase induction by direct binding to specific recognition sites in the promoter, whereas for STAT1α no such direct effects could be demonstrated. Cross-talk between the JAK-STAT, the NF-kappaB, and the p38 MAPK pathways should be explored. The new functional relationship of others proteins also were detected C/EBPβ- NFKB, C/EBPβ-Jun. This deserve further exploratory studies to confirm them. References 1. Overbergh L, Stoffels K, Mark Waer, Verstuyf A, Bouillon R, Mathieu C: Immune Regulation of 25-Hydroxyvitamin D-1 α-Hydroxylase in Human Monocytic THP1 Cells: Mechanisms of Interferon-γ-Mediated Induction. The Journal of Clinical Endocrinology and Metabolism 91(9):3566 -3574 .2006 2. Xia J, Fjell C, Mayer M, Pena O, Wishart D, Hancock: INMEX – a web-based tool for integrative meta-analysis of expression data. Nucleic Acids Res, 41, W63-70. 2013 3. Xaus J, Comalada M, Valledor A, Lloberas A, López-Soriano F, Argilés J, Bogdan C, Celada A. LPS induces apoptosis in macrophages mostly through the autocrine production of TNF-α. Blood. June 15, 2000; 95 (12) 4. Esteban L, Vidal M, Dusso A: 1-α-Hydroxylase transactivation by γ -interferon in murine macrophages requires enhanced C/EBPβ expression and activation. J Steroid Biochem Mol Biol. 2004 May;89-90(1-5):131-7. 5. Esteban L., Vidal M., Dusso A. LPS transactivation of microphage: role in local control of immune response. XLI Reunión Anual de la Sociedad Argentina de Investigación en Bioquímica y Biología Molecular (SAIB). X CongresS Panamerican Association for Biochemistry and Molecular Biology (PABMB). Pinamar, Buenos Aires, Argentina. 3 al 6/12/2005. Poster. Publicado en BIOCELL (ISSN: 0327-9545) Vol 29, 2005. 6. Gene Expression Omnibus (GEO) http://www.ncbi.nlm.nih.gov/geo/ 7. GeneMANIA. http://www.genemania.org/ 8. Cytoscape. http://www.cytoscape.org/ System Biology and Networks Oral Session – Submission 44 Encoding of spatial location and state of motion by the hippocampal region Soledad Gonzalo Cogno1 , Emilio Kropff2 , Marcelo Montemurro3 , Inés Samengo1 1 Statistical and Interdisciplinary Physics Group, Instituto Balseiro and Centro Atómico Bariloche, San Carlos de Bariloche, Argentina, 8400 2 Instituto Leloir, Ciudad Autónoma de Buenos Aires, Argentina, C1405BWE 3 Faculty of Life Sciences, University of Manchester, Manchester, UK, M13 9PT Over the last hundred years, the hippocampus has been one of the brain’s most studied structures. Many experiments have suggested the rodent hippocampus plays an important role in spatial navigation. In other mammals (in particular, humans) the encoding of space is believed to be only one function among many others that require to store and retrieve information from memory. Studying the way the hippocampus encodes spatial locations, therefore, is a gateway to understand the structures involved in mnemonic functions. The hippocampal region contains several anatomical structures located in the temporal lobe, including the hippocampus per se and the entorhinal cortex. These two areas are known to encode spatial information using two different neural codes. In rodents, the firing rate of pyramidal cells in the hippocampus is strongly correlated with the location of the animal: Each cell fires only when the rat is in a specific place. These specific places are the place fields, and such neurons are called place cells (figure 1A [1]). Paralleling place cells in the hippocampus, grid cells can be found in the entorhinal cortex. They have multiple firing fields organized in a hexagonal lattice (figure 1B [1]). Entorhinal neurons are hence activated whenever the animal’s position coincides with any of the vertices of the lattice. 19 Structure prediction and protein function ID:48 Oral Session Our work is focused on analyzing electrophysiological recordings obtained in awake and behaving animals. The experiment consists of a rat running along a linear track while the kinematic properties of the trajectory (position, velocity and acceleration) are registered with an optical system. Simultaneously, the mean-field electric potential of both the entorhinal cortex and the hippocampus are recorded with extracellular electrodes. We find that the electrophysiological signals not only encode the position of the animal, but also the velocity and the acceleration. Moreover, through an information-theoretical analysis, we see that more information flows from the entorhinal cortex to the hippocampus than in de inverse direction. During this talk, I will discuss how the kinematic state of the animal affects the electrophysiological signals and the information flow between the entorhinal cortex and the hippocampus. A B Figure 1: Place cells and grid cells A. Firing pattern of a place cell in a linear track. The rat’s position is depicted in black. The red points indicate the positions at which the cell fired. B. Firing pattern of a grid cell in an open field. References 1. György Buzsáki and Edvard I Moser: Memory, navigation and theta rhythm in the hippocampal-entorhinal system. Nature Neuroscience 2013, 16:130-138. Structure prediction and protein function . Oral Session – Submission 48 A system biology approach to evaluate endometrial maturation in women that developed preeclampsia Ezequiel Juritz Structural Bioinformatics Group. National University of Quilmes, Buenos Aires, Argentina. Background. Native protein structure fluctuates between an ensemble of structural conformers connected by a dynamic equilibrium that is defined by physicochemical parameters of the environment. The conformational changes observed between the structural conformers are significant, with an average RMSD of 1.34 Å and a maximum of 7.15 Å (Monzon, Juritz, Fornasari, & Parisi, 2013). As different conformers can bind ligands with different energy, the presence of ligands can shift the equilibrium through one or a set of specific conformers. In the present work we study how different conformers of the same protein may lead to differential outcomes when performing structure| based computational calculations. Materials and methods. We studied a total of 41,884 protein|ligand interactions, from 5,292 different ligand|binding protein. These proteins were cross linked against CoDNaS database, retrieving 78,113 structural conformers. All available structures of each protein were docked against one or more of its ligand using AutoDock Vina (Trott & Olson, 2010), using 5,277 ligands. The estimated binding energy was estimated from every conformer|ligand interaction. When cross linking proteins against CoDNaS database, an average of 44 structures per protein were recruited. Results and discussion. Significant differences of ligand binding energies were obtained from different conformers. The energies vary from -17.84 and 2.80 kcal/mol. 10% of the protein-ligand interaction studied presents a standard deviation greater that 1 kcal/mol, while the average standard deviation is 0.46 Kcal/mol. We found no relation 20 Genomics, functional genomics and metagenomics ID:10 Oral Session between the RMSD between conformers and the ligand binding energy differences, suggesting that local structural rearrangements could impact on the thermodynamic landscape of ligand binding. Structure-based computational calculations should consider protein conformational diversity in order to improve accuracy. References 1. Monzon, A., Juritz, E. I., Fornasari, M. S., & Parisi, G. D. (2013). CoDNaS: a database of Conformational Diversity in the Native State of proteins. Bioinformatics (Oxford, England), submitted. 2. Trott, O., & Olson, A. J. (2010). AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization and multithreading. Journal of Computational Chemistry, 31(2), 455-461. Genomics, functional genomicsand metagenomics Oral Session – Submission 10 Classification of Bovine Coat Color based on Genotype Diego Comas1,2 , Marco Benalcázar1,2,3 , Inti Pagnuco1,2 , Pablo Corva4 , Gustavo Meschino5 , Marcel Brun1 , Virginia Ballarin1 1 Digital Image Processing Group, Facultad de Ingeniería, Universidad Nacional de Mar del Plata, Mar del Plata, Argentina 2 Consejo Nacional de Investigaciones Científicas y Técnicas, CONICET, Mar del Plata, Argentina 3 Secretaría Nacional de Educación Superior, Ciencia, Tecnología e Innovación (SENESCYT), Ecuador. 4 Facultad de Ciencias Agrarias, Universidad Nacional de Mar del Plata, Balcarce, Argentina. 5 Bioengineering Lab, Facultad de Ingeniería, Universidad Nacional de Mar del Plata, Mar del Plata, Argentina Introduction. Introduction Several current research projects focus on the creation of haplotype maps that identify and describe common genetic variations in some species. Studies on haplotype maps are key in the understanding of how natural selection has produced genomic differences between subspecies of a population. A Single Nucleotide Polymorphism (SNP) is a DNA sequence variation occurring commonly within a population (above 1%) in which a Single Nucleotide in the genome differs between members of biological species or paired chromosomes. Those which are located in coding sequences are likely to alter the biological function of a protein, and therefore to have an effect on the phenotype of an individual. Pattern recognition plays an important role in Genomic Signal Processing (GSP) for detection, prediction, classification, control, and statistical modeling of gene networks. One of the goals of GSP is to provide researchers with new hypothesis about biology, which can be used for systems-based applications and on confirmatory experiments, respectively [1]. Here we present an application of GSP, based on the use of pattern recognition techniques in order to find subsets of SNPs, from a given set of SNPs, which best predicts coat color phenotype in cattle. Once identified the SNPs, they could be used in additional studies to confirm whether they are related to the underlying signaling mechanism that determines the phenotypes under study. Variation in coat color and spotting patterns of cattle have been extensively studied because there is evidence that animals with light-colored hair coat and darkly pigmented skin are better adapted under tropical conditions with high levels of solar radiation [2, 3]. We selected an initial set of 18 SNPs, or features in the language of pattern recognition, linked to the melanocortin 1 receptor (MC1R) gene on bovine chromosome 18, which is involved in regulating hair color [4]. Materials and methods. We used a dataset Dr composed of n=285 feature-label pairs, where each data vector is formed by 18 features corresponding to eighteen SNPs selected, located between the base pairs 13, 776, 888 and 13, 778, 639, which corresponds to the region of chromosome 18 that contains the gene MC1R [4]. The dataset belongs to the Bovine Genome Assembly version Btau-4.0 [5]. This dataset contains 132 black and 153 red hair color samples, with proportions of 0.46 and 0.54 respectively. In this context, the goal of this work is to find the best small subset of features (SNPs) that predicts, with high accuracy, the cattle coat color. The analysis includes the evaluation of the performance of the classifiers designed based on those features. Classification rules used in this work are Pyramidal Multiresolution [6], k- Nearest-Neighbor (kNN) [7], Logistic Regression [8], Linear Discriminant Analysis (LDA) [9], and Support Vector Machines (SVMs) [9]. To evaluate the performance of the designed classifiers, based on the best subset of features, we use the holdout method for error estimation [8]. We split randomly the dataset Dr 21 Genomics, functional genomics and metagenomics ID:10 Oral Session into 2 disjoint subsets Dtr ain and Dtest , of size 185 and 100, respectively, maintaining the class proportions. Using the training dataset Dtr ain , we test all the possible combinations of 2, 3, 4, and 5 features from the original set of 18 features, with a total of 12,597 features subsets to check. We rank these subsets by estimating the error of each classification rule using the K- fold cross-validation method [9] with K = 5. Finally, once we find the best subset of features for each classification rule, we use that subset to design a classifier using all the 185 samples from Dtr ain . The performance of that classifier is computed as its average error over the 100 left-out samples that belong to Dtest . Results. Table 1 shows the results of the classification of the coat color phenotype, based on genomic data from chromosome 18 in the positions corresponding to the gene MC1R. For the five classification rules tested, (i.e., Pyramidal Multiresolution, Logistic Regression, LDA, kNN, and SVM), it displays the SNPs identifiers obtained in the stage of feature selection, and estimates of the error rate, False Positive Rate (FPR), and False Negative Rate (FNR) based on the hold-out dataset Dtest. Analyzing the results presented in Table 1, the classification rule with best performance was LDA with an error of 21%. Among the SNPs selected as best predictors of the coat color phenotype, there are four SNPs which are the most frequent. These SNPs are the identifiers ‘BTA-161389 ’, ‘BTA42498 ’, ‘rs29020085 ’, and ‘rs29020087 ’. The performances of the other classification rules are all above 74%. It should be noted that SVM is the only rule that needed only 4 SNPs to reach maximum performance. Method Pyramidal Multires. Logistic Regression LDA kNN SVM SNPs identifiers ‘BTA-42498’ ‘rs29011168’ ‘rs29020087’ ‘rs29021759’ ‘BTA-161389’ ‘BTA-21794’ ‘rs29020085’ ‘rs29021758’ ‘BTA-161389’ ‘BTA-42498’ ‘rs29020087’ ‘rs29021757’ ‘BTA-161389’ ‘BTA-42498’ ‘rs29020086’ ‘rs29020087’ ‘BTA-161389’ ‘BTA-21794’ ‘rs29021758’ ‘rs29020085’ Error 23% FPR 25.92% FNR 19.56% ‘rs29011163’ 26% 33.33% 17.39% ‘rs29020085’ 21% 24.07% 17.39% ‘rs29020085’ 22% 24.07% 19.56% ‘rs29011168’ 26% 31.48% 19.57% Table 1: Classification results of the coat color phenotype of the 5 classification rules used. Error rate, False Positive Rate (FPR), and False Negative Rate (FNR) are shown. Conclusions. According to the results for the five classification rules tested, the best rule, i.e., with minor classification error, is LDA, with an error of 21%. Although this is not a low error rate, it shows the feasibility of this approach to search for biological markers that predict a given phenotype, in this case the coat color. The SNPs identified by this approach can be useful as a guide for future biological tests, which should confirm, or not, the influence of these SNPs on the phenotype. Although the influence of the MC1R gene in the primary determination of the coat color is already known, this work shows which SNPs, located in this gene, are more likely to be related to the variations. It is important to note that it is biologically shown that this phenotype is also influenced by other genes [10]. Because of this, results could be improved (i.e., decreasing the error rates) by including SNPs from other genes involved in this phenotype. However, a larger initial set of SNPs would make harder the feature selection process and increase the potential risk of overfitting when designing the classifiers. Acknowledgment. Diego Comas, Marco Benalcázar and Inti Pagnuco acknowledge support from Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Argentina References 1. Ridder D, Ridder J, Reinders M: Pattern recognition in Bioinformatics. Brief Bioinform 2013, 14:633-647 2. Finch VA, Western D: Cattle colours in pastoral herds: natural selection or social preference. Ecology 1977, 58:1384 3. Finch VA, Bennetta IL, Holmesa CR: Coat colour in cattle: effect on thermal balance, behaviour and growth, and relationship with coat type. Journal of Agricultural Science 1984, 102:141-147 22 Genomics, functional genomics and metagenomics ID:49 Oral Session 4. Stella A, Ajmone-Marsan P, Lazzari B, Boettcher P: Identification of Selection Signatures in Cattle Breeds Selected for Dairy Production. Genetics 2010, 185:1451-1461 5. The Bovine HapMap Consortium: Genome-Wide Survey of SNP Variation Uncovers the Genetic Structure of Cattle Breeds. Science 2009, 324:528-532 6. Dougherty ER, Barrera J, Mozelle G, Kim S, Brun M: Multiresolution analysis for optimal binary filters. Journal of Mathematical Imaging and Vision 2001, 14:53-72 7. Rajini NH: Classification of MRI brain images using k-nearest neighbor and artificial neural network, 2011 International Conference on Recent Trends in Information Technology, Chennai, India, 2011, pp 563- 568 8. Devroye L, Györfi L, Lugosi G: A Probabilistic Theory of Pattern Recognition. Springer-Verlag 1996, Berlin Heidelberg 9. Duda R, Hart P, Stork D: Pattern Classification. Wiley-Interscience 2001, 10. Hanna LLH, Sanders JO, Riley DG, Abbey CA, Gill CA: Identification of a major locus interacting with MC1R and modifying black coat color in an F2 Nellore-Angus population. Genetics Selection Evolution 2014, 46:1-8 Genomics, functional genomicsand metagenomics Oral Session – Submission 49 Bioprospecting of lignocellulolytic enzymes in enriched consortia of pine and eucalyptus forest soils by metagenomic sequencing Marina D. Reinert Instituto de Agrobiotecnología Rosario, Rosario, Santa Fé, Argentina Background. Second generation biofuels are produced by fermentation of sugars extracted from agronomic residues to ethanol. Lignocellulose breakdown is a crucial step needed to obtain sugar free molecules. Nowadays the bottleneck for second generation biofuel production is in the cost of lignocellulolitic enzymes [1, 2]. Our aim is to use metagenomic based bioprospecting to find novel lignocellulose degrading proteins and to produce them in a low cost system based on plants as biofactories. Methods. We took soils samples in a Pine elliotis and in a Eucalyptus grandis forest soils in Concordia, Entre Ríos, in February 2012. Both soils contained wood decaying material. Samples were then used as inoculum for minimum media [3] with only carboximetil-celulose (CMC) or sawdust as organic matter. Additionaly, we used antibiotics or antifungals to prevent each type of organism grow in each case. They were cultured for 30 days, and an aliquot of each culture was taken every 10 days. Genomic DNA was extracted from each sample. Amplicon sequencing of the V4 region of 16s rRNA gene was then performed at 454 GS-FLX+ (Roche) platform in order to evaluate the enrichment of lignocellulose degrading microorganisms. Whole genome metagenomic sequencing (454 GS-FLX+) was then performed to the most enriched sample (i.e. the one with high proportion of taxa described as lignocellulose degraders and minus of commensals). Bioprospection analysis using bioinformatics tools was then performed. First, we did de novo assembly using the CAMERA [https://portal.camera.calit2.net/gridsphere/gridsphere] assembler workflow. Then we used the MG- RAST [http://metagenomics.anl.gov/] platform for taxonomic and functional annotation. We extracted coding sequences (CDS) using Fraggene scan open reading frame (ORF) algorithm. We finally ran Blast against CAZy database [http://www.cazy.org/] to find lignocellulosic enzyme domains in our CDS dataset. A customized Perl script was used to get only those glycosyl hydrolase and cellulose binding domains linked with degrading activities [4]. Finally, we selected only those sequences who had shown consistence with Pfam [http://pfam.xfam.org/], UniProt [http://www.uniprot.org/] and Priam [http://priam.prabi.fr/] annotations, proper ORF length and not high homology with database enzymes (below 80%). Results. The metagenomic sequencing produced 718.489 reads, 421 pair bases (pb) long in average, totaling 302.172.049pb. A 10% (30.458.285pb) of the total pair bases were assembled in contigs. Maximum length contig was 523.078pb. We manually selected 39 promising proteins with an average length of 644pb, figure 1 and table 1 summarize its identity and domains. 23 Proteomics and functional proteomics ID:18 Oral Session Figure 1: The pie chart shows the abundance of glycosil hydrolase and cellulose binding domains in the selected proteins. Enzymes Acetylxylan esterase Alpha-glucuronidase Alpha-N-arabinofuranosidase Beta-glucosidase Endo-1,4-beta-xylanase Endoglucanase Xylan 1,4-beta-xylosidase Feruloyl esterase EC number 3.1.1.72 3.2.1.139 3.2.1.55 3.2.1.21 3.2.1.8 3.2.1.4 3.2.1.37 3.1.1.73 # 4 1 6 12 2 2 11 1 Table 1: shows all enzyme activities selected with his Enzyme Commission (EC) number and abundance of each one. Conclusions. The enrichment process allowed us to get bacterial consortia containing lignocellulose degrading microorganism, as we seen previously by 16s rRNA amplicon sequencing. But only implementing metagenomic sequencing we were able to know sequence identity of proteins involved in lignocellulose degrading. Proteins were manually annotated and a subset selected applying bioinformatics tools. This proceedings resulted in a list of 39 promising enzymes. These will be subject of experimental test at lab to take part of a degrading cocktail. Acknowledgments. We would like to thanks to Lic. Soledad Romero and Lic. Bianca Brun for perform all sequencing runs used in this study. References 1. Naik SN, Goud VV, Rout PK, Dalai AK: Production of first and second generation biofuels: A comprehensive review. Renew Sustain Energy Rev 2010, 14: 578–597. 2. Mtui GYS: Recent advances in pretreatment of lignocellulosic wastes and production of value added products. African Journal of Biotechnology 2009, 8: 1398–1415. 3. Crawford D, McCoy E: Cellulases of Thermomonospora fusca and Streptomyces thermodiastaticus. Appl Environ Microbiol 1972, 24: 150-152. 4. . Allgaier M, Reddy A, Park JI, Ivanova N, D’haeseleer P, Lowry P, Sapra R, Hazen TC, Simmons BA, VanderGheynst JS et al. Targeted discovery of glycoside hydrolases from a switchgrass-adapted compost community. PLoS One 2010, 5: 372–380. Proteomics and functional proteomics Oral Session – Submission 18 Visualization of genetic and proteomic biodiversity in four maturity stages of tomato fruit ripening 24 Proteomics and functional proteomics ID:18 Oral Session Paula B. Macat1,2∗ , Leandro Kovalevski2∗ , Marta Quaglino2 , Guillermo R. Pratta1,3 1-Consejo Nacional de Investigaciones Científicas y Técnicas 2-Instituto de Investigaciones Teóricos y Aplicados, Escuela de Estadística, Facultad de Ciencias Económicas y Estadística UNR, Rosario, Argentina 3-Cátedra de Genética, Facultad de Ciencias Agrarias UNR, Zavalla, Argentina * Authors equally contributing to this research Background. Tomato (Solanum lycopersicum) is a climacteric fruit whose ripening is characterized by sequential changes in protein expression, resulting in different profiling of polypeptide bands at each maturity stage [1]. However fruits from diverse tomato genotypes vary in their ripening [2]. Hence tomato fruit ripening is a biological process affected by multidimensional sources of variation, i.e.: maturity stage, genotype and protein expression. Correspondence analysis (CA) is a multidimensional scaling technique allowing a rapid visualization of associations among different sources of variations assessed by dichotomic data [3]. CA was applied in microarrays [3] and protein functional [5] studies. The aim of this work was to visualize the tomato fruit ripening by a CA that allow measuring the relative contribution of different genotypes, maturity stages and polypeptide bands to the total variation observed during the whole process, in a bioinformatic application at the individual level of biological organization. Materials and methods. Fruits from 15 genotypes (five Recombinant Inbred Lines -RIL- and their ten diallel Second Cycle Hybrids -SCH-) were screened by SDS-PAGE for 25 polypeptide bands at 4 maturity stages: Mature Green (MG), Breaker (B), Mature Red attached to plant (MRa) and Mature Red in shelves (MRs) according to [6]. A database of 15 x 25 x 4 dimension was analysed firstly by univariate analysis for presence of each band (overall and by stage) and secondly by multivariate CA at each maturity stage. Finally, an integrative CA was made to the complete database. Results. The overall presence of all polypeptide bands in the 4 maturity stages for the 15 genotypes was 0.52, having values of 0.46 at MG, 0.55 at B, 0.53 at MRa, and 0.54 at MRs. Minimum and maximum overall presence of each band varied from 0.05 (nearly absent) to 1 (full presence) for two given polypeptides. For most polypeptide bands, their presence varied through different maturity stages. Some polypeptides were more frequent at later maturity stages while others were just present in earlier stages. A higher variation among genotypes for protein expression was found at MG and MRs by CA, supporting the hypothesis that a broader genetic diversity should be expected for fruit traits that are less exposed to natural selection pressures [6]. The first two dimensions explained 35% of total variation at MG, which was the most variable maturity stage for the analyzed polypeptide profiles. Two RIL and two SCH clearly differentiated from the rest of genotypes at this stage, the polypeptide bands mostly associated to each of this four genotypes being completely opposite in their presence (Figure 1). Respecting to the other maturity stages, the first two dimensions explained 37% of total variation at B, 53% at MRa and 48% at MRs. The more divergent genotypes and their corresponding associated polypeptides were varying according to maturity stage, verifying that ripening is jointly affected by the three source of variation considered in this report, i.e., it is a multidimensional biological process. Integrative CA identified one hybrid as the most variable individual along ripening, and seven polypeptide bands highly associated to its discrepant performance in relation to the other genotypes of the diallel crossing. 25 Proteomics and functional proteomics ID:18 Oral Session Figure 1: Position of 25 poplypeptide bands (PP) and 15 genotypes (RILs indicated as LN and SCH indicated as LNx xLNy , N being the number assigned at each RIL by tomato breeders who obtained them) according to CA at MG maturity stage. Conclusions. Visualization of tomato fruit ripening at four maturity stages allowed measuring the relative contribution of genetic and proteomic diversity to this multidimensional biological process. The bioinformatic application at the individual level of organization was efficient for identifying the most variable genotypes and their associated polypeptide bands at each different maturity stage and along the complete ripening. References 1. Giovannonni JJ: Genetic regulation of fruit development and ripening The Plant Cell 2004, 16: p. S160-76. 2. Rodriguez GR, Sequin L, Pratta GR, Zorzoli R, and Picardi LA: Protein profiling in F1 and F2 generations of two tomato genotypes differing in ripening time Biologia Plantarum 2008, 52: p. 548-52. 3. Lebart L, Morineau A, and Warwick KM: Multivariate descriptive statistical analysis Wiley Chichester 1984, John Wiley, Wiley & Sons Sons Ltd. 4. Fellenberg K, Hauser NC, Brors B, Neutzner A, Hoheisel JD, and Vingron M: Correspondence analysis applied to microarray data PNAS 2001, 98: p 10781-86. 5. Chang JM, Taly JF, Erb I, Sung TY, Hsu WL, Tang CY, Notredame C, and Su ECY: Efficient and interpretable prediction of protein functional classes by Correspondence Analysis and Compact Set Relations PLOS 2013, 8: e75542. doi:10.1371/journal.pone.0075542. 6. Marchionni Basté E, Pereira da Costa JH, Rodríguez GR, Zorzoli R, and Pratta GR: Genetic analysis of tomato fruit ripening at polypeptide profiles level through quantitative and multivariate approaches American Journal of Plant Sciences 2014, 5: p. 1926-35. 26 Poster Session Sequence analysis Sequence analysis ID:12 Poster Session Poster Session – Submission 9 Physiological, genomic and proteomic evidences support the high UV resistance profile of Acinetobacter sp. Ver3 isolated from High Altitude Andean Lakes Daniel Kurth1 , Virginia Helena Albarracin1,2 , Carolina Belfiore1 , Marta Gorriti1 , Maria Eugenia Farias1 1 Planta Piloto de Procesos Industriales y Microbiológicos (PROIMI-CONICET), S. M. de Tucumán, 4000, Tucumán, Argentina. 2 Facultad de Ciencias Naturales e Instituto Miguel Lillo, Universidad Nacional de Tucumán, S. M. de Tucumán, 4000, Tucumán, Argentina. High-Altitude Andean Lakes (HAAL) are a group of disperse shallow lakes and salterns, located at the Dry Central Andes region in South America at altitudes above 3,000 m, and exposed to a unique combination of severe conditions: i.e. high solar global and UV irradiation, hypersalinity, wide fluctuations in daily temperatures, desiccation, high pH, high concentrations of toxic elements including arsenic [1]. As it is considered one of the highest UV-exposed environments on Earth, HAAL microbes can be taken as model systems to study UV-resistance mechanisms in environmental bacteria at various complexity levels. Acinetobacter sp. Ver3, a gammaproteobacteria isolated from Laguna Verde (4,400 m) was recently proposed as a model UV- resistant microbe with highly efficient DNA damage photorepairing ability [2], as well as an efficient catalase machinery [3]. Here we present the genome sequence analyses of this extremophile together with further experimental evidence supporting the idea that this bacterium is able to cope with increased damage in DNA compared to sensitive strains. The genome analyses provided insight in the taxonomic classification of this organism, suggesting that it would be a new species, and allowed to identify resistance genes related to the harsh environment. Moreover, an “UV-resistome” was defined, encompassing genes related to UV-damage repair on DNA (such as nucleases and glycosylases from excision repair systems), and genes conferring an enhanced capacity for scavenging the reactive molecular species responsible for oxidative damage (catalases, peroxidases and SODs). In addition, the UV response was also studied at the proteomic level, which confirmed the involvement of a specific cytoplasmic catalase, a putative regulator, and proteins associated to aminoacid and protein synthesis, among others. However, only a small number of proteins were overexpressed under UV stress, suggesting that the resistance of this bacterium might be due to efficient constitutively expressed systems. References 1. Farias ME, Poiré DG, Arrouy MJ, Albarracín VH: Modern stromatolite ecosystems at alkaline and hypersaline high-altitude lakes in the Argentinean Puna. In STROMATOLITES Interact Microbes with Sediments. Volume 18. Edited by Tewari V, Seckbach J. Dordrecht: Springer Netherlands; 2011:427–441. [Cellular Origin, Life in Extreme Habitats and Astrobiology] 2. Albarracín VH, Pathak GP, Douki T, Cadet J, Borsarelli CD, Gärtner W, Farias ME: Extremophilic Acinetobacter strains from high-altitude lakes in Argentinean Puna: remarkable UV-B resistance and efficient DNA damage repair. Orig Life Evol Biosph 2012, 42:201–21. 3. Di Capua C, Bortolotti A, Farías ME, Cortez N: UV-resistant Acinetobacter sp. isolates from Andean wetlands display high catalase activity. FEMS Microbiol Lett 2011, 317:181–9. Sequence analysis Poster Session – Submission 12 Analysis of the Uniprot repertoire of amino acid post-translational modifications Nicolás A. Méndez, Ignacio E. Sánchez Protein Physiology Laboratory, Departamento de Química Biológica and IQUIBICEN-CONICET, Universidad de Buenos Aires Background. The standard genetic code only accounts for the 20 most common amino acid residues. However, many amino acids in proteins are modified posttranslationally. Thus, current sequence representations for manual or in silico analysis provide incomplete information. We set out to describe the currently known posttranslational modifications in terms of prevalence, phylogenetic distribution and their relationship with the chemical reactivity of the modified standard amino acid. 28 Sequence analysis ID:12 Poster Session Materials and methods. We acquired the Uniprot list of posttranslational modifications (2014-03 release) and transferred it to a MySQL database using python code. We queried the database to evaluate the distribution of posttranslational modifications in the three domains of life and count the prevalence of each posttranslational modification. As a proxy for chemical reactivity, we used the estimation of T. Krick et al. [1]. Regression analysis was performed using R standard functions. The pvalues for the calculated coefficients of determination were obtained by permutation tests as performed by P. Legendre’s multRegress R function [2]. Scatterplots were constructed using R. The Venn diagram used to illustrate the distribution of posttranslational modifications was constructed using the BioVenn software [3]. Figure 1: Distribution of posttranslational modifications in the three domains of life expressed as percentage of total (Bacteria in yellow, Archaea in magenta and Eukarya in grey). Results. We found 466 unique posttranslational modifications in the Uniprot ontology. Note that glycation, lipidation, disulfide bridges and crosslinks are not included in the Uniprot posttranslational modification ontology and were therefore not considered at this stage of analysis. We quantified the number of posttranslational modifications for each of the 20 standard amino acids and the number of modifications involving one, two or more residues (Table 1). We also examined the distribution of posttranslational modifications in the three domains of life (Figure 1). Last, we quantified the correlation between the number of posttranslational modifications per standard amino acid and the chemical reactivity of each amino acid. 29 Sequence analysis ID:19 Decay (1/time) Residue Total PTMs 1 30 9 5 4 1 14 2 8 2 13 10 3 8 4 6 6 2 12 7 A C D E F G H I K L M N P Q R S T V W Y 17 105 22 22 10 47 20 15 54 11 18 29 17 14 24 55 38 11 19 35 #PTM volving aa 12 42 15 15 5 18 15 11 33 8 11 14 11 9 19 31 22 7 12 21 Poster Session in1 #PTM volving aa 5 57 7 7 5 29 5 4 21 3 5 15 6 5 5 18 16 4 5 12 in2 #PTM involving 3+ aa 6 2 6 2 2 Table 1: Quantification of posttranslational modifications of standard amino acids. The leftmost column shows the reactivity estimation from [1], the columns to the right show the total number of PTMs for a given standard residue. Conclusions. We propose that the standard amino acid alphabet should be expanded to include the diverse universe of posttranslational modifications. Since including all posttranslational modifications seems impractical, quantitative prevalence data will be needed to decide which posttranslational modifications are most important. The results are likely to be different in the three domains of life and may be explained in part by the chemical reactivity of the standard amino acids. References 1. Teresa Krick, David A. Shub, Nina Verstraete, Diego U. Ferreiro, Leonardo G. Alonso, Michael Shub, Ignacio E. Sanchez: "Amino acid metabolism conflicts with protein diversity." arXiv:1403.3301 [qbio.PE] 2014. 2. P. Legendre: "Rlanguage functions” http://adn.biol.umontreal.ca/~numericalecology/Rcode/ 3. Hulsen T, de Vlieg J, Alkema W: “BioVenn a web application for the comparison and visualization of biological lists using areaproportional Venn diagrams.” BMC Genomics 2008, 9:488. Sequence analysis Poster Session – Submission 19 Spatial organization and distribution of linear motifs in the Ankyrin repeat protein family and its binding partners Nina Verstraete, Ignacio E. Sánchez, Diego U. Ferreiro Universidad de Buenos Aires, Departamento de Quimica Biologica - IQUIBICEN- CONICET, Laboratorio de Fisiologia de Proteinas. 30 Sequence analysis ID:21 Poster Session Background. Interactions between proteins regulate cellular physiology. Many of these interactions involve the recognition of short peptidic regions (i.e. short linear motifs, SLiMs) which can be characterized by simple sequence patterns, usually found in intrinsically disordered regions or in loops connecting globular or transmembrane domains. These peptide- domain interactions are typically transient and often involve folding upon binding, challenging the lock-and-key paradigm of protein recognition. Ankyrin-repeats domains are one of the most frequently observed protein-protein interactors in nature. These domains are composed of tandem arrays of recurrent amino acids that cooperatively fold into elongated structures that mediate molecular recognition with high specificity. Many ankyrinbinding sites are either predicted or demonstrated to correspond to extended peptides mimicking SLiMs. Description. We present here an exhaustive analysis of linear motif identification in Ankyrin proteins and their binding partners. We searched for enriched or depleted SLiMs with respect to a random exploration of the sequence-space in the Ankyrin protein family and their partners. We also analyzed the spatial distribution of SLiMs along the protein sequences and describe how particular SLiMs are structurally distributed in the Ankyrin-containing proteins. Conclusions. This computational work presents sequence and structure-based approaches to analyze linear motifmediated protein interactions in the Ankyrin repeat protein family. We discuss that the presence of functional constraints can conflict with the Ankyrin-repeats domains folding dynamics which in turn modulate the evolution of biological interactions. Sequence analysis Poster Session – Submission 21 Segmentation of continuous range random variables sequences using entropic distances Miguel A. Ré1,2 , José L. Martínez1 1-Facultad Regional Córdoba, Universidad Tecnológica Nacional, Maestro López y Cruz Roja Argentina, Ciudad Universitaria, 5010 Córdoba 2-Facultad de Matemática, Astronomía y Física, Universidad Nacional de Córdoba, Haya de la Torre y Medina Allende, Ciudad Universitaria, 5010 Córdoba Jensen Shannon Divergence (JSD), a symmetrized version of Kullback-Leibler divergence[1], allows quantifying the difference between probability distributions. This property has been widely applied to the analysis of symbolic sequences by comparing the symbol composition of different subsequences [2]. One main advantage of JSD is that it does not require to map the symbolic sequence to a numerical sequence, which is necessary for instance in spectral or correlation analyses. JSD has been widely employed to detect domain walls in discrete sequences. See for instance segmentation of genomic chains [3]. JSD has been generalized in different ways considering non extensive entropy [4,5] or by considering higher order correlations in subsequences through Markov models [6,7]. Although JSD is a well defined magnitude for continuous distributions it has not been so extensively considered in continuous sequences segmentation. It is nevertheless of interest its application in separation of quantum states or the analysis of polarization images [8,9]. An alternative method for continuous random variables sequence segmentation is presented in this communication. In this proposal a new discrete variable is defined by considering sample mean and/or variance. The applicability of the method developed is considered by analysing continuous sequences artificially generated. References 1. Kullback S, Leibler R: On information and sufficiency. Ann Math. Stat. 1961, 22: 79- 86. 2. Grosse I, Bernaola-Galván P, Carpena P, Román-Roldán R, Oliver J, Stanley H: Analysis of symbolic sequences using the Jensen-Shannon divergence. Phys. Rev. E 2002, 65: 041905 1-16. And references therein. 3. Arvey A, Azad R, Raval A, Lawrence J: Detection of genomic islands via segmental genome heterogeneity. Nucleic Acids Research 2009, 1-12. 4. Tsallis C: Possible Generalization of Boltzmann-Gibbs statistics. J. Stat. Phys. 1988, 52: 479-487. 31 Sequence analysis ID:38 Poster Session 5. Lamberti P, Majtey A: Non-logarithmic Jensen-Shannon divergence. Phys. A 2003, 329: 81-90. 6. Thakur V, Azad R, Ramaswamy R: Markov models of genome segmentation. Phys. Rev. E 2007, 75: 011915 1-10. 7. Ré M.A., Azad R.K.: Generalization of Entropy Based Divergence Measures for Symbolic Sequence Analysis. PLoS ONE 9(4): e93532. doi:10.1371/ journal.pone. 0093532 (2014). 8. Jacques S. L., Roman J. R. and Lee K.: Imaging Superficial Tissues With Polarized Light. Lasers Surg. Med. 2000, 26: 119-129. 9. Tannous Z., Al-Arashi M., Shah S. and Yaroslavsky A.: Delineating melanoma using multimodal polarized light imaging. Lasers Surg. Med. 2009, 41: 10-16. Sequence analysis Poster Session – Submission 38 Unveiling evolutionary signals in protein-protein interaction interfaces Elin Teppa1 , Diego Javier Zea2 , Ariel Berenstein1 and Cristina Marino Buslje1 1 Structural Bioinformatics, Fundación Instituto Leloir 2 Structural Bioinformatics Group, Universidad Nacional de Quilmes Protein-protein interactions are involved in most cellular processes. The study of protein interactions from an evolutionary perspective is challenging, since it is difficult to distinguish evolutionary constraints due to protein structure and function preservation from those that arise due to interaction. The description and detection of evolutionary signals in protein-protein interactions is currently a very active field of research. Interacting residues are involved in inter-molecular interactions and they are structurally and functionally constrained, and therefore subject to a selection pressure that could be detected in homologous sequences. However residue conservation within the interface is far from obvious in many cases and the signal is usually weak. One reason is that the evolutionary pressure is not homogeneous within an interface (1). Also the coevolutionary signal between residues has been explored for detecting interacting residues with limited success (2). A decomposition of the interacting interface has been proposed where there is a core of buried residues, surrounded by a rim of residues whose atoms remain with some solvent accessibility (3,4). From a functional point of view, residues of interface core and rim have different contributions to the binding energy and consequently different selection pressures (5). Figure 1: Boxplot of Conservation (C) and cumulative MI (cMI) scores by protein regions: Interface Core (IC), Interface Rim (IR), Protein Core (PC) and Protein Surface (PS). Here we present a detailed study on protein-protein interaction using a comprehensive dataset of biological unit complexes (6). We dissected each interacting unit into four region: protein core (PC), protein surface (PS), interacting core (IC) and interacting rim (IR) based on the delta solvent accessibility upon complex formation and the relative solvent accessibility in the complex. Results show that there is no substantial difference between PC and IC, and PS and IR regions regarding conservation and coevolution. Also we have found that a coevolutionary derived measure (cMI) (7) displays a greater difference between IC and IR than residue conservation (see Figure 1). Regarding 32 Sequence analysis ID:40 Poster Session conservation and coevolution signals on residues involved in different number of interfaces, we have found that their conservation increases with the number of interacting partners while their cMI score decreases (see Figure 2) Figure 2: Boxplot of conservation and cMI scores by number of interact interfaces in which an interface residue participates. References 1. Guharoy M, Chakrabarti P. Conserved residue clusters at protein-protein interfaces and their use in binding site identification. BMC Bioinformatics. 2010;11:286. 2. Mintseris J, Weng Z. Structure, function, and evolution of transient and obligate protein-protein interactions. Proc Natl Acad Sci U S A. 2 de agosto de 2005;102(31):10930-5. 3. Bogan AA, Thorn KS. Anatomy of hot spots in protein interfaces. J Mol Biol. 3 de julio de 1998;280(1):1-9. 4. Lo Conte L, Chothia C, Janin J. The atomic structure of protein-protein recognition sites. J Mol Biol. 5 de febrero de 1999;285(5):2177-98. 5. Guharoy M, Chakrabarti P. Conservation and relative importance of residues across protein- protein interfaces. Proc Natl Acad Sci U S A. 25 de octubre de 2005;102(43):15447-52. 6. Bickerton GR, Higueruelo AP, Blundell TL. Comprehensive, atomic-level characterization of structurally characterized protein-protein interactions: the PICCOLO database. BMC Bioinformatics. 29 de julio de 2011;12(1):313. 7. Marino Buslje C, Teppa E, Di Doménico T, Delfino JM, Nielsen M. Networks of High Mutual Information Define the Structural Proximity of Catalytic Sites: Implications for Catalytic Residue Identification. PLoS Comput Biol. 4 de noviembre de 2010;6(11):e1000978. Sequence analysis Poster Session – Submission 40 Tools for the visualization of quality parameters and information of targeted sequencing data Nathalie B. Vicente1 , Gabriela Merino 2 , Juan M. Sendoya 3 , Javier Oliver 3 , Federico Prada 1 , Elmer Fernández 2 , Andrea Llera 3 1 UADE, 2 BDMG, 3 Leloir Next generation sequencing (NGS) is immersed in the big data paradigm. An easy visualization and integration of the large amounts of information produced by NGS is paramount for the interpretation of results and more complex analyses. We have developed simple tools for the friendly visualization of sequencing quality parameters and variant calling analysis derived from targeted sequencing projects, particularly for the Ion TorrentPlatform (Life Technologies). In these experiments, multiplex PCR are designed to specifically amplify different regions of the genome and only those regions (amplicons) will be subsequently sequenced. These tools were built as standalones developed in Java and are oriented to final users which do not require advanced computer skills. In the present work, these programs were used to analyze the results of a targeted sequencing experiment on human cancer cell lines, yet they can 33 Sequence analysis ID:50 Poster Session be easily adapted for use in other pathologies and genetic/molecular studies. Firstly, a heatmap was designed to analyze amplicon and gene depth coverage at a glance, allowing for a fast examination of the performance of the library construction process, particularly of the number of reads per amplicon, per gene and per primer panel pool. Such analysis uses color coding to easily distinguish between poorly, moderately, well and exceptionally well performing samples. Additionally, a circos plot was used to graphically compare sequencing variants throughout multiple samples. These circular diagrams can be used for various purposes, such as distribution of single-nucleotide polymorphisms (SNPs) or multiple pair-wise comparisons (for instance, between cancer and normal, experimental and control, or pre-treatment and post-treatment samples). In conclusion, we have developed tools which can be used for the easy and friendly visualization of sequencing quality parameters and information of targeted sequencing experiments, for its use in basic and clinical research. Sequence analysis Poster Session – Submission 50 Using coevolution classification to improve protein subfamily Franco L Simonetti1 , Martin Banchero1 , Ariel J Berenstein2 , Ariel Chernomoretz2 , Cristina Marino Buslje1 1 Bioinformatics Unit, Fundación Instituto Leloir, Capital Federal, Argentina. 2 Integrative Systems Biology Group, Fundación Instituto Leloir, Capital Federal, Argentina. Background. The common approach for protein subfamily classification relies on grouping protein sequences according to their degree of similarity. However, there is no single sequence similarity threshold for accurately grouping sequences into isofunctional groups. Most methods rely on protein superfamilies as a starting point for subfamily classification. Superfamilies are defined as a set of homologous proteins in which conserved sequence or structural characteristics can be associated with conserved functional characteristics. Superfamily members can be highly divergent and catalyze quite different overall reactions. A subfamily is defined as a set of homologous proteins within a superfamily that perform an identical function by the same mechanism Current subfamily classification methods use bottom-up clustering to construct a cluster hierarchy, then cut the hierarchy at the most appropriate locations to obtain a single partitioning [1, 2]. These methods usually integrate data such as protein sequence similarity, residue conservation within groups and HMM profiles. Moreover, results usually predict a great number of subfamilies with few members and limited biological meaning. The goal of this study is to identify subsets of functionally closely related sequences within a given superfamily. Since all proteins within a superfamily share a common ancestor, we hypothesize that functional diversity within superfamilies has arisen through a series of concerted changes that must have left an identifiable coevolutionary signal Materials and methods. The challenge is to be able to separate the subfamilies coevolutionary signals and use them in the process of subfamily classification. This information can be used to guide a hierarchical clustering. Our approach uses Mutual Information to calculate covariation [3] and commonly used clustering methods based on sequence similarity. We have defined a select group of superfamilies from the Structure Function Linkage Database as our gold standard dataset [4]. Results. Different approaches were considered for integrating Mutual Information data in sequence clustering. Since Mutual Information can only be calculated for a group of sequences, a preliminary sequence clustering is performed. Using solely covariation data, our method can cluster groups of sequences from the same subfamily. For a complete clustering solution, it performs almost as good as a hierarchical clustering based on sequence similarity. The next step will be to integrate both methods Conclusions. Automated protein classification remains an active topic of research and state of the art methods are far from predicting biologically meaningful results. Covariation data has never been used before in this context and further analysis are needed to improve the method. References 34 System Biology and Networks ID:28 Poster Session 1. David A Lee, Robert Rentzsch and Christine Orengo. GeMMA: functional subfamily classification within superfamilies of predicted protein structural domains. Nucleic Acids Res. 2010 Jan;38(3):720-37. doi: 10.1093/nar/gkp1049. Epub 2009 Nov 18. 2. Brown DP, Krishnamurthy N, Sjölander K. Automated protein subfamily identification and classification. PLoS Comput Biol. 2007 Aug;3(8):e160. 3. Buslje CM, Santos J, Delfino JM, Nielsen M. Correction for phylogeny, small number of observations and data redundancy improves the identification of coevolving amino acid pairs using mutual information. Bioinformatics. 2009 May 1;25(9):1125-31. doi: 10.1093/bioinformatics/btp135. Epub 2009 Mar 10. 4. Eyal Akiva et al, Patricia Babbitt. The Structure-Function Linkage Database. Nucleic Acids Res. 2014 Jan 1;42:D521-30 System Biology and Networks Poster Session – Submission 28 Improving Rule-Based Gene Regulatory Network Inference by means of Biclustering Cristian A. Gallo1 , Jessica A. Carballido1 , Ignacio Ponzoni1,2 1 Laboratory for Research and Development in Scientific Computing (LIDeCC), DCIC, UNS, Bahía Blanca, Argentina 2 Planta Piloto de Ingeniería Química, CONICET-UNS, Bahía Blanca, Argentina Background. Gene regulatory networks (GRNs) play an important role in the progression of life phenomena such as cell cycling, developmental biology, aging, and the progressive and recurrent pathogenesis of complex diseases, among others. The amount of gene expression time series data is becoming increasingly available, providing the opportunity to reverse engineer the time-delayed gene regulatory networks that govern the majority of these molecular processes. In this context, data mining methods constitute suitable approaches for performing the inference of the relational structures of a GRN [1]. Methods. The aim of the research presented here consists on the enhance of GRN based on association rules from multiple microarray time series datasets given as input. In this regard, a rule-based inference algorithm (GRNCOP2) [2] was combined with a biclustering technique (BiHEA) [3] in order to increase the useful information extracted from the datasets. The association rules establish causal links between two genes, where the semantics and the interpretation depend of the input data and on the rule type inferred. This provides a global view of the relation between each pair of genes since it considers all the data available on the expression profiles. On the other hand, the biclustering algorithm can be used to extract co-expression (similar or opposed) relations between genes that may only occur in a subset of the experimental conditions, extracting additional associations with a local view of the data that may not be captured by the main inference algorithm. In order to combine both methods, a pair-wise analysis is performed to extract association rules from the biclusters obtained from all the datasets, adding the best rules to the GRN inferred by the ruled based method. The proposed approach was applied to time series datasets [4, 5] composed of twenty yeast genes that are highly relevant for the cell-cycle study, and the results were analyzed in terms of the novelty and soundness of the rules provided by the biclustering algorithm. In order to assess the soundness of the rules, the average accuracy for the rules was measured regarding a freely available database of associations between yeast genes known as Yeastnet [6]. The Figure 1 shows the average accuracy for the rules obtained by the ruled based approach, the biclustering algorithm and the combination of both methods. It also shows the expected accuracy if the rules were picked randomly. The Figure 2 shows the network obtained by the rule-based approach alone and the same network enhanced by the rules obtained through the biclustering algorithm. As it can be observed, the set of rules inferred by the two algorithms and the combined results achieve high accuracy values regarding the Yeastnet benchmark database, performing above the random selection as expected. Although the rules inferred by the biclustering algorithm are less accurate than those extracted by the rule-based approach, these rules represent new potential relations that were not discovered by the main inference algorithm, thus enhancing the overall inference capabilities. 35 System Biology and Networks ID:28 Poster Session Conclusions. In this work, we have introduced an approach to integrate the results of a rule-based method with a biclustering algorithm for the inference of gene regulatory networks. The method was validated with well known publicly available gene expression datasets. The results have shown that the combined approach infers a gene regulatory network with high average accuracy regarding the Yeasnet database, providing new relations that were not present in the GRN inferred by the rule-based method alone. This shows the importance of combining different approaches in the inference of gene regulatory network, since it provides alternative views of the data and allows the discovery of significant relations that may no be detectable by an specific approach. Further analysis is required in order to confirm these promissory results. Acknowledgments. This work is kindly supported by CONICET grant PIP 112-2012-0100471CO and UNS grant PGI 24/N032. Figure 1: Average Yeastnet accuracy of the rules inferred by the rule-based approach (GRNCOP2), the biclustering algorithm (BiHEA), the combined results and a random selection. Figure 2: Gene regulatory networks inferred by the algorithms. The red arrows represents gene activation, whereas the blue arrows implies gene inhibition. Left: rule-based gene network inferred by GNRCOP2. Right: rule-based gene network enhanced by the new relations inferred by BiHEA. The new rules are denoted in light blue and light red. References 1. Gallo, CA, Carballido, JA, Ponzoni, I: Inference of Gene Regulatory Networks based on Association Rules, In Biological Knowledge Discovery Handbook: Preprocessing, Mining and Postprocessing of Biological Data. Edited by Elloumi M, Zomaya AY. John Wiley & Sons. 2013. 2. Gallo, CA, Carballido, JA, Ponzoni, I: Discovering Time-Lagged Rules from Microarray Data using Gene Profile Classifiers, BMC Bioinformatics 2011, (12)123:1-21. 36 System Biology and Networks ID:30 Poster Session 3. Gallo, CA, Carballido, JA, Ponzoni, I: BiHEA: A Hybrid Evolutionary Approach for Microarray Biclustering, Lecture Notes in Bioinformatics, Springer-Verlag 2009, 5676:36–47. 4. Segal, E, Shapira, M, Regev, A, Pe’er, D, Botstein, D, Koller, D, Friedman, N: Module Networks: Identifying Regulatory Modules and Their Condition-Specific Regulators from Gene Expression Data, Nature Genetics 2003, 34:166-176. 5. Yeang, CH, Jaakkola, T: Physical Network Models and Multi-Source Data Integration. Proc Seventh Ann Int’l Conf Research in Computational Molecular Biology 2003, 312-321. 6. Lee I, Li Z, Marcotte EM: An improved, bias-reduced probabilistic functional gene network of baker’s yeast, Saccharomyces cerevisiae. PLoS ONE 2007, 2(Suppl 10):e988 System Biology and Networks Poster Session – Submission 30 Photoreceptor Absorption Curves account for human chromatic Discrimination Ability María da Fonseca, Inés Samengo Física Estadística e Interdisciplinaria, Centro Atómico Bariloche Photoreceptors constitute the first stage in the processing of color information; many more stages are required before humans can consciously report whether two stimuli are perceived as chromatically equal or not. Therefore, although photoreceptor absorption curves (panel A) are expected to influence the accuracy of conscious discriminability, there is no reason to believe that they should suffice to explain it. However, by means of a simple information-theoretical analysis, here we demonstrate that photoreceptor absorption properties predict the wavelength dependence of human color discrimination ability, as tested by behavioral experiments (panel B). The bottleneck in chromatic information processing, therefore, seems to be determined by photoreceptor absorption characteristics. Subsequent encoding stages preserve the wavelength dependence of chromatic discriminability at the photoreceptor level. Our formalism is easily extended to include light beams of arbitrary spectral power distribution, predicting the discrimination ability in the 3- dimensional color space CIE XYZ and in the 2-dimensional space CIE xyY. We finally explore the chromatic discrimination ability of subjects with atypical photoreceptor absorption characteristics, as in daltonism or tetrachromatism. 37 System Biology and Networks ID:42 Poster Session Figure 1: A. Normalized photoreceptor absorption curves for S (blue) M (green) and L (red) cones. B. Discrimination error ∆λ as a function of wavelength λ for eight different subjects [2]. References 1. Stockman A, Brainrard DH (2009) Color vision mechanisms. In: OSA Handbook of Optics (Bass M, ed), pp. 11.1-11.104. New York: McGraw-Hill. 2. Smeulders N, Campbell FW, Andrews PR (1994) The Role of Delineation and Spatial Frequency in the Perception of the Colours of the Spectrum. Vision Res 34:927-936. System Biology and Networks Poster Session – Submission 42 PaNTex: A novel methodology to assemble Pathway Networks using Text Mining Julieta S. Dussaut1 , Fiorella Cravero2 , Ignacio Ponzoni1 , Ana G. Maguitman3 , Rocío L. Cecchini1 1 Laboratory of Research and Development in Scientific Computing (LIDeCC), Department of Computer Science, Universidad Na- cional del Sur - Bahía Blanca, Argentina 2 Planta Piloto de Ingeniería Química, CONICET - Bahía Blanca, Argentina 3 Artificial Intelligence Research and Development Laboratory (LIDIA), Department of Computer Science, Universidad Nacional del Sur - Bahía Blanca, Argentina 38 System Biology and Networks ID:42 Poster Session Background. Systems Biology is a discipline that integrates biological knowledge coming from different sources to study a range of complex biological regulatory system. In this context, the pathways, firstly created as a graphical representation of well-established knowledge about biological processes, are becoming increasingly important for life science research [1]. However the determination of interaction patterns in pathway networks is typically a manual procedure which requires significant contributions from domain experts within the research community. During the past years we have witnessed the emergence of novel data-driven methods aimed at assisting Systems Biology research. In particular, the analysis of information on molecular events contained in very large repositories has led to new approaches to extract biological interactions from scientific literature [2]. Literature mining methods can help analyze, integrate, and understand not only large collections of data per se, but also the linkages amongst them which allow us to make inferences [3, 4]. The fast publication of new papers make staying up-to-date a serious challenge (i.e. PubMed database contains information for over 23 million articles and continues to grow at a high rate weekly). Therefore, text mining methods, which aid in the construction and maintenance of pathway knowledge, have become relevant tools for biologists to manage this increasing quantity of biological literature. Another crucial issue in text mining applied to Bioinformatics is to achieve a robust testing of the methods due to the lack of large, objectively validated test sets or “gold standards” [5]. These problems have as main consequence that many inferred pathways do not represent coherent explanations of the reported facts [3], and to transform the results of automatically constructed networks into pathways seems to require important additional human efforts. For that reason, the integration of literature mining algorithms with robust validation strategies for pathway knowledge extraction is an interesting open research field. Materials and methods. In this work we present a literature mining approach for assisting in the construction of a pathway network. It is important to mention that our proposal is in an initial stage of development. For this reason, only the general architecture of the computational strategy and preliminary experiments are reported here. As a starting point of this approach, we use KEGG pathway database in order to gather a list of pathways for each organism, at this starting stage we consider only human and yeast as valid organisms for the method. Using this list we search PubMed publications via its Entrez Programming Utilities and look for co-occurrence of pathways in the same publication. The resulting data is stored in an intersection matrix. We also keep track of the number of publications that contain a pathway name to use for normalization purposes. In order to validate the proposed method we contrast the resulting normalized matrix with data reported in Alexeyenko & Sonnhammer [6]. A scheme of the designed methodology is shown in Figure 1 (see next page). Conclusions. In this work we present the architecture of a text mining approach for the extraction of associations between pathways from PubMed literature. At this moment we are evaluating the method results using homo sapiens data. 39 System Biology and Networks ID:42 Poster Session Figure 1: Scheme of PaNTex. Acknowledgments. This work is kindly supported by PGI-UNS (24/N032), PGI-UNS 24/N029, CONICET-PIP 1122009-0100322, CONICET-PIP11220120100487, PICT-2011-0149. References 1. Kamburov A, Pentchev K, Galicka H, Wierling C, Lehrach H, Herwig, R: ConsensusPathDB: Toward a more complete picture of cell biology, Nucleic Acids Research, 2011, 39(Suppl.1):D712-D717. 40 Genome Annotation and Organization ID:1 Poster Session 2. Li, C., Liakata, M., & Rebholz-Schuhmann, D. (2013). Biological network extraction from scientific literature: state of the art and challenges. Briefings in bioinformatics, bbt006. 3. Oda K, Kim J-D, Ohta T, Okanohara D, Matsuzaki T, Tateisi Y, Tsujii J: New challenges for text mining: mapping between text and manually curated pathways, BMC Bioinformatics, 2008; 9(Suppl 3):S5. 4. Buyko E, Linde J, Priebe S, Hahn U: Towards automatic pathway generation from biological full-text publications, Lecture Notes in Computer Science, 2011, 7014:67-79. 5. Maguitman AG, Rechtsteiner A, Verspoor K, Strauss C, Rocha L: Large-Scale Testing of Bibliome Informatics Using Pfam Protein Families. Pacific Symposium on Biocomputing 2006: 76-87. 6. Alexeyenko A. and Sonnhammer E.: Global networks of functional coupling in eukaryotes from comprehensive data integration. Genome Research, 2009, 19: 1107-1116. System Biology and Networks Poster Session – Submission 45 Effect of plasticity on orientation selectivity in a model of primary visual cortex Soledad Gonzalo Cogno, Germán Mato Statistical and Interdisciplinary Physics Group, Instituto Balseiro and Centro Atómico Bariloche, San Carlos de Bariloche, Argentina, 8400 Since its discovery by Hubel and Wiesel in 1959, orientation selectivity has been observed in every mammal for which the neuronal response selectivity of primary visual cortex (V1) has been examined. In some animals, like cat and monkey, anatomically close V1 neurons have similar preferred orientations, giving rise to maps of orientation preferences. However, sharp selectivity is also observed in animals, like mice, squirrels and rats, whose V1 has no orientation map. This means that neurons with different preferred orientation are intermixed. This second scenario is called salt-and-pepper organization. This scenario leads to question the role of intracortical connections since a purely topographical organization of the connections would not generate reinforcement of orientation selectivity as in the case with orientation maps. Recent studies have shown that connections are formed selectively between neurons with similar response properties, and connections are eliminated between visually unresponsive neurons; the overall connectivity rate is kept constant. Though, the effect of this plastic behavior on orientation selectivity is unclear. The present work focuses on analyzing the effect of plasticity on orientation selectivity for the salt-and-pepper organization. We simulate a patch of layer 4 composed by two populations of neurons (excitatory and inhibitory) with weakly orientated selective inputs and update the excitatory-excitatory connections. The updating rule depends on the relative timing of the pre and post-synaptic spikes. We find that even if the connections are substantially modified (see figure 1A), this leads only to a weak increase in selectivity (see figure 1B and 1C). In future work, we plan to compare this phenomenon with the results of systems with orientation maps. Figure 1: Excitatory synaptic efficacies and Orientation-Selectivity-Index (OSI) distributions. A. Distribution of the excitatory synaptic efficacies after the plasticity rule is applied. They are all initialized to 1. After plasticity is applied synaptic efficacies get stronger in average B. OSI distribution in absence of plasticity C. OSI distribution in presence of plasticity. 41 Genome Annotation and Organization Genome Annotation and Organization ID:1 Poster Session Poster Session – Submission 1 Comparative genomics in human parasite flatworms: Ehinococuccus granulosus s.s. (G1 genotype) and Echinococcus canadensis (G7 genotype) Lucas L Maldonado1 , Juliana Assis2 , Flávio Gomes Araújo2 , Natalia Macchiaroli, Marcela Cucher, Mara Rosenzvit1 , Guilherme Oliveira2 and Laura Kamenetzky1 1-IMPaM, CONICET, Fac. de Medicina - Univ. de Buenos Aires, Argentina 2- Genomics and Computational BiologyGroup, CPqRR - Oswaldo Cruz Foundation, Belo Horizonte, MG, Brazil. Background. Echinococcus canadensis is a platyhelminth parasite which keeps close phylogenetic relationship with Echinococcus granulosus and Echinococcus multilocularis, members of the class Cestoda that are involved in hydatid infections of humans and animals. In South America three species of Echinococcus sensu lato have been reported E. granulosus sensu stricto (G1 and G2 genotypes), E. canadensis (G6 and G7 genotypes) and E. ortleppi (G5 genotype) (Kamenetzky and Cucher, 2014). Only limited genetic information of E. canadensis G7 was reported so far. In this work we have sequenced the genome of this species. Methods. High quality genomic DNA has been extracted and two paired-end libraries have been sequenced by Illumina technology. Several pipelines of assembly have been evaluated. The genome has been de novo assembled with Velvet using different parameters until the best assembly was obtained. Also, reads have been mapped over E. multilocularis reference genome (Tsai et al., 2013) with BWA. Genes have been annotated by CEGMA and MAKER softwares with flatworm data for gene model training. Localization in E. multilocularis # genes Chromosome 1 95 Chromosome 2 56 Chromosome 3 59 Chromosome 4 60 Chromosome 5 41 Chromosome 6 16 Chromosome 7 24 Chromosome 8 24 Chromosome 9 7 Chromosome 10* 5 Chromosome 11* 0 Total 387 *already unasssembled scaffolds of E. multilocularis reference genome Results. Comparative studies have revealed high levels of nucleotidic identity of E. canadensis G7 with E. multilocularis as well as with E. granulosus s. s. G1. Almost all contigs have a correlation in E. multilocularis genome (Figure 1). Interestingly, the procedure for in silico annotation employed in this work allowed to identify 86% (387/450) of highly conserved genes (Table 1). 42 Genome Annotation and Organization ID:8 Poster Session Conclusions. This is the first report of E. canadensis G7 genome. It was obtained by high throughput sequencing, allowing a broad genome view of this particular species that shows important biological and epidemiological features. The knowledge of this new genome would provide information for comparative genomics allowing adapting prevention and diagnosis tools to each epidemiological situation. References Kamenetzky Laura y Cucher Marcela, Hidatidosis: genotipos de Echinococcus granulosus presentes en Artgentina y el mundo. Capitulo 43, pags 411-421, Libro Temas de Zoonosis VI, 2014, 500 páginas totales, Editorial: Asociacion Argentina de Zoonosis, ISBN 978-987-97038-5-4 Tsai IJ et al. The genomes of four tapeworm species reveal adaptations to parasitism. Nature. 2013; 496(7443):57-63 Genome Annotation and Organization Poster Session – Submission 8 The human genome data analysis platform Daniel Koile, Maximiliano de Sousa Serro, Diego Wallace, Patricio Yankilevich Instituto de Investigación en Biomedicina de Buenos Aires (IBioBA) - CONICET - Partner Institute of the Max Planck Society, CABA, Buenos Aires, Argentina Background. The health of an individual depends upon their DNA as well as upon environmental factors. The genome is the blueprint of an individual, and its analysis with additional biological information, such as the DNA methylome, the transcriptome, the proteome, and the metabolome, will further provide a dynamic assessment of the physiology and health state of an individual (1). The personal genome interpretation can be used to identify molecular and genetic variations within the population. This genetic screening information will allow us to elucidate disease pathways and identify new drug targets. In clinical trials this information will speed up time and reduce risks of trials by recruiting participants based on their genetic profile. The trial results combined with genetic profiles will allow to inform therapeutic development and identify genetic causes in drug response and side effects. Finally, this human genome analysis platform may help us to better understand the genetic basis of diseases, to make more accurate diagnosis, to have a better understanding of prognosis and to make better treatment decisions. Materials and methods. The platform we are building consists in a computer cluster, a Next Generation Sequencing (NGS) data analysis pipeline, a set of biological knowledge databases and a platform website. The software pipeline is the key component of the platform. It is made of state of the art methods for NGS data analysis. Over 15 public open source algorithms, developed by research groups from leading institutions, which conform today’s best practices are being used in our pipeline. This guarantees a transparent data analysis and reproducibility. The pipeline is designed as independent modules which sequentially execute the different genome analysis tasks. The Genome 43 Genome Annotation and Organization ID:13 Poster Session Analysis Toolkit (GATK) developed by the Broad Institute (2) is widely used in our pipeline, complementing other analysis and visualization tools. Conclusions. This human genome general analysis pipeline provides us the basis to participate in different biomedical projects which include patient genetic profiles and allow us to start collaborations with experimental research groups working with human diseases. Eventually, this basic framework can be customized to provide further important applications such as cancer diagnosis, non-invasive prenatal tests or newborn screening. In future work we aim to extend the platform to integrate transcriptome and epigenome data into the analysis. References 1. Chen R, Mias R, Li-Pook-Than J, Jiang L, Lam H, Chen R, Miriami E, Karczewski K, Hariharan M, Dewey F, et al. Personal omics profiling reveals dynamic molecular and medical phenotypes. Cell 2012, 148(6): 1293-1307. 2. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 2010, 20:1297-1303. Genome Annotation and Organization Poster Session – Submission 13 The plastome of the Yerba Mate tree Jimena Cascales1,2 , Mariana Bracco1 , Lidia Poggio1,2 , Alexandra M Gottlieb1,2 1 -Laboratorio de Citogenética y Evolución, Departamento de Ecología, Genética y Evolución, IEGEBA (UBA-CONICET), FCEyN, UBA. Int. Güiraldes 2620, Ciudad Universitaria, Pab. II, 4to piso, Laboratorio 61-62. (C1428EHA), CABA, Argentina. 2 2-CONICET. [email protected] Background. The ”yerba mate” tree (Ilex paraguariensis) is a perennial native to subtropical South America. Its economic value relies on the usage of the leaves and twigs, to prepare a popular infusion. The custom of drinking ”mate” is a legacy of the Guaraní culture strongly rooted in our society. Several medicinal properties are attributed to the high concentrations of various secondary metabolites, minerals and vitamins. In Argentina the production of ”yerba mate” is restricted to Misiones and Corrientes, due to the climate and soil requirements of the crop. Phytochemical studies on this species abound in the literature[1,2]; notably, the information about basic genetics is very limited. To contribute to its genetic knowledge, we faced the sequencing of the chloroplast genome, analyzing its structure and gene content. Materials and methods. First, intact chloroplasts were isolated from fresh materials using the Chloroplast Isolation Kit (Sigma). The plastidic DNA was extracted adapting protocols [3,4]. The samples were sequenced using 454 GSFLX+Roche at the INDEAR (Rosario, Santa Fe). There, a preliminary contig assembly was attempted. We used bioinformatic tools to verify and assemble a definite plastome. A consensus sequence was obtained with Sequencher v4.1.4 (GeneCodes Corporation); the annotation was carried- out with CpGAVAS[5]. Specific PCR primers were designed with Primer3Plus[6] and Primer- BLAST[7], to check the junctions between the large (LSC) and small (SSC) single-copy segments and the two inverted repeats (IRs). The reading frameworks were adjusted using sequences of Ilex cornuta as references, with the NCBI-BLAST[8] algorithms and the MSWAT[9] web server. The number and location of repeats were assessed using REPuter[10]. Plastidic microsatellite loci and the corresponding primer pairs were detected using the WebSat[11] server. Results. As the sequencing result, 492,515bp were generated (in 56 contigs from 4 individuals). A consensus sequence of 157.6bp was obtained for the complete plastome. It shows the typical quadripartite structure, having a LSC of 87,148bp; two IRs of 26,076bp each, and a SSC of 18,310bp. In total, 114 unique genes were detected; 80 are coding sequences, 30 tRNAs and 4 rRNAs (table 1). Fourty-nine repeats were identified, 27 palindromic and 22 forward. Thirty-five potential mononucleotidic and one dinucleotidic microsatellite loci were detected, their utility as markers remains to be evaluated. 44 Genome Annotation and Organization Gene cluster Small subunit of ribosome Large subunit of ribosome RNA polymerase subunits NADH dehydrogenase Photosystem I Photosystem II Cytochrome b/f complex ATP synthase Large subunit of RUBISCO Translational initiation factor Maturase Protease Envelope membrane protein Subunit of acetyl-CoA carboxylase c-type cytochrome synthesis gene Conserved ORF of unknown function Transfer RNA genes Ribosomal RNA genes ID:13 Poster Session Identification rps2, rps3, rps4, rps7, rps8, rps11, rps12o, rps14,rps15, rps16o, rps18, rps19 rpl2o, rpl14, rpl16o, rpl20, rpl22, rpl23, rpl32, rpl33, rpl36 rpoA, rpoB, rpoC1o, rpoC2 ndhAo, ndhBo, ndhC, ndhD, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ, ndhK psaA, psaB, psaC, psaI, psaJ, ycf3* psbA, psbB, psbC, psbD, psbE, psbF, psbH, psbI, psbJ, psbK, psbL, psbM, psbN, psbT, psbZ petA, petBo, petDo, petG, petL, petN atpA, atpB, atpE, atpFo, atpH, atpI rbcL infA matK clpP* cemA accD ccsA ycf1, ycf2, ycf4, ycf15 trnA-UGCo, trnC-GCA, trnD-GUC, trnE-UUC, trnF-GAA, trnG-GCC, trnGUCCo, trnH-GUG, trnI-CAU, trnI-GAUo, trnK-UUUo, trnL-CAA, trnLUAAo, trnL-UAG, trnM-CAU, trnfM-CAU, trnN-GUU, trnP-UGG, trnQUUG, trnR-ACG, trnR-UCU, trnS-GCU, trnS-GGA, trnS-UGA, trnT-GGU, trnT-UGU, trnV-GAC, trnV-UACo, trnW-CCA, trnY-GUA rrn4.5, rrn5, rrn16, rrn23 Table 1: Genes located in the IR region are shown in bold. ogene with one intron; *gene with two introns. ORF, open reading frame Conclusions. The data presented herein constitutes a novel contribution, and a useful information platform that will enhance the generation of new ”yerba mate” varieties, the improvement of the crop’s genetic background, and the devise of original transgenesis experiments. These, in turn, will directly benefit the ”yerba mate” industry, one of our most profitable economic activities. References 1. Filip R, Ferraro GE, Bandoni AL, Bracesco N, Nunes E, Gugliucci A, Dellacassa E: Mate (Ilex paraguariensis). In: Imperato, F. (ed) Recent advances in Phytochemistry, 2009. Research Signpost, Kerala, India, pp 113-131. 2. Heck CI, González De Mejia E: Yerba mate tea (Ilex paraguariensis): A comprehensive review on chemistry, health implications, and technological considerations. J Food Sci 2007, 72:R138-151. 3. Diekmann K, Hodkinson TR, Fricke E, Barth S: An optimized chloroplast DNA extraction protocol for grasses (Poaceae) proves suitable for whole plastid genome sequencing and SNP detection. PLoS ONE 2008, 3(7): e2813. doi:10.1371/journal.pone 4. Shi C, Hu N, Huang H, Gao J, Zhao Y-J, Gao L-Z: An improved chloroplast DNA extraction procedure for whole plastid genome sequencing. PLoS ONE 2012, 7(2): e31468. doi:10.1371/journal.pone.0031468. 5. Liu C, Shi L, Zhu Y, Chen H, Zhang J, Lin X, Guan X: CpGAVAS, an integrated web server for the annotation, visualization, analysis, and GenBank submission of completely sequenced chloroplast genome sequences. BMC Genomics 2012, 13:715. 6. Untergasser A, Cutcutache I, Koressaar T, Ye J, Faircloth BC, Remm M, Rozen SG: Primer3–new capabilities and interfaces. Nucleic Acids Res 2012, 40(15):e115. 7. Ye J, Coulouris G, Zaretskaya I, Cutcutache I, Rozen S, Madden TL: Primer-BLAST: A tool to design target-specific primers for polymerase chain reaction. BMC Bioinformatics 2012, 13:134. 8. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol 1990, 215: 403-410. 45 Genome Annotation and Organization ID:17 Poster Session 9. Cai Z: Comparative Analyses of Land Plant Plastid Genomes. Dissertation Presented to the Faculty of the Graduate School of The University of Texas at Austin in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy, 2010. [10] Kurtz S, Choudhuri JV, Ohlebusch E, Schleiermacher C, Stoye J, Giegerich R: REPuter: the manifold applications of repeat analysis on a genomic scale. Nucleic Acids Res 2001, 29(22): 4633–4642. [11] Martins WS, Soares Lucas DC, de Souza Neves KF, Bertioli DJ: WebSat - A web software for microsatellite marker development. Bioinformation 2009, 3(6): 282-283. Genome Annotation and Organization Poster Session – Submission 16 Complete genome sequencing of the thermophilic bacterium Thermus sp. 2.9 using an Illumina/pyrosequencing hybrid approach Laura Navas1,3 , Maximiliano Ortiz1 , Graciela Benintende1 , Marcelo Berretta1,3 , Rubén Zandomeni1,3 , Ariel Amadío2,3 . 1-Instituto de Microbiología y Zoología Agrícola (IMyZA), Instituto Nacional de Tecnología Agropecuaria (INTA), Las Cabañas y de Los Reseros, Buenos Aires, Argentina. 2-EEA Rafaela, Instituto Nacional de Tecnología Agropecuaria (INTA), Ruta 34 km 227, Rafaela, Santa Fe 3-CONICET In this work we studied and compared different approaches undertaken for sequencing the genome of a thermophilic bacterium. We have isolated the thermophilic Thermus sp. 2.9 from a hot spring of Rosario de la Frontera, in Salta, Argentina. Thermophilic organisms contain relevant genes with potential biotechnological applications. There is also interest in studying the mechanism involved in bacterial adaptation to their extreme natural environment. We used Roche 454 and Illumina MiSeq platforms to generate unpaired and paired-end reads, respectively. The paired-end library was build using long jumping distance technology with a length of 8 Kb. The following table summarizes the results of sequencing and assemblies: # reads Assembler # contigs N50 Roche 454 215,557 Newbler 137 39,906 Illumina MiSeq 2,139,062 MIRA 323 17,661 Roche454 + Illumina MiSeq 2,354,619 MIRA 131 79,216 Hybrid assembly using MIRA gave the best result. Scaffolding was performed with BAMBUS using the contigs coming from the hybrid assembly. Different values of redundancy were evaluated to consider true a link between contigs using paired reads. The best result was obtained with a minimum of 200 linked reads. In this way, seven scaffolds covering the entire bacterial chromosome were obtained. Using the information given by an optical map of the genome generated previously we were able to order and join the scaffolds, leading to the reduction of the whole chromosome to a single scaffold. Another three major scaffolds longer than 50 Kb were found homologous to plasmids reported for the genus, suggesting the presence of one or more plasmids in this strain. Genome annotation was made using the RAST server. We identified a total of 2,673 CDS, 48 tRNA and 3 rRNA gene-encoding regions. We analyzed these annotated features and found that 1,705 CDS can be associated to enzymes with defined functions. Corresponding EC number were assigned to those genes, while 968 CDS were classified as hypothetical proteins. Fifty-nine genes were selected as candidates for cloning and expression of the encoded proteins which have application in food industry and bioenergy, with high interest because of their potential thermostability. Genome Annotation and Organization Poster Session – Submission 17 Sequencing and assembly of Bacillus thuringiensis INTA Fr7-4 genome Laura Navas1,3 , Maximiliano Ortiz1 , Diego Sauka1,3 , Graciela Benintende1 , Marcelo Berretta1,3 , Rubén Zandomeni1,3 , Ariel Amadío2,3 . 46 Evolution, phylogenetics and comparative genomics ID:4 Poster Session 1-Instituto de Microbiología y Zoología Agrícola (IMyZA), Instituto Nacional de Tecnología Agropecuaria (INTA), Las Cabañas y de Los Reseros, Buenos Aires, Argentina. 2-EEA Rafaela, Instituto Nacional de Tecnología Agropecuaria (INTA), Ruta 34 km 227, Rafaela, Santa Fe. 3-CONICET In recent years, there has been growing interest in sequencing isolates of the Gram positive bacterium Bacillus thuringiensis (B. thuringiensis) to discover new insecticidal proteins useful for biocontrol of agricultural pests and mosquitoes. B. thuringiensis INTA Fr7-4 is a native strain isolated from a soil sample in the province of Misiones. We have previously reported the complete sequence of three plasmids of this strain and characterized three insecticidal genes of the crystal (cry) family of proteins. This work reports on the sequencing of the genomic DNA from B. thuringiensis INTA Fr7-4 and the assembly of the readings to obtain the sequence of the chromosome. Genomic DNA from B. thuringiensis INTA Fr7-4 was sequenced using a 8 Kb long jumping distance library and 2 × 150 bp run on a MiSeq Illumina apparatus. After applying the appropriate quality clipping, 2,442,414 paired end readings (total of 4,884,828) were obtained, with an average length of 129 bp, and 4,962,965 singleton reads averaging 124 nt length. A de novo assembly was done using Velvet. The longest scaffold was 3.9 Mb long, and a total of 10 scaffolds longer than 10 kb were obtained. Scaffolds were compared with the GenBank database using blastn showing high identity with the chromosome of Bacillus bombysepticus str. Wang, a closely related species. For this reason, using it as a reference genome, we were able to build a map of B. thuringiensis INTA Fr7-4 chromosome consisting of 5 scaffolds, giving a total size of 5.2 Mb. Annotation of chromosome scaffolds using the RAST server was performed. We identified 5,300 CDS and 105 tRNA and rRNA gene-encoding regions. However, 187 CDS related to sporulation process in bacteria and 103 with chromosomal DNA replication attracted our attention. We did not find any insecticidal gene in the chromosome of B. thuringiensis INTA Fr7-4. The scaffolds not located within the chromosome belong to plasmid DNA. All plasmids previously sequenced were reconstructed, and a new plasmid of 259 Kb long was identified. This plasmid contains the previous detected insecticidal genes. Evolution, phylogeneticsand comparative genomics Poster Session – Submission 4 Archean core promoter region information content and its relation with optimal growth temperature. Ariel Aptekmann, Alejandro Nadra IQUIBICEN, Argentina Abstract. We studied the relation between optimal growth temperature (OGT) and information content (IC), in the core promoter region of all the archeal genomes published to date, by calculating the information content of the motiff that represents the TATA binding site (TBS). We have tested several different approaches to predict transcription start sites (TSS) in a given genome we then used motiff prediction software in the flanking regions to the TSS, we constructed a database, compiling already available information from published sources, that contains characteristic growth conditions for each strain. Our work hipotesis is that protein-dna interfase in thermophiles should be different from that of mesophiles, in particular we propose and test a positive correlation between information content of binding sites and OGT in archeas. We show that the information content increases with increasing optimal growth temperature, and this effect cannot be explained solely by an increased CG composition. Selective pressure towards binding sites with higher binding affinity to the protein could be the reason for this correlation. The established Rseq = Rf r eq from molecular information theory doesnt take into account the effect of temperature as a selective pressure acting to skew the posible binding sites, and creating another cause for an increment in Rseq that doesnt apply to Rf r eq. Since entropy effects increase with temperature, Shannon entropy effects might as well. 47 Evolution, phylogenetics and comparative genomics ID:15 Evolution, phylogeneticsand comparative genomics Poster Session Poster Session – Submission 15 Exploring the genetic bases of mammalian unique hearing capacities: an evolutionary approach Francisco Pisciottano , Belén Elgoyhen , Lucía Franchini Instituto de Investigaciones en Ingeniería Genética y Biología Molecular (INGEBI), Buenos Aires, Argentina Mammals possess unique hearing capacities among animals. These capacities are the consequence of an evolutionary process which involves a number of important changes in the inner ear. Among these changes we can remark the the elongation of the papilla that rendered the characteristic mammalian coiled cochlea, the special and stable distribution of hair cells all along Corti’s Organ and the origin of a unique cellular type, the outer hair cell (OHC). This new kind of cell endowed mammals with a novel sound mechanic amplification mechanism known as somatic electromotility, an active cochlear amplifier process crucial for auditory sensitivity and frequency selectivity. Although these features are well studied and most of them are regarded as evolutionary novelties, product of an adaptive process in the mammalian lineage, little is known of the genetic bases underlying the evolution of these features. Only a few inner ear proteins have previously been subject of selection analysis [1,2]. Our main objective is to study the evolutionary processes that shaped those genes involved in the evolution of the particular functional capacities of the mammalian inner ear. To do so, we are assembling an inner ear database that comprises genes from different sources. For the construction of this database we aim to concentrate the information generated by seventeen expression libraries that gather 86,744 expressed sequence tags (ESTs). For the evolutionary analysis we perform branch-site specific positive selection test [3] that allow us to recognize those genes that fit the model of adaptive evolution, and the specific sites in the alignment that have evolved under positive selection in the lineage that gave origin to mammals. Among the seventeen publicly available expression libraries, the RIKEN adult mouse inner ear [4] is the main rodent library, containing 22,576 ESTs that would represent more than 4,500 genes, and one of the most trustworthy among the inner ear libraries, according to our testing studies. A preliminary test carried out from the first 100 ESTs of this library rendered 84 genes. Although only 34 of them could be analyzed due to the available information, 11 of them showed signs of positive selection (P>0.95), pointing out that there is an important number of inner ear genes that may show adaptive evolution along the mammalian branch. We present here the pipeline developed to analyze the information gathered from the expression libraries and the results obtained from the complete analysis of the RIKEN library. References 1. Franchini LF, Elgoyhen AB: Adaptive evolution in mammalian proteins involved in cochlear outer hair cell electromotility. Mol Phylogenet Evol 2006, 41:622-635. 2. Kirwan JD, Bekaert M, Commins JM, Davies KTJ, Rossiter SJ, Teeling EC: A phylomedicine approach to understanding the evolution of auditory sensory perception and disease in mammals. Evol Appl 2013, 6:412-422. 3. Zhang J, Nielsen R, Yang Z: Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level. Mol Biol Evol 2005, 22:2472-2479. 4. Okazaki Y, Furuno M, Kasukawa T, Adachi J, Bono H, Kondo S, Nikaido I, Osato N, Saito R, Suzuki H et al.: Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature 2002, 420:563-573. 48 Evolution, phylogenetics and comparative genomics ID:43 Evolution, phylogeneticsand comparative genomics Poster Session Poster Session – Submission 43 Population genetic structure of the ancestor of the Lager-brewing yeast in Patagonia (Saccharomyces eubayanus) Juan Ignacio Eizaguirre1 , David Peris2 , Patricio De Los Ríos3 , Christian Lopes4 , María Eugenia Rodríguez5 , Chris Hittinger6 , Diego Libkind7 1 Lab. Microbiología Aplicada y Biotecnología, Instituto de Investigación en Biodiversidad y Medioambiente (INIBIOMA), CONICET – UNComahue, Bariloche 2 Laboratory of Genetics, Genome Center of Wisconsin, Wisconsin Energy Institute, DOE Great Lakes Bioenergy Research Center, University of Wisconsin-Madison 3 Lab. Ecología Aplicada y Biodiversidad, Univ. Católica de Chile, Temuco 4 Grupo de Biodiversidad y Biotecnología de Levaduras, Inst. de investigación y desarrollo en Ing. de procesos, Biotecnología y Energías alternativas (PROBIEN), CONICET-UNComahue, Neuquén 5 Grupo de Biodiversidad y Biotecnología de Levaduras, Inst. de investigación y desarrollo en Ing. de procesos, Biotecnología y Energías alternativas (PROBIEN), CONICET-UNComahue, Neuquén 6 Laboratory of Genetics, Genome Center of Wisconsin, Wisconsin Energy Institute, DOE Great Lakes Bioenergy Research Center, University of Wisconsin-Madison 7 Lab. Microbiología Aplicada y Biotecnología, Instituto de Investigación en Biodiversidad y Medioambiente (INIBIOMA), CONICET – UNComahue, Bariloche The discovery and description in Patagonia of a new species of yeast, Saccharomyces eubayanus, parental of the inter-specific hybrid S. pastorianus (used worldwide in the production of LAGER beer) opened a very fertile field for research, development and innovation. This work aims to contribute to the knowledge on the biogeography of S.eubayanus along the Andean Patagonia and the genetic structure of its populations. To do this more than 200 isolates of S. eubayanus were obtained from various substrates (soil, bark, leaves and Cyttaria spp.) associated with various tree species of the endemic genus Nothofagus in Argentina and Chile (between latitudes 37◦ C to 54◦ C). The isolates were identified by PCR-fingerprinting and then, along with LAGER-brewing strains, were characterized by sequencing and analysis of COX2 (mitochondrial, 530bp) and DCR1 (nuclear, 859bp). Both genes proved to be useful for detecting intra-specific variability in the studied species. A database of the isolate´s source coordinates, altitude, type of substrate, host tree and other ecological parameters such precipitation, radiation and mean temperaturas, was generated. Geographical distance of the isolates was calculated and a principal component analysis of ecological traits was performed. The results with DCR1 gene discriminated two main populations with a genetic divergence of ∼ 1%. 29% of the strains tested was part of the "A" population (21 haplotypes) located exclusively in northern Patagonia, and "B" population (47 haplotypes) consisted of remaining strains tested and these were distributed throughout Patagonia. This is consistent with results obtained using fewer strains employing SNPs markers ∼ 10KB at genomic level. With the COX2 gene, both populations were not evidenced but we found at least 55 haplotypes. Moreover, the COX2 gene showed that 7 strains appeared to be recombinants between S.eubayanus and a second species: Saccharomyces uvarum; which is sympatric and closely related to S.eubayanus. Phylogenetic networks analysis were performed which allowed a better understanding of the reticulated evolution of populations of S.eubayanus were generated. The results showed the existence of two populations of S.eubayanus markedly different in Patagonia and that at the same time exhibit high intra-population genetic heterogeneity. The population of greater abundance and distribution (B) was the closest (genetically) to LAGER strains although none of the isolates showed 100% similarity. In this paper, hypotheses about environmental and geological factors influencing the population structure of this species are addressed. This yeast species has biotechnological importance for the production of beers with Patagonian regional identity. 49 Evolution, phylogenetics and comparative genomics ID:47 Evolution, phylogeneticsand comparative genomics Poster Session Poster Session – Submission 47 Genome sequencing and comparative analysis of two new Enterococcus faecium strains Gabriel Gallina Nizzo1,2 , Luis Esteban2 , Christian Magni1 1 Instituto de Biología Molecular y Celular de Rosario (IBR-CONICET), Facultad de CienciasBioquímicas y Farmacéuticas, Universidad Nacional de Rosario, Santa Fe, Argentina. 2 Facultad de Ciencias Médicas, Universidad Nacional de Rosario, Santa Fe, Argentina Background. The enterococci are an ancient genus of microbes that are highly adapted to living in complex environments and surviving harsh conditions. Enterococcus faecium are leading causes of multidrug resistant hospital acquired infections. Moreover, the enterococci member serve as reservoirs for antibiotic resistances that they are spreading to other important pathogens. Relevance of studies on this bacteria stand on its dual role as commensal or opportunistic pathogens [1]. Recently, we isolated from cheese two new variants of E. faecium, named E. faecium IQ110 and E. faecium GM75. In order to characterizer them we sequenced both. We used Illumina sequencing to determine the genome sequence of both isolates. The short reads were de novo assembled using SeqMan NGen sequence assembly software and with the resulting contigs BLASTN (all versus all) was performed and those contigs shorter than 1,000 bp and with an similitude higher than 99% with sequences already contained in a longer contig were deleted. The final assembly resulted in 43 and 152 contigs for E. faecium IQ110 and E. faecium GM75 respectively. This remaining contigs were ordered and oriented with Advanced Pipmaker [2] using E. NRRL as genome of reference. Genome annotation was accomplished by Rast [3] and Basys [4]. Manual curation of genes was performed with Artemis. To asses presence of Genomic Islands (GEIs), Plasmids, Viruses, Virulence genes, Acquired antimicrobial resistance genes, Insertion sequences and Pathogenic prediction were employed: Island Viewer, PlasmidFinder, Phast, VirulenceFinder , Resfinder , Isfinder and PathogenFinder respectively. A distance matrix was obtained with Gegenees [5] using the Enterococcus faecium genomes representatives of genome homology groups published so far. Then, a Phylogenetic network was constructed using SplitsTree4 software [6] in order to locate this new strains in their respectives clades. Functional comparison was based in Rast assignments and visualized with Mauve Genome Alignment Software [7].Two clusters of PTS related genes are in GEIs in both bacterias. No virulence factors were found in the IQ110. Despite the negative prediction of PathogenFinder (predicts pathogenic potential), GM75 has two virulence factors: the efa Afm adhesin and acm. Also in the GEIs three prophage clusters were found. Other important point for the medical point of view is the resistance to ATB. Both bacterias contain the Resistence genes for Aminoglycoside and Macrolide. IS elements and transposases are major mobile genetics elements In E. faecium. They share 52 IS, 16 are unique of E. faecium GM75 and 6 in the strain IQ110. The phylogenetic network sets E. faecium GM75 within the clade B and E. faecium IQ110 in clade A [8]. These findings should advance our understanding of the adaptation of this bacterium to different hosts and the evolutionary mechanism involved. Results. The main differences in the categories of COG observed between them are in ’Carbohydrate metabolism and transport’ and ’Replication and repair’. In the first category highlights the presence of genes related to the metabolism of citrate for E. faecium strain GM75 and the absence of these in the strain E. faecium IQ110 while genes for utilization xylose, D-sorbitol, L-sorbose and trehalose is in the latter and are absent in E. faecium GM75 which could indicate a shift in sugar metabolism due to a niche adaptation. Despite this we found larger number of genes for the metabolism of sucrose and fructose in E. faecium GM75 some of them adquired by HGT. E. faecium GM75 possesses more GEIs than strain IQ110; 16 GEIs vs 7. Two clusters of PTS related genes are in GEIs in both bacterias. No virulence factors were found in the IQ110. Despite the negative prediction of PathogenFinder (predicts pathogenic potential), GM75 has two virulence factors: the efa Afm adhesin and acm. Also in the GEIs three prophage clusters were found. Other important point for the medical point of view is the resistance to ATB. Both bacterias contain the Resistence 50 Genomics, functional genomics and metagenomics ID:20 Poster Session genes for Aminoglycoside and Macrolide. IS elements and transposases are major mobile genetics elements In E. faecium. They share 52 IS, 16 are unique of E. faecium GM75 and 6 in the strain IQ110. The phylogenetic network sets E. faecium GM75 within the clade B and E. faecium IQ110 in clade A [8]. These findings should advance our understanding of the adaptation of th References 1. Gilmore MS, Clewell DB, Ike Y, Shankar N, editors. Enterococci: From Commensals to Leading Causes of Drug Resistant Infection [Internet]. Boston: Massachusetts Eye and Ear Infirmary; 2014-. PubMed PMID: 24649511. 2. Elnitski L, Riemer C, Schwartz S, Hardison R, Miller W: PipMaker: a World Wide Web server for genomic sequence alignments. Curr Protoc Bioinformatics 2003, Chapter 10:Unit 10.2. 3. Overbeek R, et al: The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res 2014, 42:D206-14. 4. Van Domselaar GH, Stothard P, Shrivastava S, Cruz JA, Guo A, Dong X, Lu P, Szafron D, Greiner R, Wishart DS: BASys: a web server for automated bacterial genome annotation. Nucleic Acids Res 2005, 33:W455-9. 5. Agren J, Sundström A, Håfström T, Segerman B: Gegenees: fragmented alignment of multiple genomes for determining phylogenomic distances and genetic signatures unique for specified target groups. PLoS One 2012, 7:e39107. 6. Huson DH, Bryant D: Application of phylogenetic networks in evolutionary studies. Mol Biol Evol 2006, 23:254-267. 7. Darling AE, Mau B, Perna NT: progressiveMauve: Multiple Genome Alignment with Gene Gain, Loss and Rearrangement. PLoS ONE 2010, 5:e11147. 8. Palmer KL, et al: Comparative genomics of enterococci: variation in Enterococcus faecalis, clade structure in E. faecium, and defining characteristics of E. gallinarum and E. casseliflavus. Mbio 2012, 3:e00318-11. Genomics, functional genomicsand metagenomics Poster Session – Submission 20 pSVMTOCP: a parallel SVM tree algorithm for optimal multi-class partition Nicolás Ferreyra1 , Cristóbal Fresno2,3 , María Laura Zingaretti1 , Laura Prato1 , Diego Arab Cohen2 , Elmer Fernández2,3 1-Instituto de Ciencias Basicas y Apicadas- Universidad Nacional de Villa María, Villa María, Córdoba, Argentina. 2-Universidad Catolica de Cordoba, Biosciences Data Mining Group, Córdoba, Córdoba, Argentina. 3-CONICET, Argentina. Background. In Bioinformatics and many other fields, supervised classification problems are a common issue, particularly when a diagnostic methods based on molecular signatures is required to classify several disease levels. This is usually a hard problem due to both, the amount of samples required or the overlapping characteristics of the variables/genes describing the disease. One of the main tools used for multi-class classification problems is Support Vector Machines (SVM) [2] under the well known One vs One (OVO) and One vs All (OVA) strategies [3]. Tree SVM structures has also been proposed but some prior clustering or segmentation procedure is required that could introduce inaccuracies. In addition, any of the previous strategies are time consuming when parameters optimization is required. Here we propose a fast and accurate tree structure classification strategy based on SVM[4], enhanced by means of SVMpath algorithm[1]. The new model produce solutions with higher accuracy and reach more hard-margin solutions than any other. More hard-margin solution implies a better generalization capabilities of the classifier as a diagnostic tool. Materials and methods. Parallel SVM Tree Optimal Class Partition (pSVMTOCP) is a data-mining tool that creates binary and balanced trees (see Figure 1). Each tree is composed by nodes, which have associated two elements in a downstream manner. Any of this elements can be another node or a leaf (class). At node “i”, the data set is split, based on its class labels, into li = ηKi !/r !(Ki − r )! binary problems, Ki is the number of classes at node “i”, r = [K/2] and η is 1 for K odd and 0.5 otherwise. Each of this where problems is solved by a SVM model where the kernel 51 Genomics, functional genomics and metagenomics ID:20 Poster Session and / or cost parameters are optimized for each binary problem. Since this is a time consuming task, we apply the SVMpath algorithm, which span all the parameters in almost the same time used to train a common SVM. Then, the best performance is chosen and the classes of each partition passes to the downstream nodes and the process is repeated. To speed up the process, the proposed algorithm can be executed in parallel, separating different threads for node-training and SVMs-training inside each node. The proposed method is tested using the following datasets: Iris, Glass, Breast Tissue, 9Tumors and NCI60. The performance of the pSVMTOCP is compared against the OVO strategy. Each dataset was divided into a train set (80% of observations) and a test set (20% of observations) for every dataset. Results. Dataset characteristics and performance for OVO and SVMTOCP methods is presented in Table 1. It is possible to observe that pSVMTOCP strategy outperform the OVO method for all datasets, achieving lesser errors, higher proportion of hard margin solutions, less amount of support vectors as well as lesser training time. In Figure 1 is possible to observe the achieved pSVMTOCP associated with NCI60 dataset. Figure 1: pSVMTOCP associated with NCI60 dataset. SVM multi-class strategy Data Name Iris Glass B. Tissue 9Tumors. NCI60. base OVO N VarsK C NSV HM sol. %Error 150 5 3 1 25 1 of 3 3,33 213 10 6 21 125 4 of 15 25,58 106 10 6 156 55 4 of 15 27,27 58 71 8 0,021 42 15 of 28 53,84 61 264 8 0,006 47 6 of 28 16,66 Time 8,2 577,79 28,47 91,63 275,68 pSVMTOCP CMin-CMax NSV HM sol. 0,65-428,2 9 1 of 2 0,34-38,32 130 2 of 5 0,74-8428,2 32 2 of 5 0,006-0,068 34 7 of 7 0,002-0,012 39 7 of 7 %Error 0 23,25 18,18 30,76 8,33 Time 0,83 2,9 2,2 3,1 4,62 Table 1: Performance table for different SVM multi-class strategies (N=Rows; Vars=Variables; K=Classes; C=Cost; NSV=Number of supportvectors; HM sol.=Hard Margin solutions; %Error= Percentage of classification error predicting test set, Time=Train time in seconds). 52 Genomics, functional genomics and metagenomics ID:41 Poster Session Conclusions. pSVMTOCP strategy is a robust choice when we deal with a supervised classification problem. This strategy divides one single problem into several sub-problems allowing to set up specific parameters for each node. This is an advantage because it can treat a problem independently from the others. Also, pSVMTOCP makes accurate class-predictions for new data and has very small time executions. References 1. Hastie, Trevor. The Entire Regularization Path for the Support Vector Machine. Journal of Machine Learning Research 5 (2004). 2. Abe, Shigeo. Support Vector Machines for Pattern Classification. Springer (2005). 3. Rocha, Anderson; Goldenstein, Siome. (2013). Multiclass from Binary: Expanding One-vs-All, One-vs-One and ECOCbased Approaches. Recovered from: http://www.ic.unicamp.br/~siome/papers/Rocha-TNNLS-2013.pdf 4. Diego Arab Cohen, Elmer Andrés Fernández (2012). SVMTOCP: A binary tree base SVM approach through optimal multi-class binarization. Recovered from: http://link.springer.com/chapter/10.1007%2F978-3-642-33275-3_58 Genomics, functional genomicsand metagenomics Poster Session – Submission 41 Multi-way for analysis and visualization of OMIC data: maVOD María Laura Zingaretti1 , Johanna Demey-Zambrano2 , Jose Luis Vicente-Villardón3 , Julio Alejandro Di Rienzo4 , Jhonny Rafael Demey5 1 Instituto de Ciencias Básicas y Aplicadas- Instituto de Ciencias Humanas, Universidad Nacional de Villa María. Villa María, Cór- doba, Argentina. 2 School of Medicine and Biomedical Sciences, University at Buffalo. Buffalo, NY, USA. 3 Departamento de Estadística, Universidad de Salamanca. Salamanca, España. 4 Facultad de Ciencias Agrarias, Universidad Nacional de Córdoba. Córdoba, Argentina. 5 Lab. de Biometría y Estadística, Instituto de Estudios Avanzados. Caracas, Venezuela. In the last couple of years, the availability rise of public data of microarrays has gained increasing importance. The "omics" technologies allow quantitative knowledge of hundreds of biological data of complex nature and have enabled the opportunity of study simultaneously, based on multiple datasets, the expression levels of thousands of genes over the effects of certain treatments, diseases, and developmental stages on gene expression. This has turn out to be a promising approach for analysing and interpreting genome-wide association studies and gene set analysis that are useful to comprehension of biological processes. However, current statistical methodologies for gene set analysis based on multiple datasets are still in an early stage of development, they are mostly based on classical statistical methods, since the joint analysis of the subspaces that generate multiple datasets are not simple. The problem is centered on finding a best statistical approach that allows us to relate the genes expression with different experimental conditions or independent groups that have not been observed and measured with the same accuracy, precision and levels of replication in experimental design. The k-tables analysis have been developed to handle these problems and to calculate a consensus from data matrices that generates the different scales, dimensions or spaces. This methodology is an extension of principal component analysis (PCA) tailored to handle multiple data tables that measure sets of variables collected on the same observations (Abdi, et al., 2012); multi-way for analysis and visualization of OMIC data: maVOD (Demey and Zingaretti, 2014) is an R package written for this purpose, that introduce the following improvements to the method: Genes Sample variability, multiple comparisons between studies (DGC-Test) (Di Rienzo, 2002), selection of genes with QR criterion (Demey et al, 2008) and the network representation of biological proccess using the average projections of compromise matrix. We illustrate the proposed approach using multiple microarray gene expression datasets obtained from Tomato Funtional Genomics DataBase of eight studies asociated to several diseases that afect the tomato crop, from different times of post infection, plant age, plant tissue and types of experiment and array platform. 53 Genomics, functional genomics and metagenomics ID:49 Poster Session References • Abdi, H., Williams, L.J., Valentin, D., & Bennani-Dosse, M. (2012). STATIS and DISTATIS: optimum multitable principal component analysis and three way metric multidimensional scaling. WIREs Comput Stat, 4, 124-167. doi: 10.1002/wics.198. • Demey et al (2008) Bioinformatics, 24(24):2832-2838 • Demey JR, L Zingaretti (2014). maVOD. R package version 3.1.0 • Di Rienzo et al (2002) JABES,7(2):129-142 • Lavit C, et al. Computational Statistics and Data Analysis, 18:97–119 Genomics, functional genomicsand metagenomics Poster Session – Submission 49 Bioprospecting of lignocellulolytic enzymes in enriched consortia of pine and eucalyptus forest soils by metagenomic sequencing Marina D. Reinert1 , Santiago Revale1 , Estefanía Mancini2 , María Belén Carbonetto1 , Martín P.Vazquez1 1 Instituto de Agrobiotecnología Rosario, Rosario, Santa Fé, Argentina 2 Fundación Intituto Leloir, Buenos Aires, Argentina Background. Second generation biofuels are produced by fermentation of sugars extracted from agronomic residues to ethanol. Lignocellulose breakdown is a crucial step needed to obtain sugar free molecules. Nowadays the bottleneck for second generation biofuel production is in the cost of lignocellulolitic enzymes [1, 2]. Our aim is to use metagenomic based bioprospecting to find novel lignocellulose degrading proteins and to produce them in a low cost system based on plants as biofactories. Methods. We took soils samples in a Pine elliotis and in a Eucalyptus grandis forest soils in Concordia, Entre Ríos, in February 2012. Both soils contained wood decaying material. Samples were then used as inoculum for minimum media [3] with only carboximetil-celulose (CMC) or sawdust as organic matter. Additionaly, we used antibiotics or antifungals to prevent each type of organism grow in each case. They were cultured for 30 days, and an aliquot of each culture was taken every 10 days. Genomic DNA was extracted from each sample. Amplicon sequencing of the V4 region of 16s rRNA gene was then performed at 454 GS-FLX+ (Roche) platform in order to evaluate the enrichment of lignocellulose degrading microorganisms. Whole genome metagenomic sequencing (454 GS-FLX+) was then performed to the most enriched sample (i.e. the one with high proportion of taxa described as lignocellulose degraders and minus of commensals). Bioprospection analysis using bioinformatics tools was then performed. First, we did de novo assembly using the CAMERA [https://portal.camera.calit2.net/gridsphere/gridsphere] assembler workflow. Then we used the MG- RAST [http://metagenomics.anl.gov/] platform for taxonomic and functional annotation. We extracted coding sequences (CDS) using Fraggene scan open reading frame (ORF) algorithm. We finally ran Blast against CAZy database [http://www.cazy.org/] to find lignocellulosic enzyme domains in our CDS dataset. A customized Perl script was used to get only those glycosyl hydrolase and cellulose binding domains linked with degrading activities [4]. Finally, we selected only those sequences who had shown consistence with Pfam [http://pfam.xfam.org/], UniProt [http://www.uniprot.org/] and Priam [http://priam.prabi.fr/] annotations, proper ORF length and not high homology with database enzymes (below 80%). Results. The metagenomic sequencing produced 718.489 reads, 421 pair bases (pb) long in average, totaling 302.172.049pb. A 10% (30.458.285pb) of the total pair bases were assembled in contigs. Maximum length contig was 523.078pb. We manually selected 39 promising proteins with an average length of 644pb, figure 1 and table 1 summarize its identity and domains. 54 Metabolomics and Cheminformatics ID:46 Poster Session Figure 1: The pie chart shows the abundance of glycosil hydrolase and cellulose binding domains in the selected proteins. Enzymes Acetylxylan esterase Alpha-glucuronidase Alpha-N-arabinofuranosidase Beta-glucosidase Endo-1,4-beta-xylanase Endoglucanase Xylan 1,4-beta-xylosidase Feruloyl esterase EC number 3.1.1.72 3.2.1.139 3.2.1.55 3.2.1.21 3.2.1.8 3.2.1.4 3.2.1.37 3.1.1.73 # 4 1 6 12 2 2 11 1 Table 1: shows all enzyme activities selected with his Enzyme Commission (EC) number and abundance of each one. Conclusions. The enrichment process allowed us to get bacterial consortia containing lignocellulose degrading microorganism, as we seen previously by 16s rRNA amplicon sequencing. But only implementing metagenomic sequencing we were able to know sequence identity of proteins involved in lignocellulose degrading. Proteins were manually annotated and a subset selected applying bioinformatics tools. This proceedings resulted in a list of 39 promising enzymes. These will be subject of experimental test at lab to take part of a degrading cocktail. Acknowledgments. We would like to thanks to Lic. Soledad Romero and Lic. Bianca Brun for perform all sequencing runs used in this study. References 1. Naik SN, Goud VV, Rout PK, Dalai AK: Production of first and second generation biofuels: A comprehensive review. Renew Sustain Energy Rev 2010, 14: 578–597. 2. Mtui GYS: Recent advances in pretreatment of lignocellulosic wastes and production of value added products. African Journal of Biotechnology 2009, 8: 1398–1415. 3. Crawford D, McCoy E: Cellulases of Thermomonospora fusca and Streptomyces thermodiastaticus. Appl Environ Microbiol 1972, 24: 150-152. 4. . Allgaier M, Reddy A, Park JI, Ivanova N, D’haeseleer P, Lowry P, Sapra R, Hazen TC, Simmons BA, VanderGheynst JS et al. Targeted discovery of glycoside hydrolases from a switchgrass-adapted compost community. PLoS One 2010, 5: 372–380. 55 Metabolomics and Cheminformatics ID:46 Metabolomics and Cheminformatics Poster Session Poster Session – Submission 46 Interactive Visual Analysis Methodology for Improving Descriptor Selection in QSPR: First Steps María Jimena Martinez1 , Fiorella Cravero2 , Gustavo E. Vazquez3 , Mónica F. Díaz2 , Axel J. Soto4 , Ignacio Ponzoni1 1 Laboratory for Research and Development in Scientific Computing (LIDeCC), ICIC, DCIC, UNS, Av. Alem 1250, Bahía Blanca, Argentina 2 Planta Piloto de Ingeniería Química (PLAPIQUI) CONICET-UNS, La Carrindanga km.7, Bahía Blanca, Argentina 3 Facultad de Ingeniería y Tecnologías – Universidad Católica del Uruguay, Montevideo, Uruguay. 4 Faculty of Computer Science, Dalhousie University, Halifax, Canada. Background. The design of QSAR/QSPR models requires dealing with several problems. One of them is the selection of the most relevant set of molecular descriptors for the property or activity that is intended to be modeled. One central point in this task is how we can involve the domain expert (e.g. a chemist), so that he can incorporate his knowledge and expertise during the feature selection process [1]. In this context, strategies based on dynamic visual analysis can be useful. The main idea behind visual analytics approaches is to merge the computational capacity of statistical and machine learning methods with the human natural ability of identifying patterns in visualizations. Therefore, by allowing some form of interaction in the visualizations, users can explore the data and provide feedback to the method, and/or use the tool to arrive at more informative decisions. In this work we report our first experiences in the design of a methodology, which combines statistical methods with interactive visualizations, in order to address the problem of molecular descriptor selection. Methods. The interactive visual analytics tool proposed is used for exploring alternative QSAR models, and it is organized in four charts (Figure 1): two undirected graphs that represent pairwise associations between descriptors, a bipartite graph, which represents the relationship among models and descriptors, and a customized plot area, which depicts different relationships between the descriptors and the target property. Some relevant characteristics that can be highlighted by the visualizations are: redundant descriptors, descriptors that provide discriminative information, relevant descriptors by consensus among alternative models, and descriptors whose knowledge helps decrease the uncertainty about the value of the target property. In this way, the modeler can analyze the different aspects involved in the QSAR/QSPR model design simultaneously. Results and Conclusions. The capabilities of our tool were assessed through two case studies. One study corresponds to the prediction for VOCs (volatile organic compounds) [2]. The tool was used to select one subset of descriptors from a group of four alternatives subsets. The other study, corresponds to the prediction of elongation at break for high molecular weight polymers [3]. In this scenario, the tool was used to illustrate the case where the analyst wants to modify the automatic selections of descriptors in order to incorporate an experimental parameter to the model. In both cases, the results showed the suitability and convenience of this methodology for selecting sets of descriptors with desirable characteristics (low cardinality, high interpretability, low redundancy and high statistical performance) in an exploratory and versatile way. 56 Metabolomics and Cheminformatics ID:46 Poster Session Figure 1: a) In both undirected graphs each node represents a descriptor selected for at least one of the QSAR models. The node color uses a grayscale to indicate the proportion of models in which the descriptor has been selected. The node sizes and edges can be customized for representing different types of relationships among descriptors. Two main modes where defined: entropy-based and correlation-based. 57 Metabolomics and Cheminformatics ID:46 b) This chart is a bipartite graph, where the nodes on the left represent the models and the nodes on the right represent the descriptors of these models. The edges indicate occurrence of a descriptor in a model. Poster Session c) Double-clicking on a node in the undirected graphs shows a scatter plot with the dispersion of the values of this descriptor versus its corresponding target property value. Additionally, two histograms indicating the frequency of the descriptor and target values can be overlapped over the scatter plot. Acknowledgments. This work is kindly supported by PGI-UNS (24/N032) and PIP112-2009-0100322 (CONICET National Research Council of Argentina). References 1. Palomba D, Martínez M J, Ponzoni I, Díaz M F, Vazquez G E, Soto A J: QSAR models for predicting log Pliver on volatile organic compounds combining statistical methods and domain knowledge. Molecules 2012, 17: 14937-14953. 2. Abraham M H, Ibrahim A, Acree W E Jr: Air to liver partition coefficients for volatile organic compounds and blood to liver partition coefficients for volatile organic compounds and drugs. Eur J Med Chem 2007, 42: 743-751. 3. Todeschini R, Consonni V, Ballabio D, Mauri A, Cassotti M, Lee S, West A, Cartlidge D: QSPR study of rheological and mechanical properties of chloroprene rubber accelerators. Rubber Chemistry and Technology 2014, 87: 219-238. 58 Proteomics and functional proteomics ID:39 Proteomics and functional proteomics Poster Session Poster Session – Submission 39 Identifying relationships between structure and function of the bacterial metabolic pathway TR–TRX–TPX Diego S. Vazquez1 ,*, Javier Iserte2 , William A. Agudelo1 , Gerardo Ferrer-Sueta3 , Bruno Manta3,4 , Mariano C. González Lebrero1 , Cristina Marino Buslje2 and Javier Santos1∗ 1 IQUIFIB (UBA-CONICET), Departamento de Química Biológica, FFyB, Universidad de Buenos Aires, Argentina. 2 Laboratorio de Bioinformática Estructural, IIBA-CONICET, Fundación Instituto Leloir, Argentina. 3 Laboratorio de Fisicoquímica Biológica, Instituto de Química Biológica, Facultad de Ciencias, UdelaR, Uruguay. 4 Laboratorio de Biología Redox de Tripanosomas, Institut Pasteur de Montevideo, Uruguay. Contact e-mail: [email protected]–[email protected] Background. Throughout all the kingdoms, the cellular antioxidant and redox homeostasis are regulated by the thioredoxin and glutathione systems [1,2] which comprise several TRX-like fold proteins such as glutaredoxins, thioldependent peroxidases (PRXs), thioredoxin, among others. Our interest in this system, from a biophysical viewpoint, is mainly based on (i) the plasticity in the thioredoxin (TRX) substrate recognition process. TRX has different targets and is only reduced by the FAD- dependent thioredoxin reductase in vivo [1]; (ii) the existence of large conformational changes in PRX family (helix-coil transitions) that take place coupled to catalysis [3] and may impact over the catalytic rate; (iii) electron canalization in TR [4]. These aspects among others prompted us to hypothesize the existence of an evolutionary preserved interaction network involved in protein-protein contact and substrate recognition as well as in internal dynamic and thermodynamic stability. For this, we performed an exhaustive bioinformatic and structural analysis of three well-characterized proteins: thioredoxin reductase (TR), thioredoxin 1 (TRX) and the thiol-dependent peroxidase (TPX, an atypical 2-Cys- peroxiredoxin). Methods. Mutual co-evolutionary relationships between positions in a multiple sequence alignment containing 3430 sequences of TR, TRX and TPX proteins from the bacterial domain, were performed using the MISTIC on line server (http://mistic.leloir.org.ar [5]). The most promised inter- and intra- protein pair-of-residues obtained by mutual information (MI) were subjected to in silico mutations and molecular dynamic (MD) simulations and principal components analysis (PCA) in order to study the role in the dynamic/thermodynamic of the protein. MDs were performed in the AMBER14–GPU package [6]. PCA were analyzed and post-processed with the ccptraj module of AmberTools13. Results. Preliminary results from mutual information analysis suggest the existence of qualitatively different pair of residues: (i) located near of the active site suggesting a role in catalysis, (ii) residues with high accessible surface area suggesting a role in protein-protein interaction and (iii) a group located principally in the core of the proteins (see Figure 1). 59 Proteomics and functional proteomics ID:53 Poster Session Figure 1: Mapping of the highest (Top 10) MI scored pair-of-residues intra- (red VdW spheres) and inter-proteins (purples VdW spheres for TR-TRX and orange for TRX-TPX) on TR (A), TRX (B) and TPX (C) structure, respectively. The functional cysteines are shown in yellow. In addition, the maximum frequency conservation profile, with a clustering of 62% of similarity, is shown. Acknowledgments. This work was supported by grants from ANPCyT, UBACyT and CONICET. References 1. Lu, Holmgren: The thioredoxin antioxidant system, Free Radical Biology and Medicine, 2013, 8;66:75-87. 2. Pillay et al.: The logic of kinetic regulation in the thioredoxin system, BMC Systems Biology, 2011, 5:15. 3. Hall et al.: Structural Changes Common to Catalysis in the Tpx Peroxiredoxin Subfamily. J. Mol. Biol., 2009, 867–881, 393. 4. Williams Jr.: Mechanism and structure of thioredoxin reductase from Escherichia coli. FASEB J., 1995, 13:1267-76. 5. Simonetti et al.: MISTIC: mutual information server to infer coevolution. Nucleic Acids Research, 2013 6. Case et al.: AMBER 14, University of California, San Francisco. Proteomics and functional proteomics Poster Session – Submission 53 Conformational diversity of protein functional regions improves the characterization of deleterious mutations. Alexander Monzon1 , Emidio Capriotti2 and Gustavo Parisi1 1 Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Buenos Aires, Argentina. 2 Department of Pathology, University of Alabama at Birmingham, Birmingham, AL, USA. Introduction. The native state of the proteins shows a wide range of conformations, which are important for their biological function. The conformers characterizing the native ensemble show different degrees of biological activities derived from their corresponding structural rearrangements. These complex conformer activities, mingled in a dynamic equilibrium, define as a whole the structural basis of protein function. The flavor of these structural changes goes from the large relative movements of complete domains and the change in loops and secondary structural elements orientation to the small changes in the rotation of residues side-chain. Altogether, these movements modulate the transit of ligands (substrate, products, ions, water and allosteric modulators) through pathways connecting the surface of the protein with its interior. Tunnels, channels, cavities, pockets, grooves, voids and pores are some of the structural features of proteins defining the traffic of ligands inwards and outwards the protein accordingly with the different conformers in the native ensemble. Conformational diversity could produce variations in the size, wideness and deepness of these functional regions changing their physicochemical properties and then defining the differential biological activities observed in the conformers. Materials and results. Due to the biological importance of these regions we decided to study how deleteriousrelated mutations could occur preferentially associated with them. To this purpose we collected 382 proteins (3095 conformers) with 2394 mutations (1642 disease and 752 polymorphic) were each of the protein show experimentally probed conformational diversity extracted from CoDNaS database. Tunnels, cavities and pockets were estimated using Fpocket and MOLE programs. All the mutations were mapped into each of the conformers for each protein in the dataset to define to which functional region (tunnel, cavities and pockets) belong. We found that deleterious-related mutations occur preferentially in functional-regions (Fisher test p-value< 0.005) in reference to the occurrence of polymorphic mutations. As it is well established that deleterious-related mutations involve mainly buried residues, using all the buried positions we test how deleterious-related and polymorphic mutations could be differentially associated with the functional regions. We found using a Fisher test that for buried residues the distributions of mutations are different with a p-value< 0.005. Using the information of the conformational diversity of each protein, we found that deleterious-related mutations are less mobile that polymorphic mutations (Kolgomorov-Smirnov p-value< 0.001). 60 Structure prediction and protein function ID:6 Poster Session This trend is also observed when the deleterious-related mutations occurring in any of the functional-structures are compared with the polymorphic mutations also occurring in functional regions (Kolgomorov-Smirnov p-value< 0.001 and Wilcoxon test p-value< 0.01). Conclusions. Our results indicate that the analysis of functional-regions such as tunnels, cavities and pockets and their conformational diversity can help to better understand the functional effect of protein mutation Structure prediction and protein function 13 C α and 13 Poster Session – Submission 6 C β chemical shift-driven refinement of protein structures Pedro G. Ramírez, Osvaldo A. Martin and Jorge A. Vila. IMASL-CONICET. Universidad Nacional de San Luis, Italia 1556, 5700 - San Luis, Argentina Background. X-ray crystallography (XRC) and nuclear magnetic resonance (NMR) spectroscopy are the most powerful and predominant techniques used to experimentally determine the three–dimensional structures of biological macromolecules at near atomic resolution. On one hand, XRC has no size limitations and provides the most precise atomic detail, whereas information about the dynamics of the molecule may be limited. On the other hand, NMR– spectroscopy tops XRC in those cases where no protein crystals are available and, besides, it provides solution state dynamics. However, the main drawback of NMR-spectroscopy is the fact that it delivers lower resolution structures [1]. Because of this, validation, the process of evaluating the reliability for 3-dimensional atomic models, becomes critically important to protein structure determination via NMR-spectroscopy. Materials and methods. Our group has developed a protein structure validation method called CheShift-2 [2], which allows us to calculate the “differences” between observed and calculated chemical shifts for the nuclei of interest (13 C α and 13 C β ). This validation method indicates where, in the protein structure, the biggest “differences” are found. Thus, allowing us to modify the desired torsional angles, but keeping compatibility with all the existent experimental information, in such a way that the observed and computed chemical shift values at a local and global level are optimized. We use a refinement algorithm that identifies the residues that contain flaws and then modifies the protein structure’s torsional angles in a way that tend to diminish these flaws. The information to identify these residues is obtained by CheShift-2, and to perturb the protein structure we use the software package for prediction and design of protein structures, ROSETTA [3]. Conclusions. We evaluate our methodology by comparing the group of refined structures’ root mean square deviation (RMSD) and global distance test high accuracy score (GDT-HA) [4] against the same protein experimentally determined at high-quality level. Moreover, the physicochemical quality of the results were assessed with validation methods like PROCHECK [5] and MolProbity [6]. Acknowledgments. This work was supported by PIP-112-2011-0100030 (JAV) from IMASL-CONICET, Argentina, and Project 328402 (JAV) from UNSL, Argentina. The research was conducted by using the resources of a local Beowulf-type cluster at the IMASL-CONICET. References 1. Krishnan VR, B.: Macromolecular Structure Determination: Comparison of X- ray Crystallography and NMR Spectroscopy. eLS 2012. 2. Martin OA, Vila JA, Scheraga HA: CheShift-2: graphic validation of protein structures. Bioinformatics 2012, 28(11):15381539. 3. Raman S, Vernon R, Thompson J, Tyka M, Sadreyev R, Pei J, Kim D, Kellogg E, DiMaio F, Lange O et al: Structure prediction for CASP8 with all-atom refinement using Rosetta. Proteins 2009, 77 Suppl 9:89-99. 4. Zemla A: LGA: A method for finding 3D similarities in protein structures. Nucleic acids research 2003, 31(13):3370-3374. 61 Structure prediction and protein function ID:11 Poster Session 5. Laskowski RA, MacArthur MW, Moss DS, Thornton JM: PROCHECK: a program to check the stereochemical quality of protein structures. Journal of Applied Crystallography 1993, 26(2):283-291. 6. Davis IW, Leaver-Fay A, Chen VB, Block JN, Kapral GJ, Wang X, Murray LW, Arendall WB, 3rd, Snoeyink J, Richardson JS et al: MolProbity: all-atom contacts and structure validation for proteins and nucleic acids. Nucleic acids research 2007, 35(Web Server issue):W375-383. Structure prediction and protein function Poster Session – Submission 7 Effects of the poloxamer’s structure on its interaction with model membranes revealed by molecular dynamics simulations at coarse grain scale Irene Wood1,2 , M. Florencia Martini1,2 , Mónica Pickholz1,2 1-Pharmaceutical Technology Dept, Faculty of Pharmacy and Biochemistry, University of Buenos Aires, Buenos Aires, Argentina, 2-CONICET, Buenos Aires, Argentina. r class are composed by a central hydrophobic Background. The linear triblock co-polymers belonging to the Pluronic○ block of poli(propylene oxide) (PPO) flanked by two identicals hydrophilic blocks of poli(ethylene oxide) (PEO) [1]. These amphiphilic and biocompatible compounds are mainly used for biomedical and pharmaceutical purposes, due to their varied PEO and PPO composition. The poloxamers capability to interact with membranes justifies their applications [2]. Materials and methods. Coarse grained molecular dynamics (MD) simulations have been performed to investigate the interaction between different poloxamers, at their unimer form, with a fully hydrated 1-palmitoyl-2-oleoylphosphatidylcholine (POPC) lipid bilayer, from different initial localizations. Results. We have observed dependence of the unimer behaviors on its structural and physico-chemical features. Most of the studied unimers have shown different conformation depending on the initial condition. For instance, when F127 unimer was set up at the lipid-water interfacial region, adopts a coil structure in which the inner hydrophobic domain (PPO) is surrounded by the outer hydrophilic portion (PEO), which remains in contact with water (Figure 1.A). By the other hand, when F127 was initially placed at the bilayer hydrophobic region, have displayed a trans-membrane conformation, with the PPO block spread into the membrane tail region and the PEO chains water solvated on the both sides of the bilayer (Figure 1.B). Furthermore, the poloxamer L64 behaves in a different way when is compared with F127 and other studied poloxamers. L64 adopts a compact structure at the lipid-water interphase, showing not dependence on initial conditions. Snapshots after 1µs for the F127 systems at different initial conditions: A) interphase and B) membrane core. F127 is represented as VDW spheres (PPO in red and PEO in green). POPC (choline in blue, phosphate in magenta, carbonyls in light blue, acyl chains in brown) and water (transparent light blue) are represented as balls and sticks. Conclusion. Our results provide a picture of the conditions determining poloxamer-bilayer interactions. The interaction degree of certain co-polymers with membranes could favor their use as excipients for drug delivery and as indirect inhibitor of transmembrane efflux proteins, whose over-expression is related with multi-drug resistance. Structure prediction and protein function Poster Session – Submission 11 Frustration and Energetics in the Ankyrin Repeat Protein Fold R. Gonzalo Parra, Espada Rocio, Verstraete Nina, Ferreiro Diego U. Protein Physiology Laboratory, Dpto. Química Bioógica, Facultad de Ciencias Exactas y Naturales, UBA-CONICET-IQUIBICEN, Universidad de Buenos Aires, Argentina. 62 Structure prediction and protein function ID:14 Poster Session Background. Natural protein sequences resemble random strings of amino acids. Patterns of a relatively small set of folding architectures can be characterized by long distance interactions among amino acids. Repeat proteins are composed of tandem copies of similar motifs and can get spontaneously organized in symmetrical ways in space. Ankyrin repeat proteins comprise a large number of proteins containing tandem copies of a 33 residues length motif. They are present in all kingdoms of life, and are apparently enriched in eukaryotes and some specific pathogens. Quasi one-dimensional, these proteins constitute a simplified model, where the “sequence-codes-structure-codes-function” paradigm can be quantitative evaluated. Description. Given a structural detection of repeats we have achieved in a previous work using a geometrical approach we developed [1], we analyzed the local frustration and energetic patterns of ankyrin repeat proteins in order to dissect the energetical contributions corresponding to the different repeats, the array of repeats and their modifications. We have quantified the degree of conservation of the frustrated state over the canonical positions of the ankyrin repeats as well as for the different contacts that are present in the canonical contact map. Here we describe how frustration patterns are distributed on the structures of this protein family and how it is related with other structural and sequence measures that were calculated over the dataset. Results. We have analyzed the energetical and frustration patterns in the ankyrin repeat protein structures. Natural ankryin proteins are composed of three different populations of repeats that differ in their burial interaction energies that is also reflected in differential sequence signatures and secondary structure composition. We found that these molecules have frustration hotspots that are localized at the insertions and at binding sites for other partners as well as in the terminal repeats. When quantifying the degree of conservation of the frustrated states at the level of canonical positions in the ankyrin repeats we observed that those positions that are conserved correspond to positions where the sequence is also conserved. Moreover, when the frustrated state is conserved, it corresponds to the minimally frustrated one, i.e, “the more consensus an ankyrin protein is, the more foldable it is”. These positions that have high conservation of the frustrated state at the single residue level are connected by a minimally frustrated interaction network. We speculate that, at least in ankyrin repeats, consensus sequences stabilize the overall fold by maximying the energetic gap between the folded and unfolded states establishing a network of minimally frustrated interactions both within and between adjacent repeats. The potential implicancies of these findings for the dynamical protepties of these molecules will also be discussed. References 1. R. Gonzalo Parra , Rocío Espada , Ignacio E. Sanchez , Manfred J. Sippl , and Diego U. Ferreiro. “Detecting repetitions and periodicities in proteins by tiling the structural space .” J. Phys. Chem. B. DOI: 10.1021/jp402105j. Publication Date (Web): 11 Jun 2013. Structure prediction and protein function Poster Session – Submission 14 Bioinformatics for Biomolecules learning Llaraí Carolina Gaviria-González, María Teresa Ortiz-Melo, Josefina Vázquez-Medrano, María del Socorro Sánchez-Correa Carrera de Biología, Facultad de Estudios Superiores Iztacala, UNAM. Tlalnepantla, Edo. de México C.P. 54090, México. Background. In general, it is considered a difficulty for teachers of scientific careers to teach abstract topics, such as the spatial structure and behavior of molecules, supramolecular assemblies, or the importance and the relationship between structure and function of biomolecules. We consider Bioinformatics as an excellent tool to improve teaching of the above issues in science careers such as biology. So we decided to implement in the course of Biomolecules, which is part of the Biology curriculum at Facultad de Estudios Superiores Iztacala of UNAM, a Bioinformatics lab manual, in which the use of bioinformatic tools is intended to facilitate the comprehension of biomolecules and the approach to bioinformatics applications by Biology students. 63 Structure prediction and protein function ID:14 Poster Session Materials and methods. The activities using bioinformatics tools were developed previously for each one of the following themes: Functional groups, proteins, carbohydrates, lipids, nucleic acids, and some others such as secondary metabolites or primer design. Then the lab manual was used within students enrolled in Biomolecules courses, which were named the experimental group. The results where then compared between this group and another one of students at the same course that didn’t use the manual (called control group), with a test that included theorical questions about biomolecules structure and appreciation questions about the manual. The average of the grades at each theme were obtained as well as the global average of the test. A chi-square test was performed to the obtained data. Figure 1: Grades obtained by control and experimental group. *Illustrate significant statistical differences. Results. Although there is not significant statistical difference between the average grade obtained in theorical questions of functional groups, proteins, carbohydrates or lipids, there is a significant statistical difference at nucleic acids and global grade between the experimental and the control group, it seems to be a tendency of the former to obtain greater grades in theorical questions (see Figure 1). Besides, we include an option that was “I don’t know the answer”, which was more frequently chosed by the control group in comparison with the experimental group, as seen at Figure 1. Finally, at least 80% of the surveyed students think that these activities may facilitate biomolecules learning and consider them relevant for their proffesional development. Conclusions. The use of Bioinformatics tools in teaching may contribute to Biomolecules learning. Acknowledgments. We thank to the Programa de Apoyo a Proyectos para la Innovación y Mejoramiento de la Enseñanza (PAPIME) of the Dirección General de Asuntos del Personal Académico (DGAPA) of the UNAM for supporting this project. (PAPIME PE206112). referencias 1. Carbone A, Gromow M, Kepes F, Westhof F: Folding and self-assembly of biological macromolecules. World Scientific publishing Co. Pte. Ltd. Singapore. 2004. 2. Eiden L E: A two-way bioinformatics street. Science 2004, 306: 1437. 3. Martin F, Scholoissnig S: Bioinformatics and molecular modeling in glycobiology. Cellular and Molecular Life Sciences. 2010, 67: 2749-2772. 64 Structure prediction and protein function ID:26 Poster Session 4. Schwedw T, Peitsch M: Computational Structural Biology. Methods and Applications. World Scientific publishing Co. Pte. Ltd. Singapur. 2008. Structure prediction and protein function Poster Session – Submission 22 Understanding Mycobacterium tuberculosis Cyclopropane methyltransferases (CMAs) structure-function relationship Lucas Defelipe, Marcelo A. Marti and Adrian G. Turjanski Departamento de Quimica Biológica, Universidad de Buenos Aires Abstract. A serious concern in Mycobacterium tuberculosis treatment is the emergence of MDR (Multi-drug resistant) and XDR (Extensively drug-resistance) stains. Choosing new relevant drug targets is prioritary to fight MDR and XDR TB. We developed TuberQ [1] a protein druggability datatabase to highlight relevant drug targets based on structural druggability and gene expression experiments done in different environments mimicking the stresses TB faces during infection[2,3]. Cyclopropane methyltransferases (CMAs) are shown as a potential targets. Mycobacterium tuberculosis Cyclopropane methyltransferases (CMAs) are responsible for the modification of mycolic acids by the transfer of a methyl group to the olefin, making CMAs attractive drug targets. This protein family is composed of 9 proteins (mmaA1-4, cmaA1-2, pcaA, uma A and ufaA). Mycolic acids are long chain (60-80 carbon atoms) modified fatty acids which are major components of mycobacterial cell wall[4] and these modifications modulate properties of the cell wall (such as drug permeability) and the immune response of the host[5]. Although CMAs are methyltrasnferases with the typical Rossman fold [6] they are not only able to cyclopropinate but also to introduce keto and metoxy modifications to the same olefin. In the present work we have used comparative modelling and molecular dynamics simulations to understand the different reaction mechanisms present in this protein family. We also performed a comparative druggability study of the family to develop a phamacophore model to aid in the search of drug-like compounds with the ability to bind to several members of CMAs (cmaA1, cmaA2, pcaA and umaA), an approach known as polypharmacology. References 1. 2. 3. 4. 5. 6. Radusky L, Defelipe LA, Lanzarotti E, Luque J, Barril X, et al. (2014) Database (Oxford) 2014: bau035. Sassetti CM, Rubin EJ (2003) G. Proc Natl Acad Sci U S A 100: 12989–12994. Voskuil MI, Visconti KC, Schoolnik GK (2004) Tuberculosis (Edinb) 84: 218–227. Marrakchi H, Lanéelle M-A, Daffé M (2014) Chem Biol 21: 67–85. Available: Accessed 21 March 2014. Barkan D, Hedhli D, Yan H-G, Huygen K, Glickman MS (2012) Infect Immun 80: 1958–1968. Loenen W a M (2006) Biochem Soc Trans 34: 330–333. Structure prediction and protein function Poster Session – Submission 26 Psedocounts based on BLOSUM frequencies improves contact prediction using mutual information Diego Javier Zea1,2 , Diego Anfossi, Cristina Marino Buslje1 , Morten Nielsen3,4 1 Fundación Instituto Leloir, C1405BWE, Capital Federal, Buenos Aires, Argentina 2 Departamento de Ciencia y Tecnología, Univer- sidad Nacional de Quilmes, B1876BXD, Bernal, Buenos Aires, Argentina 3 Instituto de Investigaciones Biotecnológicas, Universidad Nacional de San Martín, B1650HMP, San Martín, Buenos Aires, Argentina 4 Center for Biological Sequence Analysis, Department of Systems Biology, The Technical university of Denmark, DK2800, Lyngby, Denmark [email protected] Background. Mutual information calculation (MI), from information theory, uses a Multiple Sequence Alignment (MSA) of homologous proteins to predict coevolving sites [1]. A major problem for MI calculation is the number of sequences of the alignment. MSAs with low number of sequences, which are very frequent, will have no observations 65 Structure prediction and protein function ID:26 Poster Session for every possible pair of amino acids. In a previous study, we found that a correction for low count was useful in those cases [2]. However, methods predictive performance, measure as residue contacts, decreases with less than 400 clusters of of non redundant sequences (at 62% of identity). Here we tested the use of pseudo frequencies based on BLOSUM matrix, as described in Altschul et.al. [3]. This is a more realistic pseudo count strategy, which can improve the MI calculation performance for protein families with low number of sequences. Materials and methods. Method performance is measure as the AUC and AUC 0.1 of ROC curves for predicting contact residues (beta carbons at 8 angstroms, alpha for glycines). The dataset comprises 150 protein families with a range of 10 to 1000 clusters of proteins at 62% of identity. We used the algorithm described on Marino Buslje et. al. [2] where a pseudocount is fixed on a user defined value (recommended to be 0.05). On this new approach, we use pseudocounts based on BLOSUM frequencies. X Gab = pcd · BLOSUM62(a|c) · BLOSUM62(b|d) cd αpab + βGab α+β Where p is the observed frequency of a pair, and G the pseudo frequency estimated by conditional probability BLOSUM62 matrix by the observed frequencies p. alpha is the number of clusters of the MSA and beta is an empiric value for assigning a weight to the pseudocount. Pab = Results and Conclusions. After testing a large number of beta values in a range between 1 and 550, the best performance was obtained for beta close to 10 on a dataset of 150 proteins . On this range, performance increased for alignments with a small number of clusters, and remains similar for large and well populated alignments in comparison with the original method using a fixed pseudo count value. Figure 1 References 1. Martin, LC et al. “Using information theory to search for coevolving residues in proteins.” Bioinformatics 21.22 (2005): 41164124. 2. Buslje, Cristina Marino et al. “Correction for phylogeny, small number of observations and data redundancy improves the identification of coevolving amino acid pairs using mutual information.” Bioinformatics 25.9 (2009): 11251131. 66 Structure prediction and protein function ID:32 Poster Session 3. Altschul, Stephen F et al. “Gapped BLAST and PSIBLAST: a new generation of protein database search programs.” Nucleic acids research 25.17 (1997): 33893402. Structure prediction and protein function Poster Session – Submission 29 Network of residues involved in preserving the conformational diversity of a protein Tadeo E. Saldaño, Gustavo Parisi, and Sebastian Fernández-Alberti niversidad Nacional de Quilmes, Roque Saenz Peña 352, B1876BXD Bernal, Argentina Background. Protein flexibility and dynamics are commonly associated with protein function. Nowadays, it is well established that the functional form of the protein, also known as the native state, is not unique. Pre-existing population of conformers and dynamics landscapes offer a central view to explain protein function. Therefore, we propose to study the dynamically relevant residues responsible of maintaining the dynamism of protein and then maintaining the protein function. We expect to consider these residues as fingerprints of protein function. Using the conformational diversity database (CoDNaS, Conformational Diversity of Native State http://www.codnas.com. ar/index.php), we propose to identify and characterize the network of residues that are involved in preserving the conformational diversity of a protein. The predictions of dynamically important residues serve as promising targets for mutational and functional studies. Materials and methods. Methods The dynamics of the different conformers for each protein are studied using normal mode analysis (NMA). We use methods that allow us to obtain the the collective motion associated to the low-frequency normal modes for a large variety of proteins. The dynamically relevant networks of residues responsible for maintaining protein dynamism are identified by previously detecting the normal modes that contribute the most a specific structural change between a pair of conformers. For this purpose, the vector describing the conformational change is projected on the basis of the normal modes of each conformer and the normal modes that present the maximum overlap are retained. After that, we probe the effect of point mutations of each residue on a conformationally relevant normal mode by calculating the response of the springs connected to it. These evaluations of residue- dependent responses to local perturbations in the elastic network representation of the protein structure will allow us to identify the network of residues that modulate the conformational changes. Several surveys have been carried out to examine the nature of residue interactions. We evaluate the evolutionary conservation, solvent accessible area and secondary structure, of network of residues. Results. The dynamically relevant networks of residues responsible of maintaining protein dynamism have been identified and characterized. We combine the information obtained from methods based on structural and dynamic properties of proteins with information related to their evolutionary conservation. We explore the correlation between ligand-binding residues and the dynamically important residues predicted by our perturbation. Results related to the conformational changes associated to the ligand binding are presented. Structure prediction and protein function Poster Session – Submission 32 In Silico Optimization of Epidermal Growth Factor Receptor Inhibitors Followed by Experimental Evaluation Claudio Cavasotto1 , Martín Lavecchia1 and José Ignacio Borrell2 1 Instituto de Investigación en Biomedicina de Buenos Aires-CONICET-Partner Institute of the Max Planck Society, Argentina 2 Spain Istitut Químic de Sarriá, Universitat Ramón Llull, Spain Abstract. The Epidermal Growth Factor Receptor (EGFR) is part of an extended family of proteins that together control aspects of cell growth and development. It is a validated target for drug discovery, since it is involved in several types of cancer. Starting from a dichlorobenzyl pyridopyrimidine scaffold lead, and following an in silico flexible-ligand/flexible receptor docking-based characterization of its binding mode, a combinatorial virtual library of 67 Structure prediction and protein function ID:33 Poster Session analogs was built, and their binding free energy assessed using the molecular mechanics-generalized born surface area (MM/GB-SA) approach, after performing long molecular dynamics simulations in explicit water. Molecules with better and worse predicted affinity were synthetized and their activity experimentally evaluated, thus obtaining a dibromobenzyl-substituted molecule with improved performance; experimental results were in excellent agreement with theoretical predictions. It is also remarkable that the ranking of the experimental inhibition activity is consistent with our calculations, which shows that binding free energy evaluation using the MM/GB-SA is a valid method for ligand optimization using an in silico-generated combinatorial library of analogs, followed by energy calculation and bioevaluation. Structure prediction and protein function Poster Session – Submission 33 WATCLUST: a tool for improve the design of hidrophilic drugs based on the proteinwater interactions Elias Daniel Lopez1 Diego Gauto2 Ariel A. Petruk2 Victoria G. Dumas1,2 , Juan Pablo Arcon2 Marcelo Adrián Marti1,2 „ Adrian Gustavo Turjanski1,2, 1 2 Departamento de Química Biológica, Facultad de Ciencias Exactas y Naturales, UBA, Buenos Aires, Argentina INQUIMAECONICET, Facultad de Ciencias Exactas y Naturales de la Universidad de Buenos Aires, CE1428EHA, Ciudad Autónoma de Buenos Aires, Argentina Author to whom correspondence should be sent: [email protected] Background. Water play an essential role in the structure and function of proteins. Precisely positioned water molecules participate in many enzymatic reaction mechanisms; solvent reorganization and displacement is a key contributor to the thermodynamics and the process of proteinligand binding, protein folding and protein large scale motions; and water chains actively participate in both proton and electron transfer process. WS are defined as confined space regions adjacent to the protein surface where the probability of finding a water molecule is higher than in the bulk solvent. The strategy used to determine the WS is adapted from our previous works [1,2]. Once determined and characterized the WS can constitute a good thermodynamic description of free energy ligand binding. To elucidate the thermodynamic profile and their potential contribution to ligand binding, a hydration site analysis program WATCLUST was developed. WATCLUST identifies hydration sites from a molecular dynamics simulation trajectory with explicit water molecules. The free energy profile of each hydration site is estimated by computing the enthalpy and entropy of the water molecule occupying a hydration site throughout the simulation. The results of the hydration site analysis can be displayed in VMD. WATClust thus presents an easy, and user friendly, analysis visualization tool to determine the WS and their properties that can be used for all people in the structural bioinformatics field and that also allows to directly transfer this information to the Autodock program, one of the most widely used open source Docking programs, to perform WS biased docking (WSBD)(figure 1).[3] 68 Structure prediction and protein function ID:34 Poster Session Figure 1: A) Screenshot of dialog utilized to select input protein and possible ligand structures, (clusters or WS) and the option for defining the “protein selection”. B) Example of WS results as displayed in the VMD plugin. The VMD viewer window showing the predicted hydration sites in the protein binding site. The hydration sites are shown as small spheres and colored in this example based on their DG values. Acknowledgments. EDL is a ANPCyT doctoral fellow. MAM and AGT are CONICET investigators. This work was partly funded by ANPCyT PICTNo. 20102805. References 1. Lella, S.D., Martí, M.A., Álvarez, R.M.S., Estrin, D.A., Díaz Ricci, J.C:Characterization of the galectin1 carbohydrate recognition domain in terms of solvent occupancy (2007) Journal of Physical Chemistry B, 111 (25), pp. 73607366 2. Gauto, D.F., Di Lella, S., Guardia, C.M.A., Estrin, D.A., Martí, M.A: Carbohydratebinding proteins: Dissecting ligand structures through solvent environment occupancy (2009) Journal of Physical Chemistry B, 113 (25), pp. 87178724. 3. Gauto DF, Petruk AA, Modenutti CP, Blanco JI, Di Lella S, Martí MA: Solvent structure improves docking prediction in lectincarbohydrate complexes. Glycobiology. 2013 Feb;23(2):24158. doi: 10.1093/glycob/cws147 Structure prediction and protein function Poster Session – Submission 34 Interactions between aromatic rings in Protein-drug complexes: a database to survey Esteban Lanzarotti1 , Lucas A. Defelipe1 , Leandro Radusky1 , Marcelo A. Marti1 , Adrian G. Turjanski1 . 1 Departamento de química biológica, Facultad de Ciencias Exactas y Naturales, UBA. The aromatic interactions have been shown to be important in both biological processes and their chemical characteristics1. Also, it has been shown that aromatic interactions form clusters grouping several aromatic rings with an additive energetic nature2. Using a reliable dataset based on PDB, we have performed a statistical analysis of aromatic interactions in the context of protein-drug complexes using planar angle and distances between rings as interactions descriptors. We have found that aromatic interactions between aromatic rings in drugs and rings in proteins found in aromatic residues (PHE, TYR, TRP and HIS) are enriched in pi-stacking conformations compared to aromatic interactions between two rings in residues. Also, in previous work3, we have defined an aromatic cluster as the transitive clousure applied over the underlying relation defined by aromatic interactions studying this groups in residue-residue interacitions. Now, we have extend this aromatic cluster definition over the entire PDB, building a web interface that enables the user to search for protein-ligand complexes provinding a way to rank this entries in terms of its aromatic clusters relevance. 69 Structure prediction and protein function ID:51 Poster Session References 1. Salonen LM, Ellermann M, Diederich F. Aromatic rings in chemical and biological recognition: energetics and structures. Angew Chem Int Ed Engl 2011. 2. Tauer TP, Sherrill CD. Beyond the benzene dimer: an investigation of the additivity of pi-pi interactions. J Phys Chem A 2005. 3. Lanzarotti E, Biekofsky RR, Estrin DA, Marti MA, Turjanski AG. Aromatic-aromatic interactions in proteins: beyond the dimer. J Chem Inf Model 2011. Structure prediction and protein function Poster Session – Submission 51 Analyzing the active and inactive state of EGFR kinase domain by pockets and cavities structural properties comparison Marcia Hasenahuer1 , Yanina Powazniak2 , Guillermo Bramuglia2 , Gustavo Parisi1 and María Silvina Fornasari1 1 Departamento de Ciencia Y Tecnología, Universidad Nacional de Quilmes, Bernal, Buenos Aires, Argentina. 2 Fundación Investigar-Argenomics, Buenos Aires, Argentina Background. EGFR (Epidermal Growth Factor Receptor) is one of the main tumor markers in many cancer types[1]. Several single amino acid substitutions (SASs) in this protein are present in different cancers. Most of these SASs are characterized as “activating”, due to the stabilization of the conformer required to drive the phosphorylation (active form). EGFR is a trans-membrane protein, formed by an extracellular, a trans-membrane and a cytoplasmic regions. The latter has a juxtamembrane, a Tyr-kinase and a C-terminal intrinsically disordered tail (C-tail) regions. Autophosphorylation on different C-tail tyrosine sites triggers signals for different cellular pathways, involved in cell growth and proliferation[2, and references therein]. Most interaction sites of proteins with their ligands and substrates are located in cavities or pockets on protein surface[3, and references therein]. The goal of this work is to understand the structural and physicochemical characteristics of EGFR kinase pockets that differentiate the active and inactive conformations and to try to elucidate the effect of SASs in those pockets, that could trigger the unregulated kinase activity of this protein. Particularly, the analysis includes the effect of not previously reported SASs observed in Argentinean cancer affected patients. Methods. Pockets and cavities calculations were performed considering active, inactive, monomeric and dimeric conformers of human EGFR Tyr-kinase region with fPocket [http://fpocket.sourceforge.net/]. Different conformer coordinates were taken from PDB [http://pdb.org/pdb/home/home.do] and CoDNAS [http://www.codnas.com. ar/about.php]. The characteristics of the pockets related to the catalytic site of the kinase were analyzed in the conformers using per-site RMSD and different pocket descriptors as volume, polar and apolar surface area, charge, hydrophobicity among others, considering also structures with mutations. Clustering methods were applied to compare this information using statistical packages of R [http://www.r-project.org/]. Further, Argentinean patient and others compiled from COSMIC database SASs [http://cancer.sanger.ac.uk/cancergenome/projects/ cosmic/] were mapped onto the structures, to analyze SASs-pockets relationship. Conclusions. The main pockets related to the active site of kinase found in this work, either contain or are in close contact with 70% of all the 195 positions with SASs related to cancer in EGFR cytoplasmic region. Reorganization of pockets could favor the binding of the C-tail to be phosphorylated and could affect the affinity for ATP or anti-cancer drugs. Disease related SASs could affect the dynamics and shape of pockets, promoting a deregulated EGFR activity. Co-localization of most of cancer related sites in pockets could be important to improve our understanding in the effects of different EGFR SASs and to include this information in the development of predictive computational tools. References 1. Salomon DS, Brandt R, Ciardiello F, Normanno N. 1995. Epidermal growth factor-related peptides and their receptors in human malignancies. Crit Rev Oncol Hematol 1995, 19:183-232. 70 Structure prediction and protein function ID:56 Poster Session 2. Levitzki A and Gazit A. Tyrosine kinase inhibition: an approach to drug development. Science 24,267(5205):1782-8. 3. Gora, A., Brezovsky, J., & Damborsky, J. Gates of Enzymes. Chemical Reviews 2013,113(8):5871–5923 Structure prediction and protein function Poster Session – Submission 56 Performance analysis of a comparative protein-DNA structure modeling pipeline with MODELLER versus a standard protocol with 3DNA Ignacio Ibarra, Francisco Melo Laboratorio de Bioinformática. Pontificia Universidad Católica de Chile Abstract. Structural information can be potentially applied in the prediction of a binding event between proteins and DNA. This task has been addressed by many groups with variable results that depend on the theoretical approximation and/or the testing metrics used. A recurrent methodological step for protein-DNA modeling is the replacement of a set of DNA sequences into the same template, using standard software for DNA bases replacement. In this work, we have tested a new Comparative modeling pipeline for protein-DNA modeling, based on the MODELLER software suite. A set of 18 DNA geometrical restraints extracted from a non-redundant set of protein-DNA complex structures were used to model and minimize the computational binding of MarA, a monomeric bacterial transcription factor, to an ensemble of DNA sequences, using the MarA-DNA complex structure as a reference template. 34 MarA binding sites were used as a testing set to evaluate our pipeline performance against a standard protocol based on DNA bases replacement with 3DNA. In both approaches, different statistical potentials were applied for evaluation and ranking of DNA binding sites, with respect to an ensemble of negative DNA sequences. The results obtained in this work propose the promissory use of this new comparative protein-DNA complex structure modeling and evaluation protocol as a suitable and general tool for the in silico prediction of protein-DNA binding specificity. 71 Index Agudelo,WA, 59 Albarraci,VH , 28 Alonso,LG, 6 Amadío,A, 46, 47 Anfossi,D, 65 Angelone,L, 15 Aptekmann, A, 47 Arab Cohen,D, 51 Arcon,JP, 68 Assis, J , 42 Ballarin,V, 21 Banchero,M, 34 Belfiorem,C, 28 Benalcázar,M, 21 Benintende,B, 47 Benintende,G, 46 Berenstein,A, 32 Berenstein,AJ, 34 Berenstein,JA, 16 Berretta,M, 46, 47 Blundell,TL, 2 Boechi,L, 12 Borrell,JI, 67 Bracco,M, 44 Bramuglia,G, 70 Brun,M, 21 Bulacio,P, 15 Bustamante,J, 12 Buus,S, 9 Campillo,NE, 2 Capriotti,C, 60 Carballido,JA , 35 Carbonetto,MB, 54 Cascales,J, 44 Cecchini,RL, 38 Chemes,LB, 13 Chernomoretz,A, 34 Claudio Cavasotto,C, 67 Comas,D, 21 Corva,P, 21 Cravero,F, 38, 56 Cucher, M, 42 Díaz,MF, 56 da Fonseca,M, 37 Daurelio,L, 18 De Los Ríos,P, 49 de Sousa Serro, M, 43 Defelipe,A, 69 Defelipe,L, 65 Demey,JR, 53 Demey-Zambrano,J, 53 Di Rienzo,JA, 53 Dumas,VG, 68 Dussaut,JS, 38 Eizaguirre,JI, 49 Elgoyhen,B, 48 Espada,R, 13, 62 Esteban,L, 18 Esteban.L, 50 Estrín,DA, 12 Ezpeleta,J, 15 Farias,ME, 28 Fernández,E, 33, 51 Fernández-Alberti,S, 6, 12, 67 Ferreiro,DU, 6, 30, 62 Ferrer-Sueta,G, 59 Ferreyra,N, 51 Fornasari,MS, 70 Franchini,L, 48 Fresno,C, 51 Gallo,CA , 35 Gauto,D, 68 Gaviria-González,LM, 63 Germán Mato,G, 41 Glavina,J, 13 Gomes Araújo, F , 42 González Lebrero,MC, 7, 59 Gonzalo Cogno,S, 19, 41 Gorriti,M, 28 Gottlieb,AM, 44 Grosso,M, 12 Hansen,AM, 9 Hasenahuer,M, 70 Hittinger,C, 49 Ibarra,I, 71 72 Index Iserte,J, 59 Juritz,E, 20 Kalstein,A, 12 Kamenetzky, L, 42 Koile, D, 43 Kovalevski,L, 24 Krick,T, 6 Kropff,E, 19 Kurth,D, 28 Lanzarotti,E, 69 Lavecchia,M, 67 Libkind,D, 49 Llera,A, 33 Lopes,C, 49 Lopez,ED, 68 Méndez,NA, 28 Macat,PB, 24 Magariños,MP, 16 Magni,C, 50 Maguitman,AG, 38 Maldonado, LL , 42 Mancini,E, 54 Manta,B, 59 Marcatili,P, 7 Marino Buslje,C, 3, 32, 34, 59, 65 Martínez,JL, 31 Marti,MA, 12, 65, 68, 69 Martin, OA , 61 Martinelli,R, 18 Martinez,MJ, 56 Martini, MF , 62 Melo Ledermann,F, 4 Melo,F, 71 Merino,G, 33 Meschino,G, 21 Montemurro,M, 19 Monzon,M, 60 Murillo,J, 15 Nadra, A, 47 Natalia Macchiaroli, 42 Navas,L, 47 Navas,N, 46 Nielsen,M, 3, 9, 65 Nizzo,GG, 50 Oliveira, G, 42 Oliver,J, 33 Ortiz,M, 46, 47 Ortiz-Melo,MT, 63 Pagnuco,I, 21 Parisi,G, 60, 67, 70 Parra,GR, 62 Peris,D, 49 Petruk,AA, 68 Pickholz, M, 62 Pisciottano,F, 48 Poggio,L, 44 Ponzoni,I, 35, 38, 56 Powazniak,Y, 70 Prada,F, 33 Prato,P, 51 Pratta,GR, 24 Quaglino,M, 24 Ré,MA , 31 Radusky,L, 12, 69 Ramírez, PG, 11, 61 Rasmussen,M, 9 Reinert,MD, 23, 54 Revale,S, 54 Rodríguez,ME, 49 Rodriguez de la Vega,RR, 13 Roitberg,A, 12 Rosenzvit, M, 42 Sánchez,IE, 6, 13, 28, 30 Sánchez-Correa,MdS, 63 Saldaño, 67 Samengo,I, 19, 37 Santos,J, 59 Sauka,D, 47 Sendoya,JM, 33 Shub,DA, 6 Shub,M, 6 Simonetti,FL, 34 Sippl,M, 2 Soto,AS, 56 Spetale,FE, 15 Tapia,T, 15 ten Have,A, 12 Teppa,E, 32 Turjanski,AG, 65, 68, 69 73 Index Vázquez-Medrano,J, 63 Vazquez,DS, 59 Vazquez,GE, 6, 56 Vazquez,MP, 54 Verstraete,N, 6, 30, 62 Vicente,NB, 33 Vicente-Villardón,JL, 53 Vila, JA, 61 Wallace, D, 43 Wood,I , 62 Yankilevich, P, 43 Zandomeni,R, 46, 47 Zea,DJ, 32, 65 Zingaretti,ML, 51, 53 74