Untitled

Transcription

Untitled
Sica, Mauricio P.
Proceedings of the VCAB2 C por A2 B2 C se distribuye bajo una Licencia Creative Commons Atribución-NoComercialSinDerivar 4.0 Internacional. Basada en una obra en http://www.a2b2c.org/Proceedings_A2B2C_2014_Tablet.
pdf.
Fecha de catalogación: 10/09/2014
Diseño de tapa:
Sica,MP
Diagramación:
Sica,MP
VCAB2 C
Sponsors
ii
VCAB2 C
5ta Conferencia Argentina de Bioinformática y
Biología Computacional (VCAB2C)
5th Argentinian Conference on Bioinformatics and
Computational Biology
Program Committee
Dr. Morten Nielsen (President) Center for Biological Sequence Analysis, Department of Systems Biology, The
Technical University of Denmark - Biotechnological Research Institute, National University of San Martín - San
Martín, Buenos Aires, Argentina.
Dr. Gustavo Parisi Structural Bioinformatics Group (SBG), Department of Science and Technology, National University of Quilmes - Bernal, Buenos Aires, Argentina.
Dr. Mauricio Sica Bioenergy Laboratory (IEDS-CONICET), Atomic Center Bariloche (CAB) - San Carlos de Bariloche, Río Negro, Argentina.
Dr. Ignacio Sánchez Protein Physiology Laboratory, Exact and Natural Sciences Faculty, University of Buenos Aires
- Buenos Aires, Argentina.
Dr. Patricio Yankilevich Institute for Research in Biomedicine of Buenos Aires (IBioBA) CONICET, Institute of
the Max Planck Society - Buenos Aires, Argentina.
Dr. Juan Morales Ecotone Laboratory (INIBIOMA-CONICET), National University of Comahue - San Carlos de
Bariloche, Río Negro, Argentina.
Dr. Sebastián Bouzat Atomic Center Bariloche (CAB), National Atomic Energy Commission (CNEA) - San Carlos
de Bariloche, Río Negro, Argentina.
Steering Committee
Dr. Mauricio Sica Bioenergy Laboratory, IEDS, CONICET, Atomic Center Bariloche (CAB) - San Carlos de Bariloche, Río Negro, Argentina.
Dra. Belén Prados Environmental Sciences Laboratory, National University of Río Negro - Río Negro, Argentina.
Dra. Carolina Bagnato National University of Río Negro - San Carlos de Bariloche, Rio Negro, Argentina.
Dr. Sebastián Bouzat Atomic Center Bariloche (CAB), National Atomic Energy Commission (CNEA) - San Carlos
de Bariloche, Río Negro, Argentina.
Dr. Gabriel Paissan Department of Computational Mechanics, Atomic Center Bariloche (CAB) - San Carlos de
Bariloche, Rio Negro, Argentina.
iii
VCAB2 C
Dr. Ignacio Ponzoni Laboratory for Research and Development in Scientific Computing (LIDeCC), Department of
Computer Science and Engineering, National University of South - Bahía Blanca, Argentina.
Dra. Cristina Marino Buslje Structural Bioinformatics Laboratory, Institute of Biochemical Research in Buenos
Aires (IIBBA), Leloir Institute Foundation - Buenos Aires, Argentina.
Dr.Gustavo Parisi Structural Bioinformatics Group (SBG), Department of Science and Technology, National University of Quilmes - Bernal, Buenos Aires, Argentina.
A2 B2 C Executive Commission
President
Vicepresident
Secretary
Treasure
Board memberes
Substitute Board memberes
Audit
Dr. Ignacio Ponzoni
Dr. Gustavo Parisi
Dra. Elizabeth Tapia
Dr. Fernán Agüero
Dra. Cristina Marino Buslje
Dr. Arjen ten Have
Dr. Marcel Brun
Dra. Silvina Fornasari
Dr. Ariel Chernomorez
iv
VCAB2 C
Palabras preliminares
Nuestra joven sociedad surgió recientemente de la necesidad de sus fundadores de dar impulso a un área desatendida
en nuestro país, creando un espacio de cooperación, intercambio e identificación entre sus participantes. Ante la
tarea de organizar la Conferencia 2014 nos propusimos continuar el proceso de consolidación de la identidad de esta
nueva asociación.
La pequeña ciudad patagónica de San Carlos de Bariloche, enclavada en un Parque Nacional privilegiado por sus recursos naturales, es considerada un polo científico y tecnológico de excelencia. Aquí tienen sus sedes tres universidades
nacionales, institutos de CONICET y CNEA y empresas de base tecnológica como INVAP y Satellogic. Bariloche
exporta tecnología nuclear y de telecomunicaciones a todo el mundo, empleando una importante parte de sus ciento
cincuenta mil pobladores. Sin embargo, como en otras regiones de nuestro país y especialmente en la Patagonia, la
historia de Bariloche está marcada por el aislamiento geográfico y político. Por lo tanto esta reunión constituye un
paso más para ampliar e integrar la comunidad científica nacional y fortalecer su vinculación internacional.
En los últimos años, la actividad científica en nuestro país vive un período de revitalización. Las políticas científicas se
estabilizan con el consenso de la comunidad de investigadores, la población general revaloriza el papel de una ciencia
nacional y el bloque geopolítico regional abre perspectivas de desarrollo autónomo. En este contexto, las sociedades
científicas constituyen el espacio natural para que los investigadores canalicen conjuntamente acciones concretas para
consolidad estos cambios y fortalecer el desarrollo científico nacional.
Es nuestro deseo que esta conferencia enriquezca el desarrollo científico de los participantes, que promueva el encuentro cordial entre colegas y amigos y facilite el intercambio de conocimientos y experiencias entre grupos. Esperamos
despertar el entusiasmo de los jóvenes, fomentando su curiosidad y aptitud para el intercambio con sus colegas. Pero
también esperamos que estos días en Bariloche constituyan una experiencia integral fructífera que ayude a pensar
nuestra historia y plantearnos los desafíos futuros de esta joven comunidad científica.
Comisión Organizadora
VCAB2 C 2014
Forewords
Our young association, was born recently out of the necessity of its founders to encourage a discipline disregarded in
our country, creating an environment for the cooperation, mutual exchange and identification between its members.
One of our purposes in the organization of the Conference 2014 consists of continuing the consolidation of the
identity of our novel association.
The small Patagonian city of San Carlos de Bariloche, located in a National Park renown for its natural resources,
is considered as a center of excellence for the science and technology. Here, three National Universities, institutes
of CONICET and CNEA, and technological companies as INVAP and Satellogic have their seat. Bariloche exports
technology on nuclear energy and telecommunications, employing an important part of its 150 thousand inhabitants.
But, as in other regions of our country and particularly in Patagonia, the geographical and political isolation has
left its mark in the history of Bariloche. Thus, this Conference is a further step in the process of expanding and
integrating the national scientific community and strengthening its international links.
In this recent years, the scientific agenda experiences a period of revitalization in our country. The policies on this
field grow on stability with the consensus of the community of researchers, the general population recognizes the
value of the science for our Nation and the geopolitical alliances in the region opens opportunities for an autonomous
development. In this scenario, the scientific societies constitutes a natural means for the researchers to channel
combined actions to consolidate this changes and reinforce the national scientific development.
It is our wish that this conference enrich the scientific development of the participants, promote the meeting between
v
VCAB2 C
colleges and friends, and facilitate the exchange of knowledge and experiences. We also hope to arise the enthusiasm
of young researchers, enlivening their curiosity and aptitude for sharing and exchanging with their colleges. But we
also wish that these days in Bariloche be a fruitful and integral experience to think about the history and future
challenges of our young scientific community.
Steering Committee
VCAB2 C 2014
vi
Contents
Page
Program Committee
Steering Committe
Executive Commission
Forewords
Program
Main Lectures
iii
iii
iv
v
ix
1
Tom L Blundell: Proteomes, Structural Biology and Drug Discovery:
Visualization, Analysis and Molecular Modeling. Monday 22, 10:30AM.
Nuria E. Campillo:
4:30PM.
This thing called Cheminformatics. Monday 22,
2
2
Manfred Sippl: Buena Vista – a grand view on protein folds and folding.. Tuesday 23, 11:00AM.
2
Morten Nielsen: Algorithms in bioinformatics: Simple solutions to complex problems. Tuesday 23, 5:30PM.
3
Cristina Marino Buslje: Activating Mutations Cluster in the ’Molecular Brake’ Regions of Protein Kinases. Implications for Driver Mutation
Prediction. Wednesday 24, 12:00AM.
3
Francisco Melo Ledermann: Towards a better understanding of the key
molecular determinants that mediate protein-DNA recognition. Wednesday 24, 2:00PM.
4
Lectures
5
Ignacio Sanchez: Aminoacid metabolism conflicts with protein diversity. Monday 22, 12:00AM.
6
Gustavo E. Vazquez: The problem of Feature Selection in Cheminformatics: How can visual analytics help us?. Monday 22, 5:30PM.
6
Sebastian Fernández-Alberti: Collective vibrations and key residues associated to conformational selection upon ligand binding. Tuesday 23,
12:00AM.
6
Mariano C. González Lebrero: Expanding the boundaries of the quantumclassical simulations using GPUs and electronic dynamics.. Tuesday 23,
2:00PM.
7
Paolo Marcatili: High-throughput identification of antigens by metatranscriptomics and peptide chip technology. Tuesday 23, 5:30PM.
7
Oral Sessions
8
vii
VCAB2 C
Session 1: Structure prediction & protein function – Proteomics (Monday
22, 2:00PM)
Session 2: Sequence analysis – Cheminformatics (Tuesday 23, 9:00AM)
Session 3: Systems Biology & Networks (Tuesday 23, 3:00PM)
Session 4: Genomics, functional genomics – Proteomics & functional
proteomics (Wednesday 24, 9:30AM)
Poster Session
Sequence analysis
System Biology and Networks
Genome Annotation and Organization
Evolution, phylogenetics and comparative genomics
Genomics, functional genomics and metagenomics
Metabolomics and Cheminformatics
Proteomics and functional proteomics
Structure prediction and protein function
Index
9
13
18
20
27
28
35
42
47
51
56
59
61
72
viii
VCAB2 C
Conference Program
Monday 22
9:00
10:00
10:30
11:30
12:00
12:45
14:00
16:00
16:30
17:30
18:30
Registration
Open Ceremony
Opening Lecture: Sir Tom Blundell
Coffee break
Lecture: Ignacio Sánchez
Lunch time
Oral session 1: Structure Prediction & Proteomics
Coffee break
Main Lecture: Nuria Campillo
Lecture: Gustavo Vázquez
Poster Session
9:00
10:30
11:00
12:00
12:45
14:00
15:00
16:00
16:30
17:30
18:30
Tuesday 23
Oral session 2: Sequence Analysis & Cheminformatics
Coffee break
Main Lecture: Manfred Sippl
Lecture: Sebastián Fernandez-Alberti
Lunch time
Lecture: Mariano González Lebrero
Oral session 3: Systems Biology
Coffee break
Main Lecture: Morten Nielsen
Lecture: Paolo Marcatili
Poster Session
9:30
11:30
12:00
12:45
14:00
15:00
15:30
Wednesday 24
Oral session 4: Genomics and Proteomics
Coffee break
Main Lecture: Cristina Marino-Buslje
Lunch time
Closing Lecture: Francisco Melo
Poster Prizes
Closing Ceremony and coffee break
ix
Main Lectures
Main Lectures
Proteomes, Structural Biology and Drug Discovery: Visualization, Analysis and
Molecular Modeling
Tom L Blundell
Department of Biochemistry, University of Cambridge,Tennis Court Road, Cambridge CB2 1GA
My talk will focus on the importance of understanding the structures of proteins and and the analysis of multiprotein
assemblies in order to understand their central roles in cell regulation. I will describe the development of software that
allows modelling and visualisation of the proteome of humans and their pathogens. I will discuss the increasing interest
in targeting protein-protein interfaces of multiprotein assemblies in the design of chemical tools and therapeutic agents.
Evidence is accumulating that such an approach will offer greater opportunities in improving specificity and selectivity
compared to targeting active sites of proteases, protein kinases and other enzymes involved in post-translational
modification. However, at the same time they pose new challenges, particularly because the protein-protein interfaces
tend to be less ligandable than active sites.
This thing called Cheminformatics
Nuria E. Campillo
Centro de Investigaciones Biológicas (CIB-CSIC) - Ramiro de Maeztu, 28040-Madrid-Spain
Cheminformatics is the use of computer and informational techniques applied to a range of problems in the field of
chemistry. Specifically in this talk we will look at the application of cheminformatics in drug development.
A brief introduction about the cheminformatics will give us the way to see different applications of its use in two of our
ongoing projects. In the first of them we use cheminformatic tools to develop a new strategy based on the design of
multi-targeted drugs to treat AD. This strategy is based on the design of chemical compounds capable of interacting
with multiple targets that are known to be involved in some aspects related to the development of this disease,
such as cholinergic deficit and aggregation of β-amyloid peptide. The targets considered in this project are CB2R
(cannabinoid system) and BuChE (cholinergic system). The second project deals with the development of neural
network models for the prediction of blood-brain-barrier passage and human intestinal absorption. These models have
been published on the EURL ECVAM’s DB-ALM website as in silico protocol to use as alternative method.
Buena Vista – a grand view on protein folds and folding.
Manfred Sippl
C.A.M.E. - Center of Applied Molecular Engineering
University of Salzburg - Department of Molecular Biology - Division of Structural Biology & Bioinformatics
With the massive increase in the number of solved protein structures we begin to see more clearly how new protein folds
arise from old templates. Proteins evolve as molecular complexes as opposed to single chain entities. Exploration of
the manifold phylogenetic and functional relations among molecular complexes require specific tools for fast retrieval
and visualization of structural matches. The problems involved challenge some fundamental issues in bioinformatics.
We discuss current challenges and solutions.
References
• Sippl, M.J. & Wiederstein, M., Detection of spatial correlations in protein structures and molecular complexes. Structure
Vol. 20, pp. 718-728 (2012)
• Wiederstein, M., Gruber, M., Frank, K., Melo, F. & Sippl, M.J., Structure-based characterization of multiprotein complexes. Structure Vol. 22(7), pp. 1063-1070 (2014)
• Sippl, M.J., On distance and similarity in fold space. Bioinformatics Vol. 24 (6) , pp. 872-873 (2008)
2
Main Lectures
Web-services
Structure Search (TopSearch): https://topsearch.services.came.sbg.ac.at/
Protein Structure Analysis (Prosa): https://prosa.services.came.sbg.ac.at/prosa.php
Algorithms in bioinformatics: Simple solutions to complex problems
Morten Nielsen
Associate Professor
Center for Biological Sequence Analysis, The Technical University of Denmark, Denmark
Instituto de Investigaciones Biotecnológicas, Universidad Nacional de San Martín, San Martín, Buenos Aires, Argentina
e-mail: [email protected]
Data mining and machine learning are two central areas of bioinformatics. During the last decades, we have in
my group developed a large panel of machine learning methods suitable for data mining and pattern recognition in
biological data. Most of the methods are hybrids of standard machine learning methods including linear regression,
Gibbs sampling, and artificial neural networks.
Although very simple, these methods have proven highly accurate when it comes to identification of patterns in
complex biological data. In my presentation, I will describe the background of some of these methods, illustrate their
functionality on biological data, and outline areas where I believe they could be complemented by expanding into novel
areas of machine learning such as Deep Learning or by making novel machine learning hybrids such as artificial neural
network guided Gibbs Clustering.
Activating Mutations Cluster in the ’Molecular Brake’ Regions of Protein Kinases.
Implications for Driver Mutation Prediction
Cristina Marino Buslje
Bioinformatics Unit, Fundación Instituto Leloir, Capital Federal, Argentina
Mutations leading to activation of proto-oncogenic protein kinases (PKs) are a type of drivers crucial for understanding
tumorogenesis and as targets for anti-tumor drugs. However, bioinformatics tools so far developed to differentiate
driver mutations, typically based on conservation considerations, systematically fail to predict activating mutations
in PKs. Here we present the first comprehensive analysis of the 407 activating mutations described in the literature,
which affect 41 PKs. Unexpectedly, we found that these mutations do not associate with conserved positions and
do not directly affect ATP binding or catalytic residues. Instead, they cluster around three segments that have been
demonstrated to act, in some PKs, as "molecular brakes" of the kinase activity. This finding led us to hypothesize
that an auto inhibitory mechanism mediated by such "brakes" is present in all PKs and that the majority of activating
mutations act by releasing it. Our results also demonstrate that activating mutations of PKs constitute a distinct
group of drivers and that specific bioinformatics tools are needed to identify them in the numerous cancer sequencing
projects currently underway. The clustering in three segments should represent the starting point of such tools, a
hypothesis that we tested by identifying two somatic mutations in EPHA7 that might be functionally relevant. This
article is protected by copyright. All rights reserved.
3
Main Lectures
Towards a better understanding of the key molecular determinants that mediate
protein-DNA recognition
Francisco Melo Ledermann
Faculty of Biological Sciences
Pontificia Universidad Católica de Chile
email address: fmelo at bio.puc.cl
In this talk, a general description of several bioinformatics tools recently developed in our lab to assist the study
of protein-DNA interactions will be provided. This include a database of protein-DNA interfaces, knowledge-based
potentials to describe protein-DNA interactions, a software for the fullatom 3D modeling of duplex DNA and proteinDNA complexes and a PyMol plugin to visualize the binding interface of protein-DNA complexes. Additionaly, some
preliminary results obtained from ongoing research on the validation of these bioinformatic tools and in the analysis
of experimental data involving protein-DNA complex structures, protein-DNA binding assays and genomic data will
be shown.
4
Lectures
Lectures
Aminoacid metabolism conflicts with protein diversity
Ignacio Sanchez
Krick T, Verstraete N, Alonso LG, Shub DA, Ferreiro DU, Shub M, Sánchez IE.
Pab II, 4th floor, Lab QB-9, Facultad de Ciencias Exactas y Naturales
Universidad de Buenos Aires - Buenos Aires, Argentina
The twenty protein coding amino acids are found in proteomes with different relative abundances. The most abundant
amino acid, leucine, is nearly an order of magnitude more prevalent than the least abundant amino acid, cysteine.
Amino acid metabolic costs differ similarly, constraining their incorporation into proteins. On the other hand, a
diverse set of protein sequences is necessary to build functional proteomes. Here we present a simple model for
a cost-diversity trade-off postulating that natural proteomes minimize amino acid metabolic flux while maximizing
sequence entropy. The model explains the relative abundances of amino acids across a diverse set of proteomes. We
found that the data is remarkably well explained when the cost function accounts for amino acid chemical decay.
More than one hundred organisms reach comparable solutions to the trade-off by different combinations of proteome
cost and sequence diversity. Quantifying the interplay between proteome size and entropy shows that proteomes can
get optimally large and diverse.
The problem of Feature Selection in Cheminformatics: How can visual analytics
help us?
Gustavo E. Vazquez
Facultad de Ingeniería y Tecnologías - Universidad Católica del Uruguay, Montevideo, Uruguay.
Traditionally, the design of QSAR/QSPR models is a complex task; the identification of the most relevant descriptors
that describe the phenomena under study constitutes a key step of this process. Most feature selection methods
used for addressing this step are focused on pure statistical associations among descriptors and target properties,
whereas the chemical knowledge is left out of the analysis. For this reason, the interpretability and generality of the
QSAR/QSPR models obtained by these feature selection methods are drastically affected. Therefore, an approach
to integrating chemist expertise in the selection process is needed for increase the user confidence in the final set
of chosen descriptors, improving the interpretability of the final model. We will talk about how the visual analytics
discipline can assist the model developer in the process of feature selection.
Collective vibrations and key residues associated to conformational selection upon
ligand binding
Sebastian Fernández-Alberti
Universidad Nacional de Quilmes, Roque Saenz Peña 352, B1876BXD Bernal, Argentina
The conformational selection paradigm for receptor-ligand binding establishes that ligand-bound conformations are
a subset of the ligand-free conformational space. Therefore, dynamic fluctuations associated to the ligand-free
conformation should contain information about unbound-to-bound conformational changes in the receptor. Coarsegrained Normal Mode Analysis and Molecular dynamics simulations provide the required information to explore these
features.
Firstly, we present a procedure to identify and characterize dynamically relevant residues responsible of maintaining the
conformational multiplicity associated to ligand-binding. The key residues can potentially be considered as fingerprints
of protein function. Furthermore, they can be proposed as promising targets for mutational and functional studies.
Next, we present a novel procedure to define and compare essential dynamics subspaces associated with ligand-bound
and ligand-free conformations. Our procedure allows us to emphasize the main similarities and differences between the
different essential dynamics. Essential dynamics subspaces associated to conformational transitions are also defined.
6
Lectures
In this way, the extent through which conformational changes upon ligand binding are included in each conformerspecific essential dynamics can be evaluated. As a test case, the glutaminase interacting protein (GIP), composed of
a single PDZ domain, is considered. Both GIP ligand-free state and glutaminase L peptide-bound state are analyzed.
Expanding the boundaries of the quantum-classical simulations using GPUs and
electronic dynamics.
Mariano C. González Lebrero
Instituto de Química y Fisicoquímica Biológicas - Facultad de Farmacia y Bioquímica
Universidad de Buenos Aires - Buenos Aires, Argentina.
The use of hybrid quantum-classical (QM / MM) simulation tools has proved useful for the response to questions in
the field of chemistry and biochemistry. Proof of this is that it has been awarded with the Nobel Prize in Chemistry
the main developers of thease techniques. The QM / MM current applications seek to describe the nuclear dynamics
maintaining the electronic structure in the ground state. This approach does not allow the treatment of conditions in
which electronic dynamics are relevant, for example in interaction with light and derived processes; electron transfer;
among others.
In this talk I will present the results of our efforts in order to expand the boundaries of the systems / processes for
which these techniques can be applied. In particular I will focus on the use of GPUs to achieve simulate large systems
at an affordable computational cost and the resent implementation of methods of electronic and electronic-nuclear
dynamics based on the Real Time-Time Dependent Density Functional Theory (RT -TDDFT) scheeme.
High-throughput identification of antigens by metatranscriptomics and peptide
chip technology
Paolo Marcatili
Technical University of Denmark (DTU), Department of Systems Biology
Lyngby, Denmark
The experimental identification of antigens is a fundamental yet problematic task in vaccine discovery: many pathogens
can hardly be cultivated, they might require specific environmental conditions to express their antigenic proteins, they
might present a large number of subdominant antigens and induce a complex polyclonal antibody response in the
host organism. In order to solve these problems we developed an integrated pipeline to detect simultaneously all
the potential antigens for the ruminant disease Digital Dermatitis (DD), together with the specific immune response
developed by the host cow, in a culture independent manner. We used a metatranscriptomic approach to identify
all the genes expressed by the complex assemblage of DD-associated bacteria (mainly belonging to the treponema
genus) and the immunologically relevant genes associated with the polyclonal immune response in more than 30
infected cows.
From this extended pool of more than 80.000 bacterial transcript we identified, using structural and functional
bioinformatics prediction tools, the 600 proteins more likely to be antigenic and subsequently screened those for
antibody reactivity using a peptide-chip technology. On the other hand, the immune repertoire of B- and T-cell
receptors expressed by each individual cow in response to the disease has been identified and analysed in order to
provide further information for the development of the vaccine, such as the MHC specificity and eventually the
molecular basis of antibody-antigen interaction.
This novel integrated approach can be extremely powerful to develop new vaccines and to understand the complex
interplay between pathogens and their interactions with the host immune system.
7
Oral Sessions
Structure prediction and protein function
ID:3
Structure prediction and protein function
Oral Session
Oral Session – Submission 3
Characterization of binding specificities of Bovine Leucocytes class I molecules:
Impacts for rational epitope discovery
Morten Nielsen1,2 , Andreas M. Hansen3 , Michael Rasmussen3 , and Soren Buus3
1-Center for Biological Sequence Analysis, Danish Technical University, Denmark
2-Instituto de Investigaciones Biotecnológicas, UNSAM, San Martín, Buenos Aires, Argentina
3-Laboratory of Experimental Immunology, Faculty of Health Sciences, University of Copenhagen, Denmark
Background. The binding of peptides to classical major histocompatibility complex (MHC) classical class-I proteins
is the single most selective step in antigen presentation [1]. However, the peptide binding specificity of the cattle
MHC (bovine leucocyte antigen, BoLA) class I (BoLA-I) molecules remains poorly characterized. We have previously
proposed a reverse immunology strategy for effective and rational epitope discovery based on in silico prediction tools
combined with experimental peptide-binding data from recombinant bovine MHCs [2]. Our aim here is to extend
this approach and improve the performance of the MHC peptide binding prediction methods NetMHC [3, 4] and
NetMHCpan [5, 6] by integrating peptide binding affinity data for a limited set of prevalent BoLA MHC class I
molecules. This will demonstrate how such an approach in a highly cost effective manner can be used to guide the
search for CTL epitopes in cattle. Our strategy was to use a nonameric Positional Scanning Combinatorial Peptide
Library (PSCPL) in combination with a high throughput peptide - MHC-I dissociation assay, and to feed this data
into peptide binding prediction methods. [7].
Results. Using this strategy, we have characterized 8 BoLA-I molecules. The peptide specificity of the BoLA-I
molecules was found to resemble that of human MHC-I molecules with primary anchors at P2 and P9, and, occasional
auxiliary P1 and P3 anchors. Seven of the 8 molecules preferred hydrophobic, whereas one (BoLA-2*01201 (T2A))
preferred positively charged P9 terminal anchor residues. Anchors in the other positions were more diverse. An
example of two characterized binding motifs is shown in figure 1.
Figure 1: Sequence logo representation of the binding motif of (left) BoLA-HD6 and (right) BoLA-T2A. The sequence logos were generated using
the Seq2Logo server [8] from the PLSPL binding data.
We analyzed 9 reported CTL epitopes from the T. Parva, the causative agent of East Coast fever in cattle, and in
8 cases, stable and high affinity binding was confirmed. Likewise, cross binding was observed between functionally
related MHCs. A set of peptides were tested for binding affinity to the 8 BoLA proteins and used to refine the
predictors NetMHC and NetMHCpan.
9
Structure prediction and protein function
ID:6
Oral Session
Table 1: Experimental validation of binding affinity of 5 known BoLA-I restricted epitopes and the alternative minimal epitopes suggested by in silico
predictions. Additional amino acids flanking the minimal epitope are underlined.
The inclusion of BoLA specific peptide binding data led to a significant improvement in prediction accuracy for
reported T. parva CTL epitopes. For an extended set of reported CTL epitopes with weak or no predicted binding,
these refined prediction methods suggested presence of nested truncated minimal epitopes with high-predicted binding
affinity. The enhanced affinity of the alternative peptides were tested and in all cases confirmed experimentally (see
table 1), and in one case was the suggested new minimal epitope validated using tetramer straining (see figure 2).
Figure 2: Validation of the alternative BoLA- B*04101 Tp2 epitope. T cells were stained with anti- bovin CD8 and different BoLA-6*04101 tetramers:
(Left) Unfolded (no peptide), (Middel) The longer Tp2 peptide, (Right) The truncated optimal peptide. Data taken from [9].
Conclusions. To the best of our knowledge, this is the first study that demonstrates how biochemical peptide
binding data combined with immunoinformatics can be effectively used to characterize the peptide binding motifs of
BoLA-I molecules, and how such data can be used to boost performance of MHC-peptide binding prediction methods,
empowering rational epitope discovery and aiding the understanding of T-cell immune response in cattle.
References
1. Yewdell, J.W. and J.R. Bennink, Immunodominance in major histocompatibility complex class I-restricted T lymphocyte
responses. Annual Review of Immunology, 1999. 17: p. 51- 88.
2. Nene, V., et al., Designing bovine T cell vaccines via reverse immunology. Ticks Tick Borne Dis, 2012. 3(3): p. 188-92.
3. Lundegaard, C., et al., NetMHC-3.0: accurate web accessible predictions of human, mouse and monkey MHC class I
affinities for peptides of length 8-11. Nucleic Acids Res, 2008.
4. Nielsen, M., et al., Reliable prediction of T-cell epitopes using neural networks with novel sequence representations.
Protein Sci, 2003. 12(5): p. 1007-17.
5. Nielsen, M., et al., NetMHCpan, a method for quantitative predictions of peptide binding to any HLA-A and -B locus
protein of known sequence. PLoS ONE, 2007. 2(8): p. e796.
6. Hoof, I., et al., NetMHCpan, a method for MHC class I binding prediction beyond humans. Immunogenetics, 2009. 61(1):
p. 1-13.
7. Harndahl, M., et al., Real-time, high-throughput measurements of peptide-MHC-I dissociation using a scintillation proximity assay. J Immunol Methods, 2011. 374(1-2): p. 5-12.
8. Thomsen, M.C. and M. Nielsen, Seq2Logo: a method for construction and visualization of amino acid binding motifs and
sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and
depletion. Nucleic Acids Res, 2012. 40(Web Server issue): p. W281-7.
10
Structure prediction and protein function
ID:6
Oral Session
9. Svitek, N., et al., Use of "one-pot, mix-and-read" peptide-MHC class I tetramers and predictive algorithms to improve
detection of cytotoxic T lymphocyte responses in cattle. Vet Res, 2014. 45(1): p. 50.
Structure prediction and protein function
13
C α and
13
Oral Session – Submission 6
C β chemical shift-driven refinement of protein structures
Pedro G. Ramírez
IMASL-CONICET. Universidad Nacional de San Luis, Italia 1556, 5700 - San Luis, Argentina
Background. X-ray crystallography (XRC) and nuclear magnetic resonance (NMR) spectroscopy are the most
powerful and predominant techniques used to experimentally determine the three–dimensional structures of biological
macromolecules at near atomic resolution. On one hand, XRC has no size limitations and provides the most precise
atomic detail, whereas information about the dynamics of the molecule may be limited. On the other hand, NMR–
spectroscopy tops XRC in those cases where no protein crystals are available and, besides, it provides solution state
dynamics. However, the main drawback of NMR-spectroscopy is the fact that it delivers lower resolution structures
[1]. Because of this, validation, the process of evaluating the reliability for 3-dimensional atomic models, becomes
critically important to protein structure determination via NMR-spectroscopy.
Materials and methods. Our group has developed a protein structure validation method called CheShift-2 [2],
which allows us to calculate the “differences” between observed and calculated chemical shifts for the nuclei of
interest (13 C α and 13 C β ). This validation method indicates where, in the protein structure, the biggest “differences”
are found. Thus, allowing us to modify the desired torsional angles, but keeping compatibility with all the existent
experimental information, in such a way that the observed and computed chemical shift values at a local and global
level are optimized.
We use a refinement algorithm that identifies the residues that contain flaws and then modifies the protein structure’s
torsional angles in a way that tend to diminish these flaws. The information to identify these residues is obtained
by CheShift-2, and to perturb the protein structure we use the software package for prediction and design of protein
structures, ROSETTA [3].
Conclusions. We evaluate our methodology by comparing the group of refined structures’ root mean square deviation (RMSD) and global distance test high accuracy score (GDT-HA) [4] against the same protein experimentally
determined at high-quality level. Moreover, the physicochemical quality of the results were assessed with validation
methods like PROCHECK [5] and MolProbity [6].
Acknowledgments. This work was supported by PIP-112-2011-0100030 (JAV) from IMASL-CONICET, Argentina,
and Project 328402 (JAV) from UNSL, Argentina. The research was conducted by using the resources of a local
Beowulf-type cluster at the IMASL-CONICET.
References
1. Krishnan VR, B.: Macromolecular Structure Determination: Comparison of X- ray Crystallography and NMR Spectroscopy.
eLS 2012.
2. Martin OA, Vila JA, Scheraga HA: CheShift-2: graphic validation of protein structures. Bioinformatics 2012, 28(11):15381539.
3. Raman S, Vernon R, Thompson J, Tyka M, Sadreyev R, Pei J, Kim D, Kellogg E, DiMaio F, Lange O et al: Structure
prediction for CASP8 with all-atom refinement using Rosetta. Proteins 2009, 77 Suppl 9:89-99.
4. Zemla A: LGA: A method for finding 3D similarities in protein structures. Nucleic acids research 2003, 31(13):3370-3374.
5. Laskowski RA, MacArthur MW, Moss DS, Thornton JM: PROCHECK: a program to check the stereochemical quality of
protein structures. Journal of Applied Crystallography 1993, 26(2):283-291.
11
Proteomics and functional proteomics
ID:52
Oral Session
6. Davis IW, Leaver-Fay A, Chen VB, Block JN, Kapral GJ, Wang X, Murray LW, Arendall WB, 3rd, Snoeyink J, Richardson
JS et al: MolProbity: all-atom contacts and structure validation for proteins and nucleic acids. Nucleic acids research
2007, 35(Web Server issue):W375-383.
Structure prediction and protein function
Oral Session – Submission 37
Physicochemical Characterization and Phylogenetic Classification of the 2/2
Hemoglobins Family sheds light on their Molecular Functions
Juan P. Bustamante1,3 , Leonardo Boechi 2 , Leandro Radusky3 , Darío A. Estrín1 , Arjen ten Have4 and Marcelo
A. Martí1,3
1
Departamento de Química Inorgánica, Analítica y Química Física, INQUIMAE-CONICET, Facultad de Ciencias Exactas y Naturales,
Universidad de Buenos Aires, Buenos Aires, Argentina. [email protected]
2
Instituto de Cálculo, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires, Argentina
3
Departamento de Química Biológica, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires,
Argentina
4
Instituto de Investigaciones Biológicas, CONICET, Universidad Nacional de Mar del Plata. Buenos Aires, Argentina
The globin family of heme proteins offers a large, diverse set of proteins, whose function is tightly related to affinity
and reactivity towards small ligands, mainly O2 but also NO, CO, and H2 S1 . Globins with high O2 affinity, generally
function as O2 -redox related enzymes, like Mycobacterium tuberculosis 2/2 HbN NO dioxygenase, moderate affinity
globins usually act as oxygen carriers, like mammalian myoglobin, while low O2 affinity globins are mostly NO or
CO sensors, like soluble guanylate cyclase. We classified and characterized 1107 protein sequences of the 2/2 Hbs
family, one of the three major globin subfamilies2, based on the assumption that a protein’s function is determined
by its structure and physicochemical properties encoded by its sequence. We combined bioinformatics and structural
biology with a phylogenetic reconstruction to describe and assign key 2/2 Hbs features that in turn determine O2
affinity. Our physicochemical model sheds light on molecular details of the O2 affinity and allows to estimate kinetic
constants for 2/2 Hbs proteins. The predicted O2 affinities, based on ligand entry and stabilization, are substantiated
by the evolutionary relationships demonstrated by the phylogenetic tree. The results offer a general and profound
understanding of the putative functions of 2/2 Hbs in terms of protein diversity.
References
1. Milani M, Pesce A, Nardini M, Ouellet H, Ouellet Y, Dewilde S, Bocedi A, Ascenzi P, Guertin M, Moens L, Friedman JM,
Wittenberg JB, Bolognesi M. Structural bases for heme binding and diatomic ligand recognition in truncated hemoglobins.
Journal of Inorganic Biochemistry. 2005. 99:97-109.
2. Vuletich DA and Lecomte JTJ. A Phylogenetic and Structural Analysis of Truncated Hemoglobins. Journal of Molecular
Evolution. 2006. 62:196–210.
Proteomics and functional proteomics
Oral Session – Submission 52
On the analysis of vibrations associated to conformational selection upon ligand
binding in a PDZ domain protein
Marcos Grosso1 , Adrián Kalstein1 , Adrián Roitberg2 and Sebastián Fernández-Alberti1
1
Quilmes National University, Bernal, Argentina, Roque Saenz Peña 352, B1876BXD
2
Department of Chemistry, University of
Florida, Gainesville, Florida 3261
The conformational selection paradigm for receptor-ligand binding establishes that ligand-bound conformations are
a subset of the ligand-free conformational space. Therefore, dynamic fluctuations associated to the ligand-free
conformation should contain information about unbound-to-bound conformational changes in the receptor. This
concept emerged as an alternative for the traditional induced-fit model, based on the hypothesis that ligand-binding
12
Sequence analysis
ID:25
Oral Session
to the ligand-free conformations induces conformational transitions to the ligand-bound state. Molecular dynamics
simulations provide the required information to explore these features. Its use in combination with subsequent essential
dynamics analysis (1) allows separating large concerted conformational rearrangements from irrelevant fluctuations.
We present a novel procedure to define and compare essential dynamics subspaces associated with ligand-bound and
ligand-free conformations. Our procedure allows us to emphasize the main similarities and differences between the
different essential dynamics. Essential dynamics subspaces associated to conformational transitions are also defined.
In this way, the extent through which conformational changes upon ligand binding are included in each conformerspecific essential dynamics can be evaluated. As a test case, the glutaminase interacting protein (GIP), composed of
a single PDZ domain, is considered. Both GIP ligand-free state and glutaminase L peptide-bound state are analyzed.
Our findings concerning the relative changes in the flexibility pattern upon binding are in good agreement with previous
NMR data.
Subspace A
S(lb)
(SV)
(SV)
Subspace B
S(lf )
S(lf )
S(lb)
M
106
33
33
ζ
69.74%
96.34%
96.29%
nD
92.8
31.8
31.8
Table 1: Comparison of ligand-free, ligand-bound and, conformational transition essential dynamics subspaces (S(lf ), S(lb) and, SV respectively).
Conclusions. We have developed a general and novel procedure to define size and composition of conformer- specific,
and conformational transition essential dynamics. We have also described a procedure to compare essential dynamics
subspaces. The procedure is easy to implement and allows emphasizing the main similarities and differences between
the different essential dynamics. We were able to explore the extent through which conformational changes upon
ligand binding are included in each conformer-specific essential dynamics. We consider that the method is suitable
to be applied in a large variety of cases such as the analysis of the effects of mutations on dynamics, design of
new drugs that prevent conformational changes upon ligand-binding, and the analysis of conformational transitions
induced by changes in cofactor oxidation states. MD simulations and PCA of GIP in its ligand-bound and ligandfree conformations have been considered as a test case. We have found that the sizes of the essential subspaces,
required to include every PCA mode that participates significantly in any structural change observed during each of
the MD simulations, are larger than the most frequently considered number of modes. The analysis of the essential
dynamics subspace associated to conformational transitions indicates that in most cases mainly the βa-βb hairpin,
and the β2-β3 loop are involved. Our findings are in good agreement with previous NMR data analysis performed
by Mohanty et al(2). The relative changes in the flexibility pattern upon binding are in agreement with the general
trend that, except in the regions of GIP that directly interact with the ligand, the ligand-bound conformation is more
flexible than the ligand-free conformation. We observed that the conformational transitions involve more complex
geometry distortions than the ones collected during the ligand-free MD simulations. The comparison of essential
dynamics subspaces for ligand-free, ligand-bound, and conformational transition reveals the ligand-free and ligandbound MD simulations share almost 70% of their essential dynamics. Besides, the essential dynamics associated to
the conformational transition is completely covered by the essential dynamics of each of ligand-free and ligand-bound
states. In this way, the conformational selection model for binding is validated. Dynamic fluctuations associated to
both conformations account for unbound-to-bound displacements.
References
1. A. Amadei, A. B. M. Linssen, and H. J. C. Berendsen, Essential dynamics of proteins, Proteins 17 (1993), 412–425.
2. Zoetewey, D. L., M. Ovee, M. Banerjee, R. Bhaskaran,and S. Mohanty. 2011. Promiscuousbinding at the crossroads of
numerous cancer pathways: insight from the binding of glutaminase interacting protein with glutaminase L. Biochemistry.
50: 3528–39.
Sequence analysis
Oral Session – Submission 25
13
Sequence analysis
ID:25
Oral Session
Evolution of linear motifs within the adenovirus E1A oncoprotein
Juliana Glavina1 , Lucía B. Chemes2 , Rocío Espada1 , Ricardo Rodriguez de la Vega3 and Ignacio E. Sánchez1
1
Protein Physiology Laboratory, Departamento de Química Biológica and IQUIBICEN-CONICET, Facultad de Ciencias Exactas y
Naturales, Universidad de Buenos Aires.
2
Protein Structure-Function and Engineering Laboratory. Fundación Instituto Leloir and IIBBA-CONICET.
3
Ecologie, Systématique et Evolution, CNRS, UMR 8079, Orsay, France and Ecologie, Systématique et Evolution, UMR 8079,
Université Paris-Sud, Orsay, France
Introduction. Many protein-protein interactions are mediated by linear sequence motifs of 5 function-determining
residues, which are often found within intrinsically disordered domains [1]. Linear motifs appear or disappear with
only a handful of point mutations and are thought to evolve rapidly. We have chosen the adenovirus E1A oncoprotein
as a model to study sequence conservation and linear motif evolution. The E1A protein is unique to the adenovirus
Genus Mastadenovirus, which infects mammals. Mastadenovirus types differ in their phenotypical traits, including
host, tissue tropisms and oncogenic potential. E1A consists of 4 intrinsically disordered regions, designated Nt, CR1,
CR2 and CR4, and one globular region designated CR3 [2]. We have analyzed the variability and evolution of 13
linear motifs in E1A and the relationship between different motif repertoires and virus phenotypes.
Methods. We used over 100 E1A sequences from known mastadenovirus types to construct an alignment. We
used the information content of each position in the alignment as a measure of conservation. Direct information
is a measure used to infer direct co-evolutionary couplings among residue pairs in multiple sequence alignments,
taking to a minimum the influence of indirect correlations. We used this approach to predict residue-residue contacts
on the E1A protein. We also studied the variability in the linear motif repertoire for different E1A proteins. The
motif repertoire was then represented superimposed on a phylogenetic tree of Mastadenoviruses. Last, we performed
hypergeometric association tests on all individual combinations of linear motifs, phenotypic traits and hosts.
Figure 1: Linear motifs within the E1A protein and E1A targets mapped to single or multiple binding sites and unmapped targets.
Results. The E1A protein is densely packed with linear motifs that explain the high number of binding partners (Figure
1). The intrinsically disordered regions and the the globular CR3 region show a high degree of conservation along the
whole length. We found pairs of co-evolving residues within each region as well as across regions, indicating that.
The different motifs showed different abundance and distribution patterns. Some were highly conserved and some
were present only in a few species.
Conclusions. E1A linear motifs evolve rapidly and follow motif-specific trends. The different motifs and regions of
the protein did not evolve independently as shown by co-evolution, and evolutionary analyses. A lack of globular
structure does not necessarily lead to a lower degree of sequence conservation.
14
Sequence analysis
ID:2
Oral Session
Acknowledgments. We acknowledge funding from Agencia Nacional de Promoción Científica y Tecnológica (PICT
2012-2550 to I.E.S), Consejo Nacional de Investigaciones Científicas y Técnicas (doctoral fellowship to J.G., L.B.C.
and I.E.S. are CONICET career investigators)
References
1. Davey NE, Trave G, Gibson TJ. Trends Biochem Sci (2011) 36: 159-169.
2. Pelka P, Ablack JN, Fonseca GJ, Yousef AF, Mymryk JS. J Virol (2008) 82(15):7252-63
Sequence analysis
Oral Session – Submission 2
On the design of shortened BCH barcode
Laura Angelone1,2 ,† , Flavio E. Spetale1,2 , Javier Murillo
Tapia 1,2
1
2
1
, Joaquin Ezpeleta
1
, Pilar Bulacio Elizabeth
CIFASIS-Conicet Institute, Rosario, Argentina
Fac. de Cs. Exactas e Ingeniería, Universidad Nacional de Rosario, Argentina
†E-mail: [email protected]
Abstract. Binary BCH codes have been recently proposed for the design of barcoding systems of high multiplexing
capacity suitable for use in sequencing platforms impaired by mismatch errors. We generalize the design of BCH
barcodes by introducing shortened BCH barcodes, a class of barcodes built from binary BCH codes allowing otherwise
prohibited barcoding sizes.
Introduction. The DNA barcoding problem is indeed an instance of a largely studied problem in Communication
Theory, the error-free transmission of discrete patterns in the presence of random noise [1], a problem which leads
to the theory of error correcting codes. Since the recognition of this fact in 2008 [2], few works have considered the
systematic design of coding-based barcode systems. With main focus on sequencing platforms impaired by mismatch
errors, we generalize the design of BCH barcodes [3] by introducing shortened BCH barcodes.
Results. Binary BCH codes of size n = 2m − 1, m ≥ 4, can be used for the construction of barcodes of N =
8, 16, 32, . . . bases [3]. For given n, multiple
t > 1 error-correction options are possible. Hence, BCH barcodes can
indeed be used to correct at least b = 2t base mismatches. To improve the design flexibility of BCH barcodes
allowing intermediate N settings, shortened binary BCH codes can be considered. Shortening BCH codes with
parameter s > 0 reduces the number of informative bits from k to k 0 = k − s preserving the number of redundant
bits. Hence, improved error correction abilities at the expense of diminished multiplexing capacity can be expected
for shortened BCH barcodes. By means of shortening, BCH barcodes of size N = n+1−s
for s even or N = n−s
2
2
for s odd can be designed. To recover from sequencing errors, shortened BCH barcodes must be first demapped to
the binary domain where earlier removed bits must be reinserted. As with standard BCH barcodes, shortened BCH
barcodes must avoid homopolymer regions [4] and take into account well-known chemistry constraints. Most of these
constraints have been already taken into account in the design of Barcrawl [5], a tool for the ab-initio design of primer
barcodes for pyrosequencing applications. Hence, before their deployment, candidate barcodes are passed through
an adapted version of the Barcrawl tool. For each barcoding system of size N built from a given error correcting
code of size n, a wide range of error correction and multiplexing abilities were evaluated. For practical purposes, N
was limited to 30 bases and thus, binary BCH codes of size n ∈ {15, 31, 63} and shortened versions of them were
considered. Barcoding systems were evaluated through their multiplexing capacity M , their barcoding rate B and
their probabilities pe and pu of detected and undetected barcode identification errors. For each N , we define M as
the maximum number of barcodes which are compatible with the given sequencing chemistry. Similarly, we define
B as the actual fraction of informative quads per barcode, i.e., B = logN4 M . The multiplexing capacity M of BCH
barcodes size N on ideal mismatch sequencing channels depends on the desired pe and pu for the given ps of the
corresponding QSC model. To accomplish a strict control of pu , we looked for BCH barcodes able to satisfy the
15
Metabolomics and Cheminformatics
ID:31
Oral Session
operational constraint pu < 10−8 at ps = 10−2 for N ≤ 27. We found that the desired operational constraint could
be only satisfied with shortened binary BCH codes of size n = 63. As shown in Table 1, a broad range of (M, pe )
configurations can be obtained.
N
21
22
24
25
27
M
86
384
pe = 10−5
B
(n, k, t, s)
0.168 (63, 30, 6, 21)
0.187 (63, 30, 6, 19)
M
73
295
pe = 10−5
B
(n, k, t, s)
0.142
0.144
(63, 24, 7, 15)
(63, 24, 7, 13)
M
72
pe = 10−5
B
(n, k, t, s)
0.148
(63, 18, 10, 9)
Table 1: The multiplexing capacity M and the barcoding rate B accomplished by BCH barcodes of size N built from shortened binary BCH codes of
size n able to carry k informative bits and to correct at least t binary errors when variable shortening degrees s are used. BCH barcodes are constrained
to accomplish increasingly stringent pe settings with pu < 10−8 for ps = 10−2 on a QSC channel model.
Acknowledgments. LA’s, FES’s, JM’s, JE’s, PB’s and ET’s work was supported by project PICT 2012- 2513,
SECYT, Argentina
References
1. Calderbank AR (1998) The art of signaling: fifty years of coding theory. IEEE Transactions on Information Theory 44.
2. Hamady M, Walker JJ, Harris JK, Gold NJ, Knight R (2008) Error-correcting barcoded primers for pyrosequencing hundreds
of samples in multiplex. Nat Methods 5: 235–237.
3. Krishnan A, Sweeney M, Vasic J, Galbraith D, Vasic B (2011) Barcodes for dna sequencing with guaranteed error correction
capability. Electronic Letters 47: 236–237.
4. Meyer M, Kircher M (2010) Illumina sequencing library preparation for highly multiplexed target capture and sequencing.
Cold Spring Harb Protoc 2010: pdb.prot5448.
5. Frank D (2009) Barcrawl and bartab: software tools for the design and implementation of barcoded primers for highly
multiplexed dna sequencing. BMC Bioinformatics 10: 362.
Metabolomics and Cheminformatics
Oral Session – Submission 31
A multilayer network approach for guiding drug repositioning in neglected diseases
Ariel J Berenstein1,2 ,* María P Magariños3,1 , Ariel Chernomoretz1,2 , Fernán Agüero3
1
Laboratorio de Bioinformática, Fundación Instituto Leloir, Buenos Aires, Argentina
2
Departamento de Física, Universidad de Buenos Aires, Buenos Aires, Argentina
3
Laboratorio de Genómica y Bioinformática, Instituto de Investigaciones Biotecnológicas, Universidad de San Martin, San Martín,
Buenos Aires, Argentina.
Background. Neglected tropical diseases (NTDs) are human infectious diseases that occur in tropical or subtropical
regions and are often associated with poverty. Historically, lack of interest from the pharmaceutical industry, resulted
in the lack of drugs to combat the majority of the pathogens that cause these diseases. Recently, the availability of
open chemical information has increased with the advent of public domain chemical resources and the release of data
from high throughput screening assays. In our laboratory, our goal is to prioritize and identify candidate drug targets,
and candidate drug-like molecules to foster drug development in for these diseases. For this we use comparative
genomics, and chemogenomics approaches.
Materials and methods. Chemical data-sets, including bioactivity data against pathogen and non- pathogen targets
were obtained from open databases and high throughput screenings. Using these data, we built a multilayer network
considering three disjoint set of vertexes with 1.48 106 drugs and 1.67 105 proteins across 221 species and a few key
protein features (orthology, Pfam domains, participation in defined metabolic pathways), organized in three different
layers (Fig. 1A). Three different classes of target similarity criteria were considered: sharing of PFAM domains
16
System Biology and Networks
ID:27
Oral Session
present in the same protein, clustering in the same ortholog group (OrthoMCL algorithm), and belonging to the
same metabolic pathway. Only statistically significant terms (in context of drug-target predictions) were taken into
account. A bipartite projection was made using a modified version of the Zhou method [2] over the protein layer
(Fig. 1b). In the resulting monopartite protein projected network, proteins are linked if and only if, they share at
least one relevant biological entity. Taking advantage of this approach, we first tackled the problem of prioritizing
targets for drug discovery in the absence or scarcity of bioactivity data for an organism of interest. For this, given
an organism of interest we took advantage of the network to get a global prioritized list of promising targets in the
query species. In a second application, we suggest candidate targets for orphan compounds, which have been shown
to be active in whole-cell or whole-organism screenings but whose target is currently unknown. In this case, we aim
to obtain reduced prioritization list of target proteins for the orphan molecule.
Figure 1: Schematic representation of data and workflow. A: Multilayer representation of drug-target data, first layer (bottom) contains drugs with any
known bioactivity over proteins represented in the second layer. Top plane contains significant biological entities involving proteins of different organisms
(orthologs, metabolic pathways and PFAM domains). B. Bipartite projection of protein-entities layers in a protein-projected network (PP-Layer). In
the resulting monopartite protein projected network, proteins are linked if and only if, they share at least one relevant biological entity.
Results. We find that our approach allow us to get statistically significant prioritized lists in both pathogen and model
organisms, as evaluated by a tenfold cross validation procedure. Moreover, we found that our method overcomes
traditional sequence-alignment based approaches like FASTA. We will discuss a number of interesting targets in
pathogen organisms which have been prioritized under the assumption that no bioactivity information was available
for them. On the other hand, we found our approach is especially useful to get reduced prioritization lists of
target proteins for orphan query molecules. We did this in two ways: 1) in silico, by generating artificially orphaned
compounds, via a leave one out procedure, and 2) in a post-facto validation of the strategy, in which we analyzed a
number of suggested targets for compounds that are active against P. falciparum. Overall, our results suggest that
it is possible to identify candidate drug targets, either for complete query species or for orphan compounds, even in
the absence of species-specific inhibition data. This is particularly important in the case of neglected diseases, as
this means we can leverage data from model organisms (or from other tropical diseases) to guide drug repositioning
exercises for these important diseases
Acknowledgments. We acknowledge support from CONICET (fellowships and salaries) and from ANPCyT (Agencia
Nacional de Promoción Científica y Tecnológica (grant PICT-2010-1479)
References
1. Magariños et al. (2012) Nucleic Acids Res 40: D1118-D1127. DOI: 10.1093/nar/gkr1053
2. Zhou et al. Phys Rev E. (2007) 76:046115. DOI: 10.1103/PhysRevE.76.046115
17
System Biology and Networks
ID:27
System Biology and Networks
Oral Session
Oral Session – Submission 27
Microarray Metanalysis and Gene Regulatory inference of LPS transactivation of
macrophage 1-α-hydroxylase
Romina Martinelli1 , Lucas Daurelio2 , Luis Esteban1,2
1
2
Facultad de Ciencias Médicas, Universidad Nacional de Rosario, Santa Fe, Argentina
Instituto de Biología Molecular y Celular de Rosario (IBR-CONICET). Facultad de Ciencias Bioquímicas y Farmacéuticas, Universidad
Nacional de Rosario, Santa Fe, Argentina
E-mail: [email protected]
Background. 25-Hydroxyvitamin-D can be activated to 1,25-dihydroxyvitamin-D3 [1,25(OH)2 D3] by the rate-limiting
enzyme 1-α-hydroxylase. Particularly, in cells of the immune system this enzime is under control of immune stimuli.
In pathological situations, such tuberculosis, this can lead to systemic excess of 1,25(OH)2 D3 and hypercalcemia.
Despite there are some studies of LPS transactivation of macrophage 1-α-hydroxylase, all of them are focused on the
most relevant transcriptional factors involved, but no systems approach was used to examine the complex interaction
that involves the enzime regulation.
Materials and methods. To make it, we employed microarray data from human macrophages, obtained from GEO
(6), using ”macrophages” and ”LPS” as key words. The experiments made at least by triplicates (see Table 1).
The meta-analysis was performed with INMEX (2). To perform differential expression analysis on individual data
sets Benjamini-Hochberg’s False Discovery Rate (FDR) was settled. To combine p-values from multiple studies for
information integration Fisher’s method was chosen. Enrichment in Pathways and Go analysis using hypergeometric
test were done to get functionality information. List of genes significantly enriched in a particular pathway which
contain to 1-α-hydroxylase was selected. The list was feed with regulatory proteins and enzimes names detected in
the lab involved in the 1-α- hydroxylase response (4). The final list was loaded in Genemania (7). In order to improve
the visualization and curate it after the network was deployed in Cytoscape (8).
Results and discusion. As it was expected according to the literature, the Gene Differentially Expressed List was
rich, among others in: Jak-STAT signaling pathway ,Transcriptional misregulation in cancer, Chemokine signaling
pathway, and Toll-like receptor signaling pathway. Interesting the 1-hydroxylase gene mapped to tuberculosis pathway,
one clinical association which gave the first insight of the extra-renal activity of this enzyme. The network show
various hubs; Myd88, Stat1α and TIrap seem to be interesting from the 1-α-hydroylase regulation. We found most
18
System Biology and Networks
ID:44
Oral Session
of the transcription factors described before to interact with 1-α-hydroxylase promoter, namely NFKB1, CREB,
STAT1α, C/EBPβ and Jun. All of them possesses binding sites in the hydroxylase promoter. It was previously
shown by transfection studies and gel shift assays that C/EBPβ (1,4,5) plays a role in 1-α-hydroxylase induction
by direct binding to specific recognition sites in the promoter, whereas for STAT1α no such direct effects could be
demonstrated. Cross-talk between the JAK-STAT, the NF-kappaB, and the p38 MAPK pathways should be explored.
The new functional relationship of others proteins also were detected C/EBPβ- NFKB, C/EBPβ-Jun. This deserve
further exploratory studies to confirm them.
References
1. Overbergh L, Stoffels K, Mark Waer, Verstuyf A, Bouillon R, Mathieu C: Immune Regulation of 25-Hydroxyvitamin D-1
α-Hydroxylase in Human Monocytic THP1 Cells: Mechanisms of Interferon-γ-Mediated Induction. The Journal of Clinical
Endocrinology and Metabolism 91(9):3566 -3574 .2006
2. Xia J, Fjell C, Mayer M, Pena O, Wishart D, Hancock: INMEX – a web-based tool for integrative meta-analysis of
expression data. Nucleic Acids Res, 41, W63-70. 2013
3. Xaus J, Comalada M, Valledor A, Lloberas A, López-Soriano F, Argilés J, Bogdan C, Celada A. LPS induces apoptosis in
macrophages mostly through the autocrine production of TNF-α. Blood. June 15, 2000; 95 (12)
4. Esteban L, Vidal M, Dusso A: 1-α-Hydroxylase transactivation by γ -interferon in murine macrophages requires enhanced
C/EBPβ expression and activation. J Steroid Biochem Mol Biol. 2004 May;89-90(1-5):131-7.
5. Esteban L., Vidal M., Dusso A. LPS transactivation of microphage: role in local control of immune response. XLI Reunión
Anual de la Sociedad Argentina de Investigación en Bioquímica y Biología Molecular (SAIB). X CongresS Panamerican
Association for Biochemistry and Molecular Biology (PABMB). Pinamar, Buenos Aires, Argentina. 3 al 6/12/2005.
Poster. Publicado en BIOCELL (ISSN: 0327-9545) Vol 29, 2005.
6. Gene Expression Omnibus (GEO) http://www.ncbi.nlm.nih.gov/geo/
7. GeneMANIA. http://www.genemania.org/
8. Cytoscape. http://www.cytoscape.org/
System Biology and Networks
Oral Session – Submission 44
Encoding of spatial location and state of motion by the hippocampal region
Soledad Gonzalo Cogno1 , Emilio Kropff2 , Marcelo Montemurro3 , Inés Samengo1
1
Statistical and Interdisciplinary Physics Group, Instituto Balseiro and Centro Atómico Bariloche, San Carlos de Bariloche, Argentina,
8400
2
Instituto Leloir, Ciudad Autónoma de Buenos Aires, Argentina, C1405BWE
3
Faculty of Life Sciences, University of Manchester, Manchester, UK, M13 9PT
Over the last hundred years, the hippocampus has been one of the brain’s most studied structures. Many experiments
have suggested the rodent hippocampus plays an important role in spatial navigation. In other mammals (in particular,
humans) the encoding of space is believed to be only one function among many others that require to store and retrieve
information from memory. Studying the way the hippocampus encodes spatial locations, therefore, is a gateway to
understand the structures involved in mnemonic functions. The hippocampal region contains several anatomical
structures located in the temporal lobe, including the hippocampus per se and the entorhinal cortex. These two areas
are known to encode spatial information using two different neural codes. In rodents, the firing rate of pyramidal
cells in the hippocampus is strongly correlated with the location of the animal: Each cell fires only when the rat is
in a specific place. These specific places are the place fields, and such neurons are called place cells (figure 1A [1]).
Paralleling place cells in the hippocampus, grid cells can be found in the entorhinal cortex. They have multiple firing
fields organized in a hexagonal lattice (figure 1B [1]). Entorhinal neurons are hence activated whenever the animal’s
position coincides with any of the vertices of the lattice.
19
Structure prediction and protein function
ID:48
Oral Session
Our work is focused on analyzing electrophysiological recordings obtained in awake and behaving animals. The
experiment consists of a rat running along a linear track while the kinematic properties of the trajectory (position,
velocity and acceleration) are registered with an optical system. Simultaneously, the mean-field electric potential
of both the entorhinal cortex and the hippocampus are recorded with extracellular electrodes. We find that the
electrophysiological signals not only encode the position of the animal, but also the velocity and the acceleration.
Moreover, through an information-theoretical analysis, we see that more information flows from the entorhinal cortex
to the hippocampus than in de inverse direction. During this talk, I will discuss how the kinematic state of the animal
affects the electrophysiological signals and the information flow between the entorhinal cortex and the hippocampus.
A
B
Figure 1: Place cells and grid cells A. Firing pattern of a place cell in a linear track. The rat’s position is depicted in black. The red points indicate the
positions at which the cell fired. B. Firing pattern of a grid cell in an open field.
References
1. György Buzsáki and Edvard I Moser: Memory, navigation and theta rhythm in the hippocampal-entorhinal system. Nature
Neuroscience 2013, 16:130-138.
Structure prediction and protein function
.
Oral Session – Submission 48
A system biology approach to evaluate endometrial maturation in women that
developed preeclampsia
Ezequiel Juritz
Structural Bioinformatics Group. National University of Quilmes, Buenos Aires, Argentina.
Background. Native protein structure fluctuates between an ensemble of structural conformers connected by a
dynamic equilibrium that is defined by physicochemical parameters of the environment. The conformational changes
observed between the structural conformers are significant, with an average RMSD of 1.34 Å and a maximum of
7.15 Å (Monzon, Juritz, Fornasari, & Parisi, 2013). As different conformers can bind ligands with different energy,
the presence of ligands can shift the equilibrium through one or a set of specific conformers. In the present work
we study how different conformers of the same protein may lead to differential outcomes when performing structure|
based computational calculations.
Materials and methods. We studied a total of 41,884 protein|ligand interactions, from 5,292 different ligand|binding
protein. These proteins were cross linked against CoDNaS database, retrieving 78,113 structural conformers. All
available structures of each protein were docked against one or more of its ligand using AutoDock Vina (Trott & Olson,
2010), using 5,277 ligands. The estimated binding energy was estimated from every conformer|ligand interaction.
When cross linking proteins against CoDNaS database, an average of 44 structures per protein were recruited.
Results and discussion. Significant differences of ligand binding energies were obtained from different conformers.
The energies vary from -17.84 and 2.80 kcal/mol. 10% of the protein-ligand interaction studied presents a standard
deviation greater that 1 kcal/mol, while the average standard deviation is 0.46 Kcal/mol. We found no relation
20
Genomics, functional genomics
and metagenomics
ID:10
Oral Session
between the RMSD between conformers and the ligand binding energy differences, suggesting that local structural
rearrangements could impact on the thermodynamic landscape of ligand binding. Structure-based computational
calculations should consider protein conformational diversity in order to improve accuracy.
References
1. Monzon, A., Juritz, E. I., Fornasari, M. S., & Parisi, G. D. (2013). CoDNaS: a database of Conformational Diversity in
the Native State of proteins. Bioinformatics (Oxford, England), submitted.
2. Trott, O., & Olson, A. J. (2010). AutoDock Vina: improving the speed and accuracy of docking with a new scoring
function, efficient optimization and multithreading. Journal of Computational Chemistry, 31(2), 455-461.
Genomics, functional genomicsand metagenomics
Oral Session – Submission 10
Classification of Bovine Coat Color based on Genotype
Diego Comas1,2 , Marco Benalcázar1,2,3 , Inti Pagnuco1,2 , Pablo Corva4 , Gustavo Meschino5 , Marcel Brun1 ,
Virginia Ballarin1
1 Digital Image Processing Group, Facultad de Ingeniería, Universidad Nacional de Mar del Plata, Mar del Plata, Argentina 2 Consejo
Nacional de Investigaciones Científicas y Técnicas, CONICET, Mar del Plata, Argentina 3 Secretaría Nacional de Educación Superior,
Ciencia, Tecnología e Innovación (SENESCYT), Ecuador. 4 Facultad de Ciencias Agrarias, Universidad Nacional de Mar del Plata,
Balcarce, Argentina. 5 Bioengineering Lab, Facultad de Ingeniería, Universidad Nacional de Mar del Plata, Mar del Plata, Argentina
Introduction. Introduction Several current research projects focus on the creation of haplotype maps that identify
and describe common genetic variations in some species. Studies on haplotype maps are key in the understanding
of how natural selection has produced genomic differences between subspecies of a population. A Single Nucleotide
Polymorphism (SNP) is a DNA sequence variation occurring commonly within a population (above 1%) in which a
Single Nucleotide in the genome differs between members of biological species or paired chromosomes. Those which
are located in coding sequences are likely to alter the biological function of a protein, and therefore to have an effect
on the phenotype of an individual. Pattern recognition plays an important role in Genomic Signal Processing (GSP)
for detection, prediction, classification, control, and statistical modeling of gene networks. One of the goals of GSP
is to provide researchers with new hypothesis about biology, which can be used for systems-based applications and
on confirmatory experiments, respectively [1]. Here we present an application of GSP, based on the use of pattern
recognition techniques in order to find subsets of SNPs, from a given set of SNPs, which best predicts coat color
phenotype in cattle. Once identified the SNPs, they could be used in additional studies to confirm whether they are
related to the underlying signaling mechanism that determines the phenotypes under study.
Variation in coat color and spotting patterns of cattle have been extensively studied because there is evidence that
animals with light-colored hair coat and darkly pigmented skin are better adapted under tropical conditions with high
levels of solar radiation [2, 3]. We selected an initial set of 18 SNPs, or features in the language of pattern recognition,
linked to the melanocortin 1 receptor (MC1R) gene on bovine chromosome 18, which is involved in regulating hair
color [4].
Materials and methods. We used a dataset Dr composed of n=285 feature-label pairs, where each data vector
is formed by 18 features corresponding to eighteen SNPs selected, located between the base pairs 13, 776, 888 and
13, 778, 639, which corresponds to the region of chromosome 18 that contains the gene MC1R [4]. The dataset
belongs to the Bovine Genome Assembly version Btau-4.0 [5]. This dataset contains 132 black and 153 red hair
color samples, with proportions of 0.46 and 0.54 respectively. In this context, the goal of this work is to find the best
small subset of features (SNPs) that predicts, with high accuracy, the cattle coat color. The analysis includes the
evaluation of the performance of the classifiers designed based on those features. Classification rules used in this work
are Pyramidal Multiresolution [6], k- Nearest-Neighbor (kNN) [7], Logistic Regression [8], Linear Discriminant Analysis
(LDA) [9], and Support Vector Machines (SVMs) [9]. To evaluate the performance of the designed classifiers, based
on the best subset of features, we use the holdout method for error estimation [8]. We split randomly the dataset Dr
21
Genomics, functional genomics
and metagenomics
ID:10
Oral Session
into 2 disjoint subsets Dtr ain and Dtest , of size 185 and 100, respectively, maintaining the class proportions. Using
the training dataset Dtr ain , we test all the possible combinations of 2, 3, 4, and 5 features from the original set of
18 features, with a total of 12,597 features subsets to check. We rank these subsets by estimating the error of each
classification rule using the K- fold cross-validation method [9] with K = 5. Finally, once we find the best subset of
features for each classification rule, we use that subset to design a classifier using all the 185 samples from Dtr ain .
The performance of that classifier is computed as its average error over the 100 left-out samples that belong to
Dtest .
Results. Table 1 shows the results of the classification of the coat color phenotype, based on genomic data from
chromosome 18 in the positions corresponding to the gene MC1R. For the five classification rules tested, (i.e.,
Pyramidal Multiresolution, Logistic Regression, LDA, kNN, and SVM), it displays the SNPs identifiers obtained in
the stage of feature selection, and estimates of the error rate, False Positive Rate (FPR), and False Negative Rate
(FNR) based on the hold-out dataset Dtest. Analyzing the results presented in Table 1, the classification rule with
best performance was LDA with an error of 21%. Among the SNPs selected as best predictors of the coat color
phenotype, there are four SNPs which are the most frequent. These SNPs are the identifiers ‘BTA-161389 ’, ‘BTA42498 ’, ‘rs29020085 ’, and ‘rs29020087 ’. The performances of the other classification rules are all above 74%. It
should be noted that SVM is the only rule that needed only 4 SNPs to reach maximum performance.
Method
Pyramidal Multires.
Logistic Regression
LDA
kNN
SVM
SNPs identifiers
‘BTA-42498’
‘rs29011168’
‘rs29020087’ ‘rs29021759’
‘BTA-161389’
‘BTA-21794’
‘rs29020085’ ‘rs29021758’
‘BTA-161389’
‘BTA-42498’
‘rs29020087’ ‘rs29021757’
‘BTA-161389’
‘BTA-42498’
‘rs29020086’ ‘rs29020087’
‘BTA-161389’
‘BTA-21794’
‘rs29021758’
‘rs29020085’
Error
23%
FPR
25.92%
FNR
19.56%
‘rs29011163’
26%
33.33%
17.39%
‘rs29020085’
21%
24.07%
17.39%
‘rs29020085’
22%
24.07%
19.56%
‘rs29011168’
26%
31.48%
19.57%
Table 1: Classification results of the coat color phenotype of the 5 classification rules used. Error rate, False Positive Rate (FPR), and False Negative
Rate (FNR) are shown.
Conclusions. According to the results for the five classification rules tested, the best rule, i.e., with minor classification
error, is LDA, with an error of 21%. Although this is not a low error rate, it shows the feasibility of this approach
to search for biological markers that predict a given phenotype, in this case the coat color. The SNPs identified by
this approach can be useful as a guide for future biological tests, which should confirm, or not, the influence of these
SNPs on the phenotype. Although the influence of the MC1R gene in the primary determination of the coat color is
already known, this work shows which SNPs, located in this gene, are more likely to be related to the variations. It is
important to note that it is biologically shown that this phenotype is also influenced by other genes [10]. Because of
this, results could be improved (i.e., decreasing the error rates) by including SNPs from other genes involved in this
phenotype. However, a larger initial set of SNPs would make harder the feature selection process and increase the
potential risk of overfitting when designing the classifiers.
Acknowledgment. Diego Comas, Marco Benalcázar and Inti Pagnuco acknowledge support from Consejo Nacional
de Investigaciones Científicas y Técnicas (CONICET), Argentina
References
1. Ridder D, Ridder J, Reinders M: Pattern recognition in Bioinformatics. Brief Bioinform 2013, 14:633-647
2. Finch VA, Western D: Cattle colours in pastoral herds: natural selection or social preference. Ecology 1977, 58:1384
3. Finch VA, Bennetta IL, Holmesa CR: Coat colour in cattle: effect on thermal balance, behaviour and growth, and
relationship with coat type. Journal of Agricultural Science 1984, 102:141-147
22
Genomics, functional genomics
and metagenomics
ID:49
Oral Session
4. Stella A, Ajmone-Marsan P, Lazzari B, Boettcher P: Identification of Selection Signatures in Cattle Breeds Selected for
Dairy Production. Genetics 2010, 185:1451-1461
5. The Bovine HapMap Consortium: Genome-Wide Survey of SNP Variation Uncovers the Genetic Structure of Cattle
Breeds. Science 2009, 324:528-532
6. Dougherty ER, Barrera J, Mozelle G, Kim S, Brun M: Multiresolution analysis for optimal binary filters. Journal of
Mathematical Imaging and Vision 2001, 14:53-72
7. Rajini NH: Classification of MRI brain images using k-nearest neighbor and artificial neural network, 2011 International
Conference on Recent Trends in Information Technology, Chennai, India, 2011, pp 563- 568
8. Devroye L, Györfi L, Lugosi G: A Probabilistic Theory of Pattern Recognition. Springer-Verlag 1996, Berlin Heidelberg
9. Duda R, Hart P, Stork D: Pattern Classification. Wiley-Interscience 2001,
10. Hanna LLH, Sanders JO, Riley DG, Abbey CA, Gill CA: Identification of a major locus interacting with MC1R and
modifying black coat color in an F2 Nellore-Angus population. Genetics Selection Evolution 2014, 46:1-8
Genomics, functional genomicsand metagenomics
Oral Session – Submission 49
Bioprospecting of lignocellulolytic enzymes in enriched consortia of pine and
eucalyptus forest soils by metagenomic sequencing
Marina D. Reinert
Instituto de Agrobiotecnología Rosario, Rosario, Santa Fé, Argentina
Background. Second generation biofuels are produced by fermentation of sugars extracted from agronomic residues
to ethanol. Lignocellulose breakdown is a crucial step needed to obtain sugar free molecules. Nowadays the bottleneck
for second generation biofuel production is in the cost of lignocellulolitic enzymes [1, 2]. Our aim is to use metagenomic
based bioprospecting to find novel lignocellulose degrading proteins and to produce them in a low cost system based
on plants as biofactories.
Methods. We took soils samples in a Pine elliotis and in a Eucalyptus grandis forest soils in Concordia, Entre Ríos,
in February 2012. Both soils contained wood decaying material. Samples were then used as inoculum for minimum
media [3] with only carboximetil-celulose (CMC) or sawdust as organic matter. Additionaly, we used antibiotics or
antifungals to prevent each type of organism grow in each case. They were cultured for 30 days, and an aliquot
of each culture was taken every 10 days. Genomic DNA was extracted from each sample. Amplicon sequencing of
the V4 region of 16s rRNA gene was then performed at 454 GS-FLX+ (Roche) platform in order to evaluate the
enrichment of lignocellulose degrading microorganisms. Whole genome metagenomic sequencing (454 GS-FLX+)
was then performed to the most enriched sample (i.e. the one with high proportion of taxa described as lignocellulose
degraders and minus of commensals). Bioprospection analysis using bioinformatics tools was then performed. First,
we did de novo assembly using the CAMERA [https://portal.camera.calit2.net/gridsphere/gridsphere] assembler
workflow. Then we used the MG- RAST [http://metagenomics.anl.gov/] platform for taxonomic and functional
annotation. We extracted coding sequences (CDS) using Fraggene scan open reading frame (ORF) algorithm. We
finally ran Blast against CAZy database [http://www.cazy.org/] to find lignocellulosic enzyme domains in our CDS
dataset. A customized Perl script was used to get only those glycosyl hydrolase and cellulose binding domains
linked with degrading activities [4]. Finally, we selected only those sequences who had shown consistence with Pfam
[http://pfam.xfam.org/], UniProt [http://www.uniprot.org/] and Priam [http://priam.prabi.fr/] annotations, proper
ORF length and not high homology with database enzymes (below 80%).
Results. The metagenomic sequencing produced 718.489 reads, 421 pair bases (pb) long in average, totaling
302.172.049pb. A 10% (30.458.285pb) of the total pair bases were assembled in contigs. Maximum length contig
was 523.078pb. We manually selected 39 promising proteins with an average length of 644pb, figure 1 and table 1
summarize its identity and domains.
23
Proteomics and functional proteomics
ID:18
Oral Session
Figure 1: The pie chart shows the abundance of glycosil hydrolase and cellulose binding domains in the selected proteins.
Enzymes
Acetylxylan esterase
Alpha-glucuronidase
Alpha-N-arabinofuranosidase
Beta-glucosidase
Endo-1,4-beta-xylanase
Endoglucanase
Xylan 1,4-beta-xylosidase
Feruloyl esterase
EC number
3.1.1.72
3.2.1.139
3.2.1.55
3.2.1.21
3.2.1.8
3.2.1.4
3.2.1.37
3.1.1.73
#
4
1
6
12
2
2
11
1
Table 1: shows all enzyme activities selected with his Enzyme Commission (EC) number and abundance of each one.
Conclusions. The enrichment process allowed us to get bacterial consortia containing lignocellulose degrading microorganism, as we seen previously by 16s rRNA amplicon sequencing. But only implementing metagenomic sequencing
we were able to know sequence identity of proteins involved in lignocellulose degrading. Proteins were manually
annotated and a subset selected applying bioinformatics tools. This proceedings resulted in a list of 39 promising
enzymes. These will be subject of experimental test at lab to take part of a degrading cocktail.
Acknowledgments. We would like to thanks to Lic. Soledad Romero and Lic. Bianca Brun for perform all sequencing
runs used in this study.
References
1. Naik SN, Goud VV, Rout PK, Dalai AK: Production of first and second generation biofuels: A comprehensive review.
Renew Sustain Energy Rev 2010, 14: 578–597.
2. Mtui GYS: Recent advances in pretreatment of lignocellulosic wastes and production of value added products. African
Journal of Biotechnology 2009, 8: 1398–1415.
3. Crawford D, McCoy E: Cellulases of Thermomonospora fusca and Streptomyces thermodiastaticus. Appl Environ Microbiol
1972, 24: 150-152.
4. . Allgaier M, Reddy A, Park JI, Ivanova N, D’haeseleer P, Lowry P, Sapra R, Hazen TC, Simmons BA, VanderGheynst
JS et al. Targeted discovery of glycoside hydrolases from a switchgrass-adapted compost community. PLoS One 2010,
5: 372–380.
Proteomics and functional proteomics
Oral Session – Submission 18
Visualization of genetic and proteomic biodiversity in four maturity stages of
tomato fruit ripening
24
Proteomics and functional proteomics
ID:18
Oral Session
Paula B. Macat1,2∗ , Leandro Kovalevski2∗ , Marta Quaglino2 , Guillermo R. Pratta1,3
1-Consejo Nacional de Investigaciones Científicas y Técnicas
2-Instituto de Investigaciones Teóricos y Aplicados, Escuela de Estadística, Facultad de Ciencias Económicas y Estadística UNR,
Rosario, Argentina
3-Cátedra de Genética, Facultad de Ciencias Agrarias UNR, Zavalla, Argentina
* Authors equally contributing to this research
Background. Tomato (Solanum lycopersicum) is a climacteric fruit whose ripening is characterized by sequential
changes in protein expression, resulting in different profiling of polypeptide bands at each maturity stage [1]. However
fruits from diverse tomato genotypes vary in their ripening [2]. Hence tomato fruit ripening is a biological process
affected by multidimensional sources of variation, i.e.: maturity stage, genotype and protein expression.
Correspondence analysis (CA) is a multidimensional scaling technique allowing a rapid visualization of associations
among different sources of variations assessed by dichotomic data [3]. CA was applied in microarrays [3] and protein
functional [5] studies. The aim of this work was to visualize the tomato fruit ripening by a CA that allow measuring
the relative contribution of different genotypes, maturity stages and polypeptide bands to the total variation observed
during the whole process, in a bioinformatic application at the individual level of biological organization.
Materials and methods. Fruits from 15 genotypes (five Recombinant Inbred Lines -RIL- and their ten diallel Second
Cycle Hybrids -SCH-) were screened by SDS-PAGE for 25 polypeptide bands at 4 maturity stages: Mature Green
(MG), Breaker (B), Mature Red attached to plant (MRa) and Mature Red in shelves (MRs) according to [6]. A
database of 15 x 25 x 4 dimension was analysed firstly by univariate analysis for presence of each band (overall and by
stage) and secondly by multivariate CA at each maturity stage. Finally, an integrative CA was made to the complete
database.
Results. The overall presence of all polypeptide bands in the 4 maturity stages for the 15 genotypes was 0.52, having
values of 0.46 at MG, 0.55 at B, 0.53 at MRa, and 0.54 at MRs. Minimum and maximum overall presence of each
band varied from 0.05 (nearly absent) to 1 (full presence) for two given polypeptides. For most polypeptide bands,
their presence varied through different maturity stages. Some polypeptides were more frequent at later maturity
stages while others were just present in earlier stages. A higher variation among genotypes for protein expression
was found at MG and MRs by CA, supporting the hypothesis that a broader genetic diversity should be expected for
fruit traits that are less exposed to natural selection pressures [6]. The first two dimensions explained 35% of total
variation at MG, which was the most variable maturity stage for the analyzed polypeptide profiles. Two RIL and two
SCH clearly differentiated from the rest of genotypes at this stage, the polypeptide bands mostly associated to each of
this four genotypes being completely opposite in their presence (Figure 1). Respecting to the other maturity stages,
the first two dimensions explained 37% of total variation at B, 53% at MRa and 48% at MRs. The more divergent
genotypes and their corresponding associated polypeptides were varying according to maturity stage, verifying that
ripening is jointly affected by the three source of variation considered in this report, i.e., it is a multidimensional
biological process. Integrative CA identified one hybrid as the most variable individual along ripening, and seven
polypeptide bands highly associated to its discrepant performance in relation to the other genotypes of the diallel
crossing.
25
Proteomics and functional proteomics
ID:18
Oral Session
Figure 1: Position of 25 poplypeptide bands (PP) and 15 genotypes (RILs indicated as LN and SCH indicated as LNx xLNy , N being the number
assigned at each RIL by tomato breeders who obtained them) according to CA at MG maturity stage.
Conclusions. Visualization of tomato fruit ripening at four maturity stages allowed measuring the relative contribution
of genetic and proteomic diversity to this multidimensional biological process. The bioinformatic application at the
individual level of organization was efficient for identifying the most variable genotypes and their associated polypeptide
bands at each different maturity stage and along the complete ripening.
References
1. Giovannonni JJ: Genetic regulation of fruit development and ripening The Plant Cell 2004, 16: p. S160-76.
2. Rodriguez GR, Sequin L, Pratta GR, Zorzoli R, and Picardi LA: Protein profiling in F1 and F2 generations of two tomato
genotypes differing in ripening time Biologia Plantarum 2008, 52: p. 548-52.
3. Lebart L, Morineau A, and Warwick KM: Multivariate descriptive statistical analysis Wiley Chichester 1984, John Wiley,
Wiley & Sons Sons Ltd.
4. Fellenberg K, Hauser NC, Brors B, Neutzner A, Hoheisel JD, and Vingron M: Correspondence analysis applied to microarray
data PNAS 2001, 98: p 10781-86.
5. Chang JM, Taly JF, Erb I, Sung TY, Hsu WL, Tang CY, Notredame C, and Su ECY: Efficient and interpretable prediction of protein functional classes by Correspondence Analysis and Compact Set Relations PLOS 2013, 8: e75542.
doi:10.1371/journal.pone.0075542.
6. Marchionni Basté E, Pereira da Costa JH, Rodríguez GR, Zorzoli R, and Pratta GR: Genetic analysis of tomato fruit
ripening at polypeptide profiles level through quantitative and multivariate approaches American Journal of Plant Sciences
2014, 5: p. 1926-35.
26
Poster Session
Sequence analysis
Sequence analysis
ID:12
Poster Session
Poster Session – Submission 9
Physiological, genomic and proteomic evidences support the high UV resistance
profile of Acinetobacter sp. Ver3 isolated from High Altitude Andean Lakes
Daniel Kurth1 , Virginia Helena Albarracin1,2 , Carolina Belfiore1 , Marta Gorriti1 , Maria Eugenia Farias1
1 Planta Piloto de Procesos Industriales y Microbiológicos (PROIMI-CONICET), S. M. de Tucumán, 4000, Tucumán, Argentina.
2 Facultad de Ciencias Naturales e Instituto Miguel Lillo, Universidad Nacional de Tucumán, S. M. de Tucumán, 4000, Tucumán,
Argentina.
High-Altitude Andean Lakes (HAAL) are a group of disperse shallow lakes and salterns, located at the Dry Central
Andes region in South America at altitudes above 3,000 m, and exposed to a unique combination of severe conditions:
i.e. high solar global and UV irradiation, hypersalinity, wide fluctuations in daily temperatures, desiccation, high pH,
high concentrations of toxic elements including arsenic [1]. As it is considered one of the highest UV-exposed
environments on Earth, HAAL microbes can be taken as model systems to study UV-resistance mechanisms in
environmental bacteria at various complexity levels. Acinetobacter sp. Ver3, a gammaproteobacteria isolated from
Laguna Verde (4,400 m) was recently proposed as a model UV- resistant microbe with highly efficient DNA damage
photorepairing ability [2], as well as an efficient catalase machinery [3]. Here we present the genome sequence
analyses of this extremophile together with further experimental evidence supporting the idea that this bacterium is
able to cope with increased damage in DNA compared to sensitive strains. The genome analyses provided insight in the
taxonomic classification of this organism, suggesting that it would be a new species, and allowed to identify resistance
genes related to the harsh environment. Moreover, an “UV-resistome” was defined, encompassing genes related to
UV-damage repair on DNA (such as nucleases and glycosylases from excision repair systems), and genes conferring
an enhanced capacity for scavenging the reactive molecular species responsible for oxidative damage (catalases,
peroxidases and SODs). In addition, the UV response was also studied at the proteomic level, which confirmed the
involvement of a specific cytoplasmic catalase, a putative regulator, and proteins associated to aminoacid and protein
synthesis, among others. However, only a small number of proteins were overexpressed under UV stress, suggesting
that the resistance of this bacterium might be due to efficient constitutively expressed systems.
References
1. Farias ME, Poiré DG, Arrouy MJ, Albarracín VH: Modern stromatolite ecosystems at alkaline and hypersaline high-altitude
lakes in the Argentinean Puna. In STROMATOLITES Interact Microbes with Sediments. Volume 18. Edited by Tewari V,
Seckbach J. Dordrecht: Springer Netherlands; 2011:427–441. [Cellular Origin, Life in Extreme Habitats and Astrobiology]
2. Albarracín VH, Pathak GP, Douki T, Cadet J, Borsarelli CD, Gärtner W, Farias ME: Extremophilic Acinetobacter strains
from high-altitude lakes in Argentinean Puna: remarkable UV-B resistance and efficient DNA damage repair. Orig Life
Evol Biosph 2012, 42:201–21.
3. Di Capua C, Bortolotti A, Farías ME, Cortez N: UV-resistant Acinetobacter sp. isolates from Andean wetlands display
high catalase activity. FEMS Microbiol Lett 2011, 317:181–9.
Sequence analysis
Poster Session – Submission 12
Analysis of the Uniprot repertoire of amino acid post-translational modifications
Nicolás A. Méndez, Ignacio E. Sánchez
Protein Physiology Laboratory, Departamento de Química Biológica and IQUIBICEN-CONICET, Universidad de Buenos Aires
Background. The standard genetic code only accounts for the 20 most common amino acid residues. However,
many amino acids in proteins are modified posttranslationally. Thus, current sequence representations for manual
or in silico analysis provide incomplete information. We set out to describe the currently known posttranslational
modifications in terms of prevalence, phylogenetic distribution and their relationship with the chemical reactivity of
the modified standard amino acid.
28
Sequence analysis
ID:12
Poster Session
Materials and methods. We acquired the Uniprot list of posttranslational modifications (2014-03 release) and
transferred it to a MySQL database using python code. We queried the database to evaluate the distribution
of posttranslational modifications in the three domains of life and count the prevalence of each posttranslational
modification. As a proxy for chemical reactivity, we used the estimation of T. Krick et al. [1]. Regression analysis
was performed using R standard functions. The pvalues for the calculated coefficients of determination were obtained
by permutation tests as performed by P. Legendre’s multRegress R function [2]. Scatterplots were constructed using
R. The Venn diagram used to illustrate the distribution of posttranslational modifications was constructed using the
BioVenn software [3].
Figure 1: Distribution of posttranslational modifications in the three domains of life expressed as percentage of total (Bacteria in yellow, Archaea in
magenta and Eukarya in grey).
Results. We found 466 unique posttranslational modifications in the Uniprot ontology. Note that glycation, lipidation,
disulfide bridges and crosslinks are not included in the Uniprot posttranslational modification ontology and were
therefore not considered at this stage of analysis. We quantified the number of posttranslational modifications for
each of the 20 standard amino acids and the number of modifications involving one, two or more residues (Table 1).
We also examined the distribution of posttranslational modifications in the three domains of life (Figure 1). Last,
we quantified the correlation between the number of posttranslational modifications per standard amino acid and the
chemical reactivity of each amino acid.
29
Sequence analysis
ID:19
Decay (1/time)
Residue
Total PTMs
1
30
9
5
4
1
14
2
8
2
13
10
3
8
4
6
6
2
12
7
A
C
D
E
F
G
H
I
K
L
M
N
P
Q
R
S
T
V
W
Y
17
105
22
22
10
47
20
15
54
11
18
29
17
14
24
55
38
11
19
35
#PTM
volving
aa
12
42
15
15
5
18
15
11
33
8
11
14
11
9
19
31
22
7
12
21
Poster Session
in1
#PTM
volving
aa
5
57
7
7
5
29
5
4
21
3
5
15
6
5
5
18
16
4
5
12
in2
#PTM involving 3+ aa
6
2
6
2
2
Table 1: Quantification of posttranslational modifications of standard amino acids. The leftmost column shows the reactivity estimation from [1], the
columns to the right show the total number of PTMs for a given standard residue.
Conclusions. We propose that the standard amino acid alphabet should be expanded to include the diverse universe
of posttranslational modifications. Since including all posttranslational modifications seems impractical, quantitative
prevalence data will be needed to decide which posttranslational modifications are most important. The results are
likely to be different in the three domains of life and may be explained in part by the chemical reactivity of the standard
amino acids.
References
1. Teresa Krick, David A. Shub, Nina Verstraete, Diego U. Ferreiro, Leonardo G. Alonso, Michael Shub, Ignacio E. Sanchez:
"Amino acid metabolism conflicts with protein diversity." arXiv:1403.3301 [qbio.PE] 2014.
2. P. Legendre: "Rlanguage functions” http://adn.biol.umontreal.ca/~numericalecology/Rcode/
3. Hulsen T, de Vlieg J, Alkema W: “BioVenn a web application for the comparison and visualization of biological lists using
areaproportional Venn diagrams.” BMC Genomics 2008, 9:488.
Sequence analysis
Poster Session – Submission 19
Spatial organization and distribution of linear motifs in the Ankyrin repeat protein
family and its binding partners
Nina Verstraete, Ignacio E. Sánchez, Diego U. Ferreiro
Universidad de Buenos Aires, Departamento de Quimica Biologica - IQUIBICEN- CONICET, Laboratorio de Fisiologia de Proteinas.
30
Sequence analysis
ID:21
Poster Session
Background. Interactions between proteins regulate cellular physiology. Many of these interactions involve the
recognition of short peptidic regions (i.e. short linear motifs, SLiMs) which can be characterized by simple sequence
patterns, usually found in intrinsically disordered regions or in loops connecting globular or transmembrane domains.
These peptide- domain interactions are typically transient and often involve folding upon binding, challenging the
lock-and-key paradigm of protein recognition. Ankyrin-repeats domains are one of the most frequently observed
protein-protein interactors in nature. These domains are composed of tandem arrays of recurrent amino acids that
cooperatively fold into elongated structures that mediate molecular recognition with high specificity. Many ankyrinbinding sites are either predicted or demonstrated to correspond to extended peptides mimicking SLiMs.
Description. We present here an exhaustive analysis of linear motif identification in Ankyrin proteins and their binding
partners. We searched for enriched or depleted SLiMs with respect to a random exploration of the sequence-space
in the Ankyrin protein family and their partners. We also analyzed the spatial distribution of SLiMs along the protein
sequences and describe how particular SLiMs are structurally distributed in the Ankyrin-containing proteins.
Conclusions. This computational work presents sequence and structure-based approaches to analyze linear motifmediated protein interactions in the Ankyrin repeat protein family. We discuss that the presence of functional
constraints can conflict with the Ankyrin-repeats domains folding dynamics which in turn modulate the evolution of
biological interactions.
Sequence analysis
Poster Session – Submission 21
Segmentation of continuous range random variables sequences using entropic
distances
Miguel A. Ré1,2 , José L. Martínez1
1-Facultad Regional Córdoba, Universidad Tecnológica Nacional, Maestro López y Cruz Roja Argentina, Ciudad Universitaria, 5010
Córdoba 2-Facultad de Matemática, Astronomía y Física, Universidad Nacional de Córdoba, Haya de la Torre y Medina Allende,
Ciudad Universitaria, 5010 Córdoba
Jensen Shannon Divergence (JSD), a symmetrized version of Kullback-Leibler divergence[1], allows quantifying the
difference between probability distributions. This property has been widely applied to the analysis of symbolic sequences
by comparing the symbol composition of different subsequences [2]. One main advantage of JSD is that it does not
require to map the symbolic sequence to a numerical sequence, which is necessary for instance in spectral or correlation
analyses.
JSD has been widely employed to detect domain walls in discrete sequences. See for instance segmentation of genomic
chains [3]. JSD has been generalized in different ways considering non extensive entropy [4,5] or by considering higher
order correlations in subsequences through Markov models [6,7].
Although JSD is a well defined magnitude for continuous distributions it has not been so extensively considered in
continuous sequences segmentation. It is nevertheless of interest its application in separation of quantum states or the
analysis of polarization images [8,9]. An alternative method for continuous random variables sequence segmentation
is presented in this communication. In this proposal a new discrete variable is defined by considering sample mean
and/or variance. The applicability of the method developed is considered by analysing continuous sequences artificially
generated.
References
1. Kullback S, Leibler R: On information and sufficiency. Ann Math. Stat. 1961, 22: 79- 86.
2. Grosse I, Bernaola-Galván P, Carpena P, Román-Roldán R, Oliver J, Stanley H: Analysis of symbolic sequences using the
Jensen-Shannon divergence. Phys. Rev. E 2002, 65: 041905 1-16. And references therein.
3. Arvey A, Azad R, Raval A, Lawrence J: Detection of genomic islands via segmental genome heterogeneity. Nucleic Acids
Research 2009, 1-12.
4. Tsallis C: Possible Generalization of Boltzmann-Gibbs statistics. J. Stat. Phys. 1988, 52: 479-487.
31
Sequence analysis
ID:38
Poster Session
5. Lamberti P, Majtey A: Non-logarithmic Jensen-Shannon divergence. Phys. A 2003, 329: 81-90.
6. Thakur V, Azad R, Ramaswamy R: Markov models of genome segmentation. Phys. Rev. E 2007, 75: 011915 1-10.
7. Ré M.A., Azad R.K.: Generalization of Entropy Based Divergence Measures for Symbolic Sequence Analysis. PLoS ONE
9(4): e93532. doi:10.1371/ journal.pone. 0093532 (2014).
8. Jacques S. L., Roman J. R. and Lee K.: Imaging Superficial Tissues With Polarized Light. Lasers Surg. Med. 2000, 26:
119-129.
9. Tannous Z., Al-Arashi M., Shah S. and Yaroslavsky A.: Delineating melanoma using multimodal polarized light imaging.
Lasers Surg. Med. 2009, 41: 10-16.
Sequence analysis
Poster Session – Submission 38
Unveiling evolutionary signals in protein-protein interaction interfaces
Elin Teppa1 , Diego Javier Zea2 , Ariel Berenstein1 and Cristina Marino Buslje1
1
Structural Bioinformatics, Fundación Instituto Leloir
2
Structural Bioinformatics Group, Universidad Nacional de Quilmes
Protein-protein interactions are involved in most cellular processes. The study of protein interactions from an evolutionary perspective is challenging, since it is difficult to distinguish evolutionary constraints due to protein structure
and function preservation from those that arise due to interaction. The description and detection of evolutionary
signals in protein-protein interactions is currently a very active field of research. Interacting residues are involved in
inter-molecular interactions and they are structurally and functionally constrained, and therefore subject to a selection pressure that could be detected in homologous sequences. However residue conservation within the interface is
far from obvious in many cases and the signal is usually weak. One reason is that the evolutionary pressure is not
homogeneous within an interface (1). Also the coevolutionary signal between residues has been explored for detecting
interacting residues with limited success (2). A decomposition of the interacting interface has been proposed where
there is a core of buried residues, surrounded by a rim of residues whose atoms remain with some solvent accessibility
(3,4). From a functional point of view, residues of interface core and rim have different contributions to the binding
energy and consequently different selection pressures (5).
Figure 1: Boxplot of Conservation (C) and cumulative MI (cMI) scores by protein regions: Interface Core (IC), Interface Rim (IR), Protein Core (PC)
and Protein Surface (PS).
Here we present a detailed study on protein-protein interaction using a comprehensive dataset of biological unit complexes (6). We dissected each interacting unit into four region: protein core (PC), protein surface (PS), interacting
core (IC) and interacting rim (IR) based on the delta solvent accessibility upon complex formation and the relative
solvent accessibility in the complex. Results show that there is no substantial difference between PC and IC, and
PS and IR regions regarding conservation and coevolution. Also we have found that a coevolutionary derived measure (cMI) (7) displays a greater difference between IC and IR than residue conservation (see Figure 1). Regarding
32
Sequence analysis
ID:40
Poster Session
conservation and coevolution signals on residues involved in different number of interfaces, we have found that their
conservation increases with the number of interacting partners while their cMI score decreases (see Figure 2)
Figure 2: Boxplot of conservation and cMI scores by number of interact interfaces in which an interface residue participates.
References
1. Guharoy M, Chakrabarti P. Conserved residue clusters at protein-protein interfaces and their use in binding site identification. BMC Bioinformatics. 2010;11:286.
2. Mintseris J, Weng Z. Structure, function, and evolution of transient and obligate protein-protein interactions. Proc Natl
Acad Sci U S A. 2 de agosto de 2005;102(31):10930-5.
3. Bogan AA, Thorn KS. Anatomy of hot spots in protein interfaces. J Mol Biol. 3 de julio de 1998;280(1):1-9.
4. Lo Conte L, Chothia C, Janin J. The atomic structure of protein-protein recognition sites. J Mol Biol. 5 de febrero de
1999;285(5):2177-98.
5. Guharoy M, Chakrabarti P. Conservation and relative importance of residues across protein- protein interfaces. Proc Natl
Acad Sci U S A. 25 de octubre de 2005;102(43):15447-52.
6. Bickerton GR, Higueruelo AP, Blundell TL. Comprehensive, atomic-level characterization of structurally characterized
protein-protein interactions: the PICCOLO database. BMC Bioinformatics. 29 de julio de 2011;12(1):313.
7. Marino Buslje C, Teppa E, Di Doménico T, Delfino JM, Nielsen M. Networks of High Mutual Information Define the
Structural Proximity of Catalytic Sites: Implications for Catalytic Residue Identification. PLoS Comput Biol. 4 de
noviembre de 2010;6(11):e1000978.
Sequence analysis
Poster Session – Submission 40
Tools for the visualization of quality parameters and information of targeted
sequencing data
Nathalie B. Vicente1 , Gabriela Merino 2 , Juan M. Sendoya 3 , Javier Oliver 3 , Federico Prada 1 , Elmer
Fernández 2 , Andrea Llera 3
1
UADE,
2
BDMG,
3
Leloir
Next generation sequencing (NGS) is immersed in the big data paradigm. An easy visualization and integration of
the large amounts of information produced by NGS is paramount for the interpretation of results and more complex
analyses. We have developed simple tools for the friendly visualization of sequencing quality parameters and variant
calling analysis derived from targeted sequencing projects, particularly for the Ion TorrentPlatform (Life Technologies).
In these experiments, multiplex PCR are designed to specifically amplify different regions of the genome and only
those regions (amplicons) will be subsequently sequenced. These tools were built as standalones developed in Java
and are oriented to final users which do not require advanced computer skills. In the present work, these programs
were used to analyze the results of a targeted sequencing experiment on human cancer cell lines, yet they can
33
Sequence analysis
ID:50
Poster Session
be easily adapted for use in other pathologies and genetic/molecular studies. Firstly, a heatmap was designed to
analyze amplicon and gene depth coverage at a glance, allowing for a fast examination of the performance of the
library construction process, particularly of the number of reads per amplicon, per gene and per primer panel pool.
Such analysis uses color coding to easily distinguish between poorly, moderately, well and exceptionally well performing
samples. Additionally, a circos plot was used to graphically compare sequencing variants throughout multiple samples.
These circular diagrams can be used for various purposes, such as distribution of single-nucleotide polymorphisms
(SNPs) or multiple pair-wise comparisons (for instance, between cancer and normal, experimental and control, or
pre-treatment and post-treatment samples). In conclusion, we have developed tools which can be used for the easy
and friendly visualization of sequencing quality parameters and information of targeted sequencing experiments, for
its use in basic and clinical research.
Sequence analysis
Poster Session – Submission 50
Using coevolution classification to improve protein subfamily
Franco L Simonetti1 , Martin Banchero1 , Ariel J Berenstein2 , Ariel Chernomoretz2 , Cristina Marino Buslje1
1
Bioinformatics Unit, Fundación Instituto Leloir, Capital Federal, Argentina.
2
Integrative Systems Biology Group, Fundación
Instituto Leloir, Capital Federal, Argentina.
Background. The common approach for protein subfamily classification relies on grouping protein sequences according
to their degree of similarity. However, there is no single sequence similarity threshold for accurately grouping sequences
into isofunctional groups. Most methods rely on protein superfamilies as a starting point for subfamily classification.
Superfamilies are defined as a set of homologous proteins in which conserved sequence or structural characteristics can
be associated with conserved functional characteristics. Superfamily members can be highly divergent and catalyze
quite different overall reactions. A subfamily is defined as a set of homologous proteins within a superfamily that
perform an identical function by the same mechanism
Current subfamily classification methods use bottom-up clustering to construct a cluster hierarchy, then cut the
hierarchy at the most appropriate locations to obtain a single partitioning [1, 2]. These methods usually integrate
data such as protein sequence similarity, residue conservation within groups and HMM profiles. Moreover, results
usually predict a great number of subfamilies with few members and limited biological meaning.
The goal of this study is to identify subsets of functionally closely related sequences within a given superfamily. Since
all proteins within a superfamily share a common ancestor, we hypothesize that functional diversity within superfamilies
has arisen through a series of concerted changes that must have left an identifiable coevolutionary signal
Materials and methods. The challenge is to be able to separate the subfamilies coevolutionary signals and use
them in the process of subfamily classification. This information can be used to guide a hierarchical clustering.
Our approach uses Mutual Information to calculate covariation [3] and commonly used clustering methods based on
sequence similarity. We have defined a select group of superfamilies from the Structure Function Linkage Database
as our gold standard dataset [4].
Results. Different approaches were considered for integrating Mutual Information data in sequence clustering. Since
Mutual Information can only be calculated for a group of sequences, a preliminary sequence clustering is performed.
Using solely covariation data, our method can cluster groups of sequences from the same subfamily. For a complete
clustering solution, it performs almost as good as a hierarchical clustering based on sequence similarity. The next
step will be to integrate both methods
Conclusions. Automated protein classification remains an active topic of research and state of the art methods are
far from predicting biologically meaningful results. Covariation data has never been used before in this context and
further analysis are needed to improve the method.
References
34
System Biology and Networks
ID:28
Poster Session
1. David A Lee, Robert Rentzsch and Christine Orengo. GeMMA: functional subfamily classification within superfamilies
of predicted protein structural domains. Nucleic Acids Res. 2010 Jan;38(3):720-37. doi: 10.1093/nar/gkp1049. Epub
2009 Nov 18.
2. Brown DP, Krishnamurthy N, Sjölander K. Automated protein subfamily identification and classification. PLoS Comput
Biol. 2007 Aug;3(8):e160.
3. Buslje CM, Santos J, Delfino JM, Nielsen M. Correction for phylogeny, small number of observations and data redundancy improves the identification of coevolving amino acid pairs using mutual information. Bioinformatics. 2009 May
1;25(9):1125-31. doi: 10.1093/bioinformatics/btp135. Epub 2009 Mar 10.
4. Eyal Akiva et al, Patricia Babbitt. The Structure-Function Linkage Database. Nucleic Acids Res. 2014 Jan 1;42:D521-30
System Biology and Networks
Poster Session – Submission 28
Improving Rule-Based Gene Regulatory Network Inference by means of
Biclustering
Cristian A. Gallo1 , Jessica A. Carballido1 , Ignacio Ponzoni1,2
1
Laboratory for Research and Development in Scientific Computing (LIDeCC), DCIC, UNS, Bahía Blanca, Argentina 2 Planta Piloto
de Ingeniería Química, CONICET-UNS, Bahía Blanca, Argentina
Background. Gene regulatory networks (GRNs) play an important role in the progression of life phenomena such as
cell cycling, developmental biology, aging, and the progressive and recurrent pathogenesis of complex diseases, among
others. The amount of gene expression time series data is becoming increasingly available, providing the opportunity
to reverse engineer the time-delayed gene regulatory networks that govern the majority of these molecular processes.
In this context, data mining methods constitute suitable approaches for performing the inference of the relational
structures of a GRN [1].
Methods. The aim of the research presented here consists on the enhance of GRN based on association rules from
multiple microarray time series datasets given as input. In this regard, a rule-based inference algorithm (GRNCOP2)
[2] was combined with a biclustering technique (BiHEA) [3] in order to increase the useful information extracted
from the datasets. The association rules establish causal links between two genes, where the semantics and the
interpretation depend of the input data and on the rule type inferred. This provides a global view of the relation
between each pair of genes since it considers all the data available on the expression profiles. On the other hand, the
biclustering algorithm can be used to extract co-expression (similar or opposed) relations between genes that may
only occur in a subset of the experimental conditions, extracting additional associations with a local view of the data
that may not be captured by the main inference algorithm. In order to combine both methods, a pair-wise analysis
is performed to extract association rules from the biclusters obtained from all the datasets, adding the best rules
to the GRN inferred by the ruled based method. The proposed approach was applied to time series datasets [4, 5]
composed of twenty yeast genes that are highly relevant for the cell-cycle study, and the results were analyzed in terms
of the novelty and soundness of the rules provided by the biclustering algorithm. In order to assess the soundness
of the rules, the average accuracy for the rules was measured regarding a freely available database of associations
between yeast genes known as Yeastnet [6]. The Figure 1 shows the average accuracy for the rules obtained by the
ruled based approach, the biclustering algorithm and the combination of both methods. It also shows the expected
accuracy if the rules were picked randomly. The Figure 2 shows the network obtained by the rule-based approach
alone and the same network enhanced by the rules obtained through the biclustering algorithm. As it can be observed,
the set of rules inferred by the two algorithms and the combined results achieve high accuracy values regarding the
Yeastnet benchmark database, performing above the random selection as expected. Although the rules inferred by
the biclustering algorithm are less accurate than those extracted by the rule-based approach, these rules represent
new potential relations that were not discovered by the main inference algorithm, thus enhancing the overall inference
capabilities.
35
System Biology and Networks
ID:28
Poster Session
Conclusions. In this work, we have introduced an approach to integrate the results of a rule-based method with
a biclustering algorithm for the inference of gene regulatory networks. The method was validated with well known
publicly available gene expression datasets. The results have shown that the combined approach infers a gene
regulatory network with high average accuracy regarding the Yeasnet database, providing new relations that were
not present in the GRN inferred by the rule-based method alone. This shows the importance of combining different
approaches in the inference of gene regulatory network, since it provides alternative views of the data and allows the
discovery of significant relations that may no be detectable by an specific approach. Further analysis is required in
order to confirm these promissory results.
Acknowledgments. This work is kindly supported by CONICET grant PIP 112-2012-0100471CO and UNS grant
PGI 24/N032.
Figure 1: Average Yeastnet accuracy of the rules inferred by the rule-based approach (GRNCOP2), the biclustering algorithm (BiHEA), the combined
results and a random selection.
Figure 2: Gene regulatory networks inferred by the algorithms. The red arrows represents gene activation, whereas the blue arrows implies gene
inhibition. Left: rule-based gene network inferred by GNRCOP2. Right: rule-based gene network enhanced by the new relations inferred by BiHEA.
The new rules are denoted in light blue and light red.
References
1. Gallo, CA, Carballido, JA, Ponzoni, I: Inference of Gene Regulatory Networks based on Association Rules, In Biological
Knowledge Discovery Handbook: Preprocessing, Mining and Postprocessing of Biological Data. Edited by Elloumi M,
Zomaya AY. John Wiley & Sons. 2013.
2. Gallo, CA, Carballido, JA, Ponzoni, I: Discovering Time-Lagged Rules from Microarray Data using Gene Profile Classifiers,
BMC Bioinformatics 2011, (12)123:1-21.
36
System Biology and Networks
ID:30
Poster Session
3. Gallo, CA, Carballido, JA, Ponzoni, I: BiHEA: A Hybrid Evolutionary Approach for Microarray Biclustering, Lecture Notes
in Bioinformatics, Springer-Verlag 2009, 5676:36–47.
4. Segal, E, Shapira, M, Regev, A, Pe’er, D, Botstein, D, Koller, D, Friedman, N: Module Networks: Identifying Regulatory
Modules and Their Condition-Specific Regulators from Gene Expression Data, Nature Genetics 2003, 34:166-176.
5. Yeang, CH, Jaakkola, T: Physical Network Models and Multi-Source Data Integration. Proc Seventh Ann Int’l Conf
Research in Computational Molecular Biology 2003, 312-321.
6. Lee I, Li Z, Marcotte EM: An improved, bias-reduced probabilistic functional gene network of baker’s yeast, Saccharomyces
cerevisiae. PLoS ONE 2007, 2(Suppl 10):e988
System Biology and Networks
Poster Session – Submission 30
Photoreceptor Absorption Curves account for human chromatic Discrimination
Ability
María da Fonseca, Inés Samengo
Física Estadística e Interdisciplinaria, Centro Atómico Bariloche
Photoreceptors constitute the first stage in the processing of color information; many more stages are required before
humans can consciously report whether two stimuli are perceived as chromatically equal or not. Therefore, although
photoreceptor absorption curves (panel A) are expected to influence the accuracy of conscious discriminability, there
is no reason to believe that they should suffice to explain it. However, by means of a simple information-theoretical
analysis, here we demonstrate that photoreceptor absorption properties predict the wavelength dependence of human
color discrimination ability, as tested by behavioral experiments (panel B). The bottleneck in chromatic information
processing, therefore, seems to be determined by photoreceptor absorption characteristics. Subsequent encoding
stages preserve the wavelength dependence of chromatic discriminability at the photoreceptor level. Our formalism is easily extended to include light beams of arbitrary spectral power distribution, predicting the discrimination
ability in the 3- dimensional color space CIE XYZ and in the 2-dimensional space CIE xyY. We finally explore the
chromatic discrimination ability of subjects with atypical photoreceptor absorption characteristics, as in daltonism or
tetrachromatism.
37
System Biology and Networks
ID:42
Poster Session
Figure 1: A. Normalized photoreceptor absorption curves for S (blue) M (green) and L (red) cones.
B. Discrimination error ∆λ as a function of wavelength λ for eight different subjects [2].
References
1. Stockman A, Brainrard DH (2009) Color vision mechanisms. In: OSA Handbook of Optics (Bass M, ed), pp. 11.1-11.104.
New York: McGraw-Hill.
2. Smeulders N, Campbell FW, Andrews PR (1994) The Role of Delineation and Spatial Frequency in the Perception of the
Colours of the Spectrum. Vision Res 34:927-936.
System Biology and Networks
Poster Session – Submission 42
PaNTex: A novel methodology to assemble Pathway Networks using Text Mining
Julieta S. Dussaut1 , Fiorella Cravero2 , Ignacio Ponzoni1 , Ana G. Maguitman3 , Rocío L. Cecchini1
1
Laboratory of Research and Development in Scientific Computing (LIDeCC), Department of Computer Science, Universidad Na-
cional del Sur - Bahía Blanca, Argentina
2
Planta Piloto de Ingeniería Química, CONICET - Bahía Blanca, Argentina
3
Artificial Intelligence Research and Development Laboratory (LIDIA), Department of Computer Science, Universidad Nacional del
Sur - Bahía Blanca, Argentina
38
System Biology and Networks
ID:42
Poster Session
Background. Systems Biology is a discipline that integrates biological knowledge coming from different sources to
study a range of complex biological regulatory system. In this context, the pathways, firstly created as a graphical
representation of well-established knowledge about biological processes, are becoming increasingly important for life
science research [1]. However the determination of interaction patterns in pathway networks is typically a manual
procedure which requires significant contributions from domain experts within the research community. During the
past years we have witnessed the emergence of novel data-driven methods aimed at assisting Systems Biology
research. In particular, the analysis of information on molecular events contained in very large repositories has led to
new approaches to extract biological interactions from scientific literature [2]. Literature mining methods can help
analyze, integrate, and understand not only large collections of data per se, but also the linkages amongst them which
allow us to make inferences [3, 4]. The fast publication of new papers make staying up-to-date a serious challenge
(i.e. PubMed database contains information for over 23 million articles and continues to grow at a high rate weekly).
Therefore, text mining methods, which aid in the construction and maintenance of pathway knowledge, have become
relevant tools for biologists to manage this increasing quantity of biological literature. Another crucial issue in text
mining applied to Bioinformatics is to achieve a robust testing of the methods due to the lack of large, objectively
validated test sets or “gold standards” [5]. These problems have as main consequence that many inferred pathways
do not represent coherent explanations of the reported facts [3], and to transform the results of automatically
constructed networks into pathways seems to require important additional human efforts. For that reason, the
integration of literature mining algorithms with robust validation strategies for pathway knowledge extraction is an
interesting open research field.
Materials and methods. In this work we present a literature mining approach for assisting in the construction of
a pathway network. It is important to mention that our proposal is in an initial stage of development. For this
reason, only the general architecture of the computational strategy and preliminary experiments are reported here.
As a starting point of this approach, we use KEGG pathway database in order to gather a list of pathways for each
organism, at this starting stage we consider only human and yeast as valid organisms for the method. Using this
list we search PubMed publications via its Entrez Programming Utilities and look for co-occurrence of pathways in
the same publication. The resulting data is stored in an intersection matrix. We also keep track of the number
of publications that contain a pathway name to use for normalization purposes. In order to validate the proposed
method we contrast the resulting normalized matrix with data reported in Alexeyenko & Sonnhammer [6]. A scheme
of the designed methodology is shown in Figure 1 (see next page).
Conclusions. In this work we present the architecture of a text mining approach for the extraction of associations
between pathways from PubMed literature. At this moment we are evaluating the method results using homo sapiens
data.
39
System Biology and Networks
ID:42
Poster Session
Figure 1: Scheme of PaNTex.
Acknowledgments. This work is kindly supported by PGI-UNS (24/N032), PGI-UNS 24/N029, CONICET-PIP 1122009-0100322, CONICET-PIP11220120100487, PICT-2011-0149.
References
1. Kamburov A, Pentchev K, Galicka H, Wierling C, Lehrach H, Herwig, R: ConsensusPathDB: Toward a more complete
picture of cell biology, Nucleic Acids Research, 2011, 39(Suppl.1):D712-D717.
40
Genome Annotation and Organization
ID:1
Poster Session
2. Li, C., Liakata, M., & Rebholz-Schuhmann, D. (2013). Biological network extraction from scientific literature: state of
the art and challenges. Briefings in bioinformatics, bbt006.
3. Oda K, Kim J-D, Ohta T, Okanohara D, Matsuzaki T, Tateisi Y, Tsujii J: New challenges for text mining: mapping
between text and manually curated pathways, BMC Bioinformatics, 2008; 9(Suppl 3):S5.
4. Buyko E, Linde J, Priebe S, Hahn U: Towards automatic pathway generation from biological full-text publications, Lecture
Notes in Computer Science, 2011, 7014:67-79.
5. Maguitman AG, Rechtsteiner A, Verspoor K, Strauss C, Rocha L: Large-Scale Testing of Bibliome Informatics Using Pfam
Protein Families. Pacific Symposium on Biocomputing 2006: 76-87.
6. Alexeyenko A. and Sonnhammer E.: Global networks of functional coupling in eukaryotes from comprehensive data
integration. Genome Research, 2009, 19: 1107-1116.
System Biology and Networks
Poster Session – Submission 45
Effect of plasticity on orientation selectivity in a model of primary visual cortex
Soledad Gonzalo Cogno, Germán Mato
Statistical and Interdisciplinary Physics Group, Instituto Balseiro and Centro Atómico Bariloche, San Carlos de Bariloche, Argentina,
8400
Since its discovery by Hubel and Wiesel in 1959, orientation selectivity has been observed in every mammal for
which the neuronal response selectivity of primary visual cortex (V1) has been examined. In some animals, like cat
and monkey, anatomically close V1 neurons have similar preferred orientations, giving rise to maps of orientation
preferences. However, sharp selectivity is also observed in animals, like mice, squirrels and rats, whose V1 has no
orientation map. This means that neurons with different preferred orientation are intermixed. This second scenario
is called salt-and-pepper organization. This scenario leads to question the role of intracortical connections since a
purely topographical organization of the connections would not generate reinforcement of orientation selectivity as
in the case with orientation maps.
Recent studies have shown that connections are formed selectively between neurons with similar response properties,
and connections are eliminated between visually unresponsive neurons; the overall connectivity rate is kept constant.
Though, the effect of this plastic behavior on orientation selectivity is unclear. The present work focuses on analyzing
the effect of plasticity on orientation selectivity for the salt-and-pepper organization. We simulate a patch of layer 4
composed by two populations of neurons (excitatory and inhibitory) with weakly orientated selective inputs and update
the excitatory-excitatory connections. The updating rule depends on the relative timing of the pre and post-synaptic
spikes. We find that even if the connections are substantially modified (see figure 1A), this leads only to a weak
increase in selectivity (see figure 1B and 1C). In future work, we plan to compare this phenomenon with the results
of systems with orientation maps.
Figure 1: Excitatory synaptic efficacies and Orientation-Selectivity-Index (OSI) distributions. A. Distribution of the excitatory synaptic efficacies after
the plasticity rule is applied. They are all initialized to 1. After plasticity is applied synaptic efficacies get stronger in average B. OSI distribution in
absence of plasticity C. OSI distribution in presence of plasticity.
41
Genome Annotation and Organization
Genome Annotation and Organization
ID:1
Poster Session
Poster Session – Submission 1
Comparative genomics in human parasite flatworms: Ehinococuccus granulosus
s.s. (G1 genotype) and Echinococcus canadensis (G7 genotype)
Lucas L Maldonado1 , Juliana Assis2 , Flávio Gomes Araújo2 , Natalia Macchiaroli, Marcela Cucher, Mara
Rosenzvit1 , Guilherme Oliveira2 and Laura Kamenetzky1
1-IMPaM, CONICET, Fac. de Medicina - Univ. de Buenos Aires, Argentina 2- Genomics and Computational BiologyGroup, CPqRR
- Oswaldo Cruz Foundation, Belo Horizonte, MG, Brazil.
Background. Echinococcus canadensis is a platyhelminth parasite which keeps close phylogenetic relationship with
Echinococcus granulosus and Echinococcus multilocularis, members of the class Cestoda that are involved in hydatid
infections of humans and animals. In South America three species of Echinococcus sensu lato have been reported E.
granulosus sensu stricto (G1 and G2 genotypes), E. canadensis (G6 and G7 genotypes) and E. ortleppi (G5 genotype)
(Kamenetzky and Cucher, 2014). Only limited genetic information of E. canadensis G7 was reported so far. In this
work we have sequenced the genome of this species.
Methods. High quality genomic DNA has been extracted and two paired-end libraries have been sequenced by Illumina
technology. Several pipelines of assembly have been evaluated. The genome has been de novo assembled with Velvet
using different parameters until the best assembly was obtained. Also, reads have been mapped over E. multilocularis
reference genome (Tsai et al., 2013) with BWA. Genes have been annotated by CEGMA and MAKER softwares with
flatworm data for gene model training.
Localization in E. multilocularis # genes
Chromosome 1
95
Chromosome 2
56
Chromosome 3
59
Chromosome 4
60
Chromosome 5
41
Chromosome 6
16
Chromosome 7
24
Chromosome 8
24
Chromosome 9
7
Chromosome 10*
5
Chromosome 11*
0
Total 387
*already unasssembled scaffolds of E. multilocularis reference genome
Results. Comparative studies have revealed high levels of nucleotidic identity of E. canadensis G7 with E. multilocularis
as well as with E. granulosus s. s. G1. Almost all contigs have a correlation in E. multilocularis genome (Figure 1).
Interestingly, the procedure for in silico annotation employed in this work allowed to identify 86% (387/450) of highly
conserved genes (Table 1).
42
Genome Annotation and Organization
ID:8
Poster Session
Conclusions. This is the first report of E. canadensis G7 genome. It was obtained by high throughput sequencing,
allowing a broad genome view of this particular species that shows important biological and epidemiological features.
The knowledge of this new genome would provide information for comparative genomics allowing adapting prevention
and diagnosis tools to each epidemiological situation.
References
Kamenetzky Laura y Cucher Marcela, Hidatidosis: genotipos de Echinococcus granulosus presentes en Artgentina y el mundo.
Capitulo 43, pags 411-421, Libro Temas de Zoonosis VI, 2014, 500 páginas totales, Editorial: Asociacion Argentina de Zoonosis,
ISBN 978-987-97038-5-4
Tsai IJ et al. The genomes of four tapeworm species reveal adaptations to parasitism. Nature. 2013; 496(7443):57-63
Genome Annotation and Organization
Poster Session – Submission 8
The human genome data analysis platform
Daniel Koile, Maximiliano de Sousa Serro, Diego Wallace, Patricio Yankilevich
Instituto de Investigación en Biomedicina de Buenos Aires (IBioBA) - CONICET - Partner Institute of the Max Planck Society,
CABA, Buenos Aires, Argentina
Background. The health of an individual depends upon their DNA as well as upon environmental factors. The
genome is the blueprint of an individual, and its analysis with additional biological information, such as the DNA
methylome, the transcriptome, the proteome, and the metabolome, will further provide a dynamic assessment of the
physiology and health state of an individual (1). The personal genome interpretation can be used to identify molecular
and genetic variations within the population. This genetic screening information will allow us to elucidate disease
pathways and identify new drug targets. In clinical trials this information will speed up time and reduce risks of trials
by recruiting participants based on their genetic profile. The trial results combined with genetic profiles will allow to
inform therapeutic development and identify genetic causes in drug response and side effects. Finally, this human
genome analysis platform may help us to better understand the genetic basis of diseases, to make more accurate
diagnosis, to have a better understanding of prognosis and to make better treatment decisions.
Materials and methods. The platform we are building consists in a computer cluster, a Next Generation Sequencing
(NGS) data analysis pipeline, a set of biological knowledge databases and a platform website. The software pipeline
is the key component of the platform. It is made of state of the art methods for NGS data analysis. Over 15
public open source algorithms, developed by research groups from leading institutions, which conform today’s best
practices are being used in our pipeline. This guarantees a transparent data analysis and reproducibility. The pipeline
is designed as independent modules which sequentially execute the different genome analysis tasks. The Genome
43
Genome Annotation and Organization
ID:13
Poster Session
Analysis Toolkit (GATK) developed by the Broad Institute (2) is widely used in our pipeline, complementing other
analysis and visualization tools.
Conclusions. This human genome general analysis pipeline provides us the basis to participate in different biomedical
projects which include patient genetic profiles and allow us to start collaborations with experimental research groups
working with human diseases. Eventually, this basic framework can be customized to provide further important
applications such as cancer diagnosis, non-invasive prenatal tests or newborn screening. In future work we aim to
extend the platform to integrate transcriptome and epigenome data into the analysis.
References
1. Chen R, Mias R, Li-Pook-Than J, Jiang L, Lam H, Chen R, Miriami E, Karczewski K, Hariharan M, Dewey F, et al.
Personal omics profiling reveals dynamic molecular and medical phenotypes. Cell 2012, 148(6): 1293-1307.
2. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M,
DePristo MA. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing
data. Genome Res 2010, 20:1297-1303.
Genome Annotation and Organization
Poster Session – Submission 13
The plastome of the Yerba Mate tree
Jimena Cascales1,2 , Mariana Bracco1 , Lidia Poggio1,2 , Alexandra M Gottlieb1,2
1
-Laboratorio de Citogenética y Evolución, Departamento de Ecología, Genética y Evolución, IEGEBA (UBA-CONICET), FCEyN,
UBA. Int. Güiraldes 2620, Ciudad Universitaria, Pab. II, 4to piso, Laboratorio 61-62. (C1428EHA), CABA, Argentina.
2
2-CONICET. [email protected]
Background. The ”yerba mate” tree (Ilex paraguariensis) is a perennial native to subtropical South America. Its
economic value relies on the usage of the leaves and twigs, to prepare a popular infusion. The custom of drinking
”mate” is a legacy of the Guaraní culture strongly rooted in our society. Several medicinal properties are attributed to
the high concentrations of various secondary metabolites, minerals and vitamins. In Argentina the production of ”yerba
mate” is restricted to Misiones and Corrientes, due to the climate and soil requirements of the crop. Phytochemical
studies on this species abound in the literature[1,2]; notably, the information about basic genetics is very limited. To
contribute to its genetic knowledge, we faced the sequencing of the chloroplast genome, analyzing its structure and
gene content.
Materials and methods. First, intact chloroplasts were isolated from fresh materials using the Chloroplast Isolation
Kit (Sigma). The plastidic DNA was extracted adapting protocols [3,4]. The samples were sequenced using 454
GSFLX+Roche at the INDEAR (Rosario, Santa Fe). There, a preliminary contig assembly was attempted. We used
bioinformatic tools to verify and assemble a definite plastome. A consensus sequence was obtained with Sequencher
v4.1.4 (GeneCodes Corporation); the annotation was carried- out with CpGAVAS[5]. Specific PCR primers were
designed with Primer3Plus[6] and Primer- BLAST[7], to check the junctions between the large (LSC) and small (SSC)
single-copy segments and the two inverted repeats (IRs). The reading frameworks were adjusted using sequences of
Ilex cornuta as references, with the NCBI-BLAST[8] algorithms and the MSWAT[9] web server. The number and
location of repeats were assessed using REPuter[10]. Plastidic microsatellite loci and the corresponding primer pairs
were detected using the WebSat[11] server.
Results. As the sequencing result, 492,515bp were generated (in 56 contigs from 4 individuals). A consensus
sequence of 157.6bp was obtained for the complete plastome. It shows the typical quadripartite structure, having a
LSC of 87,148bp; two IRs of 26,076bp each, and a SSC of 18,310bp. In total, 114 unique genes were detected; 80
are coding sequences, 30 tRNAs and 4 rRNAs (table 1). Fourty-nine repeats were identified, 27 palindromic and 22
forward. Thirty-five potential mononucleotidic and one dinucleotidic microsatellite loci were detected, their utility as
markers remains to be evaluated.
44
Genome Annotation and Organization
Gene cluster
Small subunit of ribosome
Large subunit of ribosome
RNA polymerase subunits
NADH dehydrogenase
Photosystem I
Photosystem II
Cytochrome b/f complex
ATP synthase
Large subunit of RUBISCO
Translational initiation factor
Maturase
Protease
Envelope membrane protein
Subunit of acetyl-CoA carboxylase
c-type cytochrome synthesis gene
Conserved ORF of unknown function
Transfer RNA genes
Ribosomal RNA genes
ID:13
Poster Session
Identification
rps2, rps3, rps4, rps7, rps8, rps11, rps12o, rps14,rps15, rps16o, rps18, rps19
rpl2o, rpl14, rpl16o, rpl20, rpl22, rpl23, rpl32, rpl33, rpl36
rpoA, rpoB, rpoC1o, rpoC2
ndhAo, ndhBo, ndhC, ndhD, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ, ndhK
psaA, psaB, psaC, psaI, psaJ, ycf3*
psbA, psbB, psbC, psbD, psbE, psbF, psbH, psbI, psbJ, psbK, psbL, psbM,
psbN, psbT, psbZ
petA, petBo, petDo, petG, petL, petN
atpA, atpB, atpE, atpFo, atpH, atpI
rbcL
infA
matK
clpP*
cemA
accD
ccsA
ycf1, ycf2, ycf4, ycf15
trnA-UGCo, trnC-GCA, trnD-GUC, trnE-UUC, trnF-GAA, trnG-GCC, trnGUCCo, trnH-GUG, trnI-CAU, trnI-GAUo, trnK-UUUo, trnL-CAA, trnLUAAo, trnL-UAG, trnM-CAU, trnfM-CAU, trnN-GUU, trnP-UGG, trnQUUG, trnR-ACG, trnR-UCU, trnS-GCU, trnS-GGA, trnS-UGA, trnT-GGU,
trnT-UGU, trnV-GAC, trnV-UACo, trnW-CCA, trnY-GUA
rrn4.5, rrn5, rrn16, rrn23
Table 1: Genes located in the IR region are shown in bold. ogene with one intron; *gene with two introns. ORF, open reading frame
Conclusions. The data presented herein constitutes a novel contribution, and a useful information platform that will
enhance the generation of new ”yerba mate” varieties, the improvement of the crop’s genetic background, and the
devise of original transgenesis experiments. These, in turn, will directly benefit the ”yerba mate” industry, one of our
most profitable economic activities.
References
1. Filip R, Ferraro GE, Bandoni AL, Bracesco N, Nunes E, Gugliucci A, Dellacassa E: Mate (Ilex paraguariensis). In: Imperato,
F. (ed) Recent advances in Phytochemistry, 2009. Research Signpost, Kerala, India, pp 113-131.
2. Heck CI, González De Mejia E: Yerba mate tea (Ilex paraguariensis): A comprehensive review on chemistry, health
implications, and technological considerations. J Food Sci 2007, 72:R138-151.
3. Diekmann K, Hodkinson TR, Fricke E, Barth S: An optimized chloroplast DNA extraction protocol for grasses (Poaceae)
proves suitable for whole plastid genome sequencing and SNP detection. PLoS ONE 2008, 3(7): e2813. doi:10.1371/journal.pone
4. Shi C, Hu N, Huang H, Gao J, Zhao Y-J, Gao L-Z: An improved chloroplast DNA extraction procedure for whole plastid
genome sequencing. PLoS ONE 2012, 7(2): e31468. doi:10.1371/journal.pone.0031468.
5. Liu C, Shi L, Zhu Y, Chen H, Zhang J, Lin X, Guan X: CpGAVAS, an integrated web server for the annotation, visualization,
analysis, and GenBank submission of completely sequenced chloroplast genome sequences. BMC Genomics 2012, 13:715.
6. Untergasser A, Cutcutache I, Koressaar T, Ye J, Faircloth BC, Remm M, Rozen SG: Primer3–new capabilities and
interfaces. Nucleic Acids Res 2012, 40(15):e115.
7. Ye J, Coulouris G, Zaretskaya I, Cutcutache I, Rozen S, Madden TL: Primer-BLAST: A tool to design target-specific
primers for polymerase chain reaction. BMC Bioinformatics 2012, 13:134.
8. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol 1990, 215: 403-410.
45
Genome Annotation and Organization
ID:17
Poster Session
9. Cai Z: Comparative Analyses of Land Plant Plastid Genomes. Dissertation Presented to the Faculty of the Graduate
School of The University of Texas at Austin in Partial Fulfillment of the Requirements for the Degree of Doctor of
Philosophy, 2010. [10] Kurtz S, Choudhuri JV, Ohlebusch E, Schleiermacher C, Stoye J, Giegerich R: REPuter: the
manifold applications of repeat analysis on a genomic scale. Nucleic Acids Res 2001, 29(22): 4633–4642. [11] Martins
WS, Soares Lucas DC, de Souza Neves KF, Bertioli DJ: WebSat - A web software for microsatellite marker development.
Bioinformation 2009, 3(6): 282-283.
Genome Annotation and Organization
Poster Session – Submission 16
Complete genome sequencing of the thermophilic bacterium Thermus sp. 2.9
using an Illumina/pyrosequencing hybrid approach
Laura Navas1,3 , Maximiliano Ortiz1 , Graciela Benintende1 , Marcelo Berretta1,3 , Rubén Zandomeni1,3 , Ariel
Amadío2,3 .
1-Instituto de Microbiología y Zoología Agrícola (IMyZA), Instituto Nacional de Tecnología Agropecuaria (INTA), Las Cabañas y de
Los Reseros, Buenos Aires, Argentina.
2-EEA Rafaela, Instituto Nacional de Tecnología Agropecuaria (INTA), Ruta 34 km 227, Rafaela, Santa Fe
3-CONICET
In this work we studied and compared different approaches undertaken for sequencing the genome of a thermophilic
bacterium. We have isolated the thermophilic Thermus sp. 2.9 from a hot spring of Rosario de la Frontera, in Salta,
Argentina. Thermophilic organisms contain relevant genes with potential biotechnological applications. There is also
interest in studying the mechanism involved in bacterial adaptation to their extreme natural environment. We used
Roche 454 and Illumina MiSeq platforms to generate unpaired and paired-end reads, respectively. The paired-end
library was build using long jumping distance technology with a length of 8 Kb. The following table summarizes the
results of sequencing and assemblies:
# reads
Assembler
# contigs
N50
Roche 454
215,557
Newbler
137
39,906
Illumina MiSeq
2,139,062
MIRA
323
17,661
Roche454 + Illumina MiSeq
2,354,619
MIRA
131
79,216
Hybrid assembly using MIRA gave the best result. Scaffolding was performed with BAMBUS using the contigs coming
from the hybrid assembly. Different values of redundancy were evaluated to consider true a link between contigs using
paired reads. The best result was obtained with a minimum of 200 linked reads. In this way, seven scaffolds covering
the entire bacterial chromosome were obtained. Using the information given by an optical map of the genome
generated previously we were able to order and join the scaffolds, leading to the reduction of the whole chromosome
to a single scaffold. Another three major scaffolds longer than 50 Kb were found homologous to plasmids reported
for the genus, suggesting the presence of one or more plasmids in this strain. Genome annotation was made using the
RAST server. We identified a total of 2,673 CDS, 48 tRNA and 3 rRNA gene-encoding regions. We analyzed these
annotated features and found that 1,705 CDS can be associated to enzymes with defined functions. Corresponding
EC number were assigned to those genes, while 968 CDS were classified as hypothetical proteins. Fifty-nine genes
were selected as candidates for cloning and expression of the encoded proteins which have application in food industry
and bioenergy, with high interest because of their potential thermostability.
Genome Annotation and Organization
Poster Session – Submission 17
Sequencing and assembly of Bacillus thuringiensis INTA Fr7-4 genome
Laura Navas1,3 , Maximiliano Ortiz1 , Diego Sauka1,3 , Graciela Benintende1 , Marcelo Berretta1,3 , Rubén
Zandomeni1,3 , Ariel Amadío2,3 .
46
Evolution, phylogenetics
and comparative genomics
ID:4
Poster Session
1-Instituto de Microbiología y Zoología Agrícola (IMyZA), Instituto Nacional de Tecnología Agropecuaria (INTA), Las Cabañas y
de Los Reseros, Buenos Aires, Argentina. 2-EEA Rafaela, Instituto Nacional de Tecnología Agropecuaria (INTA), Ruta 34 km 227,
Rafaela, Santa Fe. 3-CONICET
In recent years, there has been growing interest in sequencing isolates of the Gram positive bacterium Bacillus
thuringiensis (B. thuringiensis) to discover new insecticidal proteins useful for biocontrol of agricultural pests and
mosquitoes. B. thuringiensis INTA Fr7-4 is a native strain isolated from a soil sample in the province of Misiones. We
have previously reported the complete sequence of three plasmids of this strain and characterized three insecticidal
genes of the crystal (cry) family of proteins. This work reports on the sequencing of the genomic DNA from B.
thuringiensis INTA Fr7-4 and the assembly of the readings to obtain the sequence of the chromosome. Genomic
DNA from B. thuringiensis INTA Fr7-4 was sequenced using a 8 Kb long jumping distance library and 2 × 150 bp run
on a MiSeq Illumina apparatus. After applying the appropriate quality clipping, 2,442,414 paired end readings (total
of 4,884,828) were obtained, with an average length of 129 bp, and 4,962,965 singleton reads averaging 124 nt
length. A de novo assembly was done using Velvet. The longest scaffold was 3.9 Mb long, and a total of 10 scaffolds
longer than 10 kb were obtained. Scaffolds were compared with the GenBank database using blastn showing high
identity with the chromosome of Bacillus bombysepticus str. Wang, a closely related species. For this reason, using
it as a reference genome, we were able to build a map of B. thuringiensis INTA Fr7-4 chromosome consisting of 5
scaffolds, giving a total size of 5.2 Mb.
Annotation of chromosome scaffolds using the RAST server was performed. We identified 5,300 CDS and 105
tRNA and rRNA gene-encoding regions. However, 187 CDS related to sporulation process in bacteria and 103 with
chromosomal DNA replication attracted our attention.
We did not find any insecticidal gene in the chromosome of B. thuringiensis INTA Fr7-4. The scaffolds not located
within the chromosome belong to plasmid DNA. All plasmids previously sequenced were reconstructed, and a new
plasmid of 259 Kb long was identified. This plasmid contains the previous detected insecticidal genes.
Evolution, phylogeneticsand comparative genomics
Poster Session – Submission 4
Archean core promoter region information content and its relation with optimal
growth temperature.
Ariel Aptekmann, Alejandro Nadra
IQUIBICEN, Argentina
Abstract. We studied the relation between optimal growth temperature (OGT) and information content (IC), in the
core promoter region of all the archeal genomes published to date, by calculating the information content of the motiff
that represents the TATA binding site (TBS). We have tested several different approaches to predict transcription
start sites (TSS) in a given genome we then used motiff prediction software in the flanking regions to the TSS, we
constructed a database, compiling already available information from published sources, that contains characteristic
growth conditions for each strain. Our work hipotesis is that protein-dna interfase in thermophiles should be different
from that of mesophiles, in particular we propose and test a positive correlation between information content of
binding sites and OGT in archeas. We show that the information content increases with increasing optimal growth
temperature, and this effect cannot be explained solely by an increased CG composition. Selective pressure towards
binding sites with higher binding affinity to the protein could be the reason for this correlation. The established
Rseq = Rf r eq from molecular information theory doesnt take into account the effect of temperature as a selective
pressure acting to skew the posible binding sites, and creating another cause for an increment in Rseq that doesnt
apply to Rf r eq. Since entropy effects increase with temperature, Shannon entropy effects might as well.
47
Evolution, phylogenetics
and comparative genomics
ID:15
Evolution, phylogeneticsand comparative genomics
Poster Session
Poster Session – Submission 15
Exploring the genetic bases of mammalian unique hearing capacities: an
evolutionary approach
Francisco Pisciottano , Belén Elgoyhen , Lucía Franchini
Instituto de Investigaciones en Ingeniería Genética y Biología Molecular (INGEBI), Buenos Aires, Argentina
Mammals possess unique hearing capacities among animals. These capacities are the consequence of an evolutionary
process which involves a number of important changes in the inner ear. Among these changes we can remark the the
elongation of the papilla that rendered the characteristic mammalian coiled cochlea, the special and stable distribution
of hair cells all along Corti’s Organ and the origin of a unique cellular type, the outer hair cell (OHC). This new kind
of cell endowed mammals with a novel sound mechanic amplification mechanism known as somatic electromotility,
an active cochlear amplifier process crucial for auditory sensitivity and frequency selectivity. Although these features
are well studied and most of them are regarded as evolutionary novelties, product of an adaptive process in the
mammalian lineage, little is known of the genetic bases underlying the evolution of these features. Only a few inner
ear proteins have previously been subject of selection analysis [1,2].
Our main objective is to study the evolutionary processes that shaped those genes involved in the evolution of the
particular functional capacities of the mammalian inner ear. To do so, we are assembling an inner ear database that
comprises genes from different sources. For the construction of this database we aim to concentrate the information
generated by seventeen expression libraries that gather 86,744 expressed sequence tags (ESTs). For the evolutionary
analysis we perform branch-site specific positive selection test [3] that allow us to recognize those genes that fit the
model of adaptive evolution, and the specific sites in the alignment that have evolved under positive selection in the
lineage that gave origin to mammals.
Among the seventeen publicly available expression libraries, the RIKEN adult mouse inner ear [4] is the main rodent
library, containing 22,576 ESTs that would represent more than 4,500 genes, and one of the most trustworthy among
the inner ear libraries, according to our testing studies. A preliminary test carried out from the first 100 ESTs of this
library rendered 84 genes. Although only 34 of them could be analyzed due to the available information, 11 of them
showed signs of positive selection (P>0.95), pointing out that there is an important number of inner ear genes that
may show adaptive evolution along the mammalian branch. We present here the pipeline developed to analyze the
information gathered from the expression libraries and the results obtained from the complete analysis of the RIKEN
library.
References
1. Franchini LF, Elgoyhen AB: Adaptive evolution in mammalian proteins involved in cochlear outer hair cell electromotility.
Mol Phylogenet Evol 2006, 41:622-635.
2. Kirwan JD, Bekaert M, Commins JM, Davies KTJ, Rossiter SJ, Teeling EC: A phylomedicine approach to understanding
the evolution of auditory sensory perception and disease in mammals. Evol Appl 2013, 6:412-422.
3. Zhang J, Nielsen R, Yang Z: Evaluation of an improved branch-site likelihood method for detecting positive selection at
the molecular level. Mol Biol Evol 2005, 22:2472-2479.
4. Okazaki Y, Furuno M, Kasukawa T, Adachi J, Bono H, Kondo S, Nikaido I, Osato N, Saito R, Suzuki H et al.: Analysis
of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature 2002, 420:563-573.
48
Evolution, phylogenetics
and comparative genomics
ID:43
Evolution, phylogeneticsand comparative genomics
Poster Session
Poster Session – Submission 43
Population genetic structure of the ancestor of the Lager-brewing yeast in
Patagonia (Saccharomyces eubayanus)
Juan Ignacio Eizaguirre1 , David Peris2 , Patricio De Los Ríos3 , Christian Lopes4 , María Eugenia Rodríguez5 ,
Chris Hittinger6 , Diego Libkind7
1
Lab. Microbiología Aplicada y Biotecnología, Instituto de Investigación en Biodiversidad y Medioambiente (INIBIOMA), CONICET
– UNComahue, Bariloche
2
Laboratory of Genetics, Genome Center of Wisconsin, Wisconsin Energy Institute, DOE Great Lakes Bioenergy Research Center,
University of Wisconsin-Madison
3
Lab. Ecología Aplicada y Biodiversidad, Univ. Católica de Chile, Temuco
4
Grupo de Biodiversidad y Biotecnología de Levaduras, Inst. de investigación y desarrollo en Ing. de procesos, Biotecnología y
Energías alternativas (PROBIEN), CONICET-UNComahue, Neuquén
5
Grupo de Biodiversidad y Biotecnología de Levaduras, Inst. de investigación y desarrollo en Ing. de procesos, Biotecnología y
Energías alternativas (PROBIEN), CONICET-UNComahue, Neuquén
6
Laboratory of Genetics, Genome Center of Wisconsin, Wisconsin Energy Institute, DOE Great Lakes Bioenergy Research Center,
University of Wisconsin-Madison
7
Lab. Microbiología Aplicada y Biotecnología, Instituto de Investigación en Biodiversidad y Medioambiente (INIBIOMA), CONICET
– UNComahue, Bariloche
The discovery and description in Patagonia of a new species of yeast, Saccharomyces eubayanus, parental of the
inter-specific hybrid S. pastorianus (used worldwide in the production of LAGER beer) opened a very fertile field
for research, development and innovation. This work aims to contribute to the knowledge on the biogeography of
S.eubayanus along the Andean Patagonia and the genetic structure of its populations.
To do this more than 200 isolates of S. eubayanus were obtained from various substrates (soil, bark, leaves and
Cyttaria spp.) associated with various tree species of the endemic genus Nothofagus in Argentina and Chile (between
latitudes 37◦ C to 54◦ C). The isolates were identified by PCR-fingerprinting and then, along with LAGER-brewing
strains, were characterized by sequencing and analysis of COX2 (mitochondrial, 530bp) and DCR1 (nuclear, 859bp).
Both genes proved to be useful for detecting intra-specific variability in the studied species. A database of the
isolate´s source coordinates, altitude, type of substrate, host tree and other ecological parameters such precipitation,
radiation and mean temperaturas, was generated. Geographical distance of the isolates was calculated and a principal
component analysis of ecological traits was performed.
The results with DCR1 gene discriminated two main populations with a genetic divergence of ∼ 1%. 29% of the
strains tested was part of the "A" population (21 haplotypes) located exclusively in northern Patagonia, and "B"
population (47 haplotypes) consisted of remaining strains tested and these were distributed throughout Patagonia.
This is consistent with results obtained using fewer strains employing SNPs markers ∼ 10KB at genomic level. With
the COX2 gene, both populations were not evidenced but we found at least 55 haplotypes. Moreover, the COX2
gene showed that 7 strains appeared to be recombinants between S.eubayanus and a second species: Saccharomyces
uvarum; which is sympatric and closely related to S.eubayanus. Phylogenetic networks analysis were performed which
allowed a better understanding of the reticulated evolution of populations of S.eubayanus were generated. The results
showed the existence of two populations of S.eubayanus markedly different in Patagonia and that at the same time
exhibit high intra-population genetic heterogeneity. The population of greater abundance and distribution (B) was the
closest (genetically) to LAGER strains although none of the isolates showed 100% similarity. In this paper, hypotheses
about environmental and geological factors influencing the population structure of this species are addressed. This
yeast species has biotechnological importance for the production of beers with Patagonian regional identity.
49
Evolution, phylogenetics
and comparative genomics
ID:47
Evolution, phylogeneticsand comparative genomics
Poster Session
Poster Session – Submission 47
Genome sequencing and comparative analysis of two new Enterococcus faecium
strains
Gabriel Gallina Nizzo1,2 , Luis Esteban2 , Christian Magni1
1
Instituto de Biología Molecular y Celular de Rosario (IBR-CONICET), Facultad de CienciasBioquímicas y Farmacéuticas, Universidad
Nacional de Rosario, Santa Fe, Argentina.
2
Facultad de Ciencias Médicas, Universidad Nacional de Rosario, Santa Fe, Argentina
Background. The enterococci are an ancient genus of microbes that are highly adapted to living in complex environments and surviving harsh conditions. Enterococcus faecium are leading causes of multidrug resistant hospital
acquired infections. Moreover, the enterococci member serve as reservoirs for antibiotic resistances that they are
spreading to other important pathogens. Relevance of studies on this bacteria stand on its dual role as commensal
or opportunistic pathogens [1]. Recently, we isolated from cheese two new variants of E. faecium, named E. faecium
IQ110 and E. faecium GM75. In order to characterizer them we sequenced both. We used Illumina sequencing to
determine the genome sequence of both isolates. The short reads were de novo assembled using SeqMan NGen
sequence assembly software and with the resulting contigs BLASTN (all versus all) was performed and those contigs
shorter than 1,000 bp and with an similitude higher than 99% with sequences already contained in a longer contig
were deleted. The final assembly resulted in 43 and 152 contigs for E. faecium IQ110 and E. faecium GM75 respectively. This remaining contigs were ordered and oriented with Advanced Pipmaker [2] using E. NRRL as genome of
reference. Genome annotation was accomplished by Rast [3] and Basys [4]. Manual curation of genes was performed
with Artemis. To asses presence of Genomic Islands (GEIs), Plasmids, Viruses, Virulence genes, Acquired antimicrobial resistance genes, Insertion sequences and Pathogenic prediction were employed: Island Viewer, PlasmidFinder,
Phast, VirulenceFinder , Resfinder , Isfinder and PathogenFinder respectively. A distance matrix was obtained with
Gegenees [5] using the Enterococcus faecium genomes representatives of genome homology groups published so far.
Then, a Phylogenetic network was constructed using SplitsTree4 software [6] in order to locate this new strains in
their respectives clades. Functional comparison was based in Rast assignments and visualized with Mauve Genome
Alignment Software [7].Two clusters of PTS related genes are in GEIs in both bacterias. No virulence factors were
found in the IQ110. Despite the negative prediction of PathogenFinder (predicts pathogenic potential), GM75 has
two virulence factors: the efa Afm adhesin and acm. Also in the GEIs three prophage clusters were found. Other
important point for the medical point of view is the resistance to ATB. Both bacterias contain the Resistence genes
for Aminoglycoside and Macrolide. IS elements and transposases are major mobile genetics elements In E. faecium.
They share 52 IS, 16 are unique of E. faecium GM75 and 6 in the strain IQ110. The phylogenetic network sets
E. faecium GM75 within the clade B and E. faecium IQ110 in clade A [8]. These findings should advance our
understanding of the adaptation of this bacterium to different hosts and the evolutionary mechanism involved.
Results. The main differences in the categories of COG observed between them are in ’Carbohydrate metabolism
and transport’ and ’Replication and repair’. In the first category highlights the presence of genes related to the
metabolism of citrate for E. faecium strain GM75 and the absence of these in the strain E. faecium IQ110 while
genes for utilization xylose, D-sorbitol, L-sorbose and trehalose is in the latter and are absent in E. faecium GM75
which could indicate a shift in sugar metabolism due to a niche adaptation. Despite this we found larger number of
genes for the metabolism of sucrose and fructose in E. faecium GM75 some of them adquired by HGT. E. faecium
GM75 possesses more GEIs than strain IQ110; 16 GEIs vs 7.
Two clusters of PTS related genes are in GEIs in both bacterias. No virulence factors were found in the IQ110.
Despite the negative prediction of PathogenFinder (predicts pathogenic potential), GM75 has two virulence factors:
the efa Afm adhesin and acm. Also in the GEIs three prophage clusters were found.
Other important point for the medical point of view is the resistance to ATB. Both bacterias contain the Resistence
50
Genomics, functional genomics
and metagenomics
ID:20
Poster Session
genes for Aminoglycoside and Macrolide. IS elements and transposases are major mobile genetics elements In E.
faecium. They share 52 IS, 16 are unique of E. faecium GM75 and 6 in the strain IQ110.
The phylogenetic network sets E. faecium GM75 within the clade B and E. faecium IQ110 in clade A [8]. These
findings should advance our understanding of the adaptation of th
References
1. Gilmore MS, Clewell DB, Ike Y, Shankar N, editors. Enterococci: From Commensals to Leading Causes of Drug Resistant
Infection [Internet]. Boston: Massachusetts Eye and Ear Infirmary; 2014-. PubMed PMID: 24649511.
2. Elnitski L, Riemer C, Schwartz S, Hardison R, Miller W: PipMaker: a World Wide Web server for genomic sequence
alignments. Curr Protoc Bioinformatics 2003, Chapter 10:Unit 10.2.
3. Overbeek R, et al: The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST).
Nucleic Acids Res 2014, 42:D206-14.
4. Van Domselaar GH, Stothard P, Shrivastava S, Cruz JA, Guo A, Dong X, Lu P, Szafron D, Greiner R, Wishart DS:
BASys: a web server for automated bacterial genome annotation. Nucleic Acids Res 2005, 33:W455-9.
5. Agren J, Sundström A, Håfström T, Segerman B: Gegenees: fragmented alignment of multiple genomes for determining
phylogenomic distances and genetic signatures unique for specified target groups. PLoS One 2012, 7:e39107.
6. Huson DH, Bryant D: Application of phylogenetic networks in evolutionary studies. Mol Biol Evol 2006, 23:254-267.
7. Darling AE, Mau B, Perna NT: progressiveMauve: Multiple Genome Alignment with Gene Gain, Loss and Rearrangement.
PLoS ONE 2010, 5:e11147.
8. Palmer KL, et al: Comparative genomics of enterococci: variation in Enterococcus faecalis, clade structure in E. faecium,
and defining characteristics of E. gallinarum and E. casseliflavus. Mbio 2012, 3:e00318-11.
Genomics, functional genomicsand metagenomics
Poster Session – Submission 20
pSVMTOCP: a parallel SVM tree algorithm for optimal multi-class partition
Nicolás Ferreyra1 , Cristóbal Fresno2,3 , María Laura Zingaretti1 , Laura Prato1 , Diego Arab Cohen2 , Elmer
Fernández2,3
1-Instituto de Ciencias Basicas y Apicadas- Universidad Nacional de Villa María, Villa María, Córdoba, Argentina. 2-Universidad
Catolica de Cordoba, Biosciences Data Mining Group, Córdoba, Córdoba, Argentina. 3-CONICET, Argentina.
Background. In Bioinformatics and many other fields, supervised classification problems are a common issue, particularly when a diagnostic methods based on molecular signatures is required to classify several disease levels. This
is usually a hard problem due to both, the amount of samples required or the overlapping characteristics of the
variables/genes describing the disease. One of the main tools used for multi-class classification problems is Support
Vector Machines (SVM) [2] under the well known One vs One (OVO) and One vs All (OVA) strategies [3]. Tree
SVM structures has also been proposed but some prior clustering or segmentation procedure is required that could
introduce inaccuracies. In addition, any of the previous strategies are time consuming when parameters optimization
is required. Here we propose a fast and accurate tree structure classification strategy based on SVM[4], enhanced by
means of SVMpath algorithm[1]. The new model produce solutions with higher accuracy and reach more hard-margin
solutions than any other. More hard-margin solution implies a better generalization capabilities of the classifier as a
diagnostic tool.
Materials and methods. Parallel SVM Tree Optimal Class Partition (pSVMTOCP) is a data-mining tool that creates
binary and balanced trees (see Figure 1). Each tree is composed by nodes, which have associated two elements in a
downstream manner. Any of this elements can be another node or a leaf (class). At node “i”, the data set is split,
based on its class labels, into li = ηKi !/r !(Ki − r )! binary problems, Ki is the number of classes at node “i”, r = [K/2]
and η is 1 for K odd and 0.5 otherwise. Each of this where problems is solved by a SVM model where the kernel
51
Genomics, functional genomics
and metagenomics
ID:20
Poster Session
and / or cost parameters are optimized for each binary problem. Since this is a time consuming task, we apply the
SVMpath algorithm, which span all the parameters in almost the same time used to train a common SVM. Then,
the best performance is chosen and the classes of each partition passes to the downstream nodes and the process is
repeated. To speed up the process, the proposed algorithm can be executed in parallel, separating different threads
for node-training and SVMs-training inside each node. The proposed method is tested using the following datasets:
Iris, Glass, Breast Tissue, 9Tumors and NCI60. The performance of the pSVMTOCP is compared against the OVO
strategy. Each dataset was divided into a train set (80% of observations) and a test set (20% of observations) for
every dataset.
Results. Dataset characteristics and performance for OVO and SVMTOCP methods is presented in Table 1. It is
possible to observe that pSVMTOCP strategy outperform the OVO method for all datasets, achieving lesser errors,
higher proportion of hard margin solutions, less amount of support vectors as well as lesser training time. In Figure
1 is possible to observe the achieved pSVMTOCP associated with NCI60 dataset.
Figure 1: pSVMTOCP associated with NCI60 dataset.
SVM multi-class strategy
Data
Name
Iris
Glass
B. Tissue
9Tumors.
NCI60.
base
OVO
N VarsK
C
NSV HM sol. %Error
150 5 3
1
25 1 of 3
3,33
213 10 6 21
125 4 of 15 25,58
106 10 6 156
55 4 of 15 27,27
58 71 8 0,021 42 15 of 28 53,84
61 264 8 0,006 47 6 of 28 16,66
Time
8,2
577,79
28,47
91,63
275,68
pSVMTOCP
CMin-CMax NSV HM sol.
0,65-428,2
9
1 of 2
0,34-38,32 130
2 of 5
0,74-8428,2 32
2 of 5
0,006-0,068 34
7 of 7
0,002-0,012 39
7 of 7
%Error
0
23,25
18,18
30,76
8,33
Time
0,83
2,9
2,2
3,1
4,62
Table 1: Performance table for different SVM multi-class strategies (N=Rows; Vars=Variables; K=Classes; C=Cost; NSV=Number of supportvectors;
HM sol.=Hard Margin solutions; %Error= Percentage of classification error predicting test set, Time=Train time in seconds).
52
Genomics, functional genomics
and metagenomics
ID:41
Poster Session
Conclusions. pSVMTOCP strategy is a robust choice when we deal with a supervised classification problem. This
strategy divides one single problem into several sub-problems allowing to set up specific parameters for each node.
This is an advantage because it can treat a problem independently from the others. Also, pSVMTOCP makes
accurate class-predictions for new data and has very small time executions.
References
1. Hastie, Trevor. The Entire Regularization Path for the Support Vector Machine. Journal of Machine Learning Research
5 (2004).
2. Abe, Shigeo. Support Vector Machines for Pattern Classification. Springer (2005).
3. Rocha, Anderson; Goldenstein, Siome. (2013). Multiclass from Binary: Expanding One-vs-All, One-vs-One and ECOCbased Approaches. Recovered from: http://www.ic.unicamp.br/~siome/papers/Rocha-TNNLS-2013.pdf
4. Diego Arab Cohen, Elmer Andrés Fernández (2012). SVMTOCP: A binary tree base SVM approach through optimal
multi-class binarization. Recovered from: http://link.springer.com/chapter/10.1007%2F978-3-642-33275-3_58
Genomics, functional genomicsand metagenomics
Poster Session – Submission 41
Multi-way for analysis and visualization of OMIC data: maVOD
María Laura Zingaretti1 , Johanna Demey-Zambrano2 , Jose Luis Vicente-Villardón3 , Julio Alejandro Di Rienzo4 ,
Jhonny Rafael Demey5
1
Instituto de Ciencias Básicas y Aplicadas- Instituto de Ciencias Humanas, Universidad Nacional de Villa María. Villa María, Cór-
doba, Argentina.
2
School of Medicine and Biomedical Sciences, University at Buffalo. Buffalo, NY, USA.
3
Departamento de Estadística, Universidad de Salamanca. Salamanca, España.
4
Facultad de Ciencias Agrarias, Universidad Nacional de Córdoba. Córdoba, Argentina.
5
Lab. de Biometría y Estadística, Instituto de Estudios Avanzados. Caracas, Venezuela.
In the last couple of years, the availability rise of public data of microarrays has gained increasing importance. The
"omics" technologies allow quantitative knowledge of hundreds of biological data of complex nature and have enabled
the opportunity of study simultaneously, based on multiple datasets, the expression levels of thousands of genes over
the effects of certain treatments, diseases, and developmental stages on gene expression. This has turn out to be
a promising approach for analysing and interpreting genome-wide association studies and gene set analysis that are
useful to comprehension of biological processes. However, current statistical methodologies for gene set analysis
based on multiple datasets are still in an early stage of development, they are mostly based on classical statistical
methods, since the joint analysis of the subspaces that generate multiple datasets are not simple. The problem is
centered on finding a best statistical approach that allows us to relate the genes expression with different experimental
conditions or independent groups that have not been observed and measured with the same accuracy, precision and
levels of replication in experimental design. The k-tables analysis have been developed to handle these problems and to
calculate a consensus from data matrices that generates the different scales, dimensions or spaces. This methodology
is an extension of principal component analysis (PCA) tailored to handle multiple data tables that measure sets of
variables collected on the same observations (Abdi, et al., 2012); multi-way for analysis and visualization of OMIC
data: maVOD (Demey and Zingaretti, 2014) is an R package written for this purpose, that introduce the following
improvements to the method: Genes Sample variability, multiple comparisons between studies (DGC-Test) (Di Rienzo,
2002), selection of genes with QR criterion (Demey et al, 2008) and the network representation of biological proccess
using the average projections of compromise matrix. We illustrate the proposed approach using multiple microarray
gene expression datasets obtained from Tomato Funtional Genomics DataBase of eight studies asociated to several
diseases that afect the tomato crop, from different times of post infection, plant age, plant tissue and types of
experiment and array platform.
53
Genomics, functional genomics
and metagenomics
ID:49
Poster Session
References
• Abdi, H., Williams, L.J., Valentin, D., & Bennani-Dosse, M. (2012). STATIS and DISTATIS: optimum multitable
principal component analysis and three way metric multidimensional scaling. WIREs Comput Stat, 4, 124-167. doi:
10.1002/wics.198.
• Demey et al (2008) Bioinformatics, 24(24):2832-2838
• Demey JR, L Zingaretti (2014). maVOD. R package version 3.1.0
• Di Rienzo et al (2002) JABES,7(2):129-142
• Lavit C, et al. Computational Statistics and Data Analysis, 18:97–119
Genomics, functional genomicsand metagenomics
Poster Session – Submission 49
Bioprospecting of lignocellulolytic enzymes in enriched consortia of pine and
eucalyptus forest soils by metagenomic sequencing
Marina D. Reinert1 , Santiago Revale1 , Estefanía Mancini2 , María Belén Carbonetto1 , Martín P.Vazquez1
1
Instituto de Agrobiotecnología Rosario, Rosario, Santa Fé, Argentina
2
Fundación Intituto Leloir, Buenos Aires, Argentina
Background. Second generation biofuels are produced by fermentation of sugars extracted from agronomic residues
to ethanol. Lignocellulose breakdown is a crucial step needed to obtain sugar free molecules. Nowadays the bottleneck
for second generation biofuel production is in the cost of lignocellulolitic enzymes [1, 2]. Our aim is to use metagenomic
based bioprospecting to find novel lignocellulose degrading proteins and to produce them in a low cost system based
on plants as biofactories.
Methods. We took soils samples in a Pine elliotis and in a Eucalyptus grandis forest soils in Concordia, Entre Ríos,
in February 2012. Both soils contained wood decaying material. Samples were then used as inoculum for minimum
media [3] with only carboximetil-celulose (CMC) or sawdust as organic matter. Additionaly, we used antibiotics or
antifungals to prevent each type of organism grow in each case. They were cultured for 30 days, and an aliquot of each
culture was taken every 10 days. Genomic DNA was extracted from each sample. Amplicon sequencing of the V4
region of 16s rRNA gene was then performed at 454 GS-FLX+ (Roche) platform in order to evaluate the enrichment
of lignocellulose degrading microorganisms. Whole genome metagenomic sequencing (454 GS-FLX+) was then
performed to the most enriched sample (i.e. the one with high proportion of taxa described as lignocellulose degraders
and minus of commensals). Bioprospection analysis using bioinformatics tools was then performed. First, we did
de novo assembly using the CAMERA [https://portal.camera.calit2.net/gridsphere/gridsphere] assembler
workflow. Then we used the MG- RAST [http://metagenomics.anl.gov/] platform for taxonomic and functional
annotation. We extracted coding sequences (CDS) using Fraggene scan open reading frame (ORF) algorithm. We
finally ran Blast against CAZy database [http://www.cazy.org/] to find lignocellulosic enzyme domains in our
CDS dataset. A customized Perl script was used to get only those glycosyl hydrolase and cellulose binding domains
linked with degrading activities [4]. Finally, we selected only those sequences who had shown consistence with
Pfam [http://pfam.xfam.org/], UniProt [http://www.uniprot.org/] and Priam [http://priam.prabi.fr/]
annotations, proper ORF length and not high homology with database enzymes (below 80%).
Results. The metagenomic sequencing produced 718.489 reads, 421 pair bases (pb) long in average, totaling
302.172.049pb. A 10% (30.458.285pb) of the total pair bases were assembled in contigs. Maximum length contig
was 523.078pb. We manually selected 39 promising proteins with an average length of 644pb, figure 1 and table 1
summarize its identity and domains.
54
Metabolomics and Cheminformatics
ID:46
Poster Session
Figure 1: The pie chart shows the abundance of glycosil hydrolase and cellulose binding domains in the selected proteins.
Enzymes
Acetylxylan esterase
Alpha-glucuronidase
Alpha-N-arabinofuranosidase
Beta-glucosidase
Endo-1,4-beta-xylanase
Endoglucanase
Xylan 1,4-beta-xylosidase
Feruloyl esterase
EC number
3.1.1.72
3.2.1.139
3.2.1.55
3.2.1.21
3.2.1.8
3.2.1.4
3.2.1.37
3.1.1.73
#
4
1
6
12
2
2
11
1
Table 1: shows all enzyme activities selected with his Enzyme Commission (EC) number and abundance of each one.
Conclusions. The enrichment process allowed us to get bacterial consortia containing lignocellulose degrading microorganism, as we seen previously by 16s rRNA amplicon sequencing. But only implementing metagenomic sequencing
we were able to know sequence identity of proteins involved in lignocellulose degrading. Proteins were manually
annotated and a subset selected applying bioinformatics tools. This proceedings resulted in a list of 39 promising
enzymes. These will be subject of experimental test at lab to take part of a degrading cocktail.
Acknowledgments. We would like to thanks to Lic. Soledad Romero and Lic. Bianca Brun for perform all sequencing
runs used in this study.
References
1. Naik SN, Goud VV, Rout PK, Dalai AK: Production of first and second generation biofuels: A comprehensive review.
Renew Sustain Energy Rev 2010, 14: 578–597.
2. Mtui GYS: Recent advances in pretreatment of lignocellulosic wastes and production of value added products. African
Journal of Biotechnology 2009, 8: 1398–1415.
3. Crawford D, McCoy E: Cellulases of Thermomonospora fusca and Streptomyces thermodiastaticus. Appl Environ Microbiol
1972, 24: 150-152.
4. . Allgaier M, Reddy A, Park JI, Ivanova N, D’haeseleer P, Lowry P, Sapra R, Hazen TC, Simmons BA, VanderGheynst
JS et al. Targeted discovery of glycoside hydrolases from a switchgrass-adapted compost community. PLoS One 2010,
5: 372–380.
55
Metabolomics and Cheminformatics
ID:46
Metabolomics and Cheminformatics
Poster Session
Poster Session – Submission 46
Interactive Visual Analysis Methodology for Improving Descriptor Selection in
QSPR: First Steps
María Jimena Martinez1 , Fiorella Cravero2 , Gustavo E. Vazquez3 , Mónica F. Díaz2 , Axel J. Soto4 , Ignacio
Ponzoni1
1
Laboratory for Research and Development in Scientific Computing (LIDeCC), ICIC, DCIC, UNS, Av. Alem 1250, Bahía Blanca,
Argentina
2
Planta Piloto de Ingeniería Química (PLAPIQUI) CONICET-UNS, La Carrindanga km.7, Bahía Blanca, Argentina
3
Facultad de Ingeniería y Tecnologías – Universidad Católica del Uruguay, Montevideo, Uruguay.
4
Faculty of Computer Science, Dalhousie University, Halifax, Canada.
Background. The design of QSAR/QSPR models requires dealing with several problems. One of them is the selection
of the most relevant set of molecular descriptors for the property or activity that is intended to be modeled. One
central point in this task is how we can involve the domain expert (e.g. a chemist), so that he can incorporate his
knowledge and expertise during the feature selection process [1]. In this context, strategies based on dynamic visual
analysis can be useful. The main idea behind visual analytics approaches is to merge the computational capacity
of statistical and machine learning methods with the human natural ability of identifying patterns in visualizations.
Therefore, by allowing some form of interaction in the visualizations, users can explore the data and provide feedback
to the method, and/or use the tool to arrive at more informative decisions. In this work we report our first experiences
in the design of a methodology, which combines statistical methods with interactive visualizations, in order to address
the problem of molecular descriptor selection.
Methods. The interactive visual analytics tool proposed is used for exploring alternative QSAR models, and it is
organized in four charts (Figure 1): two undirected graphs that represent pairwise associations between descriptors,
a bipartite graph, which represents the relationship among models and descriptors, and a customized plot area, which
depicts different relationships between the descriptors and the target property. Some relevant characteristics that can
be highlighted by the visualizations are: redundant descriptors, descriptors that provide discriminative information,
relevant descriptors by consensus among alternative models, and descriptors whose knowledge helps decrease the
uncertainty about the value of the target property. In this way, the modeler can analyze the different aspects involved
in the QSAR/QSPR model design simultaneously.
Results and Conclusions. The capabilities of our tool were assessed through two case studies. One study corresponds
to the prediction for VOCs (volatile organic compounds) [2]. The tool was used to select one subset of descriptors
from a group of four alternatives subsets. The other study, corresponds to the prediction of elongation at break for
high molecular weight polymers [3]. In this scenario, the tool was used to illustrate the case where the analyst wants
to modify the automatic selections of descriptors in order to incorporate an experimental parameter to the model. In
both cases, the results showed the suitability and convenience of this methodology for selecting sets of descriptors
with desirable characteristics (low cardinality, high interpretability, low redundancy and high statistical performance)
in an exploratory and versatile way.
56
Metabolomics and Cheminformatics
ID:46
Poster Session
Figure 1: a) In both undirected graphs each node represents a descriptor selected for at least one of the QSAR models. The node color uses a grayscale
to indicate the proportion of models in which the descriptor has been selected. The node sizes and edges can be customized for representing different
types of relationships among descriptors. Two main modes where defined: entropy-based and correlation-based.
57
Metabolomics and Cheminformatics
ID:46
b) This chart is a bipartite graph, where the
nodes on the left represent the models and the
nodes on the right represent the descriptors of
these models. The edges indicate occurrence
of a descriptor in a model.
Poster Session
c) Double-clicking on a node in the undirected
graphs shows a scatter plot with the dispersion
of the values of this descriptor versus its corresponding target property value. Additionally,
two histograms indicating the frequency of the
descriptor and target values can be overlapped
over the scatter plot.
Acknowledgments. This work is kindly supported by PGI-UNS (24/N032) and PIP112-2009-0100322 (CONICET National Research Council of Argentina).
References
1. Palomba D, Martínez M J, Ponzoni I, Díaz M F, Vazquez G E, Soto A J: QSAR models for predicting log Pliver on volatile
organic compounds combining statistical methods and domain knowledge. Molecules 2012, 17: 14937-14953.
2. Abraham M H, Ibrahim A, Acree W E Jr: Air to liver partition coefficients for volatile organic compounds and blood to
liver partition coefficients for volatile organic compounds and drugs. Eur J Med Chem 2007, 42: 743-751.
3. Todeschini R, Consonni V, Ballabio D, Mauri A, Cassotti M, Lee S, West A, Cartlidge D: QSPR study of rheological and
mechanical properties of chloroprene rubber accelerators. Rubber Chemistry and Technology 2014, 87: 219-238.
58
Proteomics and functional proteomics
ID:39
Proteomics and functional proteomics
Poster Session
Poster Session – Submission 39
Identifying relationships between structure and function of the bacterial metabolic
pathway TR–TRX–TPX
Diego S. Vazquez1 ,*, Javier Iserte2 , William A. Agudelo1 , Gerardo Ferrer-Sueta3 , Bruno Manta3,4 , Mariano
C. González Lebrero1 , Cristina Marino Buslje2 and Javier Santos1∗
1
IQUIFIB (UBA-CONICET), Departamento de Química Biológica, FFyB, Universidad de Buenos Aires, Argentina.
2
Laboratorio de Bioinformática Estructural, IIBA-CONICET, Fundación Instituto Leloir, Argentina.
3
Laboratorio de Fisicoquímica Biológica, Instituto de Química Biológica, Facultad de Ciencias, UdelaR, Uruguay.
4
Laboratorio de Biología Redox de Tripanosomas, Institut Pasteur de Montevideo, Uruguay.
Contact e-mail: [email protected][email protected]
Background. Throughout all the kingdoms, the cellular antioxidant and redox homeostasis are regulated by the
thioredoxin and glutathione systems [1,2] which comprise several TRX-like fold proteins such as glutaredoxins, thioldependent peroxidases (PRXs), thioredoxin, among others. Our interest in this system, from a biophysical viewpoint,
is mainly based on (i) the plasticity in the thioredoxin (TRX) substrate recognition process. TRX has different targets
and is only reduced by the FAD- dependent thioredoxin reductase in vivo [1]; (ii) the existence of large conformational
changes in PRX family (helix-coil transitions) that take place coupled to catalysis [3] and may impact over the catalytic
rate; (iii) electron canalization in TR [4]. These aspects among others prompted us to hypothesize the existence
of an evolutionary preserved interaction network involved in protein-protein contact and substrate recognition as
well as in internal dynamic and thermodynamic stability. For this, we performed an exhaustive bioinformatic and
structural analysis of three well-characterized proteins: thioredoxin reductase (TR), thioredoxin 1 (TRX) and the
thiol-dependent peroxidase (TPX, an atypical 2-Cys- peroxiredoxin).
Methods. Mutual co-evolutionary relationships between positions in a multiple sequence alignment containing 3430
sequences of TR, TRX and TPX proteins from the bacterial domain, were performed using the MISTIC on line server
(http://mistic.leloir.org.ar [5]). The most promised inter- and intra- protein pair-of-residues obtained by
mutual information (MI) were subjected to in silico mutations and molecular dynamic (MD) simulations and principal
components analysis (PCA) in order to study the role in the dynamic/thermodynamic of the protein. MDs were
performed in the AMBER14–GPU package [6]. PCA were analyzed and post-processed with the ccptraj module of
AmberTools13.
Results. Preliminary results from mutual information analysis suggest the existence of qualitatively different pair of
residues: (i) located near of the active site suggesting a role in catalysis, (ii) residues with high accessible surface
area suggesting a role in protein-protein interaction and (iii) a group located principally in the core of the proteins
(see Figure 1).
59
Proteomics and functional proteomics
ID:53
Poster Session
Figure 1: Mapping of the highest (Top 10) MI scored pair-of-residues intra- (red VdW spheres) and inter-proteins (purples VdW spheres for TR-TRX
and orange for TRX-TPX) on TR (A), TRX (B) and TPX (C) structure, respectively. The functional cysteines are shown in yellow. In addition, the
maximum frequency conservation profile, with a clustering of 62% of similarity, is shown.
Acknowledgments. This work was supported by grants from ANPCyT, UBACyT and CONICET.
References
1. Lu, Holmgren: The thioredoxin antioxidant system, Free Radical Biology and Medicine, 2013, 8;66:75-87.
2. Pillay et al.: The logic of kinetic regulation in the thioredoxin system, BMC Systems Biology, 2011, 5:15.
3. Hall et al.: Structural Changes Common to Catalysis in the Tpx Peroxiredoxin Subfamily. J. Mol. Biol., 2009, 867–881,
393.
4. Williams Jr.: Mechanism and structure of thioredoxin reductase from Escherichia coli. FASEB J., 1995, 13:1267-76.
5. Simonetti et al.: MISTIC: mutual information server to infer coevolution. Nucleic Acids Research, 2013
6. Case et al.: AMBER 14, University of California, San Francisco.
Proteomics and functional proteomics
Poster Session – Submission 53
Conformational diversity of protein functional regions improves the
characterization of deleterious mutations.
Alexander Monzon1 , Emidio Capriotti2 and Gustavo Parisi1
1
Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Buenos Aires, Argentina.
2
Department of Pathology, University of Alabama at Birmingham, Birmingham, AL, USA.
Introduction. The native state of the proteins shows a wide range of conformations, which are important for their
biological function. The conformers characterizing the native ensemble show different degrees of biological activities
derived from their corresponding structural rearrangements. These complex conformer activities, mingled in a dynamic
equilibrium, define as a whole the structural basis of protein function. The flavor of these structural changes goes
from the large relative movements of complete domains and the change in loops and secondary structural elements
orientation to the small changes in the rotation of residues side-chain. Altogether, these movements modulate
the transit of ligands (substrate, products, ions, water and allosteric modulators) through pathways connecting the
surface of the protein with its interior. Tunnels, channels, cavities, pockets, grooves, voids and pores are some of the
structural features of proteins defining the traffic of ligands inwards and outwards the protein accordingly with the
different conformers in the native ensemble. Conformational diversity could produce variations in the size, wideness
and deepness of these functional regions changing their physicochemical properties and then defining the differential
biological activities observed in the conformers.
Materials and results. Due to the biological importance of these regions we decided to study how deleteriousrelated mutations could occur preferentially associated with them. To this purpose we collected 382 proteins (3095
conformers) with 2394 mutations (1642 disease and 752 polymorphic) were each of the protein show experimentally
probed conformational diversity extracted from CoDNaS database. Tunnels, cavities and pockets were estimated using
Fpocket and MOLE programs. All the mutations were mapped into each of the conformers for each protein in the
dataset to define to which functional region (tunnel, cavities and pockets) belong. We found that deleterious-related
mutations occur preferentially in functional-regions (Fisher test p-value< 0.005) in reference to the occurrence of
polymorphic mutations. As it is well established that deleterious-related mutations involve mainly buried residues, using
all the buried positions we test how deleterious-related and polymorphic mutations could be differentially associated
with the functional regions. We found using a Fisher test that for buried residues the distributions of mutations are
different with a p-value< 0.005. Using the information of the conformational diversity of each protein, we found that
deleterious-related mutations are less mobile that polymorphic mutations (Kolgomorov-Smirnov p-value< 0.001).
60
Structure prediction and protein function
ID:6
Poster Session
This trend is also observed when the deleterious-related mutations occurring in any of the functional-structures are
compared with the polymorphic mutations also occurring in functional regions (Kolgomorov-Smirnov p-value< 0.001
and Wilcoxon test p-value< 0.01).
Conclusions. Our results indicate that the analysis of functional-regions such as tunnels, cavities and pockets and
their conformational diversity can help to better understand the functional effect of protein mutation
Structure prediction and protein function
13
C α and
13
Poster Session – Submission 6
C β chemical shift-driven refinement of protein structures
Pedro G. Ramírez, Osvaldo A. Martin and Jorge A. Vila.
IMASL-CONICET. Universidad Nacional de San Luis, Italia 1556, 5700 - San Luis, Argentina
Background. X-ray crystallography (XRC) and nuclear magnetic resonance (NMR) spectroscopy are the most
powerful and predominant techniques used to experimentally determine the three–dimensional structures of biological
macromolecules at near atomic resolution. On one hand, XRC has no size limitations and provides the most precise
atomic detail, whereas information about the dynamics of the molecule may be limited. On the other hand, NMR–
spectroscopy tops XRC in those cases where no protein crystals are available and, besides, it provides solution state
dynamics. However, the main drawback of NMR-spectroscopy is the fact that it delivers lower resolution structures
[1]. Because of this, validation, the process of evaluating the reliability for 3-dimensional atomic models, becomes
critically important to protein structure determination via NMR-spectroscopy.
Materials and methods. Our group has developed a protein structure validation method called CheShift-2 [2],
which allows us to calculate the “differences” between observed and calculated chemical shifts for the nuclei of
interest (13 C α and 13 C β ). This validation method indicates where, in the protein structure, the biggest “differences”
are found. Thus, allowing us to modify the desired torsional angles, but keeping compatibility with all the existent
experimental information, in such a way that the observed and computed chemical shift values at a local and global
level are optimized.
We use a refinement algorithm that identifies the residues that contain flaws and then modifies the protein structure’s
torsional angles in a way that tend to diminish these flaws. The information to identify these residues is obtained
by CheShift-2, and to perturb the protein structure we use the software package for prediction and design of protein
structures, ROSETTA [3].
Conclusions. We evaluate our methodology by comparing the group of refined structures’ root mean square deviation (RMSD) and global distance test high accuracy score (GDT-HA) [4] against the same protein experimentally
determined at high-quality level. Moreover, the physicochemical quality of the results were assessed with validation
methods like PROCHECK [5] and MolProbity [6].
Acknowledgments. This work was supported by PIP-112-2011-0100030 (JAV) from IMASL-CONICET, Argentina,
and Project 328402 (JAV) from UNSL, Argentina. The research was conducted by using the resources of a local
Beowulf-type cluster at the IMASL-CONICET.
References
1. Krishnan VR, B.: Macromolecular Structure Determination: Comparison of X- ray Crystallography and NMR Spectroscopy.
eLS 2012.
2. Martin OA, Vila JA, Scheraga HA: CheShift-2: graphic validation of protein structures. Bioinformatics 2012, 28(11):15381539.
3. Raman S, Vernon R, Thompson J, Tyka M, Sadreyev R, Pei J, Kim D, Kellogg E, DiMaio F, Lange O et al: Structure
prediction for CASP8 with all-atom refinement using Rosetta. Proteins 2009, 77 Suppl 9:89-99.
4. Zemla A: LGA: A method for finding 3D similarities in protein structures. Nucleic acids research 2003, 31(13):3370-3374.
61
Structure prediction and protein function
ID:11
Poster Session
5. Laskowski RA, MacArthur MW, Moss DS, Thornton JM: PROCHECK: a program to check the stereochemical quality of
protein structures. Journal of Applied Crystallography 1993, 26(2):283-291.
6. Davis IW, Leaver-Fay A, Chen VB, Block JN, Kapral GJ, Wang X, Murray LW, Arendall WB, 3rd, Snoeyink J, Richardson
JS et al: MolProbity: all-atom contacts and structure validation for proteins and nucleic acids. Nucleic acids research
2007, 35(Web Server issue):W375-383.
Structure prediction and protein function
Poster Session – Submission 7
Effects of the poloxamer’s structure on its interaction with model membranes
revealed by molecular dynamics simulations at coarse grain scale
Irene Wood1,2 , M. Florencia Martini1,2 , Mónica Pickholz1,2
1-Pharmaceutical Technology Dept, Faculty of Pharmacy and Biochemistry, University of Buenos Aires, Buenos Aires, Argentina,
2-CONICET, Buenos Aires, Argentina.
r class are composed by a central hydrophobic
Background. The linear triblock co-polymers belonging to the Pluronic○
block of poli(propylene oxide) (PPO) flanked by two identicals hydrophilic blocks of poli(ethylene oxide) (PEO) [1].
These amphiphilic and biocompatible compounds are mainly used for biomedical and pharmaceutical purposes, due
to their varied PEO and PPO composition. The poloxamers capability to interact with membranes justifies their
applications [2].
Materials and methods. Coarse grained molecular dynamics (MD) simulations have been performed to investigate the interaction between different poloxamers, at their unimer form, with a fully hydrated 1-palmitoyl-2-oleoylphosphatidylcholine (POPC) lipid bilayer, from different initial localizations.
Results. We have observed dependence of the unimer behaviors on its structural and physico-chemical features. Most
of the studied unimers have shown different conformation depending on the initial condition. For instance, when F127
unimer was set up at the lipid-water interfacial region, adopts a coil structure in which the inner hydrophobic domain
(PPO) is surrounded by the outer hydrophilic portion (PEO), which remains in contact with water (Figure 1.A). By
the other hand, when F127 was initially placed at the bilayer hydrophobic region, have displayed a trans-membrane
conformation, with the PPO block spread into the membrane tail region and the PEO chains water solvated on the
both sides of the bilayer (Figure 1.B). Furthermore, the poloxamer L64 behaves in a different way when is compared
with F127 and other studied poloxamers. L64 adopts a compact structure at the lipid-water interphase, showing not
dependence on initial conditions.
Snapshots after 1µs for the F127 systems at different initial conditions: A) interphase and B) membrane core. F127
is represented as VDW spheres (PPO in red and PEO in green). POPC (choline in blue, phosphate in magenta,
carbonyls in light blue, acyl chains in brown) and water (transparent light blue) are represented as balls and sticks.
Conclusion. Our results provide a picture of the conditions determining poloxamer-bilayer interactions. The interaction degree of certain co-polymers with membranes could favor their use as excipients for drug delivery and as indirect
inhibitor of transmembrane efflux proteins, whose over-expression is related with multi-drug resistance.
Structure prediction and protein function
Poster Session – Submission 11
Frustration and Energetics in the Ankyrin Repeat Protein Fold
R. Gonzalo Parra, Espada Rocio, Verstraete Nina, Ferreiro Diego U.
Protein Physiology Laboratory, Dpto. Química Bioógica, Facultad de Ciencias Exactas y Naturales, UBA-CONICET-IQUIBICEN,
Universidad de Buenos Aires, Argentina.
62
Structure prediction and protein function
ID:14
Poster Session
Background. Natural protein sequences resemble random strings of amino acids. Patterns of a relatively small set
of folding architectures can be characterized by long distance interactions among amino acids. Repeat proteins are
composed of tandem copies of similar motifs and can get spontaneously organized in symmetrical ways in space.
Ankyrin repeat proteins comprise a large number of proteins containing tandem copies of a 33 residues length motif.
They are present in all kingdoms of life, and are apparently enriched in eukaryotes and some specific pathogens. Quasi
one-dimensional, these proteins constitute a simplified model, where the “sequence-codes-structure-codes-function”
paradigm can be quantitative evaluated.
Description. Given a structural detection of repeats we have achieved in a previous work using a geometrical approach
we developed [1], we analyzed the local frustration and energetic patterns of ankyrin repeat proteins in order to dissect
the energetical contributions corresponding to the different repeats, the array of repeats and their modifications. We
have quantified the degree of conservation of the frustrated state over the canonical positions of the ankyrin repeats
as well as for the different contacts that are present in the canonical contact map. Here we describe how frustration
patterns are distributed on the structures of this protein family and how it is related with other structural and sequence
measures that were calculated over the dataset.
Results. We have analyzed the energetical and frustration patterns in the ankyrin repeat protein structures. Natural
ankryin proteins are composed of three different populations of repeats that differ in their burial interaction energies
that is also reflected in differential sequence signatures and secondary structure composition. We found that these
molecules have frustration hotspots that are localized at the insertions and at binding sites for other partners as well as
in the terminal repeats. When quantifying the degree of conservation of the frustrated states at the level of canonical
positions in the ankyrin repeats we observed that those positions that are conserved correspond to positions where
the sequence is also conserved. Moreover, when the frustrated state is conserved, it corresponds to the minimally
frustrated one, i.e, “the more consensus an ankyrin protein is, the more foldable it is”. These positions that have high
conservation of the frustrated state at the single residue level are connected by a minimally frustrated interaction
network. We speculate that, at least in ankyrin repeats, consensus sequences stabilize the overall fold by maximying
the energetic gap between the folded and unfolded states establishing a network of minimally frustrated interactions
both within and between adjacent repeats. The potential implicancies of these findings for the dynamical protepties
of these molecules will also be discussed.
References
1. R. Gonzalo Parra , Rocío Espada , Ignacio E. Sanchez , Manfred J. Sippl , and Diego U. Ferreiro. “Detecting repetitions and
periodicities in proteins by tiling the structural space .” J. Phys. Chem. B. DOI: 10.1021/jp402105j. Publication Date (Web):
11 Jun 2013.
Structure prediction and protein function
Poster Session – Submission 14
Bioinformatics for Biomolecules learning
Llaraí Carolina Gaviria-González, María Teresa Ortiz-Melo, Josefina Vázquez-Medrano, María del Socorro
Sánchez-Correa
Carrera de Biología, Facultad de Estudios Superiores Iztacala, UNAM. Tlalnepantla, Edo. de México C.P. 54090, México.
Background. In general, it is considered a difficulty for teachers of scientific careers to teach abstract topics, such
as the spatial structure and behavior of molecules, supramolecular assemblies, or the importance and the relationship
between structure and function of biomolecules. We consider Bioinformatics as an excellent tool to improve teaching
of the above issues in science careers such as biology. So we decided to implement in the course of Biomolecules,
which is part of the Biology curriculum at Facultad de Estudios Superiores Iztacala of UNAM, a Bioinformatics lab
manual, in which the use of bioinformatic tools is intended to facilitate the comprehension of biomolecules and the
approach to bioinformatics applications by Biology students.
63
Structure prediction and protein function
ID:14
Poster Session
Materials and methods. The activities using bioinformatics tools were developed previously for each one of the
following themes: Functional groups, proteins, carbohydrates, lipids, nucleic acids, and some others such as secondary
metabolites or primer design. Then the lab manual was used within students enrolled in Biomolecules courses, which
were named the experimental group. The results where then compared between this group and another one of
students at the same course that didn’t use the manual (called control group), with a test that included theorical
questions about biomolecules structure and appreciation questions about the manual. The average of the grades at
each theme were obtained as well as the global average of the test. A chi-square test was performed to the obtained
data.
Figure 1: Grades obtained by control and experimental group. *Illustrate significant statistical differences.
Results. Although there is not significant statistical difference between the average grade obtained in theorical
questions of functional groups, proteins, carbohydrates or lipids, there is a significant statistical difference at nucleic
acids and global grade between the experimental and the control group, it seems to be a tendency of the former to
obtain greater grades in theorical questions (see Figure 1). Besides, we include an option that was “I don’t know
the answer”, which was more frequently chosed by the control group in comparison with the experimental group, as
seen at Figure 1. Finally, at least 80% of the surveyed students think that these activities may facilitate biomolecules
learning and consider them relevant for their proffesional development.
Conclusions. The use of Bioinformatics tools in teaching may contribute to Biomolecules learning.
Acknowledgments. We thank to the Programa de Apoyo a Proyectos para la Innovación y Mejoramiento de la
Enseñanza (PAPIME) of the Dirección General de Asuntos del Personal Académico (DGAPA) of the UNAM for
supporting this project. (PAPIME PE206112).
referencias
1. Carbone A, Gromow M, Kepes F, Westhof F: Folding and self-assembly of biological macromolecules. World
Scientific publishing Co. Pte. Ltd. Singapore. 2004.
2. Eiden L E: A two-way bioinformatics street. Science 2004, 306: 1437.
3. Martin F, Scholoissnig S: Bioinformatics and molecular modeling in glycobiology. Cellular and Molecular Life
Sciences. 2010, 67: 2749-2772.
64
Structure prediction and protein function
ID:26
Poster Session
4. Schwedw T, Peitsch M: Computational Structural Biology. Methods and Applications. World Scientific publishing Co. Pte. Ltd. Singapur. 2008.
Structure prediction and protein function
Poster Session – Submission 22
Understanding Mycobacterium tuberculosis Cyclopropane methyltransferases
(CMAs) structure-function relationship
Lucas Defelipe, Marcelo A. Marti and Adrian G. Turjanski
Departamento de Quimica Biológica, Universidad de Buenos Aires
Abstract. A serious concern in Mycobacterium tuberculosis treatment is the emergence of MDR (Multi-drug resistant) and XDR (Extensively drug-resistance) stains. Choosing new relevant drug targets is prioritary to fight MDR
and XDR TB. We developed TuberQ [1] a protein druggability datatabase to highlight relevant drug targets based
on structural druggability and gene expression experiments done in different environments mimicking the stresses TB
faces during infection[2,3]. Cyclopropane methyltransferases (CMAs) are shown as a potential targets. Mycobacterium tuberculosis Cyclopropane methyltransferases (CMAs) are responsible for the modification of mycolic acids by
the transfer of a methyl group to the olefin, making CMAs attractive drug targets. This protein family is composed
of 9 proteins (mmaA1-4, cmaA1-2, pcaA, uma A and ufaA). Mycolic acids are long chain (60-80 carbon atoms)
modified fatty acids which are major components of mycobacterial cell wall[4] and these modifications modulate
properties of the cell wall (such as drug permeability) and the immune response of the host[5]. Although CMAs are
methyltrasnferases with the typical Rossman fold [6] they are not only able to cyclopropinate but also to introduce
keto and metoxy modifications to the same olefin. In the present work we have used comparative modelling and
molecular dynamics simulations to understand the different reaction mechanisms present in this protein family. We
also performed a comparative druggability study of the family to develop a phamacophore model to aid in the search
of drug-like compounds with the ability to bind to several members of CMAs (cmaA1, cmaA2, pcaA and umaA), an
approach known as polypharmacology.
References
1.
2.
3.
4.
5.
6.
Radusky L, Defelipe LA, Lanzarotti E, Luque J, Barril X, et al. (2014) Database (Oxford) 2014: bau035.
Sassetti CM, Rubin EJ (2003) G. Proc Natl Acad Sci U S A 100: 12989–12994.
Voskuil MI, Visconti KC, Schoolnik GK (2004) Tuberculosis (Edinb) 84: 218–227.
Marrakchi H, Lanéelle M-A, Daffé M (2014) Chem Biol 21: 67–85. Available: Accessed 21 March 2014.
Barkan D, Hedhli D, Yan H-G, Huygen K, Glickman MS (2012) Infect Immun 80: 1958–1968.
Loenen W a M (2006) Biochem Soc Trans 34: 330–333.
Structure prediction and protein function
Poster Session – Submission 26
Psedocounts based on BLOSUM frequencies improves contact prediction using
mutual information
Diego Javier Zea1,2 , Diego Anfossi, Cristina Marino Buslje1 , Morten Nielsen3,4
1
Fundación Instituto Leloir, C1405BWE, Capital Federal, Buenos Aires, Argentina 2 Departamento de Ciencia y Tecnología, Univer-
sidad Nacional de Quilmes, B1876BXD, Bernal, Buenos Aires, Argentina 3 Instituto de Investigaciones Biotecnológicas, Universidad
Nacional de San Martín, B1650HMP, San Martín, Buenos Aires, Argentina 4 Center for Biological Sequence Analysis, Department
of Systems Biology, The Technical university of Denmark, DK2800, Lyngby, Denmark
[email protected]
Background. Mutual information calculation (MI), from information theory, uses a Multiple Sequence Alignment
(MSA) of homologous proteins to predict coevolving sites [1]. A major problem for MI calculation is the number of
sequences of the alignment. MSAs with low number of sequences, which are very frequent, will have no observations
65
Structure prediction and protein function
ID:26
Poster Session
for every possible pair of amino acids. In a previous study, we found that a correction for low count was useful in
those cases [2]. However, methods predictive performance, measure as residue contacts, decreases with less than
400 clusters of of non redundant sequences (at 62% of identity). Here we tested the use of pseudo frequencies based
on BLOSUM matrix, as described in Altschul et.al. [3]. This is a more realistic pseudo count strategy, which can
improve the MI calculation performance for protein families with low number of sequences.
Materials and methods. Method performance is measure as the AUC and AUC 0.1 of ROC curves for predicting
contact residues (beta carbons at 8 angstroms, alpha for glycines). The dataset comprises 150 protein families with
a range of 10 to 1000 clusters of proteins at 62% of identity. We used the algorithm described on Marino Buslje et.
al. [2] where a pseudocount is fixed on a user defined value (recommended to be 0.05). On this new approach, we
use pseudocounts based on BLOSUM frequencies.
X
Gab =
pcd · BLOSUM62(a|c) · BLOSUM62(b|d)
cd
αpab + βGab
α+β
Where p is the observed frequency of a pair, and G the pseudo frequency estimated by conditional probability
BLOSUM62 matrix by the observed frequencies p. alpha is the number of clusters of the MSA and beta is an empiric
value for assigning a weight to the pseudocount.
Pab =
Results and Conclusions. After testing a large number of beta values in a range between 1 and 550, the best
performance was obtained for beta close to 10 on a dataset of 150 proteins . On this range, performance increased
for alignments with a small number of clusters, and remains similar for large and well populated alignments in
comparison with the original method using a fixed pseudo count value.
Figure 1
References
1. Martin, LC et al. “Using information theory to search for coevolving residues in proteins.” Bioinformatics 21.22 (2005):
41164124.
2. Buslje, Cristina Marino et al. “Correction for phylogeny, small number of observations and data redundancy improves the
identification of coevolving amino acid pairs using mutual information.” Bioinformatics 25.9 (2009): 11251131.
66
Structure prediction and protein function
ID:32
Poster Session
3. Altschul, Stephen F et al. “Gapped BLAST and PSIBLAST: a new generation of protein database search programs.”
Nucleic acids research 25.17 (1997): 33893402.
Structure prediction and protein function
Poster Session – Submission 29
Network of residues involved in preserving the conformational diversity of a protein
Tadeo E. Saldaño, Gustavo Parisi, and Sebastian Fernández-Alberti
niversidad Nacional de Quilmes, Roque Saenz Peña 352, B1876BXD Bernal, Argentina
Background. Protein flexibility and dynamics are commonly associated with protein function. Nowadays, it is well
established that the functional form of the protein, also known as the native state, is not unique. Pre-existing
population of conformers and dynamics landscapes offer a central view to explain protein function. Therefore, we
propose to study the dynamically relevant residues responsible of maintaining the dynamism of protein and then
maintaining the protein function. We expect to consider these residues as fingerprints of protein function. Using the
conformational diversity database (CoDNaS, Conformational Diversity of Native State http://www.codnas.com.
ar/index.php), we propose to identify and characterize the network of residues that are involved in preserving the
conformational diversity of a protein. The predictions of dynamically important residues serve as promising targets
for mutational and functional studies.
Materials and methods. Methods The dynamics of the different conformers for each protein are studied using
normal mode analysis (NMA). We use methods that allow us to obtain the the collective motion associated to
the low-frequency normal modes for a large variety of proteins. The dynamically relevant networks of residues
responsible for maintaining protein dynamism are identified by previously detecting the normal modes that contribute
the most a specific structural change between a pair of conformers. For this purpose, the vector describing the
conformational change is projected on the basis of the normal modes of each conformer and the normal modes that
present the maximum overlap are retained. After that, we probe the effect of point mutations of each residue on a
conformationally relevant normal mode by calculating the response of the springs connected to it. These evaluations
of residue- dependent responses to local perturbations in the elastic network representation of the protein structure
will allow us to identify the network of residues that modulate the conformational changes.
Several surveys have been carried out to examine the nature of residue interactions. We evaluate the evolutionary
conservation, solvent accessible area and secondary structure, of network of residues.
Results. The dynamically relevant networks of residues responsible of maintaining protein dynamism have been
identified and characterized. We combine the information obtained from methods based on structural and dynamic
properties of proteins with information related to their evolutionary conservation. We explore the correlation between
ligand-binding residues and the dynamically important residues predicted by our perturbation. Results related to the
conformational changes associated to the ligand binding are presented.
Structure prediction and protein function
Poster Session – Submission 32
In Silico Optimization of Epidermal Growth Factor Receptor Inhibitors Followed by
Experimental Evaluation
Claudio Cavasotto1 , Martín Lavecchia1 and José Ignacio Borrell2
1
Instituto de Investigación en Biomedicina de Buenos Aires-CONICET-Partner Institute of the Max Planck Society, Argentina
2
Spain Istitut Químic de Sarriá, Universitat Ramón Llull, Spain
Abstract. The Epidermal Growth Factor Receptor (EGFR) is part of an extended family of proteins that together
control aspects of cell growth and development. It is a validated target for drug discovery, since it is involved in
several types of cancer. Starting from a dichlorobenzyl pyridopyrimidine scaffold lead, and following an in silico
flexible-ligand/flexible receptor docking-based characterization of its binding mode, a combinatorial virtual library of
67
Structure prediction and protein function
ID:33
Poster Session
analogs was built, and their binding free energy assessed using the molecular mechanics-generalized born surface
area (MM/GB-SA) approach, after performing long molecular dynamics simulations in explicit water. Molecules
with better and worse predicted affinity were synthetized and their activity experimentally evaluated, thus obtaining
a dibromobenzyl-substituted molecule with improved performance; experimental results were in excellent agreement
with theoretical predictions. It is also remarkable that the ranking of the experimental inhibition activity is consistent
with our calculations, which shows that binding free energy evaluation using the MM/GB-SA is a valid method for
ligand optimization using an in silico-generated combinatorial library of analogs, followed by energy calculation and
bioevaluation.
Structure prediction and protein function
Poster Session – Submission 33
WATCLUST: a tool for improve the design of hidrophilic drugs based on the
proteinwater interactions
Elias Daniel Lopez1 Diego Gauto2 Ariel A. Petruk2 Victoria G. Dumas1,2 , Juan Pablo Arcon2 Marcelo Adrián
Marti1,2 „ Adrian Gustavo Turjanski1,2,
1
2
Departamento de Química Biológica, Facultad de Ciencias Exactas y Naturales, UBA, Buenos Aires, Argentina
INQUIMAECONICET, Facultad de Ciencias Exactas y Naturales de la Universidad de Buenos Aires, CE1428EHA, Ciudad Autónoma
de Buenos Aires, Argentina
Author to whom correspondence should be sent: [email protected]
Background. Water play an essential role in the structure and function of proteins. Precisely positioned water
molecules participate in many enzymatic reaction mechanisms; solvent reorganization and displacement is a key
contributor to the thermodynamics and the process of proteinligand binding, protein folding and protein large scale
motions; and water chains actively participate in both proton and electron transfer process. WS are defined as confined
space regions adjacent to the protein surface where the probability of finding a water molecule is higher than in the
bulk solvent. The strategy used to determine the WS is adapted from our previous works [1,2]. Once determined
and characterized the WS can constitute a good thermodynamic description of free energy ligand binding. To
elucidate the thermodynamic profile and their potential contribution to ligand binding, a hydration site analysis program
WATCLUST was developed. WATCLUST identifies hydration sites from a molecular dynamics simulation trajectory
with explicit water molecules. The free energy profile of each hydration site is estimated by computing the enthalpy
and entropy of the water molecule occupying a hydration site throughout the simulation. The results of the hydration
site analysis can be displayed in VMD. WATClust thus presents an easy, and user friendly, analysis visualization tool
to determine the WS and their properties that can be used for all people in the structural bioinformatics field and that
also allows to directly transfer this information to the Autodock program, one of the most widely used open source
Docking programs, to perform WS biased docking (WSBD)(figure 1).[3]
68
Structure prediction and protein function
ID:34
Poster Session
Figure 1: A) Screenshot of dialog utilized to select input protein and possible ligand structures, (clusters or WS) and the option for defining the
“protein selection”. B) Example of WS results as displayed in the VMD plugin. The VMD viewer window showing the predicted hydration sites in the
protein binding site. The hydration sites are shown as small spheres and colored in this example based on their DG values.
Acknowledgments. EDL is a ANPCyT doctoral fellow. MAM and AGT are CONICET investigators. This work was
partly funded by ANPCyT PICTNo. 20102805.
References
1. Lella, S.D., Martí, M.A., Álvarez, R.M.S., Estrin, D.A., Díaz Ricci, J.C:Characterization of the galectin1 carbohydrate
recognition domain in terms of solvent occupancy (2007) Journal of Physical Chemistry B, 111 (25), pp. 73607366
2. Gauto, D.F., Di Lella, S., Guardia, C.M.A., Estrin, D.A., Martí, M.A: Carbohydratebinding proteins: Dissecting ligand
structures through solvent environment occupancy (2009) Journal of Physical Chemistry B, 113 (25), pp. 87178724.
3. Gauto DF, Petruk AA, Modenutti CP, Blanco JI, Di Lella S, Martí MA: Solvent structure improves docking prediction in
lectincarbohydrate complexes. Glycobiology. 2013 Feb;23(2):24158. doi: 10.1093/glycob/cws147
Structure prediction and protein function
Poster Session – Submission 34
Interactions between aromatic rings in Protein-drug complexes: a database to
survey
Esteban Lanzarotti1 , Lucas A. Defelipe1 , Leandro Radusky1 , Marcelo A. Marti1 , Adrian G. Turjanski1 .
1
Departamento de química biológica, Facultad de Ciencias Exactas y Naturales, UBA.
The aromatic interactions have been shown to be important in both biological processes and their chemical characteristics1. Also, it has been shown that aromatic interactions form clusters grouping several aromatic rings with
an additive energetic nature2. Using a reliable dataset based on PDB, we have performed a statistical analysis of
aromatic interactions in the context of protein-drug complexes using planar angle and distances between rings as
interactions descriptors. We have found that aromatic interactions between aromatic rings in drugs and rings in
proteins found in aromatic residues (PHE, TYR, TRP and HIS) are enriched in pi-stacking conformations compared
to aromatic interactions between two rings in residues. Also, in previous work3, we have defined an aromatic cluster
as the transitive clousure applied over the underlying relation defined by aromatic interactions studying this groups
in residue-residue interacitions. Now, we have extend this aromatic cluster definition over the entire PDB, building
a web interface that enables the user to search for protein-ligand complexes provinding a way to rank this entries in
terms of its aromatic clusters relevance.
69
Structure prediction and protein function
ID:51
Poster Session
References
1. Salonen LM, Ellermann M, Diederich F. Aromatic rings in chemical and biological recognition: energetics and structures.
Angew Chem Int Ed Engl 2011.
2. Tauer TP, Sherrill CD. Beyond the benzene dimer: an investigation of the additivity of pi-pi interactions. J Phys Chem
A 2005.
3. Lanzarotti E, Biekofsky RR, Estrin DA, Marti MA, Turjanski AG. Aromatic-aromatic interactions in proteins: beyond the
dimer. J Chem Inf Model 2011.
Structure prediction and protein function
Poster Session – Submission 51
Analyzing the active and inactive state of EGFR kinase domain by pockets and
cavities structural properties comparison
Marcia Hasenahuer1 , Yanina Powazniak2 , Guillermo Bramuglia2 , Gustavo Parisi1 and María Silvina Fornasari1
1
Departamento de Ciencia Y Tecnología, Universidad Nacional de Quilmes, Bernal, Buenos Aires, Argentina.
2
Fundación Investigar-Argenomics, Buenos Aires, Argentina
Background. EGFR (Epidermal Growth Factor Receptor) is one of the main tumor markers in many cancer types[1].
Several single amino acid substitutions (SASs) in this protein are present in different cancers. Most of these SASs are
characterized as “activating”, due to the stabilization of the conformer required to drive the phosphorylation (active
form). EGFR is a trans-membrane protein, formed by an extracellular, a trans-membrane and a cytoplasmic regions.
The latter has a juxtamembrane, a Tyr-kinase and a C-terminal intrinsically disordered tail (C-tail) regions. Autophosphorylation on different C-tail tyrosine sites triggers signals for different cellular pathways, involved in cell growth
and proliferation[2, and references therein]. Most interaction sites of proteins with their ligands and substrates are
located in cavities or pockets on protein surface[3, and references therein]. The goal of this work is to understand
the structural and physicochemical characteristics of EGFR kinase pockets that differentiate the active and inactive
conformations and to try to elucidate the effect of SASs in those pockets, that could trigger the unregulated kinase
activity of this protein. Particularly, the analysis includes the effect of not previously reported SASs observed in
Argentinean cancer affected patients.
Methods. Pockets and cavities calculations were performed considering active, inactive, monomeric and dimeric conformers of human EGFR Tyr-kinase region with fPocket [http://fpocket.sourceforge.net/]. Different conformer
coordinates were taken from PDB [http://pdb.org/pdb/home/home.do] and CoDNAS [http://www.codnas.com.
ar/about.php]. The characteristics of the pockets related to the catalytic site of the kinase were analyzed in the
conformers using per-site RMSD and different pocket descriptors as volume, polar and apolar surface area, charge,
hydrophobicity among others, considering also structures with mutations. Clustering methods were applied to compare this information using statistical packages of R [http://www.r-project.org/]. Further, Argentinean patient
and others compiled from COSMIC database SASs [http://cancer.sanger.ac.uk/cancergenome/projects/
cosmic/] were mapped onto the structures, to analyze SASs-pockets relationship.
Conclusions. The main pockets related to the active site of kinase found in this work, either contain or are in close
contact with 70% of all the 195 positions with SASs related to cancer in EGFR cytoplasmic region. Reorganization of
pockets could favor the binding of the C-tail to be phosphorylated and could affect the affinity for ATP or anti-cancer
drugs. Disease related SASs could affect the dynamics and shape of pockets, promoting a deregulated EGFR activity.
Co-localization of most of cancer related sites in pockets could be important to improve our understanding in the
effects of different EGFR SASs and to include this information in the development of predictive computational tools.
References
1. Salomon DS, Brandt R, Ciardiello F, Normanno N. 1995. Epidermal growth factor-related peptides and their receptors in
human malignancies. Crit Rev Oncol Hematol 1995, 19:183-232.
70
Structure prediction and protein function
ID:56
Poster Session
2. Levitzki A and Gazit A. Tyrosine kinase inhibition: an approach to drug development. Science 24,267(5205):1782-8.
3. Gora, A., Brezovsky, J., & Damborsky, J. Gates of Enzymes. Chemical Reviews 2013,113(8):5871–5923
Structure prediction and protein function
Poster Session – Submission 56
Performance analysis of a comparative protein-DNA structure modeling pipeline
with MODELLER versus a standard protocol with 3DNA
Ignacio Ibarra, Francisco Melo
Laboratorio de Bioinformática. Pontificia Universidad Católica de Chile
Abstract. Structural information can be potentially applied in the prediction of a binding event between proteins and
DNA. This task has been addressed by many groups with variable results that depend on the theoretical approximation
and/or the testing metrics used. A recurrent methodological step for protein-DNA modeling is the replacement of
a set of DNA sequences into the same template, using standard software for DNA bases replacement. In this work,
we have tested a new Comparative modeling pipeline for protein-DNA modeling, based on the MODELLER software
suite.
A set of 18 DNA geometrical restraints extracted from a non-redundant set of protein-DNA complex structures were
used to model and minimize the computational binding of MarA, a monomeric bacterial transcription factor, to an
ensemble of DNA sequences, using the MarA-DNA complex structure as a reference template. 34 MarA binding sites
were used as a testing set to evaluate our pipeline performance against a standard protocol based on DNA bases
replacement with 3DNA. In both approaches, different statistical potentials were applied for evaluation and ranking
of DNA binding sites, with respect to an ensemble of negative DNA sequences.
The results obtained in this work propose the promissory use of this new comparative protein-DNA complex structure
modeling and evaluation protocol as a suitable and general tool for the in silico prediction of protein-DNA binding
specificity.
71
Index
Agudelo,WA, 59
Albarraci,VH , 28
Alonso,LG, 6
Amadío,A, 46, 47
Anfossi,D, 65
Angelone,L, 15
Aptekmann, A, 47
Arab Cohen,D, 51
Arcon,JP, 68
Assis, J , 42
Ballarin,V, 21
Banchero,M, 34
Belfiorem,C, 28
Benalcázar,M, 21
Benintende,B, 47
Benintende,G, 46
Berenstein,A, 32
Berenstein,AJ, 34
Berenstein,JA, 16
Berretta,M, 46, 47
Blundell,TL, 2
Boechi,L, 12
Borrell,JI, 67
Bracco,M, 44
Bramuglia,G, 70
Brun,M, 21
Bulacio,P, 15
Bustamante,J, 12
Buus,S, 9
Campillo,NE, 2
Capriotti,C, 60
Carballido,JA , 35
Carbonetto,MB, 54
Cascales,J, 44
Cecchini,RL, 38
Chemes,LB, 13
Chernomoretz,A, 34
Claudio Cavasotto,C, 67
Comas,D, 21
Corva,P, 21
Cravero,F, 38, 56
Cucher, M, 42
Díaz,MF, 56
da Fonseca,M, 37
Daurelio,L, 18
De Los Ríos,P, 49
de Sousa Serro, M, 43
Defelipe,A, 69
Defelipe,L, 65
Demey,JR, 53
Demey-Zambrano,J, 53
Di Rienzo,JA, 53
Dumas,VG, 68
Dussaut,JS, 38
Eizaguirre,JI, 49
Elgoyhen,B, 48
Espada,R, 13, 62
Esteban,L, 18
Esteban.L, 50
Estrín,DA, 12
Ezpeleta,J, 15
Farias,ME, 28
Fernández,E, 33, 51
Fernández-Alberti,S, 6, 12, 67
Ferreiro,DU, 6, 30, 62
Ferrer-Sueta,G, 59
Ferreyra,N, 51
Fornasari,MS, 70
Franchini,L, 48
Fresno,C, 51
Gallo,CA , 35
Gauto,D, 68
Gaviria-González,LM, 63
Germán Mato,G, 41
Glavina,J, 13
Gomes Araújo, F , 42
González Lebrero,MC, 7, 59
Gonzalo Cogno,S, 19, 41
Gorriti,M, 28
Gottlieb,AM, 44
Grosso,M, 12
Hansen,AM, 9
Hasenahuer,M, 70
Hittinger,C, 49
Ibarra,I, 71
72
Index
Iserte,J, 59
Juritz,E, 20
Kalstein,A, 12
Kamenetzky, L, 42
Koile, D, 43
Kovalevski,L, 24
Krick,T, 6
Kropff,E, 19
Kurth,D, 28
Lanzarotti,E, 69
Lavecchia,M, 67
Libkind,D, 49
Llera,A, 33
Lopes,C, 49
Lopez,ED, 68
Méndez,NA, 28
Macat,PB, 24
Magariños,MP, 16
Magni,C, 50
Maguitman,AG, 38
Maldonado, LL , 42
Mancini,E, 54
Manta,B, 59
Marcatili,P, 7
Marino Buslje,C, 3, 32, 34, 59, 65
Martínez,JL, 31
Marti,MA, 12, 65, 68, 69
Martin, OA , 61
Martinelli,R, 18
Martinez,MJ, 56
Martini, MF , 62
Melo Ledermann,F, 4
Melo,F, 71
Merino,G, 33
Meschino,G, 21
Montemurro,M, 19
Monzon,M, 60
Murillo,J, 15
Nadra, A, 47
Natalia Macchiaroli, 42
Navas,L, 47
Navas,N, 46
Nielsen,M, 3, 9, 65
Nizzo,GG, 50
Oliveira, G, 42
Oliver,J, 33
Ortiz,M, 46, 47
Ortiz-Melo,MT, 63
Pagnuco,I, 21
Parisi,G, 60, 67, 70
Parra,GR, 62
Peris,D, 49
Petruk,AA, 68
Pickholz, M, 62
Pisciottano,F, 48
Poggio,L, 44
Ponzoni,I, 35, 38, 56
Powazniak,Y, 70
Prada,F, 33
Prato,P, 51
Pratta,GR, 24
Quaglino,M, 24
Ré,MA , 31
Radusky,L, 12, 69
Ramírez, PG, 11, 61
Rasmussen,M, 9
Reinert,MD, 23, 54
Revale,S, 54
Rodríguez,ME, 49
Rodriguez de la Vega,RR, 13
Roitberg,A, 12
Rosenzvit, M, 42
Sánchez,IE, 6, 13, 28, 30
Sánchez-Correa,MdS, 63
Saldaño, 67
Samengo,I, 19, 37
Santos,J, 59
Sauka,D, 47
Sendoya,JM, 33
Shub,DA, 6
Shub,M, 6
Simonetti,FL, 34
Sippl,M, 2
Soto,AS, 56
Spetale,FE, 15
Tapia,T, 15
ten Have,A, 12
Teppa,E, 32
Turjanski,AG, 65, 68, 69
73
Index
Vázquez-Medrano,J, 63
Vazquez,DS, 59
Vazquez,GE, 6, 56
Vazquez,MP, 54
Verstraete,N, 6, 30, 62
Vicente,NB, 33
Vicente-Villardón,JL, 53
Vila, JA, 61
Wallace, D, 43
Wood,I , 62
Yankilevich, P, 43
Zandomeni,R, 46, 47
Zea,DJ, 32, 65
Zingaretti,ML, 51, 53
74