BII2010 - Bioinformatics Institute

Transcription

BII2010 - Bioinformatics Institute
For Enquiries:
(A member of A*STAR’s Biomedical Sciences Institutes)
30 Biopolis Street #07-01 Matrix
Singapore 138671
Tel: +65 6478 8298
Fax: +65 6478 9048
Website: www.bii.a-star.edu.sg
Company Registration Number: 19-9702109N
Bioinformatics
Institute
BII
2010
Bioinformatics Institute
1
DIRECTOR’S
MESSAGE
Contents
Director’s Message | 1
Bioinformatics Institute | 2
Scientific Advisory Board | 2
RESEARCH DIVISIONS
Biomolecular Function Discovery Division
It is still premature to speak about life sciences as a
and for re-equipping BII with new computer systems and
theoretical scientific discipline. The extrapolation depth is
wet lab equipment. At present, the Institute carries out
small due to the fragmentary knowledge in a vast space of
research in the areas of
the unknown. Incremental accumulation of data as a result
of hypothesis-driven experiments and observations is still
• biomolecular sequence analysis for the prediction of
the major source of new insight. Nevertheless, there are a
molecular and cellular functions (including the
few increasingly important research areas where the
biochemical verification of hypotheses on function)
application of quantitative, mathematical concepts has
• biomolecular structure modelling and ligand design
become instrumental for the discovery of new biomolecular
• gene expression profile analysis at the transcript and
Sebastian Maurer-Stroh & Frank Eisenhaber | 4
Georg Schneider | 6
Sharmila Adhikari | 8
mechanisms and for progress in biological theory.
Biomolecular Modelling And Design Division
Chandra Verma | 10
Ivana Mihalek | 12
Mallur Srivatsan Madhusudhan | 14
sequencing, microarray techniques, etc.). As a result,
The Bioinformatics Institute has developed and deployed
researcher can, for the first time, generate so much data
analytical tools and computational techniques for biology
that, essentially, the aim of describing living organisms in
research in house and through close collaborations with
their totality has become realistic. Yet, the deluge of data
experimental and clinical groups within and outside the
is often without understanding in terms of biomolecular
Biopolis and Singapore. The ANNOTATOR suite as an efficient
Genome And Gene Expression Data Analysis Division
Vladimir Kuznetsov | 16
Vivek Tanavde | 18
Igor Kurochkin | 20
mechanisms that link genome information and phenotypes.
environment for protein sequence-based function prediction
Computational biology has entered a new era characterized
is an example for this. To emphasize, experimental efforts
by the availability of fully sequenced genomes, as well as
also have a place in BII (i) for the verification of theoretically
increasingly complete gene expression and proteomics
derived hypotheses (as a complement to interactions with
datasets that wait for functional interpretation. In increasingly
experimental teams in collaborating institutions) as well as
Imaging Informatics Division
Lee Hwee Kuan | 22
Martin Wasser | 24
more instances, hypotheses about biomolecular mechanisms
(ii) for the generation of datasets that are necessary for the
from the data can be derived; thus, computational biology
development of theoretical methods. For this purpose, BII
becomes instrumental to generate qualitatively new biological
researchers heavily rely on co-operations with experimental
insight.
teams affiliated with other A*STAR institutes and elsewhere
IT SCIENTIFIC SERVICES
Software Engineering | 26
Bio-Computing Centre | 27
Adjunct Scientists | 28
Visiting Scientists | 29
Science Outreach Activities | 30
Conferences and Visits | 31
Recreation Club | 32
Administrative Team | 33
BII Location | 33
proteome levels
• automated analysis of microscopic images from cellular
This development has been fuelled by the emergence of
systems (imaging informatics)
high-throughput experimental techniques (such as DNA
in Singapore and the world. BII also has its own biochemical/
The Bioinformatics Institute, which was founded by Dr.
cell-biological laboratory and a high-end microscopy unit.
Gunaretnam Rajagopal in 2001 and led by myself since
August 2007, is on its way to becoming a notable contributor
The members of our institute are united in making this effort
of biologically relevant results and new, efficient computational
a success and I invite you to join us in this endeavour that
biology methods to the world-wide scientific effort in the
will open new frontiers in biology.
search for yet unknown biomolecular mechanisms, an effort
with the goal of applications in medicine and biotechnology.
Dr. Frank Eisenhaber
For BII, the years 2007 -2009 are the time for launching a
Director
new research program, for the start of new research teams
Bioinformatics Institute
Director’s Message |
1
Matrix Building, Biopolis
Photo by Vivek Tanavde, BII
Reception Counter
Photo by Vivek Tanavde, BII
2
3
Bioinformatics Institute
Research Divisions
Located in the Biopolis, the Bioinformatics Institute (BII) was set up by the Agency for Science, Technology and Research (A*STAR) in July
2001; it was re-launched in the autumn 2007 as a research institute for biomolecular mechanism discovery guided by computational biology
methods.
Bioinformatics is a multi-disciplinary approach combining computational and biological expertise to analyze biological data (both genomic
and clinical), to advance biomedical research and development. Bioinformatics is both a science and an engineering art, concerned with
the application of mathematics, physical/chemical principles and information technology to solve biological problems.
The spectrum of research activities in BII includes bioinformatics method development, experimental work for verification of hypotheses
on gene function and collaborations with other experimental labs for biological data interpretation. Additionally, BII aims to provide
postgraduate training as well as regional resource support in bioinformatics, especially for the institutes of the Biomedical Research Council
(BMRC) of A*STAR.
In the Bioinformatics Institute, there are four methodology-oriented research divisions comprising of research groups lead by independent
Principal Investigators that focus on specific areas of computational biology. The common denominator is the goal of understanding
biomolecular mechanisms underlying cellular phenomena, which is the basis for a rational understanding of pathogenesis or for planning
biotechnological applications.
Together with the BMRC, A*STAR research institutes and multinational R&D organizations in the Biopolis, the BII is situated in a conducive
environment for exchange of scientific knowledge and friendly interaction that will prompt greater collaborations, and position the Biopolis
as a notable biomedical R&D hub in Asia and in the world.
Research Divisions
Scientific Advisory Board
The Director of BII is advised by a Scientific Advisory Board consisting of eminent scientists in the field of bioinformatics/computational
biology and experimental life sciences, with respect to the institute’s research directions, recruitment of staff and international research
collaborations.
Biomolecular Function
Discovery Division
Genome and Gene Expression
Data Analysis Division
Principal Investigators
Principal Investigators
- Frank Eisenhaber
- Vladimir Kuznetsov
- Sebastian Maurer-Stroh
- Vivek Tanavde
- Georg Schneider
- Igor Kurochkin
The presiding members are:
Prof. Sir Tom Blundell (Chairman)
Chair of School of Biological Sciences
Sir William Dunn Professor of Biochemistry
Department of Biochemistry
University of Cambridge
Prof. Eytan Domany
The Henry J Leir Professorial Chair
Head, Kahn Family Research Center of Systems
Biology of the Human Cell
Department of Physics of Complex Systems
Weizmann Institute of Science, Israel
Prof. Michael Levitt
Professor and Chair
Department of Structural Biology
Stanford University School of Medicine Stanford
2
| Bioinformatics Institute
Prof. Tom Rapoport
Professor of Cell Biology
Howard Hughes Medical Institute Investigator
Department of Cell Biology
Harvard Medical School
Prof. Jason Swedlow
Wellcome Trust Senior Research Fellow and Reader
Wellcome Trust Centre for Gene Regulation and
Expression
College of Life Sciences
University of Dundee
- Sharmila Adhikari
Biomolecular modelling
and design division
Imaging informatics
division
Principal Investigators
Principal Investigators
- Chandra Verma
- Lee Hwee Kuan
- Ivana Mihalek
- Martin Wasser
- Mallur Srivatsan Madhusudan
RESEARCH DIVISIONS |
3
Recent Publications
1.
2.
4
Sebastian
MAURER-STROH &
Frank EISENHABER
Postdoctoral Fellows:
CHEN Li; Vachiranee LIMVIPHUVADH; Roger LOW Wee Chuang;
MA Jianmin; Dimitar KENANOV
Figure 1: 3D model of the 2009 H1N1 neuraminidase. The bound antiviral drug
is shown in green. Regions differing from the H5N1 avian flu and the 1918 H1N1
Spanish flu are shown in yellow. Mutations occurring among different patients
within the first weeks of the 2009 outbreak appear red.
Research Associates:
HAN Hao; WONG Wing Cheong; Raphael LEE Tze Chuen;
NEO Keng Hwee; Swe Swe THET PAING
Our group has covered a wide range of projects during the last year,
from a proteomic analysis of neural cortical stem cells in dicer
knock-out mice, over the identification of an epilepsy candidate
gene, to the prediction of amyloid fibre-forming peptides. As a
characteristic example of our work, the analysis of the new swineorigin H1N1 influenza virus has been among the first published
scientific works during the pandemic outbreak. We concluded that,
although the virus belongs to a new subtype variety, the mutations
tend to not affect the potency of neuraminidase-inhibiting drugs
but merely change the antigenic properties of the virus proteins.
We have been critically involved in the establishment of an efficient
analysis pipeline of viral sequences in Singapore that was later also
extended to partners in Mexico from the Instituto Nacional de
Medicina Genomica, the leading genome institute studying the virus
sequences close to the source of the outbreak. Our immediate
collaborators include the local hospitals and Singapore’s Ministry
of Health for samples and GIS A*STAR for the sequencing. The
particular contribution of our group is the surveillance of the ongoing
evolution of the 2009 H1N1 influenza A virus and the effect that
mutations could have on the biology of the virus, the severity of
infection and the applicability of available antiviral drugs.
4
| RESEARCH DIVISIONS
Dhar PK, Thwin CS, Tun K, Tsumoto Y, Maurer-Stroh S, Eisenhaber F,
Surana U. Synthesizing non-natural parts from natural genomic template.
J Biol Eng. 2009 Feb 3;3:2.
3.
Reumers J, Maurer-Stroh S, Schymkowitz J, Rousseau F. Protein
sequences encode safeguards against aggregation. Hum Mutat. 2009
Mar;30(3):431-7.
4.
Maurer-Stroh S, Ma J, Lee RT, Sirota FL, Eisenhaber F. Mapping the
sequence mutations of the 2009 H1N1 influenza A virus neuraminidase
relative to drug and antibody binding sites. Biol Direct. 2009 May
20;4(1):18.
5.
Yamamoto Y, Ihara M, Tham C, Low RW, Slade JY, Moss T, Oakley AE,
Polvikoski T, Kalaria RN. Neuropathological correlates of temporal pole
white matter hyperintensities in CADASIL. Stroke. 2009 Jun;40(6):200411.
6.
Ooi HS, Kwo CY, Wildpaner M, Sirota FL, Eisenhaber B, Maurer-Stroh
S, Wong WC, Schleiffer A, Eisenhaber F, Schneider G. ANNIE: integrated
de novo protein sequence annotation. Nucleic Acids Res. 2009 Jul
1;37(Web Server issue):W435-40.
7.
Wong WC, Cho SY, Quek C. R-POPTVR: A novel reinforcement-based
POPTVR fuzzy neural network for pattern classification. IEEE transactions
on neural networks, 2009 Jul 5; v20, n11, pp1740-1755
8.
Van Durme J, Maurer-Stroh S, Gallardo R, Wilkinson H, Rousseau F,
Schymkowitz J. Accurate prediction of DnaK-peptide binding via
homology modelling and experimental data. PLoS Comput Biol. 2009
Aug;5(8):e1000475.
9.
Zhao C, Zhang H, Wong WC, Sem X, Han H, Ong SM, Tan YC, Yeap
WH, Gan CS, Ng KQ, Koh MB, Kourilsky P, Sze SK, Wong SC.
Identification of novel functional differences in monocyte subsets using
proteomic and transcriptomic methods. J Proteome Res. 2009
Aug;8(8):4028-38.
Protein Sequence Analysis
Based primarily on protein sequence analysis and the analysis of
other sequence-associated data (for example, from functional
genomics and proteomics studies), the various aspects of molecular
and cellular function (enzymatic activities, posttranslational
modifications, cleavage, translocation signals, 3D structures, effects
of mutations, pathway relationships, etc.) are predicted. This
biological insight can then be used for planning experimental
validation experiments in cooperation with collaborators from other
institutes or in the division’s own protein biochemical laboratory.
Van Damme P, Maurer-Stroh S, Plasman K, Van Durme J, Colaert N,
Timmerman E, De Bock PJ, Goethals M, Rousseau F, Schymkowitz J,
Vandekerckhove J, Gevaert K. Analysis of protein processing by
N-terminal proteomics reveals novel species-specific substrate
determinants of granzyme B orthologs. Mol Cell Proteomics. 2009
Feb;8(2):258-72.
10. Eisenhaber F, Kwoh CK, Ng SK, Sung WK, Wong L. Brief overview of
bioinformatics activities in Singapore. PLoS Comput Biol. 2009
Sep;5(9):e1000508.
The publication of Maurer-Stroh et al. “Mapping the sequence mutations of the
2009 H1N1 influenza A virus neuraminidase relative to drug and antibody binding
sites” in Biology Direct (2009) was a highlight since it drew the attention of the
scientific and lay public to BII for its scientific work. The finding that drug resistant
strains are almost absent from the circulating H1N1 virus population was of
general medical importance. This paper has been downloaded by about 11,000
readers since May 2009; thus, it is the most accessed publication on Biology
Direct during the last 12 months and it belonged to the 10 most viewed articles
in Biomed Central (having more than 200 journals) during May 2009.
Dr Vachiranee Limvipuvadh
et al received the “Best
Poster Award” for their
poster entitled “Analysis
of
the
molecular
mechanisms of known and
predicted
disease
mutations in LGI epilepsy
genes” during the 8th
International Conference
on Bio-informatics held in
Biopolis, Singapore from
7 to 11 September
2009.
11. Papan C, Chen L. Metabolic fingerprinting reveals developmental
regulation of metabolites during early zebrafish embryogenesis. OMICS.
2009 Oct;13(5):397-405.
12. Zhang G, Liu T, Wang Q, Chen L, Lei J, Luo J, Ma G, Su Z. Mass
spectrometric detection of marker peptides in tryptic digests of gelatin:
A new method to differentiate between bovine and porcine gelatin. Food
Hydrocolloids Volume 23, Issue 7, 2009 Oct, Pages 2001-2007
14. Carugo O. and Eisenhaber F. (editors) Data Mining Techniques for the
Life Sciences. Humana Press and Springer Business Media. New York
2009
15. Carugo,O., and Eisenhaber,F. 2009. Preface: Electronic databases in
life science research. In Data Mining Techniques for the Life Sciences.
O.Carugo, and Eisenhaber,F.E., editors. Humana Press and Springer
Business Media. New York. v-viii.
16. Eisenhaber,B., and Eisenhaber,F. 2009. Prediction of Posttranslational
Modification of Proteins from Their Amino Acid Sequence. In Data
Mining Techniques for the Life Sciences. O.Carugo, and Eisenhaber,F.E.,
editors. Humana Press and Springer Business Media. New York. 365384.
17. Ooi,H.S., Schneider,G., Lim,T.-T., Chan,Y.-L., Eisenhaber,B., and
Eisenhaber,F. 2009. Biomolecular Pathway Databases. In Data Mining
Techniques for the Life Sciences. O.Carugo, and Eisenhaber,F.E.,
editors. Humana Press and Springer Business Media. New York. 129144.
18. Ooi,H.S., Schneider,G., Chan,Y.-L., Lim,T.-T., Eisenhaber,B., and
Eisenhaber,F. 2009. Databases of Protein–Protein Interactions and
Complexes. In Data Mining Techniques for the Life Sciences. O.Carugo,
and Eisenhaber,F.E., editors. Humana Press and Springer Business
Media. New York. 145-160.
19. Schneider,G., Wildpaner,M., Sirota,F.L., Maurer-Stroh,S., Eisenhaber,B.,
and Eisenhaber,F. 2009. Integrated Tools for Biomolecular SequenceBased Function Prediction as Exemplified by the ANNOTATOR Software
Environment. In Data Mining Techniques for the Life Sciences. O.
Carugo, and Eisenhaber,F.E., editors. Humana Press and Springer
Business Media. New York. 257-268.
20. Sohail A, Wenyu B, Lee RTC, Maurer-Stroh S and Wah IG. F-BAR
domain proteins: families and function. Communicative & Integrative
Biology, in press.
21. Sirota FL, Ooi HS, Gattermayer T, Schneider G, Eisenhaber F and
Maurer-Stroh S. Parameterization of disorder predictors for large-scale
applications requiring high specificity by using an extended benchmark
dataset. BMC Genomics, in press.
22. Limviphuvadh V, Chua LL, Eisenhaber F, Adhikari S, Maurer-Stroh S.
Is LGI2 the candidate gene for partial epilepsy with pericentral spikes?
Journal of Bioinformatics and Computational Biology, in press.
23. Kawase-Koga Y, Low R, Otaegi G, Pollock A, Deng H, Eisenhaber F,
Maurer-Stroh S, and Sun T. RNAase III enzyme Dicer maintains signaling
pathways for differentiation and survival in mouse cortical neural stem
cells. Journal of Cell Science, in press.
24. Maurer-Stroh S, Debulpaep M, Kuemmerer N, Lopez de la Paz M,
Martins IC, Reumers J, Copland A, Serpell L, Serrano L, Rousseau F,
Schymkowitz JWH. Exploring the sequence determinants of amyloid
structure using position-specific scoring matrices. Nature Methods,
accepted.
13. Ranganathan S, Eisenhaber F, Tong JC, Tan TW. Extending Asia Pacific
bioinformatics into new realms in the “-omics” era. BMC Genomics.
2009 Dec 3;10 Suppl 3:S1.
Principal Investigators’ Biographies
Sebastian Maurer-Stroh studied theoretical biochemistry in the group of Peter Schuster at the University of Vienna and wrote his master and PhD thesis while
working in Frank Eisenhaber’s lab at the Institute of Molecular Pathology (IMP) in Vienna. After a Marie Curie Postdoc fellowship at the VIB-SWITCH lab in
Brussels, he joined the A*STAR Bioinformatics Institute (BII) in Singapore where he is heading the Protein Sequence Analysis Group in the Biomolecular
Function Discovery Division since 2007. He has contributed widely used predictors for posttranslational modifications and catalyzed new biomolecular insights
by sequence-based function predictions. (Details: http://www.bii.a-star.edu.sg/research/biography/sebastianms.php)
Frank Eisenhaber’s research interest is focused on the discovery of new biomolecular mechanisms with theoretical and biochemical approaches and the
functional characterization of yet uncharacterized genes and pathways. Frank Eisenhaber is one of the scientists credited with the discovery of the SET domain
methyltransferases, ATGL, kleisins, many new protein domain functions and with the development of accurate prediction tools for posttranslational modifications
and subcellular localizations. He studied mathematics at the Humboldt-University in Berlin and biophysics and medicine at the Pirogov Medical University
in Moscow. He received the PhD from the Engelhardt Institute of Molecular Biology in Moscow. After postdoctoral work at the Institute of Molecular Biology
in Berlin-Buch (1989-1991) and at the EMBL in Heidelberg (1991-1999), he worked as teamleader at the Institute of Molecular Pathology (IMP) in Vienna
(1999-2007). Since August 2007, he is the Director of the Bioinformatics Institute, A*STAR Singapore. (Details: http://www.bii.a-star.edu.sg/research/
biography/franke.php)
RESEARCH DIVISIONS |
5
6
Georg SCHNEIDER
Postdoctoral Fellow:
Fernanda SIROTA
Research Associates:
OOI Hong Sain; Durga KUCHIBHATLA; Tobias GATTERMAYER;
Wilson KWO Chia Yee; Nigel TAN Yeow Lam
Since our aim is to develop sequence-analytic tools that can be
used by biologists, special emphasis is put on the development of
a user-friendly interface. An AJAX-based sequence-visualizer allows
for the interactive display of function predictions (see Figure 1),
while alternative views highlight different aspects of biological
objects. As an example, it is possible to get immediate access to
the distribution of predicted structural or functional features of a
set of sequences using its associated histogram view.
ANNOTATOR SOFTWARE
DEVELOPMENT
The ever increasing amount of data flowing into biological databases
shows no signs of leveling off. Sequencing technology is improving
at an unprecedented rate, bringing down the time it takes to decipher
entire genomes to a matter of days. Making sense of this data by
predicting molecular function is a time-consuming and tedious
manual task. The number of new sequence analytic methods
constantly being added to the toolbox of the computational biologist
requires knowledge about a vast array of different interfaces,
execution parameters and input formats.
A large number of external algorithms are plugged into the
ANNOTATOR and can be used to analyze sequences5. Applicable
external algorithms are presented in a way that closely follows the
standard procedure for segment based sequence analysis, which
is based on the assumption that proteins are chains of functional
units that can be analyzed independently with the overall function
of the protein arising from the synthesis of the functions predicted
for each individual module. The procedure first uses algorithms for
6
| RESEARCH DIVISIONs
Figure 2: A query sequence is projected onto the pathway graph, which can then
be interactively navigated.
The three-dimensional structure of proteins provides additional
functional information and the ANNOTATOR framework has been
enhanced with a number of methods for mapping sequence
conservation to structural models. We are currently also working
on incorporating algorithms for homology modeling which will
seamlessly integrate with heuristics for the collection of protein
families.
The ANNOTATOR group is developing software tools for sequence
analytic tasks. To this end we integrate a large number of publicly
available algorithms while at the same time implementing our own
heuristics and workflows.
The ANNOTATOR group is developing an advanced tool for functional
characterization of sequences and strives to establish the
ANNOTATOR software environment as the de-facto standard in this
field. This is achieved by providing a number of public web-services
based on the ANNOTATOR technology with ANNIE (http://annie.
bii.a-star.edu.sg)1 being the most recent addition. The scope of
work includes the integration of established algorithms as well as
research into novel heuristics for tracing distant evolutionary
relationships2,3. Due to the complex nature of sequence analytic
algorithms, it is necessary to additionally consider aspects of high
performance and distributed computing4.
Sequence analysis requires the application of a wide range of
algorithms, the results of which have to be interpreted and used
to decide on further steps. This naturally leads to a view of these
tasks as workflows, with decisions being made by the application
of rules. We are currently extending the ANNOTATOR with the
ability to design workflows and capture rules from biologists who
are taking sequence analytic decisions. The final implementation
will feature an easy-to-use workflow editor that allows bioinformaticians
to connect pre-defined sequence-analytic building blocks into
higher-level tasks.
the detection of non-globular regions, which are segments with a
compositional bias or repetitive patterns that often represent linker
regions, fibrillar segments, flexible binding sites or points of posttranslational modifications. The subsequent step is to run algorithms
for the identification of known globular domains. As a final step,
iterative heuristics have to be applied to uncover weak links in
sequence space and collect a family of protein sequence segments
that contain yet unknown globular domains.
For most sequence or structural features there are a number of
distinct predictors, whose methods are based on different underlying
principles. As an example, several algorithms for the prediction of
protein disorder, which plays an important role in structural and
functional genomics, are available. In order to use these in automated
workflows, it is necessary to identify parameter sets and threshold
selections under which the performance of the predictors becomes
directly comparable. To this end we derived new benchmark sets
and used them to identify settings, in which the different predictors
have the same false positive rate6.
Recent Publications
1. Sirota, FL, Ooi HS, Gattermayer T, Schneider ., Eisenhaber F, MaurerStroh S. Parameterization of disorder predictors for large-scale
applications requiring high specificity by using an extended benchmark
dataset. BMC Genomics 11, S15 (2010).
2. Mujezinovic, N., Schneider, G., Wildpaner, M., Mechtler, K. & Eisenhaber,
F. Reducing the haystack to find the needle: improved protein
identification after fast elimination of non-interpretable peptide MS/
MS spectra and noise reduction. BMC Genomics 11, S13 (2010).
3. Ooi HS, Kwo CY, Wildpaner M, Sirota FL, Eisenhaber B, Maurer-Stroh
S, Wong WC, Schleiffer A, Eisenhaber F, Schneider G. ANNIE: integrated
de novo protein sequence annotation. Nucleic Acids Res 37, W435-440
(2009).
Figure 1: Sequence Analysis of LGI2 with highlighted EPTP domain
Proteins don’t work in isolation and gaining functional insights
increasingly depends on understanding their roles within complexes
of interacting partners and pathways. Nevertheless, data is spread
across several major databases with each one of them using different
models and identifiers. We have created a data-model which allows
us to integrate data from many different sources and navigate it in
a unified manner. The pathway portion of the viewer has been
implemented using an interface inspired by Google Maps (see
Figure 2).
4. Ooi,H.S., Schneider,G., Lim,T.-T., Chan,Y.-L., Eisenhaber,B., and
Eisenhaber,F. 2009. Biomolecular Pathway Databases. In Data Mining
Techniques for the Life Sciences. O.Carugo, and Eisenhaber,F.E., editors.
Humana Press and Springer Business Media. New York. 129-144.
5. Ooi,H.S., Schneider,G., Chan,Y.-L., Lim,T.-T., Eisenhaber,B., and
Eisenhaber,F. 2009. Databases of Protein–Protein Interactions and
Complexes. In Data Mining Techniques for the Life Sciences. O.Carugo,
and Eisenhaber,F.E., editors. Humana Press and Springer Business
Media. New York. 145-160.
6.
Schneider,G., Wildpaner,M., Sirota,F.L., Maurer-Stroh,S., Eisenhaber,B.,
and Eisenhaber,F. 2009. Integrated Tools for Biomolecular SequenceBased Function Prediction as Exemplified by the ANNOTATOR Software
Environment. In Data Mining Techniques for the Life Sciences. O.
Carugo, and Eisenhaber,F.E., editors. Humana Press and Springer
Business Media. New York. 257-268.
7. Neuberger, G., Schneider, G. & Eisenhaber, F. pkaPS: prediction of
protein kinase A phosphorylation sites with the simplified kinasesubstrate binding model. Biol Direct 2, 1 (2007).
8. Maurer-Stroh S, Koranda M, Benetka W, Schneider G, Sirota FL,
Eisenhaber F. Towards complete sets of farnesylated and geranylgeranylated
proteins. PLoS Comput Biol 3, e66 (2007).
9. Schneider G, Neuberger G, Wildpaner M, Tian S, Berezovsky I, Eisenhaber
F. Application of a sensitive collection heuristic for very large protein
families: evolutionary relationship between adipose triglyceride lipase
(ATGL) and classic mammalian lipases. BMC Bioinformatics 7, 164
(2006).
Principal Investigator’s Biography
Georg Schneider received his PhD from the University of Vienna, Austria. Prior to joining the Bioinformatics Institute he was leading the software development of
sequence analytic projects at the Institute of Molecular Pathology in Vienna, Austria. (Details: http://www.bii.a-star.edu.sg/research/biography/georgs.php)
RESEARCH DIVISIONs |
7
With microscopy, the subcellular localization of the non-secreted
mutant variants was studied. COS7 cells were transfected with
expression constructs for each mutation including a C-terminal
GFP-tag based on the pEGFPN3 vector. COS7 cells with the empty
vector were used as controls. We found that LGI1 WT and LGI1
K353E are localized predominantly in the Golgi and partially in the
ER. But the other mutants are more predominant in the ER (Fig.
A). As for LGI2 and its mutants, they show similar localization in
both ER and Golgi with the exception of LGI2 V420E which is
localized mainly in the ER (Fig. B).
8
A poster by the Biomolecuar Function Discovery Division was one
of the 3 recipients of the “Outstanding Poster Award” at the A*STAR
Scientific Conference held in Singapore from 18 to 20 November
2008.
Sharmila ADHIKARI
Postdoctoral Fellows:
Neelagandan KAMARIAH; Subhashri RAMABADRAN;
TOH Yew Kwang; XIN Hongyi
Research Associates:
Left to Right: Dr. Sebastian Maurer-Stroh, Mr. Lim Chuan Poh(Chairman, A*STAR),
Dr. Sharmila Adhikari, Dr. Vachiranee Limviphuvadh
Photo by Bernard Chan, A*STAR
Michaela SAMMER; GUO Fusheng; Nicholas HO Rui Yuan;
LUA Wai Heng; LIEW Lailing; Winnie TAY Yu Ling
Although several genes responsible for some types of epilepsy have
already been identified, clinical description of most epilepsy cases
ends at the phenotypic level. Genetic testing is needed to increase
our understanding of the underlying molecular basis. We aim to
identify mutations in a set of selected genes already known to be
associated with certain types of epilepsy as well as to screen
reasonable new candidates from epilepsy hotspots. Screening for
mutations in epilepsy genes is important to classify genotypically
distinct forms of epilepsy which later can be applied for more
detailed diagnosis. We also aim to elucidate the molecular
mechanisms that trigger the respective epileptic attacks through
cell and molecular biology experiments that could lead to the
development of better or new drugs in the future.
EXPERIMENTAL VERIFICATION
OF PREDICTED MOLECULAR
AND CELLULAR FUNCTIONAL
PROPERTIES OF UNCHARACTERIZED
GENE PRODUCTS
The protein biochemistry/molecular biology group is involved in the
verification of sequence-analytic hypotheses with regard to molecular
and cellular functions of proteins. The targets are selected from
proteome-wide screens or from gene sets provided by collaborators.
This work will lead us to new biological insight and to the discovery
of new biological processes and mechanisms. In the future, the lab
will also generate experimental data that can be used for the
development of new prediction tools. Among the ongoing projects,
there are nuclear shuttling of transcription factors, parasite proteins
of human pathogens, targets in development and differentiation in
Drosophila melanogaster, human disease mutations and a structural
biology target related to GPI lipid anchor biosynthesis.
Partial epilepsy with pericentral spikes (PEPS) is a familial epilepsy with
disease locus mapped to human chromosome region 4p15; yet, the
causative gene is unknown. Analysis was performed to all 52 genes
known to be located in the PEPS disease map locus (4p15). We found
that only 14 of these genes are common to be deleted in patients with
similar epilepsy-related seizure phenotype. Based on functional
characteristics derived from the sequences of the encoded proteins, we
conclude that the gene LGI2 is the most likely candidate to be associated
with PEPS. LGI2 has considerable similarity to LGI1, LGI4 and GPR98
for which mutations have been found to be directly associated with
different forms of epilepsy. We experimentally investigated the effect of
point mutations in LGI1 and LGI2. We observed a reproducible phenotype
in terms of lack of protein secretion (resulting in loss of function) for
both LGI1 and LGI2 if structurally homologous positions are mutated
that are conserved throughout the LGI family and known to cause disease
in LGI1. Hence, we suggest that the underlying cellular disease mechanism
is similar for LGI1 and LGI2 and each of the LGI family members may
be responsible for phenotypically similar, mechanistically related but
genotypically distinct forms of epilepsy.
8
| RESEARCH DIVISIONs
Recent Publications
1. Limviphuvadh, V., Chua, LL., Eisenhaber,F., Adhikari, S., and MaurerStroh, S. Is lGI2 the candidate gene for partial epilepsy with pericentral
spikes? J Bioinform Comput Biol. (In press).
4. Tse WK, Eisenhaber B, Ho SH, Ng Q, Eisenhaber F, Jiang YJ. Genomewide loss-of-function analysis of deubiquitylating enzymes for zebrafish
development. BMC Genomics. 2009 30; 10(1):637.
2. Limviphuvadh, V., Chua, LL., Eisenhaber,F., Maurer-Stroh, S., and
Adhikari, S. Similarity of molecular phenotype between known epilepsy
gene LGI1 and disease candidate gene LGI2. (submitted).
5. Adhikari S, Bhatia M. H2S-induced pancreatic acinar cell apoptosis is
mediated via JNK and p38 MAP kinase. J Cell Mol Med. 2008
12(4):1374-83.
3. Grillari J, Löscher M, Denegri M, Lee K, Fortschegger K, Eisenhaber F,
Ajuh P, Lamond AI, Katinger H, Grillari-Voglauer R. Blom7alpha is a
novel heterogeneous nuclear ribonucleoprotein K homology domain
protein involved in pre-mRNA splicing that interacts with SNEVPrp19Pso4. J Biol Chem. 2009 284(42):29193-204.
6. Benetka,W., Mehlmer,N., Maurer-Stroh,S., Sammer,M., Koranda,M.,
Neumuller,R., Betschinger,J., Knoblich,J.A., Teige,M., and Eisenhaber,F.
Experimental testing of predicted myristoylation targets involved in
asymmetric cell division and calcium-dependent signalling. Cell Cycle
2008 7: 3709-3719.
Principal Investigator’s Biography
Sharmila Adhikari was appointed as a Principal Investigator at the Bioinformatics Institute A*STAR Singapore in August 2008. She leads a biochemistry and molecular
biology lab that aims to bridge the gap between theoretical predictions of proteins with unknown functions and their cellular biology, which can subsequently be used
to aid the identification of novel drug targets. She obtained her PhD degree at the National University of Singapore and worked as a postdoctoral research fellow at
Department of Pharmacology, Yong Loo Lin School of Medicine, NUS. (Details: http://www.bii.a-star.edu.sg/research/biography/sharmilaa.php)
Subcellular localization of A) LGI1 and B) LGI2 GFP-tagged wild type and mutant
protein. COS7 cells were transfected with GFP-fused protein (green) as indicated
and treated with either an anti-PDI followed by alexa 546 (red) to stain the
endoplasmic reticulum or TR Ceramide to detect the Golgi apparatus and DAPI
(blue) to stain the nuclei and then examined by laser fluorescence confocal
microscopy. The fields shown were visualized independently at the appropriate
wavelength for GFP (488 nm) and anti-PDI or TR Ceramide (546 nm), and then
the two images were merged. Original magnification: 63x.
RESEARCH DIVISIONS |
9
show for the first time how intermolecular interactions are orchestrated
through mutliple subtle and networked interactions of both partners
and how these interactions are mutually modulated. Indeed the
thermodynamic basis underlying these interactions revealed for the
first time the degeneracy that is inherent in these systems (this work
published in JACS was selected for special mention).
10
Chandra VERMA
Postdoctoral Fellows:
Amor A. SAN JUAN; Madhumalar AMURUMUGAM; Shubhra
GHOSH DASTIDAR; Gloria FUENTES; Thomas Leonard
JOSEPH; Devanathan RAGHUNATHAN; JAGADEESH M.
Nanjegowda
Paper titled “Multiple
Peptide Conformations Give
Rise to Similar Binding
Affinities:
Molecular
Simulations on p53-MDM2”
was selected by the Journal
of the Americal Chemical
Society (JACS) as one of the
best 23 papers published in
the journal in the last two
years. Selected best papers
were published in JACS
Select #3 on 10 Dec 2008.
This has led to the design of a unique entropically driven peptide
and more recently of novel small molecules that have the potential
to be exploited as leads for developing therapeutics.
Research Associates:
LOW Soo Mei; QUAH Soo Tng; SIAU Jia Wei; TAN Yaw Sing
PhD Student:
Suryani LUKMAN
Atomistic Simulations and
Design in Biology
Mechanisms underlying biological processes at a molecular level are
explored through identifying and/or mapping the character of proteins
and their interactions with other proteins, nucleic acids, ligands. The
methods used are computational and combine representations at
various levels from the coarse grained to fully atomistic. The work
builds upon foundations that are rooted in rigorous computational
biochemistry benchmarked extensively against available experimental
data. In particular simulations are combined with detailed experimental
information through extensive collaborations with experimental
laboratories to provide incisive insights into biology at an atomic
level. The group’s current research focus is on the p53 pathway,
kinases, translation initiation, ATP-synthases, 14-3-3, defensins and
basic structural/computational biophysical chemistry.
The usual arsenal of tools are utilised: construction of models based
on “imagination with a whiff of hand-waving”, homology modelling,
molecular dynamics, free energy, normal modes, reaction paths to
examine shape shiftings in proteins, electrostatics, ligand-protein/
protein-protein dockings including virtual screening (the docking
program also includes the development of novel or modifications of
existing scoring methods). On the one hand, the group examines how
native and mutant forms of proteins may (mis)function in their
interactions, while on the other, there is an extensive program that
is directed towards ligand/drug discovery and protein/peptide design
both from a therapeutic as well as a (bio)technological perspective.
The group has extensive links with a variety of experimental labs,
including the group’s own attempts at “wetting their hands” so that
the hypotheses are subject to rigorous testing and validations.
An extensive program investigating the relationship between the
structural and functional aspects of the p53 family has revealed several
insights, most notably of how the underlying dynamics critically controls
interactions, both prior to and after binding. Movies of these processes
10
| RESEARCH DIVISIONS
Figure 1: The MDM2/MDMX-p53 binding mechanism depends on the stability of the α-helical
motif formed by the residues 19-25 of the transactivation domain of p53 (cyan cartoon) which
displays three hydrophobic side chains F19, W23 and L26 (cyan sticks) appropriately for
optimal interactions with MDM2.Peptides or small molecule such as nutlin can displace this
region of p53 from MDM2, and induce apoptosis by stabilizing p53 in tumour cells. Computer
simulations show how the dynamics of the surfaces modulate each other and how the narrower
binding cleft in MDMX (in green surface) is less amenable to inhibition by nutlin (clash between
the magenta molecule and the green surface is evident) compared to MDM2 (the magenta
nutlin fits very well into the orange surface of MDM2). For the first time, simulations carried
out by the group have shown how the dynamics of the interacting surfaces controls the
interactions and the observed discriminations towards p53 and nutlin by MDM2 and
MDMX.
This highly successful program is in close collaboration with the p53
laboratory of Prof Sir David Lane. More recently the expertise of the group
has led to the development of collaborations with the University of
Edinburgh, Beatson Institute, Ludwig Institute, INSERM. In parallel, the
effort with the Lane lab is also focused on understanding the mechanisms
of the translational initiation cascade and design aptamers, peptides and
small molecules to inhibit a key component in this pathway, eIF4E, which
offers opportunities as a major target for therapeutic intervention in several
cancers. We have recently begun detailed crystallographic, biophysical
and simulation based interrogation of the system.
The recent developments and the excitement generated in the p53
field has been outlined in a Nature Reviews in Cancer article that
was published recently by the joint efforts of teams from Singapore,
Karolinska & Cambridge University.
A Review titled Awakening
guardian angels: drugging the
p53 pathway by Christopher
J. Brown, Sonia Lain, Chandra
S. Verma, Alan R. Fersht and
David P. Lane, published in
Nature Revieaws Cancer 9,
862–873 (1 December 2009)
| doi:10.1038/nrc2763 was
featured on the Nature
Reviews Cancer website
In a related and highly successful effort that involves close
collaboration with the group of Prof Beuerman and Dr Jagadeesh
Mavinahalli at the Singapore Eye Research Institute and researchers
at Nanyang Tehcnological University and National University of
Singapore, the group has successfully been designing and
investigating novel antibiotics based on defensins that appear to
show selective activity for certain bacteria with little or no activity
against human cells. The success of this project, illustrated by the
filing of two patents, has attracted seed money and the attention
of several industries.The virtual screening efforts of the group have
received a recent boost with the successful identification of a set
of molecules that appear to show promise as potential lead compounds
in the development of antibiotics, targeted against bacterial enzymes.
This work is carried out with the Experimental Therapeutics Centre,
A*STAR, who is carrying out the experimental investigations and
synthesis of compounds that are generated from the virtual
screens.
In a major translational effort, the group is engaged with
experimentalists (Dr Scaltriti) and clinicians (Prof Baselga) at the
val d’Hebron Hospital in Barcelona, studying the effects of small
molecule and antibody based therapies for breast cancer. A
significant breakthrough has emerged in the understanding of
molecular mechanisms that underlies the observation of synergism
between kinase inhibitors and antibody-based therapy for breast
cancers that are characterized by overexpressed HER2 receptors.
This work is now being extended to understand the molecular basis
underlying the cooperative interactions that are increasingly being
recognized as being of significance for improving the therapeutic
potential of existing and new therapies that target the EGFR (and
related) receptor families in oncology.
In parallel, the group is also involved in establishing detailed
mechanisms that underpin experimental observations in a variety
of systems. An ongoing successful effort is with the laboratory of
Prof Gruber at the Nanyang Technological University in developing
models of the ATPase machinery. Together with detailed structural
characterization of elements of this enzyme, we provide, using
simulations, an understanding of how dynamics modulate the
assembly and function of this complex machine.
We have been investigating mechanisms of signalling among PAK
kinases and 14-3-3 proteins, together with Prof Manser of the
Institute of Medical Biology, A*STAR. Recently, this has lead to
the finding that a commonly used phosphomimetic mutant used
to study signalling in PAK1 and assumed to be active, is in fact
inactive; simulations validated by experiments have provided
mechanistic details on the origin of this finding. This particular
development is significant because the phosphomimetic mutation
is widely used in kinase studies and insights such as those provided
by our studies can provide rapid screens for experimentalists to
identify systems where these phosphomimetics are indeed
active.
Figure 2: The interactions between therapeutic antibodies pertuzumab (in yellow
on left) and tratsuzumab (in black on right) both bound to their receptor, the extraceullar domain of HER2 (in green). Both these antibodies have had success in the
clinic for breast cancer patients.
Recent Publications
1. Lane DP, Cheok CF, Brown CJ, Madhumalar A, Ghadessy FJ, Verma C.
The Mdm2 and p53 genes are conserved in the Arachnids. Cell Cycle
(2010 in press).
7. Madhumalar A, Lee HJ, Brown CJ, Lane D, Verma C. Design of a novel
MDM2 binding peptide based on the p53 family. Cell Cycle 2009
8:2828-2836.
2. Brown CJ, Dastidar SG, See HY, Coomber DW, Ortiz-Lombardia M, Verma
C, Lane DP. Rational design and biophysical characterization of
Thioredoxin-based aptamers: insights into peptide grafting. J Mol Biol.
2010 (in press)
8. Bai Y, Liu S, Jiang P, Zhou L, Li J, Tang C, Verma C, Mu Y, Beuerman
RW, Pervushin K. Structure-dependent charge density as a determinant
of antimicrobial activity of peptide analogues of defensin Biochemistry
2009 48:7229-7239.
3. Lane DP, Cheok CF, Brown C, Madhumalar A, Ghadessy FJ, Verma
C.Mdm2 and p53 are highly conserved from placozoans to man. Cell
Cycle. 2010 9:1-8.
9. Brown CJ, Verma CS, Walkinshaw MD, Lane DP Crystallization of eIF4E
complexed with eIF4GI peptide and glycerol reveals distinct structural
differences around the cap-binding site. Cell Cycle 2009 8:19051911.
4. Dastidar SG, Madhumalar A, Fuentes G, Lane DP, Verma CS Forces
mediating protein-protein interactions: a computational study of p53
“approaching” MDM2 Theoret Chem Accnts 2010 125:621-635.
5. Brown CJ, Lain S, Verma CS, Fersht AR, Lane DP. Awakening guardian
angels: drugging the p53 pathway. Nat Rev Cancer 2009 9:862-873.
6. Dastidar SG, Lane DP, Verma CS Modulation of p53 binding to MDM2:
computational studies reveal important roles of Tyr100. BMC Bioinf
2009 15:S6
10.Scaltriti M, Verma C, Guzman M, Jimenez J, arra JL, Pederson K, Smith
DJ, Landolfi S, Ramon Y, Cajal S, Arribas J, Baselga J Lapatinib, a HER2
tyrosine kinase inhibitor, induces stabilization and accumulation of
HER2 and potentiates trastuzumab-dependent cell cytotoxicity Oncogene
2009 28:803-814.
11.Dastidar SG, Lane DP, Verma CS Multiple peptide conformations give
rise to similar binding affinities: molecular simulations of p53-MDM2.
Jl Amer. Chem Soc 2008 130:13514-13515.
Principal Investigator’s Biography
Chandra Verma carried out his undergraduate studies at IIT, Kanpur after which he studied for his D.Phil in York, UK. Subsequently he joined the York Structural Biology
lab where he remained until 2003 when he moved to the Bioinformatics Institute, Singapore. (Details: http://www.bii.a-star.edu.sg/research/biography/chandra.php)
RESEARCH DIVISIONS |
11
While in some cases detecting such “conserved” regions in a protein
is not a very challenging task, our task as bioinformaticians is
attending to the cases when the information about the protein from
multiple species is scarce, or when the evolutionary correspondence
is difficult to interpret. This is an active field of research, in which
our group participates.
A Piece of Structure in a Haystack
12
The first technical hurdle, in answering the above questions, is
finding similar pieces of the structure (as in Figure 2) in a database
of the size of the modern Protein Data Bank. From the onset of the
Information Era people have assembled and deposited, information
about tens of thousands different protein structures, from all
kingdoms of life, mostly coming from X-ray crystallography and
nuclear magnetic resonance experiments. Sorting through this
volume of data, in order to find relevant biological answers is an
interesting problem for a computer inclined scientist. While computer
science teaches us how to retrieve efficiently a well defined entry
in a large database, biologically interesting answers are often to be
find in a twilight zone of loosely defined (from the computational
perspective) hits. Designing algorithms that balance the conflicting
requirements of efficiency and non-triviality of the search is currently
the focus of independent research in our group.
Ivana MIHALEK
Postdoctoral Fellows:
Kavitha Bharatham; Westley A. Sherman
Research Associates:
ZHANG Zong Hong; Sharon CHEE Min Qi
EVOLUTION OF PROTEIN
STRUCTURE AND FUNCTION
The aim of Evolution of Protein Structure and Function (EPSF, not
to be confused with Encapsulated PostScript Format) group is to
reverse engineer the function of a protein through studying its
evolution. Bioinformatics is used to get the first inkling of the layout
and mechanism of these biological nanomachines, and computer
simulation to test, to the extent it currently allows, the reasonableness
of the interpretation of bioinformatics data. Ultimately, the goal is
to build a straightforward hypothesis which can then be tested
experimentally. Therefore, serious effort is invested into developing
ways to present the group’s findings in the most useful and compact
way to experimentalist colleagues.
How evolution chisels out functionally relevant pieces
of a protein?
In evolution, as in any statistical process, anything that can happen
will happen. Compared, however, to the options open to a simple
physical system, “can happen” is a somewhat more elaborate
condition. While the physics of DNA stability may allow for a
mutation, this mutation might severely degrade the stability of the
protein it encodes, which in turn may kill the organism carrying the
mutation. A mutation at a different position might be irrelevant to
the protein stability, but it may adversely impact its interaction with
another protein, thus disrupting a pathway in the hapless
organism.
Keeping that scenario in mind, a comparison of proteins performing
the same function in living and thriving organisms can be done,
identifying the regions in the protein in which mutations, or certain
types of mutations, are conspicuously absent. Since it can reasonably
assumed that mutations do happen sporadically in those places, as
they do in all underlying positions in the DNA, it is possible that
the carriers were eliminated from the gene pool because the mutation
resulted in some disadvantage for the organism, be it on the
translational, folding, or protein-protein interaction level. Therefore,
these are precisely the regions crucial for the protein function, the
regions we should focus our attention to.
12
| RESEARCH DIVISIONS
the former question: How far can we get in our bioinformatical
analysis by comparing pieces of similar structure, one of which has
a known physiological role? In particular if the two (pieces of)
structure appear to be conformationally rearranged, does that imply
that both actually move in order to perform their job within a cell,
and were just caught at different stages of their action?
Figure 1: The correlation between the degree of conservation of a protein region
and the impact the mutation has on the organism is most readily observable in
the case of enzymes, small chemical factories that are a very common type of
protein. In the illustration in Figure 1 (from an enzyme, called HPPD, from the
tyrosine degradation pathway), the most highly conserved regions (yellow) cannot
be mutated without causing the organism's demise, while the slightly less conserved
regions (red), if mutated, cause health problems of various degrees of severity.
What is the function of a functionally relevant piece
of protein structure?
Even in the situations when the conserved regions on the protein
are clearly discernible, their functional role may be difficult to
interpret. Sometimes this task can be relegated to the experiment.
For example, in a study of a protein called Ku (from the large group
of telomere-related proteins) the conservation map pointed to several
regions, which seemed quite mysterious. The results were turned
straight away to experimental colleagues who were able to establish,
through the site directed mutagenesis, that several pathways were
critically affected, distinct pathways intriguingly assorting with
distinct protein regions.
However, it would be desirable to elucidate the role of such regions
in silico, thus focusing the experimental work even more narrowly.
Several options are open to a computational biologist at this point,
depending on the available information about the protein: if the
structure is known, computer simulation might shed some light on
its role in structural changes the protein undergoes, the small
ligands it binds, and interaction with other proteins it engages in.
In our group we use the existing software as a tool to conduct
computational experiments designed to test our and our collaborators’
hypotheses about protein function.
Similar Structure – Similar Function?
Very similar protein structures often signal a very similar function
irrespective of the level of sequence similarity. The rule is not
infallible, and contrary examples exist of both the similar structures
carrying a different function, and of the so termed convergent
evolution in which similar functions are performed by different
structures. The latter case can be recognized by a certain degree
in local similarity of the structure. We have recently started pursuing
We are in the process of developing a set of methods, together with
the accompanying server (http://epsf.bmad.bii.a-star.edu.sg/struct_
server.html) to handle precisely this type of a problem. The main
theme of the approach is to deconstruct the protein structure from
its gross structural features, such as the relative orientation of its
elements of secondary structure (helices and strands) down to the
minutiae of its atomic makeup, and use them in that order to retrieve,
and sort by relevance, similar pieces of structure, appearing either
as complete entries in Protein Data Bank, or as a substructure
thereof. The trick, of course, is to explain to the computer how this
is to be done, without losing the underlying ideas to the weak
implementation. This is the point at which we put on our computer
scientist’s hat. Next, biology comes into play: how do we keep the
search general enough to give useful answers to molecular biologists
specializing in different proteins and pathways? How do we use our
findings as a bridge to more of the existing knowledge, and yet
present the results in a way that is parseable both by a human and
Figure 2: Detecting a structure of unknown function (deposited in Protein Databank
under identifier 3dcx; blue) as a substructure of protein involved in cell adhesion,
called radixin (1gc6; red).
by downstream computer applications? As the underlying quantity
of data increases, processing them becomes an ever larger task,
but the reward lies in the promise of giving ever more focused
answers to scientific queries.
EPSF Group in the Scienceland
The main role of a bioinformatician is sieving through the existing
(already daunting and relentlessly growing) amount of biological
data and finding facts pertinent to the problem at hand. Working
in protein science, we are lucky enough to be able to push that
search a step further through explicit simulation of the physical
systems in question. Immediate goals aside, as computational
biologists we are trying to push forward the point at which the
experiment needs to be invoked. Ultimately, we are biologists, and
consider our contributions to the experimentally testable work our
most important accomplishments.
Recent Publications
1. Wang H, Chumnarnsilpa S, Loonchanta A, Li Q, Kuan YM, Robine S,
Larsson M, Mihalek I, Burtnick LD, Robinson RC. Helix straightening
as an activation mechanism in the gelsolin superfamily of actin
regulatory proteins. J Biol Chem. 2009 Aug 7;284(32):21265-9.
2. RibesZamora A, Mihalek I, Lichtarge O, Bertuch AA. “Distinct faces
of the Ku heterodimer mediate DNA repair and telomeric functions.”
Nat Struct Mol Biol. 2007.
4. Mihalek, I., I. Res, and O. Lichtarge.” Evolutionary Trace Report
Maker: a new type of service for comparative analysis of proteins.”
Bioinformatics. 2006 Jan 15;22(2):14956.
5. Mihalek, I., I. Res, O. Lichtarge..”A Family of EvolutionEntropy
Hybrid Methods for Ranking of Protein Residues by Importance”
J. Mol. Bio. 336(5): 126582(2004).
3. Mihalek I, Res I, Lichtarge O. “On itinerant water molecules and
detectability of protein-protein interfaces through comparative
analysis of homologues.” J Mol Biol. 2007 Jun 1;369(2):58495.
Principal Investigator’s Biography
Ivana got her undergraduate degree in physics from U. of Zagreb, Croatia (1993), and her PhD in Physics (2000) and MSc (2001) in computer science
from U. of Kentucky, USA. She worked as a postdoctoral fellow in bioinformatics in Baylor College of Medicine, Houston, USA until 2007, when she joined
BII. (Details: http://www.bii.a-star.edu.sg/research/biography/ivanam.php)
RESEARCH DIVISIONS |
13
14
Mallur Srivatsan
MADHUSUDHAN
Postdoctoral Fellows:
2.Aligning the 3D structures of proteins independent
of their topology
Research Associates:
We devised a new algorithm, CLICK, to align the 3D structures of
proteins using their Cartesian coordinates, secondary structure
content and residue-wise surface accessible areas. CLICK aligns
pair of protein structures independent of their topology. This is a
powerful method to investigate structural similarity across protein
folds and protein families. CLICK is effective in not only giving the
optimal alignment between two protein structures but can also in
detecting conformational changes, such as domain motions and
rigid body shifts. The method was extensively benchmarked on
several datasets of pairs of structurally similar proteins, both
topologically similar and topologically dissimilar. The method was
also compared with other frequently used structure alignment
algorithms. CLICK performs at the same level of accuracy as these
other methods, if not statistically significantly better. We are now
using CLICK to detect small molecule binding sites on proteins
(figure 1), and protein-protein interaction interfaces on proteins
(figure 2). The application of CLICK is not restricted to aligning
protein structures. The algorithm can be readily generalized and
used to align the 3D structures of any two molecules, such as DNA,
or RNA.
Debashree BANDYOPADHYAY; NGUYEN Ngoc Minh
Rowena CHEONG Wai Sim; TAN Kuan Pern
Modeling the 3D Structures
of Proteins and their
Complexes
The broad aim of our group is to develop and apply computational
tools to model the structural biology of molecular interactions in
the living cell. To this end, we combine the laws of physics with
experimental observation and statistics to develop computational
methods in structural biology. The methods are tested, often in
close collaboration with experimental biologists, on particular systems
of interest. Our research results in detailed information of cellular
processes and provides testable hypotheses that can then be verified
experimentally. In particular, we are interested in the following
problems:
1.Improving the accuracy of comparative protein
structure modeling and functional annotation
We are developing methods to accurately align protein sequences
to protein structures. Accurate alignments are key to accurately
modeling the 3D models of proteins. These efforts include using a
structure based environment dependent gap penalty function
[Madhusudhan et al., 2006, Madhusudhan et al., 2009], and
substituting single sequences with their profiles during the alignment
process.
Methods that improve on modeling accuracy are often beneficial in
improving the accuracy of functional annotation of proteins.
Previously, we tested a decision tree based approach to predict the
structural/functional stability of single point mutants [Bajaj et al.,
2007], starting with the crystal structure of the native protein. We
extended the stability predictions to be made using homology models
instead of experimental structures. The stability prediction method
was also improved upon, with additional branches to the decision
tree that incorporated evolutionary information in the form of
sequence conservation. The effect of alignment errors on the decision
14
tree accuracy by using both sequence and structure alignments to
build the models, as well as the effect of using single and multiple
templates was studied. Our results showed that while small errors
in alignment accuracy did not change the prediction of stability,
the use of multiple templates improved upon the prediction accuracy
of models built using single templates.
| RESEARCH DIVISIONS
3. Predicting protein-protein interactions
Homology modeling is used to construct the models of target
interacting protein complexes. The method predicts the protein
constituents of the interacting complex and its 3D structure. The
3D structures of these complexes are modeled using the structural
similarity of the target proteins to constituents of known (template)
protein domain-domain interactions [Davis et al., 2006, Pieper et
al., 2005]. The complexes predicted are not restricted to pairs of
proteins. If multi-domain templates are present, multi-component
interacting protein complexes are predicted. The complexes are
assessed using a statistical potential constructed from residue
contacts across known protein domain-domain interfaces. Prediction
scores were calibrated for reliability. On a benchmark set of 100
interactions, the statistical potential accurately predicted interactions
in 97 cases. The method is also capable of distinguishing between
alternate modes of binding (Figure 3). Additional information, such
as functional annotation, and sub-cellular localization can be used
to enhance reliability. We are now developing methods that a) Model
protein interactions without the aid of full-length (entire domain
coverage) templates and b) Model all biological complexes, not
restricted to protein interactions.
Figure 2: On the left is the 3D structure of the protein assembly of two proteins
that are a part of the cellulosome from clostridium thermocellum (PDB code 1aoh,
chains A and B; grey and purple respectively). The interacting interfaces between
the two proteins are shown in yellow and green. CLICK was used to detect regions
on other proteins that are similar to these interacting interfaces. A region similar to
interface in 1aohA was found in 1g7kA (right, top), and a region similar to interface
in 1aohB was found in 1g7kB (right, bottom). The crystal structure of the complex
of 1g7k, a red fluorescent protein from coral, shows similar interface association
to 1aoh. 1aoh and 1g7k belong to different protein families and their overall 3D
fold is different. Though the interaction interface regions are topologically different,
CLICK is able to detect their structural similarity.
Figure 1: The 3D structures of A.fulgidus Rio2 Kinase (PDB code 1zao; magenta)
and purt-encoded glycinamide ribonucleotide transformylase (PDB code 1kj9;
cyan) are superimposed using CLICK. The two proteins are unrelated by protein
fold, function, or sequence. The regions of structural similarity between the two
proteins lie in their ATP binding sites. The inset shows the similarities (dashed
blue lines denote hydrogen bonds) in the interaction between the proteins and
the ATP molecules that have been bound to them. The conformation of the bound
ATP is similar in both proteins. These similarities are despite the aforementioned
differences between the proteins.
Figure 3: 3 Camelid VHH domains AMB7 (blue), AMD10 (orange) and AMD9
(green) bind to porcine pancreatic α-amylase (PPA, gray surface) through three
distinct binding modes (PDB codes 1kxt, 1kxv, and 1kxq, respectively). All three
interaction modes were evaluated for each VHH–PPA complex using the interface
statistical potential. The statistical potential is sensitive enough to distinguish
the native binding modes from the non-native modes.
Recent Publications
1. Madhusudhan MS, Webb BW, Marti-Renom MA, Eswar N, Sali A.
Alignment of multiple protein structures based on sequence and
structure features. Protein Eng Des Sel. 2009 22, 569-74.
2. Bajaj K, Madhusudhan MS, Adkar BV, Chakrabarti P, Ramakrishnan C,
Sali A, Varadarajan R. Stereochemical criteria for prediction of the effects
of proline mutations on protein stability. PLoS Comput Biol. 2007;
3(12):e241.
3. Davis FP, Braberg H, Shen MY, Pieper U, Sali A, Madhusudhan MS.
Protein complex compositions predicted by structural similarity. Nucleic
Acids Res. 2006 34(10):2943-52
4. Pieper U, Eswar N, Davis FP, Braberg H, Madhusudhan MS, Rossi A,
Marti-Renom M, Karchin R, Webb BM, Eramian D, Shen MY, Kelly L,
Melo F, Sali A. MODBASE: a database of annotated comparative protein
structure models and associated resources. Nucleic Acids Res.
2006;34(Database issue):D291-5.
5. Madhusudhan MS, Marti-Renom MA, Sanchez R, Sali A. Variable gap
penalty for protein sequence-structure alignment. Protein Eng Des Sel.
2006; 19(3):129-33.
Principal Investigator’s Biography
M. S. Madhusudhan joined the Bioinformatics Institute as a Principal Investigator in 2008. He received a Masters degree in Physics from the University of
Pune and a PhD in Biophysics from the Molecular Biophysics Unit, Indian Institute of Science. This was followed by post-doctoral work in the lab of Andrej
Sali at the Rockefeller University and University of California, San Francisco. (Details: http://www.bii.a-star.edu.sg/research/biography/madhusudhan.php)
RESEARCH DIVISIONS |
15
DNA tracks. We predict that such paired sequences can produce the
triplex forming ncRNAs (miRNAs and siRNAs) and thus might be involved
in silencing or activating mechanisms of expression of many essential
genes and might be used for antigene therapy purposes.
2. Statistical and computational analysis of proteinDNA binding in the mammalian genome
TF-DNA binding loci are explored by analyzing massive datasets generated
by application of Chromatin Immuno-Precipitation (ChIP)-based highthroughput sequencing technologies. However, these datasets suffer
from a bias in the information about binding loci availability, sample
16
Vladimir KUZNETSOV
Postdoctoral Fellows:
Oleg GRINCHUK; Efthimios MOTAKIS; Philip PRATHIPATI;
Arsen BATAGOV; KYAW Tun; YENAMANDRA Surya Pavan
Research Associates:
Piroon JENJAROENPUN; OW Ghim Siong; Zack TOH Swee Heng;
PhD students:
Thidathip WONGSURAWAT; Mikhail LUKYANOV
COMPUTATIONAL ANALYSIS
OF GENOME COMPLEXITY,
TRANSCRIPTION REGULATION
AND CELLULAR PHENOTYPES
We develop integrative computational and statistical analyses of
massive sequence datasets, matrixes of DNA-transcription factor
interactions, genome architectures, cis-antisense gene pairs, ncRNAs
and genome regulatory signals. We study the functions of these
sequences in transcriptional regulation and gene networks. We
predict and study the genes and genome modules which are
specifically associated with distinct phenotypes of cancerous cells
and which could be essential for cancer patient’s survival.
We also aim to develop novel theoretical and computational
frameworks for functional and structural analyses of co-regulation of
protein-coding genes and ncRNAs that integrates high-throughput
sequencing and expression data sets relating to mechanisms of
transcriptional control of the ncRNAs processing, and ncRNA functions.
In this context the associations with sense-antisense gene pairing,
G-quadruples, triplex forming oligonucleotide structures, RNA secondary
structures are under our consideration. The in silico predictions are
assumed to be validated through wet-lab experiments.
16
| RESEARCH DIVISIONS
Comprehensive catalogue for cis-antisense gene pairs and complex
genome architectures
Cis-antisense gene pairs (CASGPs) can transcribe mRNAs from
opposite strands of a given locus. To classify and understand diverse
CASGP phenomena in the human genome, we compiled a genomewide catalog of CASGPs and integrated these sequences with
microarray, SAGE and miRNA data in our United Sense Antisense
Gene Pairs (USA GP) Database (http://globalisland.bii.a-star.edu.
sg/~jiangtao/sas/index3.php ?link = about). We identified up to
9,000 of overlapping antisense loci. 4374 of these CASGPs form
1759 complex gene architectures. For the first time, we found strong
significant overrepresentation of human miRNA genes in loci of
CASGPs. Using USA GP, we found Structural-Functional Modules
of Cis-antisense gene pairs with precursors of microRNAs. We
developed a data-driven model of cross-talk between co-expressed
CASGPs and DICER1-mediated miRNA pathway in normal
spermatogenesis and found that this cross-talk is switched off in
severe teratozoospermia. We discovered a novel cancer amplicon
(TMEM97, IFT20, TNFAIP1, POLDIP2 and TMEM199) organized
in a complex sense-antisense architecture (CSAGA) on 17q11.2
and demonstrated its strong and reproducible co-regulatory
transcription pattern in breast cancer tumours. (Figure 3). Data
analysis of expression profiles of 410 breast cancer patients revealed
survival significance of these genes and identified patients with low
and high risk of the disease recurrence.
Figure 3: A. USA GP Model: Integrative database and computational tool for comprehensive
analysis of cis-antisense gene pairing and expression in the human genome. http://
globalisland.bii.a-star.edu.sg/ ~jiangtao/sas/index3.php?link.
The group received “The Best Paper
Award” at The First International
Conference on BioMedical
Engineering and Informatics, 27-30
May 2008, Sanya, Hainan, China
for a Paper titled “Data-driven
Networking Reveals 5-Genes Signature
for Early Detection of Lung Cancer’.
Recent Publications
Figure 1: TTF mapping (http://ggeda.bii.a-star.edu.sg/~piroonj/TTS_mapping/ TTS_mapping.
php <http://ggeda.bii.a-star.edu.sg/%7Epiroonj/TTS_mapping/TTS_mapping.php>).
incompleteness and diverse sources of technical and biological noise. We
have developed an exploratory mixture probabilistic model for specific and
non-specific transcription factor-DNA (TF-DNA) binding (Figure 2). Within
ChIP-seq data sets, the statistics of specific and non-specific DNA-protein
binding is defined by a mixture of sample size-dependent skewed functions
described by Kolmogorov-Waring (K-W) function (Kuznetsov, 2003) and
exponential function, respectively (Figure 2). Using available ChIP-seq data,
we estimate (i) specificity and sensitivity of the ChIP-seq binding assays
and (ii) the number of specific but not experimentally validated binding sites
(BSs) in the genomes of cancers and embryonic stem cells. We conclude
that estimation of the binding sensitivity of a TF cannot be technically
resolved by current ChIP-seq, compared to former techniques. Our results
suggest that low- and moderate- avidity TFBSs are highly abundant in the
mouse and other mammalian genomes and can play biologically meaningful
functional roles.
1. Kuznetsov VA, Singh O., Jenjaroenpun P. Statistics of protein-DNA binding
avidity and estimating the total number of binding sites of a transcription factor
in the mammalian genome. BMC Genomics 2010, 11 (Suppl 1):S12 (10 Feb
2010).
2. Kanapin AA, Mulder N, Kuznetsov VA. Projection of gene-protein networks
to functional space of proteome and its application to analysis of organism
complexity. BMC Genomics 2010, 11 (Suppl 1):S4 (10 Feb 2010).
3. Grinchuk OV, Motakis E., Kuznetsov VA. Complex sense-antisense architecture
of TNFAIP1/ POLDIP2 on 17q11.2 represents a novel transcriptional
structural-functional gene module involved in breast cancer progression.
BMC Genomics 2010, 11 (Suppl 2):S9 (10 Feb 2010).
4. Winston Koh, Chen Tian Sheng, Betty Tan, Vladimir Kuznetsov, Lim Sai Kiang,
Vivek Tanavde. MicroRNA Expression Profile of Human Embryonic Stem Cells
Derived Mesenchymal Stem Cells (hES-MSC) by Deep Sequencing Reveals Possible
Role of Let-7 microRNA Family in Downstream Targeting of Hepatic Nuclear Factor
4 Alpha (HNF4A) BMC Genomics 2010, 11 (Suppl 1):S6 (10 Feb 2010).
5. Grinchuk OV, Jenjaroenpun P, Orlov YL, Zhou Jiangtao and Kuznetsov VA,
Integrative analysis of the human cis-antisense gene pairs, miRNAs and
their transcription regulation patterns, Nucleic Acids Res. 2009, Nov 11.
[Epub ahead of print]
6. Kuznetsov VA: Relative avidity, specificity, and sensitivity of transcription
factor-DNA binding in genome-scale experiments. Methods Mol Biol. 2009,
563:15-50.
7. Jenjaroenpun P, Kuznetsov VA. TTS Mapping: Integrative WEB Tool for Analysis of
Triplex Formation Target DNA Sequences, G-quadruplets and Non-protein Coding
Regulatory DNA Elements in the Human Genome, BMC Genomics 2009 Vol. 10
(Suppl 3) :S9, doi: 10.1186/1471-2164-10-S3-S9.
Selected Projects:
1. TTS (Triplex Target DNA Site) mapping WEB tool
and its applications
We have developed TTS mapping method (Figure 1) which provides
comprehensive visual and analytical tools to help users to find TTSs
and their co-localizations with G-quadruplets transcription factors
(TFs), micro-RNA (miRNA) precursors, CpG Island and other
regulatory DNA elements in the human genome regions. Moreover,
applications of this tool motivated us to suggest that some ncRNAs could
provide specific control of transcription via forming natural triplexes and
quadruples with genomic DNA. In particular, TTS Mapping reveals that
ncRNA precursor of mir-483 is formed by the high-complementary and
evolutionarily conserved pair of polypurine- and polypyrimidine- rich
3. Cis-antisense gene pairing (CASGPs) phenomena
9. Motakis E and Kuznetsov VA. Genome-scale identification of survival significant
genes and gene pairs, Lecture Notes in Engineering and Computer Science.
(Proc. of World Congress on Engineering and Computer Science, San
Francisco, USA, 20-22 Oct., 2009; Eds: S.I Ao, C. Douglas, W.S. Grundfelt&
J. Burgstone), IA ENG ,Newswood Limited, vol. I, 41-46, 2009. ISBN:978988-17012-6-8.
10. Prathipati P, Ma NL, Manjunatha UH, Bender A. Fishing the target of
antitubercular compounds: in silico target deconvolution model
development and validation. J Proteome Res. 2009, 8, 2788-98.
11. Grinchuk O, Motakis E. and Kuznetsov VA. Identification of complex
sense-antisense gene’s module on 17q11.2 associated with breast cancer
aggressiveness and patient’s survival. In: World Academy of Science,
Engineering and Technology(WASET), (Editor-in-Chiff: Cemal Ardil), vol.
58, pp.1046-1056. Venice, Italy; October, 2009. ISSN: 2070-3724.
12. Kashuba E, Yenamandra SP, Darekar SD, Yurchenko M, Kashuba V, Klein
G, Szekely L. (2009) MRPS18-2 protein immortalizes primary rat embryonic
fibroblasts and endows them with stem cell-like properties. Proc Natl
Acad Sci U S A. 2009 Nov 10. [Epub ahead of print]PMID:
19903879]
13. Yenamandra SP, Sompallae R, Klein G, Kashuba E. Comparative analysis
of the Epstein-Barr virus encoded nuclear proteins of EBNA-3 family. Comput
Biol Med. 2009 Nov;39(11):1036-42. Epub 2009 Sep 16.PMID: 19762010
[PubMed - in process]
14. Tun K, Rao RK, Samavedham L,Tanaka H, Dhar, P. Rich can get poor:
conversion of hub to non-hub proteins. Syst Synth Biol. 2008 Dec;2(34):75-82. Epub 2009 Apr 28.
8. Motakis E, Ivshina AV. & Kuznetsov VA. Data-driven approach to predict
survival of cancer patients. IEEE Engineering in Medicine and Biology 2009,
28, 58-66.
Principal Investigator’s Biography
Figure 2: TFA-DNA binding in a ChiP-seq experiment and its modeling. Our modeling
of the frequency distribution of relative avidity of TF(Nanog)-DNA binding loci in the genome
of mouse embryonic stem cells (ChIp-Seq data by Chen et al.2008).
Vladimir Kuznetsov was appointed Principal Scientist and Head of the Genome and Gene Expression Data Analysis Division at the Bioinformatics Institute (BII), A*STAR in
November 2007. Since 2007, he holds adjunct professor appointments in the Mathematics Department of the NUS, and in the School of Computing Engineering of NTU,
Singapore. He received a PhD in Biophysics at Moscow State University (Russia) in 1984. In 1992, he received a Doctor of Science degree in Mathematics and Physics at
the Technical Union of Russian Academy of Sciences (St. Petersburg, Russian Federation). In 1992-1998, he established and led the laboratory of Mathematical Immunobiophysics in the Institute of Chemical Physics (Moscow). In 1995, he was awarded a prestigious one year scholar grant by the American Cancer Society/International Union
against Cancer and then worked as a researcher scholar at the Laboratory of Molecular Tumor Biology, Centre for Biological Evaluation, FDA (Bethesda, MD, USA). He later
worked as scientist at National Institutes of Health (MD, USA), where he was involved in the NIH Cancer Genome Anatomy Project. He also served as Chief Scientist at the
Civilized Software Inc. (Bethesda, MD) and as a Senior M staff at System Research Applications International Inc. (Farfax, VA, USA). In 2004-2007, he developed several
systems biology and computational genomics projects at GIS/A-STAR as a Senior Group Leader. In 1994, he was awarded the P.L Kapitsa Silver Medal “To the Author of
Scientific Discovery” and elected as a Corresponding Member the Russian Academy of Natural Sciences. He has published two books, over 100 research papers and reviews.
He is a member of the editorial boards of BMC Biology Direct, BMC Genomics and Journal of Integrative Bioinformatics. (Details: http://www.bii.a-star.edu.sg/research/
biography/vladimirk.php)
RESEARCH DIVISIONS |
17
18
Vivek TANAVDE
Research Associates:
Betty TAN Bee Tee; LEE Qian Yi; Michelle KWAN Kah Yian
Expression and Signaling
in Mesenchymal and
Hematopoietic Stem Cells
Adult stem cells have the potential to differentiate into a wide variety
of tissue specific cells. These cells can therefore be used to treat
a variety of disorders ranging from myocardial infarction to
osteoporosis. Mesenchymal stromal cells (MSC) which are the non
hematopoietic cells found in the marrow have been used in many
such therapies. Although these cells are already being used clinically,
we know very little about the mechanisms these cells use to
differentiate to different lineages. My group aims to understand
signaling pathways involved in mesenchymal stromal cell
differentiation and how these pathways are regulated. For this
purpose we developed an approach to gather information about
cellular signaling from gene expression data. Using this approach,
we identified 3 pathways critical for MSC growth and differentiation.
We are also using this approach to understand differentiation of
embryonic stem cells. Currently my laboratory is also involved in
understanding the role of micro RNAs (miRNAs) in MSC
differentiation.
Projects:
1. Development of serum free medium for culturing MSC. This
project was carried out in collaboration with Invitrogen. Using
time course gene expression analyses of differentiating MSC,
we identified three pathways critical for growth and differentiation
of MSC (Ng et al Blood 2009). This information was used to
develop STEMPRO® MSC-SFM, the first commercially available
serum free medium for culturing MSC.
2. Identification of miRNAs secreted by MSC and their role in
indirect regulation of signaling networks. This project was carried
out in collaboration with Sai Kiang Lim from the Institute of
Medical Biology (IMB). Her group had sequenced small RNAs
found intracellularly as well secreted by MSC in exosomes. We
identified differentially secreted miRNAs from the next generation
sequencing data & showed that the let-7 family of miRNAs
regulated a network of genes with the transcription factor HNF4A
18
| RESEARCH DIVISIONs
as its hub (Koh et al BMC Genomics 2010). The HNF4A gene
has no predicted miRNA binding sites in its untranslated region.
However using next generation sequencing we were able to
predict and experimentally verify that let-7 family of miRNA
indirectly regulates expression of HNF4A.
3. Identification of miRNAs important in differentiation of MSC
into bone cartilage and fat. miRNA’s have been shown to be
important regulators of differentiation in embryonic and
hematopoeitc stem cells. In this project we aim to identify
differentially expressed miRNAs in fetal limb derived MSC as
they differentiate into bone cartilage and fat. The miRNA
expression profile coupled with mRNA expression profile of the
same cells will enable us to identify miRNAs that are critical in
trilineage differentiation of MSC & the genes and signaling
pathways they target (direct as well as indirect targeting) to
achieve this regulation.
4. Identification of tranlsationally regulated genes in embryonic
stem cell differentiation. In this study we were able to successfully
identify translationally regulated genes in differentiating human
embryonic stem cells using microarray. This project is being
carried out in collaboration with Prabha Sampath at Institute
of Medical Biology, A*STAR.
5. Identification of differentially expressed signaling pathways in
Lamin A mutants. Lamin A is a protein that controls movement
of macromolecules across the nuclear membrane. Using
microarray, we identified differentially expressed genes in Lamin
A mutant cells subjected to stress. This enabled us to predict
the signaling pathway responsible for the Lamin A mutant
phenotype. This project is being carried out in collaboration
with Colin Stewart at Institute of Medical Biology, A*STAR.
6. Identification of biomarkers for assessing response of toxic
compounds to human and murine embryonic stem cells. Embryonic
stem cells have the potential to serve as valuable tools to test
toxicity of different compounds. As models of embryonic stem
cell differentiation develop, our ability to use this information to
screen compounds that affect these differentiation processes
should also improve. In this project we aim to identify biomarkers
for assessing the toxic response of a drug to neuronal differentiation
of MSC. This project is carried out in collaboration with Suzanne
Kadereit, University of Konstanz.
Figure 1: HNF4A is a common hub for networks derived from alignment data and
TargetScan predictions. Gene interaction network in this figure is derived from
the dataset of genes with overlapping regions corresponding to peaks from previous
mapping.
Figure 2: This figure shows the gene interaction network derived from computationally
predicted gene targets from TargetScan. A similar topology was observed for gene
interaction networks in Figures 1 and 2, with HNF4A as a node amongst the
interactions suggesting HNF4A as a possible downstream target for let-7 family
miRNAs.
Recent Publications
1. Winston Koh, Chen Tian Sheng, Betty Tan, Qian Yi Lee, Vladimir Kuznetsov,
Lim Sai Kiang, Vivek Tanavde. (2010) Analysis of Deep sequencing
microRNA expression profile from human embryonic stem cells derived
mesenchymal stem cells reveals possible role of let-7 microRNA family
in downstream targeting of Hepatic Nuclear Factor 4 Alpha. BMC
Genomics (In Press)
4. Ng F,.Boucher, S, Koh S, Sastry K. S., Chase L, Lakshmipathy U, Choong
C, Yang Z, Vemuri M. C, Rao M. S, Tanavde, V.(2008) PDGF, TGF-b and
FGF signaling is important for differentiation and growth of mesenchymal
stem cells (MSCs): transcriptional profiling can identify markers and
signaling pathways important in differentiation of MSC into adipogenic,
chondrogenic and ostoegenic lineages. Blood. 112(2):295-307
2. Lai RC, Arslan F, Tan SS, Tan B, Choo A, Lee MM, Chen TS, Teh BJ,
Eng JK, Sidik H, Tanavde V, Hwang WS, Lee CN, Oakley RM, Pasterkamp
G, de Kleijn DP, Tan KH, Lim SK. (2010) Derivation and characterization
of human fetal MSCs: An alternative cell source for large-scale production
of cardioprotective microparticles. J Mol Cell Cardiol. (In Press)
5. Shah VK, Desai AJ, Vasvani JB, Desai MM, Shah BP, Lall TK, Mashru
MR, Shalia KK, Tanavde V, Desai SS, Jankharia BJ. (2007) Bone marrow
cells for myocardial repair-a new therapeutic concept. Indian Heart J.
59(6):482-90.
3. Vivek M. Tanavde, Lailing Liew, Jiahao Lim and Felicia Ng (2009)
Signaling Networks in Mesenchymal Stem Cells. In: Regulatory Networks
in Stem Cells, V.K. Rajasekhar, M.C. Vemuri (eds.), Humana Press.
6. P. Shetty, K. Bharucha, V. Tanavde (2007) Human umbilical cord blood
serum can replace fetal bovine serum in the culture of mesenchymal
stem cells. Cell Biol International 31.293-298.
Principal Investigator’s Biography
Dr. Vivek Tanavde joined the Bioinformatics Institute, Singapore as a Research Scientist in the Genome & Gene Expression Data Analysis in 2006. Prior to
joining BII, he was heading the Hematopoietic Stem Cell Lab at Reliance Life Sciences, Mumbai where his work focused on developing mesenchymal stromal
cell based therapies for cardiac and neuronal disorders. From 1999 to 2002 he was a post doctoral fellow with Dr. Curt Civin at the Sidney Kimmel Cancer
Centre, Johns Hopkins University working on expansion of hematopoietic stem cells from umbilical cord blood. Dr. Tanavde obtained his Ph.D from the Cancer
Research Institute, Mumbai (1999) in Applied Biology. (Details: http://www.bii.a-star.edu.sg/research/biography/vivek.php)
RESEARCH DIVISIONS |
19
20
Igor KUROCHKIN
Postdoctoral Fellows:
Antonis GIANNAKAKIS; Aliaksandr YARMISHYN
Predictive and Functional
Analysis of Long Noncoding
RNAs
Our group is focused on the discovery and functional analysis of
transcripts expressed from the human genome that do not encode
proteins. In particular, we are interested in the cellular roles of the
so called long noncoding RNAs (lncRNAs) defined as RNAs longer
than 200 nucleotides. This goal is achieved by exploiting
bioinformatics and multiple molecular and biochemical
approaches.
Over the past decade, numerous cDNA cloning and sequencing
projects and genome-tiling array analyses revealed that the
mammalian genomes are almost entirely transcribed leading to the
generation of the tens of thousands of ncRNAs. Diverse ncRNA
species include short miRNAs, piRNAs and much longer ncRNAs
(lncRNAs). Although the involvement of miRNAs in various biological
processes including cellular proliferation and differentiation is
increasingly evident, the function of much more abundant class of
lncRNAs is largely unknown. Few studies performed so far on
biological role of lncRNAs suggest extremely diverse mechanisms
of action of this class of molecules. Computational prediction of
lncRNA function thus faces a serious challenge of decoding the
information contained within the sequence of these molecules. The
task is complicated by the fact that RNAs can encode not only
sequence-specific interactions using base-pairing rules but also
assume various secondary and tertiary structures that bind to proteins
important for transcription or epigenetic modifications. We aim to
develop a novel computational framework for functional and structural
analyses of lncRNAs that integrates high-throughput data related
to transcriptional control of lncRNA, secondary structure of these
molecules, evolutionary conservation, and functional annotation of
co-expressed protein-coding RNAs. The predictions are assumed to
be validated through wet-lab experimentation.
Information about when and where lncRNAs are expressed is useful
for probing their function. Thus the understanding of the function
and mechanistical aspects of lncRNAs action will be facilitated by
the analyses of dynamical changes in their expression that occur
20
| RESEARCH DIVISIONS
in the course of development, cell differentiation and response to
the environmental stress conditions. At the initial stage of this
project we set out to identify those lncRNAs whose expression is
significantly changed during retinoic acid-induced neuronal
differentiation of human neuroblastoma cell line SH-SY5Y. A custombuilt oligonucleotide microarray profiling revealed that a small
fraction of lncRNAs was highly regulated. A large part of these
transcripts mapped to the intronic regions of the protein-coding
genes. About 21% of the intragenic lncRNAs mapped to the
annotated genes in antisense direction, in line with the previous
reports that over 20% of human transcripts might form senseantisense pairs. Most of these antisense lncRNAs correlated positively
in their expression pattern with the sense strand of the genes with
a small minority showing negative correlation.
Figure 1: Expression of a novel intergenic lncRNA associated with HOXD cluster genes is significantly induced during neuronal differentiation of human neuroblastoma cells.
While antisense lncRNAs are expected to regulate the expression
or stability of their sense counterparts, the functional role of the
intergenic lncRNAs remains a mystery. We specifically focus on
this group of transcripts, hoping to discover novel functions performed
by RNA molecules. In order to address an issue of functionality of
intergenic lncRNAs, we determined their conservation in different
eukaryotic genomes. The analysis revealed that although regulated
intergenic lncRNAs are not conserved at the level of their entire
sequence, most of them contain short patches of highly conserved
sequences. In addition, very high conservation levels are observed
in the promoters of the regulated lncRNAs. Interestingly, retinoic
acid-induced intergenic lncRNAs were often found to be adjacent
to the genes encoding transcription factors. Among them, we
identified a lncRNA located in the homeobox D gene cluster that is
significantly up-regulated during neuronal differentiation (Figure
1). Hox genes are known to play a regulatory role in patterning of
the CNS, as well as in cell specification. A major issue of whether
intergenic lncRNAs can be a driving force behind activation of the
neuronal-specific program is addressed by analyzing the effects of
knockdown and overexpression of selected lncRNAs on the wholegenome expression pattern and cell phenotype.
Recently, integrative bioinformatics and experimental approaches
enabled us to predict several novel proteins with possible roles in
peroxisomal biochemistry and metabolism. The transport of most
proteins into the peroxisomal matrix is mediated by two poorly
defined peptide motifs, PTS1 and PTS2. In our approach, we
combined computational searches of PTS1 and PTS2 motifs in
protein sequence databases with the analysis of co-occurring motifs,
expression patterns, secondary structure properties, orthologues
and variants, literature search and manual curation. This approach
has predicted the long-sought peroxisomal processing protease
encoded by Tysnd1 gene. This protease was demonstrated to localize
to peroxisomes and process enzymes catalyzing all steps of the
peroxisomal β-oxidation pathway of fatty acids, thus suggesting its
involvement in the control of lipid metabolism (Figure 2). The U.S.
Patent Office has granted us a patent for the method of screening
for agents that modulate Tysnd1 levels or activity in cells. The
issuance of this patent represents a step forward in the development
of drugs for treatment of peroxisomal disorders.
Figure 2: Computationally predicted peroxisomal protein Tysnd1 is localized to peroxisomes where it processes enzymes involved in -oxidation pathway of fatty acids.
Recent Publications
1. Mizuno Y, Kurochkin IV, Herberth M, Okazaki Y, Schönbach C. Predicted mouse
peroxisome-targeted proteins and their actual subcellular locations. BMC
Bioinformatics. 2008 Dec 12;9 Suppl 12:S16.
2. Zhang L, Volinia S, Bonome T, Calin GA, Greshock J, Yang N, Liu CG,
Giannakakis A, et al. Genomic and epigenetic alterations deregulate
microRNA expression in human epithelial ovarian cancer. Proc Natl Acad
Sci U S A. 2008 May 13;105(19):7004-9.
4. FANTOM Consortium; RIKEN Genome Exploration Research Group and
Genome Science Group (Genome Network Project Core Group). The
transcriptional landscape of the mammalian genome. Science. 2005
Sep 2;309(5740):1559-63.
5. Kurochkin, IV, Nagashima T, Konagaya A, Schönbach C. Sequence-based
discovery of the human and rodent peroxisomal proteome. Appl
Bioinformatics. 2005;4(2):93-104.
3. Kurochkin IV, Mizuno Y, Konagaya A, Sakaki Y, Schönbach C, Okazaki
Y. Novel peroxisomal protease Tysnd1 processes PTS1- and PTS2containing enzymes involved in beta-oxidation of fatty acids. EMBO J.
2007 Feb 7;26(3):835-45.
Principal Investigator’s Biography
Igor Kurochkin joined the Bioinformatics Institute in 2009. He received his B.S. from Kiev National University, majoring in biochemistry, and then earned a
Ph.D. in molecular biology from the Institute of Molecular Biology and Genetics in Kiev. After postdoctoral work in the School of Pharmaceutical Sciences at
Toho University, Japan (1990-1993), he joined the Holland laboratory of American Red Cross, MD as a visiting research fellow supported by the International
Fellowship from the Fogarty International Center, NIH (1993-1995). During 1996-2002, he was a research scientist at Chugai Pharmaceutical Co., Ltd. (now
a member of the Roche group). He returned to the academic sector as a research scientist in RIKEN Genomic Sciences Center, Japan (2002-2009). (Details:
http://www.bii.a-star.edu.sg/research/biography/igork.php)
RESEARCH DIVISIONS |
21
22
Geometric global image features
Texture segmentation
In quantitative biology studies such as drug and siRNA screens,
robotic systems automatically acquire thousands of images from
cell assays. Because these images are large in quantity and high in
content, detecting specific patterns (phenotypes) in them requires
accurate and fast computational methods. To this end, we have
developed a geometric global image feature for pattern retrieval on
large bio-image data sets. This feature is derived by applying spectral
graph theory to local feature detectors such as the Scale Invariant
Feature Transform, and is effective on patterns with as few as 20
keypoints. We demonstrate successful pattern detection on synthetic
shape data and fluorescence microscopy images of GFP-Keratin14-expressing human skin cells.
Image segmentation is indispensable in many applications. It
facilitates the extraction of useful information for subsequent high
level image analysis. For instance, in pathological research, digital
microscopy becomes increasingly popular since the introduction of
high-throughput tissue microarrays (TMA) into bioimaging
communities. It is therefore crucial to segment each tissue image
into a meaningful partition in an accurate, fast, automated and
robust manner. In particular, textures extracted from the image have
higher discriminant power than intensity. Since a high-dimensional
feature space is usually considered for texture segmentation, the
data is very sparse as for each given pixel the number of relevant
features is usually small. Therefore, a practical approach is needed
to filter out the irrelevant features to make the description of each
segment by its feature more compact.
LEE Hwee Kuan
Postdoctoral Fellows:
Ivy LAW Yan Nei; YU Weimiao; Patrick KOH Yang Wei; CELIK Turgay
Research Associate:
YAP Choon Kong
Computer Vision and Pattern
Discovery for Bioimages
Figure 1A: Image of skin cells with a mutant annotated by a polygon.
The group focuses on developing advanced computer vision, machine
learning and mathematical models to elucidate the complex behavior
of biological systems. We analyze images from wide-field, confocal,
single plane illumination microscopes, including data sets from
high-throughput image screens.
Quantitative analysis of neural stem cells
Field theoretical image restoration
Microscopy has become a de facto tool for biology. However, it
suffers from a fundamental problem of poor contrast with increasing
depth, as the illuminating light gets attenuated and scattered and
hence can not penetrate through thick samples. The resulting decay
of light intensity due to attenuation and scattering varies exponentially
across the image. The classical space invariant deconvolution
approaches alone are not suitable for the restoration of uneven
illumination in microscopy images. We developed a novel physicsbased field theoretical approach to solve the contrast degradation
problem of light microscopy images.
Our proposed formulation is radically different from all existing
physics based restoration techniques, in which we do not assume
constant extinction coefficient in the attenuating medium. Moreover,
in our formalism, we make no distinction between the image object
and the attenuating medium. We derived a general set of equations
to handle any geometrical setup in the image acquisition. To use
our method, one only needs to specify details of the light source
and the detection equipment such as a camera.
In our formalism, we assume a volume of interest in which our
biological sample resides. As in most field theories, volume is divided
into infinitesimal elements. Light intensities in each volume elements
are then calculated based on the physical principle of light attenuation
and scattering. This allows us to calculate the amount of light
emitted from each infinitesimal volume element. We then relate
this to the light detected by the CCD/photomultiplier tube. In this
way, from the information collected by the CCD/photomultiplier tube
and the relationship between the amount of light emitted and amount
of light detected, we can restore the image and remove the problem
of light attenuation and scattering. We apply our theory on confocal
microscopy and show using controlled experiments that our restoration
method works.
22
| RESEARCH DIVISIONS
We propose a novel image segmentation model, called the Subspace
Mumford-Shah model, which incorporates subspace clustering
techniques into a Mumford-Shah model to solve texture segmentation
problems. To optimize the objective, our first attempt is to use a
supervised procedure to determine several optimal subspaces. These
subspaces are then embedded into a Mumford-Shah objective
function so that each segment of the optimal partition is homogeneous
in its own subspace. The method outperforms standard MumfordShah models since it can separate textures which are less separated
in the full feature space. The method also has an increased robustness
and convergence speed compared to existing subspace clustering
methods. Experimental results are presented to confirm the usefulness
of subspace clustering in texture segmentation. To make our work
more practical, our next goal is to develop a fully unsupervised
approach to optimize the objective.
Understanding the biology of neural cells is important in designing
treatments for neural related diseases. A unique trait of neural cell
is that it has neurites that connect other neural cells. These neurites
outgrowth is a fundamental characteristic of neurons and they
eventually form synapses and proper functioning of the nervous
system depends on the formation of proper connections.
Figure 1B: Detection of this mutant automatically by our intelligent vision system,
red representing high probability of occurrence of mutants.
Segmentation of semi-transparent objects
We consider the problem of segmenting two overlapping objects whose
intensity level in the intersection is approximately the sum of the
level of the individual objects. This is a fundamental image processing
task with many real world applications especially those that involve
some measurements of concentration using imaging techniques.
Examples include X-ray images, images of absorbent paper with mouse
scent marks, and microscopy images recording protein expression
levels. Although many applications of such a model can be found,
there has been very little, if not none, study of this problem.
We propose a variant of the Mumford-Shah model for the segmentation
of overlapping objects with additive intensity value. Unlike standard
segmentation models, it does not only determine distinct objects in
the image, but also recover the possibly multiple membership of the
pixels. To accomplish this, some a priori knowledge about the smoothness
of the object boundary is integrated into the model. Additivity is imposed
through a soft constraint which allows the user to control the degree
of additivity and is more robust than the hard constraint. We also show
analytically that the additivity parameter can be chosen to achieve
some stability conditions. To solve the optimization problem involving
geometric quantities efficiently, we apply a multi-phase level set method.
Segmentation results on synthetic and real images validate the good
performance of our model.
computer vision applications to process the images. Our task in the
computer vision and pattern discovery group is to develop fast and
accurate software to process thousands of images generated through
the high throughput microscopy system.
Our new method combines the level-set and watershed methods in
a specific way to achieve fast and accurate segmentation of the
neural cells. Neural cells have outgrowths and cytoplasm that touch
each other. Many algorithms in the literature could not separate
cells that touch each other. Our method is designed specifically to
overcome this difficulty.
Our method performs much better than currently available software,
the error rate of our method, validated against a set of about 6000
cells is 6.5% while the error rate for METAMORPH on the same set
of data is 25.5%.
Figure 2: Images of a neurosphere.
Figure 3: Nucleus segmentation
of the neurosphere.
The paper titled “Level Set
Segmentation of Cellular
lmages based on Topological
Dependence” was awarded the
“Mitsubishi Electric Research
Laboratories Best Paper Award”
at the 4th International
Symposium on Visual
Computing, 1-3 Dec 2008, Las
Vegas, USA.
In this study, a high throughput image screening system is developed.
This system includes the microscopy setup as well as advanced
Recent Publications
1. W. Yu, H. K. Lee, S. Hariharan, W. Bu, S. Ahmed, “Evolving Generalized
Voranoi Diagram of Active Contours for Accurate Cellular Image
Segmentation”, Cytometry Part A, 2010 (accepted).
2. Y. N. Law, H. K. Lee, A. M. Yip, “Semi-supervised Subspace Learning
for Mumford-Shah Model Based Texture Segmentation”, Optics Express,
2010 (accepted).
3. H. K. Lee, M. S. Uddin, S. Sankaran, S. Hariharan, S. Ahmed, “A field
theoretical restoration method for images degraded by non-uniform light
attenuation : an application for light microscopy”, Optics Express, 2009;
17(14): 11294-11308
4. Y. N. Law, H. K. Lee, A. M. Yip, “Segmentation of Semi-transparent
Objects Using a Variant of the Mumford-Shah Model”, Proceedings of
the 2009 International Conference on Image Processing, Computer
Vision & Pattern Recognition, Volume II.
5. Y. N. Law, H. K. Lee, A. M. Yip, “Supervised Texture Segmentation Using
the Subspace Mumford-Shah Model”, Proceedings of the 2009
International Conference on Image Processing, Computer Vision & Pattern
Recognition, Volume II.
6. W. Yu, H. K. Lee, S. Hariharan, S. Sankaran, P. Vallotton, S. Ahmed,
“Segmentation of Neural Stem/Progenitor Cells Nuclei within 3-D
Neurospheres”, Advances in Visual Computing, ISVC 2009, Lecture
Notes in Computer Science , 2009; 5875: 531-543.
7. Q. Ho, W. Yu, H. K. Lee, “Region Graph Spectra as Geometric Global
Image Features”, Advances in Visual Computing, ISVC 2009, Lecture
Notes in Computer Science , 2009; 5875: 253-264.
8. W. Yu, H. K. Lee, S. Hariharan, W. Bu, S. Ahmed, “Detection and
Quantitative Measurement of Neuronal Outgrowth in Fluorescence
Microscopy Images”, Proceedings of the Medical Image Understanding
and Analysis (MIUA) 2009.
Principal Investigator’s Biography
Hwee Kuan obtained his PhD in Theoretical Condensed Matter Physics from Carnegie Mellon University in 2001. He then held a joint postdoctoral position
with Oak Ridge National Laboratory (USA) and University of Georgia where he worked on advanced Monte Carlo methods and nano-magnetism. In 2003, with
an award from the Japan Society for Promotion of Science, Hwee Kuan moved to Tokyo Metropolitan University where he developed solutions to extremely
long time scaled problems and a reweighting method for nonequilibrium systems. In 2005 he returned home to join Data Storage Institute, proposing a novel
magnetic recording method using magnetic resonance. In 2006, he joined Bioinformatics Institute as a Principle Investigator in the Imaging Informatics
Division. (Details: http://www.bii.a-star.edu.sg/research/biography/leehk.php)
RESEARCH DIVISIONS |
23
24
Martin WASSER
Postdoctoral Fellows:
Rambabu CHINTA; Janos KRISTON-VIZI; DU Tiehua
Research Associates:
PUAH Wee Choo; Gina PAN Jinghong; TAN Joo Huang;
Rahul KUMAR
Live Cell Imaging and
Automation of Image Analysis
The group is interested in studying animal development using 3D
time-lapse microscopy and computer vision. Their principal goal is
to build image analysis systems that can recognize tissues, cells
and organelles in multi-dimensional image data and measure their
static and dynamic properties. The major research activities are
directed at constructing the components of a computational pipeline
and integrating them into semi-automated image analysis systems.
Computational pipelines cover preprocessing, segmentation, feature
extraction, classification and cell tracking. Currently, the efforts are
directed at the phenotypic characterization of two biological processes
in the model system Drosophila melanogaster; (1) Cell cycle
progression of embryonic cells and (2) apoptosis and remodeling
of muscle cells during metamorphosis.
In 2009, BII acquired a Zeiss 5 Live high-speed confocal laser scanning
microscope (Figure 1). This instrument will enhance the group’s ability
to produce images of live cells in sufficient quality and quantity to
support algorithm development as well as biological discovery.
Quantitative Microscopy of Cell Cycle Progression in
Drosophila Embryogenesis
The study of cells in their natural tissue environment promises to
uncover novel insights into the mechanics and regulation of cell
proliferation that cannot be easily gained from observations of
cultured cells in Petri dishes. Drosophila embryos are a powerful
system in which the dynamics of synchronized nuclear as well as
non-synchronized cell division can be easily monitored by tagging
chromosomes with fluorescent fusion proteins. 3D movies acquired
by time-lapse microscopy are not only pretty to look at but also
provide a rich source of quantitative cellular features, such as DNA
content, which combined with derived categorical features, such
as the cell cycle phase, will be useful in characterizing the function
of known and unknown genes. However, the task of analyzing colossal
amounts of multi-dimensional image data is not trivial. To address
this challenge we have developed collection of tools for image
segmentation, feature extraction, tracking, classification, visualization,
annotation, validation of computer vision algorithms and file
conversion. Image measurements rely heavily on the accuracy of
the chosen segmentation algorithm. We have developed a fast 3D
nuclear segmentation method that adapts to inhomogeneous signal
intensities, poor signal to noise ratios and histone-GFP localized to
the cytoplasm (Figure 2). To improve the performance of computer
vision methods and to support biological interpretation, we apply
machine learning for cell cycle phase classification (Figure 3). To
test our approach in the phenotypic characterization of gene function
we applied our image analysis pipeline to 3D live cell movies of
diploid wildtype and haploid mutant embryos. Our analysis has
provided new insights into the function of the maternal haploid
gene and the control of the size of the nucleus.
Figure 2: Automated 3D nuclear segmentation method.
Tissue Destruction and Remodeling in
Metamorphosis
The second biological theme is the destruction and remodeling of
tissues during metamorphosis. The group focuses on the muscular
system and uses fluorescence live cell imaging to study apoptosis
of obsolete and remodeling of persistent larval into adult muscles.
The structural organization of muscles is accompanied by initially
decreasing and later increasing thickness of the muscle fiber.
Therefore, studying the dynamics of muscle remodeling in flies
might evolve into an animal model for muscle atrophy and hypertrophy.
A challenge in studying developmental by 3D time-lapse microscope
is that rapid tissue movements can affect visualization and
quantitative analysis. To overcome this problem, a non-rigid stack
registration method was developed. In an Editorial of Cytometry A
(75A:279281, 2009), this study was highlighted as “a true
masterpiece of cell analysis”.
Figure 3: Machine learning techniques are used for automatic phenotypic classification. Here cell cycle phases are assigned to segmented nuclei.
Recent Publications
1. Ong SM, Zhao Z, Arooz T, Zhao D, Zhang S, Du T, Wasser M, van Noort
D, Yu H (2009). Engineering a scaffold-free 3D tumor model for in
vitro drug penetration studies. Biomaterials, [Epub ahead of print].
3. Du Tiehua and Wasser Martin (2009). 3D Image Stack Reconstruction
in Live Cell Microscopy of Drosophila Muscles and its Validation.
Cytometry A. 2009 Apr, 75(4): 329-43.
2. Rambabu Chinta, Wee Choo Puah, Martin Wasser (2009). 3D segmentation
for the study of Cell Cycle Progression in Live Drosophila Embryos.
International Conference on Biomedical Electronics and Devices, First
International Workshop on Medical Image Analysis and Description for
Diagnosis Systems - MIAD 2009, Porto, Portugal, 14-17 Jan 2009.
4. Wasser Martin, Zalina Bte Osman, Chia, William (2007). “EAST and
Chromator control Muscle Destruction and Remodeling in Drosophila
Metamorphosis”. Developmental Biology, Vol. 2, 380-393.
5. Wasser, Martin and Chia, William (2007). The extrachromosomal EAST
Protein of Drosophila can associate with Polytene Chromosomes and
regulate gene expression. PLoS ONE 2: e412.
Principal Investigator’s Biography
Figure 1: BII’s Zeiss 5 Live confocal microscope is used for multi-dimensional
imaging of live cells.
24
| RESEARCH DIVISIONS
Martin Wasser obtained the Biology degree from the University of Cologne in Germany in 1993. In 1998, he received his PhD in Molecular and Cell Biology
from the IMCB in Singapore. As a PhD student and postdoc, he has conducted extensive research on the role of nuclear architecture and chromatin structure
in cell proliferation and animal development. While doing postdoctoral wet-lab research he obtained a Master’s degree in Knowledge Engineering from the
Institute of Systems Science. Before joining BII he worked as a research fellow at the Temasek LifeScience Laboratory in Singapore. Since 2007 he has been
heading a research team at the BII, focusing on live-cell imaging and the development of image analysis systems. (Details: http://www.bii.a-star.edu.sg/research/
biography/martinw.php)
RESEARCH DIVISIONS |
25
26
IT Scientific SErvices
Bio-Computing Centre
Team Leader:
YONG Tai Pang
Team Members:
Caleb KHOR Ken Swee, Zahari Jeffrey, Johnny LIM Gek Wee,
CHAN Ang Loon, Charlie TAN Chee Khiong,
TOE Chin Siang, HARRON Hanafi and Violet LIN Liling.
CHEOK Leong Poh
Photo by Vivek Tanavde, BII
Software Engineering
Team Leader:
CHEOK Leong Poh
Team Members:
LUA Seow Chin, Mohamed HANIFA, NG Wee Thong,
VOON Kian Loon
The software engineering team is made up of software engineers
who work in close collaboration with scientists from BII and A*STAR’s
research institutes to address their needs in scientific software
solutions. At present, the team has a number of collaboration projects
with BII’s scientific groups, as well as other A*STAR’s research
institutes. Majority of the projects undertaken by software engineering
team are Java web-based applications with relational database, and
constantly involve other open source projects. The team’s current
focus is on Laboratory Information Management System (LIMS)
related software projects with Genopolis/BASE and Screensaver as
prominent examples.
2. Screensaver
This is a collaboration project with the Institute of Molecular
and Cell Biology, A*STAR (IMCB), Harvard Medical School
(HMS) and The Netherlands Cancer Institute (NKI). The objective
is to assist our collaborator from IMCB to setup and customize
Screensaver, an open source LIMS for high-throughput screening
developed by HMS. Work completed to date includes customizing
Screensaver user interface, changing upload/export file tailored
to IMCB format, and integrating analysis tools in Screensaver,
e.g. Annotator and cellHTS2. The collaboration with HMS and
NKI involves code development, particularly in integrating
Screensaver with cellHTS2. The team is also working together
with the Annotator Group on integrating Screensaver and the
Annotator. A local version of Screensaver production is currently
hosted at BII.
Project Highlights
1. Genopolis/BASE
This is a collaboration project with Singapore Immunology
Network, A*STAR (SIgN). The objective is to assist SIgN to find
and replace its existing LIMS software – Genopolis with BASE,
which can support both Affymetrix and Illumina platforms.
BASE is an open source web-based database solution for
microarray experiments supported by Lund University. Initial
development work completed includes helping SIgN to setup
and migrating data from Genopolis to BASE, and re-implementing
Genopolis analysis features in BASE. Current phase is focus on
simplifying user data entry, storing additional annotation
information, improving user interface and usability. Local version
of BASE production was launched in April 2009 and is currently
hosted at BII.
26
| IT SCIENTIFIC SERVICES
Scientific Computing Team
The role of the Scientific Computing team is to provide technical
support and expertise in the areas of compute, storage and networking,
in a manner which suits the unique needs and requirements of the
various BII scientific divisions. The team’s main focus is on providing
highly customized IT resources on demand, at short notice, while
seeking innovative and elegant architectures and solutions. Our
areas of specialization include:
a. High throughput Linux clusters
b. Distributed, general purpose file systems
c. High volume IP networks for large data transfers
d. Data backup, replication and archival
e. Design, implementation and operation of corporate services
YONG Tai Pang
Photo by Vivek Tanavde, BII
Two major projects were completed in year 2009 with a focus on
increasing the scientific computing resources of BII.
1. Construction of new BII Data Centre
In 2009, BII has set up its own data center to house its scientific
computing facilities and corporate servers. This data center at Matrix
has a total capacity of 33 racks comprising of high (7kW) and medium
(3kW) density racks and network racks. Additionally 6 racks at 2
kW/rack were catered for at the new BII development room.
Considerable redundancy (N+1) was installed for the equipment
serving the data center where possible, including UPS
(uninterruptible power supply) and CRACs (Computer Room Air
Conditioning) units which are critical to the operation of the
data center.
Enterprise Computing System Team
The Enterprise Computing System Team provides infrastructural
support in the areas of desktop computing. Bearing in mind the
unique needs and work culture of scientists, the team has provided
commodity as well as custom built high-end workstations, to meet
the technical needs of our users.
Roles and Responsibilities:
• Translate user requirements into design and technical
specifications, develop and deliver software solutions to meet
scientific objectives.
• Provide expertise in software engineering to assist research
groups in BII and other institutes.
• Work closely with other BII’s IT teams to leverage on their IT
infrastructures and services to provide comprehensive IT solutions
to the collaborators.
• Bridge BII with other A*STAR’s research institutes through
scientific software collaborations.
The IT Services department provides all IT technical services to the
Institute’s research and administrative needs. Such services include
web services, scientific, storage and networking infrastructural
support as detailed below:
3. BioImage Informatics on the WWW
This is an internal collaboration project with Computer Vision
and Pattern Discovery Group. The objective is to create a webbased application integrating with several image processing
algorithms. The application is made up by Java Applet as web
front-end allowing user to upload and display images, and the
uploaded images will be sent over for server-site image processing.
The software is currently in development.
The team also provides ad-hoc server-side compute facilities on
demand, on a project by project basis. For example, a group of
scientists may require an entry-level server to host services including
FTP, MySQL, Apache, or SVN. Thus the team will design, deploy
and maintain the necessary server hardware, storage and networking
based on the specific requirements laid down by our users.
Web Services Team
The Web Services team is mainly responsible for the design and
maintenance of the BII home pages on the Internet and our Intranet
web sites. They are also in charge of developing, upgrading and the
on-going maintenance of BII e-Services such as the room-booking
system and the e-Calendar.
Other services provided by the Web Services team include graphic and
multimedia design, web site design and layout, scanning and photography
services, web publishing support and web and database hosting. It is
important to ensure that the electronic information published by BII is
of high accuracy and quality as the BII home page and its associated
web pages are the first point of contact between the institute and public.
Hence, the quality of the content published is fundamental in upholding
the strong reputation and image of the Institute. The team ensures that
the information published electronically is accurate, visually appealing,
clearly presented and complies with the Singapore Government Web
Interface standards set by IDA.
2. Computational Clusters
BII currently owns and operates 2 clusters to meet its needs:
a. Annotator Cluster
This consists of 24 compute nodes, 2 job schedulers and a
pair of high end production and developmental servers. The
cluster was put together and tuned specifically to meet the
unique workflow characteristics of the Annotator project.
b. Cluster for Molecular Dynamic Simulations and Generic
Computation
This consists of 46 compute nodes, each fitted with 8 CPUs
(3Ghz) and 32GB memory, sharing 14TB of storage capacity.
24 nodes are on Infiniband interconnect while the rest are
on Gigabit Ethernet.
There are plans to expand the cluster by a batch of latest generation
(Nehalem) compute nodes in 2010.
IT SCIENTIFIC SERVICES |
27
28
29
Visiting Scientists for FY2009
Adjunct SCIENTISTS at BII
Prof. Roger Beuerman
Senior Scientific Director,
Singapore Eye Research Institute
Professor, SRP in Neuroscience and
Behavioral Disorders,
Duke-NUS Graduate Medical School
Prof. Roger Beuerman’s team at the Singapore Eye Research Institute
(SERI) has with Dr. Chandra Verma’s group at BII developed novel
antimicrobials for some of the most resistant forms of bacteria and
also fungus. The combination of ocular chemo-molecular abilities
with the computational design efforts of Dr. Verma’s group has been
very productive. They have, over the last 5 years, successfully
developed unique molecules which formed the basis of two patents.
They have received around SGD$3M in grants from the Singapore
government in support and have generated considerable commercial
interest. Most recent, they have designed a molecule that has shown
spectacular activity against bacteria from patients that have shown
resistance to commonly used antibiotics. This is opening up a new
frontier in efforts to tackle the growing problem of bacterial resistance,
both in the clinic, and, worryingly, more recently, outside the clinic.
Also, recently a promising anti-fungal agent has been identified.
The problem of resistance requires the urgent development of new
antibiotics to prevent it from assuming epidemic proportions, and
the teams of Prof. Beuerman, Dr. Verma and associates at Nanyang
Technological University have formed a synergistic collaboration for
this purpose.
A/Prof. Gerhard Grüber
Associate Professor and Deputy Head,
Division of Structural and
Computational Biology, School of Biological Sciences, Nanyang Technological University
A/Prof. Gerhard Grüber has longstanding experience in structurefunction of multi-subunit complexes like the classes of ATP synthases
(A1AO ATPsynthase, F1FO ATPsynthases) and hydrolases (V-ATPase,
Helicase and AAA-ATPases). In order to get insight into the structure
of these macromolecular complexes, techniques like solution X-ray
scattering, X-ray crystallography, NMR- and fluorescence spectroscopy
are used in his laboratory. In a collaborative project with Dr. Frank
Eisenhaber (BII, A*STAR) the 45 kDa subunit PIG-K of the
glycosylphosphatidylinositol transamidase complex was generated,
purified and the first low resolution solution structure of this protein
has been determined. Since two successful years of collaboration
with Dr. Chandra Verma (BII, A*STAR), a platform has been generated
to implement the experimental structural data into docking and
molecular dynamics in order to provide an atomic level insight into
the structure, dynamics and energetics of the coupling subunits in
the biological motor proteins.
Dr. Nathan Andrew Baker
Dr. Kristian Vlahovicek
Associate Professor
Dept. of Biochemistry and Molecular Biophysics
Center for Computational Biology
Washington University
St. Louis, USA
Visit Period: 6 - 8 April 2009
Head of the Division of Biology
Bioinformatics Group
Department of Molecular Biology
Division of Biology Faculty of Science
University of Zagreb, Croatia
Visit Period: 22 October 2009
Dr. Gary McMaster
Prof. Constantino Tsallis
Chief Scientific Officer
Affymetrix, Inc., Fremont, California, USA
Visit Period: 23 April 2009
Professor
Brazilian Center for Physics Research
Brazil
Visit Period: 14 - 22 November 2009
Prof. Patrice Koehl
Professor, Computer Science
Associate Director of Bioinformatics
Genome Center, University of California, USA
Visit Period: 18 May 2009
Dr. Marc A. Marti-Renom
Head of the Structural Genomics Unit
Bioinformatics & Genomics Department
Prince Felipe Research Center
Valencia, Spain
Visit Period: 12 - 17 July 2009
Prof. Philippe Derreumaux
Dr. Birgit Eisenhaber
Research Scientist,
Mass Spectrometry Group,
Experimental Therapeutics Centre,
A*STAR
With a strong background in protein sequence analysis and function
prediction, Dr. Birgit Eisenhaber, currently affiliated with the
Experimental Therapeutics Center, A*STAR (ETC), provides her
expertise in collaboration projects within ETC and with other A*STAR
units in biomolecular mechanism-focused research. On the one
hand, the link with BII allows her to leverage on the bioinformatics
infrastructure, especially the usage of the ANNOTATOR suite in the
research; on the other hand, BII benefits from methodical
developments and from supervision of interns and new incoming
staff.
28
| ADJUNCT SCIENTISTS
Dr. Lim Yoon Pin
Senior Scientist,
Cancer Science Institute of Singapore
Assistant Professor,
Department of Biological Sciences,
National University of Singapore
Dr. Lim Yoon Pin’s laboratory is interested in the discovery of 1)
novel oncogenes in breast cancer; 2) novel tyrosine kinase
substrates in oncogenic EGFR signaling and 3) biomarkers in gastric
cancer. He has served as an advisor for the gastric cancer
knowledgebase created by BII. In collaboration with BII, an online
interactive biological interaction network (BIN) of EGFR signaling
has also been generated and this is hosted within BII’s webpage.
The BIN, which is an important resource for researchers in the field
of EGFR research, is constantly being updated as new data are
being produced in Dr. Lim’s laboratory.
Director
UPR9080 CNRS, IBPC at CNRS
Professor at University Paris Diderot - Paris 7
Paris, France
Visit Period: 22 July 2009
Dr. M. Michael Gromiha
Senior Research Scientist
Computational Biology Research Center
National Institute of Advanced Industrial Science
and Technology (AIST)
Tokyo, Japan
Visit Period: 4 - 5 October 2009
Prof. Alexander Lyubartsev
Prof. Frederic Rousseau
Group Leader
Flanders Institute for Biotechnology (VIB)
Free University of Brussels
Belgium
Visit Period: 13 - 18 December 2009
Prof. Joost Schymkowitz
Group Leader
Flanders Institute for Biotechnology (VIB)
Free University of Brussels
Belgium
Visit Period: 13 - 18 December 2009
Dr. Remo Rohs
Associate Research Scientist
Columbia University
New York, USA
Visit Period: 21 December 2009
Dr. Vasily V. Kuvichkin
Senior Staff Scientist
Laboratory of Mechanisms of Reception
Institute of Cell Biophysics of the Russian Academy
of Sciences
Moscow, Russia
Visit Period: 15 February - 8 March 2010
Professor
Division of Physical Chemistry
Stockholm University, Sweden
Visit Period: 22 October 2009
VISITING SCIENTISTS |
29
30
31
SCIENCE Outreach Activities
BII’s achievements go beyond scientific development through extending its contributions to the organization of two major events which were
the highlights during Science. 09. This annual national event promoting scientific awareness to the general public is jointly organized by
the Agency for Science, Technology and Research (A*STAR) and the Singapore Science Centre.
X-periment! Science Carnival
14 – 16 August 2009
Marina Square Central Atrium
Biopolis Flu Forum
4 September 2009
Auditorium, Matrix @ Biopolis
BII’s participation in the X-periment! science carnival allowed the audience
to explore “The World of Proteins” from their building blocks, the amino
acids, to their complex structures and evolutionary relationship. The visitors
were able to build up an amino acid and fold a small protein motif on the
palm of their hands. In addition, they interacted with different protein
molecules with the help of a game controller and see how humans can be
related to flies by playing the evolutionary protein sequence game.
The Science Centre and the A*STAR Research Institutes took
part in the Biopolis Flu Forum as part of Science.09. At the
Biopolis Flu Forum, a panel of scientists and doctors gathered
to discuss, debate and answer to the audience about general
and controversial questions concerning the swine flu 2009
outbreak. A wide range of questions from “what sort of steps
are public health officials taking to safeguard the public?” to
“have news organizations played a responsible role in educating
and explaining events to the public?” were addressed and covered
during the forum.
Conferences and Visits
The Bioinformatics Institute organised and participated in
international conferences, symposiums and conducted training
workshops such as follows:
Conference and Symposiums
• International Conference on Bioinformatics 2009
(InCoB 2009)
7 – 11 September 2009
• UK-Singapore Partners in Science Symposium:
UK-Singapore Symposium on Current Strategies in
Antimicrobial Therapies
16 – 17 March 2009
Organised by the BII, A*STAR, the Singapore Eye Research
Institute (SERI) and the British High Commission Singapore
• School of Biological Sciences, Nanyang Technological
University (SBS)-Bioinfomatics Institute Joint Symposium
21 October 2009
Co-organised by SBS, NTU and BII, A*STAR
• UK-Singapore Symposium on p53: The Next 30 Years
25 – 26 November 2009
Organised by the A*STAR, BII and British High Commission
Singapore
• 1st Singapore-Italy Joint Symposium on Biomedical Sciences
10 – 11 December 2009
Organised by the A*STAR BMRC RIs with support from
Regione Lombardia
Training Workshops
• The Advanced Flow Cytometry Workshop
23 – 24 July 2009
Organised by Biopolis Shared Facilities, A*STAR and BII,
A*STAR
• Joint BII - Department of Biological Sciences,
National University of Singapore (DBS) Workshop Modern Approaches to Biological Problems
3 – 4 August 2009
Organised by DBS, NUS and BII, A*STAR
• Joint School of Computer Engineering, Nanyang Technological
University (SCE)-BII Workshop on Bioinformatics and
Computational Biology
28 September 2009
Organised by SCE, NTU and BII, A*STAR
Visits to Bioinformatics Institute
The Bioinformatics Institute has also hosted a number of
delegations such as follows:
• The Hungarian delegation from the 2nd Singapore-Hungarian
Symposium on Biomedical Devices and Computational Sciences
24 April 2009
• The NUS-Zhejiang delegation
6 July 2009
• The Max Planck delegation
29 July 2009
• The Italian delegation from the 1st Singapore-Italy Joint
Symposium on Biomedical Sciences
11 December 2009
Photo by Tobias Gattermayer, BII
Acknowledgements
UK-Singapore Partners in Science Symposium: UK-Singapore Symposium on
Current Strategies in Antimicrobial Therapies, 16 – 17 March 2009
Photo by Vivek Tanavde, BII
Fernanda Sirota played a key role as BII’s representative in both
events. She was a member of the Biopolis Flu Forum working
committee and was assisted by Tobias Gattermayer, Sebastian
Maurer-Stroh and the BII’s web team in some aspects.
For the X-periment! Science Carnival, Fernanda invigorated the
carnival with many innovative ideas. The success of the 3-day
weekend event was achieved with valuable participation and
contributions from colleagues like Aliaksandr Yarmishyn,
Devanathan Raghunathan, Ivana Mihalek, Janos Kriston-Vizi,
Lee Tze Chuen, Lua Wai Heng, Madhumalar Arumugam Oleg
Grinchuk, Rowena Cheong, Sebastian Maurer-Stroh, Zhang Zong
Hong, Charlie Tan, Violet Lin & Hanafi Harron etc, whose team
efforts made the event exciting, fun and educational.
Visit by the Italian Delegation on 11 December 2009
Photo by Vivek Tanavde, BII
UK-Singapore Symposium on p53: The Next 30 Years, 25 – 26 November 2009
Photo by British High Commission Singapore
Photos by Fernanda Sirota and Christine Low, BII
30
| SCIENCE OUTREACH ACTIVITIES
CONFERENCES AND VISITS |
31
32
RECREATION CLUB
The Bioinformatics Institute Recreation Club (BII Rec Club) is a
voluntary group consisting of staff from various divisions, who come
together to organize activities that will foster cohesiveness and fun
in working in our institute. We organize team bonding and social
events for staff to interact with one another, building greater rapport
in our workplace. Besides this, we are aware that our work community
consists of people from different nationalities, thus we seek to achieve
an appreciation and understanding of various cultures by creating a
“Festivals of the World” link on our intranet, to allow staff and students
to share with us the festivals celebrated in their country.
Christmas Celebration with APSN Centre for Adults
Not forgetting our part in giving back to our community, we visited
the Association for Persons with Special Needs (APSN) Centre for
Adults. This is a voluntary welfare organization that caters to people
with mild intellectual disability. The centre provides skills training
to people who are intellectually challenged (IQ 50-70), so that their
students would be able to live an independent and fulfilling life in
the society. Members of BII brought cheers to the students in
celebrating Christmas with them in games, songs and gifts.
Highlights of Events:
Chek Jawa, Pulau Ubin
A trip to the last kampong (village) of Singapore! It was a great
getaway from the hustle and bustle of the main island of Singapore
to lush nature, fresh air, and tranquility. We took a tour to Chek
Jawa, where the several ecosystems, plants and animals, which are
fast disappearing for other parts of the world, can be seen. Besides
doing a nature walk, we played group games, which in all fostered
interaction and team bonding among staff of BII. We also organized
a photography contest on the trip, where we savored on the happenings
during the trip. Photo taking enthusiasts shared their great works
on our online Best Photo and Best Caption voting contest, which
every staff of BII was involved.
(Left to Right) Noraini SULAIMAN, Christine LOW, Betty KEE, FONG Chew Peng
33
ADMINISTRATIVE
TEAM
Creative Artwork by Talented Members of APSN
All set to go
Photo by Vivek Tanavde, BII
Best Caption - “If you really
love your bike, you push it”
The Administrative Team supports the institute’s leadership to create conditions for scientific work at BII. It also serves as a link to the
BMSI Business Centre (BBC), the centralized administrative body of A*STAR’s biomedical science institutes. Within this setting, the
Administrative Team facilitates all auxiliary services rendered such as administration, procurement, finance and human resource management,
to BII scientists so that the latter can concentrate on their areas of expertise in their research work.
BII LOCATION
ddress
A
Bioinformatics Institute
30 Biopolis Street
#07-01, Matrix
Singapore 138671
Group photo - “Say Cheese”
Handicrafts for Sale
BII Movie Screening – Kungfu Panda
We brought home movie screening to the workplace, top with great snack
and most importantly wonderful working partners. It was simply a time
of relaxation as we enjoy a good laugh over an entertaining movie.
Future Plans
The BII Rec club seeks to make working in BII a fun place to be. We hope
that the regular activities that we organize provide an avenue for members
of BII to mingle with one another, strengthening working relationship, and
fostering deeper friendship.
Committee Members 2009/2010
The BII Rec Club is led by Chairlady, Betty Tan and the committee comprises
of Janos Kriston-Vizi, Kavitha Bharatnam, Fala Atkha, Aliaksandr Yarmishyn,
Mohamed Hanifa, Zack Toh, Piroon Jenjaroenpun, Lua Wai Heng and
Vachiranee Limvipavadh.
By Car
For visitors who drive, please park your vehicle at B3 (basement 3)
and follow the signage “To Matrix Lift Lobby” to locate lift D. You
may take lift D to level 1 and approach the receptionist for the visitor’s
pass.
By Bus
The following are the Singapore Bus Service Numbers that stop along
North Buona Vista Road:
74, 91, 92, 95, 191, 196, 198, 200
By MRT
Board the East-West line heading towards Boon Lay and alight at
Buona Vista MRT Station. After alighting, you may take the one-north
free shuttle bus service to Biopolis, which operate from 7.30 am to
7.30 pm every Monday to Friday and 7:30 am to 1:30 pm on
Saturday.
The show is on... ssshhh
32
| RECREATION CLUB
Image Source for Back Cover Design: Computer Model of MDM2 Interacting with Inhibitors by Chandra Verma and Team
Map courtesy of JTC Corporation