this article

Transcription

this article
MetastasisWay: A Curation and Pathway Visualization System
for Metastasis
Hong-Jie Dai3, Nai-Wen Chang1,2, Chu-Hsien Su1, Ming-Siang Huang1, Po-Ting Lai4,
Wen-Lian Hsu1
1
Institute of Information Science, Academia Sinica, Taiwan
Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taiwan
3
Department of Computer Science and Information Engineering, National Taitung University, Taiwan
4
Department of Computer Science, National Tsing-Hua University, Taiwan
2
Team number: #282
Introduction
Metastasis refers to the spread of a cancer from its primary site to other parts of the body
(secondary sites), while maintaining its malignant growth. The transmission of malignancy is
often the major concern of patients and clinicians, as it results in the death of over 90% of
cancer patients. Predication of metastasis is a highly challenging task due to the dynamic
nature of cancers. Two tumors with the exact same diagnosis may differ in their progression,
as one move to a secondary site but the other does not. Recently, the increasing awareness of
biological signaling pathways and their role in metastasis has enabled life scientists to
acquire a more comprehensive overview of the metastatic process. Studies have supported
the potential use of gene-specific target therapies in treating metastasis. Additional clinical
trials will then be conducted to validate this finding by examining drug-treated patient
samples. Increased understanding of the roles of genes in the metastatic mechanism can lead
to improved survival of cancer patients through the control of metastasis. However, the
complexity of gene-cancer interactions stands as the major obstacle that prevents insight into
these relations. In light of this, the work attempts to develop a curation system that can
construct metastatic pathways by integrating the extensive information within the large
collection of research papers. The extracted information can be used to help clinicians and
researchers enhance their understanding of metastasis mechanisms, verify drug and gene
interactions, and support clinical decisions.
System Overview
Following our previous success in developing a browser extension to assist bio-curaiton [1],
the curation interface of MetastasisWay will be implemented as a browser extension and the
entire curation process will be directly conducted on PubMed. In contrast to our previous
work, MetastasisWay attempts to recognize a wider range of biomedical concepts including
gene/protein, metastasis, cancer, cytoskeleton, cell line/movement/adhesion, tissue,
microRNA, organ, gene expression and experimental techniques. As shown in Figure 1, the
front-end of MetastasisWay will allow curators to modify the information of all of the
recognized concepts above.
Figure 1.The annotation interface for recognized concepts
Currently, there is no available corpus that contains annotations for all concepts of interest.
Therefore, we will employ a principle-based algorithm (PBA) to identify metastasis-related
terminologies and extract relations among them to construct the pathway. In PBA, we use a
collection of frames to represent linguistic concepts or rules. Each frame is a collection of
slots with relations specified among them. A slot can be a word, phrase, semantic category,
or another frame concept. One can specify position relations, collocation relations,
agreement relations and others among the slots. Unlike normal templates that involve mostly
left-right relations among its components in a sentence, relations within frames can be
multi-dimensional. For example, one slot can be a variable indicating the topic which other
slots belong. A frame can be manually created or generated by the supervised pattern
generation algorithm.
Figure 2.The developed slots and frames for microRNA
To illustrate our partial matching scheme, consider the third simple frame concept
(INSTANCE in Figure 2) involving 5 slots (HAS-PART in Figure 2) such that their relations
in a sentence are arranged as 1, 7, 2, 7, 3, 7, 6 from left to right. Suppose we were able to
identify slots 1, 7, 2, 7, 3 following this order in a sentence. While slots 7 and 6 are missing
(deletion), some words may also exist between two neighboring slots (insertion), such as 1
and 7 or 3 and 7. Furthermore, slot 1 can be matched through the word-sense rather than the
word themselves (substitution). Our partial matching scheme allows for insertion, deletion
and substitution. An insertion is given a positive score if it tends to collocate with its
neighboring matched slot. Otherwise, a negative score is assigned for the insertion. A
deletion can be harmless if the missing slot is not included in the key combination of a frame.
Note that many key combinations can be pre-specified as indices of a frame. Collocations
and bigram statistics can be incorporated to estimate the score of different combinations. A
substitution is given a lower score depending on their proximity on a semantic tree. After all
these scores are determined, we can use an alignment algorithm to measure the fitness score
and to decide how well the frame matches with the sentence. The PBA can be applied to a
variety of information extraction tasks, such as concept recognition and relation extraction.
The PBA example described was developed for microRNA recognition, and its precision and
recall on the microRNA corpus [2] are 0.986, and 0.953, respectively.
All annotation tasks, including concept recognition and relation extraction, will be
performed on our annotation server, and the relation extraction results will be returned to the
curation interface of MetastasisWay and directly displayed on PubMed. Figure 3 shows an
example of the network constructed from the content of Figure 1. The developed system will
be available at http://btm.tmu.edu.tw/metastasisWay.
Figure 3. The pathway constructed for Figure 1
Browser compatibility
The curation interface of MetasisWay will support Google Chrome and Mozilla Firefox. The
constructed metastasis database can be accessed by any browsers.
Proposed Tasks
Search PubMed using the predefined query terms, e.g. “MeSH terms (Neoplasm Metastasis)”
with the specified query tags, e.g. "EMT [title/abstract] AND TGF beta [title/abstract]" to
generate a list of abstracts related to metastasis.
Classify whether the abstract is curatable. If the abstract is a curatable target, the curator
should extract the following information:
1. PMID of the abstract
2. Gene terms and its corresponding gene ID from Entrez Gene
3. Evidence sentence containing metastasis information
4. Biomedical concepts associated with metastasis within the evidence sentence, including
neoplasm metastasis, cytoskeleton, cell movement, cell adhesion, tissue, body part,
microRNA, cancer, cell line, investigative technologies, gene expression, and other
relevant concept types (e.g. tumor growth, liver pathology, cancer patient survival
condition)
5. Classification of the relations between concepts
6. Pathway construction based on the relations, such as the regulation between gene and
cancer, metastasis and other concepts
The constructed pathway should involve genes, metastases, and cancers that bear potential
significance. Probable templates for pathway construction are listed as follows:




Gene → Gene → Metastasis → Body Part/Tissue
 The order of the concepts in the pathway elaborates how the biological process
happened. Take the pathway template above as an example. It means that the
interaction between genes resulted in metastasis to certain body parts or tissues.
Gene → Cancer → Body Part/Tissue
 It means that the regulation of a certain gene caused cancer metastasis to certain
body parts or tissues.
Metastasis → Body Part → Cancer Patient Survival Condition
 This describes the survival condition of a cancer patient influenced by the
metastasis to certain body parts.
Drug/Therapy → Cancer Patient Survival Condition
 This describes the survival condition of a cancer patient influenced by the target
drug or therapy.
Take the sentence "CXCR7 also binds to CXCL12 and has been recently found to enhance
lung and breast primary tumor growth, as well as metastasis formation" shown in Figure 1 as
an example. The curator should construct the pathway presented in Figure 3 using the
Graphviz’s dot language1.
1
http://www.graphviz.org/content/dot-language
The task will be run both manually and using the MetastasisWay system. We will report the
annotation performance in terms of precision/recall/F-score, and compare the curation time
required with and without text mining-assistance. We may also consider comparing the
constructed network with the data from ingenuity pathways analysis database1.
 Manual task: Curators will be given a list of PubMed abstracts for further processing,
and should submit the information of interest within a web form. Information of interest
includes the aforementioned biomedical concepts associated with metastasis and the
relationship between these concepts. Curators have to list all biomedical concepts
involved in metastasis-related relations for each abstract.
 Using MetastasisWay system: For each abstract of the given list, curators will be
presented with a list of extracted biomedical concepts, the evidence sentences and a
graph of metastasis pathway generated by the MetastasisWay system. Curators will
examine the information extracted by our system regarding the context, analyze their
differences and offer suggestions for further improvement.
Targeted User Community
The target user community can be anyone that hopes to discover the relationship between
gene/gene product and metastasis through text mining. MetastasisWay can offer potential
metastatic relations based on our text mining techniques, and curators can manually validate
the results based on their domain knowledge through the convenient platform offered by the
BioCreative interactive curation task (IAT). This work will be in collaboration with candidate
curators from Dr. Ueng-Cheng Yang’s lab of Institute of Biomedical Informatics, National
Yang-Ming University in Taiwan, and any potential curators recruited by IAT.
Proposed Curators





Dr. Rofe-Amor Obena, Institute of Chemistry, Academia Sinica, Taipei, Taiwan
Dr. Yan-Hua Huang, Institute of Biomedical Informatics, National Yang-Ming University,
Taipei, Taiwan
M.D. Zheng-Da Chen, Institute of Biomedical Informatics, National Yang-Ming
2
University, Taipei, Taiwan
M.D. Ming-Siang Huang, Institute of Information Science, Academia Sinica, Taipei,
Taiwan
M.D. Syed-Abdul Shabbir, College of Medical Science and Technology, Taipei Medical
University, Taipei, Taiwan
System Status
We are now developing the frontend for the MetastasisWay and wrapping the annotation tools
as web services. The entire system should be available on time before June 2015.
References
1. Dai HJ, Wu JC, Lin WS, Reyes AJ, Dela Rosa MA, Syed-Abdul S, Tsai RT, Hsu WL:
LiverCancerMarkerRIF: a liver cancer biomarker interactive curation system
combining text mining and expert annotations. Database (Oxford) 2014, 2014.
2. Bagewadi S, Bobić T, Hofmann-Apitius M, Fluck J, Klinger R: Detecting miRNA
Mentions and Relations in Biomedical Literature. F1000Research 2014, 3(205).
2
http://www.ingenuity.com/products/ipa