this article
Transcription
this article
MetastasisWay: A Curation and Pathway Visualization System for Metastasis Hong-Jie Dai3, Nai-Wen Chang1,2, Chu-Hsien Su1, Ming-Siang Huang1, Po-Ting Lai4, Wen-Lian Hsu1 1 Institute of Information Science, Academia Sinica, Taiwan Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taiwan 3 Department of Computer Science and Information Engineering, National Taitung University, Taiwan 4 Department of Computer Science, National Tsing-Hua University, Taiwan 2 Team number: #282 Introduction Metastasis refers to the spread of a cancer from its primary site to other parts of the body (secondary sites), while maintaining its malignant growth. The transmission of malignancy is often the major concern of patients and clinicians, as it results in the death of over 90% of cancer patients. Predication of metastasis is a highly challenging task due to the dynamic nature of cancers. Two tumors with the exact same diagnosis may differ in their progression, as one move to a secondary site but the other does not. Recently, the increasing awareness of biological signaling pathways and their role in metastasis has enabled life scientists to acquire a more comprehensive overview of the metastatic process. Studies have supported the potential use of gene-specific target therapies in treating metastasis. Additional clinical trials will then be conducted to validate this finding by examining drug-treated patient samples. Increased understanding of the roles of genes in the metastatic mechanism can lead to improved survival of cancer patients through the control of metastasis. However, the complexity of gene-cancer interactions stands as the major obstacle that prevents insight into these relations. In light of this, the work attempts to develop a curation system that can construct metastatic pathways by integrating the extensive information within the large collection of research papers. The extracted information can be used to help clinicians and researchers enhance their understanding of metastasis mechanisms, verify drug and gene interactions, and support clinical decisions. System Overview Following our previous success in developing a browser extension to assist bio-curaiton [1], the curation interface of MetastasisWay will be implemented as a browser extension and the entire curation process will be directly conducted on PubMed. In contrast to our previous work, MetastasisWay attempts to recognize a wider range of biomedical concepts including gene/protein, metastasis, cancer, cytoskeleton, cell line/movement/adhesion, tissue, microRNA, organ, gene expression and experimental techniques. As shown in Figure 1, the front-end of MetastasisWay will allow curators to modify the information of all of the recognized concepts above. Figure 1.The annotation interface for recognized concepts Currently, there is no available corpus that contains annotations for all concepts of interest. Therefore, we will employ a principle-based algorithm (PBA) to identify metastasis-related terminologies and extract relations among them to construct the pathway. In PBA, we use a collection of frames to represent linguistic concepts or rules. Each frame is a collection of slots with relations specified among them. A slot can be a word, phrase, semantic category, or another frame concept. One can specify position relations, collocation relations, agreement relations and others among the slots. Unlike normal templates that involve mostly left-right relations among its components in a sentence, relations within frames can be multi-dimensional. For example, one slot can be a variable indicating the topic which other slots belong. A frame can be manually created or generated by the supervised pattern generation algorithm. Figure 2.The developed slots and frames for microRNA To illustrate our partial matching scheme, consider the third simple frame concept (INSTANCE in Figure 2) involving 5 slots (HAS-PART in Figure 2) such that their relations in a sentence are arranged as 1, 7, 2, 7, 3, 7, 6 from left to right. Suppose we were able to identify slots 1, 7, 2, 7, 3 following this order in a sentence. While slots 7 and 6 are missing (deletion), some words may also exist between two neighboring slots (insertion), such as 1 and 7 or 3 and 7. Furthermore, slot 1 can be matched through the word-sense rather than the word themselves (substitution). Our partial matching scheme allows for insertion, deletion and substitution. An insertion is given a positive score if it tends to collocate with its neighboring matched slot. Otherwise, a negative score is assigned for the insertion. A deletion can be harmless if the missing slot is not included in the key combination of a frame. Note that many key combinations can be pre-specified as indices of a frame. Collocations and bigram statistics can be incorporated to estimate the score of different combinations. A substitution is given a lower score depending on their proximity on a semantic tree. After all these scores are determined, we can use an alignment algorithm to measure the fitness score and to decide how well the frame matches with the sentence. The PBA can be applied to a variety of information extraction tasks, such as concept recognition and relation extraction. The PBA example described was developed for microRNA recognition, and its precision and recall on the microRNA corpus [2] are 0.986, and 0.953, respectively. All annotation tasks, including concept recognition and relation extraction, will be performed on our annotation server, and the relation extraction results will be returned to the curation interface of MetastasisWay and directly displayed on PubMed. Figure 3 shows an example of the network constructed from the content of Figure 1. The developed system will be available at http://btm.tmu.edu.tw/metastasisWay. Figure 3. The pathway constructed for Figure 1 Browser compatibility The curation interface of MetasisWay will support Google Chrome and Mozilla Firefox. The constructed metastasis database can be accessed by any browsers. Proposed Tasks Search PubMed using the predefined query terms, e.g. “MeSH terms (Neoplasm Metastasis)” with the specified query tags, e.g. "EMT [title/abstract] AND TGF beta [title/abstract]" to generate a list of abstracts related to metastasis. Classify whether the abstract is curatable. If the abstract is a curatable target, the curator should extract the following information: 1. PMID of the abstract 2. Gene terms and its corresponding gene ID from Entrez Gene 3. Evidence sentence containing metastasis information 4. Biomedical concepts associated with metastasis within the evidence sentence, including neoplasm metastasis, cytoskeleton, cell movement, cell adhesion, tissue, body part, microRNA, cancer, cell line, investigative technologies, gene expression, and other relevant concept types (e.g. tumor growth, liver pathology, cancer patient survival condition) 5. Classification of the relations between concepts 6. Pathway construction based on the relations, such as the regulation between gene and cancer, metastasis and other concepts The constructed pathway should involve genes, metastases, and cancers that bear potential significance. Probable templates for pathway construction are listed as follows: Gene → Gene → Metastasis → Body Part/Tissue The order of the concepts in the pathway elaborates how the biological process happened. Take the pathway template above as an example. It means that the interaction between genes resulted in metastasis to certain body parts or tissues. Gene → Cancer → Body Part/Tissue It means that the regulation of a certain gene caused cancer metastasis to certain body parts or tissues. Metastasis → Body Part → Cancer Patient Survival Condition This describes the survival condition of a cancer patient influenced by the metastasis to certain body parts. Drug/Therapy → Cancer Patient Survival Condition This describes the survival condition of a cancer patient influenced by the target drug or therapy. Take the sentence "CXCR7 also binds to CXCL12 and has been recently found to enhance lung and breast primary tumor growth, as well as metastasis formation" shown in Figure 1 as an example. The curator should construct the pathway presented in Figure 3 using the Graphviz’s dot language1. 1 http://www.graphviz.org/content/dot-language The task will be run both manually and using the MetastasisWay system. We will report the annotation performance in terms of precision/recall/F-score, and compare the curation time required with and without text mining-assistance. We may also consider comparing the constructed network with the data from ingenuity pathways analysis database1. Manual task: Curators will be given a list of PubMed abstracts for further processing, and should submit the information of interest within a web form. Information of interest includes the aforementioned biomedical concepts associated with metastasis and the relationship between these concepts. Curators have to list all biomedical concepts involved in metastasis-related relations for each abstract. Using MetastasisWay system: For each abstract of the given list, curators will be presented with a list of extracted biomedical concepts, the evidence sentences and a graph of metastasis pathway generated by the MetastasisWay system. Curators will examine the information extracted by our system regarding the context, analyze their differences and offer suggestions for further improvement. Targeted User Community The target user community can be anyone that hopes to discover the relationship between gene/gene product and metastasis through text mining. MetastasisWay can offer potential metastatic relations based on our text mining techniques, and curators can manually validate the results based on their domain knowledge through the convenient platform offered by the BioCreative interactive curation task (IAT). This work will be in collaboration with candidate curators from Dr. Ueng-Cheng Yang’s lab of Institute of Biomedical Informatics, National Yang-Ming University in Taiwan, and any potential curators recruited by IAT. Proposed Curators Dr. Rofe-Amor Obena, Institute of Chemistry, Academia Sinica, Taipei, Taiwan Dr. Yan-Hua Huang, Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan M.D. Zheng-Da Chen, Institute of Biomedical Informatics, National Yang-Ming 2 University, Taipei, Taiwan M.D. Ming-Siang Huang, Institute of Information Science, Academia Sinica, Taipei, Taiwan M.D. Syed-Abdul Shabbir, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan System Status We are now developing the frontend for the MetastasisWay and wrapping the annotation tools as web services. The entire system should be available on time before June 2015. References 1. Dai HJ, Wu JC, Lin WS, Reyes AJ, Dela Rosa MA, Syed-Abdul S, Tsai RT, Hsu WL: LiverCancerMarkerRIF: a liver cancer biomarker interactive curation system combining text mining and expert annotations. Database (Oxford) 2014, 2014. 2. Bagewadi S, Bobić T, Hofmann-Apitius M, Fluck J, Klinger R: Detecting miRNA Mentions and Relations in Biomedical Literature. F1000Research 2014, 3(205). 2 http://www.ingenuity.com/products/ipa