MedicineAsk: An intelligent search facility for information about

Transcription

MedicineAsk: An intelligent search facility for information about
MedicineAsk: An intelligent search facility for information
about medicines
Ricardo João Galamba de Sepúlveda Ferrão
Thesis to obtain the Master of Science Degree in
Information Systems and Computer Engineering
Supervisors:
Prof. Helena Isabel de Jesus Galhardas
Prof. Maria Luísa Torres Ribeiro Marques da Silva Coheur
Examination Committee
Chairperson:
Supervisor:
Member of the Committee:
Prof. Miguel Nuno Dias Alves Pupo Correia
Prof. Helena Isabel de Jesus Galhardas
Prof. Ana Teresa Correia de Freitas
October 2014
ii
Resumo
˜ é muito importante no campo de medicina. As interfaces em L´ngua
O acesso rápido e fácil a informaçao
˜ uma das maneiras de aceder a este tipo de informaçao.
˜ O MedicineAsk é um protótipo de
Natural sao
ˆ sobre medicamentos e substancias
ˆ
software que procura responder a perguntas em Portugues
activas.
Foi concebido para ser fácil de usar tanto por pessoal médico como utilizadores comuns. As respostas
perguntas sao
˜ obtidas através de informaçao
˜ previamente extra´da do Prontuário Terapeutico
ˆ
as
do
˜ do módulo de
Infarmed e armazenada numa base de dados relacional. Esta tese descreve a extensao
processamento de L´ngua Natural do MedicineAsk. Focamo-nos em aumentar a quantidade de perguntas de utilizadores que é poss´vel responder. Em primeiro lugar, adicionámos técnicas de aprendizagem
˜ de perguntas usando Support Vector Machines. Em segundo lugar, foi imautomática para classicaçao
plementado suporte para perguntas que incluem anáfora e elipses. Finalmente melhorámos o detector
˜ anterior do MedicineAsk. Realizámos uma validaçao
˜ sobre cada
de sinónimos implementado na versao
˜ ao MedicinesAsk e identicámos as limitaçoes
˜ encontradas, sugerindo algumas soluçoes.
˜
nova adiçao
˜ melhorada do Processador de L´ngua Natural do MedicineAsk respondeu a 17% mais pergunA versao
˜ anterior do MedicineAsk, e ainda 5% mais perguntas ao tratar de anáforas. Esta tese
tas que a versao
relata também o estado da arte de sistemas de pergunta-resposta no dom´nio médico, de outros tipos
˜ web na área de medicina e de sistemas de recuperaçao
˜ de informaçao
˜ médica.
de aplicaçoes
˜ de Anáforas.
Palavras-chave: L´ngua Natural, Medicina, Support Vector Machines, Resoluçao
iii
iv
Abstract
Obtaining information quickly and easily is very important in the medical eld. Natural Language Interfaces are one way to access this kind of information. MedicineAsk is a prototype that seeks to answer
Portuguese Natural Language questions about medicines and active substances. It was designed to
be easy to use so that questions may be posed by both medical staff and common users. Questions
are answered through information previously extracted from the Infarmed's Therapeutic Handbook and
stored in a relational database. This thesis describes the extension of the Natural Language processing
module of MedicineAsk. We focused on increasing the quantity of answerable user questions. First, we
added machine learning techniques for question classication by using Support Vector Machines. Second, support for questions including anaphora and ellipsis has been implemented. Third, we extended
the synonym detection feature implemented in the previous version of MedicineAsk. We performed a
validation over each of the new MedicineAsk features. Our improved MedicineAsk NLI answered 17%
more questions than the previous version of MedicineAsk, with a further 5% increase when handling
anaphora. We identied current limitations of MedicineAsk and suggested some solutions. This document also shows the state of the art on medical domain question answering systems, on other types of
web-based systems in the area of medicine and on information retrieval systems for medical information.
Keywords: Natural Language, Medicine, Support Vector Machines, Anaphora Resolution
v
vi
Contents
Resumo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
iii
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
v
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xi
List of Figures
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv
Nomenclature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1 Introduction
1
1.1 Web-based system for querying data about medicines in Portuguese . . . . . . . . . . . .
3
1.2 Proposed solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
1.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
2 Related Work
7
2.1 Medical Question Answering systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
2.2 General Domain Question Answering systems . . . . . . . . . . . . . . . . . . . . . . . .
10
2.3 Medical Text Information Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
2.4 Web-based Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
2.5 Existing Algorithms for Anaphora Resolution . . . . . . . . . . . . . . . . . . . . . . . . . .
12
3 Background
15
3.1 The MedicineAsk Prototype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
3.1.1 Information Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
3.1.2 Natural Language Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18
3.1.3 Validation of the MedicineAsk prototype . . . . . . . . . . . . . . . . . . . . . . . .
21
3.2 LUP: A Language Understanding Platform . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
4 Improvements to the Natural Language Interface module
25
4.1 Automatic question classication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
4.1.1 Answering MedicineAsk questions with SVM . . . . . . . . . . . . . . . . . . . . .
26
4.1.2 Adding SVM to MedicineAsk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
4.2 Anaphora and Ellipsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
vii
4.2.1 Proposed Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
4.2.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
4.3 Synonyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
4.3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
4.3.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
5 Validation
35
5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35
5.2 Rule-based versus Automatic Question Classication
. . . . . . . . . . . . . . . . . . . .
38
5.2.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
5.2.2 Discussion and Error Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
5.3 Integrating SVM into the MedicineAsk NLI . . . . . . . . . . . . . . . . . . . . . . . . . . .
42
5.4 First Experiment - Combining Question Answering approaches . . . . . . . . . . . . . . .
42
5.4.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43
5.4.2 Discussion and Error Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43
5.5 Second Experiment - Increasing the training data . . . . . . . . . . . . . . . . . . . . . . .
44
5.5.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
44
5.5.2 Discussion and Error Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
44
5.6 Anaphora Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
46
5.6.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
5.6.2 Discussion and Error Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
5.7 Testing Test Corpus B with anaphora resolution . . . . . . . . . . . . . . . . . . . . . . . .
48
5.7.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
48
5.7.2 Discussion and Error Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
48
5.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49
6 Conclusions
51
6.1 Summary and Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
51
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
52
6.2.1 Additional question types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
52
6.2.2 Additional strategies for question answering technique combinations . . . . . . . .
53
6.2.3 Addressing common user mistakes . . . . . . . . . . . . . . . . . . . . . . . . . . .
53
6.2.4 User evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
54
6.2.5 Question type anaphora . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
54
6.2.6 Updates to Information Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . .
55
6.2.7 MedicineAsk on mobile platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
6.2.8 Analysing Portuguese NL questions with information in other languages . . . . . .
56
Bibliography
61
A Questionnaire used to obtain test corpura
63
viii
B Dictionary used to identify named medical entities in a user question (Excerpt)
67
C Training Corpus B Excerpt
69
ix
x
List of Tables
2.1 Question Analysis steps in MEANS. The input of a step is the output resulting from the
previous step. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
5.1 Number of questions for each question type in the Training Corpus A. . . . . . . . . . . .
36
5.2 Number of user questions and expected answer type for each scenario for Test Corpus A
36
5.3 Number of questions for each question type in Training Corpus B. . . . . . . . . . . . . . .
37
5.4 Number of user questions and expected answer type for each scenario for Test Corpus B
37
5.5 Percentage of questions with anaphora correctly classied . . . . . . . . . . . . . . . . . .
47
xi
xii
List of Figures
1.1 Results of searching for “medicines for headache” using the eMedicine website. . . . . . .
2
3.1 Architecture of MedicineAsk Information Extraction module. Image taken from [Mendes,
2011]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
3.2 Architecture of MedicineAsk Natural Language Interface module. Image taken from [Mendes,
2011]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
3.3 LUP architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22
4.1 Strategy 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
4.2 Strategy 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
4.3 Strategy 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
4.4 Answering a question with no anaphora and storing its information in the Antecedent
Storage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30
4.5 Solving anaphora in a question. Paracetamol is ignored because it is an active substance. 31
5.1 Scenario used to encourage users to use anaphora. . . . . . . . . . . . . . . . . . . . . .
38
5.2 Percentage of correctly classied questions by scenario for common users after improvements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
5.3 Percentage of correctly classied questions by scenario for users in the medical eld after
improvements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4 Percentage of correctly classied questions by scenario for all users after improvements.
39
40
5.5 Percentage of correctly classied questions by scenario for common users, using string
similarity techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
5.6 Percentage of correctly classied questions by scenario for users in the medical eld,
using string similarity techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
41
5.7 Percentage of correctly classied questions by scenario for common users for the First
Experiment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43
5.8 Percentage of correctly classied questions by scenario for users in the medical eld for
the First Experiment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43
5.9 Percentage of correctly classied questions by scenario for all users for the First Experiment. 44
5.10 Percentage of correctly classied questions by scenario for all users for the Second Experiment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xiii
45
5.11 Percentage of correctly classied questions by scenario for all users for the Second Experiment with Anaphora Resolution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xiv
48
Chapter 1
Introduction
Every day, the data available on the Internet, in particular medical information, increases signicantly.
Medical staff and common users may have interest in accessing this information. After visiting their
doctors, common users may want to complement the information they received about their diseases
and medication. Medical staff, such as doctors, nurses and medicine students, may want to consult the
information available on the web to clarify doubts, conrm a medication or to stay updated with the latest
information. Due to its nature, medical information often has to be accessed quickly. For example, a
doctor may need to quickly access information in order to treat an emergency patient. A common user
may have lost the information regarding one of his medicines and thus needs to urgently access the
correct dosage for that medicine. For these reasons, on-line medical information should be available
through an interface that is fast and easy to use by most people.
There is a large amount of medical information of many different types and formats currently available
on-line. This information is contained in either databases of medicines and diseases or collections of
papers containing important medical ndings. Currently, to access this on-line information, users must
either do it by hand (i.e., by manually reading a large volume of information and/or navigate through
an index), learn a language to query the data (e.g., learn SQL to query a database with medical information), or use a keyword-based system. Manually navigating through medical information can be a
complex task for a common user because the medical terms used to organize the information are often
too complicated for the common user to understand. Learning a language to query data stored in a
database is also something that cannot be expected from a common user. In what concerns keywordbased systems, keywords are considered without taking into account the connection between them.
For example, if a user searches for “medicines for headache”, the system will retrieve any document
referencing fever, even if that term is only mentioned briey. This makes it difcult for the user to nd
the information he/she is looking for. To illustrate this issue, Figure 1.1 shows the result of searching for
“medicines for headache” on eMedicine1 , a website which allows users to search for medical information
through keyword-based search. Note that even after ltering the results to show only results related to
“Drugs” there are still 1180 results. The terms that are used on the results are also not trivial. For a
1 http://emedicine.medscape.com/.
1
common user, it would not be easy to nd the information required among these results.
Figure 1.1: Results of searching for “medicines for headache” using the eMedicine website.
One alternative to access medical information is through a Natural Language Interface (NLI). It has
been shown that users often prefer Natural Language Interfaces over other methods such as keywordbased search [Kaufmann and Bernstein, 2007]. While some NLIs have been developed for the medical
eld [Ben Abacha, 2012], they are still relatively new and none is available for the Portuguese language.
This means that a Portuguese user who wants to access medical information available on-line must either use a system in a foreign language, which the user may not be uent in, or use a traditional method,
such as keyword-based search, like the one available in the Infarmed website2 . Among various on-line
3 (Therapeutic Handbook), which publishes data
ˆ
services, Infarmed provides the Prontuário Terapeutico
about medicines and active substances approved to be sold in the Portuguese market. From hence forth,
we refer to this information source as the “Infarmed website”. The Infarmed website contains data about
medicines and active substances, such as their indications, adverse reactions, precautions, dosages
and prices, among others. The user may access the data available on the Infarmed website by navigating an hierarchical index (which works similarly to the index of a book) or by using a keyword-based
search. As previously mentioned, navigating an index requires some knowledge of medical terms, and
keyword-based search can provide incorrect or irrelevant results.
2 http://www.infarmed.pt/portal/page/portal/INFARMED.
3 http://www.infarmed.pt/prontuario/index.php.
2
1.1 Web-based system for querying data about medicines in Portuguese
MedicineAsk is a question-answering prototype for information about medicines and active substances.
This prototype was developed in the context of two master thesis [Bastos, 2009] [Mendes, 2011].
MedicineAsk intends to solve the problems of the Infarmed website previously explained, by providing
an NLI for the Infarmed website. The idea is that, by using an NLI, both common users and medical staff
will be able to access the data on the Infarmed website in an easier and faster way. The MedicineAsk
architecture is divided into two modules: Information Extraction and a Natural Language Interface.
The Information Extraction module is responsible for extracting information from the Infarmed website, processing it and inserting it into a relational database.
The Natural Language Interface enables users to access the data on the Infarmed website. Users
can query about specic data regarding active substances and medicines, such as the price of a specic medicine or the indications of an active substance. It is also possible to ask more sophisticated
questions, such as medicines for a certain disease that do not have precautions with a given medical
condition. The second version of MedicineAsk improved the rst one since it was able to answer a larger
number of questions.
1.2 Proposed solution
The NLI of MedicineAsk still has limitations regarding what a user can ask the system. Questions that
contain anaphora or ellipsis cannot be answered. In other words, if a question makes a reference to a
medical entity mentioned on a previous question, MedicineAsk cannot answer that question.
Furthermore, the previous version of MedicineAsk uses rule-based techniques to answer questions.
While these techniques can achieve good results, they also require a user's question to match a certain
pattern, or to contain certain keywords. Machine learning techniques suffer less from these issues, and
possibly achieve better results than the techniques used by the previous version of MedicineAsk.
Previously, some MedicineAsk features were not nished or fully explored due to time constraints.
Namely, a synonym detection feature was added to MedicineAsk, but due to no comprehensive list of
synonyms being found at the time this feature was not fully explored.
1.3 Contributions
This thesis resulted in a paper and poster that were published in the “10th International Conference on
Data Integration in the Life Sciences”. The main contributions of this thesis are:
1) Incorporation of automatic question classication techniques for question answering
The techniques used in MedicineAsk NLI module are rule-based and keyword spotting. Rule-based
techniques have the advantage of providing very good results if a user poses a question that exactly
3
matches the patterns specied by the rules. However, it is common that a user poses a question in a
way that the developers of the rules did not think of. A keyword spotting technique classies a question
based on the presence of certain keywords. A question about dosages would have the keyword “dose”
for example. This technique relies on dictionaries which contain these keywords. The contents of these
dictionaries are either manually inserted or automatically collected from a given resource. Dictionaries
manually constructed are difcult to expand. Dictionaries that were collected by, for example, extracting
all the medicine names from a website and inserting them into the dictionary, may contain incorrect
keywords. This can lead to questions being wrongly classied.
Machine Learning techniques can be used to minimize the developers work. These techniques learn
how to analyse a question without following strict rules. For this reason, Machine Learning techniques
can classify user questions without having to match any developer-made patterns. We integrate Support
Vector Machine in MedicineAsk and observe how it compares to the previous version of MedicineAsk.
We performed tests to compare different combination strategies, such as using only SVM to interpret
questions versus combining SVM with rule-based and keyword spotting techniques. The results obtained are reported in Chapter 5 show that SVM does in fact improve the question answering capabilities
of MedicineAsk.
The way machine learning techniques are integrated into MedicineAsk is by using the LUP system.
LUP is a platform that can be used to run different Natural Language Understanding techniques (including SVMs) and compare the results of those techniques.
2) Anaphora and Ellipsis
The current version of MedicineAsk does not support questions which use anaphora and ellipsis.
An anaphora occurs when there is a reference in a sentence to a previously mentioned entity. For
example, consider that a user inputs the following two questions: “What are the indications of Mizollen?
And the adverse reactions of that medicine?”. In the second question, “of that medicine” references to
“Mizollen”, but MedicineAsk does not know this and thus, will not be able to answer this question. Ellipsis
is a special type of anaphora where the word referencing the previous entity is implicit. For example, in
the pair of questions “What are the indications of paracetamol? What are the adverse reactions?”. In
this work we add support for these types of questions by keeping a short history of questions. In the
case of anaphora, MedicineAsk analyses the history and chooses the most recent entity to answer the
question with anaphora. This part of the work was made portable, meaning it can be used for other
question answering environments that represent questions in a similar way to MedicineAsk. We have
performed tests to determine if anaphora resolution brings improvements to MedicineAsk and found the
results favourable.
3) Synonym detection
In the previous version of MedicineAsk, a list of synonyms of medical conditions, that was stored in
the database, was implemented. This feature is useful because websites such as Infarmed often use
terms for medical conditions that are not common language and thus not used by the common user.
For example most people say “febre” to refer “fever” in Portuguese but in the medical domain the term
4
“pirexia” is often used instead. Without synonyms users could potentially think no information about
“fever” was available in our system because our system is only aware of “pirexia”.
We enriched the list of synonyms stored in the database with synonyms of medical terms extracted
from the Priberam website4 , which is an online portuguese dictionary.
1.4 Thesis Outline
This document is organized into six chapters. Chapter 2 details the related work, in particular describes
other question answering systems in the eld of medicine. Chapter 3 explains the previous versions of
MedicineAsk and the LUP system. Chapter 4 describes the improvements made to the NLI module of
MedicineAsk in the scope of this thesis. Chapter 5 describes the validation performed on the new NLI
module of MedicineAsk. Finally, Chapter 6 concludes and summarises future work.
4 http://www.priberam.pt/DLPO/.
5
6
Chapter 2
Related Work
This chapter describes different types of systems related to Natural Language and/or medicine. Section 2.1 details a question answering system for the medical domain, which has a goal similar to
MedicineAsk. Section 2.2 details NaLIR, which translates user natural language queries into queries
to a relational database, for the general domain. Section 2.3 describes cTAKES, an information extraction system for the medical domain, with some similarities to MedicineAsk Information Extraction
module. Section 2.4 lists several online systems in the medical eld that are not question answering
systems, but still have similarities to MedicineAsk. Finally, Section 2.5 lists some existing algorithms for
anaphora resolution.
2.1 Medical Question Answering systems
MEANS [Ben Abacha, 2012] is a question-answering system for the medical domain. Its purpose is to
obtain an answer to Natural Language questions based on medical information obtained from publicly
available resources. Table 2.1 illustrates the steps taken to analyse a user question in MEANS. A short
explanation of these steps follows.
Step
Output
1) Question Type Identication
2) Expected Answer Type Identication
3) Question Simplication
4) Medical Entity Recognition
WH or Yes/No Question
EAT
Simplied question
Identied Medical Entities
5) Relations Extraction
6) SPARQL Query Construction
Identied Relations
SPARQL Query
Example - What is the best
treatment for pneumonia?
WH Question
EAT = Treatment
new Q = ANSWER for pneumonia.
ANSWER for <PB >pneumonia
</PB >. (PB = Problem)
treats(ANSWER, PB), EAT = Treatment
-
Table 2.1: Question Analysis steps in MEANS. The input of a step is the output resulting from the
previous step.
In the rst step, called Question Type Identication, questions are divided into Yes/No or WH questions. WH questions start with How, What, Which or When and have an answer with more information
than a simple Yes or No. The question type is determined by applying a simple set of rules to the input
questions (e.g. if the question begins with a How, a What, a Which, or a When then it is a WH question).
7
The second step, Expected Answer Type Identication (EAT), is necessary when dealing with WH
questions to discover what type of answer will be returned. The EAT is determined using lexical patterns
previously built by hand for each question type. WH questions are matched against these patterns and
any EAT found is saved. The possible answer types were dened by the authors and are M EDICAL
P ROBLEM, T REATMENT, M EDICAL T EST, S IGN
OR
S YMPTOM, D RUG, F OOD and PATIENT. Multiple Ex-
pected Answer Types for a single question are saved in order to answer multiple focus questions such as
“What are the symptoms and treatments for pneumonia?”. In this particular case, the expected answer
types are S IGN
OR
S YMPTOM and T REATMENT.
The third step, Question Simplication, applies a simplication to the question to improve the analysis
of the question. Interrogative pronouns are replaced by the `ANSWER' keyword (e.g., “What is the best
treatment for pneumonia?” becomes “ANSWER for pneumonia”). This `ANSWER' keyword is a special
word that the system will ignore in later steps. The question is also turned into its afrmative form.
“Treatment” is the EAT of the question, identied in the Expected Answer Type Identication step, so it
is no longer needed while analysing the question. The simplication is applied to prevent noise in later
steps. In this example, the word “treatment” would have been identied as a medical entity during the
Medical Entity Recognition step. This extra entity would have caused interference when searching for
relations, since the Relation Extraction step does not need to know the relation between “treatment” and
“pneumonia”.
The fourth step, Medical Entity Recognition, focuses on detecting and classifying medical terms into
seven different categories, which also match the possible Expected Answer Types: P ROBLEM, T REATMENT
, T EST, S IGN
OR
S YMPTOM, D RUG, F OOD and PATIENT. To nd medical entities, a rule-based
method called MetaMap+ and a machine learning method that uses Conditional Random Fields are
used.
The MetaMap+ method uses a tool called MetaMap [Aronson, 2001]. MetaMap is an online tool1 to
nd and classify concepts in input text by mapping them to concepts from the Unied Medical Language
System2 (UMLS). UMLS is a very large resource for biomedical science. It contains a Metathesaurus
with millions of concepts and concept names, a semantic network of semantic types and relationships,
and a Specialist Lexicon which contains syntactic, morphological and orthographic information. The
authors of MEANS identied some limitations on MetaMap [Ben Abacha and Zweigenbaum, 2011a].
For example, some words were mistakenly treated as medical concepts (such as best, normal, take and
reduce). Another limitation is that MetaMap can propose more than one answer for a single word. To
deal with these issues, MetaMap+ was proposed [Ben Abacha and Zweigenbaum, 2011a].
The machine learning method used for Medical Entity Recognition in MEANS is called BIO-CRF-H.
It uses a Conditional Random Field classier to classify the concepts. Among others, it uses as features
POS tags from TreeTagger, the semantic category of the word resulting from the MetaMap+ method,
and B-I-O tags that indicate what words of a sentence are the Beginning of a concept (B tag), Inside of
a concept (I tag) or Outside of a concept (O tag). For example, for a Treatment (T) concept in a given
1 http://metamap.nlm.nih.gov/.
2 http://www.nlm.nih.gov/research/umls/.
8
sentence, the rst (and possibly the only) word of that concept would be tagged B-T. Any other words
that follow that same concept would be tagged I-T. Any words not inside this treatment concept or any
other concepts are tagged O. The BIO-CRF-H approach uses the annotated corpus provided for the
i2b2/VA 20103 challenge [Uzuner et al., 2011] for training.
The fth step is Relation Extraction. A relation identies what is the relationship between two medical
entities in a sentence and is very important to determine answers to medical questions. Seven relations
are extracted:
P
H
TREATS
,
COMPLICATES
,
PREVENTS
,
CAUSES
,
DIAGNOSES
, D H D (Drug has dose) and
SS (Problem has signs or symptoms). In the example “Aspirin cures headaches”, we have the
relation TREATS between the concepts “aspirin” and “headache”. The relation extraction step uses a machine learning method with SVM. A rule-based method is used whenever there are not enough examples
in the training corpus to properly train the machine learning method [Ben Abacha and Zweigenbaum,
2011b].
The rule-based method to extract relations uses a set of patterns that were manually built from
analysing abstracts in MEDLINE4 (Medical Literature Analysis and Retrieval System Online). MEDLINE
is a database of medical articles and information about them. The machine learning based method
uses SVM (using LIBSVM) which was trained with the same i2b2/VA 2010 corpus that was used for
the Conditional Random Field method to extract medical entities. For each pair of words, it determines
if it has one of the seven relations previously mentioned (TREATS,
DIAGNOSES
, D H D and P
H
COMPLICATES
,
PREVENTS
,
CAUSES
,
SS) or no relation (thus eight possible categories).
Finally, the sixth step, SPARQL Query Construction, constructs a SPARQL query [Ben Abacha and
Zweigenbaum, 2012]. Before MEANS can answer any question, an RDF graph is built, through a separate process. A medical corpus is analysed and, similarly to how questions are analysed, medical
entities and relations are extracted from the medical texts. The extracted information is annotated in
RDF and inserted into an online RDF database. This RDF graph includes the medical entities and their
relations. In this step, SPARQL Query Construction, questions are translated into a SPARQL query that
is then posed to the RDF database.
MEANS has some similarities with MedicineAsk. MedicineAsk answers questions about certain
medicines, active substances and what they treat. It can answer questions like “What medicine is good
for pneumonia?” and “What active substances may cause headaches?”. MEANS can answer this kind
of questions as well but it also answers questions outside the scope of MedicineAsk. It can answer
questions such as “Who should get hepatitis A vaccination?” and “What is the best way to distinguish
type 1 and 2 diabetes?”. Furthermore, for MEANS, a set of documents were annotated in RDF format to
build a graph which is then traversed to answer queries using SPARQL. MedicineAsk extracts information from the Infarmed website and stores it in a relational database which is then queried using SQL.
Also, MEANS was made for English while MedicineAsk answers questions in Portuguese. Finally, the
combined use of rule-based methods and machine learning methods to answer medical questions has
some similarities to our goal of using both rule-based and machine learning techniques on MedicineAsk
3 http://www.i2b2.org/NLP/Relations/.
4 http://www.nlm.nih.gov/pubs/factsheets/medline.html.
9
for the same purpose.
2.2 General Domain Question Answering systems
NaLIR (Natural Language Interface for Relational databases) [Li and Jagadish, 2014] is an interactive
natural language query interface for relational databases. To better understand a user's natural language, NaLIR interacts with the user to solve ambiguities. NaLIR explains to the user how it interprets
a query so that the user better understands any errors made by NaLIR. For any ambiguity, the system
chooses a default solution, but provides the user with a multiple choice with other possible solutions.
The user can then resolve this ambiguity, if NaLIR's default option was not the correct one.
When NaLIR explains to the user how it understood the user's query, it must use a representation
that both the user and NaLIR can easily understand. For that purpose, the developers proposed a
data structure called a Query Tree. A query tree is between a linguistic parse tree and an SQL query.
The developers claim users can understand query trees better than SQL queries. They also claim that
NaLIR can almost always translate a user-veried query tree into an SQL statement. NaLIR has three
components. The rst component transforms a user's natural language question into a query tree. The
second component interacts with the user in order to verify and make any necessary corrections to the
query tree. The third component transforms the query tree into an SQL query. NaLIR was implemented
as a stand-alone interface that can work on top of any relational database.
The goal of NaLIR was to enable common users to obtain correct information from a database, using
complex queries without the need of knowing SQL or the schema of the database. A validation with
real users was performed on the system where both the quality of the results and the usability of the
system were measured. The data set used was the data set of Microsoft Academic Search (MAS). The
MAS website has an interface to this data set. NaLIR was tested against this MAS website. Because
the interaction with the user was the “central innovation” in NaLIR, the developers also experimented
with a version of the system which did not interact with the user. Of the 98 queries per system, users
were able to obtain correct answers for 66 queries using NaLIR without user interaction, 88 queries for
NaLIR with user interaction and 56 queries using the MAS website. They reported that users found the
MAS website useful for browsing data but hard to use in complex query scenarios. Users found NaLIR
to be easy to use, giving high levels of satisfaction, but the system showed to be weaker when handling
complex scenarios.
Both NaLIR and MedicineAsk are NLIDBs (natural language interfaces to databases). NaLIR was
designed to work for multiple domains, while MedicineAsk is solely for information about medicines and
active substances. We nd the user interaction to be noteworthy, and possibly useful for the future of
MedicineAsk.
10
2.3 Medical Text Information Extraction
cTAKES [Savova et al., 2010] is a tool that extracts information from free text in electronic medical
records using Natural Language Processing techniques. It is an extensible and modular system. It
uses the Unstructured Information Management Architecture (UIMA)5 and the OpenNLP6 natural language processing toolkit to annotate the extracted information. cTAKES has recently become a top level
Apache project7 .
cTAKES encloses several different rule-based and machine learning methods. In cTAKES, several
components are executed in a pipeline, some of them being optional. Its components include sentence
boundary detection, rule-based tokenization, morphologic normalization, POS tagging, shallow parsing
and Named Entity Recognition. For example, to extract information from a document, the text is rst split
into sentences and then a tokenizer is used to split the sentences into tokens. These tokens can then
be analysed by other components. More modules have been added over time such as a Semantic Role
Labeller, Coreference resolver, Relation extractor and others.
cTAKES is not a question answering system, but it is related to MedicineAsk. cTAKES extracts
structured information from unstructured data, sharing the goal of MedicineAsk Information Extraction
module. Both MEANS, cTAKES and MedicineAsk use a combination of rule-based and machine learning
methods for their respective purposes.
2.4 Web-based Systems
When practising medicine, doctors and medical staff need to quickly access medical information. Several
web-based systems have been developed to ease the user's search of medical information so that
answers can be found quickly. Some of these systems also have a mobile version, combining the vast
amount of information found on the web with the portability of paper charts, allowing the discovery of
answers without leaving the patient's side.
The systems Epocrates Online8 , eMedicine9 and Drugs.com10 , further detailed in [Mendes, 2011],
are similar medicine and disease reference systems. They offer a web-based user interface through
which users can search a vast amount of medical information by disease and by medicine (by name or
class). Searching medicines by class means the user navigates through several medicine classes and
subclasses until he/she nds the one he/she wants. This functionality is useful in cases where the user
does not know the name of a particular medicine. When searching medicines by name, the user inserts
the name of a medicine and then receives all the relevant available information about that medicine.
The three systems also have a medicine interaction check. Using this feature, a user can insert two
or more medicines and nd interactions between them. There are many possible interactions between
5 http://uima.apache.org/.
6 http://opennlp.apache.org/index.html.
7 http://ctakes.apache.org/.
8 https://online.epocrates.com/home.
9 http://emedicine.medscape.com/.
10 http://www.drugs.com/.
11
medicines, and these interactions can be very dangerous, making this feature very valuable.
Personal Health Records (PHRs) are systems that enable a patient to create a prole where he/she
can directly add information about himself (such as allergies, chronic diseases, family history, etc.),
results from laboratory analysis or even readings from devices that monitor the patient's health like
pedometers, heart rate monitors and even smart phones. Examples of Personal Health Records include
Google Health (now retired), Microsoft HealthVault11 , Dossia12 and Indivo13 .
In the previously mentioned web-based systems, medical staff add large amounts of data about
general medicine that users can later access. In PHRs, the users are the ones who insert data about
themselves. PHRs also include a vast amount of health information that users can access and they
can also have extra services that can be used with the patient's information. For example, a PHR can
indicate all the patient's interactions between the medicines he/she is currently using. It is also possible
to manage doctor appointments and reminders, or to track the progress of a patient's diet. Having all of
a patient's information in a centralized location makes it easier to consult this information, compared to
having it spread through several paper les. Patients can then allow doctors to access this information
so they can make better and more informed decisions.
2.5 Existing Algorithms for Anaphora Resolution
An anaphora occurs when the interpretation of a given expression depends on another expression in
the same context. For example in “John nished the race. He came in second place.”, “He” is called the
“anaphor”, and the expression it refers to, “John”, is called the antecedent. Automatically understanding
that “John came in second place” from analysing this text requires a relationship between two words in
different sentences to be established. Also, if the second sentence is analysed but the rst one is not
then the second sentence will not be understood. The rst sentence is required to understand who “he”
is.
An ellipsis is a specic case of anaphora (also called zero anaphora) where there is no anaphor and
the reference is implicit. For example in “They went on a trip and enjoyed themselves”, “they” is omitted
between “and” and “enjoyed”.
Using anaphora is useful to avoid repeating a term very often or to avoid writing a long and/or complex
name many times, referring to it through simpler terms.
There are many different kinds of anaphora that we will not detail in this work such as pronominal
anaphora, noun anaphora, verb anaphora and adverb anaphora. As an example of pronomial anaphora
we have “Mary gave Joe a chocolate. He found it very tasty”. There are also other types, such as
denite description anaphora, where the anaphora contains additional information (e.g. “Michael won
the championship. It was Smith's fourth victory”, the anaphor “Smith” tells us Michael's last name.).
An indirect anaphora requires background knowledge to resolve (e.g. “John knocked the chess board
sending the pieces ying away”, the anaphor is “pieces” and the antecedent is “chess board”, it requires
11 https://www.healthvault.com/.
12 http://www.dossia.com/.
13 http://indivohealth.org/.
12
the information that chess is played with pieces).
Anaphora resolution consists of mapping the anaphor to the corresponding correct antecedent. This
task is not trivial and even humans can have trouble resolving anaphora. For example in “Michael likes
being tall but his girlfriend doesn't.” it is unclear whether Michael's girlfriend dislikes being tall herself or
dislikes that her boyfriend is tall. For machines, due to the ambiguity of natural language, it is even more
difcult to nd the correct mapping between an anaphor and its antecedent.
Anaphora resolution is a topic that has been studied for many years. Hobbs' algorithm [Hobbs, 1978]
is one of the most known algorithms, and it intends to resolve pronoun anaphora. Hobbs' algorithm
traverses the syntactic tree of the sentence the anaphora is located on, in a particular order, searching
for possible antecedents using a breadth-rst strategy. Antecedents that do not match the gender and
number of the anaphor are not considered.
Mitkov's algorithm [Mitkov, 2002] is an algorithm to solve pronominal anaphora. After nding an
anaphora, but before resolving it, the algorithm analyses the current sentence and the two sentences
before the current one, to extract any noun phrases preceding the anaphora. A list of antecedent candidates is built in this step. Afterwards, a lter is applied to this list of candidates, removing any antecedents
that do not match the gender and number of the anaphora, similarly to Hobbs' algorithm. Finally, a list
of “antecedent indicators” are applied to the candidates. These antecedent indicators consist of several
different kinds of heuristics that raise or lower the score of each individual antecedent candidate. For
example, the rst antecedent noun phrase (NP) in a sentence receives a score of +2, while any indenite
NP (an NP that is not specic, such as the article “an” in English) receives a score of -1. Candidates with
a higher score are more likely to be the correct antecedent for the anaphora. In the end, the candidate
with the highest score is picked as the correct answer.
An adaptation of Mitkov's algorithm for Brazilian Portuguese was proposed in [Chaves and Rino,
2008]. It works similarly to Mitkov's algorithm, changing only the antecedent indicators used to score the
antecedent candidates. This version uses ve antecedent indicators from Mitkov's algorithm and adds
three new ones. These new antecedent indicators were added so that the algorithm would be better
suited for the Portuguese language. The new antecedent indicators are:
Syntactic Parallelism, A score of +1 is given to the NP that has the same syntactic function as the
corresponding anaphor.
Nearest NP - A score of +1 is given to the nearest NP in relation to the anaphor. The authors state
the nearest candidate is often the correct antecedent.
Proper Noun - A score of +1 is given to proper nouns.
The authors observed that proper noun
candidates tended to be chosen as the correct antecedent.
STRING [Mamede et al., 2012] is a Natural Language Processing chain for the Portuguese Language. It can process input text through basic NLP tasks and it includes modules that enable the resolution of more complex tasks. Among the modules of STRING is an Anaphora Resolution module (ARM
2.0) [Marques, 2013]. The Anaphora Resolution Module uses a hybrid method. It uses a rule-based approach to nd each anaphor and its antecedent candidates. Then it selects the most likely antecedent
13
candidate through a machine learning method that receives as input the anaphors and their respective
list of candidates. The machine learning algorithm used for this task was the Expectation-Maximization
(EM) algorithm [Dempster et al., 1977]. It requires a corpus annotated with anaphoric relations. First the
anaphors have to be identied. To identify anaphors in a question, ARM 2.0 uses syntactic information
in the question obtained from STRING. It then observes the POS tags of each token and where they are
located in the sentence, following a set of rules to discover each anaphor (e.g. articles not incorporated
in NPs or PPs are classied as anaphors). Second, the previous questions are analysed to build an
antecedent candidate list. A similar method is applied to construct the candidate list (e.g. nouns that
are heads of NPs or PPs can be candidates for antecedents). ARM 2.0 only looks for antecedent candidates in the expression with the anaphor and in a two sentence window before the anaphor. Third,
the antecedent candidate list is ordered by rank by the machine learning algorithm, from most likely
to least likely. The most likely candidate is chosen as the antecedent of the identied anaphor. Each
candidate in the list is an anaphor-antecedent pair. The EM algorithm uses several features to dene
each anaphor-antecedent pair, namely is antecedent which determines if the antecedent is the correct
one for this anaphor. Other features include the gender and the number of both the anaphor and the
antecedent, the distance in sentences between the anaphor and the candidate, the POS of the anaphor
and the antecedent, and others. The EM algorithm provides the likelihood of an anaphor-candidate pair
being part of a given cluster. The system runs the EM algorithm against two clusters. One cluster represents the candidate being the antecedent for the anaphor, and the other stating the opposite. Using
this method, it is possible to obtain the likelihood of each antecedent being the correct antecedent for
the anaphor, and thus build a ranking to determine the most likely answer.
Comparing the results of the systems detailed above is not possible because each system uses a
different test corpus. Each system also focuses on different types of anaphora and some of them aim
to be automatic while others rely on man-made data. As an example, Hobbs achieves a success rate of
91.7% while ARM 2.0 achieves a recall of 58.98%, a precision of 50.30% and an f-measure of 54.30%
[Marques, 2013].
14
Chapter 3
Background
This section describes the systems used for the execution of this thesis. Section 3.1 describes the
previous version of MedicineAsk. Section 3.2 describes the LUP system which is a platform that can be
used to run different Natural Language Understanding techniques (including SVMs) and compare the
results of those techniques.
3.1 The MedicineAsk Prototype
MedicineAsk is a prototype capable of answering Natural Language medical questions about medicines
and active substances. MedicineAsk is divided into two main modules: information extraction and natural
language interface. The information extraction module extracts data from the Infarmed website. This
extracted data is processed and stored in a relational database. Then, a web-interface enables a user
to pose a natural language question. The natural language interface module analyses the question,
translates it into an SQL query and gets an answer from the relational database. The answer is then
delivered to the user via the same web-interface.
This section details each of MedicineAsk modules.
3.1.1
Information Extraction
The Information Extraction module aims at extracting data from the Infarmed website, processing this
data and storing it in a database so that the Natural Language Interface module can access it to answer
questions posed by the users. Figure 3.1 shows the architecture of this module.
The Information Extraction module is divided into four components: web data extraction, processing of entity references, annotation, and the database. The web data extraction component navigates
through the Infarmed website and extracts data. While some types of data can be directly added to
the database (because they are already structured), other kinds of data require pre-processing. The
processing of entity references component and the annotation component handle this kind of data so
that it can then be inserted in the database.
15
Figure 3.1: Architecture of MedicineAsk Information Extraction module. Image taken from [Mendes,
2011].
Web data extraction
The data available in the Infarmed website is structured like the index of a book. The data is organized
into several chapters and sub-chapters where each one corresponds to a type or sub-type of active substance. Inside these chapters and sub-chapters is the information about those specic substances. The
extracted information consists of the Infarmed hierarchy structure, i.e. the chapter data, the substance
data and the medication data for each substance.
The MedicineAsk web data extraction component recursively traverses all chapters, sub-chapters
and active substances published on the Infarmed website. It uses XPath1 and XQuery2 expressions
to lter, extract and store the data. In order to keep the Infarmed website hierarchical structure, the
extracted chapters are represented as folders. For instance, chapter 1.1.1 and chapter 1.1.2 are folders
inside another folder called chapter 1.1.
Chapters and sub-chapters can contain text describing their contents. This text includes non-structured
information about the chapter's substances (historical information, molecular information, etc.) which is
stored inside the chapter's folder in an XML le with the name “chapter name” info.XML. Another kind of
non-structured information concerns indications, precautions, etc., shared by all the substances inside
that chapter. This information is stored in another XML le entitled “chapter name” indicacoes.XML.
Active substances have two different kinds of data associated. The non-structured data consists of
data about indications, precautions, etc. and the structured data contains information about medicines
that contain this substance. Both are kept in XML les, the non-structured data in a le named “active
substance name Substancia.XML” and the structured data in a le which is called “active substance
name Medicamento.XML.
The web data extraction component creates an auxiliary dictionary le that is used by other components of the Information Extraction module. This dictionary contains all the chapter and active substance
1 http://www.w3.org/TR/xpath20/.
2 http://www.w3.org/TR/xquery/.
16
names. This module inserts chapter and substance names into data structures that map them to the
location of the corresponding chapter and substance les stored on disk. The auxiliary dictionary le
created by the web data extraction component is also further enriched by extracting medical conditions
from the “Médicos de Portugal” website 3 . In the context of the previous version of MedicineAsk, this
website contained approximately 12000 names of medical terms. These medical terms were extracted
through XPath and XQuery expressions.
Processing of entity references
In the Infarmed website, some active substances make references to other active substances or chapters. For example, a sub-chapter may contain common data about the active substances of that subchapter. If every active substance mentioned in that sub-chapter has, for example, the same adverse
reactions then these adverse reactions are detailed only once on the sub-chapter data, instead of being
detailed for each individual active substance. In these cases, the active substance makes a reference
to where the actual information is. In the previous example each active substance would have, on the
adverse reactions section, a reference to their sub-chapter, so that a user knows what sub-chapter to
access if he wants to know the reactions of that active substance. This kind of situation is called an
“entity reference”. Entity references always come in the format V. “sub-chapter name”.
In the Infarmed website a user can use the site's index to navigate to the section that contains the
required information. As an example, if a user needs to access the information of an active substance
and nds the entity reference “V. (1.1.7)” on the adverse reactions section, then the user can use the
index to navigate to chapter 1.1.7 and read the missing information. However, MedicineAsk is a question
answering system and the user cannot freely navigate to a given chapter. For that reason these entity
references must be solved. The solution is to replace a reference with the missing information. For
example, if we nd the reference “V. (1.1.7)” on the adverse reactions section of an active substance,
MedicineAsk must go to chapter 1.1.7, copy the text on the adverse reactions section of that chapter
and place it on the original active substance, replacing the entity reference that was there.
MedicineAsk performs the reference replacement by scanning all the extracted les and analysing the
text of each active substance using regular expressions with the goal of nding these entity references.
When an entity reference is found, the name of the chapter or substance that has to be visited in order
to nd the necessary information has to be extracted from the entity reference. To do this, the dictionary
with all the chapter and substance names created in the web data extraction component is used with a
dictionary based annotator available in e-txt2db [Simoes, 2009]. When the chapter/substance name is
found, the next step is to nd the le of that chapter/substance on disk, in order to extract the required
information. To do this, the map that was made in the web data extraction component is used. This map
contains all of the chapter/substance names and their respective locations on disk. Those les are then
accessed and the referenced information is extracted and replaced in the original active substance that
contained the entity references.
3 http://medicosdeportugal.saude.sapo.pt/glossario.
17
Annotation
After the Web-data extraction and Processing of entity references steps, MedicineAsk can answer questions such as “What are the indications of paracetamol?” but it cannot answer questions like “What
medicines treat fever?”. This is because the elds that detail an active substance (e.g., indications, adverse reactions, etc.) often contain free text rather than a list of words (i.e., The texts are “Indications:
This medicine should be taken in cases of fever and also in case of headaches” instead of “Indications:
fever, headache”). If this type of text was annotated then questions such as “What medicines treat
fever?” could be answered. To annotate the text, a dictionary based method, a part-of-speech tagger
and a regular expression method were used.
The dictionary-based method uses the dictionary with substance and chapter names produced by the
web data extraction component. To identify the part-of-speech in a sentence, the part-of-speech tagger,
TreeTagger4 was used. To nd medical terms the developers found and used patterns of part-of-speech
classication. Regular expressions were used for dosage extraction because Infarmed represents this
information in a consistent way, with child and adult dosages always separated by identical tags across
every active substance.
Database
The extracted data is then inserted into a relational database. The database stores the following four
main types of data: (i)the Infarmed website's hierarchy structure, (ii)the chapter data for each chapter of
the Infarmed website, (iii) the data for each active substance, and (iv) the medication data for each active
substance. The insertion of the data was performed in two steps. First, the Chapter, ActiveSubstance,
Medicine and MarketingForms tables were populated by a Java application traversing the folder hierarchy
with all the chapters, inserting any chapters and active substances found, along with the information on
the corresponding XML les in the same folder. Second, any remaining tables were populated with the
annotated information collected in the annotation component which was stored in a le (for example, a
table for indications to stored the annotated indications text, another table for adverse reactions, etc.).
To better answer questions by nding synonyms to words the user writes, a synonym table was created.
This table was populated with only a few hand made synonyms because no source of medical term
synonyms in Portuguese was found at the time.
3.1.2
Natural Language Interface
When a user poses a question in natural language, it is necessary to analyse and process that question
in order to be able to answer it. For that purpose, MedicineAsk includes a Natural Language Interface
(NLI) module. Figure 3.2 shows the architecture of this module. The NLP module consists of three
components. The rst component, named question type identication, identies what the question is
about (e.g., if it is a question about adverse reactions or about the correct dosage of a medicine).
The second component, question decomposition, determines which are the important entities that the
4 http://www.cis.uni-muenchen.de/˜schmid/tools/TreeTagger/.
18
Figure 3.2: Architecture of MedicineAsk Natural Language Interface module. Image taken from [Mendes,
2011].
question targets (e.g., which active substance do we want to know the adverse reactions of). The third
component is question translation that aims at translating the natural language question into an SQL
query to be posed to the database that stores the Infarmed data. These three NLP components are
further described in this section.
Question type identication
The output of this component is a question type expression represented as “predicate(parameters)”. For
example, the questions “What are the indications of paracetamol?” and “What are paracetamol's indications?” would both be mapped to the question type expression “Get Indications(ActiveSubstance)”.
MedicineAsk identies questions through regular expressions and keyword spotting techniques. These
techniques are associated to a strict and a free execution mode of this module, respectively.
The strict mode is used rst. It uses regular expressions to match the question to one of several
different regular expression patterns. If there is a match, the question is associated to the same question
type as the type of the pattern. This mode is reliable if the user writes a question according to the pattern
but that does not always happen.
The free mode is activated if the strict mode fails. The free mode uses keyword spotting [Jacquemin,
2001] to determine the question's type. The question is annotated by matching words of the question to
words in the dictionaries that were built during the Information Extraction phase. For example, the word
“Indications” is annotated with an indicationsTag tag. Other words, such as medical entities are also annotated. Then, by looking at the tags, the question type can be inferred. When faced with a question that
was marked with an indicationsTag that question will likely be asking for the indications of a given active
substance and thus be represented by the question type expression Get Indications(ActiveSubstance).
19
If the free mode is used, the user gets notied of what the question was mapped into, so that he/she can
be sure that the question was not misunderstood.
Question decomposition
The objective of this component is to nd the focus of the question. In the Question type identication
component, the question “What are the indications of paracetamol?” led to the question type expression
Get Indications(ActiveSubstance). Now, the goal is to nd the medical entity “paracetamol” in the question in order to replace ActiveSubstance by “paracetamol” so that Get Indications(paracetamol) will be
the resulting question type expression. This component also has a strict and a free mode.
The strict mode is only used if the strict mode was chosen in the Question type identication phase.
In this case, the question was matched to a pattern and so we know the location of the medical entities
in the question. If the question matched the pattern “What are the indications of SUBSTANCE?” then
the medical entity in the question will be located where “SUBSTANCE” is in the pattern.
The free mode is used if the Question type identication component used the free mode as well.
Since the free mode was used, the location of the medical entities is unknown. However, the Question
type identication component annotated the user question. The Question decomposition component
can then analyse those annotations to know exactly which medical components are present in the user
question. In the question “What are the indications of paracetamol?”, “paracetamol” will be annotated
as a medical entity. Using this information the question type expression can be built.
Question translation
The last component poses a query to the database in order to answer the question. Following the
previous example, the Get Indications(paracetamol) question type expression that was produced by the
Question decomposition component is now mapped into the corresponding SQL query. Each question
type expression is mapped to a different SQL query and the parameters (e.g., paracetamol) are added
to the query WHERE clause. The results of the query are sent to the user interface after being converted
into HTML.
User Interface
The user interface is implemented in JSP5 and published in a TomCat6 server. The interface contains a
text box where the user can pose the question. The answers are shown below the text box, in text and/or
tables depending on the type of question. Several help mechanisms are implemented in MedicineAsk. In
case the user mistypes a medicine or an active substance (not uncommon with some complicated words
in the medical eld) then the Soundex7 algorithm is used. This algorithm uses phonetic comparison
to search for words in the database that sound similar to the word the user wrote. A user may also
remember only part of a medical term, like the rst word of a medicine's name. The “LIKE” SQL condition
5 http://java.sun.com/products/jsp/.
6 http://tomcat.apache.org/.
7 http://en.wikipedia.org/wiki/Soundex.
20
is used in these cases, and thus nds terms in the database that contain the word the user typed with
added prexes or sufxes. Finally, an auto-complete script in jQuery8 is available. It can help users by
showing suggestions of possible words as the user types the question, similarly to web search systems
like Google. The Soundex and “LIKE” help mechanisms can only be used when answering questions
with the rule-based techniques. This is because the rule-based techniques know where the medical
entities are. Questions are matched to a pattern, and the portion of the question that does not t the
pattern is the entity. The keyword spotting technique does not know the location of a misspelled entity,
so it cannot use these help mechanisms.
3.1.3
Validation of the MedicineAsk prototype
MedicineAsk was evaluated with real users, in order to validate the whole prototype, verifying its usability,
whether the answers retrieved were correct and if MedicineAsk was preferred over the Infarmed website.
The tests with real users were divided into a developer evaluation and a user evaluation. The goal
of the developer evaluation was having someone with experience with both the Infarmed website and
MedicineAsk to test both systems, while trying to answer questions as fast as the system allows. To
achieve this, the developers of MedicineAsk decided to answer a set of questions using both the Infarmed website and MedicineAsk, since they were familiar with both systems at the time of MedicineAsk
evaluation. They concluded that both the Infarmed website and MedicineAsk could answer any of the
test questions but it was faster to get answers through MedicineAsk.
The user evaluation used real, possible end-users of these applications, including both common
users and medical staff. This evaluation collected quantitative and qualitative measures. Quantitative
measures consisted of measures such as number of clicks and time required to answer a question while
qualitative measures were user satisfaction and ease of use of both systems. A ve point measuring
scale was used to qualify the qualitative measures. Tests showed that both quantitatively and qualitatively MedicineAsk outperformed the Infarmed website.
3.2 LUP: A Language Understanding Platform
LUP [dos Reis Mota, 2012] is a platform that enables developers to run different Natural Language
Understanding (NLU) techniques and compare their results. NLU is a subtopic of NLP. It focuses more
on how a machine understands a question, with less concern for what to do with that question. In this
case, NLU involves understanding a user question, but not how to answer it. LUP enables a user to nd
out what is the best technique and the best parameters for a given NLU problem. We propose to use it
in order to test different NLU techniques that will then be integrated in the NLP module of MedicineAsk.
The NLU techniques that LUP supports are:
Support Vector Machine (SVM) - It uses the LIBSVM [Chih-Chung Chang, 2001]
9
library and a
one-versus-all strategy meaning that instead of classifying data into multiple classes at once, it
8 http://jquery.com/.
9 http://www.csie.ntu.edu.tw/
cjlin/libsvm/.
21
classies them into “belonging to class X” versus “not belonging to class X”, for each individual
class.
Cross Language Model (CLM) [Leuski and Traum, 2010] - CLM is a statistical language classier
that maps user questions into answers from a virtual agent. The authors of that technique developed an implementation of CLM called the NPCEditor toolkit. LUP invokes this toolkit in order to
use CLM.
A classication algorithm based on string similarity - This technique was developed by the authors
of LUP [dos Reis Mota, 2012]. In the training corpus, each training example is associated to a
semantic category. This algorithm uses string similarity techniques to compare a user's question
to each training example, thus nding the semantic category with highest similarity score to the
user's question. LUP supports the use of three string similarity measures: Jaccard, Overlap and
Dice. It is also possible to use two combinations of Jaccard with Overlap, using weighting values.
Mapping the user's input into a representation that a machine can understand can be performed
in several ways. The semantic representations that LUP supports are Categories, Frames or Logical
Forms.
A category is associated with each question type. For example a category 1 could be associated
to greetings, while a category 2 could be associated to questions regarding the weather, and so on. A
frame is a set of attributes that are lled in with different values. The task for a given question consists of
nding the correct frame and the correct values to be inserted into the frame's attributes. An example of a
frame is a frame for controlling vehicles where an attribute “Action” can have “accelerate” or “use brakes”
as possible values and another attribute “Vehicle” can have values: cars, trains, boats, etc.). Finally,
logical forms represent questions through simple formulas with N-ary predicates and N arguments. As
an example, for the question “Who made Mona Lisa?” the logical form is WhoMade(Mona Lisa).
Figure 3.3: LUP architecture.
The LUP architecture is composed of several modules as represented in Figure 3.3. The Front End
module acts like an interface through which the user supplies information. This information consists of
22
the training corpora and some congurable parameters, namely which NLU techniques are going to be
used.
The Corpora Parser module is in charge of preprocessing. Preprocessing techniques include stop
words removal, normalization, POS tagging and named entity recognition. Furthermore, a congurable
number of partitions are randomly applied to the corpora with the purpose of generating training and
test sets for later use. The preprocessing techniques to apply and the number of test partitions are
parameters obtained from the Front End module.
The Tester module validates the NLU techniques through cross-validation. It uses the training and
test partitions created in the Corpora Parser module. The training partitions are sent to the Classier
Trainer module which returns the classiers. The Tester module creates log les with information about
which NLU techniques failed and why. This information shows each test example that was wrongly
classied, along with the name of the wrong category, the correct category, and examples of questions
from the training partition that belong to the wrongly predicted category and the correct category.
The Deployer module is used to create a nal instance of LUP to be used in a given NLU problem. After using LUP to run and test several techniques simultaneously, developers can choose the single best
conguration for their NLU problem. The technique that achieved greater results can then be deployed
in an instance of LUP. This instance will use that technique as a classier that uses all the previously
supplied corpora as training and will evaluate previously unseen examples supplied to it. Because the
used NLU techniques can be computationally expensive the data generated by these techniques can
be stored. The authors exemplify by stating we can store the models generated by SVM and then load
them later instead of computing the necessary algorithms every time.
23
24
Chapter 4
Improvements to the Natural
Language Interface module
The goal of this thesis is to improve the MedicineAsk's NLI module by increasing the number and variety
of user questions that the system is able to answer. User questions can be too complex (e.g. contain
useless information such as the name of the patient requiring a specic medicine) or too abstract (e.g.
“What's the price?”) to be answered in previous versions of MedicineAsk. While it is impossible to cover
every possible question a user could pose, in this thesis we seek to cover enough questions so that the
users of MedicineAsk can obtain the information they need by posing a Natural Language question in
Portuguese.
This chapter is structured as follows: Section 4.1 details a new technique to answer questions in
MedicineAsk, through machine learning methods. Section 4.2 includes our approach to handle questions containing anaphoras. Finally Section 4.3 explains how we expanded the synonym detection module of MedicineAsk.
4.1 Automatic question classication
The NLI module of the previous version of the MedicineAsk system uses rule-based and keyword spotting techniques [Galhardas et al., September 2012]. Rule-based methods require a user's question to
exactly match a certain pattern. For this reason, these methods usually have low exibility. Keyword
spotting techniques require certain keywords to be present in the question, as well as dictionaries of
those keywords to be built. In this thesis, we integrated a machine learning approach for the question
type classication and question decomposition steps using Support Vector Machines (SVMs) [Zhang,
2003]. The question type classication step consists on identifying the question type of a question (e.g.
If it is a question regarding indications or adverse reactions). The question decomposition step consists
on identifying the entities in a question (e.g. if the question is about the indications of paracetamol or
the indications of aspirin). We compared several features on SVM and report the obtained results, in
order to determine if machine learning brings any improvements to the previous MedicineAsk NLI. In
25
this section we explain the concept of SVM and how it was implemented.
4.1.1
Answering MedicineAsk questions with SVM
SVM is a supervised learning technique which classies data into one of several different classes. Training data is composed by data instances previously assigned to each class. The training data is analysed
by SVMs, which constructs a model that is then used to assign each new data instance to one of the
classes. The model is represented as a space and each data instance is a point belonging to this space.
SVMs create a hyperplane to divide the space into two sides. All points on one side of this hyperplane
belong to one class while the points on the other side belong to the other class. New data instances
will be represented as a new point in the space and their class will be determined by which side of the
hyperplane they are located in.
For MedicineAsk we want our questions to be classied into one of several different classes. Each
class is a specic question type (e.g. a class for questions about indications, another for questions
about adverse reactions, etc.). An SVM which uses more than two classes is very complex. Therefore
it is best to use a one-versus-all strategy. This type of strategy divides a multi-class classication into
smaller multiplications of two classes each. The smaller classication divides a question into the classes
“Belongs to class X” or “Does not belong to class X”, for every class we want to evaluate. For example, for
the question “What are the adverse reactions of paracetamol?” SVM rst classies it as either “Belongs
to class question about indications” or “Does not belong to class question about indications”. Then
it classies it as either “Belongs to class question about adverse reactions” or “Does not belong to
class question about adverse reactions”. After the question as been tested for every possible smaller
classication problem, the class that showed a higher likelihood score is chosen as the class of the
question.
We used several different features in SVM. A feature is a type of variable that is used to determine
how the hyperplane is dened. We used unigrams, bigrams and trigrams, which give relevance to each
individual word, pairs of consecutive words, or triples of consecutive words, respectively. We also used
binary unigrams and binary bigrams which work similarly to regular n-grams but only take into account
whether the word is present in the training corpus. We also tested length and word shape. The length
feature takes into account the length of the text being analysed. Word shape is a feature which takes
into account the type of characters in a word, such as numbers, capital letters and symbols like hyphens.
This feature does not work well by itself but can improve the results when paired with other features [Loni
et al., 2011].
In order to understand how SVM solves questions we can compare all three available techniques
of MedicineAsk. The rule-based technique answers questions by matching a question to a pattern.
Each pattern is associated with a question type. Each word that is not part of the pattern is part of the
named entities. The keyword spotting technique analyses the words in the question and, depending
on the type of words present, determines the question type (e.g. if the word indications is present
then it is a question about indications). It uses the same method to discover any named entities in the
26
question. SVM has each question type mapped to a class, and attempts to classify each question into
one of these classes, thus determining the question type. The named entities are discovered through
a dictionary based annotator. An excerpt of the dictionary used to identify named entities is shown in
Appendix B.
4.1.2
Adding SVM to MedicineAsk
To answer a question using SVMs, MedicineAsk must use LUP. Using the model corpus and dictionary of
named medical entities, LUP uses SVM to determine the question type and the named entities present
in the question. This information is then sent back to MedicineAsk, which can now build an SQL query
based on the question type and named entities of the user question.
˜ os medicamentos semelhantes
Some question types include more than one entity, such as Quais sao
˜ provocam sonolencia?
ˆ
a Mizollen que nao
(“What are the medicines similar to Mizollen that do not
ˆ
cause sleepiness?) that contains the entities “Mizollen” and “sonolencia”.
This kind of questions need
additional processing in order to know which entity is which. If this processing was not performed then
the SQL query will be malformed and it will not be possible to answer the question. In this example,
there is a chance the SQL query would be asking for “What are the medicines similar to sleepiness that
do not cause Mizollen?” which will lead to no results. When analysing one of these questions, SVM
detects these entities and tags them with the corresponding class. In this case “Mizollen” is tagged as
ˆ
a medicine and “sonolencia”
is tagged as a medical condition. These tags are then used to know how
to build the SQL query correctly. For the example question since we know “Mizollen” is a medicine and
ˆ
“sonolencia”
is a medical condition then we can be sure that the question posed to the database will
be “What are the medicines similar to Mizollen that do not cause sleepiness?” and not “What are the
medicines similar to sleepiness that do not cause Mizollen?”.
We adopted three different strategies in combining question classication techniques in MedicineAsk,
in order to determine which method was more effective. These strategies use the currently available
NLP techniques in MedicineAsk sequentially - rule-based, keyword spotting and SVM. We considered
the following strategies to integrate SVM into Medicine in order to answer questions:
Strategy 1: Rule-based, falling back to keyword spotting if no match is found
Strategy 2: Rule-based, falling back to SVM classication if no match is found
Strategy 3: Rule-based, falling back to keyword spotting if no match is found, and then falling back
to SVM classication if keyword spotting fails as well.
Strategy 1 (see Figure 4.1) was featured in the previous version of MedicineAsk. It tries to answer a
question using rules. If no match is found, it falls back on the keyword spotting method. This strategy
was already detailed on Chapter 3. Strategy 2 (see Figure 4.2) is similar to Strategy 1: MedicineAsk rst
tries to match a question to a rule-based method, and if no match is found then it falls back on SVM.
Finally, Strategy 3 (see Figure 4.3) attempts to use every available method. If a question does not match
any patterns using the rule-based method, then the keyword spotting technique is used. If no question
27
Figure 4.1: Strategy 1.
Figure 4.2: Strategy 2.
type can be determined with this technique then MedicineAsk attempts to answer it through the SVM
method.
The goal of using these strategies is to evaluate the difference of answering questions using only
a single technique versus answering questions using combinations of those techniques. We only considered a few combinations among all the possible ones between these techniques. The rule-based
method should be rst as it is the most reliable if the user matches a question exactly with a pattern.
SVM is the last technique to be used because SVM never fails to assign a class to a question.
We evaluated the three strategies and report the results obtained on Section 5.4.
4.2 Anaphora and Ellipsis
This thesis aims at supporting questions featuring anaphora and ellipsis. Extensive work and research
has been performed on these topics and a solution to resolve every possible case of anaphora and
28
Figure 4.3: Strategy 3.
ellipsis has yet to be found, as shown in Section 2.5. For this reason we do not intend to give full support
to these types of questions. The goal is to answer some simple questions that contain these special
cases and, in the future, these features will be expanded in order to support an even larger amount of
questions.
4.2.1
Proposed Solution
Since anaphora resolution is a complex topic, our objective is to resolve some basic anaphora cases
and build a solution that is extendible, so that more cases can be handled in the future.
The systems described in Section 2.5 follow, in general, three steps to resolve anaphora:
Identify anaphora and nd the anaphor;
Search for and list possible candidates for antecedents;
Choose an antecedent out of the candidates and use it to replace the anaphor.
In these systems the the process of searching for an antecedent is complex. Several questions have
to be analysed and any word in those questions is a potential antecedent. However, in MedicineAsk the
possible cases of anaphora are more limited because we are dealing with a more restricted environment.
Rather than free text, the user only inputs questions. There are two possible types of anaphora that can
occur in a user question: medical entity anaphora and question type anaphora.
With medical entity anaphora the anaphora is a medical entity, such as a medicine or active substance. For example, “What are the indications of paracetamol? And what about the adverse reactions
of that substance?” the anaphor is “that substance” and the antecedent is “paracetamol”. For question
type anaphora the anaphora is the question type, for example “What are the indications of paracetamol?
And those of mizolastina?” where the anaphor is “those” and the antecedent is “indications”.
What makes anaphora resolution in MedicineAsk easier than in other systems is that both the question type and the named entities are obtained through the process of answering a question. By storing
both the question type and the entities every time a question is successfully answered, there is no need
29
to search for possible antecedents because we already possess a list of them. We do not need to reanalyse past questions and examine words such as “indications” to determine that the previous question
type was about indications, we only need to consult the information stored beforehand. Choosing the
correct antecedent is similar to Hobbs' method, but instead of ignoring antecedents that do not match
gender or number, we ignore antecedents that would not make sense in the new question. For example
in the question “What are the indications?” can be used on a medicine or an active substance but not
on medical conditions (e.g. the question “What are the indications of headache” does not make sense).
For this reason, if every possible antecedent had an entity of the type “medical condition”, this particular
example of anaphora would not be solved.
Another difference of MedicineAsk's anaphora resolution when compared to the previous systems
is that MedicineAsk does not need to nd the anaphor. This is because we do not want to return the
question with the anaphora resolved. Instead, we only provide the answer to the question. As long
as we know the antecedent we can provide an answer to the user. For example, consider that a user
inputs the question “What are the indications of that active substance?” and we know the question is
referring to “paracetamol”. Our objective is not to turn the user's question into “What are the indications
of paracetamol?” by replacing “of that active substance” with “paracetamol”. Our objective is to simply
return to the user the answer he seeks by displaying the indications of paracetamol.
On this thesis we will focus on medical entity anaphora. The strategy to resolve medical entity
anaphora is as follows. The rst part consists on analysing regular questions with no anaphora. If the
question is successfully answered, the question's question type and named entities are stored in a data
structure which we will call Antecedent Storage. Figure 4.4 shows an example of this process for the
question “What are the indications of paracetamol?”. Note that the question type “Indications” and the
entity ”Paracetamol” are stored in the Antecedent Storage.
Figure 4.4: Answering a question with no anaphora and storing its information in the Antecedent Storage.
Afterwards it is possible to analyse questions with medical entity anaphora. If a question is analysed
30
and no entities are found a case of anaphora is detected. In this case, we send the information of the
question with medical entity anaphora to the Anaphora Resolver. The Anaphora Resolver then looks
at the information of the question with anaphora and compares it to the information in the Antecedent
Storage. If a possible antecedent for the question with anaphora is found then we return that antecedent
as the entity, which will be user to answer the question. Figure 4.5 shows an example which continues
from the example in Figure 4.4. This time the question is “What are the generics of that medicine?”.
MedicineAsk identies that this question has no medical entities and thus we have a case of medical
entity anaphora. The Anaphora Resolver is then in charge of nding a possible antecedent. The question contains the word “medicine” so we know that we are looking for a previous entity that is also a
medicine. The latest entry in the Antecedent Storage was “paracetamol” which is an active substance,
thus “paracetamol” cannot be the antecedent of this question. The second entry is “Mizollen which is
indeed a medicine, so “Mizollen” can be an antecedent for this question. The entity “Mizollen is then
combined with the question type “Get Generics” and sent to the “Question Translation” step.
Figure 4.5: Solving anaphora in a question. Paracetamol is ignored because it is an active substance.
4.2.2
Implementation
After successfully answering a question, the information about that question is stored in the Antecedent
Storage data structure. We store the question's type and any entities found. Only the information of the
last two questions is stored, however this number is congurable. This is because the older the question
we analyse the less weight we give to them to help us resolve the anaphora, as the chance of them still
being relevant decreases over time [Hobbs, 1978].
Then, when a new question without medical entities is received, we can use this stored information
to solve an anaphora. Anaphora are solved in different ways depending on the technique being used to
answer the question (rule-based, keyword spotting and machine learning).
31
With the rule-based methods, the question is matched to a pattern and thus we know the question
type. However, on some of these cases, no entities are present in the part of the question that does
˜
not match a pattern (e.g. “Quais as indicaçoes
do medicamento?” (“What are the indications of the
medicine?” matches a pattern about indications, but “medicamento” is not a recognized entity). This way
we identied that no entity is present. Since we know the question type we can access the Antecedent
Storage to nd a suitable antecedent for the question.
With the keyword spotting method, because it is searching for keywords, the method cannot determine the question type if no medical entity is present. This is because the entity is a crucial keyword to
determine the question type. For this reason, if the keyword spotting method fails, we assume there is an
anaphora and take whichever was the latest antecedent stored in the Antecedent Storage, appending it
to the question. Afterwards, we re-analyse the question with the keyword spotting technique. This time,
with an entity present on the question, the method has a higher chance of answering the question. For
example, if the keyword spotting technique attempts to analyse the question What are the indications? it
will fail. Suppose the Antecedent Storage contains only the entity “paracetamol”. In this case this entity
is appended to the question, resulting in What are the indications? paracetamol. The keyword spotting
technique then tries to answer this new question. In this case it will succeed. If this technique failed it is
because the question was not a case of anaphora, but was simply malformed.
The machine learning method can determine a question type without the presence of entities, but
without an entity there is a higher failure rate. This is because the type and location of the entities
in a question are part of the training data, and thus help (but are not required) with the question's
classication. There are two ways to resolve anaphora with SVM. Since SVM returns a question type
but no entity, we can use the question type to search the Antecedent Storage for a suitable antecedent.
Afterwards we can take the question type and the antecedent and use it to answer the question. However
this method uses the question type that SVM found when no entities were present. As stated before,
entities help classify the question correctly. This means that there is a higher chance this question type
is incorrect. There is an alternate method to avoid this problem. In this strategy if SVM fails to nd an
entity, we append the latest antecedent stored in the Antecedent Storage to the question and re-analyse
the question with SVM. This method is slower because it has to answer the question twice, but there's
a higher chance that the question will be correctly classied if the entity is present. We tested both
methods and the results are shown in Section 5.6.
In either case, the user is alerted to the fact that an anaphora was detected and another entity was
used, showing him/her which entity it was. This is to prevent accidents where a user either forgot or
misspelled the medical entity he wanted to query about, leading the system to think it was dealing with
anaphora. For example if the user asked about the indications of paracetamol and then asked about the
adverse reactions of mizolastina, but misspelled mizolastina, then the user would incorrectly receive the
adverse reactions of paracetamol as an answer.
This implementation was created with the goal of being extensible. The rules used to determine if a
given antecedent is compatible with the current question's anaphora are stored in an XML le. By editing
this XML le, it is possible to easily add new rules. It is also possible to change all of the rules to t a
32
different environment. This makes it possible to use this anaphora resolver for other environments, with
the only changes necessary being in the XML le. The restriction is that the new environment must be
somewhat similar to MedicineAsk. This means that the new system must deal with question answering,
and the question's information should be a question type plus a list of detected entities.
4.3 Synonyms
As mentioned in 3.1.1 a feature to support synonyms was implemented on the previous version of
MedicineAsk. This section describes our efforts to expand this feature on this version of MedicineAsk.
4.3.1
Motivation
Users can use many different words for different medical entities. The data extracted from the Infarmed
website uses a more formal type of speech. For example “febre” (fever) is known as “pirexia” in the
Infarmed website. If a user asks a question about fever but the database only knows the term pirexia,
then the question will not be answered correctly. Furthermore, users cannot be expected to know the
complex terminology used in medicine, or they may just use a term that is less common for a given
expression. A synonym system can bridge the gap between the terminology of a common user and the
terminology of the medical information.
Furthermore, it would be ideal if this synonym system was extensible so that support for new synonyms can easily be added.
4.3.2
Implementation
We implemented a small program in Java that extracts synonym information from Priberam1 , which is an
online Portuguese dictionary. This synonym information consists of synonyms of various medical terms.
The medical terms used to search for synonyms were on the dictionary of named medical entities, which
is used by MedicineAsk to nd entities in questions. Each one of these medical entities was used as a
keyword on the Priberam website's keyword-based search. For every result we extracted the synonyms
of that medical entity that were found in the Priberam dictionary (if available) and stored them in text
documents. This information was then inserted into the MedicineAsk database's synonym table. When
a user poses a question, queries to the database take into consideration both the entities found in the
question and their respective synonyms.
1 http://www.priberam.pt/DLPO/.
33
34
Chapter 5
Validation
This chapter describes the evaluation of the version of MedicineAsk produced as a result of this thesis,
which aimed to improve the MedicineAsk NLI module. We performed various experiments to test how
each of the new features of MedicineAsk improves upon the previous version. Section 5.1 describes
the initial setup used for the experiments and lists the experiments performed. Sections 5.2 to 5.7 detail
each one of these experiments. Section 5.8 summarises the results from the entire validation.
5.1 Experimental Setup
The patterns, that the rule-based method use to determine question types, were manually built from a
previously collected set of questions, as described in [Bastos, 2009]. There are 22 regular expression
patterns.
To train the SVM we used a training corpus named training corpus A, built from 425 questions collected during the execution of the previous thesis. Part of these questions consist on the questions that
were used to create the rule-based patterns. The other part of the questions came from an experiment
similar to this one, to test the previous version of MedicineAsk. This second group of questions was
used to ne-tune the rule-based patterns during the previous thesis. In the training corpus A, questions
are divided into 18 question types. Table 5.1 presents the number of questions per question type.
For the named entity recognition task, we used a dictionary built with medical terms extracted from
the Infarmed website and the Médicos de Portugal website. The dictionary includes 17982 terms: 2127
medicines, 1212 active substances and 13228 medical conditions. This is the same dictionary used by
the MedicineAsk keyword spotting technique.
We collected a test set, the questionnaire test set, to compare the rule-based with the SVM approach.
To this end, an on-line questionnaire composed of 9 different scenarios was distributed over the internet,
using Facebook1 . Appendix A shows this questionnaire. Each scenario consists of a description of a
problem that is related to medicines (e.g. “John needs to know the adverse reactions of Efferalgan,
what kind of question should he ask?”). The participants were invited to propose one or more (natural
1 https://www.facebook.com/.
35
Question Type
Number of Questions
QT INDICACOES
QT REAC ADVERSAS
QT PRECAUCOES
QT INTERACCOES
QT POSOLOGIA
QT MEDICAMENTO
QT MEDICAMENTOCONTRA
QT MEDICAMENTONAO
QT PRECO BARATO
QT PRECO
QT MEDICAMENTOSEMELHANTES NAO
53
69
35
14
48
15
7
6
15
5
38
QT MEDICAMENTOSEM PRECAUCOES
54
QT MEDICAMENTOSEMELHANTES SEM PRECAUCAO
7
QT MEDICAMENTOSEM INTERACCAO
19
QT MEDICAMENTODA SUBSTANCIA
QT MEDICAMENTOCOMPARTICIPADOS
QT MEDICAMENTOGENERICOS
QT MEDICAMENTOINFORMACAO
4
1
30
5
Example Question
˜ do paracetamol?”
“Quais as indicaçoes
˜ adversas do paracetamol?”
“Quais as reacçoes
˜ do paracetamol?”
“Quais as precauçoes
“Quais os medicamentos que interagem com o paracetamol?”
“Qual a dosagem do paracetamol?”
“Quais os medicamentos para a asma?”
“Quais os medicamentos contra indicados para a asma?”
˜ provocam febre?”
“Quais os medicamentos para a asma que nao
“Quais os medicamentos mais barados do paracetamol?”
“Qual o preço do Efferalgan R ?”
“Quais os medicamentos semelhantes ao Efferalgan R
˜ provoque febre?”
que nao
˜
“Quais os medicamentos para a asma que nao
˜ com a febre?”
exijam precauçoes
“Quais os medicamentos semelhantes ao Efferalgan R
˜ exijam precauçoes
˜ com a asma?”
que nao
˜
“Quais os medicamentos para a asma que nao
˜ com o paracetamol?”
tenham interacçoes
“Quais os medicamentos existentes do paracetamol?”
˜ os medicamentos comparticipados do paracetamol?”
“Quais sao
“Quais os medicamentos genéricos do paracetamol?”
˜ do Efferalgan R ?”
“Quais as informaçoes
Table 5.1: Number of questions for each question type in the Training Corpus A.
language) questions for each scenario. We collected questions from 61 users, 19 medical professionals,
and 42 common users.
For the rst experiments we began by creating a subset of the questionnaire test set which we named
test corpus A where we used questions from 30 randomly chosen users, 15 medical professionals and
15 common users. For this test set we test the questions of medical professionals and common users
separately. This is because we believed the difference in vocabulary used by medical staff and common
users would inuence the results. The test corpus A included a total of 296 questions divided into 9
scenarios. Table 5.2 shows details for each scenario. The questions were not pre-processed in any way
so any errors or typos present in the questions were not removed.
Scenario
1
2
3
4
5
6
7
8
9
Common user
questions
18
18
16
16
15
15
15
15
15
Medical user
questions
17
15
19
19
18
16
17
15
17
Total
questions
35
33
35
35
33
31
32
30
32
Expected
question type
QT INDICACOES
QT REAC ADVERSAS
QT PRECAUCOES
QT POSOLOGIA
QT MEDICAMENTOGENERICOS
QT PRECO BARATO
QT MEDICAMENTOSEM INTERACCAO
QT MEDICAMENTOSEMELHANTES NAO
QT MEDICAMENTOSEM PRECAUCOES
Table 5.2: Number of user questions and expected answer type for each scenario for Test Corpus A
We also created a second training corpus called the training corpus B. This training corpus is the
result of fusing the training corpus A and test corpus A. By enriching the corpus, it was expected that
SVM would be able to answer more questions by using this extra information. Training Corpus B has
886 questions. Once again the corpus questions are divided into 18 question types, but only 9 of the
question types were enriched (since test corpus A only uses 9 question types). Table 5.3 shows the
number of questions for each question type. Appendix C shows an excerpt of Training Corpus B.
A new test set is required to test training corpus B, because training corpus B is the result of fusing
test set A and training corpus A. We built a second test set from the remainder of the questionnaire test
36
Question Type
Number of Questions
First Corpus
Number of Questions
Second Corpus
QT MEDICAMENTOSEM PRECAUCOES
54
81
QT MEDICAMENTOSEM INTERACCAO
19
48
QT MEDICAMENTOGENERICOS
30
60
QT INDICACOES
QT REAC ADVERSAS
QT PRECAUCOES
QT POSOLOGIA
QT PRECO BARATO
QT MEDICAMENTOSEMELHANTES NAO
53
69
35
48
15
38
129
168
122
112
42
46
Example Question
˜ do paracetamol?”
“Quais as indicaçoes
˜ adversas do paracetamol?”
“Quais as reacçoes
˜ do paracetamol?”
“Quais as precauçoes
“Qual a dosagem do paracetamol?”
“Quais os medicamentos mais barados do paracetamol?”
“Quais os medicamentos semelhantes ao Efferalgan R
˜ provoque febre?”
que nao
˜
“Quais os medicamentos para a asma que nao
˜ com a febre?”
exijam precauçoes
˜
“Quais os medicamentos para a asma que nao
˜ com o paracetamol?”
tenham interacçoes
“Quais os medicamentos genéricos do paracetamol?”
Table 5.3: Number of questions for each question type in Training Corpus B.
set, which we will call the test corpus B. This test set includes 31 users. Out of these users only 4 of
them were in the medical eld. This low number does not justify making the distinction between common
users and medical staff on this test set. Test corpus B included a total of 322 questions divided into 9
scenarios. Table 5.4 shows details for each scenario. The questions were not pre-processed in any way.
Any errors and typos present in the questions were not removed.
Scenario
1
2
3
4
5
6
7
8
9
Total
questions
40
39
39
40
38
32
32
31
31
Expected
question type
QT INDICACOES
QT REAC ADVERSAS
QT PRECAUCOES
QT POSOLOGIA
QT MEDICAMENTOGENERICOS
QT PRECO BARATO
QT MEDICAMENTOSEM INTERACCAO
QT MEDICAMENTOSEMELHANTES NAO
QT MEDICAMENTOSEM PRECAUCOES
Table 5.4: Number of user questions and expected answer type for each scenario for Test Corpus B
Finally we have another subset of the test set called Test Corpus C, used to test anaphora resolution
on MedicineAsk. On the same online questionnaire described above, there were other scenarios present
in order to test additional question types. We wanted users to pose a question with an anaphora, but we
did not want to specically instruct the users to use anaphora as we thought that would inuence their
answers too much. For this reason we created a scenario which only tried to encourage users to use
an anaphora. Figure 5.1 shows the scenario used for this purpose. In order to stimulate the usage of
anaphora we simulated a previous interaction between the scenario's subject (John) and MedicineAsk.
For Test Corpus C we took all the questions we received from all 61 users, and ltered all of those
which had any reference to the medical entity. An example of a ltered question is Quanto custo o ácido
acetilsalic´lico? (“How much does acetylsalicylic acid cost?”). There were 33 questions after ltering,
such as Quanto custa? (“How much is it”).
The following sections detail each experiment performed on this thesis.
Section 5.2 details a preliminary experiment.
We tested several different feature sets for SVM
to discover if SVM brings any improvements over the previous version of MedicineAsk. In this
experiment SVM used the Training Corpus A. The questions tested were from Test Corpus A.
37
Figure 5.1: Scenario used to encourage users to use anaphora.
Section 5.3 documents the errors that emerged from integrating SVM into MedicineAsk.
Section 5.4 tests different strategies of question answering for MedicineAsk, using all the currently
available techniques. In this experiment SVM used the Training Corpus A. The questions tested
were from Test Corpus A.
Section 5.5 tests the different strategies once more after enriching the training corpus used by
SVM. For this experiment SVM uses the Training Corpus B. The questions tested were from Test
Corpus B.
Section 5.6 tests the effectiveness of anaphora resolution.
For this experiment SVM uses the
Training Corpus B. The questions tested were from Test Corpus C.
Section 5.7 re-runs the experiment described in Section 5.5 with anaphora resolution.
This ex-
periment aims at nding out if anaphora resolution improves the results from Section 5.5. For this
experiment SVM uses the Training Corpus B. The questions tested were from Test Corpus B.
For these experiments, the tests performed with the techniques available from LUP were performed
by a simple Java script which sent each question to LUP and stored the corresponding answer in a
text le. To execute the experiments on the MedicineAsk website, a Java program was created using
Selenium2 together with TestNG3 . TestNG is a testing framework similar to JUnit. Selenium is a library
that allows testing websites directly. Using this program, questions were automatically inserted into
MedicineAsk's website. The question's answer was also automatically evaluated.
5.2 Rule-based versus Automatic Question Classication
Before applying machine learning techniques directly to MedicineAsk, we rst veried if SVM had the
potential to bring any improvements to the rule-based MedicineAsk NLI. The rule-based MedicineAsk
NLI is the version from the previous thesis, which uses both rule-based and keyword spotting techniques
to answer questions. To do this, we compared both methods against the Test Corpus A and observed
2 http://www.seleniumhq.org/.
3 http://testng.org/doc/index.html.
38
the percentage of correctly classied questions. We tested a variety of features on SVM to observe if
one set of features had any major advantages over the other.
5.2.1
Results
Figures 5.2, 5.3 and 5.4 show the percentage of questions correctly classied, for each scenario (1 to
9) and for the sum of all scenarios (Total). For each scenario, we show the percentage of questions
correctly classied for each feature set used by the SVM and for the rule-based NLI. The rule-based
MedicineAsk NLI successfully answers a question if the interface returns the correct answer. Additionally
an answer by SVM is successful if both the question type and the named entities of the user question
are successfully identied. Figure 5.2 shows the results for common user questions, Figure 5.3 shows
the results for users in the medical eld and Figure 5.4 shows total results for all users combined. In
the gures, the features are as follows: u - unigram, b - bigram, x - word shape, l - length, bu - binary
unigrams, bb - binary bigrams. Each feature is explained in Section 4.1.1.
Figure 5.2: Percentage of correctly classied questions by scenario for common users after improvements.
Figure 5.3: Percentage of correctly classied questions by scenario for users in the medical eld after
improvements.
In addition to SVM, LUP can also classify questions through string similarity techniques. These
techniques compare a user's question to each training corpus question using string similarity techniques.
39
Figure 5.4: Percentage of correctly classied questions by scenario for all users after improvements.
Each question in the training corpus is mapped to a question type, thus if a user question is very similar to
a question in the training corpus then they will share the same question type. String similarity techniques
measure how similar two strings are to one another, in other others, it measures the distance between
two strings. Two identical strings have a distance of 0 and certain differences, such as different letters
or words between both strings, increases the distance.
We ran the experiment of this Section on each of the available string similarity techniques. Figures
5.5 and 5.6 show the comparison between each string similarity technique, MedicineAsk using rulebased methods and two different combinations of features of SVM. Figure 5.5 shows the results for
common user questions, Figure 5.6 shows the results for users in the medical eld
Figure 5.5: Percentage of correctly classied questions by scenario for common users, using string
similarity techniques
5.2.2
Discussion and Error Analysis
Observing the total scores over all scenarios, we conclude that SVM has an advantage over the original
rule-based and keyword spotting NLI. This is due to SVM being more exible than rule-based techniques.
SVMs learns how to classify questions by themselves, through exible algorithms with much higher
complexity than what man-made patterns can achieve. Machine learning techniques also do not need
40
Figure 5.6: Percentage of correctly classied questions by scenario for users in the medical eld, using
string similarity techniques
to heavily rely on single keywords, and instead take into account every word of every question in the
training corpus to understand a question. On the other hand, a user question must match a pattern for
the rule-based methods to succeed. Keyword spotting methods require certain keywords to be present
in the question, and these keywords must be manually built in a dictionary. If the dictionary is poor then
the questions cannot be answered.
The string similarity techniques, namely Dice and Jaccard, also improve on the previous version of
MedicineAsk, but their improvements are not as great as those brought by SVM.
None of the methods is robust to errors made by the user. In some instances, the user misspelled
certain words like medicine names. In other cases, the user made a question that was not what was
expected from that scenario (e.g. asking for adverse reactions when he was supposed to ask for indications). Even if these questions were correctly classied, they were wrong in relation to the scenario.
The majority of the cases in which the SVM failed were due to the fact that some words in the user's
requests were not present in the corpus. For example, in the question “Quais as doses pediátricas
recomendadas da mizolastina?” (“What are the recommended dosages for mizolastina?”) we nd that
“doses”, “pediátricas” and “recomendadas” are not present in the corpus. Also, some words in the training corpus were more frequently associated with a category different from the correct one, misleading
the classier.
Scenario 9 leads to very long questions and all methods have a great deal of trouble classifying
the question. Since the questions are very long users can express themselves in many different ways
which misleads the machine learning methods. The number of questions on the training corpus for this
category is also relatively low, as seen in Table 5.1, under “QT MEDICAMENTOSEM INTERACCAO”.
We see that in general a greater number of questions were correctly classied for common users than
for medical staff. This can be explained by the terminology used by each type of user. Most common
users use the entity names provided in the question, Medical staff, having greater medical knowledge,
sometimes use other, more complex terms, for those entities. For example, in a scenario regarding
vitamin A some medical staff used the term retinoids instead. While some of these questions were
correctly classied, some of these terms were too complex or returned results different from expected.
We see that simply using the unigram feature already provides very favourable results. In addition, by
41
using only one feature we decrease the complexity of the classication step, which allows for questions
to be classied more quickly. For this reason SVM with the unigram feature is the technique we decided
to use for the remainder of the experiments.
5.3 Integrating SVM into the MedicineAsk NLI
Observing the preliminary results we determined that machine learning methods have high potential to
improve MedicineAsk. We then integrated this machine learning method in MedicineAsk itself. We ran
the same experiment as detailed in section 5.2, with SVM being trained with Training Corpus A to classify
Test Corpus A. This experiment was performed to observe if the integration had been successful.
The results for the most part were the same but some issues were detected. While testing SVM
separately, a question would be classied as correct if the question type and the named entities present
on the question were correctly identied. While integrated with MedicineAsk however, the information
obtained by SVM must still be sent to the database in order for the actual answer to be obtained. The
issue was that Scenario 7 was a question regarding a medical condition called “acne nodular” and in the
medical entity dictionaries both “acne” and “nodular” are valid entities. The SQL query building process
of MedicineAsk does not expect this many entities on the question, and thus cannot retrieve an answer
from the database. By observing the results manually we see that the entities were correctly extracted,
but the answer is not actually obtained and so several questions were not answerable. This caused
SVM to fail to answer any questions from scenario 7. To x this entity issue, we added a simple piece of
code to ignore named entities that are contained in larger named entities. For example in the previously
mentioned case of “acne nodular” the detected entities are “acne nodular”, “acne” and “nodular”. With
this code, both “acne” and “nodular” are ignored as entities, and the entity used to answer the question
will be “acne nodular”, as intended.
5.4 First Experiment - Combining Question Answering approaches
As described in Section 4.1.2 we have three different strategies for combining question answering techniques:
Strategy 1: Rule-based, falling back to keyword spotting if no match is found
Strategy 2: Rule-based, falling back on machine learning if no match is found
Strategy 3: Rule-based, falling back on keyword spotting if no match is found, and then falling back
on machine learning if keyword spotting fails as well.
The following experiments test all these strategies, measuring the percentage of questions correctly
answered from Test Set A. In this experiment SVM is trained with Training Corpus A.
42
5.4.1
Results
The following gures shows the percentage of questions correctly classied between the different integration strategies. An answer is correctly classied if the NLI returns the expected answer through the
MedicineAsk website. Figure 5.7 shows the results for common user questions, Figure 5.8 shows the
results for users in the medical eld and Figure 5.9 shows total results for all users combined.
Figure 5.7: Percentage of correctly classied questions by scenario for common users for the First
Experiment.
Figure 5.8: Percentage of correctly classied questions by scenario for users in the medical eld for the
First Experiment.
5.4.2
Discussion and Error Analysis
The results and errors of this test were similar to those described in Section 5.2.2 because both the
training and test corpus were the same for both experiments.
Once again, one of the primary reasons for failure were errors made by the users. Misspelling words
like medicine names or other important keywords such as dosagens (doses) often lead to a question
being wrongly classied. There were also instances of users asking questions that were not expected
on that scenario (e.g. asking for adverse reactions when he was supposed to ask for indications).
43
Figure 5.9: Percentage of correctly classied questions by scenario for all users for the First Experiment.
SVM was still unable to answer certain queries, due to the corpus it was using for training not being
rich enough. Of the three strategies, Strategy 3 wields the best results. This is because both the keyword
spotting and the machine learning techniques are capable of answering questions the other technique
cannot answer. For this reason, by combining them, a greater number of questions can be answered.
5.5 Second Experiment - Increasing the training data
The purpose of this experiment was to evaluate if MedicineAsk NLI would be able to answer more questions if SVM's corpus was enriched. As explained in Section 5.1 we enriched the Training Corpus A with
the Test Corpus A, creating Training Corpus B. We also removed some words from the named entities
dictionary which were too generic and conicted with the question answering process (e.g. contra˜ (contraindications) and compat´vel (compatible)).
indicaçoes
5.5.1
Results
Since Test Corpus A is now part of the training corpus we must use a different test corpus. For this
experiment we measure the percentage of correctly classied questions from Test Corpus B. An answer
is correctly classied if the NLI returns the expected answer through the MedicineAsk website. Figure
5.10 shows the results of this experiment.
5.5.2
Discussion and Error Analysis
Some of the errors from this experiment are the same as the ones seen on the previous experiments
and we will not go into too much detail over them. These errors include:
User misspelling a word, namely medical entities (e.g. Para que serve o Effermalgan?, “What is
Effermalgan for?”);
User omitting a medical entity (e.g.
para que serve este medicamento?, “what is this medicine
for?”);
44
Figure 5.10: Percentage of correctly classied questions by scenario for all users for the Second Experiment.
˜ para tomar o Efferalgan?, “What are
User asked the wrong question (e.g. Quais sao˜ as indicaçoes
the indications of Efferalgan?” when the scenario was about precautions);
Presence of words in the medical entity dictionary that are too common (e.g. doença, “disease”).
We once again see that the addition of SVM to the answering process of MedicineAsk NLI brings
improvements. Strategy 3 still shows the best results, due to making use of all available techniques to
answer questions.
The reason why Strategy 3 is superior to Strategy 2 is because the keyword spotting technique can
answer some questions that SVM fails to answer. For example the question about indications Que
doenças se trata com Efferalgan? (“What diseases are treated with Efferalgan?”) is classied as a
question about interactions by SVM, so Strategy 2 fails to answer this question. The keyword spotting
technique, however, can successfully classify it as a question about indications and so Strategy 3 can
successfully answer this question. On the other hand, any question that cannot be answered by the
keyword spotting technique will be sent to SVM which will possibly answer the question correctly. This
is how Strategy 3 triumphs over Strategy 2.
However, Strategy 3 does not always beat Strategy 1 in each scenario. This is because the keyword
spotting and the machine learning techniques work differently. Strategy 3 fails to answer some questions that Strategy 2 successfully answers. This is caused by the keyword spotting technique answering
a question successfully but misinterpreting it. In some cases, SVM would have successfully answered
the question but, because the keyword spotting technique already answered it, SVM will never have a
chance at answering the question. For example, the question about precautions Como tomar Efferalgan? (“How to take Efferalgan?) is classied by the keyword spotting technique as a question about
dosages, while SVM classies it as precautions. Note that this question is ambiguous and that neither
of the methods was fully wrong in its analysis, but because the scenario expected the answer to be
precautions, only Strategy 2 succeeds. This issue affects scenario 3 the most.
Strategy 2 and Strategy 3's percentage of correctly classied questions improved on Strategy 1 by
15% and 17% respectively. Both methods show good results, but Strategy 3 has better results. Another
45
reason to use Strategy 3 is speed. SVM has the downside of being slower when answering questions
versus the other techniques. During the experiments detailed above, we observed that answering a
question using the keyword spotting technique took, on average, approximately 1 second. Using SVM
answering questions would take on average 5 seconds. While times may vary depending on the machine running MedicineAsk, every second is important when dealing with user interfaces, because slow
interfaces can decrease user satisfaction. For these reasons we have decided to use Strategy 3 as the
nal strategy in MedicineAsk.
5.6 Anaphora Evaluation
In this section we attempt to measure the improvements brought by the implemented anaphora resolution techniques to MedicineAsk's question answering. We start with a validation of a scenario specically
designed for anaphora. We ran Test Corpus C through Strategies 1, 2 and 3, plus the previous version
of MedicineAsk which does not include anaphora resolution.
As mentioned in Section 4.2.2 there are two ways to handle anaphora with SVM answers. We tested
Strategies 2 and 3 with this alternate anaphora resolution method and dubbed these new strategies as
Strategy 2.5 and Strategy 3.5 respectively. Thus we have:
Previous MedicineAsk - Strategy 1 with no Anaphora Resolution
Strategy 1 - Strategy 1 with Anaphora Resolution
Strategy 2 - Strategy 2 where SVM answers a question once and the resulting question type is
paired with the antecedent found for the current anaphora.
Strategy 2.5 - Strategy 2 where SVM answers a question with an anaphora, nds an antecedent,
appends it to the original question and re-answers that question.
Strategy 3 - Strategy 3 where SVM answers a question once and the resulting question type is
paired with the antecedent found for the current anaphora.
Strategy 3.5 - Strategy 3 where SVM answers a question with an anaphora, nds an antecedent,
appends it to the original question and re-answers that question.
For all strategies, if SVM answers a question and no entities are found, an entity is fetched from the
anaphora resolver. The difference is Strategies 2 and 3 use the question type originally found on the
rst analysis of the question while Strategies 2.5 and 3.5 append the antecedent entity to the question
and have SVM classify the question once again. The idea is that the rst method is faster, but because
there was no entity in the question then there is a higher chance the classication of the question type
will be wrong. The second method re-analyses the question with the presence of an entity so there is
a higher chance for the classication to be correct, but SVM must analyse the question twice which is
signicantly slower.
46
5.6.1
Results
Table 5.5 shows the results. The percentage of correctly identied anaphora shows the number of questions where the anaphora was resolved but the question was wrongly answered, because the question
type was not correct. The percentage of questions correctly answered shows the questions that had
both the correct resolution of the anaphora and the correct answer to the question.
Strategy
Previous MedicineAsk
Strategy 1
Strategy 2
Strategy 2.5
Strategy 3
Strategy 3.5
Percentage of correctly
identied anaphora
0%
67%
64%
70%
88%
94%
Percentage of questions
correctly answered
0%
67%
15%
0%
79%
79%
Table 5.5: Percentage of questions with anaphora correctly classied
5.6.2
Discussion and Error Analysis
The previous version of MedicineAsk is unable to answer any questions as there is no anaphora resolution in this version. When resolving anaphora from questions answered by SVM, the anaphora were
correctly resolved but the answers were not retrieved properly. The reason for this is that the questions
in this test were of the type “QT PRECO” (questions about prices of medicine) and there are only 5
questions of this type on the training corpus. We nd, however, that strategy 2 can identify more questions correctly than Strategy 2.5. This means analysing the question without the entity being present, for
this case, provided better results.
Both Strategies 2.5 and 3.5 identify more anaphora than Strategies 2 and 3 respectively, but do not
answer a greater number of questions. The reason for this is that a question can lead to no results if it
does not make sense. For example a question such as “What are the medicines that can cure vitamin
A?” is not possible. Since SVM in Strategies 2 and 3 analyse the question without any entities present,
there is a chance the question type will not be compatible with any of the anaphora antecedents currently
stored. If SVM classies a question as asking about medicines for a given medical condition, but only
medicines and active substances are present in the list of possible antecedents then the anaphora
cannot be resolved. On the other hand Strategies 2.5 and 3.5 take the rst anaphora they nd and
append it to the question. The question is then re-analysed with the entity and there is a higher chance
the question will make sense.
Some of the questions with the incorrect question type in Strategy 3 failed due to certain keywords.
The wording made by the user sometimes caused MedicineAsk to incorrectly classifying a question (e.g.
the presence of the word genérico “generic” in the question caused the question to be classied as a
question regarding the generics of a medicine rather than the price of the medicine).
The only anaphora that were not identied in any method contained other words which were unintentionally classied as an entity. Since an entity was present, no anaphora could be detected.
47
5.7 Testing Test Corpus B with anaphora resolution
To demonstrate anaphora resolution in a different way we performed the same experiment as the one
from Section 5.5 with anaphora resolution. We compared the results of Strategies 1 and 3 from that
experiment, with the results from running the same experiment but with anaphora resolution.
5.7.1
Results
Figure 5.11 shows the percentage of questions correctly classied for the questions in Test Corpus B. An
answer is correctly classied if the NLI returns the expected answer through the MedicineAsk website.
Figure 5.11: Percentage of correctly classied questions by scenario for all users for the Second Experiment with Anaphora Resolution.
5.7.2
Discussion and Error Analysis
Note that these improvements may not be accurate. This is because the anaphora present in Test
Corpus B may not have been intentional. If a user made a question such as “What are the adverse
reactions of the medicine?” because he assumed the system would still remember which medicine
he/she was talking about, then it is indeed a case of anaphora and the results are correct. However,
some users may have forgotten to type the entity, or simply misspelled it. If this is the case then resolving
the anaphora could lead to mistakes by the system. For example the user may have made an initial
question about the indications of paracetamol, and then made a question about the adverse reactions
of mizolastina. However, if he/she forgets to type mizolastina, or misspells that word, an anaphora will
be detected. In this case MedicineAsk will return the adverse reactions of paracetamol, which is not the
answer the user was interested in.
We see that, in total, Strategy 3 with anaphora resolution correctly answered 5% more questions
than regular Strategy 3. Most of the questions missing an entity were correctly classied. The only
exception occurred when a question was marked with an unintended entity (such as pré- “pre-”) which
was stored as a possible antecedent. Immediately after a question with no entities is posed, and the
unintended entity is used to answer this question, which often led to no results.
48
Anaphora resolution for medical entities brings improvements to MedicineAsk. It can also serve
as a tool to prevent spelling errors by users, as long as they were querying the same entity between
questions. To avoid errors a warning is shown telling the user that no entity was detected and a previous
one was used.
Identifying unintended entities in a question with anaphora leads to issues with both anaphora detection (because since an entity is present no anaphora can be detected) and anaphora resolution
(because the unintended entity is stored and possibly used in a later question). A thorough cleaning
and processing of the named entity dictionary should be performed in the future to avoid these types of
errors.
Strategy 3 with anaphora resolution still seems to show the best results, and will be the strategy used
by this version of MedicineAsk. Strategy 3.5 does not improve on the results enough to justify answering
each anaphora question twice using SVM.
5.8 Conclusions
Strategy 3 shows an increase of 17% in question answering during the experiment described in Section
5.5. The experiment in Section 5.6 shows us that the only medical entity anaphora that cannot be
resolved are ones where an unexpected entity was detected. From the experiment described in Section
5.7 we see that Strategy 3 with anaphora resolution answers 5% more questions than regular Strategy
3 and 18% more questions than regular Strategy 1 with anaphora resolution.
The 5% increase in correctly answered questions just from using anaphora resolution shows us that
misspelling of medical entities is one of the biggest problems in MedicineAsk. The rule-based method
has techniques to solve misspelled terms (as described in 3.1.2). However these techniques only work
because the rule-based method knows exactly where the medical entity should be, and thus can obtain
the misspelled word and process it in order to try and obtain an answer. The keyword spotting and
machine learning techniques do not require entities to be on specic locations, so special techniques
would be required to identify the presence of a misspelled word. If, somehow, the misspelled word
could be found then we could apply the same technique that the rule-based method uses to obtain an
answer to the question. One possible solution would be to remove some words from a question through
stop words, then use string similarity techniques to compare every remaining word to the named entity
dictionary.
Omission of words is another difcult topic to handle. We do not know for sure why a user would not
include the medical entity in his question, but there are some possibilities:
1. The user forgot to include the entity;
2. The user did not know he was supposed to insert an entity on the question;
3. The previous question was about the same entity, so the user assumed the system would still
remember which entity was being discussed.
49
Point 3 is a case of anaphora. Points 1 and 2 should not be handled by the system because it is a user
error. The issue is that it is difcult to distinguish between the three points. Our anaphora method will
assume there is an anaphora on every one of these situations and handle it accordingly. Our way of
handling the issue is by warning the user that we detected an anaphora and for that reason used the
previous entity to answer his question.
Users asking the wrong question also caused failures in the validation and could have a variety of
reasons. If the user made the mistake because he/she misunderstood the scenario's description, then
this problem would not occur in the real world as users would not be following someone else's scenario.
If the user typed the wrong question because he/she does not know the difference between terms like
interactions and precautions then it could potentially be interesting to have a mechanism of “related
questions” where topics deemed similar show up on the user's screen after posing a question.
Another issue commonly found was related to the presence of words in the corpus that are too
common. This issue requires the dictionary of named entities to be analysed and processed. Each word
that is too common and could interfere with question answering should be removed. The dictionary's
size makes this task difcult. Another issue is that removing certain words we think are common might
actually hurt the system's ability to answer questions.
To x the issue where the keyword spotting does not let SVM answer a question successfully the
ideal solution would be to involve the user. If, while using Strategy 3, an answer is obtained through
the keyword spotting technique, a prompt could appear on screen inquiring the user if that question was
correct. If the question was incorrect then SVM would attempt to answer the question, and possibly be
successful.
50
Chapter 6
Conclusions
This chapter describes the main conclusions obtained from the research and development of this thesis
and of the new version of the MedicineAsk NLI module. Section 6.1 shows a summary of this thesis, as
well as the contributions brought. Section 6.2 presents some limitations of our version of MedicineAsk
as well as possible solutions to those limitations. We also mention some ideas on how MedicineAsk
could be further improved.
6.1 Summary and Contributions
This thesis presents an improvement to the NLI module of MedicineAsk, a system that answers Portuguese Natural Language questions about active substances and medicines. Data was extracted from
the Infarmed website, processed and added to a database. MedicineAsk then allows users to access
this information through a NLI. This means users can access information about medicine and active substances using daily language, without having to learn any new specic means of communication with
the system. This thesis focused on improving the NLI module of MedicineAsk. The main objective was
to increase the quantity of user questions that MedicineAsk can answer. We also aimed to test several
different congurations and strategies of the question answering techniques, to determine which ones
brought better results. The work developed in this thesis resulted in the following contributions:
State of the art on question answering systems for the medical eld as well as information retrieval
systems for the medical domain. We also present a state on the art on different web-based systems
that provide medical information.
An improvement to the MedicineAsk NLI module, which namely includes:
– The addition of machine learning techniques on MedicineAsk. We added SVM to the techniques MedicineAsk uses to answer questions. We also tested several different SVM features
to test which one is more useful for our problem.
– Added support for questions with anaphora and ellipsis. Anaphora and ellipsis occur when
a previously mentioned entity is referred without the actual name of the entity. For example
51
“What are the indications of Mizollen? And the adverse reactions of that medicine?”. On
the second question, “of that medicine” references to “Mizollen”. This type of questions can
occur when, for example, a user wants to avoid typing an entity several times in a row. Due
to questions with anaphora and ellipsis not having any medical entities present in them, they
could not be answered previously. We keep a short history of questions. In case of anaphora,
MedicineAsk then analyses this history and chooses a previous entity to answer the present
question. This part of the work was made portable and can be used for other question answering environments that represent questions in a similar way to MedicineAsk.
– The extension of synonym detection, implemented on the previous version of MedicineAsk.
This feature was implemented but a comprehensive list of synonyms was not found due to
time constraints. We collected such a list and inserted it into the database.
A validation of every new addition to this version of MedicineAsk, where we test several different
question answering strategies to measure which retrieves the best results. We also identied the
current issues of MedicineAsk and possible solutions.
This thesis resulted in a paper and poster that were published in the “10th International Conference
on Data Integration in the Life Sciences”.
6.2 Future Work
We have identied several limitations with the new version of MedicineAsk. In this section we propose
some ideas for future solutions to these issues. Sections 6.2.1 to 6.2.5 detail these limitations.
During the research performed for this thesis we also identied some other possibly interesting areas
of development for MedicineAsk, as shown in Sections 6.2.6 to 6.2.8.
6.2.1
Additional question types
Currently our system answers questions that only cover one topic, for example “What are the indications
of paracetamol?”. However, a user may want to pose a more complex question, asking for more than one
type of information about a given substance or medicine, such as “What are the indications and adverse
reactions of paracetamol?”. Currently, to do this, the user would have to pose two different questions to
obtain all the information he seeks. To support these types of questions, a question would have to be
analysed for more than one question type, and all the identied question types would have to be stored.
This could be achieved by the keyword spotting and SVM techniques. The keyword spotting method
would have to be changed to take into account every keyword found on the question. For example,
if it nds a keyword for “indications” it should not stop and assume the question is about indications,
but instead should further analyse the questions in search of other keywords, such as a keyword for
“adverse reactions”. The SVM method currently analyses the question and returns the most likely class
that belongs to that question. To support questions with more than one topic, SVM would need to return
52
all of the classes that have a given threshold score, which may not be trivial. After storing every question
type found on the question, the NLP module would build an SQL query to retrieve from the database
all the necessary information. In the previous example it would return both the indications and adverse
reactions of paracetamol at once.
Other types of complex questions include questions that contain negation. Detecting negation can be
important if, for example, a user wants to know about a medicine that does not cause a given reaction.
The current version of Medicine.Ask can answer some questions with negation, for example, “What
medicines cure fever but do not cause sleepiness?”. There are, however, still several cases where the
negation will not be found.
6.2.2
Additional strategies for question answering technique combinations
Several strategies were tested in this version of MedicineAsk, which combined rule-based, keyword
spotting and SVM techniques in different sequential orders. While these strategies proved to be effective
there are types of strategies that would be interesting to test. One strategy consists on using a “voting
model”. This consists on having all available techniques provide their own answer for a given question.
Then, all of the answers are judged and the one that is considered correct will be the answer returned
by the system. The techniques used to judge which is the best answer would have to be studied further.
Another possibility is a “human in the loop” model. In this model the techniques are still used sequentially, but the user can express if he/she was satised by the system's answer. If the user replies
negatively then the next technique on the sequence is used to attempt to answer the question once
again. For example, in Strategy 3, which consists of rule-based, keyword spotting and SVM techniques
used sequentially, the keyword spotting technique may return an answer to a question. Afterwards a
prompt appears asking if the user is happy with the answer. If the user chooses “No” then MedicineAsk
attempts to answer the question using SVM. This would solve one of the issues identied in Section 5.5.2
where the keyword spotting method would successfully nd an answer to the question but the answer
was incorrect. Since technically the keyword spotting method did not fail then SVM had no opportunity to answer the question. By involving the user, SVM can attempt to answer a question even if the
keyword spotting method initially prevents this. It would also be interesting to use the same method to
include user feedback. The user feedback could then be used to automatically learn ways of improving
MedicineAsk.
6.2.3
Addressing common user mistakes
The 5% increase in correctly answered questions just from using anaphora resolution, seen in Section
5.7, shows us that misspelling of medical entities is one of the biggest problems in MedicineAsk. The
rule-based method has techniques to solve misspelled terms (as described in 3.1.2). However these
techniques only work because the rule-based method knows exactly where the medical entity should
be, and thus can obtain the misspelled word and process it in order to try and obtain an answer. The
keyword spotting and machine learning techniques do not require entities to be on specic locations, so
53
special techniques would be required to identify the presence of a misspelled word. If, somehow, the
misspelled word could be found then we could apply the same technique that the rule-based method
uses to obtain an answer to the question. One possible solution would be to remove some words from a
question through stop words, then use string similarity techniques to compare every remaining word to
the named entity dictionary.
Users asking the wrong question also caused failures in the validation and could have a variety of
reasons. If the user made the mistake because he/she misunderstood the scenario's description, then
this problem would not occur in the real world as users would not be following someone else's scenario.
If the user typed the wrong question because he/she does not know the difference between terms like
interactions and precautions then it could potentially be interesting to have a mechanism of “related
questions” where topics deemed similar show up on the user's screen after posing a question.
Another issue commonly found was related to the presence of words in the corpus that are too
common. This issue requires the dictionary of named entities to be analysed and processed. Each word
that is too common and could interfere with question answering should be removed. The dictionary's
size makes this task difcult. Another issue is that removing certain words that we think are common
might actually hurt the system's ability to answer questions.
6.2.4
User evaluation
Due to time constraints, a user evaluation was not performed on this version of MedicineAsk. This kind of
evaluation is important to measure how MedicineAsk compares to the Infarmed website. It is important
to know the level of satisfaction users of MedicineAsk have, and how easily they obtain their answers.
Ideally this evaluation would be performed with both common users and medical staff, to identify if both
types of users are equally satised with MedicineAsk.
6.2.5
Question type anaphora
We have a question type anaphora if MedicineAsk cannot nd a suitable question type for a given question (e.g. “What are the indications of paracetamol? And those of mizolastina?”, the second question
has no question type, as its referencing the rst one). This type of anaphora is possible on MedicineAsk,
but support for it was not added. This is because we did not nd a suitable solution for this problem.
MedicineAsk's current rule-based methods will not be able to answer these questions, as each pattern is based on each question type and there is currently no pattern for “no question type”. Creating
these patterns is not trivial, because, by not having a question type, these questions have very little
information to go on.
The keyword spotting techniques will also not have enough information to answer the question. One
possible method is as follows, consider that the previous input question was “What are the indications of
paracetamol?” :
1. Analyse a question using the keyword spotting technique (e.g., “And those of mizolastina?” )
54
2. No question type found, assume it is a medical entity anaphora and treat it as one (as detailed in
Section 4.2.2). Concatenate latest entity from Antecedent Storage to the original question (resulting in “And those of mizolastina? paracetamol” )
3. Re-analyse this new question again, with the keyword spotting technique
4. No question type found, extract the question type of the latest question in the Antecedent Storage
(“indications”.
5. Convert the question type into a template question (e.g., “What are the indications of”)
6. Append the original question to this template question (e.g., “What are the indications of And those
of mizolastina?”
7. Re-analyse this new question once again
Through this method, it might be possible for the keyword spotting techniques to answer some of
these questions. The issue with this solution is that it might become impossible for the keyword spotting
technique to fail, because in the worst case it will simply return the answer to the previous question.
This becomes a problem with Strategy 3, since SVM is only used when the keyword spotting technique
fails, no questions will be answered using SVM. Another possible solution for keyword spotting would
involve observing if a question has keywords related to entities but no keywords for question types. For
example, in the question “And those of mizolastina?” the keyword spotting technique would detect a
keyword of the type “active substances” due to “mizolastina”. However no keywords would be found for
question types such as “indications” or “adverse reactions”. Using this information it could be possible
to determine the existence of a question type anaphora.
Machine learning techniques have the problem of attempting to classify a question even if no question
type is present. This can be solved by either creating a class of questions for SVM, specically for cases
of anaphora, or by having a threshold score, where if no question type candidate is found with a match
above a given score then MedicineAsk assumes that question has no question type. SVM uses a score
to determine how likely it is for a question to belong to a class. The idea would be to assume there is
no question type if the score given to the chosen class is too low. The issue with a threshold score is
nding the correct value. Some questions which were correctly classied but with a low score will now be
wrongly classied. On the other hand if the threshold is too low then no anaphora will ever be detected. A
class of questions with no question type would consist of a class in the training corpus consisting entirely
of questions with no question type (e.g., questions such as “And those of SUBSTANCIA ACTIVA?”. The
issue here is the very low quantity of information in each question of the training corpus. It would be
difcult to classify a question using this method.
6.2.6
Updates to Information Extraction
Some years have passed since the last time information was extracted from the Infarmed website and
now its structure has changed. If the MedicineAsk Information Extraction module was used now it is
55
likely that it would not succeed in extracting any new information from the Infarmed website. It is then
necessary to update the MedicineAsk Information Extraction module. It would also be interesting to have
an automatic information extraction system. This would make it possible to extract information from the
Infarmed website even if the website suffered further changes, without having to alter any code from the
information extractor.
There are also other sources of information that could be explored. For example, it was suggested to
us by an acquaintance involved in the medical eld that MedicineAsk should be able to distinguish adverse reactions by their different kinds of frequency levels. Information leaets that come with medicines
usually contain the adverse reactions divided by frequency, such as rare adverse reactions and frequent
adverse reactions. It would be useful for a user to know if the adverse reactions he or she is experiencing
are common and expected or if they are of a very rare kind that might warrant a visit to the doctor. Unfortunately the Infarmed website does not distinguish between different degrees of frequency on adverse
reactions. While there are some cases where frequency is mentioned they are few and poorly structured.
Due to the small amount of information available, adding support for questions that distinguish the frequency of adverse reactions will have weak results. However, Infarmed also has a public database called
Infomed1 . This database lists several different kinds of information about medicines. This information
includes links to les with scans of the information leaets that come with each medicine. These information leaets do distinguish adverse reactions based on how frequent they are. It would be interesting
to extract this information and add it to the database, while also allowing support for questions regarding
the frequency of adverse reactions, and any other available type of information on these information
leaets.
6.2.7
MedicineAsk on mobile platforms
As mentioned in Section 1, medical information must sometimes be accessed quickly. For example a
doctor might need to conrm a diagnostic on an emergency patient or a common user may be suffering
an emergency and needs to know more about one of his medicines. However, the current MedicineAsk
system was made for traditional web browser interfaces. In emergency situations it may not be possible
to access a computer in time. A mobile application of MedicineAsk could solve this issue. Making the
information on the Infarmed website available through a portable device such as a smart phone would
make this information much more accessible.
6.2.8
Analysing Portuguese NL questions with information in other languages
The goal of MedicineAsk is to answer questions in the Portuguese language. To do this we have been
using Portuguese resources in order to answer Portuguese questions. The big disadvantage is that the
amount of medical information resources for the Portuguese language is very limited. Other languages,
such as English, have a much greater quantity and variety of medical information. There is also a greater
deal of investigation on English NLP when compared to Portuguese NLP.
1 http://www.infarmed.pt/infomed/inicio.php.
56
There are, however, ways of answering Portuguese questions while using resources in other languages, such as English. There has been investigation regarding this topic such as systems that answer French Natural Language questions with English Information [Grau et al., 2007]. In these works
two noteworthy strategies are described, and we will use the example of answering a Portuguese question with English information: The rst strategy consists on translating the Portuguese question posed
by the user to English, then analyse it with English NLP techniques and get an answer using English
resources. The advantage is the ability to analyse the question with English NLP techniques (which
may be superior). The disadvantage is that the question may be mistranslated and no information will
be extracted. The second strategy consists on analysing the Portuguese question with Portuguese NLP
techniques and then translate the extracted information (e.g. the question type and medical entities) into
English and use that information to query the English resources. The advantage is that the question
is analysed on the original language, meaning the question will not suffer from syntactic or semantic
errors caused by the translation. The disadvantage is that the translated information may lose context
because each expression is translated one by one without the help of the original question. It would also
be possible to run both methods in a hybrid fashion and use the one with better results.
57
58
Bibliography
A. R. Aronson. Effective mapping of biomedical text to the UMLS metathesaurus: the metamap program.
AMIA Annu Symp Proc, pages 17–21, 2001.
ˆ
J. P. V. Bastos. Prontuário terapeutico:
Medicine.ask. (in Portuguese). Master's thesis, Instituto Superior
Técnico, 2009.
A. Ben Abacha. Recherche de réponses précises a des questions médicales: le systeme
de questionsréponses MEANS. PhD thesis, Universite Paris-Sud, 2012.
A. Ben Abacha and P. Zweigenbaum. A hybrid approach for the extraction of semantic relations from
MEDLINE abstracts. Computational Linguistics and Intelligent Text Processing (CICLing), 6608:139–
150, 2011a.
A. Ben Abacha and P. Zweigenbaum. Medical entity recognition: A comparison of semantic and statistical methods. BioNLP, pages 56–64, 2011b.
A. Ben Abacha and P. Zweigenbaum. Automatic extraction of semantic relations between medical entities: a rule based approach. Journal of Biomedical Semantics, 2011c.
A. Ben Abacha and P. Zweigenbaum. Medical question answering: Translating medical questions into
SPARQL queries. ACM SIGHIT International Health Informatics Symposium (IHI 2012), 2012.
A. R. Chaves and L. H. Rino. The Mitkov Algorithm for Anaphora Resolution in Portuguese. Proceedings of the 8th International Conference on Computational Processing of the Portuguese Language
(PROPOR '08), pages 51–60, 2008.
Chih-Jen Lin Chih-Chung Chang. LIBSVM: a library for Support Vector Machines. Software available at
http://www.csie.ntu.edu.tw/ cjlin/libsvm, 2001.
D. Damljanovic, M. Agatonovic, and H. Cunningham. Natural language interfaces to ontologies: Combining syntactic analysis and ontology-based lookup through the user interaction. Extended Semantic
Web Conference (ESWC2010), June 2010.
A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum likelihood from incomplete data via the em algorithm. JOURNAL OF THE ROYAL STATISTICAL SOCIETY, 1977.
59
P. J. dos Reis Mota. LUP: A language understanding platform. Master's thesis, Instituto Superior
Técnico, 2012.
H. Galhardas, L. Coheur, and V. D. Mendes. Medicine.Ask: a Natural Language search system for
medicine information. INFORUM 2012 - Simpósio de Informática, September 2012.
B. Grau, A. Ligozat, I. Robba, A. Vilnat, M. Bagur, and K. Séjourné. The bilingual system MUSCLEF at QA@CLEF 2006. In Carol Peters, Paul Clough, FredricC. Gey, Jussi Karlgren, Bernardo
Magnini, DouglasW. Oard, Maarten de Rijke, and Maximilian Stempfhuber, editors, Evaluation of
Multilingual and Multi-modal Information Retrieval, volume 4730 of Lecture Notes in Computer Science, pages 454–462. Springer Berlin Heidelberg, 2007. ISBN 978-3-540-74998-1. doi: 10.1007/
978-3-540-74999-8 54. URL http://dx.doi.org/10.1007/978-3-540-74999-8_54.
J. R. Hobbs. Resolving pronoun references. Lingua, 44:311–338, 1978.
C. Jacquemin. Spotting and discovering terms through natural language processing. The MIT Press,
2001.
E. Kaufmann and A. Bernstein. How useful are natural language interfaces to the semantic web for
casual end-users? European Semantic Web Conference (ESWC 2007), 2007.
A. Leuski and D. R. Traum. Practical language processing for virtual humans. Innovative Applications of
Articial Intelligence (IAAI-10), pages 1740–1747, 2010.
F. Li and H.V. Jagadish. Constructing an interactive natural language interface for relational databases.
Proceedings of the VLDB Endowment, 8, 2014.
B. Loni, G. van Tulder, P. Wiggers, D.M.J. Tax, and M. Loog. Question classication by weighted combination of lexical, syntactic and semantic features. In Text, Speech and Dialogue. Springer Berlin
Heidelberg, 2011. ISBN 978-3-642-23537-5.
˜ STRING: an hybrid statistical and rule-based Natural
N. Mamede, J. Baptista, C. Diniz, and V. Cabarrao.
Language Processing Chain for Portuguese. PROPOR '12 (Demo Session), Coimbra, Portugal.,
2012.
I. Marcelino, G. Dias, J. Casteleiro, and J. Martinez. Semi-controlled construction of the European
Portuguese Unied Medical Language System. Workshop on Finding the Hidden Knowledge: Text
Mining for Biology and Medicine (FTHK 2008), 2008.
J. S. Marques. Anaphora resolution in Portuguese: An hybrid approach. Master's thesis, Instituto
Superior Técnico, 2013.
V. D. Mendes. Medicine.ask: an extraction and search system for medicine information. Master's thesis,
Instituto Superior Técnico, 2011.
R. Mitkov. Anaphora Resolution. Pearson Prentice Hal, 2002.
60
G. Savova, J. Masanz, P. Ogren, J. Zheng, S. Sohn, K. Kipper-Schuler, and C. Chute. Mayo clinic
clinical Text Analysis and Knowledge Extraction System (ctakes): architecture, component evaluation
and applications. Journal of the American Medical Informatics Association (JAMIA), 17:507–513,
2010.
G. F. Simoes. e-txt2db: Giving structure to unstrutured data. Master's thesis, Instituto Superior Técnico,
2009.
O. Uzuner, B. R. South, S. Shen, and S. L. Duvall. 2010 i2b2/va challenge on concepts, assertions, and
relations in clinical text. Journal of the American Medical Informatics Association (JAMIA), June 16
2011.
Lee W. S. Zhang, D. Question classication using support vector machines. In SIGIR '03: Proceedings
of the 26th annual international ACM SIGIR conference on Research and development in information
retrieval, pages pages 26–32, 2003.
61
62
Appendix A
Questionnaire used to obtain test
corpura
This appendix shows the questionnaire that was sent through Facebook to collect questions to test
MedicineAsk. It starts with a brief explanation of MedicineAsk as well as what was expected of the user.
It is then followed by 12 scenarios where the user should propose a one or more question formulations to
solve the scenario. Scenarios 5 and 6 were not used for any experiments. Scenario 9 was the scenario
which encouraged users in using anaphora.
MedicineAsk
˜ sobre medicamentos em l´ngua
O MedicineAsk é um sistema que permite a pesquisa de informaçao
natural, isto é, a linguagem que se usa no dia a dia. Por exemplo, para saber quais os medicamentos
para a febre, basta perguntar ao MedicineAsk: ”Quais os medicamentos para a febre?”.
˜
No quadro de uma tese de mestrado, precisamos da sua ajuda na recolha de diferentes formulaçoes
de perguntas sobre medicamentos. Para isso, gostariamos que lesse os cenários apresentados abaixo
e, de seguida, escrevesse uma ou mais perguntas que formularia ao MedicineAsk para resolver cada
˜ necessitam de ser muito complexas, ou de ter muitos detalhes.
um dos cenários. As perguntas nao
Estimamos que necessite de 5 minutos para completar o questionário. Agradecemos desde já a sua
˜
participaçao.
Cenário 1
˜ encontrou uma caixa de Efferalgan R sem o planeto informativo e gostaria de saber para
O Joao
que serve este medicamento. Sugira uma pergunta a submeter ao MedicineAsk para obter essa
˜
informaçao.
Cenário 2
˜ gostava de saber que efeitos indeTambém relativamente ao medicamento Efferalgan R , o Joao
˜ ocorrer quando tomar esse medicamento. Qual seria uma poss´vel pergunta a fazer
sejáveis poderao
˜
ao sistema, para responder a esta questao?
63
Cenário 3
˜ pretende saber quais os cuidados que deve ter antes de tomar o medicamento Efferalgan R .
O Joao
˜
Qual seria uma poss´vel pergunta a fazer ao sistema, para responder a esta questao?
Cenário 4
˜ a substancia
ˆ
˜ pretende saber qual a
Foi receitada ao lho do Joao
activa mizolastina. O Joao
dosagem indicada da mizolastina para crianças. Qual seria uma poss´vel pergunta a fazer ao sistema,
˜
para responder a esta questao?
Cenário 5
˜ pretende anotar a dosagem recomendada de Mizollen R e Efferalgan R , num bloco de noO Joao
˜ fazer uma única pergunta para obter as dosagens recomendadas de
tas em sua casa. Decide entao
ambos os medicamentos. Qual seria uma poss´vel pergunta a fazer ao sistema, para responder a esta
˜
questao?
Cenário 6
˜ quer ter a certeza de que os medicamentos que toma, Mizollen R e Efferalgan R , nao
˜ tem
ˆ
O Joao
˜ perigosas entre si. Qual seria uma poss´vel pergunta a fazer ao sistema, para responder a
interacçoes
˜
esta questao?
Cenário 7
˜ toma paracetamol, comprando marcas como o Panadol R . No entanto, deseja passar a
O Joao
ˆ
tomar medicamentos genéricos da substancia
activa paracetamol. Qual seria uma poss´vel pergunta a
˜
fazer ao sistema, para responder a esta questao?
Cenário 8
˜ pretende saber quais sao
˜ os medicamentos mais baratos da substancia
ˆ
O Joao
sinvastatina. Qual
˜
seria uma poss´vel pergunta a fazer ao sistema, para responder a esta questao?
Cenário 9
˜ pretende saber o preço de medicamentos com ácido acetilsalic´lico. Sabendo que o Joao
˜
O Joao
˜
já fez uma pergunta sobre as indicaçoes
do ácido acetilsalic´lico e já obteve a sua resposta, ambas
mostradas em baixo, como formularia a pergunta sobre o preço dos medicamentos com ácido acetil˜ Quais as indicaçoes
˜ do ácido acetilsalic´lico? MedicineAsk: Indicaçoes:
˜
salic´lico? Joao:
Dor ligeira a
˜
moderada; pirexia. Prolaxia secundária de acidentes cardio e cerebrovasculares isquémicos. Joao:
Cenário 10
˜ pediu ao seu médico que lhe receitasse medicamentos para a acne nodular que nao
˜ interaO Joao
˜ podia fazer ao sistema,
jam com a Vitamina A. Qual seria uma poss´vel pergunta que o médico do Joao
˜
para responder a esta questao?
Cenário 11
˜ pediu ao seu médico para receitar um medicamento semelhante ao Mizollen R que nao
˜
O Joao
ˆ
˜
provoque sonolencia
como efeito secundário. Qual seria uma poss´vel pergunta que o médico do Joao
˜
podia fazer ao sistema, para responder a esta questao?
Cenário 12
˜ perguntou ao seu médico que medicamentos pode tomar para a hipertensao,
˜ que nao
˜ exijam
O Joao
64
˜ podia fazer ao sistema,
cuidados com a asma. Qual seria uma poss´vel pergunta que o médico do Joao
˜
para responder a esta questao?
65
66
Appendix B
Dictionary used to identify named
medical entities in a user question
(Excerpt)
This appendix shows an excerpt of the dictionary of named medical entities. Both the keyword spotting
technique and SVM use this dictionary to nd entities in user questions.
SUBSTANCIA_ACTIVA ALFACALCIDOL
SUBSTANCIA_ACTIVA PIRITIONA ZINCO
SUBSTANCIA_ACTIVA PIMECROLIMUS
SUBSTANCIA_ACTIVA TROPISSETROM
SUBSTANCIA_ACTIVA CIAMEMAZINA
SUBSTANCIA_ACTIVA FLUFENAZINA
SUBSTANCIA_ACTIVA ISONIAZIDA + PIRAZINAMIDA + RIFAMPICINA
SUBSTANCIA_ACTIVA CICLOBENZAPRINA
SUBSTANCIA_ACTIVA AZATIOPRINA
SUBSTANCIA_ACTIVA PRULIFLOXACINA
SUBSTANCIA_ACTIVA DARIFENACINA
SUBSTANCIA_ACTIVA PARACETAMOL
SUBSTANCIA_ACTIVA TERBINAFINA
SUBSTANCIA_ACTIVA NICERGOLINA
SUBSTANCIA_ACTIVA ACIDO ACETILSALIC
ILICO + ACIDO ASCORBICO
SUBSTANCIA_ACTIVA LINCOMICINA
SUBSTANCIA_ACTIVA BETAMETASONA
SUBSTANCIA_ACTIVA MEBENDAZOL
SUBSTANCIA_ACTIVA SITAGLIPTINA
67
MEDICAMENTO Falcitrim
MEDICAMENTO Malarone
MEDICAMENTO Resochina
MEDICAMENTO Halfan
MEDICAMENTO Plaquinol
MEDICAMENTO Mephaquin Lactab
MEDICAMENTO Flagentyl
MEDICAMENTO Tav
egyl
MEDICAMENTO Viternum
MEDICAMENTO Drenoflux
MEDICAMENTO Fenistil
MEDICAMENTO Atarax
MEDICAMENTO Primalan
MEDICAMENTO Tinset
MEDICAMENTO Fenergan
MEDICAMENTO Dinaxil
MEDICAMENTO Actifed
MEDICAMENTO Zyrtec
MEDICAMENTO Azomyr
MEDICAMENTO Aerius
CONDICAO_MEDICA acne nodular
CONDICAO_MEDICA sonol^
encia
CONDICAO_MEDICA sonolencia
CONDICAO_MEDICA sono
CONDICAO_MEDICA hipertens~
ao
CONDICAO_MEDICA hipertensao
CONDICAO_MEDICA hipertensor
CONDICAO_MEDICA tens~
ao arterial
CONDICAO_MEDICA tensao arterial
CONDICAO_MEDICA antihipertensores
CONDICAO_MEDICA asm
aticos
CONDICAO_MEDICA asmaticos
CONDICAO_MEDICA anti-hipertensores
68
Appendix C
Training Corpus B Excerpt
This appendix shows an excerpt of the training corpus B. This is the training corpus which MedicineAsk
currently uses to train SVM.
˜
QT INDICACOES Quais as i n d i c a ç oes
da NE SUBSTANCIA ACTIVA?
˜
˜ quais?
da NE SUBSTANCIA ACTIVA s ao
QT INDICACOES As i n d i c a ç oes
ˆ
QT INDICACOES O NE SUBSTANCIA ACTIVA é i n d i c a d o para qu e?
QT INDICACOES Para que é i n d i c a d o o NE SUBSTANCIA ACTIVA?
QT INDICACOES O NE SUBSTANCIA ACTIVA é i n d i c a d o em que casos?
QT INDICACOES Para que serve NE MEDICAMENTO?
QT INDICACOES Qual o o b j e c t i v o do NE MEDICAMENTO?
QT INDICACOES Para que serve o NE MEDICAMENTO?
QT INDICACOES Qual e a b u l a do NE MEDICAMENTO?
QT INDICACOES O que e NE MEDICAMENTO e para que e u t i l i z a d o ?
QT PRECO BARATO Quais os medicamentos mais b a r a t o s da NE SUBSTANCIA ACTIVA?
QT PRECO BARATO Quais os medicamentos mais econ ómicos da
NE SUBSTANCIA ACTIVA?
QT PRECO BARATO Quais os medicamentos mais em conta da NE SUBSTANCIA ACTIVA
?
QT PRECO BARATO Quais os medicamentos mais acess ´ v e i s da
NE SUBSTANCIA ACTIVA?
QT PRECO BARATO Quais os medicamentos de pre ço mais b a i x o da
NE SUBSTANCIA ACTIVA?
QT MEDICAMENTOSEM PRECAUCOES Quais os medicamentos para o
˜ e x i j a m precau ç oes
˜
NE CONDICAO MEDICA que n ao
com o NE CONDICAO MEDICA?
QT MEDICAMENTOSEM PRECAUCOES Quais os medicamentos para o
˜ tenham precau ç oes
˜
NE CONDICAO MEDICA que n ao
com o NE CONDICAO MEDICA?
69
QT MEDICAMENTOSEM PRECAUCOES Quais os medicamentos para o
˜ e x i j a m cuidados com o NE CONDICAO MEDICA?
NE CONDICAO MEDICA que n ao
˜ e x i j a m cuidados
QT MEDICAMENTOSEM PRECAUCOES Quais os medicamentos que n ao
com o NE CONDICAO MEDICA para o NE CONDICAO MEDICA?
QT MEDICAMENTOSEM PRECAUCOES Quais os medicamentos para o
NE CONDICAO MEDICA que adequados em caso de NE CONDICAO MEDICA?
QT MEDICAMENTOGENERICOS Que medicamentos de marca g e n e r i c a possuem o
NE MEDICAMENTO como s u b s t a n c i a a c t i v a ?
QT MEDICAMENTOGENERICOS Qual o g e n e r i c o que s u b s t i t u i o NE MEDICAMENTO?
QT MEDICAMENTOGENERICOS Quais as marcas de g e n e r i c o s de NE MEDICAMENTO e
que existem ?
QT MEDICAMENTOGENERICOS Quais os g e n e r i c o s do NE MEDICAMENTO?
QT MEDICAMENTOGENERICOS Quais sao os g e n e r i c o s do NE MEDICAMENTO?
70