SaskNet: A Spreading Activation Based Semantic Network

Transcription

SaskNet: A Spreading Activation Based Semantic Network
SaskNet: A Spreading
Activation Based Semantic
Network
Report on the Current State of Development
Brian Harrington
St. Cross College
[email protected]
Submitted in Partial Completion of the Requirements for Transfer to
DPhil Status
Computing Laboratory
Oxford University
September 2006
Contents
1 Introduction
1.1 Format of This Paper . . . . . . . . . . . . . . . . . . . . . .
1
2
2 Parsing and Semantic Analysis
2.1 Parsing . . . . . . . . . . . . . . . .
2.1.1 Choosing a Parser . . . . . .
2.1.2 The Clark and Curran Parser
2.2 Discourse Representation Structures
2.2.1 CCG2Sem . . . . . . . . . . .
.
.
.
.
.
4
4
5
5
7
8
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3 Semantic Networks
3.1 A Semantic Network Definition . . . .
3.2 The SaskNet Semantic Network . . . .
3.2.1 Network Implementation . . .
3.3 Parser Filtering . . . . . . . . . . . . .
3.3.1 Clark and Curran Parser Filter
3.3.2 CCG2Sem Filter . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
10
10
12
13
17
18
21
4 Spreading Activation
4.1 History of Spreading Activation .
4.2 Spreading Activation in SaskNet
4.3 Information Integration . . . . .
4.4 Firing Algorithm . . . . . . . . .
4.5 Update Algorithm . . . . . . . .
4.6 Cleanup Algorithm . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
25
26
27
27
28
30
34
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5 Similar Projects
36
5.1 Manually Constructed Networks . . . . . . . . . . . . . . . . 36
5.2 Automatically Constructed Networks . . . . . . . . . . . . . . 37
6 Potential Uses
39
7 Future Work
41
7.1 Timeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
i
Chapter 1
Introduction
This paper will discuss the motivation for and development of the SaskNet
(Spreading Activation based Semantic Knowledge NETwork) project. The
goal of the SaskNet project is to develop a system which can automatically extract knowledge from natural language text, and build a large scale
semantic network based on that knowledge.
One of the fundamental ideas behind SaskNet is that more data means better results. To that end, we have focused on creating a system which can
process large and varied text corpora to extract a wide variety of information. SaskNet is designed to be run on virtually any English text, and while
building a system which is tuned to a particular type of information may
have benefits in terms of the quality of information extracted from a particular document, we feel designing SaskNet to be able to use larger and more
varied corpora will prove to be a more beneficial approach.
SaskNet creates a semantic network for a document by translating each
sentence into a network fragment which is then viewed as an update to the
document network. Once a network has been built for a complete document,
it is then used as an update to the larger knowledge network which represents
all the knowledge acquired by the system.
The merge algorithm used for both the sentence and document level updates
merges the smaller update network with the larger existing network. The algorithm uses spreading activation to determine the mappings between nodes
in the two networks. If a node in the update network refers to a semantic
object for which a node in the existing network already exists, the merge
algorithm attempts to map the nodes together.
A large-scale semantic network would be of great use to a wide variety
1
of projects ranging from traditional information retrieval and question answering systems to machine translation and artificial intelligence. Several
attempts have been made to develop networks similar to SaskNet, but so
far these attempts have met with only limited success. Some projects have
attempted to manually create networks, but this is a very slow and labour
intensive process. Automated network creation has been tried in the past,
but has mostly been abandoned due to the sheer computational difficulty of
the task. However, recent advances in wide coverage parsers, combined with
promising results with terascale corpora make the long sought after goal of
a large scale semantic knowledge network a viable possibility. The SaskNet
project aims to make that goal a reality.
1.1
Format of This Paper
This paper attempts to explain the design of SaskNet starting from theory
and working towards implementation. Each of the sections in chapters 2 - 4
begin with an explanation and history of a concept, then proceeds to discuss
how the concept applies to SaskNet, and finally shows the particulars of how
that concept is implemented within the project.
Chapter 2 discusses how text is manipulated before it is input into the
SaskNet system. Essentially this chapter deals with the external programs
which SaskNet has incorporated into its framework. In particular it deals
with parsing and how text is parsed before it is put into the network; and
with Discourse Representation, and how using a tool to turn sentences into
discourse representation structures is beneficial to the SaskNet project.
Chapter 3 deals with the structure of the network itself. Beginning by attempting to reach a common definition for semantic networks; the chapter
then explains the structure of the SaskNet network and how it was implemented for this project.
Chapter 4 discusses spreading activation and its use in SaskNet. The chapter begins with a history of spreading activation, and attempts to provide
motivation for why it is a useful tool for this project. It then explains how
spreading activation is used in the update algorithm for information integration, and why this is such an important step in building the network.
Chapter 5 surveys various attempts to build large scale semantic networks.
Beginning with manually created networks which are often used today despite their limitations, the chapter then proceeds to discuss other projects
which are currently trying to build automated resources similar to SaskNet.
2
Chapter 6 attempts to provide motivation for the SaskNet project. It discusses which fields could benefit from a large scale semantic knowledge network and why the currently available networks are often insufficient for the
needs of certain projects.
Chapter 7 provides a timeline for future work on the project. It attempts
to give an overview of future work on the project and give a brief discussion
of what will be required in each major phase over the next two years.
3
Chapter 2
Parsing and Semantic
Analysis
Manipulating natural language in its raw form is a very difficult task. Parsers
and semantic analysis tools allow us to work with the content of a document
on a semantic level. This simplifies the process of developing a semantic
network both computationally and algorithmically. To this end, SaskNet
employs a set of software tools to render the plain text into a discourse
representation structure, from which point it can turn the information into
a semantic network fragment with relative ease.
In order for SaskNet to create its semantic network fragment to represent
a sentence, it must know the constituent objects of the sentence and their
relations to one another. This information is very difficult to extract, and
even the best tools available are far from perfect. SaskNet has been designed
to use external tools for parsing and semantic analysis so that as these tools
improve, SaskNet can improve with them. It has also been designed to not
rely too heavily on any one tool so that if a better tool is developed, SaskNet
can use it to achieve the best performance possible.
2.1
Parsing
Before we can work with natural language text, we must first analyse and
manipulate it into a form that can be easily processed. Parsing converts
“plain” natural language text into a data structure which the system can
either use to build a semantic network fragment directly, or can use as the
input to a semantic analysis program.
4
2.1.1
Choosing a Parser
One of the major strengths of SaskNet lies in its ability to integrate large
amounts of differing information into its network. In order to exploit this
power, it is necessary for all stages of the system to be able to handle a wide
range of inputs, and process those inputs quickly.
The choice of which parser to use is therefore very important to the success
of SaskNet. For the entire system to perform well, the parser must be both
wide coverage and efficient. To this end, many parsers were considered such
as the Charniak Parser [Charniak, 2000], the Collins Parser [Collins, 1999]
and RASP [Briscoe and Carroll, 2002]. Eventually speed considerations and
its relational made the Clark and Curran Parser [Clark and Curran, 2004]
an obvious choice.
2.1.2
The Clark and Curran Parser
The Clark and Curran Parser is a wide coverage statistical parser based
on Combinatory Categorial Grammar (CCG) [Steedman, 2000], written in
C++. CCG is a lexicalised grammar formalism where each word in a sentence is paired with a lexical category, which defines how the word can
combine with adjacent words and word phrases. For example in Figure 2.1
the word likes is given the lexical category (S[dcl]\NP)/NP which means
that if it finds a noun phrase to its right (in this case pizza), it will combine
with it, and the new phrase (likes pizza) will be given the lexical category
S[dcl]\NP (Something which is looking for a noun phrase to its left in order
to become a declarative sentence). The lexical cateogories of words are combined using a small number of combinatory rules to produce a full derivation
for the sentence.
Figure 2.1: A simple CCG derivation using forward (>) and backward (<)
application.
CCG was designed to capture long range dependencies in syntactic phenomena such as coordination and extraction which are often entirely missed by
other parsers. Most parsers have a set window (number of words or characters) within which they can find dependencies, but dependencies in natural
5
language text can be arbitrarily far apart. Take for example the sentence
given in (2.1). We can easily move the dependency farther apart by adding
a clause such as in (2.2). It is easy to continue to add clauses to the sentence
to move the initial dependency farther apart as illustrated in (2.3).
The dog that Mary saw.
(2.1)
The dog that John said that Mary saw.
(2.2)
The dog that John said that Ann thought that Mary saw.
(2.3)
As the dependencies move farther apart, most parsers have greater difficulty
in recognising them, and in many cases once they move farther apart than
the parser’s set context window, they cannot be found at all. CCG was
specifically designed to be able to capture dependencies regardless of the
intervening distance in the text, and thus the Clark and Curran parser is
able to extract these dependencies that most other parsers miss.
CCG is a lexicalised grammer which means it only has a small number of
combinatory rules for combining lexical cateogries; this allows the Clark and
Curran Parser to use a supertagger to very efficiently assign lexical cateogries
to words in much the same way as standard taggers assign part of speech
tags. This results in a parser which is both efficient and robust [Clark and
Curran, 2004].
Aside from its speed, the other major advantage of using the Clark and Curran Parser for SaskNet is its relational output. While the other parsers we
evaluated only output phrase-tree structures, the Clark and Curran Parser
has the ability to output grammatical relations such as subjects and objects
of verbs and nominal modifiers [Bos et al., 2004] which allows for a much
simpler transition from parser output to semantic network.
The Clark and Curran parser also provides a named entity recogniser implemented as a separate program that combines its output with that of
the parser. This program is a sequence tagger which attaches entity tags
to words to label them as belonging to certain categories such as person,
location, organisation, date and monetary amount.
While several considerations caused the Clark and Curran parser to be chosen for use in SaskNet, the system has been constructed in such a way that
it is not heavily dependent on any particular parser or parsing algorithm. In
order to integrate a different parser into Sasknet, it would only be necessary
to design a new filter (see Section 3.3) for that parser.
6
2.2
Discourse Representation Structures
Discourse Representation Theory (DRT) takes a dynamic perspective of
natural language semantics [van Eijck and Kamp, 1997] where each new
sentence is viewed in terms of its contribution to an existing discourse. A
Discourse Representation Structure (DRS) is a formalised representation of
all of the information available at a given point in a discourse. New sentences
in the discourse are viewed as updates to the structure [Kamp and Reyle,
1993]. DRT was initially designed to solve the problem of unbound anaphora
[van Eijck, 2005], and is particularly useful for establishing links between
pronouns and their antecedents. DRT has equivalent expressive power to
first order theory.
When interpreting a sentence as a DRS a discourse referent (essentially a
free variable) is created whenever an indefinite noun phrase (e.g., a dog,
someone, a car of mine) is encountered. Definite noun phrases (e.g., this
dog, him, my car ) are always linked to existing discourse referents. For
example 1 when processing the discourse in (2.4), the first sentence creates two discourse referents, x and y referring to the woman and the room
respectively. Then three conditions are created woman(x), room(y) and entered(x,y). This produces the DRS seen in (2.5).
A woman entered the room. She smiled.
(2.4)
(x,y)(woman(x),room(y),entered(x,y))
(2.5)
When interpreted, the second sentence in discourse (2.4) creates the DRS
seen in (2.6). However when it is processed as an update to (2.5), it also
produces a link between the variable z, assigned to the pronoun she, and the
variable x which represents the antecedent of that pronoun. Thus producing
the updated DRS (2.7))
(z)(smiles(z))
(2.6)
(x,y,z)(woman(x),room(y),entered(x,y),z=x,smiled(z))
(2.7)
Kamp [1981] and Kamp and Reyle [1993] give a full set of rules for creating
DRSs in formal detail, covering constraints on anaphoric linking, nested DRS
structures, and special case DRS linking such as implication and disjunction.
1
This example has been simplified for clarity as well as to be more directly consistent
with the type of DRS used by CCG2Sem (see Section 2.2.1) and thus does not represent
all of the details of the DRS formalism as presented in Kamp and Reyle [1993]
7
2.2.1
CCG2Sem
Discourse representation theory is particularly useful to the construction of
SaskNet because it builds on similar principles of interpreting new information within the context of the current knowledge base (see Section 4.5). In
order to leverage the power of DRT, SaskNet uses the CCG2Sem program
[Bos, 2005].
CCG2Sem is a Prolog program which uses the output of the Clark and
Curran parser (see Section 2.1.2) to construct semantic derivations based on
DRS structures [Bos et al., 2004]. Essentially this program in conjunction
with the Clark and Curran parser allows seamless transition from natural
language text into a DRS, complete with nested structure, entity recognition,
and some limited anaphoric pronoun resolution. Some example output of
the program can be seen in Figure 2.2.
Representing the sentence as a DRS is ideal for SaskNet for several reasons.
The DRS structure very closely mirrors the semantic network structure used
in SaskNet, with discourse referents being roughly equivalent to object nodes
and the semantic relations being analogous to either node labels or relations
(see Section 3.2.1).
8
Figure 2.2: Example output in Prolog format and pretty print for the sentences Pierre Vinken, 61 years old, will join the board as a nonexecutive
director Nov. 29. Mr. Vinken is chairman of Elsevier N.V., the Dutch
publishing group.
9
Chapter 3
Semantic Networks
A semantic network can loosely be defined as any graphical representation
of knowledge using nodes to represent semantic objects and arcs to represent
relationships between objects. Used since at least the 3rd century AD in
philosophy, with computer implementations in use for over 45 years [Masterman, 1962], a wide variety of formalisms have used the name semantic
network [Sowa, 1992].
3.1
A Semantic Network Definition
For the purposes of this thesis, we will wish to posit certain requirements for
what we will consider as a “proper” semantic network. Primarily, we will
require that the relations in the network be labelled and directed. This is to
distinguish semantic networks from what we will call associative networks
which connect concepts based simply on the existence of a relationship without regards to the relationship’s nature or direction (See Figure 3.1). While
associative networks are technically a type of semantic network, and are
quite often used because they can easily be extracted from co-occurrence
statistics [Church and Hanks, 1990], for our purposes their lack of power
and expressiveness will discount them from consideration.
The second requirement we shall impose upon semantic networks is that they
be structurally unambiguous. It should not be possible for two semantically
different ideas or discourses to be encoded in identical network structures.
Thus, even though the semantically different ideas of John using a telescope
to see a man and John seeing a man carrying a telescope can be encoded in
the same English sentence John saw the man with the telescope, when that
10
fly
airplane
bird
wing
plant
tree
animal
cat
dog
christmas
house
whiskers
Figure 3.1: An example of an associative network. Objects and concepts
are linked without distinction for type or direction of link.
sentence is translated into a semantic network, the structure of the network
must uniquely identify one of the two interpretations (See Figure 3.2).
with
John
telescope
saw
man
(a) John used the telescope to see the man
(b) John saw the man carrying the telescope
Figure 3.2: Semantic network representations of the two parses of “John
saw the man with the telescope”
Semantic networks may still contain lexical ambiguity through having ambiguous words used as labels on nodes and arcs. For example in figure 3.3,
it is impossible to tell whether bank refers to a financial institution, or the
edge of a river. It is theoretically possible to remove lexical ambiguity from a
semantic network by forcing each node to be assigned to a particular sense
of the word(s) in its label, however word sense disambiguation is a very
difficult task and there is no complete solution currently available.
The third and final requirement we will make for semantic networks is that
11
John
went to
bank
Figure 3.3: A lexically ambiguous network
they must be able to accommodate the complex structures regularly found
in natural language speech. In particular we will require that the network allow relations between complex concepts which may themselves contain many
concepts and relations. This is to distinguish proper semantic networks from
what we will call atomic networks which only allow simple nodes representing a single concept. These networks can only accommodate a limited type
of information, and thus we will not include them in our definition of semantic networks.
3.2
The SaskNet Semantic Network
The semantic network formalism developed for SaskNet meets all of the
criteria we have set out for consideration as a “proper” semantic network,
and also has a few extra features that make it particularly well suited to the
SaskNet project. We will first explain how the criteria are met, and then
briefly describe the extra features that have been added.
SaskNet trivially meets the first criterion by its design. All relations in
SaskNet are labelled and directed with a defined agent and target node, of
which at least one must be present before a relation can be added to the
network.
The second criterion is taken care of for us by the parser. One of the primary
features of a parser is to select one parse from the list of possible parses for
a natural language sentence. Since no information is discarded in the translation from the parser output to the network creation, we maintain a single
parse and thus are left without any of the original structural ambiguity.
The third criterion is met by the hierarchical structure of the network. This
allows complex concepts and even entire discourses to be treated as single
objects. As we see in Figure 3.9, complex objects can be built up from
smaller objects and their relations. The hierarchical structure is unrestrictive, and thus it is possible for any pair of nodes to have a relation connecting
them, or for a single node to be a constituent of multiple complex nodes.
In figure 3.9 we can also see the attribute nodes (denoted by ellipses). Any
12
Figure 3.4: A Hierarchical Semantic Network
object or relation can have multiple attribute nodes, and attribute nodes
can also be complex nodes.
One additional, but very important feature of SaskNet’s semantic network
is that every link in the network is assigned a value between 0 and 1. This
value represents the confidence of the link. This value can be determined by
various means such as the confidence we have in the source of our information, or the number of different sources which have confirmed a particular
relation. In practice, the value (or weight) of a link is set by the merge algorithm (see Section 4.5). The weight is assigned to the link rather than to the
relations for two reasons: primarily it allows us to assign a different weight
to each direction of a relation, and secondly it allows us to use weighted
links to attach attribute nodes as well as relation nodes.
3.2.1
Network Implementation
SaskNet’s internal semantic network is implemented in java. It is designed
to allow maximum flexibility in node type and hierarchy. A class diagram 1
for the network is given in Figure 3.5.
In this section we will explore the details of each of the classes of the network,
explaining the functionality and design decisions of each class and how it
1
This is a simplified class diagram and contains only a portion of the total class information. Much of the detail has been left off to increase the saliency of more important
features.
13
Figure 3.5: Class diagram of SaskNet’s semantic network architecture
interacts with the other parts of the network.
SemNode
SemNodes can come in 4 distinct types (ObjNode, RelNode, AttNode and
ParentNode), Each node type has a distinct activation threshold, but all of
the node types are implemented almost identically. The primary difference
between the node types is the way they are treated by the SemNet class.
• ObjNodes represent atomic semantic objects. They have a special field
called neType which is set if the named entity recogniser (see Section
2.1.2) provided it with a label. For example, if a node represents a
person, its neType field would be set to “per”.
• RelNodes represent relations between objects or concepts. While some
semantic networks simply label their links with the names of relations,
14
SaskNet uses fully implemented nodes for this purpose, primarily so
that relations themselves can have attributes and adjustable firing potentials, and also so that a single relation can have more than one
label. All RelNodes have a agent and target link which provide the
direction of the relation, at least one of these links must be instantiated
to ensure that all relations must have a direction.
• AttNodes represent attributes of an object or concept. They are essentially simplifications of a class of node/relation pairs used to make
the network more intuitive.
• ParentNodes represent complex semantic concepts made up of two or
more nodes. All of the members of the complex concept are labelled as
the concept’s children and each node has a link to any ParentNodes of
which it is a member. ParentNodes are often vacuous parents, which
means that they are unlabelled and provide no additional information
beyond the grouping of their constituent nodes.
All nodes have a unique id assigned to them at the time of their creation
which indicates the document from which they originated. An array of labels
allows many labels to be added to a single node, which is necessary as the
same concept is often referred to in a variety of ways. Each node has arrays
of links to the nodes with which it is connected, these are stored based on
the type of node linked. Finally, all nodes contain a link to the monitor
processes for their network so that they can easily report their status and
events such as firing.
SemNodes can receive a pulse of activation from another node or from the
network, this increases the potential variable. If this causes it to exceed
15
the firingLevel, then the node sends a request to SemFire to be fired (see
Section 4.4 for more details). Nodes can also be merged; merging copies all
of one node’s links and labels to another and then deletes the first node.
Deleting a node sends messages to all connected nodes to delete the appropriate links from both ends so that no “dead” links can exist in the network.
Each node also has a print method based on its node type. This allows
proper output of SemNet’s printNetwork() method.
SemLink
SemLinks form the links between nodes, each link is assigned a strength
when it is created, this represents the certainty of the link (i.e., how confident
the system is that this link exists in the real world). This can be increased
or decreased by the network as more information is gained.
SemNet
The SemNet class is the interface into the semantic network. All of the
functionality of the network is available through SemNet’s methods. SemNet
is used to add, remove, retrieve and manipulate nodes. It also indexes the
nodes and contains the update algorithm (see Section 4.5).
SemNet must be able to retrieve nodes based on both their unique ID and
their label. Since the same label may be used in many nodes, this is achieved
with a pair of hashtables. The first hashtable maps a string into a list of the
IDs of all nodes which have that string as one of their labels. The second
hashtable maps an ID to its corresponding node. The combination of these
16
two hashtables allows SemNet to retrieve nodes based on either their label
or their ID.
SemNet’s print() method prints the contents of the network in GraphViz
[Gansner and North, 2000] format so that the graph can be visualised for
simple debugging. This is done by calling the print() method of every node
in the network. Each node then prints out its own details and the details of
all of its links.
SemMonitor
SemMonitor receives status reports from every node in the network, this can
be used for debugging purposes but it is also used to track which nodes fired
in a given sequence of activation. Every node has a link to the SemMonitor
object for the network and are required to notify SemMonitor every time
they fire.
SemFire
SemFire is structured similarly to SemMonitor in that every node in the
network contains a link to the single SemFire object. When a node wishes
to fire, it notifies SemFire. SemFire keeps a list of all requests and permits
nodes to fire in an order specified by the firing algorithm (see Section 4.4)
3.3
Parser Filtering
The output of SaskNet’s parsing and analysis tools must be manipulated
in such a way as to turn each sentence’s representation into the form of a
semantic network update which can be used by the update algorithm (see
Section 4.5). The module which performs this data manipulation is called
17
the parser filter. The parser filter is designed in a modular fashion so that
when one of the parsing and analysis tools changes, the parser filter can be
easily replaced or altered without affecting the rest of SaskNet. Two parser
filters have been developed for SaskNet, the first to filter the output of the
Clark and Curran parser, and the second to filter the output of CCG2Sem.
3.3.1
Clark and Curran Parser Filter
The first filter developed for the system was designed to work with the
Clark and Curran parser’s first-order semantic relations output. The filter is
essentially as set of rules mapping relations output by the parser to network
features. For example, as can be seen in Table 3.1. upon encountering the
output vmod(word1, word2), the filter turns the node for word2 into an
attribute for the relational node word1 (if either of the nodes do not exist
they are created; if the node for word1 is not already a relNode it is turned
into one).
Parser Output
comp(word1, word2)
vmod(word1, word2)
ncsubj(word1, word2)
dobj(word1, word2)
Rule
Merge Node1 and Node2
Node2 becomes attNode
Node1 becomes relNode
Node2 becomes attribute of Node1
Parents of Node2 become parents of Node1
Node1 becomes relNode
Subject link of Node1 set to Node2
Node1 becomes relNode
Object link of Node1 points to Node2
Table 3.1: A sample of the rules used by the Clark and Curran parser filter
Some of the rules require some complexity to ensure that links are preserved
especially between parental nodes during the application of various rules.
There are also a few “ad hoc” rules created to deal properly with phenomena
such as conjunctions and disjunctions. The order in which rules are applied
also greatly effects the performance of this filter.
The Clark and Curran Parser Filter is no longer used in SaskNet, but it
is a good example of the type of filter that would need to be created if we
chose to change the parsing and data analysis tools. The first-order semantic
output of the Clark and Curran parser is radically different from the output
of CCG2Sem which is currently used, but with the creation of a simple filter,
it can be fully integrated into SaskNet with little difficulty.
18
nmod(Vinken_2, Pierre_1)
nmod(years_5, 61_4)
comp(old_6, years_5)
ncsubj(old_6, Vinken_2)
detmod(board_11, the_10)
dobj(join_9, board_11)
nmod(director_15, nonexecutive_14)
detmod(director_15, a_13)
comp(as_12, director_15)
vmod(join_9, as_12)
comp(Nov._16, 29_17)
vmod(join_9, Nov._16)
xcomp(will_8, join_9)
ncsubj(will_8, Vinken_2)
ncsubj(join_9, Vinken_2)
<c> Pierre|NNP|N/N Vinken|NNP|N ,|,|, 61|CD|N/N years|NNS|N old|JJ|(S[adj]\NP)\NP
,|,|, will|MD|(S[dcl]\NP)/(S[b]\NP) join|VB|(S[b]\NP)/NP the|DT|NP[nb]/N
board|NN|N as|IN|((S\NP)\(S\NP))/NP a|DT|NP[nb]/N nonexecutive|JJ|N/N
director|NN|N Nov.|NNP|((S\NP)\(S\NP))/N[num] 29|CD|N[num] .|.|.
nmod(Vinken_2, Mr._1)
nmod(N.V._7, Elsevier_6)
nmod(group_12, publishing_11)
nmod(group_12, Dutch_10)
detmod(group_12, the_9)
conj(,_8, group_12)
conj(,_8, N.V._7)
comp(of_5, group_12)
comp(of_5, N.V._7)
nmod(chairman_4, of_5)
dobj(is_3, chairman_4)
ncsubj(is_3, Vinken_2)
<c> Mr.|NNP|N/N Vinken|NNP|N is|VBZ|(S[dcl]\NP)/NP chairman|NN|N
of|IN|(NP\NP)/NP Elsevier|NNP|N/N N.V.|NNP|N ,|,|, the|DT|NP[nb]/N
Dutch|NNP|N/N publishing|VBG|N/N
group|NN|N .|.|.
Figure 3.6: Sample output from the Clark and Curran Parser for the text
”Pierre Vinken, 61 years old, will join the board as a nonexecutive director
Nov. 29. Mr. Vinken is chairman of Elsevier N.V., the Dutch publishing
group”.
19
Figure 3.7: The Clark and Curran parser filter output for the input given
in Figure 3.6
20
3.3.2
CCG2Sem Filter
The CCG2Sem filter takes advantage of the recursive nature of CCG2Sem’s
prolog output. The program is written recursively, handling one predicate
at a time and continually calling itself on any sub-predicates.
Like the Clark and Curran parser filter, the CCG2Sem filter is essentially a
set of rules mapping predicates to network fragments, however with the output of CCG2Sem, the predicates are nested recursively, so the filter must deal
with them recursively. Table 3.2 shows the rules for a number of CCG2Sem’s
prolog predicates.
Prolog Predicate
drs(A[ ],B)
prop(x,B)
named(x, text, type)
pred(text, x)
pred(‘event’, x)
pred(text,[x,y])
pred(‘agent’,[x,y])
eq(x,y)
or(A,B)
Rule
Create one node for each of the discourse referents in A
Recursively call filter on B
Recursively call filter on B
Set x as the parent node for network fragment created by B
Set x to named entity type type
Give node x label text
Give node x label text
Set x to type relNode
Create relNode z with label text
set subject link of z to x
set object link of z to y
Set agent link of y to x
Create relNode z with label is
set subject link of z to x
set object link of z to y
Create parentNode x with label or
Create unlabeled parentNode y
Create unlabeled parentNode z
Set x as parent of y and z
Recursively call filter on A
Set y as the parent node for network fragment created by A
Recursively call filter on B
Set z as the parent node for network fragment created by B
Table 3.2: A sample of the rules used by the CCG2Sem filter. Capital letters
represent prolog statements, lower case letters represent prolog variables
Several of the rules used by the CCG2Sem filter are context sensitive (i.e.,
if a predicate tries to label a node which is in one of its parent nodes, it
is treated as an attribute instead). There are also a number of “special
21
case” rules such as those shown where the predicate was either ‘agent’ or
’event’.
The CCG2Sem filter continues calling itself recursively, creating networks
within networks (this results in the hierarchical nature of the network) until
it has processed the entire prolog DRS structure and we are left with a
semantic network which represents all of the information in the discourse.
22
smerge(
drs(
[[1001, 1002]:x0, [1004, 1005]:x1, [1006]:x2, [1010]:x3, [1009]:x4,
[1013]:x5, [1016]:x6],
[ [2001]:named(x0, mr, ttl),
[1002, 2002]:named(x0, vinken, per),
[1001]:named(x0, pierre, per),
[1004]:card(x1, 61, ge),
[1005]:pred(year, [x1]),
[1006]:prop(x2, drs([], [[1006]:pred(old, [x0])])),
[1006]:pred(rel, [x2, x1]),
[]:pred(event, [x2]),
[1011]:pred(board, [x3]),
[1016, 1017]:timex(x6, date([]:’XXXX’, [1016]:’11’, [1017]:’29’)),
[1009]:pred(join, [x4]),
[1009]:pred(agent, [x4, x0]),
[1009]:pred(patient, [x4, x3]),
[1014]:pred(nonexecutive, [x5]),
[1015]:pred(director, [x5]),
[1012]:pred(as, [x4, x5]),
[1016]:pred(rel, [x4, x6]),
[]:pred(event, [x4])]),
drs(
[[2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012]:x7,
[2006, 2007]:x8, [2009]:x9, [2011]:x10, [2003]:x11],
[ [2004]:pred(chairman, [x7]),
[2006, 2007]:named(x8, elsevier_nv, loc),
[2005]:pred(of, [x7, x8]),
[2010]:pred(dutch, [x9]),
[2011]:pred(publishing, [x10]),
[]:pred(nn, [x10, x9]),
[2012]:pred(group, [x9]),
[2005]:pred(of, [x7, x9]),
[2003]:prop(x11, drs([], [[2003]:eq(x0, x7)])),
[]:pred(event, [x11])
]))
).
Figure 3.8: Sample output from the CCG2Sem for the text ”Pierre Vinken,
61 years old, will join the board as a nonexecutive director Nov. 29. Mr.
Vinken is chairman of Elsevier N.V., the Dutch publishing group”.
23
Figure 3.9: CCG2Sem filter output for the input given in Figure 3.8
24
Chapter 4
Spreading Activation
Spreading activation is a common feature in connectionist models of knowledge and reasoning, and in particular is usually connected with the neural
network paradigm. Spreading activation in neural networks is the process
by which activation can spread from one node in the network to all adjacent
nodes in a similar manner to the firing of a neurone in the human brain.
Nodes in a spreading activation neural network receive activation from their
surrounding nodes, and if the total amount of accumulated activation exceeds some threshold, that node then fires, sending its activation to all nodes
to which it is connected. The amount of activation sent between any two
nodes is proportional to the strength of the link between those nodes with
respect to the strength of all other links connected to the firing node. A
simple activation function is given in (4.1).
weighti,j
activationi,j = α ∗ Pj−1
k=1 weighti,k
α
activationx,y
weightx,y
nx
+
Pni
k=j+1 weighti,k
(4.1)
Symbol Definitions
Firing variable which fluctuates depending on node types
Amount of activation sent from node x to node y when node x fires
Strength of link between node x and node y
Total number of nodes connected to node x
25
4.1
History of Spreading Activation
The discovery that human memory is organised semantically and that concepts which are semantically related can excite one another came from the
field of psycholinguistics. Meyer and Schvaneveldt [Meyer and Schvaneveldt,
1971] showed that when participants were asked to classify pairs of words,
having a pair of words which were semantically related increased both the
speed and the accuracy of the classification. They hypothesised that when
one word is retrieved from memory this causes other semantically related
words to be primed and thus retrieval of those words will be facilitated.
The formal theory of spreading activation as we know it can be traced back
to the work of Quillian [Quillian, 1969] who proposed a formal model for
spreading activation in a semantic network. This early theory was little
more than a marker passing method where the connection between any 2
nodes was found by passing markers to all adjacent nodes until 2 markers
met, similar to a breadth first search.
It was the work of Collins and Loftus [Collins and Loftus, 1975] that added
the main features of what we today consider spreading activation, such as:
Signal attenuation, summation of activation from input nodes and firing
thresholds.
Despite the obvious theoretical advantages of Collins and Loftus’ model,
due to computational restraints much of the work which has used the title of
“spreading activation” has very rarely used the full model. Many researchers
used a simplified marker passing model [Hirst, 1987], or used a smaller or
simplified network because the manual creation of a semantic network (or at
least one that fits our definition from section 3.1) was too time consuming
([Crestani, 1997],[Preece, 1981]).
The application of spreading activation to information retrieval gained a
great deal of support in the 80s and early 90s (Salton and Buckley [1988],
Kjeldsen and Cohen [1988]), however the difficulty of manually creating networks, combined with computational intractability of automatically creating networks caused most researchers to abandon this course [Preece, 1981].
Recent improvements in automatic network creation have brought hope for
further research in spreading activation theory, and we hope that SaskNet
will not only utilise spreading activation in its creation, but also be a testbed
for future work in spreading activation research.
26
4.2
Spreading Activation in SaskNet
The semantic network created for SaskNet has been designed specifically for
use with spreading activation. Each node maintains its own activation level
and threshold, and can independently send activation to all surrounding
nodes. A monitor process controls the activation and records the order and
frequency of node firing.
Each of the various types of nodes (object, relation, parent, attribute, etc.)
can have its own firing threshold and even its own firing algorithm. Each
node type has a global signal attenuation value that controls the percentage
of the activation that a node of this type passes on to each of its neighbours
when firing.
Spreading activation is by nature a parallel process, however it is implemented sequentially in SaskNet for purely computational reasons. While
future work may allow parallelisation of the algorithm, the current system
has been designed to ensure that the sequential nature of the processing does
not adversely affect the outcome. Two separate implementations of the firing algorithm have been created. The first is a pulsing algorithm where each
node which is prepared to fire at any given stage fires and the activation is
suspended until all nodes have finished firing. This is analogous to having
the nodes fire simultaneously on set pulses of time. The second implementation of the firing algorithm uses a priority queue to allow the nodes with the
greatest amount of activation to fire first (For more detailed information see
Section 4.4).The second algorithm is more analogous to the asynchronous
firing of neurones in the human brain, however both implementations have
been fully implemented and the user can choose which firing method they
wish the system to use.
4.3
Information Integration
The power of SaskNet comes from its ability to integrate information from
various sources into a single cohesive representation. This is the main goal
of the update algorithm (see Section 4.5).
Information integration allows a system to take information about a concept
or object from two or more sources, and combine that information in such
a way that new deductions can be made. For example the famous syllogism
below is only possible if we know that men in the major premise and man
in the minor premise are referring to the same class of object (i.e., that we
have resolved the two labels men and man into a single concept).
27
All men are mortal.
Socrates is a man.
Therefore Socrates is mortal.
To further illustrate the point, consider what happens when such an integration takes place erroneously.
All beetles are insects.
Paul McCartney is a Beatle.
Therefore Paul McCartney is an insect.
The reason that this example breaks down is because we incorrectly integrated the two premises by aligning beetles and beatle which, despite being
very similar in spelling and identical in pronunciation are semantically distinct concepts.
Most of the research on information integration has been done in the database
paradigm, using string similarity measurements to align database fields
[Bilenko et al., 2003]. Research done on natural language information integration has mostly centred on document clustering based on attributes
gained from pattern matching [Wan et al., 2005].
One particularly interesting line of research is the work of Guha and Garg
[Guha and Garg, 2004]. They propose a search engine which clusters document results which relate to a particular person. The proposed methodology is to create binary first order logic predicates (e.g.,first name(x,Bob),
works for(x,IBM)) which can be treated as attributes for a person, and then
using those attributes to cluster documents about one particular individual.
This amounts to a simplified version of the problem SaskNet attempts to
solve, using a simplified network, and limiting the domain to personal information; the results, however, are promising.
4.4
Firing Algorithm
Ultimately each node in a neural network should act independently; firing whenever it receives the appropriate amount of activation. This asynchronous communication between nodes is more directly analogous to the
workings of the human brain, and most spreading activation theories assume
a completely asynchronous model.
In practice, it is difficult to have all nodes operating in parallel. SaskNet
attempts to emulate an asynchronous network through its firing algorithm.
28
Each network has a SemFire object (see Section 3.2.1 for class diagram)
which controls the firing of the nodes in that network.
When a node in the network is prepared to fire, it sends a firing request
to the SemFire object. The SemFire object then holds the request until the
appropriate time before sending a firing permission message to the node
to allow it to fire.
Two separate firing algorithms have been implemented in SaskNet.
Pulse Firing
The pulse firing algorithm emulates a network where all nodes fire simultaneously at a given epoch of time. Each node that is prepared to fire at
a given time fires, and the system waits until all nodes have fired and all
activation levels have been calculated before beginning the next firing round.
To implement this algorithm, the SemFire object retains two lists of requests,
the first is the list of firing requests which will be fulfilled on this pulse, we
will call this list the pulse list. The second list contains all request made
during the current pulse, we will call this the wait list. The SemFire object
fires all of the nodes with requests in the pulse list, removing a request once
it has been fulfilled (in this algorithm the order of firing is irrelevant), while
placing all firing requests it receives into the wait list. Once the pulse list
is empty, and all requests from the current pulse have been collected in the
wait list. The SemFire object simply moves all requests from the wait list
into the pulse list, and is then ready for the next pulse.
Priority Firing
The priority firing algorithm emulates a network where the amount of activation received by a node dictates the speed with which the node fires.
Nodes receiving higher amounts of activation will fire faster than nodes
which receive just enough to meet their firing threshold.
To implement this algorithm, the SemFire object retains a priority queue of
requests, where each request is assigned a priority based on the amount of
activation it received over its activation threshold (4.2) The SemFire object
fulfills the highest priority request; if a new request is received while the
first request is being processed, it is added to the queue immediately.
29
pirority = α ∗ (activation received - firing level)
α
(4.2)
Symbol Definitions
Node type variable
The two firing algorithms would be equivalent if all of the activation in
the network spread equally. However when a node fires, it sends out a set
amount of activation, and excess activation received above the firing threshold disappears from the network. The effect of this disappearing activation
is that the order in which nodes fire can change the final pattern of activation in the network. It is therefore important that both firing algorithms
be implemented and tested so that a choice can be made based on their
contribution to the performance of the system.
4.5
Update Algorithm
The update algorithm takes a smaller network or network fragment (update
network ) and integrates it into a larger network (main network ). Essentially the same algorithm is used for updating at the sentence level and at
the document level. When updating at the sentence level, the update network represents the next sentence in the document and the main network
represents all previous sentences in the document. When updating at the
document level, the update network represents the document, and the main
network represents all of the documents that have been processed by the
system.
This algorithm has not yet been implemented, and so the details cannot be
given in this paper. This section will attempt to give a high level overview
of how the algorithm will work by walking through a simple example1 .
For this example, we will use the update network shown in figure 4.2 being
applied to the main network shown in figure 4.1. All nodes in this example
will be referred to by their ID field.
Initially, all object nodes from the update network are matched with any
similar nodes from the main network. The nodes are compared on simple
1
This example uses a simplified network to avoid unnecessary details. As this algorithm
has not yet been implemented the numbers used are speculative at best and not based on
the actual calculations that will be performed. This example should be treated only as an
illustration of the algorithm at a very high level.
30
ID=bush
found in
ID=garden
garden
ID=georgebush
George W. Bush
TYPE=per
President Bush
bush
Geroge Bush
type of
ID=plant
ID=gore
is
plant
ID=algore
flora
TYPE=per
Al Gore
result of
President
Mr. Gore
ID=pres
President of the United States
Vice President Gore
was
ID=vp
gore
ID=violence
violence
President of the US
appoints
Vice President
lives
ID=whitehouse
Vice President of the U.S.A.
White House
Figure 4.1: An example main network containing information about United
States politics, gardening and violence
ID=bu?
Bush
TYPE=per
beat
ID=go?
TYPE=per
Gore
to
ID=wh?
Whitehouse
Figure 4.2: An example update network created from the sentence ”Bush
beat Gore to the Whitehouse”
similarity characteristics such as string similarity and named entity type similarity. A similarity score is then calculated for each node pairing producing
the matrix shown in Table 4.1.
bu?
go?
wh?
georgebush
0.5
bush
0.7
algore
gore
0.5
0.7
whitehouse
0.8
Table 4.1: Similarity Matrix: Initial scoring
As we can see in table 4.1, the initial scoring is more likely to match bu?
with bush instead of the correct matching with georgebush. This is because
31
the labels in bu? and bush are identical, which outscores the named entity
type similarity in bu? and georgebush.
Once the initial scoring is completed, the algorithm chooses a node from
the update network (in this case bu?) and uses the scores in its row of the
similarity matrix to set the initial activation level of the nodes in the main
network. In this instance, the algorithm will fire the bush and georgebush
nodes, with the bush node receiving more initial activation.
After the activation has spread through the system, the algorithm checks all
of the nodes in the main network and records the amount of activation they
received. It then uses that score to update the scores for all the non chosen
nodes in the update network. In this case, algore would have received
some activation, and whitehouse would have received a fairly large amount
of activation. We therefore have increased confidence that those are the
correct nodes to use in our mapping, and therefore we increase their scores
accordingly. Likewise the scores of the nodes which received no activation
will have their scores decreased. The result is shown in Table 4.2.
bu?
go?
wh?
georgebush
0.5
bush
0.7
algore
gore
0.55
0.5
whitehouse
0.9
Table 4.2: Similarity Matrix: After testing bu? node.
Note that in Table 4.2 the first row was not changed. This is the row that
we used in our initial firing, and thus we can not gain accurate information
about the changes in its status.
The algorithm now chooses another node from the update network and
repeats the process. In this case it chooses wh?. After firing whitehouse,
the georgebush and algore nodes both receive activation, therefore their
scores are updated accordingly, resulting in the matrix shown in Table 4.3.
bu?
go?
wh?
georgebush
0.65
bush
0.5
algore
gore
0.65
0.3
whitehouse
0.9
Table 4.3: Similarity Matrix: After testing wh? node.
The algorithm continues in this manner until all nodes have been fired. Note
that on this iteration, it chooses go?, which results in a much larger effect
32
than if it had been chosen on the first iteration, because its similarity score
to algore is much higher, resulting in less wasted activation going to the
gore node. After the third iteration, the similarity matrix appears in Table
4.4.
bu?
go?
wh?
georgebush
0.7
bush
0.3
algore
gore
0.65
0.3
whitehouse
0.95
Table 4.4: Similarity Matrix: After testing wh? node.
This process repeats until the scores converge or some stopping criteria is
met (either a set number of iterations, or a minimum change in matrix
values). After each iteration the scores in the similarity matrix improve
which in turn increases the effectiveness of the next iteration. This process
must be repeated incrementally so that small amounts of activation can be
used which will not cause small mistakes in the initial scoring to overpower
the rest of the algorithm.
Eventually, our example should map wh? to whitehouse, bu? to bush and
go? to gore. Resulting in the updated network shown in Figure 4.3.
ID=bush
found in
ID=garden
garden
ID=georgebush
George W. Bush
TYPE=per
President Bush
bush
Geroge Bush
type of
ID=plant
beat
is
ID=algore
flora
TYPE=per
President
Mr. Gore
to
ID=pres
Vice President Gore
President of the United States
ID=violence
violence
President of the US
was
ID=vp
gore
result of
Al Gore
plant
ID=gore
appoints
Vice President
Vice President of the U.S.A.
ID=whitehouse
lives
White House
Figure 4.3: Network resulting from application of the update algorithm
This is a very simplified example, and many features of the algorithm have
been abstracted away. There may also be additional features which will be
added as the algorithm is developed. However we believe that this example
demonstrates that the update algorithm is theoretically sound. The next
stage of the research will be to focus on a complete implementation of the
system (see Section 7.1).
33
4.6
Cleanup Algorithm
It is possible that the update algorithm will not have sufficient information to
justify the merging of two nodes even if those two nodes do in fact represent
the same real world object. It is quite likely that in some cases two nodes
which at first appear to have little or nothing in common will later be shown
to represent the same entity. The cleanup algorithm attempts to resolve
these problems with the system by providing a mechanism for merging nodes
after they have already been placed into the network.
Like the update algorithm, the cleanup algorithm has not yet been implemented, so this section will attempt to explain the algorithm in high level
detail, but will not be able to give details of the implementation.
One of the major advantages of using spreading activation in SaskNet is
that if two nodes represent the same real world entity, eventually they will
become so closely linked that when one of the node fires, the other will
surely receive enough activation to fire as well. For example, if in Figure
4.1 from the previous section, we were to add a new node with the label
“President Bush”, eventually that node should become linked to the nodes
with the labels “white house” and “president of the United States” (and
also any new nodes added to the system that relate to both objects).
The cleanup algorithm must first chooses a source node from the network,
and then uses spreading activation to identify a partner node which may
represent the same semantic object. The source node is fired, but the activation is constrained so that it can only travel though one relation. This
is equivalent to “pausing” the activation after it has travelled one relation
away. At that point all of the objects with activation (except for the source
node) are recorded in a list we will call the neighbour list.The activation is
then allowed to proceed through one more relation. The nodes which fire on
this second “pulse” are placed into a list called the potential partners list.
Any nodes which were in the neighbour list are removed from the potential
partners list, and then the potential partners list is sorted by amount of
activation currently held. The node in the potential partners list with the
highest level of activation is our partner node.
The same process is repeated but this time using the partner node as our
initial source of activation. If after sorting the potential partners list, the
source node is among the top nodes it receives a high similarity score. This
is added to a similarity score calculated based on label string similarity and
named entity type to receive a final similarity score. If that score is above a
cutoff threshold we have found a match and the two nodes can be mapped
together.
34
To clarify the intuition behind this algorithm consider Figure 4.4. We are
attempting to find a node which has similar relations to our source node,
but does not have a direct link to it. Thus when we send activation out
one link away from our source node we should have activated many nodes
closely linked with our partner, but not activated our partner node. When
we fire all of the neighbour links, we should therefore send a great deal of
activation to our partner node.
The second stage of the algorithm is simply repeating the process with the
source and partner nodes switched so that we do not simply assume a partner
node is semantically similar to our source node if it has strong links to a
great number of nodes. To understand why this is necessary, imagine a
scenario where we have a single node with a strong connection to almost
every node in a large network, then choosing any node not directly linked to
it as a source will likely result in choosing that node as a partner, however
the second stage of the algorithm would result with a very large neighbour
list and thus is unlikely to send much of its activation to our original source
node.
Figure 4.4: Finding a partner node
35
Chapter 5
Similar Projects
The potential usefulness of a large scale semantic knowledge base can be
attested to by the number of projects currently underway to build one. In
this section we will survey several of the larger and more successful efforts
at building a network similar to that which the SaskNet project hopes to
achieve. There are two broad classes of projects that attempt to build
large scale knowledge networks. Traditionally manual creation has been the
methodolgy of choice, but more recently projects using automated creation
have begun.
5.1
Manually Constructed Networks
Manual creation of large scale semantic networks is a very labour intensive
task. Projects of this nature can easily take decades to complete and require
hundreds of contributors. However in most cases manual creation ensures
a highly reliable network is created and every entry in the network can be
used with confidence as it has been tested by humans.
By far the most widely used knowledge network in development today is
WordNet [Fellbaum, 1998]. Begun in 1985 at Princeton University, WordNet organises words into senses or distinct meanings which are connected
through a discrete number of semantic relations, and contains over 200 000
word senses [Word Net]. WordNet is designed following psycholinguistic
theories of human memory, and is mainly focused on formal taxonomies of
words. It is primarily a lexicographic resource, rather than an attempt at
semantic knowledge network, however it has been used in many cases to
approximate a semantic network (see Chapter 6, and is therefore included
36
in this list.
The Cyc Project [Lenat, 1995] attempts to focus on common knowledge,
or assertions which are too simple and obvious to be given in dictionaries
or other forms of text, but that a native speaker of English can take for
granted that his/her audience knows. The Cyc Project is manually created
one assertion at a time by a team of knowledge engineers, and contains over
2.2 million assertions relating over 250 000 terms [Matuszek et al., 2006].
ConceptNet [Liu and Singh, 2004b] (previously known as OMCSNet) uses
a semantic network similar to the network created for SaskNet. Nodes are
small fragments of English connected by directed relations. The primary
difference between ConceptNet and the semantic network formalism used
in SaskNet is that the relations in Concept Net are selected from a set of
20 pre-defined relations, and ConceptNet is only able to contain definitional
data and thus does not require a hierarchical structure. ConceptNet uses
the OpenMind corpus [Singh et al., 2002] to acquire its knowledge. This
is particularly interesting because the OpenMind corpus is created by the
general public. Visitors to a webpage are presented with text such as “A
knife is used for ...”, and are then asked to provide text fragments to fill
in the rest of the sentence. This has allowed ConceptNet to grow rapidly.
ConceptNet contains over 1.6 million edges connecting more than 300 000
nodes [Liu and Singh, 2004a].
5.2
Automatically Constructed Networks
The labour intensive nature of manually creating a semantic network makes
automatic creation of networks an obvious goal for researchers [Crestani,
1997]. However it is only recently that advances in natural language processing techniques have made automatic creation a possiblity. Semantic
networks created automatically will naturally be more likely to contain errors that would not be introduced to manually created networks, however
for many tasks the great decrease in time and labour required to build a
network, combined with the ability to use larger corpora will make up for
the decrease in accuracy [Dolan et al., 1993].
There have recently been promising results in semi-automated knowledge
acquisition. Pantel and Pennacchiotti [2006] details the Espresso system,
which automatically harvests relations from natural language text. Although the system uses semi-supervised learning for each relation, and thus
requires human intervention for each new relation type, it is nonetheless very
promising that a simple pattern matching based algorithm has been shown
37
to perform well when used on a web based corpus [Patrick Pantel and Hovy,
2004].
The most promising project currently underway in the field of automatic
network construction is MindNet [Dolan et al., 1993]. Started in 1993 at
Microsoft Research, MindNet uses a wide coverage parser to extract pre
defined relations from dictionary definitions. To illustrate the difference in
the automated construction approach, the MindNet network of over 150 000
words connected by over 700 000 relations can be created in a matter of
hours on a standard personal computer [Richardson et al., 1998].
Of all the projects listed here, MindNet is the most similar in methodology to SaskNet. However MindNet uses a more traditional phrase-structure
parser and only analyses dictionary definitions which tend to have much
less linguistic variation than newspaper text, and are also more limited in
the type of information they convey. MindNet also only uses a small set of
pre-defined relations and is essentially an atomic network (see Section 3.1).
SaskNet’s relations are defined by the text itself and it is capable of handling arbitrarily complex node structures. Therefore the largest difference
between MindNet and SaskNet is that SaskNet can accommodate a much
more diverse range of inputs, and can represent a much wider range of information. This will allow SaskNet to use very large document collections
to create its network which will hopefully lead to a larger, more diverse and
ultimately more useful network.
38
Chapter 6
Potential Uses
There is a wide variety of projects that could benefit from the successful
development of the SaskNet project. In this section we will mention a few
areas and projects in which a large scale semantic knowledge network would
be useful.
As we have already mentioned in this paper, the field of spreading activation
has been seen as promising both in its applications and because of its theoretical basis in human intelligence. It could therefore be very beneficial for
further research to be done on spreading activation in semantic networks.
Unfortunately this area of research has remained essentially dormant for
more than a decade, largely due to computational difficulties and a lack of
large scale semantic networks to use as a testbed [Preece, 1981]. We believe that if the SaskNet project is successful, it will not only provide this
testbed, but also show that spreading activation is computationally feasible
in building a large scale network. Thus the SaskNet project could provide
both motivation for further research into spreading activation as well as an
accessible platform on which to perform that research.
Many projects have used WordNet to measure the semantic similarity between concepts [Budanitsky and Hirst, 2006] for use in projects such as query
expansion in information retrieval and word sense disambiguation. WordNet is not particularly well suited to this task since it was created based
on formal taxonomies of words rather than semantic relations between concepts. As Hirst and Ong point out “In WordNet, stew and steak are not
closely related, but public and professional are.” [Fellbaum, 1998]. SaskNet
would be a much better tool for most of these projects because it is actually created as a semantic network, and thus distance between concepts in
the SaskNet network is much more directly analogous to semantic similarity
39
than distance in WordNet.
The Cyc and ConceptNet projects have been used in many areas such as
question answering [Curtis et al., 2005], word sense disambiguation [Curtis
et al., 2006], and predictive text entry [Stocky et al., 2004] with a great
deal of success. Any of these applications could benefit from SaskNet either
in conjunction with existing networks or possibly as a replacement. Since
SaskNet will be constructed automatically, it will grow much more rapidly
than either Cyc or ConceptNet, and thus many of the projects which use
those networks would benefit from a larger, more easily updated network.
It is obvious that there are many projects currently using other semantic
resources that could benefit from SaskNet, due to its automatic construction
it would be possible to make domain specific networks for tasks, or simply
expand the network to a much larger scale than is currently available to any
of the above mentioned projects. We believe that while the automated nature of SaskNet will necessarily produce more errors than are likely to occur
in any of the other resources, the short construction time, wide coverage and
flexible nature of SaskNet will make it useful to current and future projects
alike.
40
Chapter 7
Future Work
The SaskNet project is still under heavy development. The Semantic Network and underlying spreading activation algorithms have been developed,
but the update and cleanup algorithms have not yet been completed. Their
development is the obvious top priority for the project.
Once a fully functioning system has been developed, we can begin testing the
system with actual data to test performance. There are many parameters
that can be adjusted throughout the system to optimise performance, such
as activation levels of various node types, and similarity scoring metrics for
the update algorithm. Some form of pronoun resolution will also likely need
to be added to the system to compliment the pronoun resolution offered by
CCG2Sem.
Once SaskNet has a fully developed and optimised network, there are still
many possible improvements that could be made. Much of the network
design (particularly the SemNode class) has been designed to minimise the
memory requirements for SaskNet. After a fully functioning network has
been built, we can compare the size of the network against its speed and
possibly sacrifice some of the memory saving features to gain an increase in
speed.
Fortunately due to the automated nature of SaskNet, building a network is
accomplished in hours or days rather than months or years as with manually
created networks. Therefore we can easily build sample networks to test
various parameters and even change the design of the network itself without
requiring a great amount of time and effort.
In the immediate future SaskNet will be run on a single CPU system. It
may eventually be desirable to migrate SaskNet to a larger computer cluster
41
which could increase both its potential size and its speed. As mentioned
earlier in the paper, it may also eventually be desirable to re-implement the
firing algorithms to be truly parallel.
7.1
Timeline
The section provides an estimate for the completion time for various major
stages of SaskNet’s development over the coming year.
Enhanced pronoun resolution
Completion of update algorithm
Completion of cleanup algorithm
Optimisation of activation and update algorithm parameters
Building large scale test networks
Adding additional features & optimisations
42
October
December
January
February
March
June
2006
2006
2007
2007
2007
2007
Bibliography
M. Bilenko, R. Mooney, W. Cohen, P. Ravikumar, and S. Fienberg. Adaptive
name matching in information integration. Intelligent Systems, IEEE, 18:
16 – 23, Sep/Oct 2003.
Johan Bos. Towards wide-coverage semantic interpretation. In Proceedings
of Sixth International Workshop on Computational Semantics IWCS-6,
pages 42–53, 2005.
Johan Bos, Stephen Clark, Mark Steedman, James R. Curran, and Julia
Hockenmaier. Wide-coverage semantic representations from a ccg parser.
In Proceedings of the 20th International Conference on Computational
Linguistics (COLING-04), pages 1240–1246, Geneva, Switzerland, 2004.
T. Briscoe and J. Carroll. Robust accurate statistical annotation of general
text. In Proceedings of the 3rd International Conference on Language
Resources and Evaluation, pages 1499–1504, Las Palmas, Gran Canaria,
2002.
Alexander Budanitsky and Graeme Hirst. Evaluating wordnet-based measures of semantic distance. Computational Linguistics, 32:13 – 47, March
2006.
Eugene Charniak. A maximum-entropy-inspired parser. In Proceedings of
the First Conference on North American Chapter of the Association for
Computational Linguistics, pages 132–139, San Francisco, CA, USA, 2000.
Morgan Kaufmann Publishers Inc.
Kenneth Ward Church and Patrick Hanks. Word association norms, mutual
information, and lexicography. Comput. Linguist., 16(1):22–29, 1990.
Stephen Clark and James R. Curran. Parsing the WSJ using CCG and loglinear models. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL ’04), pages 104–111, Barcelona,
Spain, 2004.
Allan M. Collins and Elizabeth F. Loftus. A spreading-activation theory of
semantic processing. Psychological Review, 82(6):407–428, 1975.
43
Micheal Collins. Head-Driven Statistical Models for Natural Language Parsing. PhD thesis, University of Pennsylvania, 1999.
F. Crestani. Application of spreading activation techniques in information
retrieval. Artificial Intelligence Review, 11(6):453 – 482, Dec 1997.
Jon Curtis, G. Matthews, and D. Baxter. On the effective use of cyc in
a question answering system. In Papers from the IJCAI Workshop on
Knowledge and Reasoning for Answering Questions, Edinburgh, Scotland,
2005.
Jon Curtis, D. Baxter, and J. Cabral. On the application of the cyc ontology
to word sense disambiguation. In Proceedings of the Nineteenth International FLAIRS Conference, pages 652 – 657, Melbourne Beach, FL, May
2006.
William B. Dolan, L. Vanderwende, , and S. Richardson. Automatically
deriving structured knowledge base from on-line dictionaries. In Proceedings of the Pacific Association for Computational Linguistics, Vancouver,
British Columbia, April 1993.
Christiane Fellbaum, editor. WordNet : An Electronic Lexical Database.
MIT Press, Cambridge, Mass, USA, 1998.
Emden R. Gansner and Stephen C. North. An open graph visualization
system and its applications to software engineering. Software — Practice
and Experience, 30(11):1203 – 1233, 2000.
R. V. Guha and A. Garg. Disambiguating people in search. In 13th World
Wide Web Conference (WWW 2004), New York, USA, 2004.
G. Hirst. Semantic Interpretation and the Resolution of Ambiguity. Studies
in Natural Language Processing. Cambridge University Press, Cambridge,
UK, 1987.
H. Kamp. A theory of truth and semantic representation. In J. Groenendijk
et al., editors, Formal Methods in the Study of Language. Mathematisch
Centrum, 1981.
Hans Kamp and Uwe Reyle. From Discourse to Logic : Introduction to Modeltheoretic Semantics of Natural Language, Formal Logic and Discourse
Representation Theory. Kluwer Academic, Dordrecht, 1993.
Rick Kjeldsen and Paul R. Cohen. The evolution and performance of the
grant system. Technical report, University of Massachusetts, Amherst,
MA, USA, 1988.
Douglas B. Lenat. Cyc: A large-scale investment in knowledge infrastructure. Communications of the ACM, 38(11):33 – 38, 1995.
44
H Liu and P Singh. Conceptnet a practical commonsense reasoning tool-kit.
BT Technology Journal, 22:211 – 226, Oct 2004a.
H. Liu and P Singh. Commonsense reasoning in and over natural language.
In Proceedings of the 8th International Conference on Knowledge-Based
Intelligent Information & Engineering Systems (KES’2004), Wellington,
New Zealand, 2004b.
Margaret Masterman. Semantic message detection for machine translation,
using an interlingua. In Proceedings of the 1961 International Conference
on Machine Translation of Languages and Applied Language Analysis,
pages 438 – 475, London, 1962.
Cynthia Matuszek, J. Cabral, M. Witbrock, and J. DeOliveira. An introduction to the syntax and content of cyc. In 2006 AAAI Spring Symposium
on Formalizing and Compiling Background Knowledge and Its Applications to Knowledge Representation and Question Answering, Stanford,
CA, USA, March 2006.
D.E. Meyer and R.W. Schvaneveldt. Facilitation in recognizing pairs of
words: Evidence of a dependence between retrieval operations. Journal
of Experimental Psychology, 90(2):227–234, 1971.
Patrick Pantel and Marco Pennacchiotti. Espresso: Leveraging generic
patterns for automatically harvesting semantic relations. In Proceedings
of Conference on Computational Linguistics / Association for Computational Linguistics (COLING/ACL-06), Sydney, Australia, 2006.
Deepak Ravichandran Patrick Pantel and Eduard Hovy. Towards terascale
knowledge acquisition. In Proceedings of Conference on Computational
Linguistics (COLING-04), pages 771 – 777, Geneva, Switzerland, 2004.
S Preece. A Spreading Activation Model for Information Retrieval. PhD
thesis, University of Illinois, Urbana, IL, 1981.
M. Ross Quillian. The teachable language comprehender: A simulation
program and theory of language. Communications of the ACM, 12(8):459
– 476, 1969.
Stephen D. Richardson, William B. Dolan, , and Lucy Vanderwende. Mindnet: Acquiring and structuring semantic information from text. In Proceedings of COLING ’98, 1998.
G. Salton and C. Buckley. On the use of spreading activation methods in
automatic information retrieval. In SIGIR ’88: Proceedings of the 11th
annual international ACM SIGIR conference on research and development
in information retrieval, pages 147 – 160, New York, NY, USA, 1988. ACM
Press.
45
Push Singh, Thomas Lin, Erik T. Mueller, Grace Lim, Travell Perkins, and
Wan Li Zhu. Open mind common sense: Knowledge acquisition from the
general public. In Lecture Notes in Computer Science, volume 2519, pages
1223 – 1237. Springer Berlin / Heidelberg, 2002.
John F. Sowa. Semantic networks. In S. C. Shapiro, editor, Encyclopedia of
Artificial Intelligence. Wiley-Interscience, New York, 2nd edition, 1992.
Mark Steedman. The Syntactic Process. The MIT Press, Cambridge, MA.,
2000.
Tom Stocky, Alexander Faaborg, and Henry Lieberman. A commonsense
approach to predictive text entry. In Proceedings of Conference on Human
Factors in Computing Systems, Vienna, Austria, April 2004.
J. van Eijck. Discourse representation theory. In Encyclopedia of Language
and Linguistics. Elsevier Science Ltd, 2 edition, 2005.
J. van Eijck and H. Kamp. Representing discourse in context. In J. van
Benthem and A. ter Meulen, editors, Handbook of Logic and Language.
MIT Press, Cambridge MA, USA, 1997.
Xiaojun Wan, Jianfeng Gao, Mu Li, and Binggong Ding. Person resolution
in person search results: Webhawk. In CIKM ’05: Proceedings of the
14th ACM international conference on Information and knowledge management, pages 163 – 170, New York, NY, USA, 2005. ACM Press.
Word Net. Wnstats - wordnet 2.1 database statistics. Viewed 25 July, 2006,
2006. ¡http://wordnet.princeton.edu/¿.
46