SaskNet: A Spreading Activation Based Semantic Network
Transcription
SaskNet: A Spreading Activation Based Semantic Network
SaskNet: A Spreading Activation Based Semantic Network Report on the Current State of Development Brian Harrington St. Cross College [email protected] Submitted in Partial Completion of the Requirements for Transfer to DPhil Status Computing Laboratory Oxford University September 2006 Contents 1 Introduction 1.1 Format of This Paper . . . . . . . . . . . . . . . . . . . . . . 1 2 2 Parsing and Semantic Analysis 2.1 Parsing . . . . . . . . . . . . . . . . 2.1.1 Choosing a Parser . . . . . . 2.1.2 The Clark and Curran Parser 2.2 Discourse Representation Structures 2.2.1 CCG2Sem . . . . . . . . . . . . . . . . 4 4 5 5 7 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Semantic Networks 3.1 A Semantic Network Definition . . . . 3.2 The SaskNet Semantic Network . . . . 3.2.1 Network Implementation . . . 3.3 Parser Filtering . . . . . . . . . . . . . 3.3.1 Clark and Curran Parser Filter 3.3.2 CCG2Sem Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 10 12 13 17 18 21 4 Spreading Activation 4.1 History of Spreading Activation . 4.2 Spreading Activation in SaskNet 4.3 Information Integration . . . . . 4.4 Firing Algorithm . . . . . . . . . 4.5 Update Algorithm . . . . . . . . 4.6 Cleanup Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 26 27 27 28 30 34 . . . . . . . . . . . . . . . . . . 5 Similar Projects 36 5.1 Manually Constructed Networks . . . . . . . . . . . . . . . . 36 5.2 Automatically Constructed Networks . . . . . . . . . . . . . . 37 6 Potential Uses 39 7 Future Work 41 7.1 Timeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 i Chapter 1 Introduction This paper will discuss the motivation for and development of the SaskNet (Spreading Activation based Semantic Knowledge NETwork) project. The goal of the SaskNet project is to develop a system which can automatically extract knowledge from natural language text, and build a large scale semantic network based on that knowledge. One of the fundamental ideas behind SaskNet is that more data means better results. To that end, we have focused on creating a system which can process large and varied text corpora to extract a wide variety of information. SaskNet is designed to be run on virtually any English text, and while building a system which is tuned to a particular type of information may have benefits in terms of the quality of information extracted from a particular document, we feel designing SaskNet to be able to use larger and more varied corpora will prove to be a more beneficial approach. SaskNet creates a semantic network for a document by translating each sentence into a network fragment which is then viewed as an update to the document network. Once a network has been built for a complete document, it is then used as an update to the larger knowledge network which represents all the knowledge acquired by the system. The merge algorithm used for both the sentence and document level updates merges the smaller update network with the larger existing network. The algorithm uses spreading activation to determine the mappings between nodes in the two networks. If a node in the update network refers to a semantic object for which a node in the existing network already exists, the merge algorithm attempts to map the nodes together. A large-scale semantic network would be of great use to a wide variety 1 of projects ranging from traditional information retrieval and question answering systems to machine translation and artificial intelligence. Several attempts have been made to develop networks similar to SaskNet, but so far these attempts have met with only limited success. Some projects have attempted to manually create networks, but this is a very slow and labour intensive process. Automated network creation has been tried in the past, but has mostly been abandoned due to the sheer computational difficulty of the task. However, recent advances in wide coverage parsers, combined with promising results with terascale corpora make the long sought after goal of a large scale semantic knowledge network a viable possibility. The SaskNet project aims to make that goal a reality. 1.1 Format of This Paper This paper attempts to explain the design of SaskNet starting from theory and working towards implementation. Each of the sections in chapters 2 - 4 begin with an explanation and history of a concept, then proceeds to discuss how the concept applies to SaskNet, and finally shows the particulars of how that concept is implemented within the project. Chapter 2 discusses how text is manipulated before it is input into the SaskNet system. Essentially this chapter deals with the external programs which SaskNet has incorporated into its framework. In particular it deals with parsing and how text is parsed before it is put into the network; and with Discourse Representation, and how using a tool to turn sentences into discourse representation structures is beneficial to the SaskNet project. Chapter 3 deals with the structure of the network itself. Beginning by attempting to reach a common definition for semantic networks; the chapter then explains the structure of the SaskNet network and how it was implemented for this project. Chapter 4 discusses spreading activation and its use in SaskNet. The chapter begins with a history of spreading activation, and attempts to provide motivation for why it is a useful tool for this project. It then explains how spreading activation is used in the update algorithm for information integration, and why this is such an important step in building the network. Chapter 5 surveys various attempts to build large scale semantic networks. Beginning with manually created networks which are often used today despite their limitations, the chapter then proceeds to discuss other projects which are currently trying to build automated resources similar to SaskNet. 2 Chapter 6 attempts to provide motivation for the SaskNet project. It discusses which fields could benefit from a large scale semantic knowledge network and why the currently available networks are often insufficient for the needs of certain projects. Chapter 7 provides a timeline for future work on the project. It attempts to give an overview of future work on the project and give a brief discussion of what will be required in each major phase over the next two years. 3 Chapter 2 Parsing and Semantic Analysis Manipulating natural language in its raw form is a very difficult task. Parsers and semantic analysis tools allow us to work with the content of a document on a semantic level. This simplifies the process of developing a semantic network both computationally and algorithmically. To this end, SaskNet employs a set of software tools to render the plain text into a discourse representation structure, from which point it can turn the information into a semantic network fragment with relative ease. In order for SaskNet to create its semantic network fragment to represent a sentence, it must know the constituent objects of the sentence and their relations to one another. This information is very difficult to extract, and even the best tools available are far from perfect. SaskNet has been designed to use external tools for parsing and semantic analysis so that as these tools improve, SaskNet can improve with them. It has also been designed to not rely too heavily on any one tool so that if a better tool is developed, SaskNet can use it to achieve the best performance possible. 2.1 Parsing Before we can work with natural language text, we must first analyse and manipulate it into a form that can be easily processed. Parsing converts “plain” natural language text into a data structure which the system can either use to build a semantic network fragment directly, or can use as the input to a semantic analysis program. 4 2.1.1 Choosing a Parser One of the major strengths of SaskNet lies in its ability to integrate large amounts of differing information into its network. In order to exploit this power, it is necessary for all stages of the system to be able to handle a wide range of inputs, and process those inputs quickly. The choice of which parser to use is therefore very important to the success of SaskNet. For the entire system to perform well, the parser must be both wide coverage and efficient. To this end, many parsers were considered such as the Charniak Parser [Charniak, 2000], the Collins Parser [Collins, 1999] and RASP [Briscoe and Carroll, 2002]. Eventually speed considerations and its relational made the Clark and Curran Parser [Clark and Curran, 2004] an obvious choice. 2.1.2 The Clark and Curran Parser The Clark and Curran Parser is a wide coverage statistical parser based on Combinatory Categorial Grammar (CCG) [Steedman, 2000], written in C++. CCG is a lexicalised grammar formalism where each word in a sentence is paired with a lexical category, which defines how the word can combine with adjacent words and word phrases. For example in Figure 2.1 the word likes is given the lexical category (S[dcl]\NP)/NP which means that if it finds a noun phrase to its right (in this case pizza), it will combine with it, and the new phrase (likes pizza) will be given the lexical category S[dcl]\NP (Something which is looking for a noun phrase to its left in order to become a declarative sentence). The lexical cateogories of words are combined using a small number of combinatory rules to produce a full derivation for the sentence. Figure 2.1: A simple CCG derivation using forward (>) and backward (<) application. CCG was designed to capture long range dependencies in syntactic phenomena such as coordination and extraction which are often entirely missed by other parsers. Most parsers have a set window (number of words or characters) within which they can find dependencies, but dependencies in natural 5 language text can be arbitrarily far apart. Take for example the sentence given in (2.1). We can easily move the dependency farther apart by adding a clause such as in (2.2). It is easy to continue to add clauses to the sentence to move the initial dependency farther apart as illustrated in (2.3). The dog that Mary saw. (2.1) The dog that John said that Mary saw. (2.2) The dog that John said that Ann thought that Mary saw. (2.3) As the dependencies move farther apart, most parsers have greater difficulty in recognising them, and in many cases once they move farther apart than the parser’s set context window, they cannot be found at all. CCG was specifically designed to be able to capture dependencies regardless of the intervening distance in the text, and thus the Clark and Curran parser is able to extract these dependencies that most other parsers miss. CCG is a lexicalised grammer which means it only has a small number of combinatory rules for combining lexical cateogries; this allows the Clark and Curran Parser to use a supertagger to very efficiently assign lexical cateogries to words in much the same way as standard taggers assign part of speech tags. This results in a parser which is both efficient and robust [Clark and Curran, 2004]. Aside from its speed, the other major advantage of using the Clark and Curran Parser for SaskNet is its relational output. While the other parsers we evaluated only output phrase-tree structures, the Clark and Curran Parser has the ability to output grammatical relations such as subjects and objects of verbs and nominal modifiers [Bos et al., 2004] which allows for a much simpler transition from parser output to semantic network. The Clark and Curran parser also provides a named entity recogniser implemented as a separate program that combines its output with that of the parser. This program is a sequence tagger which attaches entity tags to words to label them as belonging to certain categories such as person, location, organisation, date and monetary amount. While several considerations caused the Clark and Curran parser to be chosen for use in SaskNet, the system has been constructed in such a way that it is not heavily dependent on any particular parser or parsing algorithm. In order to integrate a different parser into Sasknet, it would only be necessary to design a new filter (see Section 3.3) for that parser. 6 2.2 Discourse Representation Structures Discourse Representation Theory (DRT) takes a dynamic perspective of natural language semantics [van Eijck and Kamp, 1997] where each new sentence is viewed in terms of its contribution to an existing discourse. A Discourse Representation Structure (DRS) is a formalised representation of all of the information available at a given point in a discourse. New sentences in the discourse are viewed as updates to the structure [Kamp and Reyle, 1993]. DRT was initially designed to solve the problem of unbound anaphora [van Eijck, 2005], and is particularly useful for establishing links between pronouns and their antecedents. DRT has equivalent expressive power to first order theory. When interpreting a sentence as a DRS a discourse referent (essentially a free variable) is created whenever an indefinite noun phrase (e.g., a dog, someone, a car of mine) is encountered. Definite noun phrases (e.g., this dog, him, my car ) are always linked to existing discourse referents. For example 1 when processing the discourse in (2.4), the first sentence creates two discourse referents, x and y referring to the woman and the room respectively. Then three conditions are created woman(x), room(y) and entered(x,y). This produces the DRS seen in (2.5). A woman entered the room. She smiled. (2.4) (x,y)(woman(x),room(y),entered(x,y)) (2.5) When interpreted, the second sentence in discourse (2.4) creates the DRS seen in (2.6). However when it is processed as an update to (2.5), it also produces a link between the variable z, assigned to the pronoun she, and the variable x which represents the antecedent of that pronoun. Thus producing the updated DRS (2.7)) (z)(smiles(z)) (2.6) (x,y,z)(woman(x),room(y),entered(x,y),z=x,smiled(z)) (2.7) Kamp [1981] and Kamp and Reyle [1993] give a full set of rules for creating DRSs in formal detail, covering constraints on anaphoric linking, nested DRS structures, and special case DRS linking such as implication and disjunction. 1 This example has been simplified for clarity as well as to be more directly consistent with the type of DRS used by CCG2Sem (see Section 2.2.1) and thus does not represent all of the details of the DRS formalism as presented in Kamp and Reyle [1993] 7 2.2.1 CCG2Sem Discourse representation theory is particularly useful to the construction of SaskNet because it builds on similar principles of interpreting new information within the context of the current knowledge base (see Section 4.5). In order to leverage the power of DRT, SaskNet uses the CCG2Sem program [Bos, 2005]. CCG2Sem is a Prolog program which uses the output of the Clark and Curran parser (see Section 2.1.2) to construct semantic derivations based on DRS structures [Bos et al., 2004]. Essentially this program in conjunction with the Clark and Curran parser allows seamless transition from natural language text into a DRS, complete with nested structure, entity recognition, and some limited anaphoric pronoun resolution. Some example output of the program can be seen in Figure 2.2. Representing the sentence as a DRS is ideal for SaskNet for several reasons. The DRS structure very closely mirrors the semantic network structure used in SaskNet, with discourse referents being roughly equivalent to object nodes and the semantic relations being analogous to either node labels or relations (see Section 3.2.1). 8 Figure 2.2: Example output in Prolog format and pretty print for the sentences Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. Mr. Vinken is chairman of Elsevier N.V., the Dutch publishing group. 9 Chapter 3 Semantic Networks A semantic network can loosely be defined as any graphical representation of knowledge using nodes to represent semantic objects and arcs to represent relationships between objects. Used since at least the 3rd century AD in philosophy, with computer implementations in use for over 45 years [Masterman, 1962], a wide variety of formalisms have used the name semantic network [Sowa, 1992]. 3.1 A Semantic Network Definition For the purposes of this thesis, we will wish to posit certain requirements for what we will consider as a “proper” semantic network. Primarily, we will require that the relations in the network be labelled and directed. This is to distinguish semantic networks from what we will call associative networks which connect concepts based simply on the existence of a relationship without regards to the relationship’s nature or direction (See Figure 3.1). While associative networks are technically a type of semantic network, and are quite often used because they can easily be extracted from co-occurrence statistics [Church and Hanks, 1990], for our purposes their lack of power and expressiveness will discount them from consideration. The second requirement we shall impose upon semantic networks is that they be structurally unambiguous. It should not be possible for two semantically different ideas or discourses to be encoded in identical network structures. Thus, even though the semantically different ideas of John using a telescope to see a man and John seeing a man carrying a telescope can be encoded in the same English sentence John saw the man with the telescope, when that 10 fly airplane bird wing plant tree animal cat dog christmas house whiskers Figure 3.1: An example of an associative network. Objects and concepts are linked without distinction for type or direction of link. sentence is translated into a semantic network, the structure of the network must uniquely identify one of the two interpretations (See Figure 3.2). with John telescope saw man (a) John used the telescope to see the man (b) John saw the man carrying the telescope Figure 3.2: Semantic network representations of the two parses of “John saw the man with the telescope” Semantic networks may still contain lexical ambiguity through having ambiguous words used as labels on nodes and arcs. For example in figure 3.3, it is impossible to tell whether bank refers to a financial institution, or the edge of a river. It is theoretically possible to remove lexical ambiguity from a semantic network by forcing each node to be assigned to a particular sense of the word(s) in its label, however word sense disambiguation is a very difficult task and there is no complete solution currently available. The third and final requirement we will make for semantic networks is that 11 John went to bank Figure 3.3: A lexically ambiguous network they must be able to accommodate the complex structures regularly found in natural language speech. In particular we will require that the network allow relations between complex concepts which may themselves contain many concepts and relations. This is to distinguish proper semantic networks from what we will call atomic networks which only allow simple nodes representing a single concept. These networks can only accommodate a limited type of information, and thus we will not include them in our definition of semantic networks. 3.2 The SaskNet Semantic Network The semantic network formalism developed for SaskNet meets all of the criteria we have set out for consideration as a “proper” semantic network, and also has a few extra features that make it particularly well suited to the SaskNet project. We will first explain how the criteria are met, and then briefly describe the extra features that have been added. SaskNet trivially meets the first criterion by its design. All relations in SaskNet are labelled and directed with a defined agent and target node, of which at least one must be present before a relation can be added to the network. The second criterion is taken care of for us by the parser. One of the primary features of a parser is to select one parse from the list of possible parses for a natural language sentence. Since no information is discarded in the translation from the parser output to the network creation, we maintain a single parse and thus are left without any of the original structural ambiguity. The third criterion is met by the hierarchical structure of the network. This allows complex concepts and even entire discourses to be treated as single objects. As we see in Figure 3.9, complex objects can be built up from smaller objects and their relations. The hierarchical structure is unrestrictive, and thus it is possible for any pair of nodes to have a relation connecting them, or for a single node to be a constituent of multiple complex nodes. In figure 3.9 we can also see the attribute nodes (denoted by ellipses). Any 12 Figure 3.4: A Hierarchical Semantic Network object or relation can have multiple attribute nodes, and attribute nodes can also be complex nodes. One additional, but very important feature of SaskNet’s semantic network is that every link in the network is assigned a value between 0 and 1. This value represents the confidence of the link. This value can be determined by various means such as the confidence we have in the source of our information, or the number of different sources which have confirmed a particular relation. In practice, the value (or weight) of a link is set by the merge algorithm (see Section 4.5). The weight is assigned to the link rather than to the relations for two reasons: primarily it allows us to assign a different weight to each direction of a relation, and secondly it allows us to use weighted links to attach attribute nodes as well as relation nodes. 3.2.1 Network Implementation SaskNet’s internal semantic network is implemented in java. It is designed to allow maximum flexibility in node type and hierarchy. A class diagram 1 for the network is given in Figure 3.5. In this section we will explore the details of each of the classes of the network, explaining the functionality and design decisions of each class and how it 1 This is a simplified class diagram and contains only a portion of the total class information. Much of the detail has been left off to increase the saliency of more important features. 13 Figure 3.5: Class diagram of SaskNet’s semantic network architecture interacts with the other parts of the network. SemNode SemNodes can come in 4 distinct types (ObjNode, RelNode, AttNode and ParentNode), Each node type has a distinct activation threshold, but all of the node types are implemented almost identically. The primary difference between the node types is the way they are treated by the SemNet class. • ObjNodes represent atomic semantic objects. They have a special field called neType which is set if the named entity recogniser (see Section 2.1.2) provided it with a label. For example, if a node represents a person, its neType field would be set to “per”. • RelNodes represent relations between objects or concepts. While some semantic networks simply label their links with the names of relations, 14 SaskNet uses fully implemented nodes for this purpose, primarily so that relations themselves can have attributes and adjustable firing potentials, and also so that a single relation can have more than one label. All RelNodes have a agent and target link which provide the direction of the relation, at least one of these links must be instantiated to ensure that all relations must have a direction. • AttNodes represent attributes of an object or concept. They are essentially simplifications of a class of node/relation pairs used to make the network more intuitive. • ParentNodes represent complex semantic concepts made up of two or more nodes. All of the members of the complex concept are labelled as the concept’s children and each node has a link to any ParentNodes of which it is a member. ParentNodes are often vacuous parents, which means that they are unlabelled and provide no additional information beyond the grouping of their constituent nodes. All nodes have a unique id assigned to them at the time of their creation which indicates the document from which they originated. An array of labels allows many labels to be added to a single node, which is necessary as the same concept is often referred to in a variety of ways. Each node has arrays of links to the nodes with which it is connected, these are stored based on the type of node linked. Finally, all nodes contain a link to the monitor processes for their network so that they can easily report their status and events such as firing. SemNodes can receive a pulse of activation from another node or from the network, this increases the potential variable. If this causes it to exceed 15 the firingLevel, then the node sends a request to SemFire to be fired (see Section 4.4 for more details). Nodes can also be merged; merging copies all of one node’s links and labels to another and then deletes the first node. Deleting a node sends messages to all connected nodes to delete the appropriate links from both ends so that no “dead” links can exist in the network. Each node also has a print method based on its node type. This allows proper output of SemNet’s printNetwork() method. SemLink SemLinks form the links between nodes, each link is assigned a strength when it is created, this represents the certainty of the link (i.e., how confident the system is that this link exists in the real world). This can be increased or decreased by the network as more information is gained. SemNet The SemNet class is the interface into the semantic network. All of the functionality of the network is available through SemNet’s methods. SemNet is used to add, remove, retrieve and manipulate nodes. It also indexes the nodes and contains the update algorithm (see Section 4.5). SemNet must be able to retrieve nodes based on both their unique ID and their label. Since the same label may be used in many nodes, this is achieved with a pair of hashtables. The first hashtable maps a string into a list of the IDs of all nodes which have that string as one of their labels. The second hashtable maps an ID to its corresponding node. The combination of these 16 two hashtables allows SemNet to retrieve nodes based on either their label or their ID. SemNet’s print() method prints the contents of the network in GraphViz [Gansner and North, 2000] format so that the graph can be visualised for simple debugging. This is done by calling the print() method of every node in the network. Each node then prints out its own details and the details of all of its links. SemMonitor SemMonitor receives status reports from every node in the network, this can be used for debugging purposes but it is also used to track which nodes fired in a given sequence of activation. Every node has a link to the SemMonitor object for the network and are required to notify SemMonitor every time they fire. SemFire SemFire is structured similarly to SemMonitor in that every node in the network contains a link to the single SemFire object. When a node wishes to fire, it notifies SemFire. SemFire keeps a list of all requests and permits nodes to fire in an order specified by the firing algorithm (see Section 4.4) 3.3 Parser Filtering The output of SaskNet’s parsing and analysis tools must be manipulated in such a way as to turn each sentence’s representation into the form of a semantic network update which can be used by the update algorithm (see Section 4.5). The module which performs this data manipulation is called 17 the parser filter. The parser filter is designed in a modular fashion so that when one of the parsing and analysis tools changes, the parser filter can be easily replaced or altered without affecting the rest of SaskNet. Two parser filters have been developed for SaskNet, the first to filter the output of the Clark and Curran parser, and the second to filter the output of CCG2Sem. 3.3.1 Clark and Curran Parser Filter The first filter developed for the system was designed to work with the Clark and Curran parser’s first-order semantic relations output. The filter is essentially as set of rules mapping relations output by the parser to network features. For example, as can be seen in Table 3.1. upon encountering the output vmod(word1, word2), the filter turns the node for word2 into an attribute for the relational node word1 (if either of the nodes do not exist they are created; if the node for word1 is not already a relNode it is turned into one). Parser Output comp(word1, word2) vmod(word1, word2) ncsubj(word1, word2) dobj(word1, word2) Rule Merge Node1 and Node2 Node2 becomes attNode Node1 becomes relNode Node2 becomes attribute of Node1 Parents of Node2 become parents of Node1 Node1 becomes relNode Subject link of Node1 set to Node2 Node1 becomes relNode Object link of Node1 points to Node2 Table 3.1: A sample of the rules used by the Clark and Curran parser filter Some of the rules require some complexity to ensure that links are preserved especially between parental nodes during the application of various rules. There are also a few “ad hoc” rules created to deal properly with phenomena such as conjunctions and disjunctions. The order in which rules are applied also greatly effects the performance of this filter. The Clark and Curran Parser Filter is no longer used in SaskNet, but it is a good example of the type of filter that would need to be created if we chose to change the parsing and data analysis tools. The first-order semantic output of the Clark and Curran parser is radically different from the output of CCG2Sem which is currently used, but with the creation of a simple filter, it can be fully integrated into SaskNet with little difficulty. 18 nmod(Vinken_2, Pierre_1) nmod(years_5, 61_4) comp(old_6, years_5) ncsubj(old_6, Vinken_2) detmod(board_11, the_10) dobj(join_9, board_11) nmod(director_15, nonexecutive_14) detmod(director_15, a_13) comp(as_12, director_15) vmod(join_9, as_12) comp(Nov._16, 29_17) vmod(join_9, Nov._16) xcomp(will_8, join_9) ncsubj(will_8, Vinken_2) ncsubj(join_9, Vinken_2) <c> Pierre|NNP|N/N Vinken|NNP|N ,|,|, 61|CD|N/N years|NNS|N old|JJ|(S[adj]\NP)\NP ,|,|, will|MD|(S[dcl]\NP)/(S[b]\NP) join|VB|(S[b]\NP)/NP the|DT|NP[nb]/N board|NN|N as|IN|((S\NP)\(S\NP))/NP a|DT|NP[nb]/N nonexecutive|JJ|N/N director|NN|N Nov.|NNP|((S\NP)\(S\NP))/N[num] 29|CD|N[num] .|.|. nmod(Vinken_2, Mr._1) nmod(N.V._7, Elsevier_6) nmod(group_12, publishing_11) nmod(group_12, Dutch_10) detmod(group_12, the_9) conj(,_8, group_12) conj(,_8, N.V._7) comp(of_5, group_12) comp(of_5, N.V._7) nmod(chairman_4, of_5) dobj(is_3, chairman_4) ncsubj(is_3, Vinken_2) <c> Mr.|NNP|N/N Vinken|NNP|N is|VBZ|(S[dcl]\NP)/NP chairman|NN|N of|IN|(NP\NP)/NP Elsevier|NNP|N/N N.V.|NNP|N ,|,|, the|DT|NP[nb]/N Dutch|NNP|N/N publishing|VBG|N/N group|NN|N .|.|. Figure 3.6: Sample output from the Clark and Curran Parser for the text ”Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. Mr. Vinken is chairman of Elsevier N.V., the Dutch publishing group”. 19 Figure 3.7: The Clark and Curran parser filter output for the input given in Figure 3.6 20 3.3.2 CCG2Sem Filter The CCG2Sem filter takes advantage of the recursive nature of CCG2Sem’s prolog output. The program is written recursively, handling one predicate at a time and continually calling itself on any sub-predicates. Like the Clark and Curran parser filter, the CCG2Sem filter is essentially a set of rules mapping predicates to network fragments, however with the output of CCG2Sem, the predicates are nested recursively, so the filter must deal with them recursively. Table 3.2 shows the rules for a number of CCG2Sem’s prolog predicates. Prolog Predicate drs(A[ ],B) prop(x,B) named(x, text, type) pred(text, x) pred(‘event’, x) pred(text,[x,y]) pred(‘agent’,[x,y]) eq(x,y) or(A,B) Rule Create one node for each of the discourse referents in A Recursively call filter on B Recursively call filter on B Set x as the parent node for network fragment created by B Set x to named entity type type Give node x label text Give node x label text Set x to type relNode Create relNode z with label text set subject link of z to x set object link of z to y Set agent link of y to x Create relNode z with label is set subject link of z to x set object link of z to y Create parentNode x with label or Create unlabeled parentNode y Create unlabeled parentNode z Set x as parent of y and z Recursively call filter on A Set y as the parent node for network fragment created by A Recursively call filter on B Set z as the parent node for network fragment created by B Table 3.2: A sample of the rules used by the CCG2Sem filter. Capital letters represent prolog statements, lower case letters represent prolog variables Several of the rules used by the CCG2Sem filter are context sensitive (i.e., if a predicate tries to label a node which is in one of its parent nodes, it is treated as an attribute instead). There are also a number of “special 21 case” rules such as those shown where the predicate was either ‘agent’ or ’event’. The CCG2Sem filter continues calling itself recursively, creating networks within networks (this results in the hierarchical nature of the network) until it has processed the entire prolog DRS structure and we are left with a semantic network which represents all of the information in the discourse. 22 smerge( drs( [[1001, 1002]:x0, [1004, 1005]:x1, [1006]:x2, [1010]:x3, [1009]:x4, [1013]:x5, [1016]:x6], [ [2001]:named(x0, mr, ttl), [1002, 2002]:named(x0, vinken, per), [1001]:named(x0, pierre, per), [1004]:card(x1, 61, ge), [1005]:pred(year, [x1]), [1006]:prop(x2, drs([], [[1006]:pred(old, [x0])])), [1006]:pred(rel, [x2, x1]), []:pred(event, [x2]), [1011]:pred(board, [x3]), [1016, 1017]:timex(x6, date([]:’XXXX’, [1016]:’11’, [1017]:’29’)), [1009]:pred(join, [x4]), [1009]:pred(agent, [x4, x0]), [1009]:pred(patient, [x4, x3]), [1014]:pred(nonexecutive, [x5]), [1015]:pred(director, [x5]), [1012]:pred(as, [x4, x5]), [1016]:pred(rel, [x4, x6]), []:pred(event, [x4])]), drs( [[2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012]:x7, [2006, 2007]:x8, [2009]:x9, [2011]:x10, [2003]:x11], [ [2004]:pred(chairman, [x7]), [2006, 2007]:named(x8, elsevier_nv, loc), [2005]:pred(of, [x7, x8]), [2010]:pred(dutch, [x9]), [2011]:pred(publishing, [x10]), []:pred(nn, [x10, x9]), [2012]:pred(group, [x9]), [2005]:pred(of, [x7, x9]), [2003]:prop(x11, drs([], [[2003]:eq(x0, x7)])), []:pred(event, [x11]) ])) ). Figure 3.8: Sample output from the CCG2Sem for the text ”Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. Mr. Vinken is chairman of Elsevier N.V., the Dutch publishing group”. 23 Figure 3.9: CCG2Sem filter output for the input given in Figure 3.8 24 Chapter 4 Spreading Activation Spreading activation is a common feature in connectionist models of knowledge and reasoning, and in particular is usually connected with the neural network paradigm. Spreading activation in neural networks is the process by which activation can spread from one node in the network to all adjacent nodes in a similar manner to the firing of a neurone in the human brain. Nodes in a spreading activation neural network receive activation from their surrounding nodes, and if the total amount of accumulated activation exceeds some threshold, that node then fires, sending its activation to all nodes to which it is connected. The amount of activation sent between any two nodes is proportional to the strength of the link between those nodes with respect to the strength of all other links connected to the firing node. A simple activation function is given in (4.1). weighti,j activationi,j = α ∗ Pj−1 k=1 weighti,k α activationx,y weightx,y nx + Pni k=j+1 weighti,k (4.1) Symbol Definitions Firing variable which fluctuates depending on node types Amount of activation sent from node x to node y when node x fires Strength of link between node x and node y Total number of nodes connected to node x 25 4.1 History of Spreading Activation The discovery that human memory is organised semantically and that concepts which are semantically related can excite one another came from the field of psycholinguistics. Meyer and Schvaneveldt [Meyer and Schvaneveldt, 1971] showed that when participants were asked to classify pairs of words, having a pair of words which were semantically related increased both the speed and the accuracy of the classification. They hypothesised that when one word is retrieved from memory this causes other semantically related words to be primed and thus retrieval of those words will be facilitated. The formal theory of spreading activation as we know it can be traced back to the work of Quillian [Quillian, 1969] who proposed a formal model for spreading activation in a semantic network. This early theory was little more than a marker passing method where the connection between any 2 nodes was found by passing markers to all adjacent nodes until 2 markers met, similar to a breadth first search. It was the work of Collins and Loftus [Collins and Loftus, 1975] that added the main features of what we today consider spreading activation, such as: Signal attenuation, summation of activation from input nodes and firing thresholds. Despite the obvious theoretical advantages of Collins and Loftus’ model, due to computational restraints much of the work which has used the title of “spreading activation” has very rarely used the full model. Many researchers used a simplified marker passing model [Hirst, 1987], or used a smaller or simplified network because the manual creation of a semantic network (or at least one that fits our definition from section 3.1) was too time consuming ([Crestani, 1997],[Preece, 1981]). The application of spreading activation to information retrieval gained a great deal of support in the 80s and early 90s (Salton and Buckley [1988], Kjeldsen and Cohen [1988]), however the difficulty of manually creating networks, combined with computational intractability of automatically creating networks caused most researchers to abandon this course [Preece, 1981]. Recent improvements in automatic network creation have brought hope for further research in spreading activation theory, and we hope that SaskNet will not only utilise spreading activation in its creation, but also be a testbed for future work in spreading activation research. 26 4.2 Spreading Activation in SaskNet The semantic network created for SaskNet has been designed specifically for use with spreading activation. Each node maintains its own activation level and threshold, and can independently send activation to all surrounding nodes. A monitor process controls the activation and records the order and frequency of node firing. Each of the various types of nodes (object, relation, parent, attribute, etc.) can have its own firing threshold and even its own firing algorithm. Each node type has a global signal attenuation value that controls the percentage of the activation that a node of this type passes on to each of its neighbours when firing. Spreading activation is by nature a parallel process, however it is implemented sequentially in SaskNet for purely computational reasons. While future work may allow parallelisation of the algorithm, the current system has been designed to ensure that the sequential nature of the processing does not adversely affect the outcome. Two separate implementations of the firing algorithm have been created. The first is a pulsing algorithm where each node which is prepared to fire at any given stage fires and the activation is suspended until all nodes have finished firing. This is analogous to having the nodes fire simultaneously on set pulses of time. The second implementation of the firing algorithm uses a priority queue to allow the nodes with the greatest amount of activation to fire first (For more detailed information see Section 4.4).The second algorithm is more analogous to the asynchronous firing of neurones in the human brain, however both implementations have been fully implemented and the user can choose which firing method they wish the system to use. 4.3 Information Integration The power of SaskNet comes from its ability to integrate information from various sources into a single cohesive representation. This is the main goal of the update algorithm (see Section 4.5). Information integration allows a system to take information about a concept or object from two or more sources, and combine that information in such a way that new deductions can be made. For example the famous syllogism below is only possible if we know that men in the major premise and man in the minor premise are referring to the same class of object (i.e., that we have resolved the two labels men and man into a single concept). 27 All men are mortal. Socrates is a man. Therefore Socrates is mortal. To further illustrate the point, consider what happens when such an integration takes place erroneously. All beetles are insects. Paul McCartney is a Beatle. Therefore Paul McCartney is an insect. The reason that this example breaks down is because we incorrectly integrated the two premises by aligning beetles and beatle which, despite being very similar in spelling and identical in pronunciation are semantically distinct concepts. Most of the research on information integration has been done in the database paradigm, using string similarity measurements to align database fields [Bilenko et al., 2003]. Research done on natural language information integration has mostly centred on document clustering based on attributes gained from pattern matching [Wan et al., 2005]. One particularly interesting line of research is the work of Guha and Garg [Guha and Garg, 2004]. They propose a search engine which clusters document results which relate to a particular person. The proposed methodology is to create binary first order logic predicates (e.g.,first name(x,Bob), works for(x,IBM)) which can be treated as attributes for a person, and then using those attributes to cluster documents about one particular individual. This amounts to a simplified version of the problem SaskNet attempts to solve, using a simplified network, and limiting the domain to personal information; the results, however, are promising. 4.4 Firing Algorithm Ultimately each node in a neural network should act independently; firing whenever it receives the appropriate amount of activation. This asynchronous communication between nodes is more directly analogous to the workings of the human brain, and most spreading activation theories assume a completely asynchronous model. In practice, it is difficult to have all nodes operating in parallel. SaskNet attempts to emulate an asynchronous network through its firing algorithm. 28 Each network has a SemFire object (see Section 3.2.1 for class diagram) which controls the firing of the nodes in that network. When a node in the network is prepared to fire, it sends a firing request to the SemFire object. The SemFire object then holds the request until the appropriate time before sending a firing permission message to the node to allow it to fire. Two separate firing algorithms have been implemented in SaskNet. Pulse Firing The pulse firing algorithm emulates a network where all nodes fire simultaneously at a given epoch of time. Each node that is prepared to fire at a given time fires, and the system waits until all nodes have fired and all activation levels have been calculated before beginning the next firing round. To implement this algorithm, the SemFire object retains two lists of requests, the first is the list of firing requests which will be fulfilled on this pulse, we will call this list the pulse list. The second list contains all request made during the current pulse, we will call this the wait list. The SemFire object fires all of the nodes with requests in the pulse list, removing a request once it has been fulfilled (in this algorithm the order of firing is irrelevant), while placing all firing requests it receives into the wait list. Once the pulse list is empty, and all requests from the current pulse have been collected in the wait list. The SemFire object simply moves all requests from the wait list into the pulse list, and is then ready for the next pulse. Priority Firing The priority firing algorithm emulates a network where the amount of activation received by a node dictates the speed with which the node fires. Nodes receiving higher amounts of activation will fire faster than nodes which receive just enough to meet their firing threshold. To implement this algorithm, the SemFire object retains a priority queue of requests, where each request is assigned a priority based on the amount of activation it received over its activation threshold (4.2) The SemFire object fulfills the highest priority request; if a new request is received while the first request is being processed, it is added to the queue immediately. 29 pirority = α ∗ (activation received - firing level) α (4.2) Symbol Definitions Node type variable The two firing algorithms would be equivalent if all of the activation in the network spread equally. However when a node fires, it sends out a set amount of activation, and excess activation received above the firing threshold disappears from the network. The effect of this disappearing activation is that the order in which nodes fire can change the final pattern of activation in the network. It is therefore important that both firing algorithms be implemented and tested so that a choice can be made based on their contribution to the performance of the system. 4.5 Update Algorithm The update algorithm takes a smaller network or network fragment (update network ) and integrates it into a larger network (main network ). Essentially the same algorithm is used for updating at the sentence level and at the document level. When updating at the sentence level, the update network represents the next sentence in the document and the main network represents all previous sentences in the document. When updating at the document level, the update network represents the document, and the main network represents all of the documents that have been processed by the system. This algorithm has not yet been implemented, and so the details cannot be given in this paper. This section will attempt to give a high level overview of how the algorithm will work by walking through a simple example1 . For this example, we will use the update network shown in figure 4.2 being applied to the main network shown in figure 4.1. All nodes in this example will be referred to by their ID field. Initially, all object nodes from the update network are matched with any similar nodes from the main network. The nodes are compared on simple 1 This example uses a simplified network to avoid unnecessary details. As this algorithm has not yet been implemented the numbers used are speculative at best and not based on the actual calculations that will be performed. This example should be treated only as an illustration of the algorithm at a very high level. 30 ID=bush found in ID=garden garden ID=georgebush George W. Bush TYPE=per President Bush bush Geroge Bush type of ID=plant ID=gore is plant ID=algore flora TYPE=per Al Gore result of President Mr. Gore ID=pres President of the United States Vice President Gore was ID=vp gore ID=violence violence President of the US appoints Vice President lives ID=whitehouse Vice President of the U.S.A. White House Figure 4.1: An example main network containing information about United States politics, gardening and violence ID=bu? Bush TYPE=per beat ID=go? TYPE=per Gore to ID=wh? Whitehouse Figure 4.2: An example update network created from the sentence ”Bush beat Gore to the Whitehouse” similarity characteristics such as string similarity and named entity type similarity. A similarity score is then calculated for each node pairing producing the matrix shown in Table 4.1. bu? go? wh? georgebush 0.5 bush 0.7 algore gore 0.5 0.7 whitehouse 0.8 Table 4.1: Similarity Matrix: Initial scoring As we can see in table 4.1, the initial scoring is more likely to match bu? with bush instead of the correct matching with georgebush. This is because 31 the labels in bu? and bush are identical, which outscores the named entity type similarity in bu? and georgebush. Once the initial scoring is completed, the algorithm chooses a node from the update network (in this case bu?) and uses the scores in its row of the similarity matrix to set the initial activation level of the nodes in the main network. In this instance, the algorithm will fire the bush and georgebush nodes, with the bush node receiving more initial activation. After the activation has spread through the system, the algorithm checks all of the nodes in the main network and records the amount of activation they received. It then uses that score to update the scores for all the non chosen nodes in the update network. In this case, algore would have received some activation, and whitehouse would have received a fairly large amount of activation. We therefore have increased confidence that those are the correct nodes to use in our mapping, and therefore we increase their scores accordingly. Likewise the scores of the nodes which received no activation will have their scores decreased. The result is shown in Table 4.2. bu? go? wh? georgebush 0.5 bush 0.7 algore gore 0.55 0.5 whitehouse 0.9 Table 4.2: Similarity Matrix: After testing bu? node. Note that in Table 4.2 the first row was not changed. This is the row that we used in our initial firing, and thus we can not gain accurate information about the changes in its status. The algorithm now chooses another node from the update network and repeats the process. In this case it chooses wh?. After firing whitehouse, the georgebush and algore nodes both receive activation, therefore their scores are updated accordingly, resulting in the matrix shown in Table 4.3. bu? go? wh? georgebush 0.65 bush 0.5 algore gore 0.65 0.3 whitehouse 0.9 Table 4.3: Similarity Matrix: After testing wh? node. The algorithm continues in this manner until all nodes have been fired. Note that on this iteration, it chooses go?, which results in a much larger effect 32 than if it had been chosen on the first iteration, because its similarity score to algore is much higher, resulting in less wasted activation going to the gore node. After the third iteration, the similarity matrix appears in Table 4.4. bu? go? wh? georgebush 0.7 bush 0.3 algore gore 0.65 0.3 whitehouse 0.95 Table 4.4: Similarity Matrix: After testing wh? node. This process repeats until the scores converge or some stopping criteria is met (either a set number of iterations, or a minimum change in matrix values). After each iteration the scores in the similarity matrix improve which in turn increases the effectiveness of the next iteration. This process must be repeated incrementally so that small amounts of activation can be used which will not cause small mistakes in the initial scoring to overpower the rest of the algorithm. Eventually, our example should map wh? to whitehouse, bu? to bush and go? to gore. Resulting in the updated network shown in Figure 4.3. ID=bush found in ID=garden garden ID=georgebush George W. Bush TYPE=per President Bush bush Geroge Bush type of ID=plant beat is ID=algore flora TYPE=per President Mr. Gore to ID=pres Vice President Gore President of the United States ID=violence violence President of the US was ID=vp gore result of Al Gore plant ID=gore appoints Vice President Vice President of the U.S.A. ID=whitehouse lives White House Figure 4.3: Network resulting from application of the update algorithm This is a very simplified example, and many features of the algorithm have been abstracted away. There may also be additional features which will be added as the algorithm is developed. However we believe that this example demonstrates that the update algorithm is theoretically sound. The next stage of the research will be to focus on a complete implementation of the system (see Section 7.1). 33 4.6 Cleanup Algorithm It is possible that the update algorithm will not have sufficient information to justify the merging of two nodes even if those two nodes do in fact represent the same real world object. It is quite likely that in some cases two nodes which at first appear to have little or nothing in common will later be shown to represent the same entity. The cleanup algorithm attempts to resolve these problems with the system by providing a mechanism for merging nodes after they have already been placed into the network. Like the update algorithm, the cleanup algorithm has not yet been implemented, so this section will attempt to explain the algorithm in high level detail, but will not be able to give details of the implementation. One of the major advantages of using spreading activation in SaskNet is that if two nodes represent the same real world entity, eventually they will become so closely linked that when one of the node fires, the other will surely receive enough activation to fire as well. For example, if in Figure 4.1 from the previous section, we were to add a new node with the label “President Bush”, eventually that node should become linked to the nodes with the labels “white house” and “president of the United States” (and also any new nodes added to the system that relate to both objects). The cleanup algorithm must first chooses a source node from the network, and then uses spreading activation to identify a partner node which may represent the same semantic object. The source node is fired, but the activation is constrained so that it can only travel though one relation. This is equivalent to “pausing” the activation after it has travelled one relation away. At that point all of the objects with activation (except for the source node) are recorded in a list we will call the neighbour list.The activation is then allowed to proceed through one more relation. The nodes which fire on this second “pulse” are placed into a list called the potential partners list. Any nodes which were in the neighbour list are removed from the potential partners list, and then the potential partners list is sorted by amount of activation currently held. The node in the potential partners list with the highest level of activation is our partner node. The same process is repeated but this time using the partner node as our initial source of activation. If after sorting the potential partners list, the source node is among the top nodes it receives a high similarity score. This is added to a similarity score calculated based on label string similarity and named entity type to receive a final similarity score. If that score is above a cutoff threshold we have found a match and the two nodes can be mapped together. 34 To clarify the intuition behind this algorithm consider Figure 4.4. We are attempting to find a node which has similar relations to our source node, but does not have a direct link to it. Thus when we send activation out one link away from our source node we should have activated many nodes closely linked with our partner, but not activated our partner node. When we fire all of the neighbour links, we should therefore send a great deal of activation to our partner node. The second stage of the algorithm is simply repeating the process with the source and partner nodes switched so that we do not simply assume a partner node is semantically similar to our source node if it has strong links to a great number of nodes. To understand why this is necessary, imagine a scenario where we have a single node with a strong connection to almost every node in a large network, then choosing any node not directly linked to it as a source will likely result in choosing that node as a partner, however the second stage of the algorithm would result with a very large neighbour list and thus is unlikely to send much of its activation to our original source node. Figure 4.4: Finding a partner node 35 Chapter 5 Similar Projects The potential usefulness of a large scale semantic knowledge base can be attested to by the number of projects currently underway to build one. In this section we will survey several of the larger and more successful efforts at building a network similar to that which the SaskNet project hopes to achieve. There are two broad classes of projects that attempt to build large scale knowledge networks. Traditionally manual creation has been the methodolgy of choice, but more recently projects using automated creation have begun. 5.1 Manually Constructed Networks Manual creation of large scale semantic networks is a very labour intensive task. Projects of this nature can easily take decades to complete and require hundreds of contributors. However in most cases manual creation ensures a highly reliable network is created and every entry in the network can be used with confidence as it has been tested by humans. By far the most widely used knowledge network in development today is WordNet [Fellbaum, 1998]. Begun in 1985 at Princeton University, WordNet organises words into senses or distinct meanings which are connected through a discrete number of semantic relations, and contains over 200 000 word senses [Word Net]. WordNet is designed following psycholinguistic theories of human memory, and is mainly focused on formal taxonomies of words. It is primarily a lexicographic resource, rather than an attempt at semantic knowledge network, however it has been used in many cases to approximate a semantic network (see Chapter 6, and is therefore included 36 in this list. The Cyc Project [Lenat, 1995] attempts to focus on common knowledge, or assertions which are too simple and obvious to be given in dictionaries or other forms of text, but that a native speaker of English can take for granted that his/her audience knows. The Cyc Project is manually created one assertion at a time by a team of knowledge engineers, and contains over 2.2 million assertions relating over 250 000 terms [Matuszek et al., 2006]. ConceptNet [Liu and Singh, 2004b] (previously known as OMCSNet) uses a semantic network similar to the network created for SaskNet. Nodes are small fragments of English connected by directed relations. The primary difference between ConceptNet and the semantic network formalism used in SaskNet is that the relations in Concept Net are selected from a set of 20 pre-defined relations, and ConceptNet is only able to contain definitional data and thus does not require a hierarchical structure. ConceptNet uses the OpenMind corpus [Singh et al., 2002] to acquire its knowledge. This is particularly interesting because the OpenMind corpus is created by the general public. Visitors to a webpage are presented with text such as “A knife is used for ...”, and are then asked to provide text fragments to fill in the rest of the sentence. This has allowed ConceptNet to grow rapidly. ConceptNet contains over 1.6 million edges connecting more than 300 000 nodes [Liu and Singh, 2004a]. 5.2 Automatically Constructed Networks The labour intensive nature of manually creating a semantic network makes automatic creation of networks an obvious goal for researchers [Crestani, 1997]. However it is only recently that advances in natural language processing techniques have made automatic creation a possiblity. Semantic networks created automatically will naturally be more likely to contain errors that would not be introduced to manually created networks, however for many tasks the great decrease in time and labour required to build a network, combined with the ability to use larger corpora will make up for the decrease in accuracy [Dolan et al., 1993]. There have recently been promising results in semi-automated knowledge acquisition. Pantel and Pennacchiotti [2006] details the Espresso system, which automatically harvests relations from natural language text. Although the system uses semi-supervised learning for each relation, and thus requires human intervention for each new relation type, it is nonetheless very promising that a simple pattern matching based algorithm has been shown 37 to perform well when used on a web based corpus [Patrick Pantel and Hovy, 2004]. The most promising project currently underway in the field of automatic network construction is MindNet [Dolan et al., 1993]. Started in 1993 at Microsoft Research, MindNet uses a wide coverage parser to extract pre defined relations from dictionary definitions. To illustrate the difference in the automated construction approach, the MindNet network of over 150 000 words connected by over 700 000 relations can be created in a matter of hours on a standard personal computer [Richardson et al., 1998]. Of all the projects listed here, MindNet is the most similar in methodology to SaskNet. However MindNet uses a more traditional phrase-structure parser and only analyses dictionary definitions which tend to have much less linguistic variation than newspaper text, and are also more limited in the type of information they convey. MindNet also only uses a small set of pre-defined relations and is essentially an atomic network (see Section 3.1). SaskNet’s relations are defined by the text itself and it is capable of handling arbitrarily complex node structures. Therefore the largest difference between MindNet and SaskNet is that SaskNet can accommodate a much more diverse range of inputs, and can represent a much wider range of information. This will allow SaskNet to use very large document collections to create its network which will hopefully lead to a larger, more diverse and ultimately more useful network. 38 Chapter 6 Potential Uses There is a wide variety of projects that could benefit from the successful development of the SaskNet project. In this section we will mention a few areas and projects in which a large scale semantic knowledge network would be useful. As we have already mentioned in this paper, the field of spreading activation has been seen as promising both in its applications and because of its theoretical basis in human intelligence. It could therefore be very beneficial for further research to be done on spreading activation in semantic networks. Unfortunately this area of research has remained essentially dormant for more than a decade, largely due to computational difficulties and a lack of large scale semantic networks to use as a testbed [Preece, 1981]. We believe that if the SaskNet project is successful, it will not only provide this testbed, but also show that spreading activation is computationally feasible in building a large scale network. Thus the SaskNet project could provide both motivation for further research into spreading activation as well as an accessible platform on which to perform that research. Many projects have used WordNet to measure the semantic similarity between concepts [Budanitsky and Hirst, 2006] for use in projects such as query expansion in information retrieval and word sense disambiguation. WordNet is not particularly well suited to this task since it was created based on formal taxonomies of words rather than semantic relations between concepts. As Hirst and Ong point out “In WordNet, stew and steak are not closely related, but public and professional are.” [Fellbaum, 1998]. SaskNet would be a much better tool for most of these projects because it is actually created as a semantic network, and thus distance between concepts in the SaskNet network is much more directly analogous to semantic similarity 39 than distance in WordNet. The Cyc and ConceptNet projects have been used in many areas such as question answering [Curtis et al., 2005], word sense disambiguation [Curtis et al., 2006], and predictive text entry [Stocky et al., 2004] with a great deal of success. Any of these applications could benefit from SaskNet either in conjunction with existing networks or possibly as a replacement. Since SaskNet will be constructed automatically, it will grow much more rapidly than either Cyc or ConceptNet, and thus many of the projects which use those networks would benefit from a larger, more easily updated network. It is obvious that there are many projects currently using other semantic resources that could benefit from SaskNet, due to its automatic construction it would be possible to make domain specific networks for tasks, or simply expand the network to a much larger scale than is currently available to any of the above mentioned projects. We believe that while the automated nature of SaskNet will necessarily produce more errors than are likely to occur in any of the other resources, the short construction time, wide coverage and flexible nature of SaskNet will make it useful to current and future projects alike. 40 Chapter 7 Future Work The SaskNet project is still under heavy development. The Semantic Network and underlying spreading activation algorithms have been developed, but the update and cleanup algorithms have not yet been completed. Their development is the obvious top priority for the project. Once a fully functioning system has been developed, we can begin testing the system with actual data to test performance. There are many parameters that can be adjusted throughout the system to optimise performance, such as activation levels of various node types, and similarity scoring metrics for the update algorithm. Some form of pronoun resolution will also likely need to be added to the system to compliment the pronoun resolution offered by CCG2Sem. Once SaskNet has a fully developed and optimised network, there are still many possible improvements that could be made. Much of the network design (particularly the SemNode class) has been designed to minimise the memory requirements for SaskNet. After a fully functioning network has been built, we can compare the size of the network against its speed and possibly sacrifice some of the memory saving features to gain an increase in speed. Fortunately due to the automated nature of SaskNet, building a network is accomplished in hours or days rather than months or years as with manually created networks. Therefore we can easily build sample networks to test various parameters and even change the design of the network itself without requiring a great amount of time and effort. In the immediate future SaskNet will be run on a single CPU system. It may eventually be desirable to migrate SaskNet to a larger computer cluster 41 which could increase both its potential size and its speed. As mentioned earlier in the paper, it may also eventually be desirable to re-implement the firing algorithms to be truly parallel. 7.1 Timeline The section provides an estimate for the completion time for various major stages of SaskNet’s development over the coming year. Enhanced pronoun resolution Completion of update algorithm Completion of cleanup algorithm Optimisation of activation and update algorithm parameters Building large scale test networks Adding additional features & optimisations 42 October December January February March June 2006 2006 2007 2007 2007 2007 Bibliography M. Bilenko, R. Mooney, W. Cohen, P. Ravikumar, and S. Fienberg. Adaptive name matching in information integration. Intelligent Systems, IEEE, 18: 16 – 23, Sep/Oct 2003. Johan Bos. Towards wide-coverage semantic interpretation. In Proceedings of Sixth International Workshop on Computational Semantics IWCS-6, pages 42–53, 2005. Johan Bos, Stephen Clark, Mark Steedman, James R. Curran, and Julia Hockenmaier. Wide-coverage semantic representations from a ccg parser. In Proceedings of the 20th International Conference on Computational Linguistics (COLING-04), pages 1240–1246, Geneva, Switzerland, 2004. T. Briscoe and J. Carroll. Robust accurate statistical annotation of general text. In Proceedings of the 3rd International Conference on Language Resources and Evaluation, pages 1499–1504, Las Palmas, Gran Canaria, 2002. Alexander Budanitsky and Graeme Hirst. Evaluating wordnet-based measures of semantic distance. Computational Linguistics, 32:13 – 47, March 2006. Eugene Charniak. A maximum-entropy-inspired parser. In Proceedings of the First Conference on North American Chapter of the Association for Computational Linguistics, pages 132–139, San Francisco, CA, USA, 2000. Morgan Kaufmann Publishers Inc. Kenneth Ward Church and Patrick Hanks. Word association norms, mutual information, and lexicography. Comput. Linguist., 16(1):22–29, 1990. Stephen Clark and James R. Curran. Parsing the WSJ using CCG and loglinear models. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL ’04), pages 104–111, Barcelona, Spain, 2004. Allan M. Collins and Elizabeth F. Loftus. A spreading-activation theory of semantic processing. Psychological Review, 82(6):407–428, 1975. 43 Micheal Collins. Head-Driven Statistical Models for Natural Language Parsing. PhD thesis, University of Pennsylvania, 1999. F. Crestani. Application of spreading activation techniques in information retrieval. Artificial Intelligence Review, 11(6):453 – 482, Dec 1997. Jon Curtis, G. Matthews, and D. Baxter. On the effective use of cyc in a question answering system. In Papers from the IJCAI Workshop on Knowledge and Reasoning for Answering Questions, Edinburgh, Scotland, 2005. Jon Curtis, D. Baxter, and J. Cabral. On the application of the cyc ontology to word sense disambiguation. In Proceedings of the Nineteenth International FLAIRS Conference, pages 652 – 657, Melbourne Beach, FL, May 2006. William B. Dolan, L. Vanderwende, , and S. Richardson. Automatically deriving structured knowledge base from on-line dictionaries. In Proceedings of the Pacific Association for Computational Linguistics, Vancouver, British Columbia, April 1993. Christiane Fellbaum, editor. WordNet : An Electronic Lexical Database. MIT Press, Cambridge, Mass, USA, 1998. Emden R. Gansner and Stephen C. North. An open graph visualization system and its applications to software engineering. Software — Practice and Experience, 30(11):1203 – 1233, 2000. R. V. Guha and A. Garg. Disambiguating people in search. In 13th World Wide Web Conference (WWW 2004), New York, USA, 2004. G. Hirst. Semantic Interpretation and the Resolution of Ambiguity. Studies in Natural Language Processing. Cambridge University Press, Cambridge, UK, 1987. H. Kamp. A theory of truth and semantic representation. In J. Groenendijk et al., editors, Formal Methods in the Study of Language. Mathematisch Centrum, 1981. Hans Kamp and Uwe Reyle. From Discourse to Logic : Introduction to Modeltheoretic Semantics of Natural Language, Formal Logic and Discourse Representation Theory. Kluwer Academic, Dordrecht, 1993. Rick Kjeldsen and Paul R. Cohen. The evolution and performance of the grant system. Technical report, University of Massachusetts, Amherst, MA, USA, 1988. Douglas B. Lenat. Cyc: A large-scale investment in knowledge infrastructure. Communications of the ACM, 38(11):33 – 38, 1995. 44 H Liu and P Singh. Conceptnet a practical commonsense reasoning tool-kit. BT Technology Journal, 22:211 – 226, Oct 2004a. H. Liu and P Singh. Commonsense reasoning in and over natural language. In Proceedings of the 8th International Conference on Knowledge-Based Intelligent Information & Engineering Systems (KES’2004), Wellington, New Zealand, 2004b. Margaret Masterman. Semantic message detection for machine translation, using an interlingua. In Proceedings of the 1961 International Conference on Machine Translation of Languages and Applied Language Analysis, pages 438 – 475, London, 1962. Cynthia Matuszek, J. Cabral, M. Witbrock, and J. DeOliveira. An introduction to the syntax and content of cyc. In 2006 AAAI Spring Symposium on Formalizing and Compiling Background Knowledge and Its Applications to Knowledge Representation and Question Answering, Stanford, CA, USA, March 2006. D.E. Meyer and R.W. Schvaneveldt. Facilitation in recognizing pairs of words: Evidence of a dependence between retrieval operations. Journal of Experimental Psychology, 90(2):227–234, 1971. Patrick Pantel and Marco Pennacchiotti. Espresso: Leveraging generic patterns for automatically harvesting semantic relations. In Proceedings of Conference on Computational Linguistics / Association for Computational Linguistics (COLING/ACL-06), Sydney, Australia, 2006. Deepak Ravichandran Patrick Pantel and Eduard Hovy. Towards terascale knowledge acquisition. In Proceedings of Conference on Computational Linguistics (COLING-04), pages 771 – 777, Geneva, Switzerland, 2004. S Preece. A Spreading Activation Model for Information Retrieval. PhD thesis, University of Illinois, Urbana, IL, 1981. M. Ross Quillian. The teachable language comprehender: A simulation program and theory of language. Communications of the ACM, 12(8):459 – 476, 1969. Stephen D. Richardson, William B. Dolan, , and Lucy Vanderwende. Mindnet: Acquiring and structuring semantic information from text. In Proceedings of COLING ’98, 1998. G. Salton and C. Buckley. On the use of spreading activation methods in automatic information retrieval. In SIGIR ’88: Proceedings of the 11th annual international ACM SIGIR conference on research and development in information retrieval, pages 147 – 160, New York, NY, USA, 1988. ACM Press. 45 Push Singh, Thomas Lin, Erik T. Mueller, Grace Lim, Travell Perkins, and Wan Li Zhu. Open mind common sense: Knowledge acquisition from the general public. In Lecture Notes in Computer Science, volume 2519, pages 1223 – 1237. Springer Berlin / Heidelberg, 2002. John F. Sowa. Semantic networks. In S. C. Shapiro, editor, Encyclopedia of Artificial Intelligence. Wiley-Interscience, New York, 2nd edition, 1992. Mark Steedman. The Syntactic Process. The MIT Press, Cambridge, MA., 2000. Tom Stocky, Alexander Faaborg, and Henry Lieberman. A commonsense approach to predictive text entry. In Proceedings of Conference on Human Factors in Computing Systems, Vienna, Austria, April 2004. J. van Eijck. Discourse representation theory. In Encyclopedia of Language and Linguistics. Elsevier Science Ltd, 2 edition, 2005. J. van Eijck and H. Kamp. Representing discourse in context. In J. van Benthem and A. ter Meulen, editors, Handbook of Logic and Language. MIT Press, Cambridge MA, USA, 1997. Xiaojun Wan, Jianfeng Gao, Mu Li, and Binggong Ding. Person resolution in person search results: Webhawk. In CIKM ’05: Proceedings of the 14th ACM international conference on Information and knowledge management, pages 163 – 170, New York, NY, USA, 2005. ACM Press. Word Net. Wnstats - wordnet 2.1 database statistics. Viewed 25 July, 2006, 2006. ¡http://wordnet.princeton.edu/¿. 46