Taal- en spraaktechnologie
Transcription
Taal- en spraaktechnologie
Covered so far Today Taal- en spraaktechnologie Sophia Katrenko (thanks to R. Navigli and S. P. Ponzetto) Utrecht University, the Netherlands Sophia Katrenko Lecture 3 Covered so far Today Outline 1 Covered so far 2 Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Sophia Katrenko Lecture 3 Covered so far Today Recap Last time, we discussed WSD resources (WordNet, SemCor, SemEval competitions), and also methods: dictionary-based (Lesk, 1986) supervised WSD (Gale et al., 1992) minimally supervised WSD (Yarowsky, 1995) noun categorization Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Today we discuss Chapter 19 (Jurafsky), and more precisely 1 unsupervised word sense disambiguation 2 lexical acquisition Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition WSD methods: an overview Source: Navigli and Ponzetto, 2010. Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Unsupervised WSD Most methods we have discussed so far focused on the classification, where the number of senses is fixed. Noun categorization has already shifted the focus to the unsupervised learning, whereby the learning itself was unsupervised, while the evaluation was done as for the supervised systems. We will move now more to the unsupervised learning, and discuss clustering (as a mechanism) in more detail. Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Unsupervised WSD The sense of a word can never be taken in isolation. The same sense of a word will have similar neighboring words. “You shall know a word by the company it keeps” (Firth, 1957). “For a large class of cases though not for all in which we employ the word meaning it can be defined thus: the meaning of a word is its use in the language.” (Witgenschtein, “Philosophical Investigations (1953)”). Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Unsupervised WSD The unsupervised WSD relies on the observations above: take word occurrences in some (possibly predefined) contexts cluster them assign new words to one of the clusters The noun categorization task followed only the first 2 steps (no assignment for new words). Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Clustering clustering is a type of unsupervised machine learning which aims at grouping similar objects into groups no apriori output(i.e., no labels) a cluster is a collection of objects which are similar (in some way) Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Clustering Types of clustering E XCLUSIVE CLUSTERING (= a certain datum belongs to a definite cluster, no overlapping clusters) OVERLAPPING CLUSTERING (= uses fuzzy sets to cluster data, so that each point may belong to two or more clusters with different degrees of membership) H IERARCHICAL CLUSTERING (= explores the union between the two nearest clusters) P ROBABILISTIC CLUSTERING Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Clustering Hierarchical clustering is in turn of two types B OTTOM - UP ( AGGLOMERATIVE ) TOP - DOWN ( DIVISIVE ) Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Clustering Hierarchical clustering for Dutch text Source: van de Cruys (2006) Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Clustering Hierarchical clustering for Dutch dialects Source: Wieling and Nerbonne (2010) The Goeman-Taeldeman-Van Reenen-project data 1876 phonetically transcribed items for 613 dialect varieties in the Netherlands and Flanders Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Clustering Now . . . clustering problems are NP-hard (it is impossible to try all possible clustering solutions). clustering algorithms look at a small fraction of all possible partitions of the data. the portions of the search space that are considered depend on the kind of algorithm used. Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Clustering What is a good clustering solution? the intra-cluster similarity is high, and the inter-cluster similarity is low. the quality of clusters depends on the definition and the representation of clusters. the quality of clustering depends on the similarity measure. Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Clustering AGGLOMERATIVE CLUSTERING works as follows: 1 Assign each object to a separate cluster. 2 Evaluate all pair-wise distances between clusters. 3 Construct a distance matrix using the distance values. 4 Look for the pair of clusters with the shortest distance. 5 Remove the pair from the matrix and merge them. 6 Evaluate all distances from this new cluster to all other clusters, and update the matrix. 7 Repeat until the distance matrix is reduced to a single element. Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Clustering K-means algorithm Partitions n samples (objects) into k clusters. Each cluster c is represented by its centroid: µ(c) = 1 X x |c| x∈c The algorithm converges to stable centroids of clusters (= minimizes the sum of the squared distances to the cluster centers) E= k X X ||x − µi ||2 i=1 x∈ci Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Clustering K-means algorithm 1 I NITIALIZATION : select k points into the space represented by the objects that are being clustered (seed points) 2 A SSIGNMENT : assign each object to the cluster that has the closest centroid (mean) 3 U PDATE : after all objects have been assigned, recalculate the positions of the k centroids (means) 4 T ERMINATION : go back to (2) until the centroids no longer move i.e. there are no more new assignments Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Clustering K-means algorithm 1 I NITIALIZATION : select k points into the space represented by the objects that are being clustered (seed points) 2 A SSIGNMENT : assign each object to the cluster that has the closest centroid (mean) 3 U PDATE : after all objects have been assigned, recalculate the positions of the k centroids (means) 4 T ERMINATION : go back to (2) until the centroids no longer move i.e. there are no more new assignments Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Clustering K-means algorithm 1 I NITIALIZATION : select k points into the space represented by the objects that are being clustered (seed points) 2 A SSIGNMENT : assign each object to the cluster that has the closest centroid (mean) 3 U PDATE : after all objects have been assigned, recalculate the positions of the k centroids (means) 4 T ERMINATION : go back to (2) until the centroids no longer move i.e. there are no more new assignments Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Clustering K-means algorithm 1 I NITIALIZATION : select k points into the space represented by the objects that are being clustered (seed points) 2 A SSIGNMENT : assign each object to the cluster that has the closest centroid (mean) 3 U PDATE : after all objects have been assigned, recalculate the positions of the k centroids (means) 4 T ERMINATION : go back to (2) until the centroids no longer move i.e. there are no more new assignments Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Clustering Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition K-means: limitations sensitive to initial seed points (it does not specify how to initialize the mean values - randomly) need to specify k, the number of clusters, in advance (how do we chose the value of k?) unable to handle noisy data and outliers unable to model the uncertainty in cluster assignment Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition K-means: limitations sensitive to initial seed points (it does not specify how to initialize the mean values - randomly) need to specify k, the number of clusters, in advance (how do we chose the value of k?) unable to handle noisy data and outliers unable to model the uncertainty in cluster assignment Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition K-means: limitations sensitive to initial seed points (it does not specify how to initialize the mean values - randomly) need to specify k, the number of clusters, in advance (how do we chose the value of k?) unable to handle noisy data and outliers unable to model the uncertainty in cluster assignment Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition K-means: limitations sensitive to initial seed points (it does not specify how to initialize the mean values - randomly) need to specify k, the number of clusters, in advance (how do we chose the value of k?) unable to handle noisy data and outliers unable to model the uncertainty in cluster assignment Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Clustering “Good” choice of seeds: Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Clustering “Bad” choice of seeds: Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Back to unsupervised WSD 1 C ONTEXT CLUSTERING Each occurrence of a target word in a corpus is represented as a context vector Vectors are then clustered into groups, each identifying a sense of the target word 2 W ORD CLUSTERING clustering words which are semantically similar and can thus convey a specific meaning 3 C O - OCCURRENCE GRAPHS apply graph algorithms to co-occurrence graph, i.e.graphs connect pairs of words which co-occur in a syntactic relation, in the same paragraph, or in a larger context Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Back to unsupervised WSD 1 C ONTEXT CLUSTERING Each occurrence of a target word in a corpus is represented as a context vector Vectors are then clustered into groups, each identifying a sense of the target word 2 W ORD CLUSTERING clustering words which are semantically similar and can thus convey a specific meaning 3 C O - OCCURRENCE GRAPHS apply graph algorithms to co-occurrence graph, i.e.graphs connect pairs of words which co-occur in a syntactic relation, in the same paragraph, or in a larger context Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Back to unsupervised WSD 1 C ONTEXT CLUSTERING Each occurrence of a target word in a corpus is represented as a context vector Vectors are then clustered into groups, each identifying a sense of the target word 2 W ORD CLUSTERING clustering words which are semantically similar and can thus convey a specific meaning 3 C O - OCCURRENCE GRAPHS apply graph algorithms to co-occurrence graph, i.e.graphs connect pairs of words which co-occur in a syntactic relation, in the same paragraph, or in a larger context Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Context clustering A first proposal is based on the notion of word space (Schütze, 1992) A vector space whose dimensions are words The architecture proposed by Schütze (1998) Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Context clustering So what is a word space? represent a word with a word vector a co-occurrence vector which counts the number of times a word co-occurs with other words dimension legal clothes vector judge 300 75 robe 133 200 Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Context clustering Now, what can we do with all the vectors? compute the so-called dot-product (or inner product) A · B measure their magnitudes |A| and |B| (Euclidean distance) A · B = x1 ∗ x2 + y1 ∗ y2 |A| = dAC = (1) q (x1 − x0 )2 + (y1 − y0 )2 (2) Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Context clustering Word vectors capture the “topical dimensions” of a word Given the word vector space, the similarity between two words v and w can be measured geometrically, e.g. by cosine similarity: Pm vi wi vw = qP i=1qP sim(v, w) = (3) m 2 m |v||w| 2 i vi i wi Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Context clustering Problem: the word vectors conflate senses of the word we need to include information from the context context vector: the centroid (or sum) of the word vectors occurring in the context weighted according to their discriminating potential Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Context clustering Finally: sense vectors are derived by clustering context vectors into a predefined number of clusters A sense is a group of similar contexts Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Word clustering Lins approach (1998) Extract dependency triples from a text corpus John eats a yummy kiwi (eat subj John) (John subj-of eat) (eat obj kiwi) (kiwi obj-of eat) (kiwi adj-mod yummy) (yummy adj-mod-of kiwi) (kiwi det a) (a det-of kiwi) Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Word clustering Define a measure of similarity between two words The occurrence of a dependency triple (w, r , w 0 ) can be seen as the co-occurrence of three events A: a randomly selected word is w, B: a randomly selected dependency type is r , C: a randomly selected word is w 0 Assume that A and C are conditionally independent given B: P(A, B, C) = P(B)P(A|B)P(C|B) Compute the information content IC(A, B, C) = − log P(A, B, C) based on the independence assumption Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Word clustering Use the similarity scores to create a similarity tree Let w1 , . . . , wn be a list of words in descending order of their similarity to a given word w0 Initialize the similarity tree with single root node w0 For i = 1, . . . , n, insert wi as a child of wj s.t. wj is the most similar one to wi among {w0 . . . wi−1 } Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Word clustering An example of Lin’s output: Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Word clustering Clustering By Committee (Lin and Pantel, 2002) (1) Parse entire corpus using a dependency parser (2) Represent each word as a feature vector (features express the syntactic context in which a word occurs) (3) Create a similarity matrix S such that Sij is the similarity between wi and wj (4) Cluster the words by using group-average clustering not all words are clustered at the first iteration residue words are clustered at later iterations Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Word clustering Clustering By Committee (Lin and Pantel, 2002) (5) Disambiguate Find cluster centroids: word committees For non-centroid words, match their pattern features to the committee words features; features are removed from the word representation (to allow new assignments of the same word) Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Graph-based methods Co-occurrence graphs Based on the notion of co-occurrence graph A graph G = (V , E) where V is the set of vertices, i.e. words E is the set of edges, i.e. typically, Simple co-occurrence relations (e.g. within the same sentence or paragraph) Syntactic relations between pairs of co-occurring words Given a target ambiguous word w, a graph is built of the words co-occurring with w Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Graph-based methods An example of a graph: Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Graph-based methods C URVATURE C LUSTERING (Dorow et al., 2005) Based on curvature (clustering coefficient) Based on the notion of triangle: a triple of vertices {v , v , v 00 } such that {v , v 0 }, {v 0 , v 00 }, {v 00 , v } ∈ E Quantifies the ratio of interconnections of a node with its neighbors curv (v ) = #A #B (4) where A - number of traingles including v , B - possible triangles ) including v = degree(v 2 Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Graph-based methods C URVATURE C LUSTERING (Dorow et al., 2005) 1 Build the co-occurrence graph of a target word w 2 Calculate curvature for each node in the graph 3 Remove nodes whose curvature is below a threshold 4 Each connected component constitutes a meaning (i.e., a sense) of the target word w Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Graph-based methods C URVATURE C LUSTERING (Dorow et al., 2005) Step 1 Build the co-occurrence graph of a target word jaguar. Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Graph-based methods C URVATURE C LUSTERING (Dorow et al., 2005) Step 2 Calculate curvature for each node in the graph. Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Graph-based methods C URVATURE C LUSTERING (Dorow et al., 2005) Step 3 Remove nodes with curvature below a threshold, e.g. < 0.5 Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Graph-based methods C URVATURE C LUSTERING (Dorow et al., 2005) Step 4 Output a meaning for each connected component { ict, os, mac, unix }, { car, engine }, { feline, tiger, jungle } Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Graph-based methods Word Sense Induction (WSI) Actually performs Word Sense Discrimination Aims to divide the occurrences of a word into a number of classes Makes objective evaluation more difficult if not embedded into an application but WSI and WSD are strictly related → the clusters produced are used to sense tag new word occurrences Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Evaluation How to evaluate WSI? Manual evaluation Gold standard clustering Mapping to an existing sense inventory Mapping to an annotated corpus + supervised WSD Pseudowords Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Evaluation Manual evaluation: People are asked to judge the quality of a clustering How to assess the following clustering for jaguar? On a sample basis, each evaluator is asked to judge the similarity of pairs of words from the same cluster and from different clusters (without having such information) Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Evaluation Gold standard clustering Given a gold standard clustering Compare the gold standard with the output clustering Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Evaluation Clusters are compared (also from the previous lecture) using PURITY (to compute purity , each cluster is assigned to the class which is most frequent in the cluster, and then the accuracy of this assignment is measured by counting the number of correctly assigned objects and dividing by the total number of objects clustered) ENTROPY (measures cluster homogeneity; lower entropy → more homogeneous clusters) Ideally, purity should be 1, and entropy - 0. Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Evaluation Evaluation via mapping to an existing sense inventory Clusters are mapped to senses of an existing sense inventory (e.g. WordNet) Lin and Pantel (2002) automatically map clusters to WordNet synsets The similarity between a cluster c and a synset s is P SimW (s, w) SimC(c, s) = w∈c |c| (5) A cluster c is correct if a synset s exists such that SimC(c, s) ≥ a fixed threshold Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Evaluation Evaluation via pseudowords (Schütze, 1992) Generates new words with artificial ambiguity First, select two or more monosemous words, e.g.: pizza, blog Given all their occurrences in a corpus: Yesterday we ate a pizza at the restaurant. Margherita: pizza with mozzarella and tomato I am writing a new post on my blog. How many blogs are there on-line? Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Evaluation Evaluation via pseudowords (Schütze, 1992) Replace them with a pseudoword obtained by joining the monosemous words, e.g. pizzablog Yesterday we ate a pizzablog at the restaurant. Margherita: pizzablog with mozzarella and tomato. I am writing a new post on my pizzablog. How many pizzablog are there on-line? Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Evaluation Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Evaluation To consider hard vs. soft clustering baselines All-in-one: group all words into one big cluster Random: produce a random set of clusters Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Evaluation SemEval 2007: Coarse-grained WSD allows it to reach performance over 80% accuracy Lexical sample WSD even reaches almost 90% Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Evaluation Senseval/Semeval: findings The performance variations are quite consistent with the hardness of the tasks (e.g., all-words fine-grained tasks are more and more difficult) Among supervised systems, instance-based approaches and SVM perform best The most frequent sense baseline is a real challenge in an all-words WSD setting (not for lexical sample tasks) Knowledge-based methods achieve performance similar to the baseline However, some of them can provide justifications for their sense choices (e.g. SSI (Navigli and Velardi, 2005)) Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Lexical acquisition Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Similarity measures A note on measure vs. metric A metric on a set X is a function d, such that d : X × X → R and which has the following properties: d(x, y ) ≥ 0 d(x, y ) = 0 iff x = y d(x, y ) = d(y , x) d(x, z) ≤ d(x, y ) + d(y , z) Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Similarity measures Similarity between two lexical items can be measured in many ways, e.g. using distributional information (corpora counts) using WordNet structure Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Similarity measures For two binary vectors w and v, the most common measures are as follows: measure matching coefficient Dice coefficient Jaccard coefficient Overlap coefficient cosine Sophia Katrenko definition |X ∩ Y | 2|X ∩Y | |X |+|Y | |X ∩Y | |X ∪Y | |X ∩Y | min(|X |,|Y |) |X ∩Y | √ Lecture 3 |X |×|Y | Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Similarity measures If we move to frequency counts: word w v context1 w1 v1 context2 w2 v2 dDice = dDice ... ... ... 2|X ∩ Y | |X | + |Y | Pn 2 i=1 min(wi , vi ) = Pn Pn i=1 wi + i=1 vi Sophia Katrenko Lecture 3 contextn wn vn (6) (7) Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Similarity measures If we move to frequency counts: word w v context1 w1 v1 context2 w2 v2 ... ... ... contextn wn vn Jaccard coefficient dJaccard = dJaccard |X ∩ Y | |X ∪ Y | Pn min(wi , vi ) = Pni=1 i=1 max(wi , vi ) Sophia Katrenko Lecture 3 (8) (9) Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Similarity measures If we move to frequency counts: word w v context1 w1 v1 context2 w2 v2 dManhattan = n X ... ... ... |wi − vi | contextn wn vn (10) i=1 dEuclidean v u n uX = t (wi − vi )2 i=1 Sophia Katrenko Lecture 3 (11) Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Similarity measures If we move to frequency counts: word w v context1 w1 v1 context2 w2 v2 Pn dcosine = qP n wv qiPi n 2 contextn wn vn i=1 i=1 wi Sophia Katrenko ... ... ... Lecture 3 2 i=1 vi (12) Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition WordNet-based measures How to use WordNet to measure relatedness/similarity? The following notions are used: Path between two synsets c1 and c2 , pathlen(c1, c2) (the number of edges in the shortest path in the thesaurus graph between the sense nodes c1 and c2 ) The lowest common subsumer lcs(c1 , c2 ) (the lowest node in the hierarchy that subsumes (is a hypernym of) both c1 and c2 ) Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition WordNet-based measures artifact instrumentation implement device tool trap drill net Figure: Part of the WordNet hierarchy Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition WordNet-based measures The following notions are used: The probability that a randomly selected word in a corpus is an instance of concept c, P(c) (Resnik, 1995) P w∈words(c) count(w) (13) P(c) = N words(c) = the set of words subsumed by concept c, N = the total number of words in the corpus that are also present in the thesaurus. Information content IC(c) = − log P(c) Sophia Katrenko Lecture 3 (14) Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition WordNet-based measures D EFINITIONS Leacock and Chodorow, 1998 (lch) simpath (c1 , c2 ) = − log pathlen(c1 , c2 ) (15) Resnik measure (Resnik, 1995) (res) simresnik (c1 , c2 ) = − log P(lcs(c1 , c2 )) Sophia Katrenko Lecture 3 (16) Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition WordNet-based measures D EFINITIONS Wu and Palmer, 1998 (wup) simwup (c1 , c2 ) = 2 ∗ dep(lcs(c1 , c2 )) len(c1 , lcs(c1 , c2 )) + len(c2 , lcs(c1 , c2 )) + 2 ∗ dep(lcs(c1 , c2 )) Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition WordNet-based measures Lin (1998) has compared two object A and B given their COMMONALITY: the more information A and B have in common, the more similar they are (IC(common(A, B))). DIFFERENCE : the more differences between the information in A and B, the less similar they are (IC(description(A, B)) − IC(common(A, B))). simLin (A, B) = log P(common(A, B)) log P(description(A, B)) Sophia Katrenko Lecture 3 (17) Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition WordNet-based measures How to apply it to WordNet? simLin (c1 , c2 ) = 2 log P(lcs(c1 , c2 )) log P(c1 ) + log P(c2 ) (18) Jiang-Conrath distance (Jiang and Conrath, 1997) distJC (c1 , c2 ) = 2 log P(lcs(c1 , c2 )) − (log P(c1 ) + log P(c2 )) Sophia Katrenko Lecture 3 (19) Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Measures So, what measure is the best? there is no best measure apriori (similarly as there is no machine learning method that always performs the best so-called No-free lunch theorem). different applications may require different measures to be used. Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Measures So, what measure is the best? there is no best measure apriori (similarly as there is no machine learning method that always performs the best so-called No-free lunch theorem). different applications may require different measures to be used. Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Measures So, what measure is the best? there is no best measure apriori (similarly as there is no machine learning method that always performs the best so-called No-free lunch theorem). different applications may require different measures to be used. Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Measures L. Lee. Measures of Distributional Similarity. In Proceedings of the 37th ACL, 1999. DATA : verb-object co-occurrence pairs in the 1988 Associated Press newswire (1000 most frequent nouns). various distributional measures (cosine, Euclidean, others). G OAL : improving probability estimation for unseen co-occurrences: “replaced each noun- verb pair (n, v1 ) with a noun-verb-verb triple (n, v1 , v2 ) such that P(v2 ) ≈ P(v1 ). The task for the language model under evaluation was to reconstruct which of (n, v1 ) and (n, v2 ) was the original cooccurrence.” Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition Measures L. Lee. Measures of Distributional Similarity. In Proceedings of the 37th ACL, 1999. Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition WordNet measures S. Katrenko et al.. Using Local Alignments for Relation Recognition. In JAIR, 2010. Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition To summarize (1) Today, we have looked at clustering methods unsupervised WSD methods Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition To summarize (1) Today, we have looked at clustering methods unsupervised WSD methods Sophia Katrenko Lecture 3 Covered so far Today Unsupervised Word Sense Disambiguation (WSD) Lexical acquisition To summarize (2) ToDo read at home (if you haven’t done it yet) chapter 19 from Jurafsky. Sophia Katrenko Lecture 3
Similar documents
Taal- en spraaktechnologie
Unsupervised WSD Most methods we have discussed so far focused on the classification, where the number of senses is fixed. Noun categorization has already shifted the focus to the unsupervised lear...
More information