Multi-Prototype Vector Space Models of Word Meaning

Transcription

Multi-Prototype Vector Space Models of Word Meaning
Multi-Prototype Vector
Space Models of Word
Meaning
_______________________________________________________________________________________________________________
Authors : Joseph Reisinger & Raymond J. Mooney
REVIEW BY: NITISH GUPTA
ROLL NUMBER : 10461
Introduction
• Automatically judging the degree of semantic similarity between words is
an important task.
• It is useful in Text Classification, Information Retrieval, Textual
Entailment and other language processing tasks.
• The empirical approach to find semantic similarity between words uses the
Distributional Hypothesis i.e. that similar words appear in similar contexts.
• Traditionally word types are represented by a single “prototype” vector of
contextual features derived from co-occurrence information.
• The semantic similarity is measured using some measure of vector distance.
Motivation
• The traditional vector-space models represent a word with a single
“prototype” vector which is independent of context, but the meaning of a
word clearly depends on context.
• A single vector space model is incapable of handling phenomena like
Homonymy and Polysemy. This model is also incapable of handling the fact
that the word meanings violate the Triangle Inequality when viewed at the
level of word types.
Eg. The word club is similar to both bat and association. But its similarity to
the words bat and association clearly depends on the context the word club is
used in.
Methodology
• The authors present a new vector-space model that represents a word’s meaning
by a set of distinct “sense-specific” vectors. Therefore each word will be
represented by multiple vectors each of which will be representing different
context in which the word is used.
• For each word, ‘w’ :
Step 1: For each occurrence of the word ‘w’ a vector will be computed based on its
context which is composed of a 10-word window about the word.
Step 2: A set of ‘K’ clusters is formed using movMF model(mixture of von MisesFisher distributions) which models semantic relatedness using cosine similarity. A set of
𝝅𝒌 (𝒘) representing centroids of the ‘K’ clusters for each word ‘w’ is hence
computed.
• The clusters are not assumed to represent the different senses of the word rather
the authors rely on clusters to capture meaningful variation in word usage.
Methodology
Image showing the methodology of
obtaining clusters from different
contextual appearances of the word
‘Position’.
• The ‘black star’ shows the
centroid of the vectors as would
have been computed by a singlevector model.
• The different clusters and colored
stars show the different sensespecific prototype vectors
pertaining to the different
contexts in which the word
‘Position’ was used in the corpus.
Measuring Semantic Similarity
• Given two words w and w’ the authors define two noncontextual clustered similarity metrics to
measure similarity of isolated words.
where d(:, :) is the cosine similarity index.
• In AvgSim, word similarity is computed as the average similarity of all pairs of prototype
vectors of the words. Since all pair of prototypes of the words contribute in AvgSim, two words
are judged similar if many of their senses are similar.
• In MaxSim, similarity is measured as the maximum overall pairwise prototype similarities.
Since only the closest pair of prototype contributes to the MaxSim, it judges the words as
similar if only one of their senses is very close.
Experimental Evaluation
• The corpus used by the authors include:
• A snapshot of Wikipedia taken on Sept. 29th, 2009, with Wikitext markup and articles with
less than 100 words removed.
• The third edition of English Gigaword Corpus, with articles containing less 100 words
removed.
Judging Semantic Similarity
• For evaluation of various models firstly comparisons of lexical similarity measurements to
human similarity judgments from the WordSim-353 dataset is done.
• Spearman’s rank correlation (𝜌) with average human judgments was used to measure the
quality of various models.
For values of 𝑲 ∈ [𝟐, 𝟏𝟎] on Wikipedia and 𝑲 > 𝟒 on Gigawords Corpus the value of
Spearman’s Correlation factor is in the range of 0.6 – 0.8.
Predicting Near-Synonyms
• Here multi-prototype model’s ability to determine the most closely related word to a target
word is tested. The top ‘k’ most similar words were computed for each prototype of each
target word.
• For each prototype of each word a result from the multi-prototype vector model and one from
a human is given to another human. The quality of measured from the fact that how
frequently was the multi-prototype method chosen.
• The results show that for homonymous words the system gives excellent results as compared
to polysemous words, but for the right number of clusters the polysemous words also give
good results.
Thank You!!
Questions