Semantic Annotation of Microblog Topics Using Wikipedia Temporal

Transcription

Semantic Annotation of Microblog Topics Using Wikipedia Temporal
Semantic Annotation of Microblog Topics
Using Wikipedia Temporal Information
Tuan Tran
2nd International Alexandria Workshop, Hannover, Germany
Hannover, Germany, 2-3 November 2015
1
Each Russian medalist at the Sochi Games rewarded with a Mercedes-Benz
thestar.com/sports/sochi20… #Sochi #Olympics
Notifications
Messages
Michael
Felger @MikeFleger · 27 Feb 2014
Or it could have been food poisoning from #Sochi food. #bruinstalk
Research Goal
#sochi lang:en since:2014-02-01 until:2014-02-28
Fogo
Von
Slack @FogoVonSlack · 27 Feb 2014
In #Sochi, Russia, did Vladimir a stately pleasure dome decree . . . .
Understanding meanings of past trending hashtags
Delilah
Nichols @killerdee187 · 27 Feb 2014
Tell our #Olympics champions to be role models and reject @McDonalds
sponsorships act.stopcorporateabuse.org/p/dia/action3/… via @StopCorpAbuse
#Sochi
What was
KHL
Blog @KHLblog · 27 Feb 2014
#sochi about in
Pavel Bure could run a #KHL team in #Sochi
prohockeytalk.nbcsports.com/2014/02/24/pav…February 2014 ?
Yahoo Labs and 1 other follow
Flickr @Flickr · 27 Feb 2014
Photographer Diego Jimenez sums up his #Sochi experience and tips for event
#photography blog.flickr.net/en/2014/02/27/…
Hannover, Germany, 2-3 November 2015
2
Each Russian medalist at the Sochi Games rewarded with a Mercedes-Benz
thestar.com/sports/sochi20… #Sochi #Olympics
Notifications
Messages
Michael
Felger @MikeFleger · 27 Feb 2014
Or it could have been food poisoning from #Sochi food. #bruinstalk
Research Goal
#sochi lang:en since:2014-02-01 until:2014-02-28
Fogo
Von
Slack @FogoVonSlack · 27 Feb 2014
In #Sochi, Russia, did Vladimir a stately pleasure dome decree . . . .
Understanding meanings of past trending hashtags
•  Annotation
is a crucial
step !
•  Goal:
Mapping
hashtag to
Wikipedia
pages
Delilah
Nichols @killerdee187 · 27 Feb 2014
Tell our #Olympics champions to be role models and reject @McDonalds
sponsorships act.stopcorporateabuse.org/p/dia/action3/… via @StopCorpAbuse
#Sochi
What was
KHL
Blog @KHLblog · 27 Feb 2014
#sochi about in
Pavel Bure could run a #KHL team in #Sochi
prohockeytalk.nbcsports.com/2014/02/24/pav…February 2014 ?
Yahoo Labs and 1 other follow
Flickr @Flickr · 27 Feb 2014
Photographer Diego Jimenez sums up his #Sochi experience and tips for event
#photography blog.flickr.net/en/2014/02/27/…
Hannover, Germany, 2-3 November 2015
3
Research Goal
Semantic annotation for hash tag is not easy:
•  Hashtags are peculiar
#sochi
#jan25
#mh370
#crimea
vs.
vs.
vs.
vs.
2014_Winter_Olympics
Egypt_Revolution_of_2011
Malaysia_Airlines_Flight_370
Ukrainian_crisis
•  Time matters
#germany :
Greek_government-debt_crisis
European_migrant_crisis
2014_FiFA_World_Cup
July 2014
July 2015
Sep 2015
Hannover, Germany, 2-3 November 2015
4
Semantic Annotations in Twitter
Mostly in tweet level:
•  Tweak the similarities (TAGME, CIKM’10; Liu, ACL’13)
•  Employ Twitter-specific features with human supervision
(Meij, WSDM’12; Guo, NAACL’13)
•  Expand context to users (Cassidy, COLING’12; KAURI;
KDD’13), time (Fang, TACL’14; Hua, SIGMOD’15)
Annotating hashtags is limited to general topic models (Ma,
CIKM’14; Bansal, ECIR’15)
Hannover, Germany, 2-3 November 2015
5
Main Idea
Align contexts from
both sides:
Hard%to%believe%anyone%can%do%worse%than%Russia%in%#Sochi.%Brazil%
seems%to%be%trying%pre;y%hard%though!%spor=ngnews.com…%%%%
#sochi(Sochi%2014:%Record%number%of%posi=ve%tests%
F%SkySports:%q.gs/6nbAA%
•  Twitter: All constituent
tweets of the hashtag
•  Wikipedia: Temporal
signals from page views
and edit history
#Sochi%Sea%Port.%What%a%
beau=ful%site!%#Russia(
2014_Winter_Olympics%
Port_of_Sochi%
Hannover, Germany, 2-3 November 2015
Framework Outline
Problem: Given a trending hashtag, its burst time period T,
identify top-k most prominent entities to describe the
hashtag in T.
Three steps:
1.  Candidate Entities Identification
2.  Entity – Hashtag Similarities
3.  Entity Prominence Ranking
Hannover, Germany, 2-3 November 2015
7
Candidate Entities Identification
Mine from tweets contents via lexical matching.
•  Twitter side: Extract n-grams from tweets (n ≤ 5)
•  Wikipedia side: Build a lexicon for each entity: Anchors of
incoming links, Redirects, Titles, Disambiguation pages
•  Start with sample text, expand via links to increase recall
Hannover, Germany, 2-3 November 2015
8
Entity – Hashtag Similarities: Link-based
•  Build upon phase –
entity similarities
P(t|m): “Commonness”
Commonness(m
t)
count(m t)
count(m t ')
t' W
•  Use Commonness
•  Aggregate linearly to
P(Title|”Chicago”)
hashtag – entity
similarity
44
(Slide from Dan Roth: “Wikification and Beyond: The Challenges of Entity and Concept Grounding”. Tutorial
at ACL 2014)
Hannover, Germany, 2-3 November 2015
9
during the time period T , plus one day to acknowlking
nd 1,
edge possible time lags. We compute the differable.
ence between two consecutive revisions, and exHashtag
Similarities:
e pa- Entity
tract–only
the added
text snippets.Text-based
These snippets
data,
are accumulated to form the temporal context of
an entity the
e during
T , denoted
by Cover
The dis-and texts
T (e).
•  Compare
distributions
of words
hashtags
tribution of a word w for the entity e is estimated
ofbythe
entity between the probability of generating
a mixture
•  Consider
both
entity’s context
static text
and
temporal
edits:
w from the
temporal
and
from
the general
meacontext C(e) of the entity:
uted.
dual
robenmate
P̂ (w|e) = P̂ (w|MCT (e) )+(1
Language model of e’s
CT (e)edited text during
C(e)T
)P̂ (w|MC(e) )
Language model
of e’s latest text where M
and M
are the language models of e based on CT (e) and C(e), respectively. The probability P̂ (w|MC(e) ) can be regarded as corresponding to the background model,
Hannover, Germany, 2-3 November 2015
10
Figure 2: Excerpt of tweets about ice hockey results in the 2014 Winte
linking process between time-aligned revisions of candidate Wikipedia
from prominent entities to marginal ones to provide background, or m
starting from prominent entities, we can reach more entities in the grap
Entity – Hashtag Similarities: Collective
Attention-based
effective
metric that is agnostic to the scaling and
as they are large
time lag of time series (Yang and Leskovec, 2011).
attention to each
Comparethe
thedistance
temporal
correlation
collective
It measures
between
two of
time
series attention
hashtags calls fo
between
the hashtag
entity:
by finding
optimal
shiftingand
andthe
scaling
parameters
the idea of cohe
to match the shape of two time series:
Influence Max
kT Sh
dq (T Se )k
new approach t
ft (e, h) = min
(4)
q,
kT Sh k
use an observe
time series shifted from TSe
by qTunits
where dq (T Se ) is the time series derived from
Se Wikipedia page
tity prominence
by shifting q time units, and k·k is the L2 norm. It
are prominent f
has been proven that Equation 4 has a closed-form
1 and s
updated,
[1]
Jaewon
Yang,
Jure
Leskovec.
“Patterns
of
Temporal
Variation
in
Online
Media”.
WSDM
2011
solution for given fixed q, thus we can design an
related entities.
Hannover, Germany, 2-3 November 2015
11
spond3.3 Ranking Entity Prominence
extract
within
While each similarity measure captures one eviMeij et
dence of the entity prominence, we need to unify
Entity
ms and
all Prominence
scores to obtain Ranking
a global ranking function. In
erwise
this work, we propose to combine the individual
grams.
similarities
using a similarity
linear function:
•  Rank
by the unified
score:
exiconf (e, h) = ↵fm (e, h)+ fc (e, h)+ ft (e, h) (1)
tween the entity e and the hashtag h. We further
entity context. W
ts, apduring the time p
constrain that ↵ + + = 1, so that the ranking
mplicawhere
↵,
,
are
model
weights
and
f
,
f
,
f
are
c 1,t
scores of entities are normalized betweenm
0 and
edge possible tim
ecially
similarity
measures
mentions,
con-ence of
•  To the
learn
theour
model
ω =algorithm
(α,based
β, γ),ison
we
need
a premise
what tw
and
that
learning
more
tractable.
between
text,
temporal
information,
expenTheand
algorithm,
which
automaticallyrespectively,
learns the pa- be-tract only the ad
makes
an entity prominent !
rameters without the need of human-labeled data,
are accumulated
[2] is not
is explained inpremise
detail in Section
5. applicable at topic
an entity
•  The coherence
levele durin
[2]
tribution of a wo
4 Similarity Measures
by a mixture betw
from the temp
inAlgorithms
detail how
the similarity
mea- ACLw2012
RatinovWe
et al.now
“Localdiscuss
and Global
for Disambiguation
to Wikipedia”.
context C(e) of t
sures between hashtags and entities are computed.
Hannover, Germany, 2-3 November 2015
12
Influence Maximization
Observation: Prominent pages are first created / updated with
texts, then linked to other pages
•  Reflect the shifting attentions of users in social events[3]
#love(#Sochi(2014:%Russia's%ice%hockey%dream%
ends%as%Vladimir%Pu=n%watches%on%…%
Vladimir_Pu>n3
Sochi3
#sochi%Sochi:%Team%USA%takes%3%more%medals,%
tops%leaderboard%|%h;p://abc7.com%
h;p://adf.ly/dp8Hn%%
Russia3
2014_Winter_Olympics3
Russia_men’s_na>onal
_ice_3hockey_team3
#Sochi%bear%aWer%#Russia's%hockey%team%
eliminated%with%loss%to%#Finland(
I'm%s=ll%happy%because%Finland%won.%Is%that%too%
stupid..?%#Hockey%#Sochi(
…3
[3]
Ice_hockey_at_the_2014_
Winter_Olympics3
Finland3
United_States3
Ice_hockey3
Keegan et al. “Hot off the wiki: Dynamics, Practices, and structures in Wikipedia..” WikiSym 2011
Hannover, Germany, 2-3 November 2015
13
Influence Maximization
Premise: Rank top-k entities so as to maximize the spreading
to all other candidates.
#love(#Sochi(2014:%Russia's%ice%hockey%dream%
ends%as%Vladimir%Pu=n%watches%on%…%
Vladimir_Pu>n3
Sochi3
#sochi%Sochi:%Team%USA%takes%3%more%medals,%
tops%leaderboard%|%h;p://abc7.com%
h;p://adf.ly/dp8Hn%%
Russia3
2014_Winter_Olympics3
Russia_men’s_na>onal
_ice_3hockey_team3
#Sochi%bear%aWer%#Russia's%hockey%team%
eliminated%with%loss%to%#Finland(
I'm%s=ll%happy%because%Finland%won.%Is%that%too%
stupid..?%#Hockey%#Sochi(
…3
Ice_hockey_at_the_2014_
Winter_Olympics3
Finland3
United_States3
Ice_hockey3
Hannover, Germany, 2-3 November 2015
14
Experiments
Data:
•  Collect 500 million tweets for 4 months (Jan-Apr 2014) via
Streaming API.
•  Process and sample distinct trending hashtags
o  Several heuristics + clustering methods[4] used à 2444
trendings
o  3 inspectors chose 30 meaningful trending hashtags
[4]
Lehmann et al. “Dynamical Classes of Collective Attention in Twitter”. WWW 2012
Hannover, Germany, 2-3 November 2015
15
Experiments
Baselines:
•  Wikiminer (Milne & Witten, CIKM 2008)
•  Tagme (Ferragina et al., IEEE Software 2012)
•  KAURI (Shen et al., KDD 2013)
•  Meij method (Meij et al., WSDM 2012)
•  Individual similarities : M (link), C (text), T (temporal)
Evaluation: 6,965 entity-hashtag pairs are evaluated from
0-1-2 scales (5 evaluators, inter-agreement 0.6)
Hannover, Germany, 2-3 November 2015
16
Experiments
P@5
P@15
MAP
Tagme
Wikiminer
Meij
Kauri
M
C
T
IPL
0.284
0.253
0.148
0.253
0.147
0.096
0.500
0.670
0.375
0.305
0.319
0.162
0.453
0.312
0.211
0.263
0.245
0.140
0.474
0.378
0.291
0.642
0.495
0.439
Table 2: Experimental results on the sampled trending hashtags.
ne We compare IPL with other entity ann methods. Our first group of baselines inentity linking systems in domains of genxt, Wikiminer (Milne and Witten, 2008),
ort text, Tagme (Ferragina and Scaiella,
For each method, we use the default param-
in general
6.2Better
Results
and Discussion
signal
is
Table 2 shows theNon-verbal
performance
comparison
good
methods using the standard metrics for a ra
system (precision at 5 and 15 and MAP at 1
general, all baselines perform worse than re
Hannover, Germany, 2-3 November 2015
in the
literature, confirming the higher 17comp
Experiments
Better at top
P@5
P@15
MAP
Tagme
Wikiminer
Meij
Kauri
M
C
T
IPL
0.284
0.253
0.148
0.253
0.147
0.096
0.500
0.670
0.375
0.305
0.319
0.162
0.453
0.312
0.211
0.263
0.245
0.140
0.474
0.378
0.291
0.642
0.495
0.439
Table 2: Experimental results on the sampled trending hashtags.
Better when
ne We compare IPL with other entity
an6.2 Results and Discussion
including
n methods. Our first group of baselines
in-entities.
low-ranked
Table 2 shows the performance comparison
entity linking systems in domains of genxt, Wikiminer (Milne and Witten, 2008),
ort text, Tagme (Ferragina and Scaiella,
For each method, we use the default param-
methods using the standard metrics for a ra
system (precision at 5 and 15 and MAP at 1
general, all baselines perform worse than re
Hannover, Germany, 2-3 November 2015
in the
literature, confirming the higher 18comp
Experiments
Manually class events (hash tags’ peaks) to endogenous /
exogenous via checking tweets’ contents
0.6"
0.5"
Exogenous"
0.4"
MAP
dow of 2 mont
constructed an
method is stabl
lines by a larg
hashtag has bee
noise, our meth
nent entities w
Endogenous"
0.3"
0.2"
0.1"
0"
Tagme" WM" Meij" Kauri"
M"
C"
T"
IPL"
Figure 3: Performance of the methods for different
Hannover, Germany, 2-3 November 2015
7 Conclusio
19
In this work,
Experiments
Influence of size of burst time period:
•  Larger window à more noisy introduced
Hannover, Germany, 2-3 November 2015
20
Conclusions
•  Semantic Annotation in topic level is difficult
•  Lesson learnt: Can be improved by exploiting temporal and
contexts from both sides (non verbal evidences are
promising)
•  Future direction: Improve efficiency, text-based similarities
Hannover, Germany, 2-3 November 2015
21
Thank you J
Question ?
Hannover, Germany, 2-3 November 2015
22
Terminology
Trending hashtag: There are peaks[1] from daily tweet count
time series:
•  Variance score > 900
•  Highest peak > 15 x median of 2-month window sample
Burst time periods: w-window around one peak
Entities: Wikipedia pages, no redirects, disambiguation, lists
•  Entity text, view count per day, edits during T
[1]
Lehman et al. “Dynamical classes of collective attention in Twitter”. WWW 2012
Hannover, Germany, 2-3 November 2015
23
Candidate Entities Identification
Mine from tweets contents, via lexical matching.
•  Twitter side: Extract n-grams from tweets (n ≤ 5)
o  Parse POS tags for tokens, filter patterns using rules
•  Wikipedia side: Build a lexicon (anchors, redirects, titles,
disambiguation pages)
•  Practical issue:
o  Start from sample tweets
o  Expand to incoming / outgoing linked entities
Hannover, Germany, 2-3 November 2015
24
mation
spreading
Input
: h,the
T, D
learning rate
µ, threshold
✏
T (h), B, k,
where
B is
influence
transition
matrix,
sh are
the
sequel
we de!, top-k most
prominent
entities.
weets of the hash- the Output:
initial influence
scores
that are
based on the enm of ranking the tity Initialize:
prominence
! :=model
! (0) (Step 1 of IPL), and ⌧ is the
es identifying the damping
Calculate
fm , fc , ft , f! := f!(0) using Eqs. 1, 2, 3, 4
factor.
while true do
thenumber
entity influest
of enf̂! := normalize
f!
Learning
Algorithm
e entity
influence
problem
is known 5.3Maximization-based
Influence
Set sh := f̂! , calculate rh using Eq. 6Learning
g h, wemaximizaconstruct
fluence
rh , get
theIPL
top-kalgorithm.
entities E(h,The
k) objective
Now weSort
detail
the
P
if the
L(f!(e,=
h),(↵,
r(e, h))
✏ then
where the nodes
e2E(h,k)
is to learn
model
, )<of
the global
Stop
entitiesMeasure
(cf. Sec- the
observed
activities
function
(Equationspreading
1). The general
idea is thatvia
we entities
end
eVLearning
(IPL)
P
that scores
h indicates
influence
:
!
=
!
µ
(e, h), r(e,
h)) with
find an optimal ! such
that rL(f
the average
error
e2E(h,k)
m
(Kempe
et
al.,
dia article to ei ’s.
end to the top influencing entities is minimized
respect
an
approximation
!, E(h, k)the loss w.r.t. influence score r:
•  inversed
Learn ω toreturn
minimize
raph
are
X
earn
influence
a, asthe
such
a link
! = arg min
L(f (e, h), r(e, h))
ntity
prominence
” from the destiE(h,k)
ed IPL) contains
baseline method suggested by Liu et al. (2014):
score is estimated via random walks:
ng of • twoInfluence
steps: where
r(e, h) is the influence score of e and h,
k, we assume that
compute the entity E(h, k)
:=the
rh is
⌧ Br
(6)
h+
h
set
of(1top-k⌧ )sentities
with highest
nfluence score to
e influence scores, r(e, h), and L is the squared error loss function,
wer related ones.
(x the
y)2 influence transition matrix, sh are
where
B
is
the sequel
we de-and
• 
r(e,h)
f(e,h)
is
jointly
learnt via gradient descent method
L(x,
y)
=
.
ess measure sug2
the
initial
thatinare
based on1.theWe
enThe
maininfluence
steps arescores
depicted
Algorithm
8):
tity prominence
model
(Step
IPL),
and ⌧ isthe
the
start
with an initial
guess
for 1!,ofand
compute
I2 |) log(|I1 \I2 |)))
damping factor.
similarities
for the candidate entities. Here fm , fc ,
og(min(|I1 |,|I2 |))
the entity influ- f , and f represent the similarity Hannover,
Germany, 2-3 November 2015
25
score vectors.
We
t
!
tively.
ink-based
Mention
abilistic
priorSimilarity
in mapping the mention to the engarded
where
M
an
C
tity
via
the
lexicon.
One
common
way
to
estimate
T (e)
milarity of an entity with one individual
while
els
of
e
based
the
entity
prior
exploits
the
anchor
statistics
from
n in a tweet can be interpreted as the probgroun
tively.
The
prob
Wikipedia
links,
and
has
been
proven
to
work
well
c prior
in
mapping
the
mention
to
the
enEntity – Hashtag Similarities: Link-based
setting
in
different
domains
of
text.
We
follow
this
apgarded
as
correspo
the lexicon. One common way to estimate
lihood
P |lm (e)| while
proach
and
define
LP
(e|m)
=
as
the
P̂
(w|M
CT
ty prior exploits the anchor statistics from
m0 |lm0 (e)|
P̂
(w|M
Builtlink
upon
direct
similarities
of tweets
– entities:
prior
of
the
entity
e
given
a
mention
wheremodel in
ground
dia links, and has been proven to work well m,
tfw,cT
lm (e) on
is the
set of links(Meij,
withWSDM12;
anchor m Fang,
that
point
settings.
Here
w
• 
Based
commonness
TACL14)
|CT (e)
rent domains of text. We follow this apquenc
to e. The mention similarity
fm is measured
as theestimation
lihood
|l
(e)|
No.
of
incoming
links
m
P
and define
LP
(e|m)
=
as entity
the to eewith
anchor
|l
(e)|
0
0
aggregation of link priors
of
the
all m )and=C
m
m
P̂over
(w|M
C(e)
or of thementions
entity e given
a
mention
m,
where
are th
in all tweets with the hashtag h: tfw,c
T
,
where
tf
Aggregate
hashtag
level,
weighted
by the
frequency:
s the• set
of links to
with
anchor
m
that
point
use
th
|C
(e)|
T
X
w,D(h
quencies
in
he mention similarity
the
fm (e, h) f=m is measured
(LP (e|m)as
· q(m))
(2) of tfw
|D(h)
and CT (e), all
respe
ation of link priors of them entity e over all
tw
No. of times
m the lengths of
are
ns in all where
tweetsq(m)
with isthe
hashtag
h:
Kullba
in h’sm
tweets
the frequency of theappears
mention
over
use the same
est
tributi
allX
mentions of e in all tweets of h.
tfw,D(h)
Hannover, Germany, 2-3 November 2015
26
tion 3.1), and an edge (ei , ej ) 2 Vh indicates that
find an o
in direction
Wikipedia,
as such article
a link to ei ’s. ! = arg min
there istoalinks
link in
from
ej ’s Wikipedia
respect t
gives Note
an “influence
endorsement”
fromgraph
the destithat edges
of the influence
are inversed
nationinentity
to the to
source
direction
linksentity.
in Wikipedia, as such a link
!=
where r(e, h) is
gives
an “influence
endorsement”
fromthat
the destiEntity
Relatedness
In
this
work,
we
assume
Influence nation
Graph
entity to
the of
source
entity. score to E(h, k) is the se
an entity endorses
more
its influence
r(e, h), where
and L is
highlyEntity
relatedRelatedness
entities than toInlower
related
ones.
r
this work, we assume that
(x y)2
y) =
use
popular
entity relatedness
measure
sug-scoreL(x,
E(h,2k)
•  A linkWe
from
to b indicates
an “influence
endorsement”
anaaentity
endorses
more
of its
influence
to
The main
gestedhighly
by Milne
and Witten
r(e,steps
h),
related
entities(2008):
than to lower related ones.
start withL(x,
an initi
from b to a
y)
log(max(|I
We
use
a
popular
entity
relatedness
measure
sug1 |,|I
2 |) log(|I1 \I
2 |)))
M W (e1 , e2 ) = 1
similarities for th
log(|E|) log(min(|I1 |,|I2 |))
The m
gested by Milneisand
Witten (2008):
•  Level of endorsement
proportional
to the relation
weight:
ft , and f! represen
start wit
where I1 and I2 are sets of entities
having links to
log(max(|I1 |,|I2 |) log(|I1 \I2use
|)))matrix multip
(e1 , e2 ) = 1and E is
similarit
e1 andMe2W
, respectively,
the set log(min(|I
of all entilog(|E|)
1 |,|I2 |))
ties efficiently. In
ft , and f!
ties in Wikipedia. The influence transition from ei
f
such
that
the e
!
where I1 and I2 are sets of entities having links to
use
matr
to ej is to
defined
the normalized
value:
•  Normalize
haveasinfluence
matrix:
dom walk is perf
e1 and e2 , respectively, and E is the set of all entities
effic
score
r
.
Then
w
h
M W (eThe
ties in Wikipedia.
i , ej )influence transition from ei
f! sucho
bi,j = P
(5)
descent
method
to ej is defined
the
value:
W normalized
(ei , ek )
(ei ,ek )2VasM
dom
wa
derive the
gradien
score
Influence Score Let rM
be (e
the
score
remark that
ourrhra
i , einfluence
j)
h W
bi,j = P
Germany, 2-3 November 2015 (5)
27
descent
m
vector of entities in G . We can Hannover,
estimate
r effito context-sensiti
Iterative
Influence-Propagation Learning
nce as
its
Algorithm 1: Entity Influence-Prominence Learning
spreading
Input : h, T, DT (h), B, k, learning rate µ, threshold ✏
Output: !, top-k most prominent entities.
the hashnking the
Initialize: ! := ! (0)
ifying the
Calculate fm , fc , ft , f! := f!(0) using Eqs. 1, 2, 3, 4
while true do
ber of enf̂! := normalize f!
is known
Set sh := f̂! , calculate rh using Eq. 6
maximizaSort
Prh , get the top-k entities E(h, k)
ing (IPL)
mpe et al.,
oximation
influence
ominence
if
e2E(h,k)
Stop
end
! := !
µ
L(f (e, h), r(e, h)) < ✏ then
P
end
return !, E(h, k)
e2E(h,k)
rL(f (e, h), r(e, h))
Hannover, Germany, 2-3 November 2015
28
Hashtag Sampling
•  Calculate for each peak, the vector (fa,fb,fc) of portion of
tweets before, during, and after the peak time.
•  Clustering with EM, choose 4 most plausible clusters.
•  Sample separately from each cluster
Hannover, Germany, 2-3 November 2015
29