influencer - Marketwired

Comments

Transcription

influencer - Marketwired
Contextual
INFLUENCER
GRAPHS
on
SOCIAL NETWORKS
CONTEXTUAL INFLUENCER GRAPHS ON SOCIAL NETWORKS
It’s who you know, not how many!
1. INTRODUCTION
With the rapid growth in acceptance and importance of online social networks such as Twitter, Facebook and
LinkedIn, a very intriguing question has started to arise: who are the key influencers in a given social network or
community? There are billions of conversations happening across social networks, blogs, and forums, and it is
imperative for companies to utilize social media monitoring and analytics such as Heartbeat and MAP, powered
by Sysomos, to listen, measure and engage in conversations that impact their brand or industry.
Social media analytics have borrowed many metrics from traditional marketing such as demographics (gender,
geography) and customer preferences (sentiment). In addition, new metrics that leverage the massive amounts of
available social data such as engagement, followers/friends, and posts have also been introduced. Among these,
the metric to identify influencers, or the most influential users of a social network, is considered to be both extremely
powerful and important.
1.1 Network Influencers
Identifying the key influencers is crucial for companies in order to pinpoint target individuals who can potentially
broadcast and endorse a brand’s message. Engaging these individuals allows control over a brand’s online message
and minimizes the potential for negative sentiment. Careful management of this process can lead to exponential
growth in online mindshare – especially in the case of viral marketing campaigns.
Identifying the key influencers is crucial for
companies in order to target individuals who
can potentially broadcast and endorse a
brand’s message.
Most past approaches to determining influencers have focused solely on easily
calculable metrics such as number of followers/friends or number of posts.
While the aggregated follower/friends count may approximate the overall social
network, it provides little in the way of understanding the key influencers with
respect to a company or brand. This leads to noisy influencer results and wasted
time sifting through the massive volume of potential users.
3%
DAVE
46.1%
5.6%
AMY
CAROL
42.3%
BRIAN
3%
EDDIE
Figure 1. Sample Twitter follower network with PageRank scores
1
CONTEXTUAL INFLUENCER GRAPHS ON SOCIAL NETWORKS
As a motivating example, consider the simplified follower network shown in Figure 1. Amy has the greatest number of
followers and is the most influential user in this network. However, Brian, with only one follower, is more influential than
Carol with two followers, primarily because Brian has a significant portion of Amy’s mindshare. In later sections, we will
present the calculations used to arrive at this unintuitive result and describe why it presents a more accurate view of
online social influence. Hence, in an influence network, it’s not the count of followers you have but who your followers
are, that count.
In an influence network, it’s not the count of
followers you have but who are your followers
that count.
In this white paper, we describe a context-based influencer method that can
overcome the drawbacks of previous approaches to extract key influencers
for a given topic in a social network. We allow the user to define the notion of
“topic” with the help of a keyword(s) query. This helps the user choose and fix
the context that interests him/her.
Our context-based influencer method focuses on Twitter, one of the largest social networking sites with over 500
million registered users and 340 million tweets per day.
In the remaining sections of this paper we first provide background information about Twitter and describe the challenges
of handling large Twitter follower graphs. Next, we discuss our approach to dynamically rank the top influencers for
a particular topic. Finally, we describe several case studies that show the utility of our method. For each case study,
we compare the influencers our method identified to the influencers identified by other common and currently used
methods. We show the quality improvement specifically in the top 12 influencers achieved by our method.
2. BACKGROUND
2.1 Twitter
Twitter is an online social networking service and micro-blogging platform that has more than 500 million registered users
sending over 340 million messages daily. One of its distinguishing characteristics is that it enables users to follow other
users and receive their tweets. The influencer analytics initially focuses on Twitter, because it is one of the biggest social
networks with publicly available data.
2.2 Social Media Analytics
Current social media analytics sift through sparse and inaccurate data to provide insight into a brand’s social presence. In
this subsection, we discuss several popular social media analytics methods for analyzing data generated through Twitter.
Typical analytics for social media tools include:
• Sentiment Analysis: Determines the overall positive and negative sentiment of a particular topic.
• Measure trends: Determines the number of tweets that mention a particular topic and display the trends over time.
• Word/Buzz clouds: Displays a visual representation of the frequency of words in a given set of text data where larger
font sizes indicate higher frequencies. This can be used to analyze a body of tweets, or a given set of user bio fields.
2.3 Static Influencer or Authority?
Several social media analytics companies (i.e., Klout.com, PeerIndex) claim to provide influencer scores for social
networks. When we dig a little deeper into the implementation of their influencer algorithm, we find in most cases, the
metric is not a true influencer metric, but an algebraic formula incorporating the number of followers and the number
of mentions (tweets, posts). For example, the formula may be a logarithmic normalization of these numbers that
allocates approximately 80% of the weight to the follower counts and the remainder to the number of mentions.
The reason for using an algebraic formula is clear: it can be computed extremely quickly. Moreover, the counts of
followers and mentions are instantly updated in the Twitter user profile and this allows for the results to be recalculated
and reported in real time. The algebraic formula is typically called an Authority metric to distinguish it from a true
influencer metric.
2
CONTEXTUAL INFLUENCER GRAPHS ON SOCIAL NETWORKS
However, there are several significant drawbacks to this approach.
• Context Insensitive: This is a static metric, i.e., it does not vary from topic to topic. For example, irrespective of the
topic of interest, the twitter handles of mass media outlets like the New York Times or CNN would get the highest
rankings since they have millions of followers. Therefore, the metric is not context-sensitive.
• High Follower Count Bias: The follower count can significantly affect the user rankings. For example, if there is a wellregarded specialist in a certain field with a limited number of followers but all of them are also experts, they will never
show up in the top 20-100 ranked results due to their low follower count. Effectively, all the followers are treated as
having equal weight, which has been shown to be an incorrect assumption in social network analytics research.
Our approach addresses these shortcomings by: a) dynamically calculating influencers with respect to the query topic and
b) accounting for the influence of each individual follower when calculating the influencer metric for a user. The recursive
nature of the influencer relation is the major challenge in implementing influencer identification on a massive scale.
3. INFLUENCERS ON TWITTER
In this section, we tested several influencer algorithms well regarded in the academic community including PageRank,
Eigenvector Centrality, Weighted Degree, Betweenness, Centrality, Hub, and Authority metrics. We found that
PageRank produces the highest quality results among all the aforementioned influencer algorithms. Thus, we selected
PageRank as the representative influencer algorithm to be compared against the algebraic Authority score currently
used in Sysomos; similar formulaic scores are used in other influencer analytics methods. PageRank[1] is the famously
scalable network influence algorithm popularized by Google co-founder Larry Page.
3.1 A Simplified PageRank Example
The example network in Figure 1 is represented in Table 1. It illustrates how PageRank can significantly differ from the
number of followers.
User Handle
Follower Count
PageRank
Amy
4
46.1 %
Brian
1
42.3 %
Carol
2
5.6 %
Dave
0
3.0%
Eddie
0
3.0%
Table 1. Twitter follower counts and PageRank scores for sample network represented in Figure 1.
Amy is clearly the top influencer with the greatest number of followers and highest PageRank score. However, Brian
with only one follower has a higher PageRank score than Carol who has two followers. This is because Brian’s only
follower is Amy who has 4 followers and is also the most influential person in the given network in-comparison to
Carol’s two followers who both have poor PageRank scores and 0 followers.
The intuition is obvious: if a few experts consider someone an expert, then she/he is also an expert. Clearly, the
PageRank algorithm gives a better measure of influence than simply counting the number of followers.
3
CONTEXTUAL INFLUENCER GRAPHS ON SOCIAL NETWORKS
3.2 Influencer Graph Algorithm
The outline of the algorithm is as follows:
1. For
a given user query (topic), use the Sysomos search engine to return all the tweets for the specified time period.
2. From
the list of returned tweets, extract the tweet authors (user handles) and find the top authors using the Twitter
Authority score. As mentioned in section 2.2 above, this static authority score is independent of the query.
3. From
the ordered list of top authority user handles, select the top N (N~5000) user handles. This number can be
increased and is dependent on the scalability of the architecture and the desired response time of the system (e.g.
sub 5 seconds).
4. For
each user handle in the top N handles, find the follower network induced by the N handles by retrieving the
follower list for each handle. The followers that do not appear in the list of N handles are discarded.
5. We
use an open source library to calculate the PageRank[1] of this interconnected graph of Twitter user handles.
In the next few sections, we show that the PageRank results are, in fact, better than the Authority score. Quality
assessment is necessarily a subjective term that requires human interpretation. We present case studies that show the
superiority of our new algorithm in determining the authoritative or influential users.
3.3 Outlier & Noise Reduction
While the PageRank scores were of higher quality than the Authority score, we still saw some effects due to problematic
outliers. For instance, the query referring to a fast food chain’s espresso coffee brand also happened to bring back some
users from the Philippines who are fans of a karaoke bar/cafe of the same name. Because they happen to be a highly
inter-connected group of users, their influencer score is often high enough to rank in the critical top 10 list.
If a few experts consider someone an expert,
then s/he is also an expert. PageRank gives a
better measure of influence than only counting
the number of followers.
This phenomenon happened quite frequently in our test cases. Although it’s
never wise to infer what the user intended to search for by her query, we can
assume she is not looking for both the fast food restaurant’s coffee brand and
the Filipino karaoke bar, and thus users associated with the karaoke bar are
considered noise.
To accomplish this noise reduction, we experimented with a network
community detection algorithm called Modularity [2]. This community
detection algorithm if applied to our previous example scenario will likely identify the users associated with the Filipino
bar as a unique group that is well separated from the other users in the network. This will help pinpoint and filter these
types of outlier user groups from the user query results. This method was found to be effective in detecting and filtering
the outliers from the main or important results the user query had (presumably) intended to find. The method is as
follows:
1. After
the PageRank algorithm listed in section 3.2, we run the Modularity algorithm on the follower network.
2. The
Modularity function decomposes the network into X communities or sub-networks (where X < N/2, as a
community must have more than one member).
3. Each
node is labeled with the ID associated with one of the X communities.
4. When
the cumulative sum of the node population exceeds 80% of the total, we cut off the remaining smallest
communities.
5. When
the cumulative sum of the node population exceeds 80% of the total number of nodes in the network, we cut
off the remaining relatively smaller communities.
Our experiments show the above-discussed method is effective in filtering out the outliers and enhancing the overall
quality of results.
4
CONTEXTUAL INFLUENCER GRAPHS ON SOCIAL NETWORKS
3.5 Influencer Case Studies
In this section, we show some of the experiments conducted to assess the quality of the influencer results. We used
the sample free-form queries associated with “Fanexpo,” “athletic shoes brand,” and “a special kind of food and drink
brand created by a fast food chain,” and compared the results of our proposed method to the Sysomos Authority
score to assess the quality improvement.
The Sysomos Authority score ranks users on a scale of 10 to 0 indicating the top users. Typically, the first dozen to
one hundred results are rated 10 and dominate the first few result pages. In web searches, users will typically review
the first few pages of results for quality and rarely look past ratings less than 10. Hence, the top 5, top 10, and top 20
results are an important for evaluating influencer quality.
3.5.1 FAST FOOD CHAIN CASE STUDY
A famous fast food chain created a coffee style food and drink brand that gained a lot of good reviews from the
consumers. The brand also includes a wide variety of menu items such as coffee, lattes, espressos, and smoothies. The
influencer results for that specific brand are shown in Table 2.
Twitter Users
order by PageRank
Authority
Score
PageRank
Twitter Users
order by Authority
Authority
Score
PageRank
McCafe
8
2.255%
McDonald’s Corp.
10
1.682%
McDonald’s Corp.
10
1.682%
McDonald’s
10
0.959%
McDonald’s Philly
6
1.478%
Divine Lee
10
0.558%
Marti
7
1.236%
Victor Basa
10
0.558%
McDonald’s SoCal
7
1.174%
Tyler Fox-Banks
10
0.279%
The Mommy-Files
8
1.164%
McDonald’s Venezuela
10
0.234%
McDonalds Eastern NE
6
1.091%
hashtags
10
0.203%
McDonaldsDMV
6
1.017%
GUYEL
10
0.136%
Rick Wion
7
1.012%
The Product Poet
10
0.107%
McDonald’s Canada
9
0.960%
Mia Farrow
10
0.074%
McDonald’s
10
0.959%
Maxene Magalona
10
0.065%
McDonalds NYTriState
8
0.916%
XIAN LIM
10
0.065%
Utah McDonald’s
6
0.913%
Xeni Jardin
10
0.000%
Me Encanta
6
0.910%
Manado Kota
10
0.000%
©
Table 2. The top-ranked Twitter handles ordered by PageRank score and Authority score for the query “the food and
drink” brand.
5
CONTEXTUAL INFLUENCER GRAPHS ON SOCIAL NETWORKS
Figure 2.
The community graph visualization
of Twitter network for the query
“the food and drink” brand.
Several observations for these results:
• PageRank accurately lists the “the food and drink” brand’s official handle Café as the top influencer for the query,
while the Authority score is 8. This does not appear on the first page of the Authority score.
• Many of the fast food chain’s local/regional handles are rated highly with PageRank but had an Authority score lower
than 10.
• PageRank rated the handle associated with the VP of Social Media Engagement of the fast food chain as the ninth
highest although the handle had a low Authority score of 7. This high ranking by PageRank makes sense because as
the VP, he/she is clearly an influencer of the brand on Twitter.
• Notice, there are many inappropriate names in the Authority score list who may have mentioned the brand’s name in
their tweets and have a lot of followers, but they are clearly not influencers.
These observations demonstrate the superiority of the quality of the PageRank influencer results.
This network graph in figure 2 was created with the Gephi[3], an open-source network analysis and visualization
software package. The visualization of the network provides some powerful insights, and illustrates the use of
Modularity for outlier reduction.
Several points of interest:
• The main cluster (on the right) contains many of the fast food chain’s corporate and local handles. As expected,
these handles are central to the network and have significant influence.
• There is an outlier cluster (on the left) containing several handles (ie. “Rian Fellani,” “chubbybunny,” etc.), some of
which have a high PageRank. These handles mentioned the brand’s product in their tweets but were referring to a
karaoke bar in the Philippines, clearly not the target users.
• Although the outlier cluster is clearly referring to a different topic, it still contains links to the main cluster.
This is an instance confirming the “six degrees of separation” theory.
The modularity algorithm detects this outlier cluster as a separate community from the main cluster.
As described in Section 3.4, the noise reduction algorithm removes the outlier cluster from the final results,
providing superior influencer quality.
6
CONTEXTUAL INFLUENCER GRAPHS ON SOCIAL NETWORKS
3.5.2 Fanexpo Case Study
Fanexpo is an annual convention of comics, sci-fi and fantasy entertainment held in Toronto. The top-ranked
influencers are shown in Table 3.
Twitter Users
order by PageRank
Authority
Score
PageRank
Twitter Users
order by Authority
Authority
Score
PageRank
Fan Expo Canada
8
1.241%
Dark Horse Comics
10
0.749%
C.B. Cebulski
9
0.966%
Torontoist
10
0.778%
Silver Snail
7
0.822%
Michael Rooker
10
0.580%
SpaceChannel
8
0.790%
Amanda Tapping
10
0.563%
Torontoist
10
0.778%
National Post
10
0.432%
Dark Horse Comics
10
0.749%
CTV Toronto
10
0.322%
Mark Brooks
8
0.671%
CBC Top Stories
10
0.310%
Michael Shanks
9
0.661%
Nathan Fillion
10
0.358%
Katie Cook
8
0.659%
Brent Spiner
10
0.350%
Kelly Sue DeConnick
8
0.637%
Jessica Nigri
10
0.338%
Ramon Perez
7
0.632%
Meg Turney
10
0.132%
Shaun Hatton
7
0.627%
The Walking Dead
10
0.215%
Fearless Fred
9
0.614%
Eduardo Benvenuti
10
0.119%
Alice Quinn
7
0.583%
Randy Pitchford
10
0.118%
Table 3. The top-ranked Twitter handles ordered by PageRank score and Authority score for the query “Fanexpo.”
Several interesting observations can be seen when analyzing these results:
• PageRank accurately lists the handle Fan Expo Canada as the top influencer for the query, while the Authority
gave it a score of 8.
• The second-ranked PageRank, C.B. Cebulski, is a famous writer for Marvel comics – obviously very influential
in this domain.
• Notice the above two handles do not appear in the critical first page of user handles ranked by Authority scores.
• The next four handles, Silver Snail, SpaceChannel, Torontoist, and Dark Horse Comics, are a comics store
in Toronto, a sci-fi TV channel, a Toronto entertainment blog, and a comics publisher, respectively.
• The Authority score places general news outlets National Post, CTV Toronto, CBC Top Stories at the top of the
ranked list, which are not appropriate for this topic.
• The next series of PageRank names include writers for Marvel or DC comics, or actors in sci-fi or fantasy film or a TV
series. Notice that many of them are assigned Authority scores of less than 10.
Again, these observations re-emphasize the superiority of the quality of the PageRank influencer results.
7
CONTEXTUAL INFLUENCER GRAPHS ON SOCIAL NETWORKS
3.5.3 Athletic Shoes Brand Case Study
An organization to benefit cancer research was founded by a famous athlete who was later caught in a doping scandal.
An equally famous athletic shoes brand that had a partnership with this organization cut ties when the athlete was
indicated on the scandal. The influencer results for the query that combined the athletic shoes brand and the name of
the cancer benefit organization are shown in Table 4.
Twitter users
order by PageRank
Authority
Score
PageRank
Twitter users
order by Authority
Authority
Score
PageRank
Darren Rovell
10
0.63%
Darren Rovell
10
0.63%
The Associated Press
10
0.45%
The Associated Press
10
0.45%
Juliet Macur
8
0.40%
Nice Kicks
10
0.37%
Deadspin
10
0.37%
Deadspin
10
0.37%
Nice Kicks
10
0.37%
NBC Nightly News
10
0.32%
Joseph Weisenthal
9
0.34%
Jim Roberts
10
0.34%
Jim Roberts
10
0.34%
Bloomberg News
10
0.34%
Bloomberg News
10
0.34%
Sports Illustrated
10
0.32%
NBC Nightly News
10
0.32%
Business Insider
10
0.29%
Sports Illustrated
10
0.32%
CBSSports.com
10
0.28%
NYT Sports
9
0.29%
Complex
10
0.26%
Business Insider
10
0.29%
Cyclingnews.com
10
0.25%
CBSSports.com
10
0.28%
Fast Company
10
0.20%
Table 4. The top-ranked Twitter handles ordered by PageRank score and Authority score for the query “athletic shoes
brand and the cancer benefit organization.”
Several interesting points from Table 4:
• Many of the top influencers with Authority score 10 are sports news handles or sports journalists who wrote
extensively on the athlete’s doping scandal.
• In particular, Juliet Macur is third-ranked by PageRank while her Authority score is 8. She is a New York Times sports
journalist who wrote the book on the doping scandal.
• Joseph Weisenthal is a sports business insider who tweeted about the doping scandal and on the partnership
between the “athletic shoes brand and the cancer benefit organization.”
• While it may be difficult to distinguish between all the handles with an Authority score of 10, PageRank score
gives more specificity to the relative rank of the influencers.
These observations once again establish the superiority of the quality of the PageRank influencer results.
8
CONTEXTUAL INFLUENCER GRAPHS ON SOCIAL NETWORKS
4. CONCLUSIONS
In this paper, we present a novel algorithm for context-sensitive influencer ranking using the network analysis methods
such as PageRank and Modularity. We demonstrate the need for a network-based influencer score, and show that
the PageRank method is both suitable and scalable for this task. We also show that Modularity can help identify and
separate outliers (that can potentially degrade the quality of the results) from the main/intended results. Using several
case studies, we show that our new approach qualitatively outperforms other existing Authority scores prevalent in the
social analytics industry.
REFERENCES
[1]Page, Lawrence; Brin, Sergey; Motwani, Rajeev and Winograd, Terry (1999). The PageRank citation ranking:
Bringing Order to the Web.
[2] Newman, M. E. J. (2006). “Modularity and community structure in networks.” PROCEEDINGS-NATIONAL
ACADEMY OF SCIENCES USA 103 (23): 8577–8696
[3] Gephi, an open-source network analysis and visualization software package. www.gephi.org
MARKETWIRED.COM Call us 800.774.9473 Follow us @marketwired Like us fb.com/marketwired
The Sysomos Intelligence Engine powers Marketwired’s portfolio of products.
MA RKE T W IRED
©2014 Marketwire L. P. All rights reserved.
9
201401
HEARTBEAT \ MAP