influencer - Marketwired
Transcription
influencer - Marketwired
Contextual INFLUENCER GRAPHS on SOCIAL NETWORKS CONTEXTUAL INFLUENCER GRAPHS ON SOCIAL NETWORKS It’s who you know, not how many! 1. INTRODUCTION With the rapid growth in acceptance and importance of online social networks such as Twitter, Facebook and LinkedIn, a very intriguing question has started to arise: who are the key influencers in a given social network or community? There are billions of conversations happening across social networks, blogs, and forums, and it is imperative for companies to utilize social media monitoring and analytics such as Heartbeat and MAP, powered by Sysomos, to listen, measure and engage in conversations that impact their brand or industry. Social media analytics have borrowed many metrics from traditional marketing such as demographics (gender, geography) and customer preferences (sentiment). In addition, new metrics that leverage the massive amounts of available social data such as engagement, followers/friends, and posts have also been introduced. Among these, the metric to identify influencers, or the most influential users of a social network, is considered to be both extremely powerful and important. 1.1 Network Influencers Identifying the key influencers is crucial for companies in order to pinpoint target individuals who can potentially broadcast and endorse a brand’s message. Engaging these individuals allows control over a brand’s online message and minimizes the potential for negative sentiment. Careful management of this process can lead to exponential growth in online mindshare – especially in the case of viral marketing campaigns. Identifying the key influencers is crucial for companies in order to target individuals who can potentially broadcast and endorse a brand’s message. Most past approaches to determining influencers have focused solely on easily calculable metrics such as number of followers/friends or number of posts. While the aggregated follower/friends count may approximate the overall social network, it provides little in the way of understanding the key influencers with respect to a company or brand. This leads to noisy influencer results and wasted time sifting through the massive volume of potential users. 3% DAVE 46.1% 5.6% AMY CAROL 42.3% BRIAN 3% EDDIE Figure 1. Sample Twitter follower network with PageRank scores 1 CONTEXTUAL INFLUENCER GRAPHS ON SOCIAL NETWORKS As a motivating example, consider the simplified follower network shown in Figure 1. Amy has the greatest number of followers and is the most influential user in this network. However, Brian, with only one follower, is more influential than Carol with two followers, primarily because Brian has a significant portion of Amy’s mindshare. In later sections, we will present the calculations used to arrive at this unintuitive result and describe why it presents a more accurate view of online social influence. Hence, in an influence network, it’s not the count of followers you have but who your followers are, that count. In an influence network, it’s not the count of followers you have but who are your followers that count. In this white paper, we describe a context-based influencer method that can overcome the drawbacks of previous approaches to extract key influencers for a given topic in a social network. We allow the user to define the notion of “topic” with the help of a keyword(s) query. This helps the user choose and fix the context that interests him/her. Our context-based influencer method focuses on Twitter, one of the largest social networking sites with over 500 million registered users and 340 million tweets per day. In the remaining sections of this paper we first provide background information about Twitter and describe the challenges of handling large Twitter follower graphs. Next, we discuss our approach to dynamically rank the top influencers for a particular topic. Finally, we describe several case studies that show the utility of our method. For each case study, we compare the influencers our method identified to the influencers identified by other common and currently used methods. We show the quality improvement specifically in the top 12 influencers achieved by our method. 2. BACKGROUND 2.1 Twitter Twitter is an online social networking service and micro-blogging platform that has more than 500 million registered users sending over 340 million messages daily. One of its distinguishing characteristics is that it enables users to follow other users and receive their tweets. The influencer analytics initially focuses on Twitter, because it is one of the biggest social networks with publicly available data. 2.2 Social Media Analytics Current social media analytics sift through sparse and inaccurate data to provide insight into a brand’s social presence. In this subsection, we discuss several popular social media analytics methods for analyzing data generated through Twitter. Typical analytics for social media tools include: • Sentiment Analysis: Determines the overall positive and negative sentiment of a particular topic. • Measure trends: Determines the number of tweets that mention a particular topic and display the trends over time. • Word/Buzz clouds: Displays a visual representation of the frequency of words in a given set of text data where larger font sizes indicate higher frequencies. This can be used to analyze a body of tweets, or a given set of user bio fields. 2.3 Static Influencer or Authority? Several social media analytics companies (i.e., Klout.com, PeerIndex) claim to provide influencer scores for social networks. When we dig a little deeper into the implementation of their influencer algorithm, we find in most cases, the metric is not a true influencer metric, but an algebraic formula incorporating the number of followers and the number of mentions (tweets, posts). For example, the formula may be a logarithmic normalization of these numbers that allocates approximately 80% of the weight to the follower counts and the remainder to the number of mentions. The reason for using an algebraic formula is clear: it can be computed extremely quickly. Moreover, the counts of followers and mentions are instantly updated in the Twitter user profile and this allows for the results to be recalculated and reported in real time. The algebraic formula is typically called an Authority metric to distinguish it from a true influencer metric. 2 CONTEXTUAL INFLUENCER GRAPHS ON SOCIAL NETWORKS However, there are several significant drawbacks to this approach. • Context Insensitive: This is a static metric, i.e., it does not vary from topic to topic. For example, irrespective of the topic of interest, the twitter handles of mass media outlets like the New York Times or CNN would get the highest rankings since they have millions of followers. Therefore, the metric is not context-sensitive. • High Follower Count Bias: The follower count can significantly affect the user rankings. For example, if there is a wellregarded specialist in a certain field with a limited number of followers but all of them are also experts, they will never show up in the top 20-100 ranked results due to their low follower count. Effectively, all the followers are treated as having equal weight, which has been shown to be an incorrect assumption in social network analytics research. Our approach addresses these shortcomings by: a) dynamically calculating influencers with respect to the query topic and b) accounting for the influence of each individual follower when calculating the influencer metric for a user. The recursive nature of the influencer relation is the major challenge in implementing influencer identification on a massive scale. 3. INFLUENCERS ON TWITTER In this section, we tested several influencer algorithms well regarded in the academic community including PageRank, Eigenvector Centrality, Weighted Degree, Betweenness, Centrality, Hub, and Authority metrics. We found that PageRank produces the highest quality results among all the aforementioned influencer algorithms. Thus, we selected PageRank as the representative influencer algorithm to be compared against the algebraic Authority score currently used in Sysomos; similar formulaic scores are used in other influencer analytics methods. PageRank[1] is the famously scalable network influence algorithm popularized by Google co-founder Larry Page. 3.1 A Simplified PageRank Example The example network in Figure 1 is represented in Table 1. It illustrates how PageRank can significantly differ from the number of followers. User Handle Follower Count PageRank Amy 4 46.1 % Brian 1 42.3 % Carol 2 5.6 % Dave 0 3.0% Eddie 0 3.0% Table 1. Twitter follower counts and PageRank scores for sample network represented in Figure 1. Amy is clearly the top influencer with the greatest number of followers and highest PageRank score. However, Brian with only one follower has a higher PageRank score than Carol who has two followers. This is because Brian’s only follower is Amy who has 4 followers and is also the most influential person in the given network in-comparison to Carol’s two followers who both have poor PageRank scores and 0 followers. The intuition is obvious: if a few experts consider someone an expert, then she/he is also an expert. Clearly, the PageRank algorithm gives a better measure of influence than simply counting the number of followers. 3 CONTEXTUAL INFLUENCER GRAPHS ON SOCIAL NETWORKS 3.2 Influencer Graph Algorithm The outline of the algorithm is as follows: 1. For a given user query (topic), use the Sysomos search engine to return all the tweets for the specified time period. 2. From the list of returned tweets, extract the tweet authors (user handles) and find the top authors using the Twitter Authority score. As mentioned in section 2.2 above, this static authority score is independent of the query. 3. From the ordered list of top authority user handles, select the top N (N~5000) user handles. This number can be increased and is dependent on the scalability of the architecture and the desired response time of the system (e.g. sub 5 seconds). 4. For each user handle in the top N handles, find the follower network induced by the N handles by retrieving the follower list for each handle. The followers that do not appear in the list of N handles are discarded. 5. We use an open source library to calculate the PageRank[1] of this interconnected graph of Twitter user handles. In the next few sections, we show that the PageRank results are, in fact, better than the Authority score. Quality assessment is necessarily a subjective term that requires human interpretation. We present case studies that show the superiority of our new algorithm in determining the authoritative or influential users. 3.3 Outlier & Noise Reduction While the PageRank scores were of higher quality than the Authority score, we still saw some effects due to problematic outliers. For instance, the query referring to a fast food chain’s espresso coffee brand also happened to bring back some users from the Philippines who are fans of a karaoke bar/cafe of the same name. Because they happen to be a highly inter-connected group of users, their influencer score is often high enough to rank in the critical top 10 list. If a few experts consider someone an expert, then s/he is also an expert. PageRank gives a better measure of influence than only counting the number of followers. This phenomenon happened quite frequently in our test cases. Although it’s never wise to infer what the user intended to search for by her query, we can assume she is not looking for both the fast food restaurant’s coffee brand and the Filipino karaoke bar, and thus users associated with the karaoke bar are considered noise. To accomplish this noise reduction, we experimented with a network community detection algorithm called Modularity [2]. This community detection algorithm if applied to our previous example scenario will likely identify the users associated with the Filipino bar as a unique group that is well separated from the other users in the network. This will help pinpoint and filter these types of outlier user groups from the user query results. This method was found to be effective in detecting and filtering the outliers from the main or important results the user query had (presumably) intended to find. The method is as follows: 1. After the PageRank algorithm listed in section 3.2, we run the Modularity algorithm on the follower network. 2. The Modularity function decomposes the network into X communities or sub-networks (where X < N/2, as a community must have more than one member). 3. Each node is labeled with the ID associated with one of the X communities. 4. When the cumulative sum of the node population exceeds 80% of the total, we cut off the remaining smallest communities. 5. When the cumulative sum of the node population exceeds 80% of the total number of nodes in the network, we cut off the remaining relatively smaller communities. Our experiments show the above-discussed method is effective in filtering out the outliers and enhancing the overall quality of results. 4 CONTEXTUAL INFLUENCER GRAPHS ON SOCIAL NETWORKS 3.5 Influencer Case Studies In this section, we show some of the experiments conducted to assess the quality of the influencer results. We used the sample free-form queries associated with “Fanexpo,” “athletic shoes brand,” and “a special kind of food and drink brand created by a fast food chain,” and compared the results of our proposed method to the Sysomos Authority score to assess the quality improvement. The Sysomos Authority score ranks users on a scale of 10 to 0 indicating the top users. Typically, the first dozen to one hundred results are rated 10 and dominate the first few result pages. In web searches, users will typically review the first few pages of results for quality and rarely look past ratings less than 10. Hence, the top 5, top 10, and top 20 results are an important for evaluating influencer quality. 3.5.1 FAST FOOD CHAIN CASE STUDY A famous fast food chain created a coffee style food and drink brand that gained a lot of good reviews from the consumers. The brand also includes a wide variety of menu items such as coffee, lattes, espressos, and smoothies. The influencer results for that specific brand are shown in Table 2. Twitter Users order by PageRank Authority Score PageRank Twitter Users order by Authority Authority Score PageRank McCafe 8 2.255% McDonald’s Corp. 10 1.682% McDonald’s Corp. 10 1.682% McDonald’s 10 0.959% McDonald’s Philly 6 1.478% Divine Lee 10 0.558% Marti 7 1.236% Victor Basa 10 0.558% McDonald’s SoCal 7 1.174% Tyler Fox-Banks 10 0.279% The Mommy-Files 8 1.164% McDonald’s Venezuela 10 0.234% McDonalds Eastern NE 6 1.091% hashtags 10 0.203% McDonaldsDMV 6 1.017% GUYEL 10 0.136% Rick Wion 7 1.012% The Product Poet 10 0.107% McDonald’s Canada 9 0.960% Mia Farrow 10 0.074% McDonald’s 10 0.959% Maxene Magalona 10 0.065% McDonalds NYTriState 8 0.916% XIAN LIM 10 0.065% Utah McDonald’s 6 0.913% Xeni Jardin 10 0.000% Me Encanta 6 0.910% Manado Kota 10 0.000% © Table 2. The top-ranked Twitter handles ordered by PageRank score and Authority score for the query “the food and drink” brand. 5 CONTEXTUAL INFLUENCER GRAPHS ON SOCIAL NETWORKS Figure 2. The community graph visualization of Twitter network for the query “the food and drink” brand. Several observations for these results: • PageRank accurately lists the “the food and drink” brand’s official handle Café as the top influencer for the query, while the Authority score is 8. This does not appear on the first page of the Authority score. • Many of the fast food chain’s local/regional handles are rated highly with PageRank but had an Authority score lower than 10. • PageRank rated the handle associated with the VP of Social Media Engagement of the fast food chain as the ninth highest although the handle had a low Authority score of 7. This high ranking by PageRank makes sense because as the VP, he/she is clearly an influencer of the brand on Twitter. • Notice, there are many inappropriate names in the Authority score list who may have mentioned the brand’s name in their tweets and have a lot of followers, but they are clearly not influencers. These observations demonstrate the superiority of the quality of the PageRank influencer results. This network graph in figure 2 was created with the Gephi[3], an open-source network analysis and visualization software package. The visualization of the network provides some powerful insights, and illustrates the use of Modularity for outlier reduction. Several points of interest: • The main cluster (on the right) contains many of the fast food chain’s corporate and local handles. As expected, these handles are central to the network and have significant influence. • There is an outlier cluster (on the left) containing several handles (ie. “Rian Fellani,” “chubbybunny,” etc.), some of which have a high PageRank. These handles mentioned the brand’s product in their tweets but were referring to a karaoke bar in the Philippines, clearly not the target users. • Although the outlier cluster is clearly referring to a different topic, it still contains links to the main cluster. This is an instance confirming the “six degrees of separation” theory. The modularity algorithm detects this outlier cluster as a separate community from the main cluster. As described in Section 3.4, the noise reduction algorithm removes the outlier cluster from the final results, providing superior influencer quality. 6 CONTEXTUAL INFLUENCER GRAPHS ON SOCIAL NETWORKS 3.5.2 Fanexpo Case Study Fanexpo is an annual convention of comics, sci-fi and fantasy entertainment held in Toronto. The top-ranked influencers are shown in Table 3. Twitter Users order by PageRank Authority Score PageRank Twitter Users order by Authority Authority Score PageRank Fan Expo Canada 8 1.241% Dark Horse Comics 10 0.749% C.B. Cebulski 9 0.966% Torontoist 10 0.778% Silver Snail 7 0.822% Michael Rooker 10 0.580% SpaceChannel 8 0.790% Amanda Tapping 10 0.563% Torontoist 10 0.778% National Post 10 0.432% Dark Horse Comics 10 0.749% CTV Toronto 10 0.322% Mark Brooks 8 0.671% CBC Top Stories 10 0.310% Michael Shanks 9 0.661% Nathan Fillion 10 0.358% Katie Cook 8 0.659% Brent Spiner 10 0.350% Kelly Sue DeConnick 8 0.637% Jessica Nigri 10 0.338% Ramon Perez 7 0.632% Meg Turney 10 0.132% Shaun Hatton 7 0.627% The Walking Dead 10 0.215% Fearless Fred 9 0.614% Eduardo Benvenuti 10 0.119% Alice Quinn 7 0.583% Randy Pitchford 10 0.118% Table 3. The top-ranked Twitter handles ordered by PageRank score and Authority score for the query “Fanexpo.” Several interesting observations can be seen when analyzing these results: • PageRank accurately lists the handle Fan Expo Canada as the top influencer for the query, while the Authority gave it a score of 8. • The second-ranked PageRank, C.B. Cebulski, is a famous writer for Marvel comics – obviously very influential in this domain. • Notice the above two handles do not appear in the critical first page of user handles ranked by Authority scores. • The next four handles, Silver Snail, SpaceChannel, Torontoist, and Dark Horse Comics, are a comics store in Toronto, a sci-fi TV channel, a Toronto entertainment blog, and a comics publisher, respectively. • The Authority score places general news outlets National Post, CTV Toronto, CBC Top Stories at the top of the ranked list, which are not appropriate for this topic. • The next series of PageRank names include writers for Marvel or DC comics, or actors in sci-fi or fantasy film or a TV series. Notice that many of them are assigned Authority scores of less than 10. Again, these observations re-emphasize the superiority of the quality of the PageRank influencer results. 7 CONTEXTUAL INFLUENCER GRAPHS ON SOCIAL NETWORKS 3.5.3 Athletic Shoes Brand Case Study An organization to benefit cancer research was founded by a famous athlete who was later caught in a doping scandal. An equally famous athletic shoes brand that had a partnership with this organization cut ties when the athlete was indicated on the scandal. The influencer results for the query that combined the athletic shoes brand and the name of the cancer benefit organization are shown in Table 4. Twitter users order by PageRank Authority Score PageRank Twitter users order by Authority Authority Score PageRank Darren Rovell 10 0.63% Darren Rovell 10 0.63% The Associated Press 10 0.45% The Associated Press 10 0.45% Juliet Macur 8 0.40% Nice Kicks 10 0.37% Deadspin 10 0.37% Deadspin 10 0.37% Nice Kicks 10 0.37% NBC Nightly News 10 0.32% Joseph Weisenthal 9 0.34% Jim Roberts 10 0.34% Jim Roberts 10 0.34% Bloomberg News 10 0.34% Bloomberg News 10 0.34% Sports Illustrated 10 0.32% NBC Nightly News 10 0.32% Business Insider 10 0.29% Sports Illustrated 10 0.32% CBSSports.com 10 0.28% NYT Sports 9 0.29% Complex 10 0.26% Business Insider 10 0.29% Cyclingnews.com 10 0.25% CBSSports.com 10 0.28% Fast Company 10 0.20% Table 4. The top-ranked Twitter handles ordered by PageRank score and Authority score for the query “athletic shoes brand and the cancer benefit organization.” Several interesting points from Table 4: • Many of the top influencers with Authority score 10 are sports news handles or sports journalists who wrote extensively on the athlete’s doping scandal. • In particular, Juliet Macur is third-ranked by PageRank while her Authority score is 8. She is a New York Times sports journalist who wrote the book on the doping scandal. • Joseph Weisenthal is a sports business insider who tweeted about the doping scandal and on the partnership between the “athletic shoes brand and the cancer benefit organization.” • While it may be difficult to distinguish between all the handles with an Authority score of 10, PageRank score gives more specificity to the relative rank of the influencers. These observations once again establish the superiority of the quality of the PageRank influencer results. 8 CONTEXTUAL INFLUENCER GRAPHS ON SOCIAL NETWORKS 4. CONCLUSIONS In this paper, we present a novel algorithm for context-sensitive influencer ranking using the network analysis methods such as PageRank and Modularity. We demonstrate the need for a network-based influencer score, and show that the PageRank method is both suitable and scalable for this task. We also show that Modularity can help identify and separate outliers (that can potentially degrade the quality of the results) from the main/intended results. Using several case studies, we show that our new approach qualitatively outperforms other existing Authority scores prevalent in the social analytics industry. REFERENCES [1]Page, Lawrence; Brin, Sergey; Motwani, Rajeev and Winograd, Terry (1999). The PageRank citation ranking: Bringing Order to the Web. [2] Newman, M. E. J. (2006). “Modularity and community structure in networks.” PROCEEDINGS-NATIONAL ACADEMY OF SCIENCES USA 103 (23): 8577–8696 [3] Gephi, an open-source network analysis and visualization software package. www.gephi.org MARKETWIRED.COM Call us 800.774.9473 Follow us @marketwired Like us fb.com/marketwired The Sysomos Intelligence Engine powers Marketwired’s portfolio of products. MA RKE T W IRED ©2014 Marketwire L. P. All rights reserved. 9 201401 HEARTBEAT \ MAP