cerveja historia
Transcription
cerveja historia
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia ISSN: 1983-9456 (Impressa) ISSN: 2317-0123 (On-line) Editor: Fauze Najib Mattar Sistema de avaliação: Triple Blind Review Idiomas: Português e Inglês Publicação: ABEP – Associação Brasileira de Empresas de Pesquisa Statistical Analysis of Users who Chatting about Beer on Twitter 1 Análise de Usuários que Conversam sobre Cerveja no Twitter Submission: Mar./28/2014 - Approval: Apr./14/2014 Rodrigo Otávio de Araújo Ribeiro Doctor and Master in Production Engineering from Universidade Federal Fluminense - UFF. Bachelor's degree in Statistics from Escola Nacional de Ciências Estatísticas - ENCE/IBGE. He has a large experience on statistical modeling in large databases. Nowadays he is Director of Marketing Intelligence at IBOPE DTM. E-mail: [email protected] Professional Address: IBOPE DTM - Rua Voluntários da Pátria - nº 89 - sala 803 - 22270-000 Botafogo - Rio de Janeiro/RJ – Brasil. Tarsila Gomes Bello Tavares Bachelor's degree in Statistics from Escola Nacional de Ciências Estatísticas - ENCE/IBGE. Nowadays she is Coordinator of Marketing Intelligence at IBOPE DTM. E-mail: [email protected] Daniel de Oliveira Cohen Bachelor's degree in Statistics from the State University of Campinas - UNICAMP. He performs statistical analyzes as regression, segmentation and social network analysis on data collected through quantitative surveys. Nowadays he is Statistician at IBOPE Intelligence. E-mail: [email protected] 1 This paper was presented at ABEP’s 6th Brazilian Market, Opinion and Media Research Congress (held on March 24 and 25, 2014) and winner the Prize “Alfredo Carmo”. It was turned into this article by their authors and was submitted and approved for publication by PMKT. Statistical Analysis of Users who Chatting about Beer on Twitter Rodrigo Otávio de Araújo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen ABSTRACT The identification of influential users in social media is a subject that has generated great interest by companies in recent years. This work aims to evaluate this influence through the use of graphs for understanding the existing relational structure between users, established through their conversations on Twitter. Exploratory data analysis and text mining techniques have been used to further conclusions about the subject. The "conversation environment" was chosen is Brazilian beer, and the search related words were the major active brands in domestic market. The evaluation was performed considering a sample of 25 days between the months of December 2013 and January 2014. KEYWORDS: Beer, Twitter, Social network analysis. RESUMO A identificação de usuários influentes nas mídias sociais é um assunto que tem gerado grande interesse por parte das empresas nos últimos anos. Este artigo visa avaliar esta influência por meio da utilização de grafos para entendimento da estrutura relacional existente entre os usuários, estabelecida por suas conversas no Twitter. A análise exploratória de dados e as técnicas de Mineração de Textos foram utilizadas para conclusões complementares acerca do assunto. O ambiente de conversas escolhido para avaliação foi o das cervejas brasileiras, sendo as buscas realizadas por palavras relacionadas às principais marcas atuantes no mercado nacional. A avaliação foi realizada, considerando uma amostra de 25 dias entre os meses de dezembro de 2013 e janeiro de 2014. PALAVRAS-CHAVE: Cerveja, Twitter, Análise de Redes Sociais. PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V. 14, pp. 174-195, Abril, 2014 - www.revistapmkt.com.br 175 Statistical Analysis of Users who Chatting about Beer on Twitter Rodrigo Otávio de Araújo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen 1. INTRODUCTION This article aims to identify the most influential users on Twitter who posted messages about beer. A sample of 25 days between the months of December 2013 and January of 2014 was used, considering only posts made in Portuguese, in Brazil. The content of the conversations was also assessed by applying text mining algorithms. A descriptive analysis of the general behavior of Twitter users who talk about the subject, toward an understanding of aspects about the use and impact of different brands and the users profile was performed. The largest amount of posts on the subject took place in the afternoon and evening, where there is a strong asymmetry with respect to the distribution of the number of messages posted by users; the majority posted only a single message during the period. By observation of the peaks in the time series of the total number of messages posted, it was possible to evaluate the effect of holidays: the behavior of users during the New Year was very close to what was observed at Christmas. As semantic evaluation of posts about beer, many topics (themes) within the main subject were identified. This kind of information can assist companies in targeting their strategies and ongoing monitoring of consumer behavior. It was noticed that when many users post messages about beer, they mention information about where, with whom, or even when they will consume it. Many times they also mention the brands of their preference as well. The analysis of influence of users in social networks allows the creation of various marketing strategies. Most influential users on a particular subject can be contacted by companies to publicize their brands being used as links between companies and other end users. The measurement of the influence made in this work was done based on the number of connections that the user had during the study period. On Twitter, users can target their messages to each other and pass on information disclosed by any of their connections (retweets). One way to assess the degree of influence of users consists in verifying the amount of connections that pass their messages or the number of connections to which they direct their posts. This paper aims to evaluate these two points of view. This study was structured as follows: after the introduction, in the second part, we pointed out the main features of the different techniques used in the analysis. In the third, there was a small explanation about Twitter and its strong growth in Brazil. In the fourth part, the contextualization of the domestic beer market, its evolution, its trends and key brands. In the fifth part, it was detailed the analytical methodology applied, clarifying the questions answered by the study. In the sixth part, there was an explanation about the data information used. In the seventh, the results of data analysis were shown. In the eighth, it was presented the main conclusions and, finally, the limitation and suggestions for new research. PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V. 14, pp. 174-195, Abril, 2014 - www.revistapmkt.com.br 176 Statistical Analysis of Users who Chatting about Beer on Twitter Rodrigo Otávio de Araújo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen 2. THEORETICAL BACKGROUND 2.1 SOCIAL NETWORK ANALYSIS A social network is determined by a set of actors (or nodes) and pre-established relationships between them (WASSERMAN; FAUST, 1994). Actors can take many forms and represent different groups of individuals as users, companies and entities. Because of its great flexibility, social network analysis (also commonly called Social Network Analysis - SNA) can be applied in almost any context. Generally, SNA techniques are visually represented by "graphs". In these graphs, the actors or nodes are represented by dots and the relationship between a pair of nodes is defined by edges or connections. The connections can be direct when it is important to highlight that the actor was the source of this relationship (WASSERMAN; FAUST, 1994). According to the authors, in addition to being visually displayed, a social network can be described by an n x n matrix, where n is the total number of nodes on that network. The existence of relationship between the pair of nodes u and v, for example, would be given the value 1 in the corresponding cell of the matrix. The reading can be done as follows: the rows represent nodes where the relationship goes (actors of origin) and the columns, where the relationship ends (actors of destination). Thus, an indirect social network will always give a symmetric matrix. In order to assist in understanding the relationships between the actors, there are some metrics that can consider the network as a whole or each node in specific. Among them are: Degree (degree): number of edges connected to each node. PageRank: spectral measure of popularity set to direct graphs with non-negative weights of connections (PAGE et al., 1998) , and can be given by: ( )̅ ( ) Where: n = total number of nodes in the network (users). A = ∈ {−1,0, +1}n×n is the adjacency matrix with values Auv = +1 when user u marked user v as a friend and Auv = −1 when user u marked user v as a foe. A is sparse, square and asymmetric. ̅ = absolute diagonal matrix defined by ̅ ∑ | |. = is a matrix full of ones of the specified size, and 0 < α < 1 is the teleportation parameter. The matrix G is left-stochastic, each row sums to one (KUNEGIS; LOMMATZSCH; BAUCKHAGE, 2009). The software used in this study was Gephi, a freeware that allows different forms of editing and customization of the final results. It can be used in the creation of graphs and calculating the metrics analysis. PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V. 14, pp. 174-195, Abril, 2014 - www.revistapmkt.com.br 177 Statistical Analysis of Users who Chatting about Beer on Twitter Rodrigo Otávio de Araújo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen 2.2 TEXT MINING Text Mining is the process of extracting useful information or knowledge from unstructured text documents (BARYON; LAKE, 2008). In the context of this study, this technique is applied to identify patterns of comments and opinions expressed by users of Twitter about the Brazilian beer market. Information Retrieval techniques are applied over a set of texts, with the aim of making it structured. From these structured data, data mining techniques are applied to obtain relevant information, as shown in Figure 1. Source: BARION, E. C. N.; LAGO, D. Mineração de textos. Revista de Ciências Exatas e Tecnologia, 2008. FIGURE 1 Text Mining Process. The first step Mining is the indexing process that stores an index structure, from the words of the text, and makes it possible to search for documents by all terms contained therein (SALTON; MCGILL, 1983). Some steps to an analysis of Text Mining (BARYON; LAKE, 2008): Lexical Analysis: converts a string into a sequence of words that are candidates for index terms. Removal of Stop-words: removes a set of words that appear frequently in texts, but have no semantic value, such as prepositions, articles and conjunctions. This phase is extremely important, because it reduces the base to be indexed and facilitates mining. Stemming: removes all variations of words, leaving only the root of each, for example, the word “dreaming" becomes identified as the root of "dream". Selection of index terms: determines which words or radical elements will be used as indexing. These words are selected according to the weight assigned to them. Bag of Words - BOW: a matrix in which each different term in this collection of documents is indexed. From this indexing, each document can be represented by a first vector x n, where n is the total number of terms; each entry of this vector is the number of times the terms appear in this document (SIVIC, 2009). Determination of weights: filling the BOW matrix is based on metrics that weigh the frequency of occurrence of terms in documents and in the total collection (set of all documents). The metric most commonly used for this purpose is called tf-idf (term frequency inverse document frequency). Correlation (similarity) between terms: BOW based on the matrix, one can calculate the Pearson correlation between different words, in order to measure how they are related by the formula (HUANG, 2008): PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V. 14, pp. 174-195, Abril, 2014 - www.revistapmkt.com.br 178 Statistical Analysis of Users who Chatting about Beer on Twitter Rodrigo Otávio de Araújo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen (→ ∑ → ) √ ∑ ∑ ∑ ∑ ∑ ∑ Where: → vector created by the BOW. m = total number of distinct terms in the entire collection of documents. weight (tf-idf) of term t in document a. weight (tf-idf) of term t in document b 3. O TWITTER Twitter was founded in 2006 by partners Jack Dorsey, Evan Williams, Biz Stone and Noah Glass, in San Francisco, USA. The service is a social network that allows users to post and read tweets, which are nothing more than a 140 character messages. Its access can be made directly on any internet browser, for applications in mobiles. In some countries, the posts can be made by SMS as well. The idea quickly spread and gained popularity throughout the world: in 2012, there were more than 500 million registered users who posted 340 million tweets per day (LUNDEN, 2012). According to the information site of hits on web pages (<www.alexa.com>), Twitter was one of the ten most accessed pages of the world that year. Once registered, the user defines an address on the site that is not already being used. From then on, he will always be known by other users for that address preceded by the “@” symbol. Set this address and registered the account, the user can "follow" or "be followed" by other users. This means when a user posts something, the message appears directly for the users that follow him. By default, tweets are publicly visible. However, you can restrict viewing messages only to their followers. Another possibility is to repost the message that has already been posted by someone else, a practice known as retweet, and which is characterized by the abbreviation RT. In this case, the goal is to get the message out (STRACHAN, 2009). When a post that is on a specific topic, users can apply hashtags on their messages - phrases or words that begin with the # symbol (STRACHAN, 2009). Likewise, it´s possible to display only messages that on that specific topic. When a word, a phrase or an expression are often mentioned simultaneously by a large number of different users, they can be considered a trending topic (CHOWDHURY, 2009). Trending generally occurs when efforts of a group of users with common interest are brought together for the sake of some goal or when large and popular events are happening. 4. BEER MARKET IN BRAZIL Currently, Brazil has a highly competitive beer market in which companies stand out as AmBev, Brazil and Petrópolis Kirin Group. With a turnover of R$ 63 billion in 2012, the country is the third largest brewer in the 26th international consumer ranking (ECONOMIC VALUE, 2013). The market share of this market in Brazil is concentrated in AmBev breweries, Kirin Group and Grupo Petrópolis, which together have 90% of the market. Important information is the evolution of PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V. 14, pp. 174-195, Abril, 2014 - www.revistapmkt.com.br 179 Statistical Analysis of Users who Chatting about Beer on Twitter Rodrigo Otávio de Araújo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen per capita beer`s consumption in liters per year in Brazil. Since 2008, the beer consumption in Brazil has presented a significant increase and in 2012, the consumption reached 66.7 liters per capita (Chart 1). CHART 1 Evolution of Brazil`s consumption of beer (liters per capita). In Chart 2 are the market shares of Brazilian beer market in 2012. CHART 2 Market shares of the Brazilian beer market in 2012. Due to the relevance of the beer market in the Brazilian economy and its continued growth we decided to perform this study in which the monitoring was conducted following brands: Antarctica, Baden Baden, Bohemia, Brahma, Budweiser, Eisenbahn, Itaipava, Nova Schin, Serramalte , Stella PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V. 14, pp. 174-195, Abril, 2014 - www.revistapmkt.com.br 180 Statistical Analysis of Users who Chatting about Beer on Twitter Rodrigo Otávio de Araújo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen Artois and Skol, besides the word “cerveja” (beer) and two of its regional variations: “breja” and “cerva”. 5. ANALYTICAL METHODOLOGY The analytical methodology consists in the execution of three steps: the first refers to the analysis of the general behavior and the profile of users on the use of Twitter to make posts about beer; the second, the semantic analysis based on text mining techniques and multivariate statistics to identify the most relevant topics of discussion within the brewing environment and, finally, evaluation of the influence of users. 5.1 GENERAL BEHAVIOR AND PROFILE OF USERS WHEN THE SUBJECT IS BEER In the first analytical step, we sought to assess the main aggregated metrics present at work grouped in time. The most important were the following: Number of posts: measures the total number of posts made by time interval. Number of distinct users: measures the total number of distinct users who have had postings per time interval. Average Posts per user: calculated by dividing the number of posts by the number of distinct users. Percentage of posts: proportion of posts classified in each of the existing categories. The analysis of the total amount of posts makes it possible to evaluate the total intensity of impacts occurred during the observed period. Through the average of posts per user we can verify, in general terms, the degree of intensity of disclosure of the matter considered among the users , so that, the closer to 1 is the average, the lower the intensity. The percentage of posts evaluates the weight of each existing category within a given categorical variable in the total of posts considered. The evaluation of these metrics is aimed to understanding the characteristics of the general behavior of the Twitter users about beer. The identification of the peaks was made by visualization of time series of the number of posts. The same procedure must be performed to evaluate the time curve. Alternating the amount of posts to post by the average user could evaluate changes on the behavior of individual users. Often there are large variations in this metric on time intervals, due to specific users who tend to perform more posts about the specific topics or events. Twitter allows the use of specific metrics that denote the different types of behavior of its members; among them you can highlight the penetration (proportion of posts with certain characteristic). These characteristics were the following: RTs: tweets which passed on a message that had already been posted by another user. @: directing messages to another person. Http: tweets possessing information contained on websites. Hashtag (#): group discussion on a specific topic. Other: tweets which do not contain any of the aforementioned characteristics. PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V. 14, pp. 174-195, Abril, 2014 - www.revistapmkt.com.br 181 Statistical Analysis of Users who Chatting about Beer on Twitter Rodrigo Otávio de Araújo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen 5.2 INFLUENCE ANALYSIS The analysis of influence is taken from a network of conversations in which two distinct cases of influence were observed, the first case considers the retweets. The other case of influence includes the tweets sent directly to other users. In the first case, it was noted how influential a user is checking how many other users have made retweet its post. Then, there was the influence of the amount of directed conversations between users. In this article, we considered the two cases and all sorts of connections between users. However, in practical terms, the effect of retweets has always more impact, because it happens with more frequency. 5.3 SEMANTIC ANALYSIS Correlation analysis between topics was accomplished as the following process: first the lexical analysis was performed. In a second step, the cleaning of stop-words (words without semantic value) for later execution stemming algorithm (extraction of radicals) was taken. After these steps, the BOW matrix was calculated. In this array, each term corresponding to a considered column and each row to a document (tweet). The measure was used to assess the tf-idf (term frequency inverse document frequency). Based on the information matrix it was possible to obtain the measures most associated with the particular word. This similarity was assessed by Pearson correlation. The classification of posts on the theme was generated through the development of a heuristic based on the selection of keywords defined by experts. The evaluation process of the words to be considered is: Step 1: definition of keywords that characterize certain theme. Step 2: development of algorithm to count the keywords defined in step 1. Step 3: Repeat steps 1 and 2 until the proportion of posts classified into any theme can be considered satisfactory. Generally, the minimum proportion of posts classified into themes for obtaining consistent results is 50 percent. 6. AVAILABLE INFORMATION The extraction of information was done through a program developed by IBOPE DTM that connects directly to the Twitter API. Based on the distribution of market share in the Brazilian beer market, it was decided to study only the brands of the most significant companies in the segment: AmBev, Kirin Group and Grupo Petrópolis. PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V. 14, pp. 174-195, Abril, 2014 - www.revistapmkt.com.br 182 Statistical Analysis of Users who Chatting about Beer on Twitter Rodrigo Otávio de Araújo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen Therefore, we carried out the monitoring of the following brands: Antarctica, Baden Baden, Bohemia, Brahma, Budweiser, Eisenbahn, Itaipava, Nova Schin, Serramalte, Stella Artois and Skol, besides the word “cerveja” (beer) and two of its regional variations: “cerva” and “breja”. The data refer to all messages posted during the study period containing the specified words. After 25 days of monitoring, 438,507 tweets (posts) related to beer were obtained. However, the study was done focusing on disclosure in Brazil, we only considered posts in Portuguese and work was started with 291,043 posts (66.4%). The monitoring period from 10/12/2013 to 01/03/2014 was chosen based on the assumption that the holidays of the end of the year: Christmas and New Year influencing the number of posts on Twitter about beer. 7. DATA ANALYSIS The analysis followed the same structure of the methodology presented. First the general distribution of the posts was evaluated. 7.1 GENERAL BEHAVIOR AND PROFILE OF USERS WHEN THE SUBJECT IS BEER The impacts caused by the holidays put considerable variation in the daily amount of tweets posted. In Chart 3, we can see that the days that had incidence peaks posts were 24, 25 and December 31, or Christmas Eve, Christmas and New Year's Eve, in which there was an increase of over 4000 posts for the total period average. CHART 3 Distribution of posts about beer in Twitter. PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V. 14, pp. 174-195, Abril, 2014 - www.revistapmkt.com.br 183 Statistical Analysis of Users who Chatting about Beer on Twitter Rodrigo Otávio de Araújo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen As for the timing of posts (Chart 4), there was a sharp increase from 10 o'clock in the morning, which has stabilized at between 15 and 21 hours. CHART 4 Amount of posts per hour. By analyzing the average of posts per user, it can be seen that there was a peak in the middle at 9 am (Chart 5). But this peak was not large enough to consider the behavior very different from other hours of the day. CHART 5 Average amount of posts by Total, Christmas and New Year. However, when Christmas and New Year holidays were detailed, it was seen that, at Christmas, the highest average incidence of posts occurred between 8 and 9 o'clock, while in the New Year this higher average incidence of posts occurred in the period as from 23 hours, as shown in Chart 6. PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V. 14, pp. 174-195, Abril, 2014 - www.revistapmkt.com.br 184 Statistical Analysis of Users who Chatting about Beer on Twitter Rodrigo Otávio de Araújo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen CHART 6 Average amount of posts by Total, Christmas and New Year. In Chart 7, we note that 85.5 % of the posts pertaining to beer do not mention a specific brand. However, considering 14.5% of the posts with quote of some brand, Skol is the one with higher participation in Twitter with 4.3 %, followed by Brahma Itaipava with 3.5% and 2.2% of the posts. CHART 7 Percentage of posts of search words. As to individual metrics (Table 1), it was seen that the only brand that had featured a significant amount of posts with hashtag (#) was the Eisenbahn with 17.9% of the posts. Brands which contained links to sites (http) were: Baden Baden with 41.4 %, with 36.5% Eisenbahn, Antarctica PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V. 14, pp. 174-195, Abril, 2014 - www.revistapmkt.com.br 185 Statistical Analysis of Users who Chatting about Beer on Twitter Rodrigo Otávio de Araújo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen with 32.4% and Stella Artois with 29.4% of the posts. In context messages directed (@), the Serramalte brand stood out with 32.5%, followed by Nova Schin with 23.4% of the posts. Finally, the transfer of messages previously posted (RT) were higher in the Budweiser brand in 26.9% and 21.1% of Antarctica in posts. TABLE 1 Number of posts by search word and its individual metrics. POSTS SEARCH WORD CERVEJA (beer) % PENETRATION BY POST TYPE % RT @ HTTP HASTAG OTHERS 215.229 74,0% 21,1% 13,2% 8,0% 3,8% 56,3% BREJA 19.537 6,7% 11,0% 16,9% 6,1% 3,5% 64,4% CERVA 14.112 4,8% 11,2% 15,8% 5,7% 3,8% 65,7% ANTARCTICA 5.781 2,0% 21,1% 11,3% 32,4% 4,5% 33,7% BOHEMIA 1.943 0,7% 6,2% 13,2% 17,4% 9,0% 59,1% BRAHAMA 10.234 3,5% 15,7% 13,9% 16,1% 6,7% 52,0% BUDWEISER 3.232 1,1% 26,9% 8,7% 12,1% 9,5% 50,5% 114 0,0% 4,4% 32,5% 13,2% 12,3% 45,6% 12.632 4,3% 15,4% 13,5% 12,9% 9,0% 55,7% STELLA ARTOIS 574 0,2% 9,2% 7,0% 29,4% 6,3% 52,4% BADEN BADEN 331 0,1% 3,0% 16,6% 41,4% 6,9% 38,1% EISENBAHN 263 0,1% 6,1% 11,8% 36,5% 17,9% 44,9% NOVA SCHIN 538 0,2% 15,2% 23,4% 9,3% 3,2% 50,2% 6.523 2,2% 10,2% 14,2% 19,5% 3,7% 54,9% 291.043 100,0% 19,1% 13,6% 9,2% 4,2% 56,5% AMBEV SERRAMALTE SKOL KIRIN PETRÓPOLIS ITAIPAVA TOTAL Focusing on users who made some comment about beer, it was possible to see that only one person was responsible for 1416 posts of beer (Table 2), but the person has only 153 followers, in other words, just his 153 followers directly viewed the information disclosed. TABLE 2 Top ten users with large number of posts. RANK USERS (TWITTER ) 1 BEEINNDEX 2 3 FOLLOWERS ON TWITTER POSTS 1461 153 SKOL_ 443 107 DJ_RICARDOO 348 512 4 CERVEJA_DUFF 208 155 5 RENATORDM 188 514 6 ITAIPAVA_ 185 415 7 PREDRERO 162 28.107 8 MARCIO_SKOL 157 171 9 SERRALHERO 107 2.181 10 GORONAH 105 769 TOTAL 3364 Following this line of reasoning, the singer Claudia Leitte sent only one post about beer, but this information was seen by her 7,869,106 followers (Table 3). PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V. 14, pp. 174-195, Abril, 2014 - www.revistapmkt.com.br 186 Statistical Analysis of Users who Chatting about Beer on Twitter Rodrigo Otávio de Araújo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen TABLE 3 Top ten users with the highest number of followers on Twitter. USERS (TWITTER ) RANK FOLLOWERS ON TWITTER POSTS 1 CLAUDIALEITTE 1 7.869.106 2 DANILOGENTILI 1 5.324.329 3 SPIDERANDERSON 1 4.226.383 4 CLARORONALDO 1 3.625.623 5 PRETAGIL 2 3.450.693 6 PORTALR7 5 2.835.528 7 VEJA 2 2.825.215 8 BGAGLIASSO 1 2.735.376 9 G1 11 2.220.615 10 SIGNOSFODAS 1 1.432.674 TOTAL 26 To analyze the influence users, it was made a ranking of the 20 largest users by PageRank. The user "frasesdebebada" has a PageRank of 0,007 and 365 connections (Table 4), it had the greatest influence on the network. You can also see in Table 4, the presence of two users who talked about beer in Twitter, which are the top 10 users with the largest number of followers (Table 3). TABLE 4 Ranking of 20 users with higher Page Rank. RANK DEGREE PAGERANK USERS COMPANY? 1 FRASESDEBEBADA NO 365 0,0070 2 IRMA_ZULEIDE NO 51 0,0033 3 SPIDERANDERSON NO 40 0,0029 4 ASTROSLUMINOSOS YES 73 0,0024 5 SIGNOSFODAS YES 48 0,0021 6 FACTBR YES 160 0,0020 7 SOUVODKA NO 60 0,0018 8 SENTOAVARAEMVCS NO 32 0,0017 9 EDUTESTOSTERONA NO 98 0,0016 10 EVERTOUS NO 108 0,0016 11 PIADAMALIGNA NO 19 0,0015 12 G1 YES 89 0,0014 13 RELAXEI NO 96 0,0013 14 MATEUSALIANO NO 93 0,0012 15 LUCASPFVR NO 49 0,0011 16 FELIXPASSIVA NO 22 0,0010 17 B1TCH_MALVADA NO 15 0,0010 18 EUZOERO NO 24 0,0010 19 PREDRERO YES 25 0,0009 20 UMVINGADOR NO 12 0,0009 PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V. 14, pp. 174-195, Abril, 2014 - www.revistapmkt.com.br 187 Statistical Analysis of Users who Chatting about Beer on Twitter Rodrigo Otávio de Araújo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen Among the influential people there is Anderson Silva, a famous MMA fighter, with more than 4 million followers and the site G1 (from Globo organizations) with 2 million followers. In the case of Claudia Leitte, she is the person with the most followers who talked about beer, but her posts were retweeted by people who do not have the habit of talking about beer, and because of this, their position in the ranking of influencers was not superior. Anderson Silva posted a message to thanks his sponsor, a famous brand of American beer, before his fateful struggle: “... equipando já pra sair... Aproveito para agradecer a todos os meus parceiros: Budweiser, Burger King...” (<http://t.co/GILAlzRwch>). The ability to determine the real influence of distinguished users reinforces the importance of this type of analysis. There is the presence of users who represent companies among the influential, even if the tweet is not directed to certain person, its information resonate with various groups within the network. In Figure 2, you can see the full network of users who talk about beer on Twitter. Figures 3, 4 and 5 show the networks of users: "frasesdebebada", "Irma_Zuleide" and "Spider Anderson", respectively. The "frasesdebebada" user, being the most influential network in relation to the amount of connections, got further spread their messages recorded by the intensity of red color in Figure 3. FIGURE 2 Full network. PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V. 14, pp. 174-195, Abril, 2014 - www.revistapmkt.com.br 188 Statistical Analysis of Users who Chatting about Beer on Twitter Rodrigo Otávio de Araújo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen FIGURE 3 Network of user “frasesdebebada”. FIGURE 4 Network of user “Irma_Zuleide”. PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V. 14, pp. 174-195, Abril, 2014 - www.revistapmkt.com.br 189 Statistical Analysis of Users who Chatting about Beer on Twitter Rodrigo Otávio de Araújo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen FIGURA 5 Network of user “SpiderAnderson”. In semantic analysis, we can see that there is not one only word that associates strongly with more than one brand. Therefore, in order to facilitate visualization, we selected only the ten words most associated with the brands. The brands were chosen according to their volume of posts. It was found that the Skol was responsible for 4.3% of the posts related to beer, Brahma with 3.5% Itaipava with 2.2% and 2.0% with Antarctica. The word most often associated with Skol was “redondo”, with a Pearson correlation equal to 0.21, followed by the words “beats” and “vire” with a correlation of 0.16 (Chart 8). These words are related to the marketing campaign of the brand. A differential that Brahma had over other brands was the poster girl of the brand, Claudia Leitte, appeared in the 6th position of the words most associated with correlation of 0.14 (Chart 8) . In the case of Antarctica brand, it has a higher correlation related to a soft drink (Guaraná) than beer specifically (Chart 8). It happens because the name of the brand is the same for both products. PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V. 14, pp. 174-195, Abril, 2014 - www.revistapmkt.com.br 190 Statistical Analysis of Users who Chatting about Beer on Twitter Rodrigo Otávio de Araújo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen CHART 8 Top 10 words with highest correlation with brands. A group of experts in semantics was responsible for the selection of keywords grouped into some issues as major when it comes to beer. A total of 39.2% of posts with no classification was obtained. These posts generally have information on beer, but without relevant content. However, it can be seen in Chart 9, the distribution of 60% of rated posts. As from this point, there was a concentration of posts relating to the Place where the drink was consumed (19.8%), With Whom the person was drinking (13.8%) and specifically about the Brands (13.0%). CHART 9 Proportion of posts by theme. PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V. 14, pp. 174-195, Abril, 2014 - www.revistapmkt.com.br 191 Statistical Analysis of Users who Chatting about Beer on Twitter Rodrigo Otávio de Araújo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen When analyzing the most discussed themes in the brands studied (Chart 10), it was seen that the beers produced by AmBev, the Stella Artois brand has 35% of posts on the theme Commemorative Dates (Chart 10), unlike other brands of the same company with posts on the subject Place. The beers of Kirin Group showed up into three themes: Baden Baden with 44% of posts in Commemorative Dates, the Eisenbahn with 32% in the theme Place, the Nova Schin with 23% of posts in With Whom theme. The beer Itaipava, Grupo Petrópolis, got 31% of the posts in Place against 25% in the theme When. CHART 10 Percentage of posts by theme by beer brands 8. CONCLUSION It was noted in this article that holidays have a great influence on the amount of posts related to beer, reaching increases in excess of 35% on the average number of daily posts. During the day, in general, there is an increase of posts in afternoon and evening. Schedules with greater intensity postings were between 23 hours and 2 hours. The social network analysis identified efficiently influential users by the quantity and quality of connections during the period. Several influencers were identified, among them stand out Anderson Silva who sent a tweet thanking his sponsors before the fight, and G1, a communications company. Semantic analysis of posts to identify issues related to beer demonstrated that there is a concentration of posts related to the place of consummation of the drink, consumed with Whom and Which were the brands consumed. In Kirin Group each brand had a higher incidence in different themes: Baden Baden had larger numbers of postings associated with Commemorative Dates, the Eisenbahn posts associated with the Place and the Nova Schin posts associated with the theme Whom. Itaipava, Grupo Petropolis, had a higher incidence in posts with the theme Place. PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V. 14, pp. 174-195, Abril, 2014 - www.revistapmkt.com.br 192 Statistical Analysis of Users who Chatting about Beer on Twitter Rodrigo Otávio de Araújo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen 9. LIMITATIONS AND FUTURE STUDIES There was no sudden break in the time series of the number of posts. It is understood that there was no problem of disconnection with the Twitter API, so we can rely on the consistency and quality of the information used in this study. In future studies, it is a useful idea to perform the analysis with larger historical information in order to understand if there is a seasonality behavior on the theme. Another hypothesis under study is the evaluation of the difference between the hours of consumption and posting. 10. REFERENCES ALEXA. Disponível em: <www.alexa.com>. Acessado em: 6 jan. 2014. BAEZA-YATES, R.; RIBEIRO NETO, B. Modern information retrieval. Addison-Wesley, 1999. BARION, E. C. N.; LAGO, D. Mineração de textos. Revista de Ciências Exatas e Tecnologia, 2008. BAVELAS, Alex. A mathematical model for group structure. Applied Anthropology 7, 1948. CERVBRASIL. A Cerveja – Contribuição econômica, s. d. Disponível em: <http://www.cervbrasil.org.br/a-cerveja/contribuicao-economica/>. Acessado em: 6 jan. 2014. CERVEJAS DO MUNDO. História da cerveja, 2009. <http://www.cervejasdomundo.com/Brasil.htm>. Acessado em: 6 jan. 2014. Disponível em: CHOWDHURY, A. Top Twitter Trends of 2009. Twitter Blog, 15 dez. 2009. Disponível em: <https://blog.twitter.com/2009/top-twitter-trends-of-2009>. Acessado em: 3 fev. 2014. CORRÊA, A. C. G. Recuperação de documentos baseada em Informação Semântica no Ambiente AMMO. UFSCAR, 2003. COUTINHO, C. A. T.; QUINTELLA, C. A. S.; PANZANI, M. M. História da Cerveja no Brasil. Portal São Francisco, s. d. Disponível em: <http://www.portalsaofrancisco.com.br/alfa/historia-dacerveja/historia-da-cerveja-no-brasil.php>. Acessado em: 6 jan. 2014. HUANG, A. Similarity Measures for Text Document Clustering. Department of Computer Science, The University of Waikato, 2008. KUNEGIS, J.; LOMMATZSCH, A.; BAUCKHAGE, C. The Slashdot zoo: mining a social network with negative edges. Track: Social Networks and Web 2.0 / Session: Interactions in Social Communities, 2009. LIU, Bing. Web Data Mining: exploring hyperlinks, contents, and usage data. Springer, 2011. PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V. 14, pp. 174-195, Abril, 2014 - www.revistapmkt.com.br 193 Statistical Analysis of Users who Chatting about Beer on Twitter Rodrigo Otávio de Araújo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen LUNDEN, I. Analyst: Twitter Passed 500M Users In June 2012, 140M Of Them In US; Jakarta ‘Biggest Tweeting’ City. TechCrunch, 30 jul. 2012. Disponível em: <http://techcrunch.com/2012/07/30/analyst-twitter-passed-500m-users-in-june-2012-140m-of-themin-us-jakarta-biggest-tweeting-city/>. Acessado em: 3 fev. 2014. MANNING, C. D.; RAGHAVAN, P.; SCHUTZE, H. Scoring, term weighting, and the vector space model: introduction to information retrieval. Stanford, 2008. MELO, I. D. et al., Análise de Redes Sociais. Universidade Federal da Paraíba, 2013. MOURA, M. F. Proposta de utilização de mineração de textos para seleção, classificação e qualificação de documentos. Campinas: Embrapa Informática Agropecuária, 2004. NÚCLEO EDUCACIONAL DE BROGLIE. Produção e consumo de cerveja no Brasil e no mundo, 2013. Disponível em: <http://www.nucleodebroglie.com/2013/03/producao-e-consumo-de-cervejano-brasil.html>. Acessado em: 6 jan. 2014. PAGE, L. et al. The PageRank citation ranking: bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998. QUEIROZ, D. F. Análise estrutural do setor cervejeiro. FAEC – Departamento de Economia, 2010. Disponível em: <http://pt.slideshare.net/diegofelinto/monografia-2010-anlise-strutural-do-setorcervejeiro-no-brasil-diego-queiroz>. Acessado em: 6 jan. 2014. SALTON, G.; MCGILL, M. J. Introduction to modern information retrieval. Computer Science Series, USA: McGraw-Hill, 1983. SILVA, Anderson. (SpiderAnderson) tweets. Disponível em: <live http://t.co/2aBqwULK>. Acessado em: 15 abr. 2014. SANTOS, M. A. M. R. Extraindo regras de associação a partir de textos. PUC, 2002. SINDICATO NACIONAL DA INDÚSTRIA DA CERVEJA – SINDICERV. Mercado, s. d. Disponível em: <http://www.sindicerv.com.br/mercado.php>. Acessado em: 6 jan. 2014. SIVIC, J. Efficient visual search of videos cast as text retrieval. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, v. 31, n. 4, IEEE, 2009. STRACHAN, D. Twitter: how to set up your account. Telegraph, 19 fev. 2009. Disponível em: <http://www.telegraph.co.uk/travel/4698589/Twitter-how-to-set-up-your-account.html>. Acessado em: 3 fev. 2014. TWITTER, Finding your Twitter short or long code. Disponível em: <http://help.twitter.com/entries/14226-how-to-find-your-twitter-short-long-code>. Acessado em: 3 fev. 2014. VALOR ECONÔMICO. Ritmo de produção de cerveja cai em 2013. 2013. Disponível em: <http://www.valor.com.br/empresas/3221828/ritmo-de-producao-de-cerveja-cai-em-2013>. Acessa- PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V. 14, pp. 174-195, Abril, 2014 - www.revistapmkt.com.br 194 Statistical Analysis of Users who Chatting about Beer on Twitter Rodrigo Otávio de Araújo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen do em: 6 jan. 2014. WASSERMAN, Stanley; FAUST, Katherine. Social network analysis: methods and applications. Cambridge: Cambridge University Press, 1994. Note: Authors are solely responsible for the translation of their articles from Portuguese to English. PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V. 14, pp. 174-195, Abril, 2014 - www.revistapmkt.com.br 195