cerveja historia

Transcription

cerveja historia
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia
ISSN: 1983-9456 (Impressa)
ISSN: 2317-0123 (On-line)
Editor: Fauze Najib Mattar
Sistema de avaliação: Triple Blind Review
Idiomas: Português e Inglês
Publicação: ABEP – Associação Brasileira de Empresas de Pesquisa
Statistical Analysis of Users who Chatting about Beer on Twitter 1
Análise de Usuários que Conversam sobre Cerveja no Twitter
Submission: Mar./28/2014 - Approval: Apr./14/2014
Rodrigo Otávio de Araújo Ribeiro
Doctor and Master in Production Engineering from Universidade Federal Fluminense - UFF.
Bachelor's degree in Statistics from Escola Nacional de Ciências Estatísticas - ENCE/IBGE. He has
a large experience on statistical modeling in large databases. Nowadays he is Director of Marketing
Intelligence at IBOPE DTM.
E-mail: [email protected]
Professional Address: IBOPE DTM - Rua Voluntários da Pátria - nº 89 - sala 803 - 22270-000 Botafogo - Rio de Janeiro/RJ – Brasil.
Tarsila Gomes Bello Tavares
Bachelor's degree in Statistics from Escola Nacional de Ciências Estatísticas - ENCE/IBGE.
Nowadays she is Coordinator of Marketing Intelligence at IBOPE DTM.
E-mail: [email protected]
Daniel de Oliveira Cohen
Bachelor's degree in Statistics from the State University of Campinas - UNICAMP. He performs
statistical analyzes as regression, segmentation and social network analysis on data collected
through quantitative surveys. Nowadays he is Statistician at IBOPE Intelligence.
E-mail: [email protected]
1
This paper was presented at ABEP’s 6th Brazilian Market, Opinion and Media Research Congress (held on March 24
and 25, 2014) and winner the Prize “Alfredo Carmo”. It was turned into this article by their authors and was submitted
and approved for publication by PMKT.
Statistical Analysis of Users who Chatting about Beer on Twitter
Rodrigo Otávio de Araújo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen
ABSTRACT
The identification of influential users in social media is a subject that has generated great interest by
companies in recent years. This work aims to evaluate this influence through the use of graphs for
understanding the existing relational structure between users, established through their
conversations on Twitter. Exploratory data analysis and text mining techniques have been used to
further conclusions about the subject. The "conversation environment" was chosen is Brazilian beer,
and the search related words were the major active brands in domestic market. The evaluation was
performed considering a sample of 25 days between the months of December 2013 and January
2014.
KEYWORDS:
Beer, Twitter, Social network analysis.
RESUMO
A identificação de usuários influentes nas mídias sociais é um assunto que tem gerado grande
interesse por parte das empresas nos últimos anos. Este artigo visa avaliar esta influência por meio
da utilização de grafos para entendimento da estrutura relacional existente entre os usuários,
estabelecida por suas conversas no Twitter. A análise exploratória de dados e as técnicas de
Mineração de Textos foram utilizadas para conclusões complementares acerca do assunto. O
ambiente de conversas escolhido para avaliação foi o das cervejas brasileiras, sendo as buscas
realizadas por palavras relacionadas às principais marcas atuantes no mercado nacional. A avaliação
foi realizada, considerando uma amostra de 25 dias entre os meses de dezembro de 2013 e janeiro
de 2014.
PALAVRAS-CHAVE:
Cerveja, Twitter, Análise de Redes Sociais.
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil,
V. 14, pp. 174-195, Abril, 2014 - www.revistapmkt.com.br
175
Statistical Analysis of Users who Chatting about Beer on Twitter
Rodrigo Otávio de Araújo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen
1. INTRODUCTION
This article aims to identify the most influential users on Twitter who posted messages about beer.
A sample of 25 days between the months of December 2013 and January of 2014 was used,
considering only posts made in Portuguese, in Brazil.
The content of the conversations was also assessed by applying text mining algorithms. A
descriptive analysis of the general behavior of Twitter users who talk about the subject, toward an
understanding of aspects about the use and impact of different brands and the users profile was
performed.
The largest amount of posts on the subject took place in the afternoon and evening, where there is a
strong asymmetry with respect to the distribution of the number of messages posted by users; the
majority posted only a single message during the period. By observation of the peaks in the time
series of the total number of messages posted, it was possible to evaluate the effect of holidays: the
behavior of users during the New Year was very close to what was observed at Christmas.
As semantic evaluation of posts about beer, many topics (themes) within the main subject were
identified. This kind of information can assist companies in targeting their strategies and ongoing
monitoring of consumer behavior. It was noticed that when many users post messages about beer,
they mention information about where, with whom, or even when they will consume it. Many times
they also mention the brands of their preference as well.
The analysis of influence of users in social networks allows the creation of various marketing
strategies. Most influential users on a particular subject can be contacted by companies to publicize
their brands being used as links between companies and other end users.
The measurement of the influence made in this work was done based on the number of connections
that the user had during the study period. On Twitter, users can target their messages to each other
and pass on information disclosed by any of their connections (retweets).
One way to assess the degree of influence of users consists in verifying the amount of connections
that pass their messages or the number of connections to which they direct their posts. This paper
aims to evaluate these two points of view.
This study was structured as follows: after the introduction, in the second part, we pointed out the
main features of the different techniques used in the analysis. In the third, there was a small
explanation about Twitter and its strong growth in Brazil. In the fourth part, the contextualization of
the domestic beer market, its evolution, its trends and key brands. In the fifth part, it was detailed
the analytical methodology applied, clarifying the questions answered by the study. In the sixth
part, there was an explanation about the data information used. In the seventh, the results of data
analysis were shown. In the eighth, it was presented the main conclusions and, finally, the limitation
and suggestions for new research.
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil,
V. 14, pp. 174-195, Abril, 2014 - www.revistapmkt.com.br
176
Statistical Analysis of Users who Chatting about Beer on Twitter
Rodrigo Otávio de Araújo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen
2. THEORETICAL BACKGROUND
2.1 SOCIAL NETWORK ANALYSIS
A social network is determined by a set of actors (or nodes) and pre-established relationships
between them (WASSERMAN; FAUST, 1994). Actors can take many forms and represent different
groups of individuals as users, companies and entities. Because of its great flexibility, social
network analysis (also commonly called Social Network Analysis - SNA) can be applied in almost
any context.
Generally, SNA techniques are visually represented by "graphs". In these graphs, the actors or
nodes are represented by dots and the relationship between a pair of nodes is defined by edges or
connections.
The connections can be direct when it is important to highlight that the actor was the source of this
relationship (WASSERMAN; FAUST, 1994). According to the authors, in addition to being
visually displayed, a social network can be described by an n x n matrix, where n is the total
number of nodes on that network.
The existence of relationship between the pair of nodes u and v, for example, would be given the
value 1 in the corresponding cell of the matrix. The reading can be done as follows: the rows
represent nodes where the relationship goes (actors of origin) and the columns, where the
relationship ends (actors of destination). Thus, an indirect social network will always give a
symmetric matrix.
In order to assist in understanding the relationships between the actors, there are some metrics that
can consider the network as a whole or each node in specific. Among them are:
 Degree (degree): number of edges connected to each node.
 PageRank: spectral measure of popularity set to direct graphs with non-negative weights of
connections (PAGE et al., 1998) , and can be given by:
(
)̅
( )
Where:
n = total number of nodes in the network (users).
A = ∈ {−1,0, +1}n×n is the adjacency matrix with values Auv = +1 when user u marked user v as a
friend and Auv = −1 when user u marked user v as a foe. A is sparse, square and asymmetric.
̅ = absolute diagonal matrix defined by ̅
∑ | |.
= is a matrix full of ones of the specified size, and 0 < α < 1 is the teleportation parameter.
The matrix G is left-stochastic, each row sums to one (KUNEGIS; LOMMATZSCH;
BAUCKHAGE, 2009).
The software used in this study was Gephi, a freeware that allows different forms of editing and
customization of the final results. It can be used in the creation of graphs and calculating the metrics
analysis.
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil,
V. 14, pp. 174-195, Abril, 2014 - www.revistapmkt.com.br
177
Statistical Analysis of Users who Chatting about Beer on Twitter
Rodrigo Otávio de Araújo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen
2.2 TEXT MINING
Text Mining is the process of extracting useful information or knowledge from unstructured text
documents (BARYON; LAKE, 2008). In the context of this study, this technique is applied to
identify patterns of comments and opinions expressed by users of Twitter about the Brazilian beer
market.
Information Retrieval techniques are applied over a set of texts, with the aim of making it
structured. From these structured data, data mining techniques are applied to obtain relevant
information, as shown in Figure 1.
Source: BARION, E. C. N.; LAGO, D. Mineração de textos. Revista de Ciências Exatas e Tecnologia, 2008.
FIGURE 1
Text Mining Process.
The first step Mining is the indexing process that stores an index structure, from the words of the
text, and makes it possible to search for documents by all terms contained therein (SALTON;
MCGILL, 1983). Some steps to an analysis of Text Mining (BARYON; LAKE, 2008):
 Lexical Analysis: converts a string into a sequence of words that are candidates for index terms.
 Removal of Stop-words: removes a set of words that appear frequently in texts, but have no
semantic value, such as prepositions, articles and conjunctions. This phase is extremely
important, because it reduces the base to be indexed and facilitates mining.
 Stemming: removes all variations of words, leaving only the root of each, for example, the word
“dreaming" becomes identified as the root of "dream".
 Selection of index terms: determines which words or radical elements will be used as indexing.
These words are selected according to the weight assigned to them.
 Bag of Words - BOW: a matrix in which each different term in this collection of documents is
indexed. From this indexing, each document can be represented by a first vector x n, where n is
the total number of terms; each entry of this vector is the number of times the terms appear in
this document (SIVIC, 2009).
 Determination of weights: filling the BOW matrix is based on metrics that weigh the frequency
of occurrence of terms in documents and in the total collection (set of all documents). The
metric most commonly used for this purpose is called tf-idf (term frequency inverse document
frequency).
 Correlation (similarity) between terms: BOW based on the matrix, one can calculate the Pearson
correlation between different words, in order to measure how they are related by the formula
(HUANG, 2008):
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil,
V. 14, pp. 174-195, Abril, 2014 - www.revistapmkt.com.br
178
Statistical Analysis of Users who Chatting about Beer on Twitter
Rodrigo Otávio de Araújo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen
(→
∑
→ )
√
∑
∑
∑
∑
∑
∑
Where:
→
vector created by the BOW.
m = total number of distinct terms in the entire collection of documents.
weight (tf-idf) of term t in document a.
weight (tf-idf) of term t in document b
3. O TWITTER
Twitter was founded in 2006 by partners Jack Dorsey, Evan Williams, Biz Stone and Noah Glass,
in San Francisco, USA. The service is a social network that allows users to post and read tweets,
which are nothing more than a 140 character messages. Its access can be made directly on any
internet browser, for applications in mobiles. In some countries, the posts can be made by SMS as
well. The idea quickly spread and gained popularity throughout the world: in 2012, there were more
than 500 million registered users who posted 340 million tweets per day (LUNDEN, 2012).
According to the information site of hits on web pages (<www.alexa.com>), Twitter was one of the
ten most accessed pages of the world that year.
Once registered, the user defines an address on the site that is not already being used. From then on,
he will always be known by other users for that address preceded by the “@” symbol.
Set this address and registered the account, the user can "follow" or "be followed" by other users.
This means when a user posts something, the message appears directly for the users that follow him.
By default, tweets are publicly visible. However, you can restrict viewing messages only to their
followers. Another possibility is to repost the message that has already been posted by someone
else, a practice known as retweet, and which is characterized by the abbreviation RT. In this case,
the goal is to get the message out (STRACHAN, 2009).
When a post that is on a specific topic, users can apply hashtags on their messages - phrases or
words that begin with the # symbol (STRACHAN, 2009). Likewise, it´s possible to display only
messages that on that specific topic.
When a word, a phrase or an expression are often mentioned simultaneously by a large number of
different users, they can be considered a trending topic (CHOWDHURY, 2009). Trending generally
occurs when efforts of a group of users with common interest are brought together for the sake of
some goal or when large and popular events are happening.
4. BEER MARKET IN BRAZIL
Currently, Brazil has a highly competitive beer market in which companies stand out as AmBev,
Brazil and Petrópolis Kirin Group. With a turnover of R$ 63 billion in 2012, the country is the third
largest brewer in the 26th international consumer ranking (ECONOMIC VALUE, 2013).
The market share of this market in Brazil is concentrated in AmBev breweries, Kirin Group and
Grupo Petrópolis, which together have 90% of the market. Important information is the evolution of
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil,
V. 14, pp. 174-195, Abril, 2014 - www.revistapmkt.com.br
179
Statistical Analysis of Users who Chatting about Beer on Twitter
Rodrigo Otávio de Araújo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen
per capita beer`s consumption in liters per year in Brazil. Since 2008, the beer consumption in
Brazil has presented a significant increase and in 2012, the consumption reached 66.7 liters per
capita (Chart 1).
CHART 1
Evolution of Brazil`s consumption of beer (liters per capita).
In Chart 2 are the market shares of Brazilian beer market in 2012.
CHART 2
Market shares of the Brazilian beer market in 2012.
Due to the relevance of the beer market in the Brazilian economy and its continued growth we
decided to perform this study in which the monitoring was conducted following brands: Antarctica,
Baden Baden, Bohemia, Brahma, Budweiser, Eisenbahn, Itaipava, Nova Schin, Serramalte , Stella
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil,
V. 14, pp. 174-195, Abril, 2014 - www.revistapmkt.com.br
180
Statistical Analysis of Users who Chatting about Beer on Twitter
Rodrigo Otávio de Araújo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen
Artois and Skol, besides the word “cerveja” (beer) and two of its regional variations: “breja” and
“cerva”.
5. ANALYTICAL METHODOLOGY
The analytical methodology consists in the execution of three steps: the first refers to the analysis of
the general behavior and the profile of users on the use of Twitter to make posts about beer; the
second, the semantic analysis based on text mining techniques and multivariate statistics to identify
the most relevant topics of discussion within the brewing environment and, finally, evaluation of the
influence of users.
5.1 GENERAL BEHAVIOR AND PROFILE OF USERS WHEN THE SUBJECT IS BEER
In the first analytical step, we sought to assess the main aggregated metrics present at work grouped
in time. The most important were the following:
 Number of posts: measures the total number of posts made by time interval.
 Number of distinct users: measures the total number of distinct users who have had postings per
time interval.
 Average Posts per user: calculated by dividing the number of posts by the number of distinct
users.
 Percentage of posts: proportion of posts classified in each of the existing categories.
The analysis of the total amount of posts makes it possible to evaluate the total intensity of impacts
occurred during the observed period. Through the average of posts per user we can verify, in
general terms, the degree of intensity of disclosure of the matter considered among the users , so
that, the closer to 1 is the average, the lower the intensity. The percentage of posts evaluates the
weight of each existing category within a given categorical variable in the total of posts considered.
The evaluation of these metrics is aimed to understanding the characteristics of the general behavior
of the Twitter users about beer. The identification of the peaks was made by visualization of time
series of the number of posts. The same procedure must be performed to evaluate the time curve.
Alternating the amount of posts to post by the average user could evaluate changes on the behavior
of individual users. Often there are large variations in this metric on time intervals, due to specific
users who tend to perform more posts about the specific topics or events.
Twitter allows the use of specific metrics that denote the different types of behavior of its members;
among them you can highlight the penetration (proportion of posts with certain characteristic).
These characteristics were the following:
 RTs: tweets which passed on a message that had already been posted by another user.
 @: directing messages to another person.
 Http: tweets possessing information contained on websites.
 Hashtag (#): group discussion on a specific topic.
 Other: tweets which do not contain any of the aforementioned characteristics.
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil,
V. 14, pp. 174-195, Abril, 2014 - www.revistapmkt.com.br
181
Statistical Analysis of Users who Chatting about Beer on Twitter
Rodrigo Otávio de Araújo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen
5.2 INFLUENCE ANALYSIS
The analysis of influence is taken from a network of conversations in which two distinct cases of
influence were observed, the first case considers the retweets. The other case of influence includes
the tweets sent directly to other users.
In the first case, it was noted how influential a user is checking how many other users have made
retweet its post. Then, there was the influence of the amount of directed conversations between
users.
In this article, we considered the two cases and all sorts of connections between users. However, in
practical terms, the effect of retweets has always more impact, because it happens with more
frequency.
5.3 SEMANTIC ANALYSIS
Correlation analysis between topics was accomplished as the following process: first the lexical
analysis was performed. In a second step, the cleaning of stop-words (words without semantic
value) for later execution stemming algorithm (extraction of radicals) was taken. After these steps,
the BOW matrix was calculated. In this array, each term corresponding to a considered column and
each row to a document (tweet).
The measure was used to assess the tf-idf (term frequency inverse document frequency). Based on
the information matrix it was possible to obtain the measures most associated with the particular
word. This similarity was assessed by Pearson correlation.
The classification of posts on the theme was generated through the development of a heuristic based
on the selection of keywords defined by experts. The evaluation process of the words to be
considered is:
 Step 1: definition of keywords that characterize certain theme.
 Step 2: development of algorithm to count the keywords defined in step 1.
 Step 3: Repeat steps 1 and 2 until the proportion of posts classified into any theme can be
considered satisfactory.
Generally, the minimum proportion of posts classified into themes for obtaining consistent results is
50 percent.
6. AVAILABLE INFORMATION
The extraction of information was done through a program developed by IBOPE DTM that
connects directly to the Twitter API.
Based on the distribution of market share in the Brazilian beer market, it was decided to study only
the brands of the most significant companies in the segment: AmBev, Kirin Group and Grupo
Petrópolis.
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil,
V. 14, pp. 174-195, Abril, 2014 - www.revistapmkt.com.br
182
Statistical Analysis of Users who Chatting about Beer on Twitter
Rodrigo Otávio de Araújo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen
Therefore, we carried out the monitoring of the following brands: Antarctica, Baden Baden,
Bohemia, Brahma, Budweiser, Eisenbahn, Itaipava, Nova Schin, Serramalte, Stella Artois and Skol,
besides the word “cerveja” (beer) and two of its regional variations: “cerva” and “breja”.
The data refer to all messages posted during the study period containing the specified words.
After 25 days of monitoring, 438,507 tweets (posts) related to beer were obtained. However, the
study was done focusing on disclosure in Brazil, we only considered posts in Portuguese and work
was started with 291,043 posts (66.4%).
The monitoring period from 10/12/2013 to 01/03/2014 was chosen based on the assumption that the
holidays of the end of the year: Christmas and New Year influencing the number of posts on
Twitter about beer.
7. DATA ANALYSIS
The analysis followed the same structure of the methodology presented. First the general
distribution of the posts was evaluated.
7.1 GENERAL BEHAVIOR AND PROFILE OF USERS WHEN THE SUBJECT IS BEER
The impacts caused by the holidays put considerable variation in the daily amount of tweets posted.
In Chart 3, we can see that the days that had incidence peaks posts were 24, 25 and December 31, or
Christmas Eve, Christmas and New Year's Eve, in which there was an increase of over 4000 posts
for the total period average.
CHART 3
Distribution of posts about beer in Twitter.
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil,
V. 14, pp. 174-195, Abril, 2014 - www.revistapmkt.com.br
183
Statistical Analysis of Users who Chatting about Beer on Twitter
Rodrigo Otávio de Araújo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen
As for the timing of posts (Chart 4), there was a sharp increase from 10 o'clock in the morning,
which has stabilized at between 15 and 21 hours.
CHART 4
Amount of posts per hour.
By analyzing the average of posts per user, it can be seen that there was a peak in the middle at 9
am (Chart 5). But this peak was not large enough to consider the behavior very different from other
hours of the day.
CHART 5
Average amount of posts by Total, Christmas and New Year.
However, when Christmas and New Year holidays were detailed, it was seen that, at Christmas, the
highest average incidence of posts occurred between 8 and 9 o'clock, while in the New Year this
higher average incidence of posts occurred in the period as from 23 hours, as shown in Chart 6.
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil,
V. 14, pp. 174-195, Abril, 2014 - www.revistapmkt.com.br
184
Statistical Analysis of Users who Chatting about Beer on Twitter
Rodrigo Otávio de Araújo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen
CHART 6
Average amount of posts by Total, Christmas and New Year.
In Chart 7, we note that 85.5 % of the posts pertaining to beer do not mention a specific brand.
However, considering 14.5% of the posts with quote of some brand, Skol is the one with higher
participation in Twitter with 4.3 %, followed by Brahma Itaipava with 3.5% and 2.2% of the posts.
CHART 7
Percentage of posts of search words.
As to individual metrics (Table 1), it was seen that the only brand that had featured a significant
amount of posts with hashtag (#) was the Eisenbahn with 17.9% of the posts. Brands which
contained links to sites (http) were: Baden Baden with 41.4 %, with 36.5% Eisenbahn, Antarctica
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil,
V. 14, pp. 174-195, Abril, 2014 - www.revistapmkt.com.br
185
Statistical Analysis of Users who Chatting about Beer on Twitter
Rodrigo Otávio de Araújo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen
with 32.4% and Stella Artois with 29.4% of the posts. In context messages directed (@), the
Serramalte brand stood out with 32.5%, followed by Nova Schin with 23.4% of the posts. Finally,
the transfer of messages previously posted (RT) were higher in the Budweiser brand in 26.9% and
21.1% of Antarctica in posts.
TABLE 1
Number of posts by search word and its individual metrics.
POSTS
SEARCH WORD
CERVEJA (beer)
% PENETRATION BY POST TYPE
%
RT
@
HTTP
HASTAG
OTHERS
215.229
74,0%
21,1%
13,2%
8,0%
3,8%
56,3%
BREJA
19.537
6,7%
11,0%
16,9%
6,1%
3,5%
64,4%
CERVA
14.112
4,8%
11,2%
15,8%
5,7%
3,8%
65,7%
ANTARCTICA
5.781
2,0%
21,1%
11,3%
32,4%
4,5%
33,7%
BOHEMIA
1.943
0,7%
6,2%
13,2%
17,4%
9,0%
59,1%
BRAHAMA
10.234
3,5%
15,7%
13,9%
16,1%
6,7%
52,0%
BUDWEISER
3.232
1,1%
26,9%
8,7%
12,1%
9,5%
50,5%
114
0,0%
4,4%
32,5%
13,2%
12,3%
45,6%
12.632
4,3%
15,4%
13,5%
12,9%
9,0%
55,7%
STELLA ARTOIS
574
0,2%
9,2%
7,0%
29,4%
6,3%
52,4%
BADEN BADEN
331
0,1%
3,0%
16,6%
41,4%
6,9%
38,1%
EISENBAHN
263
0,1%
6,1%
11,8%
36,5%
17,9%
44,9%
NOVA SCHIN
538
0,2%
15,2%
23,4%
9,3%
3,2%
50,2%
6.523
2,2%
10,2%
14,2%
19,5%
3,7%
54,9%
291.043
100,0%
19,1%
13,6%
9,2%
4,2%
56,5%
AMBEV
SERRAMALTE
SKOL
KIRIN
PETRÓPOLIS
ITAIPAVA
TOTAL
Focusing on users who made some comment about beer, it was possible to see that only one person
was responsible for 1416 posts of beer (Table 2), but the person has only 153 followers, in other
words, just his 153 followers directly viewed the information disclosed.
TABLE 2
Top ten users with large number of posts.
RANK
USERS
(TWITTER )
1
BEEINNDEX
2
3
FOLLOWERS ON
TWITTER
POSTS
1461
153
SKOL_
443
107
DJ_RICARDOO
348
512
4
CERVEJA_DUFF
208
155
5
RENATORDM
188
514
6
ITAIPAVA_
185
415
7
PREDRERO
162
28.107
8
MARCIO_SKOL
157
171
9
SERRALHERO
107
2.181
10
GORONAH
105
769
TOTAL
3364
Following this line of reasoning, the singer Claudia Leitte sent only one post about beer, but this
information was seen by her 7,869,106 followers (Table 3).
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil,
V. 14, pp. 174-195, Abril, 2014 - www.revistapmkt.com.br
186
Statistical Analysis of Users who Chatting about Beer on Twitter
Rodrigo Otávio de Araújo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen
TABLE 3
Top ten users with the highest number of followers on Twitter.
USERS
(TWITTER )
RANK
FOLLOWERS ON
TWITTER
POSTS
1
CLAUDIALEITTE
1
7.869.106
2
DANILOGENTILI
1
5.324.329
3
SPIDERANDERSON
1
4.226.383
4
CLARORONALDO
1
3.625.623
5
PRETAGIL
2
3.450.693
6
PORTALR7
5
2.835.528
7
VEJA
2
2.825.215
8
BGAGLIASSO
1
2.735.376
9
G1
11
2.220.615
10
SIGNOSFODAS
1
1.432.674
TOTAL
26
To analyze the influence users, it was made a ranking of the 20 largest users by PageRank. The user
"frasesdebebada" has a PageRank of 0,007 and 365 connections (Table 4), it had the greatest
influence on the network.
You can also see in Table 4, the presence of two users who talked about beer in Twitter, which are
the top 10 users with the largest number of followers (Table 3).
TABLE 4
Ranking of 20 users with higher Page Rank.
RANK
DEGREE PAGERANK
USERS
COMPANY?
1
FRASESDEBEBADA
NO
365
0,0070
2
IRMA_ZULEIDE
NO
51
0,0033
3
SPIDERANDERSON
NO
40
0,0029
4
ASTROSLUMINOSOS
YES
73
0,0024
5
SIGNOSFODAS
YES
48
0,0021
6
FACTBR
YES
160
0,0020
7
SOUVODKA
NO
60
0,0018
8
SENTOAVARAEMVCS NO
32
0,0017
9
EDUTESTOSTERONA
NO
98
0,0016
10
EVERTOUS
NO
108
0,0016
11
PIADAMALIGNA
NO
19
0,0015
12
G1
YES
89
0,0014
13
RELAXEI
NO
96
0,0013
14
MATEUSALIANO
NO
93
0,0012
15
LUCASPFVR
NO
49
0,0011
16
FELIXPASSIVA
NO
22
0,0010
17
B1TCH_MALVADA
NO
15
0,0010
18
EUZOERO
NO
24
0,0010
19
PREDRERO
YES
25
0,0009
20
UMVINGADOR
NO
12
0,0009
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil,
V. 14, pp. 174-195, Abril, 2014 - www.revistapmkt.com.br
187
Statistical Analysis of Users who Chatting about Beer on Twitter
Rodrigo Otávio de Araújo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen
Among the influential people there is Anderson Silva, a famous MMA fighter, with more than 4
million followers and the site G1 (from Globo organizations) with 2 million followers.
In the case of Claudia Leitte, she is the person with the most followers who talked about beer, but
her posts were retweeted by people who do not have the habit of talking about beer, and because of
this, their position in the ranking of influencers was not superior.
Anderson Silva posted a message to thanks his sponsor, a famous brand of American beer, before
his fateful struggle: “... equipando já pra sair... Aproveito para agradecer a todos os meus parceiros:
Budweiser, Burger King...” (<http://t.co/GILAlzRwch>).
The ability to determine the real influence of distinguished users reinforces the importance of this
type of analysis.
There is the presence of users who represent companies among the influential, even if the tweet is
not directed to certain person, its information resonate with various groups within the network.
In Figure 2, you can see the full network of users who talk about beer on Twitter. Figures 3, 4 and 5
show the networks of users: "frasesdebebada", "Irma_Zuleide" and "Spider Anderson",
respectively.
The "frasesdebebada" user, being the most influential network in relation to the amount of
connections, got further spread their messages recorded by the intensity of red color in Figure 3.
FIGURE 2
Full network.
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil,
V. 14, pp. 174-195, Abril, 2014 - www.revistapmkt.com.br
188
Statistical Analysis of Users who Chatting about Beer on Twitter
Rodrigo Otávio de Araújo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen
FIGURE 3
Network of user “frasesdebebada”.
FIGURE 4
Network of user “Irma_Zuleide”.
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil,
V. 14, pp. 174-195, Abril, 2014 - www.revistapmkt.com.br
189
Statistical Analysis of Users who Chatting about Beer on Twitter
Rodrigo Otávio de Araújo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen
FIGURA 5
Network of user “SpiderAnderson”.
In semantic analysis, we can see that there is not one only word that associates strongly with more
than one brand.
Therefore, in order to facilitate visualization, we selected only the ten words most associated with
the brands. The brands were chosen according to their volume of posts.
It was found that the Skol was responsible for 4.3% of the posts related to beer, Brahma with 3.5%
Itaipava with 2.2% and 2.0% with Antarctica.
The word most often associated with Skol was “redondo”, with a Pearson correlation equal to 0.21,
followed by the words “beats” and “vire” with a correlation of 0.16 (Chart 8). These words are
related to the marketing campaign of the brand.
A differential that Brahma had over other brands was the poster girl of the brand, Claudia Leitte,
appeared in the 6th position of the words most associated with correlation of 0.14 (Chart 8) .
In the case of Antarctica brand, it has a higher correlation related to a soft drink (Guaraná) than beer
specifically (Chart 8). It happens because the name of the brand is the same for both products.
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil,
V. 14, pp. 174-195, Abril, 2014 - www.revistapmkt.com.br
190
Statistical Analysis of Users who Chatting about Beer on Twitter
Rodrigo Otávio de Araújo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen
CHART 8
Top 10 words with highest correlation with brands.
A group of experts in semantics was responsible for the selection of keywords grouped into some
issues as major when it comes to beer.
A total of 39.2% of posts with no classification was obtained. These posts generally have
information on beer, but without relevant content.
However, it can be seen in Chart 9, the distribution of 60% of rated posts. As from this point, there
was a concentration of posts relating to the Place where the drink was consumed (19.8%), With
Whom the person was drinking (13.8%) and specifically about the Brands (13.0%).
CHART 9
Proportion of posts by theme.
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil,
V. 14, pp. 174-195, Abril, 2014 - www.revistapmkt.com.br
191
Statistical Analysis of Users who Chatting about Beer on Twitter
Rodrigo Otávio de Araújo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen
When analyzing the most discussed themes in the brands studied (Chart 10), it was seen that the
beers produced by AmBev, the Stella Artois brand has 35% of posts on the theme Commemorative
Dates (Chart 10), unlike other brands of the same company with posts on the subject Place. The
beers of Kirin Group showed up into three themes: Baden Baden with 44% of posts in
Commemorative Dates, the Eisenbahn with 32% in the theme Place, the Nova Schin with 23% of
posts in With Whom theme.
The beer Itaipava, Grupo Petrópolis, got 31% of the posts in Place against 25% in the theme When.
CHART 10
Percentage of posts by theme by beer brands
8. CONCLUSION
It was noted in this article that holidays have a great influence on the amount of posts related to
beer, reaching increases in excess of 35% on the average number of daily posts. During the day, in
general, there is an increase of posts in afternoon and evening. Schedules with greater intensity
postings were between 23 hours and 2 hours.
The social network analysis identified efficiently influential users by the quantity and quality of
connections during the period. Several influencers were identified, among them stand out Anderson
Silva who sent a tweet thanking his sponsors before the fight, and G1, a communications company.
Semantic analysis of posts to identify issues related to beer demonstrated that there is a
concentration of posts related to the place of consummation of the drink, consumed with Whom and
Which were the brands consumed.
In Kirin Group each brand had a higher incidence in different themes: Baden Baden had larger
numbers of postings associated with Commemorative Dates, the Eisenbahn posts associated with
the Place and the Nova Schin posts associated with the theme Whom. Itaipava, Grupo Petropolis,
had a higher incidence in posts with the theme Place.
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil,
V. 14, pp. 174-195, Abril, 2014 - www.revistapmkt.com.br
192
Statistical Analysis of Users who Chatting about Beer on Twitter
Rodrigo Otávio de Araújo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen
9. LIMITATIONS AND FUTURE STUDIES
There was no sudden break in the time series of the number of posts. It is understood that there was
no problem of disconnection with the Twitter API, so we can rely on the consistency and quality of
the information used in this study.
In future studies, it is a useful idea to perform the analysis with larger historical information in order
to understand if there is a seasonality behavior on the theme.
Another hypothesis under study is the evaluation of the difference between the hours of
consumption and posting.
10. REFERENCES
ALEXA. Disponível em: <www.alexa.com>. Acessado em: 6 jan. 2014.
BAEZA-YATES, R.; RIBEIRO NETO, B. Modern information retrieval. Addison-Wesley, 1999.
BARION, E. C. N.; LAGO, D. Mineração de textos. Revista de Ciências Exatas e Tecnologia,
2008.
BAVELAS, Alex. A mathematical model for group structure. Applied Anthropology 7, 1948.
CERVBRASIL. A Cerveja – Contribuição econômica, s. d. Disponível em:
<http://www.cervbrasil.org.br/a-cerveja/contribuicao-economica/>. Acessado em: 6 jan. 2014.
CERVEJAS
DO
MUNDO.
História
da
cerveja,
2009.
<http://www.cervejasdomundo.com/Brasil.htm>. Acessado em: 6 jan. 2014.
Disponível
em:
CHOWDHURY, A. Top Twitter Trends of 2009. Twitter Blog, 15 dez. 2009. Disponível em:
<https://blog.twitter.com/2009/top-twitter-trends-of-2009>. Acessado em: 3 fev. 2014.
CORRÊA, A. C. G. Recuperação de documentos baseada em Informação Semântica no Ambiente
AMMO. UFSCAR, 2003.
COUTINHO, C. A. T.; QUINTELLA, C. A. S.; PANZANI, M. M. História da Cerveja no Brasil.
Portal São Francisco, s. d. Disponível em: <http://www.portalsaofrancisco.com.br/alfa/historia-dacerveja/historia-da-cerveja-no-brasil.php>. Acessado em: 6 jan. 2014.
HUANG, A. Similarity Measures for Text Document Clustering. Department of Computer Science,
The University of Waikato, 2008.
KUNEGIS, J.; LOMMATZSCH, A.; BAUCKHAGE, C. The Slashdot zoo: mining a social network
with negative edges. Track: Social Networks and Web 2.0 / Session: Interactions in Social
Communities, 2009.
LIU, Bing. Web Data Mining: exploring hyperlinks, contents, and usage data. Springer, 2011.
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil,
V. 14, pp. 174-195, Abril, 2014 - www.revistapmkt.com.br
193
Statistical Analysis of Users who Chatting about Beer on Twitter
Rodrigo Otávio de Araújo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen
LUNDEN, I. Analyst: Twitter Passed 500M Users In June 2012, 140M Of Them In US; Jakarta
‘Biggest
Tweeting’
City.
TechCrunch,
30
jul.
2012.
Disponível
em:
<http://techcrunch.com/2012/07/30/analyst-twitter-passed-500m-users-in-june-2012-140m-of-themin-us-jakarta-biggest-tweeting-city/>. Acessado em: 3 fev. 2014.
MANNING, C. D.; RAGHAVAN, P.; SCHUTZE, H. Scoring, term weighting, and the vector
space model: introduction to information retrieval. Stanford, 2008.
MELO, I. D. et al., Análise de Redes Sociais. Universidade Federal da Paraíba, 2013.
MOURA, M. F. Proposta de utilização de mineração de textos para seleção, classificação e
qualificação de documentos. Campinas: Embrapa Informática Agropecuária, 2004.
NÚCLEO EDUCACIONAL DE BROGLIE. Produção e consumo de cerveja no Brasil e no mundo,
2013. Disponível em: <http://www.nucleodebroglie.com/2013/03/producao-e-consumo-de-cervejano-brasil.html>. Acessado em: 6 jan. 2014.
PAGE, L. et al. The PageRank citation ranking: bringing order to the web. Technical report,
Stanford Digital Library Technologies Project, 1998.
QUEIROZ, D. F. Análise estrutural do setor cervejeiro. FAEC – Departamento de Economia, 2010.
Disponível em: <http://pt.slideshare.net/diegofelinto/monografia-2010-anlise-strutural-do-setorcervejeiro-no-brasil-diego-queiroz>. Acessado em: 6 jan. 2014.
SALTON, G.; MCGILL, M. J. Introduction to modern information retrieval. Computer Science
Series, USA: McGraw-Hill, 1983.
SILVA, Anderson. (SpiderAnderson) tweets. Disponível em: <live http://t.co/2aBqwULK>.
Acessado em: 15 abr. 2014.
SANTOS, M. A. M. R. Extraindo regras de associação a partir de textos. PUC, 2002.
SINDICATO NACIONAL DA INDÚSTRIA DA CERVEJA – SINDICERV. Mercado, s. d.
Disponível em: <http://www.sindicerv.com.br/mercado.php>. Acessado em: 6 jan. 2014.
SIVIC, J. Efficient visual search of videos cast as text retrieval. IEEE TRANSACTIONS ON
PATTERN ANALYSIS AND MACHINE INTELLIGENCE, v. 31, n. 4, IEEE, 2009.
STRACHAN, D. Twitter: how to set up your account. Telegraph, 19 fev. 2009. Disponível em:
<http://www.telegraph.co.uk/travel/4698589/Twitter-how-to-set-up-your-account.html>. Acessado
em: 3 fev. 2014.
TWITTER,
Finding
your
Twitter
short
or
long
code.
Disponível
em:
<http://help.twitter.com/entries/14226-how-to-find-your-twitter-short-long-code>. Acessado em: 3
fev. 2014.
VALOR ECONÔMICO. Ritmo de produção de cerveja cai em 2013. 2013. Disponível em:
<http://www.valor.com.br/empresas/3221828/ritmo-de-producao-de-cerveja-cai-em-2013>. Acessa-
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil,
V. 14, pp. 174-195, Abril, 2014 - www.revistapmkt.com.br
194
Statistical Analysis of Users who Chatting about Beer on Twitter
Rodrigo Otávio de Araújo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen
do em: 6 jan. 2014.
WASSERMAN, Stanley; FAUST, Katherine. Social network analysis: methods and applications.
Cambridge: Cambridge University Press, 1994.
Note: Authors are solely responsible for the translation of their articles from Portuguese to English.
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil,
V. 14, pp. 174-195, Abril, 2014 - www.revistapmkt.com.br
195