BIG DATA STORYTELLING

Transcription

BIG DATA STORYTELLING
OF BIG DATA STORYTELLING
58
6
BIG DATA
STORYTELLING
In a digital galaxy far, far away.
Seventy years ago, scientists started to attempt to quantify the growth rate of
data volumes, something that were referred to as the Information Explosion,
today known as Big Data. The definition of Big Data has been widely discussed
by both IBM (2013) and SAS (2015), pioneers in the industry that both define
Big Data as:
Volume, the scale of data and how it increases exponentially over time.
Velocity, the analysis of streaming data and their interconnection.
Variety, the different forms of data and their origin.
IBM include an additional fourth term, veracity, which describes the uncertainty of
data and the importance of trust when working with it, and SAS includes and additional fifth and sixth term, variability and complexity, which describes the variation
and inconsistency of the data flow, and how to tie different data sources together.
We assume that everything that can be connected, will be connected in the
future which will produce massive amounts of unstructured data. Subsequently, in
the future Big Data will be the digital equivalent of India’s garbage mountains. It
will just be digital waste if it is not structured and analysed in a productive manner.
The factors proposed above will provide an understanding of the Big Data concept, so we do not only ask ourselves how Big Data can be analysed and structured, but also how it can be used?
How can Big Data in combination with the Internet of Things (IoT) (Kopetz, 2011)
improve business, leisure and man’s future daily life by enhancing storytelling?
ALEXANDER AROZIN, AYESHA AHSAN, DANIEL LINDSTRÖM, SAMUEL LINDBERG
FUTURE OF IN
DALE
STORYTELLING ENHANCED BY BIG DATA
59
—— Dale’s logotype
DALE
I
f storytelling is the next step for Big Data in business (Narrative Science, 2015), then Big Data can be a big actor in
storytelling. Introducing Dale, a storytelling application that
utilizes Big Data to create stories tailored to your expectations.
Dale will analyse the input from the user, such as age, length of story,
genre and favourite author. Dales takes advantage of this input and
uses it to define keywords and important parameters in order to be
able to collect relevant information from various Big Data-sources.
Dale then creates the story you want, at the moment you want it.
OF BIG DATA STORYTELLING
FUTURE OF IN
Story
created story
60
WHAT ARE THE
COMPONENTS
OF DALE?
“EVERYTHING THAT CAN
BE CONNECTED, WILL BE
CONNECTED IN THE FUTURE
- WHICH WILL PRODUCE
MASSIVE AMOUNTS OF UNSTRUCTURED DATA.”
Dale consist of four major components that all perform a task in the process of
creating a story based on the user’s input.
TURING, THE ANALYST
Each parameter has a set of key factors that
Dale will use to create a story. For example, the
length of a story determines if the written story
will be a short story or a novel, the genre decides the level of art form in the story as well as
implementing the genre characteristics, the age
determines the linguistic level for the story, if it
is aimed for a child or a grown up, and favourite
author decides in what manner the story shall be
written to copy the particular way of the favourite author. All these input parameters are used
to create a framework for the writing- as well as
the information gathering process.
In the future, every digital device and database will be connected due to the IoT, which
makes it easy for Dale to collect information
and analyse the Big Data needed. Dale will use
the results from the analysis of the user’s input
to find relevant information in the cloud which
will be done by cross referencing the user’s input
with literature databases, online libraries, social
media sites, and other web sources. The information found will be the foundation of the story
Dale creates in the end.
BOND, THE AGENT
Dale have to search for information in the cloud
since almost no information, except personalized data, will be available locally, because the lucrative data is stored at the respective sources sites. It
would also be too expensive to save non local data,
locally, due to the sheer size of it. This approach enables resources that would be spent on data storage
and server maintenance to be relocated to other
more necessary areas of Dale, for example to improve Dale’s network stability, computing power or
optimize his search and/or analysis process.
CERBERUS, THE GATEKEEPER
The data Dale found in the cloud will be structured by passing through a framework. The
framework Cerberus acts as a gatekeeper before
the actual creation of the story, and it does not
only structure the data provided but it also analyses factors that makes sure that the source is legit.
Factors analysed are dependence, authenticity,
tendency, feasibility and credibility which are important factors to revise when criticising sources.
The data Dale collects are passed through Cerberus to make sure that it is legit and to further
ensure the quality of the story created.
Collected data
Verne
Cerberos
Bond
Turing
User data
Keywords
Big Data Cloud
User
VERNE, THE CREATOR
When Cerberus has accepted (or denied) the
data, it will continue to the part of Dale that
creates the story; The Verne, an algorithm (explained in the technology section below) based
on the theory of Narrative Intelligence, Big
Data and Natural Language Generation. Verne
will, with the help of the data collected from the
cloud, write a story. A story that is based on the
user’s input which has been refined and analysed
to provide an optimal storytelling experience by
providing the user with stories tailored for the
user and his/her wants, in any given situation.
TECHNOLOGY
How does it work? How can Dale produce such
great stories in the blink of an eye? The answer is
an algorithm that combines the latest advances
within narrative intelligence, Big Data and natural language generation (NLG).
Story generation is one part of narrative intelligence, and the one part that is especially interesting for this project. There have been several
attempts in the past to create systems capable
of generating stories – some have even been
quite successful like the really vintage Talespin
—— The process of how the users input becomes a story.
(Meehan, 1977) and the more recent example
ProtoPropp (Gervás et al. 2005). Both these systems are built using artificial intelligence-methods like planning or case-based reasoning. A
big drawback of using either of the previously
mentioned methods is that the story generation
heavily relies on an a priori known domain model. Basically the system needs a human to enter a
description of a fictional world before being able
to generate stories, including characters, objects,
places, and the actions that entities can perform
to change the world (Li et al. 2013).
This is ok, but would it not it be really nice
if a system could generate great stories from any
given topic you want? Dale is able to do that by
a method called Open story generation which is
another part of narrative intelligence (Li et al.
2013). Dale uses plot graphs to be able to construct
interesting and coherent narratives. A plot graph
is a set of actions and events and their connection and relations between each other. In previous
works regarding open story generation and plot
graphs, the actions and events are acquired from
a crowdsourced database. This makes it possible to
generate a story of any topic - as long as someone
has contributed to the database with a set of actions and events regarding the desired topic.
61
OF BIG DATA STORYTELLING
FUTURE OF IN
DALES ALGORITHM
Dale takes it to the next level, completely eliminating the
uncertainty of only relying on one crowdsourced database.
Instead Dale takes full advantage of the latest developments in methods for processing and analysing Big Data,
creating a unique plot graph for each story. Below is a simplified version of Dale’s algorithm Verne.
62
CREATION BY NATURAL
LANGUAGE GENERATION
Now it is time to translate the plot graph into a nice written story. Dale now takes all the parameters the user entered into account, for example level of language, favourite
author etc., and generates a coherent good written story
by using natural language generation (NLG). NLG is a
method for a computer to generate text that looks like a
text written by a human (Reiter 2010).
—— The application’s
smartphone interface.
DETERMINATION OF STORY
The first step in Dale’s story generation process is to determine the type of story the user requested, for example
a crime novel in the setting of Stockholm year 2025. Dale
is able to process and analyse patterns in existing stories
by using Part-Of-Speech Tagging (POST) to learn how
similar types of stories are structured. Part-of-speech tagging is, simply put, a process for a computer to identify
to which particular part of speech a word in a text corresponds (Schmid 1994). The main object for Dale’s POST
process is to identify the subject-verb-object sentence
structure. By using advanced and sophisticated language
models, these subject-verb-object structures are translated
to actions and events in a plot graph. This will result in
a lot of different plot graphs which Dale
analyses and merges to one final coherent
plot graph; this is done with some level of
randomness to ensure that a story is always unique.
“DALE TAKES FULL ADVANTAGE OF THE LATEST
DEVELOPMENTS IN METHODS
FOR PROCESSING AND
ANALYSING BIG DATA,
CREATING A UNIQUE PLOT
GRAPH FOR EACH STORY.”
COLLECTION OF BIG DATA
Dale now knows what kind of actions and
events the selected story should contain
and starts searching for appropriate data
to fill the plot graph with, using enormous
datasets collected by all kinds of sensors.
For example if the protagonist is generated to be a 30 year old woman working as a police, then
Dale starts filtering and analysing datasets from all women in the age around 30 who are working as a police. By
doing this, a made up character is being created and the
story now has a protagonist. The same method is used to
generate all the characters, objects and places required by
the story.
DALES USABILITY
Using Dale is very easy and intuitive, due to it being agile, fast and adapted to everyone because of its extensive
personalized settings. Dale lets the user save her favourite
stories or settings, to simplify the storytelling interaction
she wants. The purpose of Dale is not to replace traditional
storytelling in any art form, the purpose of Dale is to provide users with an alternate experience by offering them a
unique way of acquiring fictional stories of any kind, anytime, anywhere. Dale is primarily designed (target group)
for women 35 or older, but can be used by anyone of any
age, gender or occupation.
There are 4 simple steps to go through and voilà, a story
is created.
1.Start up the application and chose the mood of the
story you want by using a slider between fairy
tale and fictional realism.
F
R
E
E
M
i
U
M
- Free of charge
- Branded content & product placement
- Limited personalization settings
2.Enter a minimum of 2 keywords that you want the
story to base on. The more keywords you enter,
the less randomized the story will be.
3.Choose if you want the story to be short, medium
or long.
4.Choose if you want to read the story or listen to the
story.
P
R
E
M
i
U
M
- Monthly subscription
- Free of advertising
- Extensive personalization settings
- Possbility to have a celebrity narrator
—— Logotypes and differentiation of the two
subscription models.
After step 4 the story is automatically generated and
available for you in the form you chose. However, one of
Dale’s greatest features is the ability to switch mode in real-time between text and audio. If you initially opted to read
the story and you decide to switch to listening-mode, Dale
picks up where you are in the story and the narrator voice
starts reading. This is a great feature if your surroundings suddenly changes or if you simply don’t want to read anymore.
63
OF BIG DATA STORYTELLING
FUTURE OF IN
“THE DATA DALE COLLECTS IS PASSED THROUGH
CERBERUS TO MAKE SURE THAT IT IS LEGIT, AND
FURTHER ENSURES THE QUALITY OF
THE STORY CREATED.”
64
REVENUE STREAMS
SUBSCRIPTIONS
Dale is available on every device and operating
system, and can be acquired by their respective
market services (e.g. Google Play or Appstore)
and can be used by subscribing to one out of
two subscription models, Dale Freemium and
Dale Premium. The freemium version is free
of charge but is supported by advertising in the
application (for instance, product placement in
the story) interface and has limited personalization settings. The Premium subscription on the
other hand, features a monthly subscription, free
of advertising that has a more extensive supply
of personalization settings than the freemium
version. Dale Premium also offers the user the
possibility to have a famous person as their narrating voice, as an exclusive perk.
BRANDED CONTENT AND PRODUCT
PLACEMENT
Product placement is an important piece of Dale’s
revenue streams. The product placement will occur
within the stories by replacing objects with branded counterparts in both Freemium and Premium
modes. The featured brand is decided by analysing
the user’s digital footprints, to further enhance the
storytelling experience. The brands featured buy
advertising space from Dale as any regular advertising model, and the brand only pays for each time
their product or brand is featured.
CELEBRITY NARRATION
Celebrity narration is a premium perk that further enhances the storytelling experience and
the premium users have the choice to activate
this feature or not. Premium members have the
possibility to have a celebrity of their choice to
narrate their story for them, or having several
celebrities narrating the voices of the different
characters in the story. The purpose of Dales
celebrity narration is (as we mentioned) to enhance the storytelling experience and use it as a
competitive advantage against industry competitors. By driving celebrities to be a part of Dale’s
celebrity narration system, it energizes Dales
brand as well as the celebrity’s personal brand
which can be seen as an additional selling point.
EXAMPLE STORY
— “Hurry kids you’re gonna be late!” Shouted Elisabeth as she dropped her kids off at the
school entrance. Every morning it was the same
story... She hated that she was such an eternal
time-optimist, always stressed out. But at least
they barely made it.
She thought about how much she missed
reading books. It had an obvious calming effect
on her. Suddenly she realized that she still had
30 minutes to crunch before her meeting at the
headquarters downtown, so she set course for
the newly opened coffee shop, on the road next
to her office-building. Suddenly she remembered that one of her junior colleagues, Michael,
yesterday had recommended her the storytelling-app Dale for her smartphone.
While still in the car, she launches the app
and goes through the simple start up-sequence.
She chooses her age, language and favourite author, the rest she leaves as default. Hm, my favourite author? Obviously J.K. Rowling she thought,
giggling for herself. She had been a lifelong fan
of the Harry Potter saga ever since she was a
little girl, back in the beginning of the century.
The setup was done, that was unexpectedly
easy she thought for herself. Back at the start
screen, Elizabeth read that she was supposed
to type in a few keywords. The day before, her
oldest child had asked her about his homework
regarding the Vietnam War. Unintendedly inspired by this event, and her Harry Potter fascination, she settled for: “Vietnam movement” and “wizard”
as keywords. Ok, moving on. What is this, she thought for
herself and read out loud:
-“Short - Medium - Long”.
Oh I understand, this has to be the length of the story.
Because she only has 30 minutes to spare, she chooses the
medium option and opts to listen to the story, since she is
still in the car.
In the same moment as Elizabeth hits the listening button, the advanced algorithms in Dale go to work. In the
blink of an eye, a short novel with a wizard protagonist,
a Vietnam movement context, is created. The application
gathers information and data from the history of time
connected to her keywords, as well as recent information
like where she has been to dinner or what she has bought
online, since everything is connected nowadays. Not even a
second has passed before a soft narrator-voice starts to read
the unique fictional story out, in her headphones. Wow this
is amazing she thinks, while smiling, as she parks her car
at the café. After finding a place to sit down, an automatic
notification emerges on her smartphone asking her if she
wants to continue to read the story instead of listening.
She tries it out and to her pleasing, the application directly
understands where she is situated in the story. She finishes
the last pages of the story and heads off to work, thinking
that this definitely is an app that she will continue to use.
65
—— Simplified version of our example
storys plot graph
OF BIG DATA STORYTELLING
66
FUTURE OF IN
“BIG DATA WILL BE THE
DIGITAL EQUIVALENT
OF INDIA’S GARBAGE
MOUNTAINS, IF IT IS
NOT STRUCTURED AND
ANALYSED IN A
PRODUCTIVE MANNER,
IT WILL JUST BE DIGITAL
WASTE.”
CONCLUSION
Dale is a fully automatic story generator, able to create a story about
any topic you can think of. So, is this the future? We think it might
be. The idea of computer generated stories is not new and as we mentioned above, Talespin was working on an example of a system capable of generating somewhat coherent stories back in 1977. More
recent examples like the Scheherazade System (Li et al. 2013) by
Georgia Tech also have the knowledge to create stories.
All of the examples of story generators, mentioned above, are
however dependent on human input to be able to work as they are
intended. This is where it excels, since Dale is fully automatic thanks
to its advanced Big Data-algorithms and ground-breaking utilization
of the IoT big potential.
There is one big assumption that has to be made when creating
solutions based on big data, which is the existence of liberal data privacy laws. Our solution, and the future it can exist in, is dependent on
this liberal development which implies that data laws right now need
to be radically changed in the coming years for this future to exist.
REFERENCES
IBM, 2013. Infographic: The Four V´s of Big Data. http://www.ibmbigdatahub.
com/infographic/four-vs-big-data [Accessed: December 1, 2015]
SAS, 2015. Big Data: What it is and why it matters.
http://www.sas.com/en_us/insights/big-data/what-is-big-data.html [Accessed:
December 1, 2015]
67
Gervás, P. et al., 2005. Story plot generation based on CBR. Knowledge-Based
Systems, 18(4-5), pp.235–242. Available at: http://www.sciencedirect.com/
science/article/pii/S0950705105000407 [Accessed November 29, 2015].
Meehan, J. R. (1977, August). TALE-SPIN, An Interactive Program that Writes
Stories. In IJCAI (Vol. 77, pp. 91-98).
Li, B., Lee-Urban, S., Johnston, G., & Riedl, M. (2013, June). Story Generation
with Crowdsourced Plot Graphs. In AAAI.
Mateas, M., & Sengers, P. (1999, November). Narrative intelligence. In Proceedings AAAI Fall Symposium on Narrative Intelligence (pp. 1-10)
Schmid, H. (1994, September). Probabilistic part-of-speech tagging using decision trees. In Proceedings of the international conference on new methods in
language processing (Vol. 12, pp. 44-49).
Narrative Science, 2015. https://www.narrativescience.com/filebin/images/
pageBlocks/Storytelling_Last_Mile.pdf [Accessed 1 December, 2015]
Kopetz, H., 2011. Internet of Things. In Real-Time Systems Series (pp 307-323).
Reiter, E. (2010). Natural language generation. The Handbook of Computational Linguistics and Natural Language Processing, 574-598.