BIG DATA STORYTELLING
Transcription
BIG DATA STORYTELLING
OF BIG DATA STORYTELLING 58 6 BIG DATA STORYTELLING In a digital galaxy far, far away. Seventy years ago, scientists started to attempt to quantify the growth rate of data volumes, something that were referred to as the Information Explosion, today known as Big Data. The definition of Big Data has been widely discussed by both IBM (2013) and SAS (2015), pioneers in the industry that both define Big Data as: Volume, the scale of data and how it increases exponentially over time. Velocity, the analysis of streaming data and their interconnection. Variety, the different forms of data and their origin. IBM include an additional fourth term, veracity, which describes the uncertainty of data and the importance of trust when working with it, and SAS includes and additional fifth and sixth term, variability and complexity, which describes the variation and inconsistency of the data flow, and how to tie different data sources together. We assume that everything that can be connected, will be connected in the future which will produce massive amounts of unstructured data. Subsequently, in the future Big Data will be the digital equivalent of India’s garbage mountains. It will just be digital waste if it is not structured and analysed in a productive manner. The factors proposed above will provide an understanding of the Big Data concept, so we do not only ask ourselves how Big Data can be analysed and structured, but also how it can be used? How can Big Data in combination with the Internet of Things (IoT) (Kopetz, 2011) improve business, leisure and man’s future daily life by enhancing storytelling? ALEXANDER AROZIN, AYESHA AHSAN, DANIEL LINDSTRÖM, SAMUEL LINDBERG FUTURE OF IN DALE STORYTELLING ENHANCED BY BIG DATA 59 —— Dale’s logotype DALE I f storytelling is the next step for Big Data in business (Narrative Science, 2015), then Big Data can be a big actor in storytelling. Introducing Dale, a storytelling application that utilizes Big Data to create stories tailored to your expectations. Dale will analyse the input from the user, such as age, length of story, genre and favourite author. Dales takes advantage of this input and uses it to define keywords and important parameters in order to be able to collect relevant information from various Big Data-sources. Dale then creates the story you want, at the moment you want it. OF BIG DATA STORYTELLING FUTURE OF IN Story created story 60 WHAT ARE THE COMPONENTS OF DALE? “EVERYTHING THAT CAN BE CONNECTED, WILL BE CONNECTED IN THE FUTURE - WHICH WILL PRODUCE MASSIVE AMOUNTS OF UNSTRUCTURED DATA.” Dale consist of four major components that all perform a task in the process of creating a story based on the user’s input. TURING, THE ANALYST Each parameter has a set of key factors that Dale will use to create a story. For example, the length of a story determines if the written story will be a short story or a novel, the genre decides the level of art form in the story as well as implementing the genre characteristics, the age determines the linguistic level for the story, if it is aimed for a child or a grown up, and favourite author decides in what manner the story shall be written to copy the particular way of the favourite author. All these input parameters are used to create a framework for the writing- as well as the information gathering process. In the future, every digital device and database will be connected due to the IoT, which makes it easy for Dale to collect information and analyse the Big Data needed. Dale will use the results from the analysis of the user’s input to find relevant information in the cloud which will be done by cross referencing the user’s input with literature databases, online libraries, social media sites, and other web sources. The information found will be the foundation of the story Dale creates in the end. BOND, THE AGENT Dale have to search for information in the cloud since almost no information, except personalized data, will be available locally, because the lucrative data is stored at the respective sources sites. It would also be too expensive to save non local data, locally, due to the sheer size of it. This approach enables resources that would be spent on data storage and server maintenance to be relocated to other more necessary areas of Dale, for example to improve Dale’s network stability, computing power or optimize his search and/or analysis process. CERBERUS, THE GATEKEEPER The data Dale found in the cloud will be structured by passing through a framework. The framework Cerberus acts as a gatekeeper before the actual creation of the story, and it does not only structure the data provided but it also analyses factors that makes sure that the source is legit. Factors analysed are dependence, authenticity, tendency, feasibility and credibility which are important factors to revise when criticising sources. The data Dale collects are passed through Cerberus to make sure that it is legit and to further ensure the quality of the story created. Collected data Verne Cerberos Bond Turing User data Keywords Big Data Cloud User VERNE, THE CREATOR When Cerberus has accepted (or denied) the data, it will continue to the part of Dale that creates the story; The Verne, an algorithm (explained in the technology section below) based on the theory of Narrative Intelligence, Big Data and Natural Language Generation. Verne will, with the help of the data collected from the cloud, write a story. A story that is based on the user’s input which has been refined and analysed to provide an optimal storytelling experience by providing the user with stories tailored for the user and his/her wants, in any given situation. TECHNOLOGY How does it work? How can Dale produce such great stories in the blink of an eye? The answer is an algorithm that combines the latest advances within narrative intelligence, Big Data and natural language generation (NLG). Story generation is one part of narrative intelligence, and the one part that is especially interesting for this project. There have been several attempts in the past to create systems capable of generating stories – some have even been quite successful like the really vintage Talespin —— The process of how the users input becomes a story. (Meehan, 1977) and the more recent example ProtoPropp (Gervás et al. 2005). Both these systems are built using artificial intelligence-methods like planning or case-based reasoning. A big drawback of using either of the previously mentioned methods is that the story generation heavily relies on an a priori known domain model. Basically the system needs a human to enter a description of a fictional world before being able to generate stories, including characters, objects, places, and the actions that entities can perform to change the world (Li et al. 2013). This is ok, but would it not it be really nice if a system could generate great stories from any given topic you want? Dale is able to do that by a method called Open story generation which is another part of narrative intelligence (Li et al. 2013). Dale uses plot graphs to be able to construct interesting and coherent narratives. A plot graph is a set of actions and events and their connection and relations between each other. In previous works regarding open story generation and plot graphs, the actions and events are acquired from a crowdsourced database. This makes it possible to generate a story of any topic - as long as someone has contributed to the database with a set of actions and events regarding the desired topic. 61 OF BIG DATA STORYTELLING FUTURE OF IN DALES ALGORITHM Dale takes it to the next level, completely eliminating the uncertainty of only relying on one crowdsourced database. Instead Dale takes full advantage of the latest developments in methods for processing and analysing Big Data, creating a unique plot graph for each story. Below is a simplified version of Dale’s algorithm Verne. 62 CREATION BY NATURAL LANGUAGE GENERATION Now it is time to translate the plot graph into a nice written story. Dale now takes all the parameters the user entered into account, for example level of language, favourite author etc., and generates a coherent good written story by using natural language generation (NLG). NLG is a method for a computer to generate text that looks like a text written by a human (Reiter 2010). —— The application’s smartphone interface. DETERMINATION OF STORY The first step in Dale’s story generation process is to determine the type of story the user requested, for example a crime novel in the setting of Stockholm year 2025. Dale is able to process and analyse patterns in existing stories by using Part-Of-Speech Tagging (POST) to learn how similar types of stories are structured. Part-of-speech tagging is, simply put, a process for a computer to identify to which particular part of speech a word in a text corresponds (Schmid 1994). The main object for Dale’s POST process is to identify the subject-verb-object sentence structure. By using advanced and sophisticated language models, these subject-verb-object structures are translated to actions and events in a plot graph. This will result in a lot of different plot graphs which Dale analyses and merges to one final coherent plot graph; this is done with some level of randomness to ensure that a story is always unique. “DALE TAKES FULL ADVANTAGE OF THE LATEST DEVELOPMENTS IN METHODS FOR PROCESSING AND ANALYSING BIG DATA, CREATING A UNIQUE PLOT GRAPH FOR EACH STORY.” COLLECTION OF BIG DATA Dale now knows what kind of actions and events the selected story should contain and starts searching for appropriate data to fill the plot graph with, using enormous datasets collected by all kinds of sensors. For example if the protagonist is generated to be a 30 year old woman working as a police, then Dale starts filtering and analysing datasets from all women in the age around 30 who are working as a police. By doing this, a made up character is being created and the story now has a protagonist. The same method is used to generate all the characters, objects and places required by the story. DALES USABILITY Using Dale is very easy and intuitive, due to it being agile, fast and adapted to everyone because of its extensive personalized settings. Dale lets the user save her favourite stories or settings, to simplify the storytelling interaction she wants. The purpose of Dale is not to replace traditional storytelling in any art form, the purpose of Dale is to provide users with an alternate experience by offering them a unique way of acquiring fictional stories of any kind, anytime, anywhere. Dale is primarily designed (target group) for women 35 or older, but can be used by anyone of any age, gender or occupation. There are 4 simple steps to go through and voilà, a story is created. 1.Start up the application and chose the mood of the story you want by using a slider between fairy tale and fictional realism. F R E E M i U M - Free of charge - Branded content & product placement - Limited personalization settings 2.Enter a minimum of 2 keywords that you want the story to base on. The more keywords you enter, the less randomized the story will be. 3.Choose if you want the story to be short, medium or long. 4.Choose if you want to read the story or listen to the story. P R E M i U M - Monthly subscription - Free of advertising - Extensive personalization settings - Possbility to have a celebrity narrator —— Logotypes and differentiation of the two subscription models. After step 4 the story is automatically generated and available for you in the form you chose. However, one of Dale’s greatest features is the ability to switch mode in real-time between text and audio. If you initially opted to read the story and you decide to switch to listening-mode, Dale picks up where you are in the story and the narrator voice starts reading. This is a great feature if your surroundings suddenly changes or if you simply don’t want to read anymore. 63 OF BIG DATA STORYTELLING FUTURE OF IN “THE DATA DALE COLLECTS IS PASSED THROUGH CERBERUS TO MAKE SURE THAT IT IS LEGIT, AND FURTHER ENSURES THE QUALITY OF THE STORY CREATED.” 64 REVENUE STREAMS SUBSCRIPTIONS Dale is available on every device and operating system, and can be acquired by their respective market services (e.g. Google Play or Appstore) and can be used by subscribing to one out of two subscription models, Dale Freemium and Dale Premium. The freemium version is free of charge but is supported by advertising in the application (for instance, product placement in the story) interface and has limited personalization settings. The Premium subscription on the other hand, features a monthly subscription, free of advertising that has a more extensive supply of personalization settings than the freemium version. Dale Premium also offers the user the possibility to have a famous person as their narrating voice, as an exclusive perk. BRANDED CONTENT AND PRODUCT PLACEMENT Product placement is an important piece of Dale’s revenue streams. The product placement will occur within the stories by replacing objects with branded counterparts in both Freemium and Premium modes. The featured brand is decided by analysing the user’s digital footprints, to further enhance the storytelling experience. The brands featured buy advertising space from Dale as any regular advertising model, and the brand only pays for each time their product or brand is featured. CELEBRITY NARRATION Celebrity narration is a premium perk that further enhances the storytelling experience and the premium users have the choice to activate this feature or not. Premium members have the possibility to have a celebrity of their choice to narrate their story for them, or having several celebrities narrating the voices of the different characters in the story. The purpose of Dales celebrity narration is (as we mentioned) to enhance the storytelling experience and use it as a competitive advantage against industry competitors. By driving celebrities to be a part of Dale’s celebrity narration system, it energizes Dales brand as well as the celebrity’s personal brand which can be seen as an additional selling point. EXAMPLE STORY — “Hurry kids you’re gonna be late!” Shouted Elisabeth as she dropped her kids off at the school entrance. Every morning it was the same story... She hated that she was such an eternal time-optimist, always stressed out. But at least they barely made it. She thought about how much she missed reading books. It had an obvious calming effect on her. Suddenly she realized that she still had 30 minutes to crunch before her meeting at the headquarters downtown, so she set course for the newly opened coffee shop, on the road next to her office-building. Suddenly she remembered that one of her junior colleagues, Michael, yesterday had recommended her the storytelling-app Dale for her smartphone. While still in the car, she launches the app and goes through the simple start up-sequence. She chooses her age, language and favourite author, the rest she leaves as default. Hm, my favourite author? Obviously J.K. Rowling she thought, giggling for herself. She had been a lifelong fan of the Harry Potter saga ever since she was a little girl, back in the beginning of the century. The setup was done, that was unexpectedly easy she thought for herself. Back at the start screen, Elizabeth read that she was supposed to type in a few keywords. The day before, her oldest child had asked her about his homework regarding the Vietnam War. Unintendedly inspired by this event, and her Harry Potter fascination, she settled for: “Vietnam movement” and “wizard” as keywords. Ok, moving on. What is this, she thought for herself and read out loud: -“Short - Medium - Long”. Oh I understand, this has to be the length of the story. Because she only has 30 minutes to spare, she chooses the medium option and opts to listen to the story, since she is still in the car. In the same moment as Elizabeth hits the listening button, the advanced algorithms in Dale go to work. In the blink of an eye, a short novel with a wizard protagonist, a Vietnam movement context, is created. The application gathers information and data from the history of time connected to her keywords, as well as recent information like where she has been to dinner or what she has bought online, since everything is connected nowadays. Not even a second has passed before a soft narrator-voice starts to read the unique fictional story out, in her headphones. Wow this is amazing she thinks, while smiling, as she parks her car at the café. After finding a place to sit down, an automatic notification emerges on her smartphone asking her if she wants to continue to read the story instead of listening. She tries it out and to her pleasing, the application directly understands where she is situated in the story. She finishes the last pages of the story and heads off to work, thinking that this definitely is an app that she will continue to use. 65 —— Simplified version of our example storys plot graph OF BIG DATA STORYTELLING 66 FUTURE OF IN “BIG DATA WILL BE THE DIGITAL EQUIVALENT OF INDIA’S GARBAGE MOUNTAINS, IF IT IS NOT STRUCTURED AND ANALYSED IN A PRODUCTIVE MANNER, IT WILL JUST BE DIGITAL WASTE.” CONCLUSION Dale is a fully automatic story generator, able to create a story about any topic you can think of. So, is this the future? We think it might be. The idea of computer generated stories is not new and as we mentioned above, Talespin was working on an example of a system capable of generating somewhat coherent stories back in 1977. More recent examples like the Scheherazade System (Li et al. 2013) by Georgia Tech also have the knowledge to create stories. All of the examples of story generators, mentioned above, are however dependent on human input to be able to work as they are intended. This is where it excels, since Dale is fully automatic thanks to its advanced Big Data-algorithms and ground-breaking utilization of the IoT big potential. There is one big assumption that has to be made when creating solutions based on big data, which is the existence of liberal data privacy laws. Our solution, and the future it can exist in, is dependent on this liberal development which implies that data laws right now need to be radically changed in the coming years for this future to exist. REFERENCES IBM, 2013. Infographic: The Four V´s of Big Data. http://www.ibmbigdatahub. com/infographic/four-vs-big-data [Accessed: December 1, 2015] SAS, 2015. Big Data: What it is and why it matters. http://www.sas.com/en_us/insights/big-data/what-is-big-data.html [Accessed: December 1, 2015] 67 Gervás, P. et al., 2005. Story plot generation based on CBR. Knowledge-Based Systems, 18(4-5), pp.235–242. Available at: http://www.sciencedirect.com/ science/article/pii/S0950705105000407 [Accessed November 29, 2015]. Meehan, J. R. (1977, August). TALE-SPIN, An Interactive Program that Writes Stories. In IJCAI (Vol. 77, pp. 91-98). Li, B., Lee-Urban, S., Johnston, G., & Riedl, M. (2013, June). Story Generation with Crowdsourced Plot Graphs. In AAAI. Mateas, M., & Sengers, P. (1999, November). Narrative intelligence. In Proceedings AAAI Fall Symposium on Narrative Intelligence (pp. 1-10) Schmid, H. (1994, September). Probabilistic part-of-speech tagging using decision trees. In Proceedings of the international conference on new methods in language processing (Vol. 12, pp. 44-49). Narrative Science, 2015. https://www.narrativescience.com/filebin/images/ pageBlocks/Storytelling_Last_Mile.pdf [Accessed 1 December, 2015] Kopetz, H., 2011. Internet of Things. In Real-Time Systems Series (pp 307-323). Reiter, E. (2010). Natural language generation. The Handbook of Computational Linguistics and Natural Language Processing, 574-598.