paper - Business Process Technology group
Transcription
paper - Business Process Technology group
Master’s Thesis Business Process Mashups An Analysis of Mashups and their Value Proposition for Business Process Management Matthias Kunze [email protected] March 31, 2009 Supervisors: Prof. Dr. Mathias Weske, MSc. Hagen Overdick Hasso Plattner Institute, Potsdam (Germany) Abstract Mashups are an exciting genre of new Web applications that gained considerable momentum, recently. While thousands of mashup applications exist and many software vendors promote “Enterprise Mashup” suites, the term “mashup” itself lacks concise definition and the usefulness of mashups in specific fields of operation remains unclear. In this work, 29 mashups and tools to create mashups have been studied in a qualitative survey that revealed two general types and eight common characteristics of mashups. This survey forms the basis of a general mashup pattern that specifies mashups by their characteristic behavior of aggregating capabilities on the Web. A reference model, complying with this pattern, provides means to understand existing and design new mashups on a conceptual level. In a further step, this knowledge has been used to examine the suitability of mashups in the field of business process management. Specific application scenarios were analyzed comprehensively on the basis of the business process life cycle, revealing significant potential aggregating fragmented process knowledge and providing lightweight process implementations. Zusammenfassung Mashup bezeichnet ein neuartiges Genre von Webanwendungen, das in den letzten Jahren beachtliche Aufmerksamkeit erlangt hat. Obwohl es inzwischen Tausende von Mashups gibt und Softwarehersteller „Enterprise Mashups“ sogar als eigenständige Lösungen anbieten, ist der Begriff „Mashup“ selbst nicht einheitlich definiert und der Nutzen von Mashups in verschiedenen Einsatzfeldern nicht untersucht. In dieser Arbeit wurden 29 Mashups und Werkzeuge zum Erstellen von Mashups einer qualitativen Untersuchung unterzogen, bei der sich zwei grundlegende Arten und acht typische Charakteristika von Mashups ergaben. Die Untersuchung bildet die Grundlage eines allgemeinen Musters, das Mashups anhand ihres typischen Verhaltens spezifiziert. Dieses Verhalten resultiert aus der Aggregation von im Web verfügbaren Diensten. Ein auf Grundlage dieses Musters entwickeltes Referenzmodell gestattet es, existierende Mashups zu verstehen und neue zu entwickeln. Darüber hinaus wurden die erlangten Erkenntnisse angewandt, um Mashups auf ihre Eignung für das Business Process Management hin zu untersuchen. Eine Analyse verschiedener Anwendungsszenarien, die auf Basis des Geschäftsprozesslebenszyklus entwickelt wurden, ergab, dass Mashups ein erhebliches Potential in der Aggregation fragmentierten Prozesswissens und der Umsetzung leichtgewichtiger Prozesse bieten. Acknowledgements The research work presented in this thesis has been carried out from November 2008 to March 2009 at the Hasso Plattner Institute at the University of Potsdam. My appreciation goes to Prof. Dr. Mathias Weske offering me the possibility to conduct this work under his supervision as well as to the colleagues at the Business Process Technology group for their ongoing advice and encouragement. I thank especially Hagen Overdick and Gero Decker for many inspiring discussions. Further, I want to thank the members of the Oryx team that supported me with valuable feedback on my work, as well as Franziska Häger, Jennifer Baldwin, Martin Czuchra, Tilman Giese, and Tobias Vogel for their comments and proofreading. Hereby, I, Matthias Kunze, assure to have written this thesis on my own and that I have declared all sources as well as I have marked citations appropriately. March 31, 2009 Contents Contents 1 Introduction 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Thesis Goals and Outline . . . . . . . . . . . . . . . . . . . . . . . . 2 Preliminaries 2.1 Business Process Management . . . . . 2.2 Mashups and the Evolution of the Web 2.2.1 Web 1.0 . . . . . . . . . . . . . 2.2.2 Web 2.0 . . . . . . . . . . . . . 2.2.3 Situational Applications . . . . 2.2.4 Mashups . . . . . . . . . . . . . 2.3 Remarks . . . . . . . . . . . . . . . . . 2.3.1 Content Syndication . . . . . . 2.3.2 Representational State Transfer 2.3.3 Same Origin Policy . . . . . . . 3 Survey of Mashups and Mashup Tools 3.1 Selection of Samples . . . . . . . . . 3.2 Classification Model . . . . . . . . . 3.3 Synthesis of Survey Results . . . . . 3.4 Types of Mashups . . . . . . . . . . 3.4.1 Organic Mashups . . . . . . . 3.4.2 Dashboards . . . . . . . . . . 3.5 Common Characteristics of Mashups 3.5.1 User Centric . . . . . . . . . 3.5.2 Small Scale . . . . . . . . . . 3.5.3 Open Standards . . . . . . . 3.5.4 Software as a Service . . . . . 3.5.5 Short Time to Market . . . . 3.5.6 Aggregation of Heterogeneous 3.5.7 Data Centric . . . . . . . . . 3.5.8 Lack of Governance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Content . . . . . . . . . . 4 Anatomy of a Mashup 4.1 The Mashup Ecosystem . . . . . . . . . 4.2 Capabilities—Essential Mashup Enablers 4.3 The Mashup Pattern . . . . . . . . . . . 4.3.1 Ingestionv Contents 4.4 4.3.2 Augmentation . . . . . . . . . . . . 4.3.3 Publication . . . . . . . . . . . . . The Mashup Reference Model . . . . . . . 4.4.1 Reference Model Components . . . 4.4.2 Organic Mashups and Dashboards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 48 49 51 53 5 Application of Mashups for Business Process Management 5.1 Goals of Business Process Management . . . . . . . . . . . . . . . 5.2 Business Process Life Cycle . . . . . . . . . . . . . . . . . . . . . 5.3 Value Proposition of Mashups for Business Process Management 5.3.1 Design and Analysis . . . . . . . . . . . . . . . . . . . . . 5.3.2 Configuration . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.3 Enactment . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 55 57 59 60 62 64 67 68 6 Enabling Collaborative Process Design with Mashups 6.1 Analysis . . . . . . . . . . . . . . . . . . . . . . . . 6.1.1 Process Model from Oryx . . . . . . . . . . 6.1.2 Issues from an Issue Tracking System . . . . 6.1.3 Documentation from a Wiki . . . . . . . . . 6.2 Design . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 Mashup Platform . . . . . . . . . . . . . . . 6.2.2 Mashup Architecture . . . . . . . . . . . . . 6.3 Realization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 71 73 73 74 75 75 77 80 . . . . . . 83 83 86 88 88 89 90 7 Conclusion and Outlook 7.1 The Long Tail . . . . . . . . . . . . . . . . . . . . . 7.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . 7.3 Future Work . . . . . . . . . . . . . . . . . . . . . 7.3.1 Governance in Mashups . . . . . . . . . . . 7.3.2 Schema Standardization and Semantic Web 7.3.3 Business Process Management . . . . . . . . References vi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 List of Figures List of Figures 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 HousingMaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yahoo! Pipes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PageFlakes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Growth of Mashups . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mashup Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . General Architecture of an Organic Mashup . . . . . . . . . . . . . . General Architecture of a Dashboard . . . . . . . . . . . . . . . . . . The Mashup Ecosystem . . . . . . . . . . . . . . . . . . . . . . . . . Overview of the Mashup Pattern . . . . . . . . . . . . . . . . . . . . End-to-End Mashup Workflow . . . . . . . . . . . . . . . . . . . . . . Mashup Reference Model . . . . . . . . . . . . . . . . . . . . . . . . Organic Mashups and Dashboards according to the Mashup Reference Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Life Cycle of a Business Process . . . . . . . . . . . . . . . . . . . . . Usage of Process Model and Process Knowledge throughout the Process Life Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Comparison of Business Process and Mashup . . . . . . . . . . . . . Example of a Dashboard that Supports Human Activity . . . . . . . Architecture of the Mashup Platform . . . . . . . . . . . . . . . . . . Architecture of the Mashup Prototype . . . . . . . . . . . . . . . . . Demo of the Mashup Prototype . . . . . . . . . . . . . . . . . . . . . Long Tail in the Spectrum of Software Systems . . . . . . . . . . . . 2 3 4 16 24 28 30 40 44 49 50 53 57 61 63 66 76 78 80 84 Graphics used within this thesis for explanatory and illustrative purposes may be comprised by means of notational languages that express semantics. Diagrams will be labeled with the according abbreviations, as follows. • Business Process Modeling Notation (BPMN) [OMG, 2008] • Unified Modeling Language (UML) [OMG, 2005] • Fundamental Modeling Concepts (FMC) [Knöpfel et al., 2006] vii 1 Introduction As David Berlind, Executive Editor of the online magazine ZDNet, aptly stated: “mashups are the fastest growing ecosystem on the Web” 1 . What he means by this is that mashups are an exciting new genre of Web applications –– “one of the more powerful capabilities coming out of the Web 2.0 wave” [Phifer, 2008]. “Mashup” denotes a new style of composite applications composed of a set of capabilities that may be skill or knowledge, obtained from disparate sources on the Web. Mashups draw upon these capabilities and create value by providing immediate solutions to transient needs and insight through connecting related information. The capabilities referred to here are generally resources that are provided online, such as Web sites. Thus, mashups are sometimes considered as Web page aggregators. The term mashup originates from the pop music scene of the 1990s, when artists blended one or more existing song tracks of usually different genres. Just like these music mashups, that have been most apparent in the hip hop scene, Web mashups combine capabilities of different types and sources in new, unanticipated, and innovative ways using rather simple approaches and techniques for aggregation. Generally, mashups can be built rapidly by intermediate skilled developers using scripting languages, such as PHP, Ruby, or JavaScript. Through the provision of mashup tools even end users are given the ability to create mashups. Such tools simplify the process of composing an application to the degree of visually connecting pre-manufactured software components that map to external capabilities via drag and drop. To emphasize the broad variety of mashups, three typical representatives will be introduced in the following. HousingMaps. The most popular type of mashup is most likely the mapping mashup that visualizes any information related to a geographic position as a location on a map. This allows immediate apprehension of distances and distribution patterns. HousingMaps, a typical example of manually developed mashups, depicted in Figure 1, has been one of the precursors of mashups, leveraging Google maps to display real estate offers from Craigslist2 . HousingMaps allows users to search for houses, apartments, and rooms within a specific region, filtering offers by type, price, and other characteristics. To achieve this, 1 2 http://news.zdnet.com/2422-13569_22-152729.html http://craigslist.org 1 1 Introduction Figure 1: HousingMaps. A manually built mashup. it retrieves offers from Craigslist, sorts and filters them, extracts location information, and uses the Google Maps API to show them on a map. At the first release of HousingMaps, neither Craigslist nor Google offered their services for reuse or recombination. In fact, the developer of HousingMaps, Paul Rademacher, was hired by Google to develop the Google Maps API after he leveraged Google Maps by hacking them to create his mashup. Yahoo! Pipes. Another famous player in the mashup world is Yahoo! Pipes, providing the ability to consume data from all over the Internet, and reformat and aggregate information to create insight. It is not a mashup itself, but rather a Web application that allows users to create information aggregation mashups through drag and drop, shown in Figure 2. It uses the pipes-and-filter pattern [Hohpe and Woolf, 2003], where filters are data operations, connected through pipes, i.e. data channels. The simplicity of its user interface enables people that have a basic understanding of feeds and HTML, as well as a set of operations to aggregate information sourced from many places of the Internet. The resulting mashup is completely deployed on Yahoo!’s infrastructure, hosted and executed on their servers. The mashup’s result is delivered to the user in different formats, such as HTML lists or RSS feeds. Yahoo! Pipes also allows users to display information items on a map. Microsoft Popfly is another mashup tool that provides similar functionality. 2 Figure 2: Yahoo! Pipes. Mashup definition with pipes and filters. PageFlakes. The third example shows a completely different flavor of mashups, compared to the two outlined above. PageFlakes, illustrated in Figure 3, is an application that lets users aggregate information through the composition of modules, so-called flakes. A flake represents specific content obtained from a source on the Internet, such as a calendar, notes, weather forecast, bookmarks or RSS feeds. The collection of flakes on a page provides a quick glance of required information, similar to the dashboard of an automobile. Therefore, PageFlakes is often used as a start page. Such dashboards are a hybrid of mashup and mashup tool. While the actual dashboard instance is a mashup, it allows users to adapt the mashup to the personal needs, offering a set of flakes to add content, at the same time. These kinds of mashups enable virtually anybody to assemble a set of relevant information items, and have become extremely popular within the last years, as the diversity of available dashboard mashups shows. Similar mashups are iGoogle, and NetVibes. 3 1 Introduction Figure 3: PageFlakes. A typical dashboard, aggregating several sources with the help of widgets. 1.1 Motivation Information that is critical to perform tasks and make decisions is often spread across several software systems and transferred inefficiently between those who can provide and those who need to consume data. Users often resort to ineffective means, such as copy-and-pasting information, packaging it in spread sheets or documents and sending it via email. These processes are manual, error prone, and not scalable. Solutions that meet such needs typically entail custom development. Due to the immediate and potentially short-lived need, this is often very expensive and not profitable. The capacity of mashups to deliver information from their source to their consumers, reformat, and aggregate data to create insight by connecting related data promises high value within the operations of organizations. Thus, mashups are predicted to gain significant impact on future IT landscapes of companies, leveraging existing systems to serve immediate needs of end users by the means of simple and rapid creation of ad-hoc applications. At the time of writing of this thesis, a widely agreed, crisp definition of mashups is still nonexistent. While this has considerable drawbacks in terms of creating solutions based on established standards or best practices, it also opens the field for innovative, yet unanticipated solutions. 4 1.2 Related Work Business process management is an established discipline that joins information technology and business economics on the level of the inner processes of an organization that create revenue. The participants involved in business process management are manifold, ranging from strategic positions such as CIO or CTO to process implementing developers and participants that interact with a running process instance. Central knowledge assets of business process management are business processes models. They serve as an agreement among the different phases to carrying out processes in an organization. Information beyond process models is often kept informally, suffering from the same fragmentation and inefficient communication described above. Bringing together information that emerges within business process management and the process model with the help of mashups promises a set of key benefits to all participants. Among these benefits is immediate and simple access to information. Instead of searching through a multitude of systems and places to identify relevant knowledge every time it is required, mashups can aggregate such capabilities and provide fast and efficient access, connecting related information with each other and the process model. End users, such as knowledge workers, are empowered to create their own mashups out of resources internal and external to the organization to support their activities and decision making. 1.2 Related Work While mashups have become quite popular within the last years, there is only limited literature that attempts to comprehend mashups and their concepts. The majority of research articles are contributions from software vendors and focus solely on technical questions. Independent and thorough analysis of mashups is still missing. Mashups were only infrequently represented at conferences, such as WWW, SIGMOD, ICSOC, and OOPSLA in the past, but they seem to be gaining increasing momentum, e.g. in cutting-edge conferences such as Mashup Camp3 and Composable Web4 . Many software vendors have approached the topic by providing their own frameworks or solutions, for example IBM with [Keukelaere et al., 2008, Riabov et al., 2008, Simmen et al., 2008], SAP with [Gurram et al., 2008, Janiesch et al., 2008], and Microsoft with [Isaacs and Manolescu, 2008, Jackson and Wang, 2007]. Furthermore, a lot of discussion about mashups is going on in blogs, along with general discussions about Web 2.0 and Web-oriented Architecture that seems to emerge as a new 3 4 http://www.mashupcamp.com/ http://mashart.org/composableweb2009/ 5 1 Introduction paradigm of service distribution on the Web. Dion Hinchcliffe is one famous author at ZDNet5 frequently discussing mashups in the context of Web 2.0 and business. The whole extent of mashups does not become visible unless one takes these discussions into account, as well as the magnitude of mashups and mashup tools that are offered already. A great resource to get an impression of the quantity of existing mashups is ProgrammableWeb6 . Some attempts have been conducted to formulate reference models or reference architectures for mashups. However, these works limited their scope to aspects of mashups instead of drawing a holistic picture and thus, are of limited general usefulness. [Hinchcliffe, 2006] and [Yu et al., 2008] approach mashups as an aggregation of components of one of the following types: data, functionality, or presentation. While this seemed to be a good idea, it turned out during the work of this thesis that resorting to distinguishing these component types is not feasible considering resources and representations on the Web: Representations, i.e. hypertext documents, are generally a hybrid of structured data, functionality, and presentation directives. This is discussed in detail in Section 4.2. [Bradley, 2007] claims to create a reference architecture for enterprise mashups. Unfortunately, this architecture is based on a business view rather than on software engineering principles and suffers from loss of generality. A similar approach is given by [López et al., 2008], who neglect the characteristic of mashups to also aggregate functionality. A promising suggestion that is also academically well founded is provided by [Abiteboul et al., 2008], who develop a generic but very formal model for mashups. This is of questionable practical use, since mashups emerge from communities that are rather pragmatic, and less likely to adopt complex yet scientifically well founded models. The present work tries to draw upon these models, adapting their strengths yet remaining general and simple enough to cover the whole extent of mashups rather than only focusing on particular aspects. Business process management has been a well established science for more than two decades. Terms and concepts are well understood and consolidated. Most of the statements made about business process management in this thesis relate to lectures at the Business Process Technology group at the Hasso Plattner Institute under whose supervision this work was conducted. Additional knowledge is mainly obtained from [Weske, 2007] and [van der Aalst et al., 2003, van der Aalst and Weske, 2005]. At this time, I am not aware of any advanced literature that examines the value proposition of mashups for business process management. [Casati, 2007] provides a position paper that introduces the idea of combining the two different topics, but 5 6 6 http://blogs.zdnet.com/Hinchcliffe http://www.programmableweb.com 1.3 Thesis Goals and Outline only identifies potential deficiencies of business process management and suggests looking at mashups to solve them. Continuing work on this position paper was not found. Some companies, e.g. The Process Factory7 or Serena8 are also addressing business process with mashups, yet they do not provide scientific work for review. Consequently, this thesis provides introductory work examining the value proposition of mashups for business process management. 1.3 Thesis Goals and Outline As mentioned previously, there is no common agreement for a definition of the term mashup, how they are constituted, and which value they provide to specific business opportunities. Literature is concerned with solving technical problems and providing exciting platforms and software suites rather than analyzing the problem in the same way mashups emerged: bottom up. Thus, the thesis does not assume any definition for mashups but a piece of software that aggregates more than one capability that are made available over the web. This work attempts to gain a comprehensive understanding of characteristics of mashups, the environment in which they have evolved, and their inner workings. The thesis is structured as follows: Section 2 will introduce the background of business process management and mashups, explaining how these topics have emerged over time and which influences led to the state they currently embrace. It will further give related information about distinct topics that are considered fundamental for the understanding of the remainder of the thesis. Section 3 will examine mashups from the ground up. For that purpose several mashups and mashup tools have been analyzed, among others, for their type, purpose, and aggregated capabilities. The result of that survey will be synthesized and a set of common characteristics derived that support further observations and examinations within the thesis. Section 4 leverages results and experience ensuing from the survey to understand the environment that mashups exist in, called the mashup ecosystem. Taking this as a starting point, mashups will be examined at a finer degree of granularity in order to understand their anatomy and will reveal a conceptual pattern common to all mashups. This will be refined to derive a compositional reference model for mashups. The reference model does by no means address a specific architecture or 7 8 http://www.theprocessfactory.com/ http://www.serena.com/mashups/ 7 1 Introduction technology but serves as a tool to understand the workings of existing mashups and the design of new ones. The gained understanding of mashups is essential to study their application on business processes. Section 5 will discuss business process management by means of the business process life cycle, a model that describes the phases a process passes through from design to execution to retirement or improvement through redesign. The section will propose opportunities to improve or support each of these phases and conclude with an assessment of the value of mashups for business process management. The results of the previous sections will be employed to show the applicability of mashups for business process management as well as the value created in a proof of concept, in Section 6. An actual mashup has been implemented that aggregates several sources of knowledge related to a process with the according process model. This mashup comprehensively displays documentation, issues, and feature requests in a single, holistic perspective. Its development and outcome will be documented in this section, discussing occurring issues and gained experience. Finally, the work will be concluded in Section 7. The long tail of software development—a metaphor occasionally used in that context—positions mashups in the overall spectrum of software systems and discusses their potential and value proposition. Further, the essential outcome of this thesis will be reviewed and discussed, addressing ideas for and issues of future work in the same area. 8 2 Preliminaries 2.1 Business Process Management In the early days of computers, applications were built directly on top of operating systems that provided little functionality. Such applications comprised all functionality to conduct a task, each implementing every required component on its own, including data storage. This led to anomalies and inconsistencies, because data used in more than one application had to be copied and maintained among them. In the course of computer history, operating systems provided increasing functionality, and additional layers of functional abstraction and encapsulation emerged between operating systems and specific applications. One of the most influential applications that provided its functionality to applications built on top is the database system. Database systems provide data storage among several applications, eliminating anomalies and inconsistencies of redundantly stored data. Applications that satisfy specific needs were built on top, sharing data with other applications. Soon, it became obvious that not only data, but also functionality should be reused. Business logic being merely dedicated to a specific domain, such as customer relationship management, proved to be expedient among several departments of an organization. This contributed to typical enterprise architectures that are present in today’s organizations. Domain specific, yet reusable software components, offered as services, are built on top of general purpose applications such as database systems. The emphasis of creating use case specific, or tailor-made, applications shifted from programming to composition of these services. Until the process orientation trends of the 1990s that originated in business management, most applications were data driven. Striving for innovation and flexibility, process orientation promised efficient support to align business and information technology. The need to structure businesses along revenue creating processes put the focus on process orientation in information technology and inherently constituted the field business process management (BPM), eventually combining the benefits of process orientation with the capacities provided by the evolution of information technology. Business process management is based on the observation that the value an organization creates is the outcome of a number of steps, or activities, performed in a coordinated manner. While such activities and their fulfillment may be implicit in a company, business process management makes them explicit in the form of business processes. A business process comprises “a set of activities that are performed in 9 2 Preliminaries coordination in an organizational and technical environment” [Weske, 2007]. Business processes are classified by different levels. Organizational business processes are high level processes that help to understand and realize an organization’s goals, ultimately contributing to the organization’s business strategy. They are realized by a set of operational business processes that coordinate the operational activities of an organization. Operational business processes remain independent of particular technology and platforms. Processes are conducted within an organization, centrally controlled by an orchestration agent, while several processes can interact across organizations. Since there is no agent that centrally coordinates these interactions, they are called choreographies. Business Process Management emphasizes the automatic orchestration of processes. This requires formal specification and the explicit representation of processes through process models. A process model “acts as blueprints for a set of business process instances”, which are “concrete cases of the operational business” of an organization [Weske, 2007]. Several notations for process models exist that are essentially similar. Formal aspects allow for validation of correctness of process models, while graphical notations are significant for stakeholders to understand these models. Section 5.2 will introduce business process management goals relevant for this work. A life cycle model for business processes that constitutes several phases will be explained subsequently. 2.2 Mashups and the Evolution of the Web Asked the question, what is a mashup, many people already have a gut feeling and often some very own understanding, but cannot provide a crisp definition. Descriptions range from “ad hoc aggregations of whatever needs to be aggregated” [Hinchcliffe, 2006] to “Frankenstein on the Web” [Hoffman, 2007]. The term mashup is still quite fuzzy and often misunderstood, common agreement exists that mashups are an exciting new genre of Web applications that aggregate capabilities from several Web resources via publicly available interfaces [Merrill, 2006]. What mashups are and how they work is often compared as an analogy to the personal computer. A personal computer is running an operating system, which separates concerns of the control of hardware components from those of applications providing many application programming interfaces (API) that encapsulate low level interaction with, for instance, display, hard drive, and network interfaces. APIs expose a set of higher level functions and thus, make software development much easier. Programmers do 10 2.2 Mashups and the Evolution of the Web not have to worry about the particularities of lower level functionality, any more. Applications simply use those interfaces, which increases development efficiency dramatically. For Web applications the operating system of a personal computer is exchanged with the Internet. Functionality and data is provided online. So-called Web APIs are used by Web applications in a similar way as classic applications use operating system APIs. Many companies expose their capabilities as Web APIs, e.g. Flickr 9 and Amazon 10 , and many non-profit organizations also provide resources that are consumable by Web applications, such as Wikipedia 11 . Exposing any content to the web—even only as a static Web site—can be considered as providing some kind of capability that can be leveraged by others. Mashups are applications that consume several of such capabilities and aggregate—or mash—them in new and innovative ways that were not anticipated before. Sometimes the content providers are even unwitting of the reuse of the capabilities they offer. The notion of mashups using Web APIs as an analogy to operating systems also coined the term “Web as a Platform” or “Internet Operating System” [O’Reilly, 2005]. To understand how mashups can obtain and aggregate capabilities from distributed sources on the web, one needs to understand how the Web moved from pure human oriented document storage to a network of services and machine consumable capabilities. 2.2.1 Web 1.0 The Web (or World Wide Web) was developed in 1989 as a project at the particle physics laboratory of CERN in Switzerland. Tim Berners-Lee envisioned a distributed hypertext system that allowed scientists, even if they were not computer experts, to easily generate, share, and keep track of content without the need to maintain personal copies [Berners-Lee, 1989]. Berners-Lee states that the “Web’s major goal was to be a shared information space through which people and machines could communicate” [Berners-Lee, 1996]. He realized that such a system needed to be decentralized (through unidirectional links), platform independent, and simple, addressing the needs of humans and machines likewise. The latter was established by the central context of structured hypertext—defined in the Hypertext Markup Language (HTML)—that gave plain text semantic meaning through tags. The first 9 http://www.flickr.com/services/api/ http://aws.amazon.com/documentation/ 11 http://wikipedia.org 10 11 2 Preliminaries HTML draft12 did not contain any means to record general metadata about the document, except its title. However, it contained a small set of tags that allowed to structure text logically (through headings and paragraphs) and to mark pieces with specific meanings (such as addresses), supporting his vision of “shaking it, until you make some sense of the tangle” [Berners-Lee, 1989]. Within a few years, scientific interest in the Web grew rapidly and a universe was soon to emerge from its first solitary occurrence. By 1993 more than 500 Web servers were online and it was nearly impossible to keep up with the list of published content. The Web became increasingly complex and demanded for a way to search for content. People started to manually assemble indexes of other Web pages, which was inefficient and turned out to be insufficient. At that time, search machines appeared, Lycos being one of the most famous ones [Hoffman, 2007]. Automatic cataloging of content was greatly supported by the introduction of the META tag in HTML that allowed supplying metadata about the content, such as title, description, keywords, or language. Soon, the Web matured and became a global network for everyone, not just scientists. Enterprises began to value global communication and sought for means to establish their business electronically over the Web. Ordinary people collaborated in content creation, providing personal home pages or sites that reveal information about specific topics. The Web has been developing into a network of resources that provided knowledge and skills, e.g. online telephone directories. 2.2.2 Web 2.0 The term Web 2.0, originally credited to Dale Dougherty and Craig Cline, yet made popular by the famous article “What is Web 2.0” from Tim O’Reilly [O’Reilly, 2005], does not describe a technological revolution of the Web. It rather describes a changed perception of the Web—an evolution of people and devices that drive the Web [Amer-Yahia et al., 2008]. This evolution, demarcates a paradigm shift from a Web of publication to a Web of participation. This became visible through the usage of new metaphors, including but not limited to tags instead of categories, wikis instead of content management systems, and blogs instead of personal homepages. [Watt, 2007] identifies three core patterns of Web 2.0: service, simplicity, and community. These patterns describe common characteristics among Web 2.0 applications. 12 http://www.w3.org/History/19921103-hypertext/hypertext/WWW/MarkUp/Tags.html 12 2.2 Mashups and the Evolution of the Web Corresponding to O’Reilly’s article, this list needs to be complemented by another entity that is essential for mashups: data. Service. New devices, such as smart phones, and new approaches to Web applications, such as AJAX, required new interfaces to existing functionality and data. The endeavor to serve such a multitude of new devices, now and in the future, led to reconsidering system design and eventually decomposition of existing software systems into services. Legacy applications were disassembled or wrapped with service adapters to encapsulate functionality into coarsely grained components. This supports decoupling and thus, reuse of capabilities among different types of applications and devices, which suggested unlocking new business opportunities. In the course of Web 2.0, many applications were moving to such a service model, where applications are run in user agents, i.e. Web browsers, and access core functionality as service over the Internet. These services form the APIs of the “Internet Operating System”, making the Web tick as a platform and enabling efficient development of Web applications on top. Data. While functionality can be—more or less—easily replicated among vendors, data has become one, if not the most important, building block of Web 2.0 applications, since functionality is generally only useful in combination with data. Data ownership and data leadership are key factor for online businesses [O’Reilly, 2005]. However, it is a crucial task to maintain and enhance data. Users are valuable resources to enhancing data with metadata. They can categorize data items through folksonomies (tags), identify and eliminate duplicates, complement missing information, assess data quality through feedbacks and reviews, and identify related information through recommendations. This is only possible if the data is made available for users. With participation of users in content generation, communities create and enhance magnitudes of relevant and high quality data, more data than a single body could own or even manage. Examples are the numerous blogs and wikis that have sprouted in recent years, taking Wikipedia as the most prominent example outperforming commercial online encyclopedias in timeliness, amount, and quality of information. Simplicity. The Web 2.0 paradigm gained momentum within the last years and applications became easier to use and to develop. Web applications moved beyond displaying content on static pages to retrieve external information. They are now 13 2 Preliminaries characterized by interactive and visually rich user experience, mainly due to the employment of AJAX (cf. [Garrett, 2005]). Web 2.0 is generally driven by communities. So are applications and the services they are built on. This led to the evolution of open standards that have low entry barriers and thus, allow for easy application development. Among these open standards is syndication that lets users subscribe to streams of uniformly structured content, which in turn allows for simple machine consumption. In fact, syndication became the most famous and applied means to consume data via APIs. Semantic Web denotes the meaningful data structuring and annotation with metadata and became a relevant academic topic. Concepts of Semantic Web actively support machine consumption of data, eventually leading to automatic knowledge acquisition systems. Another prominent example for the employment of simplicity is the famous architectural style Representational State Transfer (REST) [Fielding, 2000] for Web applications that simplified and unified access to resources dramatically. Community. Web 2.0 has caused a shift in the way users participate in and with the Web, affecting the way users organize, access, and use information. Upcoming applications hosted on providers’ costs motivated a whole generation to participate in content generation, where a contributor gains more from the system than they put into it, as described above. The human urge to communicate, argue about opinions, and share new ideas is a major force driving the Web of documents into a Web of participation. Small communities further benefit from effects of self-control through social aspects, where the cohesion of the community prevents abuse and finds ways to handle problems internally. Users that behave inappropriately, provide false or poor information will be punished via social means such as negative feedback or even exclusion. The effect of communities maintaining and enhancing information is commonly referred to as “wisdom of crowds” [O’Reilly, 2005]. Beyond data, communities also affect services and simplicity. Communities of developers strive for simple yet powerful solutions. Among a given set of alternatives this applies to a form of survival of the fittest, leading to a smaller set of proven and broadly adopted concepts and technologies. 14 2.2 Mashups and the Evolution of the Web 2.2.3 Situational Applications With his seminal article “Situated Software” [Shirky, 2004], Clay Shirky describes one of the core demands of the fast changing Internet ecosystem: simple applications that satisfy an immediate need within a specific social context. While flexibility is an invariable goal of software development, the “Web School” focuses on scalability, generality, and completeness, to address a large amount of users with their applications. However, a lot of situations exist that require a fast solution that is just good enough and addresses only a small group of users. When development of such applications was costly and needed IT experts due to the integration of required resources, such needs were neglected again and again. The evolution of the Web fuels into satisfying these needs. Services expose specific pieces of the expertise that is needed, simplicity allows non-programmers to consume and combine these services. Data can easily be obtained from freely available resources to create insight among several dimensions of a problem. The knowledge to combine data and functionality to satisfy a specific need is provided by domain experts, the users of the software at hand. [Jhingran, 2006] argues that situational applications have a rather transient existence. They will either outlive their usefulness—when the need expires—or migrate to a more sophisticated solution due to increasing demands. Disposing situational applications that have outlived their usefulness is not critical because the cost to create them is less than the value they add. 2.2.4 Mashups [Clarkin and Holmes, 2007] describe mashups as agile views composed of simple services. Such services, or more general capabilities, aim to satisfy one specific objective, rather than providing a complete solution suite and thus, stay application independent. Mashups stem from the reuse of existing resources, facilitating rapid, on-demand software development. Specific capabilities allow the mashup developer to create applications that correspond to a specific need in a specific context. The rise of mashups can be seen as a long existent need that finally met the opportunity to gain momentum. The evolution of the Web let services emerge and communities reuse those capabilities. Mashups simply combine several of these capabilities for their own good, solving specific needs through the expertise of others. While one basic value proposition of mashups still remains the satisfaction of tran- 15 2 Preliminaries sient needs, they grew beyond situational applications, already. HousingMaps, being used by thousands of users every day, is not a situational application anymore. Nor did it outlive its usefulness. However, it still does not own any of the data it provides, offering just the skill to combine real estate offers with visual mapping capabilities. 3000 2500 2000 1500 1000 500 0 Figure 4: Growth of Mashups, according to [Yu, 2008] Figure 4 depicts that mashups did not experience the often predicted hype or a boom since they emerged in 2005, but rather show steady growth. The diagram depicts the number of mashups registered at ProgrammableWeb at a certain point in time. In average, 94 new mashups are registered each month [Yu, 2008]. While they are still maturing, mashups are gaining more and more popularity, promising substantial advantages to individuals and enterprises as well. These advantages include reduced cost and improved productivity in application development due to lightweight composition and reuse. Through aggregation of widespread knowledge, they create value and uncover new insights. 2.3 Remarks The following sections explain the terms and concepts of some topics referred to in the course of this work. They are essential for the understanding of the thesis contents and found the basis for reasoning within discussions. 16 2.3 Remarks 2.3.1 Content Syndication The term content syndication denotes a concept to structure and publish content of Web sites and other Web data in an agreed format, independent of any visual layout. The first occurrence of content syndication dates back to 1995, when the Meta Content Framework was developed for content syndication by Apple Computers, a proprietary data format to structure the content of websites with the help of metadata. Several vendors followed with their own formats that, due to its rising popularity, leveraged XML as structural markup. In early 1999, Netscape released RDF Site Summary version 0.9, the first version of the probably most famous syndication format yet, commonly referred to by its acronym: RSS. The Resource Description Framework (RDF) [W3C, 2004] allows semantically describing data with the help of XML markup and refers to the concepts of Semantic Web. Until 2002 RSS underwent several changes, mostly simplifications, and was finally released as Really Simple Syndication version 2.0 that does not leverage RDF anymore. Usage of the term RSS refers to RSS 2.0, hereafter. As of 2003 copyright of RSS 2.0 was owned by Harvard University and its development was frozen ever since, which is also stipulated in the official specification document [RSS, 2007]. It was Sam Ruby, who initiated discussion and the development of a successor of RSS that should overcome the deficiencies of RSS, being open to development and extension for everybody and vendor neutral. Development of the so called project “Atom”, went fast and the Internet Engineering Task Force published the Atom Syndication Format in July 2005 [Nottingham and Sayre, 2005], referred to as Atom, hereafter. While RSS is still extremely popular and widely used, it becomes apparent that it won’t be developed further, due to the rising popularity of Atom that found a highly valuable use in the Atom Publishing Protocol (AtomPub), an application level protocol for editing and publishing resources on the Web [Gregorio and de hOra, 2007]. The next version of Atom, which has not been announced yet, is supposed to take over RSS and Atom eventually. In order to avoid confusion and loss of generality, the concept of syndication, represented by both formats, RSS and Atom, will be referred to as content syndication, hereafter. 17 2 Preliminaries 2.3.2 Representational State Transfer This work will refer to Representational State Transfer (REST), an architectural style that was first described by Roy T. Fielding in his PhD thesis [Fielding, 2000]. While many institutions and individuals claim to understand the principles defined in that work, recent discussions revealed confusion and a lack of expertise in that topic13 . A short introduction of the primary principles of REST will be given below, whereas a good introduction into the topic, its background, and best practices are given in [Richardson and Ruby, 2007]. [Overdick, 2007] discusses the term ResourceOriented Architecture in the context of REST and Web 2.0. The term architectural style denotes a coordinated set of constraints that restrict architectural elements and the relationships among them. However, a style is only an abstraction of those elements and relationships. The instantiation of an architectural style is an architecture constituted by the elements constrained by the style. A system’s implementation is the instantiation of an architecture. REST is an architectural style that refers to interactions within applications distributed over the Web. REST can be considered an approach to reengineering the Web, elaborating the concepts that made it successful and evading flaws that turned out to be a disadvantage. Thereby, the style focuses especially on the architectural aspects that are inherent to distributed interactions through hypertext at the scale of the Internet [Fielding et al., 2002]. These aspects are data elements and connectors that enable interaction between components. Thus, REST embraces the core principles resources, representations, and a uniform interface. Components are the endpoint applications that eventually communicate with each other, such as the Web server and Web client application. Resources. Resources are the principal architectural abstraction of information in REST. Referring to [Fielding, 2000], a resource can be anything that is worthy of having a name, or to be more precise, a unique identity. Thus, a resource can be perceived as a concept. Valid incarnations of such concepts are digital documents, provided services, or even real-world objects such as a person. Fielding comprehends resources as a temporally membership function, that maps to a set of entities or values at a certain point in time. 13 http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven 18 2.3 Remarks Representations. Actions conducted within REST style architectures are performed through representations, snapshots of a resource’s current or intended state that is transferred between components. This is an important aspect of REST: Instead of invoking methods remotely that change the value of a resource, a representation of the resource’s state is transferred to the components that intend to change it. Representations can be manifested in any form that can be exchanged in a distributed hypertext system, which is essentially a digital message entity. As [Fielding, 2000] defines, a representation consists of data that constitutes the snapshot of a resource’s state, metadata that describes that data and serves to understand it, e.g. by machines, and occasionally metametadata that describes the metadata, e.g. to verify the integrity of data and metadata. Uniform Interface. While resource and representation are the central data elements of the architectural style, its connectors are described through an interface definition, which, if appropriately implemented, enables effective and efficient interaction between the components within a distributed hypertext system. Since REST is an abstraction of an architecture, it does not state methods of an interface, but rather constraints, the interface must comply with. REST is often mistakenly confused with HTTP [Fielding et al., 1999]. HTTP is just an implementation that complies with the uniform interface definition of REST and is today’s most common transport protocol on the Web. The following explains the constraints imposed by the REST uniform interface and its application in HTTP. Universal Identification of Resources: The identity of resources must comply with one single naming scheme within the hypertext system, and all resources must only be identified through these identities. URI [Berners-Lee et al., 1998] is the naming scheme employed on the Web and in HTTP in particular, including its subset URL [Berners-Lee et al., 2005]. Self-descriptive Messages: Interaction among components in REST is realized by a request-response-pattern of transferred messages. Each message, i.e. requests as well as responses, must contain all information that is required by the receiving component to understand that message. This implies statelessness among requests: one request must not depend on another request and be understood without prior knowledge or preceding interaction. HTTP defines the format of each message comprising mainly a verb, a header, and a body element. The verb identifies the intention of the message, i.e. whether to retrieve or update the status of a resource through representation. The header contains information about the representation, especially its data 19 2 Preliminaries format. The representation of the resource’s state itself is contained in a body, which is optional depending on the intention of the message. Manipulation of Resources through Representations: Resources must not be manipulated except by changing their state through transferring a representation that describes its intended state. The representation of resources is generally contained in the body of an HTTP message. A resource can be updated by expressing an intention through the according HTTP verb and providing a representation of its new state in the body of the request. Hypermedia as the Engine of Application State: Since all interactions are stateless, the state of a resource is manifested in its representation. In turn, the state of an application is made up of the state of all involved information entities. In REST, resources are the key abstraction of information. Thus, application state is communicated in the form of representations. These representations must be hypertext, i.e. contain all required information to advance the state of an application. These are the identity of related resources and information about the messages to transfer. By that, an application can be carried out, starting with one single resource identity. All subsequent interactions to advance the state are communicated through according representations. HTML [Raggett et al., 1999, Pemberton, 2002] that emerged along with HTTP uses hyperlinks to address related resources and provides forms as a means to update the state of related resources through a representation that is the serialization of the form’s content. However, applications that conform to REST do not need to use HTML; any hypertext or hypermedia format for representations is valid. 2.3.3 Same Origin Policy Providing services on unobtrusive platforms such as Web browsers has the significant benefit of reaching many people and lowering barriers to adoption. In the ideal case, applications running in Web browsers don’t even require browser plugins such as Flash or Java. While content could be processed completely on a Web server and presented to the user as a static HTML document, this hinders interaction. A more responsive way for Web applications is provided under the term AJAX that is an interaction paradigm rather than a technology, according to [Garrett, 2005]: Content and functionality are obtained on demand from the server. However, such Web applications generally suffer from browser security restrictions, namely the same 20 2.3 Remarks origin policy14 that denies obtaining Web resources, i.e. capabilities provided on Web servers, that are located on a different host than the original HTML document that constitutes the Web application itself. The general understanding of mashups is that they aggregate capabilities from different sources, that is, different origins. Thus, the same origin policy places a burden particular on mashups. While a solution in the form of controlled relaxation of that policy is likely to be implemented in future releases of Web browsers, crafty developers have found ways to circumvent this restriction, already. The first workaround is called On-Demand JavaScript15 , or JSON with Padding16 (JSONP), and relies on an exception of the same origin policy. This exception allows the retrieval of related documents of a page, such as images, style sheets, and JavaScript files, from other hosts by adding according links to the page’s document object model (DOM). The JSONP approach dynamically manipulates the DOM of a page, adds links that refer to remote script locations, and loads JavaScript files on demand from any origin. Information can be transferred by this approach, if the data is formatted as JSON [Crockford, 2006] and encapsulated into a function call that is executed when the file is loaded. The drawback of this method is that it only allows the retrieval of information using the browser’s routine to load files from a remote location, any sophisticated communication to the server, such as updating resources, would need to be encoded in the requesting URI. This would violate the specifications of HTTP [Fielding et al., 1999]. As a consequence, this approach is generally useful to load related functionality on demand or data that is accessed in a read-only manner. The second workaround is the establishment of an AJAX proxy on the same Web server the Web application is loaded from. Such a proxy receives an AJAX request from the client application that encapsulates a request to a remote resource. The encapsulated request is unpacked and forwarded it to its original destination, through HTTP communication elaborated on the server site. Compared to the JSONP approach, proxies allow relaying requests of any type, i.e. HTTP verbs. This method further enables establishing caches between the Web application and the original remote content provider, thus reducing network traffic and load on the latter. Unfortunately, this solution requires Web application providers to establish such a proxy and users to trust the application provider to handle transmitted data with the appropriate responsibility. 14 http://en.wikipedia.org/wiki/Same_origin_policy http://ajaxpatterns.org/On-Demand_Javascript 16 http://ajaxian.com/archives/jsonp-json-with-padding 15 21 3 Survey of Mashups and Mashup Tools Due to the community driven emergence of mashups, sprouting everywhere as a result of network effects, rather than being dominated by few bodies, exact definitions of the term mashup are open to debate. It is even questionable, whether mashup can be regarded as a term defining a class of applications or if it is rather an idiom that captures a phenomenon of developments that are related to some extent. Hence, the question at hand is not “What properties do mashups separate from other Web applications?”, but rather “Which properties do mashups share?”. This chapter attempts to answer the latter question, by studying 29 mashups and mashup tools, aspiring to give an overview of the dimension of the mashup universe. Although the focus of this survey is mashups, mashup tools were considered, too. Giving end users the ability to create mashups without any programming efforts and in order to speed up mashup development, many tools have emerged recently [Yu et al., 2008]. Following predictions of the remarkable business value of mashups, such as in [Hof, 2005], so-called enterprise mashups have gained more attention and have influenced the mashup universe from a business perspective. As a consequence, the survey regarded manually assembled mashups as well as those created by a variety of mashup tools, whereas the assisting facilities of these tools were not of primary interest. Related work, mainly considering tools, has been done by [Hoyer and Fischer, 2008, IBM Corporation, 2008]. 3.1 Selection of Samples In order to obtain an independent population of sample mashups, two sources were consulted. Refer to the appendix for a detailed list of the samples as well as the evaluation of the examined properties. ProgrammableWeb. ProgrammableWeb is the most prominent online information source concerning mashups. Established 2006 by John Musser, ProgrammableWeb keeps track of a large variety of mashups and Web APIs, providing detailed information, examples and links to their origins. Further on, ProgrammableWeb offers information about the overall development of mashups, providing a magnitude of statistics. It is also the first address, if one is looking for advice to implement a mashup, search for appropriate APIs, or conduct a survey among mashups. Thus, it was selected to contribute to the present survey. 23 3 Survey of Mashups and Mashup Tools Mapping (37%) Photo (11%) Shopping (10%) Search (8%) Video (8%) Travel (7%) Social (5%) News (5%) Music (5%) Messaging (4%) Figure 5: Mashup Categories: Distribution of tags among registered mashups, from http://www.programmableweb.com/mashups Since ProgrammableWeb lists more than 3600 mashups at this time of writing, with three more mashups added each day in average, a selection of a relatively small yet well distributed subset had to be determined: For each of the five most popular tag categories, the four most popular mashups that combine more than one source were chosen, excluding duplicates. Popularity of tag categories is determined by the number of mashups tagged accordingly (cf. Figure 5). Popularity of mashups within a category is given by the ProgrammableWeb search API. This set was retrieved using ProgrammableWeb’s own Web API17 . Unfortunately, some of the listed mashups were not available anymore, resulting in a list of 18 mashups and mashup tools. These are: 2008 US Electoral Map, Adactio Elsewhere, Afrous, Albumart, Baebo, Flash Earth, Forbes List of World’s 100 Most Powerful Celebrities, Gaiagi Driver, PageFlakes, Sad Statements, sampa, SecretPrices, Sporting Sights, TuneGlue°, Twitter Top News Trends, Vdiddy, Weather Bonk, and Wiinearby.net. Market Overview of Enterprise Mashup Tools. As it turned out, the list of mashups extracted from ProgrammableWeb contained only non-commercial mashups, and only a few mashup tools. However, as preliminary work for this thesis indicated, mashups have been considerably influenced by applications labeled Enterprise 17 http://api.programmableweb.com/ 24 3.2 Classification Model Mashups and according tools. A recent survey among enterprise mashup tools was conducted by [Hoyer and Fischer, 2008]. While that survey focused on the specific tool aspects and cannot be used directly for the present survey, the list of the reviewed mashup tools was reused here. Eliminating those tools that were not available for inspection, the following tools were reviewed: Dapper Factory, Google Mashup Editor, IBM Mashup Center, Intel Mash Maker, Jackbe Presto, Microsoft Popfly, NetVibes, SAP Enterprise Mashup Platform, Serena Mashup Suite, Yahoo! Pipes, and iGoogle. 3.2 Classification Model The survey of mashups and mashup tools here is a rather exploratory examination of each mashup contained in the lists above. Since there is no definition of mashups, it was quite difficult to specify properties to assess each mashup. Thus, the review process consisted of several passes, the first to gain an overview of mashups and their properties, and to compile a list of evaluation criteria. This list is presented below. In the second pass, mashups were examined for those criteria in detail. A preliminary classification of mashup type was assembled, but refined later, resulting in two classes for that property. In the third review pass, all mashups and mashup tools were classified according to this differentiation. name: The name of the mashup or mashup tool as referred to hereafter. url: The URL of the mashup or, if it is not publicly available, of a site that gives further information about the mashup. The latter is the case for most mashup tools. category: A list of tags that categorize mashups according to their essence. The specific purpose of mashup tools is not unveiled until a user creates a mashup with them. Thus, mashup tools are categorized as “tool” here. description: A short summary of the mashup’s purpose, including additional information of how it works, and further information if necessary. mashup type: Determines whether the mashup or tool-created mashup is of the type organic mashup or dashboard. Refer to 3.4 for more information. 25 3 Survey of Mashups and Mashup Tools input: Specifies information required from the user, such as search terms, product names, etc. output: Specifies the primary information that results from the mashup, as well as the form it is presented in. alternative output: Specifies alternative forms of output, if provided. capabilities: Lists the capabilities or capability types that are aggregated by the mashup, or the mashup created by a tool, respectively. aggregation location: Describes where the different resources are aggregated, i.e. whether this happens on a Web server that hosts and executes mashup logic, on the client’s Web browser, or on both. aggregation type: Describes how aggregation happens. In some, no real aggregation happens beyond simultaneous display of data within a page, such as in several lists. technology: Describes the technology used to access and aggregate resources. Due to their technical diversity, mashups were examined through reengineering, analyzing network traffic, and decomposing parts of the application according to the specific cases. A completely objective and standardized analysis of each mashup was not feasible. In some cases, information could not be obtained, because application logic was hidden. Nevertheless, I believe that this survey yields valuable insights and a relevant overview. The set of samples is not considered large enough to support a profound statistical analysis, supplying validity among all imaginable mashups in particular. However, it serves well to synthesize and verify a set of commonalities among mashups in general. These are presented in the following. 3.3 Synthesis of Survey Results The survey supports the initial assumption of mashups being a phenomenon or genre of Web applications, rather than a specific technology or architecture. The spectrum of mashups is tremendous and while mashup tools and platforms such as Yahoo! Pipes or iGoogle are perceived to be more popular, mashups that are programmed 26 3.4 Types of Mashups by hand outweigh existing tools by far. Yet, due to their limited applicability to a specific situation they are not that famous. Mashups embrace the concept being served on demand. A mashup is just a description of how to combine capabilities upon its instantiation; its explicit representation is a model. The instantiation of this representation puts the mashup application into an execution context that allows accessing capabilities and interacting with the user. Yahoo! Pipes even provides a visual representation of the created mashup models. The term mashup is generally used to refer to the genre, the application representation, or an instance at the same time. 3.4 Types of Mashups During the survey of mashups and mashup tools it became apparent that end user mashups can be partitioned into two basic types. These types differ mainly in their architecture and resulting user interface. This partitioning is ascribed to different application scenarios of mashups. There is no common agreement of mashup types among related work at the time of writing this thesis. Thus, the terms organic mashup and dashboard are introduced here to improve understanding in the course of this thesis. Early mashups were created manually to serve a specific, situational purpose of a small group of users. Such a purpose was rather defined by a use case than the needs of an individual person. Due to the manual creation that involves at least some expertise in computer science, e.g. programming, these mashups will be referred to as organic mashups, hereafter. After the first wave of adoption, enterprises started to engage in mashup development. However, instead of serving a situational use case, enterprises aimed to serve the needs of individuals—customers—offering them aggregate sites that were similar to portals. Such aggregate sites allow users to dynamically assemble any piece of information the individual was interested in, for instance, visualized key performance indicators of their business. Hence, the metaphor of an aggregate panel of gauges showing these performance indicators. In the following this type of mashup will be called a dashboard. The distinction between organic mashups and dashboards is virtual and rather continuous. Single mashup instances may transition between the two types, embracing 27 3 Survey of Mashups and Mashup Tools characteristics of both. One such example is WeatherBonk, an organic mashup that is not customizable by end users. Still, it leverages the widget layout of dashboards. 3.4.1 Organic Mashups Organic mashups can be regarded as the origin of mashups in general. The first mashups such as HousingMaps, combined disparate sources of different formats and assembled their output through programming. At that time Web APIs were not common, and authors had to disassemble code and resource access mechanisms on their own to obtain capabilities. The result have been mashups serving particular use cases that were beneficial for a specific group of users. For HousingMaps the use case is to find a place to live within a geographic proximity, visualized on a map. This type of mashup is classically more information centric, which is why they are often referred to as data mashups. [Jhingran, 2006, Merrill, 2006, Simmen et al., 2008, Hoffman, 2007] even account only for data mashups in their considerations. Dev. User ▼ Mashup Definition Mashup Application ▼ fixed set of capabilities { ... Figure 6: General Architecture of an Organic Mashup (FMC) The term organic is devoted to the way these mashups are created: A finite set of resources is aggregated in sophisticated ways, which requires a basic understanding of how to aggregate information streams, statistics skills—typical skills of a programmer. As shown in Figure 6, the overall architecture of such a mashup is rather static: the set of aggregated capabilities, the logic to combine them as well as the presentation logic required for visualization. Regardless of the static architecture, organic mashups can provide dynamic, real time insights through accessing capabilities on demand, obtaining freshly updated content. Organic mashups are usually developed by few people compared to the number of users. Nevertheless, developers themselves belong to the group of users quite often. 28 3.4 Types of Mashups With the release of Yahoo! Pipes in early 2007, the manual implementation of organic mashups received tool support. More tool vendors entered the market: Afrous and Microsoft Popfly, to name a few. These tools allow aggregating streams of information with advanced operations and present them by means of rich visualization. Such tools often promise ease of creating mashups without any IT skills. From personal experience, aggregating information streams with operators is a rather advanced task that requires a basic understanding of capabilities’ data formats and aggregation algorithms. 3.4.2 Dashboards As mentioned above, dashboards largely gained popularity when enterprises started to look into mashup development. Mashups promised a new way of reusing existing services through an appealing interface, facilitating aggregation and combination of content without involvement of the IT department. Soon, the term Enterprise Mashup was coined for dashboards that mashed corporate performance indicators and applications, similar to portals [Gurram et al., 2008]. Portals are driven by corporate entities, e.g. the IT department. That means, they are designed and deployed centrally and similar for every user. Some portals allow for customization, but still retain their basic content and layout. The content of portals generally needs to be deployed to the portal backend [Bellas, 2004]. Dashboards, on the other hand, are driven by end users and allow them to select, combine, and aggregate capabilities freely, corresponding to the individual’s very demands. Due to the visual assembling of content and their similarity to portals, dashboards are sometimes referred to as aggregate sites. The key driver of dashboards allowing such simple content aggregation is the widget. The term widget is derived from window and gadget and denotes a graphical user interface component that embraces a closed set of functionality and visualization in its own independent life cycle. Widgets are essentially small applications that can be composed into one large application, providing means for interaction between and coordination among them. Through the metaphor of widgets, mashups typically unify design-time and run-time support at once. Widgets can be rearranged within the dashboard at any time, and there may be widgets that themselves allow for browsing widget repositories or develop new widgets, as in SAP EMAP [Gurram et al., 2008]. Dashboards are hybrids of a tool that allows users to create and customize the personal site and a platform to host and execute the created mashups. This integration 29 3 Survey of Mashups and Mashup Tools User Dashboard Dashboard Configuration ▼ ▼ Widget 1 ... Widget 2 ▼ Dynamic Set of Capabilities { ... Figure 7: General Architecture of a Dashboard (FMC) approach makes them valuable for corporate users and individuals as well, since there is no need to set up or host any artifacts on the client’s computer. Figure 7 shows the abstract architecture of dashboards. Usually the user and developer, the person that assembles the site, are the same. Programming is only required for the creation of widgets, which can be simply organic mashups wrapped within a container that complies with the dashboard platform. Widgets access their respective Web capabilities on their own. In contrast to organic mashups, dashboard tools allow virtually anyone to assemble their own mashup without technical expertise. Rich interfaces allow for drag-and-drop and simple layout of widgets. Lately, companies have begun to research possibilities of enhancing interaction between widgets. Thus, widgets can dynamically react to state changes of other widgets through message sending communication and act in a shared coordination. Implementation of such communication channels is a highly relevant topic, because the same origin policy (cf. Section 2.3.3) of browsers restricts communication across widgets of different domains. Discussion of this particular problem is beyond the topic of the current work, detailed information and current solution approaches are provided in [Abiteboul et al., 2008, Gurram et al., 2008, Jackson and Wang, 2007, Keukelaere et al., 2008, López et al., 2008] 3.5 Common Characteristics of Mashups The review of the mashups and mashup tools listed above yielded the synthesis of the following qualities that are characteristic for mashups. Not all mashups embrace 30 3.5 Common Characteristics of Mashups each of the listed properties and while they may not provide a conclusive definition of mashups, they give an indication of whether an application is a mashup. The findings of the survey are largely of a technical nature, accounting for the scope of this thesis. [Novak and Voigt, 2006] provide a survey among organic mashups that focuses on more social aspects and content topics of mashups. The findings will also contribute to describing a general model for mashups and lay the groundwork to examine mashups in the context of business process management in Section 5. 3.5.1 User Centric Mashups are applications for humans, typically satisfying needs of individuals or narrow user groups. These needs are usually specific and not defined by strategic business requirements. Often, mashups are even created by those who leverage them: end users [Crupi and Warner, 2008a]. All reviewed mashups feature visually rich user interfaces that are not only graphically appealing, but also facilitate the exploration of aggregated content. One of the most popular metaphors used is a map where data items are located according to their respective positions, as in Wiinearby.net and Weather Bonk. TuneGlue° depicts relationships between pieces of music as a network of connected dots, the distance between dots represents the distance between the particular songs’ resemblance. Rich visualization of data is much easier to consume and understand by humans, because of our ability to visually identify patterns: distribution, relationships, and shapes. Mashup tools usually allow for a high degree of personalization. Dashboards especially, such as PageFlakes, that aggregate visual components within a single page, offer the end user to decide, which information they want to see. 3.5.2 Small Scale Typically, mashups deal with relatively few data sources and small sets of data, compared to traditional data integration approaches. The actual mashups surveyed indicate that the maximum amount of combined sources for organic mashups is seven. For mashup tools this number is up to the user, but will typically be limited as well. This observation complies well with the “seven phenomenon” stating that humans can recognize up to seven elements out of a given set due to their cognitive 31 3 Survey of Mashups and Mashup Tools capacities. Since mashups provide insight among disparate sources of information, this limit seems to remain intact for them, too. The amount of output information of a mashup is also limited. In many cases, less than 30 items—the typical length of a syndication feed—or the size of the visualization container. Again, this limitation supports users to gain an overview of relevant information. Data aggregations beyond these limits are better solved by traditional data integration approaches, such as database systems. [Clarkin and Holmes, 2007] suggest not to solve complicated aggregations among normalized and fragmented data with mashups at all. 3.5.3 Open Standards Mashups build on technologies and best practices that evolved with Web 2.0, sometimes referred to as Web-oriented Architecture (WOA) [Hinchcliffe, 2008]. An important issue of these, so-called, open standards is their wide adoption and acceptance among Web developers. They evolved bottom up in a survival-of-the-fittest manner, reengineering the principles of the Web (c.f. REST [Fielding et al., 2002]), rather than being forced by corporate governance. This has considerable impact on simplicity, application neutral reuse of existing content, and low entry barriers. [Clarkin and Holmes, 2007] and [Jhingran, 2006] emphasize that successful developments that seek wide adoption must leverage information standardization, i.e. agree on global schemas and metadata. In many cases, information is provided in the form of content syndication, where the specific request is expressed via a URI that is configured with an according query. OpenSearch [A9.com, Inc., 2007] is an extension for Atom that defines the creation of a search-query URI as well as the syntax and semantics of the returned search results document. Since mashups use Web APIs as building blocks, they form exactly what Tim O’Reilly describes as applications in the “Internet Operating System” [O’Reilly, 2005], leveraging the Web as a platform. The popularity of mapping mashups (cf. Section 1) demands open standards for geographical data. One such standard is the Keyhole Markup Language [OGC, 2008] that does not only specifies the definition of geographic locations, but also allows the attachment of any information related to these locations. 32 3.5 Common Characteristics of Mashups 3.5.4 Software as a Service Mashups are Software as a Service, accessing sources and making themselves accessible on the Web. Standalone installation of mashups is more than questionable, because of their underlying intention to satisfy particular, situational needs. Furthermore, since mashups consume resources accessible on the web, users need to be online anyways. As a matter of fact, all surveyed mashups—organic mashups and those created by tools—run in a browser. Several, for instance Sad Statements or Twitter Top News Trends, provide their content additionally as syndication feeds to be consumed by other applications, including other mashups. Providing mashups as Software as a Service has further benefits, as it is the key enabler for wide adoption. A sophisticated infrastructure to access them is not required, only a browser. Accessibility from everywhere also establishes the basis for easy sharing and collaboration (cf. Section 2.2.2). 3.5.5 Short Time to Market Mashups provide solutions for a situational context, often limited in the duration of application. [Clarkin and Holmes, 2007] argue that mashups that focus on a specific need must be created in a short time, measured in hours or days rather than weeks or months. Mashups reuse content that is available in uncomplicated ways, e.g. simple formats and without hindering security restrictions. These contents are combined with just a bit of “glue”, avoiding any work that has been done elsewhere. Such lightweight software models benefit from the “building on the shoulders of giants” effect [Hinchcliffe, 2007], which fuels directly into shortened development times. Tools, e.g. Yahoo! Pipes or PageFlakes, which enable end users to easily create mashups via drag-and-drop, can most apparently reduce development time and effort. A flexible framework can leverage existing resources, provide a set of connectors and operators to compose them, and open mashup development to those who can define the specific needs best: the users of the application—domain experts, which are often not programmers. A particular quality of mashups is “good enough”, which means that building an application that fulfills a specific need should not include more features than the 33 3 Survey of Mashups and Mashup Tools problem itself requires. This shortens development times and lowers the cost of application development. [Shirky, 2004] 3.5.6 Aggregation of Heterogeneous Content Mashups explicitly aggregate heterogeneous content, that is knowledge and skill, from disparate, unrelated sources in a non-invasive manner, retaining their original purpose. This includes, but is not limited to, enacting Web services remotely, loading functionality from external sources and executing it within the mashup application, obtaining data from widespread sources, and employing skills for presentation and visualization. Sad Statements combines Twitter posts with Flickr photos using the Yahoo! Term Extraction API18 Web service to extract relevant terms of Twitter posts. These terms are used to find matching images on Flickr. The resulting pair—the Twitter post and according Flickr images—is aggregated to a single sad statement, delivered to the user as a static page. Wiinearby.net combines information about Wii console offers on different retail platforms such as eBay, Amazon, or Craigslist, with mapping capabilities of Google. In contrast to the preceding example where the aggregation is carried out on a Web server, Wiinearby.net executes mapping functionality completely in the context of the provided Web page, on client site. Many mashups automatically acquire content in realtime, i.e. using data that is up to date to be aggregated. This is important to give insight into a set of information that changes over time, and the basis for decision support. Freshness matters [Clarkin and Holmes, 2007]. It also encourages for a more lightweight architecture. Instead of storing obtained content, aggregation is done based on data obtained live. [Bradley, 2007] even believes that mashups do not have any native data storage or content repository at all. 3.5.7 Data Centric While aggregated capabilities are not limited to data, the main concern of all reviewed mashups is comprehensive data presentation, providing insight and understanding. According to [Novak and Voigt, 2006], most of the used APIs are simply syndication feeds, providing data. All surveyed mashups aggregate content, only 18 http://developer.yahoo.com/search/content/V1/termExtraction.html 34 3.5 Common Characteristics of Mashups some employ external services to enhance their data. Moreover, most of the mashups consume content read-only. Some widgets available for dashboards, for instance iGoogle, allow manipulation of data. This observation suggests that mashups mainly focus on information aggregation and enhancement. The reason for this is simplicity. Manipulating resources generally requires some kind of authorization and authentication, which is difficult for mashups, because users may not surrender their credentials to the hands of a mashup developer and current browser implementations have limitations concerning authorization among several domains (cf. Section 2.3.3). These obstacles hinder easy aggregation of services to manipulate data. iGoogle is an exception, because in order to access one’s personal page, one has to be logged in with a Google account, which automatically gives access to the magnitude of mashable resources Google offers for their users. Ongoing development in the subject of shared identities, such as OpenID [OIDF, 2007] and future browser development will circumvent these issues, making mashups more data interactive. 3.5.8 Lack of Governance The term governance refers to decisions that define expectations, grant power, and verify performance. Considering mashups, this relates to accessing capabilities in a courteous way, granting access to trusted capabilities through authentication and authorization, and ensuring qualities, such as availability and reliability of the mashup and accuracy of content. Most of the surveyed mashups consume capabilities that are provided freely, often through commercial providers, such as Yahoo! (e.g. Flickr), Google (e.g. Google Maps), or Microsoft (e.g. Microsoft Virtual Earth), neglecting governance in the main. Based on demands for simplicity, such as lightweight composition, uncomplicated open standards, spontaneous selection of capability sources, governance gets in the way in rapidly creating situational solutions. The largest deficiency is found in distributed identity and authorization management. Data access, in particular, is specific to user identities, which are disparate user accounts in many cases. If a mashup accesses several confined resources, it may be required to provide different authentication credentials separately to each of these services. Such limitations hinder the development of mashups that access personal information. Classic enterprise software solves this with single-sign-on, which is not feasible for mashups because the accessed capabilities are generally not under the control of the mashup provider. OpenID and OAuth [OAuth Core Workgroup, 2005] provide means for federated authorization under the assumption of one globally 35 3 Survey of Mashups and Mashup Tools unique identity for one user. While this approach promises value, it is inhibited by the fact that users typically have at least one authorization identity, i.e. user account, for each of the different capabilities that would be accessed and mashed. One solution for the aforementioned problem is to provide mashups subjected to the control of trustworthy corporate entities and provide certified access under terms of data privacy. However, such trustful relationships do not generally apply to the broad universe of organic mashups. Another problem of distributed identities is the same origin policy of Web browsers (cf. Section 2.3.3) that deny to access capabilities originating from another domain than the originally loaded document, i.e. access disparate capabilities. Future versions of Web browsers are likely to address this issue natively, and approaches exist to circumvent these restrictions for mashups, as in [Isaacs and Manolescu, 2008, Jackson and Wang, 2007, Keukelaere et al., 2008]. Courteous access to capabilities refers to the goal not to harm the capabilities that are aggregated under the covers of a mashup. Such harm can happen in many ways: a popular mashup may result in highly increased access frequency of a capability beyond its capacities and render it unavailable. Flash crowds are a common phenomenon that appears when Web sources are advertised publicly. Further, erroneous or not carefully crafted mashups may cause denial of service of capabilities that were not intended to be reused for mashups. Capability providers attempt to address these issues by providing their content as separate resources via Web APIs. These incorporate caching and provide open standards interfaces, as mentioned previously. [Phifer, 2008] advances the view that mashup developers should be in charge in order to avoid harming of capabilities. This can be done by employing caches, monitoring, and trust management in the mashup infrastructure itself. These measures also contribute to the quality of mashups, for instance reliability, response times, and availability, even if capabilities may not be reachable. A certain level of quality of obtained information, in terms of completeness, accurateness, and timeliness, is difficult to achieve, since capability providers are unlikely to provide any service level agreements for resources that they publish freely available. [Alba et al., 2008] discuss data accuracy further. In general, it is the mashup developer that is responsible for ensuring that provided content is accurate according to the mashup’s purpose. In general, it is observable that too much governance can result in a disservice to mashups. The reason for the lack of governance in current mashups can be found in its hindrance of aggregating capabilities. Thus, governance must address the 36 3.5 Common Characteristics of Mashups demands of each particular mashup and provide just enough control to satisfy those needs. 37 4 Anatomy of a Mashup This chapter will pick up the gained knowledge of the mashup survey and investigate concepts that shape the current perception of mashups. This begins at a distance, viewing mashups in their environment. The mashup ecosystem puts the term mashup into a holistic perspective, explaining the process of mashup creation and usage. The term ecosystem expresses also a balanced system resulting from the evolution of mashups. Following, the mashup and its inner workings will be approached, identifying a typical pattern of activities that is common to mashups in general. A formal representation of this pattern will yield a reference model that allows developers to examine and understand existing mashups, but also to design new ones on a conceptual level. In order to put the conclusions drawn here into a particular context and avoid confusion, the discussion of mashups will be limited to the following properties in the remainder of this chapter, unless stated otherwise. While these restrictions apply to all mashups reviewed in the previous survey (cf. Section 3), they do not constrain a mashup definition in general. Mashups are software systems that aggregate knowledge (information) or skills (functionality) from two or more sources providing capabilities that are different in their purpose. They consume and provide capabilities in human and/or machine consumable formats over the Web. The additional value created by mashups results from the aggregation of the essence of the capabilities, not from sophisticated operations defined through the mashup that are executed on top of the capabilities. Mashups further provide continuous views on dynamic capabilities obtained in real time rather than working as transformation tools that take static data as input and generate a statically used output. 4.1 The Mashup Ecosystem The term ecosystem originates from the field of biology, delineated as the community of interacting organisms and their physical environment, and has been adopted in many fields of academia and industry. In accordance with this definition, a mashup ecosystem describes the mashup, the entities interacting with mashups, and their physical environment. Entities need to be understood as roles. Figure 8 depicts 39 4 Anatomy of a Mashup .. .. the entities of the mashup ecosystem, capability provider, mashup, and mashup consumer, and the way they interact with each other. Mashup Site Mashup Specification Execution Context Expertise Provider User Agent Mashup Consumer Mashup Application Mashup .. ▼ ▼ .. User Capability Capability Provider Figure 8: The Mashup Ecosystem, depicting the entities mashup consumer, mashup, and capability provider (FMC) Capability Providers offer some kind of expertise, generally distinguished between skill, i.e. functionality, and knowledge, i.e. information. The entity that encompasses domain expertise may not have the intention to maintain the technical exposition of the according capability19 on the Web. However, mashups create value by composing these very capabilities. An entity may act as intermediary and provide a capability on behalf of the expertise provider. Unless stated otherwise, expertise provider and technical provider are considered the same entity hereafter, referred to as capability provider. Mashups play the principal role in the mashup ecosystem, comprising the mashup site, storing a mashup’s specification, and the mashup application instances with their execution context (cf. Section 3.3). The mashup site is a centralized storage that allows the delivery of the mashup specification over the Web, offering it as Software as a Service (cf. Section 3.5.4). The execution context of the mashup may differ from the mashup site [Merrill, 2006]. Many organic mashups are executed centrally at the mashup site providing aggregated content to the client, whereas 19 Capabilities are modeled as active entities, according to Section 4.2 40 4.1 The Mashup Ecosystem dashboards are typically delivered as specifications to and executed completely within the user’s Web browser. Mashup execution may also span both locations. Mashup Consumers define the needs mashups are determined to satisfy. Representing the target group a mashup serves, it is the users, who create the mashup in the first place (cf. Section 3.5.1)—developers are most likely end users themselves. In general, end users are represented to the mashup by an application that makes mashups human consumable: the user agent, primarily a browser. In turn, mashups themselves may consume other mashups, which then act as capability providers. In a nutshell, these entities interact as follows. Capability provision: Some entities offer capabilities. The capabilities are accessible through an interface that is understood by the mashup application and can be discovered and explored by mashup developers. Mashup creation: After identifying their needs, developers select the capabilities that offer beneficial expertise and create the mashup, by programming, using tools, or both20 . The mashup is stored as a description of its task: the mashup specification. Mashup execution: At the time a mashup is requested to fulfill its purpose, a mashup application instance is initialized and accesses the specified capabilities, obtains the offered expertise, be it knowledge or skill, combines them according to its specification, and returns the mashup result to the consumer. The environment that encompasses mashups is the Web in its entirety, including private content delivery networks. The latter are usually corporate networks that isolate valuable and confidential information from unwitting access from outside the network. This is considered one of the beneficial aspects of mashups. Section 3.5.4 outlined already that mashups are characteristically Software as a Service and don’t need any installation on the client side, which makes them accessible from everywhere. According to [Berners-Lee, 1996], interaction on the Web happens in compliance with a uniform interface, implemented by various transport protocols. This is in particular HTTP [Fielding et al., 1999] as today’s most common protocol to access resources on the Web. This environment is considerably influenced by effects of Web 2.0, including the growing popularity of the REST architectural style, described in Section 2.3.2. The following sections will emphasize that further. However, since there is no strict definition of mashups, they may and do employ patterns that do not comply with the constraints REST imposes. 20 Mashup creation in its various forms is depicted abstractly through write access to the mashup specification in Figure 8 41 4 Anatomy of a Mashup 4.2 Capabilities—Essential Mashup Enablers [Hinchcliffe, 2006] aptly states that mashups aggregate “whatever needs to be aggregated”. The continuum of capability types is wide and most often refers to websites and content syndication feeds. Without loss of generality, one can summarize this continuum as the set of capabilities that are exposed within the mashup ecosystem environment. This is supported by the observations of the survey, in particular in Section 3.5.6. Capabilities are accessible via a rather simple interface: A request message sent to an Internet address is responded with the transmission of a hypertext document, representing the capability’s expertise. This characterization of a capability matches the definition of a resource in REST [Fielding, 2000]: “the intended conceptual target of a hypertext reference”. The reason for that is rather simple: While REST does not only denote a desirable architectural style for large scale, distributed applications, it describes the very architectural properties that have become the foundation for the modern Web architecture and contributed largely to its technical success [Fielding et al., 2002]. Therefore, most existing capabilities, such as Web sites, are resources implicitly. Resource Orientation. Resources in the context of REST are generic by nature and thus, retain application independence promoting reuse at the granularity of the resource’s very expertise, not any domain logic built on top. Offering access under these assumptions, resources share their state with external entities. This state may be exposed for consumption only, or be offered for manipulation by consumers. Exposing capabilities in the context of resource orientation adds value beyond the costs involved, offering low entry barriers, Internet-size scaling, and instant deployment [Overdick, 2007]—a hypothesis that is backed by the many RESTful APIs published by commercial and noncommercial resource providers: According to ProgrammableWeb 66% of more than 1150 APIs registered at the time of writing are based on the REST style. 21 One, if not the most important, element of resource orientation in distributed hypertext systems is the exchange of the resource’s state through representation that is a semi-structured hypertext document [Fielding et al., 2002]. Instead of manipulating data through one of various incarnations of remote function calls, the information set representing a resource’s state is moved to the processor (cf. Section 2.3.2). Hypertext and its superset hypermedia allow submitting semantically structured data 21 http://www.programmableweb.com/apis 42 4.2 Capabilities—Essential Mashup Enablers along with references to related resources. Thus, representations contain arbitrarily structured content that may, or may not, hide information about the private state of the resource. Representations can further include directives to process information, such as JavaScript, and directives to render the content, such as Cascading Style Sheets (CSS). By that, representations comprise data, logic, and presentation. While most of the consumed capabilities are centric to data delivery (cf. Sections 3.5.7 and 3.5.8), resources also offer behavior beyond delivering data. Yahoo! Pipes, for instance, offers a module to extract location information from free text.22 Other services even allow changing the state of resources. The intended behavior is expressed via a set of exposed methods23 that allow updating the resource’s state (cf. Section 2.3.2). Web APIs. Capabilities on the Web are often referred to as Web APIs. Similar to the term mashup, Web APIs are deficient in a crisp definition. Web APIs can be considered to express a particular behavior through providing a Web oriented programming interface. Thus, Web API refers to one resource or set of related resources, exposing behavior and representation as described above, with the explicit intention to be used as service interface by other applications. Some capabilities are offered through intermediaries, because the owner of domain expertise is not willing to expose that expertise. Very often, this intermediary employs Web APIs to provide that expertise, explicitly offering it for reuse. Within the last two years, many organizations operating on the Web started to offer Web APIs as an explicit alternative to consume their expertise in applications, rather than by screen-scraping their Web sites. Consequently, Web APIs became effective means to manage governance among capabilities (cf. Section 3.5.8). A common and widely adopted open standard for Web APIs is the Atom Publishing Protocol. AtomPub leverages the Atom Syndication Format to represent collections of data items in semi-structured documents, and applies an interaction protocol enabling humans and machines to create and update content. AtomPub interaction obeys the constraints of REST almost purely by the definition of the protocol and thus, received much attention. Google’s Data APIs24 , for instance, are based on and extend AtomPub. 22 http://pipes.yahoo.com/pipes/docs?doc=location HTTP refers to these methods as verbs. [Fielding et al., 1999] 24 http://code.google.com/apis/gdata/ 23 43 4 Anatomy of a Mashup Web APIs are not only located at the level of the resource’s interface. More sophisticated APIs may be exposed through functionality that is executed locally to the mashup, providing skill and knowledge through accessing further resources under the hood. The Google Maps API is an example for such. It is an API implemented in JavaScript that offers functionality on program code level. This functionality provides means to draw maps, position location pointers on it, and enrich these with related information. The API’s implementation autonomously accesses and loads geographic data and images from Google’s servers. 4.3 The Mashup Pattern Capability Provider Mashup Mashup Consumer Due to their bottom up evolution, mashups have a naturally small denominator of commonalities beyond the high level characteristics derived from the survey in Section 3.5. Being rapid implementations of specific needs, mashups are remarkably manifold in their architecture and the way they consume capabilities. However, the quantity of reviewed mashups allows derive a pattern that consists of three activities taking place within a mashups to aggregate disparate capabilities and creating value. Figure 9 illustrates this pattern and puts it into the context of the mashup ecosystem, described above. Send Request Display Response Ingest Request Expertise Normalize Response Augment Publish Provide Expertise Figure 9: Overview of the Mashup Pattern (BPMN) The three activities are ingestion, augmentation, and publication. The ingestion phase accesses capabilities and prepares them for further processing. The capabilities 44 4.3 The Mashup Pattern are combined and aggregated during augmentation, and the resulting content is repackaged to be delivered to the client in the publication phase. The main activities are ordered by causal dependency within the diagram. As further discussions will reveal, the actual ordering may vary within particular mashups. Some ingestion will happen before, during, and even after augmentation, depending on the specific requirements and implementation of mashups. 4.3.1 Ingestion Ingestion is the act of harvesting heterogeneous capabilities that are spread in networks and encapsulating them to facilitate their usage by a single application—the mashup. By that, ingestion serves two purposes. On the one hand, ingestion acts as connector to remote capabilities. As explained previously, the majority of capabilities are explicitly or implicitly offered as resources in the terms of REST. However, capabilities exist that are not exposed by the same means. To name a few, databases, legacy systems, proprietary documents, e.g. spread sheets, are valuable capabilities especially in a corporate setting. While such capabilities may also be exposed in a general effort of service enabling knowledge assets within organizations, it is as likely that they aren’t. Specific ingestion connectors to access the corresponding systems and obtain the capabilities are thus required. If capabilities are Web APIs, the according ingestion component must understand and obey to the interaction protocol of these APIs. RESTful APIs increase reusability beyond anticipated use cases greatly by providing a uniform yet versatile interface. As Section 3.5.8 indicates, resources that allow the manipulation of shared state are rather seldom, due to the absence of adequate secure cross-domain communication capabilities of browsers (cf. Section 2.3.3). On the other hand, ingestion serves as a primer for representations obtained from the capability providers. In most cases, the relevant asset of representations is data. In addition to the broad diversity of capabilities and access methods, data is delivered in an equally broad variety of data types and formats (cf. Section 3.5.6). Content not provided in formats that address machine consumption in the first place, such as human-oriented websites, emails, and spreadsheets, require more sophisticated information extraction systems, further discussed in [Kayed and Shalaan, 2006]. One of the most controversial methods to extract accurate information is screen scraping, i.e. using algorithms and templates to extract content from semi-structured documents intended to be rendered preferably in a Web browser. While this approach is prone to faults due to ever changing site structures, [Alba et al., 2008] underline its 45 4 Anatomy of a Mashup significance in present scenarios. Formats that are designed for machine consumption such as syndication formats, like Atom and RSS, or even semantically structured documents, e.g. via RDF, promise much better performance, yet is their application still delayed. RDF especially assumes considerable knowledge of the format and its semantics to utilize it effectively. After successfully unlocking these resources, exchanged information must be normalized into an agreed format that sustains the operations in the augmentation phase. Functionality may need to be wrapped to comply with specific interfaces, or be analyzed and disarmed in order to avoid malicious code to harm users or other capabilities, as in [Isaacs and Manolescu, 2008]. Due to the need to discover and explore potential mashable capabilities in the first place, mashup tools often include catalogues or repositories that let users and developers browse through a set of capabilities. In most cases, the tools already provide readymade ingestion components for them. Some tools even offer facilities to import capabilities intended for other tools or mashups [Gurram et al., 2008]. ProgrammableWeb provides a freely available, well documented catalogue of over 1000 Web APIs that are offered on the Web and consumed by diverse mashups. 4.3.2 Augmentation Augmentation denotes the application of competence among a set of capabilities, including knowledge and skill, with the objective of value co-creation. Augmentation aggregates the capabilities that were obtained and normalized during the ingestion phase. Here, aggregation stands universally for any means to combine these capabilities in meaningful ways to create value, while retaining their essence. The competence, often referred to as the “glue” to put the pieces together, is provided by the domain expert. As outlined in Section 2.2.3 this is the mashup developer and user, which may in fact be the same person. Mashup applications stand out by the relatively small amount of work involved to create them, that is, to automate the application of competence. In that context, “relatively small” means the amount of effort compared to traditional integration of data and functionality. Augmentation is the main focus of literature concerning mashups. Some academic papers put their main focus on the interaction of software components, for instance [Gurram et al., 2008, López et al., 2008], whereas others understand data aggregation as central matter, for example [Morbidoni et al., 2007, Riabov et al., 2008, 46 4.3 The Mashup Pattern Simmen et al., 2008]. In general, this allows for the distinction of two main strategies of capability augmentation, as suggested in [Abiteboul et al., 2008]: augmentation through interaction of components and augmentation through chaining a set of operations. Both approaches leverage components that may provide information, functionality, or both. For the two augmentation strategies, it is of no relevance whether functionality is provided locally, i.e. as executable component within the execution environment of a mashup, or remotely, i.e. as a service provided through a resource. The first approach considers a set of interacting components that are autonomous to some extent. Each component provides certain expertise, some of them by encapsulating external capabilities. Augmentation happens through interaction among these components, connecting data from one component with that of another and creating value. This usually happens as a response to user interaction, after the mashup application was delivered to the user and has been initialized. This type of augmentation, often called assembly or wiring, is generally used in dashboards, where widgets interact as a result of user interaction with the dashboard application [Hoyer and Fischer, 2008]. For such interaction, the execution context of the mashup application must provide communication channels that enable interaction among the components. The most often used architectural design pattern to enable interaction without closely coupling components is the publisher-subscriber pattern [Yu et al., 2008]. Due to its interactive behavior, this type of aggregation is suitable to explore information sets and discover insights among them. The second type of aggregation is more interesting from an academic point of view and thus much better covered by scientific work. In contrast to the first approach, aggregation happens before content is delivered to the user. Much research has been conducted that focused on the sole integration of several data capabilities, creating new information through combining separate yet related information. A lot of different methods have been developed to achieve this goal, the most prominent being the pipes-and-filter pattern [Hohpe and Woolf, 2003], where operational filters are chained to create a complex process of data transformation. This has been applied by [Simmen et al., 2008], and in Yahoo! Pipes and Microsoft Popfly. More formal work contains query languages to aggregate data capabilities [Jarrar and Dikaiakos, 2008, Tatemura et al., 2007]. [Riabov et al., 2008] describe automatic data composition through the definition of goals. Details of the particular aggregation approaches are well covered in existing work and thus, beyond the scope of this thesis that rather aims to gain a holistic picture of mashups. In essence, a general metaphor can be deduced from these particular approaches: an ordered set of operations that are performed in coordination—an execution plan [López et al., 2008]—operating on collections of semi-structured information items that are generally syndication feeds. 47 4 Anatomy of a Mashup These operations comprise the “glue”, i.e. basic functionality derived from relational algebra, and functionality exposed through resources, locally or remotely as described above. An important issue, in the aggregation of hypertext, is the significance of links that express application logic in a broader sense (cf. Section 2.3.2). On the Web, hypertext is the engine of application state: Links identify related information, e.g. directives to render or to process information, and control paths to advance application state. Aggregation of hypertext must, therefore, consider the meaning of links and ensure their integrity. Carrying links along aggregation of documents allows increasing data quality and trust, since the end user can obtain knowledge about the origin of information. 4.3.3 Publication Publication is the act of transforming augmentation results into meaningful data formats and delivering them to the recipient of the mashup. While the result of the augmentation phase may be artifacts of arbitrary type, for instance the compilation of a set of algorithms, enhanced collections of data items, or data sets along with processing and displaying directives, these results must be transformed into a specific representation. This representation must be understood by the recipient, the mashup consumer, and serialized to be transported to the client via a transport protocol, that is, typically HTTP (cf. Section 3.5.4). The transformation of augmentation artifacts into a representation follows any of three options, discussed in detail in [Fielding et al., 2002]: (1) render into a fixed format image and thus, hiding internal data, (2) encapsulate content with processing and rendering instructions, and (3) send raw data along with a set of metadata expressing the data type and format. All three options allow delivering information as well as functionality. For instance, (1) applied to functionality yields a compiled and executable software component, such as a Java browser applet. In general, publication produces a hybrid representation of all three options, hypertext containing or referring to data, logic, and presentation directives, similar to the representations obtained from capabilities (cf. Section 4.2). The actual representation that is provided by publication depends on the demands of the mashup consumer. While this is a human in most cases, [Ankolekar et al., 2007] do advocate that mashups, as consumers of reusable capabilities, should offer their capabilities 48 4.4 The Mashup Reference Model Ingest share value and offer for reuse discover and mix capabilities Publish Augment assemble application, explore different combinations and insights Figure 10: End-to-End Mashup Workflow for machine consumption as well. Thus, a mashup may provide different types of publication to address different users and devices. This reuse of reused capabilities establishes a circular sequence of the activities that comprise the mashup pattern. [IBM Corporation, 2008] labels this cyclic workflow “End-to-End Mashup Workflow”, depicted in Figure 10. 4.4 The Mashup Reference Model The term reference model denotes an abstract framework that allows understanding significant entities and relationships among them, within a certain environment. Reference models do not make any assumptions about technology, platforms or architectures. The increasing popularity and rising interest of companies in mashups led several institutions to approach reference architectures for mashups, among them are [Bradley, 2007, López et al., 2008]. However, these proposals are approaching mashups on a technical level, although there is no particular technical denominator among the different mashups. These attempts limit the variety and flexibility, and thus, the potential of mashups. [Abiteboul et al., 2008] provide an abstract model to formally define mashups on the basis of relations describing interfaces of mashup components, so-called mashlets. While this model retains the potential of the mashup variety, it is 49 4 Anatomy of a Mashup too formal, comprising a complicated mashup specification language. As this thesis argued so far, mashups grew bottom up and gained momentum through the high degree of simplicity and freedom to discover new and unanticipated, ad-hoc combinations of capabilities. Thus, a reference model needs to address that freedom and be limited to a logical level rather than an architectural or technical one. The reference model presented here, picks up the component-orientation of mashups presented in [Abiteboul et al., 2008], relaxes it, and overcomes the shortcomings of earlier approaches to define a reference architecture, such as in [Bradley, 2007, López et al., 2008]. For this purpose, the presented model advances the observed mashup pattern and provides a structural composition approach that facilitates the understanding of the processes taking place inside a mashup. This does not only allow the logical decomposition of existing mashups into the parts identified by the mashup pattern, but is also a first step on the way to design new mashups on a conceptual level. This will directly contribute to the development of the proof of concept, provided in Section 6. REST Style Mashup Reference Model deliver Resource Mashup Component Mashup Ingestion Component Augmentation Component consume ▲ ▶ consume Publication Component deliver ▼ Representation Figure 11: Mashup Reference Model (UML) In correspondence with the observations made in Section 4.2, the REST architectural style is chosen as a reference system, that allows remaining accurate within a specific context, without loss of generality. Consequently, a mashup is considered a resource that accesses capabilities in the form of representations and provides its state as 50 4.4 The Mashup Reference Model representation itself. The reference model and the REST context are depicted in Figure 11. The model is a metamodel—a concrete mashup model is an instance of the mashup reference model. The reference model does not necessarily describe any architectural, but rather logical structures of mashups. Especially, architectural separation of components is enforced by no means. According to the reference model, a mashup is comprised of a set of components. These components are classified according to their role within the mashup pattern: ingestion, augmentation, or publication. Each component provides knowledge or skill, i.e. information or functionality to process information. The components are connected through delivery-consumption-associations that illustrate data flow channels between the components. The symbol I denotes the direction of data flowing along the channels. While there is no general restriction, these channels work following a request-response pattern, i.e. a component requests data from a preceding component and consumes the corresponding response. Thus, control flow is established implicitly through data flow. This kind of interaction comprises both strategies of augmentation, discussed in Section 4.3.2. It allows semi-autonomous components to interact as a result of external events in any ways and serves to describe processes that aggregate capabilities in preparation of content delivery to the user. In the latter case, an incarnation of the reference model can be perceived as a process model, describing the process of value co-creation inside a mashup. The following section discusses each component and its relations to other component types in detail. 4.4.1 Reference Model Components Ingestion. Ingestion components are connectors to access resources. In general one can assume one ingestion component per consumed capability. Ingestion components consume information and normalize data, if the according capability provided data. Otherwise, they encapsulate the functionality provided by a capability in a way that it does not matter whether functionality is executed remotely or retrieved and executed locally. Ingestion components only consume representations of capabilities; they are not receiver of data flow from other mashup components. However, they do not necessarily 51 4 Anatomy of a Mashup perform before any augmentation happens, they may be requested to access functionality from external resources as part of the augmentation phase. Government mechanisms are generally implemented by ingestion components. Since one ingestion component accesses one particular resource, it knows interaction protocols and potential authentication and authorization measures. As REST explicitly promotes the presence of transparent intermediaries [Fielding et al., 2002], ingestion components may also employ caches to reduce network traffic and shorten response times, thus providing courteous access discussed in Section 3.5.8. Augmentation. Components performing under the control of augmentation are generally richly structured. Based on the granularity of composition, one could express each operation that is applied to data as a single augmentation component. Thus, augmentation components consume data that is delivered from ingestion components or augmentation components. No restrictions exist about the number of incoming channels of an augmentation component. This allows for effective aggregation of information retrieved through disparate ingestion components. Augmentation components can also incorporate logic that affects the control flow within the augmentation phase. By these means, augmentation comprises a network of components that provide the value co-creation of a mashup, containing all domain knowledge that is required. Augmentation components may be executed on the mashup site, locally on the client site, or be spread among client and server, spanning the execution context of a mashup across physical and network boundaries. As mentioned earlier, ingestion components may be invoked during augmentation, e.g. to retrieve related information or load functionality on demand. Publication. Publication comprises the transformation steps that are required to deliver aggregated value to the user, i.e. creating a form of representation that can be rendered and displayed by the user agent. This incorporates the construction of semi-structured or structured documents according to a data format, in most cases. While publication components generally consume data from augmentation components, they may request capabilities through ingestion components as well, e.g. if a capability provides means for data transformation. An example for such is the Google Charts API25 , that provides a service to easily create a diagram out of a statistical data set. 25 http://code.google.com/apis/chart/ 52 4.4 The Mashup Reference Model Since publication components are considered the user endpoint of mashups, they may implement a set of governance mechanisms as well. This includes, but is not limited to, the handling of authentication and authorization of users to access a mashup, or employ caches that can reduce traffic and resource consumption. 4.4.2 Organic Mashups and Dashboards The mashup reference model further allows putting the two types of mashups— organic mashups and dashboards, introduced in Section 3.4—into relation to each other. An organic mashup can be considered as an arbitrary composition of mashup components following the model described above. That is the reason for their name, they are developed organically. Dashboards, on the contrary, can be considered as mashups that combine a set of other mashups. Figure 12 depicts that circular relationship, already pointed out in Section 4.3.3. Mashup Organic Mashup Dashboard Widget Figure 12: Organic Mashups and Dashboards according to the Mashup Reference Model (UML) Dashboards consist of a dynamic set of widgets and each widget interacts with the user on its own responsibility. This suggests that widgets may be constructed logically from the same components as mashups in general. They incorporate their own publication components, employ ingestion components to access remote capabilities, and may perform operations in between. However, compared to mashups as defined here, widgets do not necessarily aggregate more than one capability but provide an alternative interface to a single capability. Nevertheless, they still conform to the mashup reference model in a relaxed way. 53 5 Application of Mashups for Business Process Management Business process management traditionally addresses operations and their relationships that form a value creating process inside an organization and systems that provide the capacity to conduct these activities and processes. These processes and their representations, process models, are central assets of knowledge within an organization, since they comprise the expertise to create value out of a set of resources. Due to their importance to the organization, business processes are carefully crafted and highly governed, which requires much effort to design, implement, and execute a business process. The observations and conclusions of the preceding chapters have shown typical characteristics of mashups (cf. Section 3.5) and their value in organizational environments (cf. Section 3.4). This chapter analyzes to which extent mashups can contribute to business process management. Successful application of mashups has to address the main goals of business process management, which are, therefore, recapitulated first. The business process life cycle is a useful tool to understand the objectives and concepts of business process management and suits well for scoping potential mashup scenarios here. Therefore, this life cycle is explained before the value proposition of mashups for business process management is examined in different settings. Alike earlier discussions and explanations, this chapter will not address a specific technical infrastructure for its examinations. While an apparent trend of enabling software systems for Web access has been growing within organizations recently, it would exceed the scope of this work that rather focuses on a conceptual application of mashups for business process management. A more technical and concrete insight into the technical realization of a mashup will be given in the next chapter that provides proof for the observations made. 5.1 Goals of Business Process Management To understand the potential of mashups for business process management, a summary of the main goals of the latter is given. In order to contribute to business process management, mashups need to address these goals in a constructive way. 55 5 Application of Mashups for Business Process Management Understand the Operations of an Organization. The most important goal of business process management is to gain insight and understanding of the operations of an organization and their relations [Weske, 2007], answering the question of how an organization works in a holistic way, yet detailed enough to address the operational business. The core concept of this understanding is an explicit representation of the process through a model. Such a representation is only of significant use, if it creates common understanding among stakeholders and allows them to review and improve the represented process. Semantically rich and syntactically strict representations support formal model checking, i.e. verification of the soundness of process models. Graphical representations proved to be very useable in stakeholder reviews, because visual representations facilitate understanding. Implement Processes in an Enterprise Environment. If existing processes are understood and captured in models, redesigned, or new processes were established, they need to be implemented within an organizational and technical environment. That means, they need to be integrated into and executed in a controlled context that reduces proneness to errors and provides effective handling of exceptions. Integration with existing systems, which may rather be information centric systems, through reuse reduces the gap between concrete business processes and their realization in software systems. As important as the implementation into an IT systems landscape is the implementation into the organizational environment, that is, efficient and effective coordination of resources. Process instances interact with systems and with human participants. In an organizational setting, persons are represented by their competences and positions—characteristics that primarily define a person’s activities within a process. Increasing the efficiency of a process also requires increasing the efficiency of participant interactions. Establish Flexibility. Besides explicit establishment of processes and their realization, the key operational goal of business process management is flexibility. The ability to adapt to changing influential factors is required among several dimensions. In today’s rapidly changing market dynamics, it is essential to adjust affected processes in short time to prevent loss of market share. Equally important are improvements to the process itself. Systems managing business processes must further be able to adapt to changing technical situations, such as changes in the organization’s IT landscape without affecting the business process itself. 56 5.2 Business Process Life Cycle 5.2 Business Process Life Cycle The business process life cycle, presented in [Weske, 2007], constitutes the backbone of business process management, comprising four phases that form a cyclic relationship among each other: design and analysis, configuration, enactment, and evaluation. Process stakeholders are related to these phases according to their specific tasks and responsibilities according to [Decker, 2008], as shown in Figure 13. Each phase and its specific function are described briefly below. Sometimes, an additional activity is considered in the context of the business process life cycle: Administration comprises the storage, management and efficient retrieval of numerous information artifacts related to different phases and stakeholders of the life cycle. [Weske, 2007] Process Participant Process Manager Evaluation Design and Analysis Enactment Process Designer Configuration Enterprise Architect Process Implementer/ Developer Figure 13: Life Cycle of a Business Process within Business Process Management, including involved stakeholder roles (BPMN) Design and Analysis. The design and analysis phase is typically considered the first phase of the business process life cycle, because it comprises the activities that lead to an explicit representation of a process: the process model. This representation founds the central knowledge asset for all other phases of the life cycle. Design 57 5 Application of Mashups for Business Process Management incorporates also process redesign to overcome issues of existing processes, to improve their performance, or to adapt to changed business demands. Analysis encompasses means for proofing the process’ correctness at the level of the process model, including validation, verification, and simulation. Especially validation—reaching compliance with the business goals a process focuses on—is conducted in a collaborative manner, typically in reviews that include all involved stakeholders [Weske, 2007]. Configuration. After initial design or redesign, and successfully passing analysis, a process needs to be put into action. This happens during the configuration phase that includes all actions required to implement the business process within its technical and organizational environment. Configuration draws from a wide spectrum of IT involvement. Sometimes, IT systems are not involved at all. Business processes comprise the activities that are required to create some kind of value. By setting up guidelines, policies, and rules business processes can be realized among a relatively small group of humans that agree to comply with these procedures. Processes can also be implemented as hardwired applications that execute activities in a static way. This may be valuable for highly repetitive processes that do not change over time, e.g. due to legal regulations. In such cases, standard software can perform these processes that do not need much maintenance in practice. Business process management, however, focuses on realizing efficient process operation within the organization’s IT landscape retaining flexibility to adapt and improve processes. Thus, it is desired to implement processes in a generic framework, typically a business process management system. This allows enacting processes, defined by process models, in a controlled manner and a coordinated environment reducing the effort of configuration to a minimum, e.g. the integration with existing software systems and communication with users. Enactment. Following its completed configuration, the business process is ready to be unleashed to perform its obligation. The enactment phase comprises the actual run time of process instances, that is, the phase of value creation. Enactment means, that a process instance is initiated upon an event originating from the process’ environment and this process instance is then executed under the active control of the business process management system according to the execution constraints specified by the process model and the configuration directives set up in the previous phase. 58 5.3 Value Proposition of Mashups for Business Process Management The activities performed as operational steps of process instances are generally differentiated into two classes according to the involvement of interaction with human users. While system activities do only interact with software system parts of the process environment, human activities depend on the fulfillment of a set of tasks by a human user. Process instances that are completely detached from user involvement are also called system workflows. Part of this phase is the collection of data that arises during the enactment of process instances including, but not limited to, data objects, events, and exceptions. This information is kept in so-called process execution logs for further analysis of the process. Evaluation. Evaluation comprises the analysis and processing of the very information gained throughout the enactment of process instances. Business activity monitoring gives insight into the current state of process instances and the process landscape. Process execution logs provide information about execution performance and potential issues of process instances. This allows appraising business processes and the execution environment against set targets of process performance indicators. The process manager uses this information to examine and invoke process improvements if the targets were not met or the process lacks efficiency or effectiveness. One subject within the evaluation phase is business process mining. This is used to discover existing but not explicitly represented business processes from observations gained through the activities of a real process [van der Aalst, 2002]. Due to its very nature, business process mining is unlikely to benefit from mashups and a topic addressed by data analysis and data mining. Thus, it is not further considered here. 5.3 Value Proposition of Mashups for Business Process Management Business process management is a mature and proven discipline within the field of computer science that has been influenced by few strong bodies: companies and academics. There is clear understanding and concise definition of the goals and concepts of business process management. Mashups, in contrary, are quite young and in their adolescent phase that is heavily impacted by decentralized development through individuals, non-profit communities and recently also companies. Compared to business process management, no crisp definition of mashups exists and they are rather perceived as phenomenon (cf. Section 3). As a consequence, little work is 59 5 Application of Mashups for Business Process Management done to combine these two fields on a conceptual level, compared to mostly technical problems that were addressed in the past, described in Section 1.2. While business process management considers process models the central assets of knowledge, many information artifacts exist at different levels of abstraction related to processes, process instances, and process models. These include, among others, execution logs, documentation, and information about the organizational and technical environment. Much information comes from process stakeholders, each of them having particular skills, knowledge, and experience. The numerous information artifacts are likely kept in different places, for instance knowledge management systems such as wikis and process repositories. They facilitate the efficient storage, organization, and retrieval of these kinds of information. According to the discussion in Section 2.2.2, such systems can substantially benefit from the Web 2.0 patterns: access to capabilities is largely simplified, functionality encapsulated in services is more likely to be reused, and data can be enhanced through the “wisdom of crowds” [O’Reilly, 2005]. The latter leverages the knowledge of individual process stakeholders to complete, improve, and link process related information. Using technologies and systems that emerged with the Web 2.0, such as wikis or issue trackers, facilitates the collection of this information and allows putting it into relation. Feedback, tagging, and rating enable stakeholders to maintain and enhance existing information. Mashups offer the capability to combine these information islands, and provide insight among required information. Several applications of mashups on top of this information will be presented below, corresponding to the phases in the business process life cycle they propose value for. 5.3.1 Design and Analysis Among the stakeholders involved in process design, the process designer stands in the center of this phase, coordinating requirements from other stakeholders, such as knowledge workers, enterprise architects, and the chief process officer. The process designer also holds the role of the process manager quite often, due to their central role and their profound understanding of the process they are responsible for. In many cases, process managers have knowledge of business operations that goes beyond existing documentation. 60 5.3 Value Proposition of Mashups for Business Process Management The central activity conducted within this phase is process modeling. The process is usually modeled with the help of graphical notations, such as the Business Process Modeling Notation (BPMN [OMG, 2008]) or the Event-driven Process Chain (EPC [Scheer et al., 1992]). Many more exist that essentially share similar characteristics: They provide easily understandable, graphical expressions along with formal semantics that allow for automated model verification. process model Design and Analysis Configuration Enactment Evalutation process knowledge Figure 14: Usage of Process Model and Process Knowledge throughout the Process Life Cycle (BPMN) Figure 14 shows data dependencies within the business process life cycle on an abstract level: Data that emerges from different phases of the life cycle, generally labeled process knowledge, and the process model itself entail each other. Presenting process knowledge to process designers allows them to consider all relevant information items, issues, constraints, requests, and potential quality requirements in a holistic way during process design. In a given scenario, a process model is assembled of a set of activities that are kept in a process repository. These activities have specific semantics, and information about them is kept in different systems. Knowledge about such activities includes informally captured process documentations and advices stored in different collaborative tools. An issue tracking system stores problems or situations that arose within other phases of the process’ life cycle. Examples for the latter are setup difficulties during configuration, operational obstructions and problems during enactment, and performance issues such as unsatisfying throughput or high failure rates. The information items are likely fragmented, since they arise from several stakeholders in different life cycle phases. It is desirable to integrate these phases by fueling the gained knowledge and experience directly into design and redesign. 61 5 Application of Mashups for Business Process Management A potential mashup aggregates arbitrary data related to the process originating from any phase of the life cycle. The result is displayed directly in the modeling environment, attached to the model to point out which items belong to which activity or model element. Such a process visualization tool could be offered to involved stakeholders, encouraging them for collaboration and contribution. Manifesting and enhancing data in knowledge management systems gives the process designer the ability to consider all these influential factors, which eventually leads to better process design and improvement. A prototypical implementation of this mashup is described in Section 6 and depicted in Figure 19. In an enhanced process modeling environment, process model verification and simulation could be supported through implementing a visual step-through mechanism for the process model, highlighting currently enacted activities of a process instance. An according mashup would need to access the different knowledge management systems and to interact with the process execution engine. 5.3.2 Configuration Assuming the utilization of a business process management system, the configuration of a process is largely reduced to enhancing the process model with technical information that specifies its interaction with its environment. That means providing software artifacts that realize activities and connecting the process to interfaces of systems of the organization’s IT landscape. The responsible process stakeholder role is the process implementer supported by a group of developers and enterprise architects. Thus, embedding a process in its environment is essentially composing an application—the process—of a set of services—the systems’ interfaced functionality. Mashups are composite applications, too. Advocates of mashups even describe them as service compositions. [Spohrer et al., 2008] define a service as “the application of competence to the benefit of another”, that embraces the steps proposal, agreement, and realization. A service is the process of value co-creation between provider and beneficiary. In the mashup ecosystem (cf. Section 4.1), the capability provider is the provider of knowledge or skill, and the mashup the beneficiary. The provider proposes its capability, by making it available on the Web, agreement is reached upon consumption of that capability, through request and the according response, containing a representation (cf. Section 4.2), and application of the mashup logic realizes the creation of value. Service systems are not restricted to a specific number 62 5.3 Value Proposition of Mashups for Business Process Management of providers and beneficiaries. This supports the perception of mashups composing a set of services. For example, a simple inventory replenishment process would access a customer relationship management system and obtain the latest orders of a customer. Based on that information, the process would inquire the supply chain management system and invoke a replenishment request to fulfill the customer’s order. Figure 15 compares this process implementation as workflow (left) with its implementation as a mashup (right), revealing a remarkable amount of resemblance. In a process execution engine, the behavior of specific activities needed to be described through software artifacts that access an organization’s services. Flexibility is achieved by the automatic interpretation of the process model resulting in process instances. The same holds for mashups: An augmentation component would contain the process logic and access an organization’s services, potentially Web enabled, through ingestion components (cf. Section 4.4), providing the process’ outcome through a publication component to humans or other applications. In contrary to providing activity specific implementations, the augmentation component would express the process logic and compose standardized ingestion components. A mashup can thus be considered as a manifestation of a process (cf. Section 4.3.2)— the “glue” that combines the activities is a directive of the work that needs to be performed. Yahoo! Pipes (cf. Figure 2) and Microsoft Popfly give a good impression, how a data centric process could be implemented as mashup in practice, leveraging the pipes-and-filter pattern. A process is defined by a set of activities—filters—that are chained to each other through pipes. Task 1 Augm. Task 2 ▲ Ing. 1 ▲ CRM System SCM System crm.uri ▶ Pub. ▲ Ing. 2 ▲ scm.uri Figure 15: Comparison of the Configuration of a Business Process (left, BPMN) and a Mashup (right, UML) Due to their characteristics of little governance, small scale, and situational scope (cf. Section 3.5) mashups may not satisfy the needs to enact processes under high 63 5 Application of Mashups for Business Process Management performance and government demands. However, they may be of value for processes that need to be enacted within a service-oriented environment and have a low degree of repetition, such as prototypes or interim solutions. Compared to classic software service systems, mashups are more focused on data aggregation among a set of disparate sources than functional composition. Thus, mashups may be the preferred choice for processes that focus on data retrieval among systems, or as [Crupi and Warner, 2008b] describe it, to provide a face on top of the services of an organization and establish a layer that connects users with a service-oriented architecture. This is especially favorable due to mashups’ user centricity that makes services visible and accessible to human end users in less governance-eager situations [IBM Corporation, 2008]. The idea of leveraging capabilities of any type offered on the Web is likely to influence business process management in the future. With the rise of Web 2.0, Web applications became exceptionally famous and companies seek to offer legacy functionality through Web APIs. Mashups may play a considerable role in that development, as outlined in Section 7.3.3. 5.3.3 Enactment As outlined above, processes as well as activities can be differentiated between system workflows or activities, which do not involve any actions conducted by humans, and human involved workflows or activities. Generally, system activities are performed by sophisticated software artifacts developed in the configuration phase, some are provided through service interfaces of the organization’s IT landscape. Eventually, there may exist activities taking resources into account that reside outside of the organization’s systems landscape. While processes themselves do not cross organizational boundaries, mashups provide conceptual and technological means to access external resources through a virtually unrestricted set of data formats, aggregate information, and make sense of it fueling directly into the further process. This could be realized in practice through delivering an activity as subprocess. Corresponding to the implementation of processes as mashups, discussed previously, a subprocess could be carried out as a mashup, aggregating information or functionality (cf. Section 4.2), such as leveraging geographic location services to extract a precise location out of a free text description, for instance the Yahoo! Pipes Location Builder Module. 64 5.3 Value Proposition of Mashups for Business Process Management Despite system activities, mashups suggest significant potential to support and improve human involved activities. Such activities rely on process participants— knowledge workers—that are able to perform tasks and make decisions that cannot be performed automatically. Thus, human activities require user interfaces that do not only provide the process participants with information that is directly related to the activity, such as input and output information. Further information about related resources may be necessary. It is the process participant that has the required knowledge and experience which information needs to be referred to. User interfaces traditionally comprise work lists that contain a set of task items that need to be conducted, each item consisting of a form that expects data input from the user. Process participants perform these work-items as they appear in their work list. The splitting of a process into small, self-contained activities results in process fragmentation that dates back to the early days of manufacturing where the fragmentation of a process into small activities and their execution by specialized workforce was very efficient. Knowledge workers, however, are different from manufacturing workers in that they have the expertise and experience to control the whole case, fragmentation of work is counterproductive. For instance, case management in the financial or insurance industry involves considering data from many sources, such as customer relationship management, accounting, or collections. Often, employees are left alone to obtain this information without coherent user interface support. This results in alarming work place setups, where knowledge workers have a magnitude of applications running in parallel sustaining high risk of errors and inefficient case management. To account for this weakness, part of the business process management discipline shifted its focus from flow processing to data-driven processing for such cases. Case handling supports complex activities that need to be handled by humans: “the focus is on the case as a whole rather than on individual work-items distributed over work-lists” [van der Aalst et al., 2003], resulting in relaxed ordering of activities and more flexible completion approaches of the cases. Case handling itself deals with the formal specification of data dependencies and resulting processes that disclose high variability. Its details are beyond the context of this thesis, further work in that topic is provided in [van der Aalst and Weske, 2005, Weske, 2007]. Mashups offer substantial improvement for human interaction, in view of the fact that they aggregate potentially related information from arbitrary sources. They can provide a complete picture of a complex case, including all information related to the process as well as information that may be important for the case, but not considered part of the process’ resources during process design. A solution that 65 5 Application of Mashups for Business Process Management worklist and data form Accounting customer payment history customer rating Collections customer details Stock Website CRM System Figure 16: Example of a Dashboard that Supports Human Activity aggregates a set of pre-considered information, including an interface to handle the case, yet one that is extensible through the individual user, is desired. Dashboards provide exactly that form of data aggregation and flexibility. For each identified case a basic set of widgets can be assembled, based on the specification of a case. Such a basic configuration would comprise a widget to interact with the relevant data items of the case, as well as related information assets that offer decision support, e.g. internal and external performance figures. The process model itself could be included in such a dashboard to visualize the current state of a process, giving advice to the process participant and information about related activities. Knowledge workers would be empowered to create and customize such a supportive environment themselves, by extending the set of widgets with further information that may they consider relevant. An example for such a dashboard is sketched in Figure 16. Besides providing means to handle the case, the dashboard provides 66 5.3 Value Proposition of Mashups for Business Process Management several widgets that summarize information about a customer that were retrieved from disparate systems. Using expressive metaphors, colors and gauges, provides understanding of the given information at a glance. The remarkable value mashups provide is an aggregation of real time data combined by the domain expert without putting the burden on the IT department to develop these tools. Effective coordination of these widgets can further reduce workload significantly, such as copying data from one place to another, allowing the knowledge worker to focus on the case itself. 5.3.4 Evaluation Evaluation uses information collected during the enactment phase to assess the performance of process models and the technical and organizational execution environment based on measurements taken from process instances. Such assessment can generally be distinguished according to the time of assessment related to the time of data acquisition. Live monitoring of processes takes the state of currently enacted process instances into account. Mashups can aggregate that data and display it corresponding to the process model. Visually combining model elements, such as activities and events, with information about active process instances gives insight about how many instances are currently executing the same activity, which process participants hold which tasks, which resources are involved, and which customers are being served. Single process instances could be inspected and diagnosed for potential issues. The second type of process quality assessment uses historical data about processes to derive its overall performance in the setting created during the configuration phase. Performance measures such as time spent to execute specific activities can be analyzed statistically. Again, visualization of the gained information in close relation to the corresponding process model facilitates the identification of bottlenecks or resource scarcities. Single process instances can be traced back based on the information retrieved from process execution logs. The knowledge gained in this phase can be used to compare a current process model and its configuration to earlier and future versions and assist in improving the process model. A possible mashup that provides a hybrid of the approaches described above would provide immediate reports about the current state and performance of specific processes in an organization. This allowed comparing currently running processes with 67 5 Application of Mashups for Business Process Management historic measures, making manually assembling process information and creating a report unnecessary. Such mashups can give valuable insights about recent process improvements and thus, fuel directly into future process design and redesign in a subsequent phase. 5.4 Assessment Business process management is a well governed science built on strict and formal fundaments whereas mashups are comparably lax as a result of ungoverned evolution among a broad base of individuals. Compared to business process management, mashups address only small scale processing and interaction is largely reduced to capability retrieval, transformation, and combination, whereas business processes are likely to have a high complexity of interaction with systems that share their resources’ state with the enacted process instances. Business process management is centered at the interaction with systems and the choreography of process instances across organizational boundaries with a strong emphasis on their control flow. However, the observations and considerations above lead to the conclusion that mashups propose considerable value in contributing to the goals of business process management. Understanding the operations of an organization is supported and established in those phases of the business process life cycle where mashups accumulate information from different sources and provide insight and overview. In the design and analysis phase, knowledge that is spread and fragmented across stakeholders is recombined to create holistic views and increased understanding of a process and related information. This leads to better process design and eventually supports the improvement of processes. The evaluation phase benefits from the ability of mashups to aggregate any data that arose during the enactment of process instances and provide real time reporting about process state and process performance. Since the use cases and requirements on such reports are virtually unlimited, the user-centric ad-hoc aggregation of “whatever needs to be aggregated” [Hinchcliffe, 2006] can satisfy specific needs. Implementation of processes in an enterprise environment is largely supported during configuration and enactment of business processes. In the configuration phase, mashups can assume the position of a small-scale process execution engine. This means that processes can be implemented as mashups due to their conceptual perception of aggregating services. Business process management demands for strong 68 5.4 Assessment governance along the business process life cycle, which renders support for processes with a low degree of repetition inefficient. Such low repetitive processes embody situational needs and thus, may be more efficiently supported by mashups. If a rapid solution is needed, the supported process is relatively simple, and governance or efficiency of its enactment are not key performance measures, mashups are likely to be beneficial. Thus, mashups can be perfect solutions for process prototypes or interim solutions. During enactment, mashups can represent activities that are rather data centric and interact with capabilities external to the process environment or with humans. Dashboards provide huge improvement in efficiency as decision and information support for interaction with knowledge workers, providing overview and insight in case handling. The dynamic character of dashboards also contributes to flexible adaptation to changes in the business process. Flexibility is greatly supported in all phases since mashups are ad-hoc aggregations. They constitute just-in-time solutions that don’t require much work to be set up and to be hosted in an organizational and technical environment. Changes in that environment can be easily integrated into mashups, due to their ability to provide lightweight and rapid solutions to specific needs. The considerations in this chapter lead to the conclusion, that the key benefit for mashups in business process management can be seen in the aggregation of data related to a process and its visual presentation that is based on the corresponding process models. This enables process stakeholders themselves to create new perspectives on process models that include information relevant to particular needs. Mashups leverage existing yet fragmented information and provide users that have a stake in business processes with overview of and insight into that information. On the other hand, mashups provide a huge set of technical accomplishment to access resources of any type and make use of it. These technical foundations of leveraging the Internet Operating System suggest benefit to business process management in the future. Given example scenarios outline the potential application of mashups and show that the possibilities are by no means exhausted. In many cases the customers of business process management have very specific needs, based on their actual enterprise IT architecture and the processes they employ. The wide adaptability of mashups shows that they can satisfy a broad diversity of needs. While not all use cases can be identified in advance, mashups provide the means to address and satisfy these needs. 69 6 Enabling Collaborative Process Design with Mashups This chapter comprises the documentation of a prototypical mashup application that picks up the observations made and conclusions drawn in Section 4 and 5 and proves their practical realization and applicability. The provided mashup obtains and aggregates fragmented knowledge about a process throughout the business process life cycle and makes that information accessible through a visually rich and easily understandable user interface. As outlined in the previous chapter, the central knowledge asset of business process management is the process model. The Business Process Technology group at the Hasso Plattner Institute hosts an open source software project, called Oryx Editor26 . Oryx is a browser-based model editor, capable of designing many types of diagrams, yet currently focusing on the support of process models via BPMN [OMG, 2008]. Additional to the browser-based model editor, Oryx comprises a model repository that allows users to store and manage models, transform models into different formats, and perform diverse operations on them. Oryx’ modeling capabilities and free access as Software as a Service make it a suitable provider for process models and will thus be leveraged for the mashup prototype. This chapter is, in contrary to the previous ones, rather technical, since it describes the actual implementation of a mashup and a mashup framework. Understanding of and experience in the development of Web applications are advantageous. 6.1 Analysis Section 5.3.1 concluded that numerous information artifacts are yielded within the different phases of the process life cycle originating from several stakeholders. These information artifacts, in relation to processes also denoted as process knowledge, are likely kept in different places within an organization. The utilization of Web 2.0 and open source information systems, such as wikis, facilitates the collection of information out of the process stakeholders’ minds and is already well established, according to [Bughin et al., 2008]. The wisdom of crowds (cf. Section 2.2.2) allows individuals to enhance and correlate the fragmented knowledge and makes it more valuable. Another promising source of information are issue tracking systems. Such systems allow relating deficiency reports or feature requests to specific artifacts of a product, and record them in a managed system. Issues such as incompatibilities 26 http://oryx-editor.org 71 6 Enabling Collaborative Process Design with Mashups with existing systems during the configuration phase of a business process, necessary improvements that become apparent within process enactment, or unsatisfactory performance numbers emerging from process evaluation are potential requirements that may be tracked in issue tracking systems in the context of business process management. Based on that assumption, a concrete scenario was derived, already introduced on a more abstract level in Section 5.3.1: During the first phase of the process life cycle—design and analysis—all relevant information that may have an influence on the tasks within this phase must be considered. It is a rather daunting task to search and inspect the fragmented process knowledge manually, and an automatic solution that aggregates these information artifacts and provides them in a holistic way is aspired. The present prototype aims at connecting stakeholders through aggregating and presenting shared knowledge, thus improving collaboration among them and offering holistic insight into processes. Process models are leveraged as a vehicle for the management and visualization of information related to processes and process elements. This leads to the support of the following process relevant tasks through the mashup prototype. Requirements Management: Requirements in form of requests or deficiency reports can be related to processes and process elements. Documentation: Any document that is made available through a URI, i.e. on the Web (which may be a closed corporate network), can be related to processes and process elements. Wikis are considered the primary source for documentation. Process Modeling: Process modelers are provided with all relevant information by the aggregation of process model, requirements, and documentation. This includes redesign and improvement of process models as a result of evaluation. Review: Collaborative workshops to explore and review processes are supported by visually attaching information items to the process model. The mashup is provided as a Web application and thus universally accessible. The following capabilities were identified to be aggregated by the prototype. As already described in Section 3.5.8, mashups suffer from the same origin policy imposed by Web browsers. This can be overcome by the employment of different means, namely an AJAX proxy or JSONP. The problem and both workaround approaches are discussed in detail in Section 2.3.3. The description of each capability will also include a short discussion on the strategy to obtain information from it, giving proposals for their ingestion. 72 6.1 Analysis 6.1.1 Process Model from Oryx Oryx does not only provide means to model process diagrams but also stores them and offers them in different formats, such as RDF, PNG, or SVG. Models are identified by URIs. An extension to the process repository of Oryx, located on the server site, further allows delivering models formatted in JSON and encapsulated in a function call, to be loaded via JSONP. The modeling component of Oryx editor requires the ability to display SVG inline the page’s DOM and handle valid XML, which is not provided by every Web browser. Thus, a mashup API was developed as part of the Oryx project that is technical independent from the modeling environment and allows viewing models in all browsers without any extensions. This API is called MOVI27 . MOVI retrieves a picture of a model diagram in the PNG format as well as a JSON-formatted representation of the model via JSONP and creates virtual process model elements completely as part of the page’s DOM. This makes Oryx’ process models displayable in virtually all popular browsers. MOVI’s features further allow for interaction of the user with the process model, providing means to highlight model elements, and display arbitrary content. By these features, MOVI provides the ideal starting point to visually aggregate information with a process model and elements, remaining accessible through most browsers. Therefore, it will found the functional basis for the mashup and the publication component at the same time. 6.1.2 Issues from an Issue Tracking System Issue tracking systems are a common way in software development to track issues that arise in relation to any development artifacts, i.e. software system components. Issues are captured in the form of tickets that describe either an incident or a request and are directed to responsible persons to solve the issue. Such tickets allow documentation of the resolution progress over time and thus, traceability of design decisions. Issue tracking systems support the management of tickets within a project through providing means to prioritize tickets and describe their severity and affected artifacts. It is quite obvious that such a system could also be employed to keep track of issues that arise with processes or activities in all phases of the business process 27 MOVI is an acronym that stands for MOdel VIewer and can be found at: https://bpt.hpi. uni-potsdam.de/Oryx/MOVI 73 6 Enabling Collaborative Process Design with Mashups life cycle. This can be requirements and requests during process design as well as incidents during process configuration and enactment. The present case assumes exactly this scenario, leveraging the Trac28 open source issue tracking system to document issues with processes and process activities. Besides its issue tracking capabilities, Trac provides several tools for lightweight, web-based software project management. For issue tickets it offers powerful search functionality that takes an arbitrary set of conditions, according to ticket data, as input and provides a configurable set of ticket related information as result. It also provides search results in the form of content syndication via RSS, but in that case the returned information is limited to title, description, author and date. In order to retrieve further information, the ticket itself needed to be inspected. This problem is described by [Alba et al., 2008]: A Web API is provided in the form of a content syndication feed, but the information exposed through a human interface—the website—provides more accurate data. Thus, screen scraping of the website was chosen to gather information about related tickets. Tickets are related to process elements through two ticket data fields: component and keyword. The component of a ticket defines the artifact where it occurs, this maps to the process identified via a URI in Oryx. Each process model element is identified through a resource-id that is mapped to one of several keywords of a ticket. This allows relating one ticket to several elements as well as several tickets to the same element. Entering this data at the time of ticket creation is relatively simple and supported by the mashup through explicitly presenting this data to the user. 6.1.3 Documentation from a Wiki A wiki is a hypertext based content management system. Compared to traditional content management systems, wikis do not explicitly distinguish authors from readers, but rather serve as a central place to collaborate in content creation. Probably, this epitomizes the Web 2.0 paradigm of participation in the first place. Wikis gained increasing significance as support for documentation in organizations, because they enable collaboration and provide an easy to use interface. Wiki interfaces have a remarkably low entry barrier opening it to a wide spectrum of users, even those that are not technically savvy. According to [Bughin et al., 2008], wikis are already quite established tools within companies. 28 http://trac.edgewall.org/ 74 6.2 Design Process models created with the Oryx editor allow attaching URIs to model elements, referencing external sources of information. Such information will be retrieved by the mashup to enrich the process model with documentation. This information is likely kept in a wiki, but it may also be stored in other systems that provide access to content via a URI. For the prototype, process documentation will be stored within a page of a TWiki29 installation and is accessed through obtaining the HTML representation of that page. 6.2 Design According to the capabilities aggregated, the prototype described here uses a hybrid of both approaches to overcome the same origin policy and consume capabilities from remote resources (cf. Section 2.3.3). MOVI leverages JSONP that allows it to run completely independent of any server component. However, issues from the issue tracker and documentation from the wiki pages must be accessed by a proxy server component that is invoked through an AJAX request from the client. Consequently, the mashup application itself consists of two tiers, illustrated in Figure 17 and 18: a client tier that allows for interaction with the user and a server tier that provides access to remote resources through a proxy. Corresponding to the conclusions about courteous access of resources drawn in Section 3.5.8, this server tier can be used to employ a cache that shields high traffic loads from capability providers. Advancing this idea—performing operations on the obtained capabilities, such as data filtering and transformation, and caching those results—suggests to elaborate on a minimalist mashup platform, discussed in the following. 6.2.1 Mashup Platform As previous considerations showed, a platform that supports development, deployment, and execution of mashups in a hybrid scenario to access remote capabilities, is expedient. Figure 17 depicts an architectural diagram of this platform. The platform allows for the execution of operations on a Web server. These operations are encapsulated in components called filters, according to the pipes-and-filter pattern that has repeatedly shown up in the context of mashups (cf. Section 4.3.2). The design is based on the idea to compose filters according to the types introduced in Section 4.3. Filters for ingestion, augmentation, and publication bear resemblance 29 http://www.twiki.org/ 75 6 Enabling Collaborative Process Design with Mashups ... Mashup Client Tier ▼ HTTP Filter Delivery and Execution Engine .. Filter Storage Filter Result Cache Executed Filter ▼ Mashup Server Tier HTTP ... Capabilities Figure 17: Architecture of the Mashup Platform (FMC) among each other on a technical level and it should make no difference whether they were executed on a server or a client. Thus, the filter delivery and execution engine of the mashup platform either loads filter code through HTTP and transmits it to the client or executes this code on the server and delivers the filter results. The results of each filter can be cached on the server, which allows for faster responses, if cached data is fresh enough, and reduction of the impact on capability providers. Filters are stored on the server and the platform allows the execution of these filters in a controlled environment: the execution context that is created upon a client’s request (cf. Section 4.1). Filters can be invoked through HTTP requests or by other filters that are already running. The only programming language that has been considered suitable for the implementation of filters is JavaScript [ECMA, 1999], because it is universally supported by Web browsers. Thus, a Web server was needed that supports to run JavaScript as well. Among suitable JavaScript engines were Aptana Jaxer30 , Mozilla Rhino31 , and Mozilla Spidermonkey32 . One of the potential Web server candidates that featured one of the above JavaScript engines was CouchDB33 , a document-oriented database system that stores schema-less data in the form of JSON objects. Among others, 30 http://www.aptana.com/jaxer http://www.mozilla.org/rhino/ 32 http://www.mozilla.org/js/spidermonkey/ 33 http://couchdb.apache.org 31 76 6.2 Design CouchDB offers the following features that contribute to the mashup platform: (a) store and serve static files, (b) store structured or unstructured documents, and (c) execute JavaScript code in Spidermonkey through an interface corresponding to the REST architectural style (cf. Section 2.3.2) making it a Web application platform. These features led to the design and implementation of the minimalist mashup platform described above, using (a) to store the mashup application, related files, and the filter components. (c) is leveraged to execute filters within CouchDB, accessing remote capabilities that were otherwise inaccessible to the client. The composition of several filters during runtime is supported by that feature, too. Finally, (b) enables caching of filter results by storing them as documents, if this is requested at the time of filter invocation. Such caches provide temporal snapshots of filter results to be returned while they are fresh, instead of running the filter repeatedly. 6.2.2 Mashup Architecture Figure 18 illustrates the architecture of the mashup prototype and its components that run within the mashup platform. As it becomes apparent in the diagram, the prototype is constituted by several components, which are either executed on the server or the client, as described above. Each component refers to an activity of the mashup pattern, discussed in Section 4.3. The architecture itself is an instantiation of the mashup reference model, described in Section 4.4: Each component consumes data, and delivers it to its succeeding component, except the publication component that delivers a representation to the user. These components and their functions are explained briefly, according to the mashup pattern activity they belong to. Ingestion. The MOVI API supplies its own ingestion layer executed completely in the browser, implementing JSONP to load functionality and data on demand. Besides loading the model in from of a JSON representation and a picture for display, MOVI offers the process model as a data structure that allows for inspection and according processing of the process model. The content of wiki pages is formatted in HTML and needs to be obtained through the server side ingestion component that is requested by the browser application through an AJAX request. The wiki ingestion component loads the HTML representation of a given URI. If this URI specifies a fragment (identified through the # character), the content will be reduced to the DOM node identified by that fragment. Otherwise it will be reduced to the content of the HTML body. Finally, the result of 77 6 Enabling Collaborative Process Design with Mashups ▼ MOVI (Publication, Oryx Ingestion) Augmentation Mashup Browser Tier ▼ Wiki Ingestion ▼ Trac Ingestion Mashup Server Tier ▼ Wiki ▼ Trac ▼ Oryx Backend Figure 18: Architecture of the Mashup Prototype (FMC) the ingestion component is returned as a string containing an HTML fragment that can be displayed somewhere in the mashup. The aforementioned flaw in the RSS-based Web API of Trac (cf. Section 6.1.2), entails the employment of screen scraping to search and retrieve issues that are related to a specific process model. This is realized through a set of regular expressions and iteration over the search website’s HTML source. Application of an XML parser was not possible, because the description of a ticket allows the containment of HTML, which may not conform to XML rules and eventually break the parser. Again, the information cannot be accessed via JSONP and thus, needs to be retrieved through an AJAX request forwarded by a proxy. The ingestion filter located on the mashup platform performs the screen scraping and transforms the issues into a JSON collection. This data transformation is the very data normalization mentioned in Section 4.3.1. This task consumes some time, as well as processing resources both on the Trac server, for searching related tickets in a database, and the mashup ingestion component, for screen scraping and data normalization. High load is intercepted through the employment of a cache that temporally stores the scraping results, offering courteous access to capabilities, discussed above and in Section 3.5.8. 78 6.2 Design Augmentation. Aggregating process model, related issues from Trac, and documentation from wiki pages is implemented through iteration among the process model elements accessible via the MOVI API. Each model in Oryx is identified by a URI. This very URI is used within one of the issue properties (component) to relate the issue to a model. Another property (keywords) is used to relate an issue to a specific model element by the use of the identifier of that element. While the model URI is used to search for issues of a model (described above), issues and model elements are connected through comparing the model identifiers stored in entities of both types. Model elements can contain links to resources on the Internet (refuri), which are considered documentation capabilities. Thus, each model element can refer to its own documentation that is then loaded through the specified link by the wiki ingestion component. Publication. The publication of the mashup prototype leverages the mapping metaphor, i.e. displaying landmarks in close proximity to its geographic location on a map. For the present mashup, however, the geographic map is replaced with a process map—the process model—and landmarks are exchanged with issue and documentation information displayed close to their according model element. While augmentation and publication are distinct in their conceptual nature, publication actions are conducted concurrently to augmentation. All information is presented on the picture of the process model, supplied by MOVI. After model elements, issues, and documentation are connected during augmentation, the publication layer creates annotations that comprehend information about issues. A green, orange, or yellow circle that is connected to a model element represents the maximum severity and overall number of issues of a model element. To discover more information, a tooltip is displayed that contains the list of all issues and documentation, upon clicking on a model element. This tooltip is displayed close to the original model. An additional list shows all issues for that model, including those that are not related to a specific element. If an issue of that list is related to an element, clicking on it highlights this element on the process model picture and shows the tooltip. All links contained in the representations of the capabilities, i.e. the model URI, documentation/ wiki page URI, and issues’ URI are kept valid. They are incorporated into the mashup’s presentation, offering the capacity to navigate to the original capabilities, make changes, or get more information. Thus, the mashup works also as a portal to comprehensive collaboration, guiding users through the information system landscape of an organization. 79 6 Enabling Collaborative Process Design with Mashups 6.3 Realization The resulting application is depicted in Figure 19. The screenshot of the mashup shows the annotations displaying the highest severity and overall number of issues per model element as well as the tooltip containing the list of issues for the currently selected model element. The list of all issues is located on the right. Figure 19: Demo of the Mashup Prototype According to the mashup types identified in Section 3.4, the prototypes can be considered an organic mashup. It was developed manually, because no existing tools provided adequate support to aggregate the very specific capabilities needed for the scenario. Ingestion required sophisticated means, because the information systems did not provide very elaborate Web APIs to access content. Yet does this demonstrate the ubiquitous availability of capabilities on the Web. A standardized and established data format and schema for Web APIs would be desirable, as pointed out in Section 3.5.3. Capability providers attempt to satisfy this desire by providing information by means of content syndication. The prototype also features the common characteristics of mashups that were synthesized in Section 3.5. The application addresses the specific needs of process designers that need a brief yet complete overview of relevant information tailored to a process model and is provided as Software as a Service, through offering it on the Web running in a Web browser. The mashup aggregates heterogeneous content: MOVI functionality to display and interact with the picture representation of a process 80 6.3 Realization model, HTML representations of related documentation, and remote search capability to obtain HTML representations of process related issues. Although MOVI provides comprehensive functional features, it focuses on the presentation of the process model and is, in combination with the mashup application, rather data centric. The overall scale of aggregated information is relatively small, determined by the size of the model and the amount of issues. The implementation of the mashup exposed two very typical qualities of mashups. The first is lack of governance (cf. Section 3.5.8). This topic has been frequently discussed in this work, and had considerable influence on the prototype. Since no means for federated login were considered adequate, all information aggregated by the mashup has to be freely available. This turned out to be acceptable, because the mashup did not require any authorization to modify any obtained information and read-only access rights were sufficient. The second quality is the short time to market. Excluding the design and implementation of the mashup platform and extensions to MOVI, i.e. annotations for issues and tooltip visualizations, it took less than a week to implement the mashup itself. 81 7 Conclusion and Outlook The last chapter of this thesis summarizes and concludes the observations and findings of the work and gives prospects on future topics that are considered relevant indicators in the context of mashups and business process management. However, an assessment of mashups is provided first that puts this new and exciting genre of Web applications into relation with other software systems: The long tail is a frequently cited metaphor in connection with mashups [Hoyer et al., 2008]. 7.1 The Long Tail Figure 20 depicts the continuum of software that is employed in enterprises. Located at the left end of this spectrum is software that addresses strategic goals and thus the core business of an organization. Applications and systems of that type of the continuum are used by many persons and driven by few. Building theses systems involves a well engineered development process, and is a comparatively long term project. This is, because the systems are large scale systems, supporting many people, managing magnitudes of data and functionality. Due to their nature, these systems have strong demands for governance, such as security, reliability, availability, and performance that exceed the demands for flexibility by far. This is where large scale and complex, expensive IT systems are located. Among these are customer relationship management, enterprise resource planning, and supply chain management, as well as service-oriented architectures that provide an implementation platform for business processes. On the right end of the spectrum are more opportunistic applications that focus on day to day problems of users, and satisfy the needs of individuals. Such applications do not handle large scale data and have no or very low demands on governance and quality and rather emphasize rapid, lightweight development and flexibility. A typical example is the “spreadsheet keynote” that uses some tables of a spreadsheet to derive statistics and present some diagrams. Due to their simplistic nature, such applications are easy to build by technically-savvy people, usually through assembling existing components. They are driven by many, but provided to few. The spectrum is not discrete, there is no specified point where the first type starts and the second type ends. Systems and applications may not even be located immobile on the spectrum. Maturing prototypes, for instance, move from early and rapidly assembled mockups, located on the right side, increasingly to the left. Systems that are located along the curve typically feature specific characteristics: The 83 7 Conclusion and Outlook #users / cost / value per application head tail # of applications Figure 20: Long Tail in the Spectrum of Software Systems further right an application moves on the curve, the less involvement of the IT department and formal development methods does it embrace, thus lowering costs for their development. On the other hand, the value of the single application is highest on the left side of the spectrum which comes along with high risk to develop such a system—the opposite is the case at the right end. The area below the curve represents the value of systems and applications to the organization that possess them, either as the driver of or as support for the organization’s activities. The spectrum depicts the long tail, a term that was coined by Chris Anderson [Anderson, 2004]. He published an article that described an entirely new economic model for the media and entertainment industry: With the evolution of online distribution many influential factors of traditional distribution disappeared. Among them are limitation of resources, such as storage space, sparse demand within geographical regions, and production costs for mediums that carried the content, such as CDs and DVDs. It has become profitable not only to sell recent mainstream hits but also alternatives and misses, because the cost to provide them became extremely low. The area below the curve represents the revenue that is made through selling content, hits are located on the left, few works that are consumed by many, and misses on the right, many works that appealed only a few. The sheer amount of the less demanded products outweighs the few hits and can even create more revenue [Anderson, 2004]. This narrow shape that promises so much value is the long tail. Similar to goods retailers, IT departments did not sufficiently address applications that would not generate a high demand, because the development cost of applications 84 7.1 The Long Tail with low demand and a potentially short life time was higher than the revenue these applications could provide. The evolution of Web 2.0 supplied individuals with the tools to create situation specific and visually appealing applications drawing from virtually unlimited resources of data and knowledge on the Web, e.g. through wikis and blogs. This is where mashups emerged: niche products that became profitable and allowed to extend the spectrum of corporate applications towards the long tail. Mashups offer lightweight solutions to existing problems that were either neglected or addressed inappropriately in time and quality, before. They make it easy for domain experts to build working software that addresses their particular needs. This flexibility allows addressing immediate needs and provides realistic options to create cheaper solutions for an organization. Due to their low cost, the risk-costs are low too. Along the long tail, one can even attempt to differentiate mashups by their type. Organic mashups are typically less customizable, since they address the needs of a narrow group of users rather than those of individuals, which suggests locating them fairly left of the long tail. Dashboards on the other hand are extremely customizable; in fact it is the individual user who decides which capabilities are aggregated. The ratio of developers to users is practically one, locating dashboards at the very right end of the long tail. Mashups are not stuck to the long tail. As already mentioned in section 2.2.3, if demand increases, they may be subject to be shifted more and more to the left, while still remaining mashups. HousingMaps is a very successful example that did not only gain substantial popularity, but was also motive for Google to actually open their mapping capabilities and provide them as Web API. On the other hand, demand may decrease, mashup applications outlive their usefulness, and become disposed. This does not impede their creation, since they were cheap to create in the first place. The long tail is a valuable metaphor to compare software systems. It shows that mashups are applications that emphasize the satisfaction of immediate needs, rather than providing complete software suites for holistic business use cases. However, mashups can excel the state of a situational application and evolve to a more general application that serves many users, moving along the spectrum in the direction of the left end. The evolution of mashups will be largely influenced by the demand for situational applications in the future and their evolution along the long tail. 85 7 Conclusion and Outlook 7.2 Conclusion The work presented in this thesis thoroughly elaborated mashups to understand their basic concepts and value. The gained insight founded an analysis, in which ways mashups can be applied in business process management and which value they propose in different scenarios. Mashups arose from situational needs to combine capabilities on the Internet, reminiscent of recycling existing resources, rather than creating new applications from scratch. Although mashups became increasingly popular just recently, they are perceived as a phenomenon of assembling applications out of pieces spread over the Internet. Thus, mashups are a genre of applications, rather than a specific architecture or technology, enjoying broad interest, yet lacking concise definition. Due to their relatively young existence, such definition would probably do a disservice to their future evolution and impose inapt boundaries on the flourishing of mashups. After a short introduction into the history of mashups and business process management (cf. Section 2), the first part of this thesis has been dedicated to build advanced understanding of the mashup genre that comprises a broad variety of many manually built and tool-supported mashups, as well as tools themselves. Based on a survey among several successful mashup applications and tools, Section 3 identified two main types of mashup applications, organic mashups and dashboards, along with a set of common properties that are characteristic for mashups: user centricity, small scale, open standards, software as a service, short time to market, aggregation of heterogeneous content, data centricity, and lack of governance. The observations and conclusions gained in that survey and the synthesis of its results fueled into the elaboration of an operational pattern common to mashups in general, explained in Section 4.3. This pattern describes the conceptual workflow of a mashup that consists of activities corresponding to one of three types: ingestion, augmentation, and publication. The pattern blends with the environmental setting of mashups, described by the mashup ecosystem, in Section 4.1. In order to support mashup design and reengineering according to this pattern, an elementary reference model was derived that captures mashup operations in entities and discusses relationships among those, presented in Section 4.4. All mashups that were reviewed in the survey (cf. Section 3.1) fit into this reference model. The acquired insight into mashups explains their general value proposition, fostering innovation through unlocking and recombining capabilities in new and unanticipated ways. Through radical application composition of existing pieces obtained from external sources, mashups increase agility and reduce development costs: Immediate needs can be satisfied 86 7.2 Conclusion with relatively small effort and by reusing the value of existing assets. Aggregation of data, spread among several systems, allows connecting related information and enables to quickly uncover business insights. The second part of this thesis is devoted to examine this rather general value proposition for its particular applicability in business process management with respect to the main goals of the latter (cf. Section 5.1). By means of the business process life cycle, the suitability of mashups to support tasks as part of business process management has been elaborated and explained by several potential mashup solutions in Section 5.3. Among these are process modeling support through aggregating knowledge that emerges from other stages of the business process life cycle, and the support of process participants through dashboards in the context of case handling and decision support. As a result of this examination, mashups suggest most value in aggregating process knowledge that is fragmented among several places, and presenting it visually, providing insight and understanding of process related information. Section 6 proved the value mashups propose to business process management by means of a prototypical mashup application. Drawing from the general knowledge about mashups gained in the first part of this thesis, the proof of concept provides process modeling support addressing one potential scenario elaborated in Section 5.3.1: The application aggregates model related information, documentation and requirements, with the process model itself. While careful and exhaustive tests addressing usability and real world effect of the prototype were not feasible within the scope of this work, feedback to the proof of concept was invariably positive and suggests future research in this direction. In organizations that embrace large scale information and process systems it is critical to bridge interaction between these systems and their users. Missing communication between developers and knowledge workers results in lacking insights and weakened support for information exploration and presentation. Mashups can be perceived as the “last mile” of software applications carrying information from centralized, high capacity systems to many diverse end-points where this information is ultimately used. Since mashups are a fairly new trend in software development, their future is still uncertain. At a first glance, the well established world of highly governed information and process management systems of business process management seems to contradict with the revolutionary agile and fuzzy approach of mashups. However, I believe that those two software concepts can coexist and even complement each other, whereas mashups put a face on the IT landscape of organizations according to the needs of the users of these systems. 87 7 Conclusion and Outlook 7.3 Future Work Mashups can be largely considered as a discipline created by autodidacts and opens a new field for software engineering, due to its recent popularity. As pointed out in Section 1.2, software companies have been adopting the topic quite ambitiously and provided a lot of technical proposals for mashups and mashup tools. Still, many questions remain open and obstacles unsolved. Some of them will be covered in the following. 7.3.1 Governance in Mashups Introduced in Section 3.5.8 and raised several times within this work, governance is one of the most pressing issues of mashups, currently. In order to increase the application of mashups, the lack of governance must be addressed. Successful mashups will demand for trustworthy solutions to access private or corporate data and ensure responsible usage. Most mashups still do not support login to remote sources. Ongoing work is conducted by OpenID [OIDF, 2007] and OpenAuth [OAuth Core Workgroup, 2005], but a pleasant and simple solution that resolves the issues are yet to be developed, before users and enterprises will put their trust in mashups. [Phifer, 2008] aptly discusses the need for governance as well as the need for a certain degree of freedom: “Mashups Demand Governance (But Not Too Much Governance)”. Along with access to trusted sources goes the retrieval and execution of remote functionality in a trusted environment. The mashup prototype, presented in Section 6, obtains functionality (MOVI) from a server and executes it locally. The problem with this approach is, that executing code from not trusted capabilities in a trustworthy environment may lead to security breaches and disclosure of confidential information, e.g. through identity spoofing. [Isaacs and Manolescu, 2008] propose a promising approach to disarm untrusted code, by running it in an artificial sandbox isolating the code from its native execution environment. Future improvements of governance considering security as well as ensuring a certain level of quality among Web APIs will have significant influence on the evolution of mashups, their adoption in the industry, and their eventual success. 88 7.3 Future Work 7.3.2 Schema Standardization and Semantic Web [Ankolekar et al., 2007] advertise that Web 2.0 and Semantic Web are two distinct yet complementary approaches for the future evolution of the Web. In order to succeed, each one must draw from the other’s strengths. This fits especially the field of mashups. As mentioned in Section 3.5.3, data standardization has considerable impact on simplicity and reusability of content and functionality among the Web. During design and realization of the mashup prototype, it became apparent that the different capabilities required carefully crafted ingestion components that adjusted to the particular document structure of the representations. While content syndication provides adequate means to application neutral data serialization, it lacks semantic expressiveness—a deficiency that could be complemented, and thus solved, by the application of the Resource Description Framework (RDF) [W3C, 2004]. While RDF provides a well defined set of expressive markup, it retains application neutrality and allows for custom extension. Data transformation could be performed automatically (during ingestion and publication), due to the standardized vocabulary used by capabilities, mashups, and consumers. Augmentation could be expressed via declarations in RDF, or implemented as custom functions that leverages the well-defined document structure [Morbidoni et al., 2007]. Its ability to transitively conduct data transformations supports the assumption of simply building mashups by plugging existing pieces that comprise compatible interfaces. RDF is XML-based and perfectly suitable to perform structural and especially semantic transformations and operations on other XML-based documents. XML-based exchange formats for business process models are under current research and development, the XML Process Definition Language [WfMC, 2008] being one of them. Such formats are designed to not only express diagram structure and visualization, but also comprise semantics of the model and enable to enact them within workflow management systems. This and the capabilities of RDF would allow for tighter and more meaningful integration of process models in mashups, while decoupling mashup applications from specific or proprietary data formats. The drawback of RDF that comes with its comprehensive and application independent expressiveness is that it requires a certain level of knowledge and experience to apply its features effectively. Unfortunately, this raises the entry barrier for developers and users. Its suitability in mashups is therefore argued. Future research in that topic needs to compare the gained value with the increased complexity while considering different application scenarios. 89 7 Conclusion and Outlook 7.3.3 Business Process Management To my knowledge, this thesis is the first academic work that examined the value proposition of mashups for business process management. The work demonstrated the general value mashups offer as well as the practical feasibility to realize this value for business process management. In particular, it demonstrated how process related information can be aggregated with process models to gain insight about fragmented process knowledge in the design phase and monitor or analyze process instances during evaluation. Further work in that topic needs to take these particular approaches a step further and examine their advantage by a thorough analysis of productive operation. Additionally, more work needs to be conducted elaborating on other scenarios pointed out in Sections 5.3. Recently, research work has been conducted that attempts to combine the benefits of REST and distributed hypermedia systems with business process management. As outlined in Section 5.3.2, capabilities and Web APIs, in particular, are likely to influence future application design, also in the context of business process management. The efforts of organizations to offer capabilities on the Web may lead to business process management systems that are executed completely on top of the “Internet Operating System” [O’Reilly, 2005]. Based on recent research in the field of case handling and its effects on business process design, the automatic generation of mashups is imagined, as indicated in Section 5.3.3. Such generation would take a case description as input and derive a tentative dashboard configuration that contained the information and controls needed to complete a case. Subsequently, user interface specialists or even the affected knowledge workers—the experts for particular cases—could adopt these dashboards according to their individual needs. An according solution needed to provide a mashup infrastructure and pre-manufactured components that access the organization’s information systems, process it and offer results in various forms to the users, corresponding to the suggestions of the mashup reference model in Section 4.4. 90 References [A9.com, Inc., 2007] A9.com, Inc. (2007). Open Search 1.1 Specification. http: //www.opensearch.org/Specifications/OpenSearch/1.1. [Abiteboul et al., 2008] Abiteboul, S., Greenshpan, O., and Milo, T. (2008). Modeling the Mashup Space. In WIDM ’08: Proceeding of the 10th ACM workshop on Web information and data management, pages 87–94, New York, NY, USA. ACM. [Alba et al., 2008] Alba, A., Bhagwan, V., and Grandison, T. (2008). Accessing the Deep Web: When Good Ideas Go Bad. In OOPSLA Companion ’08: Companion to the 23rd ACM SIGPLAN conference on Object oriented programming systems languages and applications, pages 815–818, New York, NY, USA. ACM. [Amer-Yahia et al., 2008] Amer-Yahia, S., Markl, V., Halevy, A., Doan, A., Alonso, G., Kossmann, D., and Weikum, G. (2008). Databases and Web 2.0 panel at VLDB 2007. In Proceedings of SIGMOD 2008, volume 37 of Lecture Notes in Computer Science, pages 49–52, Vancouver, Canada. ACM. [Anderson, 2004] Anderson, C. (2004). The Long Tail. http://www.wired.com/ wired/archive/12.10/tail.html. [Ankolekar et al., 2007] Ankolekar, A., Krötzsch, M., Tran, T., and Vrandecic, D. (2007). The Two Cultures: Mashing up Web 2.0 and the Semantic Web. In WWW ’07: Proceedings of the 16th international conference on World Wide Web, pages 825–834, New York, NY, USA. ACM. [Bellas, 2004] Bellas, F. (2004). Standards for Second-Generation Portals. IEEE Internet Computing, 8(2):54–60. [Berners-Lee, 1989] Berners-Lee, T. (1989). Information Management: A Proposal. http://www.w3.org/History/1989/proposal.html. [Berners-Lee, 1996] Berners-Lee, T. (1996). Www: Past, present, and future. Computer, 29(10):69–77. [Berners-Lee et al., 1998] Berners-Lee, T., Fielding, R., and Masinter, L. (1998). Uniform Resource Identifiers (URI): Generic Syntax. RFC 2396 (Draft Standard). 91 References Obsoleted by RFC 3986, updated by RFC 2732. [Berners-Lee et al., 2005] Berners-Lee, T., Fielding, R., and Masinter, L. (2005). Uniform Resource Identifier (URI): Generic Syntax. RFC 3986 (Standard). [Berners-Lee et al., 1994] Berners-Lee, T., Masinter, L., and McCahill, M. (1994). Uniform Resource Locators (URL). RFC 1738 (Proposed Standard). Obsoleted by RFCs 4248, 4266, updated by RFCs 1808, 2368, 2396, 3986. [Bradley, 2007] Bradley, A. (2007). Reference Architecture for Enterprise ’Mashups’. Technical report, Gartner Research. [Bughin et al., 2008] Bughin, J., Manyika, J., and Miller, A. (2008). Building the Web 2.0 Enterprise: McKinsey Global Survey Results. The McKinsey Quarterly. [Casati, 2007] Casati, F. (2007). Business Process Mashups? Process Management and the Web Growing Together. In WETICE ’07: Proceedings of the 16th IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises, page 5, Washington, DC, USA. IEEE Computer Society. [Clarkin and Holmes, 2007] Clarkin, L. and Holmes, J. (2007). Enterprise Mashups. The Architecture Journal, 13:24–28. [Crockford, 2006] Crockford, D. (2006). The application/json Media Type for JavaScript Object Notation (JSON). RFC 4627 (Informational). [Crupi and Warner, 2008a] Crupi, J. and Warner, C. (2008a). Enterprise Mashups Part I: Bringing SOA to the People. SOA Magazine, 18. [Crupi and Warner, 2008b] Crupi, J. and Warner, C. (2008b). Enterprise Mashups Part II: Why SOA Architects Should Care. SOA Magazine, 21. [Decker, 2008] Decker, G. (2008). BPM Offensive Berlin: BPMN Stakeholder. http: //bpmb.de/index.php/BPMN-Stakeholder. [ECMA, 1999] ECMA (1999). ECMAScript Language Specification (JavaScript), 3rd Edition. http://www.ecma-international.org/publications/standards/ Ecma-262.htm. 92 References [Fielding et al., 1999] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., and Berners-Lee, T. (1999). Hypertext Transfer Protocol – HTTP/1.1. RFC 2616 (Draft Standard). Updated by RFC 2817. [Fielding, 2000] Fielding, R. T. (2000). Architectural Styles and the Design of Network-based Software Architectures. PhD thesis, University of California, Irvine. [Fielding et al., 2002] Fielding, R. T., Software, D., and Taylor, R. N. (2002). Principled Design of the Modern Web Architecture. ACM Transactions on Internet Technology, 2:115–150. [Garrett, 2005] Garrett, J. J. (2005). Ajax: A New Approach to Web Applications. http://www.adaptivepath.com/ideas/essays/archives/000385.php. [Gregorio and de hOra, 2007] Gregorio, J. and de hOra, B. (2007). The Atom Publishing Protocol. RFC 5023 (Proposed Standard). [Gurram et al., 2008] Gurram, R., Mo, B., and Güldemeister, R. (2008). A Web based Mashup Platform for Enterprise 2.0. Technical report, SAP Labs, LLC. [Hinchcliffe, 2006] Hinchcliffe, D. (2006). Is IBM making enterprise mashups respectable? http://blogs.zdnet.com/Hinchcliffe/?p=49. [Hinchcliffe, 2007] Hinchcliffe, D. (2007). Mashups: The next major new software development model? http://blogs.zdnet.com/Hinchcliffe/?p=106. [Hinchcliffe, 2008] Hinchcliffe, D. (2008). The WOA story emerges as better outcomes for SOA. http://blogs.zdnet.com/Hinchcliffe/?p=213. [Hof, 2005] Hof, R. (2005). Mix, Match, And Mutate: "Mash-ups" – homespun combinations of mainstream services – are altering the Net. http://www. businessweek.com/magazine/content/05_30/b3944108_mz063.htm. [Hoffman, 2007] Hoffman, B. (2007). Ajax Security. Addison-Wesley Professional, Reading. [Hohpe and Woolf, 2003] Hohpe, G. and Woolf, B. (2003). Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions. AddisonWesley Longman Publishing Co., Inc., Boston, MA, USA. 93 References [Hoyer and Fischer, 2008] Hoyer, V. and Fischer, M. (2008). Market Overview of Enterprise Mashup Tools. In Bouguettaya, A., Krüger, I., and Margaria, T., editors, ICSOC, volume 5364 of Lecture Notes in Computer Science, pages 708– 721. [Hoyer et al., 2008] Hoyer, V., Stanoesvka-Slabeva, K., Janner, T., and Schroth, C. (2008). Enterprise Mashups: Design Principles towards the Long Tail of User Needs. In SCC ’08: Proceedings of the 2008 IEEE International Conference on Services Computing, pages 601–602, Washington, DC, USA. IEEE Computer Society. [IBM Corporation, 2008] IBM Corporation (2008). Why Mashups Matter. In 28th DNUG Conferece, June 2008. [Isaacs and Manolescu, 2008] Isaacs, S. and Manolescu, D. (2008). Microsoft Live Labs: Web Sandbox. In PDC2008: Professional Developers Conference. [Jackson and Wang, 2007] Jackson, C. and Wang, H. J. (2007). Subspace: Secure Cross-domain Communication for Web Mashups. In WWW ’07: Proceedings of the 16th international conference on World Wide Web, pages 611–620, New York, NY, USA. ACM. [Janiesch et al., 2008] Janiesch, C., Fleischmann, K., and Dreiling, A. (2008). Extending Services Delivery with Lightweight Composition. In WISE ’08: Proceedings of the 2008 international workshops on Web Information Systems Engineering, pages 162–171, Berlin, Heidelberg. Springer-Verlag. [Jarrar and Dikaiakos, 2008] Jarrar, M. and Dikaiakos, M. D. (2008). MashQL: a query-by-diagram topping SPARQL. In ONISW ’08: Proceeding of the 2nd international workshop on Ontologies and Information Systems for the Semantic Web, pages 89–96, New York, NY, USA. ACM. [Jhingran, 2006] Jhingran, A. (2006). Enterprise Information Mashups: Integrating Information, Simply. In VLDB’2006: Proceedings of the 32nd international conference on Very large data bases, pages 3–4. VLDB Endowment. [Kayed and Shalaan, 2006] Kayed, M. and Shalaan, K. F. (2006). A Survey of Web Information Extraction Systems. IEEE Trans. on Knowl. and Data Eng., 18(10):1411–1428. Member-Chang„ Chia-Hui and Member-Girgis„ Moheb Ramzy. 94 References [Keukelaere et al., 2008] Keukelaere, F. D., Bhola, S., Steiner, M., Chari, S., and Yoshihama, S. (2008). SMash: Secure Component Model for Cross-Domain Mashups on Unmodified Browsers. In WWW ’08: Proceeding of the 17th international conference on World Wide Web, pages 535–544, New York, NY, USA. ACM. [Knöpfel et al., 2006] Knöpfel, A., Gröne, B., and Tabeling, P. (2006). Fundamental Modeling Concepts: Effective Communication of IT Systems. John Wiley & Sons. [López et al., 2008] López, J., Pan, A., Ballas, F., and Montoto, P. (2008). Towards a Reference Architecture for Enterprise Mashups. In Actas del Taller de Trabajo ZOCO’08/JISBD. Integración de Aplicaciones Web: XIII Jornadas de Ingeniería del Software y Bases de Datos. Gijón, 7 al 10 de Octubre de 2008, pages 67–76. [Merrill, 2006] Merrill, D. (2006). Mashups: The new breed of Web app. http: //www.ibm.com/developerworks/xml/library/x-mashups.html. [Morbidoni et al., 2007] Morbidoni, C., Polleres, A., Tummarello, G., and Phuoc, D. L. (2007). Semantic Web Pipes. Technical Report DERI-TR-2007-11-07, DERI Galway, IDA Business Park, Lower Dangan, Galway, Ireland. [Nottingham and Sayre, 2005] Nottingham, M. and Sayre, R. (2005). The Atom Syndication Format. RFC 4287 (Proposed Standard). [Novak and Voigt, 2006] Novak, J. and Voigt, B. (2006). Mashing-up Mashups: From Collaborative Mapping to Community Innovation Toolkits. In MCIS 06 Mediterranean Conference on Information Systems 2006, MCIS 06 -Mediterranean Conference on Information Systems, Venice, October 05-08, 2006. [OAuth Core Workgroup, 2005] OAuth Core Workgroup (2005). OAuth Core, Version 1.0. http://oauth.net/core/1.0/. [OGC, 2008] OGC (2008). KML, Version 2.2. http://www.opengeospatial.org/ standards/kml/. [OIDF, 2007] OIDF (2007). developers/specs/. OpenID Specifications. http://openid.net/ [OMG, 2005] OMG (2005). UML Specification, Version 2.0. http://www.omg.org/ spec/UML/. 95 References [OMG, 2008] OMG (2008). Business Process Modelling Notation Specification, Version 1.1. http://www.bpmn.org/. [O’Reilly, 2005] O’Reilly, T. (2005). What Is Web 2.0? Design Patterns and Business Models for the Next Generation of Software. http://www.oreilly.de/artikel/ web20.html. [Overdick, 2007] Overdick, H. (2007). The Resource-Oriented Architecture. Services, IEEE Congress on, pages 340–347. [Pemberton, 2002] Pemberton, S. (2002). XHTML 1.0: The Extensible HyperText Markup Language (Second Edition). http://www.w3.org/TR/2002/ REC-xhtml1-20020801. [Phifer, 2008] Phifer, G. (2008). End-User Mashups Demand Governance (But Not Too Much Governance). Technical report, Gartner Research. [Raggett et al., 1999] Raggett, D., Le Hors, A., and Jacobs, I. (1999). HTML Specification, Version 4.01. http://www.w3.org/TR/1999/REC-html401-19991224. [Riabov et al., 2008] Riabov, A. V., Boillet, E., Feblowitz, M. D., Liu, Z., and Ranganathan, A. (2008). Wishful Search: Interactive Composition of Data Mashups. In WWW ’08: Proceeding of the 17th international conference on World Wide Web, pages 775–784, New York, NY, USA. ACM. [Richardson and Ruby, 2007] Richardson, L. and Ruby, S. (2007). RESTful Web Services. O’Reilly. [RSS, 2007] RSS (2007). rss-specification/. RSS 2.0 Specification. http://www.rssboard.org/ [Scheer et al., 1992] Scheer, A.-W., Nüttgens, M., and Keller, G. (1992). Semantische Prozeßmodellierung auf der Grundlage Ereignisgesteuerter Prozeßketten. Technical Report 89, Institut für Wirtschaftsinformatik, Universität des Saarlandes. [Shirky, 2004] Shirky, C. (2004). Situated Software. writings/situated_software.html. 96 http://www.shirky.com/ References [Simmen et al., 2008] Simmen, D. E., Altinel, M., Markl, V., Padmanabhan, S., and Singh, A. (2008). Damia: Data Mashups for Intranet Applications. In SIGMOD ’08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 1171–1182, New York, NY, USA. ACM. [Spohrer et al., 2008] Spohrer, J., Vargo, S. L., Caswell, N., and Maglio, P. P. (2008). The Service System Is the Basic Abstraction of Service Science. In HICSS ’08: Proceedings of the Proceedings of the 41st Annual Hawaii International Conference on System Sciences, page 104, Washington, DC, USA. IEEE Computer Society. [Tatemura et al., 2007] Tatemura, J., Sawires, A., Po, O., Chen, S., Candan, K. S., Agrawal, D., and Goveas, M. (2007). Mashup Feeds: Continuous Queries over Web Services. In SIGMOD ’07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data, pages 1128–1130, New York, NY, USA. ACM. [van der Aalst, 2002] van der Aalst, W. M. P. (2002). Making Work Flow: On the Application of Petri Nets to Business Process Management. In ICATPN ’02: Proceedings of the 23rd International Conference on Applications and Theory of Petri Nets, pages 1–22, London, UK. Springer-Verlag. [van der Aalst et al., 2003] van der Aalst, W. M. P., ter Hofstede, A. H. M., and Weske, M. (2003). Business Process Management: A Survey. In Business Process Management, pages 1–12. [van der Aalst and Weske, 2005] van der Aalst, W. M. P. and Weske, M. (2005). Case Handling: A New Paradigm for Business Process Support. Data Knowl. Eng., 53(2):129–162. [W3C, 2004] W3C (2004). Resource Description Framework (RDF) Specifications. http://www.w3.org/RDF/. [Watt, 2007] Watt, S. (2007). Mashups – The evolution of the SOA, Part 1: Web 2.0 and foundational concepts. http://www.ibm.com/developerworks/ webservices/library/ws-soa-mashups/index.html. [Weske, 2007] Weske, M. (2007). Business Process Management: Concepts, Languages, Architectures. Springer-Verlag New York, Inc., Secaucus, NJ, USA. 97 References [WfMC, 2008] WfMC (2008). Process Definition Interface – XML Process Definition Language, Version 2.1a. [Yu et al., 2008] Yu, J., Benatallah, B., Casati, F., and Daniel, F. (2008). Understanding Mashup Development. IEEE Internet Computing, 12(5):44–52. [Yu, 2008] Yu, S. (2008). Innovation in the Programmable Web: Characterizing the Mashup Ecosystem. In Mashups’08 ICSOC. 98