paper - Business Process Technology group

Transcription

paper - Business Process Technology group
Master’s Thesis
Business Process Mashups
An Analysis of Mashups and their Value Proposition
for Business Process Management
Matthias Kunze
[email protected]
March 31, 2009
Supervisors: Prof. Dr. Mathias Weske, MSc. Hagen Overdick
Hasso Plattner Institute, Potsdam (Germany)
Abstract
Mashups are an exciting genre of new Web applications that gained considerable momentum, recently. While thousands of mashup applications exist and many software
vendors promote “Enterprise Mashup” suites, the term “mashup” itself lacks concise definition and the usefulness of mashups in specific fields of operation remains
unclear.
In this work, 29 mashups and tools to create mashups have been studied in a qualitative survey that revealed two general types and eight common characteristics of
mashups. This survey forms the basis of a general mashup pattern that specifies
mashups by their characteristic behavior of aggregating capabilities on the Web. A
reference model, complying with this pattern, provides means to understand existing
and design new mashups on a conceptual level. In a further step, this knowledge
has been used to examine the suitability of mashups in the field of business process management. Specific application scenarios were analyzed comprehensively on
the basis of the business process life cycle, revealing significant potential aggregating
fragmented process knowledge and providing lightweight process implementations.
Zusammenfassung
Mashup bezeichnet ein neuartiges Genre von Webanwendungen, das in den letzten
Jahren beachtliche Aufmerksamkeit erlangt hat. Obwohl es inzwischen Tausende von
Mashups gibt und Softwarehersteller „Enterprise Mashups“ sogar als eigenständige
Lösungen anbieten, ist der Begriff „Mashup“ selbst nicht einheitlich definiert und der
Nutzen von Mashups in verschiedenen Einsatzfeldern nicht untersucht.
In dieser Arbeit wurden 29 Mashups und Werkzeuge zum Erstellen von Mashups
einer qualitativen Untersuchung unterzogen, bei der sich zwei grundlegende Arten
und acht typische Charakteristika von Mashups ergaben. Die Untersuchung bildet
die Grundlage eines allgemeinen Musters, das Mashups anhand ihres typischen Verhaltens spezifiziert. Dieses Verhalten resultiert aus der Aggregation von im Web
verfügbaren Diensten. Ein auf Grundlage dieses Musters entwickeltes Referenzmodell gestattet es, existierende Mashups zu verstehen und neue zu entwickeln. Darüber
hinaus wurden die erlangten Erkenntnisse angewandt, um Mashups auf ihre Eignung
für das Business Process Management hin zu untersuchen. Eine Analyse verschiedener Anwendungsszenarien, die auf Basis des Geschäftsprozesslebenszyklus entwickelt
wurden, ergab, dass Mashups ein erhebliches Potential in der Aggregation fragmentierten Prozesswissens und der Umsetzung leichtgewichtiger Prozesse bieten.
Acknowledgements
The research work presented in this thesis has been carried out from November 2008
to March 2009 at the Hasso Plattner Institute at the University of Potsdam.
My appreciation goes to Prof. Dr. Mathias Weske offering me the possibility to
conduct this work under his supervision as well as to the colleagues at the Business
Process Technology group for their ongoing advice and encouragement. I thank
especially Hagen Overdick and Gero Decker for many inspiring discussions. Further,
I want to thank the members of the Oryx team that supported me with valuable
feedback on my work, as well as Franziska Häger, Jennifer Baldwin, Martin Czuchra,
Tilman Giese, and Tobias Vogel for their comments and proofreading.
Hereby, I, Matthias Kunze, assure to have written this thesis on my own and that I
have declared all sources as well as I have marked citations appropriately.
March 31, 2009
Contents
Contents
1 Introduction
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Thesis Goals and Outline . . . . . . . . . . . . . . . . . . . . . . . .
2 Preliminaries
2.1 Business Process Management . . . . .
2.2 Mashups and the Evolution of the Web
2.2.1 Web 1.0 . . . . . . . . . . . . .
2.2.2 Web 2.0 . . . . . . . . . . . . .
2.2.3 Situational Applications . . . .
2.2.4 Mashups . . . . . . . . . . . . .
2.3 Remarks . . . . . . . . . . . . . . . . .
2.3.1 Content Syndication . . . . . .
2.3.2 Representational State Transfer
2.3.3 Same Origin Policy . . . . . . .
3 Survey of Mashups and Mashup Tools
3.1 Selection of Samples . . . . . . . . .
3.2 Classification Model . . . . . . . . .
3.3 Synthesis of Survey Results . . . . .
3.4 Types of Mashups . . . . . . . . . .
3.4.1 Organic Mashups . . . . . . .
3.4.2 Dashboards . . . . . . . . . .
3.5 Common Characteristics of Mashups
3.5.1 User Centric . . . . . . . . .
3.5.2 Small Scale . . . . . . . . . .
3.5.3 Open Standards . . . . . . .
3.5.4 Software as a Service . . . . .
3.5.5 Short Time to Market . . . .
3.5.6 Aggregation of Heterogeneous
3.5.7 Data Centric . . . . . . . . .
3.5.8 Lack of Governance . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
Content
. . . . .
. . . . .
4 Anatomy of a Mashup
4.1 The Mashup Ecosystem . . . . . . . . .
4.2 Capabilities—Essential Mashup Enablers
4.3 The Mashup Pattern . . . . . . . . . . .
4.3.1 Ingestion . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
4
5
7
.
.
.
.
.
.
.
.
.
.
9
9
10
11
12
15
15
16
17
18
20
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
23
23
25
26
27
28
29
30
31
31
32
33
33
34
34
35
.
.
.
.
39
39
42
44
45
v
Contents
4.4
4.3.2 Augmentation . . . . . . . . . . . .
4.3.3 Publication . . . . . . . . . . . . .
The Mashup Reference Model . . . . . . .
4.4.1 Reference Model Components . . .
4.4.2 Organic Mashups and Dashboards
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
46
48
49
51
53
5 Application of Mashups for Business Process Management
5.1 Goals of Business Process Management . . . . . . . . . . . . . . .
5.2 Business Process Life Cycle . . . . . . . . . . . . . . . . . . . . .
5.3 Value Proposition of Mashups for Business Process Management
5.3.1 Design and Analysis . . . . . . . . . . . . . . . . . . . . .
5.3.2 Configuration . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.3 Enactment . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4 Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
55
55
57
59
60
62
64
67
68
6 Enabling Collaborative Process Design with Mashups
6.1 Analysis . . . . . . . . . . . . . . . . . . . . . . . .
6.1.1 Process Model from Oryx . . . . . . . . . .
6.1.2 Issues from an Issue Tracking System . . . .
6.1.3 Documentation from a Wiki . . . . . . . . .
6.2 Design . . . . . . . . . . . . . . . . . . . . . . . . .
6.2.1 Mashup Platform . . . . . . . . . . . . . . .
6.2.2 Mashup Architecture . . . . . . . . . . . . .
6.3 Realization . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
71
71
73
73
74
75
75
77
80
.
.
.
.
.
.
83
83
86
88
88
89
90
7 Conclusion and Outlook
7.1 The Long Tail . . . . . . . . . . . . . . . . . . . . .
7.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . .
7.3 Future Work . . . . . . . . . . . . . . . . . . . . .
7.3.1 Governance in Mashups . . . . . . . . . . .
7.3.2 Schema Standardization and Semantic Web
7.3.3 Business Process Management . . . . . . . .
References
vi
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
91
List of Figures
List of Figures
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
HousingMaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Yahoo! Pipes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PageFlakes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Growth of Mashups . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Mashup Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . .
General Architecture of an Organic Mashup . . . . . . . . . . . . . .
General Architecture of a Dashboard . . . . . . . . . . . . . . . . . .
The Mashup Ecosystem . . . . . . . . . . . . . . . . . . . . . . . . .
Overview of the Mashup Pattern . . . . . . . . . . . . . . . . . . . .
End-to-End Mashup Workflow . . . . . . . . . . . . . . . . . . . . . .
Mashup Reference Model . . . . . . . . . . . . . . . . . . . . . . . .
Organic Mashups and Dashboards according to the Mashup Reference
Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Life Cycle of a Business Process . . . . . . . . . . . . . . . . . . . . .
Usage of Process Model and Process Knowledge throughout the Process Life Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Comparison of Business Process and Mashup . . . . . . . . . . . . .
Example of a Dashboard that Supports Human Activity . . . . . . .
Architecture of the Mashup Platform . . . . . . . . . . . . . . . . . .
Architecture of the Mashup Prototype . . . . . . . . . . . . . . . . .
Demo of the Mashup Prototype . . . . . . . . . . . . . . . . . . . . .
Long Tail in the Spectrum of Software Systems . . . . . . . . . . . .
2
3
4
16
24
28
30
40
44
49
50
53
57
61
63
66
76
78
80
84
Graphics used within this thesis for explanatory and illustrative purposes may be
comprised by means of notational languages that express semantics. Diagrams will
be labeled with the according abbreviations, as follows.
• Business Process Modeling Notation (BPMN) [OMG, 2008]
• Unified Modeling Language (UML) [OMG, 2005]
• Fundamental Modeling Concepts (FMC) [Knöpfel et al., 2006]
vii
1 Introduction
As David Berlind, Executive Editor of the online magazine ZDNet, aptly stated:
“mashups are the fastest growing ecosystem on the Web” 1 . What he means by this
is that mashups are an exciting new genre of Web applications –– “one of the more
powerful capabilities coming out of the Web 2.0 wave” [Phifer, 2008].
“Mashup” denotes a new style of composite applications composed of a set of capabilities that may be skill or knowledge, obtained from disparate sources on the Web.
Mashups draw upon these capabilities and create value by providing immediate solutions to transient needs and insight through connecting related information. The
capabilities referred to here are generally resources that are provided online, such as
Web sites. Thus, mashups are sometimes considered as Web page aggregators.
The term mashup originates from the pop music scene of the 1990s, when artists
blended one or more existing song tracks of usually different genres. Just like these
music mashups, that have been most apparent in the hip hop scene, Web mashups
combine capabilities of different types and sources in new, unanticipated, and innovative ways using rather simple approaches and techniques for aggregation. Generally,
mashups can be built rapidly by intermediate skilled developers using scripting languages, such as PHP, Ruby, or JavaScript. Through the provision of mashup tools
even end users are given the ability to create mashups. Such tools simplify the process
of composing an application to the degree of visually connecting pre-manufactured
software components that map to external capabilities via drag and drop.
To emphasize the broad variety of mashups, three typical representatives will be
introduced in the following.
HousingMaps. The most popular type of mashup is most likely the mapping mashup
that visualizes any information related to a geographic position as a location on a
map. This allows immediate apprehension of distances and distribution patterns.
HousingMaps, a typical example of manually developed mashups, depicted in Figure 1, has been one of the precursors of mashups, leveraging Google maps to display
real estate offers from Craigslist2 .
HousingMaps allows users to search for houses, apartments, and rooms within a specific region, filtering offers by type, price, and other characteristics. To achieve this,
1
2
http://news.zdnet.com/2422-13569_22-152729.html
http://craigslist.org
1
1 Introduction
Figure 1: HousingMaps. A manually built mashup.
it retrieves offers from Craigslist, sorts and filters them, extracts location information, and uses the Google Maps API to show them on a map. At the first release of
HousingMaps, neither Craigslist nor Google offered their services for reuse or recombination. In fact, the developer of HousingMaps, Paul Rademacher, was hired by
Google to develop the Google Maps API after he leveraged Google Maps by hacking
them to create his mashup.
Yahoo! Pipes. Another famous player in the mashup world is Yahoo! Pipes, providing the ability to consume data from all over the Internet, and reformat and aggregate
information to create insight. It is not a mashup itself, but rather a Web application
that allows users to create information aggregation mashups through drag and drop,
shown in Figure 2. It uses the pipes-and-filter pattern [Hohpe and Woolf, 2003],
where filters are data operations, connected through pipes, i.e. data channels.
The simplicity of its user interface enables people that have a basic understanding
of feeds and HTML, as well as a set of operations to aggregate information sourced
from many places of the Internet. The resulting mashup is completely deployed on
Yahoo!’s infrastructure, hosted and executed on their servers. The mashup’s result
is delivered to the user in different formats, such as HTML lists or RSS feeds. Yahoo!
Pipes also allows users to display information items on a map. Microsoft Popfly is
another mashup tool that provides similar functionality.
2
Figure 2: Yahoo! Pipes. Mashup definition with pipes and filters.
PageFlakes. The third example shows a completely different flavor of mashups,
compared to the two outlined above. PageFlakes, illustrated in Figure 3, is an application that lets users aggregate information through the composition of modules,
so-called flakes. A flake represents specific content obtained from a source on the
Internet, such as a calendar, notes, weather forecast, bookmarks or RSS feeds. The
collection of flakes on a page provides a quick glance of required information, similar
to the dashboard of an automobile. Therefore, PageFlakes is often used as a start
page.
Such dashboards are a hybrid of mashup and mashup tool. While the actual dashboard instance is a mashup, it allows users to adapt the mashup to the personal
needs, offering a set of flakes to add content, at the same time. These kinds of mashups enable virtually anybody to assemble a set of relevant information items, and
have become extremely popular within the last years, as the diversity of available
dashboard mashups shows. Similar mashups are iGoogle, and NetVibes.
3
1 Introduction
Figure 3: PageFlakes. A typical dashboard, aggregating several sources with the help
of widgets.
1.1 Motivation
Information that is critical to perform tasks and make decisions is often spread across
several software systems and transferred inefficiently between those who can provide
and those who need to consume data. Users often resort to ineffective means, such
as copy-and-pasting information, packaging it in spread sheets or documents and
sending it via email. These processes are manual, error prone, and not scalable.
Solutions that meet such needs typically entail custom development. Due to the
immediate and potentially short-lived need, this is often very expensive and not
profitable.
The capacity of mashups to deliver information from their source to their consumers,
reformat, and aggregate data to create insight by connecting related data promises
high value within the operations of organizations. Thus, mashups are predicted to
gain significant impact on future IT landscapes of companies, leveraging existing
systems to serve immediate needs of end users by the means of simple and rapid
creation of ad-hoc applications. At the time of writing of this thesis, a widely agreed,
crisp definition of mashups is still nonexistent. While this has considerable drawbacks
in terms of creating solutions based on established standards or best practices, it also
opens the field for innovative, yet unanticipated solutions.
4
1.2 Related Work
Business process management is an established discipline that joins information technology and business economics on the level of the inner processes of an organization
that create revenue. The participants involved in business process management are
manifold, ranging from strategic positions such as CIO or CTO to process implementing developers and participants that interact with a running process instance.
Central knowledge assets of business process management are business processes
models. They serve as an agreement among the different phases to carrying out
processes in an organization. Information beyond process models is often kept informally, suffering from the same fragmentation and inefficient communication described
above.
Bringing together information that emerges within business process management
and the process model with the help of mashups promises a set of key benefits to all
participants. Among these benefits is immediate and simple access to information.
Instead of searching through a multitude of systems and places to identify relevant
knowledge every time it is required, mashups can aggregate such capabilities and
provide fast and efficient access, connecting related information with each other and
the process model. End users, such as knowledge workers, are empowered to create
their own mashups out of resources internal and external to the organization to
support their activities and decision making.
1.2 Related Work
While mashups have become quite popular within the last years, there is only limited
literature that attempts to comprehend mashups and their concepts. The majority of
research articles are contributions from software vendors and focus solely on technical
questions. Independent and thorough analysis of mashups is still missing. Mashups
were only infrequently represented at conferences, such as WWW, SIGMOD, ICSOC,
and OOPSLA in the past, but they seem to be gaining increasing momentum, e.g.
in cutting-edge conferences such as Mashup Camp3 and Composable Web4 .
Many software vendors have approached the topic by providing their own frameworks
or solutions, for example IBM with [Keukelaere et al., 2008, Riabov et al., 2008,
Simmen et al., 2008], SAP with [Gurram et al., 2008, Janiesch et al., 2008], and Microsoft with [Isaacs and Manolescu, 2008, Jackson and Wang, 2007]. Furthermore,
a lot of discussion about mashups is going on in blogs, along with general discussions about Web 2.0 and Web-oriented Architecture that seems to emerge as a new
3
4
http://www.mashupcamp.com/
http://mashart.org/composableweb2009/
5
1 Introduction
paradigm of service distribution on the Web. Dion Hinchcliffe is one famous author
at ZDNet5 frequently discussing mashups in the context of Web 2.0 and business.
The whole extent of mashups does not become visible unless one takes these discussions into account, as well as the magnitude of mashups and mashup tools that are
offered already. A great resource to get an impression of the quantity of existing
mashups is ProgrammableWeb6 .
Some attempts have been conducted to formulate reference models or reference architectures for mashups. However, these works limited their scope to aspects of
mashups instead of drawing a holistic picture and thus, are of limited general usefulness. [Hinchcliffe, 2006] and [Yu et al., 2008] approach mashups as an aggregation
of components of one of the following types: data, functionality, or presentation.
While this seemed to be a good idea, it turned out during the work of this thesis
that resorting to distinguishing these component types is not feasible considering resources and representations on the Web: Representations, i.e. hypertext documents,
are generally a hybrid of structured data, functionality, and presentation directives.
This is discussed in detail in Section 4.2. [Bradley, 2007] claims to create a reference
architecture for enterprise mashups. Unfortunately, this architecture is based on a
business view rather than on software engineering principles and suffers from loss of
generality. A similar approach is given by [López et al., 2008], who neglect the characteristic of mashups to also aggregate functionality. A promising suggestion that is
also academically well founded is provided by [Abiteboul et al., 2008], who develop
a generic but very formal model for mashups. This is of questionable practical use,
since mashups emerge from communities that are rather pragmatic, and less likely
to adopt complex yet scientifically well founded models. The present work tries to
draw upon these models, adapting their strengths yet remaining general and simple
enough to cover the whole extent of mashups rather than only focusing on particular
aspects.
Business process management has been a well established science for more than
two decades. Terms and concepts are well understood and consolidated. Most of
the statements made about business process management in this thesis relate to lectures at the Business Process Technology group at the Hasso Plattner Institute under
whose supervision this work was conducted. Additional knowledge is mainly obtained
from [Weske, 2007] and [van der Aalst et al., 2003, van der Aalst and Weske, 2005].
At this time, I am not aware of any advanced literature that examines the value
proposition of mashups for business process management. [Casati, 2007] provides a
position paper that introduces the idea of combining the two different topics, but
5
6
6
http://blogs.zdnet.com/Hinchcliffe
http://www.programmableweb.com
1.3 Thesis Goals and Outline
only identifies potential deficiencies of business process management and suggests
looking at mashups to solve them. Continuing work on this position paper was not
found. Some companies, e.g. The Process Factory7 or Serena8 are also addressing
business process with mashups, yet they do not provide scientific work for review.
Consequently, this thesis provides introductory work examining the value proposition
of mashups for business process management.
1.3 Thesis Goals and Outline
As mentioned previously, there is no common agreement for a definition of the term
mashup, how they are constituted, and which value they provide to specific business
opportunities. Literature is concerned with solving technical problems and providing
exciting platforms and software suites rather than analyzing the problem in the same
way mashups emerged: bottom up. Thus, the thesis does not assume any definition
for mashups but a piece of software that aggregates more than one capability that
are made available over the web. This work attempts to gain a comprehensive understanding of characteristics of mashups, the environment in which they have evolved,
and their inner workings.
The thesis is structured as follows: Section 2 will introduce the background of business process management and mashups, explaining how these topics have emerged
over time and which influences led to the state they currently embrace. It will further
give related information about distinct topics that are considered fundamental for
the understanding of the remainder of the thesis.
Section 3 will examine mashups from the ground up. For that purpose several mashups and mashup tools have been analyzed, among others, for their type, purpose,
and aggregated capabilities. The result of that survey will be synthesized and a set of
common characteristics derived that support further observations and examinations
within the thesis.
Section 4 leverages results and experience ensuing from the survey to understand
the environment that mashups exist in, called the mashup ecosystem. Taking this
as a starting point, mashups will be examined at a finer degree of granularity in
order to understand their anatomy and will reveal a conceptual pattern common
to all mashups. This will be refined to derive a compositional reference model for
mashups. The reference model does by no means address a specific architecture or
7
8
http://www.theprocessfactory.com/
http://www.serena.com/mashups/
7
1 Introduction
technology but serves as a tool to understand the workings of existing mashups and
the design of new ones.
The gained understanding of mashups is essential to study their application on business processes. Section 5 will discuss business process management by means of the
business process life cycle, a model that describes the phases a process passes through
from design to execution to retirement or improvement through redesign. The section
will propose opportunities to improve or support each of these phases and conclude
with an assessment of the value of mashups for business process management.
The results of the previous sections will be employed to show the applicability of
mashups for business process management as well as the value created in a proof
of concept, in Section 6. An actual mashup has been implemented that aggregates
several sources of knowledge related to a process with the according process model.
This mashup comprehensively displays documentation, issues, and feature requests
in a single, holistic perspective. Its development and outcome will be documented
in this section, discussing occurring issues and gained experience.
Finally, the work will be concluded in Section 7. The long tail of software development—a metaphor occasionally used in that context—positions mashups in the
overall spectrum of software systems and discusses their potential and value proposition. Further, the essential outcome of this thesis will be reviewed and discussed,
addressing ideas for and issues of future work in the same area.
8
2 Preliminaries
2.1 Business Process Management
In the early days of computers, applications were built directly on top of operating
systems that provided little functionality. Such applications comprised all functionality to conduct a task, each implementing every required component on its own,
including data storage. This led to anomalies and inconsistencies, because data used
in more than one application had to be copied and maintained among them. In
the course of computer history, operating systems provided increasing functionality,
and additional layers of functional abstraction and encapsulation emerged between
operating systems and specific applications. One of the most influential applications that provided its functionality to applications built on top is the database
system. Database systems provide data storage among several applications, eliminating anomalies and inconsistencies of redundantly stored data. Applications that
satisfy specific needs were built on top, sharing data with other applications.
Soon, it became obvious that not only data, but also functionality should be reused.
Business logic being merely dedicated to a specific domain, such as customer relationship management, proved to be expedient among several departments of an
organization. This contributed to typical enterprise architectures that are present in
today’s organizations. Domain specific, yet reusable software components, offered as
services, are built on top of general purpose applications such as database systems.
The emphasis of creating use case specific, or tailor-made, applications shifted from
programming to composition of these services.
Until the process orientation trends of the 1990s that originated in business management, most applications were data driven. Striving for innovation and flexibility,
process orientation promised efficient support to align business and information technology. The need to structure businesses along revenue creating processes put the
focus on process orientation in information technology and inherently constituted
the field business process management (BPM), eventually combining the benefits
of process orientation with the capacities provided by the evolution of information
technology.
Business process management is based on the observation that the value an organization creates is the outcome of a number of steps, or activities, performed in a
coordinated manner. While such activities and their fulfillment may be implicit in a
company, business process management makes them explicit in the form of business
processes. A business process comprises “a set of activities that are performed in
9
2 Preliminaries
coordination in an organizational and technical environment” [Weske, 2007]. Business processes are classified by different levels. Organizational business processes
are high level processes that help to understand and realize an organization’s goals,
ultimately contributing to the organization’s business strategy. They are realized by
a set of operational business processes that coordinate the operational activities of
an organization. Operational business processes remain independent of particular
technology and platforms.
Processes are conducted within an organization, centrally controlled by an orchestration agent, while several processes can interact across organizations. Since there is no
agent that centrally coordinates these interactions, they are called choreographies.
Business Process Management emphasizes the automatic orchestration of processes.
This requires formal specification and the explicit representation of processes through
process models. A process model “acts as blueprints for a set of business process instances”, which are “concrete cases of the operational business” of an organization
[Weske, 2007]. Several notations for process models exist that are essentially similar.
Formal aspects allow for validation of correctness of process models, while graphical
notations are significant for stakeholders to understand these models.
Section 5.2 will introduce business process management goals relevant for this work.
A life cycle model for business processes that constitutes several phases will be explained subsequently.
2.2 Mashups and the Evolution of the Web
Asked the question, what is a mashup, many people already have a gut feeling
and often some very own understanding, but cannot provide a crisp definition. Descriptions range from “ad hoc aggregations of whatever needs to be aggregated”
[Hinchcliffe, 2006] to “Frankenstein on the Web” [Hoffman, 2007]. The term mashup
is still quite fuzzy and often misunderstood, common agreement exists that mashups
are an exciting new genre of Web applications that aggregate capabilities from several Web resources via publicly available interfaces [Merrill, 2006]. What mashups
are and how they work is often compared as an analogy to the personal computer.
A personal computer is running an operating system, which separates concerns of
the control of hardware components from those of applications providing many application programming interfaces (API) that encapsulate low level interaction with,
for instance, display, hard drive, and network interfaces. APIs expose a set of higher
level functions and thus, make software development much easier. Programmers do
10
2.2 Mashups and the Evolution of the Web
not have to worry about the particularities of lower level functionality, any more.
Applications simply use those interfaces, which increases development efficiency dramatically.
For Web applications the operating system of a personal computer is exchanged
with the Internet. Functionality and data is provided online. So-called Web APIs
are used by Web applications in a similar way as classic applications use operating
system APIs. Many companies expose their capabilities as Web APIs, e.g. Flickr 9
and Amazon 10 , and many non-profit organizations also provide resources that are
consumable by Web applications, such as Wikipedia 11 . Exposing any content to the
web—even only as a static Web site—can be considered as providing some kind of
capability that can be leveraged by others.
Mashups are applications that consume several of such capabilities and aggregate—or
mash—them in new and innovative ways that were not anticipated before. Sometimes the content providers are even unwitting of the reuse of the capabilities they
offer. The notion of mashups using Web APIs as an analogy to operating systems also
coined the term “Web as a Platform” or “Internet Operating System” [O’Reilly, 2005].
To understand how mashups can obtain and aggregate capabilities from distributed
sources on the web, one needs to understand how the Web moved from pure human oriented document storage to a network of services and machine consumable
capabilities.
2.2.1 Web 1.0
The Web (or World Wide Web) was developed in 1989 as a project at the particle physics laboratory of CERN in Switzerland. Tim Berners-Lee envisioned a
distributed hypertext system that allowed scientists, even if they were not computer experts, to easily generate, share, and keep track of content without the need
to maintain personal copies [Berners-Lee, 1989]. Berners-Lee states that the “Web’s
major goal was to be a shared information space through which people and machines
could communicate” [Berners-Lee, 1996]. He realized that such a system needed to
be decentralized (through unidirectional links), platform independent, and simple,
addressing the needs of humans and machines likewise. The latter was established
by the central context of structured hypertext—defined in the Hypertext Markup
Language (HTML)—that gave plain text semantic meaning through tags. The first
9
http://www.flickr.com/services/api/
http://aws.amazon.com/documentation/
11
http://wikipedia.org
10
11
2 Preliminaries
HTML draft12 did not contain any means to record general metadata about the
document, except its title. However, it contained a small set of tags that allowed to
structure text logically (through headings and paragraphs) and to mark pieces with
specific meanings (such as addresses), supporting his vision of “shaking it, until you
make some sense of the tangle” [Berners-Lee, 1989].
Within a few years, scientific interest in the Web grew rapidly and a universe was
soon to emerge from its first solitary occurrence. By 1993 more than 500 Web
servers were online and it was nearly impossible to keep up with the list of published
content. The Web became increasingly complex and demanded for a way to search for
content. People started to manually assemble indexes of other Web pages, which was
inefficient and turned out to be insufficient. At that time, search machines appeared,
Lycos being one of the most famous ones [Hoffman, 2007]. Automatic cataloging of
content was greatly supported by the introduction of the META tag in HTML that
allowed supplying metadata about the content, such as title, description, keywords,
or language.
Soon, the Web matured and became a global network for everyone, not just scientists. Enterprises began to value global communication and sought for means to
establish their business electronically over the Web. Ordinary people collaborated
in content creation, providing personal home pages or sites that reveal information
about specific topics. The Web has been developing into a network of resources that
provided knowledge and skills, e.g. online telephone directories.
2.2.2 Web 2.0
The term Web 2.0, originally credited to Dale Dougherty and Craig Cline, yet made
popular by the famous article “What is Web 2.0” from Tim O’Reilly [O’Reilly, 2005],
does not describe a technological revolution of the Web. It rather describes a changed
perception of the Web—an evolution of people and devices that drive the Web
[Amer-Yahia et al., 2008]. This evolution, demarcates a paradigm shift from a Web
of publication to a Web of participation. This became visible through the usage of
new metaphors, including but not limited to tags instead of categories, wikis instead
of content management systems, and blogs instead of personal homepages.
[Watt, 2007] identifies three core patterns of Web 2.0: service, simplicity, and community. These patterns describe common characteristics among Web 2.0 applications.
12
http://www.w3.org/History/19921103-hypertext/hypertext/WWW/MarkUp/Tags.html
12
2.2 Mashups and the Evolution of the Web
Corresponding to O’Reilly’s article, this list needs to be complemented by another
entity that is essential for mashups: data.
Service. New devices, such as smart phones, and new approaches to Web applications, such as AJAX, required new interfaces to existing functionality and data.
The endeavor to serve such a multitude of new devices, now and in the future, led to
reconsidering system design and eventually decomposition of existing software systems into services. Legacy applications were disassembled or wrapped with service
adapters to encapsulate functionality into coarsely grained components. This supports decoupling and thus, reuse of capabilities among different types of applications
and devices, which suggested unlocking new business opportunities.
In the course of Web 2.0, many applications were moving to such a service model,
where applications are run in user agents, i.e. Web browsers, and access core functionality as service over the Internet. These services form the APIs of the “Internet
Operating System”, making the Web tick as a platform and enabling efficient development of Web applications on top.
Data. While functionality can be—more or less—easily replicated among vendors,
data has become one, if not the most important, building block of Web 2.0 applications, since functionality is generally only useful in combination with data. Data
ownership and data leadership are key factor for online businesses [O’Reilly, 2005].
However, it is a crucial task to maintain and enhance data. Users are valuable resources to enhancing data with metadata. They can categorize data items through
folksonomies (tags), identify and eliminate duplicates, complement missing information, assess data quality through feedbacks and reviews, and identify related information through recommendations. This is only possible if the data is made available
for users.
With participation of users in content generation, communities create and enhance
magnitudes of relevant and high quality data, more data than a single body could
own or even manage. Examples are the numerous blogs and wikis that have sprouted
in recent years, taking Wikipedia as the most prominent example outperforming
commercial online encyclopedias in timeliness, amount, and quality of information.
Simplicity. The Web 2.0 paradigm gained momentum within the last years and
applications became easier to use and to develop. Web applications moved beyond
displaying content on static pages to retrieve external information. They are now
13
2 Preliminaries
characterized by interactive and visually rich user experience, mainly due to the employment of AJAX (cf. [Garrett, 2005]). Web 2.0 is generally driven by communities.
So are applications and the services they are built on. This led to the evolution of
open standards that have low entry barriers and thus, allow for easy application
development.
Among these open standards is syndication that lets users subscribe to streams of
uniformly structured content, which in turn allows for simple machine consumption. In fact, syndication became the most famous and applied means to consume
data via APIs. Semantic Web denotes the meaningful data structuring and annotation with metadata and became a relevant academic topic. Concepts of Semantic
Web actively support machine consumption of data, eventually leading to automatic
knowledge acquisition systems. Another prominent example for the employment of
simplicity is the famous architectural style Representational State Transfer (REST)
[Fielding, 2000] for Web applications that simplified and unified access to resources
dramatically.
Community. Web 2.0 has caused a shift in the way users participate in and with
the Web, affecting the way users organize, access, and use information. Upcoming
applications hosted on providers’ costs motivated a whole generation to participate
in content generation, where a contributor gains more from the system than they put
into it, as described above. The human urge to communicate, argue about opinions,
and share new ideas is a major force driving the Web of documents into a Web of
participation. Small communities further benefit from effects of self-control through
social aspects, where the cohesion of the community prevents abuse and finds ways
to handle problems internally. Users that behave inappropriately, provide false or
poor information will be punished via social means such as negative feedback or even
exclusion.
The effect of communities maintaining and enhancing information is commonly referred to as “wisdom of crowds” [O’Reilly, 2005]. Beyond data, communities also
affect services and simplicity. Communities of developers strive for simple yet powerful solutions. Among a given set of alternatives this applies to a form of survival
of the fittest, leading to a smaller set of proven and broadly adopted concepts and
technologies.
14
2.2 Mashups and the Evolution of the Web
2.2.3 Situational Applications
With his seminal article “Situated Software” [Shirky, 2004], Clay Shirky describes
one of the core demands of the fast changing Internet ecosystem: simple applications
that satisfy an immediate need within a specific social context. While flexibility is
an invariable goal of software development, the “Web School” focuses on scalability,
generality, and completeness, to address a large amount of users with their applications. However, a lot of situations exist that require a fast solution that is just
good enough and addresses only a small group of users. When development of such
applications was costly and needed IT experts due to the integration of required
resources, such needs were neglected again and again.
The evolution of the Web fuels into satisfying these needs. Services expose specific
pieces of the expertise that is needed, simplicity allows non-programmers to consume and combine these services. Data can easily be obtained from freely available
resources to create insight among several dimensions of a problem. The knowledge
to combine data and functionality to satisfy a specific need is provided by domain
experts, the users of the software at hand.
[Jhingran, 2006] argues that situational applications have a rather transient existence. They will either outlive their usefulness—when the need expires—or migrate
to a more sophisticated solution due to increasing demands. Disposing situational
applications that have outlived their usefulness is not critical because the cost to
create them is less than the value they add.
2.2.4 Mashups
[Clarkin and Holmes, 2007] describe mashups as agile views composed of simple services. Such services, or more general capabilities, aim to satisfy one specific objective, rather than providing a complete solution suite and thus, stay application
independent. Mashups stem from the reuse of existing resources, facilitating rapid,
on-demand software development. Specific capabilities allow the mashup developer
to create applications that correspond to a specific need in a specific context.
The rise of mashups can be seen as a long existent need that finally met the opportunity to gain momentum. The evolution of the Web let services emerge and
communities reuse those capabilities. Mashups simply combine several of these capabilities for their own good, solving specific needs through the expertise of others.
While one basic value proposition of mashups still remains the satisfaction of tran-
15
2 Preliminaries
sient needs, they grew beyond situational applications, already. HousingMaps, being
used by thousands of users every day, is not a situational application anymore. Nor
did it outlive its usefulness. However, it still does not own any of the data it provides,
offering just the skill to combine real estate offers with visual mapping capabilities.
3000
2500
2000
1500
1000
500
0
Figure 4: Growth of Mashups, according to [Yu, 2008]
Figure 4 depicts that mashups did not experience the often predicted hype or a
boom since they emerged in 2005, but rather show steady growth. The diagram
depicts the number of mashups registered at ProgrammableWeb at a certain point
in time. In average, 94 new mashups are registered each month [Yu, 2008]. While
they are still maturing, mashups are gaining more and more popularity, promising
substantial advantages to individuals and enterprises as well. These advantages
include reduced cost and improved productivity in application development due to
lightweight composition and reuse. Through aggregation of widespread knowledge,
they create value and uncover new insights.
2.3 Remarks
The following sections explain the terms and concepts of some topics referred to
in the course of this work. They are essential for the understanding of the thesis
contents and found the basis for reasoning within discussions.
16
2.3 Remarks
2.3.1 Content Syndication
The term content syndication denotes a concept to structure and publish content
of Web sites and other Web data in an agreed format, independent of any visual
layout.
The first occurrence of content syndication dates back to 1995, when the Meta Content Framework was developed for content syndication by Apple Computers, a proprietary data format to structure the content of websites with the help of metadata.
Several vendors followed with their own formats that, due to its rising popularity,
leveraged XML as structural markup. In early 1999, Netscape released RDF Site
Summary version 0.9, the first version of the probably most famous syndication
format yet, commonly referred to by its acronym: RSS. The Resource Description
Framework (RDF) [W3C, 2004] allows semantically describing data with the help
of XML markup and refers to the concepts of Semantic Web. Until 2002 RSS underwent several changes, mostly simplifications, and was finally released as Really
Simple Syndication version 2.0 that does not leverage RDF anymore. Usage of the
term RSS refers to RSS 2.0, hereafter.
As of 2003 copyright of RSS 2.0 was owned by Harvard University and its development was frozen ever since, which is also stipulated in the official specification
document [RSS, 2007]. It was Sam Ruby, who initiated discussion and the development of a successor of RSS that should overcome the deficiencies of RSS, being
open to development and extension for everybody and vendor neutral. Development
of the so called project “Atom”, went fast and the Internet Engineering Task Force
published the Atom Syndication Format in July 2005 [Nottingham and Sayre, 2005],
referred to as Atom, hereafter.
While RSS is still extremely popular and widely used, it becomes apparent that it
won’t be developed further, due to the rising popularity of Atom that found a highly
valuable use in the Atom Publishing Protocol (AtomPub), an application level protocol for editing and publishing resources on the Web [Gregorio and de hOra, 2007].
The next version of Atom, which has not been announced yet, is supposed to take
over RSS and Atom eventually. In order to avoid confusion and loss of generality, the
concept of syndication, represented by both formats, RSS and Atom, will be referred
to as content syndication, hereafter.
17
2 Preliminaries
2.3.2 Representational State Transfer
This work will refer to Representational State Transfer (REST), an architectural
style that was first described by Roy T. Fielding in his PhD thesis [Fielding, 2000].
While many institutions and individuals claim to understand the principles defined
in that work, recent discussions revealed confusion and a lack of expertise in that
topic13 . A short introduction of the primary principles of REST will be given below,
whereas a good introduction into the topic, its background, and best practices are
given in [Richardson and Ruby, 2007]. [Overdick, 2007] discusses the term ResourceOriented Architecture in the context of REST and Web 2.0.
The term architectural style denotes a coordinated set of constraints that restrict
architectural elements and the relationships among them. However, a style is only an
abstraction of those elements and relationships. The instantiation of an architectural
style is an architecture constituted by the elements constrained by the style. A
system’s implementation is the instantiation of an architecture.
REST is an architectural style that refers to interactions within applications distributed over the Web. REST can be considered an approach to reengineering the
Web, elaborating the concepts that made it successful and evading flaws that turned
out to be a disadvantage. Thereby, the style focuses especially on the architectural
aspects that are inherent to distributed interactions through hypertext at the scale
of the Internet [Fielding et al., 2002]. These aspects are data elements and connectors that enable interaction between components. Thus, REST embraces the core
principles resources, representations, and a uniform interface. Components are the
endpoint applications that eventually communicate with each other, such as the Web
server and Web client application.
Resources. Resources are the principal architectural abstraction of information in
REST. Referring to [Fielding, 2000], a resource can be anything that is worthy of
having a name, or to be more precise, a unique identity. Thus, a resource can be
perceived as a concept. Valid incarnations of such concepts are digital documents,
provided services, or even real-world objects such as a person. Fielding comprehends
resources as a temporally membership function, that maps to a set of entities or
values at a certain point in time.
13
http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven
18
2.3 Remarks
Representations. Actions conducted within REST style architectures are performed
through representations, snapshots of a resource’s current or intended state that is
transferred between components. This is an important aspect of REST: Instead of
invoking methods remotely that change the value of a resource, a representation of
the resource’s state is transferred to the components that intend to change it. Representations can be manifested in any form that can be exchanged in a distributed
hypertext system, which is essentially a digital message entity. As [Fielding, 2000]
defines, a representation consists of data that constitutes the snapshot of a resource’s
state, metadata that describes that data and serves to understand it, e.g. by machines, and occasionally metametadata that describes the metadata, e.g. to verify
the integrity of data and metadata.
Uniform Interface. While resource and representation are the central data elements
of the architectural style, its connectors are described through an interface definition, which, if appropriately implemented, enables effective and efficient interaction
between the components within a distributed hypertext system. Since REST is an
abstraction of an architecture, it does not state methods of an interface, but rather
constraints, the interface must comply with.
REST is often mistakenly confused with HTTP [Fielding et al., 1999]. HTTP is just
an implementation that complies with the uniform interface definition of REST and
is today’s most common transport protocol on the Web. The following explains the
constraints imposed by the REST uniform interface and its application in HTTP.
Universal Identification of Resources: The identity of resources must comply with
one single naming scheme within the hypertext system, and all resources must
only be identified through these identities.
URI [Berners-Lee et al., 1998] is the naming scheme employed on the Web and
in HTTP in particular, including its subset URL [Berners-Lee et al., 2005].
Self-descriptive Messages: Interaction among components in REST is realized by a
request-response-pattern of transferred messages. Each message, i.e. requests
as well as responses, must contain all information that is required by the receiving component to understand that message. This implies statelessness among
requests: one request must not depend on another request and be understood
without prior knowledge or preceding interaction.
HTTP defines the format of each message comprising mainly a verb, a header,
and a body element. The verb identifies the intention of the message, i.e.
whether to retrieve or update the status of a resource through representation.
The header contains information about the representation, especially its data
19
2 Preliminaries
format. The representation of the resource’s state itself is contained in a body,
which is optional depending on the intention of the message.
Manipulation of Resources through Representations: Resources must not be manipulated except by changing their state through transferring a representation
that describes its intended state.
The representation of resources is generally contained in the body of an HTTP
message. A resource can be updated by expressing an intention through the
according HTTP verb and providing a representation of its new state in the
body of the request.
Hypermedia as the Engine of Application State: Since all interactions are stateless, the state of a resource is manifested in its representation. In turn, the state
of an application is made up of the state of all involved information entities.
In REST, resources are the key abstraction of information. Thus, application
state is communicated in the form of representations. These representations
must be hypertext, i.e. contain all required information to advance the state
of an application. These are the identity of related resources and information
about the messages to transfer. By that, an application can be carried out,
starting with one single resource identity. All subsequent interactions to advance the state are communicated through according representations.
HTML [Raggett et al., 1999, Pemberton, 2002] that emerged along with HTTP
uses hyperlinks to address related resources and provides forms as a means to
update the state of related resources through a representation that is the serialization of the form’s content. However, applications that conform to REST do
not need to use HTML; any hypertext or hypermedia format for representations
is valid.
2.3.3 Same Origin Policy
Providing services on unobtrusive platforms such as Web browsers has the significant
benefit of reaching many people and lowering barriers to adoption. In the ideal
case, applications running in Web browsers don’t even require browser plugins such
as Flash or Java. While content could be processed completely on a Web server
and presented to the user as a static HTML document, this hinders interaction. A
more responsive way for Web applications is provided under the term AJAX that
is an interaction paradigm rather than a technology, according to [Garrett, 2005]:
Content and functionality are obtained on demand from the server. However, such
Web applications generally suffer from browser security restrictions, namely the same
20
2.3 Remarks
origin policy14 that denies obtaining Web resources, i.e. capabilities provided on Web
servers, that are located on a different host than the original HTML document that
constitutes the Web application itself.
The general understanding of mashups is that they aggregate capabilities from different sources, that is, different origins. Thus, the same origin policy places a burden
particular on mashups. While a solution in the form of controlled relaxation of
that policy is likely to be implemented in future releases of Web browsers, crafty
developers have found ways to circumvent this restriction, already.
The first workaround is called On-Demand JavaScript15 , or JSON with Padding16
(JSONP), and relies on an exception of the same origin policy. This exception
allows the retrieval of related documents of a page, such as images, style sheets, and
JavaScript files, from other hosts by adding according links to the page’s document
object model (DOM). The JSONP approach dynamically manipulates the DOM of
a page, adds links that refer to remote script locations, and loads JavaScript files
on demand from any origin. Information can be transferred by this approach, if the
data is formatted as JSON [Crockford, 2006] and encapsulated into a function call
that is executed when the file is loaded. The drawback of this method is that it only
allows the retrieval of information using the browser’s routine to load files from a
remote location, any sophisticated communication to the server, such as updating
resources, would need to be encoded in the requesting URI. This would violate the
specifications of HTTP [Fielding et al., 1999]. As a consequence, this approach is
generally useful to load related functionality on demand or data that is accessed in
a read-only manner.
The second workaround is the establishment of an AJAX proxy on the same Web
server the Web application is loaded from. Such a proxy receives an AJAX request
from the client application that encapsulates a request to a remote resource. The encapsulated request is unpacked and forwarded it to its original destination, through
HTTP communication elaborated on the server site. Compared to the JSONP approach, proxies allow relaying requests of any type, i.e. HTTP verbs. This method
further enables establishing caches between the Web application and the original
remote content provider, thus reducing network traffic and load on the latter. Unfortunately, this solution requires Web application providers to establish such a proxy
and users to trust the application provider to handle transmitted data with the appropriate responsibility.
14
http://en.wikipedia.org/wiki/Same_origin_policy
http://ajaxpatterns.org/On-Demand_Javascript
16
http://ajaxian.com/archives/jsonp-json-with-padding
15
21
3 Survey of Mashups and Mashup Tools
Due to the community driven emergence of mashups, sprouting everywhere as a result
of network effects, rather than being dominated by few bodies, exact definitions of
the term mashup are open to debate. It is even questionable, whether mashup can
be regarded as a term defining a class of applications or if it is rather an idiom that
captures a phenomenon of developments that are related to some extent. Hence,
the question at hand is not “What properties do mashups separate from other Web
applications?”, but rather “Which properties do mashups share?”.
This chapter attempts to answer the latter question, by studying 29 mashups and
mashup tools, aspiring to give an overview of the dimension of the mashup universe.
Although the focus of this survey is mashups, mashup tools were considered, too.
Giving end users the ability to create mashups without any programming efforts
and in order to speed up mashup development, many tools have emerged recently
[Yu et al., 2008]. Following predictions of the remarkable business value of mashups, such as in [Hof, 2005], so-called enterprise mashups have gained more attention
and have influenced the mashup universe from a business perspective. As a consequence, the survey regarded manually assembled mashups as well as those created
by a variety of mashup tools, whereas the assisting facilities of these tools were
not of primary interest. Related work, mainly considering tools, has been done by
[Hoyer and Fischer, 2008, IBM Corporation, 2008].
3.1 Selection of Samples
In order to obtain an independent population of sample mashups, two sources were
consulted. Refer to the appendix for a detailed list of the samples as well as the
evaluation of the examined properties.
ProgrammableWeb. ProgrammableWeb is the most prominent online information
source concerning mashups. Established 2006 by John Musser, ProgrammableWeb
keeps track of a large variety of mashups and Web APIs, providing detailed information, examples and links to their origins. Further on, ProgrammableWeb offers
information about the overall development of mashups, providing a magnitude of
statistics. It is also the first address, if one is looking for advice to implement a
mashup, search for appropriate APIs, or conduct a survey among mashups. Thus, it
was selected to contribute to the present survey.
23
3 Survey of Mashups and Mashup Tools
Mapping (37%)
Photo (11%)
Shopping (10%)
Search (8%)
Video (8%)
Travel (7%)
Social (5%)
News (5%)
Music (5%)
Messaging (4%)
Figure 5: Mashup Categories: Distribution of tags among registered mashups, from
http://www.programmableweb.com/mashups
Since ProgrammableWeb lists more than 3600 mashups at this time of writing, with
three more mashups added each day in average, a selection of a relatively small yet
well distributed subset had to be determined:
For each of the five most popular tag categories, the four most popular mashups that combine more than one source were chosen, excluding
duplicates. Popularity of tag categories is determined by the number of
mashups tagged accordingly (cf. Figure 5). Popularity of mashups within
a category is given by the ProgrammableWeb search API.
This set was retrieved using ProgrammableWeb’s own Web API17 . Unfortunately,
some of the listed mashups were not available anymore, resulting in a list of 18 mashups and mashup tools. These are: 2008 US Electoral Map, Adactio Elsewhere, Afrous,
Albumart, Baebo, Flash Earth, Forbes List of World’s 100 Most Powerful Celebrities,
Gaiagi Driver, PageFlakes, Sad Statements, sampa, SecretPrices, Sporting Sights,
TuneGlue°, Twitter Top News Trends, Vdiddy, Weather Bonk, and Wiinearby.net.
Market Overview of Enterprise Mashup Tools. As it turned out, the list of mashups extracted from ProgrammableWeb contained only non-commercial mashups,
and only a few mashup tools. However, as preliminary work for this thesis indicated, mashups have been considerably influenced by applications labeled Enterprise
17
http://api.programmableweb.com/
24
3.2 Classification Model
Mashups and according tools. A recent survey among enterprise mashup tools was
conducted by [Hoyer and Fischer, 2008].
While that survey focused on the specific tool aspects and cannot be used directly for
the present survey, the list of the reviewed mashup tools was reused here. Eliminating
those tools that were not available for inspection, the following tools were reviewed:
Dapper Factory, Google Mashup Editor, IBM Mashup Center, Intel Mash Maker,
Jackbe Presto, Microsoft Popfly, NetVibes, SAP Enterprise Mashup Platform, Serena
Mashup Suite, Yahoo! Pipes, and iGoogle.
3.2 Classification Model
The survey of mashups and mashup tools here is a rather exploratory examination
of each mashup contained in the lists above. Since there is no definition of mashups,
it was quite difficult to specify properties to assess each mashup. Thus, the review
process consisted of several passes, the first to gain an overview of mashups and
their properties, and to compile a list of evaluation criteria. This list is presented
below. In the second pass, mashups were examined for those criteria in detail. A
preliminary classification of mashup type was assembled, but refined later, resulting
in two classes for that property. In the third review pass, all mashups and mashup
tools were classified according to this differentiation.
name: The name of the mashup or mashup tool as referred to hereafter.
url: The URL of the mashup or, if it is not publicly available, of a site that gives
further information about the mashup. The latter is the case for most mashup
tools.
category: A list of tags that categorize mashups according to their essence. The
specific purpose of mashup tools is not unveiled until a user creates a mashup
with them. Thus, mashup tools are categorized as “tool” here.
description: A short summary of the mashup’s purpose, including additional information of how it works, and further information if necessary.
mashup type: Determines whether the mashup or tool-created mashup is of the type
organic mashup or dashboard. Refer to 3.4 for more information.
25
3 Survey of Mashups and Mashup Tools
input: Specifies information required from the user, such as search terms, product
names, etc.
output: Specifies the primary information that results from the mashup, as well as
the form it is presented in.
alternative output: Specifies alternative forms of output, if provided.
capabilities: Lists the capabilities or capability types that are aggregated by the
mashup, or the mashup created by a tool, respectively.
aggregation location: Describes where the different resources are aggregated, i.e.
whether this happens on a Web server that hosts and executes mashup logic,
on the client’s Web browser, or on both.
aggregation type: Describes how aggregation happens. In some, no real aggregation
happens beyond simultaneous display of data within a page, such as in several
lists.
technology: Describes the technology used to access and aggregate resources.
Due to their technical diversity, mashups were examined through reengineering, analyzing network traffic, and decomposing parts of the application according to the
specific cases. A completely objective and standardized analysis of each mashup was
not feasible. In some cases, information could not be obtained, because application
logic was hidden. Nevertheless, I believe that this survey yields valuable insights and
a relevant overview.
The set of samples is not considered large enough to support a profound statistical
analysis, supplying validity among all imaginable mashups in particular. However, it
serves well to synthesize and verify a set of commonalities among mashups in general.
These are presented in the following.
3.3 Synthesis of Survey Results
The survey supports the initial assumption of mashups being a phenomenon or genre
of Web applications, rather than a specific technology or architecture. The spectrum
of mashups is tremendous and while mashup tools and platforms such as Yahoo!
Pipes or iGoogle are perceived to be more popular, mashups that are programmed
26
3.4 Types of Mashups
by hand outweigh existing tools by far. Yet, due to their limited applicability to a
specific situation they are not that famous.
Mashups embrace the concept being served on demand. A mashup is just a description of how to combine capabilities upon its instantiation; its explicit representation
is a model. The instantiation of this representation puts the mashup application
into an execution context that allows accessing capabilities and interacting with the
user. Yahoo! Pipes even provides a visual representation of the created mashup
models. The term mashup is generally used to refer to the genre, the application
representation, or an instance at the same time.
3.4 Types of Mashups
During the survey of mashups and mashup tools it became apparent that end user
mashups can be partitioned into two basic types. These types differ mainly in their
architecture and resulting user interface. This partitioning is ascribed to different
application scenarios of mashups. There is no common agreement of mashup types
among related work at the time of writing this thesis. Thus, the terms organic
mashup and dashboard are introduced here to improve understanding in the course
of this thesis.
Early mashups were created manually to serve a specific, situational purpose of a
small group of users. Such a purpose was rather defined by a use case than the needs
of an individual person. Due to the manual creation that involves at least some
expertise in computer science, e.g. programming, these mashups will be referred to
as organic mashups, hereafter.
After the first wave of adoption, enterprises started to engage in mashup development. However, instead of serving a situational use case, enterprises aimed to serve
the needs of individuals—customers—offering them aggregate sites that were similar
to portals. Such aggregate sites allow users to dynamically assemble any piece of information the individual was interested in, for instance, visualized key performance
indicators of their business. Hence, the metaphor of an aggregate panel of gauges
showing these performance indicators. In the following this type of mashup will be
called a dashboard.
The distinction between organic mashups and dashboards is virtual and rather continuous. Single mashup instances may transition between the two types, embracing
27
3 Survey of Mashups and Mashup Tools
characteristics of both. One such example is WeatherBonk, an organic mashup that is
not customizable by end users. Still, it leverages the widget layout of dashboards.
3.4.1 Organic Mashups
Organic mashups can be regarded as the origin of mashups in general. The first
mashups such as HousingMaps, combined disparate sources of different formats and
assembled their output through programming. At that time Web APIs were not
common, and authors had to disassemble code and resource access mechanisms on
their own to obtain capabilities. The result have been mashups serving particular use
cases that were beneficial for a specific group of users. For HousingMaps the use case
is to find a place to live within a geographic proximity, visualized on a map. This
type of mashup is classically more information centric, which is why they are often
referred to as data mashups. [Jhingran, 2006, Merrill, 2006, Simmen et al., 2008,
Hoffman, 2007] even account only for data mashups in their considerations.
Dev.
User
▼
Mashup
Definition
Mashup Application
▼
fixed set of
capabilities
{
...
Figure 6: General Architecture of an Organic Mashup (FMC)
The term organic is devoted to the way these mashups are created: A finite set of
resources is aggregated in sophisticated ways, which requires a basic understanding
of how to aggregate information streams, statistics skills—typical skills of a programmer. As shown in Figure 6, the overall architecture of such a mashup is rather static:
the set of aggregated capabilities, the logic to combine them as well as the presentation logic required for visualization. Regardless of the static architecture, organic
mashups can provide dynamic, real time insights through accessing capabilities on
demand, obtaining freshly updated content. Organic mashups are usually developed
by few people compared to the number of users. Nevertheless, developers themselves
belong to the group of users quite often.
28
3.4 Types of Mashups
With the release of Yahoo! Pipes in early 2007, the manual implementation of organic mashups received tool support. More tool vendors entered the market: Afrous
and Microsoft Popfly, to name a few. These tools allow aggregating streams of
information with advanced operations and present them by means of rich visualization. Such tools often promise ease of creating mashups without any IT skills. From
personal experience, aggregating information streams with operators is a rather advanced task that requires a basic understanding of capabilities’ data formats and
aggregation algorithms.
3.4.2 Dashboards
As mentioned above, dashboards largely gained popularity when enterprises started
to look into mashup development. Mashups promised a new way of reusing existing
services through an appealing interface, facilitating aggregation and combination
of content without involvement of the IT department. Soon, the term Enterprise
Mashup was coined for dashboards that mashed corporate performance indicators
and applications, similar to portals [Gurram et al., 2008]. Portals are driven by
corporate entities, e.g. the IT department. That means, they are designed and
deployed centrally and similar for every user. Some portals allow for customization,
but still retain their basic content and layout. The content of portals generally needs
to be deployed to the portal backend [Bellas, 2004]. Dashboards, on the other hand,
are driven by end users and allow them to select, combine, and aggregate capabilities
freely, corresponding to the individual’s very demands. Due to the visual assembling
of content and their similarity to portals, dashboards are sometimes referred to as
aggregate sites.
The key driver of dashboards allowing such simple content aggregation is the widget. The term widget is derived from window and gadget and denotes a graphical
user interface component that embraces a closed set of functionality and visualization in its own independent life cycle. Widgets are essentially small applications
that can be composed into one large application, providing means for interaction
between and coordination among them. Through the metaphor of widgets, mashups
typically unify design-time and run-time support at once. Widgets can be rearranged within the dashboard at any time, and there may be widgets that themselves
allow for browsing widget repositories or develop new widgets, as in SAP EMAP
[Gurram et al., 2008].
Dashboards are hybrids of a tool that allows users to create and customize the personal site and a platform to host and execute the created mashups. This integration
29
3 Survey of Mashups and Mashup Tools
User
Dashboard
Dashboard
Configuration
▼
▼
Widget 1
...
Widget 2
▼
Dynamic Set of
Capabilities
{
...
Figure 7: General Architecture of a Dashboard (FMC)
approach makes them valuable for corporate users and individuals as well, since there
is no need to set up or host any artifacts on the client’s computer. Figure 7 shows
the abstract architecture of dashboards. Usually the user and developer, the person
that assembles the site, are the same. Programming is only required for the creation
of widgets, which can be simply organic mashups wrapped within a container that
complies with the dashboard platform. Widgets access their respective Web capabilities on their own. In contrast to organic mashups, dashboard tools allow virtually
anyone to assemble their own mashup without technical expertise. Rich interfaces
allow for drag-and-drop and simple layout of widgets.
Lately, companies have begun to research possibilities of enhancing interaction between widgets. Thus, widgets can dynamically react to state changes of other widgets
through message sending communication and act in a shared coordination. Implementation of such communication channels is a highly relevant topic, because the
same origin policy (cf. Section 2.3.3) of browsers restricts communication across
widgets of different domains. Discussion of this particular problem is beyond the
topic of the current work, detailed information and current solution approaches are
provided in [Abiteboul et al., 2008, Gurram et al., 2008, Jackson and Wang, 2007,
Keukelaere et al., 2008, López et al., 2008]
3.5 Common Characteristics of Mashups
The review of the mashups and mashup tools listed above yielded the synthesis of
the following qualities that are characteristic for mashups. Not all mashups embrace
30
3.5 Common Characteristics of Mashups
each of the listed properties and while they may not provide a conclusive definition
of mashups, they give an indication of whether an application is a mashup. The
findings of the survey are largely of a technical nature, accounting for the scope of
this thesis. [Novak and Voigt, 2006] provide a survey among organic mashups that
focuses on more social aspects and content topics of mashups.
The findings will also contribute to describing a general model for mashups and lay
the groundwork to examine mashups in the context of business process management
in Section 5.
3.5.1 User Centric
Mashups are applications for humans, typically satisfying needs of individuals or
narrow user groups. These needs are usually specific and not defined by strategic
business requirements. Often, mashups are even created by those who leverage them:
end users [Crupi and Warner, 2008a].
All reviewed mashups feature visually rich user interfaces that are not only graphically appealing, but also facilitate the exploration of aggregated content. One of the
most popular metaphors used is a map where data items are located according to
their respective positions, as in Wiinearby.net and Weather Bonk. TuneGlue° depicts
relationships between pieces of music as a network of connected dots, the distance between dots represents the distance between the particular songs’ resemblance. Rich
visualization of data is much easier to consume and understand by humans, because
of our ability to visually identify patterns: distribution, relationships, and shapes.
Mashup tools usually allow for a high degree of personalization. Dashboards especially, such as PageFlakes, that aggregate visual components within a single page,
offer the end user to decide, which information they want to see.
3.5.2 Small Scale
Typically, mashups deal with relatively few data sources and small sets of data,
compared to traditional data integration approaches. The actual mashups surveyed
indicate that the maximum amount of combined sources for organic mashups is
seven. For mashup tools this number is up to the user, but will typically be limited
as well. This observation complies well with the “seven phenomenon” stating that
humans can recognize up to seven elements out of a given set due to their cognitive
31
3 Survey of Mashups and Mashup Tools
capacities. Since mashups provide insight among disparate sources of information,
this limit seems to remain intact for them, too. The amount of output information
of a mashup is also limited. In many cases, less than 30 items—the typical length of
a syndication feed—or the size of the visualization container. Again, this limitation
supports users to gain an overview of relevant information.
Data aggregations beyond these limits are better solved by traditional data integration approaches, such as database systems. [Clarkin and Holmes, 2007] suggest
not to solve complicated aggregations among normalized and fragmented data with
mashups at all.
3.5.3 Open Standards
Mashups build on technologies and best practices that evolved with Web 2.0, sometimes referred to as Web-oriented Architecture (WOA) [Hinchcliffe, 2008]. An important issue of these, so-called, open standards is their wide adoption and acceptance among Web developers. They evolved bottom up in a survival-of-the-fittest
manner, reengineering the principles of the Web (c.f. REST [Fielding et al., 2002]),
rather than being forced by corporate governance. This has considerable impact
on simplicity, application neutral reuse of existing content, and low entry barriers.
[Clarkin and Holmes, 2007] and [Jhingran, 2006] emphasize that successful developments that seek wide adoption must leverage information standardization, i.e. agree
on global schemas and metadata.
In many cases, information is provided in the form of content syndication, where
the specific request is expressed via a URI that is configured with an according
query. OpenSearch [A9.com, Inc., 2007] is an extension for Atom that defines the
creation of a search-query URI as well as the syntax and semantics of the returned
search results document. Since mashups use Web APIs as building blocks, they form
exactly what Tim O’Reilly describes as applications in the “Internet Operating System” [O’Reilly, 2005], leveraging the Web as a platform. The popularity of mapping
mashups (cf. Section 1) demands open standards for geographical data. One such
standard is the Keyhole Markup Language [OGC, 2008] that does not only specifies the definition of geographic locations, but also allows the attachment of any
information related to these locations.
32
3.5 Common Characteristics of Mashups
3.5.4 Software as a Service
Mashups are Software as a Service, accessing sources and making themselves accessible on the Web. Standalone installation of mashups is more than questionable,
because of their underlying intention to satisfy particular, situational needs. Furthermore, since mashups consume resources accessible on the web, users need to be
online anyways.
As a matter of fact, all surveyed mashups—organic mashups and those created by
tools—run in a browser. Several, for instance Sad Statements or Twitter Top News
Trends, provide their content additionally as syndication feeds to be consumed by
other applications, including other mashups. Providing mashups as Software as a
Service has further benefits, as it is the key enabler for wide adoption. A sophisticated infrastructure to access them is not required, only a browser. Accessibility
from everywhere also establishes the basis for easy sharing and collaboration (cf.
Section 2.2.2).
3.5.5 Short Time to Market
Mashups provide solutions for a situational context, often limited in the duration of
application. [Clarkin and Holmes, 2007] argue that mashups that focus on a specific
need must be created in a short time, measured in hours or days rather than weeks
or months.
Mashups reuse content that is available in uncomplicated ways, e.g. simple formats and without hindering security restrictions. These contents are combined
with just a bit of “glue”, avoiding any work that has been done elsewhere. Such
lightweight software models benefit from the “building on the shoulders of giants” effect [Hinchcliffe, 2007], which fuels directly into shortened development times. Tools,
e.g. Yahoo! Pipes or PageFlakes, which enable end users to easily create mashups via
drag-and-drop, can most apparently reduce development time and effort. A flexible
framework can leverage existing resources, provide a set of connectors and operators
to compose them, and open mashup development to those who can define the specific needs best: the users of the application—domain experts, which are often not
programmers.
A particular quality of mashups is “good enough”, which means that building an
application that fulfills a specific need should not include more features than the
33
3 Survey of Mashups and Mashup Tools
problem itself requires. This shortens development times and lowers the cost of
application development. [Shirky, 2004]
3.5.6 Aggregation of Heterogeneous Content
Mashups explicitly aggregate heterogeneous content, that is knowledge and skill,
from disparate, unrelated sources in a non-invasive manner, retaining their original
purpose. This includes, but is not limited to, enacting Web services remotely, loading
functionality from external sources and executing it within the mashup application,
obtaining data from widespread sources, and employing skills for presentation and
visualization.
Sad Statements combines Twitter posts with Flickr photos using the Yahoo! Term
Extraction API18 Web service to extract relevant terms of Twitter posts. These
terms are used to find matching images on Flickr. The resulting pair—the Twitter
post and according Flickr images—is aggregated to a single sad statement, delivered
to the user as a static page. Wiinearby.net combines information about Wii console
offers on different retail platforms such as eBay, Amazon, or Craigslist, with mapping
capabilities of Google. In contrast to the preceding example where the aggregation
is carried out on a Web server, Wiinearby.net executes mapping functionality completely in the context of the provided Web page, on client site.
Many mashups automatically acquire content in realtime, i.e. using data that is up
to date to be aggregated. This is important to give insight into a set of information that changes over time, and the basis for decision support. Freshness matters
[Clarkin and Holmes, 2007]. It also encourages for a more lightweight architecture.
Instead of storing obtained content, aggregation is done based on data obtained live.
[Bradley, 2007] even believes that mashups do not have any native data storage or
content repository at all.
3.5.7 Data Centric
While aggregated capabilities are not limited to data, the main concern of all reviewed mashups is comprehensive data presentation, providing insight and understanding. According to [Novak and Voigt, 2006], most of the used APIs are simply
syndication feeds, providing data. All surveyed mashups aggregate content, only
18
http://developer.yahoo.com/search/content/V1/termExtraction.html
34
3.5 Common Characteristics of Mashups
some employ external services to enhance their data. Moreover, most of the mashups consume content read-only. Some widgets available for dashboards, for instance
iGoogle, allow manipulation of data. This observation suggests that mashups mainly
focus on information aggregation and enhancement.
The reason for this is simplicity. Manipulating resources generally requires some kind
of authorization and authentication, which is difficult for mashups, because users
may not surrender their credentials to the hands of a mashup developer and current
browser implementations have limitations concerning authorization among several
domains (cf. Section 2.3.3). These obstacles hinder easy aggregation of services to
manipulate data. iGoogle is an exception, because in order to access one’s personal
page, one has to be logged in with a Google account, which automatically gives
access to the magnitude of mashable resources Google offers for their users. Ongoing
development in the subject of shared identities, such as OpenID [OIDF, 2007] and
future browser development will circumvent these issues, making mashups more data
interactive.
3.5.8 Lack of Governance
The term governance refers to decisions that define expectations, grant power, and
verify performance. Considering mashups, this relates to accessing capabilities in a
courteous way, granting access to trusted capabilities through authentication and authorization, and ensuring qualities, such as availability and reliability of the mashup
and accuracy of content. Most of the surveyed mashups consume capabilities that are
provided freely, often through commercial providers, such as Yahoo! (e.g. Flickr),
Google (e.g. Google Maps), or Microsoft (e.g. Microsoft Virtual Earth), neglecting
governance in the main. Based on demands for simplicity, such as lightweight composition, uncomplicated open standards, spontaneous selection of capability sources,
governance gets in the way in rapidly creating situational solutions.
The largest deficiency is found in distributed identity and authorization management. Data access, in particular, is specific to user identities, which are disparate
user accounts in many cases. If a mashup accesses several confined resources, it may
be required to provide different authentication credentials separately to each of these
services. Such limitations hinder the development of mashups that access personal
information. Classic enterprise software solves this with single-sign-on, which is not
feasible for mashups because the accessed capabilities are generally not under the
control of the mashup provider. OpenID and OAuth [OAuth Core Workgroup, 2005]
provide means for federated authorization under the assumption of one globally
35
3 Survey of Mashups and Mashup Tools
unique identity for one user. While this approach promises value, it is inhibited
by the fact that users typically have at least one authorization identity, i.e. user
account, for each of the different capabilities that would be accessed and mashed.
One solution for the aforementioned problem is to provide mashups subjected to the
control of trustworthy corporate entities and provide certified access under terms
of data privacy. However, such trustful relationships do not generally apply to the
broad universe of organic mashups.
Another problem of distributed identities is the same origin policy of Web browsers
(cf. Section 2.3.3) that deny to access capabilities originating from another domain than the originally loaded document, i.e. access disparate capabilities. Future
versions of Web browsers are likely to address this issue natively, and approaches exist to circumvent these restrictions for mashups, as in [Isaacs and Manolescu, 2008,
Jackson and Wang, 2007, Keukelaere et al., 2008].
Courteous access to capabilities refers to the goal not to harm the capabilities that
are aggregated under the covers of a mashup. Such harm can happen in many ways:
a popular mashup may result in highly increased access frequency of a capability
beyond its capacities and render it unavailable. Flash crowds are a common phenomenon that appears when Web sources are advertised publicly. Further, erroneous
or not carefully crafted mashups may cause denial of service of capabilities that were
not intended to be reused for mashups. Capability providers attempt to address
these issues by providing their content as separate resources via Web APIs. These
incorporate caching and provide open standards interfaces, as mentioned previously.
[Phifer, 2008] advances the view that mashup developers should be in charge in order
to avoid harming of capabilities. This can be done by employing caches, monitoring, and trust management in the mashup infrastructure itself. These measures also
contribute to the quality of mashups, for instance reliability, response times, and
availability, even if capabilities may not be reachable.
A certain level of quality of obtained information, in terms of completeness, accurateness, and timeliness, is difficult to achieve, since capability providers are unlikely
to provide any service level agreements for resources that they publish freely available. [Alba et al., 2008] discuss data accuracy further. In general, it is the mashup
developer that is responsible for ensuring that provided content is accurate according
to the mashup’s purpose.
In general, it is observable that too much governance can result in a disservice to
mashups. The reason for the lack of governance in current mashups can be found
in its hindrance of aggregating capabilities. Thus, governance must address the
36
3.5 Common Characteristics of Mashups
demands of each particular mashup and provide just enough control to satisfy those
needs.
37
4 Anatomy of a Mashup
This chapter will pick up the gained knowledge of the mashup survey and investigate concepts that shape the current perception of mashups. This begins at a
distance, viewing mashups in their environment. The mashup ecosystem puts the
term mashup into a holistic perspective, explaining the process of mashup creation
and usage. The term ecosystem expresses also a balanced system resulting from
the evolution of mashups. Following, the mashup and its inner workings will be
approached, identifying a typical pattern of activities that is common to mashups
in general. A formal representation of this pattern will yield a reference model that
allows developers to examine and understand existing mashups, but also to design
new ones on a conceptual level.
In order to put the conclusions drawn here into a particular context and avoid confusion, the discussion of mashups will be limited to the following properties in the
remainder of this chapter, unless stated otherwise. While these restrictions apply to
all mashups reviewed in the previous survey (cf. Section 3), they do not constrain a
mashup definition in general.
Mashups are software systems that aggregate knowledge (information) or
skills (functionality) from two or more sources providing capabilities that
are different in their purpose. They consume and provide capabilities in
human and/or machine consumable formats over the Web.
The additional value created by mashups results from the aggregation of
the essence of the capabilities, not from sophisticated operations defined
through the mashup that are executed on top of the capabilities.
Mashups further provide continuous views on dynamic capabilities obtained in real time rather than working as transformation tools that take
static data as input and generate a statically used output.
4.1 The Mashup Ecosystem
The term ecosystem originates from the field of biology, delineated as the community
of interacting organisms and their physical environment, and has been adopted in
many fields of academia and industry. In accordance with this definition, a mashup
ecosystem describes the mashup, the entities interacting with mashups, and their
physical environment. Entities need to be understood as roles. Figure 8 depicts
39
4 Anatomy of a Mashup
..
..
the entities of the mashup ecosystem, capability provider, mashup, and mashup
consumer, and the way they interact with each other.
Mashup Site
Mashup
Specification
Execution Context
Expertise
Provider
User
Agent
Mashup
Consumer
Mashup
Application
Mashup
..
▼
▼
..
User
Capability
Capability
Provider
Figure 8: The Mashup Ecosystem, depicting the entities mashup consumer, mashup,
and capability provider (FMC)
Capability Providers offer some kind of expertise, generally distinguished between
skill, i.e. functionality, and knowledge, i.e. information. The entity that encompasses
domain expertise may not have the intention to maintain the technical exposition of
the according capability19 on the Web. However, mashups create value by composing
these very capabilities. An entity may act as intermediary and provide a capability
on behalf of the expertise provider. Unless stated otherwise, expertise provider and
technical provider are considered the same entity hereafter, referred to as capability
provider.
Mashups play the principal role in the mashup ecosystem, comprising the mashup
site, storing a mashup’s specification, and the mashup application instances with
their execution context (cf. Section 3.3). The mashup site is a centralized storage
that allows the delivery of the mashup specification over the Web, offering it as
Software as a Service (cf. Section 3.5.4). The execution context of the mashup may
differ from the mashup site [Merrill, 2006]. Many organic mashups are executed
centrally at the mashup site providing aggregated content to the client, whereas
19
Capabilities are modeled as active entities, according to Section 4.2
40
4.1 The Mashup Ecosystem
dashboards are typically delivered as specifications to and executed completely within
the user’s Web browser. Mashup execution may also span both locations.
Mashup Consumers define the needs mashups are determined to satisfy. Representing the target group a mashup serves, it is the users, who create the mashup in
the first place (cf. Section 3.5.1)—developers are most likely end users themselves.
In general, end users are represented to the mashup by an application that makes
mashups human consumable: the user agent, primarily a browser. In turn, mashups
themselves may consume other mashups, which then act as capability providers.
In a nutshell, these entities interact as follows. Capability provision: Some entities
offer capabilities. The capabilities are accessible through an interface that is understood by the mashup application and can be discovered and explored by mashup
developers. Mashup creation: After identifying their needs, developers select the
capabilities that offer beneficial expertise and create the mashup, by programming,
using tools, or both20 . The mashup is stored as a description of its task: the mashup
specification. Mashup execution: At the time a mashup is requested to fulfill its
purpose, a mashup application instance is initialized and accesses the specified capabilities, obtains the offered expertise, be it knowledge or skill, combines them
according to its specification, and returns the mashup result to the consumer.
The environment that encompasses mashups is the Web in its entirety, including
private content delivery networks. The latter are usually corporate networks that
isolate valuable and confidential information from unwitting access from outside the
network. This is considered one of the beneficial aspects of mashups. Section 3.5.4
outlined already that mashups are characteristically Software as a Service and don’t
need any installation on the client side, which makes them accessible from everywhere. According to [Berners-Lee, 1996], interaction on the Web happens in compliance with a uniform interface, implemented by various transport protocols. This
is in particular HTTP [Fielding et al., 1999] as today’s most common protocol to
access resources on the Web.
This environment is considerably influenced by effects of Web 2.0, including the
growing popularity of the REST architectural style, described in Section 2.3.2. The
following sections will emphasize that further. However, since there is no strict
definition of mashups, they may and do employ patterns that do not comply with
the constraints REST imposes.
20
Mashup creation in its various forms is depicted abstractly through write access to the mashup
specification in Figure 8
41
4 Anatomy of a Mashup
4.2 Capabilities—Essential Mashup Enablers
[Hinchcliffe, 2006] aptly states that mashups aggregate “whatever needs to be aggregated”. The continuum of capability types is wide and most often refers to websites
and content syndication feeds. Without loss of generality, one can summarize this
continuum as the set of capabilities that are exposed within the mashup ecosystem
environment. This is supported by the observations of the survey, in particular in
Section 3.5.6.
Capabilities are accessible via a rather simple interface: A request message sent to
an Internet address is responded with the transmission of a hypertext document,
representing the capability’s expertise. This characterization of a capability matches
the definition of a resource in REST [Fielding, 2000]: “the intended conceptual target
of a hypertext reference”. The reason for that is rather simple: While REST does
not only denote a desirable architectural style for large scale, distributed applications, it describes the very architectural properties that have become the foundation
for the modern Web architecture and contributed largely to its technical success
[Fielding et al., 2002]. Therefore, most existing capabilities, such as Web sites, are
resources implicitly.
Resource Orientation. Resources in the context of REST are generic by nature
and thus, retain application independence promoting reuse at the granularity of the
resource’s very expertise, not any domain logic built on top. Offering access under these assumptions, resources share their state with external entities. This state
may be exposed for consumption only, or be offered for manipulation by consumers.
Exposing capabilities in the context of resource orientation adds value beyond the
costs involved, offering low entry barriers, Internet-size scaling, and instant deployment [Overdick, 2007]—a hypothesis that is backed by the many RESTful APIs
published by commercial and noncommercial resource providers: According to ProgrammableWeb 66% of more than 1150 APIs registered at the time of writing are
based on the REST style. 21
One, if not the most important, element of resource orientation in distributed hypertext systems is the exchange of the resource’s state through representation that is a
semi-structured hypertext document [Fielding et al., 2002]. Instead of manipulating
data through one of various incarnations of remote function calls, the information
set representing a resource’s state is moved to the processor (cf. Section 2.3.2). Hypertext and its superset hypermedia allow submitting semantically structured data
21
http://www.programmableweb.com/apis
42
4.2 Capabilities—Essential Mashup Enablers
along with references to related resources. Thus, representations contain arbitrarily
structured content that may, or may not, hide information about the private state of
the resource. Representations can further include directives to process information,
such as JavaScript, and directives to render the content, such as Cascading Style
Sheets (CSS). By that, representations comprise data, logic, and presentation.
While most of the consumed capabilities are centric to data delivery (cf. Sections 3.5.7 and 3.5.8), resources also offer behavior beyond delivering data. Yahoo!
Pipes, for instance, offers a module to extract location information from free text.22
Other services even allow changing the state of resources. The intended behavior is
expressed via a set of exposed methods23 that allow updating the resource’s state
(cf. Section 2.3.2).
Web APIs. Capabilities on the Web are often referred to as Web APIs. Similar
to the term mashup, Web APIs are deficient in a crisp definition. Web APIs can
be considered to express a particular behavior through providing a Web oriented
programming interface. Thus, Web API refers to one resource or set of related
resources, exposing behavior and representation as described above, with the explicit
intention to be used as service interface by other applications.
Some capabilities are offered through intermediaries, because the owner of domain
expertise is not willing to expose that expertise. Very often, this intermediary employs Web APIs to provide that expertise, explicitly offering it for reuse. Within the
last two years, many organizations operating on the Web started to offer Web APIs
as an explicit alternative to consume their expertise in applications, rather than by
screen-scraping their Web sites. Consequently, Web APIs became effective means to
manage governance among capabilities (cf. Section 3.5.8).
A common and widely adopted open standard for Web APIs is the Atom Publishing
Protocol. AtomPub leverages the Atom Syndication Format to represent collections
of data items in semi-structured documents, and applies an interaction protocol
enabling humans and machines to create and update content. AtomPub interaction
obeys the constraints of REST almost purely by the definition of the protocol and
thus, received much attention. Google’s Data APIs24 , for instance, are based on and
extend AtomPub.
22
http://pipes.yahoo.com/pipes/docs?doc=location
HTTP refers to these methods as verbs. [Fielding et al., 1999]
24
http://code.google.com/apis/gdata/
23
43
4 Anatomy of a Mashup
Web APIs are not only located at the level of the resource’s interface. More sophisticated APIs may be exposed through functionality that is executed locally to the
mashup, providing skill and knowledge through accessing further resources under the
hood. The Google Maps API is an example for such. It is an API implemented in
JavaScript that offers functionality on program code level. This functionality provides means to draw maps, position location pointers on it, and enrich these with
related information. The API’s implementation autonomously accesses and loads
geographic data and images from Google’s servers.
4.3 The Mashup Pattern
Capability
Provider
Mashup
Mashup
Consumer
Due to their bottom up evolution, mashups have a naturally small denominator
of commonalities beyond the high level characteristics derived from the survey in
Section 3.5. Being rapid implementations of specific needs, mashups are remarkably
manifold in their architecture and the way they consume capabilities. However, the
quantity of reviewed mashups allows derive a pattern that consists of three activities
taking place within a mashups to aggregate disparate capabilities and creating value.
Figure 9 illustrates this pattern and puts it into the context of the mashup ecosystem,
described above.
Send
Request
Display
Response
Ingest
Request
Expertise
Normalize
Response
Augment
Publish
Provide
Expertise
Figure 9: Overview of the Mashup Pattern (BPMN)
The three activities are ingestion, augmentation, and publication. The ingestion
phase accesses capabilities and prepares them for further processing. The capabilities
44
4.3 The Mashup Pattern
are combined and aggregated during augmentation, and the resulting content is
repackaged to be delivered to the client in the publication phase. The main activities
are ordered by causal dependency within the diagram. As further discussions will
reveal, the actual ordering may vary within particular mashups. Some ingestion
will happen before, during, and even after augmentation, depending on the specific
requirements and implementation of mashups.
4.3.1 Ingestion
Ingestion is the act of harvesting heterogeneous capabilities that are spread in networks and encapsulating them to facilitate their usage by a single application—the
mashup. By that, ingestion serves two purposes.
On the one hand, ingestion acts as connector to remote capabilities. As explained
previously, the majority of capabilities are explicitly or implicitly offered as resources
in the terms of REST. However, capabilities exist that are not exposed by the same
means. To name a few, databases, legacy systems, proprietary documents, e.g.
spread sheets, are valuable capabilities especially in a corporate setting. While such
capabilities may also be exposed in a general effort of service enabling knowledge assets within organizations, it is as likely that they aren’t. Specific ingestion connectors
to access the corresponding systems and obtain the capabilities are thus required. If
capabilities are Web APIs, the according ingestion component must understand and
obey to the interaction protocol of these APIs. RESTful APIs increase reusability
beyond anticipated use cases greatly by providing a uniform yet versatile interface.
As Section 3.5.8 indicates, resources that allow the manipulation of shared state are
rather seldom, due to the absence of adequate secure cross-domain communication
capabilities of browsers (cf. Section 2.3.3).
On the other hand, ingestion serves as a primer for representations obtained from the
capability providers. In most cases, the relevant asset of representations is data. In
addition to the broad diversity of capabilities and access methods, data is delivered
in an equally broad variety of data types and formats (cf. Section 3.5.6). Content
not provided in formats that address machine consumption in the first place, such as
human-oriented websites, emails, and spreadsheets, require more sophisticated information extraction systems, further discussed in [Kayed and Shalaan, 2006]. One of
the most controversial methods to extract accurate information is screen scraping,
i.e. using algorithms and templates to extract content from semi-structured documents intended to be rendered preferably in a Web browser. While this approach is
prone to faults due to ever changing site structures, [Alba et al., 2008] underline its
45
4 Anatomy of a Mashup
significance in present scenarios. Formats that are designed for machine consumption such as syndication formats, like Atom and RSS, or even semantically structured
documents, e.g. via RDF, promise much better performance, yet is their application
still delayed. RDF especially assumes considerable knowledge of the format and its
semantics to utilize it effectively.
After successfully unlocking these resources, exchanged information must be normalized into an agreed format that sustains the operations in the augmentation
phase. Functionality may need to be wrapped to comply with specific interfaces, or
be analyzed and disarmed in order to avoid malicious code to harm users or other
capabilities, as in [Isaacs and Manolescu, 2008].
Due to the need to discover and explore potential mashable capabilities in the first
place, mashup tools often include catalogues or repositories that let users and developers browse through a set of capabilities. In most cases, the tools already provide readymade ingestion components for them. Some tools even offer facilities to
import capabilities intended for other tools or mashups [Gurram et al., 2008]. ProgrammableWeb provides a freely available, well documented catalogue of over 1000
Web APIs that are offered on the Web and consumed by diverse mashups.
4.3.2 Augmentation
Augmentation denotes the application of competence among a set of capabilities, including knowledge and skill, with the objective of value co-creation.
Augmentation aggregates the capabilities that were obtained and normalized during
the ingestion phase. Here, aggregation stands universally for any means to combine
these capabilities in meaningful ways to create value, while retaining their essence.
The competence, often referred to as the “glue” to put the pieces together, is provided
by the domain expert. As outlined in Section 2.2.3 this is the mashup developer and
user, which may in fact be the same person. Mashup applications stand out by the
relatively small amount of work involved to create them, that is, to automate the
application of competence. In that context, “relatively small” means the amount of
effort compared to traditional integration of data and functionality.
Augmentation is the main focus of literature concerning mashups. Some academic
papers put their main focus on the interaction of software components, for instance [Gurram et al., 2008, López et al., 2008], whereas others understand data aggregation as central matter, for example [Morbidoni et al., 2007, Riabov et al., 2008,
46
4.3 The Mashup Pattern
Simmen et al., 2008]. In general, this allows for the distinction of two main strategies of capability augmentation, as suggested in [Abiteboul et al., 2008]: augmentation through interaction of components and augmentation through chaining a set
of operations. Both approaches leverage components that may provide information,
functionality, or both. For the two augmentation strategies, it is of no relevance
whether functionality is provided locally, i.e. as executable component within the
execution environment of a mashup, or remotely, i.e. as a service provided through
a resource.
The first approach considers a set of interacting components that are autonomous
to some extent. Each component provides certain expertise, some of them by encapsulating external capabilities. Augmentation happens through interaction among
these components, connecting data from one component with that of another and
creating value. This usually happens as a response to user interaction, after the
mashup application was delivered to the user and has been initialized. This type
of augmentation, often called assembly or wiring, is generally used in dashboards,
where widgets interact as a result of user interaction with the dashboard application [Hoyer and Fischer, 2008]. For such interaction, the execution context of the
mashup application must provide communication channels that enable interaction
among the components. The most often used architectural design pattern to enable
interaction without closely coupling components is the publisher-subscriber pattern
[Yu et al., 2008]. Due to its interactive behavior, this type of aggregation is suitable
to explore information sets and discover insights among them.
The second type of aggregation is more interesting from an academic point of view
and thus much better covered by scientific work. In contrast to the first approach,
aggregation happens before content is delivered to the user. Much research has been
conducted that focused on the sole integration of several data capabilities, creating
new information through combining separate yet related information. A lot of different methods have been developed to achieve this goal, the most prominent being
the pipes-and-filter pattern [Hohpe and Woolf, 2003], where operational filters are
chained to create a complex process of data transformation. This has been applied by
[Simmen et al., 2008], and in Yahoo! Pipes and Microsoft Popfly. More formal work
contains query languages to aggregate data capabilities [Jarrar and Dikaiakos, 2008,
Tatemura et al., 2007]. [Riabov et al., 2008] describe automatic data composition
through the definition of goals. Details of the particular aggregation approaches
are well covered in existing work and thus, beyond the scope of this thesis that
rather aims to gain a holistic picture of mashups. In essence, a general metaphor
can be deduced from these particular approaches: an ordered set of operations that
are performed in coordination—an execution plan [López et al., 2008]—operating on
collections of semi-structured information items that are generally syndication feeds.
47
4 Anatomy of a Mashup
These operations comprise the “glue”, i.e. basic functionality derived from relational
algebra, and functionality exposed through resources, locally or remotely as described
above.
An important issue, in the aggregation of hypertext, is the significance of links that
express application logic in a broader sense (cf. Section 2.3.2). On the Web, hypertext is the engine of application state: Links identify related information, e.g.
directives to render or to process information, and control paths to advance application state. Aggregation of hypertext must, therefore, consider the meaning of links
and ensure their integrity. Carrying links along aggregation of documents allows
increasing data quality and trust, since the end user can obtain knowledge about the
origin of information.
4.3.3 Publication
Publication is the act of transforming augmentation results into meaningful data
formats and delivering them to the recipient of the mashup.
While the result of the augmentation phase may be artifacts of arbitrary type, for
instance the compilation of a set of algorithms, enhanced collections of data items,
or data sets along with processing and displaying directives, these results must be
transformed into a specific representation. This representation must be understood
by the recipient, the mashup consumer, and serialized to be transported to the client
via a transport protocol, that is, typically HTTP (cf. Section 3.5.4).
The transformation of augmentation artifacts into a representation follows any of
three options, discussed in detail in [Fielding et al., 2002]: (1) render into a fixed
format image and thus, hiding internal data, (2) encapsulate content with processing
and rendering instructions, and (3) send raw data along with a set of metadata
expressing the data type and format. All three options allow delivering information
as well as functionality. For instance, (1) applied to functionality yields a compiled
and executable software component, such as a Java browser applet.
In general, publication produces a hybrid representation of all three options, hypertext containing or referring to data, logic, and presentation directives, similar to
the representations obtained from capabilities (cf. Section 4.2). The actual representation that is provided by publication depends on the demands of the mashup
consumer. While this is a human in most cases, [Ankolekar et al., 2007] do advocate
that mashups, as consumers of reusable capabilities, should offer their capabilities
48
4.4 The Mashup Reference Model
Ingest
share value and
offer for reuse
discover and mix
capabilities
Publish
Augment
assemble application,
explore different combinations
and insights
Figure 10: End-to-End Mashup Workflow
for machine consumption as well. Thus, a mashup may provide different types of
publication to address different users and devices.
This reuse of reused capabilities establishes a circular sequence of the activities that
comprise the mashup pattern. [IBM Corporation, 2008] labels this cyclic workflow
“End-to-End Mashup Workflow”, depicted in Figure 10.
4.4 The Mashup Reference Model
The term reference model denotes an abstract framework that allows understanding significant entities and relationships among them, within a certain environment.
Reference models do not make any assumptions about technology, platforms or architectures.
The increasing popularity and rising interest of companies in mashups led several institutions to approach reference architectures for mashups, among them are
[Bradley, 2007, López et al., 2008]. However, these proposals are approaching mashups on a technical level, although there is no particular technical denominator among
the different mashups. These attempts limit the variety and flexibility, and thus, the
potential of mashups. [Abiteboul et al., 2008] provide an abstract model to formally
define mashups on the basis of relations describing interfaces of mashup components,
so-called mashlets. While this model retains the potential of the mashup variety, it is
49
4 Anatomy of a Mashup
too formal, comprising a complicated mashup specification language. As this thesis
argued so far, mashups grew bottom up and gained momentum through the high
degree of simplicity and freedom to discover new and unanticipated, ad-hoc combinations of capabilities. Thus, a reference model needs to address that freedom and
be limited to a logical level rather than an architectural or technical one.
The reference model presented here, picks up the component-orientation of mashups presented in [Abiteboul et al., 2008], relaxes it, and overcomes the shortcomings
of earlier approaches to define a reference architecture, such as in [Bradley, 2007,
López et al., 2008]. For this purpose, the presented model advances the observed
mashup pattern and provides a structural composition approach that facilitates the
understanding of the processes taking place inside a mashup. This does not only
allow the logical decomposition of existing mashups into the parts identified by the
mashup pattern, but is also a first step on the way to design new mashups on a
conceptual level. This will directly contribute to the development of the proof of
concept, provided in Section 6.
REST Style
Mashup Reference Model
deliver
Resource
Mashup
Component
Mashup
Ingestion
Component
Augmentation
Component
consume ▲
▶
consume
Publication
Component
deliver ▼
Representation
Figure 11: Mashup Reference Model (UML)
In correspondence with the observations made in Section 4.2, the REST architectural
style is chosen as a reference system, that allows remaining accurate within a specific
context, without loss of generality. Consequently, a mashup is considered a resource
that accesses capabilities in the form of representations and provides its state as
50
4.4 The Mashup Reference Model
representation itself. The reference model and the REST context are depicted in
Figure 11.
The model is a metamodel—a concrete mashup model is an instance of the mashup
reference model. The reference model does not necessarily describe any architectural, but rather logical structures of mashups. Especially, architectural separation
of components is enforced by no means. According to the reference model, a mashup
is comprised of a set of components. These components are classified according to
their role within the mashup pattern: ingestion, augmentation, or publication. Each
component provides knowledge or skill, i.e. information or functionality to process
information.
The components are connected through delivery-consumption-associations that illustrate data flow channels between the components. The symbol I denotes the
direction of data flowing along the channels. While there is no general restriction,
these channels work following a request-response pattern, i.e. a component requests
data from a preceding component and consumes the corresponding response. Thus,
control flow is established implicitly through data flow.
This kind of interaction comprises both strategies of augmentation, discussed in Section 4.3.2. It allows semi-autonomous components to interact as a result of external
events in any ways and serves to describe processes that aggregate capabilities in
preparation of content delivery to the user. In the latter case, an incarnation of the
reference model can be perceived as a process model, describing the process of value
co-creation inside a mashup. The following section discusses each component and its
relations to other component types in detail.
4.4.1 Reference Model Components
Ingestion. Ingestion components are connectors to access resources. In general one
can assume one ingestion component per consumed capability. Ingestion components
consume information and normalize data, if the according capability provided data.
Otherwise, they encapsulate the functionality provided by a capability in a way
that it does not matter whether functionality is executed remotely or retrieved and
executed locally.
Ingestion components only consume representations of capabilities; they are not receiver of data flow from other mashup components. However, they do not necessarily
51
4 Anatomy of a Mashup
perform before any augmentation happens, they may be requested to access functionality from external resources as part of the augmentation phase.
Government mechanisms are generally implemented by ingestion components. Since
one ingestion component accesses one particular resource, it knows interaction protocols and potential authentication and authorization measures. As REST explicitly
promotes the presence of transparent intermediaries [Fielding et al., 2002], ingestion
components may also employ caches to reduce network traffic and shorten response
times, thus providing courteous access discussed in Section 3.5.8.
Augmentation. Components performing under the control of augmentation are
generally richly structured. Based on the granularity of composition, one could
express each operation that is applied to data as a single augmentation component.
Thus, augmentation components consume data that is delivered from ingestion components or augmentation components. No restrictions exist about the number of
incoming channels of an augmentation component. This allows for effective aggregation of information retrieved through disparate ingestion components. Augmentation
components can also incorporate logic that affects the control flow within the augmentation phase. By these means, augmentation comprises a network of components
that provide the value co-creation of a mashup, containing all domain knowledge that
is required.
Augmentation components may be executed on the mashup site, locally on the client
site, or be spread among client and server, spanning the execution context of a
mashup across physical and network boundaries. As mentioned earlier, ingestion
components may be invoked during augmentation, e.g. to retrieve related information or load functionality on demand.
Publication. Publication comprises the transformation steps that are required to
deliver aggregated value to the user, i.e. creating a form of representation that can
be rendered and displayed by the user agent. This incorporates the construction of
semi-structured or structured documents according to a data format, in most cases.
While publication components generally consume data from augmentation components, they may request capabilities through ingestion components as well, e.g. if
a capability provides means for data transformation. An example for such is the
Google Charts API25 , that provides a service to easily create a diagram out of a
statistical data set.
25
http://code.google.com/apis/chart/
52
4.4 The Mashup Reference Model
Since publication components are considered the user endpoint of mashups, they may
implement a set of governance mechanisms as well. This includes, but is not limited
to, the handling of authentication and authorization of users to access a mashup, or
employ caches that can reduce traffic and resource consumption.
4.4.2 Organic Mashups and Dashboards
The mashup reference model further allows putting the two types of mashups—
organic mashups and dashboards, introduced in Section 3.4—into relation to each
other. An organic mashup can be considered as an arbitrary composition of mashup
components following the model described above. That is the reason for their name,
they are developed organically. Dashboards, on the contrary, can be considered
as mashups that combine a set of other mashups. Figure 12 depicts that circular
relationship, already pointed out in Section 4.3.3.
Mashup
Organic
Mashup
Dashboard
Widget
Figure 12: Organic Mashups and Dashboards according to the Mashup Reference
Model (UML)
Dashboards consist of a dynamic set of widgets and each widget interacts with the
user on its own responsibility. This suggests that widgets may be constructed logically from the same components as mashups in general. They incorporate their own
publication components, employ ingestion components to access remote capabilities,
and may perform operations in between. However, compared to mashups as defined
here, widgets do not necessarily aggregate more than one capability but provide an
alternative interface to a single capability. Nevertheless, they still conform to the
mashup reference model in a relaxed way.
53
5 Application of Mashups for Business Process
Management
Business process management traditionally addresses operations and their relationships that form a value creating process inside an organization and systems that
provide the capacity to conduct these activities and processes. These processes and
their representations, process models, are central assets of knowledge within an organization, since they comprise the expertise to create value out of a set of resources.
Due to their importance to the organization, business processes are carefully crafted
and highly governed, which requires much effort to design, implement, and execute
a business process.
The observations and conclusions of the preceding chapters have shown typical characteristics of mashups (cf. Section 3.5) and their value in organizational environments
(cf. Section 3.4). This chapter analyzes to which extent mashups can contribute to
business process management. Successful application of mashups has to address the
main goals of business process management, which are, therefore, recapitulated first.
The business process life cycle is a useful tool to understand the objectives and concepts of business process management and suits well for scoping potential mashup
scenarios here. Therefore, this life cycle is explained before the value proposition of
mashups for business process management is examined in different settings.
Alike earlier discussions and explanations, this chapter will not address a specific
technical infrastructure for its examinations. While an apparent trend of enabling
software systems for Web access has been growing within organizations recently, it
would exceed the scope of this work that rather focuses on a conceptual application
of mashups for business process management. A more technical and concrete insight
into the technical realization of a mashup will be given in the next chapter that
provides proof for the observations made.
5.1 Goals of Business Process Management
To understand the potential of mashups for business process management, a summary of the main goals of the latter is given. In order to contribute to business
process management, mashups need to address these goals in a constructive way.
55
5 Application of Mashups for Business Process Management
Understand the Operations of an Organization. The most important goal of business process management is to gain insight and understanding of the operations of
an organization and their relations [Weske, 2007], answering the question of how an
organization works in a holistic way, yet detailed enough to address the operational
business. The core concept of this understanding is an explicit representation of the
process through a model. Such a representation is only of significant use, if it creates
common understanding among stakeholders and allows them to review and improve
the represented process. Semantically rich and syntactically strict representations
support formal model checking, i.e. verification of the soundness of process models.
Graphical representations proved to be very useable in stakeholder reviews, because
visual representations facilitate understanding.
Implement Processes in an Enterprise Environment. If existing processes are
understood and captured in models, redesigned, or new processes were established,
they need to be implemented within an organizational and technical environment.
That means, they need to be integrated into and executed in a controlled context that
reduces proneness to errors and provides effective handling of exceptions. Integration
with existing systems, which may rather be information centric systems, through
reuse reduces the gap between concrete business processes and their realization in
software systems. As important as the implementation into an IT systems landscape
is the implementation into the organizational environment, that is, efficient and
effective coordination of resources. Process instances interact with systems and with
human participants. In an organizational setting, persons are represented by their
competences and positions—characteristics that primarily define a person’s activities
within a process. Increasing the efficiency of a process also requires increasing the
efficiency of participant interactions.
Establish Flexibility. Besides explicit establishment of processes and their realization, the key operational goal of business process management is flexibility. The
ability to adapt to changing influential factors is required among several dimensions.
In today’s rapidly changing market dynamics, it is essential to adjust affected processes in short time to prevent loss of market share. Equally important are improvements to the process itself. Systems managing business processes must further be
able to adapt to changing technical situations, such as changes in the organization’s
IT landscape without affecting the business process itself.
56
5.2 Business Process Life Cycle
5.2 Business Process Life Cycle
The business process life cycle, presented in [Weske, 2007], constitutes the backbone of business process management, comprising four phases that form a cyclic
relationship among each other: design and analysis, configuration, enactment, and
evaluation. Process stakeholders are related to these phases according to their specific tasks and responsibilities according to [Decker, 2008], as shown in Figure 13.
Each phase and its specific function are described briefly below.
Sometimes, an additional activity is considered in the context of the business process
life cycle: Administration comprises the storage, management and efficient retrieval
of numerous information artifacts related to different phases and stakeholders of the
life cycle. [Weske, 2007]
Process Participant
Process Manager
Evaluation
Design and
Analysis
Enactment
Process Designer
Configuration
Enterprise Architect
Process Implementer/
Developer
Figure 13: Life Cycle of a Business Process within Business Process Management,
including involved stakeholder roles (BPMN)
Design and Analysis. The design and analysis phase is typically considered the first
phase of the business process life cycle, because it comprises the activities that lead
to an explicit representation of a process: the process model. This representation
founds the central knowledge asset for all other phases of the life cycle. Design
57
5 Application of Mashups for Business Process Management
incorporates also process redesign to overcome issues of existing processes, to improve
their performance, or to adapt to changed business demands. Analysis encompasses
means for proofing the process’ correctness at the level of the process model, including
validation, verification, and simulation. Especially validation—reaching compliance
with the business goals a process focuses on—is conducted in a collaborative manner,
typically in reviews that include all involved stakeholders [Weske, 2007].
Configuration. After initial design or redesign, and successfully passing analysis, a
process needs to be put into action. This happens during the configuration phase that
includes all actions required to implement the business process within its technical
and organizational environment.
Configuration draws from a wide spectrum of IT involvement. Sometimes, IT systems
are not involved at all. Business processes comprise the activities that are required
to create some kind of value. By setting up guidelines, policies, and rules business
processes can be realized among a relatively small group of humans that agree to
comply with these procedures. Processes can also be implemented as hardwired
applications that execute activities in a static way. This may be valuable for highly
repetitive processes that do not change over time, e.g. due to legal regulations. In
such cases, standard software can perform these processes that do not need much
maintenance in practice.
Business process management, however, focuses on realizing efficient process operation within the organization’s IT landscape retaining flexibility to adapt and improve
processes. Thus, it is desired to implement processes in a generic framework, typically a business process management system. This allows enacting processes, defined
by process models, in a controlled manner and a coordinated environment reducing
the effort of configuration to a minimum, e.g. the integration with existing software
systems and communication with users.
Enactment. Following its completed configuration, the business process is ready to
be unleashed to perform its obligation. The enactment phase comprises the actual
run time of process instances, that is, the phase of value creation. Enactment means,
that a process instance is initiated upon an event originating from the process’ environment and this process instance is then executed under the active control of the
business process management system according to the execution constraints specified
by the process model and the configuration directives set up in the previous phase.
58
5.3 Value Proposition of Mashups for Business Process Management
The activities performed as operational steps of process instances are generally differentiated into two classes according to the involvement of interaction with human
users. While system activities do only interact with software system parts of the
process environment, human activities depend on the fulfillment of a set of tasks by
a human user. Process instances that are completely detached from user involvement
are also called system workflows.
Part of this phase is the collection of data that arises during the enactment of process
instances including, but not limited to, data objects, events, and exceptions. This
information is kept in so-called process execution logs for further analysis of the
process.
Evaluation. Evaluation comprises the analysis and processing of the very information gained throughout the enactment of process instances. Business activity
monitoring gives insight into the current state of process instances and the process
landscape. Process execution logs provide information about execution performance
and potential issues of process instances. This allows appraising business processes
and the execution environment against set targets of process performance indicators.
The process manager uses this information to examine and invoke process improvements if the targets were not met or the process lacks efficiency or effectiveness.
One subject within the evaluation phase is business process mining. This is used to
discover existing but not explicitly represented business processes from observations
gained through the activities of a real process [van der Aalst, 2002]. Due to its very
nature, business process mining is unlikely to benefit from mashups and a topic
addressed by data analysis and data mining. Thus, it is not further considered
here.
5.3 Value Proposition of Mashups for Business Process Management
Business process management is a mature and proven discipline within the field of
computer science that has been influenced by few strong bodies: companies and
academics. There is clear understanding and concise definition of the goals and
concepts of business process management. Mashups, in contrary, are quite young
and in their adolescent phase that is heavily impacted by decentralized development
through individuals, non-profit communities and recently also companies. Compared
to business process management, no crisp definition of mashups exists and they are
rather perceived as phenomenon (cf. Section 3). As a consequence, little work is
59
5 Application of Mashups for Business Process Management
done to combine these two fields on a conceptual level, compared to mostly technical
problems that were addressed in the past, described in Section 1.2.
While business process management considers process models the central assets of
knowledge, many information artifacts exist at different levels of abstraction related
to processes, process instances, and process models. These include, among others,
execution logs, documentation, and information about the organizational and technical environment. Much information comes from process stakeholders, each of them
having particular skills, knowledge, and experience.
The numerous information artifacts are likely kept in different places, for instance
knowledge management systems such as wikis and process repositories. They facilitate the efficient storage, organization, and retrieval of these kinds of information.
According to the discussion in Section 2.2.2, such systems can substantially benefit
from the Web 2.0 patterns: access to capabilities is largely simplified, functionality encapsulated in services is more likely to be reused, and data can be enhanced
through the “wisdom of crowds” [O’Reilly, 2005]. The latter leverages the knowledge
of individual process stakeholders to complete, improve, and link process related information. Using technologies and systems that emerged with the Web 2.0, such as
wikis or issue trackers, facilitates the collection of this information and allows putting
it into relation. Feedback, tagging, and rating enable stakeholders to maintain and
enhance existing information.
Mashups offer the capability to combine these information islands, and provide insight among required information. Several applications of mashups on top of this
information will be presented below, corresponding to the phases in the business
process life cycle they propose value for.
5.3.1 Design and Analysis
Among the stakeholders involved in process design, the process designer stands in
the center of this phase, coordinating requirements from other stakeholders, such as
knowledge workers, enterprise architects, and the chief process officer. The process
designer also holds the role of the process manager quite often, due to their central role and their profound understanding of the process they are responsible for.
In many cases, process managers have knowledge of business operations that goes
beyond existing documentation.
60
5.3 Value Proposition of Mashups for Business Process Management
The central activity conducted within this phase is process modeling. The process is
usually modeled with the help of graphical notations, such as the Business Process
Modeling Notation (BPMN [OMG, 2008]) or the Event-driven Process Chain (EPC
[Scheer et al., 1992]). Many more exist that essentially share similar characteristics: They provide easily understandable, graphical expressions along with formal
semantics that allow for automated model verification.
process
model
Design and
Analysis
Configuration
Enactment
Evalutation
process
knowledge
Figure 14: Usage of Process Model and Process Knowledge throughout the Process
Life Cycle (BPMN)
Figure 14 shows data dependencies within the business process life cycle on an abstract level: Data that emerges from different phases of the life cycle, generally labeled process knowledge, and the process model itself entail each other. Presenting
process knowledge to process designers allows them to consider all relevant information items, issues, constraints, requests, and potential quality requirements in a
holistic way during process design.
In a given scenario, a process model is assembled of a set of activities that are kept in
a process repository. These activities have specific semantics, and information about
them is kept in different systems. Knowledge about such activities includes informally captured process documentations and advices stored in different collaborative
tools. An issue tracking system stores problems or situations that arose within other
phases of the process’ life cycle. Examples for the latter are setup difficulties during
configuration, operational obstructions and problems during enactment, and performance issues such as unsatisfying throughput or high failure rates. The information
items are likely fragmented, since they arise from several stakeholders in different life
cycle phases. It is desirable to integrate these phases by fueling the gained knowledge
and experience directly into design and redesign.
61
5 Application of Mashups for Business Process Management
A potential mashup aggregates arbitrary data related to the process originating from
any phase of the life cycle. The result is displayed directly in the modeling environment, attached to the model to point out which items belong to which activity
or model element. Such a process visualization tool could be offered to involved
stakeholders, encouraging them for collaboration and contribution. Manifesting and
enhancing data in knowledge management systems gives the process designer the
ability to consider all these influential factors, which eventually leads to better process design and improvement. A prototypical implementation of this mashup is
described in Section 6 and depicted in Figure 19.
In an enhanced process modeling environment, process model verification and simulation could be supported through implementing a visual step-through mechanism
for the process model, highlighting currently enacted activities of a process instance.
An according mashup would need to access the different knowledge management
systems and to interact with the process execution engine.
5.3.2 Configuration
Assuming the utilization of a business process management system, the configuration of a process is largely reduced to enhancing the process model with technical
information that specifies its interaction with its environment. That means providing software artifacts that realize activities and connecting the process to interfaces
of systems of the organization’s IT landscape. The responsible process stakeholder
role is the process implementer supported by a group of developers and enterprise
architects. Thus, embedding a process in its environment is essentially composing
an application—the process—of a set of services—the systems’ interfaced functionality.
Mashups are composite applications, too. Advocates of mashups even describe them
as service compositions. [Spohrer et al., 2008] define a service as “the application of
competence to the benefit of another”, that embraces the steps proposal, agreement,
and realization. A service is the process of value co-creation between provider and
beneficiary. In the mashup ecosystem (cf. Section 4.1), the capability provider is
the provider of knowledge or skill, and the mashup the beneficiary. The provider
proposes its capability, by making it available on the Web, agreement is reached
upon consumption of that capability, through request and the according response,
containing a representation (cf. Section 4.2), and application of the mashup logic
realizes the creation of value. Service systems are not restricted to a specific number
62
5.3 Value Proposition of Mashups for Business Process Management
of providers and beneficiaries. This supports the perception of mashups composing
a set of services.
For example, a simple inventory replenishment process would access a customer relationship management system and obtain the latest orders of a customer. Based
on that information, the process would inquire the supply chain management system and invoke a replenishment request to fulfill the customer’s order. Figure 15
compares this process implementation as workflow (left) with its implementation
as a mashup (right), revealing a remarkable amount of resemblance. In a process
execution engine, the behavior of specific activities needed to be described through
software artifacts that access an organization’s services. Flexibility is achieved by
the automatic interpretation of the process model resulting in process instances.
The same holds for mashups: An augmentation component would contain the process logic and access an organization’s services, potentially Web enabled, through
ingestion components (cf. Section 4.4), providing the process’ outcome through a
publication component to humans or other applications. In contrary to providing
activity specific implementations, the augmentation component would express the
process logic and compose standardized ingestion components.
A mashup can thus be considered as a manifestation of a process (cf. Section 4.3.2)—
the “glue” that combines the activities is a directive of the work that needs to be
performed. Yahoo! Pipes (cf. Figure 2) and Microsoft Popfly give a good impression,
how a data centric process could be implemented as mashup in practice, leveraging
the pipes-and-filter pattern. A process is defined by a set of activities—filters—that
are chained to each other through pipes.
Task 1
Augm.
Task 2
▲
Ing. 1
▲
CRM System
SCM System
crm.uri
▶
Pub.
▲
Ing. 2
▲
scm.uri
Figure 15: Comparison of the Configuration of a Business Process (left, BPMN) and
a Mashup (right, UML)
Due to their characteristics of little governance, small scale, and situational scope
(cf. Section 3.5) mashups may not satisfy the needs to enact processes under high
63
5 Application of Mashups for Business Process Management
performance and government demands. However, they may be of value for processes that need to be enacted within a service-oriented environment and have a low
degree of repetition, such as prototypes or interim solutions. Compared to classic
software service systems, mashups are more focused on data aggregation among a
set of disparate sources than functional composition. Thus, mashups may be the
preferred choice for processes that focus on data retrieval among systems, or as
[Crupi and Warner, 2008b] describe it, to provide a face on top of the services of
an organization and establish a layer that connects users with a service-oriented architecture. This is especially favorable due to mashups’ user centricity that makes
services visible and accessible to human end users in less governance-eager situations
[IBM Corporation, 2008].
The idea of leveraging capabilities of any type offered on the Web is likely to influence
business process management in the future. With the rise of Web 2.0, Web applications became exceptionally famous and companies seek to offer legacy functionality
through Web APIs. Mashups may play a considerable role in that development, as
outlined in Section 7.3.3.
5.3.3 Enactment
As outlined above, processes as well as activities can be differentiated between system
workflows or activities, which do not involve any actions conducted by humans, and
human involved workflows or activities.
Generally, system activities are performed by sophisticated software artifacts developed in the configuration phase, some are provided through service interfaces of
the organization’s IT landscape. Eventually, there may exist activities taking resources into account that reside outside of the organization’s systems landscape.
While processes themselves do not cross organizational boundaries, mashups provide
conceptual and technological means to access external resources through a virtually
unrestricted set of data formats, aggregate information, and make sense of it fueling
directly into the further process.
This could be realized in practice through delivering an activity as subprocess. Corresponding to the implementation of processes as mashups, discussed previously, a
subprocess could be carried out as a mashup, aggregating information or functionality (cf. Section 4.2), such as leveraging geographic location services to extract a
precise location out of a free text description, for instance the Yahoo! Pipes Location
Builder Module.
64
5.3 Value Proposition of Mashups for Business Process Management
Despite system activities, mashups suggest significant potential to support and improve human involved activities. Such activities rely on process participants—
knowledge workers—that are able to perform tasks and make decisions that cannot
be performed automatically. Thus, human activities require user interfaces that do
not only provide the process participants with information that is directly related to
the activity, such as input and output information. Further information about related resources may be necessary. It is the process participant that has the required
knowledge and experience which information needs to be referred to. User interfaces
traditionally comprise work lists that contain a set of task items that need to be
conducted, each item consisting of a form that expects data input from the user.
Process participants perform these work-items as they appear in their work list. The
splitting of a process into small, self-contained activities results in process fragmentation that dates back to the early days of manufacturing where the fragmentation
of a process into small activities and their execution by specialized workforce was
very efficient.
Knowledge workers, however, are different from manufacturing workers in that they
have the expertise and experience to control the whole case, fragmentation of work
is counterproductive. For instance, case management in the financial or insurance
industry involves considering data from many sources, such as customer relationship
management, accounting, or collections. Often, employees are left alone to obtain this
information without coherent user interface support. This results in alarming work
place setups, where knowledge workers have a magnitude of applications running in
parallel sustaining high risk of errors and inefficient case management.
To account for this weakness, part of the business process management discipline
shifted its focus from flow processing to data-driven processing for such cases. Case
handling supports complex activities that need to be handled by humans: “the focus
is on the case as a whole rather than on individual work-items distributed over
work-lists” [van der Aalst et al., 2003], resulting in relaxed ordering of activities and
more flexible completion approaches of the cases. Case handling itself deals with the
formal specification of data dependencies and resulting processes that disclose high
variability. Its details are beyond the context of this thesis, further work in that
topic is provided in [van der Aalst and Weske, 2005, Weske, 2007].
Mashups offer substantial improvement for human interaction, in view of the fact
that they aggregate potentially related information from arbitrary sources. They
can provide a complete picture of a complex case, including all information related
to the process as well as information that may be important for the case, but not
considered part of the process’ resources during process design. A solution that
65
5 Application of Mashups for Business Process Management
worklist and data form
Accounting
customer
payment history
customer rating
Collections
customer details
Stock Website
CRM System
Figure 16: Example of a Dashboard that Supports Human Activity
aggregates a set of pre-considered information, including an interface to handle the
case, yet one that is extensible through the individual user, is desired.
Dashboards provide exactly that form of data aggregation and flexibility. For each
identified case a basic set of widgets can be assembled, based on the specification
of a case. Such a basic configuration would comprise a widget to interact with the
relevant data items of the case, as well as related information assets that offer decision
support, e.g. internal and external performance figures. The process model itself
could be included in such a dashboard to visualize the current state of a process,
giving advice to the process participant and information about related activities.
Knowledge workers would be empowered to create and customize such a supportive
environment themselves, by extending the set of widgets with further information
that may they consider relevant. An example for such a dashboard is sketched in
Figure 16. Besides providing means to handle the case, the dashboard provides
66
5.3 Value Proposition of Mashups for Business Process Management
several widgets that summarize information about a customer that were retrieved
from disparate systems. Using expressive metaphors, colors and gauges, provides
understanding of the given information at a glance.
The remarkable value mashups provide is an aggregation of real time data combined
by the domain expert without putting the burden on the IT department to develop
these tools. Effective coordination of these widgets can further reduce workload
significantly, such as copying data from one place to another, allowing the knowledge
worker to focus on the case itself.
5.3.4 Evaluation
Evaluation uses information collected during the enactment phase to assess the performance of process models and the technical and organizational execution environment based on measurements taken from process instances. Such assessment can
generally be distinguished according to the time of assessment related to the time of
data acquisition.
Live monitoring of processes takes the state of currently enacted process instances
into account. Mashups can aggregate that data and display it corresponding to the
process model. Visually combining model elements, such as activities and events,
with information about active process instances gives insight about how many instances are currently executing the same activity, which process participants hold
which tasks, which resources are involved, and which customers are being served.
Single process instances could be inspected and diagnosed for potential issues.
The second type of process quality assessment uses historical data about processes to
derive its overall performance in the setting created during the configuration phase.
Performance measures such as time spent to execute specific activities can be analyzed statistically. Again, visualization of the gained information in close relation
to the corresponding process model facilitates the identification of bottlenecks or
resource scarcities. Single process instances can be traced back based on the information retrieved from process execution logs. The knowledge gained in this phase
can be used to compare a current process model and its configuration to earlier and
future versions and assist in improving the process model.
A possible mashup that provides a hybrid of the approaches described above would
provide immediate reports about the current state and performance of specific processes in an organization. This allowed comparing currently running processes with
67
5 Application of Mashups for Business Process Management
historic measures, making manually assembling process information and creating a
report unnecessary. Such mashups can give valuable insights about recent process
improvements and thus, fuel directly into future process design and redesign in a
subsequent phase.
5.4 Assessment
Business process management is a well governed science built on strict and formal
fundaments whereas mashups are comparably lax as a result of ungoverned evolution
among a broad base of individuals. Compared to business process management,
mashups address only small scale processing and interaction is largely reduced to
capability retrieval, transformation, and combination, whereas business processes are
likely to have a high complexity of interaction with systems that share their resources’
state with the enacted process instances. Business process management is centered
at the interaction with systems and the choreography of process instances across
organizational boundaries with a strong emphasis on their control flow. However, the
observations and considerations above lead to the conclusion that mashups propose
considerable value in contributing to the goals of business process management.
Understanding the operations of an organization is supported and established in
those phases of the business process life cycle where mashups accumulate information
from different sources and provide insight and overview. In the design and analysis
phase, knowledge that is spread and fragmented across stakeholders is recombined to
create holistic views and increased understanding of a process and related information. This leads to better process design and eventually supports the improvement
of processes. The evaluation phase benefits from the ability of mashups to aggregate any data that arose during the enactment of process instances and provide real
time reporting about process state and process performance. Since the use cases and
requirements on such reports are virtually unlimited, the user-centric ad-hoc aggregation of “whatever needs to be aggregated” [Hinchcliffe, 2006] can satisfy specific
needs.
Implementation of processes in an enterprise environment is largely supported
during configuration and enactment of business processes. In the configuration phase,
mashups can assume the position of a small-scale process execution engine. This
means that processes can be implemented as mashups due to their conceptual perception of aggregating services. Business process management demands for strong
68
5.4 Assessment
governance along the business process life cycle, which renders support for processes
with a low degree of repetition inefficient. Such low repetitive processes embody situational needs and thus, may be more efficiently supported by mashups. If a rapid
solution is needed, the supported process is relatively simple, and governance or efficiency of its enactment are not key performance measures, mashups are likely to
be beneficial. Thus, mashups can be perfect solutions for process prototypes or interim solutions. During enactment, mashups can represent activities that are rather
data centric and interact with capabilities external to the process environment or
with humans. Dashboards provide huge improvement in efficiency as decision and
information support for interaction with knowledge workers, providing overview and
insight in case handling. The dynamic character of dashboards also contributes to
flexible adaptation to changes in the business process.
Flexibility is greatly supported in all phases since mashups are ad-hoc aggregations. They constitute just-in-time solutions that don’t require much work to be set
up and to be hosted in an organizational and technical environment. Changes in that
environment can be easily integrated into mashups, due to their ability to provide
lightweight and rapid solutions to specific needs.
The considerations in this chapter lead to the conclusion, that the key benefit for
mashups in business process management can be seen in the aggregation of data
related to a process and its visual presentation that is based on the corresponding
process models. This enables process stakeholders themselves to create new perspectives on process models that include information relevant to particular needs.
Mashups leverage existing yet fragmented information and provide users that have
a stake in business processes with overview of and insight into that information. On
the other hand, mashups provide a huge set of technical accomplishment to access
resources of any type and make use of it. These technical foundations of leveraging
the Internet Operating System suggest benefit to business process management in
the future.
Given example scenarios outline the potential application of mashups and show that
the possibilities are by no means exhausted. In many cases the customers of business process management have very specific needs, based on their actual enterprise
IT architecture and the processes they employ. The wide adaptability of mashups
shows that they can satisfy a broad diversity of needs. While not all use cases can
be identified in advance, mashups provide the means to address and satisfy these
needs.
69
6 Enabling Collaborative Process Design with Mashups
This chapter comprises the documentation of a prototypical mashup application
that picks up the observations made and conclusions drawn in Section 4 and 5 and
proves their practical realization and applicability. The provided mashup obtains and
aggregates fragmented knowledge about a process throughout the business process
life cycle and makes that information accessible through a visually rich and easily
understandable user interface.
As outlined in the previous chapter, the central knowledge asset of business process
management is the process model. The Business Process Technology group at the
Hasso Plattner Institute hosts an open source software project, called Oryx Editor26 .
Oryx is a browser-based model editor, capable of designing many types of diagrams,
yet currently focusing on the support of process models via BPMN [OMG, 2008].
Additional to the browser-based model editor, Oryx comprises a model repository
that allows users to store and manage models, transform models into different formats, and perform diverse operations on them. Oryx’ modeling capabilities and free
access as Software as a Service make it a suitable provider for process models and
will thus be leveraged for the mashup prototype.
This chapter is, in contrary to the previous ones, rather technical, since it describes
the actual implementation of a mashup and a mashup framework. Understanding of
and experience in the development of Web applications are advantageous.
6.1 Analysis
Section 5.3.1 concluded that numerous information artifacts are yielded within the
different phases of the process life cycle originating from several stakeholders. These
information artifacts, in relation to processes also denoted as process knowledge,
are likely kept in different places within an organization. The utilization of Web
2.0 and open source information systems, such as wikis, facilitates the collection of
information out of the process stakeholders’ minds and is already well established,
according to [Bughin et al., 2008]. The wisdom of crowds (cf. Section 2.2.2) allows
individuals to enhance and correlate the fragmented knowledge and makes it more
valuable. Another promising source of information are issue tracking systems. Such
systems allow relating deficiency reports or feature requests to specific artifacts of
a product, and record them in a managed system. Issues such as incompatibilities
26
http://oryx-editor.org
71
6 Enabling Collaborative Process Design with Mashups
with existing systems during the configuration phase of a business process, necessary
improvements that become apparent within process enactment, or unsatisfactory
performance numbers emerging from process evaluation are potential requirements
that may be tracked in issue tracking systems in the context of business process
management.
Based on that assumption, a concrete scenario was derived, already introduced on
a more abstract level in Section 5.3.1: During the first phase of the process life
cycle—design and analysis—all relevant information that may have an influence on
the tasks within this phase must be considered. It is a rather daunting task to search
and inspect the fragmented process knowledge manually, and an automatic solution
that aggregates these information artifacts and provides them in a holistic way is
aspired. The present prototype aims at connecting stakeholders through aggregating
and presenting shared knowledge, thus improving collaboration among them and
offering holistic insight into processes. Process models are leveraged as a vehicle for
the management and visualization of information related to processes and process
elements. This leads to the support of the following process relevant tasks through
the mashup prototype.
Requirements Management: Requirements in form of requests or deficiency reports
can be related to processes and process elements.
Documentation: Any document that is made available through a URI, i.e. on the
Web (which may be a closed corporate network), can be related to processes and
process elements. Wikis are considered the primary source for documentation.
Process Modeling: Process modelers are provided with all relevant information by
the aggregation of process model, requirements, and documentation. This
includes redesign and improvement of process models as a result of evaluation.
Review: Collaborative workshops to explore and review processes are supported by
visually attaching information items to the process model. The mashup is
provided as a Web application and thus universally accessible.
The following capabilities were identified to be aggregated by the prototype. As
already described in Section 3.5.8, mashups suffer from the same origin policy imposed by Web browsers. This can be overcome by the employment of different means,
namely an AJAX proxy or JSONP. The problem and both workaround approaches
are discussed in detail in Section 2.3.3. The description of each capability will also
include a short discussion on the strategy to obtain information from it, giving proposals for their ingestion.
72
6.1 Analysis
6.1.1 Process Model from Oryx
Oryx does not only provide means to model process diagrams but also stores them
and offers them in different formats, such as RDF, PNG, or SVG. Models are identified by URIs. An extension to the process repository of Oryx, located on the
server site, further allows delivering models formatted in JSON and encapsulated in
a function call, to be loaded via JSONP.
The modeling component of Oryx editor requires the ability to display SVG inline
the page’s DOM and handle valid XML, which is not provided by every Web browser.
Thus, a mashup API was developed as part of the Oryx project that is technical independent from the modeling environment and allows viewing models in all browsers
without any extensions. This API is called MOVI27 . MOVI retrieves a picture of a
model diagram in the PNG format as well as a JSON-formatted representation of
the model via JSONP and creates virtual process model elements completely as part
of the page’s DOM. This makes Oryx’ process models displayable in virtually all
popular browsers. MOVI’s features further allow for interaction of the user with the
process model, providing means to highlight model elements, and display arbitrary
content.
By these features, MOVI provides the ideal starting point to visually aggregate information with a process model and elements, remaining accessible through most
browsers. Therefore, it will found the functional basis for the mashup and the publication component at the same time.
6.1.2 Issues from an Issue Tracking System
Issue tracking systems are a common way in software development to track issues
that arise in relation to any development artifacts, i.e. software system components.
Issues are captured in the form of tickets that describe either an incident or a request and are directed to responsible persons to solve the issue. Such tickets allow
documentation of the resolution progress over time and thus, traceability of design
decisions. Issue tracking systems support the management of tickets within a project
through providing means to prioritize tickets and describe their severity and affected
artifacts. It is quite obvious that such a system could also be employed to keep track
of issues that arise with processes or activities in all phases of the business process
27
MOVI is an acronym that stands for MOdel VIewer and can be found at: https://bpt.hpi.
uni-potsdam.de/Oryx/MOVI
73
6 Enabling Collaborative Process Design with Mashups
life cycle. This can be requirements and requests during process design as well as
incidents during process configuration and enactment.
The present case assumes exactly this scenario, leveraging the Trac28 open source issue tracking system to document issues with processes and process activities. Besides
its issue tracking capabilities, Trac provides several tools for lightweight, web-based
software project management. For issue tickets it offers powerful search functionality that takes an arbitrary set of conditions, according to ticket data, as input
and provides a configurable set of ticket related information as result. It also provides search results in the form of content syndication via RSS, but in that case the
returned information is limited to title, description, author and date. In order to
retrieve further information, the ticket itself needed to be inspected. This problem
is described by [Alba et al., 2008]: A Web API is provided in the form of a content syndication feed, but the information exposed through a human interface—the
website—provides more accurate data. Thus, screen scraping of the website was
chosen to gather information about related tickets.
Tickets are related to process elements through two ticket data fields: component
and keyword. The component of a ticket defines the artifact where it occurs, this
maps to the process identified via a URI in Oryx. Each process model element is
identified through a resource-id that is mapped to one of several keywords of a ticket.
This allows relating one ticket to several elements as well as several tickets to the
same element. Entering this data at the time of ticket creation is relatively simple
and supported by the mashup through explicitly presenting this data to the user.
6.1.3 Documentation from a Wiki
A wiki is a hypertext based content management system. Compared to traditional
content management systems, wikis do not explicitly distinguish authors from readers, but rather serve as a central place to collaborate in content creation. Probably,
this epitomizes the Web 2.0 paradigm of participation in the first place. Wikis gained
increasing significance as support for documentation in organizations, because they
enable collaboration and provide an easy to use interface. Wiki interfaces have a
remarkably low entry barrier opening it to a wide spectrum of users, even those that
are not technically savvy. According to [Bughin et al., 2008], wikis are already quite
established tools within companies.
28
http://trac.edgewall.org/
74
6.2 Design
Process models created with the Oryx editor allow attaching URIs to model elements,
referencing external sources of information. Such information will be retrieved by
the mashup to enrich the process model with documentation. This information
is likely kept in a wiki, but it may also be stored in other systems that provide
access to content via a URI. For the prototype, process documentation will be stored
within a page of a TWiki29 installation and is accessed through obtaining the HTML
representation of that page.
6.2 Design
According to the capabilities aggregated, the prototype described here uses a hybrid
of both approaches to overcome the same origin policy and consume capabilities from
remote resources (cf. Section 2.3.3). MOVI leverages JSONP that allows it to run
completely independent of any server component. However, issues from the issue
tracker and documentation from the wiki pages must be accessed by a proxy server
component that is invoked through an AJAX request from the client. Consequently,
the mashup application itself consists of two tiers, illustrated in Figure 17 and 18:
a client tier that allows for interaction with the user and a server tier that provides
access to remote resources through a proxy. Corresponding to the conclusions about
courteous access of resources drawn in Section 3.5.8, this server tier can be used to
employ a cache that shields high traffic loads from capability providers. Advancing
this idea—performing operations on the obtained capabilities, such as data filtering
and transformation, and caching those results—suggests to elaborate on a minimalist
mashup platform, discussed in the following.
6.2.1 Mashup Platform
As previous considerations showed, a platform that supports development, deployment, and execution of mashups in a hybrid scenario to access remote capabilities,
is expedient. Figure 17 depicts an architectural diagram of this platform. The platform allows for the execution of operations on a Web server. These operations are
encapsulated in components called filters, according to the pipes-and-filter pattern
that has repeatedly shown up in the context of mashups (cf. Section 4.3.2).
The design is based on the idea to compose filters according to the types introduced
in Section 4.3. Filters for ingestion, augmentation, and publication bear resemblance
29
http://www.twiki.org/
75
6 Enabling Collaborative Process Design with Mashups
...
Mashup Client Tier
▼
HTTP
Filter Delivery and Execution Engine
..
Filter Storage
Filter Result
Cache
Executed Filter
▼
Mashup Server Tier
HTTP
...
Capabilities
Figure 17: Architecture of the Mashup Platform (FMC)
among each other on a technical level and it should make no difference whether they
were executed on a server or a client. Thus, the filter delivery and execution engine
of the mashup platform either loads filter code through HTTP and transmits it to
the client or executes this code on the server and delivers the filter results. The
results of each filter can be cached on the server, which allows for faster responses,
if cached data is fresh enough, and reduction of the impact on capability providers.
Filters are stored on the server and the platform allows the execution of these filters
in a controlled environment: the execution context that is created upon a client’s
request (cf. Section 4.1). Filters can be invoked through HTTP requests or by other
filters that are already running.
The only programming language that has been considered suitable for the implementation of filters is JavaScript [ECMA, 1999], because it is universally supported by
Web browsers. Thus, a Web server was needed that supports to run JavaScript as
well. Among suitable JavaScript engines were Aptana Jaxer30 , Mozilla Rhino31 , and
Mozilla Spidermonkey32 . One of the potential Web server candidates that featured
one of the above JavaScript engines was CouchDB33 , a document-oriented database
system that stores schema-less data in the form of JSON objects. Among others,
30
http://www.aptana.com/jaxer
http://www.mozilla.org/rhino/
32
http://www.mozilla.org/js/spidermonkey/
33
http://couchdb.apache.org
31
76
6.2 Design
CouchDB offers the following features that contribute to the mashup platform: (a)
store and serve static files, (b) store structured or unstructured documents, and (c)
execute JavaScript code in Spidermonkey through an interface corresponding to the
REST architectural style (cf. Section 2.3.2) making it a Web application platform.
These features led to the design and implementation of the minimalist mashup platform described above, using (a) to store the mashup application, related files, and
the filter components. (c) is leveraged to execute filters within CouchDB, accessing
remote capabilities that were otherwise inaccessible to the client. The composition of
several filters during runtime is supported by that feature, too. Finally, (b) enables
caching of filter results by storing them as documents, if this is requested at the time
of filter invocation. Such caches provide temporal snapshots of filter results to be
returned while they are fresh, instead of running the filter repeatedly.
6.2.2 Mashup Architecture
Figure 18 illustrates the architecture of the mashup prototype and its components
that run within the mashup platform. As it becomes apparent in the diagram, the
prototype is constituted by several components, which are either executed on the
server or the client, as described above. Each component refers to an activity of the
mashup pattern, discussed in Section 4.3. The architecture itself is an instantiation
of the mashup reference model, described in Section 4.4: Each component consumes
data, and delivers it to its succeeding component, except the publication component
that delivers a representation to the user. These components and their functions are
explained briefly, according to the mashup pattern activity they belong to.
Ingestion. The MOVI API supplies its own ingestion layer executed completely
in the browser, implementing JSONP to load functionality and data on demand.
Besides loading the model in from of a JSON representation and a picture for display,
MOVI offers the process model as a data structure that allows for inspection and
according processing of the process model.
The content of wiki pages is formatted in HTML and needs to be obtained through
the server side ingestion component that is requested by the browser application
through an AJAX request. The wiki ingestion component loads the HTML representation of a given URI. If this URI specifies a fragment (identified through the #
character), the content will be reduced to the DOM node identified by that fragment.
Otherwise it will be reduced to the content of the HTML body. Finally, the result of
77
6 Enabling Collaborative Process Design with Mashups
▼
MOVI
(Publication,
Oryx
Ingestion)
Augmentation
Mashup Browser Tier
▼
Wiki Ingestion
▼
Trac Ingestion
Mashup Server Tier
▼
Wiki
▼
Trac
▼
Oryx Backend
Figure 18: Architecture of the Mashup Prototype (FMC)
the ingestion component is returned as a string containing an HTML fragment that
can be displayed somewhere in the mashup.
The aforementioned flaw in the RSS-based Web API of Trac (cf. Section 6.1.2), entails the employment of screen scraping to search and retrieve issues that are related
to a specific process model. This is realized through a set of regular expressions and
iteration over the search website’s HTML source. Application of an XML parser was
not possible, because the description of a ticket allows the containment of HTML,
which may not conform to XML rules and eventually break the parser. Again, the information cannot be accessed via JSONP and thus, needs to be retrieved through an
AJAX request forwarded by a proxy. The ingestion filter located on the mashup platform performs the screen scraping and transforms the issues into a JSON collection.
This data transformation is the very data normalization mentioned in Section 4.3.1.
This task consumes some time, as well as processing resources both on the Trac
server, for searching related tickets in a database, and the mashup ingestion component, for screen scraping and data normalization. High load is intercepted through
the employment of a cache that temporally stores the scraping results, offering courteous access to capabilities, discussed above and in Section 3.5.8.
78
6.2 Design
Augmentation. Aggregating process model, related issues from Trac, and documentation from wiki pages is implemented through iteration among the process model
elements accessible via the MOVI API. Each model in Oryx is identified by a URI.
This very URI is used within one of the issue properties (component) to relate the
issue to a model. Another property (keywords) is used to relate an issue to a specific
model element by the use of the identifier of that element. While the model URI is
used to search for issues of a model (described above), issues and model elements are
connected through comparing the model identifiers stored in entities of both types.
Model elements can contain links to resources on the Internet (refuri), which are considered documentation capabilities. Thus, each model element can refer to its own
documentation that is then loaded through the specified link by the wiki ingestion
component.
Publication. The publication of the mashup prototype leverages the mapping metaphor, i.e. displaying landmarks in close proximity to its geographic location on
a map. For the present mashup, however, the geographic map is replaced with
a process map—the process model—and landmarks are exchanged with issue and
documentation information displayed close to their according model element.
While augmentation and publication are distinct in their conceptual nature, publication actions are conducted concurrently to augmentation. All information is presented on the picture of the process model, supplied by MOVI. After model elements,
issues, and documentation are connected during augmentation, the publication layer
creates annotations that comprehend information about issues. A green, orange, or
yellow circle that is connected to a model element represents the maximum severity
and overall number of issues of a model element. To discover more information,
a tooltip is displayed that contains the list of all issues and documentation, upon
clicking on a model element. This tooltip is displayed close to the original model.
An additional list shows all issues for that model, including those that are not related
to a specific element. If an issue of that list is related to an element, clicking on it
highlights this element on the process model picture and shows the tooltip.
All links contained in the representations of the capabilities, i.e. the model URI,
documentation/ wiki page URI, and issues’ URI are kept valid. They are incorporated into the mashup’s presentation, offering the capacity to navigate to the original
capabilities, make changes, or get more information. Thus, the mashup works also
as a portal to comprehensive collaboration, guiding users through the information
system landscape of an organization.
79
6 Enabling Collaborative Process Design with Mashups
6.3 Realization
The resulting application is depicted in Figure 19. The screenshot of the mashup
shows the annotations displaying the highest severity and overall number of issues
per model element as well as the tooltip containing the list of issues for the currently
selected model element. The list of all issues is located on the right.
Figure 19: Demo of the Mashup Prototype
According to the mashup types identified in Section 3.4, the prototypes can be considered an organic mashup. It was developed manually, because no existing tools
provided adequate support to aggregate the very specific capabilities needed for the
scenario. Ingestion required sophisticated means, because the information systems
did not provide very elaborate Web APIs to access content. Yet does this demonstrate the ubiquitous availability of capabilities on the Web. A standardized and
established data format and schema for Web APIs would be desirable, as pointed
out in Section 3.5.3. Capability providers attempt to satisfy this desire by providing
information by means of content syndication.
The prototype also features the common characteristics of mashups that were synthesized in Section 3.5. The application addresses the specific needs of process designers
that need a brief yet complete overview of relevant information tailored to a process
model and is provided as Software as a Service, through offering it on the Web
running in a Web browser. The mashup aggregates heterogeneous content: MOVI
functionality to display and interact with the picture representation of a process
80
6.3 Realization
model, HTML representations of related documentation, and remote search capability to obtain HTML representations of process related issues. Although MOVI
provides comprehensive functional features, it focuses on the presentation of the
process model and is, in combination with the mashup application, rather data centric. The overall scale of aggregated information is relatively small, determined by
the size of the model and the amount of issues.
The implementation of the mashup exposed two very typical qualities of mashups.
The first is lack of governance (cf. Section 3.5.8). This topic has been frequently
discussed in this work, and had considerable influence on the prototype. Since no
means for federated login were considered adequate, all information aggregated by
the mashup has to be freely available. This turned out to be acceptable, because
the mashup did not require any authorization to modify any obtained information
and read-only access rights were sufficient. The second quality is the short time
to market. Excluding the design and implementation of the mashup platform and
extensions to MOVI, i.e. annotations for issues and tooltip visualizations, it took
less than a week to implement the mashup itself.
81
7 Conclusion and Outlook
The last chapter of this thesis summarizes and concludes the observations and findings of the work and gives prospects on future topics that are considered relevant
indicators in the context of mashups and business process management. However, an
assessment of mashups is provided first that puts this new and exciting genre of Web
applications into relation with other software systems: The long tail is a frequently
cited metaphor in connection with mashups [Hoyer et al., 2008].
7.1 The Long Tail
Figure 20 depicts the continuum of software that is employed in enterprises. Located
at the left end of this spectrum is software that addresses strategic goals and thus the
core business of an organization. Applications and systems of that type of the continuum are used by many persons and driven by few. Building theses systems involves
a well engineered development process, and is a comparatively long term project.
This is, because the systems are large scale systems, supporting many people, managing magnitudes of data and functionality. Due to their nature, these systems have
strong demands for governance, such as security, reliability, availability, and performance that exceed the demands for flexibility by far. This is where large scale and
complex, expensive IT systems are located. Among these are customer relationship
management, enterprise resource planning, and supply chain management, as well as
service-oriented architectures that provide an implementation platform for business
processes.
On the right end of the spectrum are more opportunistic applications that focus on
day to day problems of users, and satisfy the needs of individuals. Such applications
do not handle large scale data and have no or very low demands on governance and
quality and rather emphasize rapid, lightweight development and flexibility. A typical example is the “spreadsheet keynote” that uses some tables of a spreadsheet to
derive statistics and present some diagrams. Due to their simplistic nature, such applications are easy to build by technically-savvy people, usually through assembling
existing components. They are driven by many, but provided to few.
The spectrum is not discrete, there is no specified point where the first type starts
and the second type ends. Systems and applications may not even be located immobile on the spectrum. Maturing prototypes, for instance, move from early and
rapidly assembled mockups, located on the right side, increasingly to the left. Systems that are located along the curve typically feature specific characteristics: The
83
7 Conclusion and Outlook
#users / cost / value per application
head
tail
# of applications
Figure 20: Long Tail in the Spectrum of Software Systems
further right an application moves on the curve, the less involvement of the IT department and formal development methods does it embrace, thus lowering costs for
their development. On the other hand, the value of the single application is highest
on the left side of the spectrum which comes along with high risk to develop such a
system—the opposite is the case at the right end. The area below the curve represents the value of systems and applications to the organization that possess them,
either as the driver of or as support for the organization’s activities.
The spectrum depicts the long tail, a term that was coined by Chris Anderson
[Anderson, 2004]. He published an article that described an entirely new economic
model for the media and entertainment industry: With the evolution of online distribution many influential factors of traditional distribution disappeared. Among them
are limitation of resources, such as storage space, sparse demand within geographical
regions, and production costs for mediums that carried the content, such as CDs and
DVDs. It has become profitable not only to sell recent mainstream hits but also alternatives and misses, because the cost to provide them became extremely low. The
area below the curve represents the revenue that is made through selling content,
hits are located on the left, few works that are consumed by many, and misses on the
right, many works that appealed only a few. The sheer amount of the less demanded
products outweighs the few hits and can even create more revenue [Anderson, 2004].
This narrow shape that promises so much value is the long tail.
Similar to goods retailers, IT departments did not sufficiently address applications
that would not generate a high demand, because the development cost of applications
84
7.1 The Long Tail
with low demand and a potentially short life time was higher than the revenue these
applications could provide. The evolution of Web 2.0 supplied individuals with the
tools to create situation specific and visually appealing applications drawing from
virtually unlimited resources of data and knowledge on the Web, e.g. through wikis
and blogs. This is where mashups emerged: niche products that became profitable
and allowed to extend the spectrum of corporate applications towards the long tail.
Mashups offer lightweight solutions to existing problems that were either neglected
or addressed inappropriately in time and quality, before. They make it easy for
domain experts to build working software that addresses their particular needs. This
flexibility allows addressing immediate needs and provides realistic options to create
cheaper solutions for an organization. Due to their low cost, the risk-costs are low
too.
Along the long tail, one can even attempt to differentiate mashups by their type.
Organic mashups are typically less customizable, since they address the needs of a
narrow group of users rather than those of individuals, which suggests locating them
fairly left of the long tail. Dashboards on the other hand are extremely customizable;
in fact it is the individual user who decides which capabilities are aggregated. The
ratio of developers to users is practically one, locating dashboards at the very right
end of the long tail.
Mashups are not stuck to the long tail. As already mentioned in section 2.2.3, if
demand increases, they may be subject to be shifted more and more to the left,
while still remaining mashups. HousingMaps is a very successful example that did
not only gain substantial popularity, but was also motive for Google to actually
open their mapping capabilities and provide them as Web API. On the other hand,
demand may decrease, mashup applications outlive their usefulness, and become
disposed. This does not impede their creation, since they were cheap to create in
the first place.
The long tail is a valuable metaphor to compare software systems. It shows that
mashups are applications that emphasize the satisfaction of immediate needs, rather
than providing complete software suites for holistic business use cases. However,
mashups can excel the state of a situational application and evolve to a more general
application that serves many users, moving along the spectrum in the direction of
the left end. The evolution of mashups will be largely influenced by the demand for
situational applications in the future and their evolution along the long tail.
85
7 Conclusion and Outlook
7.2 Conclusion
The work presented in this thesis thoroughly elaborated mashups to understand
their basic concepts and value. The gained insight founded an analysis, in which
ways mashups can be applied in business process management and which value they
propose in different scenarios.
Mashups arose from situational needs to combine capabilities on the Internet, reminiscent of recycling existing resources, rather than creating new applications from
scratch. Although mashups became increasingly popular just recently, they are perceived as a phenomenon of assembling applications out of pieces spread over the
Internet. Thus, mashups are a genre of applications, rather than a specific architecture or technology, enjoying broad interest, yet lacking concise definition. Due to
their relatively young existence, such definition would probably do a disservice to
their future evolution and impose inapt boundaries on the flourishing of mashups.
After a short introduction into the history of mashups and business process management (cf. Section 2), the first part of this thesis has been dedicated to build advanced
understanding of the mashup genre that comprises a broad variety of many manually
built and tool-supported mashups, as well as tools themselves. Based on a survey
among several successful mashup applications and tools, Section 3 identified two
main types of mashup applications, organic mashups and dashboards, along with a
set of common properties that are characteristic for mashups: user centricity, small
scale, open standards, software as a service, short time to market, aggregation of
heterogeneous content, data centricity, and lack of governance.
The observations and conclusions gained in that survey and the synthesis of its results fueled into the elaboration of an operational pattern common to mashups in
general, explained in Section 4.3. This pattern describes the conceptual workflow of
a mashup that consists of activities corresponding to one of three types: ingestion,
augmentation, and publication. The pattern blends with the environmental setting
of mashups, described by the mashup ecosystem, in Section 4.1. In order to support
mashup design and reengineering according to this pattern, an elementary reference
model was derived that captures mashup operations in entities and discusses relationships among those, presented in Section 4.4. All mashups that were reviewed in
the survey (cf. Section 3.1) fit into this reference model. The acquired insight into
mashups explains their general value proposition, fostering innovation through unlocking and recombining capabilities in new and unanticipated ways. Through radical
application composition of existing pieces obtained from external sources, mashups
increase agility and reduce development costs: Immediate needs can be satisfied
86
7.2 Conclusion
with relatively small effort and by reusing the value of existing assets. Aggregation
of data, spread among several systems, allows connecting related information and
enables to quickly uncover business insights.
The second part of this thesis is devoted to examine this rather general value proposition for its particular applicability in business process management with respect
to the main goals of the latter (cf. Section 5.1). By means of the business process
life cycle, the suitability of mashups to support tasks as part of business process
management has been elaborated and explained by several potential mashup solutions in Section 5.3. Among these are process modeling support through aggregating
knowledge that emerges from other stages of the business process life cycle, and the
support of process participants through dashboards in the context of case handling
and decision support. As a result of this examination, mashups suggest most value in
aggregating process knowledge that is fragmented among several places, and presenting it visually, providing insight and understanding of process related information.
Section 6 proved the value mashups propose to business process management by
means of a prototypical mashup application. Drawing from the general knowledge
about mashups gained in the first part of this thesis, the proof of concept provides process modeling support addressing one potential scenario elaborated in Section 5.3.1: The application aggregates model related information, documentation
and requirements, with the process model itself. While careful and exhaustive tests
addressing usability and real world effect of the prototype were not feasible within
the scope of this work, feedback to the proof of concept was invariably positive and
suggests future research in this direction.
In organizations that embrace large scale information and process systems it is critical
to bridge interaction between these systems and their users. Missing communication
between developers and knowledge workers results in lacking insights and weakened
support for information exploration and presentation. Mashups can be perceived as
the “last mile” of software applications carrying information from centralized, high
capacity systems to many diverse end-points where this information is ultimately
used. Since mashups are a fairly new trend in software development, their future
is still uncertain. At a first glance, the well established world of highly governed
information and process management systems of business process management seems
to contradict with the revolutionary agile and fuzzy approach of mashups. However,
I believe that those two software concepts can coexist and even complement each
other, whereas mashups put a face on the IT landscape of organizations according
to the needs of the users of these systems.
87
7 Conclusion and Outlook
7.3 Future Work
Mashups can be largely considered as a discipline created by autodidacts and opens
a new field for software engineering, due to its recent popularity. As pointed out
in Section 1.2, software companies have been adopting the topic quite ambitiously
and provided a lot of technical proposals for mashups and mashup tools. Still, many
questions remain open and obstacles unsolved. Some of them will be covered in the
following.
7.3.1 Governance in Mashups
Introduced in Section 3.5.8 and raised several times within this work, governance
is one of the most pressing issues of mashups, currently. In order to increase
the application of mashups, the lack of governance must be addressed. Successful mashups will demand for trustworthy solutions to access private or corporate
data and ensure responsible usage. Most mashups still do not support login to remote sources. Ongoing work is conducted by OpenID [OIDF, 2007] and OpenAuth
[OAuth Core Workgroup, 2005], but a pleasant and simple solution that resolves the
issues are yet to be developed, before users and enterprises will put their trust in
mashups. [Phifer, 2008] aptly discusses the need for governance as well as the need
for a certain degree of freedom: “Mashups Demand Governance (But Not Too Much
Governance)”.
Along with access to trusted sources goes the retrieval and execution of remote functionality in a trusted environment. The mashup prototype, presented in Section 6,
obtains functionality (MOVI) from a server and executes it locally. The problem
with this approach is, that executing code from not trusted capabilities in a trustworthy environment may lead to security breaches and disclosure of confidential
information, e.g. through identity spoofing. [Isaacs and Manolescu, 2008] propose a
promising approach to disarm untrusted code, by running it in an artificial sandbox
isolating the code from its native execution environment.
Future improvements of governance considering security as well as ensuring a certain
level of quality among Web APIs will have significant influence on the evolution of
mashups, their adoption in the industry, and their eventual success.
88
7.3 Future Work
7.3.2 Schema Standardization and Semantic Web
[Ankolekar et al., 2007] advertise that Web 2.0 and Semantic Web are two distinct
yet complementary approaches for the future evolution of the Web. In order to
succeed, each one must draw from the other’s strengths. This fits especially the field
of mashups.
As mentioned in Section 3.5.3, data standardization has considerable impact on
simplicity and reusability of content and functionality among the Web. During
design and realization of the mashup prototype, it became apparent that the different capabilities required carefully crafted ingestion components that adjusted to
the particular document structure of the representations. While content syndication
provides adequate means to application neutral data serialization, it lacks semantic
expressiveness—a deficiency that could be complemented, and thus solved, by the application of the Resource Description Framework (RDF) [W3C, 2004]. While RDF
provides a well defined set of expressive markup, it retains application neutrality
and allows for custom extension. Data transformation could be performed automatically (during ingestion and publication), due to the standardized vocabulary used by
capabilities, mashups, and consumers. Augmentation could be expressed via declarations in RDF, or implemented as custom functions that leverages the well-defined
document structure [Morbidoni et al., 2007]. Its ability to transitively conduct data
transformations supports the assumption of simply building mashups by plugging
existing pieces that comprise compatible interfaces.
RDF is XML-based and perfectly suitable to perform structural and especially semantic transformations and operations on other XML-based documents. XML-based
exchange formats for business process models are under current research and development, the XML Process Definition Language [WfMC, 2008] being one of them.
Such formats are designed to not only express diagram structure and visualization,
but also comprise semantics of the model and enable to enact them within workflow
management systems. This and the capabilities of RDF would allow for tighter and
more meaningful integration of process models in mashups, while decoupling mashup
applications from specific or proprietary data formats.
The drawback of RDF that comes with its comprehensive and application independent expressiveness is that it requires a certain level of knowledge and experience to
apply its features effectively. Unfortunately, this raises the entry barrier for developers and users. Its suitability in mashups is therefore argued. Future research in
that topic needs to compare the gained value with the increased complexity while
considering different application scenarios.
89
7 Conclusion and Outlook
7.3.3 Business Process Management
To my knowledge, this thesis is the first academic work that examined the value
proposition of mashups for business process management. The work demonstrated
the general value mashups offer as well as the practical feasibility to realize this
value for business process management. In particular, it demonstrated how process
related information can be aggregated with process models to gain insight about
fragmented process knowledge in the design phase and monitor or analyze process
instances during evaluation.
Further work in that topic needs to take these particular approaches a step further
and examine their advantage by a thorough analysis of productive operation. Additionally, more work needs to be conducted elaborating on other scenarios pointed
out in Sections 5.3.
Recently, research work has been conducted that attempts to combine the benefits of
REST and distributed hypermedia systems with business process management. As
outlined in Section 5.3.2, capabilities and Web APIs, in particular, are likely to influence future application design, also in the context of business process management.
The efforts of organizations to offer capabilities on the Web may lead to business
process management systems that are executed completely on top of the “Internet
Operating System” [O’Reilly, 2005].
Based on recent research in the field of case handling and its effects on business
process design, the automatic generation of mashups is imagined, as indicated in
Section 5.3.3. Such generation would take a case description as input and derive
a tentative dashboard configuration that contained the information and controls
needed to complete a case. Subsequently, user interface specialists or even the affected knowledge workers—the experts for particular cases—could adopt these dashboards according to their individual needs. An according solution needed to provide
a mashup infrastructure and pre-manufactured components that access the organization’s information systems, process it and offer results in various forms to the users,
corresponding to the suggestions of the mashup reference model in Section 4.4.
90
References
[A9.com, Inc., 2007] A9.com, Inc. (2007). Open Search 1.1 Specification. http:
//www.opensearch.org/Specifications/OpenSearch/1.1.
[Abiteboul et al., 2008] Abiteboul, S., Greenshpan, O., and Milo, T. (2008). Modeling the Mashup Space. In WIDM ’08: Proceeding of the 10th ACM workshop on
Web information and data management, pages 87–94, New York, NY, USA. ACM.
[Alba et al., 2008] Alba, A., Bhagwan, V., and Grandison, T. (2008). Accessing the
Deep Web: When Good Ideas Go Bad. In OOPSLA Companion ’08: Companion
to the 23rd ACM SIGPLAN conference on Object oriented programming systems
languages and applications, pages 815–818, New York, NY, USA. ACM.
[Amer-Yahia et al., 2008] Amer-Yahia, S., Markl, V., Halevy, A., Doan, A., Alonso,
G., Kossmann, D., and Weikum, G. (2008). Databases and Web 2.0 panel at
VLDB 2007. In Proceedings of SIGMOD 2008, volume 37 of Lecture Notes in
Computer Science, pages 49–52, Vancouver, Canada. ACM.
[Anderson, 2004] Anderson, C. (2004). The Long Tail. http://www.wired.com/
wired/archive/12.10/tail.html.
[Ankolekar et al., 2007] Ankolekar, A., Krötzsch, M., Tran, T., and Vrandecic, D.
(2007). The Two Cultures: Mashing up Web 2.0 and the Semantic Web. In
WWW ’07: Proceedings of the 16th international conference on World Wide Web,
pages 825–834, New York, NY, USA. ACM.
[Bellas, 2004] Bellas, F. (2004). Standards for Second-Generation Portals. IEEE
Internet Computing, 8(2):54–60.
[Berners-Lee, 1989] Berners-Lee, T. (1989). Information Management: A Proposal.
http://www.w3.org/History/1989/proposal.html.
[Berners-Lee, 1996] Berners-Lee, T. (1996). Www: Past, present, and future. Computer, 29(10):69–77.
[Berners-Lee et al., 1998] Berners-Lee, T., Fielding, R., and Masinter, L. (1998).
Uniform Resource Identifiers (URI): Generic Syntax. RFC 2396 (Draft Standard).
91
References
Obsoleted by RFC 3986, updated by RFC 2732.
[Berners-Lee et al., 2005] Berners-Lee, T., Fielding, R., and Masinter, L. (2005).
Uniform Resource Identifier (URI): Generic Syntax. RFC 3986 (Standard).
[Berners-Lee et al., 1994] Berners-Lee, T., Masinter, L., and McCahill, M. (1994).
Uniform Resource Locators (URL). RFC 1738 (Proposed Standard). Obsoleted
by RFCs 4248, 4266, updated by RFCs 1808, 2368, 2396, 3986.
[Bradley, 2007] Bradley, A. (2007). Reference Architecture for Enterprise ’Mashups’.
Technical report, Gartner Research.
[Bughin et al., 2008] Bughin, J., Manyika, J., and Miller, A. (2008). Building the
Web 2.0 Enterprise: McKinsey Global Survey Results. The McKinsey Quarterly.
[Casati, 2007] Casati, F. (2007). Business Process Mashups? Process Management
and the Web Growing Together. In WETICE ’07: Proceedings of the 16th IEEE
International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises, page 5, Washington, DC, USA. IEEE Computer Society.
[Clarkin and Holmes, 2007] Clarkin, L. and Holmes, J. (2007). Enterprise Mashups.
The Architecture Journal, 13:24–28.
[Crockford, 2006] Crockford, D. (2006). The application/json Media Type for
JavaScript Object Notation (JSON). RFC 4627 (Informational).
[Crupi and Warner, 2008a] Crupi, J. and Warner, C. (2008a). Enterprise Mashups
Part I: Bringing SOA to the People. SOA Magazine, 18.
[Crupi and Warner, 2008b] Crupi, J. and Warner, C. (2008b). Enterprise Mashups
Part II: Why SOA Architects Should Care. SOA Magazine, 21.
[Decker, 2008] Decker, G. (2008). BPM Offensive Berlin: BPMN Stakeholder. http:
//bpmb.de/index.php/BPMN-Stakeholder.
[ECMA, 1999] ECMA (1999). ECMAScript Language Specification (JavaScript),
3rd Edition. http://www.ecma-international.org/publications/standards/
Ecma-262.htm.
92
References
[Fielding et al., 1999] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L.,
Leach, P., and Berners-Lee, T. (1999). Hypertext Transfer Protocol – HTTP/1.1.
RFC 2616 (Draft Standard). Updated by RFC 2817.
[Fielding, 2000] Fielding, R. T. (2000). Architectural Styles and the Design of
Network-based Software Architectures. PhD thesis, University of California, Irvine.
[Fielding et al., 2002] Fielding, R. T., Software, D., and Taylor, R. N. (2002). Principled Design of the Modern Web Architecture. ACM Transactions on Internet
Technology, 2:115–150.
[Garrett, 2005] Garrett, J. J. (2005). Ajax: A New Approach to Web Applications.
http://www.adaptivepath.com/ideas/essays/archives/000385.php.
[Gregorio and de hOra, 2007] Gregorio, J. and de hOra, B. (2007). The Atom Publishing Protocol. RFC 5023 (Proposed Standard).
[Gurram et al., 2008] Gurram, R., Mo, B., and Güldemeister, R. (2008). A Web
based Mashup Platform for Enterprise 2.0. Technical report, SAP Labs, LLC.
[Hinchcliffe, 2006] Hinchcliffe, D. (2006). Is IBM making enterprise mashups respectable? http://blogs.zdnet.com/Hinchcliffe/?p=49.
[Hinchcliffe, 2007] Hinchcliffe, D. (2007). Mashups: The next major new software
development model? http://blogs.zdnet.com/Hinchcliffe/?p=106.
[Hinchcliffe, 2008] Hinchcliffe, D. (2008). The WOA story emerges as better outcomes for SOA. http://blogs.zdnet.com/Hinchcliffe/?p=213.
[Hof, 2005] Hof, R. (2005). Mix, Match, And Mutate: "Mash-ups" – homespun combinations of mainstream services – are altering the Net. http://www.
businessweek.com/magazine/content/05_30/b3944108_mz063.htm.
[Hoffman, 2007] Hoffman, B. (2007). Ajax Security. Addison-Wesley Professional,
Reading.
[Hohpe and Woolf, 2003] Hohpe, G. and Woolf, B. (2003). Enterprise Integration
Patterns: Designing, Building, and Deploying Messaging Solutions. AddisonWesley Longman Publishing Co., Inc., Boston, MA, USA.
93
References
[Hoyer and Fischer, 2008] Hoyer, V. and Fischer, M. (2008). Market Overview of
Enterprise Mashup Tools. In Bouguettaya, A., Krüger, I., and Margaria, T.,
editors, ICSOC, volume 5364 of Lecture Notes in Computer Science, pages 708–
721.
[Hoyer et al., 2008] Hoyer, V., Stanoesvka-Slabeva, K., Janner, T., and Schroth, C.
(2008). Enterprise Mashups: Design Principles towards the Long Tail of User
Needs. In SCC ’08: Proceedings of the 2008 IEEE International Conference on
Services Computing, pages 601–602, Washington, DC, USA. IEEE Computer Society.
[IBM Corporation, 2008] IBM Corporation (2008). Why Mashups Matter. In 28th
DNUG Conferece, June 2008.
[Isaacs and Manolescu, 2008] Isaacs, S. and Manolescu, D. (2008). Microsoft Live
Labs: Web Sandbox. In PDC2008: Professional Developers Conference.
[Jackson and Wang, 2007] Jackson, C. and Wang, H. J. (2007). Subspace: Secure
Cross-domain Communication for Web Mashups. In WWW ’07: Proceedings of
the 16th international conference on World Wide Web, pages 611–620, New York,
NY, USA. ACM.
[Janiesch et al., 2008] Janiesch, C., Fleischmann, K., and Dreiling, A. (2008). Extending Services Delivery with Lightweight Composition. In WISE ’08: Proceedings of the 2008 international workshops on Web Information Systems Engineering,
pages 162–171, Berlin, Heidelberg. Springer-Verlag.
[Jarrar and Dikaiakos, 2008] Jarrar, M. and Dikaiakos, M. D. (2008). MashQL: a
query-by-diagram topping SPARQL. In ONISW ’08: Proceeding of the 2nd international workshop on Ontologies and Information Systems for the Semantic Web,
pages 89–96, New York, NY, USA. ACM.
[Jhingran, 2006] Jhingran, A. (2006). Enterprise Information Mashups: Integrating Information, Simply. In VLDB’2006: Proceedings of the 32nd international
conference on Very large data bases, pages 3–4. VLDB Endowment.
[Kayed and Shalaan, 2006] Kayed, M. and Shalaan, K. F. (2006). A Survey of
Web Information Extraction Systems. IEEE Trans. on Knowl. and Data Eng.,
18(10):1411–1428. Member-Chang„ Chia-Hui and Member-Girgis„ Moheb Ramzy.
94
References
[Keukelaere et al., 2008] Keukelaere, F. D., Bhola, S., Steiner, M., Chari, S., and
Yoshihama, S. (2008). SMash: Secure Component Model for Cross-Domain Mashups on Unmodified Browsers. In WWW ’08: Proceeding of the 17th international
conference on World Wide Web, pages 535–544, New York, NY, USA. ACM.
[Knöpfel et al., 2006] Knöpfel, A., Gröne, B., and Tabeling, P. (2006). Fundamental
Modeling Concepts: Effective Communication of IT Systems. John Wiley & Sons.
[López et al., 2008] López, J., Pan, A., Ballas, F., and Montoto, P. (2008). Towards
a Reference Architecture for Enterprise Mashups. In Actas del Taller de Trabajo
ZOCO’08/JISBD. Integración de Aplicaciones Web: XIII Jornadas de Ingeniería
del Software y Bases de Datos. Gijón, 7 al 10 de Octubre de 2008, pages 67–76.
[Merrill, 2006] Merrill, D. (2006). Mashups: The new breed of Web app. http:
//www.ibm.com/developerworks/xml/library/x-mashups.html.
[Morbidoni et al., 2007] Morbidoni, C., Polleres, A., Tummarello, G., and Phuoc,
D. L. (2007). Semantic Web Pipes. Technical Report DERI-TR-2007-11-07, DERI
Galway, IDA Business Park, Lower Dangan, Galway, Ireland.
[Nottingham and Sayre, 2005] Nottingham, M. and Sayre, R. (2005). The Atom
Syndication Format. RFC 4287 (Proposed Standard).
[Novak and Voigt, 2006] Novak, J. and Voigt, B. (2006). Mashing-up Mashups:
From Collaborative Mapping to Community Innovation Toolkits. In MCIS 06 Mediterranean Conference on Information Systems 2006, MCIS 06 -Mediterranean
Conference on Information Systems, Venice, October 05-08, 2006.
[OAuth Core Workgroup, 2005] OAuth Core Workgroup (2005). OAuth Core, Version 1.0. http://oauth.net/core/1.0/.
[OGC, 2008] OGC (2008). KML, Version 2.2. http://www.opengeospatial.org/
standards/kml/.
[OIDF, 2007] OIDF (2007).
developers/specs/.
OpenID Specifications.
http://openid.net/
[OMG, 2005] OMG (2005). UML Specification, Version 2.0. http://www.omg.org/
spec/UML/.
95
References
[OMG, 2008] OMG (2008). Business Process Modelling Notation Specification, Version 1.1. http://www.bpmn.org/.
[O’Reilly, 2005] O’Reilly, T. (2005). What Is Web 2.0? Design Patterns and Business
Models for the Next Generation of Software. http://www.oreilly.de/artikel/
web20.html.
[Overdick, 2007] Overdick, H. (2007). The Resource-Oriented Architecture. Services,
IEEE Congress on, pages 340–347.
[Pemberton, 2002] Pemberton, S. (2002). XHTML 1.0: The Extensible HyperText Markup Language (Second Edition). http://www.w3.org/TR/2002/
REC-xhtml1-20020801.
[Phifer, 2008] Phifer, G. (2008). End-User Mashups Demand Governance (But Not
Too Much Governance). Technical report, Gartner Research.
[Raggett et al., 1999] Raggett, D., Le Hors, A., and Jacobs, I. (1999). HTML Specification, Version 4.01. http://www.w3.org/TR/1999/REC-html401-19991224.
[Riabov et al., 2008] Riabov, A. V., Boillet, E., Feblowitz, M. D., Liu, Z., and Ranganathan, A. (2008). Wishful Search: Interactive Composition of Data Mashups.
In WWW ’08: Proceeding of the 17th international conference on World Wide
Web, pages 775–784, New York, NY, USA. ACM.
[Richardson and Ruby, 2007] Richardson, L. and Ruby, S. (2007). RESTful Web
Services. O’Reilly.
[RSS, 2007] RSS (2007).
rss-specification/.
RSS 2.0 Specification.
http://www.rssboard.org/
[Scheer et al., 1992] Scheer, A.-W., Nüttgens, M., and Keller, G. (1992). Semantische Prozeßmodellierung auf der Grundlage Ereignisgesteuerter Prozeßketten.
Technical Report 89, Institut für Wirtschaftsinformatik, Universität des Saarlandes.
[Shirky, 2004] Shirky, C. (2004). Situated Software.
writings/situated_software.html.
96
http://www.shirky.com/
References
[Simmen et al., 2008] Simmen, D. E., Altinel, M., Markl, V., Padmanabhan, S., and
Singh, A. (2008). Damia: Data Mashups for Intranet Applications. In SIGMOD
’08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 1171–1182, New York, NY, USA. ACM.
[Spohrer et al., 2008] Spohrer, J., Vargo, S. L., Caswell, N., and Maglio, P. P. (2008).
The Service System Is the Basic Abstraction of Service Science. In HICSS ’08:
Proceedings of the Proceedings of the 41st Annual Hawaii International Conference
on System Sciences, page 104, Washington, DC, USA. IEEE Computer Society.
[Tatemura et al., 2007] Tatemura, J., Sawires, A., Po, O., Chen, S., Candan, K. S.,
Agrawal, D., and Goveas, M. (2007). Mashup Feeds: Continuous Queries over Web
Services. In SIGMOD ’07: Proceedings of the 2007 ACM SIGMOD international
conference on Management of data, pages 1128–1130, New York, NY, USA. ACM.
[van der Aalst, 2002] van der Aalst, W. M. P. (2002). Making Work Flow: On the
Application of Petri Nets to Business Process Management. In ICATPN ’02:
Proceedings of the 23rd International Conference on Applications and Theory of
Petri Nets, pages 1–22, London, UK. Springer-Verlag.
[van der Aalst et al., 2003] van der Aalst, W. M. P., ter Hofstede, A. H. M., and
Weske, M. (2003). Business Process Management: A Survey. In Business Process
Management, pages 1–12.
[van der Aalst and Weske, 2005] van der Aalst, W. M. P. and Weske, M. (2005).
Case Handling: A New Paradigm for Business Process Support. Data Knowl.
Eng., 53(2):129–162.
[W3C, 2004] W3C (2004). Resource Description Framework (RDF) Specifications.
http://www.w3.org/RDF/.
[Watt, 2007] Watt, S. (2007). Mashups – The evolution of the SOA, Part 1:
Web 2.0 and foundational concepts. http://www.ibm.com/developerworks/
webservices/library/ws-soa-mashups/index.html.
[Weske, 2007] Weske, M. (2007). Business Process Management: Concepts, Languages, Architectures. Springer-Verlag New York, Inc., Secaucus, NJ, USA.
97
References
[WfMC, 2008] WfMC (2008). Process Definition Interface – XML Process Definition
Language, Version 2.1a.
[Yu et al., 2008] Yu, J., Benatallah, B., Casati, F., and Daniel, F. (2008). Understanding Mashup Development. IEEE Internet Computing, 12(5):44–52.
[Yu, 2008] Yu, S. (2008). Innovation in the Programmable Web: Characterizing the
Mashup Ecosystem. In Mashups’08 ICSOC.
98