Print this article - Journal of Computing and Security

Transcription

Print this article - Journal of Computing and Security
October 2015, Volume 2, Number 4 (pp. 257–270)
http://www.jcomsec.org
Journal of Computing and Security
Web Service Matchmaking based on Functional Similarity in Service
Cloud
Kourosh Moradyan a , Omid Bushehrian a,∗
a Department
of Computer Eng. and IT, Shiraz University of Technology, Shiraz, Iran.
ARTICLE
I N F O.
Article history:
Received: 29 May 2015
Revised: 13 November 2015
Accepted: 02 March 2016
Published Online: 26 August 2016
Keywords:
Web Service, Web Service
Matchmaking, Cloud Computing,
Ontology, Semantic Web Service,
OWL-S, WSMO
ABSTRACT
Nowadays increasing use of web as a means to accomplish daily tasks by calling
web services, makes web services more and more significant. Users make a
query on the Internet to find the required web service based on their needs.
Cloud computing, due to its design and abundance of resources has become an
ideal choice for web service providers to publish their services backed by cloud
servers. The cloud can eliminate problems like web service availability and
security. On the other side, obtaining the most relevant web service depends
on user’s request accuracy and the mechanism used to match the request.
Due to recent shutting down of public UDDI registries, most of web service
matchmaking mechanisms are based on web service description files (WSDL)
which are published on the owners’ websites. Semantic web services use OWL-S
and WSMO instead of WSDL to describe services in a way that software agents
are able to find appropriate services automatically. However, the high cost and
effort needed to formally define web services makes this method impractical.
In this paper we have proposed an ontology which formally models the user’s
query for web services in the service cloud by considering both functional and
syntactical dimensions. The stepwise matchmaking method of web services
based on the user’s query is also presented. To show the precision of the
proposed method, a set of experiments on a cluster of 3738 real web service
WSDL documents has been performed.
c 2015 JComSec. All rights reserved.
1
Introduction
Service-Oriented Computing (SOC) is an emerging
paradigm in which the ultimate goal is to realize development of distributed applications in heterogeneous
environments. Services that can be defined as a piece
of functionality done by an external provider are the
basic building blocks. They are identified by an ad∗ Corresponding author.
Email addresses: [email protected] (K. Moradyan),
[email protected] (O. Bushehrian)
c 2015 JComSec. All rights reserved.
ISSN: 2322-4460 dressable interface describing their capabilities to the
outer world. An application developed according to
SOC paradigm includes a combination of service discovery, selection and invocation [1]. Cloud computing,
due to its design, makes the use of cloud based web
services trustable and hence service providers are eager to publish their services using cloud servers. Cloud
guarantees availability, reliability, and quality of service [2]. A web service is described by means of the
web Services Description Language (WSDL), and this
description is published in a public Universal Description Discovery and Integration (UDDI) registry. Web
258 Web Service Matchmaking based on Functional Similarity in Service Cloud — K. Moradyan and O. Bushehrian
service communication is carried out through a form
of XML messaging, such as Simple Object Access
Protocol (SOAP) request or response. But major service providers such as Google, Amazon and Yahoo no
longer publish their services in public UDDI registries;
instead they publish them through their own websites.
This trend forces users to discover web services using
a search engine model. In [3] Al-Masri et al. showed
that 53% of the registered services of UDDI Business
Registry (UBR) are invalid, whereas 92% of services
found by search engines are valid and active. Search
engines process a user’s query to get the results back.
So it is essential for the user to be aware of the correct keywords in order to retrieve the most relevant
services that match the service request [4].
Cloud services provide a variety of services such
as Infrastructure as a Service (IaaS), Platform as
a Service (PaaS), and Software as a Service (SaaS)
paradigms [5]. The advances in SaaS paradigm has
caused a rapid growth in the number of clouds and
their services in recent years [6]. Some of the characteristics of SaaS such as high availability confidence
of 99.99%, better scalability, cost saving, less maintenance and API integration have attracted service
providers to migrate to cloud [5]. Daily increase of
the number of web services has raised web service
discovery as a new challenge in cloud [6].
Discovery is the process of matchmaking between
user’s query and advertised service descriptions [7]. In
this paper we have proposed an ontology (Figure 1)
which formally models the user’s query for web services in the service cloud considering both functional
and syntactical dimensions. Preparing user’s request
using query ontology allows the users to verify the
requests logically before giving it to a time consuming
matchmaking algorithm. Due to informative form of
requests based on query ontology there may be situations in which no service can satisfy the request.
Based on this query language, a matchmaking method
is also proposed and its accuracy is examined.
The rest of this paper is organized as follows. In
Section 2 we briefly investigate the related works.
Section 3 gives information about structure of WSDL
documents. Section 4 explains our proposed query
ontology. In this section we design an ontology based
on information needed for requesting a service. In
Section 5 we introduce our proposed approach. In
Section 6 we explain the way to extract features from
WSDL documents. In Section 7 the relation between
user’s query and web services is established. Section 8
discusses the experiments results. And in Section 9
conclusions and future works will be discussed.
2
Related Works
Due to the importance of web services and the benefits that can be achieved from them, in recent years
the web service technology has gained much attention. One of the most challenging problems is the
discovery of web services based on their capabilities.
Semantic web services are described using web ontology languages (OWL-S) [8] or Web Service Modeling
Ontology (WSMO) [9]. Describing web services using
such languages enables software agents to automatically discover, select, and use services as components
in a composite that shapes an application. Semantic
web services use ontologies to find matches between
user requirements and service capabilities [10]. However, due to the high cost and effort needed to semantically describe web services, this approach has not
been very widely applied. In fact non-semantic web
services are more popular because they are supported
by industry and also development tools [11]. Instead,
non-semantic web services discovery approaches use
information retrieval techniques [12].
In [13] matching of documents is carried out based
on weighted keyword similarity that is a textual matching. They weigh each element in the WSDL document
according to significance of that element in describing the desired service. Wu et al. in [14] proposed a
system for improving web service discovery based on
clustering and tagging text. In [15][16][17] matching
is performed based on the semantics of schemas. They
tried to find match between two schemas which is a
linguistic analysis.
Many studies have been carried out to overcome the
drawbacks of discovery techniques based on UDDI. In
order to avoid the bottleneck, studies on distributed
service discovery in peer-to-peer or grid environments
have been carried out in [18]. Qiang et al. in [19]
proposed a peer-to-peer-based decentralized service
discovery approach that uses data distribution and
lookup capabilities of Chord in order to discover services. They improved data availability by distributing
descriptions of functionally alike web services to different successor nodes. Their approach, named Chord4S,
also supports QoS-aware service discovery. Dasgupta
et al. in [20] proposed a multi-agent based distributed
platform for efficient semantic service discovery. They
also have addressed the scalability problem of discovery approaches.
Abundance of Web services with the same functionality over the Internet and the daily additions has
become a problem that causes longer response time
for matchmaking algorithms. In order to overcome
this problem some of the approaches consider nonfunctional aspects of web services to filter and select
more accurate services. Kritikos et al. in [21] proposed
259
October 2015, Volume 2, Number 4 (pp. 257–270)
Parameter
Throughput
Subclass of
QoS
NoInputParam...
Subclass of
Subclass of
ResponseTime
Subclass of
InputParameter
Relational
Subclass of
Subclass of
Subclass of
Reliability
Literal
RDF
1
Literal
XML
OutputParameter
1
1
Availability
Syntactical
Subclass of
hasType
Literal
hasValue
RollbackCost
Subclass of
Subclass of
Subclass of
hasName
hasSignature
ReturnType
Subclass of
isCompos...
Transactional
1
Thing
0 .. *
has
QueryFormat
0 .. *
FunctionalFilter
Subclass of
isFilteredBy
Subclass of
Subclass of
URI
Goal
Subclass of
1
Subclass of
Subclass of
SubclassSubclass
of
of
belongsTo
Subclass of
NonFunctionalF...
Subclass of
Service
DataSourceLo...
Subclass of
SQLServer
Filter
Signature
Compensable
Keyword
Retriable
Atomic
Cancelable
Figure 1. The Proposed Query Ontology
three techniques that manages space of offered services
to improve the overall matchmaking time. Suraci et al.
in [22] proposed a distributed workload control algorithm to manage the requests. They concentrated on
minimizing the overall latencies that requester agents
would face.
Service-based applications may need to replace the
services that are no longer available or fail to meet
certain requirements during the execution of application. Zisman et al. in [23] proposed a framework that
supports runtime service discovery.
Cassar et al. in [24] presented a method, that using probabilistic machine learning techniques, is capable of extracting hidden features from semantic web
service description documents such as OWL-S and
WSMO. They used these features to build a model
that represents different types of service descriptions
in a vector which enables heterogeneous service descriptions to be represented, compared or discovered.
The majority of web services exist without explicit
associated semantic descriptions that leads to not
being considered as relevant to specific user’s request
during web service discovery. Paliwal et al. in [25] tried
to categorize web service descriptions semantically
and also enhance semantics of user’s request.
Semantic web services, as the most revolutionary
technology, promises to enable machine to machine
interaction on the web and to automate the discovery
and selection of the most suitable web services in
building service based applications. But most of the
web service descriptions are using conventional web
service annotations, e.g. WSDL that lacks semantic
annotations. Farrag et al. in [26] proposed a mapping
algorithm that facilitates redefinition of conventional
web service annotations using semantic annotations
such as OWL-S.
Sangers et al. in [27] proposed a semantic web service discovery framework in order to find services using
natural language processing techniques. They used
part-of-speech tagging, lemmatization, and word sense
disambiguation techniques to match keywords from
the user’s request with the web service descriptions
given in WSMO. Klusch et al. in [28] proposed a hybrid semantic web service matchmaker, called OWLSMX, that improves logic-based semantic matching of
OWL-S services by leveraging non-logic-based information retrieval techniques. In [29] the analysis of
experimental evaluations of performance of OWLSMX is summarized and an improved matchmaker is
implemented.
In some QoS-based approaches in web service dis-
260 Web Service Matchmaking based on Functional Similarity in Service Cloud — K. Moradyan and O. Bushehrian
covery and selection, it is assumed that the QoS data
expressed by service providers are trustworthy and effective. However, the values of QoS attributes may be
unreliable. El-Kafrawy et al. in [30] proposed a web
service discovery and selection model that considers
trust factor when calculating web service reputation.
Lin et al. in [31] proposed a mechanism in which web
service discovery is achieved in two phases based on
QoS and collaborative filtering. Their mechanism verifies QoS values stated by service providers. Sunchithra
et al. in [32] used a third party between provider and
consumer to check for integrity of QoS values and
calculate some implicit metrics. In [33] web service
discovery is carried out by considering four similarity
assessments to match services. They used WordNet
and HowNet to calculate the distance between two
words in a hierarchy concept.
3
WSDL Document Structure
WSDL is a language based on XML to describe nonsemantic web services. It provides descriptions about
the operations that a web service can perform as well
as its location. Inside the root element of a WSDL file,
services are defined using six major components [34]:
• <types> describes a container for defining data
types using some type system such as XML
Schema Definition (XSD).
• <message> represents an abstract data that is
transmitted. A message consists of logical parts,
each of which is associated with a definition within
some type system.
• <portType> is a set of abstract operations. Each
operation refers to an input message and/or output message.
• <binding> specifies concrete protocol and data
format specifications for the operations and messages defined by a particular portType.
• <port> specifies an address for a binding, thus
defines a single communication endpoint.
• <service> aggregates a set of related ports.
4
Query Ontology
We proposed an ontology (Figure 1) to formally define
the user query for service matchmaking. The ontology
structure states that every query is a workflow (composition) of one or more tasks that the user requires.
Each task in the query has a name and a description that describes its function using natural language
comments. Those comments include some essential
keywords, as well as parameters that are categorized
into input parameters and output parameters. For
every task parameter, its type should be mentioned.
Parameter name is also considered because it may contain useful words which are relevant. We also consider
Quality of Service (QoS) criteria as an option that
the user can consider. We created a tool for creating
queries based on the ontology. User’s query is stored
as RDF statements for further processing.
As an example, consider the user requests for a task
to check weather information for a city, so he/she prepares his request using query creating tool. Request
is a web service that is capable of retrieving weather
information for a city that the user enters as a string.
Request is stored as RDF statements to be processed
next. In order to transform the user query to RDF
statements we use query ontology axioms to prepare
instance statements by creating individuals from ontology resources and then establishing relationships
between individuals using predefined predicates in ontology. In definition of concepts in ontology we set constraints which allow us to check them on newly added
statements. We use textual data in a request as literal
values of statements. After transformation, in request
process phase reasoning over RDF statements is done
and the consistency among statements is checked. If
no inconsistency is found, it means user’s request is
logically correct and is ready to be delivered to a web
service discovery system. We use Pellet reasoner in
Jena to check the consistency of user request. We first
load the query ontology in a model and then add RDF
statements created from user request to it. Reasoning
is done over the model which contains query ontology schema and also newly added statements from
the user request. In Figure 2 a fragment of RDF file
containing user information is presented. The steps
for preparing the request are illustrated in Figure 3.
5
Matchmaking Approach
Matchmaking process is carried out in three phases.
First functional matching is done that matches services functionally, then the syntactical matching of
services with the same functional capabilities is done
and at the end QoS matching of services is considered,
i.e. to find services with the required QoS capabilities
according to the user expectations. Figure 4 shows
the sequence of matchmaking process. We use the information available in WSDL document to perform
matchmaking. We parse the WSDL document to retrieve features that describe the function of the web
service semantically. Features of interest are service
name, operation names, operation documentations,
parameter names and types, and service documentation. We use root element, service element, types
element, message element, and portType element to
extract our required features. These are the features
used to compare user’s query with web services.
October 2015, Volume 2, Number 4 (pp. 257–270)
Figure 2. A Fragment of the User Request
Predefined
Statements
Mapping
Statements
Query Ontology
Reject
Reasoning
verified
Discovery
System
Figure 3. Steps for Preparing the User Request
Figure 4. The Sequence of Matchmaking Process
In fact the logic of our matchmaking algorithm is
based on this reasoning that when user describes his
required task, he uses some keywords to explain it.
Obviously the keywords are nouns and verbs of the
sentences (just content words, not function words)
and we try to find synonym matches for these keywords among the keywords extracted from documents
used to describe web services. In case the services are
not annotated with documentations, only service, parameter and operation names are used to extract the
261
262 Web Service Matchmaking based on Functional Similarity in Service Cloud — K. Moradyan and O. Bushehrian
Requested Tasks
Query Ontology
WSDL Documents
?
WSDL Parser
Request Parser
WordNet
Service 1
Service 2
Description
…….
Task 2Description
Task 1
TaskTask
name2
…...
Task name
DescriptionTask n
Task name
Description
Input Task name
Description
Parameter
Input1
Description
Parameter
…... Input1
Parameter
Parameter
…...kInput1
Parameter
Parameter
…...k
1
Output
Parameter
…...k
Parameter
Output
Parameter k
Parameter
Output
Parameter
Output
Parameter
…...
Task name
Service n
Task 2Description
TaskTask
namen…...
Task name
Description
name Taskn 2Description
Input TaskTask
name…...
Description Task
Task name
InputDescription
Operation 1
name
Description
Input TaskTask
Task
namen…...
Parameter
InputDescription
1
Task name
InputDescription
Parameter
…... InputDescription
1 Input Task name
Operation
Task
name n
Parameter
InputDescription
1
Parameter
Parameter
…...k
1 InputDescription
Operation name
Description
Parameter
…...
Input
1
Input
Output
Parameter
…...k Parameter
InputDescription
1
Parameter
Parameter
…...k
1 Input
Output
Parameter
k Parameter
…... InputDescription
1
Output
Parameter
…...
k
Parameter
Input1
Parameter
Output Parameter
Parameter
…...k
1
Output
Parameter
k Parameter
…... Input1
Parameter
Output
Output
Parameter
…...k
Parameter
Output
Parameter
Parameter
…...
k
1
Parameter Output
Parameter
k
Parameter
Output
Output
Parameter
…...k
Parameter
Output
Parameter Output
Parameter m
Parameter
Output
Parameter
Output
Parameter
Parameter
Output
Parameter
Matchmaker
Response
Figure 5. Architecture of the Proposed Matchmaker
keywords.
By comparing the user’s query content with the
extracted features semantically, services with the required functionality are matched to the user’s query.
Next we compare matched services to the user’s query
according to the operation signatures. To match the
signatures, we consider the number and the types of
the input and output parameters for the request and
the service. Finally we match the QoS criteria issued
in the request to the QoS parameters in the service. So
the most relevant web services are returned. Figure 5
shows the proposed matchmaking method.
6
tations, parameter names and parameter types of the
input and output parameters for every operation in
WSDL documents. We process this information to
retrieve the required features.
A. Processing user query
A user query is a collection of RDF statements that
is queried using SPARQL. To do so we use the Jena
API 1 and we make some query to extract information
from the user request. Features that we need are:
(1) User’s description for the task. We process this
description using OpenNLP API. Processing
description is as follows:
a) Extracting sentences. We split text into sentences.
b) Tokenizing every sentence.
c) Tagging every token.
d) Extracting keywords using tags. We extract
just words that are tagged as noun or verb,
so function words are not retrieved.
Extracting Features
In order to match the query and the services, we need
to extract information from the user query and WSDL
documents. The required information are: task names,
task descriptions, Input and Output parameter names
and types from the user query, and the QoS values
specified in the request as well as service name, service
documentation, operation names, operation documen1
https://jena.apache.org/
October 2015, Volume 2, Number 4 (pp. 257–270)
e) Preprocessing keywords. We preprocess keywords to remove the synonym keywords from
the list of obtained keywords.
(2) Task names. Every task name is comprised of
some meaningful words that states the function
of the task.
(3) Signature of the required tasks. This is the information about the name, the number and the
types of the input and the output parameters
per task.
(4) Quality of Service parameters. We use these
parameters from the user request to compare
them with the corresponding values which were
calculated previously for every service.
ters from them. Processed words are checked
to be meaningful in English and so the words
that are special names are ignored and the others are added to a list to be compared to user’s
keywords semantically. In order to check that a
word is meaningful or not we use WordNet. We
use JAWS 3 API to determine the synonym set
of a given word. If the returned synonym set is
empty, it means that the given word is not an
English word. Keywords gathered from operation names and parameter names are added to
a separate vector to be compared to keywords
gathered from task names and their parameters
from user request.
Service names try to describe the overall functions of services, so we add the keywords that
are extracted from the service name to the vector of keywords extracted from web service documentation.
(3) Extracting and processing operation documentation. Every operation may have a documentation
that describes its function. We extract this documentation and do the same work as presented
in step 1. We collect the keywords from this documentation and web service documentation in
a list to be processed next.
(4) Extracting signatures of operations. We extract
the types of the input parameters as well as the
output parameters to compare with user query’s
signature(s).
B. Parsing WSDL documents
We need some essential features in WSDL documents that explain the function of the corresponding web service. Features like service name, service
documentation, operation names, operation documentations and parameter names. We also extract operation signatures in order to compare to the user’s
request. In order to parse WSDL document we use
Membrane SOA Model 2 that is a Java API for WSDL
and XML Schema. Parsing is as follows:
(1) Extracting and processing Web service documentation. Web service documentations in a WSDL
document may be repeated and also may contain
some useless information like URL and contact
number. In order to remove this information we
use Regular Expressions in Java by accommodating their patterns to recognize them inside
the text and then removing them. Then we process the description as
a) Extracting sentences.
b) Tokenizing every sentence.
c) Tagging every token.
d) Extracting keywords using tags. We extract
just the words that are tagged as noun or
verb, so function words are not retrieved.
e) Preprocessing keywords. Like preprocessing
the keywords obtained from the request we
also preprocess these keywords to remove the
synonym keywords from the list of obtained
keywords.
(2) Extracting and processing service name, operation names and parameter names. Service name,
operation names and parameter names in WSDL
documents follow the Camel Case naming convention, so we use the Camel Case writing pattern to split names into their formative words.
These extracted words may contain non-letter
characters at the beginning or at the end of the
word, so we check them to delete extra charac2
http://www.membrane-soa.org/soa-model/
7
Established Realations
Assume that the user’s query consists of n tasks
T = {t1 , t2 , · · · , tn }. The similarity measure
Similarity(ti , sj ) between task ti from the user
request and service sj is as follows:
Similarity(ti , sj ) = Φ1 S(Wti , Wj )
+ Φ2 Simname (ti , sj )
+ Φ3 M atchsignature (ti , sj )
+ Φ4 M atchQoS (ti , sj )
(1)
In Equation (1), Φ1 = Φ2 = Φ3 = Φ4 = 0.25 are
weights of the assessment methods and Wti and Wj
are keywords gathered from the user query documentation of task ti and the keywords of the embedded
documentations and the service name in service sj
respectively. S (Wti , Wj ) computes the similarity between two list of gathered keywords and is calculated
as follows:
|Wti ∩ Wj |
S (Wti , Wj ) =
(2)
|Wti |
3
http://lyle.smu.edu/~tspell/jaws/
263
264 Web Service Matchmaking based on Functional Similarity in Service Cloud — K. Moradyan and O. Bushehrian
In Equation (2), |Wti ∩Wj | is the cardinality of the set
of common keywords in Wti and Wj that semantically
are the same and is calculated as follows:
If Wti = {wi1 , wi2 , · · · , win } contains n keywords
and Wj = {wj1 , wj2 , · · · , wjm } contains m keywords,
then if we define CK as the set of common keywords
and Synset(x) as the set of synonyms of keyword x,
then the relationship defined in 3 is established:
CK ≡ {w ∈ (Wti ∩ Wj )|w ∈ Wti ∧ ∃x ∈ Wj : w
∈ Synset(x)}
(3)
In Equation (1), Simname (ti , sj ) computes the similarity between the names of task ti and the operation
names of service sj and their parameter names. We
calculate this similarity using Equation (4) as follows:
Simname (ti , sj ) = max (S(T N i , Ojk ))
k=1···n
(4)
In which, T N i is a list containing the name of task ti
and its parameters split into meaningful words, and
Ojk is a list of the name of k th operation in service
sj and its parameters split into meaningful words.
In order to match the signature of user query
task ti with an operation signature in sj we use
M atchsignature (ti , sj ) in Equation (1). We check signatures according to the number and types of input
parameters and output parameters. In Equation (5)
we show how to match signatures as follows:
M atchsignature (ti , sj ) = max (matching(ti , ojk ))
k=1···n
(5)
In which, Ojk is kth operation in service sj and ti
is the user’s required task. And matching(ti , ojk ) is
calculated as follows:
M (ti , ojk )
matching(ti , ojk ) =
(6)
avg(E(ti ), E(ojk ))
Where, M (ti , ojk ) is the number of matched types
between ti and ojk and E (ti ) , E(ojk ) are the total
number of types in ti and ojk respectively.
Quality of Service (QoS) refers to non-functional
attributes of web services. Main attributes are as
follows [35]:
• Availability : it is the quality aspect of service
which concerns about presence and being ready
for immediate use of web service. Larger values
indicates higher availability.
• Response Time: it is measured as the time a
service takes to respond to requests.
• Throughput: it is defined as the rate at which
a service is able to process requests.
• Reliability : it is a quality aspect of a web service
defined as the probability that the web service
can respond to its users without failure.
• Security : comprises the existence and authentication mechanisms offered by a service.
In Equation (1), we calculate M atchQoS (ti , sj ) using the following formula:
M atchQoS (ti , sj ) =
n
X
ωi simAi (ti , sj )
(7)
i=1
In Equation (7), n equals to the number of considered
QoS attributes, Ai is the ith attribute and ωi = n1 is
the weight of given attribute. Method simAi (ti , sj )
calculates the amount of similarity between attribute
Ai in task ti and service sj and for all of attributes is
defined as follows [33]:
simAi (ti , sj ) =
1 − |ti .Ai − sj .Ai |
2 − ti .Ai − sj .Ai
(8)
For example if we only consider response time and
throughput attributes then:
M atchQoS (ti , sj ) =
2
X
ωi simAi (ti , sj )
i=1
= 0.5 × simresponseT ime (ti , sj )
+ 0.5 × simthroughput (ti , sj ) (9)
In which:
simresponseT ime (ti , sj )
1 − |ti .responseT ime − sj .responseT ime|
=
(10)
2 − ti .responseT ime − sj .responseT ime
And
simthroughput (ti , sj )
1 − |ti .throughput − sj .throughput|
=
2 − ti .throughput − sj .throughput
(11)
A. Case stady
Consider user requests for a task in a service to
calculate summation of two numbers using a method
called sum that gets two float arguments as input
and a float as result, with a description as “adds
two numbers.” and some QoS expected. In order to
calculate Similarity(ti , sj ) first the content words
are extracted from the description which includes
“adds” and “numbers” are extracted. These words are
used to be matched against the content words previously extracted from web service documentations.
The amount of semantic matching is calculated as
S (Wti , Wj ). Then the word “sum” as a task name
is processed and is compared semantically to every operation name in the given web service. This
value calculates Simname (ti , sj ). At the end, the
amount of matching between task signature and
operation’s signatures (M atchsignature (ti , sj )) and
also M atchQoS (ti , sj ) are calculated by comparing
the types of the input and the output parameters
265
October 2015, Volume 2, Number 4 (pp. 257–270)
and matching QoS attributes. By assigning the same
weight to every similarity measure, the total similarity, Similarity(ti , sj ), for every web service to user
request is calculated.
8
Experiments and Results
Our experiments are based on a benchmark of 3738
real (online) web services collected from all over the
world [36]. In this benchmark the QoS of response
time and throughput attributes have been calculated.
We manually classified web services into calculator,
email validation, SMS, and weather categories. We
have done two series of experiments.
Our experiments are based on sixty five manually classified WSDL documents from the mentioned
benchmark shown in Table 1. We identified service descriptions from four categories: calculator (twenty two
services), email validation (six services), SMS (twenty
five services), and weather (twelve services). In this
section a set of experiments are reported: our matchmaker implemented with four similarity assessment
methods. Summation of the values gathered from
Equation (2) and Equation (4) calculates the amount
of functional similarity between the request and services. The third method in Equation (5) calculates the
amount of signature matching between the request
and services. And Equation (7) filters the proposed
services to match the user request more accurately.
To evaluate the effectiveness of our proposed matchmaker we used precision and recall metrics. “Precision is the proportion of retrieved documents that are
relevant, and recall is the proportion of relevant documents that are retrieved” [33]. We used the following
Equations:
above. Experiments showed our approach has better
performance.
In [37], the proposed approach focuses on finding
web services that have exactly the same name and description and web services with the same function but
described using different synonym words have less importance. Also they have used hierarchical structures
between words that relate them with irrelevant functional semantics. For example words like “get” and
“add” shouldn’t be compared.
In our experiments we can see that in some cases,
similarity values are rather low. In our opinion, the
reason may be some faulty information in WSDL documents, for example an operation named “calcAdd”
is split to “calc” and “Add” and because “calc” is
meaningless in English, it is deleted. Or for another
example in a WSDL document that have operations
named “add”, “sub”, “mult”, and “divide”, the operation names “sub” and “mult” are discarded because
they are meaningless. By finding the best synonym
words and replacing them with the abbreviated words,
as a solution to these problems, the results will be
improved.
We also compared our approach to goDiscovery
[11], a non-semantic web service discovery approach
based on WSDL which uses statistical methods and
indexing techniques to improve precision and response
time of discovery process. The same benchmark as
ours has been used to evaluate goDiscovery.
In order to compare our work we use F score measure that is a measure to evaluate the overall system
accuracy. F score is a weighted average of Precision
(P) and Recall (R) and is calculated as follows [11]:
F score = 2 ∗
Recall =
|true positive|
|true positive| + |f alse negative|
P recision =
(12)
|true positive|
|true positive| + |f alse positive|
(13)
Figure 6 shows the results of our experiments. We
ran different queries over four categories to obtain
these results. The web services were given similarity
values to the request. In our experiments we considered
services whose similarities to the request were bigger
than 30%. This threshold was obtained empirically as
it fitted our experiments well.
We compared our approach to a similar one. We implemented the approach in [37] and performed our experiments on sixty five WSDL documents, mentioned
P ∗R
P +R
(14)
Figure 7 shows the F score for our approach, goDiscovery [11] and approach in [37]. Results show that
our approach outperforms the other approaches according to the overall accuracy. In our approach we
focus on matching services semantically with the same
functionality. We consider both exact and functionally alike results to the request. In [37], the focus is
on finding the exact matches with the request. Therefore some of the potential results are not considered.
goDiscovery is not a semantic-based approach but a
statistical and indexing approach that is based on the
retrieved information of current document. It uses
them to build its model.
266 Web Service Matchmaking based on Functional Similarity in Service Cloud — K. Moradyan and O. Bushehrian
Table 1. Manually Classified Web Services
Category
WSDL URI
http://hs19.iccs.bas.bg/work/nusoap/nusoap.0.7.2/lib/addexample2.php?wsdl
http://www.vejlebib.dk/webservices/mathservice.asmx?WSDL
http://www.fasil.dk/CalculationService.asmx?WSDL
http://mathertel.de/AJAXEngine/S01_AsyncSamples/CalcService.asmx?WSDL
http://fhrg.first.fraunhofer.de:8080/linuxtoolbox/services/Calculator?wsdl
http://www.tou.it/WebServices/MathService.asmx?WSDL
http://www.html2xml.nl/Services/Calculator/Version1/Calculator.asmx?WSDL
http://www.pggmlevensloopcalculator.nl/CalculatorService.asmx?WSDL
http://wiki.vanguardsw.com/bin/ws.dsb?wsdl/Finance/Accounting/House%20Payment%20Calculator
http://aspalliance.com/quickstart/aspplus/samples/services/MathService/VB/MathService.asmx?wsdl
http://www.deeptraining.com/webservices/mathservice.asmx?WSDL
Calculator
http://websrv.cs.fsu.edu/~engelen/calc.wsdl
http://www.learnasp.com/quickstart/aspplus/samples/services/MathService/VB/MathService.asmx?WSDL
http://www.development.ccs.neu.edu/home/gali/project/math.asmx?WSDL
http://www.cazella.com/math.asmx?WSDL
http://www.etecnologia.net/samples/webservices/mathservice.asmx?WSDL
http://www.fitnesswizard.com/Services/CalculationService.asmx?WSDL
http://www.huberstyle.com/slxws/calc.asmx?wsdl
http://reliablesoftware.net/add.asmx?WSDL
http://www.ecs.syr.edu/faculty/fawcett/handouts/CSE686/code/calcWebService/Calc.asmx?WSDL
http://soatest.parasoft.com/axis/calculator.wsdl
http://wiki.vanguardsw.com/bin/ws.dsb?wsdl/Operations/Inventory%20Management/EOQ%20Calculator
http://webservices.data-8.co.uk/EmailValidation.asmx?WSDL
http://ws.serviceobjects.com/ev2/emailvalidation2.asmx?WSDL
Email
http://ws.serviceobjects.com/ev/EmailValidate.asmx?WSDL
Validation http://ws2.serviceobjects.net/ev/EmailValidate.asmx?wsdl
http://ws2.serviceobjects.net/ev2/emailvalidation2.asmx?WSDL
http://wsext.ncm.com/emailvalidation.asmx?WSDL
http://sms.utnet.cn/smsservice2/v2/smssend2.asmx?WSDL
http://sms.utnet.cn/SmsService2/SmsSend.asmx?WSDL
http://www.barnaland.is/dev/sms.asmx?WSDL
http://info-messenger.de/sms.asmx?wsdl
http://www.mediestrategi.dk/services/sms_server.php?wsdl
http://sms.cellcom.co.il/SmsGate/SmsGate.asmx?wsdl
http://sms.cellcom.co.il/SmsGate/SmsGate2.asmx?WSDL
http://wap.icellcom.co.il/SmsGate/SmsGate2.asmx?wsdl
http://wap.icellcom.co.il/SmsGate/SmsGate.asmx?WSDL
http://usm.unicell.co.il/sms.asmx?WSDL
http://extra.vianett.no/webservices/vianett_sms_service/SMS_Service.asmx?WSDL
http://avalon.bitsyu.net/GamaSMS/SMSService.asmx?WSDL
SMS
http://www.smsdome.com/portalvbvs/services/smsgateway.asmx?wsdl
http://www.infocountry.co.za/za/default/webservice/SMSService.asmx?WSDL
http://chat.siberalem.com/SMS.asmx?wsdl
http://ws.cdyne.com/SmsWS/SMS.asmx?wsdl
http://ser02.2sms.com/WebServices/1.0/SMSService.asmx?WSDL
http://www.2sms.com/WebServices/1.1/SMSService.asmx?WSDL
http://2sms.com/WebServices/1.4/SMSService.asmx?WSDL
http://www.jajah.com/api/SMSService.asmx?wsdl
http://xms.admad.mobi/SMS.asmx?WSDL
http://www.geekymonkey.com/sms/smsservice.asmx?wsdl
http://sms.mobileart.com/sms_receive.wsdl
http://sms.yakoon.com/sms.asmx?WSDL
http://203.113.172.33/mssgateway/SMS_WS.asmx?WSDL
http://netpub.cstudies.ubc.ca/dotnet/WebServices/WeatherService.asmx?wsdl
http://www.thinkpage.cn/weather/WeatherService.asmx?WSDL
http://www.premis.cz/PremisWS/WeatherWS.asmx?WSDL
http://www.webservicex.net/WeatherForecast.asmx?wsdl
http://ws.cdyne.com/WeatherWS/Weather.asmx?wsdl
http://oscar.snapps.com/portal.nsf/WeatherService?WSDL
Weather
http://asyncpostback.com/WeatherService.asmx?WSDL
http://wsrp.bea.com/portal/boulder/weather.wsdl
http://lostsprings.com/weather/WeatherService.asmx?WSDL
http://www.riptrails.com/webservices/Weather.asmx?WSDL
http://weather.shellware.com/weather.asmx?wsdl
http://www.deeptraining.com/webservices/weather.asmx?wsdl
267
October 2015, Volume 2, Number 4 (pp. 257–270)
Recall
Precision
Recall [37]
Precision [37]
100
90
80
Percent
70
60
50
40
30
20
10
0
Weather
Calculator
Email Validation
SMS
Service Category
Figure 6. Results of the First Experiment
100
95
Average F_score
90
90.66
91.81
approach in [37]
goDiscovery
93.18
85
80
75
70
65
60
our approach
Approach
Figure 7. Comparison of Overall Accuracy of Three Approaches
9
Conclusion and Future Work
In this paper we make complementary contributions
to web service matchmaking. We design an ontology
to formulate a user’s query structure. To do so, we
propose a Query Ontology that persuades users to
prepare a structured request with full information
needed to do matchmaking. It also verifies the user
request logically before delivering it to matchmaking
process which is a time consuming process. So requests with no answer are discarded before processing,
which helps matchmaking algorithms to process only
appropriate requests and save time. We also propose
a semantic-based matchmaking algorithm based on
WSDL documents to discover services based on their
functionalities. Service matchmaking categorizes web
services into four categories and for each category a
similarity assessment method is given. Experimental
results show that our approach has an improved precision in retrieving services in comparison to previous
approaches.
It is hard to push programmers to use a formal specification for each web service they write. There are
problems like respecting Camel Case writing convention used by programmers, using abbreviated words in
defining web services, and considering relationship between words which belong to a specific domain. In the
future we plan to conquer such problems and obstacles
in our matchmaking method to get better results.
References
[1] Debajyoti
Chougule.
Mukhopadhyay and Archana
A survey on web service discov-
268 Web Service Matchmaking based on Functional Similarity in Service Cloud — K. Moradyan and O. Bushehrian
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
ery approaches. In Advances in Computer
Science, Engineering & Applications, pages
1001–1012. Springer, 2012.
URL http:
//link.springer.com/chapter/10.1007/
978-3-642-30157-5_99.
Khalid Elgazzar, Hossam S. Hassanein, and
Patrick Martin.
Daas: Cloud-based mobile web service discovery.
Pervasive and
Mobile Computing, 13:67–84, 2014.
URL
http://www.sciencedirect.com/science/
article/pii/S1574119213001405.
Eyhab Al-Masri and Qusay H. Mahmoud. Investigating web services on the world wide
web. In Proceedings of the 17th international
conference on World Wide Web, pages 795–
804. ACM, 2008. URL http://dl.acm.org/
citation.cfm?id=1367605.
Khalid Elgazzar, Ahmed E. Hassan, and
Patrick Martin. Clustering wsdl documents
to bootstrap the discovery of web services.
pages 147–154. IEEE, 2010.
ISBN 9781-4244-8146-0.
doi: 10.1109/ICWS.2010.31.
URL http://ieeexplore.ieee.org/lpdocs/
epic03/wrapper.htm?arnumber=5552792.
K. Chandrasekaran. Essentials of Cloud Computing. CRC Press, 2014. ISBN 978-1-4822-0543-5.
G. Khan, S. Sengupta, and A. Sarkar. Wsrm:
A relational model for web service discovery in
enterprise cloud bus (ecb). In 2014 3rd International Conference on Eco-friendly Computing and
Communication Systems (ICECCS), pages 217–
222, 2014. doi: 10.1109/Eco-friendly.2014.51.
G. Sambasivam, V. Ravisankar, T. Vengattaraman, R. Baskaran, and P. Dhavachelvan.
A normalized approach for service discovery.
Procedia Computer Science, 46:876–883, 2015.
ISSN 18770509. doi: 10.1016/j.procs.2015.02.
157. URL http://linkinghub.elsevier.com/
retrieve/pii/S1877050915002215.
David Martin, Mark Burstein, Jerry Hobbs,
Ora Lassila, Drew McDermott, Sheila McIlraith, Srini Narayanan, Massimo Paolucci, Bijan Parsia, Terry Payne, et al. Owl-s: Semantic markup for web services (2004). Latest version available from: http://www. ai. sri.
com/daml/services/owl-s/1.2, 2009.
Holger Lausen, Axel Polleres, and Dumitru Roman. Web service modeling ontology (wsmo).
W3C Member Submission, 3, 2005.
Tom Narock, Victoria Yoon, and Sal March. A
provenance-based approach to semantic web
service description and discovery. Decision
Support Systems, 64:90–99, 2014. ISSN 01679236.
doi: 10.1016/j.dss.2014.04.007.
URL http:
//linkinghub.elsevier.com/retrieve/pii/
S0167923614001286.
[11] Yehia Elshater, Khalid Elgazzar, and Patrick
Martin. godiscovery: Web service discovery made
efficient. pages 711–716. IEEE, 2015. ISBN
978-1-4673-7272-5. doi: 10.1109/ICWS.2015.99.
URL http://ieeexplore.ieee.org/lpdocs/
epic03/wrapper.htm?arnumber=7195634.
[12] John Garofalakis, Yannis Panagis, Evangelos Sakkopoulos, and Athanasios Tsakalidis. Contemporary web service discovery
mechanisms.
J. Web Eng., 5(3):265–290,
2006. URL http://www.researchgate.net/
profile/Yannis_Panagis/publication/
220538231_Contemporary_Web_
Service_Discovery_Mechanisms/links/
0fcfd50c04ce0afae6000000.pdf.
[13] Benjamin Kanagwa and Agnes F. N. Lumaala.
Discovery of services based on wsdl tag level
combination of distance measures. International Journal of Computing and ICT Research, 5, 2011. URL http://ijcir.mak.ac.ug/
specialissue2011/article3.pdf.
[14] Jian Wu, Liang Chen, Yanan Xie, and Zibin
Zheng. Titan: a system for effective web service discovery. In Proceedings of the 21st international conference companion on World Wide
Web, pages 441–444. ACM, 2012. URL http:
//dl.acm.org/citation.cfm?id=2188069.
[15] Hong-Hai Do and Erhard Rahm. Coma: a system for flexible combination of schema matching
approaches. In Proceedings of the 28th international conference on Very Large Data Bases, pages
610–621. VLDB Endowment, 2002. URL http:
//dl.acm.org/citation.cfm?id=1287422.
[16] AnHai Doan, Pedro Domingos, and Alon Y.
Halevy. Reconciling schemas of disparate data
sources: A machine-learning approach.
In
ACM Sigmod Record, volume 30, pages 509–
520. ACM, 2001. URL http://dl.acm.org/
citation.cfm?id=375731.
[17] Erhard Rahm and Philip A. Bernstein. A
survey of approaches to automatic schema
matching.
the VLDB Journal, 10(4):334–
350, 2001. URL http://link.springer.com/
article/10.1007/s007780100057.
[18] Xia Wang and Wolfgang A. Halang. Discovery and Selection of Semantic Web Services, volume 453 of Studies in Computational Intelligence. Springer Berlin Heidelberg,
Berlin, Heidelberg, 2013. ISBN 978-3-642-339370. URL http://link.springer.com/10.1007/
978-3-642-33938-7.
[19] Qiang He, Jun Yan, Yun Yang, Ryszard Kowalczyk, and Hai Jin. A decentralized service discovery approach on peer-to-peer networks. IEEE
Transactions on Services Computing, 6(1):64–75,
2013. ISSN 1939-1374. doi: 10.1109/TSC.2011.31.
October 2015, Volume 2, Number 4 (pp. 257–270)
[20]
[21]
[22]
[23]
[24]
[25]
[26]
URL http://ieeexplore.ieee.org/lpdocs/
epic03/wrapper.htm?arnumber=5928313.
Sourish Dasgupta, Anoop Aroor, Feichen
Shen, and Yugyung Lee. Smartspace: Multiagent based distributed platform for semantic service discovery.
IEEE Transactions on Systems, Man, and Cybernetics: Systems, 44(7):805–821, 2014. ISSN 2168-2216,
2168-2232. doi: 10.1109/TSMC.2013.2281582.
URL http://ieeexplore.ieee.org/lpdocs/
epic03/wrapper.htm?arnumber=6616615.
Kyriakos Kritikos and Dimitris Plexousakis.
Novel optimal and scalable nonfunctional service matchmaking techniques. IEEE Transactions on Services Computing, 7(4):614–627,
2014. ISSN 1939-1374. doi: 10.1109/TSC.2013.11.
URL http://ieeexplore.ieee.org/lpdocs/
epic03/wrapper.htm?arnumber=6464254.
Vincenzo Suraci, Claudio Gori Giorgi, Stefano
Battilotti, and Francisco Facchinei. Distributed
workload control for federated service discovery.
Procedia Computer Science, 56:233–241, 2015.
ISSN 18770509. doi: 10.1016/j.procs.2015.07.
221. URL http://linkinghub.elsevier.com/
retrieve/pii/S1877050915017020.
Andrea Zisman, George Spanoudakis, James
Dooley, and Igor Siveroni. Proactive and reactive runtime service discovery: A framework
and its evaluation. IEEE Transactions on Software Engineering, 39(7):954–974, 2013. ISSN
0098-5589, 1939-3520. doi: 10.1109/TSE.2012.84.
URL http://ieeexplore.ieee.org/lpdocs/
epic03/wrapper.htm?arnumber=6375710.
Gilbert Cassar, Payam Barnaghi, and Klaus
Moessner. Probabilistic matchmaking methods
for automated service discovery. IEEE Transactions on Services Computing, 7(4):654–666,
2014. ISSN 1939-1374. doi: 10.1109/TSC.2013.28.
URL http://ieeexplore.ieee.org/lpdocs/
epic03/wrapper.htm?arnumber=6515118.
A. V. Paliwal, B. Shafiq, J. Vaidya, Hui
Xiong, and N. Adam. Semantics-based automated service discovery. IEEE Transactions on Services Computing, 5(2):260–275, 2012.
ISSN 1939-1374. doi: 10.1109/TSC.2011.19.
URL http://ieeexplore.ieee.org/lpdocs/
epic03/wrapper.htm?arnumber=5744076.
Tamer Ahmed Farrag, Ahmed Ibrahim Saleh,
and Hesham Arafat Ali. Toward swss discovery: Mapping from wsdl to owl-s based
on ontology search and standardization engine.
IEEE Transactions on Knowledge
and Data Engineering, 25(5):1135–1147, 2013.
ISSN 1041-4347. doi: 10.1109/TKDE.2012.25.
URL http://ieeexplore.ieee.org/lpdocs/
epic03/wrapper.htm?arnumber=6152104.
[27] Jordy Sangers, Flavius Frasincar, Frederik
Hogenboom, and Vadim Chepegin. Semantic web service discovery using natural language processing techniques.
Expert Systems with Applications, 40(11):4660–4671, 2013.
ISSN 09574174. doi: 10.1016/j.eswa.2013.02.
011. URL http://linkinghub.elsevier.com/
retrieve/pii/S0957417413001279.
[28] Matthias Klusch, Benedikt Fries, and Katia Sycara.
Owls-mx: A hybrid semantic web service matchmaker for owl-s services. Web Semantics: Science, Services and
Agents on the World Wide Web, 7(2):121–133,
2009. URL http://www.sciencedirect.com/
science/article/pii/S1570826808000838.
[29] Matthias Klusch, Patrick Kapahnke, and
Benedikt Fries. Hybrid semantic web service retrieval: A case study with owls-mx. In Semantic
Computing, 2008 IEEE International Conference
on, pages 323–330. IEEE, 2008.
[30] Passent El-Kafrawy, Emad Elabd, and Hanaa
Fathi. A trustworthy reputation approach
for web service discovery. Procedia Computer
Science, 65:572–581, 2015. ISSN 18770509.
doi: 10.1016/j.procs.2015.09.001. URL http:
//linkinghub.elsevier.com/retrieve/pii/
S1877050915028318.
[31] Szu-Yin Lin, Chin-Hui Lai, Chih-Heng Wu, and
Chi-Chun Lo. A trustworthy qos-based collaborative filtering approach for web service discovery. Journal of Systems and Software, 93:217–228,
2014. ISSN 01641212. doi: 10.1016/j.jss.2014.01.
036. URL http://linkinghub.elsevier.com/
retrieve/pii/S0164121214000442.
[32] M. Suchithra and M. Ramakrishnan. Efficient
discovery and ranking of web services using nonfunctional qos requirements for smart grid applications. Procedia Technology, 21:82–87, 2015.
ISSN 22120173. doi: 10.1016/j.protcy.2015.10.
013. URL http://linkinghub.elsevier.com/
retrieve/pii/S2212017315002418.
[33] Jian Wu and Zhaohui Wu. Similarity-based web
service matchmaking. In Services Computing,
2005 IEEE International Conference on, volume 1,
pages 287–294. IEEE, 2005.
[34] Erik Christensen, Francisco Curbera, Greg
Meredith, and Sanjiva Weerawarana. Web services description language (wsdl) 1.1, 2001.
[35] Daniel Menasce. Qos issues in web services. Internet Computing, IEEE, 6(6):72–75, 2002.
[36] Yilei Zhang, Zibin Zheng, and Michael R. Lyu.
Wsexpress: A qos-aware search engine for web
services. In Web Services (ICWS), 2010 IEEE
International Conference on, pages 91–98. IEEE,
2010.
[37] Yiqiao Wang and Eleni Stroulia. Semantic
269
270 Web Service Matchmaking based on Functional Similarity in Service Cloud — K. Moradyan and O. Bushehrian
structure matching for assessing web-service
similarity.
In Service-Oriented ComputingICSOC 2003, pages 194–207. Springer, 2003.
URL http://link.springer.com/chapter/
10.1007/978-3-540-24593-3_14.
Kourosh Moradyan received his Associate
Degree in computer software from University
of Kurdistan, Sanandaj, IRAN and his B.Sc.
degree in software engineering from Zagros
Institute of Higher Education, Kermanshah,
IRAN. He also received his M.Sc. degree in
software engineering from Shiraz University
of Technology, Shiraz, IRAN. His research
interests include Semantic Web, and Web
Service Discovery.
Omid Bushehrian received the BSc in software engineering from AmirKabir University
of Technology (Tehran Polytechnics), IRAN,
the MSc and Ph.D degrees in software engineering from the University of Science and
Technology, IRAN. He is currently a faculty
member at Shiraz university of Technology,
IRAN. His main research interest is cloud
computing and software quality.