Print this article - Journal of Computing and Security
Transcription
Print this article - Journal of Computing and Security
October 2015, Volume 2, Number 4 (pp. 257–270) http://www.jcomsec.org Journal of Computing and Security Web Service Matchmaking based on Functional Similarity in Service Cloud Kourosh Moradyan a , Omid Bushehrian a,∗ a Department of Computer Eng. and IT, Shiraz University of Technology, Shiraz, Iran. ARTICLE I N F O. Article history: Received: 29 May 2015 Revised: 13 November 2015 Accepted: 02 March 2016 Published Online: 26 August 2016 Keywords: Web Service, Web Service Matchmaking, Cloud Computing, Ontology, Semantic Web Service, OWL-S, WSMO ABSTRACT Nowadays increasing use of web as a means to accomplish daily tasks by calling web services, makes web services more and more significant. Users make a query on the Internet to find the required web service based on their needs. Cloud computing, due to its design and abundance of resources has become an ideal choice for web service providers to publish their services backed by cloud servers. The cloud can eliminate problems like web service availability and security. On the other side, obtaining the most relevant web service depends on user’s request accuracy and the mechanism used to match the request. Due to recent shutting down of public UDDI registries, most of web service matchmaking mechanisms are based on web service description files (WSDL) which are published on the owners’ websites. Semantic web services use OWL-S and WSMO instead of WSDL to describe services in a way that software agents are able to find appropriate services automatically. However, the high cost and effort needed to formally define web services makes this method impractical. In this paper we have proposed an ontology which formally models the user’s query for web services in the service cloud by considering both functional and syntactical dimensions. The stepwise matchmaking method of web services based on the user’s query is also presented. To show the precision of the proposed method, a set of experiments on a cluster of 3738 real web service WSDL documents has been performed. c 2015 JComSec. All rights reserved. 1 Introduction Service-Oriented Computing (SOC) is an emerging paradigm in which the ultimate goal is to realize development of distributed applications in heterogeneous environments. Services that can be defined as a piece of functionality done by an external provider are the basic building blocks. They are identified by an ad∗ Corresponding author. Email addresses: [email protected] (K. Moradyan), [email protected] (O. Bushehrian) c 2015 JComSec. All rights reserved. ISSN: 2322-4460 dressable interface describing their capabilities to the outer world. An application developed according to SOC paradigm includes a combination of service discovery, selection and invocation [1]. Cloud computing, due to its design, makes the use of cloud based web services trustable and hence service providers are eager to publish their services using cloud servers. Cloud guarantees availability, reliability, and quality of service [2]. A web service is described by means of the web Services Description Language (WSDL), and this description is published in a public Universal Description Discovery and Integration (UDDI) registry. Web 258 Web Service Matchmaking based on Functional Similarity in Service Cloud — K. Moradyan and O. Bushehrian service communication is carried out through a form of XML messaging, such as Simple Object Access Protocol (SOAP) request or response. But major service providers such as Google, Amazon and Yahoo no longer publish their services in public UDDI registries; instead they publish them through their own websites. This trend forces users to discover web services using a search engine model. In [3] Al-Masri et al. showed that 53% of the registered services of UDDI Business Registry (UBR) are invalid, whereas 92% of services found by search engines are valid and active. Search engines process a user’s query to get the results back. So it is essential for the user to be aware of the correct keywords in order to retrieve the most relevant services that match the service request [4]. Cloud services provide a variety of services such as Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS) paradigms [5]. The advances in SaaS paradigm has caused a rapid growth in the number of clouds and their services in recent years [6]. Some of the characteristics of SaaS such as high availability confidence of 99.99%, better scalability, cost saving, less maintenance and API integration have attracted service providers to migrate to cloud [5]. Daily increase of the number of web services has raised web service discovery as a new challenge in cloud [6]. Discovery is the process of matchmaking between user’s query and advertised service descriptions [7]. In this paper we have proposed an ontology (Figure 1) which formally models the user’s query for web services in the service cloud considering both functional and syntactical dimensions. Preparing user’s request using query ontology allows the users to verify the requests logically before giving it to a time consuming matchmaking algorithm. Due to informative form of requests based on query ontology there may be situations in which no service can satisfy the request. Based on this query language, a matchmaking method is also proposed and its accuracy is examined. The rest of this paper is organized as follows. In Section 2 we briefly investigate the related works. Section 3 gives information about structure of WSDL documents. Section 4 explains our proposed query ontology. In this section we design an ontology based on information needed for requesting a service. In Section 5 we introduce our proposed approach. In Section 6 we explain the way to extract features from WSDL documents. In Section 7 the relation between user’s query and web services is established. Section 8 discusses the experiments results. And in Section 9 conclusions and future works will be discussed. 2 Related Works Due to the importance of web services and the benefits that can be achieved from them, in recent years the web service technology has gained much attention. One of the most challenging problems is the discovery of web services based on their capabilities. Semantic web services are described using web ontology languages (OWL-S) [8] or Web Service Modeling Ontology (WSMO) [9]. Describing web services using such languages enables software agents to automatically discover, select, and use services as components in a composite that shapes an application. Semantic web services use ontologies to find matches between user requirements and service capabilities [10]. However, due to the high cost and effort needed to semantically describe web services, this approach has not been very widely applied. In fact non-semantic web services are more popular because they are supported by industry and also development tools [11]. Instead, non-semantic web services discovery approaches use information retrieval techniques [12]. In [13] matching of documents is carried out based on weighted keyword similarity that is a textual matching. They weigh each element in the WSDL document according to significance of that element in describing the desired service. Wu et al. in [14] proposed a system for improving web service discovery based on clustering and tagging text. In [15][16][17] matching is performed based on the semantics of schemas. They tried to find match between two schemas which is a linguistic analysis. Many studies have been carried out to overcome the drawbacks of discovery techniques based on UDDI. In order to avoid the bottleneck, studies on distributed service discovery in peer-to-peer or grid environments have been carried out in [18]. Qiang et al. in [19] proposed a peer-to-peer-based decentralized service discovery approach that uses data distribution and lookup capabilities of Chord in order to discover services. They improved data availability by distributing descriptions of functionally alike web services to different successor nodes. Their approach, named Chord4S, also supports QoS-aware service discovery. Dasgupta et al. in [20] proposed a multi-agent based distributed platform for efficient semantic service discovery. They also have addressed the scalability problem of discovery approaches. Abundance of Web services with the same functionality over the Internet and the daily additions has become a problem that causes longer response time for matchmaking algorithms. In order to overcome this problem some of the approaches consider nonfunctional aspects of web services to filter and select more accurate services. Kritikos et al. in [21] proposed 259 October 2015, Volume 2, Number 4 (pp. 257–270) Parameter Throughput Subclass of QoS NoInputParam... Subclass of Subclass of ResponseTime Subclass of InputParameter Relational Subclass of Subclass of Subclass of Reliability Literal RDF 1 Literal XML OutputParameter 1 1 Availability Syntactical Subclass of hasType Literal hasValue RollbackCost Subclass of Subclass of Subclass of hasName hasSignature ReturnType Subclass of isCompos... Transactional 1 Thing 0 .. * has QueryFormat 0 .. * FunctionalFilter Subclass of isFilteredBy Subclass of Subclass of URI Goal Subclass of 1 Subclass of Subclass of SubclassSubclass of of belongsTo Subclass of NonFunctionalF... Subclass of Service DataSourceLo... Subclass of SQLServer Filter Signature Compensable Keyword Retriable Atomic Cancelable Figure 1. The Proposed Query Ontology three techniques that manages space of offered services to improve the overall matchmaking time. Suraci et al. in [22] proposed a distributed workload control algorithm to manage the requests. They concentrated on minimizing the overall latencies that requester agents would face. Service-based applications may need to replace the services that are no longer available or fail to meet certain requirements during the execution of application. Zisman et al. in [23] proposed a framework that supports runtime service discovery. Cassar et al. in [24] presented a method, that using probabilistic machine learning techniques, is capable of extracting hidden features from semantic web service description documents such as OWL-S and WSMO. They used these features to build a model that represents different types of service descriptions in a vector which enables heterogeneous service descriptions to be represented, compared or discovered. The majority of web services exist without explicit associated semantic descriptions that leads to not being considered as relevant to specific user’s request during web service discovery. Paliwal et al. in [25] tried to categorize web service descriptions semantically and also enhance semantics of user’s request. Semantic web services, as the most revolutionary technology, promises to enable machine to machine interaction on the web and to automate the discovery and selection of the most suitable web services in building service based applications. But most of the web service descriptions are using conventional web service annotations, e.g. WSDL that lacks semantic annotations. Farrag et al. in [26] proposed a mapping algorithm that facilitates redefinition of conventional web service annotations using semantic annotations such as OWL-S. Sangers et al. in [27] proposed a semantic web service discovery framework in order to find services using natural language processing techniques. They used part-of-speech tagging, lemmatization, and word sense disambiguation techniques to match keywords from the user’s request with the web service descriptions given in WSMO. Klusch et al. in [28] proposed a hybrid semantic web service matchmaker, called OWLSMX, that improves logic-based semantic matching of OWL-S services by leveraging non-logic-based information retrieval techniques. In [29] the analysis of experimental evaluations of performance of OWLSMX is summarized and an improved matchmaker is implemented. In some QoS-based approaches in web service dis- 260 Web Service Matchmaking based on Functional Similarity in Service Cloud — K. Moradyan and O. Bushehrian covery and selection, it is assumed that the QoS data expressed by service providers are trustworthy and effective. However, the values of QoS attributes may be unreliable. El-Kafrawy et al. in [30] proposed a web service discovery and selection model that considers trust factor when calculating web service reputation. Lin et al. in [31] proposed a mechanism in which web service discovery is achieved in two phases based on QoS and collaborative filtering. Their mechanism verifies QoS values stated by service providers. Sunchithra et al. in [32] used a third party between provider and consumer to check for integrity of QoS values and calculate some implicit metrics. In [33] web service discovery is carried out by considering four similarity assessments to match services. They used WordNet and HowNet to calculate the distance between two words in a hierarchy concept. 3 WSDL Document Structure WSDL is a language based on XML to describe nonsemantic web services. It provides descriptions about the operations that a web service can perform as well as its location. Inside the root element of a WSDL file, services are defined using six major components [34]: • <types> describes a container for defining data types using some type system such as XML Schema Definition (XSD). • <message> represents an abstract data that is transmitted. A message consists of logical parts, each of which is associated with a definition within some type system. • <portType> is a set of abstract operations. Each operation refers to an input message and/or output message. • <binding> specifies concrete protocol and data format specifications for the operations and messages defined by a particular portType. • <port> specifies an address for a binding, thus defines a single communication endpoint. • <service> aggregates a set of related ports. 4 Query Ontology We proposed an ontology (Figure 1) to formally define the user query for service matchmaking. The ontology structure states that every query is a workflow (composition) of one or more tasks that the user requires. Each task in the query has a name and a description that describes its function using natural language comments. Those comments include some essential keywords, as well as parameters that are categorized into input parameters and output parameters. For every task parameter, its type should be mentioned. Parameter name is also considered because it may contain useful words which are relevant. We also consider Quality of Service (QoS) criteria as an option that the user can consider. We created a tool for creating queries based on the ontology. User’s query is stored as RDF statements for further processing. As an example, consider the user requests for a task to check weather information for a city, so he/she prepares his request using query creating tool. Request is a web service that is capable of retrieving weather information for a city that the user enters as a string. Request is stored as RDF statements to be processed next. In order to transform the user query to RDF statements we use query ontology axioms to prepare instance statements by creating individuals from ontology resources and then establishing relationships between individuals using predefined predicates in ontology. In definition of concepts in ontology we set constraints which allow us to check them on newly added statements. We use textual data in a request as literal values of statements. After transformation, in request process phase reasoning over RDF statements is done and the consistency among statements is checked. If no inconsistency is found, it means user’s request is logically correct and is ready to be delivered to a web service discovery system. We use Pellet reasoner in Jena to check the consistency of user request. We first load the query ontology in a model and then add RDF statements created from user request to it. Reasoning is done over the model which contains query ontology schema and also newly added statements from the user request. In Figure 2 a fragment of RDF file containing user information is presented. The steps for preparing the request are illustrated in Figure 3. 5 Matchmaking Approach Matchmaking process is carried out in three phases. First functional matching is done that matches services functionally, then the syntactical matching of services with the same functional capabilities is done and at the end QoS matching of services is considered, i.e. to find services with the required QoS capabilities according to the user expectations. Figure 4 shows the sequence of matchmaking process. We use the information available in WSDL document to perform matchmaking. We parse the WSDL document to retrieve features that describe the function of the web service semantically. Features of interest are service name, operation names, operation documentations, parameter names and types, and service documentation. We use root element, service element, types element, message element, and portType element to extract our required features. These are the features used to compare user’s query with web services. October 2015, Volume 2, Number 4 (pp. 257–270) Figure 2. A Fragment of the User Request Predefined Statements Mapping Statements Query Ontology Reject Reasoning verified Discovery System Figure 3. Steps for Preparing the User Request Figure 4. The Sequence of Matchmaking Process In fact the logic of our matchmaking algorithm is based on this reasoning that when user describes his required task, he uses some keywords to explain it. Obviously the keywords are nouns and verbs of the sentences (just content words, not function words) and we try to find synonym matches for these keywords among the keywords extracted from documents used to describe web services. In case the services are not annotated with documentations, only service, parameter and operation names are used to extract the 261 262 Web Service Matchmaking based on Functional Similarity in Service Cloud — K. Moradyan and O. Bushehrian Requested Tasks Query Ontology WSDL Documents ? WSDL Parser Request Parser WordNet Service 1 Service 2 Description ……. Task 2Description Task 1 TaskTask name2 …... Task name DescriptionTask n Task name Description Input Task name Description Parameter Input1 Description Parameter …... Input1 Parameter Parameter …...kInput1 Parameter Parameter …...k 1 Output Parameter …...k Parameter Output Parameter k Parameter Output Parameter Output Parameter …... Task name Service n Task 2Description TaskTask namen…... Task name Description name Taskn 2Description Input TaskTask name…... Description Task Task name InputDescription Operation 1 name Description Input TaskTask Task namen…... Parameter InputDescription 1 Task name InputDescription Parameter …... InputDescription 1 Input Task name Operation Task name n Parameter InputDescription 1 Parameter Parameter …...k 1 InputDescription Operation name Description Parameter …... Input 1 Input Output Parameter …...k Parameter InputDescription 1 Parameter Parameter …...k 1 Input Output Parameter k Parameter …... InputDescription 1 Output Parameter …... k Parameter Input1 Parameter Output Parameter Parameter …...k 1 Output Parameter k Parameter …... Input1 Parameter Output Output Parameter …...k Parameter Output Parameter Parameter …... k 1 Parameter Output Parameter k Parameter Output Output Parameter …...k Parameter Output Parameter Output Parameter m Parameter Output Parameter Output Parameter Parameter Output Parameter Matchmaker Response Figure 5. Architecture of the Proposed Matchmaker keywords. By comparing the user’s query content with the extracted features semantically, services with the required functionality are matched to the user’s query. Next we compare matched services to the user’s query according to the operation signatures. To match the signatures, we consider the number and the types of the input and output parameters for the request and the service. Finally we match the QoS criteria issued in the request to the QoS parameters in the service. So the most relevant web services are returned. Figure 5 shows the proposed matchmaking method. 6 tations, parameter names and parameter types of the input and output parameters for every operation in WSDL documents. We process this information to retrieve the required features. A. Processing user query A user query is a collection of RDF statements that is queried using SPARQL. To do so we use the Jena API 1 and we make some query to extract information from the user request. Features that we need are: (1) User’s description for the task. We process this description using OpenNLP API. Processing description is as follows: a) Extracting sentences. We split text into sentences. b) Tokenizing every sentence. c) Tagging every token. d) Extracting keywords using tags. We extract just words that are tagged as noun or verb, so function words are not retrieved. Extracting Features In order to match the query and the services, we need to extract information from the user query and WSDL documents. The required information are: task names, task descriptions, Input and Output parameter names and types from the user query, and the QoS values specified in the request as well as service name, service documentation, operation names, operation documen1 https://jena.apache.org/ October 2015, Volume 2, Number 4 (pp. 257–270) e) Preprocessing keywords. We preprocess keywords to remove the synonym keywords from the list of obtained keywords. (2) Task names. Every task name is comprised of some meaningful words that states the function of the task. (3) Signature of the required tasks. This is the information about the name, the number and the types of the input and the output parameters per task. (4) Quality of Service parameters. We use these parameters from the user request to compare them with the corresponding values which were calculated previously for every service. ters from them. Processed words are checked to be meaningful in English and so the words that are special names are ignored and the others are added to a list to be compared to user’s keywords semantically. In order to check that a word is meaningful or not we use WordNet. We use JAWS 3 API to determine the synonym set of a given word. If the returned synonym set is empty, it means that the given word is not an English word. Keywords gathered from operation names and parameter names are added to a separate vector to be compared to keywords gathered from task names and their parameters from user request. Service names try to describe the overall functions of services, so we add the keywords that are extracted from the service name to the vector of keywords extracted from web service documentation. (3) Extracting and processing operation documentation. Every operation may have a documentation that describes its function. We extract this documentation and do the same work as presented in step 1. We collect the keywords from this documentation and web service documentation in a list to be processed next. (4) Extracting signatures of operations. We extract the types of the input parameters as well as the output parameters to compare with user query’s signature(s). B. Parsing WSDL documents We need some essential features in WSDL documents that explain the function of the corresponding web service. Features like service name, service documentation, operation names, operation documentations and parameter names. We also extract operation signatures in order to compare to the user’s request. In order to parse WSDL document we use Membrane SOA Model 2 that is a Java API for WSDL and XML Schema. Parsing is as follows: (1) Extracting and processing Web service documentation. Web service documentations in a WSDL document may be repeated and also may contain some useless information like URL and contact number. In order to remove this information we use Regular Expressions in Java by accommodating their patterns to recognize them inside the text and then removing them. Then we process the description as a) Extracting sentences. b) Tokenizing every sentence. c) Tagging every token. d) Extracting keywords using tags. We extract just the words that are tagged as noun or verb, so function words are not retrieved. e) Preprocessing keywords. Like preprocessing the keywords obtained from the request we also preprocess these keywords to remove the synonym keywords from the list of obtained keywords. (2) Extracting and processing service name, operation names and parameter names. Service name, operation names and parameter names in WSDL documents follow the Camel Case naming convention, so we use the Camel Case writing pattern to split names into their formative words. These extracted words may contain non-letter characters at the beginning or at the end of the word, so we check them to delete extra charac2 http://www.membrane-soa.org/soa-model/ 7 Established Realations Assume that the user’s query consists of n tasks T = {t1 , t2 , · · · , tn }. The similarity measure Similarity(ti , sj ) between task ti from the user request and service sj is as follows: Similarity(ti , sj ) = Φ1 S(Wti , Wj ) + Φ2 Simname (ti , sj ) + Φ3 M atchsignature (ti , sj ) + Φ4 M atchQoS (ti , sj ) (1) In Equation (1), Φ1 = Φ2 = Φ3 = Φ4 = 0.25 are weights of the assessment methods and Wti and Wj are keywords gathered from the user query documentation of task ti and the keywords of the embedded documentations and the service name in service sj respectively. S (Wti , Wj ) computes the similarity between two list of gathered keywords and is calculated as follows: |Wti ∩ Wj | S (Wti , Wj ) = (2) |Wti | 3 http://lyle.smu.edu/~tspell/jaws/ 263 264 Web Service Matchmaking based on Functional Similarity in Service Cloud — K. Moradyan and O. Bushehrian In Equation (2), |Wti ∩Wj | is the cardinality of the set of common keywords in Wti and Wj that semantically are the same and is calculated as follows: If Wti = {wi1 , wi2 , · · · , win } contains n keywords and Wj = {wj1 , wj2 , · · · , wjm } contains m keywords, then if we define CK as the set of common keywords and Synset(x) as the set of synonyms of keyword x, then the relationship defined in 3 is established: CK ≡ {w ∈ (Wti ∩ Wj )|w ∈ Wti ∧ ∃x ∈ Wj : w ∈ Synset(x)} (3) In Equation (1), Simname (ti , sj ) computes the similarity between the names of task ti and the operation names of service sj and their parameter names. We calculate this similarity using Equation (4) as follows: Simname (ti , sj ) = max (S(T N i , Ojk )) k=1···n (4) In which, T N i is a list containing the name of task ti and its parameters split into meaningful words, and Ojk is a list of the name of k th operation in service sj and its parameters split into meaningful words. In order to match the signature of user query task ti with an operation signature in sj we use M atchsignature (ti , sj ) in Equation (1). We check signatures according to the number and types of input parameters and output parameters. In Equation (5) we show how to match signatures as follows: M atchsignature (ti , sj ) = max (matching(ti , ojk )) k=1···n (5) In which, Ojk is kth operation in service sj and ti is the user’s required task. And matching(ti , ojk ) is calculated as follows: M (ti , ojk ) matching(ti , ojk ) = (6) avg(E(ti ), E(ojk )) Where, M (ti , ojk ) is the number of matched types between ti and ojk and E (ti ) , E(ojk ) are the total number of types in ti and ojk respectively. Quality of Service (QoS) refers to non-functional attributes of web services. Main attributes are as follows [35]: • Availability : it is the quality aspect of service which concerns about presence and being ready for immediate use of web service. Larger values indicates higher availability. • Response Time: it is measured as the time a service takes to respond to requests. • Throughput: it is defined as the rate at which a service is able to process requests. • Reliability : it is a quality aspect of a web service defined as the probability that the web service can respond to its users without failure. • Security : comprises the existence and authentication mechanisms offered by a service. In Equation (1), we calculate M atchQoS (ti , sj ) using the following formula: M atchQoS (ti , sj ) = n X ωi simAi (ti , sj ) (7) i=1 In Equation (7), n equals to the number of considered QoS attributes, Ai is the ith attribute and ωi = n1 is the weight of given attribute. Method simAi (ti , sj ) calculates the amount of similarity between attribute Ai in task ti and service sj and for all of attributes is defined as follows [33]: simAi (ti , sj ) = 1 − |ti .Ai − sj .Ai | 2 − ti .Ai − sj .Ai (8) For example if we only consider response time and throughput attributes then: M atchQoS (ti , sj ) = 2 X ωi simAi (ti , sj ) i=1 = 0.5 × simresponseT ime (ti , sj ) + 0.5 × simthroughput (ti , sj ) (9) In which: simresponseT ime (ti , sj ) 1 − |ti .responseT ime − sj .responseT ime| = (10) 2 − ti .responseT ime − sj .responseT ime And simthroughput (ti , sj ) 1 − |ti .throughput − sj .throughput| = 2 − ti .throughput − sj .throughput (11) A. Case stady Consider user requests for a task in a service to calculate summation of two numbers using a method called sum that gets two float arguments as input and a float as result, with a description as “adds two numbers.” and some QoS expected. In order to calculate Similarity(ti , sj ) first the content words are extracted from the description which includes “adds” and “numbers” are extracted. These words are used to be matched against the content words previously extracted from web service documentations. The amount of semantic matching is calculated as S (Wti , Wj ). Then the word “sum” as a task name is processed and is compared semantically to every operation name in the given web service. This value calculates Simname (ti , sj ). At the end, the amount of matching between task signature and operation’s signatures (M atchsignature (ti , sj )) and also M atchQoS (ti , sj ) are calculated by comparing the types of the input and the output parameters 265 October 2015, Volume 2, Number 4 (pp. 257–270) and matching QoS attributes. By assigning the same weight to every similarity measure, the total similarity, Similarity(ti , sj ), for every web service to user request is calculated. 8 Experiments and Results Our experiments are based on a benchmark of 3738 real (online) web services collected from all over the world [36]. In this benchmark the QoS of response time and throughput attributes have been calculated. We manually classified web services into calculator, email validation, SMS, and weather categories. We have done two series of experiments. Our experiments are based on sixty five manually classified WSDL documents from the mentioned benchmark shown in Table 1. We identified service descriptions from four categories: calculator (twenty two services), email validation (six services), SMS (twenty five services), and weather (twelve services). In this section a set of experiments are reported: our matchmaker implemented with four similarity assessment methods. Summation of the values gathered from Equation (2) and Equation (4) calculates the amount of functional similarity between the request and services. The third method in Equation (5) calculates the amount of signature matching between the request and services. And Equation (7) filters the proposed services to match the user request more accurately. To evaluate the effectiveness of our proposed matchmaker we used precision and recall metrics. “Precision is the proportion of retrieved documents that are relevant, and recall is the proportion of relevant documents that are retrieved” [33]. We used the following Equations: above. Experiments showed our approach has better performance. In [37], the proposed approach focuses on finding web services that have exactly the same name and description and web services with the same function but described using different synonym words have less importance. Also they have used hierarchical structures between words that relate them with irrelevant functional semantics. For example words like “get” and “add” shouldn’t be compared. In our experiments we can see that in some cases, similarity values are rather low. In our opinion, the reason may be some faulty information in WSDL documents, for example an operation named “calcAdd” is split to “calc” and “Add” and because “calc” is meaningless in English, it is deleted. Or for another example in a WSDL document that have operations named “add”, “sub”, “mult”, and “divide”, the operation names “sub” and “mult” are discarded because they are meaningless. By finding the best synonym words and replacing them with the abbreviated words, as a solution to these problems, the results will be improved. We also compared our approach to goDiscovery [11], a non-semantic web service discovery approach based on WSDL which uses statistical methods and indexing techniques to improve precision and response time of discovery process. The same benchmark as ours has been used to evaluate goDiscovery. In order to compare our work we use F score measure that is a measure to evaluate the overall system accuracy. F score is a weighted average of Precision (P) and Recall (R) and is calculated as follows [11]: F score = 2 ∗ Recall = |true positive| |true positive| + |f alse negative| P recision = (12) |true positive| |true positive| + |f alse positive| (13) Figure 6 shows the results of our experiments. We ran different queries over four categories to obtain these results. The web services were given similarity values to the request. In our experiments we considered services whose similarities to the request were bigger than 30%. This threshold was obtained empirically as it fitted our experiments well. We compared our approach to a similar one. We implemented the approach in [37] and performed our experiments on sixty five WSDL documents, mentioned P ∗R P +R (14) Figure 7 shows the F score for our approach, goDiscovery [11] and approach in [37]. Results show that our approach outperforms the other approaches according to the overall accuracy. In our approach we focus on matching services semantically with the same functionality. We consider both exact and functionally alike results to the request. In [37], the focus is on finding the exact matches with the request. Therefore some of the potential results are not considered. goDiscovery is not a semantic-based approach but a statistical and indexing approach that is based on the retrieved information of current document. It uses them to build its model. 266 Web Service Matchmaking based on Functional Similarity in Service Cloud — K. Moradyan and O. Bushehrian Table 1. Manually Classified Web Services Category WSDL URI http://hs19.iccs.bas.bg/work/nusoap/nusoap.0.7.2/lib/addexample2.php?wsdl http://www.vejlebib.dk/webservices/mathservice.asmx?WSDL http://www.fasil.dk/CalculationService.asmx?WSDL http://mathertel.de/AJAXEngine/S01_AsyncSamples/CalcService.asmx?WSDL http://fhrg.first.fraunhofer.de:8080/linuxtoolbox/services/Calculator?wsdl http://www.tou.it/WebServices/MathService.asmx?WSDL http://www.html2xml.nl/Services/Calculator/Version1/Calculator.asmx?WSDL http://www.pggmlevensloopcalculator.nl/CalculatorService.asmx?WSDL http://wiki.vanguardsw.com/bin/ws.dsb?wsdl/Finance/Accounting/House%20Payment%20Calculator http://aspalliance.com/quickstart/aspplus/samples/services/MathService/VB/MathService.asmx?wsdl http://www.deeptraining.com/webservices/mathservice.asmx?WSDL Calculator http://websrv.cs.fsu.edu/~engelen/calc.wsdl http://www.learnasp.com/quickstart/aspplus/samples/services/MathService/VB/MathService.asmx?WSDL http://www.development.ccs.neu.edu/home/gali/project/math.asmx?WSDL http://www.cazella.com/math.asmx?WSDL http://www.etecnologia.net/samples/webservices/mathservice.asmx?WSDL http://www.fitnesswizard.com/Services/CalculationService.asmx?WSDL http://www.huberstyle.com/slxws/calc.asmx?wsdl http://reliablesoftware.net/add.asmx?WSDL http://www.ecs.syr.edu/faculty/fawcett/handouts/CSE686/code/calcWebService/Calc.asmx?WSDL http://soatest.parasoft.com/axis/calculator.wsdl http://wiki.vanguardsw.com/bin/ws.dsb?wsdl/Operations/Inventory%20Management/EOQ%20Calculator http://webservices.data-8.co.uk/EmailValidation.asmx?WSDL http://ws.serviceobjects.com/ev2/emailvalidation2.asmx?WSDL Email http://ws.serviceobjects.com/ev/EmailValidate.asmx?WSDL Validation http://ws2.serviceobjects.net/ev/EmailValidate.asmx?wsdl http://ws2.serviceobjects.net/ev2/emailvalidation2.asmx?WSDL http://wsext.ncm.com/emailvalidation.asmx?WSDL http://sms.utnet.cn/smsservice2/v2/smssend2.asmx?WSDL http://sms.utnet.cn/SmsService2/SmsSend.asmx?WSDL http://www.barnaland.is/dev/sms.asmx?WSDL http://info-messenger.de/sms.asmx?wsdl http://www.mediestrategi.dk/services/sms_server.php?wsdl http://sms.cellcom.co.il/SmsGate/SmsGate.asmx?wsdl http://sms.cellcom.co.il/SmsGate/SmsGate2.asmx?WSDL http://wap.icellcom.co.il/SmsGate/SmsGate2.asmx?wsdl http://wap.icellcom.co.il/SmsGate/SmsGate.asmx?WSDL http://usm.unicell.co.il/sms.asmx?WSDL http://extra.vianett.no/webservices/vianett_sms_service/SMS_Service.asmx?WSDL http://avalon.bitsyu.net/GamaSMS/SMSService.asmx?WSDL SMS http://www.smsdome.com/portalvbvs/services/smsgateway.asmx?wsdl http://www.infocountry.co.za/za/default/webservice/SMSService.asmx?WSDL http://chat.siberalem.com/SMS.asmx?wsdl http://ws.cdyne.com/SmsWS/SMS.asmx?wsdl http://ser02.2sms.com/WebServices/1.0/SMSService.asmx?WSDL http://www.2sms.com/WebServices/1.1/SMSService.asmx?WSDL http://2sms.com/WebServices/1.4/SMSService.asmx?WSDL http://www.jajah.com/api/SMSService.asmx?wsdl http://xms.admad.mobi/SMS.asmx?WSDL http://www.geekymonkey.com/sms/smsservice.asmx?wsdl http://sms.mobileart.com/sms_receive.wsdl http://sms.yakoon.com/sms.asmx?WSDL http://203.113.172.33/mssgateway/SMS_WS.asmx?WSDL http://netpub.cstudies.ubc.ca/dotnet/WebServices/WeatherService.asmx?wsdl http://www.thinkpage.cn/weather/WeatherService.asmx?WSDL http://www.premis.cz/PremisWS/WeatherWS.asmx?WSDL http://www.webservicex.net/WeatherForecast.asmx?wsdl http://ws.cdyne.com/WeatherWS/Weather.asmx?wsdl http://oscar.snapps.com/portal.nsf/WeatherService?WSDL Weather http://asyncpostback.com/WeatherService.asmx?WSDL http://wsrp.bea.com/portal/boulder/weather.wsdl http://lostsprings.com/weather/WeatherService.asmx?WSDL http://www.riptrails.com/webservices/Weather.asmx?WSDL http://weather.shellware.com/weather.asmx?wsdl http://www.deeptraining.com/webservices/weather.asmx?wsdl 267 October 2015, Volume 2, Number 4 (pp. 257–270) Recall Precision Recall [37] Precision [37] 100 90 80 Percent 70 60 50 40 30 20 10 0 Weather Calculator Email Validation SMS Service Category Figure 6. Results of the First Experiment 100 95 Average F_score 90 90.66 91.81 approach in [37] goDiscovery 93.18 85 80 75 70 65 60 our approach Approach Figure 7. Comparison of Overall Accuracy of Three Approaches 9 Conclusion and Future Work In this paper we make complementary contributions to web service matchmaking. We design an ontology to formulate a user’s query structure. To do so, we propose a Query Ontology that persuades users to prepare a structured request with full information needed to do matchmaking. It also verifies the user request logically before delivering it to matchmaking process which is a time consuming process. So requests with no answer are discarded before processing, which helps matchmaking algorithms to process only appropriate requests and save time. We also propose a semantic-based matchmaking algorithm based on WSDL documents to discover services based on their functionalities. Service matchmaking categorizes web services into four categories and for each category a similarity assessment method is given. Experimental results show that our approach has an improved precision in retrieving services in comparison to previous approaches. It is hard to push programmers to use a formal specification for each web service they write. There are problems like respecting Camel Case writing convention used by programmers, using abbreviated words in defining web services, and considering relationship between words which belong to a specific domain. In the future we plan to conquer such problems and obstacles in our matchmaking method to get better results. References [1] Debajyoti Chougule. Mukhopadhyay and Archana A survey on web service discov- 268 Web Service Matchmaking based on Functional Similarity in Service Cloud — K. Moradyan and O. Bushehrian [2] [3] [4] [5] [6] [7] [8] [9] [10] ery approaches. In Advances in Computer Science, Engineering & Applications, pages 1001–1012. Springer, 2012. URL http: //link.springer.com/chapter/10.1007/ 978-3-642-30157-5_99. Khalid Elgazzar, Hossam S. Hassanein, and Patrick Martin. Daas: Cloud-based mobile web service discovery. Pervasive and Mobile Computing, 13:67–84, 2014. URL http://www.sciencedirect.com/science/ article/pii/S1574119213001405. Eyhab Al-Masri and Qusay H. Mahmoud. Investigating web services on the world wide web. In Proceedings of the 17th international conference on World Wide Web, pages 795– 804. ACM, 2008. URL http://dl.acm.org/ citation.cfm?id=1367605. Khalid Elgazzar, Ahmed E. Hassan, and Patrick Martin. Clustering wsdl documents to bootstrap the discovery of web services. pages 147–154. IEEE, 2010. ISBN 9781-4244-8146-0. doi: 10.1109/ICWS.2010.31. URL http://ieeexplore.ieee.org/lpdocs/ epic03/wrapper.htm?arnumber=5552792. K. Chandrasekaran. Essentials of Cloud Computing. CRC Press, 2014. ISBN 978-1-4822-0543-5. G. Khan, S. Sengupta, and A. Sarkar. Wsrm: A relational model for web service discovery in enterprise cloud bus (ecb). In 2014 3rd International Conference on Eco-friendly Computing and Communication Systems (ICECCS), pages 217– 222, 2014. doi: 10.1109/Eco-friendly.2014.51. G. Sambasivam, V. Ravisankar, T. Vengattaraman, R. Baskaran, and P. Dhavachelvan. A normalized approach for service discovery. Procedia Computer Science, 46:876–883, 2015. ISSN 18770509. doi: 10.1016/j.procs.2015.02. 157. URL http://linkinghub.elsevier.com/ retrieve/pii/S1877050915002215. David Martin, Mark Burstein, Jerry Hobbs, Ora Lassila, Drew McDermott, Sheila McIlraith, Srini Narayanan, Massimo Paolucci, Bijan Parsia, Terry Payne, et al. Owl-s: Semantic markup for web services (2004). Latest version available from: http://www. ai. sri. com/daml/services/owl-s/1.2, 2009. Holger Lausen, Axel Polleres, and Dumitru Roman. Web service modeling ontology (wsmo). W3C Member Submission, 3, 2005. Tom Narock, Victoria Yoon, and Sal March. A provenance-based approach to semantic web service description and discovery. Decision Support Systems, 64:90–99, 2014. ISSN 01679236. doi: 10.1016/j.dss.2014.04.007. URL http: //linkinghub.elsevier.com/retrieve/pii/ S0167923614001286. [11] Yehia Elshater, Khalid Elgazzar, and Patrick Martin. godiscovery: Web service discovery made efficient. pages 711–716. IEEE, 2015. ISBN 978-1-4673-7272-5. doi: 10.1109/ICWS.2015.99. URL http://ieeexplore.ieee.org/lpdocs/ epic03/wrapper.htm?arnumber=7195634. [12] John Garofalakis, Yannis Panagis, Evangelos Sakkopoulos, and Athanasios Tsakalidis. Contemporary web service discovery mechanisms. J. Web Eng., 5(3):265–290, 2006. URL http://www.researchgate.net/ profile/Yannis_Panagis/publication/ 220538231_Contemporary_Web_ Service_Discovery_Mechanisms/links/ 0fcfd50c04ce0afae6000000.pdf. [13] Benjamin Kanagwa and Agnes F. N. Lumaala. Discovery of services based on wsdl tag level combination of distance measures. International Journal of Computing and ICT Research, 5, 2011. URL http://ijcir.mak.ac.ug/ specialissue2011/article3.pdf. [14] Jian Wu, Liang Chen, Yanan Xie, and Zibin Zheng. Titan: a system for effective web service discovery. In Proceedings of the 21st international conference companion on World Wide Web, pages 441–444. ACM, 2012. URL http: //dl.acm.org/citation.cfm?id=2188069. [15] Hong-Hai Do and Erhard Rahm. Coma: a system for flexible combination of schema matching approaches. In Proceedings of the 28th international conference on Very Large Data Bases, pages 610–621. VLDB Endowment, 2002. URL http: //dl.acm.org/citation.cfm?id=1287422. [16] AnHai Doan, Pedro Domingos, and Alon Y. Halevy. Reconciling schemas of disparate data sources: A machine-learning approach. In ACM Sigmod Record, volume 30, pages 509– 520. ACM, 2001. URL http://dl.acm.org/ citation.cfm?id=375731. [17] Erhard Rahm and Philip A. Bernstein. A survey of approaches to automatic schema matching. the VLDB Journal, 10(4):334– 350, 2001. URL http://link.springer.com/ article/10.1007/s007780100057. [18] Xia Wang and Wolfgang A. Halang. Discovery and Selection of Semantic Web Services, volume 453 of Studies in Computational Intelligence. Springer Berlin Heidelberg, Berlin, Heidelberg, 2013. ISBN 978-3-642-339370. URL http://link.springer.com/10.1007/ 978-3-642-33938-7. [19] Qiang He, Jun Yan, Yun Yang, Ryszard Kowalczyk, and Hai Jin. A decentralized service discovery approach on peer-to-peer networks. IEEE Transactions on Services Computing, 6(1):64–75, 2013. ISSN 1939-1374. doi: 10.1109/TSC.2011.31. October 2015, Volume 2, Number 4 (pp. 257–270) [20] [21] [22] [23] [24] [25] [26] URL http://ieeexplore.ieee.org/lpdocs/ epic03/wrapper.htm?arnumber=5928313. Sourish Dasgupta, Anoop Aroor, Feichen Shen, and Yugyung Lee. Smartspace: Multiagent based distributed platform for semantic service discovery. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 44(7):805–821, 2014. ISSN 2168-2216, 2168-2232. doi: 10.1109/TSMC.2013.2281582. URL http://ieeexplore.ieee.org/lpdocs/ epic03/wrapper.htm?arnumber=6616615. Kyriakos Kritikos and Dimitris Plexousakis. Novel optimal and scalable nonfunctional service matchmaking techniques. IEEE Transactions on Services Computing, 7(4):614–627, 2014. ISSN 1939-1374. doi: 10.1109/TSC.2013.11. URL http://ieeexplore.ieee.org/lpdocs/ epic03/wrapper.htm?arnumber=6464254. Vincenzo Suraci, Claudio Gori Giorgi, Stefano Battilotti, and Francisco Facchinei. Distributed workload control for federated service discovery. Procedia Computer Science, 56:233–241, 2015. ISSN 18770509. doi: 10.1016/j.procs.2015.07. 221. URL http://linkinghub.elsevier.com/ retrieve/pii/S1877050915017020. Andrea Zisman, George Spanoudakis, James Dooley, and Igor Siveroni. Proactive and reactive runtime service discovery: A framework and its evaluation. IEEE Transactions on Software Engineering, 39(7):954–974, 2013. ISSN 0098-5589, 1939-3520. doi: 10.1109/TSE.2012.84. URL http://ieeexplore.ieee.org/lpdocs/ epic03/wrapper.htm?arnumber=6375710. Gilbert Cassar, Payam Barnaghi, and Klaus Moessner. Probabilistic matchmaking methods for automated service discovery. IEEE Transactions on Services Computing, 7(4):654–666, 2014. ISSN 1939-1374. doi: 10.1109/TSC.2013.28. URL http://ieeexplore.ieee.org/lpdocs/ epic03/wrapper.htm?arnumber=6515118. A. V. Paliwal, B. Shafiq, J. Vaidya, Hui Xiong, and N. Adam. Semantics-based automated service discovery. IEEE Transactions on Services Computing, 5(2):260–275, 2012. ISSN 1939-1374. doi: 10.1109/TSC.2011.19. URL http://ieeexplore.ieee.org/lpdocs/ epic03/wrapper.htm?arnumber=5744076. Tamer Ahmed Farrag, Ahmed Ibrahim Saleh, and Hesham Arafat Ali. Toward swss discovery: Mapping from wsdl to owl-s based on ontology search and standardization engine. IEEE Transactions on Knowledge and Data Engineering, 25(5):1135–1147, 2013. ISSN 1041-4347. doi: 10.1109/TKDE.2012.25. URL http://ieeexplore.ieee.org/lpdocs/ epic03/wrapper.htm?arnumber=6152104. [27] Jordy Sangers, Flavius Frasincar, Frederik Hogenboom, and Vadim Chepegin. Semantic web service discovery using natural language processing techniques. Expert Systems with Applications, 40(11):4660–4671, 2013. ISSN 09574174. doi: 10.1016/j.eswa.2013.02. 011. URL http://linkinghub.elsevier.com/ retrieve/pii/S0957417413001279. [28] Matthias Klusch, Benedikt Fries, and Katia Sycara. Owls-mx: A hybrid semantic web service matchmaker for owl-s services. Web Semantics: Science, Services and Agents on the World Wide Web, 7(2):121–133, 2009. URL http://www.sciencedirect.com/ science/article/pii/S1570826808000838. [29] Matthias Klusch, Patrick Kapahnke, and Benedikt Fries. Hybrid semantic web service retrieval: A case study with owls-mx. In Semantic Computing, 2008 IEEE International Conference on, pages 323–330. IEEE, 2008. [30] Passent El-Kafrawy, Emad Elabd, and Hanaa Fathi. A trustworthy reputation approach for web service discovery. Procedia Computer Science, 65:572–581, 2015. ISSN 18770509. doi: 10.1016/j.procs.2015.09.001. URL http: //linkinghub.elsevier.com/retrieve/pii/ S1877050915028318. [31] Szu-Yin Lin, Chin-Hui Lai, Chih-Heng Wu, and Chi-Chun Lo. A trustworthy qos-based collaborative filtering approach for web service discovery. Journal of Systems and Software, 93:217–228, 2014. ISSN 01641212. doi: 10.1016/j.jss.2014.01. 036. URL http://linkinghub.elsevier.com/ retrieve/pii/S0164121214000442. [32] M. Suchithra and M. Ramakrishnan. Efficient discovery and ranking of web services using nonfunctional qos requirements for smart grid applications. Procedia Technology, 21:82–87, 2015. ISSN 22120173. doi: 10.1016/j.protcy.2015.10. 013. URL http://linkinghub.elsevier.com/ retrieve/pii/S2212017315002418. [33] Jian Wu and Zhaohui Wu. Similarity-based web service matchmaking. In Services Computing, 2005 IEEE International Conference on, volume 1, pages 287–294. IEEE, 2005. [34] Erik Christensen, Francisco Curbera, Greg Meredith, and Sanjiva Weerawarana. Web services description language (wsdl) 1.1, 2001. [35] Daniel Menasce. Qos issues in web services. Internet Computing, IEEE, 6(6):72–75, 2002. [36] Yilei Zhang, Zibin Zheng, and Michael R. Lyu. Wsexpress: A qos-aware search engine for web services. In Web Services (ICWS), 2010 IEEE International Conference on, pages 91–98. IEEE, 2010. [37] Yiqiao Wang and Eleni Stroulia. Semantic 269 270 Web Service Matchmaking based on Functional Similarity in Service Cloud — K. Moradyan and O. Bushehrian structure matching for assessing web-service similarity. In Service-Oriented ComputingICSOC 2003, pages 194–207. Springer, 2003. URL http://link.springer.com/chapter/ 10.1007/978-3-540-24593-3_14. Kourosh Moradyan received his Associate Degree in computer software from University of Kurdistan, Sanandaj, IRAN and his B.Sc. degree in software engineering from Zagros Institute of Higher Education, Kermanshah, IRAN. He also received his M.Sc. degree in software engineering from Shiraz University of Technology, Shiraz, IRAN. His research interests include Semantic Web, and Web Service Discovery. Omid Bushehrian received the BSc in software engineering from AmirKabir University of Technology (Tehran Polytechnics), IRAN, the MSc and Ph.D degrees in software engineering from the University of Science and Technology, IRAN. He is currently a faculty member at Shiraz university of Technology, IRAN. His main research interest is cloud computing and software quality.