Ontology based semantic data validation
Transcription
Ontology based semantic data validation
Centre for Knowledge Analytics and Ontological Engineering http://www.kanoe.org (World Bank/TEQIP II Funded) MOEKA - Memoranda in Ontological Engineering and Knowledge Analytics MOEKA.2014.4 Ontology-Based Semantic Data Validation Raksha P. S. PES UNIVERSITY (Established under Karnataka Act No. 16 of 2013) Ring Road, Banashankari III Stage, Bangalore-560 085, India Publication of this Technical Report and the research work presented here was supported in part by the World Bank/Government of India research grant under the TEQIP programme (subcomponent 1.2.1) to the Centre for Knowledge Analytics and Ontological Engineering (KAnOE), http://kanoe.org, at PES University (formerly PES Institute of Technology), Bangalore, India. TABLE OF CONTENTS Abstract 1. Introduction 2. Literature survey 2.1 Description logics as ontology languages for the semantic web 2.2 An owl dl reasoner - pellet 2.3 Tableaux algorithm for description logic 2.4 Hermit-hyper tableaux algorithm for reasoning description logic 3. System design 3.1 Basic design 4. Detailed design 4.1 System architecture 4.2 Flow chart 4.3 Database schema 5. Implementation 5.1 Sources of conflict 5.2 Creating an instance 5.3 Deleting an instance 5.4 Adding or editing property value 5.4.1 Adding or editing datatype property 5.4.2 Adding or editing object property 5.5 deleting a property value 5.5.1 Deleting datatype property value 5.5.2 Deleting object property value 6. Software testing 6.1 Test cases and results 7. User manual 8. Conclusion and future work 8.1 Conclusion 8.2 Future work References 1 4 5 6 7 9 11 12 13 14 15 18 23 24 24 25 26 26 29 32 33 34 37 38 44 72 73 74 75 LIST OF FIGURES Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Figure 10 Figure 11 Figure 12 Figure 13 Figure 14 Figure 15 Figure 16 Figure 17 Figure 18 Figure 19 Figure 20 Figure 21 Figure 22 Figure 23 Figure 24 Figure 25 Figure 26 Figure 27 Figure 28 Figure 29 Figure 30 Figure 31 Figure 32 Figure 33 Figure 34 Figure 35 Figure 36 Figure 37 Figure 38 Figure 39 Basic design 1 Basic design 2 System architecture Process flow chart User input flow chart Flow chart for editing instance Creating an instance Deleting an instance Adding or editing datatype property value Adding and editing object property value Deleting datatype property value Deleting object property value Input an ontology file Options for a class Details of new instance Cannot create 2 instances with same name Edit an instance Add property value Add property value successful Domain violation Range violation Maximum cardinality violation Edit property value Edit property value with existing values Edit property value range violation Edit property value successful Edit property value no values for property Edit property value has value violation Delete property value Delete property value selecting a value Delete property value successful Delete property value some values from violation Delete property value exact cardinality violation Delete property value min cardinality violation Delete an instance Delete an instance successfully Delete an instance no instance Delete an instance minimum cardinality violation Delete an instance some values from violation 12 12 14 15 16 17 24 25 26 29 33 34 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 LIST OF TABLES Table 1 Concept 18 Table 2 Property 18 Table 3 Instance 19 Table 4 Domain 19 Table 5 Ranges 20 Table 6 Sub_class_of 20 Table 7 Sub_property_of 21 Table 8 Instance_property_value 21 Table 9 Concept_property_constraint 22 Table 10 Unit test for creating an instance 38 Table 11 Unit test for deleting an instance 39 Table 12 Unit test for adding a property for an instance 39 Table 13 Unit test for detecting violation while editing property value of an instance 40 Table 14 Unit test for detecting violation while deleting a property value of an instance 40 Table 15 Unit test for editing property value of an instance 41 Table 16 Unit test for deleting a property value of an instance 41 Table 17 Performance test for 10000 instance property values 42 Table 18 Performance test to compare with different reasoners 43 ABSTRACT With the increasing impact of internet in day-to-day life, validation of user data has become a major challenge. Even though many reasonable methods are available for syntactic validation today, there are still many drawbacks and inefficiencies in the semantic validation methods available. Semantic validation refers to checking whether data conforms to a specified ontology. Use of Web Ontology Language (OWL) standard for representing ontology makes the process of semantic validation more structured and efficient. In this project, we try to address the issue of semantic data validation using ontological techniques. We try to develop an ontological tool that can be applied to any domain. The tool performs reasoning on the given ontology and stores the inferences in the database. As soon as the user enters data, the tool checks if the data is valid against the given ontology by using these pre-computed inferences. Presently available reasoners are highly inefficient since they are designed to compute the deductive closure of a set of facts or rules. This project has designed and implemented an efficient reasoning algorithm for the specific task of semantic data validation. INTRODUCTION Ontology based semantic data validation CHAPTER 1 INTRODUCTION Internet is the fastest growing source of information today. Huge amounts of data reside on the internet and continue to increase every day. It is very important to validate the data being added on to the internet in order to maintain the integrity and originality of the data. Validating such data poses a major challenge. Validation of data can be either syntactic or semantic. Syntactic validation is superficial and refers to the process of verifying if data conforms to the rules of syntax. There are many methods available for syntactic validation, most of which perform well, whereas only few semantic validators are available. Semantic validation refers to the process of verifying that the data elements are logically valid. For example consider a form field value that represents a date. This date value must be written according to a certain format to be syntactically valid. In order to be interpreted by the application, the date value also have to follow certain rules (such as date should refer a date in the future or later than a particular date) to be semantically valid. Description logic denotes a family of knowledge representation formalisms that model the application domain by defining the relevant concepts of the domain and then using these concepts to specify properties of objects and individuals occurring in the domain. Semantic data model is a software model where the data is organized in such a way that it can be interpreted meaningfully without human intervention.Semantic web represents the web of semantic data that can be processed by machines as well as human users. To make sure that both machines and human users have a common understanding of terms, it needs ontologies in which these terms are specified precisely, and which thus establish a joint terminology between them. Ontology is a rigorous and exhaustive organization of the knowledge domain.Description logic provides a formal framework for the Web ontology language (OWL), which has been proposed as a standard. Use of OWL, which is a standard for representing ontology, in the process of semantic validation makes it more organized and efficient. Any application developed for semantic web requires a sound and complete reasoning capability to function properly. A reasoner is a piece of software that has the capability to infer logical consequences from a set of asserted facts or axioms. There are known, effective reasoning algorithms for description logic. Some of the available reasoners today implement the variants of the tableaux reasoning algorithm, where the basic idea is to compute the deductive closure of a KAnOE, PESIT 2013-14 Page 2 Ontology based semantic data validation set of facts and rules. These sets of validation rules and facts for a specific ontology are stored in the reasoner, which will help the reasoner to infer the logical consequences. These existing reasoners are highly inefficient because they are not dedicated to specific task. A reasoner designed for a specific task can be highly efficient and suitable for the process of semantic data validation. In this project, we have developed an ontological validation tool, which is domain independent, to perform the specific task of semantic data validation by implementing an efficient reasoning algorithm. This tool uses semantics and ontological techniques to implement the reasoning algorithm, which is specifically designed to perform data validation with respect to the given ontology. When an ontology file is given as input, the tool performs reasoning on the ontology and stores the inferences in the database. As soon as the user enters new data in a particular field, the toolcomputesif the data is semantically valid with respect to the ontology, within a short time. If data is not valid,tool notifies the user with a suitable diagnostic message. KAnOE, PESIT 2013-14 Page 3 LITERATURE SURVEY Ontology based semantic data validation CHAPTER 2 LITERATURE SURVEY 2.1 Description logics as ontology languages for the semantic web: The Semantic Web aims to build machine-understandable Web resources,whose information can then be shared and processed both by automated tools,such as search engines, and by human users. This sharing of information between different agents requires semantic mark-up. To make sure that different agents have a common understanding of these terms, one needs ontologiesin which these terms are specified precisely, and which thus establish a shared terminology between the agents. The use of ontologies in this context requires a well-designed, well-defined, and Webcompatible ontology language with supporting reasoning tools. The syntax of this language should be both intuitive to human users and compatible with existing Web standards. Its semantics should be formally specified and its expressive power should be adequate. Reasoning is an important consideration in the design of ontology. It can be employed in different development phases. During ontology development, it can be used to test whether concepts are consistent and to derive implied relations. In particular, one usually wants to compute the concept hierarchy. Interoperability and integration of different ontologies is also an important issue. Reasoning may also be used when the ontology is deployed, i.e., when a Web page is already annotated with its concepts. High quality ontologies are crucial for the Semantic Web, and their construction, integration, and evolution greatly depends on the availability of a well-defined semantics and powerful reasoning tools. Since Description logics provide for both, they are ideal candidates for ontology languages. Description logics (DLs) are a family of knowledge representation languages that can be used to represent the knowledge of an application domain in a way that is both structured and formally well-specified. Regarding an ontology language for the SemanticWeb, there was a joint US/EU initiative for a W3C ontology standard, which was for historical reasons called DAML+OIL [1]. This language has a syntax based on RDF Schema [2], and it is based on common ontological primitives from Frame Languages (which support human understandability). Its semantics can be defined by a translation into an expressive version of Description Logic, known as DLSHIQ [3], KAnOE, PESIT 2013-14 Page 5 Ontology based semantic data validation and the developers have tried to find a good compromise between expressiveness and the complexity of reasoning. Although reasoning in SHIQis decidable, it has a rather high worst-case complexity. Nevertheless, there is an optimized SHIQreasoner (FaCT) [4] available, which behaves quite well in practice. Some of the features of SHIQ make the DL expressive enough to be used as an ontology language. Firstly, SHIQprovides number restrictions that are more expressive than other versions. Secondly, SHIQallows the formulation of complex terminological axioms. Thirdly, SHIQalso allows for inverse roles, transitive roles, and subroles. FaCT is based on the tableaux reasoning algorithm, which computes the deductive closure of the axioms. Thus it is highly inefficient. In the paper called Description Logics as Ontology Languages for the Semantic Web [5],the authors describe what description logics are and what they can do for the semantic web. They also argue that, without the last decade of basic research in this area, description logics could not play such an important role in this domain. 2.2 An OWL DL Reasoner - Pellet: Reasoning capability is of crucial importance to many applications developed for the semantic web. Description Logics provide sound and complete reasoning algorithms that can effectively handle the DL fragment of the Web Ontology Language (OWL) [6]. However, Existing DL reasoners, most notably FaCT [4], is quite efficient but do not meet some important requirements. In general, a Semantic Web reasoner should handle individuals (provide ABox reasoning), should not make the Unique Name Assumption, should support entailment checks, should answer conjunctive ABox queries and should work with XML Schema datatypes. Pellet [7] was developed to address these issues. Pellet has many good features which makes it a good choice for various light-weight situations. It performs ontology analysis and repair by incorporating a number of heuristics to detect “DLizable” OWL Full ontologies and repair them. Pellet provides support for entailment in semantic web. It also provides support for query answering using the “rolling-up” technique [7]. A knowledge base is a set of axioms and assertions, written using a specific language. The terminology, or TBox, of the knowledge base consists of the set of axioms that define new concepts. The world description, assertional knowledge, or ABox of the knowledge base consists of the set of assertions. The TBox expresses intentional knowledge, which is typically stable, whereas the ABox captures extensional knowledge, which changes as the world evolves.Pellet is based on Tableaux algorithm developed for expressive DL. In the Pellet system, OWL ontology KAnOE, PESIT 2013-14 Page 6 Ontology based semantic data validation is parsed into RDF triples, which in turn are converted into assertions and axioms in the knowledge base, while Pellet validates the ontology. Pellet stores the axioms about classes in the TBox components and stores the assertions about individuals in the ABox component. The Tableau reasoner uses the standard tableau rules and includes various standard optimizations such as dependency directed back jumping, semantic branching and early blocking strategies [7]. Datatype reasoning for the built-in and derived primitive XML schema data types are also supported. In the paper called Pellet: An OWL DL reasoner [7], the authors expose the capabilities of Pellet from a Java API, a command-line interface, and a web form. They use this for classification, class satisfiability testing, query, and species validation and repair. In the paper called Pellet: A Practical OWL-DL Reasoner [8], the authors specify the architecture and various features of Pellet. While evaluating the performance of Pellet, the authors found that, Pellet is not very efficient for classification. They also found that Pellet requires very less time for consistency checking. Pellet performs better than any other reasoner in query answering; on the other hand it is not so efficient in TBox reasoning tasks. 2.3 Tableaux algorithm for Description Logic: The realization of a sizeableSemantic Web calls for computing with very large ontologies. Recently, there is growing attention on utilizing distributed or parallel computing schemes to speed up reasoning with semantic web ontologies.However, the rule-based inference scheme used by these methods is incomplete by its nature, and it is hard to extend to more expressive ontologies. Parallelizing the tableau algorithm utilizes both non-deterministic nature (Generates different possible tableaux) and independence between tableau branches and is very effective. The basic idea of the tableau algorithm is to check concept satisfiability with respect to a knowledge base by constructing a common model of the concept and the knowledge base.A tableau algorithm for a specific DL language contains the following main elements [9]: 1. A completion graph or a tableau that represents a model of the DL language. 2. A set of tableau expansion rules to construct a complete and consistent completion graph. 3. A set of blocking rules to detect infinite cyclic models and ensure termination. 4. A set of clash conditions to detect logical contradictions. KAnOE, PESIT 2013-14 Page 7 Ontology based semantic data validation The basic process of tableau algorithms with respect to description logic ALC for an ALC TBox T and an ALC concept C, constructs a common model for both T and C, checking the satisfiability of C with respect to T. If one such model is found, C is satisfiable, otherwise C is unsatisfiable. Before the reasoning process starts, the concepts in T and Cshould be transformed into the Negation Normal Form (NNF), i.e., where negation only occurs in front of atomicconcepts.Reasoning with respect to a TBox T can be reduced to reasoning with respect to an empty TBox with the internalization technique. For an ALC knowledge base, a completion graph or a tableau T = <V, E,L> is a tree, where V is the node set, E is the edge set, L is a function that assigns labels for each node and edge. Each node x in the tree represents an individual in the domain of the model, and the label L(x) contains all concepts of which x is an instance. Each edge <x, y> represents a set of role instances in the model, and the label L(<x, y>) contains the names of those roles. Given a concept C and a TBox T, the tableau is a tree expanded from an initial root node x0 with the help of expansion rules. To ensure termination, a node can be blocked with the subset blocking strategy. No expansion rule will be applied to a blocked node. In fact, a blocked node prevents the cyclic application of tableau expansion rules, and hence represents infinitely many similar individuals in the model. An ALC tableau contains a clash if there is {C, ¬C} belongs to L(x) for some node x and concept C. A tableau is consistent if it contains no clash, and is complete if no expansion rule can be applied. The given concept is satisfiable if and only if the algorithm finds a consistent and complete tableau. Note that this method is non-deterministic and it generates different possible tableaux. Once a chosen search path leads to a clash, the algorithm needs to track back to the tableau state before the choice, and try other remaining choices. In the paper called A Distributed Tableau Algorithm for the ALC Description Logic [9], theauthors describe the parallelization of the ALC tableau algorithm. They take advantage of several properties of an ALC tableau that can be used in a parallel implementation of the algorithm. First, non-deterministic choices can be independently handled. If multiple choices exist at a node, each choice may be handled by an individual process node. Second, different branches of the tableau tree, in fact, each node on the tree, can be expanded independently. They show that the proposed parallel algorithm can be realized using the MapReduce framework [10], by representing tableau as key-value pairs. KAnOE, PESIT 2013-14 Page 8 Ontology based semantic data validation 2.4 HERMIT-Hyper tableaux algorithm for reasoning description logic: HermiT [11] is a Description Logic reasoning system based on anarchitecture which addresses two sources of complexity of tableaux reasoners. First, there are often a great number of different possible constructions which might be models. Second, the models built by tableaux reasoners can be extremely large. HermiT implements a “hypertableau” calculus which greatly reduces the number of possible models which must be considered (down to only a single possibility for a significant subset of ontologies). HermiT also incorporates the “anywhere blocking” strategy, which limits the sizes of models which are constructed. Finally, HermiT makes use of a novel and highly-efficient approach to handling nominal in the presence of number restrictions and inverse roles; this will allow ontology authors to make much freer use of nominals than has been possible to date. This combination of fundamental algorithmic improvements also enables a range of additional optimizations. On OWL ontology O can be divided into three parts: the property axioms, the class axioms, and the facts. These correspond to the RBox R, TBox T, and ABox A of a Description Logic knowledge base. To show that a knowledge base K = (R, T, A) is satisfiable, a tableau algorithm constructs a derivation—a sequence of ABoxes by application of inference rule. The algorithm terminates either if no inference rule is applicable anymore or there is a contradiction. The knowledge base K is unsatisfiable if and only if all choices fail to construct a model. Handling disjunctions through reasoning by case is often called or-branching. Tableau algorithms are usually free to choose the order in which they process the assertions in an ABox, thus leading to a non-deterministic model. Various absorption optimizations have been developed to address this problem. The basic absorption algorithm tries to rewrite TBox axioms into the form B ⊑C where B is an atomic concept. Then, instead of deriving ¬B ⊔C for each individual in an ABox, C(s) is derived only if the ABox contains B(s); thus, the absorbed axioms can be applied in a “more deterministic” way. However, it is often unclear in advance which combinations of transformation and absorption techniques will yield the best results; absorption algorithms are, therefore, typically guided primarily by heuristics and may not eliminate all non-determinism. HermiT‟s hyper tableau algorithm generalizes these absorption optimizations by rewriting description logic axioms into a form which allows standard absorption, role absorption, and binary absorption to be performed simultaneously, as well as allowing additional types of “absorption” impossible in standard tableau calculi. KAnOE, PESIT 2013-14 Page 9 Ontology based semantic data validation Standard tableau algorithms only allow individuals to be blocked by their ancestors—this is called ancestor blocking. This causes the derivation for Kto terminate but results in an exponentially-large construction. HermiT extends this blocking strategy such that an individual can be blocked by (almost) any other individual. On Kthis improved anywhere blocking approach results in the construction of a considerably small model. Anywhere blocking can reduce the size of generated models by an exponential factor, and this substantially improves real-world performance on many difficult and complex ontologies. Although anywhere blocking can often prevent the creation of multiple copies of identical individuals, it is not uncommon for tableau procedures to produce models containing a great many very similar individuals. If an expression ∃R.C occurs in different parts of a partiallyconstructed model, then multiple individuals labeled with C will be created, and if the structures surrounding these new individuals differ in any way, then one will not block the other. HermiT takes advantage of this observation through individual reuse: when it expands an existential ∃R.C it first attempts to re-use some existing individual labeled with C to construct a model, and only if this model construction fails doesit introduce a new individual. This approach allows HermiT to consider non-tree-shaped models, and drastically reduces the size of models produced for ontologies which describe complex structures, such as ontologies of anatomy. In the paper called HermiT: A Highly-Efficient OWL Reasoner[11], the authors describe the various features of the HermiT OWL reasoner and how it overcomes the problems of tableau method using hyper-tableau method. They also evaluate the performance of HermiT in comparison with other reasoners. Their tests show that HermiT is usually much faster than other reasoners when classifying complex ontologies and it is able to classify a number of ontologies which no other reasoner has been able to handle. After doing a literature survey on various reasoners, we identified that most of the reasoners today implement variants of tableaux reasoning algorithm and are highly inefficient. There are 2 main drawbacks with all these reasoners. First, the tableaux reasoners are highly nondeterministic, since there are a great number of different possibilities of construction. Second, models developed by the tableaux reasoners are extremely large even for relatively small ontologies. On the other hand we also identified that semantic data validation is a major challenge today and none of the reasoners are dedicated for this specific task. Hence we have developed an efficient ontology based semantic data validator which overcomes the drawbacks of existing reasoner and is designed to perform the specific task of semantic data validation. KAnOE, PESIT 2013-14 Page 10 SYSTEM DESIGN Ontology based semantic data validation CHAPTER 3 SYSTEM DESIGN 3.1 BASIC DESIGN: USER INTERFACE TOOL ONTOLOGY Figure 1: Basic design 1 Figure1 shows the first part of the basic design of the ontology based semantic data validating tool. When ontology is given as input to the tool, it performs reasoning and produces a user interface where in the user can enter the data. User interface basically consists of various data fields according to the ontology given. Computes validity User enters data USER INTERFACE VALID TOOL INVALID Figure 2: Basic design 2 Figure 2 shows the second part of the basic design of the tool. As soon as the user enters the data into any data field of the user interface, the tool performs reasoning and computes if the data is valid or not according to the given ontology. If the data is valid, the user can proceed with entering data into other fields. KAnOE, PESIT 2013-14 Page 12 DETAILED DESIGN Ontology based semantic data validation CHAPTER 4 DETAILED DESIGN 4.1 SYSTEM ARCHITECTURE: ONTOLOGY Reasoning LOGICAL INFERENCES Store Data Base VALID USER DATA VALIDATOR INVALID Figure 3: System Architecture Figure 3 shows the system architecture of the tool. When the ontology is given, reasoning is done on it and all possible logical inferences are calculated. These inferences are stored in the database for future use. Once the user enters data into any field, logical inferences specific to the data field are calculated. Using both the inferences the tool will interpret if the data entered by the user is valid or invalid. KAnOE, PESIT 2013-14 Page 14 Ontology based semantic data validation 4.2 FLOW CHART: Flow chart of the whole process of semantic validation is shown in the diagram below: START INPUT OWL FILE PARSE THE INPUT FILE, STORE THE ONTOLOGY IN THE DATABASE GENERATE USER INTERFACE INPUT USER DATA Continue VALIDATE THE USER DATA AGAINST THE ONTOLOGY STOP Figure 4: Process flow chart The process starts when the ontology file is given as input. Reasoning is done on the axioms/facts available in the OWL file and all the possible inferences are stored in the database for further use. Based on these inferences a user interface in generated where the user can enter data. As soon as the user enters data, it is validated against the inferences stored in the database. As long as the user enters data the tool keeps validating the data. KAnOE, PESIT 2013-14 Page 15 Ontology based semantic data validation Flow chart of the process of validating user input instantly is shown below: START INPUT USER DATA CREATE INSTANCE DELETE INSTANCE IF USER CHOOSES A CLASS INPUT DETAILS OF THE INSTANCE INPUT DETAILS OF THE INSTANCE EDIT INSTANCE A IF THE INSTANCE DOESNOT EXISTS IF DELETING INSTANCE DOESNOT CREATE ANY CONFLICT NO YES YES CREATE NEW INSTANCE YES NOTIFY USER DELETE THE INSTANCE MORE USER DATA NO STOP Figure 5: User input flow chart Figure 5 shows how the tool validates the user data. As soon as the user gives input and choose a class tool will give three options of creating, editing or deleting an instance of the class. If user chooses to create a new instance, tool will input the details of new instance and if the instance does not exist a new instance will be created. If the instance already exists then tool notifies the user. If user chooses to delete an instance, tool will check if deleting this instance will create any other conflict. If no conflicts arise then tool will delete the instance, otherwise tool will notify the user. KAnOE, PESIT 2013-14 Page 16 Ontology based semantic data validation Flow chart of how the tool handles editing an instance is shown below: A ADD PROPERTY VALUE IF USER CHOOSES AN OPTION DELETE PROPERTY VALUE EDIT PROPERTY VALUE VALUE EDIT PROPERTY INPUT DETAILS INPUT DETAILS OF OF PROPERTY PROPERTY VALUE AND VALUE AND NEW VALUE INPUT DETAILS OF PROPERTY AND VALUE IF THIS UPDATE CREATES ANY FURTHER CONFLICT NO PERFORM UPDATE YES NOTIFY USER STOP Figure 6: Flow chart for editing instance Figure 6 shows how the tool handles editing an instance. For every instance the user can choose to add, edit or delete an instance property value. If the user chooses to add or delete property value, tool will input the property and value details from the user. If the user chooses to edit a property value tool will input property, value and new value from the user. With these values the tool will check if this update creates any further conflicts. If no then the tool will perform the update, otherwise it will notify the user. KAnOE, PESIT 2013-14 Page 17 Ontology based semantic data validation 4.3 DATABASE SCHEMA: All the logical inferences computed from the elements of the ontology are stored in the database. The database contains 9 different tables that are used to store the data. Description of all these tables and their inter dependencies are shown in the following tables: CONCEPT Holds information about all classes in the ontology ATTRIBUTE CID TYPE DESCRIPTION INTEGER Unique ID for each class. Primary key of the table. URI STRING Uniform Resource Identifier of the class. Name STRING Name of the class. Table 1: Concept PROPERTY Holds information about all properties in the ontology ATTRIBUTE Property_ID TYPE INTEGER DESCRIPTION Unique ID for each property. Primary key of the table. URI STRING Uniform Resource Identifier of the property. Name STRING Name of the property. Relation BOOLEAN Indicates if the property is datatype or object property. Table 2: Property KAnOE, PESIT 2013-14 Page 18 Ontology based semantic data validation INSTANCE Holds information about all instances in the ontology ATTRIBUTE Instance_type-ID TYPE DESCRIPTION INTEGER Unique ID for each instance. Primary key of the table CID INTEGER Class of the instance. Foreign key refering to the CID in concept table. URI STRING Uniform Resource Identifier of the instance. Name STRING Name of the instance. Table 3: Instance DOMAIN Holds information about domain classes of all properties ATTRIBUTE domain_ID TYPE INTEGER DESCRIPTION ID of the domain class. Primary key of the table. Foreign key refering to the CID in concept table. Property_ID INTEGER ID of the property. Primary key of the table. Foreign key refering to the property_ID in property table Table 4: Domain KAnOE, PESIT 2013-14 Page 19 Ontology based semantic data validation RANGES Holds information about range of all properties ATTRIBUTE property_ID TYPE INTEGER DESCRIPTION ID of the property. Primary key of the table. Foreign key refering to the property_ID in property table Data_range STRING Range value of datatype property range_ID INTEGER ID of the range class of a object property. Foreign key refering to the CID in concept table. Table 5: Ranges SUB_CLASS_OF Holds information about sub classes of various classes in the ontology ATTRIBUTE parent_ID TYPE INTEGER DESCRIPTION ID of the parent class. Primary key of the table. Foreign key refering to the CID in concept table. CID INTEGER ID of the child class. Primary key of the table. Foreign key refering to the CID in concept table. Table 6: Sub_Class_Of KAnOE, PESIT 2013-14 Page 20 Ontology based semantic data validation SUB_ Holds information about sub properties PROPERTY_ OF of various properties in the ontology ATTRIBUTE property_ID TYPE INTEGER DESCRIPTION ID of the child property. Primary key of the table. Foreign key refering to the property_ID in property table. Parent_ID INTEGER ID of the child property. Primary key of the table. Foreign key refering to the property_ID in property table. Table 7: Sub_Property_Of INSTANCE_ Holds information about all property PROPERTY_ values of various instances in the ontology VALUE ATTRIBUTE instance_type_ID TYPE INTEGER DESCRIPTION ID of the instance. Primary key of the table. Foreign key refering to the instance_type_ID in instance table. property_ID INTEGER ID of the property. Primary key of the table. Foreign key refering to the property_ID in property table. Literal STRING Value of the datatype property Value INTEGER ID of the instance of object property. Foreign key refering to the instance_type_ID in instance table. Table 8: Instance_Property_Value KAnOE, PESIT 2013-14 Page 21 Ontology based semantic data validation CONCEPT_ Holds information about all constraints of PROPERTY_ various classes and properties in the ontology CONSTRAINT ATTRIBUTE TYPE Concept_constraint_ID INTEGER DESCRIPTION Unique ID for each constraint. Primary key of the table CID INTEGER ID of the class.Foreign key refering to the CID in concept table. Property_id INTEGER ID of the property. Foreign key refering to the property_ID in property table. Range_ID INTEGER ID of the range class of a object property. Foreign key refering to the CID in concept table. Exact_val_data STRING Exact value of the datatype property Exact_val_object INTEGER Exact instance of the object property. Foreign key refering to the instance_type_ID of instance table. Some_val_data STRING Values of some values from range of the datatype property Some_val_object INTEGER ID of the range class of some values from constraint. Foreign key refering to the CID in concept table. All_val_data STRING Values of all values from range of the datatype property All_val_object INTEGER ID of the range class of all values from constraint. Foreign key refering to the CID in concept table Min_cardinality INTEGER Minimum cardinality value of the constraint. Max_cardinality INTEGER Maximum cardinality value of the constraint. Exact_cardinality INTEGER Exact cardinality value of the constraint. Table 9: Concept_Property_Constraint KAnOE, PESIT 2013-14 Page 22 IMPLEMENTATION Ontology based semantic data validation CHAPTER 5 IMPLEMENTATION 5.1 SOURCES OF CONFLICT: Any ontology consists of various classes and properties that relate those classes. User cannot alter the classes and properties, but user can only alter instances and property values of those instances. Thus main conflicts that the tool will handle are: 1. Creating an instance. 2. Deleting an instance. 3. Adding or editing property value. 4. Deleting a property value. 5.2 CREATING AN INSTANCE: When the user tries to create an instance of a class, the tool has to check if there are any constraints on creating an empty instance. If there are constraints then the tool should inform the user about the constraint. Consider an example as shown in the Figure 7.In the example, user tries to create an instance of a class „CAR‟ without an engine and there is a constraint on class „CAR‟ that no car should be defined without an engine. In such cases the tool should be able to identify the constraint and warn the user. hasEngine ENGINES CAR *DieselEngine someValuesFrom Figure7: Creating an instance KAnOE, PESIT 2013-14 Page 24 Ontology based semantic data validation Pseudo code for handling an instance creation is as written below: When an empty instance of a particular Class is created if the class has any constraint notify user else continue 5.3 DELETING AN INSTANCE: When the user tries to delete an instance, it may violate minimum cardinality constraint or it may affect another instance that is related to the instance. Consider the example shown in the Figure 8 where there is a company „A‟ which has supplier „S1‟ who supplies product with identity number „14‟. When the user tries to delete the supplier „S1‟, it affects the product with identity „14‟ also. Tool should identify such cascade effects of deleting an instance and warn the user. “S1” hasSupplier suppliesProductWithID “A” “12” Figure8: Deleting an instance Pseudo code for handling an instance deletion is as written below: When an instance is being deleted for all the properties that instance is in range of: if min-cardinality = no_of_instances in property_range notify user-delete not possible else if (in any properties instance is involved in-violating a constraint) KAnOE, PESIT 2013-14 Page 25 Ontology based semantic data validation notify user-should the other instance also be deleted? if yes delete all instances else delete the instance 5.4 ADDING OR EDITING PROPERTY VALUE: OWL supports 2 types of properties, datatype and object properties. Only difference between them is that range value of a datatype property is a literal whereas range value of object property is an instance. User can add or edit any datatype or object property. When the user tries to add or edit a property value, tool has to check all the constraints that may be violated because of adding or editing the property value. 5.4.1 ADDING OR EDITING DATATYPE PROPERTY: When the user tries to add or edit a datatype property, the tool has to check for various constraints like domain, range, exact value, maximum cardinality, some values from and functional property violations. Consider the example shown in the Figure9, where a person „RAM‟ has 2 values for the datatype property „hasCitizenship‟. There is a constraint on the property that maximum cardinality is „2‟. When the user tries to add a third value to this property, the tool has to restrict the user. “US” hasCitizenship hasCitizenship “INDIAN” RAM hasCitizenship “KOREAN” maxCardinality=2 Figure9: Adding or editing datatype property value There are many other constraints that the tool has to check for datatype property. Pseudo code for handling all datatype property constraints are as written below: KAnOE, PESIT 2013-14 Page 26 Ontology based semantic data validation 1. Domain constraint violation: ifDomain_value is not instance of DomainClasses notify user-Domain not suitable else continue 2. All values from constraint violation: ifproperty_value is not a value in specified_values notify user-value out of range else continue 3. Datatype range constraint violation: ifproperty_value not belongs to one of datatype_range notify user- value not proper datatype else continue 4. Enumerated class range constraint violation: ifproperty_value is not one of value in enumerated_range notify user- value out of range else continue KAnOE, PESIT 2013-14 Page 27 Ontology based semantic data validation 5. Has exact value constraint violation: ifproperty_value is notSameAsspecified_value notify user-value not valid else continue 6. Maximum cardinality constraint violation: if max-cardinality = no_of_values in property_range notify user- cannot add value else continue 7. Some values from constraint violation only for edit property value: input the new_value ifnew_value is not one of value in range if(no_of_remaining_property_values in range) >= 1 edit value else notify user-edit not possible else edit value 8. Functional property constraint violation: ifno_of_values in property_range = 1 notify user-add not possible else continue KAnOE, PESIT 2013-14 Page 28 Ontology based semantic data validation 5.4.2 ADDING OR EDITING OBJECT PROPERTY: When the user tries to add or edit object property, the tool has to check for various constraints like domain, range, exact value, maximum cardinality, some values from, inverse of a property, symmetric property, Asymmetric property, functional property, inverse functional property, irreflexive property violations. Consider the example shown in the Figure 10, a person „RAHUL‟ has wife „ANJALI‟. We have that „hasWife‟ is a functional property indicating that every person should have an unique wife. When the user tries to add that „RAHUL‟ ha wife „TINA‟, this violates the functional property constraint, thus the tool has to restrict the user by doing so. ANJALI hasWife RAHUL Functional Poperty TINA hasWife Figure10: Adding and editing object property value There are many other constraints that the tool has to check for object property. Pseudo code for handling all object property constraints are as written below: 1 Domain constraint violation: ifDomain_value is not instance of DomainClasses notify user-Domain not suitable else continue 2 All values from constraint violation: ifproperty_value is not an instance of range notify user-value out of range else KAnOE, PESIT 2013-14 Page 29 Ontology based semantic data validation continue 3 Range constraint violation: ifproperty_value not instance of range notify user- value out of range else continue 4 Has Exact value constraint violation: ifproperty_value is not specified_value notify user-value not valid else continue 5 Maximum cardinality constraint violation: if max-cardinality = no_of_instances in property_range notify user- cannot add value else continue 6 Some values from constraint violation: input the new_value ifnew_value is not one of instance in range if (no_of_remaining_property_instances in range) >= 1 edit value else notify user-edit not possible else KAnOE, PESIT 2013-14 Page 30 Ontology based semantic data validation edit value 7 Inverse of a property constraint violation: inputnew_value get list of all other constraints property and inverProperty has ifnew_valuesatisfis all other_constraints add/edit<domain_value property new_value> and <new_valueinversePropertydomain_value> else notify user-additing/editing not possible 8 Symmetric property constraint violation: Input new_value Get list of all other conatraints the property has if new value satisifes all other_constraints add/edit<domain_value property new_value> and <new_value property Domian_value> else notify user-addition/edition not possible 9 Asymmetric property constraint violation: inputnew_value if<new_value property domain_value> exists nofify user- add not possible else continue 10 Functional property constraint violation: ifno_of_values in property_range = 1 notify user-add/edit not possible KAnOE, PESIT 2013-14 Page 31 Ontology based semantic data validation else continue 11 Inverse functional property constraint violation: inputnew_value if<any_individual property new_value> exists notify user-value already exists else continue 12 Irreflexive property constraint violation: ifdomain_value == range_value notify user-add/edit not possible else continue 5.5 DELETING A PROPERTY VALUE: When user tries to delete datatype or object property value the tool checks for many constraint violation that may occur due to deleting. KAnOE, PESIT 2013-14 Page 32 Ontology based semantic data validation 5.5.1 DELETING DATATYPE PROPERTY VALUE: For handling deletion of a datatype property value, tool should check for minimum cardinality and some values from constraint violation. Consider the example shown in the Figure11 Where a polygon has 3 sides „S1‟,‟S2‟ and „S3‟. And minimum cardinality for datatype property „hasSide‟ is 3. When the user tries to delete a side of the polygon tool has to restrict the user. “S1” hasSide hasSide POLYGON “S2” minCardinality=3 hasSide “S3” Figure11: Deleting datatype property value Pseudo code for handling deletion of all datatype property constraints are as written below: 1. Some values from constraint violation: ifno_of_values in property_range = 1 notify user-delete not posible else if (no_of_remaining_property_values in range) >= 1 delete value else notify user-delete not possible 2. Minimum cardinality constraint violation: if min-cardinality = no_of_values in property_range notify user-delete not possible else KAnOE, PESIT 2013-14 Page 33 Ontology based semantic data validation delete the value 5.5.2 DELETING OBJECT PROPERTY VALUE: For handling deletion of a object property value, tool should check for minimum cardinality and some values from, inverse of a property, symmetric property and transitive property constraint violation. Consider the example shown in the Figure12 Where „MEGHA‟ has roommate „ANJALI‟. And we have that object property „hasRoommate‟ is a symmetric property, thus we can infer that „ANJALI‟ also has roommate „MEGHA‟. When user tries to delete that „MEGHA‟ hasRoommate “ANJALI‟ the tool should ask the user should the property be deleted from the other side also. hasRoommate MEGHA ANJALI symmetric hasRoommate Figure12: Deleting object property value Pseudo code for handling deletion of all object property constraints are as written below: 1. Some values from constraint violation: ifno_of_instances in range = 1 notify user-delete not possible else if (no_of_remaining_property_instances in range) >= 1 delete value else notify user-delete not possible KAnOE, PESIT 2013-14 Page 34 Ontology based semantic data validation 2. Minimum cardinality constraint violation: if min-cardinality = no_of_instances in property_range notify user-delete not possible else delete the value 3. Inverse of a property constraint violation: if any constraint in property and inverse_property is violated notify user-delete not possible else notify user- should both properties be deleted? if yes continue else don’t delete 4. Symmetric property constraint violation: if any constraint in <domain_value property range_value> and <range_value property domain_value> is violated notify user-delete not possible else notify user- should both triples be deleted? if yes continue else don’t delete KAnOE, PESIT 2013-14 Page 35 Ontology based semantic data validation 5. Transitive property constraint violation: if any constraint in property and inferred property is violated notify user- delete not possible else notify user- should inferrred property also be deleted? if yes continue else don’t delete KAnOE, PESIT 2013-14 Page 36 SOFTWARE TESTING Ontology based semantic data validation CHAPTER 6 SOFTWARE TESTING The ontology based semantic data validation tool that we developed was tested with various test cases and the results were evaluated. Details of all these test cases and theirs results are given below. 6.1 Test Cases and Results: First unit test was done to check if a new instance is being created successfully. Test results are shown in the Table 10: UNIT TEST CASE ID Test case 1 DESCRIPTION To test if new instance is being created INPUT Class name, new instance name and instance URI EXPECTED OUTPUT Creation of New instance ACTUAL OUTPUT New instance created REMARKS Test passed Table 10: Unit test for creating an instance KAnOE, PESIT 2013-14 Page 38 Ontology based semantic data validation Next unit test was done to check if an existing instance is being deleted successfully. Test results are shown in the Table 11: UNIT TEST CASE ID Test case 2 DESCRIPTION To test if an existing instance is being deleted INPUT Class name and instance name EXPECTED OUTPUT Deletion of the instance ACTUAL OUTPUT Instance being deleted if no conflicts arise because of the deletion REMARKS Test passed Table 11: Unit test for deleting an instance Next unit test was done to check if a property value for an instance is being added successfully. Test results are shown in the Table 12: UNIT TEST CASE ID Test case 3 DESCRIPTION To test if a property value for an instance is being added INPUT Class name, instance name, property name and value EXPECTED OUTPUT Addition of property value for an instance ACTUAL OUTPUT Instance property value being added if this does not violate any constraint REMARKS Test passed Table 12: Unit test for adding a property for an instance KAnOE, PESIT 2013-14 Page 39 Ontology based semantic data validation Next unit test was done to check if tool detects the violation when a property value for an instance is being edited successfully. Test results are shown in the Table 13: UNIT TEST CASE ID Test case 4 DESCRIPTION To test if a property value for an instance is being edited INPUT Class name, instance name, property name, value and new value EXPECTED OUTPUT Detection of the violation ACTUAL OUTPUT Violation is being detected and alerted REMARKS Test passed Table 13: Unit test for detecting violation while editing property value of an instance Next unit test was done to check if tool detects violation when a property value for an instance is being deleted successfully. Test results are shown in the Table 14: UNIT TEST CASE ID Test case 5 DESCRIPTION To test if a property value for an instance is being deleted INPUT Class name, instance name, property name and value EXPECTED OUTPUT Detection of violation ACTUAL OUTPUT Violation is being detected and alerted REMARKS Test passed Table 14: Unit test for detecting violation while deleting a property value of an instance KAnOE, PESIT 2013-14 Page 40 Ontology based semantic data validation Next unit test was done to check if a property value for an instance is being edited successfully. Test results are shown in the Table 15: UNIT TEST CASE ID Test case 4 DESCRIPTION To test if a property value for an instance is being edited INPUT Class name, instance name, property name, value and new value EXPECTED OUTPUT Edition of property value for an instance ACTUAL OUTPUT Instance property value being edited if this does not violate any constraint REMARKS Test passed Table 15: Unit test for editing property value of an instance Next unit test was done to check if a property value for an instance is being deleted successfully. Test results are shown in the Table 16: UNIT TEST CASE ID Test case 5 DESCRIPTION To test if a property value for an instance is being deleted INPUT Class name, instance name, property name and value EXPECTED OUTPUT deletion of property value for an instance ACTUAL OUTPUT Instance property value being deleted if this does not violate any constraint REMARKS Test passed Table 16: Unit test for deleting a property value of an instance KAnOE, PESIT 2013-14 Page 41 Ontology based semantic data validation First Performance test was done to check if tool operates with same speed and efficiency if the number of instance property values is increased to 10,000 values. These values were obtained by randomly generating 25 instances for each class in the given ontology and validating all possible instance property values from these instances. Results of the test are shown on the Table 17. PERFORMANCE TEST CASE ID Test case 6 DESCRIPTION To test performance with 10,000 instance property values INPUT 10,000 instance property values EXPECTED OUTPUT Tool performance with same speed and efficiency as before TIME TAKEN FOR 10 INSTANCE PROPERTY VAUES 2.9 mili seconds TIME TAKEN FOR 10,000 INSTANCE PROPERTY VAUES 3.9 mili seconds REMARKS Tool working efficiently even for large ontologies Table 17: Performance test for 10000 instance property values KAnOE, PESIT 2013-14 Page 42 Ontology based semantic data validation Next Performance test was done to compare the efficiency of the tool in comparison with other reasoners such as HERMIT and PELLET. Same ontology was given as input to the tools and time taken by them calculated. Results of the test are shown on the Table 18. PERFORMANCE TEST CASE ID Test case 7 DESCRIPTION To test performance with other reasoners INPUT Same ontology to all reasoners EXPECTED OUTPUT Tool that we developed having better performance than other reasoners TIME TAKEN BY REASONER “HERMIT” 250 mili seconds TIME TAKEN BY REASONER “PELLET” 317.18 mili seconds TIME TAKEN BY REASONER THAT WE DEVELOPED 3.9 mili seconds REMARKS Tool developed by us is highly efficient compared to pellet and hermit Table 18: Performance test to compare with different reasoners KAnOE, PESIT 2013-14 Page 43 USER MANUAL Ontology based semantic data validation CHAPTER 7 USER MANUAL In this chapter we present the results for various inputs, obtained from the ontology based semantic data validation tool that we have developed. These results are shown in the form of screen shots as shown below: Figure 13: Input an ontology file. Figure 13 shows the homepage of the data validation tool. The whole process of data validation starts when the user gives an owl file as input. Once the user specifies the path of owl file, the validator starts interpreting the ontology. If the specified file is not a valid owl file, validator will notify the user. KAnOE, PESIT 2013-14 Page 45 Ontology based semantic data validation Figure 14: Options for a class. Once the user gives owl file as input, tool will list all the classes in the ontology. When the user selects a class, the tool provides 3 options (create, edit or delete instance) for the user to choose from. In figure 14user has selected the class called “Beach”.If the input file is not an owl file or path specified by the user is not valid, the tool will alert the user that input file not found. Once the input file is a valid owl file, tool will extract all the classes present in the ontology and provides the user with the list of classes to choose from. User should have minimum prior knowledge about the domain and he should select a class, which he/she wants to manipulate. KAnOE, PESIT 2013-14 Page 46 Ontology based semantic data validation Figure 15: Details of new instance. When the user chooses to create an instance, tool will provide a form to enter the details of the new instance. In Figure 15 User has selected to create a new instance of class “Beach” and user has submitted the name and URI of the instance and the instance created successfully. Whenever a new instance name and uri is provided the tool will check the data stored in the database. Tool will compare the name of the new instance with all the existing instances. If an instance with the same name exists the tool will detect and notify the user. If there is no instance already existing with the same name, new instance will be created successfully. KAnOE, PESIT 2013-14 Page 47 Ontology based semantic data validation Figure 16: Cannot create 2 instances with same name. When the user tries to create an instance whose name already exists in the ontology, the tool alerts the user that the instance is already in the ontology. In Figure 16 the user has given an existing instance name thus the tool has alerted to change the instance name. Whenever the user provides a instance name the tool will check the database and compares if there is an instance existing already with the same name. If so, then the tool alerts the user to provide another name for the instance. This is done because instance name should be unique in ontology. If there are two instances with same name, then the ontology will be inconsistent. KAnOE, PESIT 2013-14 Page 48 Ontology based semantic data validation Figure 17: Edit an instance. When the user chooses to edit an instance of a class, tool will list all the instances of the class present in the ontology. When the user selects an instance the tool will provide 3 Options to choose from. User can add a property value for an instance or edit a property value of an instance or delete a property value for an instance, thus the tool has listed all the instances on that class, which are “CurrawongBeach”, “BondiBeach” and “KovalamBeach”. In Figure 17User has chosen the instance “KovalamBeach”. As soon as an instance is selected tool provides the three options. In Figure 18 user has chosen to add a new property value. KAnOE, PESIT 2013-14 Page 49 Ontology based semantic data validation Figure 18: Add property value. When the user chooses to add property value to an instance, tool will provide a form to enter details of the property and value. In Figure 18 user has selected to add property value for the instance “KovalamBeach” of the class “Beach”. While choosing a property, tool will list all the available properties in the ontology to make it easy for the user to choose from. All the property values have been stored in the database by the tool as soon as the ontology is given as input. Tool makes use of these stored properties and fetches all the properties instantly when the user has to select a property value. KAnOE, PESIT 2013-14 Page 50 Ontology based semantic data validation Figure 19: Add property value successful. In Figure 19 user has provided the instance property value as “KovalamBeach” “hasActivity” “paraGliding”. As soon as the data is provided in the “value” field, tool takes the instance property value and validates it with the existing data. If addition of this instance-property-value violates any restriction, then the tool notifies the user about the violation. Only if this value does not create any conflict and it is valid according to the ontology, tool will provide submit option. When user submits the data instance property value will be created successfully. KAnOE, PESIT 2013-14 Page 51 Ontology based semantic data validation Figure 20: Domain violation. In Figure 20 user tries to add a property value for the instance “KovalamBeach” of the class “Beach”. The new instance property value, which is “KovalamBeach” “hasRating” “OneStarRating” is violating a domain constraint. The property “hasRating” has a restriction that its domain value should be instance of class “Accommodation”. But “KovalamBeach” is not in the domain of the property. Tool will detect if this violates any constraint notifies user. KAnOE, PESIT 2013-14 Page 52 Ontology based semantic data validation Figure 21: Range violation. In Figure 21 user tries to add a property value for the instance “KovalamBeach” of the class “Beach”. The new instance property value, which is “KovalamBeach” “has Activity” “blackThunder”, is violating the range constraint. The property “hasActivity” has a restriction that its value should be an instance of class “Activity”. But “BlackThunder” is not instance of the class “Activity”. Tool will detect that this value violates range constraint and notifies the user. KAnOE, PESIT 2013-14 Page 53 Ontology based semantic data validation Figure 22: Maximum cardinality violation. In Figure 22 the user tries to add a property value for the instance “KovalamBeach” of the class “Beach”. The new instance property value which is, “KovalamBeach” “has Activity” Trecking” violates the maximum cardinality constraint for the instance and the property. The class “Beach” has a restriction that any of its instances should have a maximum of two values for the property “hasActivity”. For the instance “KovlamBeach” there are two values already existing in the ontology, thus adding another value will violate maximum cardinality constraint. Tool detects such violations and notifies the user. KAnOE, PESIT 2013-14 Page 54 Ontology based semantic data validation Figure 23: Edit property value. When the user chooses to edit an instance of a class, tool will list all the instances of the class present in the ontology. When the user selects an instance the tool will provide 3 Options to choose from. User can add a property value for an instance or edit a property value of an instance or delete a property value for an instance, thus the tool has listed all the instances on that class, which are “CurrawongBeach”, “BondiBeach” and “KovalamBeach”. In Figure 23User has chosen the instance “KovalamBeach”. As soon as an instance is selected tool provides the three options. In Figure 24 user has chosen to edit an existing property value. KAnOE, PESIT 2013-14 Page 55 Ontology based semantic data validation Figure 24: Edit property value with existing values. In Figure 24 the user has chosen to edit property value of the instance “KovalamBeach” of the class “Beach” and the tool provides a form to enter property value to be edited and also the new value. As soon as the user chooses to edit property value, tool will list all the properties present in the ontology. These values will be stored in the database when the input ontology was given. As soon as the user selects a property “has Activity”, tool will list all the values existing for the property. User can only choose from these values to edit. KAnOE, PESIT 2013-14 Page 56 Ontology based semantic data validation Figure 25: Edit property value range violation. In Figure 25 the user has chosen to edit property value of the instance “KovalamBeach” of the class “Beach” and the tool provides a form to enter property value to be edited and also the new value. The new instance property value which is, “KovalamBeach” “hasActivity” “blackThunder” violates the range of the property “hasActivity”. Thus user cannot edit the instance property value “KovalamBeach” “hasActivity” “ParaGliding”. Tool detects such violations and notifies user. KAnOE, PESIT 2013-14 Page 57 Ontology based semantic data validation Figure 26: Edit property value successful. In Figure 26 user has chosen to edit instance property value “KovalamBeach” “hasActivity” “ParaGliding” with new instance property value “KovalamBeach” “hasActivity” “Trecking”. Tool will check if this new value is violating any constraint. Since this value does not violate any constraint, tool will edit the property value successfully. KAnOE, PESIT 2013-14 Page 58 Ontology based semantic data validation Figure 27: Edit property value no values for property. In Figure 27 the user has selected to edit a property value of the instance “KovalamBeach” of the class “Beach”. When the user selects a property of the instance tool will list all the values existing for the instance property. In Figure 28 user has selected the property “hasRating”. Tool detects that no values exist for this property to edit, thus tool alerts the user to select a new property. KAnOE, PESIT 2013-14 Page 59 Ontology based semantic data validation Figure 28: Edit property value has value violation. In Figure 28 User tries to edit a property value of the instance called “FourSeasons” of the class “LuxuryHotel”. User has chosen to edit instance property value “FourSeasons” “hasRating” “ThreeStarRating” with value “FourSeasons” “hasRating” “OneStarRating”. There is a restriction in the ontology that any instance of class “LuxuryHotel” with property “hasRating” should have value “ThreeStarRating”. The tool detects that this new value violates has Value constraint and alerts the user. KAnOE, PESIT 2013-14 Page 60 Ontology based semantic data validation Figure 29: Delete property value. When the user chooses to edit an instance of a class, tool will list all the instances of the class present in the ontology. When the user selects an instance the tool will provide 3 Options to choose from. User can add a property value for an instance or edit a property value of an instance or delete a property value for an instance, thus the tool has listed all the instances on that class, which are “CurrawongBeach”, “BondiBeach” and “KovalamBeach” in the Figure 30. User has chosen the instance “KovalamBeach”. As soon as an instance is selected tool provides the three options. In Figure 29 user has chosen to delete an existing property value. KAnOE, PESIT 2013-14 Page 61 Ontology based semantic data validation Figure 30: Delete property value selecting a value. In Figure 30 User has chosen to delete a property value for instance “KovalamBeach” of the class “Beach”. As soon as the user chooses to delete property value, tool will list all the properties present in the ontology. These values will be stored in the database when the input ontology was given. As soon as the user selects a property, tool instantly provides the user with all the existing values to select from. KAnOE, PESIT 2013-14 Page 62 Ontology based semantic data validation Figure 31: Delete property value successful. In Figure 31 User has chosen to delete a property value for instance “KovalamBeach” of the class “Beach”. User has selected the instance property value “KovalamBeach” “hasActivity” “Trecking” to delete. Since deletion of the selected instance property value does not violate any constraint, tool will successfully delete the instance property value. KAnOE, PESIT 2013-14 Page 63 Ontology based semantic data validation Figure 32: Delete property value some values from violation. In Figure 32 User has chosen to delete a property value for an instance “KovalamBeach” of the class “Beach”. There is a restriction on the class “Beach” and property “hasActivity” that it should have at least one value from the class “Activity”. Deletion of the instance property value “KovalamBeach” “hasActivity” “kayaking” will violate the some values from constraint, since it is the only value which is an instance of class “Activity”. Tool detects that the value violates the some values form constraint and alerts the user. KAnOE, PESIT 2013-14 Page 64 Ontology based semantic data validation Figure 33: Delete property value exact cardinality violation. In Figure 33 User has chosen to delete a property value for an instance “BlachThunder” of the class “BackpackersDestination”. There is a restriction in the ontology that an instance of the class “BackpackersDeatination” should have exactly one value for the property “hasAccommodation”. Deleting the instance property value “BlackThunder” “hasAccommodation” “Pearl” will violate the exact value constraint. Tool detects that deleting this value violates exact cardinality constraint and alerts the user. KAnOE, PESIT 2013-14 Page 65 Ontology based semantic data validation Figure 34: Delete property value min cardinality violation. In Figure 34 User has chosen to delete a property value for the instance “Rover” of the class “FamilyDestination”. There is a restriction in the ontology that any instance of class “FamilyDestination” with property “hasAccommodation” should have minimum one value. Deleting the instance property value “Rover” “hasAccommodation” will violate this constraint. Tool detects that deleting this value violates minimum cardinality constraint and alerts the user. KAnOE, PESIT 2013-14 Page 66 Ontology based semantic data validation Figure 35: Delete an instance. When the user chooses to delete an instance of a class, tool will list all the instances of the class present in the ontology. In Figure 35 user has selected to delete an instance of the class “Beach”, thus the tool has listed all the instances on that class, which are “CurrawongBeach”, “BondiBeach” and “KovalamBeach”. User has chosen the instance “KovalamBeach”. As soon as an instance is selected tool provides details of the instance to delete the instance. KAnOE, PESIT 2013-14 Page 67 Ontology based semantic data validation Figure 36: Delete an instance successfully. In Figure 36 User has chosen to delete an instance “KovalamBeach” of the class “Beach”. Whenever an instance of a class is being deleted the tool will check if this deletion violates any constraints. Tool successfully deletes the instance “KovalamBeach”, since its deletion does not violate any constraint. KAnOE, PESIT 2013-14 Page 68 Ontology based semantic data validation Figure 37: Delete an instance no instance. In Figure 37 User has chosen to delete an instance from a class “Sports”. But there are no instance existing in the ontology for the selected class. Tool detects that there are no instances existing for the class and alerts user. KAnOE, PESIT 2013-14 Page 69 Ontology based semantic data validation Figure 38: Delete an instance minimum cardinality violation. In Figure 38 User has chosen to delete an instance “Pearl” of class “BudgetAccommodation”. Tool detects that deleting this instance will violate the minimum cardinality constraint of the instance property value “Rover” “hasAccommodation” “Pearl”. Tool alerts the user about the violation. KAnOE, PESIT 2013-14 Page 70 Ontology based semantic data validation Figure 39: Delete an instance some values from violation. In Figure 39 User has chosen to delete instance “DollMuseum” of class “Museums”. Tool detects that this deletion will violate some values form constraint of the instance property value “Roaster” “hasActivity” “DollMuseum”. Tool alerts the user about the violation. KAnOE, PESIT 2013-14 Page 71 CONCLUSION AND FUTURE WORK Ontology based semantic data validation CHAPTER 8 CONCLUSION AND FUTURE WORK 8.1 CONCLUSION: In today‟s scenario internet is one of the fastest growing sources of information and huge amount of data resides on it. In this project we identified validating the data being added to the internet as a major problem. Data can be validated either syntactically or semantically. Many syntactic validators are available today and they are very efficient. Only few syntactic validators are available today and they are highly inefficient. Most of these semantic validators use tableaux reasoning algorithm for validation. This algorithm computes the deductive closure of all the given facts or axioms. We considered the drawbacks of tableaux reasoners and the inadequacy of semantic data validators in this project and we developed an ontological tool that is domain independent and will perform the specific task of data validation semantically. This tool does not compute the deductive closure of the facts; instead the tool only computes a set of inferences on the given facts in the ontology, needed for data validation. These computed inferences are stored in the database for further use. Whenever needed some of the inferences are used to validate the data instantly as soon as the user enters the data. Finally we conclude that with this approach we are able to address both of the drawbacks of the tableaux algorithm by eliminating the calculation of deductive closure of the axioms. The model thus developed is highly efficient even for large ontologies. Performance of this tool is significantly higher when compared to the performance of tableaux reasoners. KAnOE, PESIT 2013-14 Page 73 Ontology based semantic data validation 8.2 FUTURE WORK: We can extend the current project by designing and developing certain new features such as: 1. This tool validates data against the core OWL functionalities; it can be extended to all the functionalities of OWL and OWL2. 2. This tool can be integrated with other open source tools for further enhancements. KAnOE, PESIT 2013-14 Page 74 REFERENCES [1] Horrocks, Ian. "DAML+OIL:A Description Logic for the Semantic Web." IEEE Data Eng. Bull. 25.1 (2002): 4-9. [2] Brickley, Dan, and Ramanathan V. Guha. "{RDF vocabulary description language 1.0: RDF schema}." (2004). [3] Horrocks, Ian, Ulrike Sattler, and Stephan Tobies. "Reasoning with Individuals for the Description Logic\ mathcal {SHIQ}." Automated Deduction-CADE-17.Springer Berlin Heidelberg, 2000.482-496. [4] Horrocks, Ian. "The fact system." Automated Reasoning with Analytic Tableaux and Related Methods. Springer Berlin Heidelberg, 1998.307-312. [5]Baader, Franz, Ian Horrocks, and Ulrike Sattler. “Description logics as ontology languages for the semantic web." Mechanizing Mathematical Reasoning.Springer Berlin Heidelberg, 2005.228248. [6] McGuinness, Deborah L., and Frank Van Harmelen. "OWL web ontology language overview." W3C recommendation 10.10 (2004): 2004. [7] Sirin, Evren, et al. "Pellet: A practical owl-dl reasoner." Web Semantics: science, services and agents on the World Wide Web 5.2 (2007): 51-53. [8] Sirin, Evren, et al. "Pellet: A practical owl-dl reasoner." Web Semantics: science, services and agents on the World Wide Web 5.2 (2007): 51-53. [9] Bao, Jie, Dave Braines, and David Mott. "A Distributed Tableau Algorithm for the ALC Description Logic." [10] Mutharaju, Raghava, Frederick Maier, and Pascal Hitzler. "A MapReduce Algorithm for SC." 23rd International Workshop on Description Logics DL2010. 2010. [11] Shearer, Rob, Boris Motik, and Ian Horrocks. "HermiT: A Highly-Efficient OWL Reasoner." OWLED.Vol. 432. 2008. Page 75