The Use of Linked Data Approach For An Alternative Web Guide For
Transcription
The Use of Linked Data Approach For An Alternative Web Guide For
' $ The Use of Linked Data Approach For An Alternative Web Guide For Potential University Applicants Noel Doherty Computing Session 2011 & % The candidate confirms that the work submitted is their own and the appropriate credit has been given where reference has been made to the work of others. I understand that failure to attribute material which is obtained from another source may be considered as plagiarism. (Signature of student) Summary This project is concerned with producing a case study creating a useful web application which will retrieve Web based material from linked data sources and present to the user a automatically generated web guide of Leeds City. Included in this report is a background review of similar and related systems in this area in addition to an literature review of materials for potential developers, and an in dept evaluation of the effectiveness of the technologies in the area. i Acknowledgements I would like to thank my supervisor, Lyida Lau, for her help during the whole project process her help has kept me on task many times. I would also like to take a chance to thank my family without their support I doubt I would have ever finished second year. ii Contents 0.0.1 Project Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 0.0.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 0.0.3 Minimum Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . 1 0.0.4 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 0.1 Project Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 0.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 0.1.2 Schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 0.2 Background Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 0.2.1 The Web Of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 0.2.1.1 The Semantic Web . . . . . . . . . . . . . . . . . . . . . . . . 5 0.2.1.2 Linked Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 0.2.1.3 The Linked Data Cloud . . . . . . . . . . . . . . . . . . . . . 6 0.2.1.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 0.2.2 Developmental Tools and Appropriate Technologies . . . . . . . . 10 0.2.2.1 AJAX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 0.2.2.2 RDF(Resource Description Framework) . . . . . . . . . . . . 10 0.2.2.3 SPARQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 0.2.2.4 XHTML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 0.2.3 A review of selected sites which currently use Linked Data . . . . . 14 0.2.3.1 Police.uk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 0.2.3.2 bbc.co.uk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 0.2.3.3 GovYou.co.uk - Your Freedom, Your Ideas, Gov You . . . . 15 0.2.3.4 New York Times - Who Went Where . . . . . . . . . . . . . . 18 0.2.4 Linked Data and Content Management Systems . . . . . . . . . . 21 0.3 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 0.3.1 Requirements for Alternative Guide . . . . . . . . . . . . . . . . . . . 21 0.3.2 Data Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 0.3.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 0.3.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 0.3.5 Stage 1: First Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 iii 0.3.5.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 0.3.5.2 SPARQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 0.3.5.3 XHTML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 0.3.5.4 JavaScript . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 0.3.6 Stage 2: Second Iteration . . . . . . . . . . . . . . . . . . . . . . . . . 27 0.3.6.1 design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 0.3.6.2 SPARQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 0.3.6.3 XHTML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 0.3.6.4 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 0.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 0.4.1 Agaisnt Project Objectives . . . . . . . . . . . . . . . . . . . . . . . . 31 0.4.2 Agaisnt Requriements . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 0.4.3 From A Potential User . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 0.4.4 Evaluation of the effectiveness of technologies . . . . . . . . . . . . 34 0.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 0.5.1 Future Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 0.5.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 Bibliography 38 A Personal Reflection 41 iv 0.0.1 Project Aim The aim of this project is to create an alternative web guide for prospective Leeds University and Metropolitan students while exploring the Linked data platform. An alternative guide would aim to be an alternative look at what information students may be interested in, less about the studious aspects of university life as apposed to the Taught Student guide [21] which Leeds University publishes every year and the Leeds City Guide student section which amounts to a list of bars [13]. Core aims of the project would be creating a usable resource of information, which is up to date automatically using the linked data resources on the world wide web at the moment, which would mean the site would stay relevant over the course of time. 0.0.2 Objectives The objectives are listed as follows they are achievable steps towards completing the aim of the project. The minimum objectives are as follows: • To learn about the technical platform of link data and apply that knowledge. • To survey related work in the field of linked data. • To document and gather the requirements of a site which uses linked data. • To design and develop an alternative guide to Leeds. • To evaluate the project in terms of the aims set down. 0.0.3 Minimum Requirements The minimum requirements are: • Use a Linked Data source, to form the basis of the site. • Embed retrieved Linked Data on the site dynamically • Create A interactive map of Leeds with keys points of interest to a student. Combining Geo location information and retrieved descriptions of places. • Develop a website which natively runs on today’s top three browsers:Mozilla Firefox, Google Chrome, and Windows Internet Explorer. 1 Enhancements that could be implemented are as follows: • Use BBC’s Linked Data resource to add where applicable news articles relating to the current content the user is viewing. • Link in a social media site as data source. This would involve connecting the data retrieved from Linked data and a twitter feed or facebook page. So that live updates from say for example clubs and bars would be displayed on the site. • Create a jobs section which uses URI’s for jobs which can be added on the site. Jobs would be submitted and tagged by the submitter, and searchable from the site. Enhancements to the project are about improving the relevancy of the data, and creating a site where the information available is recent and therefore potentially more relevant. 0.0.4 Challenges In creating A linked data application a devoper is instantly faced with the following challanges: • Finding Data that works/is relevant • Multitude of standards within Ontologies • Accessing the data 2 0.1 0.1.1 Project Outline Introduction This project aims to create a linked data application while also providing an evaluation of the linked data platform. Using the case study of building an application to act as a web guide. 0.1.2 Schedule This section contains a table which breaks down the time alloted to the full project. The four milestones on the project plan are four key points in the life of the project and refer to the following: 1. Milestone One: Requirements Gathering, background research has been completed. 2. Milestone Two: 1st prototype which meets the minimum requirements has been created. 3. Milestone Three: 2nd and final prototype which should exceed the minimum requirements. 4. Milestone Four: Evaluation and final write will have been finished. 3 1 Information Dates Problem definition, aims re- 7th-13th February Milestones quirements and objectives 2 Preliminary Investigation - 14th-21st February Investigate, Linked Data, Semantic web, SPQRQL and Approaches to linked data 3 Research on Data Sources 22nd - 2nd March and features - Investigate where data will be coming from. e.g.Dbpedia, data.gov. And what features the site could use. 4 Mid Term Report.- Produce 3rd - 8th March Milestone 1 report for submission on progress so far. 5 Design - Design site, 9th - 16th March SPARQL queries to be used to retrieve information 6 1st Prototype - Develop 17th - 27th March 1st iteration of website, this would include a single data source being retrieved. This prototype should meet minimum requirements 7 Testing of Prototype - Test 28nd - 1st April Milestone 2 prototype, ensure data is correctly being retrieved. 8 Progress Meeting 1st April 9 2nd Prototype - Final site 2nd - 13th April Milestone 3 creation. At this point extensions will have been implemented. 10 Testing of 2nd prototype 14th April - 17th April - Test site, ensuring it all works. 11 Evaluation of system - Eval- 18th - 22st April uate whether site has provided useful information and met requirements. 12 Full write up 23rd April - 9th May 4 Milestone 4 0.2 0.2.1 0.2.1.1 Background Research The Web Of Data The Semantic Web The Semantic Web(SW), is a Web that includes documents, or portions of documents, describing explicit relationships between things and containing semantic information intended for automated processing by machines [11]. In essence the SW seeks to add meaning to the information that is present on the World Wide Web(WWW), with the aim of creating machine readable meta data. SW will not come as a new product it will be pragmatically built. Like the Internet SW will be decentralized [3], with data sets existing independently with links connecting them. In contrast to the WWW the current way of accessing information the SW will contain structured data. This structure will be built by creating ontologies, and these ontologies will provide the relationships. An ontology is a specification of a concept which can be shared [12]. Ontologies would serve as the vocabulary [16] of the SW with user created agents using these vocabularies to navigate the data structures on the SW. An ontology itself a document or file on the SW would contain entities and relations between them, with a taxonomy that defines what makes up the entities and the relations between the objects. [3] The concept of an agent has been discussed within computing before [4] [26], they are systems that are in an environment which they can perceive, are capable of action within that environment that is unsupervised/uncontrolled and they carry out tasks with some objectives in mind. [17] Agents on the SW fall into two main categories agents that retrieve and connect data, and agents which act as personal agents on the web. An agent that retrieves data would be set parameters and would roam the SW using vocabularies provided by ontologies. These agents would collect web content, then process it for their user and would be able to prove the validity of the data by backtracking to sources. A personal agent as described in [3] could organise and rearrange appointments, by interacting with private ontologies and users timetables to automate the best times for a users appointments considering their schedule. The challenges faced by the Semantic web are two fold vastness and vagueness [1]. The vastness of the web is clear when you look at the millions of results returned by search engines, and converting all of these pages into a semantic sense would require millions of man hours to do. Therefore it would seem that convincing people to change their practices to practices that are in line with the semantic ethos is part 5 of the vastness problem. Vagueness is a problem in the sense that trying to nail down what something really is in essence is difficult, [18] and creating a creating an automated agent to tag meaning to data is again more difficult. 0.2.1.2 Linked Data The term Linked Data(LD) refers to a set of best practices for publishing and connecting structured data on the Web [15]. Linked data is built on the pre-existing web using Http and URI’s (uniform resource identifiers) and XML to publish structured data which is referential. Tim Burners Lee cites four key aspects of Linked data they are as follows: 1. Use URIs as names for things. 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL). 4. Include links to other URIs. so that they can discover more things. There are certain perceived limitations in the linked data methodology, if anyone can create a Linked Data resource then any one can publish anything like it is at the moment with pages on the web, however if automatic agents are searching the web to create linked data mash ups then this could be an issue. Burners Lee talks about possible solutions to this with a star rating of the data [2]. However though I personally see it as a none issue, the web it build on referential integrity choosing your data source is up to you as the designer of a system this negates the problems of poor data or incorrect data. The more Linked Data available on the Web the more interesting applications people can develop [14], it is an exciting area in computer science at the moment. 0.2.1.3 The Linked Data Cloud A key group providing information and connections in linked data community is the LOD(Linking open data) community. The LOD group have created an image to show a snap shot of the current Linked data ’cloud’, fig: 1, each node on the figure is a linked data resource. To appear on the cloud image the dataset must contain at least 1000 triples, this is to exclude sites which are simple made of a a FOAF profile. The LOD cloud is only a cloud in name, nodes on the graph are self contained data sets. The connections between the nodes are links made using the ontology sameAs the sameAs ontology defines the subject as the same as the object thus 6 connecting them. This is one of expressions of simple facts that are covered in section [? ]. Figure 1: Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/ DBpedia is the center of the cloud (fig: 1), it is a linked data version of the Wikipedia website. DBpedia uses a combination of ontologies to describe the data contained in the articles. Using a combination of an internal dbpedia ontology fig: 2, which covers 272 classes containing approximately 1,300 different properties with others such as the FOAF, DBpedia attempts to give semantic meaning to the elements of data. 0.2.1.4 Summary The background research suggests that overall Linked Data is a evolution of the semantic web idea, it is the next logical step. When describing the SW, agents and ontologies were discussed but in the discussion of linked data you can see the ontologies and agents being created. The visionaries of Semantic web ideas are moving the SW into todays Web, using RDF as the standard of choice. The growth of Linked data however, appears to not match the hype it has received. With people like Tim Berners Lee, attempting to drive the project forward it was assumed that the Linked data cloud would grow quickly however if you look at 7 Figure 2: The DBpedia Onotogy 8 the numbers of datasets in the cloud and the growth in 2007 there where 12 Linked data sets, this number grew to 203 over the course of two years and then has not increased since. It would appear that new datasets have stopped being released. This could be due to the difficulties in converting current sites being difficult. Overall, Linked data is attempting to build the semantic web on HTTP and URIs but it does not make the web smarter. The semantic web wont solve the problem of which RDF graphs to look at the moment A human has to choose them and bad choices result in bad results. I believe that factors like this and the difficulties in converting to RDF are putting off users from creating Linked data aware applications and Linked datasets.Once these difficulties have been overcame i foresee Linked data and the SW moving towards the ultimate gold of enabling computers to do more useful work and develop systems that can support trust interactions over the network. [30] 9 0.2.2 Developmental Tools and Appropriate Technologies The technologies that are at the core of linked data and the linked data approach are discussed in this section. These tools and methods do not exist solely to facilitate the creation of LD web applications therefore I will be excluding the discussion of features that are not applicable to LD and or the SW. 0.2.2.1 AJAX AJAX, a shorthand of JavaScript and XML. AJAX in itself isn’t anything new rather a collection of methods designed with the ’art of exchanging’[10] data with a server to update a web page with out reloading the whole page. AJAX is key for the project, being as its core ideas will assist in creating a dynamically embedded site in line with the objectives of the project. A core part of AJAX is the XMLHttpRequest object, this object that is supported by all of the browsers mentioned in section 0.0.3, is used for asynchronous data retrieval with a server. Web applications using linked data resources hosted on the Web will use the XMLHttpRequest to retrieve data from SPARQL endpoints. The Document Object Model(DOM), is part of the AJAX family of convention. Useful for developing Linked data based Web Applications due to it allowing access to elements in the Object Model. Using elements with no semantic meaning, such as DIV and SPAN Linked data developers can create Web Applications with out creating the content in advance, and then filling the elements with content dynamically on load. Examples of this can be seen on the BBC news and BBC sport which use these for live news and sport updates. JavaScript is the last element that binds, the AJAX methods together. There are challenges in creating AJAX applications, not essentially of a technical nature. The challenges with AJAX applications is having the vision to create something that is more than has been seen on the web before, to forget the limitations of HTML and move forward with a much wider pool of options and possibilities. 0.2.2.2 RDF(Resource Description Framework) Resource Description Framework, is a data model. It can provide a conceptual description of the data being represented; for example the tag person could be used to identify that a document is about a person [9]. The base standards for RDF are XML and URI, URI’s are used to identify what the data is about and the XML is the syntax of RDF. The w3 list the following as the key concepts of the RDF format: [19] 10 • Graph data model • URI-based vocabulary • Data types • Literals • XML serialization syntax • Expression of simple facts • Entailment Within RDF the triple is a key part of the makeup, made of three parts the subject, predicate and an object. The triple forms the basis for the query language, covered in the next section, and the RDF expression of simple facts. Simple facts are represented in RDF by connecting the subject and objects of a triple using the predicate or property [19], this creates links within datasets people can be connected to addresses or ideas using these expressions. RDF graphs are themselves just multiple RDF triples with the subjects and objects making up the nodes of the graph. In recent years work has been done to use the RDF model and apply it to social media [29], social networks being just big connected datasets means that the connections between the people can be mapped to an RDF triple graph. Then explorations of friendship networks, social interests and counter terrorism can be made. RDF ultimately is a way to create a system of machine readable-processable identifiers for subject, object and predicate with out confusion [22]. Its key characteristic is its ability to attribute meaning to data. 0.2.2.3 SPARQL SPARQL("sparkle") is a query language for RDF [24], the name SPARQL is a recursive acronym SPARQL Protocol and RDF Query Language. It is one of the key components in semantic web technology. It is quickly becoming the De facto standard for quires on the semantic web [5]. SPARQL queries are build using triples of subject, object and predicate. SPARQL is very similar in its syntax to that of SQL, with the terms SELECT and WHERE preforming the same roll. The select query form returns variable bindings result, this is a "Query Solution Object" [6] made of solution objects which correspond to returned results. An example is shown below: Given the data: 11 1 <http : / / example . org /book/book1> <http : / / p u r l . org /dc/ elements / 1 . 1 / t i t l e > " Linked Data Approach " <http : / / example . org /book/book2> <http : / / p u r l . org /dc/ elements / 1 . 1 / t i t l e > " The Semantic Web − A Primer " 3 <http : / / example . org /book/book3> <http : / / p u r l . org /dc/ elements / 1 . 1 / t i t l e > " The Wizard Of Oz " This simple data set describes three books with the title "Linked Data Approach", "The Semantic Web - A Primer" and "The Wizard Of Oz". The query: 1 SELECT ? t i t l e , ? u r i WHERE 3 { ? u r i <http : / / p u r l . org /dc/ elements / 1 . 1 / t i t l e > ? t i t l e . 5 } This query selects the the title, and URI of each book in the dataset. On the data above has the following solution: Uri Title http://example.org/book/book1 Linked Data Approach http://example.org/book/book2 The Semantic Web - A Primer http://example.org/book/book3 The Wizard Of Oz The other two forms A query can take are ask, and construct. Ask returns a True of false answer to a query, with construct forming the result set into A valid RDF format. SPARQL queries ran on the web against live data sets are ran against SPARQL endpoint, this is a conformant SPARQL protocol service Defined in the 2005 SPROT specification [7]. Currently there exists tools such as VizQuer, which can be used to generate SPARQL queries that can be ran against SPARQL end points. In the project SPARQL queries will be used to retrieve data from Linked Data Sites, the results will are returned in an JSON format. [24] 0.2.2.4 XHTML XHTML, standing for eXtensible HyperText Markup Language. Is a combination of XML and HTML, differing to HTML due to more defined structure. its well-formed nature allows for true backwards and forwards compatibility between browsers. The current standard for XHTML is 1.1, released in November 2010. The benefits for a developer using XHTML are as follows: 12 • XHTML is easy to maintain - The strict nature of how XHTML must be formed , makes it easy to spot errors and with the strict XHTML checker on the wc3 website developers can find out where they have gone wrong simply by checking with wc3. • XHTML can take advantage of XML functionality XHTML uses the document object model[23], which defines the standard way of accessing elements of the page. XHTML’s DOM model allows access to elements on the page and attributes of the elements. 13 0.2.3 A review of selected sites which currently use Linked Data This section aims to look at web applications which exist already on the WWW and use LD as a basis for the service being offered. Like a piece of work published in a journal web applications created by companies and the government provide future LD users with an idea of how to approach their own solutions. In these reviews we will be taking a look at how sites retrieve the data they are using, how they have chosen to display that data and how they have augmented the data with data from other sources. 0.2.3.1 Police.uk Police.uk is a site published in January 2011, created for the police force. It uses linked data from data.gov combined with Google maps to give you detailed breakdowns of the crimes by postcode. To find your area you simply enter your post code on the front page and it takes you to page relating to the crime stats in said postcode(figure 4). Police.uk is built on AJAX principles, the page is made up of div elements which are fed with information from JQUERY scripts. The data set the the police Crime And local neighborhood data. This data set contains statistics broken down by area and level of crime. The level of crime is split into six categories: Anti-social, robbery, burglary, vehicle crime, violent crimes, and other crimes. These are then placed on a street map, first of all just showing the numbers of the different types of crime in an area on a Google map. Zooming into the map, shows a detailed breakdown of where the crimes took place, by road (figure 5). To handle this traversing of information a JQUERY script which runs a XHTTP request is ran which gets the information for all of crimes around the area originally searched for when the original area is first cached, figure 3, this list returned in JSON format can then be added to the map if the user zooms in on a location which contains any crimes in the list. In addition to the crime data police.uk uses social media through twitter and youtube, to augment the retrieved data. Two JQUERY scripts where used to retrieve this information and like the crime data is then updated to a ’Twitter/Youtube’ page division. Twitter and youtube are not linked data sources, in this case the retrieved ’tweets’ and videos provide insight into the police daily work. This site is a example of what a mash up between, linked data and social media can achieve. Applications like this build on linked data, are indicators of the potential look and feel of what the semantic web could offer us in the new era of governmental transparency. Social media sources at this time are not in a Linked data RDF format, however with time if current trends are followed there is potential 14 for that to change if this happened due to the modular nature of the police website it would be an easy transition if the switch was made. 0.2.3.2 bbc.co.uk Since 1994 BBC have had a website, in the past the bbc used to maintain all of these sites by hand. The BBC ran into a problem with the vast amount of pages it was having to maintain, thousands of web pages under their domain having to be maintained primarily as independent sites led to sites going down and not coming back up because the original creator wasn’t around anymore, an new solution was needed. The BBC choose to adopt a linked data approach to creating their mini sites, by creating a RDF database containing the information on the BBC. Creating a mini site on /programmes is now a case of defining the information [28]. Then as long as the core database is maintained the dependent sites will also be maintained. For example, Outcasts is a new show on the BBC (figure 6), currently the show is available on iplayer the links to the episodes are updated when the show is uploaded to Iplayer and the most recent link is taken, the smart part is this is done automatically and when the episodes are took down from iplayer the site will reflect that. The BBC /music site, continues the linked data approach taking it further than /programmes. The /music uses information from DBpedia(figure 7) and MusicBrainz(figure 8). By pulling the biography and links and information from resources that are actively maintained and updated by a peer reviewed active community under creative commons licenses the BBC can be fairly sure that the links and information provided is accurate and up to date. To summarize, the BBC have took a linked data approach to creating their sites this will save them time updating each page, make their sites more relevant and in the end save them money, which for an organization based on public money can only be a good thing. 0.2.3.3 GovYou.co.uk - Your Freedom, Your Ideas, Gov You Is a Linked Data site created to continue the work of the original governmental ’Your Freedom’ web side which was launched in 2010 by Nick Clegg at the start of the 15 Figure 3: Retrieving Crime Data Figure 4: Screenshot of police website Figure 5: Screen shot of zoomed in map 16 Figure 6: The Outcasts Minisite Figure 7: An Artist biography pulled from dbpeida 17 coalition government. The original Your Freedom site involved users submitting ideas directly to the government. This site was later closed with the submitted ideas being turned in to a Linked Data resource. The GovYou site http://www.govyou.co.uk/is a continuation of the progress made by the government. It is tagged searchable archive of the 14,000 submitted ideas, with an option to add new ideas. As discussed section 0.2.1.2 GovYou uses HTTP URI’s which refer to the idea on that URI, for example the top idea on the site is ’Abolish Control Orders’, which then has the url:http://www.govyou.co.uk/abolish-control-orders/ The GovYou site, is a good example of the use of Linked Data to attempt to collect and collate masses of ideas and form them into tagged indexable archives with the optimistic aims of generally improving the society we live in. 0.2.3.4 New York Times - Who Went Where The New York Times’ Who Went Where site is a linked data site which uses the New York Times’ own linked data set and dbpedia. The New York Times’ data set contains thousands of subject headings which are mapped to Dbpedia, Freebase and GeoNames [27]. The site is built using XHTML and JQuery, to create a web based search application. The alumni in the news site is an fairly simple example of what can be achieved with their own dataset. Using dbpedia it retrieves the names of all the colleges and universities in the world, the query below: 1 SELECT ? u r i , ?name WHERE { ? u r i r d f : type dbpedia−owl : U n i v e r s i t y . 3 ? u r i foaf :name ?name } L I M I T 1000 OFFSET o f f s e t This query one of many which makes up the site is made from two three triples. The first triple selects the URI of every object in the DBPDEIDA dataset the second triple takes every URI and takes the foaf:name of it. Then queries DBpedia for the NYT identifiers of all the alumni of that institution, which is then used to query their own search api. For example searching for Leeds University results in a page on Jack Staw(figure 9), the only notable alumni from Leeds to have ever appeared in the New York Times, with links to all the NYT articles in ascending chronological order relating to him. This site is a perfect example of the Linked part of linked data, simple applications created using referential data resulting in an informative and useful search engine. The real power of LD comes in the form of composite queries, building applications 18 using different datasets and creating queries with the results of one dataset appears to be core to the Linked data approach. 19 Figure 8: Artist Links and information pulled from Music Brainz Figure 9: The page returned for University of Leeds 20 0.2.4 Linked Data and Content Management Systems There is connection between LD and content Management Systems(CMS). On the technical level A CMS are designed to enable groups of people to share and edit data. In the same sense many of the operators in the Linked Data Cloud are running CMS type services where the user bases are allowed to edit and add new information to sites. Currently there is an issue within the CMS industry with translating the new semantic values of data with coupled with meta data [20] against the current CMS that drive many text and multimedia content driven sites. Solutions this issue have been appearing [8] in this paper the team create a system which ’enable the exposure of site content as linked data’ where they take a Drupal site and convert it essentially into an RDF format, the impactions of this is that the current CMS systems could potentially be retro fitted to an RDF linked data format with SPARQL endpoints created. The conversion to the RDF linked data standard is that developers would be able to then dynamically load data into their sites. The With over one in every hundred websites in the world using Drupal as a backend CMS the use of the linked data standard is set to rise. The more data available on the Web in this standard format the more options for future web agents or linked data based content systems, which is a call echoed by Tim Berners Lee in talks in 2009. 0.3 Case Study After researching linked data, the standards thats govern its use, the and the sites that offer linked data repositories. To continue the evaluation of the linked data approach A case study creating a city guide for prospective Leeds Students, this guide would use linked data standards to create a repository of information on Leeds the city. 0.3.1 Requirements for Alternative Guide Creating a guide for potential students to Leeds Universities is something that I can relate to the need of remembering my time looking at potential universities. I decided to build on my personal reckoning of what a potential through a series of informal discussions with current students and students that are currently looking at universities. These would be guided by myself too keep the discussion on the possibilities of a guide. 21 In total four discussion groups where held with a total of 11 people, 4 of whom we’re current applicants. Over the course of these discussions, only two main points came across in all four discussion groups. These where that they wanted, to get a feel for Leeds by looking at the guide, they wanted to navigate through Leeds in the guide and ’discover’ Leeds. To ’discover’ Leeds and get a feel for it, where to vague to be requirements for a site, when these points where raised during the discussions I inquired what that these ideas meant to the people in the group. This then developed the feelings in to the requirements, of having a map showing the location of what they we’re reading about with clear information and links connecting different places together. Once the discussions had been completed and the notes analyzed, It was clear that this guide could be created as a linked data mashup application, that had the following non-functional requirements: • Clear Information about Leeds • Simple Navigation between pages One the general requirements had been taken functional requirements were needed. Using the information gathered in the discussions in conjunction with the background research these functional requirements emerged: • Retrieval of Linked data from an online dataset. • Methods to dynamically embed Linked data into an XHTML form. • A Map showing key points of locations with information on the locations. 0.3.2 Data Issues From my evaluation of linked data I believe that the current trend in linked data sites is that the inception of the idea for the site is based on looking at the LD resources in the LOD cloud and working from there. Choosing the creation of a web guide, after researching the technology and recourses mean that many of the more specialized of the linked data resources are instantly in now way relevant to the data required for a web guide. With this taken into account choosing Dbpedia as the resource for the first iteration, this would provide the ’bones’ to build on. 22 0.3.3 Methodology With this case study seeking to explore the potential applications of linked data, the methodology of the implementations should seek to enhance this exploratory nature. With this in mind the software engineering part of the case study, I will be using an evolutionary development methodology with some elements of rapid development. 0.3.4 Implementation 0.3.5 Stage 1: First Iteration 0.3.5.1 Design Once the requirements of the application had been constructed, I a clear vision of what the application was attempting to archive. The first iteration would be an XHTML page augmented with JavaScripts which would use JQUERY to query the DBpeida SPARQL endpoint, on receiving this data the XHTML page is updated. Using an evolutionary development approach a stage one iteration was required. This would form the core of the application, the design broke into four interconnected parts. Broadly these parts where, the design of the SPARQL queries, JavaScript methods, XHTML page. In this section I will detail the creation of these parts. 0.3.5.2 SPARQL The data retrieved from the SPARQL queries was core in powering the web application, the first iteration would used Dbpedia to retrieve data. To build the web guide it would be key to gather data on places in Leeds so as brought up in the discussion sessions users could explore the places in and around Leeds learning about Leeds as they explored. As covered in section 0.2.2.3, SPARQL queries in general are made up of triple patterns. Designing the SPARQL queries would require me to select data from a source specifically or to find items which are connected with Leeds. In line with evolutionary development approach, the design of the queries was an evolution starting with selecting the abstract for the resource Leeds I then built from there. 23 To extract the abstract I would need to build a query that used the central resource in DBPEDIA for Leeds (http://dbpedia.org/resource/Leeds) as the subject of the query then using the ontology for the abstract as the the predicate and a variable for the abstract object. 2 SELECT ?abstract WHERE { { <http : / / dbpedia . org / resource / Leeds> <http : / / dbpedia . org / ontology / abstract > ?abstract ) } } This query returned all the abstracts held for Leeds, as part of the rapid evolutionary development building on this query a filter was added to stem the results into the English ones. 1 F I L T E R langMatches ( lang (? abstract ) , ’en ’ ) With this basic query completed retrieving all the places in Leeds was the next stage: 1 PREFIX dbprop : <http : / / dbpedia . org / property /> PREFIX db : <http : / / dbpedia . org / resource /> 3 PREFIX dpowl : <http : / / dbpedia . org / ontology/> 5 SELECT ?name WHERE { ? u r i dpowl : location db : Leeds . ? u r i foaf :name ?name . 9 } 7 11 } This query was built using two triple patterns. The first triple pattern finds the URI address for every location that has its location set as Leeds, the second triple pattern then retrieves the foaf:name of each uri. 0.3.5.3 XHTML The design of the XHTML page was the simplest part of the process, at this stage the webpage the appilcatiton is on simply had to contain DIV’s that could be added to using a JQUERY script. 24 1 </head> 3 <body> 5 <h1>An A l t e r n a t i v e Web Guide</h1> 7 <div i d = " l i n k s " ></div > <div i d = " abst " > <img i d = "new" ><img i d = " theImg " width= " 400 " height= " 300 " a l i g n = " r i g h t " border= " 0 " > </ div > 9 </body> 11 </html > Each div provides a inclusive way of group areas of content, each div acts acts as a dump for data retrieved from the SPARQL query. 0.3.5.4 JavaScript In the design of the JavaScript elements of the site, the first objective was to create a set of methods for use in the retrieval and manipulation of data. During the investigation JQUERY appeared to be used extensively, I concluded that this was due to the simplification of the AJAX methods for this JQUERY would be used for the development of the JavaScript elements of the alternative guide. To access the linked data once the SPARQL queries we designed I created a function using JQUERY for the AJAX request to sent the query to the end point: 1 f u n ction addlinks ( div_id , sparql ) 3 { / / DpediaUrl i s spliced with the SPARQL query to create the request 5 var dbpediaUrl = " http : / / dbpedia . org / sparql?default−graph−u r i =http%3A%2F%2Fdbpedia . org&query= " + 7 escape ( sparql ) + "&format= j s o n " ; 9 / / THE JQUERY . ajax method i s used to run the query against the DBPEDIA Set 11 $ . ajax ( { 13 / / The returned data type s e t here w i l l be JSON dataType : ’ jsonp ’ , 15 jsonp : ’ callback ’ , 25 17 19 u r l : dbpediaUrl , / / SET ABOVE success : function ( data ) { / / For each binding i n the r e s u l t s the s c r i p t adds a l i n k to that place i n the d i v _ i d that has been given $ . each( data . r e s u l t s . bindings , function ( entryIndex , entry ) { var t x t = " ’ " +entry .name. value+ " ’ " ; 21 23 var content = document . getElementById ( d i v _ i d ) . innerHTML ; document . getElementById ( d i v _ i d ) . innerHTML = ( content + " < l i ><a h r e f =# onClick = " + " updateinfo ( ’ " +entry . u r i . value+ " ’ ) " + " > " + t x t + " </a></ l i > " ) ; }) ; 25 27 } }) ; This function, given a SPARQL query and div identifier updates and rewrites the XHTML of the DIV, this is a key part of the AJAX ideas. The new XHTML contains links which run the next script to retrieve the updated information. It was designed to just propagate the first set of links of places so that a user could see places of interest and then click them on clicking them the updateinfo script would be ran: 2 f u n ction updateinfo ( URI , sparql ) { 4 var sparql = " SELECT ?abstract ?name WHERE { " + 6 " < " +URI + " > " + " <http : / / dbpedia . org / ontology / abstract > ?abstract . " + 8 " < " +URI + " > r d f s : label ?name . " + " F I L T E R langMatches ( lang (? abstract ) , ’en ’ ) } " ; 10 12 var dbpediaUrl = " http : / / dbpedia . org / sparql?default−graph−u r i =http%3A%2F%2Fdbpedia . org&query= " + 14 escape ( sparql ) + "&format= j s o n " ; 16 18 $ . ajax ( { dataType : ’ jsonp ’ , 20 jsonp : ’ callback ’ , u r l : dbpediaUrl , 22 success : function ( data ) { 26 $ . each( data . r e s u l t s . bindings , function ( entryIndex , entry ) { var newImg = new Image ( ) ; newImg . s r c = entry . imageuri . value " <img s r c = " +newImg . s r c + " > " var content = document . getElementById ( " abst " ) . innerHTML ; document . getElementById ( " abst " ) . innerHTML = " <h2> " +entry .name. value+ " </h2 > <p> " +entry . abstract . value+ " </p> " ; 24 26 28 30 }) ; 32 } }) ; 34 36 } This script onclick, using the URI of the resource the user was clicking on to update the page with the abstract, and adds a heading with the name. Due to the way the the AJAX methods work each time a user clicks the page is updated without the page reloading. There is a delay, between the click and the update due to the sending and receiving of data. 0.3.6 0.3.6.1 Stage 2: Second Iteration design Once the first iteration was complete, the application was retrieving place data from DBPEDIA. This data was then dynamically formatted into a list of links that on click would retrieve the abstract from the selected objects URI. This first experiment was somewhat of a success however, the application was just a list of place names and abstracts. The application needed to evolve. Building on the methods created in the first iteration was key to the second iteration, therefore the new SPARQL queries would be altered versions of previous queries selecting different data objects and retrieving more data. 0.3.6.2 SPARQL The second iteration SPARQL queries had to retrieve more information but also better more relevant information. Working under the assumption that subjects 27 that were on the topic of Leeds which contained connections to the places and things in Leeds would provide a better basis for creating a guide. Rather than a list of places, instead now we would have a list of subjects which would be delved into. Selecting the URI’s and names where the skos:prefLabel was Leeds, the powerful nature of linked data structures can be seen here using the ’fact’ that categories of information on Leeds always have the prefLabel Leeds: SELECT ? u r i , ?name WHERE { ? f l o a t skos : prefLabel ’ Leeds ’@en . ? u r i skos : broader ? f l o a t . 4 ? u r i r d f s : label ?name . 2 6 } This query returns the result set: uri name http://dbpedia.org/resource/Category:Parks_and_commons_in_Leeds "Parks and commons in Leeds"@en http://dbpedia.org/resource/Category:Companies_based_in_Leeds "Companies based in Leeds"@en http://dbpedia.org/resource/Category:Bishops_of_Ripon_and_Leeds "Bishops of Ripon and Leeds"@en http://dbpedia.org/resource/Category:History_of_Leeds "History of Leeds"@en http://dbpedia.org/resource/Category:Politics_of_Leeds "Politics of Leeds"@en http://dbpedia.org/resource/Category:Sport_in_Leeds "Sport in Leeds"@en http://dbpedia.org/resource/Category:People_from_Leeds "People from Leeds (district)"@en http://dbpedia.org/resource/Category:Transport_in_Leeds "Transport in Leeds"@en http://dbpedia.org/resource/Category:Leeds_media "Leeds media"@en http://dbpedia.org/resource/Category:Geography_of_Leeds "Geography of Leeds"@en http://dbpedia.org/resource/Category:Culture_of_Leeds "Culture of Leeds"@en http://dbpedia.org/resource/Category:Leeds_City_Region "Leeds City Region"@en http://dbpedia.org/resource/Category:Music_from_Leeds "Music from Leeds"@en http://dbpedia.org/resource/Category:Visitor_attractions_in_Leeds "Visitor attractions in Leeds"@en http://dbpedia.org/resource/Category:Buildings_and_structures_in_Leeds "Buildings and structures in Leeds"@en http://dbpedia.org/resource/Category:Education_in_Leeds "Education in Leeds"@en http://dbpedia.org/resource/Category:Local_government_in_Leeds "Local government in Leeds"@en With a new improved query for getting the basis for the data, improving on the lack of information on screen was the next key issue. To solve this problem, I opted to improve on the the iteration one query for retrieving the abstract adding in the addition data elements needed. These would be and image, and some geolocation points to plot on the map. SELECT ?abstract ?imageuri ? l a t ?long ?name WHERE 28 2 <uri > <http : / / dbpedia . org / ontology / abstract > ?abstract . < u r i > r d f s : label ?name . < u r i > foaf : depiction ?imageuri . < u r i > geo : l a t ? l a t . < u r i > geo : long ?long . " F I L T E R langMatches ( lang (? abstract ) , ’en ’ ) } ; 4 6 The image is retrieved using the FOAF:depiction, geo:lat and long are surprisingly the latitude and longitude of the location. With these new SPARQL queries created, new divs would be needed to hold the new information. 0.3.6.3 XHTML With the decision to select data in a different way two DIVs where required for links now, one to show the categories of subject and one to show the subjects of the categories. Keeping the Links div a places DIV was added to accommodate the extra data: 1 <div i d = " places " ></div > To create a map using the Googlemaps Api requires a Div to put the map in as with other DIV elements in the guide this one has no meaning alone and acts as a space to dynamically add the map to: 1 <div i d = "map" s t y l e = " width : 550px ; height : 450px " ></div > The final element added in iteration two, was an image element: 1 <img i d = " theImg " width= " 400 " height= " 300 " a l i g n = " r i g h t " border= " 0 " > The image element will be updated by the updated updateinfo function, this will update the src attribute of the img element to the content being displayed at the time. JavaScript With the functions built, in iteration one the second iteration built on the updateinfo and add links functions. The addlinks function remained unchanged 29 with the new SPARQL query being ran and the list of categories updated into the links div. The updateinfo’s success function was altered to process the new data: 1 3 5 7 f u nction ( data ) { č . each( data . r e s u l t s . bindings , function ( entryIndex , entry ) { var newImg = new Image ( ) ; newImg . s r c = entry . imageuri . value " <img s r c = " +newImg . s r c + " > " var content = document . getElementById ( " abst " ) . innerHTML ; document . getElementById ( " abst " ) . innerHTML = " <h2> " +entry .name. value+ " </h2 ><img s r c = " +newImg . s r c + " width = ’400 ’ height = ’300 ’ > < l i >lat : "+ entry . l a t . value+ " </ l i >< l i >Long : " +entry . long . value+ " <p> " +entry . abstract . value+ " </p> " ; updatemap( entry . l a t . value , entry . long . value , entry .name. value , entry . abstract . value ) ; Now the function updates, the heading and abstract as before, sets the img elements src to the FOAF:depection which is a URL of the image of the resource that is being accessed. Finally the success function updates the map to show the location of the current content: 2 f u n ction updatemap( l a t , long ,name, i n f o ) { 4 var l a t l n g = new google . maps . LatLng ( l a t , long ) ; var myOptions = { 6 zoom : 12 , center : l a t l n g , 8 mapTypeId : google . maps . MapTypeId .ROADMAP }; 10 var map = new google . maps .Map(document . getElementById ( "map" ) , myOptions ) ; var marker = new google . maps . Marker ( { 12 position : latlng , map:map, 14 t i t l e : name} ) ; } The update map function uses the GoogleMaps Api, it takes takes four arguments: Latitude, Longitude, Name and information. These arguments are then used to add a pointer to the map showing the location of the current content. 30 0.3.6.4 Testing To test the final prototype I had to make sure that it ran correctly in the three most used browsers on the Web today. The effect on browsers on web deveopment has been discussed in the past [25], the current way browsers are made means that having a web page which is strict XHTML isn’t a guarantee that the page will load with the same behavior in each browser. Therefore I will take a look at the top three browsers these are Mozilla Firefox, Google Chrome and Microsoft Internet Explorer. Browser Page Loads Shows Picture Retrieves Extra data Map Updates Chrome Yes Yes Yes Yes Firefox Yes Yes Yes Yes Yes Yes Internet Explorer Yes Yes Testing of todays most widely used browsers As the table shows all browsers, were functionally working. The differences came in the behavior in displaying the information on the site, Internet explorer placed the image over the first part of the abstract, Firefox placed the image above the text with no text being obscured finally Chrome also placed the abstract data below the image. The reason for the different behaviors is because a style sheet wasn’t used to properly control the div elements. With out the cascading style sheet controlling the DIV elements the browsers apply there own logic in ordering the elements, this is fine in Chrome and Firefox although the site remains very plain However Internet Explorer for some unknown reason the output is wrong. 0.4 Evaluation In this section I have split the evaluation into four sub sections, three sections will be regarding the development of the web guide these will be the Evaluation against objectives, requirements and from a user view point. The final section will be an evaluation of the linked data approach, and will take a look at the effectiveness of the technologies as they stand. 0.4.1 Agaisnt Project Objectives The project objectives were set down in section 0.0.2 at the beginning of the project process. They were made of four key points if accomplished would indicate that the project had been a success. Each point is evaluated below: 31 – To learn about the technical platform of link data and apply that knowledge - This objective was completed in two distinct ways, Learning about the Linked data platform was a central part of the background research section, looking at the standards in place and the technologies that power linked data provided an insightful look into the platform. The creation of the web guide was the application of that knowledge. – To survey related work in the field of linked data - As part of my research I looked at other websites and web based applications which used Linked data to power their site. This survey provided an insight to how people are going about creating Linked data based applications, while also serving to further my understanding of the platform. – To document and gather the requirements of a site which uses linked data. - In the creation of the web guide, requirements were created for the Linked data site this is documented in section 0.3.1. – To design and develop an alternative guide to Leeds - The case study of the alternative guide creation documented the design and development 0.3. 0.4.2 Agaisnt Requriements As one of the objectives was to design and develop an alternative guide to Leeds a set of requirements where set down, these requirements form the basis for judging whether the guide had achieved its goals: – Use a Linked Data source, to form the basis of the site. - In the creation of the alternative web guide the linked data source DBpedia was used to form the basis of the site. In the second iteration of the web guide the web guide selected all the subjects that bordered with Leeds the subject; These subject headings contained information pertaining different aspects Leeds. This was to attempt to give a full guide of Leeds the city, this approach failed as much as it succeeded the returned results were useful to a point. The problem being that the returned results were all not of the same type different ontologies proved to be awkward to handle, however the web guide did retrieve data from a linked data source. – Embed retrieved Linked Data on the site dynamically. - Using the AJAX methods I was able to complete this goal. The XHTML page was divided into parts which the JavaScript updated dynamically. 32 – Create A interactive map of Leeds with keys points of interest to a student. Combining Geo location information and retrieved descriptions of places. - In iteration two a SPARQL query was created to retrieve the Geo locations of each place in Leeds this point was then added to a map. This was done using the Google maps Api, the map was interactive in the A user could move about it and see some information about each point. – Develop a website which natively runs on today’s top three browsers:Mozilla Firefox, Google Chrome, and Windows Internet Explorer. - Shown in the testing after iteration two with some minor tweaks the site ran in each listed browser. 0.4.3 From A Potential User To fully evaluate the guide the viewpoint of a user must be considered. To consider this we will look at the experience of a user using the system to find out about the city of Leeds. To do this I will take anticipate the actions new user of the system, and discuss the pitfalls potential highlights. The first thing a user of the guide finds, is that it appears like a normal web page, Links on one side, pictures and headings. On clicking links the user will find more links popping up on the subject that is clicked on. Then investigating those new links will likely lead the user to discover the page updates with new information automatically. The way the product is designed, the ease of use comes from its simplicity any user can open the page and just click through links discovering more information on Leeds. While using this however the user will most likely start to realize the page far from instantly loads each time a link is clicked there is a wait time while the next data set is retrieved. This provides the first real pitfall in the linked data approach, unless your caching your data in one load at the start of a session the user has to load bits of data at a time which results in many small waits. Due to the design of the SPARQL queries some of the objects returned from the categories, do not fit the pattern to retrieve the image, abstract, name, coordinates. Approximately 90 % of the links return new information if clicked, but may be missing an image if its a concept or cooariantes if its not a place. This essentially isn’t a major problem but from a users perspective this is a large flaw in the design. The Web Guide suffers from one grave flaw in that the users have no control of any of the site except for the links in the side bars. As much as the page is dynamically embedded the experience for a user isn’t dynamic because of that, 33 the opposite in fact is true clicking through pages of automatically retrieved and dynamically displayed data does not have the same appeal as a well crafted site. Overall it does feel like you explore Leeds clicking through the different parts of the extracted data, the map provides a point of reference so new visitors to the city can connect the dots as they move from one data object to the next. The sites guides design is basic, this would be a major pitfall if the product was going to market. As the product is only a prototype, for user improvement more features 0.4.4 Evaluation of the effectiveness of technologies Creating the web guide was part of my investigation into the Linked data approach. I wanted to look at both the practical problems of using Linked data in its current state and what can be created in a short time space using the current standards in web technology. In this subsection I will be addressing both issues I have found with the Linked data platform and evaluating the effectiveness of the technologies. With SPARQL being the standard for querying RDF it is central in any evaluation of the linked data approach. As touched on in section 0.4.3, the main problem that appeared with the SPARQL query language isn’t in the semantics of the language, it lies in the retrieval of the data. When querying a SPARQL end point the application sets a XHTTPrequest to the endpoint which then responds with the results in a selected format. This creates a ’lag’ time between users opening the site and displaying the data, this was seen in the evaluation of the N.Y Times web mesh up which covered its loading times with a waiting message. This problem can be avoided if by keeping a copy of the data set locally however this is not practical if multiple large sets are being used. This seems to be a problem that gets increasingly more of a delay when more sets of data are used. Repetition of data was a problem that needs to be addressed, within even only DBpedia the lack of consistency of the data. This becomes increasingly obvious as one explores the data sets, for example on DBpedia many resources uses FOAF:name and dbprop:name both these values are from my experience are the same if both are present but often only one or the other was. This created a problem selecting the right property to select the piece of information, and could be solved by removing the repetition by having only one ontology that describes names. 34 The AJAX methods where the most effective existing parts of what I perceive the Linked data approach to be. The using JavaScript to retrieve data and manipulate DOM objects provides a developer with a clear route for collecting and dynamically displaying data. The combination of JavaScript and XHTML creates options that where not available to HTML developer, AJAX methods provide a perfect framework for the creation of mesh ups. During my background research, I came to the conclusion that the growth rate of linked data resources had slowed down and that the rate of creation of new datasets was slowing. However when building the application I was overwhelmed by the amount of datasets in use, and in this lies the problem. At the moment with only 203 nodes in the LOD cloud choosing DBpdeia as the source for retrieving data was an obvious choice a central node with good links to other sites, but if there were 15,000 sets without some logical agent traversing the links between data its hard to see how you could go about dealing with the amount of data you could possibly use, and with calls for all data to be in an RDF format there is a potential for an exponential growth. With linked data being a new standard this may however be analogous of complaining that there was an increase in web pages which made it hard to find a good web page, and the inception of new sets could conversely spark the promised wave of new linked data applications. The Linked data approach is about creating new services or resources from data that is being shared, its also about trying to add something to the data being retrieved to make it more than the sum of its parts. The technologies exist at the moment to achieve this goal, however 0.5 Discussion 0.5.1 Future Extensions There is an extensive list of extensions that could be made to the web guide to extended and refine it. In this section I will be looking at the extensions that could be plausibly added to the web guide application given additional time have been available, to improve its functionality. Additions to the site should build on the data resources that have already been retrieved.Several of the more practical potential enhancements are listed below: – Creating A Linked Data jobs set would be a massive addition to the Guide, many students looking for at Universities would like to look at the available 35 jobs. Creating an RDF repository would be another good way of expanding my understanding of Linked Data. – Addition of more Linked data sources, meshing them into the current structure of the site as an experiment to see if more data equals a better guide. Instead of building queries from retrieved terms from DBpedia it would be interesting to create composite queries that used DBpedia sameAs ontologies, to follow the links on the LOD cloud to new data. – An area of extension for work can be found in looking where there is a crossover between linked data and (social) media, I would look to extend the web application by creating some kind of media aware agent which when given data could use those terms to retrieve media based on those terms. For example like police.uk this could be in the form of youtube videos, or it could query RSS feeds for news articles. – To extend the guide I would like to attempt to combat loading times by applying some kind of logic to the retrieval of data. Caching the next potential clicks when a user has selected a category could reduce perceived waiting times. – At the end of the second iteration the web guide still looked very basic and left it to the browser to decide how to display the page. To improve on this I would have liked to implemented a style sheet which did more than just divide the divs into basic sections, and also use a Java api to implement the link sections in a more visually appealing ways. 0.5.2 Conclusion Linked data is a re-envisioning of the semantic web in less flowery terms, its a real way to make a semantic web future possible. The noise surrounding this area has grown increasingly in recent years. The linked applications being built now by developers are starting to evolve from web pages to real interactive useful applications, with some of the applications being of real use to people. When people helped with the earthquake in Haiti, using the power of linked data to update the map, which was then shared with rescue workers on the ground a real sense of what Linked data could accomplish was shown. Linked data is as much about people putting data on to the web as it is the applications made from the data. As the LD ’cloud’ continues to grow, the potential for new applications grows with it. The Linked data approach to delivering data lies in the RDF format, meta data with more value because its fully integrated which then makes it easier to link data from different providers. 36 The goal of this project has been to produce an alternative guide to Leeds using the linked data approach while exploring the linked data platform. The evaluation although limited that has been preformed indicates that this has been achieved. The exploration, of the Linked data concepts has been the real challenge in the process. It seems clear that Linked Data is here, its being used by people to create a new type of web aware applications. 37 Bibliography [1] G. Antoniou and F. Van Harmelen. A semantic web primer. The MIT Press, 2004. [2] T. Berners-Lee. Linked data. International Journal on Semantic Web and Information Systems, 4(2), 2006. [3] T. Berners-Lee, J. Hendler, O. Lassila, et al. The semantic web. Scientific american, 284(5):28–37, 2001. [4] J.P. Bigus and J. Bigus. Constructing intelligent agents using java. 2001. [5] C. Bizer, T. Heath, and T. Berners-Lee. Linked data-the story so far. International Journal on Semantic Web and Information Systems, 5(3):1–22, 2009. [6] K.G. Clark, L. Feigenbaum, and L. Feigenbaum. Serializing sparql query results in json. W3C Note, 2007. [7] K.G. Clark, L. Feigenbaum, and E. Torres. SPARQL protocol for RDF. W3C working draft, 14, 2005. [8] S. Corlosquet, R. Delbru, T. Clark, A. Polleres, and S. Decker. Produce and Consume Linked Data with Drupal! The Semantic Web-ISWC 2009, pages 763–778, 2009. [9] DBpedia. http://dbpedia.org/page/Mark_Zuckerberg, Feb 2011. [10] J.J. Garrett et al. Ajax: A new approach to web applications. February, 18:2005, 2005. [11] Semantic Web Agreement Group. What is the semantic web http://swag. webns.net/whatIsSW, May 2001. [12] T.R. Gruber et al. A translation approach to portable ontology specifications. Knowledge acquisition, 5:199–199, 1993. 38 [13] Leeds City Guide. Leeds student guide http://www.leeds-city-guide.com/ students, feb 2011. [14] M. Hausenblas. Exploiting linked data to build web applications. Internet Computing, IEEE, 13(4):68–73, 2009. [15] Tom Heath. Linked data project faq http://linkeddata.org/faq, Aug 2010. [16] J. Heflin and J. Hendler. A portrait of the Semantic Web in action. Intelligent Systems, IEEE, 16(2):54–59, 2001. [17] J. Hendler, O. Lassila, and T. Berners-Lee. The Semantic Web - A new form of web content this is meaningful to computers will unleash a revolution of possibilities. Scientific American Special Online Issue, 2002. [18] I. Horrocks. Ontologies and the semantic web. Communications of the ACM, 51(12):58–67, 2008. [19] G. Klyne and J.J. Carroll. Resource description framework (RDF): Concepts and abstract syntax. Changes, 2004. [20] GB Laleci, G. Aluc, A. Dogac, A. Sinaci, O. Kilic, and F. Tuncer. A semantic backend for content management systems. Knowledge-Based Systems, 23(8):832–843, 2010. [21] University Leeds. Leeds univsity taught student guide http://www.leeds.ac. uk/qmeu/tsg/, February 2011. [22] F. Manola, E. Miller, and B. McBride. RDF primer. W3C recommendation, 10, 2004. [23] S. Pemberton et al. XHTMLŹ 1.0 The Extensible HyperText Markup Language. W3C Recommendations, pages 1–11, 2000. [24] E. PrudŠHommeaux, A. Seaborne, et al. SPARQL query language for RDF. W3C working draft, 4, 2006. [25] C. Queinnec. The influence of browsers on evaluators or, continuations to program web servers. In ACM SIGPLAN Notices, volume 35, pages 23–33. ACM, 2000. [26] S. Russell and P. Norvig. Intelligent agents. Artificial intelligence: A modern approach, pages 32–54, 2003. 39 [27] Evan Sandhaus and Rob Larson. linked data cloud More tags released to the http://open.blogs.nytimes.com/2010/01/13/ more-tags-released-to-the-linked-data-cloud/, January 2010. [28] M. Smethurst. How we make websites http://www.bbc.co.uk/blogs/ radiolabs/2009/01/how_we_make_websites.shtml, 2009. [29] AK Thushar and P.S. Thilagam. An rdf approach for discovering the relevant semantic associations in a social network. In Advanced Computing and Communications, 2008. ADCOM 2008. 16th International Conference on, pages 214–220. IEEE, 2008. [30] Wc3. Semantic web http://www.w3.org/standards/semanticweb/, January 2010. 40 Appendix A Personal Reflection This project was something I had looked forward to starting for a long time, the last step in finishing at Leeds. I found the whole process to be overwhelming at some points. So many things that had to be done with the one goal of creating this report. Choosing to look at Linked data was one of the best choices I made in my university time. Learning about an establishing side of development of computing has been interesting and fun looking at some of the quirky stuff people have done with it, the linked data pokémon site being a great example of quirky. Coming to the end of my project I came to the realization I was about to have to get a job, in my breaks from work I started looking at potential jobs. Now if your like me and haven’t looked at the job market you should have a look during before you start your project. Choosing a project that fitted with your ideal career path seems like a wise idea in hindsight but honestly I didn’t. During this project I feel I’ve developed a new skill set which would be now valuable for the workplace, critically looking at your own work is a skill that is important in scientific processes and workplaces. Starting with an open mind is essential to creating the right piece of work. At the start of the project before I had enough knowledge of the Linked data platform and the datasets, I made assumptions to what was going to be possible and what i could achieve. In some respects this spurred me on to work harder and learn more, but my advice would be to just keep an open mind and try to set 41 down what you want to achieve at an early stage and establish its plausibility. With this project not directly building on any past modules, I was covering lots of ground for the first time. This mean learning to use JavaScript, XHTML, DOM manipulation and some SPARQL, In a short amount of time while I was ultimately up to the task the pressure of learning something entirely new while running over deadlines is a stressful experience. At one point designing the SPARQL queries I felt like crying this wasn’t going to help and I pushed on once I had understood the concepts rather than just read them it was a much more pleasurable endeavor. Being a reflection, I should look at what I have personally gained during the process of this report. I have learn’t that I can make a substantial document and that I shouldn’t fear taking on large tasks like this. They only start as large and the more you do the smaller they are. One thing I would do again is make draft chapters while the project is on going, as you can see in my time plan all the writing was left till the end. Now its finished its clear that this was not the right strategy, keeping notes was helpful during the process but unless they are organized they are practically useless under pressure. It would seem wise to finish with a warning future students about project timings,it feels like you have forever but it really works out that you have the right amount of time to do a good piece of work. Starting the project late I feel has meant that playing catch up has been the path of the course. You should make sure to not whittle the time away and really take the chance to do a great piece of work. 42