The Use of Linked Data Approach For An Alternative Web Guide For

Transcription

The Use of Linked Data Approach For An Alternative Web Guide For
'
$
The Use of Linked Data Approach For An
Alternative Web Guide For Potential
University Applicants
Noel Doherty
Computing
Session 2011
&
%
The candidate confirms that the work submitted is their own and the appropriate
credit has been given where reference has been made to the work of others.
I understand that failure to attribute material which is obtained from another source
may be considered as plagiarism.
(Signature of student)
Summary
This project is concerned with producing a case study creating a useful web
application which will retrieve Web based material from linked data sources and
present to the user a automatically generated web guide of Leeds City. Included in
this report is a background review of similar and related systems in this area in addition to an literature review of materials for potential developers, and an in dept
evaluation of the effectiveness of the technologies in the area.
i
Acknowledgements
I would like to thank my supervisor, Lyida Lau, for her help during the whole project
process her help has kept me on task many times.
I would also like to take a chance to thank my family without their support I doubt
I would have ever finished second year.
ii
Contents
0.0.1 Project Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
0.0.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
0.0.3 Minimum Requirements . . . . . . . . . . . . . . . . . . . . . . . . . .
1
0.0.4 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
0.1 Project Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
0.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
0.1.2 Schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
0.2 Background Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
0.2.1 The Web Of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
0.2.1.1
The Semantic Web . . . . . . . . . . . . . . . . . . . . . . . .
5
0.2.1.2
Linked Data . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
0.2.1.3
The Linked Data Cloud . . . . . . . . . . . . . . . . . . . . .
6
0.2.1.4
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
0.2.2 Developmental Tools and Appropriate Technologies . . . . . . . . 10
0.2.2.1
AJAX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
0.2.2.2
RDF(Resource Description Framework) . . . . . . . . . . . . 10
0.2.2.3
SPARQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
0.2.2.4
XHTML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
0.2.3 A review of selected sites which currently use Linked Data . . . . . 14
0.2.3.1
Police.uk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
0.2.3.2
bbc.co.uk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
0.2.3.3
GovYou.co.uk - Your Freedom, Your Ideas, Gov You . . . . 15
0.2.3.4
New York Times - Who Went Where . . . . . . . . . . . . . . 18
0.2.4 Linked Data and Content Management Systems . . . . . . . . . . 21
0.3 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
0.3.1 Requirements for Alternative Guide . . . . . . . . . . . . . . . . . . . 21
0.3.2 Data Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
0.3.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
0.3.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
0.3.5 Stage 1: First Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
iii
0.3.5.1
Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
0.3.5.2
SPARQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
0.3.5.3
XHTML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
0.3.5.4
JavaScript . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
0.3.6 Stage 2: Second Iteration . . . . . . . . . . . . . . . . . . . . . . . . . 27
0.3.6.1
design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
0.3.6.2
SPARQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
0.3.6.3
XHTML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
0.3.6.4
Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
0.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
0.4.1 Agaisnt Project Objectives . . . . . . . . . . . . . . . . . . . . . . . . 31
0.4.2 Agaisnt Requriements . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
0.4.3 From A Potential User . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
0.4.4 Evaluation of the effectiveness of technologies . . . . . . . . . . . . 34
0.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
0.5.1 Future Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
0.5.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Bibliography
38
A Personal Reflection
41
iv
0.0.1
Project Aim
The aim of this project is to create an alternative web guide for prospective Leeds
University and Metropolitan students while exploring the Linked data platform. An
alternative guide would aim to be an alternative look at what information students
may be interested in, less about the studious aspects of university life as apposed to
the Taught Student guide [21] which Leeds University publishes every year and the
Leeds City Guide student section which amounts to a list of bars [13]. Core aims
of the project would be creating a usable resource of information, which is up to
date automatically using the linked data resources on the world wide web at the
moment, which would mean the site would stay relevant over the course of time.
0.0.2
Objectives
The objectives are listed as follows they are achievable steps towards completing
the aim of the project.
The minimum objectives are as follows:
• To learn about the technical platform of link data and apply that knowledge.
• To survey related work in the field of linked data.
• To document and gather the requirements of a site which uses linked data.
• To design and develop an alternative guide to Leeds.
• To evaluate the project in terms of the aims set down.
0.0.3
Minimum Requirements
The minimum requirements are:
• Use a Linked Data source, to form the basis of the site.
• Embed retrieved Linked Data on the site dynamically
• Create A interactive map of Leeds with keys points of interest to a student.
Combining Geo location information and retrieved descriptions of places.
• Develop a website which natively runs on today’s top three browsers:Mozilla
Firefox, Google Chrome, and Windows Internet Explorer.
1
Enhancements that could be implemented are as follows:
• Use BBC’s Linked Data resource to add where applicable news articles relating
to the current content the user is viewing.
• Link in a social media site as data source. This would involve connecting the
data retrieved from Linked data and a twitter feed or facebook page. So that
live updates from say for example clubs and bars would be displayed on the
site.
• Create a jobs section which uses URI’s for jobs which can be added on the site.
Jobs would be submitted and tagged by the submitter, and searchable from
the site.
Enhancements to the project are about improving the relevancy of the data, and
creating a site where the information available is recent and therefore potentially
more relevant.
0.0.4
Challenges
In creating A linked data application a devoper is instantly faced with the following
challanges:
• Finding Data that works/is relevant
• Multitude of standards within Ontologies
• Accessing the data
2
0.1
0.1.1
Project Outline
Introduction
This project aims to create a linked data application while also providing an evaluation of the linked data platform. Using the case study of building an application to
act as a web guide.
0.1.2
Schedule
This section contains a table which breaks down the time alloted to the full project.
The four milestones on the project plan are four key points in the life of the project
and refer to the following:
1. Milestone One: Requirements Gathering, background research has been completed.
2. Milestone Two: 1st prototype which meets the minimum requirements has been
created.
3. Milestone Three: 2nd and final prototype which should exceed the minimum
requirements.
4. Milestone Four: Evaluation and final write will have been finished.
3
1
Information
Dates
Problem definition, aims re-
7th-13th February
Milestones
quirements and objectives
2
Preliminary Investigation -
14th-21st February
Investigate, Linked Data,
Semantic web,
SPQRQL
and Approaches to linked
data
3
Research on Data Sources
22nd - 2nd March
and features - Investigate
where data will be coming from.
e.g.Dbpedia,
data.gov. And what features the site could use.
4
Mid Term Report.- Produce
3rd - 8th March
Milestone 1
report for submission on
progress so far.
5
Design
-
Design
site,
9th - 16th March
SPARQL queries to be used
to retrieve information
6
1st Prototype - Develop
17th - 27th March
1st iteration of website,
this would include a single data source being retrieved.
This prototype
should meet minimum requirements
7
Testing of Prototype - Test
28nd - 1st April
Milestone 2
prototype, ensure data is
correctly being retrieved.
8
Progress Meeting
1st April
9
2nd Prototype - Final site
2nd - 13th April
Milestone 3
creation. At this point extensions will have been implemented.
10
Testing of 2nd prototype
14th April - 17th April
- Test site, ensuring it all
works.
11
Evaluation of system - Eval-
18th - 22st April
uate whether site has provided useful information
and met requirements.
12
Full write up
23rd April - 9th May
4
Milestone 4
0.2
0.2.1
0.2.1.1
Background Research
The Web Of Data
The Semantic Web
The Semantic Web(SW), is a Web that includes documents, or portions of documents,
describing explicit relationships between things and containing semantic information
intended for automated processing by machines [11]. In essence the SW seeks to
add meaning to the information that is present on the World Wide Web(WWW), with
the aim of creating machine readable meta data.
SW will not come as a new product it will be pragmatically built. Like the Internet
SW will be decentralized [3], with data sets existing independently with links connecting them. In contrast to the WWW the current way of accessing information the
SW will contain structured data. This structure will be built by creating ontologies,
and these ontologies will provide the relationships.
An ontology is a specification of a concept which can be shared [12]. Ontologies would serve as the vocabulary [16] of the SW with user created agents using
these vocabularies to navigate the data structures on the SW. An ontology itself a
document or file on the SW would contain entities and relations between them, with
a taxonomy that defines what makes up the entities and the relations between the
objects. [3]
The concept of an agent has been discussed within computing before [4] [26],
they are systems that are in an environment which they can perceive, are capable
of action within that environment that is unsupervised/uncontrolled and they carry
out tasks with some objectives in mind. [17]
Agents on the SW fall into two main categories agents that retrieve and connect
data, and agents which act as personal agents on the web. An agent that retrieves
data would be set parameters and would roam the SW using vocabularies provided
by ontologies. These agents would collect web content, then process it for their user
and would be able to prove the validity of the data by backtracking to sources. A
personal agent as described in [3] could organise and rearrange appointments, by
interacting with private ontologies and users timetables to automate the best times
for a users appointments considering their schedule.
The challenges faced by the Semantic web are two fold vastness and vagueness
[1]. The vastness of the web is clear when you look at the millions of results returned
by search engines, and converting all of these pages into a semantic sense would
require millions of man hours to do. Therefore it would seem that convincing people
to change their practices to practices that are in line with the semantic ethos is part
5
of the vastness problem. Vagueness is a problem in the sense that trying to nail
down what something really is in essence is difficult, [18] and creating a creating an
automated agent to tag meaning to data is again more difficult.
0.2.1.2
Linked Data
The term Linked Data(LD) refers to a set of best practices for publishing and connecting structured data on the Web [15]. Linked data is built on the pre-existing
web using Http and URI’s (uniform resource identifiers) and XML to publish structured
data which is referential.
Tim Burners Lee cites four key aspects of Linked data they are as follows:
1. Use URIs as names for things.
2. Use HTTP URIs so that people can look up those names.
3. When someone looks up a URI, provide useful information, using the standards
(RDF*, SPARQL).
4. Include links to other URIs. so that they can discover more things.
There are certain perceived limitations in the linked data methodology, if anyone
can create a Linked Data resource then any one can publish anything like it is at
the moment with pages on the web, however if automatic agents are searching the
web to create linked data mash ups then this could be an issue. Burners Lee talks
about possible solutions to this with a star rating of the data [2]. However though
I personally see it as a none issue, the web it build on referential integrity choosing
your data source is up to you as the designer of a system this negates the problems
of poor data or incorrect data.
The more Linked Data available on the Web the more interesting applications
people can develop [14], it is an exciting area in computer science at the moment.
0.2.1.3
The Linked Data Cloud
A key group providing information and connections in linked data community is the
LOD(Linking open data) community. The LOD group have created an image to show
a snap shot of the current Linked data ’cloud’, fig: 1, each node on the figure is a
linked data resource. To appear on the cloud image the dataset must contain at
least 1000 triples, this is to exclude sites which are simple made of a a FOAF profile.
The LOD cloud is only a cloud in name, nodes on the graph are self contained
data sets. The connections between the nodes are links made using the ontology
sameAs the sameAs ontology defines the subject as the same as the object thus
6
connecting them. This is one of expressions of simple facts that are covered in section
[? ].
Figure 1: Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch.
http://lod-cloud.net/
DBpedia is the center of the cloud (fig: 1), it is a linked data version of the Wikipedia
website. DBpedia uses a combination of ontologies to describe the data contained
in the articles. Using a combination of an internal dbpedia ontology fig: 2, which
covers 272 classes containing approximately 1,300 different properties with others
such as the FOAF, DBpedia attempts to give semantic meaning to the elements of
data.
0.2.1.4
Summary
The background research suggests that overall Linked Data is a evolution of the semantic web idea, it is the next logical step. When describing the SW, agents and
ontologies were discussed but in the discussion of linked data you can see the ontologies and agents being created. The visionaries of Semantic web ideas are moving the SW into todays Web, using RDF as the standard of choice.
The growth of Linked data however, appears to not match the hype it has received. With people like Tim Berners Lee, attempting to drive the project forward it
was assumed that the Linked data cloud would grow quickly however if you look at
7
Figure 2: The DBpedia Onotogy
8
the numbers of datasets in the cloud and the growth in 2007 there where 12 Linked
data sets, this number grew to 203 over the course of two years and then has not
increased since. It would appear that new datasets have stopped being released.
This could be due to the difficulties in converting current sites being difficult.
Overall, Linked data is attempting to build the semantic web on HTTP and URIs
but it does not make the web smarter. The semantic web wont solve the problem
of which RDF graphs to look at the moment A human has to choose them and
bad choices result in bad results. I believe that factors like this and the difficulties in
converting to RDF are putting off users from creating Linked data aware applications
and Linked datasets.Once these difficulties have been overcame i foresee Linked
data and the SW moving towards the ultimate gold of enabling computers to do
more useful work and develop systems that can support trust interactions over the
network. [30]
9
0.2.2
Developmental Tools and Appropriate Technologies
The technologies that are at the core of linked data and the linked data approach
are discussed in this section. These tools and methods do not exist solely to facilitate
the creation of LD web applications therefore I will be excluding the discussion of
features that are not applicable to LD and or the SW.
0.2.2.1
AJAX
AJAX, a shorthand of JavaScript and XML. AJAX in itself isn’t anything new rather a
collection of methods designed with the ’art of exchanging’[10] data with a server
to update a web page with out reloading the whole page. AJAX is key for the
project, being as its core ideas will assist in creating a dynamically embedded site in
line with the objectives of the project.
A core part of AJAX is the XMLHttpRequest object, this object that is supported by
all of the browsers mentioned in section 0.0.3, is used for asynchronous data retrieval
with a server. Web applications using linked data resources hosted on the Web will
use the XMLHttpRequest to retrieve data from SPARQL endpoints.
The Document Object Model(DOM), is part of the AJAX family of convention.
Useful for developing Linked data based Web Applications due to it allowing access
to elements in the Object Model. Using elements with no semantic meaning, such as
DIV and SPAN Linked data developers can create Web Applications with out creating the content in advance, and then filling the elements with content dynamically
on load. Examples of this can be seen on the BBC news and BBC sport which use
these for live news and sport updates.
JavaScript is the last element that binds, the AJAX methods together.
There are challenges in creating AJAX applications, not essentially of a technical
nature. The challenges with AJAX applications is having the vision to create something that is more than has been seen on the web before, to forget the limitations of
HTML and move forward with a much wider pool of options and possibilities.
0.2.2.2
RDF(Resource Description Framework)
Resource Description Framework, is a data model. It can provide a conceptual
description of the data being represented; for example the tag person could be
used to identify that a document is about a person [9]. The base standards for RDF
are XML and URI, URI’s are used to identify what the data is about and the XML is the
syntax of RDF.
The w3 list the following as the key concepts of the RDF format: [19]
10
• Graph data model
• URI-based vocabulary
• Data types
• Literals
• XML serialization syntax
• Expression of simple facts
• Entailment
Within RDF the triple is a key part of the makeup, made of three parts the subject,
predicate and an object. The triple forms the basis for the query language, covered
in the next section, and the RDF expression of simple facts. Simple facts are represented in RDF by connecting the subject and objects of a triple using the predicate
or property [19], this creates links within datasets people can be connected to addresses or ideas using these expressions. RDF graphs are themselves just multiple RDF
triples with the subjects and objects making up the nodes of the graph.
In recent years work has been done to use the RDF model and apply it to social media [29], social networks being just big connected datasets means that the
connections between the people can be mapped to an RDF triple graph. Then explorations of friendship networks, social interests and counter terrorism can be made.
RDF ultimately is a way to create a system of machine readable-processable
identifiers for subject, object and predicate with out confusion [22]. Its key characteristic is its ability to attribute meaning to data.
0.2.2.3
SPARQL
SPARQL("sparkle") is a query language for RDF [24], the name SPARQL is a recursive
acronym SPARQL Protocol and RDF Query Language. It is one of the key components in semantic web technology. It is quickly becoming the De facto standard for
quires on the semantic web [5]. SPARQL queries are build using triples of subject,
object and predicate.
SPARQL is very similar in its syntax to that of SQL, with the terms SELECT and WHERE
preforming the same roll. The select query form returns variable bindings result, this is
a "Query Solution Object" [6] made of solution objects which correspond to returned
results. An example is shown below:
Given the data:
11
1 <http : / / example . org /book/book1> <http : / / p u r l . org /dc/ elements / 1 . 1 / t i t l e > " Linked
Data Approach "
<http : / / example . org /book/book2> <http : / / p u r l . org /dc/ elements / 1 . 1 / t i t l e > " The
Semantic Web − A Primer "
3 <http : / / example . org /book/book3> <http : / / p u r l . org /dc/ elements / 1 . 1 / t i t l e > " The
Wizard Of Oz "
This simple data set describes three books with the title "Linked Data Approach",
"The Semantic Web - A Primer" and "The Wizard Of Oz".
The query:
1 SELECT ? t i t l e , ? u r i
WHERE
3 {
? u r i <http : / / p u r l . org /dc/ elements / 1 . 1 / t i t l e > ? t i t l e .
5 }
This query selects the the title, and URI of each book in the dataset. On the data
above has the following solution:
Uri
Title
http://example.org/book/book1
Linked Data Approach
http://example.org/book/book2
The Semantic Web - A Primer
http://example.org/book/book3 The Wizard Of Oz
The other two forms A query can take are ask, and construct. Ask returns a True of
false answer to a query, with construct forming the result set into A valid RDF format.
SPARQL queries ran on the web against live data sets are ran against SPARQL
endpoint, this is a conformant SPARQL protocol service Defined in the 2005 SPROT
specification [7]. Currently there exists tools such as VizQuer, which can be used to
generate SPARQL queries that can be ran against SPARQL end points.
In the project SPARQL queries will be used to retrieve data from Linked Data Sites,
the results will are returned in an JSON format. [24]
0.2.2.4
XHTML
XHTML, standing for eXtensible HyperText Markup Language. Is a combination of XML
and HTML, differing to HTML due to more defined structure. its well-formed nature
allows for true backwards and forwards compatibility between browsers. The current
standard for XHTML is 1.1, released in November 2010. The benefits for a developer
using XHTML are as follows:
12
• XHTML is easy to maintain - The strict nature of how XHTML must be formed ,
makes it easy to spot errors and with the strict XHTML checker on the wc3 website developers can find out where they have gone wrong simply by checking
with wc3.
• XHTML can take advantage of XML functionality
XHTML uses the document object model[23], which defines the standard way of
accessing elements of the page. XHTML’s DOM model allows access to elements on
the page and attributes of the elements.
13
0.2.3
A review of selected sites which currently use Linked Data
This section aims to look at web applications which exist already on the WWW and
use LD as a basis for the service being offered. Like a piece of work published in a
journal web applications created by companies and the government provide future
LD users with an idea of how to approach their own solutions. In these reviews we will
be taking a look at how sites retrieve the data they are using, how they have chosen
to display that data and how they have augmented the data with data from other
sources.
0.2.3.1
Police.uk
Police.uk is a site published in January 2011, created for the police force. It uses linked
data from data.gov combined with Google maps to give you detailed breakdowns
of the crimes by postcode. To find your area you simply enter your post code on the
front page and it takes you to page relating to the crime stats in said postcode(figure
4). Police.uk is built on AJAX principles, the page is made up of div elements which
are fed with information from JQUERY scripts.
The data set the the police Crime And local neighborhood data. This data set
contains statistics broken down by area and level of crime. The level of crime is split
into six categories: Anti-social, robbery, burglary, vehicle crime, violent crimes, and
other crimes. These are then placed on a street map, first of all just showing the
numbers of the different types of crime in an area on a Google map. Zooming into
the map, shows a detailed breakdown of where the crimes took place, by road
(figure 5). To handle this traversing of information a JQUERY script which runs a XHTTP
request is ran which gets the information for all of crimes around the area originally
searched for when the original area is first cached, figure 3, this list returned in JSON
format can then be added to the map if the user zooms in on a location which
contains any crimes in the list.
In addition to the crime data police.uk uses social media through twitter and
youtube, to augment the retrieved data. Two JQUERY scripts where used to retrieve
this information and like the crime data is then updated to a ’Twitter/Youtube’ page
division. Twitter and youtube are not linked data sources, in this case the retrieved
’tweets’ and videos provide insight into the police daily work.
This site is a example of what a mash up between, linked data and social media can achieve. Applications like this build on linked data, are indicators of the
potential look and feel of what the semantic web could offer us in the new era of
governmental transparency. Social media sources at this time are not in a Linked
data RDF format, however with time if current trends are followed there is potential
14
for that to change if this happened due to the modular nature of the police website
it would be an easy transition if the switch was made.
0.2.3.2
bbc.co.uk
Since 1994 BBC have had a website, in the past the bbc used to maintain all of
these sites by hand. The BBC ran into a problem with the vast amount of pages it
was having to maintain, thousands of web pages under their domain having to be
maintained primarily as independent sites led to sites going down and not coming
back up because the original creator wasn’t around anymore, an new solution was
needed.
The BBC choose to adopt a linked data approach to creating their mini sites, by
creating a RDF database containing the information on the BBC. Creating a mini
site on /programmes is now a case of defining the information [28]. Then as long as
the core database is maintained the dependent sites will also be maintained. For
example, Outcasts is a new show on the BBC (figure 6), currently the show is available on iplayer the links to the episodes are updated when the show is uploaded to
Iplayer and the most recent link is taken, the smart part is this is done automatically
and when the episodes are took down from iplayer the site will reflect that.
The BBC /music site, continues the linked data approach taking it further than
/programmes. The /music uses information from DBpedia(figure 7) and MusicBrainz(figure
8). By pulling the biography and links and information from resources that are actively maintained and updated by a peer reviewed active community under creative commons licenses the BBC can be fairly sure that the links and information
provided is accurate and up to date.
To summarize, the BBC have took a linked data approach to creating their sites
this will save them time updating each page, make their sites more relevant and in
the end save them money, which for an organization based on public money can
only be a good thing.
0.2.3.3
GovYou.co.uk - Your Freedom, Your Ideas, Gov You
Is a Linked Data site created to continue the work of the original governmental ’Your
Freedom’ web side which was launched in 2010 by Nick Clegg at the start of the
15
Figure 3: Retrieving Crime Data
Figure 4: Screenshot of police website
Figure 5: Screen shot of zoomed in map
16
Figure 6: The Outcasts Minisite
Figure 7: An Artist biography pulled from dbpeida
17
coalition government. The original Your Freedom site involved users submitting ideas
directly to the government. This site was later closed with the submitted ideas being
turned in to a Linked Data resource.
The GovYou site http://www.govyou.co.uk/is a continuation of the progress made
by the government. It is tagged searchable archive of the 14,000 submitted ideas,
with an option to add new ideas. As discussed section 0.2.1.2 GovYou uses HTTP URI’s
which refer to the idea on that URI, for example the top idea on the site is ’Abolish
Control Orders’, which then has the url:http://www.govyou.co.uk/abolish-control-orders/
The GovYou site, is a good example of the use of Linked Data to attempt to
collect and collate masses of ideas and form them into tagged indexable archives
with the optimistic aims of generally improving the society we live in.
0.2.3.4
New York Times - Who Went Where
The New York Times’ Who Went Where site is a linked data site which uses the New
York Times’ own linked data set and dbpedia. The New York Times’ data set contains thousands of subject headings which are mapped to Dbpedia, Freebase and
GeoNames [27]. The site is built using XHTML and JQuery, to create a web based
search application.
The alumni in the news site is an fairly simple example of what can be achieved
with their own dataset. Using dbpedia it retrieves the names of all the colleges and
universities in the world, the query below:
1 SELECT ? u r i , ?name WHERE {
? u r i r d f : type dbpedia−owl : U n i v e r s i t y .
3
? u r i foaf :name ?name
} L I M I T 1000 OFFSET o f f s e t
This query one of many which makes up the site is made from two three triples.
The first triple selects the URI of every object in the DBPDEIDA dataset the second
triple takes every URI and takes the foaf:name of it. Then queries DBpedia for the
NYT identifiers of all the alumni of that institution, which is then used to query their
own search api. For example searching for Leeds University results in a page on Jack
Staw(figure 9), the only notable alumni from Leeds to have ever appeared in the
New York Times, with links to all the NYT articles in ascending chronological order
relating to him.
This site is a perfect example of the Linked part of linked data, simple applications
created using referential data resulting in an informative and useful search engine.
The real power of LD comes in the form of composite queries, building applications
18
using different datasets and creating queries with the results of one dataset appears
to be core to the Linked data approach.
19
Figure 8: Artist Links and information pulled from Music Brainz
Figure 9: The page returned for University of Leeds
20
0.2.4
Linked Data and Content Management Systems
There is connection between LD and content Management Systems(CMS). On the
technical level A CMS are designed to enable groups of people to share and edit
data. In the same sense many of the operators in the Linked Data Cloud are running
CMS type services where the user bases are allowed to edit and add new information to sites.
Currently there is an issue within the CMS industry with translating the new semantic values of data with coupled with meta data [20] against the current CMS that
drive many text and multimedia content driven sites. Solutions this issue have been
appearing [8] in this paper the team create a system which ’enable the exposure of
site content as linked data’ where they take a Drupal site and convert it essentially
into an RDF format, the impactions of this is that the current CMS systems could potentially be retro fitted to an RDF linked data format with SPARQL endpoints created.
The conversion to the RDF linked data standard is that developers would be able to
then dynamically load data into their sites.
The With over one in every hundred websites in the world using Drupal as a backend CMS the use of the linked data standard is set to rise. The more data available
on the Web in this standard format the more options for future web agents or linked
data based content systems, which is a call echoed by Tim Berners Lee in talks in
2009.
0.3
Case Study
After researching linked data, the standards thats govern its use, the and the sites
that offer linked data repositories. To continue the evaluation of the linked data
approach A case study creating a city guide for prospective Leeds Students, this
guide would use linked data standards to create a repository of information on Leeds
the city.
0.3.1
Requirements for Alternative Guide
Creating a guide for potential students to Leeds Universities is something that I can relate to the need of remembering my time looking at potential universities. I decided
to build on my personal reckoning of what a potential through a series of informal
discussions with current students and students that are currently looking at universities. These would be guided by myself too keep the discussion on the possibilities of
a guide.
21
In total four discussion groups where held with a total of 11 people, 4 of whom
we’re current applicants. Over the course of these discussions, only two main points
came across in all four discussion groups. These where that they wanted, to get a
feel for Leeds by looking at the guide, they wanted to navigate through Leeds in the
guide and ’discover’ Leeds.
To ’discover’ Leeds and get a feel for it, where to vague to be requirements for
a site, when these points where raised during the discussions I inquired what that
these ideas meant to the people in the group. This then developed the feelings in to
the requirements, of having a map showing the location of what they we’re reading
about with clear information and links connecting different places together.
Once the discussions had been completed and the notes analyzed, It was clear
that this guide could be created as a linked data mashup application, that had the
following non-functional requirements:
• Clear Information about Leeds
• Simple Navigation between pages
One the general requirements had been taken functional requirements were
needed. Using the information gathered in the discussions in conjunction with the
background research these functional requirements emerged:
• Retrieval of Linked data from an online dataset.
• Methods to dynamically embed Linked data into an XHTML form.
• A Map showing key points of locations with information on the locations.
0.3.2
Data Issues
From my evaluation of linked data I believe that the current trend in linked data
sites is that the inception of the idea for the site is based on looking at the LD
resources in the LOD cloud and working from there. Choosing the creation of a
web guide, after researching the technology and recourses mean that many
of the more specialized of the linked data resources are instantly in now way
relevant to the data required for a web guide. With this taken into account
choosing Dbpedia as the resource for the first iteration, this would provide the
’bones’ to build on.
22
0.3.3
Methodology
With this case study seeking to explore the potential applications of linked
data, the methodology of the implementations should seek to enhance this
exploratory nature. With this in mind the software engineering part of the case
study, I will be using an evolutionary development methodology with some elements of rapid development.
0.3.4
Implementation
0.3.5
Stage 1: First Iteration
0.3.5.1
Design
Once the requirements of the application had been constructed, I a clear vision of what the application was attempting to archive. The first iteration would
be an XHTML page augmented with JavaScripts which would use JQUERY to
query the DBpeida SPARQL endpoint, on receiving this data the XHTML page is
updated.
Using an evolutionary development approach a stage one iteration was required. This would form the core of the application, the design broke into four
interconnected parts. Broadly these parts where, the design of the SPARQL
queries, JavaScript methods, XHTML page. In this section I will detail the creation of these parts.
0.3.5.2
SPARQL
The data retrieved from the SPARQL queries was core in powering the web application, the first iteration would used Dbpedia to retrieve data. To build the
web guide it would be key to gather data on places in Leeds so as brought up
in the discussion sessions users could explore the places in and around Leeds
learning about Leeds as they explored.
As covered in section 0.2.2.3, SPARQL queries in general are made up of triple
patterns. Designing the SPARQL queries would require me to select data from
a source specifically or to find items which are connected with Leeds. In line
with evolutionary development approach, the design of the queries was an
evolution starting with selecting the abstract for the resource Leeds I then built
from there.
23
To extract the abstract I would need to build a query that used the central
resource in DBPEDIA for Leeds (http://dbpedia.org/resource/Leeds) as the subject of the query then using the ontology for the abstract as the the predicate
and a variable for the abstract object.
2
SELECT ?abstract WHERE { {
<http : / / dbpedia . org / resource / Leeds>
<http : / / dbpedia . org / ontology / abstract > ?abstract ) }
}
This query returned all the abstracts held for Leeds, as part of the rapid evolutionary development building on this query a filter was added to stem the
results into the English ones.
1 F I L T E R langMatches ( lang (? abstract ) , ’en ’ )
With this basic query completed retrieving all the places in Leeds was the next
stage:
1 PREFIX dbprop : <http : / / dbpedia . org / property />
PREFIX db : <http : / / dbpedia . org / resource />
3 PREFIX dpowl : <http : / / dbpedia . org / ontology/>
5
SELECT ?name WHERE {
? u r i dpowl : location db : Leeds .
? u r i foaf :name ?name .
9 }
7
11 }
This query was built using two triple patterns. The first triple pattern finds the URI
address for every location that has its location set as Leeds, the second triple
pattern then retrieves the foaf:name of each uri.
0.3.5.3
XHTML
The design of the XHTML page was the simplest part of the process, at this stage
the webpage the appilcatiton is on simply had to contain DIV’s that could be
added to using a JQUERY script.
24
1 </head>
3 <body>
5 <h1>An A l t e r n a t i v e Web Guide</h1>
7
<div i d = " l i n k s " ></div >
<div i d = " abst " > <img i d = "new" ><img i d = " theImg " width= " 400 " height= " 300 "
a l i g n = " r i g h t " border= " 0 " > </ div >
9
</body>
11 </html >
Each div provides a inclusive way of group areas of content, each div acts acts
as a dump for data retrieved from the SPARQL query.
0.3.5.4
JavaScript
In the design of the JavaScript elements of the site, the first objective was to
create a set of methods for use in the retrieval and manipulation of data. During
the investigation JQUERY appeared to be used extensively, I concluded that
this was due to the simplification of the AJAX methods for this JQUERY would be
used for the development of the JavaScript elements of the alternative guide.
To access the linked data once the SPARQL queries we designed I created a
function using JQUERY for the AJAX request to sent the query to the end point:
1
f u n ction addlinks ( div_id , sparql )
3 {
/ / DpediaUrl i s spliced with the SPARQL query to create the request
5 var dbpediaUrl =
" http : / / dbpedia . org / sparql?default−graph−u r i =http%3A%2F%2Fdbpedia .
org&query= " +
7
escape ( sparql ) +
"&format= j s o n " ;
9
/ / THE JQUERY . ajax method i s used to run the query against the DBPEDIA Set
11
$ . ajax ( {
13
/ / The returned data type s e t here w i l l be JSON
dataType : ’ jsonp ’ ,
15
jsonp : ’ callback ’ ,
25
17
19
u r l : dbpediaUrl , / / SET ABOVE
success : function ( data ) {
/ / For each binding i n the r e s u l t s the s c r i p t adds a l i n k to that place
i n the d i v _ i d that has been given
$ . each( data . r e s u l t s . bindings , function ( entryIndex , entry ) {
var t x t = " ’ " +entry .name. value+ " ’ " ;
21
23
var content = document . getElementById ( d i v _ i d ) . innerHTML ;
document . getElementById ( d i v _ i d ) . innerHTML = ( content + " < l i ><a h r e f =#
onClick = " + " updateinfo ( ’ " +entry . u r i . value+ " ’ ) " + " > " + t x t + " </a></ l i > " ) ;
}) ;
25
27
}
}) ;
This function, given a SPARQL query and div identifier updates and rewrites the
XHTML of the DIV, this is a key part of the AJAX ideas. The new XHTML contains
links which run the next script to retrieve the updated information. It was designed to just propagate the first set of links of places so that a user could see
places of interest and then click them on clicking them the updateinfo script
would be ran:
2 f u n ction updateinfo ( URI , sparql )
{
4 var sparql =
" SELECT ?abstract ?name WHERE { " +
6
" < " +URI + " > " +
" <http : / / dbpedia . org / ontology / abstract > ?abstract . " +
8
" < " +URI + " > r d f s : label ?name . " +
" F I L T E R langMatches ( lang (? abstract ) , ’en ’ ) } " ;
10
12 var dbpediaUrl =
" http : / / dbpedia . org / sparql?default−graph−u r i =http%3A%2F%2Fdbpedia .
org&query= " +
14
escape ( sparql ) +
"&format= j s o n " ;
16
18 $ . ajax ( {
dataType : ’ jsonp ’ ,
20
jsonp : ’ callback ’ ,
u r l : dbpediaUrl ,
22
success : function ( data ) {
26
$ . each( data . r e s u l t s . bindings , function ( entryIndex , entry ) {
var newImg = new Image ( ) ;
newImg . s r c = entry . imageuri . value
" <img s r c = " +newImg . s r c + " > "
var content = document . getElementById ( " abst " ) . innerHTML ;
document . getElementById ( " abst " ) . innerHTML = " <h2> " +entry .name. value+ " </h2
> <p> " +entry . abstract . value+ " </p> " ;
24
26
28
30
}) ;
32
}
}) ;
34
36 }
This script onclick, using the URI of the resource the user was clicking on to update the page with the abstract, and adds a heading with the name. Due to
the way the the AJAX methods work each time a user clicks the page is updated without the page reloading. There is a delay, between the click and the
update due to the sending and receiving of data.
0.3.6
0.3.6.1
Stage 2: Second Iteration
design
Once the first iteration was complete, the application was retrieving place data
from DBPEDIA. This data was then dynamically formatted into a list of links that
on click would retrieve the abstract from the selected objects URI. This first experiment was somewhat of a success however, the application was just a list of
place names and abstracts. The application needed to evolve.
Building on the methods created in the first iteration was key to the second iteration, therefore the new SPARQL queries would be altered versions of previous
queries selecting different data objects and retrieving more data.
0.3.6.2
SPARQL
The second iteration SPARQL queries had to retrieve more information but also
better more relevant information. Working under the assumption that subjects
27
that were on the topic of Leeds which contained connections to the places
and things in Leeds would provide a better basis for creating a guide. Rather
than a list of places, instead now we would have a list of subjects which would
be delved into.
Selecting the URI’s and names where the skos:prefLabel was Leeds, the powerful nature of linked data structures can be seen here using the ’fact’ that
categories of information on Leeds always have the prefLabel Leeds:
SELECT ? u r i , ?name WHERE {
? f l o a t skos : prefLabel ’ Leeds ’@en .
? u r i skos : broader ? f l o a t .
4
? u r i r d f s : label ?name .
2
6 }
This query returns the result set:
uri
name
http://dbpedia.org/resource/Category:Parks_and_commons_in_Leeds
"Parks and commons in Leeds"@en
http://dbpedia.org/resource/Category:Companies_based_in_Leeds
"Companies based in Leeds"@en
http://dbpedia.org/resource/Category:Bishops_of_Ripon_and_Leeds
"Bishops of Ripon and Leeds"@en
http://dbpedia.org/resource/Category:History_of_Leeds
"History of Leeds"@en
http://dbpedia.org/resource/Category:Politics_of_Leeds
"Politics of Leeds"@en
http://dbpedia.org/resource/Category:Sport_in_Leeds
"Sport in Leeds"@en
http://dbpedia.org/resource/Category:People_from_Leeds
"People from Leeds (district)"@en
http://dbpedia.org/resource/Category:Transport_in_Leeds
"Transport in Leeds"@en
http://dbpedia.org/resource/Category:Leeds_media
"Leeds media"@en
http://dbpedia.org/resource/Category:Geography_of_Leeds
"Geography of Leeds"@en
http://dbpedia.org/resource/Category:Culture_of_Leeds
"Culture of Leeds"@en
http://dbpedia.org/resource/Category:Leeds_City_Region
"Leeds City Region"@en
http://dbpedia.org/resource/Category:Music_from_Leeds
"Music from Leeds"@en
http://dbpedia.org/resource/Category:Visitor_attractions_in_Leeds
"Visitor attractions in Leeds"@en
http://dbpedia.org/resource/Category:Buildings_and_structures_in_Leeds
"Buildings and structures in Leeds"@en
http://dbpedia.org/resource/Category:Education_in_Leeds
"Education in Leeds"@en
http://dbpedia.org/resource/Category:Local_government_in_Leeds
"Local government in Leeds"@en
With a new improved query for getting the basis for the data, improving on
the lack of information on screen was the next key issue. To solve this problem,
I opted to improve on the the iteration one query for retrieving the abstract
adding in the addition data elements needed. These would be and image,
and some geolocation points to plot on the map.
SELECT ?abstract ?imageuri ? l a t ?long ?name WHERE
28
2
<uri >
<http : / / dbpedia . org / ontology / abstract > ?abstract .
< u r i > r d f s : label ?name .
< u r i > foaf : depiction ?imageuri .
< u r i > geo : l a t ? l a t .
< u r i > geo : long ?long .
" F I L T E R langMatches ( lang (? abstract ) , ’en ’ ) } ;
4
6
The image is retrieved using the FOAF:depiction, geo:lat and long are surprisingly the latitude and longitude of the location. With these new SPARQL queries
created, new divs would be needed to hold the new information.
0.3.6.3
XHTML
With the decision to select data in a different way two DIVs where required for
links now, one to show the categories of subject and one to show the subjects
of the categories. Keeping the Links div a places DIV was added to accommodate the extra data:
1 <div i d = " places " ></div >
To create a map using the Googlemaps Api requires a Div to put the map in as
with other DIV elements in the guide this one has no meaning alone and acts
as a space to dynamically add the map to:
1
<div i d = "map" s t y l e = " width : 550px ; height : 450px " ></div >
The final element added in iteration two, was an image element:
1 <img i d = " theImg " width= " 400 " height= " 300 " a l i g n = " r i g h t " border= " 0 " >
The image element will be updated by the updated updateinfo function, this
will update the src attribute of the img element to the content being displayed
at the time.
JavaScript
With the functions built, in iteration one the second iteration built on the updateinfo and add links functions. The addlinks function remained unchanged
29
with the new SPARQL query being ran and the list of categories updated into
the links div.
The updateinfo’s success function was altered to process the new data:
1
3
5
7
f u nction ( data ) {
č . each( data . r e s u l t s . bindings , function ( entryIndex , entry ) {
var newImg = new Image ( ) ;
newImg . s r c = entry . imageuri . value
" <img s r c = " +newImg . s r c + " > "
var content = document . getElementById ( " abst " ) . innerHTML ;
document . getElementById ( " abst " ) . innerHTML = " <h2> " +entry .name. value+ " </h2
><img s r c = " +newImg . s r c + " width = ’400 ’ height = ’300 ’ >
< l i >lat : "+
entry . l a t . value+ " </ l i >< l i >Long : " +entry . long . value+ " <p> " +entry .
abstract . value+ " </p> " ;
updatemap( entry . l a t . value , entry . long . value , entry .name. value , entry .
abstract . value ) ;
Now the function updates, the heading and abstract as before, sets the img
elements src to the FOAF:depection which is a URL of the image of the resource
that is being accessed. Finally the success function updates the map to show
the location of the current content:
2 f u n ction updatemap( l a t , long ,name, i n f o )
{
4
var l a t l n g = new google . maps . LatLng ( l a t , long ) ;
var myOptions = {
6
zoom : 12 ,
center : l a t l n g ,
8
mapTypeId : google . maps . MapTypeId .ROADMAP
};
10
var map = new google . maps .Map(document . getElementById ( "map" ) , myOptions )
;
var marker = new google . maps . Marker ( {
12
position : latlng ,
map:map,
14
t i t l e : name} ) ;
}
The update map function uses the GoogleMaps Api, it takes takes four arguments: Latitude, Longitude, Name and information. These arguments are then
used to add a pointer to the map showing the location of the current content.
30
0.3.6.4
Testing
To test the final prototype I had to make sure that it ran correctly in the three
most used browsers on the Web today. The effect on browsers on web deveopment has been discussed in the past [25], the current way browsers are made
means that having a web page which is strict XHTML isn’t a guarantee that the
page will load with the same behavior in each browser. Therefore I will take a
look at the top three browsers these are Mozilla Firefox, Google Chrome and
Microsoft Internet Explorer.
Browser
Page Loads
Shows Picture
Retrieves Extra data
Map Updates
Chrome
Yes
Yes
Yes
Yes
Firefox
Yes
Yes
Yes
Yes
Yes
Yes
Internet Explorer Yes
Yes
Testing of todays most widely used browsers
As the table shows all browsers, were functionally working. The differences
came in the behavior in displaying the information on the site, Internet explorer
placed the image over the first part of the abstract, Firefox placed the image
above the text with no text being obscured finally Chrome also placed the abstract data below the image. The reason for the different behaviors is because
a style sheet wasn’t used to properly control the div elements. With out the cascading style sheet controlling the DIV elements the browsers apply there own
logic in ordering the elements, this is fine in Chrome and Firefox although the
site remains very plain However Internet Explorer for some unknown reason the
output is wrong.
0.4
Evaluation
In this section I have split the evaluation into four sub sections, three sections will
be regarding the development of the web guide these will be the Evaluation
against objectives, requirements and from a user view point. The final section
will be an evaluation of the linked data approach, and will take a look at the
effectiveness of the technologies as they stand.
0.4.1
Agaisnt Project Objectives
The project objectives were set down in section 0.0.2 at the beginning of the
project process. They were made of four key points if accomplished would
indicate that the project had been a success. Each point is evaluated below:
31
– To learn about the technical platform of link data and apply that knowledge - This objective was completed in two distinct ways, Learning about
the Linked data platform was a central part of the background research
section, looking at the standards in place and the technologies that power
linked data provided an insightful look into the platform. The creation of
the web guide was the application of that knowledge.
– To survey related work in the field of linked data - As part of my research I
looked at other websites and web based applications which used Linked
data to power their site. This survey provided an insight to how people are
going about creating Linked data based applications, while also serving
to further my understanding of the platform.
– To document and gather the requirements of a site which uses linked data.
- In the creation of the web guide, requirements were created for the
Linked data site this is documented in section 0.3.1.
– To design and develop an alternative guide to Leeds - The case study of
the alternative guide creation documented the design and development
0.3.
0.4.2
Agaisnt Requriements
As one of the objectives was to design and develop an alternative guide to
Leeds a set of requirements where set down, these requirements form the basis
for judging whether the guide had achieved its goals:
– Use a Linked Data source, to form the basis of the site. - In the creation
of the alternative web guide the linked data source DBpedia was used to
form the basis of the site. In the second iteration of the web guide the
web guide selected all the subjects that bordered with Leeds the subject;
These subject headings contained information pertaining different aspects
Leeds. This was to attempt to give a full guide of Leeds the city, this approach failed as much as it succeeded the returned results were useful to
a point. The problem being that the returned results were all not of the
same type different ontologies proved to be awkward to handle, however
the web guide did retrieve data from a linked data source.
– Embed retrieved Linked Data on the site dynamically. - Using the AJAX
methods I was able to complete this goal. The XHTML page was divided
into parts which the JavaScript updated dynamically.
32
– Create A interactive map of Leeds with keys points of interest to a student.
Combining Geo location information and retrieved descriptions of places.
- In iteration two a SPARQL query was created to retrieve the Geo locations
of each place in Leeds this point was then added to a map. This was done
using the Google maps Api, the map was interactive in the A user could
move about it and see some information about each point.
– Develop a website which natively runs on today’s top three browsers:Mozilla
Firefox, Google Chrome, and Windows Internet Explorer. - Shown in the
testing after iteration two with some minor tweaks the site ran in each listed
browser.
0.4.3
From A Potential User
To fully evaluate the guide the viewpoint of a user must be considered. To
consider this we will look at the experience of a user using the system to find
out about the city of Leeds. To do this I will take anticipate the actions new user
of the system, and discuss the pitfalls potential highlights.
The first thing a user of the guide finds, is that it appears like a normal web page,
Links on one side, pictures and headings. On clicking links the user will find more
links popping up on the subject that is clicked on. Then investigating those new
links will likely lead the user to discover the page updates with new information
automatically. The way the product is designed, the ease of use comes from its
simplicity any user can open the page and just click through links discovering
more information on Leeds. While using this however the user will most likely
start to realize the page far from instantly loads each time a link is clicked there
is a wait time while the next data set is retrieved. This provides the first real pitfall
in the linked data approach, unless your caching your data in one load at the
start of a session the user has to load bits of data at a time which results in many
small waits.
Due to the design of the SPARQL queries some of the objects returned from
the categories, do not fit the pattern to retrieve the image, abstract, name,
coordinates. Approximately 90 % of the links return new information if clicked,
but may be missing an image if its a concept or cooariantes if its not a place.
This essentially isn’t a major problem but from a users perspective this is a large
flaw in the design.
The Web Guide suffers from one grave flaw in that the users have no control of
any of the site except for the links in the side bars. As much as the page is dynamically embedded the experience for a user isn’t dynamic because of that,
33
the opposite in fact is true clicking through pages of automatically retrieved
and dynamically displayed data does not have the same appeal as a well
crafted site.
Overall it does feel like you explore Leeds clicking through the different parts of
the extracted data, the map provides a point of reference so new visitors to
the city can connect the dots as they move from one data object to the next.
The sites guides design is basic, this would be a major pitfall if the product was
going to market. As the product is only a prototype, for user improvement more
features
0.4.4
Evaluation of the effectiveness of technologies
Creating the web guide was part of my investigation into the Linked data approach. I wanted to look at both the practical problems of using Linked data in
its current state and what can be created in a short time space using the current standards in web technology. In this subsection I will be addressing both
issues I have found with the Linked data platform and evaluating the effectiveness of the technologies.
With SPARQL being the standard for querying RDF it is central in any evaluation
of the linked data approach. As touched on in section 0.4.3, the main problem
that appeared with the SPARQL query language isn’t in the semantics of the
language, it lies in the retrieval of the data. When querying a SPARQL end point
the application sets a XHTTPrequest to the endpoint which then responds with
the results in a selected format. This creates a ’lag’ time between users opening
the site and displaying the data, this was seen in the evaluation of the N.Y Times
web mesh up which covered its loading times with a waiting message. This
problem can be avoided if by keeping a copy of the data set locally however
this is not practical if multiple large sets are being used. This seems to be a
problem that gets increasingly more of a delay when more sets of data are
used.
Repetition of data was a problem that needs to be addressed, within even only
DBpedia the lack of consistency of the data. This becomes increasingly obvious
as one explores the data sets, for example on DBpedia many resources uses
FOAF:name and dbprop:name both these values are from my experience are
the same if both are present but often only one or the other was. This created
a problem selecting the right property to select the piece of information, and
could be solved by removing the repetition by having only one ontology that
describes names.
34
The AJAX methods where the most effective existing parts of what I perceive
the Linked data approach to be. The using JavaScript to retrieve data and
manipulate DOM objects provides a developer with a clear route for collecting
and dynamically displaying data. The combination of JavaScript and XHTML
creates options that where not available to HTML developer, AJAX methods
provide a perfect framework for the creation of mesh ups.
During my background research, I came to the conclusion that the growth rate
of linked data resources had slowed down and that the rate of creation of
new datasets was slowing. However when building the application I was overwhelmed by the amount of datasets in use, and in this lies the problem. At the
moment with only 203 nodes in the LOD cloud choosing DBpdeia as the source
for retrieving data was an obvious choice a central node with good links to
other sites, but if there were 15,000 sets without some logical agent traversing
the links between data its hard to see how you could go about dealing with the
amount of data you could possibly use, and with calls for all data to be in an
RDF format there is a potential for an exponential growth. With linked data being a new standard this may however be analogous of complaining that there
was an increase in web pages which made it hard to find a good web page,
and the inception of new sets could conversely spark the promised wave of
new linked data applications.
The Linked data approach is about creating new services or resources from
data that is being shared, its also about trying to add something to the data
being retrieved to make it more than the sum of its parts. The technologies exist
at the moment to achieve this goal, however
0.5
Discussion
0.5.1
Future Extensions
There is an extensive list of extensions that could be made to the web guide
to extended and refine it. In this section I will be looking at the extensions that
could be plausibly added to the web guide application given additional time
have been available, to improve its functionality. Additions to the site should
build on the data resources that have already been retrieved.Several of the
more practical potential enhancements are listed below:
– Creating A Linked Data jobs set would be a massive addition to the Guide,
many students looking for at Universities would like to look at the available
35
jobs. Creating an RDF repository would be another good way of expanding my understanding of Linked Data.
– Addition of more Linked data sources, meshing them into the current structure of the site as an experiment to see if more data equals a better guide.
Instead of building queries from retrieved terms from DBpedia it would be
interesting to create composite queries that used DBpedia sameAs ontologies, to follow the links on the LOD cloud to new data.
– An area of extension for work can be found in looking where there is a
crossover between linked data and (social) media, I would look to extend the web application by creating some kind of media aware agent
which when given data could use those terms to retrieve media based on
those terms. For example like police.uk this could be in the form of youtube
videos, or it could query RSS feeds for news articles.
– To extend the guide I would like to attempt to combat loading times by
applying some kind of logic to the retrieval of data. Caching the next potential clicks when a user has selected a category could reduce perceived
waiting times.
– At the end of the second iteration the web guide still looked very basic and
left it to the browser to decide how to display the page. To improve on this
I would have liked to implemented a style sheet which did more than just
divide the divs into basic sections, and also use a Java api to implement
the link sections in a more visually appealing ways.
0.5.2
Conclusion
Linked data is a re-envisioning of the semantic web in less flowery terms, its a
real way to make a semantic web future possible. The noise surrounding this
area has grown increasingly in recent years. The linked applications being built
now by developers are starting to evolve from web pages to real interactive
useful applications, with some of the applications being of real use to people.
When people helped with the earthquake in Haiti, using the power of linked
data to update the map, which was then shared with rescue workers on the
ground a real sense of what Linked data could accomplish was shown.
Linked data is as much about people putting data on to the web as it is the
applications made from the data. As the LD ’cloud’ continues to grow, the
potential for new applications grows with it. The Linked data approach to delivering data lies in the RDF format, meta data with more value because its fully
integrated which then makes it easier to link data from different providers.
36
The goal of this project has been to produce an alternative guide to Leeds
using the linked data approach while exploring the linked data platform. The
evaluation although limited that has been preformed indicates that this has
been achieved. The exploration, of the Linked data concepts has been the
real challenge in the process. It seems clear that Linked Data is here, its being
used by people to create a new type of web aware applications.
37
Bibliography
[1] G. Antoniou and F. Van Harmelen. A semantic web primer. The MIT Press,
2004.
[2] T. Berners-Lee. Linked data. International Journal on Semantic Web and
Information Systems, 4(2), 2006.
[3] T. Berners-Lee, J. Hendler, O. Lassila, et al. The semantic web. Scientific
american, 284(5):28–37, 2001.
[4] J.P. Bigus and J. Bigus. Constructing intelligent agents using java. 2001.
[5] C. Bizer, T. Heath, and T. Berners-Lee. Linked data-the story so far. International Journal on Semantic Web and Information Systems, 5(3):1–22, 2009.
[6] K.G. Clark, L. Feigenbaum, and L. Feigenbaum. Serializing sparql query
results in json. W3C Note, 2007.
[7] K.G. Clark, L. Feigenbaum, and E. Torres. SPARQL protocol for RDF. W3C
working draft, 14, 2005.
[8] S. Corlosquet, R. Delbru, T. Clark, A. Polleres, and S. Decker. Produce and
Consume Linked Data with Drupal! The Semantic Web-ISWC 2009, pages
763–778, 2009.
[9] DBpedia. http://dbpedia.org/page/Mark_Zuckerberg, Feb 2011.
[10] J.J. Garrett et al. Ajax: A new approach to web applications. February,
18:2005, 2005.
[11] Semantic Web Agreement Group. What is the semantic web http://swag.
webns.net/whatIsSW, May 2001.
[12] T.R. Gruber et al. A translation approach to portable ontology specifications. Knowledge acquisition, 5:199–199, 1993.
38
[13] Leeds City Guide. Leeds student guide http://www.leeds-city-guide.com/
students, feb 2011.
[14] M. Hausenblas. Exploiting linked data to build web applications. Internet
Computing, IEEE, 13(4):68–73, 2009.
[15] Tom Heath. Linked data project faq http://linkeddata.org/faq, Aug 2010.
[16] J. Heflin and J. Hendler. A portrait of the Semantic Web in action. Intelligent
Systems, IEEE, 16(2):54–59, 2001.
[17] J. Hendler, O. Lassila, and T. Berners-Lee. The Semantic Web - A new form
of web content this is meaningful to computers will unleash a revolution of
possibilities. Scientific American Special Online Issue, 2002.
[18] I. Horrocks. Ontologies and the semantic web. Communications of the
ACM, 51(12):58–67, 2008.
[19] G. Klyne and J.J. Carroll. Resource description framework (RDF): Concepts
and abstract syntax. Changes, 2004.
[20] GB Laleci, G. Aluc, A. Dogac, A. Sinaci, O. Kilic, and F. Tuncer. A semantic
backend for content management systems. Knowledge-Based Systems,
23(8):832–843, 2010.
[21] University Leeds. Leeds univsity taught student guide http://www.leeds.ac.
uk/qmeu/tsg/, February 2011.
[22] F. Manola, E. Miller, and B. McBride. RDF primer. W3C recommendation,
10, 2004.
[23] S. Pemberton et al. XHTMLŹ 1.0 The Extensible HyperText Markup Language.
W3C Recommendations, pages 1–11, 2000.
[24] E. PrudŠHommeaux, A. Seaborne, et al. SPARQL query language for RDF.
W3C working draft, 4, 2006.
[25] C. Queinnec. The influence of browsers on evaluators or, continuations to
program web servers. In ACM SIGPLAN Notices, volume 35, pages 23–33.
ACM, 2000.
[26] S. Russell and P. Norvig. Intelligent agents. Artificial intelligence: A modern
approach, pages 32–54, 2003.
39
[27] Evan Sandhaus and Rob Larson.
linked
data
cloud
More tags released to the
http://open.blogs.nytimes.com/2010/01/13/
more-tags-released-to-the-linked-data-cloud/, January 2010.
[28] M. Smethurst.
How we make websites http://www.bbc.co.uk/blogs/
radiolabs/2009/01/how_we_make_websites.shtml, 2009.
[29] AK Thushar and P.S. Thilagam. An rdf approach for discovering the relevant
semantic associations in a social network. In Advanced Computing and
Communications, 2008. ADCOM 2008. 16th International Conference on,
pages 214–220. IEEE, 2008.
[30] Wc3. Semantic web http://www.w3.org/standards/semanticweb/, January
2010.
40
Appendix A
Personal Reflection
This project was something I had looked forward to starting for a long time, the
last step in finishing at Leeds. I found the whole process to be overwhelming at
some points. So many things that had to be done with the one goal of creating this report. Choosing to look at Linked data was one of the best choices
I made in my university time. Learning about an establishing side of development of computing has been interesting and fun looking at some of the quirky
stuff people have done with it, the linked data pokémon site being a great
example of quirky.
Coming to the end of my project I came to the realization I was about to have
to get a job, in my breaks from work I started looking at potential jobs. Now
if your like me and haven’t looked at the job market you should have a look
during before you start your project. Choosing a project that fitted with your
ideal career path seems like a wise idea in hindsight but honestly I didn’t. During
this project I feel I’ve developed a new skill set which would be now valuable
for the workplace, critically looking at your own work is a skill that is important in
scientific processes and workplaces.
Starting with an open mind is essential to creating the right piece of work. At the
start of the project before I had enough knowledge of the Linked data platform
and the datasets, I made assumptions to what was going to be possible and
what i could achieve. In some respects this spurred me on to work harder and
learn more, but my advice would be to just keep an open mind and try to set
41
down what you want to achieve at an early stage and establish its plausibility.
With this project not directly building on any past modules, I was covering lots of
ground for the first time. This mean learning to use JavaScript, XHTML, DOM manipulation and some SPARQL, In a short amount of time while I was ultimately up
to the task the pressure of learning something entirely new while running over
deadlines is a stressful experience. At one point designing the SPARQL queries
I felt like crying this wasn’t going to help and I pushed on once I had understood the concepts rather than just read them it was a much more pleasurable
endeavor.
Being a reflection, I should look at what I have personally gained during the
process of this report. I have learn’t that I can make a substantial document
and that I shouldn’t fear taking on large tasks like this. They only start as large
and the more you do the smaller they are. One thing I would do again is make
draft chapters while the project is on going, as you can see in my time plan all
the writing was left till the end. Now its finished its clear that this was not the
right strategy, keeping notes was helpful during the process but unless they are
organized they are practically useless under pressure.
It would seem wise to finish with a warning future students about project timings,it feels like you have forever but it really works out that you have the right
amount of time to do a good piece of work. Starting the project late I feel
has meant that playing catch up has been the path of the course. You should
make sure to not whittle the time away and really take the chance to do a
great piece of work.
42