How to Build an RDF Based Wiki Fortgeschrittenenpraktikum Walter Christian Kammergruber

Transcription

Institut f¨
ur Informatik
Lehr- und Forschungseinheit
f¨
ur Programmierung
und Softwaretechnik
Fortgeschrittenenpraktikum
How to Build an RDF Based Wiki
Walter Christian Kammergruber
Aufgabensteller:
Betreuer:
Abgabetermin:
Professor Dr. Martin Wirsing
Axel Rausmayer
Mai 2006
Zusammenfassung
Es ist ein verbreiteter Trugschluss, ein Wiki nur als eine Ansammlung von Textdokumenten zu
betrachten. Eine Wikiseite ist mehr als nur ASCII-Text: Auf der einen Seite gibt es implizite
Daten, die mit beschreibenden Text verquirlt, und zudem an anderer Stelle gespeichert sind. Dies
f¨
uhrt zu Konsitenzproblemen. Auf der anderen Seite gibt es Metadaten u
¨ber Wikiseiten, die
mit bisherigen Wikians¨
atzen nicht zufriedenstellend verwaltet werden können. Wir zeigen einen
Ansatz, bei dem die Daten und Metadaten in einer RDF Datenbank gepeichert und gehandhabt
werden. Dabei k¨
onnen Duplikationen vermieden werden. Zudem werden verschiedene Ansichten
auf die Daten m¨
oglich. Wegen der Verwendung von RDF, ein weit unterst¨
uzter Standard, können
externe Datenquellen in die RDF Datenbank einbezogen werden. Wir zeigen zudem einen neuen
Ansatz f¨
ur eine Wikisyntax, eine Sprache mit Namen WITL. Bei WITL wird nicht ein ’search
and replace’ Stil verwendet um den Text zu rendern, sondern ein Syntaxbaum, der mittels einer
LL(k)-Grammatik definiert ist, wird erzeugt und ausgewertet, um das gewollte Ausgabeformat zu
generieren.
Abstract
It is a common fallacy to see a Wiki as just a collection of text documents. It is a network of
information. A wiki page is more than just ASCII text: On the one hand there is a lot of implicit
data tangled with descriptive text that is often a duplication of other data stored elsewhere. This
duplication leads to consistency problems. On the other hand, there is meta-data about Wiki
pages (such as their name or author) that currently cannot be properly managed. We show an
approach, where this data and meta data is stored and managed by an RDF database. This
prevents duplication and allows us to publish different views on the same data. Additionally,
because of using RDF as a widely supported standard, one can also add data from external sources
to the database.
We also show a new Wiki syntax approach, a language called WITL. In WITL we do not use a
search and replace mechanism for rendering the text written in Wiki style but we build an abstract
syntax tree defined by a LL(k) grammar and walk this tree to generate the desired output format.
Contents
Contents
1 Introduction
1.1 What Is Yet Another Wiki Good For?
1.2 About RDF . . . . . . . . . . . . . . .
1.2.1 RDF as a Graph . . . . . . . .
1.2.2 RDF as Triples . . . . . . . . .
1.2.3 Further Advantages . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4
4
5
5
6
6
2 WITL: A Syntax for the Wiki
2.1 Wiki Markup . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 WITL Syntax . . . . . . . . . . . . . . . . . . . . . . . .
2.2.1 WITL Syntax explained by Illustrative Examples
2.3 Using ANTLR for Parsing WITL Code . . . . . . . . . .
2.3.1 The Lexer . . . . . . . . . . . . . . . . . . . . . . .
2.3.2 The Parser . . . . . . . . . . . . . . . . . . . . . .
2.4 Rendering with WITL . . . . . . . . . . . . . . . . . . . .
2.4.1 Evaluating the Abstract Syntax Tree . . . . . . . .
2.4.2 Defining Functions . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7
7
7
9
9
10
11
12
12
13
3 Wikked Architecture
3.1 Three Layer Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
15
15
4 Scenario: A Bookmark Collection Web Site
4.1 Motivation . . . . . . . . . . . . . . . . . . .
4.2 The Wikked Three Layer Model in Practice
4.2.1 Daniel the Data Manager . . . . . . .
4.2.2 Alice the Author . . . . . . . . . . . .
4.2.3 Wayland the Web Designer . . . . . .
18
18
18
18
18
19
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5 Related Work
23
6 Future Research
23
7 Conclusion
23
3
1
Introduction
1 Introduction
1.1 What Is Yet Another Wiki Good For?
When you write an entry in a Wiki, you just type some lines of text in some Wiki syntax. You
may include links to other entries, books, web pages, images, articles, newspapers and so on. But
you do not have a handy way to access general data like your bookmark collection or your favorite
songs. You can simply quote them, but what happens when your taste of music changes? You
have to update all your entries. Therefore it would be nice to have the information separated from
the text. As an example we show how you can include your bookmark collection in your Wiki. In
general, you can see our Wiki as some kind of RDF 1 browser. Everything is encoded in RDF and
the Wiki is just a tool to make the information visible.
Because we use RDF as a quite popular form for managing data structures, you can also include
or import data from external sources based on RDF. These sources could be for example the open
directory project created by the Netscape Communications Corporation [9] and its subprojects
for music or restaurants. There is also an increasing number of tools concerning RDF, e. g for
extracting information out of web pages. RDF at the bottom of the Semantic Web is a very
promising approach for modeling data structures with an increasing number of applications.
The primary goal of our wiki called Wikked is that it can display any data encoded in RDF.
The point of using RDF for managing information is to have the possibility to state relations
between resources in a practical way. For example when an entry links to an other one, you can
formulate this fact with a statement such as “entry A links to entry B”. A graph representation of
such statements can be seen on Fig. 1 on the example of an address book entry. You can access
these pieces of information easily by formulating queries, e.q. by using a querying language like
RDQL[15]. These queries are an elegant way to get the information you want.
Storing and querying meta data can become quite complex. There are some handmade solutions
for attaching meta data to Wiki entries but all of them seem to be just a makeshift. RDF is highly
developed and there are many tools for it available. By expressing the information in RDF we try
to see everything as a node in a network. You can connect or “label” everything with anything.
An entry has a creator and the creator has linked nodes or properties like names, addresses, phone
numbers and so on2 .
A big issue for Wikis is also the Wiki syntax. In most Wikis, there are regular expressions
used for searching and replacing patterns, e. g. *bold* stands for bold and there are blocks with
macros for extra functionality.
In WITL, we just differ between functions and text. A Wiki document then consists of a sequence
of text and functions. A function may also have nested functions and nested text, i.e. when the
document is parsed, the result is an abstract syntax tree. To keep the idea behind traditional Wiki
syntax namely being able to allow fast-paced editing we integrated Syntactic Sugar or call it wiki
markup, e. g. **bold** for bold. This syntactic sugar is replaced by the constructs of WITL
before the actual parsing of the Wiki document is done. Because of the wiki markup, users with
no experience in programming languages can use Wikked as well. They just can pick out a few
functions they want to use and omit the additional functionality.
On the one hand, WITL is used as language additional to the wiki markup. On the other
hand WITL can also be used as template language for designing the web page of the wiki itself.
It has dynamical and static ways to include constructs ‘normal’ programming languages use but
it is designed for working with heterogeneous data, especially text. Because of this approach,
Wikked does not depend on template engines such as velocity from the apache jakarta project
[1], JavaServer Pages [16] or stringtemplate from Terence Parr [12]. The templates for the pages
1 RDF
2 You
stands for Resource Description Format. For details see section 1.2.
may have a look at the vCard MIME Directory Profile from the networking group [4] for more examples.
4
1
Introduction
http://www.w3.org/2000/10/swap/pim/contact#fullName
John Doe
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://www.w3.org/2000/10/swap/pim/contact#Person
Figure 1: This small RDF graph encodes part of an address book entry. The entry itself is the
blank node in the top left corner, it contains the full name of the contact and has the
RDF type http://www.w3.org/2000/10/swap/pim/contact#Person. Both edges are
drawn with a URI label, but actually they are labeled with a node. Therefore, it could
also be a blank node.
in Wikked can also be edited in Wikked, because both the templates and the entries are pages
in Wikked. As a consequence of preprocessing wiki markup, generating templates can be done
quickly and without the need of writing too much HTML.
1.2 About RDF
For our purposes, we view RDF as a universal data structure that stores graph-based information.
Its main virtue is its simplicity and its ability to integrate external data.
1.2.1 RDF as a Graph
Similar to usual graph definitions, an RDF graph consists of
• Nodes: There are two kinds of nodes.
– Resources: these are the main building blocks of the graph. If a node should be publicly
addressable, it gets a URI as an identifier. If not, it remains anonymous and is called a
blank node.
– Literals: Resources can also be viewed as potentially containing URIs. In order to add
arbitrary data to RDF, one attaches a literal to a resource. Literals can only exist in
that role, as the target of an edge. They contain a text string.
• Edges: edges are binary and directed, connecting a source and a target node. Every edge
has a node as its type; a label, if you will.
Fig. 1 shows how these basic constructs are used to build an RDF graph. Note how URIs are
used as a convenient public namespace for identifying nodes. But they can also be viewed as
references to entities on the Web (including the local file system). Constructing URIs as a code
for non-Web entities, one can also put references to real-world “objects” into an RDF graph. For
example, creating a resource with the URI mailto:[email protected] and attaching other nodes to it
can mean3 that we are describing the real-world person John Doe.
3 Naturally,
this is a matter of semantics, of how a group of people agrees on interpreting a particular RDF graph.
5
1
Introduction
1.2.2 RDF as Triples
Internally, RDF is usually stored as a set of triples. Each triple is an edge and has the components
subject, predicate and object. The names of the components already hint at the fact that with
each triple we are making an assertion. That is why triples are also sometimes called statements.
The subject is the edge source and must be a resource. The predicate is the edge label and also
has to be a resource. The object is the edge target and can either be a resource or a literal. A
consequence of RDF only knowing about edges is that there cannot be single nodes. In practice,
this is not a problem, because RDF is so fine-grained that single resources are rarely enough for
expressing anything meaningful. The following shows how the RDF triples of the example in Fig. 1
are expressed as plain text. Note that in this format, blank nodes also need an identifier (which is
internal only).
_:1 http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://www.w3.org/2000/10/swap/pim/contact#Person .
_:1 http://www.w3.org/2000/10/swap/pim/contact#fullName
"John Doe" .
1.2.3 Further Advantages
Fortunately, the basics of RDF are simple. Further possibilities such as reification4 of statements,
schema languages, inferencing etc. are outlined in the RDF Primer [8]. Many aspects of RDF,
such as merging of graphs and multi-dimensionality, make it very useful for software engineering
applications [14].
4 Reifying
a statement means making it available as a resource. Without reification, one can only annotate edge
types, but not the instances of a type.
6
Finding the right syntax for a Wiki is a very important issue because user of the wiki have to
use it intensively. You can either resort to some existing rendering machines or create a new one.
We tried to use Radeox [5] as a rendering engine, but we quickly reached its limitations. Adding
extra functionality beyond the wiki markup and makros led to awkward constructs and unclean
code design. For example the links to other Wiki pages are in Radeox written with brackets, e.g.
[someTitle] creates a link to a Wiki page with title: “someTitle”. We wanted to identify Wiki
pages by means other than their title 5 . They should be unique and independent from the title.
The identifier being independent from the title brings also the advantage that when a title changes,
the identifier is untouched. We wanted to distinguish between both sort of links, i.e. wiki links to
unique ids and also links to titles that refer either to a set of entries or to an explicit entry. That
differentiation in Radeox would just be possible with some makeshifts. Because of being unhappy
with exiting solutions we finally decided to create a new and simple syntax with a formal semantic.
There are two syntactic concepts involved. Thereby we differentiate between wiki markup and our
wiki language called WITL. The syntax of both languages are described in the following sections.
2.1 Wiki Markup
“A Wiki website is a hypertext on steroids.” (Lars Aronsson [2])
Writing wiki entries as easy as possible is a key issue of wiki markup, especially because typical
HTML source code makes the actual text content very hard to read and edit for most users6 .
Promoting plain-text editing with a few simple conventions for structure and style is therefore
advisable.
Keeping that in mind, we introduced a simple wiki markup. Wiki markup parsing is a separate
text to text transformation that is performed before the actual WITL parsing. The wiki markup
is compatible with WITL, because WITL has very few special symbols. Fig. 2 shows the used
wiki markup.
2.2 WITL Syntax
Our syntax emerged of some aesthetic and practical considerations. We just started with setting
a link in curly brackets, e. g. {http://www.ifi.lmu.de} or {ftp://ftp.leo.org}. Because of
the structure of a URI (see RFC:3986 [3] for details), which is defined as a scheme name followed
by a colon and other chars, it is straightforward to see the scheme as a function name and the
characters behind the colon as an argument. You can consider this construct as a function with the
signature function(String schemeName, String arg), e. g. {http://www.ifi.lmu.de} stands
for function("http", "//www.ifi.lmu.de").
A function in most cases does not only need one but several arguments and therefore we defined
a separator between the arguments. We chose “ ˆ ”, because “ ˆ ” is not allowed in a URI, rarely
used in text and not need for the wiki markup.
In WITL we also have blocks with unparsed text. Unparsed blocks are convenient for writing
longer text parts where reserved charaters may occur and it is not desired that every single character
is “escaped”. These verbatim text blocks are surrounded by tripled brackets, e. g. [[[ some
unparsed text ]]].
Abbreviated abstract syntax:
5 The
title of an entry is not a good identifier. Just consider a title like “Hub”. As ”Hub” there might be a lot of
entries with the same title.
6 This fact is nothing new but worth mention.
7
Markup
Result
==== Title Level 1 ====
Title Level 1
=== Title Level 2 ===
Title Level 2
== Title Level 3 ==
Title Level 3
= Title Level 4 =
Title Level 4
**bold**, ~~italic~~, ''teletype''
bold, italic, teletype
tabs lead to quoted paragraphs
% comments start with a percent sign
for http:
{http://www.ifi.lmu.de} creates a link to
"http://www.ifi.lmu.de"
for http:
http://www.ifi.lmu.de creates a link to
"http://www.ifi.lmu.de"
- indent opens sublist
and this text is in the same item
+ numbered sublist at same level
indent opens sublist and this text is in the
same item
1. numbered sublist at same level
Paragraphs are separated by blank lines
Paragraphs are separated by blank lines
This is a new paragraph .
This is a new paragraph .
: Definition list
A list with terms
: Start term with colon
|| Table Heading | Table Heading |
| one
|two
|
| three
|four
|
Definition list
A list with terms
Start term with colon
Table Heading
Table Heading
one
two
three
four
{wikked.link:http://page1.com^page one}
creates a link to "http://page1.com" but
displays "page one"
page one creates a link to "http://page1.com" but
displays "page one"
The second argument can be left out:
{wikked.link:http://page1.com^page one}
creates a link to "http://page1.com"
The second argument can be left out:
page one creates a link to "http://page1.com"
Figure 2: This figure displays the used wiki markup.
8
seq
node
funapp
functionName
plain
verbatim
::= (node)* .
::= (plain | verbatim | funapp) .
::= ‘{’ ( (functionName (‘:’ seq ( ‘ ˆ’ seq ) *) ? )
| (‘*’ (.)* ‘*’)
| (‘!’ (.)* )
| (‘?’ (.)* ) )? ‘}’ .
::= ([a-z]|[A-Z]|‘_’|‘$’) ([a-z]|[A-Z]|‘_’|[0-9]|‘$’|‘+’|‘-’|‘.’)*.
::= all characters except ’{’ and ’}’ .
::= "[[[" (.) * "]]]" .
2.2.1 WITL Syntax explained by Illustrative Examples
Most of the text is considered “plain text”, except for the following constructs:
• {h1:Heading on level one} a wikked function, most HTML commands are defined
• {http://foo.com}, {ftp:www.leo.org} standard URLs fit naturally as links into the syntax
scheme
• [[[verbatim text no functions are evaluated]]] “verbatim” or unescaped text can be
used to insert HTML (e.g. in web page templates) or source code (e.g. Java code for
documentation)
•
{* Comments *} produce no output
Variables:
• {?var}gets a value (unevaluated!)
• {:var}, gets an evaluated value
• {!var^value} sets a value
Attributes:
• {?varâttrib}gets a value (unevaluated!)
• {:var}, gets an evaluated value
• {!var^value} sets an attribute to an value
{?}, {:} and {!} functions are syntactic sugar and could be expressed by {get}, {getEval} and
{set}.
2.3 Using ANTLR for Parsing WITL Code
For parsing WITL code we use ANTLR [11], primarily developed by Terence Parr as a parser
generator. On its web site it is described the following way:
“ANTLR, ANother Tool for Language Recognition, (formerly PCCTS) is a language
tool that provides a framework for constructing recognizers, compilers, and translators
from grammatical descriptions containing Java, C#, Python, or C++ actions. ANTLR
is popular because it is easy to understand, powerful, flexible, generates human-readable
output, and comes with complete source. ANTLR provides excellent support for tree
construction, tree walking, and translation. There are currently over 5000 ANTLR
source downloads a month.” (from: http://www.antlr.org/about.html visited on
03/16/2006)
9
ANTLR creates three main parts for language recognition:
• A Lexer that divides the input in token classes.
• A Parser that evaluates a token stream according to specified rules.
• A TreeWalker that evaluates a syntax tree.
For our purpose, we only use the lexer and the parser. We do not define a TreeWalker, because
we have our own specialized implementation of a syntax tree and do not use the default interfaces
supplied with ANTLR. Even if we used the interfaces of ANTLR for syntax trees, a TreeWalker
would be to inflexible for the evaluation.
2.3.1 The Lexer
In this section we will present the shortened employed lexer grammar. The lexer is used to split
up the WITL code in token classes that can be processed by the Parser on a higher level of
abstraction. ANTLR uses a syntax similar to EBNF. Elements and characteristics of the target
language - here java - can be included.
Listing 1: Lexer definition
3
6
9
12
15
18
21
24
class WitlLexer extends Lexer ;
/∗ The o p e n i n g t a g f o r a l l f u n c t i o n s such a s
∗ {<functionName >: , { ! , { ? , { : o r s p e c i a l a r i t i e s {} and
∗ {<functionName> ( no c o l l o n ) ∗/
OTAG : ( options { generateAmbigWarnings=false ; } :
’{’’}’ { $setType ( EMPTY_FUNCTION ) ; }
| "{:" !
| ( // u s e s s y n t a c t i c p r e d i c a t e t o d i s t i n g u i s h
// between ’{ ’ < functionName > ’ : ’
// and ’{ ’ < functionName > ’} ’
( ’{’ ! ( COMNAME ) ’:’ ! ) => ’{’ ! ( COMNAME ) ’:’ !
| ’{’ ! COMNAME ’}’ ! { $setType ( ZERO_ARG_FUNCTION ) ; } )
| "{!" ! // f o r ”{ s e t : ”
| "{?" ! // f o r ”{ g e t : ”
);
/∗ C l o s i n g t a g f o r f u n c t i o n s . ∗/
CTAG : ’}’ ;
/∗ {∗ . ∗} matches e v e r y t i n g i n b e t w e e n ∗/
COMMENTTAG : "{*" ! ( . ) ∗ "*}" ! ;
/∗ [ [ [ . ] ] ] matches e v e r y t i n g i n b e t w e e n
∗ t r i p l e s q u a r e b r a c k e t ∗/
VERBATIM : "[[[" ! ( . ) ∗ "]]]" ! ;
27
/∗ Used t o s e p e r a t e between arguments i n
∗ a f u n c t i o n , e . g . { fun : a r g 1 ˆ a r g 2 } ∗/
ARGSEP : "^" ;
30
/∗ e v e r y t h i n g e x c e p t s p e c i a l s i g n s
∗/
10
33
TEXT : ( { LA ( 2 ) ! = ’]’ | | LA ( 3 ) ! = ’]’ }? ’]’ // LA = l o o k ahead
| { LA ( 2 ) ! = ’[’ | | LA ( 3 ) ! = ’[’ }? ’[’
| CONTENT
| ESC )+ ;
36
/∗ Escaped s i g n s { , } , : , ˆ , [ , ] with l e a d i n g ’ / ’ ∗/
protected ESC : "/" ! ( ’{’ | ’}’ | ’:’ | ’^’ | ’]’ | ’[’ ) ;
39
/∗ Allowed p l a i n c o n t e n t ∗/
protected CONTENT : ˜ ( ’{’ | ’}’ | ’\n’ | ’\r’ | ’^’ | ’]’ | ’[’ ) ;
42
protected COMNAME : ( ’a’ . . ’z’ | ’A’ . . ’Z’ | ’_’ | ’$’ )
( ’a’ . . ’z’ | ’A’ . . ’Z’ | ’_’ | ’0’ . . ’9’ | ’$’ | ’+’ | ’-’ | ’.’ ) ∗ ;
45
WS : ( ’ ’ | ’\t’ | ’\f’ ) ;
NL : ( ’\n’ | "\r\n" | ’\r’ ) ;
Listing 1 shows the source code of the lexer. We assume that the comments in the source code
describe most token classes sufficiently. Some tricky parts however, are worth being mentioned
explicitly:
• Functions are split up in an opening tag (line 13) and a closing tag. To distinguish between
’{’<functionName>’:’ and ’{’<functionName>’}’ ANTLR has to first look whether it
finds an opening curly bracket followed by a functionName, followed by a colon. If it does
not find a colon, it has to fall back and look for a closing curly bracket, i.e. it has to
differentiate if the function has zero arguments or one or more.
• The TEXT token definition uses a look ahead of two and three to find out if a [ or ] belongs
to a VERBATIM token class.
2.3.2 The Parser
The parser code in ANTLR is analog to the lexer code. It is very similar to EBNF but with
features needed for processing with java. Listing 2 shows a very simplified extraction of the parser
code.
Listing 2: Parser definition
class WitlParser extends Parser ;
3
/∗ Returns t h e r o o t o f t h e WITL document . ∗/
witl returns [ Sequence seq ] : ( node ) ∗ ;
6
/∗ g e n e r a l node ∗/
private node returns [ WitlExpression node ] :
( plain | function | getCommand | setCommand | comment ) ;
12
/∗ f u n c t i o n node { functionName : w i t l t e x t } ,
∗ { functionName } , {} ∗/
private function returns [ WitlExpression com ] :
( OTAG witl ( ARGSEP witl ) ∗ CTAG
| EMPTY_FUNCTION | ZERO_ARG_FUNCTION ) ;
11
18
21
/∗ f u n c t i o n node {? w i t l t e x t } f o r G e t t e r ∗/
private getCommand returns [ WitlExpression com ] :
GET_FUNCTION witl ( ARGSEP witl ) ∗ CTAG ;
/∗ f u n c t i o n node { ! w i t l t e x t } f o r S e t t e r ∗/
private setCommand returns [ WitlExpression com ] :
SET_FUNCTION witl ( ARGSEP witl ) ∗ CTAG ;
24
/∗ comment node {∗ my comment ∗} ∗/
private comment returns [ WitlExpression node ] : COMMENTTAG ;
27
/∗ p l a i n t e x t o r verbatim t e x t ∗/
private plain returns [ WitlExpression node ] : text | verbatim ;
30
private text returns [ String text ] : TEXT | NL | WS ;
private verbatim returns [ String text ] : VERBATIM ;
Some remarks:
• Tokens delivered from the Lexer are in uppercase letters, e.g. VERBATIM.
• Parser rules are in lowercase or mixed cases an may have a return value of any java class.
WitlExpression and Sequence are two examples of java classes.
2.4 Rendering with WITL
2.4.1 Evaluating the Abstract Syntax Tree
While building a syntax tree seems to be a nice gadget, you want to display some text in a certain
target language. In the case of our Wiki mostly HTML but also LATEX is imaginable. A syntax tree
for itself is just a structure and insensitive to the grammar it has been produced by. Extracting
the information expressed by the tree is an important issue. The evaluation of markup functions
is very easy and can be done by implementing some hard coded functions, but WITL has more to
offer. You can also define your own functions as shown on Fig. 3 in the Wiki text. The evaluation
then becomes more complex. We define a general evaluation strategy that can handle these cases.
The evaluation of a tree is done by evaluating every node of the tree. The result is yet again a
tree, but very often, it now consists of only text nodes. The final evaluation step is to flatten the
tree via pre-order traversal to produce HTML output.
Below, we give a more formal definition of evaluation.
EBNF name
seq
funapp
plain
term signature
seq hhead, taili
funapp hvarname, argi
plainhtexti
The evaluation function has the signature
eval : Tree × Text → Tree × Text
12
The type Tree is defined as
Tree
=
|
|
funapphVarname, Treei
plainhTexti
seqhTree, Sequencei
In the definition of eval, we use the type Text for storing variable bindings. Lookup is done via
function application where the instance bindings is viewed as having the signature
bindings : Varname → Tree
Adding new bindings views bindings as a sequence of (varname, tree) pairs to which the . operator
prepends a pair. Lookup is always performed starting with the first element, returning the first
match if one is found.
The evaluation of each possible alternative of a tree is defined as follows.
• Sequence of Nodes:
eval(seq hhead, taili , bindings) = seq result, eval(tail, bindings 0 )
where (result, bindings 0 ) = eval(head, bindings)
eval(seq hi , bindings) = seq hi
• Function Application: Without loss of generality, we define the function application just for
one argument. Note that the function definition is stored as a function application whose
functor part is called fun.
eval(funapp hfunname, argi , bindings) =
eval(body, (param, arg) . bindings)
where funapp hfun, param, bodyi =
bindings(funname)
• Plain Text:
eval(plainhtexti, bindings) = (plainhtexti, bindings)
Except for the evaluation of sequences, this evaluation strategy is typical for functional languages
with lazy evaluation.
2.4.2 Defining Functions
The primary goal behind using WITL as a language for both entries and templates is that you
can define your own functions in the text as well as using hard coded functions. This fact makes
WITL convenient but also flexible. We will demonstrate the mechanism with a short example:
For defining your own bold function you can use code like this:
{ def : {b : content } ˆ
[ [ [ <b> ] ] ]
{ var : content }
[ [ [ </b> ] ] ]
}
The function def is a predefined function and has two arguments:
13
1. The function to define: {b: content}
The function has a name ”b” and one argument with variable name ”content”.
2. The body of the function as second argument:
There are HTML tags in triple brackets and between them, there is a predefined function
var that tells the interpreter to fetch and evaluate its argument.
When the function application is evaluated, e. g. {b:someBold}, first the function definition of b
is looked up and the variable content is bound to ”someBold”. Then the definition of function
b is evaluated. In this definition there is a function var embedded. The function var tells the
interpreter to lookup its argument ”content” in the variable bindings and to insert the result in the
text at the position itself occurred. As result {b:someBold} is replaced by ”<b> someBold </b>”
Fig. 3 shows the syntax tree for the example above. Round nodes are functions such as
{b:content}. The only empty round node is a sequence. It is a container for the nodes in
the second argument of def. The square nodes are just for the plain text. In addition to def,
eval and var, WITL also provides further lisp-inspired functions such as {list:} and {for:}.
content
b
def
<b>
var
content
</b>
Figure 3: This syntax tree is a graphical representation of the definition for a bold function. Round
nodes are functions, the empty round node is a sequence of nodes and the square nodes
are plain text.
14
3.1 Three Layer Architecture
In order to reduce complexity, we created an architecture with three layers as shown in Fig. 4.
This is just an abstraction to give a bird’s-eye view on the system.
The data layer is used to maintain most of the data to which the to layers above can refer. Thus
it is some kind of service layer, which is reliable for creating, updating and deleting data when
possible.
The presentation layer is the layer in the middle. This is the level where entries are situated.
An entry from the author’s view stands for the traditional Wiki page. It is the actual Wiki text
extended by the possibility of adding other information like the used language or date of creation.
When writing an entry, an author can access the underlying data by including a query or just
referring to meta data for this entry in the Wiki text.
The meta presentation layer is the layer on top. It is used for designing the web pages. There are
templates defined for creating a common look and feel. It is also used to create a certain view on
a template, because one might be interested in the text, title and date of creation for an entry and
you do not care about the language or other extra information. Somebody else, on the other hand
might be just interested in the title and creator, because he wants to create an overview of entries
that a friend of him has written. This top level gives the possibility to choose what information one
want to have displayed. The idea is to apply a template to an entry and by switching to another
template, the view can be changed while the entry stays the same. This can be done the other way
round. You can keep the template and change the entry. There may be also entries that do not
have a Wiki text but other properties. For example there might be an entry for a person and it is
desired to have some information about the person displayed. This can be easily done by using a
template that functions as a form which is filled with the data concerning the person.
[[
X HT [ <! DO
ML
C
1. 0 TY PE
"h
Tr a h tml
tr a tt p:/
n si
ns
/w
<h
ti o PU B LI
xml tml ition ww.w3
nal
C
a l.
.
ns =
/ /E "- //
dt d or g/ T
N"
W3 C
<ti "http
"
R/x
>
//D
tl
:/ /
Wi
ht m
TD
w ww
l1/
< /t k ke d e>
. w3
D TD
rel
<l i itl e
.
/x h
o rg
=
t ml
/19
</ h "s ty l n k hr >
{
B
e
19 9/
{ em : Jo hn
]] ] <b od y e ad > e sh ee f ="..
x
h
:24
t ml
t"
/
>
.0 2 Sm hi t
">
ty p m y st y
.20
h
le.
--e="
05 } }
{b -h1text css"
{u r:} : Wh/
ac
" /
tss
- -- l:
i
s a>
- -- li :
wi k
li:
on
}
i g
ano e ite
o od
the
m
Wi
f or
r i
{ i k is
?
te m
{ li : co ll * *a r
bo r nk: h oba e* *
r
w
ati
t
ve_ tp:// tive idly
s
u
sof
e
twa n.wik oftwa sed a
r
s
re}
i
} , p ed ia e
i.e
.
. t o rg /w
o w
i
o r k ki/ Co
l la
in
meta
presentation
layer
presentation
layer
data layer
Figure 4: A three layer architecture. On top: The meta presentation building the framework for
the Wiki, i.e. the web page templates. In the middle: The presentation layer for authors
to write their entries. On bottom: The data layer for managing information.
3.2 Components
This section gives a component oriented view of Wikked s architecture. Fig. 5 describes the
interaction of the components involved with Wikked:
15
Figure 5: This figure displays the involved components of the Wikked wiki. On bottom there is
the RDF database used by Hyena to maintain the data needed for Wikked, i.e. the wiki
pages and other exploited data. Wikked itself runs as a servlet in a servlet container
and uses hyena as a service to receive the desired information. The rendering engine is
used to parse the WITL source code including wiki markup and then to generate the
HTML output.
16
• Hyena:
Hyena is used as a service to manage data concerning RDF. This includes the wiki pages,
which are RDF nodes itself.
• Wikked:
Wikked runs as a servlet in a servlet container. It receives request from clients and returns
the corresponding HTML pages. The HTML pages are generated from WITL source code
by the Rendering Engine.
• Rendering engine:
The Rendering Engine includes a preprocessor for translating wiki markup to WITL. WITL
then is translated to HTML.
This architecture can be seen as a implementation of the MVC pattern7 . Thereby the Wikked
servlet acts a controller which translates interactions with the view into actions to be performed
by the model. The model is the RDF graph managed by Hyena. The rendering engine at last is
the view. The view renders the contents of a model, i.e. it generates HTML output by applying a
template to wiki page.
7 Also
known as the Model 2 architecture.
17
4
Scenario: A Bookmark Collection Web Site
4 Scenario: A Bookmark Collection Web Site
4.1 Motivation
In Wiki documents you often want to refer to a data source instead of simply quoting it. In cases
like “list all my java books” or “list my ten favorite songs” it would be nice to have a mechanism
to query your database for this information. Your database in this case can be a text document,
an excel file or, as the name implies, some kind of relational database. With such a mechanism
you do not have to just copy and paste the text, for the plainest case. This is especially interesting
when data are subject to changes. You may buy some new java books or your taste in music may
vary. You also may want to apply a bit more advanced query conditions, e. g. “list my top ten brit
pop CDs from 1998 until 2006”. With your data managed by the RDF database, you can easily
refer to it with query functions.
Why did we choose a example with bookmarks? We suggest that nearly everybody has a huge
collection of bookmarks, but when you have to search for something you ask google. When you
browse the web, you can also find a lot of web pages with only bookmarks on them. Nowadays, there
are tools such as del.icio.us 8 but the bookmark example is very simple and easy to understand.
Hence we show an application of our concept by a use case where three friends are creating a web
page for bookmark collections.
4.2 The Wikked Three Layer Model in Practice
The work for implementing the Wikked layer model can be split up to three persons, let us
call them Mathew, the data manager, Alice an author who presents her bookmarks and Wayland
the designer, who creates nice templates for the web pages. Each of them has their well defined
function. Mathew has to create an data base and create an API for accessing it. Alice has to write
articles and Wayland has to design templates.
4.2.1 Daniel the Data Manager
Daniel’s work is to import some bookmarks into the RDF database. A bookmark in our case
consists of a title, a URI and belongs to one or more categories. You can see a visual representation
of a bookmark for java.sun.com in Fig. 6. The round nodes are representing resources and oblong
nodes literal values. There are also two blank nodes called anonymous resources. The most left
one stands for the concept of “Sun’s Java home page”, the other one for the category with title
“Java”. In this very simple example there are only three categories: News, Java and Wikis. In
each category there are ten bookmarks, all having the same structure as the example of Fig. 6.
4.2.2 Alice the Author
Writing readable and informative entries is Alice’s job. In our example she only writes three entries,
one with her Java links, one with her News links and finally one with her Wiki links. Because she
wants to quote them, all of her bookmarks are imported into the database by Daniel earlier9 . Her
entry for Java my look like this:
These are my ˜˜ Java ˜˜ links :
{ queryTable : { book : cat : Java }
}
8 http://del.icio.us/
9 She
had sent them to Daniel per email earlier.
18
4
hg:title
Java
rdf:type
hg:category
hg:source
hg:category
http://java.sun.com/
hg:title
rdf:type
Java Technology
hg:bookmark
Figure 6: A graph representation for an entry in a bookmark collection. To make the graph more
readable the namespaces are abbreviated. This bookmark has a title:“Java Technology”,
a source: “htp://java.sun.com”, a type:“hg:bookmark”and a category linked to a node
with title:“Java” and type:“hg:category”.
This entry seems to be a little short, but for demonstration purposes it is sufficient. It shows the
main aspects of our Wiki syntax:
• function orientation: A function begins with a curly brace followed by the function name,
a colon, one or more arguments separated by ^ and finally a closing curly bracket. In the
example there are two functions. The first one has the function name “queryTable” with
“{book:cat:java}” as a single argument. The second one is the argument for the first one.
It has the function name “book” and the argument “cat:java”. “queryTable” gets a query
in RDQL as its argument and returns a table with the results. Here “book” is a predefined
query function. It simply returns a query in RDQL according to its argument. In this case
it returns a query to list a table with title and source for all bookmarks where the category
is “java”.
• nested functions: A function can also occur in a function, which makes recursive calls
possible. When the document is parsed, the structure of the text is translated to a tree as
described in the previous section about syntax.
• syntactic sugar: To keep the advantages of common Wiki syntax, we added some syntactic
sugar, e. g. ~~italic~~is translated to {i:italic} and stands for italic. For some more
examples refer to section 2.1. This is very useful for just making some notices in ASCII text,
but to have them rendered in HTML.
4.2.3 Wayland the Web Designer
Finally, Wayland an experienced web designer has to create some templates for the entries Alice
wrote. As already mentioned, he can also use WITL to face this challenge.
Therefore he writes a master template:
{ call : { header }}
{ call : { body }}
{ call : { footer }}
19
4
In this template there are several nested templates, included with the function call: <templatekey>10 :
• header: A header template that has some tags with formal definitions needed for layout
reasons and to make the whole page valid HTML.
• body: The definition how to display a single entry. There is a sidebar on the left that holds
the author, the date of creation and the keywords.
• footer: In this example the footer created by Wayland is just for printing a closing tag of
the HTML document. But when the website of the three gets more sophisticated, there can
be for example copyright information included.
The code for the header:
[ [ [ <html>
<head>
<title> ] ] ] { : dc : title } [ [ [ </ title>
<style type="text/css">
< !−− some CSS −−>
</ style>
</ head> ] ] ]
The code for the body:
[ [ [ <body>
<div id=" container ">
< !−− l e f t s i d e b a r , some i n f o s −−>
<div id="sidebar -left"> ] ] ]
{b : Author : { : dc : creator }}
[ [ [ <br /> ] ] ]
{b : created on : { : dc : date }}
{b : subjects : { : dc : subject }}
[ [ [ </ div>
< !−− c e n t e r e d c o n t e n t s −−>
<div id=" content ">
<div class="entry"> ] ] ]
{ h1 : Topic : { : dc : title }}
[ [ [ < !−− t h e w i k i T e x t −−> ] ] ]
{ : dc : source }
[ [ [ </ div>
</ div>
<div>
</ body> ] ] ]
The code for the footer:
[[[
< !−− some o t h e r text −−>
</ html> ] ] ]
10 The
key in the most cases is an URL.
20
4
Figure 7: A screen shot of an entry displayed by the Wiki.
Fig. 8 shows a scheme of how the templates are evaluated. On top there is the content of the
main page template. Each sub template is included and evaluated by the call function. Then the
functions included in the “called” templates are evaluated as well until you have got the desired
output, a web page coded in HTML. Fig. 7 shows an screen shot of the result.
21
4
{call: header}
Page Template {call: body}
{call: footer}
Body Template
<div class=”entry”>
{b:someBold}
{renderGet:dc:source}
Entry Content
these are my ~~java~~ links:
{queryTable:{book:cat:News}}
HTML Output
<h1> Topic: Java</h1>
these are my <i>Java</i>-links:
<br />
<table><tr><th>title</th>
<th>source</th>
</tr>
<tr>
<td>Radeox :: start</td>
<td>http://radeox.org/</td>
</tr>
...
Figure 8: This figure shows an extract how a template (Page template) is evaluated. First the
body (Body Template) is looked up in the template list and then evaluated. In the body
there is a function renderGet that fetches the property dc:source of the entry and tells
the evaluator to render it. The text for the property (Entry Content) has embedded
functions that are simplified evaluated - here in one step. The entry then is rendered to
HTML (HTML output).
22
7
Conclusion
5 Related Work
We are not aware of any RDF-based Wikis. As an example for a Wiki with meta tags we can
mention SnipSnap. SnipSnap [6] uses a concept with so called labels to attach data to an entry
like a label for the creator of the entry, a person. A person then can also have several labels like
”name: Smith” of a type like ”NameLabel”. A label has a type, a name and a value. This approach
is rather clever, but the disadvantage is that they do not use an open standard. Worth mention is
that SnipSnap supports an export of an entry to rdf.
Omnigator [10] provides a web interface for navigating topic maps. It supports importing RDF
models and browsing them. But Omnigator is not a Wiki.
SEAL [7] is a framework for developing ontology-based portal application. It differs in these
main points: it is not a wiki, it concentrates on semantics, which WITL does not, and SEAL is
focused on protal application. It has with WITL in common that both can give views or they call
it lenses on RDF data.
6 Future Research
• Blogs:
Our metadata support makes this a very straightforward enhancement. The first step is
to add meta-data about the creation data. The second step is to display multiple small
wiki pages on one web page and to sort them chronologically. Finally, one can add further
convenience functions such as partitioning the blog entries into pages, a calendar wiki, an
RSS feed etc.
• Integration into the Hyena RDF editing infrastructure:
There are two ways in which Hyena [13] will benefit Wikked:
– RDF and Wiki syntax editing with a graphical user interface (GUI):
One frontend for Hyena is implemented as a collection of plugins for the integrated
development environment Eclipse. Having this frontend edit Wikked’s data will provide
a nice alternative to purely Web-based editing.
– Remote publishing:
Hyena provides an infrastucture of distributed servers that can publish and subscribe
data among each other. One can therefore start one editor as a Hyena engine (with a
GUI frontend) and Wikked as another (without a user interface). Publishing from the
editor to Wikked uses a push protocol provided by Hyena.
• Lightweight publishing for software engineers:
We are currently extending Hyena with a set of tools that allows software engineers to manage development-related information (such as bug-tracking lists, documentation and source
code) with Hyena, in one integrated RDF database. Using Wikked as a presentation layer
for that database allows them to effectively publish and edit that information on the Web.
For example, one can write a wiki page that documents a certain aspect of the system and
includes code samples that are retrieved via RDQL queries.
7 Conclusion
In this paper, we presented an RDF based Wiki. Therefore, we designed a scripting language called
WITL. Templates for Wiki pages use the same language as entries in the Wiki. Both entries and
templates are also described and maintained with RDF. Furthermore, templates can also be edited
23
7
Conclusion
as a normal Wiki page. That assures highest flexibility for the look and feel of the Wiki. To keep
the advantages of ordinary Wikis, traditional Wiki markup is also supported. We developed a
three layer architecture for browsing RDF data and showed how the involved components interact
between each other. As a proof of concept we presented a scenario where a web site for maintaining
bookmark collection is developed.
24
References
References
[1] Apache Jakarta Project. Velocity. http://jakarta.apache.org/velocity/.
[2] L. Aronsson. Operation of a large scale, general purpose wiki website. Elpub 2002. Technology
Interactions., 2002.
[3] T. Berners-Lee, R. T. Fielding, and L. Masinter. Uniform Resource Identifier (URI): Generic
Syntax. http://www.ietf.org/rfc/rfc3986.txt, January 2005.
[4] F. Dawson and T. Howes. vCard MIME directory profile. ftp://ftp.isi.edu/in-notes/
rfc2426.txt, September 1998.
[5] Fraunhofer FIRST. Radeox. http://radeox.org/.
[6] M. L. Jugel and S. J. Schmidt. SnipSnap. http://snipsnap.org/.
[7] A. Maedche, S. Staab, N. Stojanovic, R. Studer, and Y. Sure. Seal - a framework for developing
semantic web portals. In BNCOD 18: Proceedings of the 18th British National Conference on
Databases, pages 1–22, London, UK, 2001. Springer-Verlag.
[8] F. Manola and E. Miller. RDF Primer, W3C Recommendation. http://www.w3.org/TR/
rdf-primer/, 2004.
[9] Netscape Communications Corporation. Open Directory Project. http://www.dmoz.org/.
[10] Ontopia. Omnigator. http://www.ontopia.net/omnigator/.
[11] T. Parr. Antlr parser generator. http://antlr.org/.
[12] T. Parr. StringTemplate: Java Template Engine. http://www.stringtemplate.org/.
[13] A. Rauschmayer. Hyena: A Semantic Web Enabled Editor for Software Engineers. Submitted
for publication.
[14] A. Rauschmayer and P. Renner. Knowledge-Representation-Based Software Engineering.
Technical Report 0407, Ludwig-Maximilians-Universität M¨
unchen, Institut f¨
ur Informatik,
May 2004.
[15] A. Seaborne. RDQL - a query language for RDF. http://www.w3.org/Submission/RDQL/.
[16] Sun Microsystems. Jsr-000152 javaserver pages 2.0 specification - final release. http://jcp.
org/aboutJava/communityprocess/final/jsr152/.
25
List of Figures
List of Figures
1
2
3
4
5
6
7
8
RDF Example: Adress Book Entry . . . . . . . . .
Wiki Markup . . . . . . . . . . . . . . . . . . . . .
Abstract Syntax Tree for Bold Function . . . . . .
Three Layer Architecture for Wikked . . . . . . . .
Wikked Components . . . . . . . . . . . . . . . .
A graph representation for an entry in a bookmark
A screen shot of an entry displayed by the Wiki. .
Template processing. . . . . . . . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
collection.
. . . . . .
. . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5
8
14
15
16
19
21
22
26

How to Build an RDF Based Wiki Fortgeschrittenenpraktikum Walter Christian Kammergruber

Transcription

Similar documents

Document 6482899

Wikidictionary

How to Embed a YouTube Video in Webcourses ( ) Wiki or Blog Campus Pack Fusion

Lessons of the Kobayashi Maru: Cheating is Fundamental

Lessons of the Kobayashi Maru: If You`re Not Cheating

RPG Web Development

Neo-Luddism and the Demonisation of Technology: Cultural

HOW TO USE THE WIKI Working together, learning together

Deborah Chiarella, MLS – University at Buffalo, Health Sciences