Example: wissen.de

Transcription

Example: wissen.de
Example: wissen.de
Software Architecture VO/KU (707.023/707.024)
Roman Kern
KMI, TU Graz
December 17, 2012
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
1 / 58
Outline
1
Introduction
2
Software Architecture Overview
3
Software Architecture Styles
4
Project Management
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
2 / 58
Introduction
Section
Introduction
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
3 / 58
Introduction
Wissen.de - Web Site of the Year 2012
Has just been elected Web Site of the Year 2012
Winner in the category: Education
Both the best and the most popular web-site
http://www.websitedesjahres.de/
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
4 / 58
Introduction
Wissen.de - The Web Site
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
5 / 58
Introduction
Wissen.de - The Web Site
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
6 / 58
Introduction
Wissen.de - Host
Wissen.de is a web-site hosted by wissenmedia
Wissenmedia is owned by Bertelsmann SE & Co. KGaA
Wissenmedia owns brands: Brockhaus, Bertelsmann, WAHRIG,
CHRONIK, JollyBooks
The brand Brockhaus is over 200 years old and is known by 93%
people (in Germany)
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
7 / 58
Introduction
Wissen.de - Scope
Wissen.de is a free service
Content is added and curated by editors
Does not follow the Wikipedia model
Free content is not taken from Brockhaus
wissen.de articles differ from printed articles
In their style and their life-cycle
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
8 / 58
Introduction
Wissen.de - Motivation
The project started out as an innovation platform:
Be innovative in terms of business models
Wissen.de is just a single portal to a complex system
→ Another example is a cooperation with a set-top box manufacturer
Be innovative in terms of technologies
→ Try out new functionality
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
9 / 58
Introduction
Wissen.de - Software Architecture
In terms of software architecture this puts an emphasis an specific
quality attributes:
Flexibility
quickly try out new features
Evolvability
add new features without interfering with existing infrastructure
Scalability
need to manage millions of articles (more than the German Wikipedia)
need to serve many users
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
10 / 58
Introduction
Wissen.de - Software Architecture
Focus on specific quality attributes has implications on others:
Configurability
Need to be high as well
Testability
Suffers, as the system is changing at a high pace
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
11 / 58
Introduction
Wissen.de - Software Architecture
More importantly, the architecture needs to be flexible
And foresee possible directions
Typically use YAGNI (“You ain’t gonna need it”) - as a guideline
Complexity
The system has a high level of complexity
→ very hard for new developers in the project
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
12 / 58
Introduction
Wissen.de - Software Architecture
High flexibility is achieved by
Loose coupling
Individual components do not depend on other components
Generic interfaces and protocols
Thus components can be easily swapped out and replaced
But this have an impact on:
Performance
System needs to be as generic as possible
→ no option to fine-tune algorithms
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
13 / 58
Introduction
Wissen.de - Backend
Wissen.de is only one of multiple web sites
The whole infrastructure contains many sub-systems and components
Another part is the interface to the other systems (e.g. editor
systems)
It is embedded into an existing landscape of tools → integrability
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
14 / 58
Introduction
Wissen.de - Team
Developed by separate teams
Teams are from different companies
Know-Center, Key-Tec, EDELWEISS72,
wissenmedia, arvato, Nionex...
Teams are geographically dispersed
Graz, Munich, Gütersloh
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
15 / 58
Introduction
Wissen.de - Overview
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
16 / 58
Introduction
Wissen.de - Detail
Will focus on the backend part only
It is run on multiple (virtual) machines
Used by multiple components
Main tasks: Store articles, index articles, present articles
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
17 / 58
Introduction
Articles
Articles are stored as XML
Combination of data and meta-data
Meta-data are title, date, category, ...
Data is XML, not restricted to a single format
Links between articles
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
18 / 58
Software Architecture Overview
Section
Software Architecture Overview
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
19 / 58
Software Architecture Overview
Main Architecture
Web service as main interface
→ Client-server architecture
Main architecture: n-tier style
Typical example for a heterogeneous architecture style
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
20 / 58
Software Architecture Overview
n-Tier Architecture
Conceptual architecture: 3-tier
Database layer
Application logic layer
Presentation layer
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
21 / 58
Software Architecture Overview
Architecture: 3-layer applications
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
22 / 58
Software Architecture Overview
n-Tier Architecture
Implementation architecture: 2-tier
Framework library
Presentation libraries
E.g. web-service library, command line library
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
23 / 58
Software Architecture Overview
Database Layer
Object-relational mapping (ORM)
No direct interaction with the relational database
Schema can be derived from the business objects
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
24 / 58
Software Architecture Overview
Example of ORM
@Entity
@Table ( name = ”ARTIFACT” )
@NamedQueries ( { @NamedQuery ( name = ” a r t i f a c t B y I d ” , q u e r y =
”SELECT x FROM ARTIFACT x WHERE x . ARTIFACT ID = : a r t i f a c t I d P a r a m AND
})
@XmlJavaTypeAdapter ( X m l A r t i c l e . A d a p t e r . c l a s s )
p u b l i c c l a s s A r t i f a c t implements S e r i a l i z a b l e {
@EmbeddedId
private ArtifactPK
id ;
@ManyToOne
@MapsId
@JoinColumn ( name = ”BOOK ID” , r e f e r e n c e d C o l u m n N a m e = ”BOOK ID” )
p r i v a t e Book
book ;
@ManyToOne ( f e t c h = F e t c h T y p e . LAZY)
@JoinColumn ( name = ”CONTENT ID” , r e f e r e n c e d C o l u m n N a m e = ”CONTENT
p r i v a t e Content
content ;
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
25 / 58
Software Architecture Overview
Presentation Layer
XSLT scripts to transform the output into the target media
Not only articles are transformed
E.g. search results, error messages
Different output target media
E.g. mobile version, version for set-top boxes, product specific
renderings
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
26 / 58
Software Architecture Overview
Presentation Layer
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
27 / 58
Software Architecture Overview
Presentation Layer
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
28 / 58
Software Architecture Overview
Example of XSLT
< x s l : s t y l e s h e e t v e r s i o n=” 2 . 0 ” xmlns=” h t t p : / /www . w3 . o r g /1999/ x h t m l ”
xmlns : x s l=” h t t p : / /www . w3 . o r g /1999/ XSL/ T r a n s f o r m ”
x p a t h −d e f a u l t −namespace=” h t t p : / /www . w3 . o r g /1999/ x h t m l ”>
< x s l : t e m p l a t e match=” / ”>
<html xmlns=” h t t p : / /www . w3 . o r g /1999/ x h t m l ”>
<head>
< l i n k t y p e=” t e x t / c s s ” r e l=” s t y l e s h e e t ” h r e f=” { $ u r l − p r e f i x }
</ head>
<body>
<d i v c l a s s=” a r t i k e l −i n h a l t ”>
< x s l : copy−o f s e l e c t=” $ t e x t b a u s t e i n −h e a d e r ” />
<!−− H i e r w i r d d e r e i g e n t l i c h e
< x s l : a p p l y −t e m p l a t e s />
A r t i k e l g e r e n d e r t −−>
<!−− I n h a l t s u e b e r s i c h t f u e r ge−c h u n k t e A r t i k e l −−>
< x s l : i f t e s t=” n o t ( / ∗ [ 1 ] / ws : k o n t e x t / ws : ws−i n t e r n ) and /
< x s l : c a l l −t e m p l a t e name=” chunks −t a b l e ” />
</ x s l : i f >
< x s l : copy−o f s e l e c t=” $ t e x t b a u s t e i n −f o o t e r ” />
</ d i v>
</ body>
Roman
(KMI, TU Graz)
Example: wissen.de
December 17, 2012
29 / 58
</Kern
html>
Software Architecture Overview
Web Service Interface
Multiple interfaces, for different use cases (e.g. read-only access,
administrative access, ...)
Stateless
Hybrid of REST and RPC style service
Output is either XML or JSON
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
30 / 58
Software Architecture Styles
Section
Software Architecture Styles
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
31 / 58
Software Architecture Styles
Information Extraction - Preprocessing
Task: Transform an XML into a textual representation
Three stages:
Input XML
→ transformed into XHTML
→ transformed into plain text
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
32 / 58
Software Architecture Styles
Information Extraction - Preprocessing
Style: Pipeline
Batch-sequential, the next filter starts once the previous has finished
The output of the previous filter is the input to the next
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
33 / 58
Software Architecture Styles
Pipes and filters
Figure: Pipe and filters style
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
34 / 58
Software Architecture Styles
Information Extraction - Execute
Task: Extract information out of text
Multiple sub-tasks:
Split the text into sentences
Split a sentence into token (words)
Mark certain words as stop-words (should be ignored)
Assign word groups to individual tokens
Detect named entities (E.g. person names)
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
35 / 58
Software Architecture Styles
Information Extraction - Execute
Realisation: Pipeline with shared repository
First the text is filled into a special data-structure
Each filter (sentence chunker, stop-word detection, ...) modifies the
data-structure
Using so called annotations
Each annotation is a span (start, end) with addition features
Caveat: filters depend on the output of preceding filters
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
36 / 58
Software Architecture Styles
Example of Information Extraction Pipeline
p u b l i c L i s t <E x t r a c t e d I n f o r m a t i o n A n n o t a t i o n > p r o c e s s ( S t r i n g t e x t ) t
AnnotatedDocument doc = new D e f a u l t D o c u m e n t ( ) ;
doc . s e t T e x t ( t e x t ) ;
f o r ( Annotator annotator : annotators ) {
a n n o t a t o r . a n n o t a t e ( doc ) ;
}
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
37 / 58
Software Architecture Styles
Event Framework
Components can register to listen for events
Components can trigger events
Typically all events should be handled asynchronously (the sender is
not blocked)
Architectural style: publish-and-subscribe
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
38 / 58
Software Architecture Styles
Notification Architectures
Figure: Notification architecture
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
39 / 58
Software Architecture Styles
Example of an Event Listener
e v e n t L i s t e n e r = new E v e n t L i s t e n e r ( ) {
@Override
p u b l i c v o i d o n E v e n t ( E v e n t e v e n t , Task t a s k ) {
i f ( e v e n t i n s t a n c e o f MediaAddedEvent ) {
i s D i r t y = true ;
}
}
};
eventManager . r e g i s t e r E v e n t L i s t e n e r ( e v e n t L i s t e n e r ) ;
// somewhere e l s e
e v e n t M a n a g e r . f i r e E v e n t A s y n c ( new MediaAddedEvent ( name ) ) ;
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
40 / 58
Software Architecture Styles
Cluster Communication
Need to scale out (horizontally) to cope with the demand
Add redundancy to increase the availability
→ instead of a single machine, have a cluster of machines
Works transparently with the event framework
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
41 / 58
Software Architecture Styles
Cluster Communication
Dynamically detect all cluster members on start-up (discovery)
Communication is based on either broadcast/multicasts (UDP) or
direct communication (TCP)
All cluster nodes need to know each other
Architectural style: peer-to-peer
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
42 / 58
Software Architecture Styles
Peer to peer
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
43 / 58
Software Architecture Styles
Cluster Communication - Synchronous
Only asynchronous communication facilities
Create synchronous communication via callbacks
Each synchronous message contains a unique id and sent
asynchronously
Once the message has been processed by the remote note, a
notification is sent back passing the id
Processing then can be continued at the sender side
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
44 / 58
Software Architecture Styles
Indexing
The search index needs to be updated once articles have been
changed
The component responsible to update the content of articles fires an
event as soon as an article has changed
The index components listens for these events
→ Decoupling of components, as one component does not know the
other components
Disadvantage: no direct control of the process flow, hard to track the
progress of operations
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
45 / 58
Software Architecture Styles
Request Tracking
Track long running operations
For example: batch import of articles, which might take hours
Idea: collect all information regarding an operation in one place,
called task
Store this information in the database
Notify user once the operation is done
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
46 / 58
Software Architecture Styles
Request Tracking - Task
Task consists of
ID: Unique ID of the task
Status: running, finished
Result: success, failed, cancelled
Messages: List of messages for the user
Attributes: Track the progress (→ progress bar in the UI)
Properties: Store internal state information
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
47 / 58
Software Architecture Styles
Request Tracking - Task
A single task might spawn multiple machines
Synchronisation via the database
Administration console list all tasks
Helps to detect the root of problems
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
48 / 58
Software Architecture Styles
Logging
Common logging infrastructure
Logging is also collected in the tasks
Logging output also contains the task-id
Log output is collected in files
Log files are rotated
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
49 / 58
Software Architecture Styles
Error Handling
Each layer produces its own type system of errors
The presentation layer is responsible to report the error to the user
For each error an unique ID is generated
The ID is reported to the user and logged
Thus no internal state is reported to the outside
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
50 / 58
Software Architecture Styles
Monitoring
Monitor the current state of the system
Web-based tool to monitor the state
Current resource consumption, e.g. memory used
List of recent error logs
Support of administrative/analysis operations
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
51 / 58
Software Architecture Styles
Monitoring
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
52 / 58
Software Architecture Styles
Runtime Performance
Improve performance by use of caching
Caches need to be in-sync across the multiple machines
Therefore all changes need to be reported to all machines
The event framework propagates these changes to all nodes and
components
Changes in the file-system need to be detected as well
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
53 / 58
Software Architecture Styles
Improve Flexibility
Improve flexibility by increase of configurability
Level of configurability rises with the power of the configuration
language
Highest level if the configuration itself is some sort of programming
language
→ Interpreter architectural style
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
54 / 58
Project Management
Section
Project Management
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
55 / 58
Project Management
Support Infrastructure
Version control system
Bug/Issue tracking system
Continuous integration system
Documentation system
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
56 / 58
Project Management
Rollout
Development system
Virtual machine with all tools installed
Staging system
Replica of the production system
Production system
Only versions are deployed on the production system, which have
been tested on the staging system
Only a few people are allowed to deploy on the production system
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
57 / 58
Project Management
Project Management
Agile project development
Short cycles, working software
Project communication via periodic conference calls
Additionally e-mail and via issue tracker
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
58 / 58
Project Management
Section
Questions?
Roman Kern (KMI, TU Graz)
Example: wissen.de
December 17, 2012
59 / 58