Example: wissen.de
Transcription
Example: wissen.de
Example: wissen.de Software Architecture VO/KU (707.023/707.024) Roman Kern KMI, TU Graz December 17, 2012 Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 1 / 58 Outline 1 Introduction 2 Software Architecture Overview 3 Software Architecture Styles 4 Project Management Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 2 / 58 Introduction Section Introduction Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 3 / 58 Introduction Wissen.de - Web Site of the Year 2012 Has just been elected Web Site of the Year 2012 Winner in the category: Education Both the best and the most popular web-site http://www.websitedesjahres.de/ Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 4 / 58 Introduction Wissen.de - The Web Site Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 5 / 58 Introduction Wissen.de - The Web Site Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 6 / 58 Introduction Wissen.de - Host Wissen.de is a web-site hosted by wissenmedia Wissenmedia is owned by Bertelsmann SE & Co. KGaA Wissenmedia owns brands: Brockhaus, Bertelsmann, WAHRIG, CHRONIK, JollyBooks The brand Brockhaus is over 200 years old and is known by 93% people (in Germany) Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 7 / 58 Introduction Wissen.de - Scope Wissen.de is a free service Content is added and curated by editors Does not follow the Wikipedia model Free content is not taken from Brockhaus wissen.de articles differ from printed articles In their style and their life-cycle Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 8 / 58 Introduction Wissen.de - Motivation The project started out as an innovation platform: Be innovative in terms of business models Wissen.de is just a single portal to a complex system → Another example is a cooperation with a set-top box manufacturer Be innovative in terms of technologies → Try out new functionality Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 9 / 58 Introduction Wissen.de - Software Architecture In terms of software architecture this puts an emphasis an specific quality attributes: Flexibility quickly try out new features Evolvability add new features without interfering with existing infrastructure Scalability need to manage millions of articles (more than the German Wikipedia) need to serve many users Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 10 / 58 Introduction Wissen.de - Software Architecture Focus on specific quality attributes has implications on others: Configurability Need to be high as well Testability Suffers, as the system is changing at a high pace Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 11 / 58 Introduction Wissen.de - Software Architecture More importantly, the architecture needs to be flexible And foresee possible directions Typically use YAGNI (“You ain’t gonna need it”) - as a guideline Complexity The system has a high level of complexity → very hard for new developers in the project Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 12 / 58 Introduction Wissen.de - Software Architecture High flexibility is achieved by Loose coupling Individual components do not depend on other components Generic interfaces and protocols Thus components can be easily swapped out and replaced But this have an impact on: Performance System needs to be as generic as possible → no option to fine-tune algorithms Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 13 / 58 Introduction Wissen.de - Backend Wissen.de is only one of multiple web sites The whole infrastructure contains many sub-systems and components Another part is the interface to the other systems (e.g. editor systems) It is embedded into an existing landscape of tools → integrability Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 14 / 58 Introduction Wissen.de - Team Developed by separate teams Teams are from different companies Know-Center, Key-Tec, EDELWEISS72, wissenmedia, arvato, Nionex... Teams are geographically dispersed Graz, Munich, Gütersloh Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 15 / 58 Introduction Wissen.de - Overview Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 16 / 58 Introduction Wissen.de - Detail Will focus on the backend part only It is run on multiple (virtual) machines Used by multiple components Main tasks: Store articles, index articles, present articles Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 17 / 58 Introduction Articles Articles are stored as XML Combination of data and meta-data Meta-data are title, date, category, ... Data is XML, not restricted to a single format Links between articles Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 18 / 58 Software Architecture Overview Section Software Architecture Overview Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 19 / 58 Software Architecture Overview Main Architecture Web service as main interface → Client-server architecture Main architecture: n-tier style Typical example for a heterogeneous architecture style Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 20 / 58 Software Architecture Overview n-Tier Architecture Conceptual architecture: 3-tier Database layer Application logic layer Presentation layer Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 21 / 58 Software Architecture Overview Architecture: 3-layer applications Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 22 / 58 Software Architecture Overview n-Tier Architecture Implementation architecture: 2-tier Framework library Presentation libraries E.g. web-service library, command line library Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 23 / 58 Software Architecture Overview Database Layer Object-relational mapping (ORM) No direct interaction with the relational database Schema can be derived from the business objects Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 24 / 58 Software Architecture Overview Example of ORM @Entity @Table ( name = ”ARTIFACT” ) @NamedQueries ( { @NamedQuery ( name = ” a r t i f a c t B y I d ” , q u e r y = ”SELECT x FROM ARTIFACT x WHERE x . ARTIFACT ID = : a r t i f a c t I d P a r a m AND }) @XmlJavaTypeAdapter ( X m l A r t i c l e . A d a p t e r . c l a s s ) p u b l i c c l a s s A r t i f a c t implements S e r i a l i z a b l e { @EmbeddedId private ArtifactPK id ; @ManyToOne @MapsId @JoinColumn ( name = ”BOOK ID” , r e f e r e n c e d C o l u m n N a m e = ”BOOK ID” ) p r i v a t e Book book ; @ManyToOne ( f e t c h = F e t c h T y p e . LAZY) @JoinColumn ( name = ”CONTENT ID” , r e f e r e n c e d C o l u m n N a m e = ”CONTENT p r i v a t e Content content ; Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 25 / 58 Software Architecture Overview Presentation Layer XSLT scripts to transform the output into the target media Not only articles are transformed E.g. search results, error messages Different output target media E.g. mobile version, version for set-top boxes, product specific renderings Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 26 / 58 Software Architecture Overview Presentation Layer Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 27 / 58 Software Architecture Overview Presentation Layer Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 28 / 58 Software Architecture Overview Example of XSLT < x s l : s t y l e s h e e t v e r s i o n=” 2 . 0 ” xmlns=” h t t p : / /www . w3 . o r g /1999/ x h t m l ” xmlns : x s l=” h t t p : / /www . w3 . o r g /1999/ XSL/ T r a n s f o r m ” x p a t h −d e f a u l t −namespace=” h t t p : / /www . w3 . o r g /1999/ x h t m l ”> < x s l : t e m p l a t e match=” / ”> <html xmlns=” h t t p : / /www . w3 . o r g /1999/ x h t m l ”> <head> < l i n k t y p e=” t e x t / c s s ” r e l=” s t y l e s h e e t ” h r e f=” { $ u r l − p r e f i x } </ head> <body> <d i v c l a s s=” a r t i k e l −i n h a l t ”> < x s l : copy−o f s e l e c t=” $ t e x t b a u s t e i n −h e a d e r ” /> <!−− H i e r w i r d d e r e i g e n t l i c h e < x s l : a p p l y −t e m p l a t e s /> A r t i k e l g e r e n d e r t −−> <!−− I n h a l t s u e b e r s i c h t f u e r ge−c h u n k t e A r t i k e l −−> < x s l : i f t e s t=” n o t ( / ∗ [ 1 ] / ws : k o n t e x t / ws : ws−i n t e r n ) and / < x s l : c a l l −t e m p l a t e name=” chunks −t a b l e ” /> </ x s l : i f > < x s l : copy−o f s e l e c t=” $ t e x t b a u s t e i n −f o o t e r ” /> </ d i v> </ body> Roman (KMI, TU Graz) Example: wissen.de December 17, 2012 29 / 58 </Kern html> Software Architecture Overview Web Service Interface Multiple interfaces, for different use cases (e.g. read-only access, administrative access, ...) Stateless Hybrid of REST and RPC style service Output is either XML or JSON Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 30 / 58 Software Architecture Styles Section Software Architecture Styles Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 31 / 58 Software Architecture Styles Information Extraction - Preprocessing Task: Transform an XML into a textual representation Three stages: Input XML → transformed into XHTML → transformed into plain text Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 32 / 58 Software Architecture Styles Information Extraction - Preprocessing Style: Pipeline Batch-sequential, the next filter starts once the previous has finished The output of the previous filter is the input to the next Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 33 / 58 Software Architecture Styles Pipes and filters Figure: Pipe and filters style Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 34 / 58 Software Architecture Styles Information Extraction - Execute Task: Extract information out of text Multiple sub-tasks: Split the text into sentences Split a sentence into token (words) Mark certain words as stop-words (should be ignored) Assign word groups to individual tokens Detect named entities (E.g. person names) Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 35 / 58 Software Architecture Styles Information Extraction - Execute Realisation: Pipeline with shared repository First the text is filled into a special data-structure Each filter (sentence chunker, stop-word detection, ...) modifies the data-structure Using so called annotations Each annotation is a span (start, end) with addition features Caveat: filters depend on the output of preceding filters Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 36 / 58 Software Architecture Styles Example of Information Extraction Pipeline p u b l i c L i s t <E x t r a c t e d I n f o r m a t i o n A n n o t a t i o n > p r o c e s s ( S t r i n g t e x t ) t AnnotatedDocument doc = new D e f a u l t D o c u m e n t ( ) ; doc . s e t T e x t ( t e x t ) ; f o r ( Annotator annotator : annotators ) { a n n o t a t o r . a n n o t a t e ( doc ) ; } Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 37 / 58 Software Architecture Styles Event Framework Components can register to listen for events Components can trigger events Typically all events should be handled asynchronously (the sender is not blocked) Architectural style: publish-and-subscribe Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 38 / 58 Software Architecture Styles Notification Architectures Figure: Notification architecture Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 39 / 58 Software Architecture Styles Example of an Event Listener e v e n t L i s t e n e r = new E v e n t L i s t e n e r ( ) { @Override p u b l i c v o i d o n E v e n t ( E v e n t e v e n t , Task t a s k ) { i f ( e v e n t i n s t a n c e o f MediaAddedEvent ) { i s D i r t y = true ; } } }; eventManager . r e g i s t e r E v e n t L i s t e n e r ( e v e n t L i s t e n e r ) ; // somewhere e l s e e v e n t M a n a g e r . f i r e E v e n t A s y n c ( new MediaAddedEvent ( name ) ) ; Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 40 / 58 Software Architecture Styles Cluster Communication Need to scale out (horizontally) to cope with the demand Add redundancy to increase the availability → instead of a single machine, have a cluster of machines Works transparently with the event framework Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 41 / 58 Software Architecture Styles Cluster Communication Dynamically detect all cluster members on start-up (discovery) Communication is based on either broadcast/multicasts (UDP) or direct communication (TCP) All cluster nodes need to know each other Architectural style: peer-to-peer Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 42 / 58 Software Architecture Styles Peer to peer Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 43 / 58 Software Architecture Styles Cluster Communication - Synchronous Only asynchronous communication facilities Create synchronous communication via callbacks Each synchronous message contains a unique id and sent asynchronously Once the message has been processed by the remote note, a notification is sent back passing the id Processing then can be continued at the sender side Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 44 / 58 Software Architecture Styles Indexing The search index needs to be updated once articles have been changed The component responsible to update the content of articles fires an event as soon as an article has changed The index components listens for these events → Decoupling of components, as one component does not know the other components Disadvantage: no direct control of the process flow, hard to track the progress of operations Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 45 / 58 Software Architecture Styles Request Tracking Track long running operations For example: batch import of articles, which might take hours Idea: collect all information regarding an operation in one place, called task Store this information in the database Notify user once the operation is done Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 46 / 58 Software Architecture Styles Request Tracking - Task Task consists of ID: Unique ID of the task Status: running, finished Result: success, failed, cancelled Messages: List of messages for the user Attributes: Track the progress (→ progress bar in the UI) Properties: Store internal state information Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 47 / 58 Software Architecture Styles Request Tracking - Task A single task might spawn multiple machines Synchronisation via the database Administration console list all tasks Helps to detect the root of problems Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 48 / 58 Software Architecture Styles Logging Common logging infrastructure Logging is also collected in the tasks Logging output also contains the task-id Log output is collected in files Log files are rotated Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 49 / 58 Software Architecture Styles Error Handling Each layer produces its own type system of errors The presentation layer is responsible to report the error to the user For each error an unique ID is generated The ID is reported to the user and logged Thus no internal state is reported to the outside Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 50 / 58 Software Architecture Styles Monitoring Monitor the current state of the system Web-based tool to monitor the state Current resource consumption, e.g. memory used List of recent error logs Support of administrative/analysis operations Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 51 / 58 Software Architecture Styles Monitoring Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 52 / 58 Software Architecture Styles Runtime Performance Improve performance by use of caching Caches need to be in-sync across the multiple machines Therefore all changes need to be reported to all machines The event framework propagates these changes to all nodes and components Changes in the file-system need to be detected as well Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 53 / 58 Software Architecture Styles Improve Flexibility Improve flexibility by increase of configurability Level of configurability rises with the power of the configuration language Highest level if the configuration itself is some sort of programming language → Interpreter architectural style Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 54 / 58 Project Management Section Project Management Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 55 / 58 Project Management Support Infrastructure Version control system Bug/Issue tracking system Continuous integration system Documentation system Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 56 / 58 Project Management Rollout Development system Virtual machine with all tools installed Staging system Replica of the production system Production system Only versions are deployed on the production system, which have been tested on the staging system Only a few people are allowed to deploy on the production system Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 57 / 58 Project Management Project Management Agile project development Short cycles, working software Project communication via periodic conference calls Additionally e-mail and via issue tracker Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 58 / 58 Project Management Section Questions? Roman Kern (KMI, TU Graz) Example: wissen.de December 17, 2012 59 / 58